PowerWeb: Senior Project

home

Flowing Pennies

Multisite for iWeb

Peek-a-Boo

extras

feedback

Distributed-Processing Theory

There are two requirements for a task that works successfully in a distributed-processing environment: (1) concurrency and (2) a high processing-time-to-data-volume ratio. "Rarely is the speed of an algorithm limited by the speed of the arithmetic units [computers]. Instead, the best algorithms are defined by latency and communications bandwidth" (Hillis 1993, 7).

Concurrency is a high-level property that strongly affects distributed processing performance while processing a given task (Ravindran 1993, 66). Distributed-processing tasks that perform well have a large amount of inherent parallelism. Communications bandwidth is the amount of communications that must occur during each part of the entire task. Successful distributed-processing tasks typically have a relatively small amount of data that must be transmitted after each tiny part of the entire task: the relative size of each subtask is often referred to as "granularity."

The power of simultaneous processing is lost when the communications and synchronization of the task become a larger job than the processing of the task itself.

Many mathematical applications fit well into a distributed-processing model. Fractal generation has a huge processing-time-to-data-volume ratio, and each subrectangle of the total fractal can typically be computed entirely independently of other subrectangles. (See the "Example Plug-ins" section for a fuller description of our fractal generation.) Other mathematical applications are possible, ranging from numerical integration to factorization to Fourier transforms to matrix operations.

Multicomputer supercomputers have been around for a long time. However, price, support, and software are all quite expensive. "The truth is, you can use a network of workstations to simulate a large-grain parallel computer without ripping out operating system software, breaking and resetting programmers' arms, or hocking that prized collection of Peggy Lee recordings" (Lewis 1994, 5). A distributed-processing approach such as PowerWeb is incredibly cost-effective, especially if a network of PCs or Macintoshes is already available.

Paul Fortier suggests that a distributed-processing scheme involves several process management issues:

process creation
process destruction
process scheduling
process dispatching
process blocking
process suspension
process wakeup
process resume
process adjustment
process communication
process identification

(Fortier 1986, 144)

Thoughtful consideration revealed that many of these are crucial to a distributed-processing foundation while some are optional. Those we felt were crucial (creation, destruction, scheduling, dispatching, communication, and identification) were implemented in the PowerWeb foundation; the optional tasks (blocking, suspension, wakeup, resume, and adjustment) will be implemented by the plug-ins themselves if required.