Distributed-Processing Theory


There are two requirements for a task that works successfully in a distributed-processing environment: (1) concurrency and (2) a high processing-time-to-data-volume ratio. "Rarely is the speed of an algorithm limited by the speed of the arithmetic units [computers]. Instead, the best algorithms are defined by latency and communications bandwidth" (Hillis 1993, 7).

Concurrency is a high-level property that strongly affects distributed processing performance while processing a given task (Ravindran 1993, 66). Distributed-processing tasks that perform well have a large amount of inherent parallelism. Communications bandwidth is the amount of communications that must occur during each part of the entire task. Successful distributed-processing tasks typically have a relatively small amount of data that must be transmitted after each tiny part of the entire task: the relative size of each subtask is often referred to as "granularity."

The power of simultaneous processing is lost when the communications and synchronization of the task become a larger job than the processing of the task itself.

Many mathematical applications fit well into a distributed-processing model. Fractal generation has a huge processing-time-to-data-volume ratio, and each subrectangle of the total fractal can typically be computed entirely independently of other subrectangles. (See the "Example Plug-ins" section for a fuller description of our fractal generation.) Other mathematical applications are possible, ranging from numerical integration to factorization to Fourier transforms to matrix operations.

Multicomputer supercomputers have been around for a long time. However, price, support, and software are all quite expensive. "The truth is, you can use a network of workstations to simulate a large-grain parallel computer without ripping out operating system software, breaking and resetting programmers' arms, or hocking that prized collection of Peggy Lee recordings" (Lewis 1994, 5). A distributed-processing approach such as PowerWeb is incredibly cost-effective, especially if a network of PCs or Macintoshes is already available.

Paul Fortier suggests that a distributed-processing scheme involves several process management issues:

(Fortier 1986, 144)

Thoughtful consideration revealed that many of these are crucial to a distributed-processing foundation while some are optional. Those we felt were crucial (creation, destruction, scheduling, dispatching, communication, and identification) were implemented in the PowerWeb foundation; the optional tasks (blocking, suspension, wakeup, resume, and adjustment) will be implemented by the plug-ins themselves if required.