Parallel processing is information processing that uses more than one computer processor simultaneously to perform work on a problem. This should not be confused with multitasking , in which many tasks are performed on a single processor by continuously switching between them, a common practice on serial machines. Computers that are designed for parallel processing are called parallel processors or parallel machines. Many parallel processors can also be described as supercomputers, a more general term applied to the class of computer systems that is most powerful at any given time.
The need to coordinate the work of the individual processors makes parallel processing more complex than processing with a single processor. The processing resources available must be assigned efficiently, and the processors may need to share information as the work progresses. Parallel processors are used for problems that are computationally intensive, that is, they require a very large number of computations. Parallel processing may be appropriate when the problem is very difficult to solve or when it is important to get the results very quickly.
Some examples of problems that may require parallel processing are image processing, molecular modeling, computer animations and simulations, and analysis of models to predict climate and economics. Many problems, such as weather forecasting, can be addressed with increasingly complex models as the computing power is developed to implement them, so there is always an incentive to create newer, more powerful parallel processors. Although early work in parallel processing focused on complex scientific and engineering applications, current uses also include commercial applications such as data mining and risk evaluation in investment portfolios. In some situations, the reliability added by additional processors is also important.
Parallel processors are one of the tools used in high-performance computing, a more general term that refers to a group of activities aimed at developing and applying advanced computers and computer networks. In 1991 a U.S. federal program, the HPCC (High Performance Computing and Communications) program, was introduced to support the development of supercomputing, gigabit networking , and computation-intensive science and engineering applications. The HPCC program uses the term "Grand Challenges" to identify computationally intensive tasks with broad economic and scientific impact that will only be solved with high performance computing technologies.
As of 2002, most of the world's fastest computers (as cited by the Top 500 Supercomputers list which is published on the Internet) are parallel processors. The number of processors may be from fewer than fifty to many thousands. Companies manufacturing these machines include IBM, SGI, Cray, Hitachi, and Fujitsu.
There are many possible ways of designing a parallel computer, and Michael Flynn in 1966 developed a taxonomy for parallel processors, a way of thinking about these alternatives. Flynn categorized them based on two parameters: the stream of instructions (the algorithm ) and the stream of data (the input). The instructions can be carried out one at a time or concurrently, and the data can be processed one at a time or in multiples. In Flynn's scheme, SISD is "Single Instruction stream, Single Data stream," and refers to a traditional sequential computer in which a single operation can be carried out on a single data item at a time.
The two main categories of parallel processor are SIMD and MIMD. In a SIMD (Single Instruction, Multiple Data) machine, many processors operate simultaneously, carrying out the same operation on many different pieces of data. In a MIMD (Multiple Instruction, Multiple Data) machine, the number of processors may be fewer but they are capable of acting independently on different pieces of data. The remaining category, MISD (Multiple Instruction, Single Data) is rarely used since its meaning is not clearly defined. Since it implies that several instructions are being applied to each piece of data, the term is sometimes used to describe a vector supercomputer in which data pass through a pipeline of processors each with a different instruction.
Sometimes Flynn's taxonomy is augmented with the SPMD category, which refers to "Single Program, Multiple Data" to describe a system in which there are many instances of a single type of process, each executing the same code independently. This is equivalent to implementing a SIMD operation in a MIMD machine.
Different parallel architectures have varying strengths and weaknesses depending on the task to be performed. SIMD machines usually have a very large number of simple processors. They are suited to tasks that are massively parallel, in which there are relatively simple operations to be performed on huge amounts of data. Each data stream is assigned to a different processor and the processors operate in lockstep (synchronously ), each performing the same operation on its data at the same time. Processors communicate to exchange data and results, either through a shared memory and shared variables or through messages passed on an interconnection network between processors, each of which has its own local memory. Array processors such as the ICL DAP (Distributed Array Processor) and the Connection Machine, produced by Thinking Machines Corporation, are well-known examples of SIMD machines.
There is greater variety in the design of MIMD machines, which operate asynchronously with each processor under the control of its own program. In general, MIMD machines have fewer, more powerful processors than SIMD machines. They are divided into two classes: multiprocessors (also called tightly coupled machines) which have a shared memory, and multicomputers (or loosely coupled machines) which operate with an interconnection network. Although many of the earlier, high-performance parallel processors used in government research were very large, highly expensive SIMD machines, MIMD machines can be built more easily and cheaply, often with off-the-shelf components. Many different experimental designs have been created and marketed.
Parallel Algorithms and Languages
In a serial algorithm, each step in the algorithm is completed before the next is begun. A parallel algorithm is a sequence of instructions for solving a problem that identifies the parts of the process that can be carried out simultaneously. To write a program for a parallel processor, the programmer must decide how each sub-task in the algorithm will be assigned to processors and in what order the necessary steps will be performed, and at what points communication is necessary. There can be many algorithms for a particular problem, so the programmer needs to identify and implement the one best suited for a particular parallel architecture.
Sometimes "software inertia," the cost of converting programming applications to parallel form, is cited as a barrier to parallel processing. Some systems automatically adapt a serial process for parallel processing but this may not result in the best performance that can be obtained for that problem. In general, it is difficult to write parallel programs that achieve the kind of high-speed performance of which parallel processors are theoretically capable. Programming languages have been developed specifically for use on parallel processors to handle parallel data structures and functions, scheduling, and memory management. In some cases, these are extensions of existing programming languages, such as parallel versions of C and Lisp; in other cases, they are new languages developed for use on specific parallel architectures.
Performance of Parallel Processors
Designers of parallel processors would like to be able to compare the performance of different machines. The usual measure is the number of floating point operations per second, or FLOPS, which is the rate at which a machine can perform single-precision floating point calculations. Many parallel machines are capable of GigaFLOPS (billions of floating point operations per second) performance and newer machines are aimed at performing in the TeraFLOPS range (trillions of floating point operations per second).
As a performance measure, FLOPS refers to the maximum performance of which the machine may be capable. However performance on most problems is dependent on the extent to which they can be parallelized, or broken down into concurrent activities. Another factor is the suitability of the task for a particular parallel architecture. Some problems may be better suited to a SIMD machine, others to some variant of a MIMD machine. A measure of performance relative to the problem to be solved is speedup, which is the ratio of two programs' execution times, usually on a single node and on P nodes of the same computer.
The speedup that can be obtained on a parallel machine depends on the number of processors available, and also on the size of the problem and the way in which it can be broken into parts. Ideally, speedup would be linear so that five processors would give a speedup of five, or ten processors a speedup of ten. However, a number of factors contribute to sub-linear speedup, including additional software overhead in the parallel implementation, load balancing to prevent idle processors, and time spent communicating data between processors. A critical limitation is the amount of parallel activity that the problem allows. Gene Amdahl's Law says that the speedup of a parallel algorithm is limited by the fraction of the problem that must be performed sequentially.
see also Algorithms; Procedural Languages; Supercomputers; Virtual Memory.
Flynn, Michael J. Computer Architecture: Pipelined and Parallel Processor Design. Boston: Jones & Bartlett, 1995.
Parhami, Behrooz. Introduction to Parallel Processing: Algorithms and Architectures. New York: Plenum, 1999.
Roosta, Seyed H. Parallel Processing and Parallel Algorithms: Theory and Computation. New York: Springer, 1999.