Measuring the Performance of Thread Architectures for Parallelizing Communication Subsystems

The demand for high-performance distributed communication systems (such as video-on-demand servers, global personal communication systems, and the underlying communication protocol stacks) is increasing dramatically. Distributing communication services throughout high-speed computer networks offers many potential benefits by increasing performance, scalability, and functionality. In particular, performing communication services in parallel helps to improve performance by increasing processing rates and reducing latency. To improve performance significantly, however, the speed-up obtained from parallel processing must outweigh the major sources of overhead associated with parallel processing. On shared memory multi-processors, these sources of overhead primarily involve context switching, synchronization, and data movement.

Many communication systems (such as the standard layered protocol stacks specified by the TCP/IP and the ISO OSI reference models) decompose naturally into a series of hierarchically-related tasks. A number of thread architectures have been proposed as the basis for parallelizing these types of communication systems. There are two fundamental types of thread architectures: task-based and message-based. Task-based thread architectures are formed by binding one or more processing elements to the layers of tasks in a communication system. In contrast, message-based thread architectures are formed by binding the processing elements to the data messages and control messages that flow through the layers of tasks. Each type of thread architecture incurs different levels of context switching, synchronization, and data movement overhead. This overhead is affected by factors such as the application requirements, OS and hardware platform, and network characteristics.

My ADAPTIVE Communication Environment (ACE). ACE facilitates systematic performance experiments on parallel thread architectures for high-performance communication subsystems. Experiments based on the framework have emprically investigated the different levels of context switching, synchronization, and data movement overhead associated with different thread architectures. This knowledge has led to the development and deployment of distributed communication systems that effectively utilize parallelism to satisfy their performance requirements.

The following papers present the results of parallel protocol processing architecture measurements:

Back to Douglas C. Schmidt's home page.

Last modified 11:34:24 CDT 28 September 2006