|Figure 1. TAO Architectural Components||TAO Features|
Each component in Figure 1 is summarized below. Complete information about TAO's ORB endsystem architecture is available online.
To meet these challenges, we are developing a high-performance I/O subsystem for TAO that is designed to run over Washington University's Gigabit ATM network. The components in TAO's I/O subsystem include (1) a high-speed ATM Port Interface Controller (APIC), (2) a real-time I/O subsystem, (3) a Run-Time Scheduler, and (4) an admission controller, shown in Figure 2.
|Figure 2. TAO's Gigabit I/O Subsystem|
To guarantee the QoS needs of applications, TAO requires guarantees from the underlying I/O subsystem. To accomplish this task, we are developing a high-performance network I/O subsystem. The components of this subsystem are described below.
1.1. High-speed Network Adaptor
At the heart of our I/O subsystem is a daisy-chained interconnect comprising one or more ATM Port Interconnect Controller (APIC) chips. APIC can be used both as a endsystem/network interface, as well as an I/O interface chip. It sustains an aggregate bi-directional data rate of 2.4 Gbps. In addition, TAO is designed with a layered architecture that can run on conventional embedded platforms linked via QoS-enabled networks (such as IPv6 with RSVP) and real-time interconnects (such as VME backplanes and multi-processor shared memory environments).
1.2. Real-time I/O Subsystem
TAO enhances the STREAMS model provided by Solaris and real-time operating systems like VxWorks. TAO's real-time I/O subsystem minimizes priority inversion and hidden scheduling problems that arise during protocol processing. Our strategy for avoiding priority inversion is to have a pool of kernel threads dedicated to protocol processing and to associate these threads with application threads. The kernel threads run at the same priority as the application threads, which prevents various real-time scheduling hazards such as priority inversion and hidden scheduling.
1.3. Run-Time Scheduler
TAO supports QoS guarantees via a real-time I/O scheduling class that supports periodic real-time applications. Once a thread of the real-time I/O class is admitted by the OS, the scheduler is responsible for (1) computing its priority relative to other threads in the class and (2) dispatching the thread periodically so that its deadlines are met.
TAO's real-time I/O scheduling class allows applications to specify their requirements in a high-level, intuitive manner. For instance, one implementation of TAO's real-time I/O scheduling class is based on rate monotonic scheduling, where applications can specify their processing requirements in terms of computation time C and period P. The OS then grants priorities to real-time I/O threads so that schedulability is guaranteed.
1.4. Admission Controller
To ensure that application QoS requirements can be met, TAO performs admission control for the real-time I/O scheduling class. Admission control allows the OS to either guarantee the specified computation time or to refuse to admit the thread. Admission control is useful for real-time systems with deterministic and statistical QoS requirements.
|Figure 3. Components in TAO's ORB Core|
TAO's ORB Core is based on the high-performance, cross-platform ACE components such as Acceptors and Connectors, Reactors, and Tasks. Figure 3 illustrates how the
client side of TAO's ORB Core uses ACE's
Strategy_Connector to cache connections to the server,
thus saving connection setup time and minimizing latencies between
invocation and execution. The server side uses ACE's
Strategy_Acceptor, in conjunction with Reactor, to accept connections. The
acceptor delegates activation of
the connection handler to one of ACE's various activation strategies
(e.g., a threaded activation strategy shown in the diagram),
which turns each handler into an Active
Object. The connection handler extracts Inter-ORB Protocol (IOP)
requests and hands them to TAO's Object Adapter,
which dispatches the request to the servant operation.
|Figure 4. Layered and De-layered Demultiplexing|
Demultiplexing client requests through all these layers is expensive, particularly when a large number of operations appear in an IDL interface and/or a large number of objects are managed by an ORB. To minimize this overhead, TAO utilizes de-layered demultiplexing (shown in Figure 4(B)). This approach uses demultiplexing keys that the ORB assigns to clients. These keys map client requests to object/operation tuples in O(1) time without requiring any hashing or searching.
To further reduce the number of demultiplexing layers, the APIC can be programmed to directly dispatch client requests associated with ATM virtual circuits. This strategy reduces demultiplexing latency and supports end-to-end QoS on a per-request or per-object basis.
|Figure 5. TAO's QoS Specification Model|
An RT_Operation is a scheduled operation, i.e., one that has expressed its scheduled resource requirements to TAO using an RT_Info struct. The attributes in an RT_Info include worst-case execution time, period, importance, and data dependencies. Using scheduling techniques like RMS and analysis approaches like RMA, TAO's Real-Time Scheduling Service determines if there is a feasible schedule based on knowledge of all RT_Info data for all the RT_Operations in an application.
This set of attributes is sufficient for rate monotonic analysis and is used by TAO to (1) validate the feasibility of the schedule and (2) allocate ORB endsystem and network resources. Currently, developers must determine these parameters manually and provide them to TAO's Real-time Scheduling Service through its CORBA interface. We are planning to enhance this process by creating a tool that (1) monitors the execution of applications in example scenarios and (2) automatically extracts the necessary run-time parameters. Likewise, instead of actual execution, simulation results could be used to define RT_Info attributes for each operation.
|Figure 6. TAO's Scheduling Service|
Work_Taskis a unit of work that encapsulates application-level processing and communication activity. In some MDA projects, a work task is also called a module or process, but we avoid these terms because of their overloaded usage.
RT_Taskis a work task that has timing constraints. Each
RT_Taskis considered to be a ``method'' (function) that has its own QoS information specified in terms of the attributes in its run-time information (
RT_Info) descriptor. Thus, an application-level object with multiple methods may require multiple
RT_Taskcan contain zero or more threads. An
RT_Taskthat does not contain any of its own threads will only execute in the context of another
RT_Task, i.e., it must ``borrow'' another task's (e.g., the Object Adapter's) thread of control to run.
Our analysis, based on RMA, assumes fixed priority, i.e., the operating system does not change the priority of a thread. This contrasts with time-shared OS Schedulers, which typically age long-running processes by decreasing their priority over time. Thus, from the point of view of the OS dispatcher, the priority of each thread is constant.
RT_Infostructure specifies an
RT_Task's scheduling characteristics (such as computation time and execution period).
RT_Infostructure for each
RT_Taskin the system. By using an
RT_Info, the Run-Time Scheduler can be queried for scheduling characteristics (e.g., a task's priority) of the
RT_Task. Currently, the data represented in the
RT_Infostructures are computed off-line, i.e., priorities are statically assigned prior to run-time.
6.1. Presentation Layer Optimizations
The transformation between IDL definitions and the target programming language is automated by TAO's IDL compiler. In addition to reducing the potential for inconsistencies between client stubs and server skeletons, this compiler support innovative automated optimizations. TAO's IDL compiler is designed to generate and configure multiple strategies for marshaling and demarshaling IDL types. For instance, based on measures of a type's run-time usage, TAO can link in either compiled and/or interpreted IDL stubs and skeletons. This flexibility can achieve an optimal tradeoff between interpreted code (which is slow, but compact in size) and compiled code (which is fast, but larger in size).
Likewise, TAO can cache premarshaled application data units (ADUs) that are used repeatedly. Caching improves performance when ADUs are transferred sequentially in ``request chains'' and each ADU varies only slightly from one transmission to the other. In such cases, it is not necessary to marshal the entire every time. This optimization requires that the real-time ORB perform flow analysis of application code to determine what request fields can be cached.
Although these techniques can significantly reduce marshaling overhead for the common case, applications with strict real-time service requirements often consider only worst-case execution. As a result, the flow analysis optimizations described above can only be employed under certain circumstances, e.g., for applications that can accept statistical real-time service or when the worst-case scenarios are still sufficient to meet deadlines.
6.2. Memory Management Optimizations
Conventional implementations of CORBA suffer from excessive dynamic memory management and data copying overhead. Dynamic memory management is problematic for hard real-time systems because heap fragmentation can yield non-uniform behavior for different message sizes and different workloads. Likewise, excessive data copying throughout an ORB endsystem can significantly lower end-to-end performance.
Existing ORBs use dynamic memory management for several purposes. The ORB Core typically allocates a memory buffer for each incoming client request. IIOP demarshaling engines typically allocate memory to hold the decoded request parameters. Finally, IDL dynamically allocate and delete copies of client request parameters before and after an upcall, respectively.
These memory management policies are important in some circumstances (e.g., to protect against corrupting internal CORBA buffers when upcalls are made in threaded applications that modify their input). However, this strategy needlessly increases memory and bus overhead for real-time applications, as well as for streaming applications (such as satellite surveillance and teleconferencing) that consume their input immediately without modifying it.
TAO is designed to minimize and eliminate data copying at multiple points. For instance, TAO's ``zero-copy'' buffer management system described in allows client requests to be sent and received to and from the network without incurring any data copying overhead. Moreover, these buffers can be preallocated and passed between various processing stages in the ORB. In addition, Integrated Layer Processing (ILP) can be used to reduce data movement. Because ILP requires maintaining ordering constraints, we are applying compiler techniques (such as control and data flow analysis) to determine where ILP can be employed effectively.
Using compiler techniques for presentation layer and memory management functionality allows us to optimize performance without modifying standard OMG IDL and CORBA applications.
Back to the TAO intro page.
Last modified 11:34:24 CDT 28 September 2006