Semantic Domain Integration for Embedded and Hybrid Systems


NSF Support

This research is supported by the National Science Foundation, grant CCF-0615341.


Project Description

Society is increasingly dependent on complex mission-critical engineered systems such as supervisory control and data access systems for power grid management, industrial control systems for automated manufacturing, and medical device systems for patient monitoring and treatment. The potential failure of these systems puts safety, health, and economic concerns of vital national interest in jeopardy. To protect these vital interests, it is crucial that these engineered systems maintain rigorous control over physical properties such as power flows, drug release rates, and spatial positioning. Furthermore, controlling these physical properties requires precise control over systemic properties such as communication and computation latencies, sensor sampling rates, and actuation response times. The system software that manages these engineered systems must monitor, evaluate, and respond to changes in the engineered system, while also coordinating computation, communication, sensing and actuation resources across heterogeneous and time-varying application requirements.

The current lack of integration among the following semantic domains limits the ability of system developers to exert precise control over physical and systemic properties of engineered systems: (1) application--application-specific quality of service (QoS) semantics required to ensure that high-fidelity control over the engineered system can be maintained; (2) system software--the QoS semantics of the system software components used to implement the application; (3) resource management--rigorous run-time resource management to ensure application-level QoS requirements can be met within the context of the system software QoS semantics; and (4) behavioral information--information about the observed run-time behavior of the system software and the engineered system itself. The problem that this research addresses is the disjointed manner in which these highly inter-dependent semantic domains are handled in the current state of the art, which limits the system developers' ability to address key current challenges, such as preventing (or at least mitigating) cascading power grid system failures.

Our research group at Washington University in St. Louis, in collaboration with researchers at the University of Kansas and the University of Missouri-Rolla, is developing a revolutionary approach to system software for complex systems in which application QoS requirements, system software QoS semantics, resource management, and behavioral information are integrated through (1) mutually consistent formal and verifiable models of each semantic domain; (2) novel policies and mechanisms for exerting precise run-time control across semantic domains; and (3) detailed, efficient, and timely collection and dissemination of behavioral information to improve run-time control fidelity. Through rigorous integration of these semantic domains we aim to achieve a much greater correspondence among their respective semantics, and establish a foundation for revolutionary improvements in the state of the art, particularly for increases in system accuracy and reliability in producing desired behaviors (and in preventing undesired behaviors) in complex mission-critical engineered systems.

Project Participants

Investigators

Christopher Gill, Associate Professor of Computer Science and Engineering, is PI for this research at Washington University in St. Louis.

Douglas Niehaus, Associate Professor, Electrical Engineering and Computer Science Department, is PI for collaborative research at the University of Kansas.

Bruce McMillin, Professor, Department of Computer Science, is PI and Mariesa Crow, Professor, Department of Electrical and Computer Engineering, is co-PI for collaborative research at the University of Missouri-Rolla.

Doctoral Students

Doctoral students at Washington University in St. Louis who have contributed to this research are Terry Tidwell and Yuanfang Zhang.


Doctoral Dissertation


Publications


Presentations


Discoveries

Our previous research (supported by NSF CAREER grant CCF-0448562) has shown that detailed models of essential middleware mechanisms can be developed, composed, and for constrained examples verifieded tractably, using state of the art timed automata model checkers. However, to apply this approach to a wider range of real-time systems, particularly those involving more general forms of preemptive concurrency, new techniques are needed to address decidability and tractability concerns.

In particular, our previous work has focused on the time and event semantics of middleware mechanisms whose interactions are by design (1) decoupled through explicit interfaces, and (2) temporally predictable at the time scales of interest to real-time middleware design, both of which we are able to leverage in modeling and verification of their semantics. Despite such design for predictability in the middleware mechanisms themselves, in many soft real-time systems of interest additional temporal variability and event asynchrony introduced by the application and by the system's operating environment must also be addressed.

In this current research, our discoveries include four main contributions towards this goal. First, we have developed formal models for the sharing of computational resources among different system activities. Second we have defined a practical schedulability analysis for generalized sporadic tasks whose timing behavior is subject to multiple iter-arrival constraints. Third, we have developed reconfigurability extensions to allow real-time component middleware to accommodate and enforce timing semantics of both periodic and aperiodic tasks. Fourth, we have designed (and implemented in MC-ORB) a new real-time middleware concurrency architecture specifically designed for real-time tasks atop modern multi-processor and multi-core platforms.

1. Formal Models for Resource Sharing

In this first area of contribution, we have shown how bounded fair scheduling policies introduce a quasi-cyclic structure in the state space of multi-threaded real-time systems. We also have shown that bounds on the divergence of threads' execution can be determined for that quasi-cyclic structure, which then can be exploited to reduce the complexity of model checking. To demonstrate these features, we performed a case study involving progress-based fair scheduling of multi-threaded processing pipelines, with which a preliminary evaluation of the approach has been conducted.

Further details about this contribution can be found in Washington University Department of Computer Science and Engineering Technical Report WUCSE-2007-34. These results also served as a starting point for a broader investigation of resource scheduling among multiple activities with non-preemptive stochastic intervals of exclusive resource use, which was conducted under support from NSF Cybertrust grant CNS-0716764.

2. Practical Schedulability Analysis for Generalized Sporadic Tasks

Our second area of contribution has focused on defining a practical schedulability analysis for a generalized sporadic task model, in which a particular set of inter-arrival constraints may restrict the execution of each task. While reasonably straightforward to calculate, this model and analysis can evaluate workloads resulting from sophisticated combinations of multiple inter-arrival constraints more accurately than simpler models (e.g., leaky bucket), as the following figure illustrates:

This schedulability analysis includes schedulability tests both for single processor cases and for end-to-end cases. For the end-to-end cases a generalized release guard protocol must be used at run-time to enforce subtask separation, as the following figure illustrates:

To evaluate the generalized sporadic task (ST) model alongside the periodic task (PT) model which it generalizes, we ran simulations to compare them in the presence of release guards (RG).

The figures below illustrate the benefit of the generalized sporadic task model: as shown by the figure to the left, when one task (T3) is allowed 30% jitter in its execution time, the bounds on the tasks' worst case response times under the generalized sporadic task model are no worse than (and in the case of the lowest priority task T1, improve upon) those under the periodic task model; as the figure to the right illustrates, when that task's jitter increases to 60%, the sporadic task model afforded significantly tighter bounds than the periodic task model, for all tasks in our experiment.

The figure on the left below offers a similar comparison of the worst case response times for the lowest priority task (which is most significantly impacted by jitter in higher priority tasks), as task T3's jitter percentage is steadily increased. The bounds remain consistent up to about 33% jitter, beyond which the response time for task T1 becomes higher and then unbounded under the periodic task model. The figure on the right below also shows a similar impact on the ratio of tasks which miss their deadlines, which also increases with the increasing jitter percentage in the periodic task model but not in the generalized sporadic task model.

In addition to the smaller task set used for illustrative purposes in the experiments described above, we also evaluated a larger (ten task) task set that is representative of more complex real world applications. As the figure on the left below illustrates, similar results were seen when one task was allowed significant jitter (in this case 75%): the generalized sporadic task model was at least as good and in many cases much better than the periodic task model, while for multiple tasks the periodic model could not bound response times at that level of jitter. The figure on the right below also shows a similar increase in the task miss ratio for the periodic task model (but not the generalized sporadic task model) as that task's jitter increases, for this task set as well.

We also examined the effect of increasing the number of simultaneous releases of a task, the number of separate time constraints on a task, and the number of tasks with multiple time constraints. As the figures below indicate, the deadline miss ratio was consistently better under the generalized sporadic task model than under the periodic task model, for this representative task set.

Further details about this contribution can be found in our ECRTS 2008 paper.

3. Reconfigurable Real-Time Middleware for Periodic and Aperiodic Tasks

Our third area of contribution has been the development of reconfigurability extensions to allow real-time component middleware to accommodate and enforce timing semantics of both periodic and aperiodic tasks. In the component middleware architecture we have developed, illustrated by the figure below, each sub-task of a distributed end-to-end task is hosted on an application processor, and duplicate components are provided as alternatives for balancing load among the application processors. A separate dedicated task manager provides admission control and load balancing services which interact with the application processors to ensure real-time performance of end-to-end tasks.

Within this architecture, admission control (AC), idle resetting (IR), and load balancing (LB) can be performed per-task or per-job, and idle resetting and load balancing also can be disabled (none). Idle resetting per job does not make sense if admission control is performed per task, so combinations involving those contradictory settings are disallowed by our configuration extensions. The figure below shows the different valid combinations of these strategy alternatives that can be configured in our approach.

As the figure below illustrates, we have implemented this architecture atop the CIAO real-time component middleware, by defining separate components for the LB, AC, and IR services and inter-connecting them (through events pushed to component ports via a federated event channel) within a chain of processing for the task that begins with a task effector component that initiates processing by the first subtask and continues through the last subtask.

To allow different applications to be configured readily with different strategy options (by answering a few questions posed by a simple textual interface), we have developed a configuration engine that acts as a front-end to CIAO's deployment and configuration engine (DAnCE), as the figures below illustrate.

To validate our approach, and to evaluate the performance, overheads, and benefits resulting from it, we conducted several experiments on a testbed consisting of six single core (Pentium) machines (one for the task manager and the others for the application processors), all connected by a 100 megabit Ethernet switch.

In the first experiment, we generated random sets of tasks each with 4 aperiodic and 5 periodic tasks. We evaluated all 15 of the reasonable strategy combinations, and measured the accepted utilization ratio -- the ratio of the utilization by jobs that were accepted and released to run, to the utilization of all jobs that were requested (some of which were not admitted) -- the ideal value for which is 1.0. The results of this experiment, shown in the figure to the left below, demonstrate the importance of performing both admission control and idle resetting per-job, with an additional though less dramatic improvement for performing load balancing per job, for these task sets.

To evaluate conditions in which application loads are distributed disproportionately among servers, we conducted a second experiment in which all tasks were assigned to 3 of the 5 application processors and duplicates were assigned to the other 2 processors. This configuration ensures that at least initially load is concentrated more heavily on the 3 application processors to which the tasks are assigned (though the duplicates allow load balancing onto the other 2 application processors if that strategy is configured). The results of this experiment, shown in the figure to the right above, demonstrate the increased importance of load balancing (per task or per job) when loads are imbalanced, in addition to confirming the importance of per job admission control and idle resetting in these cases as well.

We also measured the overheads of the service components we developed, using the same platform configuration. The figure below illustrates the different paths through the service components, and our measurements showed that even the greatest accumulated delay incurred along any of these paths was less than 2 milliseconds, which is reasonable for many distributed real-time applications.

Further details about this contribution can be found in our ICDCS 2008 and TPDS 2010 papers.

4. Real-time Middleware for Multi-Processor and Multi-Core Platforms

Our fourth area of contribution has been the development of MC-ORB, a new real-time middleware whose concurrency architecture is specifically designed to host real-time tasks atop modern multi-processor and multi-core platforms. Because existing off-the-shelf operating systems (e.g., Linux) upon which many distributed real-time systems are being built are still evolving to cope with the implications of the increasingly commonplace multi-processor and multi-core platforms, we began by measuring the timing of different communication and migration semantics for threads on different cores. Our first experiment simply measured the round-trip delay for signal transmission between two threads running on separate cores. As the figure below illustrates, we found that the round trip delays were similar when launched from two different cores, and that they were distributed around 20 microseconds with a standard deviation of 4 microseconds.

We also evaluated the offsets between the clock values for the two cores by collecting an intermediate time stamp (y) in between the time stamps immediately before (x) and after (z) the round-trip interval, as the figure on the left below illustrates. The figure in the middle below shows the relative offsets calculated from the perspective of each core, and the figure on the right below shows the absolute offset obtained by reversing the signs of the values collected on the initial core (core 0). The results of these experiments show that the clock frequencies of the cores are similar and that a non-negligible offset (e.g., around 27 microseconds) between the cores may exist, which we continued to measure and account for in our further experiments.

To evaluate the performance implications of alternative thread allocation and migration strategies, we next conducted experiments to quantify the cost of thread migration under three distinct scenarios. In the first scenario, a currently running thread migrates itself. The results of that experiment, shown in the figure on the left below, revealed a migration delay of between 16 and 45 microseconds. In the second scenario, a thread migrates another currently running thread. The results of that experiment, shown in the figure in the middle below, revealed a migration delay of between 18 and 36 microseconds. In the third scenario, a thread migrates another thread that is asleep. The results of that experiment, shown in the figure on the right below, revealed a migration delay of between 4 and 10 microseconds. These results demonstrate that thread migration delays can be large compared to other mechanism costs, and that if migration is used there is a clear (and sometimes substantial) advantage to triggering a thread's migration from another thread rather than a thread migrating itself.

We took the results of our experiments into account in designing the architecture of MC-ORB, which is illustrated by the figure to the left below: (1) a multi-core-aware task allocation service is used to reduce the number of thread migrations among cores; (2) that service is invoked exclusively by a single highest priority manager thread to avoid priority inversions; and (3) handing off task invocation requests to dedicated thread pools on each core to further reduce thread migrations. In contrast, the leader-followers architecture, which is used by previous generation object request brokers such as TAO and nORB and is illustrated in the figure on the right below, may incur significantly greater thread migration overhead on multi-core platforms. This is because the leader-followers architecture was designed to reduce thread context switches in uniprocessor platforms, but when tasks are distributed among multiple cores may find itself on another core when it receives an invocation request, than the core on which the task to be invoked resides. MC-ORB addresses this problem through the series of inter-thread hand-offs described above, which while undesirable in a uniprocessor setting (due to the additional context switches they incur) help to improve overall performance in a multi-core setting.

Further details about this contribution can be found in our RTCSA 2009 paper, and also on the MC-ORB web page where links for open-source download and installation instructions also can be found.