Robert Glaubius contributed to this research as a post-doctoral researcher.
Doctoral students at Washington University in St. Louis who have contributed to this research include Venkita Subramonian, Terry Tidwell, and Huang-Ming Huang.
Colleagues at other universities with whom we have collaborated based on the results of this research include:
The architecture diagram below depicts the different models we have developed, and how they cover different parts of the ACE architecture. In this figure, unshaded rectangles in the diagram represent models expressed directly as timed automata, while shaded rectangles represent models expressed as data structures that are shared among the automata:
However, simply maintaining a current thread identifier and continuing the execution of the automaton whose thread is current may accidentally over-constrain the model by eliminating concurrent execution that is possible in the actual system. The left side of the following figure illustrates this problem. We therefore refined the run-to-completion semantics further by adding a low priority automaton that only runs when the automata modeling threads are idle. As the right side of the following figure illustrates, this "idle catcher" automaton sets the current thread id back to nil, ensuring non-deterministic choice of the next thread to be run after an idle interval.
As a representative example of verification scenarios for ACE-based applications, we used our models to verify a version of the Gateway example that is distributed with ACE, in which notification of event delivery establishes a potential for deadlock. We used model checking in the IF toolkit to explore the effects of alternative event dispatching strategies for the ACE Reactor ("wait-on-connection" in which remote method invocations block in the calling object until the reply is received, versus "wait-on-reactor" in which the thread of execution making the call is returned to the reactor and the reply is handled asynchronously) in that example.
The following figure illustrates the results of those studies, which demonstrate the use of our models in detecting timing and concurrency hazards in non-trivial software systems. The left side of the figure shows an event sequence in which a deadlock occurs with the "wait-on-connection" strategy, which was detected automatically by the IF model checker using our models. With the "wait-on-reactor" strategy, model checking verified deadlock freedom, as one representative trace shown on the right side of the figure below illustrates.
Because of the lower complexity (and accordingly lower overhead) of the "wait-on-connection" strategy, we have been collaborating with Cesar Sanchez, Henny Sipma, and Zohar Manna at Stanford University who are working to develop a suite of deadlock avoidance protocols that can overcome the potential for deadlock hazards noted above for the "wait-on-connection" strategy. We have implemented the simplest of these protocols, called the BASIC-P protocol, in ACE's thread pool reactor, and have developed timed automata models of the protocol operating within the reactor. We are also working to implement and model the other protocols in this family to offer applications having different characteristics and requirements a range of flexible and verifiable options. The following figure shows a comparison of the execution traces produced by checking the models, and by running the same example using our implementation in ACE.
Preliminary results developed under this grant (published in Washington University Department of Computer Science and Engineering Technical Reports WUCSE-2006-34 and WUCSE-2007-34) were significantly expanded with support from NSF grant CNS-0716764 (Cybertrust) titled CT-ISG: Collaborative Research: Non-bypassable Kernel Services for Execution Security, to show how non-preemptive scheduling of a resource between tasks whose durations of use are stochastic could (through modeling as a Markov Decision Process) be achieved optimally with respect to a value measure that initially represented simple proportional sharing of the resource and then was generalized to a wider range of utility-based reward functions.
Although the concluding results of that research (published at RTSS 2010) established a good foundation towards on-line use of time-utility based value-optimal schedling policies in real systems, they also suffered from significant time and space complexity in generating and storing the scheduling policies themselves. Several additional advances were needed to make those techniques more applicable to real-world systems, which were pursued as the final phase of this project.
Pseudo 0 Heuristic
We evaluated the different heuristics using varying loads and also differently shaped utility curves. The figure to the left below illustrates "downward step" curves in which a particular value is accrued by completion prior to a termination time, after which no value is accrued. The middle figure below illustrates "linear drop" utility curves that give a fixed value for completion before a critical point, and then give linearly decreasing vale from the critical point until a termination time after which no value is accrued. The figure on the right below illustrates "target sensitive" utility curves in which value increases linearly until a critical point, then decreases linearly until the termination time, and is zero after that. In addition to these "soft real-time" utility curves, we also evaluated "hard real-time" versions in which instead of remaining at zero after the termination time the utilities were (randomly) assigned a large negative value.
In general, the deadline heuristic was the most effective for soft real-time cases with a downward step utiity curve, as the upper left figure below illustrates. For soft real-time cases with linear drop utility curves or target sensitive utility curves under high or medium load, the pseudo 0 heuristic performed best as the upper right figure below illustrates. For hard real-time cases, except for target sensitive utility curves under low load, the greedy algorithm performed best as the lower left figure below illustrates. As the lower right figure below illustrates, none of the heuristics was able to capture a significant fraction of the optimal value for target-sensitive utility curves under low load (in either soft real-time or hard real-time versions).
The tables below summarize the recommended heuristics for the cases we evaluated (with no heuristic being suitable for target-sensitive utility curves under low load).
The middle figure below shows how decision trees can be used to reduce the size of the policy representation in many cases, by combining related states (using greedy entropy minimization) into tree nodes that are organized hierarchically according to constraints on the state variables. The decision trees also implicitly provide decisions for states the value-optimal policy may have omitted.
To evaluate the effectiveness of this approach, we conducted evaluations to determine at what tree sizes (over randomly selected policies) the best encoding of the policy occurred. As the figure on the right below shows, in a noticable fraction of the cases a tree of size at most 10 was sufficient while in other cases a larger tree representation was needed.
The figure on the left below illustrates the approach, in which a leaf of the tree may use a heuristic to recommend an action, rather than having a fixed action always be specified. The figure on the right below shows the benefit of this approach: when the best combination of tree and heuristics at the leaves (labeled "best" in the figure) was used it outperformed the tree approach alone (labeled "tree" in the figure) in a small number of cases and was otherwise comparable, but significantly outperformed the best performing heuristic (labeled "heuristic" in the figure).
Further discussion of all of these culminating contributions, and details of the models and techniques used, is available in Terry Tidwell's doctoral dissertation.