This research investigates how to maintain survivable operation of such systems, even in the face of invasive attacks where computations are intentionally subverted to interfere with other computations' execution constraints. The goal of this research is to develop new techniques for isolating the effects of interactions among computations through specific resources in these systems, including: flexible specification and rigorous enforcement of computations' execution constraints; explicit control of all OS kernel components under a single scheduler; detailed on-line monitoring of computations and their supporting OS kernel components; automated learning to discover previously unknown interactions among computations; and formal modeling and verification of computations, execution constraints, and system components and resources.

The expected benefits of this project include: a novel approach to non-bypassable isolation of computations from the effects of adversarial attack in which isolation can be enforced flexibly according to the system-specific execution constraints that must be satisfied; a high quality open-source software implementation of kernel-level scheduling and monitoring services that provide and measure such non-bypassable isolation; new formal models, analyses, and methodologies for verifiably correct configuration and management of those services; and empirical studies of the services' ability to protect computations from interference under a wide range of adversarial attacks.

William D. Smart, Associate Professor of Computer Science and Engineering, is co-PI for this research at Washington University in St. Louis.

Douglas Niehaus, Associate Professor, Electrical Engineering and Computer Science Department, is PI for collaborative research at the University of Kansas.

- Robert Glaubius, Scheduling Policy Design using Stochastic Dynamic Programming, Washington University, December 2009

- Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart,
Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers,
31st Real-Time Systems Symposium (RTSS), San Diego, CA, November 30-December 3, 2010.
- Robert Glaubius, Terry Tidwell, Christopher Gill, and William D. Smart,
Real-Time Scheduling via Reinforcement Learning,
26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), Catalina Island, CA, July 8-11, 2010.
- Robert Glaubius, Terry Tidwell, Braden Sidoti, David Pilla, Justin Meden, Christopher Gill, and William D. Smart,
Scalable Scheduling Policy Design for Open Soft Real-Time Systems
16th Real-Time and Embedded Technology and Applications Symposium (RTAS 2010), Stockholm, Sweden April 12-15, 2010.
**Received RTAS 2010 best student paper award.** - Robert Glaubius, Terry Tidwell, Christopher D. Gill, and William D. Smart,
Scheduling Policy Design for Autonomic Systems,
International Journal of Autonomous and Adaptive Communication Systems, 2(3), 2009.
- Robert Glaubius, Terry Tidwell, William D. Smart, and Christopher
Gill, Scheduling Design and Verification for
Open Soft Real-Time Systems, 29th Real-Time Systems Symposium
(RTSS 2008), Barcelona, Spain, November 30 - December 3, 2008.
- Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius,
Scheduling as a Learned Art,
4th International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2008), Prague, Czech Republic, July 1, 2008.
- Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart,
Scheduling for Reliable
Execution in Autonomic Systems, 5th International Conference on Autonomic and
Trusted Computing (ATC 2008), Oslo, Norway, June 23-25, 2008.

- Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart,
Optimizing Expected Time Utility in Cyber-Physical Systems Schedulers,
31st Real-Time Systems Symposium (RTSS), San Diego, CA, November 30-December 3, 2010.
- Robert Glaubius, Terry Tidwell, Christopher Gill, and William D. Smart,
Real-Time Scheduling via Reinforcement Learning,
26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), Catalina Island, CA, July 8-11, 2010.
- Christopher Gill, CPS Scheduling Policy Design with Utility and Stochastic Execution,
GA Tech Cyber-Physical Systems Summer School, Atlanta, GA, June 23-25, 2010.
- Robert Glaubius, Terry Tidwell, Braden Sidoti, David Pilla, Justin Meden, Christopher Gill, and William D. Smart,
Scalable Scheduling Policy Design for Open Soft Real-Time Systems
16th Real-Time and Embedded Technology and Applications Symposium (RTAS 2010), Stockholm, Sweden April 12-15, 2010.
- Christopher Gill, Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times,
Cyber-Physical Systems Seminar, University of Pennsylvania, Philadelphia, PA, USA, November 23, 2009.
- Christopher Gill, Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times,
CS Systems Symposium, University of North Carolina, Chapel Hill, NC, USA, November 20, 2009.
- Robert Glaubius, Terry Tidwell, William D. Smart, and Christopher
Gill, Scheduling Design and Verification for
Open Soft Real-Time Systems, 29th Real-Time Systems Symposium
(RTSS 2008), Barcelona, Spain, November 30 - December 3, 2008.
- Christopher Gill, William D. Smart, Terry Tidwell, and Robert
Glaubius, Scheduling as a Learned Art, 4th
International Workshop on Operating Systems Platforms for Embedded
Real-Time Applications (OSPERT 2008), Prague, Czech Republic, July 1,
2008.
- Terry Tidwell, Robert Glaubius, Christopher Gill, and William D.
Smart, Scheduling for Reliable Execution in
Autonomic Systems, 5th International Conference on Autonomic and
Trusted Computing (ATC-08), Oslo, Norway, June 23-25, 2008.

- Robert Glaubius, Terry Tidwell, Christopher Gill, and William D. Smart,
Real-Time Scheduling via Reinforcement Learning,
26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), Catalina Island, CA, July 8-11, 2010.
- Christopher Gill, Non-bypassable
Kernel Services for Execution Security, Cybertrust PI Meeting Poster Session,
Yale University, New Haven, CT, March 16-18, 2008.

The goal of this research has been to develop new techniques for non-bypassable specification and enforcement of execution guarantees for system tasks, leveraging and extending those prior results. In this research, our discoveries to date include seven major contributions towards this goal. First, we have established a novel approach to scheduling policy design and verification, that is specifically designed for resource allocations to tasks with variable execution times. Second, we have developed a novel technique for using repeated (periodic) structure to wrap the state space into a tightly bounded form. Third, have shown how a state space suitable for verification with existing timed automaton model checkers can be generated by first enumerating verification states that consist of sets of utilization states, and then defining suitable difference bound matrices (DBMs). Fourth, we have developed a new technique called Expanding State Policy Iteration (ESPI) that allows computation of an optimal policy for a wrapped state model. Fifth, we have shown that for highly structured examples like proportional sharing of a resource, the structure of the generated scheduling policies themselves can be approximated with high accuracy, leading to a scalable technique for generation of high quality scheduling policies for larger numbers of concurrent activities. Sixth, we have shown that even when distributions of tasks execution times are unknown in advance, if the distributions are stable then reinforcement learning can be used to obtain suitable policies on-line. Seventh, we have extended our approach to include periodic tasks whose individual utility can be expressed as functions of their completion times.

The first contribution of our research is an approach to scheduling policy design and verification for systems in which: (1) multiple threads of execution require mutually exclusive use of a single common resource (i.e., a CPU) in order to run; (2) whenever a thread is granted the resource, it occupies the resource for a finite and bounded subsequent duration; (3) the duration for which a thread occupies the resource may vary from run to run of the same thread but overall obeys a known independent and bounded distribution over any reasonably large sample of runs of that thread (as the figures below illustrate); and (4) a scheduler initially chooses which thread to run according to a given scheduling policy, dispatches that thread, waits until the end of the duration during which the thread occupies the resource, and then repeats that sequence perpetually.**Scheduling Policy Design and Verification**Using the distributions of thread execution times, we define a Markov Decision Process (MDP) whose actions correspond to the decision to dispatch a particular available thread (in the system model described above we assume all threads are always available), and whose states correspond to the number of time quanta during which the resource was utilized by each thread (represented as integer-valued vectors). The MDP's transition function is defined in terms of the run-time distribution for each thread.

To bound the number of states, our initial approach introduced a termination time (our subsequent development of the wrapped state model, described in the next section, allows us to do away with this restriction), resulting in

*absorbing states*from which further actions do not change the state of the system, as the figure below illustrates.We then are able to generate a verification state space, whose states consist of reachable subsets of the utilization states on each action, based on the threads run-time distributions, as the following figure illustrates. In the figure below, utilization states are denoted by circles, which are grouped into (often overlapping) verification states denoted by the boxes around each such group.

To illustrate the use of these techniques, we generated an optimal policy for the two thread run-time distributions shown above, which is illustrated in the figure below. Important features of the figure below include (1) the specified target utilization boundary which the policy intends to enforce, which is illustrated by the black ray in each panel of the figure, (2) the slight parallel overlapping of the edge of the dark grey region (in which thread 1 is dispatched) with the target utilization ray, which is shown in the left and bottom right panels of the figure below, and (3) the further deviation and then convergence of the edge of the dark grey region near the termination time, which is shown in the top right panel of the figure.

The target utilization is a specified input to the generation of the optimal policy. The overlap of the dark grey region with the target utilization ray is due to the threads relative run-time distributions: near the ray and even slightly below it, running thread 1 has a higher probability of keeping the utilization nearer the target ray than does running thread 2. The deviation from and then convergence back to the ray near the termination time is an artifact of the termination time itself (which in turn is a motivation for our development of the wrapped state model described next). This illustration of the approach highlights its ability to encode scheduling policies for time-varying thread executions, and to enforce them with reasonably good precision. Further discussion of the approach, and details of the models used, is available in our ATC 2008 paper.

Our second contribution is to overcome limitations of the initial version of our approach described above, by exploiting features of the original utilization state space as shown in the figure below, without introducing an artificial termination time restriction. In the figure below, transitions that move to the right (shown in red) correspond to dispatching thread 1, while transitions that move upward (shown in black) correspond to dispatching thread 2. The target utilization ray is shown with a dashed line, and the states through which that ray passes are marked with a hash pattern.**Wrapped State Model**The intuition on which our second contribution is based, is that the marked states in the figure above are highly similar: they have the same reward, and the relative distribution of future states from them is the same, so that we would expect the optimal policy to prefer the same action in these states. The key idea of our approach is to detect such similar states and collapse them together to obtain a smaller, more tractable MDP model. We say that a system is

*periodic*if and only if the utilization ray passes through utilization states at regular intervals.This periodic property allows us to combine equivalence classes of the original utilization state space shown above, into individual states of a

*wrapped state model*shown in the figure below. Each state in the wrapped state model corresponds to one such equivalence class, and the transitions are again shown in red for thread 1 and in black for thread 2.One further refinement is needed since in the figure above the state space is still unbounded (though significantly reduced). We observe that the wrapped state model shown in the figure above consists of possible displacements from the utilization ray among integer-valued states. In states that are sufficiently far from the target utilization we expect the greedy policy of choosing the action that is expected to result in a state nearest target utilization is a suitable proxy for the optimal policy. With the assumption that each thread can not exceed some maximum run-time on a single execution, our strategy for bounding the state space is then to truncate the state space to include only the states in which the choice of which thread to schedule is not already determined with high probability. The resulting

*bounded wrapped state model*is shown in the figure below, in which the absorbing states reflect the equivalence of further actions once the decision of which thread to schedule is already well determined.The impact of wrapping the state model on the cost of verification of properties based on it, is also significant. In the resulting verification state space, the initial exploration of which is illustrated in the figure shown below, each verification state consists of the utilization states reachable from a previous verification state under a particular action. In our previous approach, as the termination time (and with it the finite history of system states over which verification states were computed) was increased the resulting verification state space grew exponentially, with a history size of 20 resulting in 2883441 verification states 3604283 and transitions, which took 10 hours 47 minutes to enumerate using the IF toolset. Our wrapping approach greatly reduces the size of the verification state space, so that for the same example an exploration using the IF toolset took less than a second as the verification state space had only 201 states and 398 transitions.

The third contribution of our research is a further refinement of our previous verification approach, based on defining difference bound matrices (DBMs), that (1) can represent even wrapped state models even more efficiently, and (2) allows us to verify systems in which some of the state variables have continuous rather than discrete domains.**Verification with Continuous State Variables**We define DBMs to represent salient features of both the wrapped state space and the scheduling policy itself. The figure above shows the general form of DBMs needed to represent the utilization regions of the wrapped state space, while the figure below shows decision regions of the policy itself, both encoded as DBMs.

We also define operators

*run*and*wrap*over the DBMs, which transform a DBM according to the effects of (1) running a particular thread, or (2) wrapping the state space, respectively.The figure above shows the effect of applying the

*run*operator to DBM D, transforming it into DBM D', which encodes the effects of running thread 2. The left side of the figure below shows the possible DBMs that may be obtained by then applying the wrap operator to D', using different values of a specified*compression parameter*. The right side of the figure below shows the results of intersecting those possible DBMs with the utilization regions encoded by the wrapped state space DBMs.The state space wrapping technique and the use of DBMs for verification allow us to: (1) remove the artificial termination time, and its resulting artifacts; (2) produce a much more compact representation of the utilization and verification state spaces which significantly reduces the cost of defining and verifying properties of scheduling policies; and (3) verify properties over state variables with continuous values as well as variables with discrete ones. Further discussion of these refinements to our approach, and details of the models and techniques used, is available in our RTSS 2008 paper.

The fourth contribution of our research is a new technique called Expanding State Policy Iteration (ESPI) that allows computation of an optimal policy for a wrapped state model. The ESPI technique we have developed is based on an existing dynamic programming technique for solving MDPs called policy iteration (Puterman, 1994). Our fundamental contribution is to define policy evaluation and policy improvement steps over the wrapped utilization state space, resulting in a combined policy iteration technique that produces a locally optimal scheduling policy. We also provide an evaluation of ESPI including showing conditions that guarantee termination. Further discussion of these refinements to our approach, and details of the models and techniques used, is available in our paper published in the International Journal of Autonomous and Adaptive Communication Systems, 2(3), 2009.**Expanding State Policy Iteration**

The fifth contribution of our research is the development of a class of compactly parameterized policies for the utilization state models described above. This approach overcomes a major limitation of our MDP models, the explicit enumeration of exponentially growing state spaces, while retaining the ability to evaluate and compare policies in the value-based framework established by MDP based policy generation techniques like ESPI (discussed above).**Conic Approximation of Scheduling Decision Boundaries**The figure below shows an approximation to the optimal scheduling policy for a problem instance with two tasks, where each point denotes a state in the wrapped state space. The target utilization ray (corresponding to utilization shares of 7/12 and 5/12) is shown as a dashed ray eminating from the bottom left corner. Each task has a different duration distribution supported on the interval (0, 8]. Task 1 advances along the horizontal axis, and task 2 advances along the vertical axis.

The scheduling policy selects task 1 in states denoted by closed circles, and selects task 2 in states denoted by closed squares. Notic that the

*decision boundary*(the surface separating regions of the state space where the policy is homogeneous) can be described using a ray parallel to the utilization ray, with an offset from it labeled d in the figure above.The figure below illustrates an approximation to the optimal scheduling policy for a problem instance with three tasks, again with each distinctly shaped dot in the figure illustrating a decision to run a particular task in that state. The policy is further illustrated by plotting it at three different time horizons, with the leftmost figure showing the state space after 10 time steps, the middle figure after 20 time steps, and the leftmost figure after 30 time steps.

This leads to a compact parameterized model for n tasks consisting of n

*action vectors*and a*decision offset*(labeled d as in the earlier example with two tasks). The leftmost figure below shows the parameterization of a two task example. The middle figure below illustrates the generalization of the model to three tasks, and the rightmost figure below shows how the action vectors and decision offset combine to define the decision boundaries between homogeneous regions.To evaluate the quality of the policies generated by this new technique, we performed experiments comparing the results using this parameterized model with two different search strategies (hill climbing and policy gradient), to results obtained by a greedy policy heuristic and a heuristic of always scheduling the most underused task. With four tasks, we also found a case where we were able to compare these policies to a policy generated by our MDP based approach, though the MDP based approach was only tractable for a subset of the four task examples we tried.

The leftmost figure belows shows results from the four task case, in which both search strategies converged towards the MDP based bound, with performance comparable to the greedy heuristic and significantly outperforming the underused heuristic. The middle figure shows the results of a ten task case, with the parameterized policy significantly outperforming both the greedy and underused heuristics after about 20 iterations of search (the MDP based approach was intractable for ten tasks). The rightmost figure below shows that with only two or three tasks the MDP, greedy, and parameterized approach (after 100 iterations of search) perform similarly though the MDP based approach is slightly better: however, as the number of tasks increases (and the MDP based approach becomes intractable) the parameterized approach increasingly outperformed the greedy and underutilized approaches.

Further discussion of these extensions of our approach, and details of the models and techniques used, is available in our RTAS 2010 paper.

The sixth contribution of our research is to consider task sets in which execution time distributions for tasks are unknown in advance and instead must be observed at run-time, as may be common in many systems where environmental factors and other complications make the distrubution of execution times dependent on the sysem's operating context.**Scheduling Policy Design with Unknown Execution Time Distributions**We assume that each task's duration obeys some underlying but unknown stationary distribution, and provide

*probably approximately correct*(PAC) bounds on the computational complexity of learning a near-optimal policy using balanced wandering. Our result is novel, as it extends established methods for learning infinite Markov decision processes to a domain with a countably infinite state space with unbounded costs.We compared the performance of several exploration strategies in the context of task scheduling by conducting experiments comparing optimistic interval-based, epsilon-greedy, and balanced wandering exploration strategies. To compare the performance of these exploration strategies, we generated 1000 random problem instances with two tasks. Duration distributions for these tasks consisted of discretized normal distributions supported on the interval [1,32], with means and variances selected uniformly at random from the respective intervals [1,32] and [1,8]. Utilization targets for each task were chosen according to u = (u1, u2)/(u1 + u2), where u1 and u2 were integers selected uniformly at random over [1,128]. We conducted experiments by initializing the model in the state x = (0, 0). The controller simulated a single trajectory over 20,000 decision epochs in each problem instance with each exploration strategy. We report the number of mistakes, i.e., the number of times the exploration strategy chooses an action that has greater expected cost than the greedy action. The results of these experiments are shown in the figures below. We report 95% condence intervals on the mean number of mistakes each exploration strategy makes, averaged across the problem instances described above. Note that these plots have different scales due to the variation in mistake rates among exploration strategies.

The leftmost figure compares the performance of interval-based optimistic action selection to that of an "Exploit" policy that behaves greedily with respect to the estimated model at each decision epoch. All of the interval-based exploration settings we considered exhibited statistically similar performance. Interestingly, the exploitive strategy yields similar performance to the explorative strategies despite its lack of an explicit exploration mechanism. The exploitive policy's benets are clear in compari- son to the two other exploration strategies we consider.

The middle figure illustrates the performance of greedy exploration. Notice that the mistake rate decreases along with the likelihood of taking exploratory actions, i.e., as 0 approaches zero. Explicit exploration may not improve performance in this domain. This is further supported by our results for balanced wandering. The theory behind balanced wandering is that making a few initial mistakes early on will pay off in the long run due to more uniformly accurate models.

The rightmost figure shows that this is not the case in in our scheduling domain, as a purely exploitive strategy with m = 0 outperforms each balanced wandering approach. These results suggest that the exploitive strategy may be the best available exploration method in the task scheduling problem. One plausible explanation is that the environment itself enforces rational exploration: if some task is never dispatched, the system will enter progressively more costly states as that task becomes more and more underused. Thus, eventually the esti- mated benet of running that task will be substantial enough that the exploitive strategy must use it.

Further discussion of these extensions of our approach, and details of the models and techniques used, is available in our UAI 2010 paper.

The seventh contribution of our research is to consider tasks whose individual utility can be expressed as functions of their completion times. Unlike our previous considerations of utility based on proportional resource shares, these utility functions do not inherently induce the regularity needed to form suitable state spaces. To re-introduce such regularity, we therefore restrict our attention to task models with periodic (rather than always-available) job releases.**Utility-Aware Scheduling of Periodic Tasks**Our resulting task model is illustrated in the figure below. The top represents a classic periodic real-time task with a single job released in each period and a relative deadline after which the utility of the job falls to zero. The middle represents a generalization in which multiple jobs may be active in the system at once, with utility extending beyond the period in which they were released. The bottom represents another example, where the utility is constant for a given interval and then decays linearly to 0. As is illustrated in these examples, our task model assumes that each job's utility falls to zero at some

*termination time*that is the same duration after the release times for all jobs of a given task (but that these offsets may differ from task to task).The figure below illustrates how these utility curves can be integrated within an MDP-based approach that is similar but differs in several important details from our previous work. The first difference is that each task's utility function (shown at the top of the figure below) must be covolved with the distribution of execution times for the task to produce a distribution of

*expected potential utility density*which can be used to define a reward function.The second difference is that the state space is based on the available jobs at each discrete time step within the hyper-period (the least common multiple of all the task periods). As in our previous work, the transitions in this state space encode the scheduling of each job as a distinct action, but the third difference is that a special

*idle*action that schedules no job during a particular time step is also always available. This last extension allows scheduling policies to be non-work-conserving, allowing execution of a job to be delayed until greater utility can be accrued from it.The figures below illustrate a simple deterministic two task system, with the leftmost and middle figures showing the transitions for choosing to schedule each of the tasks, and the rightmost figure showing the transitions for the idle action. Note that in each of these figures the hyper-period is only four time units long, after which the transitions wrap back to the beginning of the hyper-period.

The figure below shows the set of reachable states resulting from the transitions in the figures above.

To evaluate the efficacy of our MDP-based approach, particularly in comparison to heuristic methods, we conducted experiments using three distinct utilty functions that are illustrated in the figures below. The leftmost figure represents a classic hard real-time task where utility is constant until its termination time. The middle figure represents a soft real-time task where utility is initially constant and the decays linearly until the termination time. The rightmost figure represents a target sensitive task (e.g., for processing audio-video frames) for which utility is zero initially, increases linearly to a peak, and then decays linearly to its termination time.

For each experiment, we generated task duration distributions using an upper and lower bound, and a threshold dividing the lower 80 percent and upper 20 percent of the distribution, as the figure below illustrates. Together with additional constraints, these bounds enforced a total offered load that remained mostly between 70 percent and 90 percent of system capacity, with occasional transient overloads up to 120 percent.

The figures below show the performance of the heuristics we examined, relative to the value-optimal policies generated by our MDP-based approach. The upper left figure was generated with 2 tasks, the upper right figure with 3 tasks, the lower left figure with 4 tasks, and the lower right figure with 5 tasks.

The figure below summarizes how the performance of the heuristics decays with increasing numbers of tasks, up to 5 tasks beyond which our MDP-based approach became intractable. While these results demonstrate the feasibility of extending our techniques to small numbers of tasks with diverse individual utility characteristics, they also motivate significant future work towards addressing the questions of scalability and different forms of task availability that remain.

Further discussion of these extensions of our approach, and details of the models and techniques used, is available in our RTSS 2010 paper.