# Metastability Lecture *

##### By DR. CHARLES E. MOLNAR

 "This is the first segment..." QuickTime movie, 40 seconds.

This is the first segment of a discussion of the metastability problem that occurs in synchronizers in clock systems, and some of the same analysis techniques and issues and ideas arise in talking about arbiters in asynchronous systems. And that has just been pointed out in A-D converters. But for the purposes of this discussion, we will talk about a problem in the context of synchronizing inputs to clocked systems.

My first encounter with this problem arose in designing a laboratory instrument computer called the LINC in 1963. This was intended to be a laboratory machine for biological researchers, and the essential point of novelty was that this system was designed to be inter-connected with laboratory equipment and interactive with the user so as to be able to do data collection, experiment control, and so on, in the laboratory instead of in a remote computer system accessed with punched cards.

One of the things that was required in that setting was the ability of the computer system to detect an external binary signal, called external level here, and to use it to control a skip instruction that, depending upon the value of the external level instruction, would enable the incrementing of the program counter.

That is represented by these registers. The idea was that if the external level were present or true that in the execution of the simple instruction sequence the skip on external level would cause the program counter to be incremented an extra time so that the next instruction would be taken from location twenty and if the external level was not asserted then the program counter would be incremented only once to the next instruction and take the next instruction from here.

The naive approach, which I blush to say was the way we first did it, was to allow the external level to enable the flip-flops in the program counter at an appropriate time and thereby enabling an extra stepping of the program counter. This is just the carry chain that will give the value P+1 as the data input to the program counter given that the present value is P. And of course, you all know what can happen here for particular counts such as an initial count of 0-1-1-1-1 the next count up would be 1-0-0-0-0 if I counted right and of course if the external level happened to be changing just as the clock pulse's critical edge was occurring then it is possible that some but not all of these flip-flop that were supposed to be complemented would be complemented and lead to an erroneous next instruction location.

This of course is outside of what the hardware designer or the programmer intended and represents a loss of control of the system resulting in behavior that is outside the design.

Of course, the textbook solution (they did read textbooks even in those days) was to synchronize that input signal and here a synchronizer was added in the path from the external level to the enable in order to make a single decision as to whether the external level was an 0 or 1 at the time of clock. This introduced an extra delay of one clock interval in making the decision, but the intent (the conventional wisdom) said that the flip-flop would either take the value 0 or the value 1, and by the time the next

 "We discovered even in those days..." QuickTime movie, 29 seconds.

clock pulse would be available to increment the program counter, this value would either be true or false and all of these flip-flops would see the same thing and behave consistently, either counting or not counting.

We discovered even in those days that it was not so simple and that there were occasions even with the synchronizer present in which there would be losses of control and if we hadn't seen the behavior of the system without the synchronizer to begin with, we probably would not have known where to look but would say "Gee, we're having the same kinds of loss of control on occasion even with the synchronizer present. These were slow circuits, these were circuits made with pnp germanium transistors out of Philco's surface barrier transistor line and the problems basically was that the synchronizer flip-flop was poorly designed and had absolutely atrocious behavior under these conditions. But the fact is that this got us to look at the problem and to think about it and to recognize that it was fundamental.

In more recent times, in the early 1970's, we were involved in doing some asynchronous design involving decisions about which of two inputs arrived first. The earlier experience with the synchronizer problem made us aware that there was something to be worried about here and this was with high speed ECL logic. After a considerable amount of very difficult work trying to set up measurement techniques, we could observe here cases in which a latch was receiving a data input that was changing at the time that the latch was being switched from transparent to holding, that we could after a lot of work jigger up a circuit that was able to capture the wayward trajectories. When a response wasn't resolved in the normal switching time, we could see trajectories that would have a long hang-time. Our attempts to publish materials at the time were rejected by the learned journals and the people did not want to believe that the problem existed.

 "One reviewer made a marvelous comment..." QuickTime movie, 42 seconds.

Circuit people said "Of course. So what," and system people said this can't be true because flip-flops only deal with zeros and ones. One reviewer made a marvelous comment in rejecting one of the early papers, saying that if this problem really existed it would be so important that everybody knowledgeable in the field would have to know about it, and "I'm an expert and I don't know about it, so therefore it must not exist."

Now, of course, it is respectable to talk about the problem and to try to analyze it. As those of you in this audience know, the problem goes beyond the reach of the usual tools of logic design and logical analysis.

This is a simple example of a bi-stable reset flip-flop in which initially both the set and the reset input (I have labeled them data and clock here) are asserted. Of course, the books say that you should never assert the set and reset inputs to a set/reset flip-flop together. Actually, what you should never do is to de-assert them at the same time.

 "If you make a standard, primitive flow table..." QuickTime movie, 59 seconds.

If you make a standard, primitive flow table for this in which the columns are labeled with the input values D and C, and the rows are labeled with the output values which determine the present state in the Huffman flow table analysis, then you can construct a flow table which tells you what happens when the data input changes from 0 to 1 first...then the system is stable here in the circle state 1/1 the change of the D input to 1 takes us over to this column, this points to the next state 0/1 which is stable and then the system can stay stable in this state independent of the subsequent value of C. If the C input changes first then the system goes to state 1/0 and then a subsequent later change of D leaves the system in the same state, so under the normal conditions of operation of a set/reset flip/flop this flow table analysis tells you exactly what happens when you apply this analysis and do what the fundamental mode of design introduced by Huffman prohibits, which is letting two inputs change together. But if

you just apply the mechanics you see that you have here if you change D and C both from 0 and 1 at the same time (whatever that means) that the next state pointed to here is 0/0 in state 1/1 with the inputs 1/1 the next state is 0/0 and according to this analysis the circuit would oscillate. What happens if these two inputs change at not quite the same time or what happens if the speeds of the gates are not perfectly well-matched, of course, is outside of the bounds of this analysis.

To go a little bit further, trying to learn how to think about the problem, we could assume for the moment that the data and input change at exactly the same moment and that the circuit is perfectly symmetric these two sides and the interconnections are symmetric.

So if the voltage here and here [between "Data" and "Clock"] are exactly the same, we may as well connect the points, no current will flow. And if these inputs are the same and the circuit is symmetric then these will be the same [between "X" and "Y"] and we may as well connect those together. If we wanted to analyze the perfectly symmetric case, then we could instead make up this chap with a "2" inside, which is two of these in parallel, and what we end up with is an inverter with a feedback around itself. At least part of the idealized problem then is to think about what an inverter with a

feedback does under conditions in which you change a two-input NAND, what the feedback does under conditions in which you change the input. If you go to the Boolean equations, you get frustrated here because when D/C is at 1, this element is operating as an inverter. And if you look at the Boolean equation for the inverter it says that U/not is equal to X/Y, or X/Y is equal to U/not because of the inversion. But if you look at the feedback path you say that U is equal to X/Y and there is no solution in the set of allowed values 0 and 1 of these equations.

So the logical analysis gives you no answer. Literally no solution. The next thing you might try to do is to make the circuit a little bit more realistic and to say well there's really delay here and if we wanted to be fancy we could put some delay here too and analyze it and then you can get a solution that would show the kind of oscillation that would be suggested by the flow table analysis. Now of course, this is an easy experiment to do. You don't need any precise timing control you just take a two input NAND and you connect one of outputs back to the input and so like good engineers we went to the bench and did this and we found that sometimes it would oscillate with some kind of circuits and with other kinds of circuits the outputs would go to a value that was somewhere in-between the 0 and 1 value and that the dynamics depended not only on the logic family but also on the particular sample we would select.