143 40 606KB
English Pages 219 Year 1995
Tutorial The Systematic Design of Asynchronous Circuits Michael Kishinevsky Luciano Lavagno Peter Vanbekbergen The University of Aizu, Politecnico di Torino, Synopsys, Inc., USA Japan Italy Cadence Berkeley Labs, USA
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
1
Outline Introduction and basics What is asynchronous, advantages and disadvantages, design styles, hazards and races, control/data dichotomy Specication and synthesis methodologies Event based models, concurrent languages, FSMs Synthesis from Signal Transition Graphs Synthesis ow, state assignment, logic synthesis, design-for-testability, comparison with synchronous Verication and validation of asynchronous designs Formal verication, checking properties, validation in VHDL environment Practical applications Interface design, asynchronous processors, DSP circuits, CAD tools ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
2
Motivation Reliable design methodology for asynchronous is coming Asynchronous design sometimes unavoidable (asynchronous interfaces, arbiters, etc.) Asynchronous is getting more important: "In the microprocessor of the year 2000] we will see perhaps even asynchronous design..." "Looking at the future" Anant Agrawal, Vice President, Engineering Sun Microsystems SPARC Technology Business ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
3
What is asynchronous? Asynchronous = not synchronous ) Synchronous: use an external global clock for observing system states Asynchronous: use internal and external events for observing system states Asynchronous are also called self-timed ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
4
Synchronous communication
Clock
Data input Register
Register Sender
Logic
Receiver
Clock Tsetup
Thold
Timing constraint: input data must stay unchanged within a setup/hold window around a clock event
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
5
Asynchronous communication Ack Register
Req
Register Logic
Sender
Receiver
Data
A signal handshake is used instead of a global clock Handshake can be implemented using delay padding or/and completion detection ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
6
Completion detection Ack Req Register
detector
Logic
Sender
Register Receiver
Data
Special redundant data encoding ) area overhead Completion detection forms a request ) more area overhead Completion detection can be done inside a receiver Tolerates data skew ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
7
Delay padding Ack d
Register
Req
Register Logic
Sender
Receiver
Data
Similar to synchronous Measure or estimate maximum logic delay Pad extra delay (if Req sets before Logic stabilizes) for mimicking completion detection Time redundancy { timing margins are required ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
8
Potential advantages Top ten lists at Async94 from Al Davis
No clock distribution Average case performance Low power consumption Robustness to parameter variations Modular composition Metastability either does not occur or has time to resolve Global synchrony does not exist anyway (hardware/software codesign)
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
9
Robustness 20
MOperations/sec
77K 15
Alain Martin 1989 Christian Nielsen 1994
300K
10
5
1 Voltage, V
5 op/s 1
2
3
4
5
6
7
8
9
10
Operating range: 5 op/s under 0.5V and 300K $ 19 Mop/s under 11V and 77K ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
10
Problems Have you ever seen asynchronous circuits? Area overhead Testability? How to design them?
A systematic design methodology is needed ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
11
Part 1: Basics Example: C-element design Completion detection Design styles
{ delay models { signalling methods { models of environment
Hazards and races: review Asynchronous data path ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
12
Synchronization of parallel processes ai
A
ao co
ci
? bi
B
bo
C
After A and B do C . Both rising and falling transitions have to be synchronized: ifif aa = b = 1 then c := 1 = b = 0 then c := 0 otherwise c := c AND-gate synchronizes only rising transitions
Synchronization can be done with a C-element
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
13
C-element on basic gates a c
1
ab
b
0
0
1
0
0
1
1
1
c 2
4
3
Is this implementation correct? ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
14
Delay assumptions Environment a 01 0
01 0
1 b
01
delay1
2
01 0 n1
01 0
4 n3
n4
3
delay2
n2
0
c
Delay 1 is a propagation delay from node 1 via environment node 3 Delay 2 is a topropagation delay from node 1 via internal feedback to node Delay 1 3must be greater than Delay 2
Delay assumptions inuence correctness of asynchronous design ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
15
CMOS implementation of a C-element weak b a a
b a a
b
b b
a c
a b
a c c
b
c
b a
Static Dynamic Quasi-static Are these implementations correct? Careful design is required (layout, transistor sizing, charge sharing)
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
16
RS-latch with completion detection Q’
S Completion
Q’
Completion Q
R
S
R
Q
Completion = 0 , < S R >=< Q Q0 > Gillies,Swartwout'61 (in Miller '65), Martin, van Berkel 80's] ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
17
Dual-rail four-phase encoding
1
Valid
Empty 4
Data
2
Ack
3 Send
Reset
X X.t X.f True 1 0 False 0 1 Empty 0 0 Valid = f True, False g
Cycle
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
18
Completion detection ...
X
...
Cn
completion(X)
x1.t x1.f
...
X
xn.t xn.f
...
C2
completion(X)
x1.t x1.f xn.t xn.f
Completion detectors for dual-rail four-phase code Armstrong et al. 69] Varshavsky 75] ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
19
Completion detection vs delay padding Completion detection { Completion detection is well developed { Requires an area overhead and an extra-delay. Not practical for wide data paths Bundled data { Padding delay allows to mimic the completion detection { Data path is synchronous { Requires timing redundancy for safe design ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
20
What inuences the design methodology Delay model Signalling model Environment model Specication language
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
21
Delay models Place: gates, wires, gates + wires Timing properties: bounded and unbounded
Bounded = lower and upper bounds for circuit components are known Unbounded = delays are unknown but nite
Memory properties: inertial and pure
Apply a train of short pulses to the input of a delay element before the output of a delay has changed its value. Inertial = all pulses are ltered out Pure = no pulses are ltered out Intermediate models are possible ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
22
Gate vs wire delay models
Gate delay model = delays are in gates, no delays in wires More realistic version: gates may have arbitrary delays, wires after fork may have delays less than one gate delay Wire delay model = gate+wire delay model: wires (and gates) may have arbitrary delays ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
23
Delay models and correctness Logic view
Transistor view a b
a
a
b
a
d1 a
c c
b
d3
c
c
d4
b
b
d5
d2
a b
This design is correct for unbounded gate delay model This design is incorrect for unbounded wire delay model If d1 > d3 + d5 or d2 > d3 + d4 then the implementation does not satisfy the specication Correctness of this design under the bounded delay model depends on delay constraints. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
24
Design styles for asynchronous control logic Bounded delays: realistic delay model for gates and wires.
Technology mapping is easy. Verication is dicult since time must be included in the model. Speed-independent: pessimistic delay model for gates (unbounded) and often acceptable delay model for wires (delays after fork are less than one gate delay). Technology mapping is more dicult. Verication is easy since time is excluded from the model. Delay-insensitive: pessimistic delay model for gates and wires (both are unbounded). The class of such circuits (built out of basic gates) is almost empty. Quasi-delay insensitive = delay-insensitive except critical wire forks, called isochronic forks. In practice this is the same as speed-independent. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
25
Signalling methods
Two-phase: rising and falling signal transitions are equivalent Four-phase: rising and falling signal transitions are not equivalent
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
26
Two-phase signalling 3
Req Ack Register
Ack
Req
Sender
Logic Data
1
Register Receiver
2
Data Cycle
Cycle
Two transitions at the control lines during one cycle acknowledge ) data change ) request ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
27
Four-phase signalling 1 Req
3
Register
Req
Sender
2
Ack
Ack Register
4
Logic
Receiver
Data
Data
Send
Reset
Cycle
Four transitions at the control lines during one cycle Control is simpler Reset phase: change data and reset control ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
28
Models of the environment Asynchronous circuits operate in the environment.
Slow enough environment = Fundamental mode: inputs can change after the asynchronous system has settled into a stable state Reactive environment = Input/output mode: inputs can change as soon as the rst output signal has changed due to the previous input change Constraints on input changes for fundamental mode and I/O-mode: { Single input change (SIC) - too slow { Multiple input change (MIC) - dicult to design using classical asynchronous FSM technique { Unrestricted input change (UIC) - very dicult to design using classical asynchronous FSM technique Fundamental mode constraints are similar to setup and hold restrictions in synchronous design. Fundamental mode is often too restrictive. Example: any pipeline operates in I/O-mode ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
29
Environment and correctness Environment a 01 0
01 0
1 b
01
delay1
2
01 0 n1
01 0
4 n3
n4
3
delay2
0
c
For fundamental mode this design is correct (necessary delay constraints hold by denition) For I/O-mode this design may be incorrect unless "Delay 1 > Delay 2" type constraints hold
n2
Assumptions on the environment inuence the correctness of asynchronous design ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
30
Goals of tutorial Will present design methodology for { I/O-mode (most general) { Bounded and unbounded delay models
(each of them may be useful for certain designs) { Four-phase signalling (more ecient and fast control). Two-phase signalling can be also handled.
Will not present { Specic methods for Fundamental mode { Specic methods for two-phase signalling { All the methods that have been devised in history : : : ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
31
Hazards a
1 1 -> 0
1 -> 0
b 0
1
1
0
transition cube
1 -> 0 -> 1
Hazard example: function static 1-hazard for c = a xor b
slow 0 -> 1
0 -> 1 2
Hazard: signal transition not specied by the designer Hazards make asynchronous design dicult Theory of hazards: Unger, Kung, Nowick] ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
32
Hazards and delay model Inertial gate delay model ) no hazards 10 5
Wire delay model ) hazards
10
5
20
ICCAD'95 tutorial
5
Hazard analysis depends on the delay model The Systematic Design of Asynchronous Circuits
33
Taxonomy of hazards 1 0
1 0
Static hazards
Signal behavior:
1 0
1 0
1
1
0
1 0
0
Dynamic hazards
static (no transition is required) or dynamic (monotonous transition is required) Circuit type: combinational (circuits without feedbacks) or sequential (circuits with feedbacks)
Due to specication or due to implementation:
function or logic essential or not essential Due to the I/O mode of operation: delay hazards occurs when two input transitions are too close together ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
34
Combinational hazards Function hazards: inherent in the specication of the logic function. Condition for 1-hazards: a transition cube contains 0 Cannot be eliminated by changing the logic Logic hazards: depend on the particular implementation of the logic function Can be eliminated by changing the logic (or sometimes the delays) ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
35
Static logic hazard elimination c b
ab c
0
1
1
0
1
1
0
0
0
1 a 1 c 1
1
0
2
1 1
0
1
1
0
0
1
c b
ab c
0
1
1
0
1
1
0
0
0
1
0
2
1 a 1 c 1 1 0 1 0 a 1 1 b 1
1 1
Static hazard: a signal which should remain constant changes Static logic 1-hazard elimination condition: each transition cube is contained inside a cube of the cover Static logic 0-hazards never occur in a two-level Sum-of-products implementation ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
36
Dynamic function hazards ab cd 00 01 11 10 00 0 1 0 1 Non-monotonous change
01 1
1
1
1
11 1
0
0
1
10 0
0
0
1
transition cube
Dynamic hazard: a signal which should change once changes multiple times Dynamic function hazard: non-monotonous change of the function inside the transition cube
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
37
Dynamic logic hazard elimination ab cd 00 01 11 10 00 0 1 0 1
a 01 b1
01 0
1
1
1
c 01
11 0
0
0
1
d 1
10 0
0
0
1
ab cd 00 01 11 10 00 0 1 0 1 01 0
1
1
1
11 0
0
0
1
10 0
0
0
1
10 10 10 slow
010
1010
very slow 0
a 01
10
b1
10 10
c 01 d 1
10
10
0
Dynamic logic 1to0-hazard elimination condition: if transition cube intersects cube c of the cover, then c contains the start point of the transition cube 0to1-hazards are symmetrical (the same for the end point) ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
38
Sequential hazards Sequential circuits circuits with feedbacks Sequential hazards occur because feedbacks propagate new values before the input net completes signal propagation to all inputs of the next state logic
Essential hazard: inherent in the FSM specication. cannot be eliminated by another state encoding ) cannot be eliminated for unbounded delays can be masked out by controlling the delays Non-essential hazards (races): can be eliminated by a proper state encoding ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
39
Non-critical races Encoding
transient states 001 00
00 a
10 b
b
s1
s2
011
000
10
10
10
10 011
000 10
10 010
Race: more than one state signal changes Non-critical race: all transient states drive the FSM into the same nal state ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
40
Critical races 00 a
10 b s1
001
b 00
s2
10 011 010 s3
10 b
000
011
000
10
10
10
10 110
b
s4
10
10 010
110
Critical race: dierent transient states drive the FSM into dierent nal states. Final state depends on the race winner ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
41
Race-free encoding 001 00
10 10
000
011 10
10 010 10
10 110
ICCAD'95 tutorial
10
10
Tracey encoding: Two stateinput transitions under the same b smust 1 !b sbe 2 and s 3 ! s4 distinguished by a stable state signal: 10 000 and 110 ! 10 100 011 !
100
The Systematic Design of Asynchronous Circuits
42
Types of race-free encoding Single bit change encoding: simple, bad throughput adjacent states dier by exactly one bit may require a few FSM cycles to reach a stable state
Single transition time encoding:
better throughput, more complex adjacent states may dier by multiple bits requires exactly one FSM cycle to reach a stable state Example: Tracey encoding Race-free FSMs may still have essential hazards ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
43
Mod-2 counter: essential hazards 1/0
0/0
very slow 01 1...0
1/0 transition1
x
01
00
01 transition2 0/0
0/0
transition3
1/1
01 01
01
y1’ z y2
010
11 0/0
y1 D 10
010
10 1/1
010
D
010
010
Unger's condition for essential hazards: state after three transitions of an input signal 6= state after one transitions of the same input signal ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
44
Why essential hazards occur Input net Next state delay1
delay2
logic
State
delay_next
delay_feedback
No essential hazards if and only if
jdelay 1 ; delay 2j < mindelaynext + mindelayfeedback
Essential hazards can be removed by controlling the delays ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
45
Delay hazards Environment a 010 1
010
b 01 2
n1
010
4 n3
n4
3 n2
c
010
I/O-mode: input b changes after output c, but before the output of gate 3This hasissettled a delay hazard
0
slow
Delay hazards may occur under I/O-mode even in designs free from essential hazards ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
46
Hazard elimination methods Changing delays (delay padding). Works only for bounded delay model Logic transformations (circuit redundancy). Works both for bounded and unbounded delay models Encoding of inputs and outputs (code redundancy) and/or changing allowed input transitions. Works both for bounded and unbounded delay models Fighting with hazards always requires some area and time overhead ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
47
Masking hazards by delay padding (1) 10
5
20
5
20
5
20
Controlling delays can eliminate hazards ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
48
Masking hazards by delay padding (2) slow a
b
a 0
1
1
0
1 -> 0
2 1 -> 0 1
Contradictory delay constraints ) controlling delays cannot eliminate hazards in general
b
0 -> 1 fast 0 -> 1 a
b
fast
a 0
1
1
0
0 -> 1 1 0 -> 1 1 1
b
1 -> 0
ICCAD'95 tutorial
slow
2 1 -> 0
The Systematic Design of Asynchronous Circuits
49
Hazard elimination by redundant encoding b
a
1
1 -> 0 0
1
1
0
1 -> 0 1 -> 0 -> 1 slow
0 -> 1
0 -> 1 2
c.t b.t b.f a.t a.f
00
01
11
10
00
0
0
*
0
01
0
0
*
1
11
* 0
* 1
*
* 0
10
*
a.t b.f
c.t
a.f b.t c.f
Function hazards cannot be eliminated by changing the logic ) change the specication Hazards depend on the allowed input transitions: XOR-gate is hazard-free under unique input changes in the fundamental mode Redundant encoding of input and output signals and four phase signalling protocol allows to eliminate even functional hazards
Dual-rail four-phase XOR-gate ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
50
Modern view on hazards and races Classical hazard/race taxonomy is complex for sequential circuits even in Fundamental Mode. It is much more complex for the I/O-mode: Hazards may be viewed as signal transition disabling Critical races may be viewed as incomplete (ambiguous) specications
Hazard avoidance: design a circuit in which transition disabling cannot occur Critical avoidance: make therace specication complete and unambiguous ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
51
Speed-independent design 0
0
Stable
1 Enabling 1
0
Enabled (0*)
1 Switching 1
Disabling
1
1 time
0
0
Stable
1 Output changes
Input changes
For unbounded gate delay model: Disabling cannot occur ) no hazards No need for a complex theory of hazards for this delay model ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
52
Critical races as incomplete state coding 001 00
10 10
000
011 10
10
10
10 010
Conflict
y2 must change 10
10 010
FSM spec is ambiguous, i.e. incomplete State assignment conict: < y1 y2 y3 > next state y20 010 0 010 1 ... ...
110 y2 must stay unchanged
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
53
Complete state coding 0010 00
10 10
0000
0110 10
10
10
10 0100
Solving state assignment
conicts:
Add new state signals or Remove one of the states (by re-encoding, reducing concurrency)
10 10 0101
ICCAD'95 tutorial
1101
The Systematic Design of Asynchronous Circuits
54
Data path design Two ways to design data path: Synchronous data path - bundled data Asynchronous data path (speed-independent = quasi delay-insensitive) Asynchronous data path is useful if performance is data dependent and the average case computation delay is much less than the worst case delay Two ecient methods for asynchronous data path: reduced direct logic (RDL) and dierential cascode voltage switch logic (DCVSL) ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
55
Indication a
a c
C
b b a
a
b
b
c
c
ICCAD'95 tutorial
c
OR-gate: output indicates falling input transitions C-element: output indicates
falling and rising input transitions Indication: A. Taubin (1980) Transition if and only cif:indicates transition a cc cannot occur without a for is a "completion detector" a. Design idea: All input and internal transitions must be indicated by some output transitions Choose indicating outputs to optimize performance/area
The Systematic Design of Asynchronous Circuits
56
Di erential Cascode Voltage Switch Logic Req
Output.f
Output.t
N transistor Inputs
network Req
ICCAD'95 tutorial
Inputs are (typically) dual-rail Additional request signal is used for precharging Outputs are dual-rail fourphase (valid-empty protocol) Staticizers (cross-coupled inverters) may be needed for output signals Two cross-coupled pull-up transistors may improve performance
The Systematic Design of Asynchronous Circuits
57
Reduced direct logic Input1.t Input1.f Output.t
Output.f
Design idea:
N transistor Inputs
network Input1.f
ICCAD'95 tutorial
Inputs and outputs are fourphase dual-rail No additional request signal Full custom design only
Input1.t
Add a pair of p-transistors for each output and a pair of ntransistors for each dual rail input Optimize n-transistor logic Distribute indication of the inputs among the outputs
The Systematic Design of Asynchronous Circuits
58
Asynchronous ripple-carry adder c.t c.f sum.t
cout.t
sum.f
c.t
c.f
c.t
c.f
b.t
b.f
b.f
b.t
a.t
a.t
a.f
a.t
a.f
b.t
ICCAD'95 tutorial
a.t a.f
a.t a.f b.t b.f c.t a.t
b.t b.f c.f
b.f b.t
a.f
cout.f
a.f
b.f
A. Martin (1991) Asynchronous cell transistor count = 34 Synchronous cell transistor count (without pass transistors) = 28 Inputs and outputs are fourphase dual-rail
The Systematic Design of Asynchronous Circuits
59
Critical delay in asynchronous adder Critical delay
Indication in
0
valid phase 1011 0001
1
0
0
0
1
0
1
1
1100
Indication in empty phase
Critical delay is proportional to log2N (N = number of bits) It is determined by the maximal number of contiguous dierent bits 32-bit adder delay (1.6 MOSIS CMOS): asynchronous: 11 nsec synchronous: 40 nsec ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
60
Part 2: Specication and synthesis Specication of control
{ CSP { FSM { Signal Transition Graphs (STG)
STG in details: basics, limitation and extensions, timing Synthesis overview Simulation engine Complete state coding = race elimination ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
61
CSP CSP = communicating sequential processes. Concurrent processes communicating by input and output commands on channels. Semantics are based on trace theory. Udding, van de Snepscheut, Rem] In general, techniques try to come up with quasi-delay-insensitive implementations Two main approaches: Production rule based Martin, ...] and Syntax-directed decomposition Ebergen, Berkel] ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
62
CSP PROGRAM li] ro " ri] ro # :ri] lo " :li] lo #]
\" = sequencing. r0 " = r0 goes high, r0 # r0 goes low.
li] = wait until li is high, :li] wait until li is low. Program meaning: li has to be high before r0 goes high. The circuit will wait until ri goes high, before r0 will go low again and so on .... ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
63
PRODUCTION RULES li 7! ro " ri 7! ro # :ri 7! lo " :li 7! lo #
production rule = boolean condition 7! transition. li 7! ro ": when li is high, ro will go high. :ri 7! lo ":
when ri is low, lo will go high.
conict: ro " and ro # are not mutually exclusive. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
64
PRODUCTION RULES(2) Add a state signal to solve conict: li] ro " ri] x " x] ro # :ri] lo " :li] x # :x] lo #] The new production rules: :x ^ li 7! ro " ri 7! x " x 7! ro # x ^ :ri 7! lo " :li 7! x # :x 7! lo # ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
65
IMPLEMENTATION li ro x
ff
x
lo ri
The production rules can now be identied with operations. r0 maps to an and-gate (r0 = xli). :x ^ li 7! ro " x 7! ro # ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
66
FSM y1y2 s1
00
xC+xC/o xC/o C+xC/o
s2
10
xC/o xC/o
s3
11
xC/o
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
67
FSM(2) Nodes = states, arcs = state transition Each arc is labeled with a pair of input=output symbols. input = condition for state transition to take place. output = values for output signals after the state transition.
Mostly limited to fundamental mode Unger]. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
68
BURST MODE FSM s1
a-b-/x-
a+/y+ s2 b-/y+
b+/x+y-
s4
b+/ys3
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
69
BURST MODE FSM based on signal transitions rather than signal levels Nowick, Yun]. First a set of input transitions res. Second a set of output transitions res. Finally state variables change and take the system to another state. Mostly limited to fundamental mode Unger]. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
70
SIGNAL TRANSITION GRAPH TIMING DIAGRAM
input
a
STG
a+
b+ c+
output
c a-
bc-
input
b
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
71
SIGNAL TRANSITION GRAPH(2)
Vertices represent signal transitions. Arcs represent causal relations between transitions. Introduced by Molnar 85] Chu 87] Rosenblum 85]
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
72
SIGNAL TRANSITION GRAPH (3) STG a+
PETRI NET b+
a+
b+
p3
p4
c+
c+ p1
a-
bc-
p5
p6
a-
b-
p7
p8
p2
c-
STG is an interpreted Petri net. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
73
CONSISTENCY
CONSISTENT
INCONSISTENT A+
A+
Two subsequent up-transitions without a down-transition in between
A-
A-
Two subsequent down-transitions without an up-transition in between
x+
A+
A-
A+
Two up-transitions with a down-transition in between (and vice versa)
Parallel transitions of the same signal A-
A+
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
74
MODELING POWER CONCURRENCY c+ a-
b-
CHOICE
a-
ICCAD'95 tutorial
a- concurrent with bEITHER a- fires before bOR b- fires before aOR a- and b- fire simultaneously
EITHER a- fires OR b- fires but not both b-
choice made by environment
The Systematic Design of Asynchronous Circuits
75
FIRING SEMANTICS = TOKEN = PLACE c+ = TRANSITION c+ = enabled p4
p3
c+ has fired c+ fires
c+
c+ p5
ICCAD'95 tutorial
p4
p3
p6
p5
p6
The Systematic Design of Asynchronous Circuits
76
SYNTHESIS Signal Transition Graph Translation Petri Net Token flow simulation direct method
Reachability Graph Encoding State Graph
Karnaugh Map Logic synthesis Logic equations
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
77
REACHABILITY GRAPH Vertices represent states the circuit can be in. Arcs represent transition from one state to another. Can be derived from the Petri net by simulating the token ow. Easiest representation to do analysis on. Size may be exponential in the number of transitions. Similar to trace structures Dill], transition diagrams Muller]. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
78
REACHABILITY GRAPH(2) p1
Fire a+
p2 a+
b+
p3
p4
Fire b+
c+ STATE {p1,p2} p1 STATE {p1,p3}
p2 a+
b+
p3
p4
p1
c+
a+
b+
p3
p4
STATE {p1,p4}
c+
p1 Fire b+
p2
p2 a+
b+
p3
p4
Fire a+
c+ STATE {p3,p4}
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
79
STATE GRAPH 000
{p1,p2} a+ {p2,p3}
100
{p1,p4}
010 a+
b+
a+ b+ {p3,p4} c+
110 c+
ENCODING
{p5,p6} a-
b-
{p6,p7}
{p5,p8}
111 b-
a011
101 a-
b-
a-
b-
b+
a+
b+
{p7,p8}
001
cREACHABILITY GRAPH
c-
ICCAD'95 tutorial
STATE GRAPH
The Systematic Design of Asynchronous Circuits
80
ENCODING signal a signal b
000 a+ signal a high, 100 signal b low
signal a and b low b+ 010
signal a low, signal b high
Every state is assigned a code. This code is determined by signals in the Petri net. Based on this code, logical equations can be derived. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
81
EQUATIONS 000 b+
a+ 100
c
010 110 c+
K-MAP
111
00
01
11
10
0
0
0
1
0
1
0
1
1
1
c
a+
b+
ab
b-
a011
101 LOGIC EQUATIONS a-
b001 c-
c = ab + ac + bc
STATE GRAPH
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
82
EQUATIONS(2) c 000
c
b+
a+
K-MAP 100
010 a+
b+
ab 0
00
01
11
10
0
0
1
0
1
110 c+
State 000: c is assigned 0 because c stays low in this state. State 011: c is assigned 1 because c goes high in this state (c+ is enabled). ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
83
LIMITATIONS The advantages of STG. { Concurrency of signal transitions. { Conditional behavior based on signal transitions. The disadvantages of STG. { Conditional behavior based on signal levels. { No timing information just relative ordering of events. { No don't care behavior. { No or-node behavior. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
84
LEVELS AND TRANSITIONS CK
CK
SDA
SDA
0 bit
1 bit
CK
CK
SDA
SDA
Protocol distinguishing dierent situations. stop command
ICCAD'95 tutorial
start command
The Systematic Design of Asynchronous Circuits
85
STG WITH EXTENSIONS CONDITION SIGNAL LEVEL
SDA
SDA
CK+
.....
CONDITION TRANSITION
SDA+
SIGNAL LEVEL
CKb
CK-
b a1
b0
SDA#
DON’T CARE SDA
SDAs
extensions proposed by Moon et al] ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
86
OR PRECEDENCE a+
b+
a+
b+
c+
c+
c goes high after a and b have gone high
c goes high after a OR b have gone high NO CHOICE!!
This behavior is hard to model with classical Petri nets. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
87
DON'T CARE & UNDEFINED BEHAVIOR CK+
CK
I#
Is
CK−
I (b)
(a)
I #: signal I becomes unstable. I s: signal I becomes stable. The level is unknown. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
88
LEVEL TRANSITIONS a a+
b+
out 0
out
a-
b-
1
b
out
Pulse on a sets out to 0, Pulse on b sets out to 1. Pulses may come in any order. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
89
BOOLEAN GUARDS X1, X2 are Boolean characteristic functions.
Functions determine (together with tokens) which transition is enabled. ι=0
Χ1 t1
Χ2
ι=1
t2
t1 is enabled if p contains a token AND signal i = 0. t2 is enabled if p contains a token AND signal i = 1. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
90
TRANSLATION TO STATE GRAPH m
m(i)=0 +
+
#
si
si
si
m’
m’(i)=1
m
m(i)=?
m’
m’(i)=1
#
si
0
si
si
m(i)=?
m’
s
1
1
si
m
si
m’ m’(i)=0
m’(i)=-
m
1
m(i)=-
si
m’’ m’’(i)=1
m if m(i) == 0 then t1 is enabled i=0
i=1 if m(i) == 1 then t2 is enabled
t1
ICCAD'95 tutorial
t2
The Systematic Design of Asynchronous Circuits
91
TIMING: SPECIFICATION A A UP B
[20-40]
B UP
>30
Environment: if A goes high, B will go high 20 to 40 ns. later. Circuit: if A goes high UP needs to go high before B and needs to stay high at least 30 ns. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
92
TIMING: SPECIFICATION(2) Delay relation A
[20-40] Tolerance
B UP
>30 Specification relation
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
93
TIMING: SPECIFICATION(3) Specication relations indicate how the circuit should react, if synthesized correctly. Delay relations indicate how an existing circuit (or environment) reacts. Tolerance: delays are not xed numbers, but intervals (even more complex symbolic representation is possible).
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
94
TIMING CONSTRAINTS Analysis of timing constraints Rokicki, Hulgaard, Burns, Dill, Berthomieu, Amon, Walkup]. Take into account analysis results Myers]. Active synthesis to satisfy timing constraints Borriello, Vanbekbergen]. Still a lot of research needed in this area. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
95
TIMING: ANALYSIS [135-inf]
t3
[30-40]
t5
[0-20]
t1
MAX(t4-t5)?
[0-inf]
[50-80] t2
[50-80]
t4
Hard problem in its full generality Berthomieu]. Ecient solution by Rockiki (but might still be exponential). Polynomial solutions with various restrictions Dill, Amon]. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
96
TIMING: SYNTHESIS Trigger to clock edges
A
Add delay line B UP
20ns
CK
20ns
[20-40]
>30
A H 30 ns
set out
UP
reset
UP
Solutions with varying restrictions Borriello, Vanbekbergen]. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
97
SYNTHESIS: OVERVIEW AFSM
STG
VHDL
SIMULATION RESULTS
STATE GRAPH
TECH INDEPENDENT EQUATIONS LIBRARY HAZARD-LIST TOLERANCE TECH. DEPENDENT EQUATIONS
CORRECTION PHASE
VHDL
SIMULATION RESULTS
LAYOUT
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
98
SYNTHESIS: OVERVIEW(2) Simulation at high level (specication) and low level (circuit) Start from AFSM or STG Race-elimination by state graph transformation First step hazard-elimination (technology independent) Second step hazard-elimination (technology dependent) ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits
99
SYNTHESIS: OVERVIEW(3) Map to any library. The library may contain special elements to get better results, but synthesis should not rely on it. Library elements need to be hazard-free. Tolerance factor indicating how much delays of cells in library may vary. Back-annotation. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 100
WHY VHDL? The need for a consistent language for all digital applications (synchronous and asynchronous). Tools (simulators) for free. The complete design can be validated at dierent levels of abstraction. Useful for debugging software prototypes ... ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 101
SIMULATION ENVIRONMENT STG
SYSTEM
USER
OR BEHAVIORAL VHDL
GATE-LEVEL VHDL
ASSERTION VHDL
TESTBENCH VHDL
VHDL SIMULATION OR
RESULTS
ICCAD'95 tutorial
ASSERTION VIOLATIONS
The Systematic Design of Asynchronous Circuits 102
ARCHITECTURE
inputs BEHAVIORAL VHDL OR GATE-LEVEL VHDL
TEST BENCH
Error messages
ICCAD'95 tutorial
outputs state signals
ASSERTION VHDL
The Systematic Design of Asynchronous Circuits 103
VHDL MODEL p2
p1 s+ p3
p3
State = set of tokens in places A vector of Booleans: places { places(1) = 1 if place p1 contains a token. { places(1) = 0 if place p1 does not contain a token. ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 104
VHDL MODEL(2) p2
p1 s+ p3
p3
BLOCK (places(1) = ’1’ AND places(2) = ’1’) BEGIN places(1) 90 ns > 40 ns bcsl brl bgninl bbsyl asl borgtl basl bwrl writel dtackl bgnoutl
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 195
The VMEbus master interface protocol Decompose into arbitration and read/write protocol: { simplify specication and synthesis tasks { separate the arbitration component (meta-stability) Both decomposed STGs must agree on the partial order of common transitions (\net contraction", Chu '87, or \hiding', Dill '88)
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 196
The arbitration STG bgninl-
bgnoutl-
bgninl+
bgnoutl+
bgninl-
bcslbcsl+ bbsyl90bbsyl-
bbsyl+
bbsyl90+
ε
bgninl+
initial
bcsl-
bgninl-
bgnoutl-
bgninl+
bgnoutl+
bgninlbcsl+ bbsyl90bbsyl-
bbsyl+
bbsyl90+
ε
bgninl+
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 197
A meta-stability problem We must latch and wait: : : (or use a mutual exclusion element) bcsl bgninl bgninl_del bgninl
bgninl_del bcsl_del
CK
bcsl D
bcsl_del Q
bcsl bgninl bgninl_del bcsl_del
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 198
(Half of) the read/write STG baslbcsl-
brl-
borgtl-
writelborgtl40-
ε
bbsyl-
aslin-
dtackl-
aslin+
asloutbrl+
aslin-
bbsyl+
basl+
aslout+
aslin+
bcsl+
borgtl+
borgtl40+
ε
dtackl+
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 199
State encoding initial baslbcsl-
brl-
borgtl-
writel-
is0borgtl40ε
bbsyl-
aslin-
aslin+
asloutbrl+
aslin-
bbsyl+
is0+ dtackl-
basl+
aslout+
aslin+
bcsl+
borgtl+
borgtl40+
ε
dtackl+
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 200
Initial implementation bgnoutl = bbsyl + bcsl bgnoutl + bgninl bbsyl = bbsyl90 bgninl + bbsyl bcsl + bbsyl bgninl + bgnoutl brl = bcs l + borgtl + is0 borgtl = aslin borgtl + bbsyl bcsl writel + borgtl is0 aslout = aslout is0 + basl + borgtl40 + bwrl writel + dtackl writel = aslout is0 + borgtl + bwrl + dtackl is0 = aslin borgtl + bbsyl is0 + borgtl is0 + brl is0 ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 201
Final implementation (XILINX 3000 or SC) bbsyl90
bbsyl
bgnoutl bgninl_del bcsl_del
is0 borgtl40 bwrl writel dtackl basl aslin borgtl
bcsl_del bgninl_del bbsyl
is0 borgtl bcsl is0 aslout borgtl dtackl bwrl
ICCAD'95 tutorial
aslout
is0
bgnoutl brl bbsyl
brl writel bcsl bbsyl
writel
borgtl
is0 aslin
The Systematic Design of Asynchronous Circuits 202
Hazard example dtackl-
basl+ bwrl+ writel+ aslout+
aslin+
bcsl+
borgtl+
borgtl40+
ε
dtackl+
if aslout+ dtackl+ + dtackl writel aslout writel then writel may have a 0 ! 1 ! 0 ! 1 hazard is0 aslout borgtl dtackl bwrl
writel
is0 = 1, aslout = 0 ! 1, borgtl = 0 (! 1), dtackl = 0 ! 1, bwrl = 0 (! 1) ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 203
AMULET processor AMULET is an asynchronous implementation of the ARM microprocessor Goal: microprocessor design with low power consumption (DEC Alpha - 30 watts) AMULET1 { two-phase bundled data design two-phase speed-independent control + synchronous data path AMULET2 { four-phase bundled data design four-phase speed-independent control + synchronous data path S. Furber 1995
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 204
Micropipelines Sutherland's micropipeline: Control path = Muller pipeline Data path = Event controlled latches (Capture and Pass) Rin Rout
Rin Rout
Control Data
Ain Aout
Rin
Ain Aout
Control
Muller control pipeline
Rout
C
C Ain
Micropipeline structure
C
Aout
1
0
0
1
1
0
1
1
0
0
1
0
transfer
ICCAD'95 tutorial
Control operation
no transfer
The Systematic Design of Asynchronous Circuits 205
Two-phase control, two-phase data path C
Rin
Aout Sutherland’s micropipeline with two-phase ’capture-pass’ data in
data out
Ain
latch
Rout phi1
phi2
Token
Two-phase clocking
Data-flow operation
Bubble
D1
D2
D2
D3
D3
D4
D1
D1
D2
D2
D3
D4
transfer
ICCAD'95 tutorial
Two-phase register cell is too large and too slow ) level sensitive (fourphase) transparent latches + two-phase to four-phase converters for controlling a register (Amulet1)
no transfer
The Systematic Design of Asynchronous Circuits 206
Two-phase control, four-phase data path 1
One transition at Rin 4
Rin
C
Aout 2
forces two transitions at the latch control
5
Latch
data out
data in
latch weak 6
toggle Ain
data out
3
Rout
Enable data in
Two-phase to four-phase control conversion is too expensive ) four-phase control + level sensitive (four-phase) transparent latches (Amulet2) ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 207
Four-phase control, four-phase data path nAout a Rin
aC1
aC2
b Rout
Asymmetric C-elements: a = Rin*b + a (Rin+nAout + b) nAin+ b = a + b* nAout aC1: nAout
data in
nAin
data out latch Rin b
weak a
Rout-
Rin+
nAout+
a+
b-
nAin-
Rout+
Rin-
nAoutab+
Only half of the cells of a Muller pipeline can be lled with data With four-phase control a pipeline can be made dense Extra memory is needed in the control to distinguish between the old and new value Day pipeline (with hazards) ) speed-independent pipeline was obtained by synthesis (Forcage) Four-phase control: 30% faster and consumes 30% less power than two-phase ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 208
Amulet results Amulet1/ES2 ARM6 Process 1m 1m Area(mm) 5.5 4.1 4.1 2.7 Transistors 58,374 33,494 Performance 20.5 kDhry 33.5 kDhry Multiplier 5.3 ns/bit 25ns/bit MIPS/W 77 120
Amulet2 expectations: 3 times better than Amulet1 on the same fabrication process ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 209
Counterow Pipeline Processor Architecture Conventional RISC pipeline: instructions and data ow in one direction Counterow pipeline: instructions and data ow in opposite directions Sproull, Sutherland, Molnar 94] Sun Microsystems Labs.
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 210
CFPP advantages and disadvantages Potential advantages:
{ Local control { no need in global "pipeline stall" signals { Regular and modular structure (proof of correctness is easy to get) { Local communication between pipeline stages
Potential disadvantages:
{ Sinchronization between two ows requires arbiters
(many points of metastability) { Low latency { Control hazards (traps, branches) require invalidating wrong results ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 211
A simplied counterow pipeline structure Registers
Update results Dest Source R value R value
Res1 Res2 R value R value
Garner sources Stage i
results
Instructions
Op
=
=
=
=
tags (register names) comparison
Instruction fetch
ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 212
Control structure PI
Events:
AR Ex Control
AI - accept instruction
Exec
PI - pass instruction AR - accept result PR - pass result
AI
ICCAD'95 tutorial
PR
Ex - execute instr.
The Systematic Design of Asynchronous Circuits 213
Design challenge: CFPP state graph States:
Arbitration AI
E
AR PR
PI
R
I AI AR F
PI
Ex
PR
Events:
E - Empty
AI - accept instruction
I - Instruction
PI - pass instruction
R - Result
AR - accept result
F - Full
PR - pass result
C - Complete
Ex - execute instr.
C
How to synthesize this state graph formally? A few solutions proposed based on STG and CSP ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 214
Tangram Approach: asynchronous for low power Tangram = CSP-based language Direct compilation to asynchronous circuits: one language construct ) one circuit component + peephole optimization Using STGs allows resynthesis and optimization Pe~na 95] Low-power parts for portable audio equipment (DCC Error Corrector) Van Berkel 93,94] ICCAD'95 tutorial
The Systematic Design of Asynchronous Circuits 215
DCC Error Corrector results Average case switching activity (correct words { 98%)