The Systematic Design of Asynchronous Circuits


143 40 606KB

English Pages 219 Year 1995

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

The Systematic Design of Asynchronous Circuits

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Tutorial The Systematic Design of Asynchronous Circuits Michael Kishinevsky Luciano Lavagno Peter Vanbekbergen The University of Aizu, Politecnico di Torino, Synopsys, Inc., USA Japan Italy Cadence Berkeley Labs, USA

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

1

Outline Introduction and basics What is asynchronous, advantages and disadvantages, design styles, hazards and races, control/data dichotomy Specication and synthesis methodologies Event based models, concurrent languages, FSMs Synthesis from Signal Transition Graphs Synthesis ow, state assignment, logic synthesis, design-for-testability, comparison with synchronous Verication and validation of asynchronous designs Formal verication, checking properties, validation in VHDL environment Practical applications Interface design, asynchronous processors, DSP circuits, CAD tools ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

2

Motivation Reliable design methodology for asynchronous is coming Asynchronous design sometimes unavoidable (asynchronous interfaces, arbiters, etc.) Asynchronous is getting more important: "In the microprocessor of the year 2000] we will see perhaps even asynchronous design..." "Looking at the future" Anant Agrawal, Vice President, Engineering Sun Microsystems SPARC Technology Business ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

3

What is asynchronous? Asynchronous = not synchronous ) Synchronous: use an external global clock for observing system states Asynchronous: use internal and external events for observing system states Asynchronous are also called self-timed ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

4

Synchronous communication

Clock

Data input Register

Register Sender

Logic

Receiver

Clock Tsetup

Thold

Timing constraint: input data must stay unchanged within a setup/hold window around a clock event

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

5

Asynchronous communication Ack Register

Req

Register Logic

Sender

Receiver

Data

A signal handshake is used instead of a global clock Handshake can be implemented using delay padding or/and completion detection ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

6

Completion detection Ack Req Register

detector

Logic

Sender

Register Receiver

Data

Special redundant data encoding ) area overhead Completion detection forms a request ) more area overhead Completion detection can be done inside a receiver Tolerates data skew ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

7

Delay padding Ack d

Register

Req

Register Logic

Sender

Receiver

Data

Similar to synchronous Measure or estimate maximum logic delay Pad extra delay (if Req sets before Logic stabilizes) for mimicking completion detection Time redundancy { timing margins are required ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

8

Potential advantages Top ten lists at Async94 from Al Davis

No clock distribution Average case performance Low power consumption Robustness to parameter variations Modular composition Metastability either does not occur or has time to resolve Global synchrony does not exist anyway (hardware/software codesign)

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

9

Robustness 20

MOperations/sec

77K 15

Alain Martin 1989 Christian Nielsen 1994

300K

10

5

1 Voltage, V

5 op/s 1

2

3

4

5

6

7

8

9

10

Operating range: 5 op/s under 0.5V and 300K $ 19 Mop/s under 11V and 77K ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

10

Problems Have you ever seen asynchronous circuits? Area overhead Testability? How to design them?

A systematic design methodology is needed ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

11

Part 1: Basics Example: C-element design Completion detection Design styles

{ delay models { signalling methods { models of environment

Hazards and races: review Asynchronous data path ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

12

Synchronization of parallel processes ai

A

ao co

ci

? bi

B

bo

C

After A and B do C . Both rising and falling transitions have to be synchronized: ifif aa = b = 1 then c := 1 = b = 0 then c := 0 otherwise c := c AND-gate synchronizes only rising transitions

Synchronization can be done with a C-element

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

13

C-element on basic gates a c

1

ab

b

0

0

1

0

0

1

1

1

c 2

4

3

Is this implementation correct? ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

14

Delay assumptions Environment a 01 0

01 0

1 b

01

delay1

2

01 0 n1

01 0

4 n3

n4

3

delay2

n2

0

c

Delay 1 is a propagation delay from node 1 via environment node 3 Delay 2 is a topropagation delay from node 1 via internal feedback to node Delay 1 3must be greater than Delay 2

Delay assumptions inuence correctness of asynchronous design ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

15

CMOS implementation of a C-element weak b a a

b a a

b

b b

a c

a b

a c c

b

c

b a

Static Dynamic Quasi-static Are these implementations correct? Careful design is required (layout, transistor sizing, charge sharing)

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

16

RS-latch with completion detection Q’

S Completion

Q’

Completion Q

R

S

R

Q

Completion = 0 , < S R >=< Q Q0 > Gillies,Swartwout'61 (in Miller '65), Martin, van Berkel 80's] ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

17

Dual-rail four-phase encoding

1

Valid

Empty 4

Data

2

Ack

3 Send

Reset

X X.t X.f True 1 0 False 0 1 Empty 0 0 Valid = f True, False g

Cycle

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

18

Completion detection ...

X

...

Cn

completion(X)

x1.t x1.f

...

X

xn.t xn.f

...

C2

completion(X)

x1.t x1.f xn.t xn.f

Completion detectors for dual-rail four-phase code Armstrong et al. 69] Varshavsky 75] ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

19

Completion detection vs delay padding Completion detection { Completion detection is well developed { Requires an area overhead and an extra-delay. Not practical for wide data paths Bundled data { Padding delay allows to mimic the completion detection { Data path is synchronous { Requires timing redundancy for safe design ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

20

What inuences the design methodology Delay model Signalling model Environment model Specication language

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

21

Delay models Place: gates, wires, gates + wires Timing properties: bounded and unbounded

Bounded = lower and upper bounds for circuit components are known Unbounded = delays are unknown but nite

Memory properties: inertial and pure

Apply a train of short pulses to the input of a delay element before the output of a delay has changed its value. Inertial = all pulses are ltered out Pure = no pulses are ltered out Intermediate models are possible ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

22

Gate vs wire delay models

Gate delay model = delays are in gates, no delays in wires More realistic version: gates may have arbitrary delays, wires after fork may have delays less than one gate delay Wire delay model = gate+wire delay model: wires (and gates) may have arbitrary delays ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

23

Delay models and correctness Logic view

Transistor view a b

a

a

b

a

d1 a

c c

b

d3

c

c

d4

b

b

d5

d2

a b

This design is correct for unbounded gate delay model This design is incorrect for unbounded wire delay model If d1 > d3 + d5 or d2 > d3 + d4 then the implementation does not satisfy the specication Correctness of this design under the bounded delay model depends on delay constraints. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

24

Design styles for asynchronous control logic Bounded delays: realistic delay model for gates and wires.

Technology mapping is easy. Verication is dicult since time must be included in the model. Speed-independent: pessimistic delay model for gates (unbounded) and often acceptable delay model for wires (delays after fork are less than one gate delay). Technology mapping is more dicult. Verication is easy since time is excluded from the model. Delay-insensitive: pessimistic delay model for gates and wires (both are unbounded). The class of such circuits (built out of basic gates) is almost empty. Quasi-delay insensitive = delay-insensitive except critical wire forks, called isochronic forks. In practice this is the same as speed-independent. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

25

Signalling methods

Two-phase: rising and falling signal transitions are equivalent Four-phase: rising and falling signal transitions are not equivalent

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

26

Two-phase signalling 3

Req Ack Register

Ack

Req

Sender

Logic Data

1

Register Receiver

2

Data Cycle

Cycle

Two transitions at the control lines during one cycle acknowledge ) data change ) request ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

27

Four-phase signalling 1 Req

3

Register

Req

Sender

2

Ack

Ack Register

4

Logic

Receiver

Data

Data

Send

Reset

Cycle

Four transitions at the control lines during one cycle Control is simpler Reset phase: change data and reset control ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

28

Models of the environment Asynchronous circuits operate in the environment.

Slow enough environment = Fundamental mode: inputs can change after the asynchronous system has settled into a stable state Reactive environment = Input/output mode: inputs can change as soon as the rst output signal has changed due to the previous input change Constraints on input changes for fundamental mode and I/O-mode: { Single input change (SIC) - too slow { Multiple input change (MIC) - dicult to design using classical asynchronous FSM technique { Unrestricted input change (UIC) - very dicult to design using classical asynchronous FSM technique Fundamental mode constraints are similar to setup and hold restrictions in synchronous design. Fundamental mode is often too restrictive. Example: any pipeline operates in I/O-mode ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

29

Environment and correctness Environment a 01 0

01 0

1 b

01

delay1

2

01 0 n1

01 0

4 n3

n4

3

delay2

0

c

For fundamental mode this design is correct (necessary delay constraints hold by denition) For I/O-mode this design may be incorrect unless "Delay 1 > Delay 2" type constraints hold

n2

Assumptions on the environment inuence the correctness of asynchronous design ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

30

Goals of tutorial Will present design methodology for { I/O-mode (most general) { Bounded and unbounded delay models

(each of them may be useful for certain designs) { Four-phase signalling (more ecient and fast control). Two-phase signalling can be also handled.

Will not present { Specic methods for Fundamental mode { Specic methods for two-phase signalling { All the methods that have been devised in history : : : ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

31

Hazards a

1 1 -> 0

1 -> 0

b 0

1

1

0

transition cube

1 -> 0 -> 1

Hazard example: function static 1-hazard for c = a xor b

slow 0 -> 1

0 -> 1 2

Hazard: signal transition not specied by the designer Hazards make asynchronous design dicult Theory of hazards: Unger, Kung, Nowick] ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

32

Hazards and delay model Inertial gate delay model ) no hazards 10 5

Wire delay model ) hazards

10

5

20

ICCAD'95 tutorial

5

Hazard analysis depends on the delay model The Systematic Design of Asynchronous Circuits

33

Taxonomy of hazards 1 0

1 0

Static hazards

Signal behavior:

1 0

1 0

1

1

0

1 0

0

Dynamic hazards

static (no transition is required) or dynamic (monotonous transition is required) Circuit type: combinational (circuits without feedbacks) or sequential (circuits with feedbacks)

Due to specication or due to implementation:

function or logic essential or not essential Due to the I/O mode of operation: delay hazards occurs when two input transitions are too close together ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

34

Combinational hazards Function hazards: inherent in the specication of the logic function. Condition for 1-hazards: a transition cube contains 0 Cannot be eliminated by changing the logic Logic hazards: depend on the particular implementation of the logic function Can be eliminated by changing the logic (or sometimes the delays) ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

35

Static logic hazard elimination c b

ab c

0

1

1

0

1

1

0

0

0

1 a 1 c 1

1

0

2

1 1

0

1

1

0

0

1

c b

ab c

0

1

1

0

1

1

0

0

0

1

0

2

1 a 1 c 1 1 0 1 0 a 1 1 b 1

1 1

Static hazard: a signal which should remain constant changes Static logic 1-hazard elimination condition: each transition cube is contained inside a cube of the cover Static logic 0-hazards never occur in a two-level Sum-of-products implementation ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

36

Dynamic function hazards ab cd 00 01 11 10 00 0 1 0 1 Non-monotonous change

01 1

1

1

1

11 1

0

0

1

10 0

0

0

1

transition cube

Dynamic hazard: a signal which should change once changes multiple times Dynamic function hazard: non-monotonous change of the function inside the transition cube

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

37

Dynamic logic hazard elimination ab cd 00 01 11 10 00 0 1 0 1

a 01 b1

01 0

1

1

1

c 01

11 0

0

0

1

d 1

10 0

0

0

1

ab cd 00 01 11 10 00 0 1 0 1 01 0

1

1

1

11 0

0

0

1

10 0

0

0

1

10 10 10 slow

010

1010

very slow 0

a 01

10

b1

10 10

c 01 d 1

10

10

0

Dynamic logic 1to0-hazard elimination condition: if transition cube intersects cube c of the cover, then c contains the start point of the transition cube 0to1-hazards are symmetrical (the same for the end point) ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

38

Sequential hazards Sequential circuits  circuits with feedbacks Sequential hazards occur because feedbacks propagate new values before the input net completes signal propagation to all inputs of the next state logic

Essential hazard: inherent in the FSM specication. cannot be eliminated by another state encoding ) cannot be eliminated for unbounded delays can be masked out by controlling the delays Non-essential hazards (races): can be eliminated by a proper state encoding ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

39

Non-critical races Encoding

transient states 001 00

00 a

10 b

b

s1

s2

011

000

10

10

10

10 011

000 10

10 010

Race: more than one state signal changes Non-critical race: all transient states drive the FSM into the same nal state ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

40

Critical races 00 a

10 b s1

001

b 00

s2

10 011 010 s3

10 b

000

011

000

10

10

10

10 110

b

s4

10

10 010

110

Critical race: dierent transient states drive the FSM into dierent nal states. Final state depends on the race winner ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

41

Race-free encoding 001 00

10 10

000

011 10

10 010 10

10 110

ICCAD'95 tutorial

10

10

Tracey encoding: Two stateinput transitions under the same b smust 1 !b sbe 2 and s 3 ! s4 distinguished by a stable state signal: 10 000 and 110 ! 10 100 011 !

100

The Systematic Design of Asynchronous Circuits

42

Types of race-free encoding Single bit change encoding: simple, bad throughput adjacent states dier by exactly one bit may require a few FSM cycles to reach a stable state

Single transition time encoding:

better throughput, more complex adjacent states may dier by multiple bits requires exactly one FSM cycle to reach a stable state Example: Tracey encoding Race-free FSMs may still have essential hazards ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

43

Mod-2 counter: essential hazards 1/0

0/0

very slow 01 1...0

1/0 transition1

x

01

00

01 transition2 0/0

0/0

transition3

1/1

01 01

01

y1’ z y2

010

11 0/0

y1 D 10

010

10 1/1

010

D

010

010

Unger's condition for essential hazards: state after three transitions of an input signal 6= state after one transitions of the same input signal ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

44

Why essential hazards occur Input net Next state delay1

delay2

logic

State

delay_next

delay_feedback

No essential hazards if and only if

jdelay 1 ; delay 2j < mindelaynext + mindelayfeedback

Essential hazards can be removed by controlling the delays ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

45

Delay hazards Environment a 010 1

010

b 01 2

n1

010

4 n3

n4

3 n2

c

010

I/O-mode: input b changes after output c, but before the output of gate 3This hasissettled a delay hazard

0

slow

Delay hazards may occur under I/O-mode even in designs free from essential hazards ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

46

Hazard elimination methods Changing delays (delay padding). Works only for bounded delay model Logic transformations (circuit redundancy). Works both for bounded and unbounded delay models Encoding of inputs and outputs (code redundancy) and/or changing allowed input transitions. Works both for bounded and unbounded delay models Fighting with hazards always requires some area and time overhead ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

47

Masking hazards by delay padding (1) 10

5

20

5

20

5

20

Controlling delays can eliminate hazards ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

48

Masking hazards by delay padding (2) slow a

b

a 0

1

1

0

1 -> 0

2 1 -> 0 1

Contradictory delay constraints ) controlling delays cannot eliminate hazards in general

b

0 -> 1 fast 0 -> 1 a

b

fast

a 0

1

1

0

0 -> 1 1 0 -> 1 1 1

b

1 -> 0

ICCAD'95 tutorial

slow

2 1 -> 0

The Systematic Design of Asynchronous Circuits

49

Hazard elimination by redundant encoding b

a

1

1 -> 0 0

1

1

0

1 -> 0 1 -> 0 -> 1 slow

0 -> 1

0 -> 1 2

c.t b.t b.f a.t a.f

00

01

11

10

00

0

0

*

0

01

0

0

*

1

11

* 0

* 1

*

* 0

10

*

a.t b.f

c.t

a.f b.t c.f

Function hazards cannot be eliminated by changing the logic ) change the specication Hazards depend on the allowed input transitions: XOR-gate is hazard-free under unique input changes in the fundamental mode Redundant encoding of input and output signals and four phase signalling protocol allows to eliminate even functional hazards

Dual-rail four-phase XOR-gate ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

50

Modern view on hazards and races Classical hazard/race taxonomy is complex for sequential circuits even in Fundamental Mode. It is much more complex for the I/O-mode: Hazards may be viewed as signal transition disabling Critical races may be viewed as incomplete (ambiguous) specications

Hazard avoidance: design a circuit in which transition disabling cannot occur Critical avoidance: make therace specication complete and unambiguous ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

51

Speed-independent design 0

0

Stable

1 Enabling 1

0

Enabled (0*)

1 Switching 1

Disabling

1

1 time

0

0

Stable

1 Output changes

Input changes

For unbounded gate delay model: Disabling cannot occur ) no hazards No need for a complex theory of hazards for this delay model ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

52

Critical races as incomplete state coding 001 00

10 10

000

011 10

10

10

10 010

Conflict

y2 must change 10

10 010

FSM spec is ambiguous, i.e. incomplete State assignment conict: < y1 y2 y3 > next state y20 010 0 010 1 ... ...

110 y2 must stay unchanged

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

53

Complete state coding 0010 00

10 10

0000

0110 10

10

10

10 0100

Solving state assignment

conicts:

Add new state signals or Remove one of the states (by re-encoding, reducing concurrency)

10 10 0101

ICCAD'95 tutorial

1101

The Systematic Design of Asynchronous Circuits

54

Data path design Two ways to design data path: Synchronous data path - bundled data Asynchronous data path (speed-independent = quasi delay-insensitive) Asynchronous data path is useful if performance is data dependent and the average case computation delay is much less than the worst case delay Two ecient methods for asynchronous data path: reduced direct logic (RDL) and dierential cascode voltage switch logic (DCVSL) ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

55

Indication a

a c

C

b b a

a

b

b

c

c

ICCAD'95 tutorial

c

OR-gate: output indicates falling input transitions C-element: output indicates

falling and rising input transitions Indication: A. Taubin (1980) Transition if and only cif:indicates transition a cc cannot occur without a for is a "completion detector" a. Design idea: All input and internal transitions must be indicated by some output transitions Choose indicating outputs to optimize performance/area

The Systematic Design of Asynchronous Circuits

56

Di erential Cascode Voltage Switch Logic Req

Output.f

Output.t

N transistor Inputs

network Req

ICCAD'95 tutorial

Inputs are (typically) dual-rail Additional request signal is used for precharging Outputs are dual-rail fourphase (valid-empty protocol) Staticizers (cross-coupled inverters) may be needed for output signals Two cross-coupled pull-up transistors may improve performance

The Systematic Design of Asynchronous Circuits

57

Reduced direct logic Input1.t Input1.f Output.t

Output.f

Design idea:

N transistor Inputs

network Input1.f

ICCAD'95 tutorial

Inputs and outputs are fourphase dual-rail No additional request signal Full custom design only

Input1.t

Add a pair of p-transistors for each output and a pair of ntransistors for each dual rail input Optimize n-transistor logic Distribute indication of the inputs among the outputs

The Systematic Design of Asynchronous Circuits

58

Asynchronous ripple-carry adder c.t c.f sum.t

cout.t

sum.f

c.t

c.f

c.t

c.f

b.t

b.f

b.f

b.t

a.t

a.t

a.f

a.t

a.f

b.t

ICCAD'95 tutorial

a.t a.f

a.t a.f b.t b.f c.t a.t

b.t b.f c.f

b.f b.t

a.f

cout.f

a.f

b.f

A. Martin (1991) Asynchronous cell transistor count = 34 Synchronous cell transistor count (without pass transistors) = 28 Inputs and outputs are fourphase dual-rail

The Systematic Design of Asynchronous Circuits

59

Critical delay in asynchronous adder Critical delay

Indication in

0

valid phase 1011 0001

1

0

0

0

1

0

1

1

1100

Indication in empty phase

Critical delay is proportional to log2N (N = number of bits) It is determined by the maximal number of contiguous dierent bits 32-bit adder delay (1.6 MOSIS CMOS): asynchronous: 11 nsec synchronous: 40 nsec ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

60

Part 2: Speci cation and synthesis Specication of control

{ CSP { FSM { Signal Transition Graphs (STG)

STG in details: basics, limitation and extensions, timing Synthesis overview Simulation engine Complete state coding = race elimination ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

61

CSP CSP = communicating sequential processes. Concurrent processes communicating by input and output commands on channels. Semantics are based on trace theory. Udding, van de Snepscheut, Rem] In general, techniques try to come up with quasi-delay-insensitive implementations Two main approaches: Production rule based Martin, ...] and Syntax-directed decomposition Ebergen, Berkel] ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

62

CSP PROGRAM li] ro " ri] ro # :ri] lo " :li] lo #]

\" = sequencing. r0 " = r0 goes high, r0 # r0 goes low.

li] = wait until li is high, :li] wait until li is low. Program meaning: li has to be high before r0 goes high. The circuit will wait until ri goes high, before r0 will go low again and so on .... ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

63

PRODUCTION RULES li 7! ro " ri 7! ro # :ri 7! lo " :li 7! lo #

production rule = boolean condition 7! transition. li 7! ro ": when li is high, ro will go high. :ri 7! lo ":

when ri is low, lo will go high.

conict: ro " and ro # are not mutually exclusive. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

64

PRODUCTION RULES(2) Add a state signal to solve conict: li] ro " ri] x " x] ro # :ri] lo " :li] x # :x] lo #] The new production rules: :x ^ li 7! ro " ri 7! x " x 7! ro # x ^ :ri 7! lo " :li 7! x # :x 7! lo # ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

65

IMPLEMENTATION li ro x

ff

x

lo ri

The production rules can now be identied with operations. r0 maps to an and-gate (r0 = xli). :x ^ li 7! ro " x 7! ro # ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

66

FSM y1y2 s1

00

xC+xC/o xC/o C+xC/o

s2

10

xC/o xC/o

s3

11

xC/o

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

67

FSM(2) Nodes = states, arcs = state transition Each arc is labeled with a pair of input=output symbols. input = condition for state transition to take place. output = values for output signals after the state transition.

Mostly limited to fundamental mode Unger]. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

68

BURST MODE FSM s1

a-b-/x-

a+/y+ s2 b-/y+

b+/x+y-

s4

b+/ys3

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

69

BURST MODE FSM based on signal transitions rather than signal levels Nowick, Yun]. First a set of input transitions res. Second a set of output transitions res. Finally state variables change and take the system to another state. Mostly limited to fundamental mode Unger]. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

70

SIGNAL TRANSITION GRAPH TIMING DIAGRAM

input

a

STG

a+

b+ c+

output

c a-

bc-

input

b

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

71

SIGNAL TRANSITION GRAPH(2)

Vertices represent signal transitions. Arcs represent causal relations between transitions. Introduced by Molnar 85] Chu 87] Rosenblum 85]

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

72

SIGNAL TRANSITION GRAPH (3) STG a+

PETRI NET b+

a+

b+

p3

p4

c+

c+ p1

a-

bc-

p5

p6

a-

b-

p7

p8

p2

c-

STG is an interpreted Petri net. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

73

CONSISTENCY

CONSISTENT

INCONSISTENT A+

A+

Two subsequent up-transitions without a down-transition in between

A-

A-

Two subsequent down-transitions without an up-transition in between

x+

A+

A-

A+

Two up-transitions with a down-transition in between (and vice versa)

Parallel transitions of the same signal A-

A+

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

74

MODELING POWER CONCURRENCY c+ a-

b-

CHOICE

a-

ICCAD'95 tutorial

a- concurrent with bEITHER a- fires before bOR b- fires before aOR a- and b- fire simultaneously

EITHER a- fires OR b- fires but not both b-

choice made by environment

The Systematic Design of Asynchronous Circuits

75

FIRING SEMANTICS = TOKEN = PLACE c+ = TRANSITION c+ = enabled p4

p3

c+ has fired c+ fires

c+

c+ p5

ICCAD'95 tutorial

p4

p3

p6

p5

p6

The Systematic Design of Asynchronous Circuits

76

SYNTHESIS Signal Transition Graph Translation Petri Net Token flow simulation direct method

Reachability Graph Encoding State Graph

Karnaugh Map Logic synthesis Logic equations

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

77

REACHABILITY GRAPH Vertices represent states the circuit can be in. Arcs represent transition from one state to another. Can be derived from the Petri net by simulating the token ow. Easiest representation to do analysis on. Size may be exponential in the number of transitions. Similar to trace structures Dill], transition diagrams Muller]. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

78

REACHABILITY GRAPH(2) p1

Fire a+

p2 a+

b+

p3

p4

Fire b+

c+ STATE {p1,p2} p1 STATE {p1,p3}

p2 a+

b+

p3

p4

p1

c+

a+

b+

p3

p4

STATE {p1,p4}

c+

p1 Fire b+

p2

p2 a+

b+

p3

p4

Fire a+

c+ STATE {p3,p4}

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

79

STATE GRAPH 000

{p1,p2} a+ {p2,p3}

100

{p1,p4}

010 a+

b+

a+ b+ {p3,p4} c+

110 c+

ENCODING

{p5,p6} a-

b-

{p6,p7}

{p5,p8}

111 b-

a011

101 a-

b-

a-

b-

b+

a+

b+

{p7,p8}

001

cREACHABILITY GRAPH

c-

ICCAD'95 tutorial

STATE GRAPH

The Systematic Design of Asynchronous Circuits

80

ENCODING signal a signal b

000 a+ signal a high, 100 signal b low

signal a and b low b+ 010

signal a low, signal b high

Every state is assigned a code. This code is determined by signals in the Petri net. Based on this code, logical equations can be derived. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

81

EQUATIONS 000 b+

a+ 100

c

010 110 c+

K-MAP

111

00

01

11

10

0

0

0

1

0

1

0

1

1

1

c

a+

b+

ab

b-

a011

101 LOGIC EQUATIONS a-

b001 c-

c = ab + ac + bc

STATE GRAPH

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

82

EQUATIONS(2) c 000

c

b+

a+

K-MAP 100

010 a+

b+

ab 0

00

01

11

10

0

0

1

0

1

110 c+

State 000: c is assigned 0 because c stays low in this state. State 011: c is assigned 1 because c goes high in this state (c+ is enabled). ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

83

LIMITATIONS The advantages of STG. { Concurrency of signal transitions. { Conditional behavior based on signal transitions. The disadvantages of STG. { Conditional behavior based on signal levels. { No timing information just relative ordering of events. { No don't care behavior. { No or-node behavior. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

84

LEVELS AND TRANSITIONS CK

CK

SDA

SDA

0 bit

1 bit

CK

CK

SDA

SDA

Protocol distinguishing dierent situations. stop command

ICCAD'95 tutorial

start command

The Systematic Design of Asynchronous Circuits

85

STG WITH EXTENSIONS CONDITION SIGNAL LEVEL

SDA

SDA

CK+

.....

CONDITION TRANSITION

SDA+

SIGNAL LEVEL

CKb

CK-

b a1

b0

SDA#

DON’T CARE SDA

SDAs

extensions proposed by Moon et al] ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

86

OR PRECEDENCE a+

b+

a+

b+

c+

c+

c goes high after a and b have gone high

c goes high after a OR b have gone high NO CHOICE!!

This behavior is hard to model with classical Petri nets. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

87

DON'T CARE & UNDEFINED BEHAVIOR CK+

CK

I#

Is

CK−

I (b)

(a)

I #: signal I becomes unstable. I s: signal I becomes stable. The level is unknown. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

88

LEVEL TRANSITIONS a a+

b+

out 0

out

a-

b-

1

b

out

Pulse on a sets out to 0, Pulse on b sets out to 1. Pulses may come in any order. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

89

BOOLEAN GUARDS X1, X2 are Boolean characteristic functions.

Functions determine (together with tokens) which transition is enabled. ι=0

Χ1 t1

Χ2

ι=1

t2

t1 is enabled if p contains a token AND signal i = 0. t2 is enabled if p contains a token AND signal i = 1. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

90

TRANSLATION TO STATE GRAPH m

m(i)=0 +

+

#

si

si

si

m’

m’(i)=1

m

m(i)=?

m’

m’(i)=1

#

si

0

si

si

m(i)=?

m’

s

1

1

si

m

si

m’ m’(i)=0

m’(i)=-

m

1

m(i)=-

si

m’’ m’’(i)=1

m if m(i) == 0 then t1 is enabled i=0

i=1 if m(i) == 1 then t2 is enabled

t1

ICCAD'95 tutorial

t2

The Systematic Design of Asynchronous Circuits

91

TIMING: SPECIFICATION A A UP B

[20-40]

B UP

>30

Environment: if A goes high, B will go high 20 to 40 ns. later. Circuit: if A goes high UP needs to go high before B and needs to stay high at least 30 ns. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

92

TIMING: SPECIFICATION(2) Delay relation A

[20-40] Tolerance

B UP

>30 Specification relation

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

93

TIMING: SPECIFICATION(3) Specication relations indicate how the circuit should react, if synthesized correctly. Delay relations indicate how an existing circuit (or environment) reacts. Tolerance: delays are not xed numbers, but intervals (even more complex symbolic representation is possible).

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

94

TIMING CONSTRAINTS Analysis of timing constraints Rokicki, Hulgaard, Burns, Dill, Berthomieu, Amon, Walkup]. Take into account analysis results Myers]. Active synthesis to satisfy timing constraints Borriello, Vanbekbergen]. Still a lot of research needed in this area. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

95

TIMING: ANALYSIS [135-inf]

t3

[30-40]

t5

[0-20]

t1

MAX(t4-t5)?

[0-inf]

[50-80] t2

[50-80]

t4

Hard problem in its full generality Berthomieu]. Ecient solution by Rockiki (but might still be exponential). Polynomial solutions with various restrictions Dill, Amon]. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

96

TIMING: SYNTHESIS Trigger to clock edges

A

Add delay line B UP

20ns

CK

20ns

[20-40]

>30

A H 30 ns

set out

UP

reset

UP

Solutions with varying restrictions Borriello, Vanbekbergen]. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

97

SYNTHESIS: OVERVIEW AFSM

STG

VHDL

SIMULATION RESULTS

STATE GRAPH

TECH INDEPENDENT EQUATIONS LIBRARY HAZARD-LIST TOLERANCE TECH. DEPENDENT EQUATIONS

CORRECTION PHASE

VHDL

SIMULATION RESULTS

LAYOUT

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

98

SYNTHESIS: OVERVIEW(2) Simulation at high level (specication) and low level (circuit) Start from AFSM or STG Race-elimination by state graph transformation First step hazard-elimination (technology independent) Second step hazard-elimination (technology dependent) ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits

99

SYNTHESIS: OVERVIEW(3) Map to any library. The library may contain special elements to get better results, but synthesis should not rely on it. Library elements need to be hazard-free. Tolerance factor indicating how much delays of cells in library may vary. Back-annotation. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 100

WHY VHDL? The need for a consistent language for all digital applications (synchronous and asynchronous). Tools (simulators) for free. The complete design can be validated at dierent levels of abstraction. Useful for debugging software prototypes ... ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 101

SIMULATION ENVIRONMENT STG

SYSTEM

USER

OR BEHAVIORAL VHDL

GATE-LEVEL VHDL

ASSERTION VHDL

TESTBENCH VHDL

VHDL SIMULATION OR

RESULTS

ICCAD'95 tutorial

ASSERTION VIOLATIONS

The Systematic Design of Asynchronous Circuits 102

ARCHITECTURE

inputs BEHAVIORAL VHDL OR GATE-LEVEL VHDL

TEST BENCH

Error messages

ICCAD'95 tutorial

outputs state signals

ASSERTION VHDL

The Systematic Design of Asynchronous Circuits 103

VHDL MODEL p2

p1 s+ p3

p3

State = set of tokens in places A vector of Booleans: places { places(1) = 1 if place p1 contains a token. { places(1) = 0 if place p1 does not contain a token. ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 104

VHDL MODEL(2) p2

p1 s+ p3

p3

BLOCK (places(1) = ’1’ AND places(2) = ’1’) BEGIN places(1) 90 ns > 40 ns bcsl brl bgninl bbsyl asl borgtl basl bwrl writel dtackl bgnoutl

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 195

The VMEbus master interface protocol Decompose into arbitration and read/write protocol: { simplify specication and synthesis tasks { separate the arbitration component (meta-stability) Both decomposed STGs must agree on the partial order of common transitions (\net contraction", Chu '87, or \hiding', Dill '88)

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 196

The arbitration STG bgninl-

bgnoutl-

bgninl+

bgnoutl+

bgninl-

bcslbcsl+ bbsyl90bbsyl-

bbsyl+

bbsyl90+

ε

bgninl+

initial

bcsl-

bgninl-

bgnoutl-

bgninl+

bgnoutl+

bgninlbcsl+ bbsyl90bbsyl-

bbsyl+

bbsyl90+

ε

bgninl+

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 197

A meta-stability problem We must latch and wait: : : (or use a mutual exclusion element) bcsl bgninl bgninl_del bgninl

bgninl_del bcsl_del

CK

bcsl D

bcsl_del Q

bcsl bgninl bgninl_del bcsl_del

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 198

(Half of) the read/write STG baslbcsl-

brl-

borgtl-

writelborgtl40-

ε

bbsyl-

aslin-

dtackl-

aslin+

asloutbrl+

aslin-

bbsyl+

basl+

aslout+

aslin+

bcsl+

borgtl+

borgtl40+

ε

dtackl+

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 199

State encoding initial baslbcsl-

brl-

borgtl-

writel-

is0borgtl40ε

bbsyl-

aslin-

aslin+

asloutbrl+

aslin-

bbsyl+

is0+ dtackl-

basl+

aslout+

aslin+

bcsl+

borgtl+

borgtl40+

ε

dtackl+

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 200

Initial implementation bgnoutl = bbsyl + bcsl  bgnoutl + bgninl bbsyl = bbsyl90  bgninl + bbsyl  bcsl + bbsyl  bgninl + bgnoutl brl = bcs l + borgtl + is0 borgtl = aslin  borgtl + bbsyl  bcsl  writel + borgtl  is0 aslout = aslout  is0 + basl + borgtl40 + bwrl  writel + dtackl writel = aslout  is0 + borgtl + bwrl + dtackl is0 = aslin  borgtl + bbsyl  is0 + borgtl  is0 + brl  is0 ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 201

Final implementation (XILINX 3000 or SC) bbsyl90

bbsyl

bgnoutl bgninl_del bcsl_del

is0 borgtl40 bwrl writel dtackl basl aslin borgtl

bcsl_del bgninl_del bbsyl

is0 borgtl bcsl is0 aslout borgtl dtackl bwrl

ICCAD'95 tutorial

aslout

is0

bgnoutl brl bbsyl

brl writel bcsl bbsyl

writel

borgtl

is0 aslin

The Systematic Design of Asynchronous Circuits 202

Hazard example dtackl-

basl+ bwrl+ writel+ aslout+

aslin+

bcsl+

borgtl+

borgtl40+

ε

dtackl+

if aslout+ dtackl+ + dtackl writel  aslout writel then writel may have a 0 ! 1 ! 0 ! 1 hazard is0 aslout borgtl dtackl bwrl

writel

is0 = 1, aslout = 0 ! 1, borgtl = 0 (! 1), dtackl = 0 ! 1, bwrl = 0 (! 1) ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 203

AMULET processor AMULET is an asynchronous implementation of the ARM microprocessor Goal: microprocessor design with low power consumption (DEC Alpha - 30 watts) AMULET1 { two-phase bundled data design  two-phase speed-independent control + synchronous data path AMULET2 { four-phase bundled data design  four-phase speed-independent control + synchronous data path S. Furber 1995

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 204

Micropipelines Sutherland's micropipeline: Control path = Muller pipeline Data path = Event controlled latches (Capture and Pass) Rin Rout

Rin Rout

Control Data

Ain Aout

Rin

Ain Aout

Control

Muller control pipeline

Rout

C

C Ain

Micropipeline structure

C

Aout

1

0

0

1

1

0

1

1

0

0

1

0

transfer

ICCAD'95 tutorial

Control operation

no transfer

The Systematic Design of Asynchronous Circuits 205

Two-phase control, two-phase data path C

Rin

Aout Sutherland’s micropipeline with two-phase ’capture-pass’ data in

data out

Ain

latch

Rout phi1

phi2

Token

Two-phase clocking

Data-flow operation

Bubble

D1

D2

D2

D3

D3

D4

D1

D1

D2

D2

D3

D4

transfer

ICCAD'95 tutorial

Two-phase register cell is too large and too slow ) level sensitive (fourphase) transparent latches + two-phase to four-phase converters for controlling a register (Amulet1)

no transfer

The Systematic Design of Asynchronous Circuits 206

Two-phase control, four-phase data path 1

One transition at Rin 4

Rin

C

Aout 2

forces two transitions at the latch control

5

Latch

data out

data in

latch weak 6

toggle Ain

data out

3

Rout

Enable data in

Two-phase to four-phase control conversion is too expensive ) four-phase control + level sensitive (four-phase) transparent latches (Amulet2) ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 207

Four-phase control, four-phase data path nAout a Rin

aC1

aC2

b Rout

Asymmetric C-elements: a = Rin*b + a (Rin+nAout + b) nAin+ b = a + b* nAout aC1: nAout

data in

nAin

data out latch Rin b

weak a

Rout-

Rin+

nAout+

a+

b-

nAin-

Rout+

Rin-

nAoutab+

Only half of the cells of a Muller pipeline can be lled with data With four-phase control a pipeline can be made dense Extra memory is needed in the control to distinguish between the old and new value Day pipeline (with hazards) ) speed-independent pipeline was obtained by synthesis (Forcage) Four-phase control: 30% faster and consumes 30% less power than two-phase ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 208

Amulet results Amulet1/ES2 ARM6 Process 1m 1m Area(mm) 5.5 4.1 4.1 2.7 Transistors 58,374 33,494 Performance 20.5 kDhry 33.5 kDhry Multiplier 5.3 ns/bit 25ns/bit MIPS/W 77 120

Amulet2 expectations: 3 times better than Amulet1 on the same fabrication process ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 209

Counterow Pipeline Processor Architecture Conventional RISC pipeline: instructions and data ow in one direction Counterow pipeline: instructions and data ow in opposite directions Sproull, Sutherland, Molnar 94] Sun Microsystems Labs.

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 210

CFPP advantages and disadvantages Potential advantages:

{ Local control { no need in global "pipeline stall" signals { Regular and modular structure (proof of correctness is easy to get) { Local communication between pipeline stages

Potential disadvantages:

{ Sinchronization between two ows requires arbiters

(many points of metastability) { Low latency { Control hazards (traps, branches) require invalidating wrong results ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 211

A simpli ed counterow pipeline structure Registers

Update results Dest Source R value R value

Res1 Res2 R value R value

Garner sources Stage i

results

Instructions

Op

=

=

=

=

tags (register names) comparison

Instruction fetch

ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 212

Control structure PI

Events:

AR Ex Control

AI - accept instruction

Exec

PI - pass instruction AR - accept result PR - pass result

AI

ICCAD'95 tutorial

PR

Ex - execute instr.

The Systematic Design of Asynchronous Circuits 213

Design challenge: CFPP state graph States:

Arbitration AI

E

AR PR

PI

R

I AI AR F

PI

Ex

PR

Events:

E - Empty

AI - accept instruction

I - Instruction

PI - pass instruction

R - Result

AR - accept result

F - Full

PR - pass result

C - Complete

Ex - execute instr.

C

How to synthesize this state graph formally? A few solutions proposed based on STG and CSP ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 214

Tangram Approach: asynchronous for low power Tangram = CSP-based language Direct compilation to asynchronous circuits: one language construct ) one circuit component + peephole optimization Using STGs allows resynthesis and optimization Pe~na 95] Low-power parts for portable audio equipment (DCC Error Corrector) Van Berkel 93,94] ICCAD'95 tutorial

The Systematic Design of Asynchronous Circuits 215

DCC Error Corrector results Average case switching activity (correct words { 98%)