VLSI Specification, Verification and Synthesis 9781461291978, 9781461320074, 1461291976

VLSI Specification, Verification and Synthesis Proceedings of a workshop held in Calgary from 12-16 January 1987. The co

129 36 32MB

English Pages 415 [405] Year 2013

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

VLSI Specification, Verification and Synthesis
 9781461291978, 9781461320074, 1461291976

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

VLSI SPECIFICATION, VERIFICATION AND SYNTHESIS

THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE VLSI, COMPUTER ARCHITECTURE AND DIGITAL SIGNAL PROCESSING

Consulting Editor Jonathan Allen Other books in the series:

Introduction to VLSI Silicon Devices: Physics, Technology and Characterization, B. EI-Kareh and R. J. Bombard. ISBN 0-89838-210-6. Latchup in CMOS Technology: The Problem and Its Cure, R. R. Troutman. ISBN 0-89838-215-7. Digital CMOS Circuit Design, M. Annaratone. ISBN 0-89838-224-6. The Bounding Aproach to VLSI Circuit Simulation, C. A. Zukowski. ISBN 0-89838-176-2. Multi-Level Simulation for VLSI Design, D. D. Hill and D. R. Coelho. ISBN 0-89838-184-3. Relaxation Techniques for the Simulation of VLSI Circuits, J. White and A. Sangiovanni-Vincentelli. ISBN 0-89838-186-X. VLSI CAD Tools and Applications, Wolfgang Fichtner and Martin Morf, Editors. ISBN 0-89838-193-2. A VLSI Architecture for Concurrent Data Structures, W. J. Dally. ISBN 0-89838-235-1. Yield Simulation for Integrated Circuits, D. M. H. Walker. ISBN 0-89838-244-0.

VLSI SPECIFICATION, VERIFICATION AND SYNTHESIS

edited by Graham Birtwistle University of Calgary and P .A. Subrahmanyam AT&T Bell Laboratories

., ~

KLUWER ACADEMIC PUBLISHERS Boston/Dordrecht/Lancaster

Distributors for North America: Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, Massachusetts 02061, USA Distributors for the UK and Ireland: Kluwer Academic Publishers MTP Press Limited Falcon House, Queen Square Lancaster LAI lRN, UNITED KINGDOM Distributors for all other countries: Kluwer Academic Publishers Group Distribution Centre Post Office Box 322 3300 AH Dordrecht, THE NETHERLANDS

Library of Congress Cataloging-in-Publication Data VLSI specification, verification, and synthesis I by Graham Birtwistle and P.A. Subrahmanyam [editors). p. cm.-(The Kluwer international series in engineering and computer science; SECS 35) A collection of papers presented at a workshop held in Calgary, Canada, Jan. 12-16, 1987. ISBN-13: 978-1-4612-9197-8 e-ISBN-13: 978-1-4613-2007-4 001: 10.1007/978-1-4613-2007-4 1. Integrated circuits-Very large scale integration-Design and construction. 1. Birtwistle, G. M. (Graham M.) II. Subrahmanyam, P. A. III. Series. TK7874.V564 1988 621.395-dc19

Copyright © 1988 by Kluwer Academic Publishers, Boston. Softcover reprint of the hardcover 1st edition 1988 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publishers, Martinus Nijhoff Publishing, 101 Philip Drive, Assinippi Park, Norwell, Massachusetts 02061.

Contents Preface

vii

Implementing Safety Critical Systems: The VIPER Microprocessor w.J. Cu/lyer

2 A Proof of Correctness of the VIPER Microprocessors:

27

The First Level A. Cohn

3 HOL: A Proof Generating System for Higher-Order

73

Logic M.Gordon 4

Formal Verification and Implementation of a Microprocessor J.Joyce

129

5 Toward a Framework for Dealing with System Timing

159

6 BIDS: A Method for Specifying Bidirectional

217

7 Hardware Verification in the Interactive VHDL

235

in Very High Level Silicon Compilers P.A.Subrahmanyam

Hardware Devices D.Musser, P.Narendran, and W.Premerlani

Workstation P.Narendran and J.Stillman Contextual Constraints for Design and Verification B.S.Davie and G.Milne

257

9 Abstraction Mechanisms for Hardware Verification T.Melham

267

8

vi

Contents

10

Formal Validation of an Integrated Circuit Design Style

293

I. Dhingra 11

A Compositional Model of MOS Circuits G. Winskel

323

12 A Tactical Framework for Hardware Design S.Johnston, B.Bose, and C.Boyer

349

13 Verification of Asynchronous Circuits: Behaviors,

385

Constraints, and Specifications. C.Berthet and E. Cerny

Preface VLSI Specification, Verification and Synthesis Proceedings of a workshop held in Calgary from 12-16 January 1987. The collection of papers in this book represents some of the discussions and presentations at a workshop on hardware verification held in Calgary, January 12-16 1987. The thrust of the workshop was to give the floor to a few leading researchers involved in the use of formal approaches to VLSI design, and provide them ample time to develop not only their latest ideas but also the evolution of these ideas. In contrast to simulation, where the objective is to assist in detecting errors in system behavior in the case of some selected inputs, the intent of hardware verification is to formally prove that a chip design meets a specification of its intended behavior (for all acceptable inputs). There are several important applications where formal verification of designs may be argued to be cost-effective. Examples include hardware components used in "safety critical" applications such as flight control, industrial plants, and medical life-support systems (such as pacemakers). The problems are of such magnitude in certain defense applications that the UK Ministry of Defense feels it cannot rely on commercial chips and has embarked on a program of producing formally verified chips to its own specification. Hospital, civil aviation, and transport boards in the UK will also use these chips. A second application domain for verification is afforded by industry where specific chips may be used in high volume or be remotely placed. The cost of a flawed telecommunications chip design might mean a prohibitive replacement program. Simply because of their locations, sensor chips installed on pipelines and surveillance chips in polar regions would be tremendously expensive to replace should their designs be faulty. A third application area is product redesign. Technology advances mean that chips have to be reworked into a new technology every two or three years, or even less. If a design has a verified design development tree, redesign to account for changes in technology will typically involve a replacement of the lower subtrees. There need be no change to the overall chip specification if the behaviors of the replacement designs conform to the specification of their plug-in slot in the new tree. In this case, the new versions of the same part can be inserted into existing systems with a guarantee that they will not introduce new bugs. The major themes of the workshop are presented here in four sections. First, the state of the art in hardware verification and specification is covered with two papers on the VIPER chip and its proof. VIPER is the first production microprocessor chip to be formally verified and implemented in silicon. The second section presents Mike Gordon's

viii

Preface

Higher Order Logic System (HOL) which has been used for several substantial verification efforts. The papers explain how to construct proofs in HOL and give several examples of HOL in use as a hardware description language describing circuits ranging is size from leaf cells through sub-systems all the way through to a eomplete (but small) microprocessor. The third section discusses some ideas for automated CAD systems using formal specifications. The final section covers transistor level modelling, a proof system for a development of Mossim, the verification of a dynamic CMOS circuit design style, and the use of abstraction mechanisms in verification.

The VIPER chip and its proof. The first paper, by John Cullyer, describes the design and development of the VIPER microprocessor. In order to satisfy certification authorities of the correctness of the processor's implementation, semi-formal mathematical methods were used, both to specify the overall behavior of the processor and to prove that two different gate level realizations conform to this top level specification. The second paper on VIPER by Avra Cohn (not read at the workshop) complements Cullyer's overall description and design insights. It details formal mechanization of the first two levels of the semi-formal proof developed by RSRE (The Royal Systems and Radar Establishment in UK). The mechanical verification was a formidable task, which revealed several small flaws with the original hand proof. Three of the flaws revealed were of a more serious nature: a phase error in the fetch cycle, and some incomplete checks for illegal instructions due to unforeseen conditions. Cohn's work would seem to vindicate completely formal methods. HOL. VIPER demonstrates that existing design and verification techniques can be applied to complex synchronous circuits in a practical manner. Future automated verification tools need to be interfaced to industrial VLSI systems and transform specifications to layout without human intervention. The second section concentrates upon Mike Gordon's HOL (a mechanization of Higher Order Logic). Many of the largest proofs completed to date have been constructed using either Mike's LCF-LSM system or its successor HOL. HOL has already shown (by construction) that it can handle circuits, sub-systems and complete architectures. Thus one notation can span the whole VLSI spectrum. What is lacking at the moment is the ability to incorporate electrical properties into HOL definitions; constraints fit are readily incorporated into the HOL pattern. Gordon's paper gives an introduction to the properties of the HOL language, its proof support apparatus, and small examples of an interactive proof in HOL. Jeff Joyce discusses his reworking in HOL of Gordon's original LCF-LSM proof of a toy microprocessor.

Preface

ix

VLSI CAD environments. The Synapse environment is intended to support the development of provably correct designs, proceeding from high level behavioral specifications and optional performance criteria. Its features include various flavors of expert system support, as well as support for formal manipulations; ways of incorporating order of magnitude constraints along dimensions such as area and response time; and a leaf cell synthesizer that has generated novel CMOS circuits. An important issue in the context of such high level design tools is that of &y&tem timing, which deals with the relative sequencing of subcomputations, the detailed behavior of signal voltages over time, and overall system performance. The design of a VLSI system typically involves many decisions relating to several levels of timing detail. Examples include choices relating to timing disciplines, architectural and circuit design paradigms, the number of clock phases, the number of amplification stages, the sizes of specific transistors, and the temporal behavior demanded of specific input signals (such as stability criteria they must obey). Subrahmanyam's paper introduces a framework for dealing with system timing issues in the context of very high level silicon compilers. The paradigm considered allows the details of system timing to be gradually introduced as a design progresses. During the early stages of a design, no timing details need be explicit, and the only temporal constraints arise due to functionality, causality and stability criteria. As a design evolves, and decisions relating to resource allocation, computation schedules, timing and clocking disciplines are made, the temporal constraints on signals evolve correspondingly. If the external environment in which a system resides imposes any a priori temporal constraints, such constraints can be explicitly included in the initial specification, and accounted for in the synthesis process. The framework supports synchronous and asynchronous (self-timed) disciplines, multiphase-clocks, as well as techniques that bear on optimizations of performance metrics such as power and area. The paper by Musser et at outlines an approach to verification based on the formalism of abstract data types, and as implemented in the Affirm system. The paper by Narendran et at describes work on the Interactive VHDL workstation (IVW) project that uses the Rewrite Rule Laboratory (RRL) theorem prover. The main goals of this project are to furnish the user with textual and graphical capabilities to enter and modify his behavioral and structural specifications, and tools to relate these specifications to formulas for input to the Rewrite Rule Laboratory theorem prover. The final paper in this section, by George Milne, sketches a technique where specifications, designs and contextual information are all described in a common language. Contextual knowledge is captured as just another expression in the language and may be used to simplify the representation of a design in a rigorous

x

Preface

manner.

Modelling and abstraction. Tom Melham addresses abstraction issues in specification. Abstract specifications of design components are used to derive more compact descriptions of behavior. By doing this at each level of a hierarchically structured design, the size and complexity of a large proof can be controlled. Thus, abstraction mechanisms complement structural hierarchy in handling proofs of large systems. Glynn Winskel discusses several models of MOS circuits and points out some of their shortcomings. Bryant's model, used in Mossim II, tackles such issues in reasonable generality and has been used as the basis of several MOS switch-level simulators; however, this model is not compositional. Winskel presents a compositional model for the behavior of MOS circuits when the input is steady, and shows how this leads to a formal logic. He also indicates the difficulties in providing a full and accurate treatment for circuits with changing inputs. Inder Dhingra formalizes some rules of thumb for popular CMOS design styles, and uses them to analyze the rules of the NORA design style. In this design style, there are two complementary clock lines, and p- and n-type gates. The design rules govern how the gates may be connected, what clock lines they are driven by, and if the output is to be guaranteed. In order to make the design style synchronous, a CMOS latch is used as a dynamic register. This further complicates the rules but gives rise to a design style that generates smaller and faster circuits as compared to standard CMOS. Dhingra goes on to prove that the NORA design style can fail for large circuits, and develops CLIC, a refinement of NORA using a two phase non-overlapping clocking scheme that is guaranteed. Dhingra also gives a formal motivation for the design style using a simple transistor model with charge storage capability. Steve Johnson discusses work-in-progress on the automation of digital-design synthesis from functional specifications via transformations. He describes work concerned with systematically obtaining correct physical implementations from iterative specifications, and with building aids to facilitate the process. The motivation is to develop a practical methodology with an algebraic flavor. Christian Berthet presents a methodology for comparing switch-level circuits with their high-level specifications wherein circuits, user behavior and input constraints are modelled as asynchronous machines. The model is based on characteristic functions. A Boolean algebra of asynchronous machines is defined. Machine composition and internal variable abstraction are shown to correspond respectively to the product and sum operations of the algebra. Internal variables can be abstracted under the presence of a domain constraint, provided that the I/O behavior is preserved. The verification of a speed-independent fair arbiter is presented to exemplify the techniques.

Preface

xi

Acknowledgements. The workshop was funded by the Natural Sciences and Engineering Research Council of Canada through an Operating Grant, by the University of Calgary, and by the Department of Computer Science at the University of Calgary. The Alberta Microelectronics Center kindly furnished the location and kept us going with coffee throughout the week. The workshop organizers would like to thank those who made the job easy: the speakers who provided excellent written material beforehand, and the participants for many lively discussions which never got (completely) out of hand.

Graham Birtwistle, Univerlit,l 0/ Calgary. P.A.Subrahmanyam, AT8T Bell Laboratoriel.

1 Implementing Safety-Critical Systems: The VIPER Microprocessor 1 W. J. Cullyer Computing Division Royal Signals alld Radar Establishment St. Andrews Road, Malvern Worcs WR14 3PS, England

Introduction During the last five years industry has progressively spread the use of commercial microprocessors into many areas of real-time control. This has been particularly marked in military projects and has happened in a number of civil applications, such as control of nuclear reactors and railway signalling. Some of these applications are 'safety-critical' in the sense that design errors could result in accidents and loss of life. Good examples of such critical applications are found in the avionics fitted in civil and military aircraft [5]. Based on work at RSRE it has been concluded that commercial microprocessors do not from a sound basis for equipments to be used in the most hazardous environments. The methods used to design mass produced general purpose devices are relatively informal and more often than not the resulting implementation contains design errors. Handbooks describing operation codes can be ambiguous and there is no guarantee that second and third sourced devices will have the same functionality as the parent. For these and other related reasons there is a requirement for high integrity processors and support chips produced by the most formal methods available. Such devices may find only a limited market but their use in vital 'black boxes' could be a major contribution to safety. Overall, the RSRE research programme has three main limbs. Chronologically the first area tackled was the verification of computer programs using automated tools based on representation of the program text as directed graphs [1]. This has resulted in two separate sets of analysers for validation of programs, called MALPAS and SPADE, which are available commercially from UK companies. This topic is dealt with briefly in this paper, insofar as it affects the checking of programs written for the new VIPER microprocessor. The second limb of the programme is the research on hardware verification, described in this paper. Currently the techniques in use employ Michael Gor1 This

paper is Copyright ©HMSO London 1986.

2

Implementing Safety-Critical Systems

don's LCF-LSM, invented at Cambridge [9] and the ELLA logic description language, invented at RSRE [10]. The first chip produced by this means is the VIPER 32-bit microprocessor [6]. This has been conceived using the most formal methods available in order to satisfy certification authorities and equipment designers of the correctness of the practical devices. Two different realisations are being produced, based on specifications issued by RSRE. This paper describes the development of VIPER and points out some of the practical problems encountered over the two years of the project. Informal proofs of correctness using LCF-LSM at the higher levels and ELLA nearer to the circuit level have been carried out at RSRE [7]. These methods are outlined in Section 2. Fully formal proofs, using Higher Order Logic (HOL) are being carried out by Avra Cohn and Michael Gordon at Cambridge, under contract to RSRE. It is intended to place these proofs in the public domain so that other scientists can check on the validity of the work. The final and most recent limb of the RSRE programme is the study of safety-critical systems, up to the level of a complete 'black-box', which may control some vital actuator. These studies embrace the whole area of n-channel systems, (n 2, 3 ... ) and are intended to demonstrate how high integrity processors can be combined with comparators and error detection on memories to produce certifiable systems. Clearly the support (or 'glue') chips need to be formally specified and designed and this is itself an area of research. RSRE has placed a contract with Cambridge Consultants Ltd, Cambridge, UK, for work on these system topics. In the end all of this activity is intended to produce safer equipment. So far there have been few deaths or injuries due to design faults in the hardware or software of computer-based systems. However, if nothing is done to improve the standards of design and certainty in some application areas it is probable that the casualty rate due to flawed microelectronics will rise over the next decade. It will be demonstrated in this paper that current methods of hardware verification are mature enough to be used for practical problems, on the scale of a simple 32-bit microprocessor. Based on this realisation there is every reason to suggest to certification authorities and to industry that formal methods should now be applied to critical devices. By paying equally close attention to the correctness of software it is believed that in the 1990's it should be possible to produce ultra-safe equipment for those regimes in which large numbers of human lives are at stake. However, the ability to do this will depend in part on scarce human resources, since knowledge of the requisite techniques has not spread far from the originating laboratories.

=

1

The Specification and Design of VIPER

The key feature about the new VIPER processor, Figure 1, is its simplicity. A more complex architecture would have taken the proofs beyond the current

3

REGISTER SET

E

X

T E

R

x

N A

L

y

BUFFER

p

o

A T A

A

0 0

R E S S

...

A

o

L DA RT E C S H S

....

INSTRUCTION LATCH

FIG 1

VI PER MICRO·PROCESSOR ARCI1ITECTURE

4

Implementing Safety-Critical Systems

state of the art. As will be seen from Figure 1, the device has an accumulator, A, two 'index registers', X and Y and a program counter, P. There is a singlebit register, B, which holds the results of comparisons and can be concatenated with registers in shift operations. Unique to VIPER is the 'stop' signal from the AL U. Any illegal operation, arithmetic overflow or computation of an illegal address causes the device to stop and raise an exception on an external pin so that corrective action can be taken by the rest of the system. All instructions have an identical format, shown in Figure 2. The logic of instruction decoding is outlined in Figure 3. Note that comparisons subsume all other operations. Table 1 gives a list of the comparison and ALU functions available in the prototype chips. The ml field of the instruction is used to select shift instructions when cl 0 and II 12. This minor degree of overloading is acceptable since shifts are applied to registers and there is no 'memory read' in this context.

=

=

Table 1: Comparison and Alu Operations

= 32 bits from register, m = 32 bits from memory) I COMPARISON, cf = 1 I NOT COMPARISON, cf = 0 (r

I ff 0 1 2 3 4 5 6 7 8 9 10 11 12

r m r=m r:f. m rm unsigned r < m unsigned r > m (r < m) OR b (r> m) OR b (r = m) OR b (r:f. m) OR b (r ~ m) OR b

13 14 15

(r> m) OR b (unsigned r < m) OR b (unsigned r> m) OR b

NEGATE m CALL READ from PERIPHERAL READ from MEMORY r + m, b := carry r + m, stop on overflow r - m, b := borrow r - m, stop on overflow rXORm rAND m rNORm r AND NOT m ml = 0 : SHIFT RIGHT, copy sign ml = 1 : SHIFT RIGHT, through b ml = 2: SHIFT LEFT, stop on oflo ml = 3: SHIFT LEFT, through b illegal illegal illegal

Some of the rationale for this architecture is as follows. One of the primary requirements was for the instruction decoder to be as simple as possible.

mf-O mf-l .f-2 mf-3

-> -> -> ->

m-tail m-[tail] m-[tail+X] m-[tail+y]

I

COMPARISON FLAG

cf-O -> NOT compare cf-l -> compare

r

MEMORY SELECT

I "T~'T

See Fig. 3

OrSTlNA TION SELECT

FIG 2

FUNCTION SELECT

INSTRUCTION FORMAT

w,'

20-BIT CONSTANT, ADDRESS OR OFFSET

See TABLE 1

''IT "T / I \~

~

-> r-A -> r-X -> r-Y -> r-P (padded'

REGISTER SELECT

rf-O rf-l rf-2 rf-3

0

BIT

6

df-7 df-6 df-S df-4 df-3 df-2 df-l df-O

-> -> -> -> -> -> ->

Write to memory Write to I/O IF B THEN no-op ELSE destination - P IF B THEN destination - P ELSE no-op destination - P (unconditional) destination - Y destination - X -> destination - A

y

Execute

comparison,

y

assiging result to B

Execute

WRITE

instruction

Execute function defined by ff

FIG 3

y

LOGIC OF INSTRUCTION DECODING

Implementing Safety-Critical Systems

7

Secondly, a wide data word was needed for high resolution in digital filter applications. Thirdly, a rich set of comparison functions was desirable, based on the setting or clearing of a single flag, B. No support has been provided for interrupts or for the manipulation of a stack. Both of these facilities are potentially hazardous. Interrupts mean that program validation is not feasible to the same depth as in a non-interruptable program. Stacks have the difficulties of overflow or trying to POP from an empty stack and are best avoided in safety-critical work. As regards address range, 20 bits may seem excessive, since safety-critical programs should be short. However, the I/O space may need as much as one million words and to preserve uniformity the main memory space is made the same size, selection between the two being by a one-bit signal (MEM/IO) which essentially becomes a 21"t address bit. In practice the ROM and RAM used with VIPER processors will probably be widely separated in the main memory address space, e.g. trusted ROMs at the lower end of memory and RAM at the top. Only static RAMs should be used with VIPER and the ommission of any means of providing refresh to dynamic memories is deliberate. The top level specification for VIPER was written in LCF-LSM and is reproduced as Annex A. This specification is based on the transformations of a vector:

state

= (ram, P, A, X, Y, B, stop)

where ram iniplies the current contents of both address spaces, P, A, X, Y and B are the current contents of registers in VIPER and stop is a Boolean which indicates if the machine has stopped. The function NEXTO defined in Annex A gives all possible transitions for VIPER in the form:

NEXT(state)

-+

state

The state which forms the argument is that just before the next instruction fetch. Notice that as illustrated in Annex A it proved possible to provide a formal specification of this processor in just six pages of A4, namely five pages of auxiliary functions and a single page definition of NEXT. Hierarchically, the next layer of documentation defines the 'major state machine' illustrated in Figure 4. This combination of elementary functions must be shown to conform precisely with the top level specification. The node 'dummy' has been included solely to act as the 'primary node' for the purposes of proof. It has to be established that all possible journeys round this diagram, from dummy back to dummy, produce the transformations required in the top level specification. To do this requires the augmented vector:

major

= «ram, P, A, X, Y, B, stop), (t, inst))

where t represents the contents of an internal register not accessible to the programmer and inst the current 12 bits of the instruction fields. Each of the

FIG.4

STATE

TRANSITIONS

IN

MAJOR

STATE

MACHINE

Implementing Safety-Critical Systems

9

nodes in Figure 4 produces the mapping:

MACHINE(major) - (major,nexLnode) where nexLnode is the number of one of the other nodes in the diagram. The third layer of documents specifies the so-called 'electronic block model' shown in Figure 5. This is the lowest level of documentation issued by RSRE. Those wishing to fabricate VIPER in practice have to utilise their own in-house VLSI technology to produce each of the blocks shown. The specification for each of these blocks was defined initially in LCF-LSM and subsequently in the logic description language ELLA. There were a number of reasons for changing language. As the design moves closer towards circuit level, the definition of a Boolean variable has to be extended to allow for values such as x for 'irrelevant' and i for 'illegal'. ELLA allows this. Secondly, ELLA has been interfaced to various VLSI CAD systems using a package called 'ELLANET' which enables 'net lists' for chips to be transported to alien CAD environments. In practice the translation of LCF-LSM into ELLA texts was straightforward. A library of LCF-LSM primitive functions was written in ELLA. This enabled LCF-LSM functions to be compiled in ELLA with only minor syntactic changes [11]. As a result there is considerable confidence that the two sets of texts are functionally identical. As regards complexity, the building blocks shown in Figure 5 can be specified completely in about thirty pages of A4. In addition, the industrial teams producing the prototype chips had a gate level design for each block, produced at RSRE, and this aided their task in understanding the functions of each module. The bottom level of documentation is the 'circuit'. This level of documentation was produced by Ferranti, using their own design language (FDL) and Marconi using HILO. In order to achieve comparison with the RSRE block model these texts were translated into ELLA, using software written at RSRE. This was a vital step in the whole process of verification of the gate level design and is considered further in Section 2.3. The above description gives the impression of 'top down' design. This was not the case in practice. There were several iterations of the top level specification before the final text of Annex A was agreed. Since no proofs were available in the early stages of the project, confidence was built up in the various texts largely by simulation, as described in Section 2.4. Based on this experience it is fair to conclude that at least three levels of documentation (top level specification, major state machine and electronic block model) have to be iterated in harmony to achieve a practical and utilitarian design. Detailed design at silicon level should be postponed until these three layers of documents have been frozen. It is interesting to note that the manufacturers of the prototype VIPER chips made comparatively little use of the top level specification or the major state machine descriptions. Their designs were completed largely on the basis of the electronic block model. This suggests that the higher levels of specification can be left in the hands of small teams of specialists, with detailed design and

I

r--

I'R 1/

FIG 5

/

,

CONTROL

~'"

ALU ~M

, J

~

-

-----

-

-

--

/

ADDR ,

DATA

TIMING CONTROL

MAIN DATA, REGISTERS A,X,Y,P

V

VI PE R ELECTRONIC BLOCK MODEL

INSTRUCTION DECODER

B REGISTER AND STOP ~ LOGIC

B

EXT INTERFACE

N

1-1

V

r-....

'"

Implementing Safety-Critical Systems

11

fabrication of chips being devolved to VLSI companies who do not necessarily need to employ experts in formal methods.

2 2.1

Verification of the VIPER Design Major State Machine to Top Level Specification

The informal proof of correspondence was done by the author at RSRE and took about three man weeks, by hand analysis, i.e. without the use of the Cambridge LCF-LSM tools. By deriving the spanning tree of all possible transitions of the major state machine, Figure 6, it proved possible to conduct a 24-limb proof in first-order logic. This established with a reasonable degree of certainty that the major state machine conformed with the top level specification. Subsequently, Avra Cohn and Michael Gordon, working with HOL versions of the texts at Cambridge and using their automated theorem prover have shown that this informal proof was flawed in some minor respects but overall was sound. This exercise has shown the importance of formal analysis using well founded axioms, rules and tactics to reveal the unexpected logical features of a specification. As this formal verification work in Cambridge has been completed it is safe to conclude that the major state machine is an accurate model which forms a rigorous basis for the lower levels of design [4].

2.2

Electronic Block Model to Major State Machine

This layer of proof is too extensive to carry out by hand and definitely requires automated theorem proving. At the time of writing (December 1986), this work is about to be started in Cambridge. Avra Cohn and Michael Gordon will be using their HOL tools on a SUN workstation to establish the correspondence of the electronic building blocks of Figure 5 with the major state machine. Inevitably this set of proofs will be voluminous if published in full but it is intended to produce a detailed summary for review, probably as a Monograph. The technique to be used relies on carrying out case analysis for the various node numbers in Figure 4. Details of how this was done for the simpler example of a 6-bit counter are given in Reference [3], in terms of the ML code defining the rules and tactics.

2.3

Circuit to Electronic Block Model

This proof uses the ELLA simulator and a technique devised by Clive Pygott of RSRE. The essence of the method, which is explained in full in Reference [11], is to carry out the equivalent of exhaustive simulation on the ELLA description of one block, such as the instruction decoder, in parallel with applying the same test vectors to an ELLA description of the implementation. To avoid the combinatorial explosion and totally unrealistic run times, extensive use is made

INDEX

FIG. 6. SPANNING

TREE OF POSSIBLE

6. READIO

STATE TRANSITIONS

16. DUMMY.

4. PERFORM 11. WRITEIO

3. PRECALL

10. WRITEMEM

t FETCH

8. STOPS TATE

~.

• PATHS WHICH ARE NOT SEMANTICALLY FEASIBLE

7. READMEM

Implementing Safety-Critical Systems

13

of the x (i.e. irrelevant) value of Boolean for all components of a test vector which do not affect the outcome of a particular test. By this means the quarter of a million tests which would have been required for the VIPER instruction decoder were thinned to just over 900 runs of the ELLA simulator. Indeed, the whole of the implementation of VIPER can be checked with only 5,500 or so such vectors. It is worth noting that all of the tests for one chip can be completed, on heavily loaded VAX 8600's in about three hours of elapsed time. In practice, both of the contractors fabricating the prototype chips made mistakes early on at circuit level when implementing particular blocks. The method outlined above proved to be powerful enough to detect the discrepancies. Both designs were corrected before net-lists were finalised and masks generated, with substantial savings in overall project timescales. It is believed that this method of 'intelligent exhaustive simulation' can be refined and turned into a practical tool for the VLSI industries. Work is in hand at RSRE towards this end. At Cambridge this level of proof has- been done in HOL. As an interesting comparison, Reference [3] shows how HOL can be used to prove the circuit of a 6-bit counter correct whereas Reference [11] shows the same 'proof' done using the ELLA simulator. It is a relevant question to ask if a logic description language, such as ELLA, is needed as a stepping stone from the abstraction of HOL to current VLSI design languages. Experience on VIPER suggests that the introduction of ELLA was essential to gain industrial acceptance of the methods in use.

2.4

The Use of Simulation

In the absence of formal proofs of correspondence, extensive use was made of simulation to build up confidence in the various levels of documentation. Specifically, the following animations or simulations exist, running on VAX/VMS: • An animation of the LCF-LSM top level specification, written in ALGOL68, based on a library of LCF-LSM primitives (WORDn, BITSn, etc.); • An animation of the major state machine, written in ALGOL68; • A simulation of the electronic block model in ELLA, based on a library of LCF-LSM primitives written in ELLA. Software has been written to compare the outputs from these programs automatically when the three programs are driven with the same VIPER object code, as illustrated in Figure 7. Note also the links to the industrial VLSI CAD systems, which proved to be vital when interacting with circuit designers. It is important to comment on the relative speeds of these simulations. On VAX 8600 it is about 50 VIPER instructions per second at the top level, about 12 instructions per second for the major state machine and only 1 or 2 instructions per second for the ELLA simulation of the electronic block model.

14

SOFTWARI WRITTEN IV RSRE FOR I PEl PRo.IlCT

r

T~

LEVEL .,EC MAJOR..-_ _-.

STATE

L-~~~~~

SPEC ElECTRONIC ,..-_ _...,

BLOCK MOOEL

~-----------,

I

MARCONI

I

I

FIG 7 SIMULATIONS

OF

VI PER

Implementing Safety-Critical Systems

15

As might be expected, many hundreds of mill hours of VAX time was used at RSRE, most of this on ELLA simulations. The so-called 'animations' of the LCF-LSM texts proved to be very valuable. When dealing with abstract definitions such simulators produce a great deal of engineering confidence and can be used for practical demonstrations to prospective customers for the ultimate device. It is not clear if this comment applies to work in HaL but based on experience of the RSRE team to date this is a topic worth pursuing. It should be straightforward to simulate higher level HaL documents but not so easy to animate HaL descriptions of circuits where existential quantifiers are used to hide internal wires.

3

Implementing VIPER

In principle, VIPER could be fabricated in any VLSI technology, including cell arrays, semi-custom and so on. The route chosen for the prototypes was the use of cell arrays, namely: • The Ferranti Electronics Ltd Uncommitted Logic Arrays (ULA), based on a bipolar technology, and • The UK5000 cell arrays, as fabricated by Marconi Electronic Devices Ltd, which is a CMOS technology. The companies worked in parallel and independently to produce designs for the electronic building blocks which make up the VIPER processor. Neither company has ELLA in use and therefore schematic capture and initial simulations were done on their current CAD systems and based on existing libraries. Once these designs had been produced to the level of 'net lists' these descriptions were sent back to RSRE on magnetic tape. Using software written at RSRE, these texts were converted into ELLA so that the 'intelligent exhaustive testing' method described in Section 2.3 could be applied to detect any gate level mistakes. Given the large amount of detail available at RSRE, it then proved possible to use the ELLA simulations of the two designs to generate test pattern tapes in Malvern, as illustrated in Figure 7. These tapes were used by the contractors to run acceptance tests at the ends of the respective production lines. To achieve this, simple programs were written which exercised all of the registers and facilities in VIPER. These were run through the ELLA simulator to generate the necessary waveforms on the external pins of the chip and these binary sequences were converted at RSRE into formats appropriate for the probe test machines in use by Marconi and Ferranti. This unusually close liason between a customer for a committed logic array and the silicon house undoubtedly contributed a great deal to the success of the project overall.

16

Implementing Safety-Critical Systems

4

Programming VIPER

4.1

Choice of Language

Clearly assemblers and compilers could be written for VIPER to implement virtually any language. There is some point in sticking to existing languages, to gain commercial acceptance. However, for the most safety-critical systems the collateral work at RSRE on software verification suggests that a fresh language is needed. This seemingly radical decision has been taken after study of the options for writing software for potentially hazardous regimes. Each language was considered under the following headings: • Existence of 'dangerous' constructs in the language; • Knowledge of the operational semantics of the language, which affects the ability to carry out static code analysis and/or formal verification; • Existence of a rigorous numerical model underlying the arithmetic, and • Existence of any 'trusted' subsets of the language. Using these criteria, assembler level languages were compared with PASCAL, CORAL 66 and Ada. Overall it was concluded that none of these languages was satisfactory, although a subset of PASCAL produced by Carre shows great promise and has been used on several critical UK projects [2]. Accordingly, it was decided to produce a new high order language for programming VIPER, which has been called 'NewSpeak', since it was designed in 1984 and is calculated to stop programmers thinking 'wrong thoughts'. If it is worth producing a new language, then its definition and implementation must show major improvements over anything in the market place, otherwise industry will not be prepared to accept the overhead of training. Ian Currie of RSRE has designed the NewSpeak language to fulfill this requirement. As well as a compiler for VIPER, the language may well be valuable for machines such as the 68000 and it is hoped that other NewSpeak compilers will emerge over the next three to four years.

4.2

NewSpeak

The language is described in more detail in Reference [8]. In this section an outline of the chief features is presented, sufficient to show the interactions which have taken place between the design of the VIPER architecture and the design of the NewSpeak language. Among the main features of NewSpeak are: • Range-monitoring of both real and integer variables with most of the work being done by the compiler, i.e. very little run-time overhead;

Implementing Safety-Critical Systems

17

• Accuracy monitoring of real values, so that poor numerical significance in an algorithm is spotted at compile time; • Assertions so that the programmer can incorporate collateral knowledge (e.g. the accuracy of an algorithm) in a form which the compiler can use; • The 'flavouring' of types eg two real variables might be flavoured METRES and SECOND; the compiler has rules for combining flavours, some of which deliver a new flavour (e.g. METRES/SECOND); • All NewSpeak constructs are bounded in time and storage allocation. Time critical programs can be written without the use of interrupts by polling the external lines at regular intervals, using the compiler diagnostics to check that the time limits are not exceeded.

It will be realised from the above list that NewSpeak is a far more rigorous language than PASCAL or Ada. Its very strong typing, static storage allocation and guarantees on run time will make it uniquely suited for safety-critical work. It should be available commercially at the start of 1990. By combining this new language with the VIPER microprocessor, it is believed that new standards of certainty can be reached in the development of high integrity equipment.

5

Conclusions

The VIPER project has shown that the methods of hardware verification developed by Michael Gordon at Cambridge, when combined with the use of a modern hardware description language, ELLA, are powerful enough to use on practical projects. It is probable that HOL/ELLA will replace LCF-LSM/ELLA as the working tools at RSRE within the next twelve months. Two companies in the UK VLSI industry have proven that they can interface to this formal regime using their standard logic array technologies. On the programming side, RSRE has concluded that no existing language is safe enough for the highest integrity systems. A new high order language, NewSpeak, has been defined for use with VIPER. By rigorous application of typing, range checks on variables and other features it is believed that an ultrasafe language can be produced, with commercial applications post-1990. Aside from verified processors and a high integrity language, great care must be taken with the overall system design when implementing safety-critical systems. Multi-channel implementations will be mandatory, even if VIPER processors are in use in the individual lanes and NewSpeak has been used for programming. It is believed that all of the steps advocated in this paper will have to be taken if the public is to be protected from the consequences of design errors in computer-based real-time control systems.

18

Implementing Safety-Critical Systems

Acknowledgements VIPER has been developed by a team at RSRE made up of Dr J. Kershaw, Dr C. H. Pygott and the author. The formal verification work at Cambridge is being conducted by Dr A. Cohn and Dr M. J. C. Gordon. Mr 1. F. Currie of RSRE is responsible for the design and specification of NewSpeak.

References [1] Bramson B. D., "Malvern's Program Analysers", RSRE Research Review 1985. [2] Carre B. A. and Debney C., "SPADE Pascal", Program Validation Ltd., UK, July 1985. [3] Cohn A. and Gordon M. J. C., "A Mechanized Proof of Correctness of a Simple Counter", University of Cambridge Computer Laboratory, Technical Report No. 94, July 1986. [4] Cohn A., "A Proof of Correctness of the VIPER Microprocessor: The First Level", in: VLSI Specification, Verification and Synthesis, (this volume). [5] Cullyer W. J., "Hardware Integrity", Aeronautical Journal, Royal Aeronautical Society, UK, September 1985. [6] Cullyer W. J., "VIPER Microprocessor: Formal Specification" , RSRE Report 85013, October 1985. [7] Cullyer W. J., "VIPER: Correspondence Between Specification and Major State Machine", RSRE Report 86004, January 1986. [8] Cullyer W. J. and Kershaw J., "Programming the VIPER Microprocessor" Proceedings of Conference on Safety and Security, Glasgow, UK, October 1986. [9] Gordon M. J. C., "LCF-LSM", University of Cambridge Computer Laboratory, Technical Report No. 41, 1983. [10] Morison J. et aI., "ELLA: Hardware Description or Specification?", Proceedings IEEE International Conference, CAD-84, Santa Clara, November 1984. [11] Pygott C. H., "Formal Proof of Correspondence Between a Hardware Module and its Gate Level Implementation", RSRE Report 85012, June 1985.

19

Implementing Safety-Critical Systems

Annex A: Top Level Specification of VIPER In LCF-LSM VALUE: (word32 # bool # boor) ....... word32 VALUE (result, carry, overflow) == result OFLO : (word32 # boor # bool) ....... bool OFLO (result, carry, overflow) == overflow CARRY: (word32 # boor # bool) ....... boor CARRY (result, carry, overflow) == carry SVAL : (word32 # boor # bool) ....... bool SVAL (result, b, abort) == abort BVAL : (word32 # boor # bool) ....... bool BVAL (result, b, abort) == b TRIM33T020 : word33 ....... word20 TRIM33T020 (w) == (WORD20 (V (SEG (0, 19) (BITS33 w»))) TRIM32T020 : word32 ....... word20 TRIM32T020 (w) == (WORD20 (V (SEG (0, 19) (BITS32 w)))) TRIM34T032 : word34 ....... word32 TRIM34T032 (w) == WORD32 (V (TL (TL (BITS34 w»))) P AD20T032 : word20 ....... word32 PAD20T032 (w) == WORD32 (VAL20 w) SIGNEXT : word32 ....... word33 SIGNEXT (w) == LET bitlist = BITS32 w IN (WORD33 (V (CONS (EL 31 bitlist) bitlist))) RIGHT: (bool # word32) --+ word32 RIGHT (b, r) == (WORD32 (V (CONS b (SEG (1, 31) (BITS32 r»))) LEFT: (word32 # bool) ....... word32 LEFT (r, b) == LET twice = V (TL (BITS32 r» IN (b::::} (WORD32 «twice + twice) + 1) I (WORD32 (twice RIGHTARITH : word32 ....... word32 RIGHTARITH (r) == LET sign = EL 31 (BITS32 r) IN (WORD32 (V (CONS sign (SEG (1,31) (BITS32 r)))))

+ twice))))

20

Implementing Safety-Critical Systems

NEG: (word33) -+ num NEG (m) == «(VAL33 m) = 0)

~

0 , (VAL33 (NOT33 m)

+ 1»

ADD32: (word32 # word32) -+ (word32 # bool # bool) ADD32 (r, m) == LET sum = WORD34 «VAL33 (SIGNEXT r» + (VAL33 (SIGNEXT m))) IN LET opposite = (EL 31 (BITS32 r» XOR (EL 31 (BITS32 m» IN (TRIM34T032 sum, (EL 32 (BITS34 sum» XOR opposite, «EL 32 (BITS34 sum» XOR (EL 31 (BITS34 sum)))) SUB32 : (word32 # word32) -+ (word32 # bool # bool) SUB32 (r, m) == LET dif = WORD34 «VAL33 (SIGNEXT r» + (NEG (SIGNEXT m))) IN LET opposite = (EL31 (BITS32 r» XOR (EL 31 (BITS32 m» IN (TRIM34T032 dif, (EL 32 (BITS34 dif» XOR opposite, «EL 32 (BITS34 dif» XOR (EL 31 (BITS34 dif)))) INCP32 : (word20) -+ word32 INCP32 (p) == (VALUE (ADD32 (PAD20T032 p) (WORD32 1))) COMPARE: (word4 # word32 # word32 # bool) -+ bool COMPARE (fsf, r, m, b) == LET op = VAL4 fsf IN LET dif = WORD34 «VAL33 (SIGNEXT r» + (NEG (SIGNEXT m))) IN LET equal = (r = m) IN LET less = EL 32 (BITS34 dif) IN LET borrow = (EL 32 (BITS34 dif» XOR «EL 31 (BITS32 r» XOR (EL 31 (BITS32 m))) IN (op = 0 ~ less , op = 1 ~ NOT less , op = 2 ~ equal , op = 3 ~ NOT equal' op = 4 ~ (less OR equal) , op = 5 ~ NOT (less OR equal) , op = 6 ~ borrow , op = 7 ~ NOT borrow op = 8 ~ (less OR b) , op = 9 ~ «NOT less) OR b) , op = 10 ~ (equal OR b) , op = 11 ~ «NOT equal) OR b) , op = 12 ~ «less OR equal) OR b) ,

21

hnplementing Safety. Critical Systems

op op

= 13 :::} «NOT (less OR equal» OR b) !

= 14 :::} (borrow OR b) !

«NOT borrow) OR b»

REG: (word2 # word32 # word32 REG (rsf, a, x, y, p) == LET r = VAL2 rsflN (r = 0 :::} a ! r=l:::}x! r=2:::}y! PAD20T032 p)

#

word32

#

word20)

OFFSET: (word2 # word20 # word32 # word32) OFFSET (msf, addr, x, y) LET mf VAL2 msf IN LET addr32 = PAD20T032 addr IN (mf = 0 :::} addr32 ! mf = 1 :::} addr32 ! mf = 2 :::} (VALUE (ADD32 addr32 x» ! (VALUE (ADD32 addr32 y)))

=

==

-+

-+

word32

word32

INVALID: (word32) -+ bool INVALID (value) == (NOT (value = (PAD20T032 (TRIM32T020 value)))) INSTFETCH : (mem2L32 # word20) -+ word32 INSTFETCH (ram, p) == (FETCH21 (ram WORD21 (V (CONS F (BITS20 p»))) MEMREAD : (mem2L32 # word2 # word20 # word32 word32 # bool # bool) -+ word32 MEMREAD (ram, msf, addr, x, y, io, nil) == LET m = VAL2 msf IN (nil:::} WORD32 0 ! m = 0 :::} PAD32T020 addr ! (FETCH21 ram (WORD21

#

(V

(CONS io (BITS20 (TRIM32T020 (OFFSET msf addr x y»»))))) MEMWRITE : (mem2L32 # word32 # word2 # word20 word32 # word32 # bool) -+ mem2L32

#

22

Implementing Safety-Critical Systems

MEMWRITE (ram, source, msf, addr, x, y, io) == LET m = VAL2 msf IN (m = 0 => ram I (STORE21 (WORD21 (V

(CONS io (BITS20 (TRIM32T020 (OFFSET msf addr x y)))))) source ram» ALU : (word4 # word2 # word3 # word32 # word32 # bool) -+ (word32 # bool # bool) ALU (fsf, msf, dsf, r, m, b) == LET ff = VAL4 fsf IN LET mf = VAL2 msf IN LET df = VAL3 dsf IN LET pwrite = (df = 3) OR (df = 4) OR (df = 5) IN (ff = 0 => «NOT32 m), b, pwrite) I ff = 1 => (m, b, «NOT pwrite) OR (INVALID m))) I ff = 2 => (m, b, pwrite) I ff = 3 => (m, b, (pwrite AND (INVALID m))) I if = 4 => (LET sum = ADD32 (r, m) IN «VALUE sum), (CARRY sum), pwrite» I ff = 5 => (LET sum = ADD32 (r, m) IN «VALUE sum), b, (OFLO sum) OR (pwrite AND (INVALID (VALUE sum»)))) I ff = 6 => (LET dif = SUB32 (r, m) IN «VALUE dif), (CARRY dif), pwrite» I ff = 7 => (LET dif = SUB32 (r, m) IN «VALUE dif), b, (OFLO dif) OR (pwrite AND (INVALID (VALUE dif))))) if = 8 => «(r OR32 m) AND32 (NOT32 (r AND32 m))), b, pwrite) I ff = 9 => «r AND32 m), b, pwrite) I ff = 10 => «NOT32 (r 0R32 m», b, pwrite) I if = 11 => «r AND32 (NOT32 m», b, pwrite) I if = 12 => (mf = 0 => «RIGHTARITH r), b, pwrite) I mf = 1 => «RIGHT (b, r», (EL 0 (BITS32 r», pwrite) I mf = 2 => (LET double = ADD32 (r, r) IN «VALUE double), b, (OFLO double) OR pwrite» I

I

23

Implementing Safety-Critical Systems

ff ff

= 13 => (r, b, T) I = 14 => (r, b, T) I

(LEFT (r, b), (EL 31 (BITS32 r)), pwrite))

(r, b, T))

WRITE: (word3 # word1) - bool WRITE (dsf, csf) == LET df = VAL3 dsf IN LET cf = VALl csf IN «cf = 0) AND «df = 7) OR (df = 6))) NILM : (word3 # word1 # word4) - bool NILM (dsf, csf, fsf) == LET df = VAL3 dsf IN LET cf = VALl csf IN LET ff = VAL4 fsf IN «cf = 0) AND NOT «df = 7) OR (df = 6)) AND (ff = 12)) NOOP : (word3 # word1 # bool) - bool NOOP (dsf, csf, b) == LET df = VAL3 dsfIN LET cf = VALl csf IN «cf = 0) AND «(df = 5) AND b) OR «df = 4) and (NOT b)))) SPAREFUNC : (word3 # word1 # word4) - bool SPAREFUNC (dsf, csf, fsf) == LET df = VAL3 dsf IN LET cf = VALl csf IN LET ff = VAL4 fsf IN «cf = 0) AND NOT «df = 6) OR (df = 7)) AND «ff = 13) OR (ff = 14) OR (ff = 15))) ILLEGALCALL : (word3 # word1 # word4) - bool ILLEGALCALL (dsf, csf, fsf) == LET df = VAL3 dsf IN LET cf = VALl csf IN LET ff = VAL4 fsf IN «cf = 0) AND (ff = 1) AND «df = 0) OR (df = 1) OR (df = 2))) ILLEGALPDEST : (word3 # word1 # word4) - bool ILLEGALPDEST (dsf, csf, fsf) == LET df = VAL3 dsfIN LET cf = VALl csfIN LET ff = VAL4 fsf IN «cf = 0) AND «df=3) OR (df=4) OR (df=5)) AND NOT «ff=l) OR (ff=3) OR (ff=5) OR (ff=7)))

I

24

Implementing Safety-Critical Systems

ILLEGALWRITE: (word3 # word1 # word2) ILLEGALWRITE (dsf, esf, msf) LET mf VAL2 msf IN «WRITE dsf esf) AND (mf = 0»

==

=

-+

bool

OUTPUT: (word3 # word1) -+ bool OUTPUT (dsf, esf) LET df = VAL3 dsfIN LET ef = VALl esfIN «ef = 0) AND (df = 6»

==

INPUT: (word3 # word1 # word4) -+ bool INPUT (dsf, esf, fsf) == LET df = VAL3 dsf IN LET ef = VALl esfIN LET ff = VAL4 fsf IN « ef 0) AND NOT « df 7) OR (df 6» AND (ff

=

=

=

= 2»

NEXT: (mem2L32 # word20 # word32 # word32 # word32 # bool # bool) -+ (mem2L.32 # word20 # word32 # word32 # word32 # bool # bool) NEXT (ram, p, a, x, y, b, stop) LET instbits = BITS32 (INSTFETCH ram p) IN LET newp = TRIM32T020 (INCP32 p) IN LET rsf = WORD2 (V (SEG (30, 31) instbits» IN LET msf = WORD2 (V (SEG (28, 29) instbits» IN LET dsf = WORD3 (V (SEG (25, 27) instbits» IN LET esf = WORD1 (V (SEG (24, 24) instbits» IN LET fsf = WORD4 (V (SEG (20, 23) instbits» IN LET addr WORD20 (V (SEG (0, 19) instbits» IN LET df = VAL3 dsfIN LET ef = VALl esfIN LET ff = VAL4 fsf IN LET eomp = (cf = 1) IN LET call (ef 0) AND (ff 1) IN LET output (OUTPUT dsf esf) IN LET input (INPUT dsf esf fsf) IN LET io = (output OR input) IN LET writeop = (WRITE dsf esf) IN LET skip = (NOOP dsf esf b) IN LET notine = INVALID (INCP32 p) IN LET illegaladdr = (NOT (NILM dsf esffsf» AND NOT skip AND (INVALID (OFFSET msf addr x y» IN

==

=

= = = =

=

hnplementing Safety-Critical Systems

LET illegalcl = (ILLEGALCALL dsf csf fsf) IN LET illegalsp = (SPAREFUNC dsf csf fsf) IN LET illegalonp = (ILLEGALPDEST dsf csf fsf) IN LET illegalwr = (ILLEGALWRITE dsf csf msf) IN LET source = (REG rsfax y newp)) IN (stop =? (ram, p , a, x, y, b, T) I {machine has stopped} «notinc OR illegaladdr) OR (illegalcl OR illegalsp) OR (illegalonp OR illegalwr)) =? (ram, newp, a, x, y, b, T) I comp => (ram, newp, a, x, y, COMPARE (fsf source (MEMREAD ram msf addr x y io F) b),

F)

I

writeop =? «MEMWRITE ram source msf addr x y io), newp, a,x,y, b, F) I skip =? (ram, newp, a, x, y, b, F) I (LET m = (MEMREAD ram msf addr x y io (NILM dsf csf fsf)) IN LET aluout = (ALU fsf msf dsf source m b) IN (df = 0 =? (ram, newp, VALUE aluout, x, y, BVAL aluQut, SVAL aluout) I df = 1 =? (ram, newp, a, VALUE aluout, y, BVAL aluout, SVAL aluout) I df = 2 =? (ram, newp, a, x, VALUE aluout, BVAL aluQut, SVAL aluout) I call => (ram, (TRIM32T020 (VALUE aluout)), a, x, (INCP32 p), BVAL aluQut, SVAL aluout) I (ram, (TRIM32T020 (VALUE aluout)), a, x, y, BVAL aluout, SVAL aluout)))

25

2 A Proof of Correctness of the Viper Microprocessor: The First Level Avra Cohn University of Cambridge Computer Laboratory Corn Exchange Street Cambridge, CB2 3QG England. Abstract: The Viper microprocessor designed at the Royal Signals and Radar Establishment (RSRE) is one of the first commercially produced computers to have been developed using modern formal methods. Viper is specified in a sequence of decreasingly abstract levels. In this paper a mechanical proof of the equivalence of the first two of these levels is described. The proof was generated using a version of Robin Milner's LCF system.

1

Introduction

The Viper microprocessor designed at the Royal Signals and Radar Establishment (RSRE) is one of the first commercially produced computers to have been developed using modern formal methods. Viper is specified in a sequence of decreasingly abstract levels. In this paper a mechanical proof of the equivalence of the first two of these levels is described. The approach used for this proof is based on HOL, a version of Robin Milner's LCF proof generating system. There are two reasons for verifying at successive levels of abstraction. First, though it is ultimately a circuit that is being built, and the correctness of this circuit is the most important concern, successive levels of description represent successive stages in the development of the implementation, and one would like to know if these levels are correct. It is possible for the circuit to be correct even if a more abstract specification is incorrect, but the error is still worth knowing about. In fact, some minor errors were discovered, during the machine-checked proof, in the two specifications of Viper. As far as we know, the actual implementation does not reflect these difficulties. (See Section 7 for details.) Second, working along a sequence of levels makes the verification of the circuit a more tractable problem; there is a logical transformation and an introduction of detail at each stage, and these can then be treated separately. We intend to continue the machine-checked proof of Viper down to more circuit-like levels of abstraction in the future.

28

A Proof of Correctness of the Viper Microprocessor

This paper is intended to be self contained, but is necessarily brief on background material. A description of Viper can be found in Kershaw [13]; the two specifications can be found in Cullyer [4] and Cullyer [5]. (See also Cullyer [6].) The machine proof is based on an informal proof outline given in [5]. A description of the HOL system is given in Gordon [10]. More generally, the LCF approach to proof is described in the successive manuals [7] and [15]. [2] is a study by Cohn and Gordon of a similar but very much simpler machine-checked proof (of the correctness of a counter); it is based on Cullyer and Pygott [3]. We know of three other machine-checked proofs of computers: Gordon [8] verified a PDP-8 style machine in a precursor of HOL called LCF _LSM. This was redone and improved in HOL by Joyce [12]. In the Boyer-Moore paradigm, Hunt [11] has verified a PDP-11 like machine. Neither of these machines was intended for serious use, as is Viper. The design of Viper, the high-level specification and host machine are due to Cullyer, Kershaw and Pygott. The only differences in the specifications as they appear in this paper are that they have been put into HOL format, and have been corrected for errors. The idea of viewing the host machine as a transition graph is also due to Cullyer [5], as is the informal proof outline [5]. We have formalised the notion of traversing the graph, made the notion of time explicit, formulated the correctness statements, and generated a machine-checked proof.

1.1 1.1.1

The HOL System Proof in HOL

HOL, like LCF, is a system for generating formal proofs. In HOL, a logic in which problems can be expressed is interfaced to a programming language called ML in which proof strategies can be encoded. The logic is conventional higher-logic (Church, [1]). It is oriented towards hardware verification only in that it provides types, constants and axioms for representing bit strings. New types, constants and axioms can be introduced by the user, and organised into hierarchies of logical theories. The type discipline of the programming language ensures that the only way to create theorems is by performing a proof; theorems have the ML type thm, objects of which type can only be constructed by the application of inference rules to other theorems or axioms. (Theorems are written with a turnstile, 1-, in front of them.) Formal proofs (sequences of elements each of which is either an axiom or a theorem which follows from earlier elements of the sequence by a rule of inference) are generated in HOL in the sense that for each element of the sequence which is not an axiom, an ML procedure representing the rule of inference is executed to produce that element. That the final element, the fact proved, is actually a theorem is guaranteed by the type discipline of the programming language. The sequences themselves are not retained. LCF-style proof is discussed

A Proof of Correctness of the Viper Microprocessor

29

further in Sections 5.1 and 5.4. 1.1.2

The Logic

The HOL system uses the ASCII characters -, \I and 1\,·'=>, ! and \ to represent the logical symbols ..." V, 1\, :::>, V and .A respectively. For the purposes of this paper, a term of higher-order logic can be one of eleven kinds. o

A variable

o A constant such as T or F (which represent the truth-values true and false respectively) o A function application of the form tl t2 where the term tl is called the operator and the term t2 the operand o An abstraction of the form \x. t where the variable x is called the bound variable and the term t the body o A negation of the form -t where t is a term o A conjunction of the form td\t2 where tl and t2 are terms o A disjunction of the form tl \It 2 where tl and t2 are terms o An implication of the form tl==>t2 where tl and t2 are terms o A universal quantification of the form ! x. t where the variable x is the bound variable and the term t is the body o A conditional of the form t=>tllt2 where t, tl and t2 are terms; this has if-part t, then-part tl and else-part t2 o A "local declaration" of the form let X=tl in t2, where x is a variable and tl and t2 are terms; this is provably equivalent to (\x. t2)tl (see Section 5.1) All terms in HOL have a type. The expression t:ty means t has type ty; for example, the expressions T:bool and F:bool indicate that the truth-values T and F have type bool for boolean, and 3:num indicates that 3 is a number. If ty is a type then (ty)list (also written ty list) is the type oflists whose components have type ty. If tYl and tY2 are types, then tYl->tn is the type of functions whose arguments have type tYl and results of type tn. The cartesian product operator is represented by #, so that tYl #tn is the type of pairs whose first components have type tYl and second, tn. The HOL system provides a number of predefined types and constants for reasoning about hardware. The types include wordn, the type of n-bit words and memnl_n2 for memories of n2-bit words addressed by nl-bit words. #bn-l" ·bo (where bi is either 0 or 1) denotes an n-bit word in which bo is the least significant bit. The predefined constants used in this paper are shown below.

30

A Proof of Correctness of the Viper Microprocessor

o V: (bool)list->num converts a list of truth-values to a number o VALn:vordn->num converts an n-bit word to a number o BITSn:vordn->(bool)list converts an n-bit word to a list of booleans o WORDn:num->vordn converts a number to an n-bit word o FETCHn:memnl_n2 ->(vordnl->vordn2) looks up a word in memory List functions include HD for taking the head of a list and CONS for constructing lists. [tlj" 'jt n ] denotes the list containing tl," ·,t n . To make terms more readable, HOL uses certain conventions. One is that a term tl t2 .. ·t n abbreviates ( .. ·(tl t2)' . ·t n ); function application associates to the left. The product operator # associates to the right and binds more tightly than the operator ->. For example, the function used to model Viper's ALU (see Section 3.1) has type vord4#vord2#vord3#Vord32#vord32#bool->vord32#bool#bool which abbreviates (vord4#(vord2#(vord3#(vord32#(vord32#bool»»)->(vord32#(bool#bool» From the axioms defining the various constants we can prove Theorem 0: 1- lb. HD(BITS1(WORD1(V[b]»)

1.2

=b

What is Proved

At the most abstract level, Viper can be viewed as a function from a state of the machine to a new state, where a state reflects the configuration of the memory, registers and program counter. This is called the high-level or functional specification. The functional specification contains an operational semantics of the Viper instruction set. It carries an implicit notion of time in the concepts of current and next state. Each state transition at the top level is implemented by a sequence of events at the lower level (the host or major state machine), each of which may affect the internal state of the machine. The sequence is determined as the events are performed, according to the internal state and the current event. The possible sequences define a graph of events, each event pointing to one or more others, and one designated initial. The lower level can thus be viewed as a function from internal states and nodes in the graph to new internal states and new nodes in the graph. An internal state is more detailed than a high-level state; it includes the high-level state and also the state of some internal registers. The latter are collectively called the transient. The graph is traversed by starting at a fixed node at a fixed time, and moving from node to node until the initial node is

A Proof of Correctness of the Viper Microprocessor

31

reached again. The host machine is modelled with an explicit notion of time in the sequencing and accumulation of the effects. (This level of description is a first step toward a description of the Viper hardware.) The host and target machine time scales are different, but coincide at intervals. In fact, the host machine provides a specific event (at the beginning of each sequence) during which registers may be examined. Because the notions of time are related in this way, the proof that the two levels of description correspond is not logically difficult. It involves examining each possible sequence of events at the lower level and relating its cumulative effect (the visible part of it) to the corresponding state transition at the higher level. As it happens, there are twenty-four such sequences, hence twenty-four cases to consider in the proof. To give an idea of what is being proved, and how the proof is generated mechanically, one of the twenty-four cases is examined in some detail: the execution of a procedure call to a literal address.

1.3

The Top Level Correctness Statement

We have formulated and proved the following statement of correctness for Viper. All concepts are explained in detail later in the paper. Theorem 1: The Top Level Correctness of Viper 1- HOST_HEXT_SIG(state_sig,transient_sig,node_sig) /\ AT node_sig '10000 n ==> let d = BUKBER_OF_STEPS(state_sig n) in BEXT_TIME(n,n+d) (AT node_sig #10000) /\ (state_sig(n+d) = HEXT(state_sig n»

The terms ending with _sig are signals: functions from time (a number) to something else: state_sig is a function from time to states, and state_sig n means the state at time n; similarly for transient_sig and node_sig. HOST _BEXT _SIG is an abbreviation for: !n. (state_sig(n+1),transient_sig(n+1»,node_sig(n+1) = HOST_HEXT«state_sig n,transient_sig n),node_sig n) HOST _HEXT represents the host machine. HOST _HEXT _SIG holds of a state-, transient-

and node-signal if the host machine always (for every n) takes the state, transient and node at the current time and returns the state, transient and node at the next time. AT is defined by: AT f x n

= (f

n

= x)

It means: f at time n is x. HEXT is the high-level or functional specification. BEXT_TIME(n1,n2) P means that n2 is the next time after n1 that the predicate

32

A Proof of Correctness of the Viper Microprocessor

P is true. NUMBER_OF _STEPS takes a state s and returns the total number of node transitions required to return to the initial node starting in state s. The initial node happens to be called #10000. Therefore the correctness statement means that if: o HOST_NEXT applied to the state, transient and node at any time gives the state, transient and node at the next time, and

o the node at some particular time n is #10000 then o d is the number of steps it takes to return to node #10000, and o after d steps, the state attained by the host (HOST_NEXT) is the same as the state specified functionally (by NEXT).

(The transient does not matter in this comparison; it is just used by HOST_NEXT in its graph traversal.) The correctness statement is deceptively compact. To prove it requires computing for each possible path through the graph the number of steps that path comprises and the final state accumulated. Each final state must be compared to the state specified at the higher level, under the conditions that caused that particular path to be chosen. The proof depends on the definitions of NEXT and HOST_NEXT, as well as on various properties of numbers. In this paper, one such path (the execution of a literal call instruction) is examined in detail to illustrate the nature of the proof. Two main theorems are proved for this path; the first is discussed in Section 5.2.3, and the second in Sections 5.3 and 5.5. In this case the first theorem has the form: Theorem 2: The Number of Steps for the Call Path C(state_sig n) /\ HOST_NEXT_SIG(state_sig,transient_sig,node_sig) /\ (node_sig n = #10000) ==> NEXT_TIME (n,n+4) (AT node_sig #10000) where C is some property of the state which causes the literal call path through the graph to be chosen. The theorem says: given that condition C applies to the state at some time n, and given that the state, transient and node at any time are computed by HOST_NEXT applied to the state, transient and node at the previous time, and given that the node at time n is #10000, then n+4 happens to be the next time at which the node is again #10000. The second theorem has the form:

A Proof of Correctness of the Viper Microprocessor

33

Theorem 3: Equivalence of Host and Specification for the Call Case C(state_sig n) /\ HOST_NEXT_SIG(state_sig,transient_sig,node_sig) /\ (node_sig n = #10000) ==> (state_sig(n+4) = NEXT(state_sig n» This states the correctness of Viper for the one kind of instruction. It says that under condition C, and given that HOST_NEXT computes the successive states, transients and nodes, and starting at node #10000 at time n, the state attained by the host machine at time n+4 agrees with the high-level transformation of the time-n state. A similar pair of theorems is proved for each of the twenty-four possible paths through the graph. To tie these together into the main theorem, it has to be shown that at each node, the conditions for choosing the next node cover all possibilities. Then the main theorem follows by analysis of all logical cases.

2

The Design of Viper

Viper has a 32-bit memory. Addresses are 20-bit words, but the memory is addressed by 2I-bit words, where the most significant bit distinguishes main from peripheral memory; peripheral memory is for input-output operations. Instructions are interpreted in the context of a memory (ram) of 32-bit words, a 20-bit program counter (p), three 32-bit registers (accumulator a and index registers x and y), a I-bit register (b, to hold the results of comparisons, etc) and a flag for stopping the machine (stop) should an anomolous situation arise. These seven components comprise the configurations (or states) described by the functional specification. Instructions are 32 bit words, segmented as follows: 2 bits source register field

2 bits memory address control field

3 bits destination control field

1 bit compare flag

4 bits function field

20 bits address

Each field is a bit string. The first five are represented by the variables rsf, msf, dsf, csf and fsf, respectively. The top twelve bits encode the register source, the memory source, the destination, and the function part of an instruction. The bottom twenty bits are the address. The source register field can hold the values 0, 1, 2 or 3. These values respectively indicate that the source register for the operation is the a, x, or y register, or the program counter. The memory address control field can also

34

A Proof of Correctness of the Viper Microprocessor

hold the values 0, 1,2 or 3; these indicate that the memory source is the literal address of the instruction, the contents of the address, or the contents of the address offset by the value in the x or y register. The destination control field can hold the values 0, ... ,7, which indicate the destination of the operation. The a, x and y registers are indicated by values 0, 1 and 2, respectively; the program counter by 3; the program counter if the b-flag is set, by 4; the program counter if the b-flag is not set, by 5; and the location given by the 20-bit address field (in peripheral or main memory) by 6 and 7, respectively. The I-bit compare flag indicates a compare instruction if it holds the value 1, and a non-compare if O. The function field can hold values 0, ... ,15. A call instruction is indicated by the value 1; a peripheral memory operation by 2; various other functions which require a memory source by 0 and 3, ... ,11; and functions which do not need a memory source by 12. The values 13, 14 and 15 are spare instructions whose attempted use indicates an error.

3

The Specification in HOL

To specify the behaviour of Viper in HOL, a hierarchy of logical theories is constructed in which the new types, constants and definitions required can be neatly organized. All of the definitions in Sections 3.1, 3.2 and 4.1 are corrected HOL versions of those in [4] and [5].

3.1

Basic Definitions

A high-level state (ram,p,a,x,y,b,stop) is represented in HOL as an object with the following type: mem21_32#vord20#vord32#Vord32#vord32#bool#bool

The following logical constants are used in [4]: o AND:bool->(bool->bool) for the logical A. o OR:bool->(bool->bool) for the logical V o NOT:bool->bool for the logical..., In this paper, we use HOL's /\ and \I for AND and OR, respectively, to avoid having two equivalent symbols in different places; in the actual proof AND, OR and NOT are used in the places they occur in [4], and simple substitutions are made so that HOL's inference rules for /\, and \I and - apply. We retain NOT in this paper, however, as HOL's - is unfortunately rather unreadable. There are some function constants for acting on bit strings: o PAD20T032:vord20->vord32 for extending a 20-bit word to 32 bits o TRIK32T020:vord32->vord20 for truncating a 32-bit word to 20 bits

A Proof of Correctness of the Viper Microprocessor

35

o IlfCP32:llord20->llord32 for padding a 20-bit word, then incrementing

The first two are constrained by the axiom Axiom 1: Trim-Pad Axiom 1- !1l. TRIM32T020(PAD20T032 ll) =

1l

The function REG for selecting a source register according to the source register field rsf has the type REG:llord2#llord32#llord32#llord32#llord20->llord32

and is defined by 1- REG(rsf,a,x,y,p) =

let r = VAL2 rsf in «r=O) => a 1 (r=l) => x

1 (r=2) => y 1 PAD20T032 p)

The function IlfSTFETCH to fetch from main memory according to the program counter has the type IRSTFETCH:mem21_32#llord20->llord32

and is defined below. A 21-bit address is formed from p and F. 1- IlfSTFETCH(ram,p) = FETCH21 ram (VORD21(V(CONS F(BITS20 p»»

The boolean value (F in this case) distinguishes main from peripheral memory. There are functions to extract the various fields of an instruction: o R:llord32->vord2 to extract the source register field o M:llord32->llord2 to extract the memory address field o D:llord32->llord3 to extract the destination field o

c: llord32->llordl to extract the compare field

o FF:llord32->llord4 to extract the function field o A:llord32->llord20 to extract the address.

(The name FF is used instead of F, as in [4], to avoid confusion with the truth value F.) There are constants for recognizing certain illegal instructions: o IlfVALID:llord32->bool for detecting invalid addresses o ILLEGALCALL:llord3#Vordl#Vord4->bool for detecting illegal call instructions o ILLEGALPDEST:llord3#Vordl#llord4->bool for detecting illegal uses of the pro-

gram counter as a destination o ILLEGALiRlTE: vord3#Vordl#Vord2->bool for detecting illegal write instruc-

tions

36

A Proof of Correctness of the Viper Microprocessor

o SPAREFUNC:vord3#vordl#Vord4->bool for detecting attempted uses of the spare ALU functions fields.

defined, for example, by: 1- INVALID value = NOT (value = PAD20T032(TRIK32T020 value»

1- ILLEGALCALL(dsf,csf,fsf) (let df = VAL3 dsf in let cf = VALl csf in let ff = VAL4 fsf in (cf=O) /\ «ff=l) /\ «df=O) \/ «df=l) \/ (df=2»») INVALID tests whether the top twelve bits of a word are actually being used; a valid address can only use the bottom twenty bits. From the definition of INVALID and Axiom 1, it follows that

Theorem 4: Validity of Padded Addresses 1- !v. NOT(INVALID(PAD20T032 v»

which means that a 20-bit address padded to 32 bits is always valid. ILLEGALCALL tests whether an instruction is a call whose destination is the a, x or y register. If so it is illegal; the program counter must be the destination of the ALU result, so that the jump to the procedure can occur. There is a function NOOP for recognizing instructions that are non-operations; it has the type NOOP:vord3#vordl#bool->bool

and is defined by 1- NOOP(dsf,csf,b) let df = VAL3 dsf in let cf - VALl csf in (cf=O) /\ «(df=5) /\ b) \/ «df=4) /\ (NOT b»)

An instruction is a non-operation if it is not a comparison and if its destination field holds the value 5 while the b flag is set, or 4 while it's not. (This allows for conditional call and jump instructions whose conditions fail.) Next, there is the important function that represents the behaviour of the ALU:

ALU:vord4#vord2#vord3#vord32#vord32#bool->vord32#bool#bool

A Proof of Correctness of the Viper Microprocessor

37

ALU takes a function field, a memory address field and a destination field, a register source, a memory source and the b register, and returns a 32-bit result (a memory source), along with values for the b register and the stop flag. At the moment, we are only interested in the behaviour of ALU for call functions, in which case only the destination field and memory source matter. The computed result is the memory value given, the b-value is the value given, and the stopvalue is true only if the destination is not the program counter, or if the memory source is invalid. (For calls, the memory source returned represents the location of the procedure being called.) " " abbreviates parts of the definition not relevant to call instructions.

1- ALU(fsf,msf,dsf,r,m,b) .. let ff = VAL4 fsf in let mf = VAL2 msf in let df = VAL3 dsf in let pwrite = (df=3) \/ «df=4) \/ (df=5» in «ff=O) => ... 1 (ff=l) => (m,b,(NOT pvrite) \/ (INVALID m» (ff=2) => ... 1 (ff=3) => •.. 1 (ff"4) -> ... 1 (ff=5) => ... 1 (ff-6) => ... 1 (ff=7) => ... 1 (ff=8) => ... 1 (ff=9) => ... 1 (ff=10) => '" (ff=ll) => '" I (ff=12) => ... I (ff=13) => ... 1 (ff"'14) => ... 1 ••• )

The functions VALUE, BVAL and SVAL extract the respective components of the 3-tuple returned by ALU. They are defined simply by 1- VALUE(result,b,stop) .. result 1- BVAL(result,b,stop) = b 1- SVAL(result,b,stop) = stop

It is easy to unfold the definition of ALU and prove:

Theorem 5: The AL U Result for the Call Case 1- (VAL4 fsf .. 1) ==> (ALU(fsf,msf,dsf,r,m,b) .. m,b, (NOT«VAL3 dsf = 3) \/ «VAL3 dsf = 4) \/ (VAL3 dsf = 5»» (INVALID m»

\/

38

A Proof of Correctness of the Viper Microprocessor

Finally, there are three more function constants: o OFFSET:vord2#vord20#vord32#vord32->vord32 o NILM:vord3#vordl#vord4->bool o MEMREAD:mem21_32#vord2#Vord20#vord32#vord32#bool#bool->vord32

defined below. (The definition of ADD32, which adds the contents of two 32-bit registers, is not given here; its importance for now is only that the addition may generate an invalid address. ADD32 returns a 32-bit result and two booleans, so VALUE can be used to extract the result.) 1- OFFSET(msf,addr,x,y) = let mf • VAl2 msf in let addr32 = PAD20T032 addr in «mf-O) -> addr32 1 (mf-l) -> addr32 (mf=2) => VAlUE(ADD32(addr32,x» 1 VAlUE(ADD32(addr32,y») 1- NILM(dsf,csf,fsf) = let df = VAl3 dsf in let cf = VAll csf in let ff - VAl4 fsf in (cf=O) /\ «NOT«df-7) \/ (df-6») /\ (ff=12» 1- MEMREAD(ram,asf,addr,x,y,io,nilm) let _ - VAL2 msf in (nila -> ... I (m-O) -> PAD20T032 addr FETCH21 ram (VORD2l(V(CONS io(BITS20(TRIM32T020(OFFSET(msf,addr,x,y»»»» OFFSET returns a memory value according to the memory address control field, an address, and the x and y registers. It either pads the address to 32 bits (if the memory address field indicates that the address is a literal or an indirect address) or adds the x or y register to the address (if it is an offset address). IfILM is a predicate which is true of the parts of an instruction if they indicate that no memory source is required to interpret the instruction. It holds of non-comparisons with non-memory destinations whose AlU operations do not require a memory source. MEMREAD reads from memory; it takes a memory and a memory address control field, an address, two registers, a flag to distinguish main from peripheral memory, and a flag to indicate whether a memory source is required at all. For instructions which do require a memory source: if the memory address control field holds value 0, the literal address is returned (padded to 32 bits); otherwise the contents of the address of the (possibly offset) address is fetched from the appropriate part of memory. (For instructions which do not require a memory source, MEMREAD returns some arbitrary value.)

A Proof of Correctness of the Viper Microprocessor

3.2

39

The Specification

The high-level specification of Viper can now be stated. It is called NEXT since it takes a state and returns the next state. The HOL definition (with parts not immediately relevant abbreviated as " ... ") is: 1- NEXT(raa,p,a,x,y,b,stop) let fetched z IHSTFETCH(ram,p) in let nevp • TRIM32T020(IRCP32 p) in let rsf = R fetched in let msf - M fetched in let dsf = D fetched in let csf - C fetched in let fsf = FF fetched in let addr A fetched in let df - VAL3 dsf in let cf - VALl csf in let ff = VAL4 fsf in let comp • (cf-l) in let call - «cf=O) /\ (ff-l» in let output z «cf=O) /\ (df-6» in let input - «cf-O) /\ ROT«df=7) \/ (df=6» /\ (ff=2» in let io = (output \/ input) in let vriteop = «cf=O) /\ «df=7) \/ (df-6») in let skip = ROOP(dsf,csf,b) in let noinc = IRVALID(IRCP32 p) in let illegaladdr = (ROT(RILM(dsf,csf,fsf») /\ «IRVALID(OFFSET(msf,addr,x,y») /\ (ROT skip» in let illegalcl - ILLEGALCALL(dsf,csf,fsf) in let illegalsp = SPAREFUNC(dsf,csf,fsf) in let illegalonp = ILLEGALPDEST(dsf,csf,fsf) in let illegalvr = ILLEGALWRITE(dsf ,csf ,maf) in let source = REG(rsf,a,x,y,nevp) in (stop => (ram,p,a,x,y,b,T) (no inc \/ illegaladdr) \/ «illegalcl \/ illegalsp) \/ (illegalonp \/ illegalvr» => (ram,nevp,a,x,y,b,T) 1 (comp => •.. 1 (vriteop => •.. 1 (skip => (ram,nevp,a,x,y,b,F) let m = MEMREAD(ram,msf,addr,x,y,io,RILM(dsf,csf,fsf» in let aluout = ALU(fsf,msf,dsf,source,_,b) in

=

40

A Proof of Correctness of the Viper Microprocessor

(df=O)

=>

(ram,nevp,VALUE aluout,x,y,BVAL aluout,SVAL aluout) (df=!)

=>

(ram,nevp,a,VALUE aluout,y,BVAL aluout,SVAL aluout) I (df=2) =>

(ram,newp,a,x,VALUE aluout,BVAL aluout,SVAL aluout) I (call =>

(ram,TRIK32T020(VALUE aluout),a,x,INCP32 p,BVAL aluout, SVAL aluout) I ...

»»)

NEXT first tests whether the stop flag is set, and if so, returns the original state unchanged. Otherwise, it fetches a new instruction from memory according to the program counter, and examines its various fields. It decodes the instruction with a series of tests. The new instruction is either an illegal instruction, a comparison, a write instruction, a non-operation, an ALU operation with the a, x or y register as its destination, a call instruction, or a jump. In each of these nine cases, a new state is determined, representing the state after the new instruction has been executed. The new state may have the memory changed, the program counter incremented or otherwise changed, and so on. Illegal instructions include the five sorts mentioned in 3.1, as well as instructions with illegal addresses. These latter are instructions for which the ALU requires a memory source, the address is invalid, and the operation indicated is not a non-operation. Not all of the possible new states are of interest at the moment; we are really only interested in the conditional branch for call. For a call, the memory source for the ALU is provided by KEKREAD (and the register source is selected by REG). The value computed by the ALU, trimmed to 20 bits, is the program counter of the new state, and the incremented original program counter is the y register of the new state. In this way the new state "points to" the address of the procedure being called, and carries information for an eventual return to the original location (plus 1) in the y register. The b register and stop flag of the new state are also set according to the result of the ALU operation. To arrive at the call branch of the conditional, certain conditions must obviously apply. For one thing, the stop flag must be false. Also, the six illegal situations must be avoided: first, the incremented program counter must be valid. Second, the address of the new fetched instruction must be legal. (For the example case, the address is literal, i. e. the value of the memory address control field is 0, so OFFSET just pads the address; by Theorem 4 a padded address is necessarily valid. Thus the address is legal.) Third, the instruction must be a legal call; the new destination cannot be the a, x or y register. This excludes the values 0, 1 and 2 for the new destination field. The other three illegal conditions must also be avoided. The instruction cannot be a comparison (the compare field does not hold 1) or a write operation (the values 6 and 7 are also excluded for the destination field) or a non-operation. Finally, for the call

A Proof of Correctness of the Viper Microprocessor

41

branch to be selected, the function field must hold the value 1. The complete list of conditions is as follows: 1. 2. 3. 4.

ROT stop ROT(IRVALID(IRCP32 p» VAL2(M(IRSTFETCH(ram,p») = 0 HOT(ILLEGALCALL(D(IHSTFETCH(ram,p»,C(IHSTFETCH(ram,p», FF(IRSTFETCH(ram,p»» 5. ROT(SPAREFUNC(D(IRSTFETCH(ram,p»,C(IRSTFETCH(ram,p», FF(IHSTFETCH(ram,p»» 6. ROT(ILLEGALPDEST(D(INSTFETCH(ram,p»,C(INSTFETCH(ram,p», FF(INSTFETCH(ram,p»» 7. HOT(ILLEGALWRlTE(D(IRSTFETCH(ram,p»,C(IHSTFETCH(ram,p», M(IHSTFETCH(ram,p»» 8. VAL1(C(INSTFETCH(ram,p») = 0 9. NOT(VAL3(D(INSTFETCH(ram,p») = 7) /\ ROT(VAL3(D(IHSTFETCH(ram,p») = 6) 10. ROT(ROOP(D(INSTFETCH(ram,p»,C(INSTFETCH(ram,p»,b» 11. VAL4(FF(INSTFETCH(ram,p») = 1

4

The Host Machine in HOL

4.1

Event Sequences

The functional specification of Viper gives the new state (after a new instruction is executed) directly from the current state. (The new instruction is implicit in the state because the state includes a memory and a program counter.) At the host machine level, however, there are several stages of computation for each state transition at the higher level. A new instruction is fetched and placed in internal registers. The state and these registers are then transformed in stages until the instruction is completely executed. Only then is a single transition at the time scale of functional specification completed. For the interpretation of instructions by the host machine, five internal registers (of appropriate size) are used to hold the first five fields of an instruction; these (as mentioned in Section 2) are called rsf, msf, dsf, csf and fsf. In addition, a multi-purpose 32-bit register (t) is used to hold a (padded) address or various other information. These six registers together comprise what is called the transient, and are invisible to the functional specification. The type of the transient (t,rsf,msf,dsf,csf,fsf) is: vord32#vord2#vord2#vord3#vordl#vord4 The host machine interprets an instruction by executing a sequence of (from three to seven) events. Each event in a sequence can affect the state and/or the transient. The event, in the context of the state and transient, determines whether there is a next event to be executed and if so what it is.

42

A Proof of Correctness of the Viper Microprocessor

For example, a stop is a simple event during which the stop flag is set. It is always the last event in the sequence in which it occurs. An instruction fetch is an event during which a new instruction is found in memory according to the program counter, and its various fields placed in the appropriate registers of the transient. The new address is placed in the t register. The state is affected only insofar as the program counter is imcremented, and possibly the stop flag set, depending on the new instruction and new program counter. Whether there is a next event in the sequence, and if so what it is, is determined by inspection of the new state and transient. The next event may be one of several, including the preparation for a call, or a stop event. The preparation for a call causes the y register of the state to contain the program counter, and the stop flag to be set to false since nothing new can go wrong at this point; the transient is not affected. The next event must be an lLU operation. (The purpose is to save the program counter so that it can be restored after the called procedure is finished.) Performing an operation can mean either performing a comparison or performing an lLU operation. Performing an lLU operation is an event during which either the a, x, or y register or the program counter receives a value computed by the lLU. The transient does not change. The next event is based on the result computed by the lLU; it is either a stop event, or the end of the sequence in which the 110 event occurred is signalled. Thus the following sequence of events is possible: o

Fetch a new instruction

o Prepare for a call o Perform an lLU operation For allowing the stop flag to be noticed, a dummy event is placed at the beginning of the sequence, during which the state and transient remain unchanged. The event following a dummy event is an instruction fetch, unless the stop flag is set, so the sequence of events is: o Dummy event o Fetch a new instruction o Prepare for a call o Perform lLU operation Viper is modelled at the host level as continuously running. Its infinite sequence of events consists of repetitions of twenty-four possible finite sequences each beginning with a dummy event and ending as soon as the end of the sequence is indicated (i. e. a dummy event is the necessary next event). Events are associated with numbers corresponding to nodes in the transition graph. (The numbers look random here because not all the possible events have been revealed.) The numbers are represented by 5-bit strings.

A Proof of Correctness of tile Viper Microprocessor

16. DUMMY,

8. STOP,

1. FETCH,

3. PRECALL,

43

4. PERFORM

In HOL, the functions DUMMY, STOP, FETCH, PRECALL and PERFORM formalize the five events mentioned so far. The auxiliary functions FKOVE and PKOVE compute the next event after a FETCH and PERFORM event respectively. As usual, " " abbreviates parts of the definition not relevant to the call sequence. 1- DUMMY«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) let stops tate = WORD5 8 in let fetch - WORD5 1 in (stop => «(ram,p,a,x,y,b,T) ,t,rsf,msf,dsf,csf,fsf) ,stopstate) «(ram,p,a,x,y,b,F) ,t,rsf,msf,dsf,csf,fsf) ,fetch»

1- STOP«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) = let dummy .. WORD5 16 in «ram,p,a,x,y,b,T),t,rsf,msf,dsf,csf,fsf),dummy 1- FKOVE(msf,dsf,csf,fsf,b) let mf .. VAL2 msf in let df = VAL3 dsf in let cf = VALl csf in let ff = VAL4 fsf in let b' - HD(BITS1 b) in let noop = NOOP(dsf,csf,b') in let precall .. WORD5 3 in let stopstate = WORD5 8 in let dummy = WORD5 16 in let ........ in let .. . in let let •.. = ... in let .. . in let «cf-l) => «mf=O) => .. . )1 «df=7) => «mf=O) => .•. 1 «mf=l) => 1 ••• «df=6) => «mf=O) -> •.. «mf=i) => .•• 1 ••• (noop => dummy «mf=O) => «(cf=O) /\ (ff=l» => precall 1 ••• ) 1 «mf=1) => «(cf=O) /\ (ff=2» => ..• 1 «(cf=O) /\ (ff=12» -> »1 «(cf=O) /\ (ff=12» => ...

»

»

»»»)

in in

44

A Proof of Correctness of the Viper Microprocessor

1- FETCH«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) let fetched = INSTFETCH(ram,p) in let newp = TRIM32T020(INCP32 p) in let newr = R fetched in let newm = M fetched in let newd = 0 fetched in let newc = C fetched in let newf = FF fetched in let newt = PAD20T032(A fetched) in let not inc = INVALID(INCP32 p) in let illegalcl = ILLEGALCALL(newd,newc,newf) in let illegalsp = SPAREFUNC(newd,newc,newf) in let illegalonp = ILLEGALPDEST(newd,newc,newf) in let illegalwr ILLEGALWRlTE(newd,newc,newm) in let stopstate = WORD5 8 in let b' = WORD1(V[b]) in (not inc \I (illegalcl \/ (illegalsp \/ (illegalonp \/ illegalwr») => «(ram,newp,a,x,y,b,T) ,newt,newr,newm,newd,newc,newf) ,s topstate) «(ram,newp,a,x,y,b,F),newt ,newr,newm,newd,newc,newf), FMOVE(newm,newd,newc,newf,b'»)

1- PRECALL«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) let perform = WORD5 4 in «ram,p,a,x,PAD20T032 p,b,F),t,rsf,msf,dsf,csf,fsf),perform

1- PMOVE halt = let stopstate = WORD5 8 in let dummy = WORD5 16 in (halt => stopstate

1

dummy)

1- PERFORM«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) let comp = (VALl csf = 1) in let df = VAL3 dsf in let source = REG(rsf,a,x,y,p) in let dummy = WORD5 16 in (comp => «( ... , ... , ... , ... , ... , ... ,F), ••• , ••• , ••• , ••• , ••• , ••• ), ••• )

1

let result = ALU(fsf,msf,dsf,source,t,b) in let ans = VALUE result in let newb = BVAL result in let halt = SVAL result in «df=O) => «(ram,p,ans,x,y,newb,halt),t,rsf,msf,dsf,csf,fsf),PMOVE halt) (df=l) => «(ram,p,a,ans,y,newb,halt),t,rsf,msf,dsf,csf,fsf),PMOVE halt) 1 (df=2) => «(ram,p,a,x,ans,newb,halt),t,rsf,msf,dsf,csf,fsf),PMOVE halt) «(ram,TRIM32T020 ans,a,x,y,newb,halt),t,rsf,msf,dsf,csf,fsf), PMOVE halt»)

A Proof of Correctness of the Viper Microprocessor

4.2

45

The Transition Graph

Each of the events defined in Section 4.1 determines one or more subsequent events. Together they determine a transition graph of events, part of which is already known:

I

DUMMY

oil

~

FETCH

~

I STOP

PRECALL

I

PERFORM

I

1

I

Figure 1: The Graph of Events Each possible path through the graph, starting and ending at the DUMMY node, can be shown in a tree:

Figure 2: Paths through the Graph Of the five possible sequences, we only consider in this paper the one containing PRECALL which does not stop. It is clear from the event definitions that where there is a choice of next event, certain conditions must hold in order that the desired next event be chosen. As DUMMY is executed, FETCH will be chosen if

46

A Proof of Correctness of the Viper Microprocessor

the stop flag is set to false. As FETCH is executed, STOP can be avoided if none of the five illegal conditions apply to the new fetched instruction. Furthermore, PRECALL will be chosen if the new value of the compare field is not 1 (i. e. is 0); the new value of the destination field neither 7 nor 6; HOOP is false of the destination field, compare field and b flag; the new value of the memory address control field is 0; and the new value of the function field is 1. As PERFORM is executed, STOP is again avoided if either the current value of the compare field is 1 or the stop value returned by performing the ALU operation is false; that is, if HOT(SVAL(ALU(fsf,msf,dsf,REG(rsf,a,lI:,y,p),t,b») holds for the values in the various registers when PERFORM is executed. PRECALL gives no choice of next node. To formalize these conditions, the predicates cl, c3 and cl7, from states to boolean values, are introduced. (The conjunction of these was called C in Section 1.3. The twenty four paths determine in all thirty-four different conditions of which these are three.) The examinations of the fields that take place must take into account that in the particular sequence DUMMY, FETCH, PRECALL, PERFORM, after FETCH is executed, it is the fields of the new instruction that occupy the transient registers. To make the desired choice of next node during DUMMY, we require: c1(ram,p,a,lI:,y,b,stop)

m

HOT stop

To make the correct choice during FETCH: c3(ram,p,a,lI:,y,b,stop) (HOT «IHVALID(IHCP32 p» \/ «ILLEGALCALL(D(IHSTFETCH(raa,p»,C(IHSTFETCH(ram,p», FF(IHSTFETCH(ram,p»» \/ «SPAREFUNC(D(IHSTFETCH(raa,p»,C(IHSTFETCH(raa,p», FF(IHSTFETCH(ram,p»» \/ «ILLEGALPDEST(D(IHSTFETCH(ram,p»,C(IHSTFETCH(raa,p», FF(IHSTFETCH(ram,p»» \/ (ILLEGALVRITE(D(IHSTFETCH(ram,p»,C(IHSTFETCH(ram,p», M(IHSTFETCH(raa,p»»»») /\ «HOT(VAL1(C(IHSTFETCH(raa,p») = 1» /\ «ROT«VAL3(D(IHSTFETCH(raa,p») • 7) \/ (VAL3(D(IHSTFETCH(ram,p») = 6») /\ «HOT(HOOP(D(IHSTFETCH(ram,p»,C(IHSTFETCH(raa,p»,b») /\ «VAL2(M(IHSTFETCH(raa,p») = 0) /\ «VAL1(C(IHSTFETCH(raa,p») - 0) /\ (VAL4(FF(IHSTFETCH(ram,p») = 1»»» and to make the correct choice during PERFORM:

A Proof of Correctness of the Viper Microprocessor

47

c17(raa,p,a,x,y,b,stop) = (VAL1(C(IRSTFETCH(raa,p») - 1) \/ (ROT(SVAL (ALU (FF(IRSTFETCH(raa,p»,M(IRSTFETCH(ram,p»,D(IRSTFETCH(ram,p», REG(R(IRSTFETCH(ram,p»,a,x,PAD20T032(TRIM32T020(IRCP32 p», TRIM32T020(IRCP32 p»,PAD20T032 (A(IRSTFETCH(ram,p») ,b»»

°

Some elementary inference can be done on the combined conditions: since the function field holds 1, the compare field holds and the call is not illegal, it follows that the destination field holds neither 0, 1 nor 2. Indeed, since it does not hold 6 or 7 either, and its value has to be 0, ... ,7, it must hold 3, 4 or 5. That is: 1- ROT(ILLEGALCALL(dsf,csf,fsf» /\ (VALl csf = 0) /\ ROT(VAL3 dsf = 7) /\ ROT(VAL3 dsf (VAL4 fBf = 1)

= 6)

/\

....>

«VAL3 dsf = 3) \/ (VAL3 dsf = 4) \/ (VAL3 dsf = 5» /\ ROT(VAL3 dsf = 0) /\ ROT(VAL3 dsf = 1) /\ ROT(VAL3 dsf = 2)

In particular it follows that: Theorem 6: Legal Call Destinations 1- ROT(ILLEGALCALL(D(IRSTFETCH(ram,p»,C(IRSTFETCH(ram,p», FF(INSTFETCH(ram,p»» /\ (VAL1(C(INSTFETCH(ram,p») = 0) /\ NOT(VAL3(D(INSTFETCH(ram,p») - 7) /\ ROT(VAL3(D(INSTFETCH(ram,p») = 6) /\ (VAL4(FF(INSTFETCH(ram,p») = 1) «VAL3(D(INSTFETCH(ram,p») = 3) \/ (VAL3(D(INSTFETCH(ram,p») = 4) \/ (VAL3(D(INSTFETCH(ram,p») - 5» /\ NOT(VAL3(D(INSTFETCH(ram,p») = 0) /\ NOT(VAL3(D(INSTFETCH(ram,p») - 1) /\ NOT(VAL3(D(IRSTFETCH(ram,p») = 2)

Because the value of the compare field is 0, it follows from c3 and c17 that: NOT (SVAL (ALU (FF(INSTFETCH(ram,p»,M(INSTFETCH(ram,p»,D(IRSTFETCH(ram,p», REG(R(INSTFETCH(ram,p»,a,x,PAD20T032(TRIM32T020(INCP32 p», TRIM32T020(IRCP32 p»,PAD20T032(A(INSTFETCH(ram,p») ,b»)

48

A Proof of Correctness of the Viper Microprocessor

In fact, this last fact also follows from c3, Theorem 4, Theorem 5 and Theorem 6, so c17 is actually redundant for this path. We keep it as a reminder that there is a choice of next node during PERFORM. The three conditions and their corollaries give all of the conditions for choosing the call branch of the conditional in the specification, NEXT, in Section 3.2. (The instruction does not have an illegal address for the same reason as before.)

5

The Equivalence Proof of Specification and Host Machine

What has to be proved is that under the conditions cl, c3 and c17, the functional specification agrees with the host machine on the visible state. The first thing to establish is what that state is; the second is whether NEXT gives that state. The events comprising the host machine (Section 4.1) only say implicitly what the transformation is; they determine a sequence of events (a path through the graph shown in Section 4.2) which transforms the initial state and transient in stages. The transformation can be made more explicit by defining a "control" function (HOSLNEXT) linking the events. (Branches which are not needed at the moment are filled in by" ... ".) The type major abbreviates state#transient. HOST_NEXT has type major#node->major#node. Its partial definition is: 1- HOST_NEXT (maj ,node) = (let nodenum - VALS node in (nodenum=O) => ... 1 (nodenum=2) => ... 1 (nodenum=4) => PERFORM maj (nodenum=6) => ". 1 (nodenum=8) => STOP maj (nodenum=10) => 1 (nodenum=12) => ... 1 (nodenum=14) => ... 1 (nodenum=16) => DUMMY maj (nodenum=18) => 1 (nodenum=20) => 1 (nodenum=22) => 1 (nodenum=24) => 1 (nodenum=26) => 1 (nodenum=28) => 1 (nodenum=30) => 1

..

(nodenum=l) (nodenum=3) (nodenum=5) (nodenum=7) (nodenum=9) (nodenum=l1) (nodenum=13) (nodenum=15) (nodenum=l7) (nodenum=19) (nodenum=21) (nodenum=23) (nodenum=25) (nodenum=27) (nodenum=29) (nodenum=31)

=> FETCH maj 1 .. > PRECALL maj => => => => => => => => => => => => => => ... )

HOST_NEXT says how to step through the graph. It defines the host machine, which takes a state, transient and node to a new state, transient and node. The new state, transient and node are still not fully explicit; they are computed

A Proof of Correctness of the Viper Microprocessor

49

by functions such as PRECALL which in turn may either call other functions or return a new major and node. In Sections 5.1 and 5.2 the transformation is made completely explicit.

5.1

A Digression on Forward Proof in HOL

First, each transition in the graph is characterized by saying exactly how HOST_NEXT changes a state and transient and selects a next node during that transition. It is clearly true, for example, by the definition of HOST_NEXT and the fact that VAL5 #00011 = 3 that 1- HOST_NEXT«(ram,p,a,x,y,b,stop) ,t,rsf,msf,dsf,csf,fsf) ,#0 0011) PRECALL«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) and that by the definition of PRECALL 1- PRECALL«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) let perform = WORD5 4 in «ram,p,a,x,PAD20T032 p,b,F) ,t,rsf,msf,dsf,csf,fsf) ,perform By definition of let and the fact that WORD5 4 = #00100, it follows that 1- (let perform = WORD5 4 in «ram,p,a,x,PAD20T032 p,b,F) ,t,rsf,msf,dsf,csf,fsf) ,perform) • «ram,p,a,x,PAD20T032 p,b,F) ,t,rsf,msf,dsf,csf,fsf) ,#00100

hence Theorem 7: HOST..NEXT for Node 3 1- HOST_BEXT«(ram,p,a,x,y,b,stop) ,t,rsf,msf,dsf,csf,fsf) ,#0 0011) «ram,p,a,x,PAD20T032 p,b,F) ,t,rsf,msf,dsf,csf,fsf) ,#00100

Simple as this reasoning appears, it represents a long chain of primitive inferences. To unfold HOSLBEXT the facts BOT(3=0), BOT(3=1), BOT(3=2) and 3=3, as well as (T=>t1It2) = t1 and (F=>t1It2) '" t2 must be used to rewrite the definition. Furthermore, each expression of the form let z = t1 in t2 is logically equivalent to (\z. t2)t 1. Taking that inference step on the definition of PRE CALL gives: 1- PRECALL«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) (\z. (\perform. «ram,p,a,x,PAD20T032 p,b,F) ,t,rsf,msf,dsf,csf,fsf) ,perform) z) (WORD5 4)

50

A Proof of Correctness of the Viper Microprocessor

The next inference step is beta-conversion; expressions of the form (\z.t2)t1 can be reduced to t2 [t1/z] , the result of substituting t1 for free occurrences of z in t2 (subject to restrictions on free variable capture). Taking that step once gives: 1- PRECALL«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) = (\perform. «ram,p,a,x,PAD20T032 p,b,F),t,rsf,msf,dsf,csf,fsf) ,perform) (WORDS 4)

and taking it again 1- PRECALL«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) «ram,p,a,x,PAD20T032 p,b,F) ,t,rsf,msf,dsf,csf,fsf) ,WORDS 4.

Transitivity is needed to complete the chain: if t1 = t2 and t2 = t3 then t1 = t3. The fact that WORDS 4 = #00100 is also required. This chain of primitive inferences is the proof of Theorem 7. All of the proofs discussed in this paper are proofs in that sense: a chain of inferences justified by axioms and inference rules ofthe logic. Obviously, it is not practical to construct proofs of any size manually, nor would they be very interesting to look at, but one likes to know that they exist and could be displayed. (The whole correctness proof described in this paper comprises over a million primitive inferences, for example.) In HOL, the general purpose programming language (ML) interfaced to the logic allows the user to write procedures to generate the actual chains of inferences at some level of abstraction above primitive inference steps. (The level of abstraction depends on the cleverness of the procedure.) A general ML procedure which proves Theorem 7 unfolds (rewrites) the definitions of all new constants (HOST_NEXT, PRECALL, etc), unfolds the definition of let, and does beta-conversion repeatedly until there are no lambda expressions left. Wherever possible, the procedure uses facts about numbers, substitutes equals for equals, use the transitivity of equality, and so on. By executing this general procedure, the proof is generated with a single high-level command. In fact, over 7,000 primitive inferences are performed in the course of generating the proof using this procedure. The number is as large as that partly because the procedure is so general; a great deal of inference is done in the course of each unfolding (rewriting) in order to find the location of the replacement and build up a new theorem by inference. This saves the user the trouble of specifying exactly each replacement to be made and its location in the structure of a term, or indeed of thinking very hard about how to prove the fact; the procedure is a general way of unfolding any definition phrased in terms of previous definitions and let expressions. This sort of proof by rewriting is central to HOL (and LCF) methodology; it is at once very powerful and quite expensive. For further discussion of this trade-off, see Section 8. For more on LCF-style rewriting see the LCF manual (Paulson, [15]) and also Paulson [14].

A Proof of Correctness of the Viper Microprocessor

51

This is an example of forward proof: the user supplies a general ML procedure with the definitions it will need (and the bit string #00011 in this case), and HOL unfolds definitions and lets and does routine inferences until there is no more to do. The user does not necessarily have to know in advance what the final term will be; the procedure computes it, and proves it equal to the initial term. This is one mode in which HOL may be used; the other is discussed in Section 5.4.

5.2 5.2.1

Stepping through the Graph of the Host Machine Node to Node

Using the same procedure, the new state, transient and node that HOST_NEXT gives for DUMMY, STOP, FETCH and PERFORM can also be produced. For DUMMY all choice can be limited (by properties of conditionals) to the node component.

Theorem 8: HOST~EXT for Node 16 1- HOST_NEXT«(ram,p,a,x,y,b,stop) ,t,rsf,msf,dsf,csf,fsf) ,#10 000) «ram,p,a,x,y,b,stop) ,t,rsf,msf,dsf,csf,fsf) ,(stop => #010001#00001) For STOP there is no choice.

Theorem 9: HOST~EXT for Node 8 1- HOST_NEXT«(ram,p,a,x,y,b,stop) ,t,rsf,msf,dsf,csf,fsf) ,#0 1000) «ram,p,a,x,y,b,T) ,t,rsf,msf,dsf,csf,fsf) ,#10000 For FETCH, the new stop flag is set if an illegal condition has arisen, and the next node (if not STOP) is selected by FMOVE. The transient receives parts of the new instruction. The choice can be limited to the stop and node components. The procedure uses Theorem 0 as a rewrite rule at the appropriate point. (Theorem 10 is easier to read with some of the let-expressions not unfolded.)

Theorem 10: HOST ~EXT for Node 1 1- HOST_NEXT«(ram,p,a,x,y,b,stop) ,t,rsf,msf,dsf,csf,fsf) ,#0 0001) let fetched = INSTFETCH(ram,p) in let new_msf = M fetched in let new_dsf = D fetched in let new_csf = C fetched in let new_fsf = FF fetched in «ram,TRIM32T020(INCP32 p),a,x,y,b, (INVALID(INCP32 p» \/ «ILLEGALCALL(new_dsf,new_csf,new_fsf» \/ «SPAREFUNC(new_dsf,new_csf,new_fsf» \/ «ILLEGALPDEST(new_dsf,new_csf,new_fsf» \/

52

A Proof of Correctness of the Viper Microprocessor

(ILLEGALVRITE(new_dsf,new_csf,new_msf»»», PAD20T032(A(INSTFETCH(ram,p»),R(INSTFETCH(ram,p»,new_msf,new_dsf, new_csf,new_fsf) , «INVALID(INCP32 p» \/ «ILLEGALCALL(new_dsf,new_csf,new_fsf» \/ «SPAREFUNC(new_dsf,new_csf,new_fsf» \/ «ILLEGALPDEST(new_dsf,new_csf,new_fsf» \/ (ILLEGALVRITE(new_dsf,new_csf,new_msf»») => #01000 I «VALl new_csf = 1) => «VAL2 new_msf = 0) => ... I «VAL2 new_msf 1) => ... I ... » I «VAL3 new_dsf = 7) => «VAL2 new_msf = 0) => ... I «VAL2 new_msf - 1) => ... I ... » I «VAL3 new_dsf = 6) => «VAL2 new_msf = 0) => ... I «VAL2 new_msf = 1) => ... I ... » I «VALl new_csf = 0) /\ «(VAL3 new_dsf = 5) /\ b) \/ «VAL3 new_dsf - 4) /\ (NOT b») => #10000 I «VAL2 new_msf = 0) => «VALl new_csf = 0) /\ (VAL4 new_fsf = 1) => #00011 I ... ) I «VAL2 new_msf = 1) => «VALl new_csf = 0) /\ (VAL4 new_fsf = 2) => ... «VALl new_csf = 0) /\ (VAL4 new_fsf = 12) => ... I ... » I «VALl new_csf = 0) /\ (VAL4 new_fsf = 12) => ... I ... »»»» For PERFORM, only the memory component and the transient involve no choice.

Theorem 11: HOST ...NEXT for Node 4 1- HOST_NEXT«(ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf),#00100) «ram, «VALl csf = 1) => '" I «VAL3 dsf = 0) => p I «VAL3 dsf = 1) => p I «VAL3 dsf = 2) => p I TRIM32T020(VALUE(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b»»»), «VALl csf = 1) => '" I «VAL3 dsf = 0) => VALUE(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b» I a», «VALl csf = 1) => ... I

A Proof of Correctness of the Viper Microprocessor

53

«VAL3 dsf = 0) => x 1 «VAL3 dsf = 1) => VALUE(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b» 1 x»), «VALl csf = 1) => ... 1 «VAL3 dsf = 0) => y 1 «VAL3 dsf = 1) => y 1 «VAL3 dsf = 2) => VALUE(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b» 1 y»», «VALl csf = 1) => ••• 1 BVAL(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b»), «VALl csf = 1) => F 1 SVAL(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b»», t,rsf,msf,dsf,csf,fsf), «VALl csf = 1) => ... 1 (SVAL(ALU(fsf,msf,dsf,REG(rsf,a,x,y,p),t,b» -> #01000 1 #10000»

5.2.2

Making Time Explicit

We now consider making time explicit. Times are represented as numbers. Signals are functions from times to registers: ram_sig:num->mem21_32 a_sig:num->vord32 y_sig:num->vord32 stop_sig:num->bool rsf_sig:num->vord2 dsf_sig:num->vord3 fsf_sig:num->vord4

p_sig:num->vord20 x_sig:num->vord32 b_sig:num->bool t_sig:num->vord32 msf_sig:num->vord2 csf_sig:num->vordl

ram_sig n means the value of ram at time n, and so on.

A theorem of the following form expresses the behaviour of HOST_NEXT as it steps through a node # ..... ; a time unit is the time it takes for one event to happen at the host level. 1- (!n. «ram_sig(n+l),p_sig(n+l) ,a_sig(n+l) ,x_sig(n+l) , y_sig(n+l) ,b_sig(n+l),stop_sig(n+l»,t_sig(n+l) , rsf_sig(n+l),msf_sig(n+l),dsf_sig(n+l),csf_sig(n+l), fsf_sig(n+l»,node_sig(n+l) = HOST_NEXT

«(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n), t_sig n,rsf_sig n,msf_sig n,dsf_sig n,csf_sig n,fsf_sig n), node_sig n» ==> (node_sig n = # ..... ) ==>

54

A Proof of Correctness of the Viper Microprocessor

«(ram_sig(n+1) n .•. ) /\ (p_sig(n+1) ..... n ..• ) /\ (a_sig(n+1) ... n ... ) /\ (x_sig(n+1) ... n ... ) /\ (y_sig(n+1) - ... n ... ) /\ (b_sig(n+1) - ..• n ... ) /\ (stop_sig(n+1) = ... n ..• » /\ (t_sig(n+1) = ... n ... ) /\ (rsf_sig(n+1) - ... n ..• ) /\ (msf_sig(n+1) ..... n .•. ) /\ (dsf_sig(n+l) = .•. n •.. ) /\ (csf_sig(n+l) '" ••• n •.• ) /\ (fsf_sig(n+l) = ... n ..• » /\ (node_sig(n+1) = ... ) K

•••

The first antecedent could be abbreviated by use of HOST_NEXT_SIG as in Section 1.3. It gives a sequence of 14-tuples of signals such that HOST_NEXT applied to the fourteen signals at any time gives the fourteen signals at the next time. The second antecedent fixes a particular event (a node in the graph) at some time n. The fourteen equations then give the value of the signals at time n+1 in terms of the signals at time n, after that node is traversed. For PRECALL (event 3, node #00011), the theorem is: TheoreIn 12: TiIned 1-

HOST~EXT

for Node 3

(!n.

«ram_sig(n+l) ,p_sig(n+l) ,a_sig(n+1),x_sig(n+l) , y_sig(n+1) ,b_sig(n+1) ,stop_sig(n+1»,t_sig(n+l) , rsf_sig(n+l),msf_sig(n+l) ,dsf_sig(n+l) ,csf_sig(n+l) , fsf_sig(n+l»,node_sig(n+1) = HOST_NEXT

«(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n), t_sig n,rsf_sig n,msf_sig n,dsf_sig n,csf_sig n,fsf_sig n), node_sig n» ==> (node_sig n '" #00011) ==> «(ram_sig(n+1) - ram_sig n) /\ (p_sig(n+l) '" p_sig n) /\ (a_sig(n+1) - a_sig n) /\ (x_sig(n+l) K x_sig n) /\ (y_sig(n+1) = PAD20T032(p_sig n» /\ (b_sig(n+1) = b_sig n) /\ (stop_sig(n+1) = F» /\ (t_sig(n+1) = t_sig n) /\ (rsf_sig(n+1) = rsf_sig n) /\ (msf_sig(n+1) = msf_sig n) /\

A Proof of Correctness of the Viper Microprocessor

55

(dsf_sig(n+1) • dsf_sig n) /\ (csf_sig(n+1) = csf_sig n) /\ (fsf_sig(n+1) = fsf_sig n» /\ (node_sig(n+1) = #00100) This gives the effect over one time unit of passing through the PRECALL node: if the node signal at time n is #00011, then the y register at time n+1 holds the (padded) value of the program counter at time n, and so on. The relation to Theorem 7 is clear. To prove Theorem 12, it is assumed for some n that 1- node_sig n

= #00011

The first antecedent is assumed and instantiated to that n:

Assumption 1: Sequence Assumption 1- «ram_sig(n+1),p_sig(n+1),a_sig(n+1),x_sig(n+1), y_sig(n+1) ,b_sig(n+1) ,stop_sig(n+1»,t_sig(n+1) , rsf_sig(n+1) ,msf_sig(n+1) ,dsf_sig(n+1) ,csf_sig(n+1) , fsf_sig(n+1»,node_sig(n+1) = HOST_NEXT

«(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n), t_sig n,rsf_sig n,msf_sig n,dsf_sig n,csf_sig n,fsf_sig n), node_sig n) Assumption 1 is unfolded using the first assumption and then by uSIng Theorem 7 (which says what HOST_NEXT does at node #00011). The result can be expressed (using properties of tuples) as fourteen pairwise equalities with all assumptions discharged. This gives Theorem 12. The same ML procedure which generates the proof of Theorem 12 also generates theorems for DUMMY, STOP, FETCH and PERFORM, which are not shown here but bear a similar relation to Theorems 8-11 (respectively) as Theorem 12 does to Theorem 7. This is another example of a general procedure being used to generate a forward proof (in the case of Theorem 12, a pr~of of about 2,200 inferences); one does not have to know in advance the theorem being proved, but only its form.

5.2.3

End Results of Sequences in the Host Machine

Finally, the result of the whole sequence DUMMY, FETCH, PRECALL, PERFORM can be produced. To do this, it is assumed again that for some n, node_sig n = #10000 (we start at DUMMY at time n). The assumption is made which asserts that the signals at any time are the result of HOST_NEXT at the previous time. The three conditions (el, c3 and c17) are assumed to hold of the state signals at time n, to ensure that the correct choices are made at each node. Then the path

56

A Proof of Correctness of the Viper Microprocessor

is traversed by referring at each node to the corresponding theorem (Theorem 12 for node 3, etc) which describes the effects over the one time unit taken to traverse that node. That causes certain changes in the fourteen components, which are compounded with the changes at the previous node (if there was one). At each stage, the assumed conditions are used to help make decisions about the node signal, i. e. the next event. For each node, a theorem follows, describing the accumulated changes at that time. Again, an ML procedure is written to generate the proof. It is a recursive procedure which refers initially to the theorem (analagous to Theorem 12) for node #10000, dismisses the antecedents because these have already been assumed, and then uses the pairwise equalities to rewrite the components and select the next node. Successive theorems are generated by examining the node, and referring to the theorem about HOST_NEXT for that node. The process continues until the first appearance of node #10000 as the next node. At each stage, more complex simplifications must also be done; for example, where there is a choice of next node, the conditional expression denoting the next node has to be reduced to a particular bit string by using assumed facts such as the fact that c17 holds of the state at time n. At some stages, the procedure must also use lemmas; for example, in the final stage of the literal call path, Theorem 6 is required to avoid choices involving a destination of 0, 1 or 2. Under the assumptions 1. !n.

«ram_sig(n+1),p_sig(n+1),a_sig(n+1),x_sig(n+1), y_sig(n+1),b_sig(n+1),stop_sig(n+1»,t_sig(n+1), rsf_sig(n+l),msf_sig(n+1),dsf_sig(n+1),csf_sig(n+1), fsf_sig(n+l»,node_sig(n+l) =

HOST_NEXT

2. 3. 4. 5.

«(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n), t_sig n,rsf_sig n,msf_sig n,dsf_sig n,csf_sig n,fsf_sig n), node_sig n) node_sig n = #10000 c1(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n) c3(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n) c17(ram_sig n,p_sig n,a_sig n,x_sig n,y_sig n,b_sig n,stop_sig n)

four theorems are proved. These, conjoined, are called Theorem 13. At time n+1, DUMMY has been executed without having to stop. 1- ««ram_sig(n+l) = ram_sig n) /\ (p_sig(n+l) = p_sig n) /\ (a_sig(n+l) = a_sig n) /\ (x_sig(n+l) = x_sig n) /\ (y_sig(n+l) = y_sig n) /\ (b_sig(n+l) = b_sig n) /\ (stop_sig(n+l) = F» /\ (t_sig(n+l) = t_sig n) /\ (rsf_sig(n+l) = rsf_sig n) /\

A Proof of Correctness of the Viper Microprocessor

57

(msf_sig(n+l) = msf_sig n) /\ (dsf_sig(n+l) = dsf_sig n) /\ (csf_sig(n+l) = csf_sig n) /\ (fsf_sig(n+l) = fSf_sig n» /\ (node_sig(n+1) = #00001»

At time n+2 FETCH has been executed, so the program counter is incremented, the transient fields contains the fields of the new instruction, and the t register holds the new address. 1-

= ram_sig n) /\ (p_sig(n+2) = TRIK32T020(INCP32(p_sig n») /\ (a_sig(n+2) = a_sig n) /\ (x_sig(n+2) = x_sig n) /\ (y_sig(n+2) = y_sig n) /\ (b_sig(n+2) = b_sig n) /\ (stop_sig(n+2) = F» /\ (t_sig(n+2) = PAD20T032(A(INSTFETCH(ram_sig n,p_sig n»» (rsf_sig(n+2) = R(INSTFETCH(ram_sig n,p_sig n») /\ (msf_sig(n+2) = K(INSTFETCH(ram_sig n,p_sig n») /\ (dsf_sig(n+2) = D(INSTFETCH(ram_sig n,p_sig n») /\ (csf_sig(n+2) = C(INSTFETCH(ram_sig n,p_sig n») /\ (fsf_sig(n+2) = FF(INSTFETCH(ram_sig n,p_sig n»» /\ (node_sig(n+2) = #00011»

««r~sig(n+2)

/\

At time n+3 PRECALL has been executed, so the y register holds the last value of the program counter. 1- ««ram_sig(n+3) = ram_sig n) /\ (p_sig(n+3) = TRIK32T020(INCP32(p_sig n») /\ (a_sig(n+3) - a_sig n) /\ (x_sig(n+3) = x_sig n) /\ (y_sig(n+3) = INCP32(p_sig n» /\ (b_sig(n+3) = b_sig n) /\ (stop_sig(n+3) = F» /\ (t_sig(n+3) = PAD20T032(A(INSTFETCH(ram_sig n,p_sig n»» /\ (rsf_sig(n+3) = R(INSTFETCH(ram_sig n,p_sig n») /\ (msf_sig(n+3) = K(INSTFETCH(ram_sig n,p_sig n») /\ (dsf_sig(n+3) = D(INSTFETCH(ram_sig n,p_sig n») /\ (csf_sig(n+3) = C(INSTFETCH(ram_sig n,p_sig n») /\ (fsf_sig(n+3) = FF(INSTFETCH(ram_sig n,p_sig n»» /\ (node_sig(n+3) = #00100»

At time n+4 PERFORM has been executed. By the definition of PERFORM, the program counter now holds the result (the first value) computed by the ALU if

58

A Proof of Correctness of the Viper Microprocessor

the compare field of the new instruction does not hold 1 (which it doesn't) and if the destination field of the new instruction is not 0, 1 or 2 (which it is not). The b flag holds the b value (the second value) computed by the ALU, and the stop flag the stop value (the third). Based on Theorem 11, the memory source supplied to the ALU comes from the t register, which at time n+3 on this path held the (padded) new address. The other arguments are likewise dictated by Theorem 11 and the state and transient at time n+3. The whole ALU expression IS:

ALU (FF(INSTFETCH(raa_sig n,p_sig n», K(INSTFETCH(raa_sig n,p_sig n», D(INSTFETCH(raa_sig n,p_sig n», REG

(R(INSTFETCH(raa_sig n,p_sig n»,a_sig n,x_sig n, PAD20T032(TRIK32T020(INCP32(p_sig n»), TRIK32T020(INCP32(p_sig n»), PAD20T032(A(INSTFETCH(raa_sig n,p_sig n»),b_sig n)

For convenience this is abbreviated by a new constant, ALU_ABBR6 (one of eight needed for the twenty-four cases). For any raa, p, a, x, y and b: 1- ALU_ABBR6(raa,p,a,x,y,b) = ALU (FF(INSTFETCH(raa,p»,K(INSTFETCH(raa,p»,D(INSTFETCH(raa,p», REG

(R(INSTFETCH(raa,p»,a,x,PAD20T032 (TRIK32T020(INCP32 p», TRIK32T020(INCP32 p»,PAD20T032(A(INSTFETCH(raa,p»),b)

Since it has been assumed that VAL4(FF(INSTFETCH(raa_sig n,p_sig n») • 1, Theorem 5 can be used to compute the result of the ALU operation: PAD20T032(A(INSTFETCH(raa_sig n,p_sig n»),b_sig n, NOT «VAL3(D(INSTFETCH(ram_sig n,p_sig n») = 3) \/ «VAL3(D(INSTFETCH(ram_sig n,p_sig n») = 4) \/ (VAL3(D(INSTFETCH(ram_sig n,p_sig n») = 5»» \/ (INVALID(PAD20T032(A(INSTFETCH(ram_sig n,p_sig n»»

The padded address cannot be invalid (Theorem 4), and the destination of the new instruction must be 3, 4 or 5 (Theorem 6), so the ALU returns: PAD20T032(A(INSTFETCH(ram_sig n,p_sig n»),b_sig n,F

Thus at time n+4 PERFORK has been executed:

A Proof of Correctness of the Viper Microprocessor

59

1- «(ram_sig(n+4) - ram_sig n) /\ (p_sig(n+4) - A(INSTFETCH(ram_sig n,p_sig n») /\ (a_sig(n+4) = a_sig n) /\ (x_sig(n+4) = x_sig n) /\ (y_sig(n+4) = INCP32(p_sig n» /\ (b_sig(n+4) = b_sig n) /\ (stop_sig(n+4) = F» /\ (t_sig(n+4) = PAD20T032(A(INSTFETCH(ram_sig n,p_sig n»» /\ (rsf_sig(n+4) = R(INSTFETCH(ram_sig n,p_sig n») /\ (msf_sig(n+4) - M(INSTFETCH(ram_sig n,p_sig n») /\ (dsf_sig(n+4) = D(INSTFETCH(ram_sig n,p_sig n») /\ (csf_sig(n+4) = C(INSTFETCH(ram_sig n,p_sig n») /\ (fsf_sig(n+4) = FF(INSTFETCH(ram_sig n,p_sig n»» /\ (node_sig(n+4) = #10000)

The next node is #10000, so the sequence is ended. The program counter now holds the address of the procedure to be called, and the y register stores the return address (the original program counter's value plus one). These four theorems provide the basis for proving Theorem 2. The fourth is part of the proof of Theorem 3. (See Section 1.3). (Here, the theorems are stated in terms of the fourteen components. Various properties of n-tuples are required to formulate them as in Section 1.3.)

5.3

The Equivalence Proof

Now the high-level specification, NEXT (Section 3.2) is considered in relation to the results of the graph traversal. We wish to prove: Theorem 14: Result of Specification for Call c1(ram,p,a,x,y,b,stop) /\ c3(ram,p,a,x,y,b,stop) /\ c17(ram,p,a,x,y,b,stop) ==> (NEXT(ram,p,a,x,y,b,stop) = ram,A(INSTFETCH(ram,p»,a,x,INCP32 p,b,F)

That is, for any state, NEXT specifies the same changes as have accumulated in the registers at the end of the sequence produced by the host machine (disregarding time). The proof consists in reducing the conditional in the definition of NEXT. Obviously, stop is false (c1 holds). The five illegal conditions mentioned in c3 are all false. Furthermore, the new address cannot be illegal: the new memory address control field holds 0, so OFFSET gives a padded 20-bit address. Since the new compare field also holds 0, comp is false; vriteop and skip are also obviously false. The new destination field is neither 0, 1 or 2 (by Theorem 6) and so the

60

A Proof of Correctness of the Viper Microprocessor

call branch is arrived at. The program counter, the b flag and the stop flag of the new state in this case depend on the result (aluout) computed by ALU. The new state is: (ram,TRIM32T020(VALUE aluout),a,x,INCP32 p,BVAL aluout,SVAL aluout) In the fully unfolded definition of NEXT, aluout is ALU (FF(INSTFETCH(ram,p»,M(INSTFETCH(ram,p»,D(INSTFETCH(ram,p)), REG(R(INSTFETCH(ram,p»,a,x,y,TRIM32T020(INCP32 p», MEMREAD (ram,M(INSTFETCH(ram,p»,A(INSTFETCH(ram,p»,x,y, «VAL1(C(INSTFETCH(ram,p») = 0) /\ (VAL3(D(INSTFETCH(ram,p») = 6» \/ «VAL1(C(INSTFETCH(ram,p») = 0) /\ NOT«VAL3(D(INSTFETCH(ram,p») - 7) \/ (VAL3(D(INSTFETCH(ram,p») - 6» /\ (VAL4(FF(INSTFETCH(ram,p») = 2», «VAL1(C(INSTFETCH(ram,p») = 0) /\ (NOT(VAL3(D(INSTFETCH(ram,p») = 7) /\ NOT(VAL3(D(INSTFETCH(ram,p») = 6» /\ (VAL4(FF(INSTFETCH(ram,p») = 12»),b) which is abbreviated by a new constant ALU_ABBR2, so that the new state specified by NEXT is (ram,TRIM32T020(VALUE(ALU_ABBR2(ram,p,a,x,y,b»),a,x, INCP32 p,BVAL(ALU_ABBR2(ram,p,a,x,y,b», SVAL(ALU_ABBR2(ram,p,a,x,y,b») Since VAL4(FF(INSTFETCH(ram,p») = 1, the reasoning is as for ALU_ABBR6 in Section 5.2.3: Theorem 5 is used to conclude that ALU in this case returns MEMREAD( ... ) as its result, b for the b flag, and NOT«VAL3(D(INSTFETCH(ram,p») = 3) \/ «VAL3(D(INSTFETCH(ram,p») - 4) \/ (VAL3(D(INSTFETCH(ram,p») = 5») \/ (INVALID(MEMREAD( ... ») for the stop flag, where " ... " stands for the arguments to MEMREAD. The value of the new destination field is 3 or 4 or 5, so the stop flag becomes F \I (INVALID (MEMREAD(. .. »). It remains to work out the value ofMEMREAD(. •. ). Its sixth argument is a boolean value which distinguishes main from peripheral memory, and the seventh, a boolean which indicates whether no memory source is required. The seventh works out to be false, since the value of the function

A Proof of Correctness of the Viper Microprocessor

61

field is not 12 (but rather 1). The sixth also works out to be false, since the function field is not 2, nor is the destination field 6. Thus, since the memory address control field is 0, MEMREAD returns just PAD20T032(A(INSTFETCH(ram,p»). This cannot be invalid (Theorem 4). Therefore the ALU returns: PAD20T032(A(INSTFETCH(ram,p»),b,F

so the state returned by NEXT is (ram,TRIM32T020(PAD20T032(A(INSTFETCH(ram,p»»,a,x,INCP32 p,b,F)

which by Axiom 1 is just (ram,A(INSTFETCH(ram,p»,a,x,INCP32 p,b,F)

which is what was wanted. In this case, the memory component (ram) and the a, x and y components of the state given by NEXT agree immediately with the end results of HOST_NEXT. That the other three registers agree depends on an argument about the ALU. The essence of the argument can be expressed by a theorem relating the two values computed by the ALU: the one referring to MEMREAD, in the case of the specification, and the one determined one by Theorem 11 for the PERFORM node (at the correct point in the sequence), in the case of the host machine. Theorem 15: Equivalence of ALU in Specification and Host 1- «VAL2(M(INSTFETCH(ram,p») = 0) /\ (VAL1(C(INSTFETCH(ram,p») = 0) /\ NOT(VAL3(D(INSTFETCH(ram,p») = 7) /\ NOT(VAL3(D(INSTFETCH(ram,p») = 6) /\ (VAL4(FF(INSTFETCH(ram,p») = 1» ==> (ALU_ABBR2(ram,p,a,x,y,b) = ALU_ABBR6(ram,p,a,x,y,b»

5.4

A Digression on Goal-Oriented Proof

For the HOL proof of Theorem 14, we are starting for a change with a goal. We know that NEXT should unfold to the state specified by the host machine, in the circumstances; but do not know whether it really does, nor how to attempt to prove that it does. Here, a goal oriented proof is called for. One starts with a goal (a term with an associated list of assumptions), and applies strategies or tactics to the goal to produce subgoals; then tactics are applied to the subgoals, and so on, until subgoals are reached which are known to be true (that is, which correspond to previously proved theorems or axioms). The tactics justify each decomposition into sub goals so that a proof can be assembled in the end. The proof is built up by inferences, starting with the theorems corresponding to the final set of subgoals. For example, if a tactic T is applied to a goal (A, t) (term t and assumptions A) to produce two subgoals

62

A Proof of Correctness of the Viper Microprocessor

then the ML procedure which justifies that decomposition expects two theorems o I-tl with no assumptions beyond Ai, and o l-t2 with no assumptions beyond A2 and infers a theorem I-t (with no assumptions beyond A). Tactics can be sequenced and combined in various ways to form compound tactics. In this way, arbitrarily complex proof strategies can be expressed as procedures which generate proofs. This style of proof revolves around the list of current assumptions in a natural way. Tactics use the same rewriting mechanism as forward proof. One frequently-used tactic generates a subgoal by rewriting the goal using all current assumptions of that goal as rewrite rules. Each rewrite is justified by a chain of inferences which is used used later in assembling the proof. Sometimes groups of assumptions can logically imply a new assumption; a useful tactic finds new conclusions and adds them to the assumption list. This provides a simple kind of resolution in HOL. For example, it follows from the definition of invalidity (Section 3.1) that 1- !pl. NOT (INVALID pl) ==> (PAD20T032(TRIK32T020 pl)

= pl)

This has an instance 1- NOT(INVALID(INCP32 p»

==> (PAD20T032(TRIK32T020(INCP32 p»=INCP32 p)

which can be used to simplify the expression PAD20T032(TRIK32T020(INCP32 p»

during the proof of Theorem 14. The instance can be identified and its conclusion added as an assumption as soon as its antecedent, NOT(INVALID(INCP32 p», joins the assumptions. The conclusion is not a new assumption because it can always be dismissed by a series of simple inference steps.

5.5

The Equivalence Proof in HOL

To prove Theorem 14 in HOL, first the definitions of el, e3 and e17 are unfolded. The definition of HEXT is unfolded, using a version of the definition without the let and lambda expressions (analagous to the version of the definition of PRE CALL

A Proof of Correctness of the Viper Microprocessor

63

in Section 5.1). (Part of this unfolding has been seen in the form of the ALU expression, Section 5.3.) The assumptions are "resolved" with Theorem 6, all of whose antecedents are by this point among the assumptions. This adds new members to the list of current assumptions of the subgoal, including: o NOT(VAL3(D(INSTFETCH(ram,p») =

0)

o NOT(VAL3(D(INSTFETCH(ram,p») - 1) o NOT(VAL3(D(INSTFETCH(ram,p») • 2)

The subgoal can now be rewritten using all of its current assumptions as well as facts such as Axiom 1 and Theorem 5 (to reduce the ALU expressions). By this process the NEXT expression (a conditional with ten branches) reduces to the correct case, and the state in that case to the desired state: both sides of the subgoal are equal. A sub goal with a term t=t is known to correspond to an axiom: namely, reflexivity. From this point HOL starts with the axiom of reflexivity and builds up the proof by using the justification of each decomposition into subgoals. The book-keeping is all done by HOL, the user only supplying the tactics. The whole proof can be generated with one command, given the correct (compound) tactic. The point about forward proof is that it would be very difficult to structure a forward proof effort starting from I-t=t and applying inference rules to produce a theorem as complex as Theorem 14. The only real tricks in this process are supplying the lemmas needed, and causing the rewriting to happen in an efficient way so the proof can be done in a reasonable amount of time. The former requires the proof effort to be planned and structured sensibly. The latter is discussed further in Section 8.

6

The Rest of the Proof

So far, the specification and host machine have only been revealed insofar as they concern the literal call instruction. This in fact gives a good idea of what is claimed and proved in the other cases, and how it is stated and proved in HOL. The literal call proofs are about average in length and difficulty among the twenty-four cases. The other four paths suggested in Figure 2 are for states in which the stop flag is set; call instructions for which the ALU sets the stop flag; states for which fetching produces an illegal instruction; and states for which the new fetched instruction is a non-operation. The other nineteen paths are for call instructions with indirect or offset addresses; ALU operations with the a, l[ or y register as destination; write operations to memory or output; read operations from memory or input; and variations of the above with differents sorts of addressing and in which illegal situations arise and the machine stops. The tree of all possible paths can be seen in [5]. In fact, the definitions of NEXT and the host machines in the example proofs have been unfolded to their maximum depth, but that is not necessary for doing

64

A Proof of Correctness of the Viper Microprocessor

the proof. As soon as they are unfolded sufficiently, the host and specification are seen to be equal (by means of Theorem 15 and so on), and the proof is completed. A large number of other lemmas and intermediate results were proved en route to the main theorems and not mentioned here; no one is very complicated, but they add up to a certain amount of unseen effort. Other results have been mentioned only in passing (see tables in Section 8).

7

Errors Found in the Viper Specifications

Aside from certain HOL conventions, such as consistently currying or uncurrying a function, and some corrections of typographical errors, the HOL versions of the functional specification and the host differ from the definitions in [4] at a few places where we found errors in the host and high-level specification. There was a type error in the function FMOVE which was passed a boolean b by FETCH when FMOVE was expecting a I-bit string. More seriously, there were some other errors in FETCH, and one in NEXT. The function for FETCH had read: 1- FETCH«ram,p,a,x,y,b,stop),t,rsf,msf,dsf,csf,fsf) let fetched = INSTFETCH(ram,p) in let newp = TRIM32T020(INCP32 p) in let newr = R fetched in let newm = M fetched in let newd = D fetched in let newc = C fetched in let newf - FF fetched in let newt - PAD20T032(A fetched) in let not inc = INVALID(INCP32 newp) in let illegalcl = ILLEGALCALL(dsf,csf,fsf) in let illegalsp = SPAREFUNC(dsf,csf,fsf) in let illegalonp = ILLEGALPDEST(dsf,csf,fsf)in let illegalwr - ILLEGALWRlTE(dsf,csf,msf) in let stopstate E WORDS 8 in let b' = WORD1(V[b]) in (notinc \I (illegalcl \/ (illegalsp \/ (illegalonp \/ illegalwr») => «(ram,newp,a,x,y,b,T),newt,newr,newm,newd,newc,newf),stopstate) «(ram,newp,a,x,y,b,F),newt,newr,newm,newd,newc,newf), FMOVE(newm,newd,newc,newf,b'»)

so that the phase of fetch cycle was confused: it is actually the incremented program counter one wants to check for validity, not the twice incremented counter, and it is the new instruction one wants to check for legality, not the instruction that rested in the transient before the FETCH operation. This error

A Proof of Correctness of the Viper Microprocessor

65

became apparent in the course of doing the proof because the host's results disagreed with the specification's results. The original specification had an incomplete check for illegal instructions; the possibility of the instruction being a non-operation was neglected. The relevant part read: ... in let illegaladdr = (NOT(NILM(dsf,csf,fsf»

/\ (INVALID(OFFSET(msf,addr,x,y»»

in

This naturally caused a problem with interpreting non-operations whose fields met the two conditions above but which were meant to be legal instuctions. Finally, we added definitions to form a complete problem statement: the simple definition of HOST_NEXT to traverse the graph and the formal definitions of the conditions cl and so on. There were more conditions than forseen in the informal proof ([5]), similar to c17; these conditions always decide whether to stop because of an abnormality after an ALU operation, or whether to end the sequence naturally, depending on the stop value returned by the ALU. However, the inputs to the ALU are different in different paths through the graph, depending on which nodes have come before. In the literal call path the ALU expression was: ALU (FF(INSTFETCH(ram_sig n,p_sig n», M(INSTFETCH(ram_sig n,p_sig n», D(INSTFETCH(ram_sig n,p_sig n», REG (R(INSTFETCH(ram_sig n,p_sig n»,a_sig n,x_sig n, PAD20T032(TRIM32T020(INCP32(p_sig n»), TRIM32T020(INCP32(p_sig n»), PAD20T032(A(INSTFETCH(ram_sig n,p_sig n»),b_sig n)

but this instance was a result of having been through the DUMMY, FETCH and in particular the PRECALL node, and no others. In fact, there are seven other similar conditions for the twenty-four total paths. Furthermore, the conditions suggested did not make a distinction between the contents of the transient before and after a FETCH event. For example, in [5], c3 was defined to be something like c3(p,dsf,csf,fsf,msf,b) = (NOT «INVALID(INCP32 p» \/ «ILLEGALCALL(dsf,csf,fsf» \/ «SPAREFUNC(dsf,csf,fsf» \/ «ILLEGALPDEST(dsf,csf,fsf» \/ (ILLEGALWRITE(dsf,csf,msf»»») /\

66

A Proof of Correctness of the Viper Microprocessor

«NOT(VALl csf • 1» /\ «NOT«VAL3 dsf = 7) \/ (VAL3 dsf • 6») /\ «NOT(NOOP(dsf,csf,b») /\ «VAL2 .sf - 0) /\ «VALl csf • 0) /\ (VAL4 fsf = 1»»»

This is not what is required for traversing the graph (see Section 4.2) and in fact does not make sense; it is not a function of the state and so has no meaning at the higher level.

8

Performing the Proof

We have tried to present the proof as simply as possible in this paper, but the actual proof effort deserves some discussion. The process of typing the definitions in [4] and [5] into HOL took about a week; it was straightforward since the definitions were written in a notation called LCF -LSM, a predecessor of HOL. The generation of the proof took about six person-months of work. (It was done on a Sun 3 workstation with eight megabytes of memory.) This may seem a long time for a proof which is not conceptually difficult, but it should be borne in mind what was actually generated: a sequence of over a million inferences which ensures that the correctness statement is really true (see Section 1.1). The hand proof done by Cullyer ([5]) took only about three weeks but did not detect the errors described in Section 7. The generation of the theorems about HOST_NEXT (Sections 5.1, 5.2.1, 5.2.2 and 5.2.3) was relatively quick and easy, as they were done by forward proof, i.e. letting HOL continuously unfold and simplify a definition to compute an unforseen theorem by a known method. Most of the time was spent in proving the theorems about NEXT, for which the desired result was known in advance but the way to prove it was not. The latter is a more typical sort of HOL or LCF proof effort, and it is a very interactive process. One typically tries a tactic, examines the subgoals, backs up, tries another, and so on, until a successful structure of tactics is discovered. The proof effort has to be planned carefully and the proof well understood in advance. Lemmas have to be anticipated and supplied in a useful form. The difficulty is partly because HOL (and LCF) are really just frameworks for performing proofs; the only proof tool which is in any sense automatic is rewriting (either forward or goal oriented) and the mechanism which justifies it. Any proof tool or strategy one thinks of can be expressed in the programming language ML and then applied, but designing these is a research problem. One hopes that large proof efforts such as the present one suggest new possible proof tools. The proofs are difficult also because of the sheer size of the theorems (for example, the versions of Theorem 13 before it is fully simplified are several pages long when printed in the style shown). A certain amount of the proof

A Proof of Correctness of the Viper Microprocessor

67

effort was directed toward structuring theorems into smaller ones, and defining abbreviations to control the expansion of theorems (such as ALU_ABBR6 and ALU_ABBR2).

The proofs are also large in terms of the computation time; rewriting in particular can be costly of time, and can be avoided in various ways by investing more of the user's time in doing explicit local replacements or other ad hoc methods. Large theorems are even time-consuming to fetch from files or to pretty-print. This problem can probably be attacked both by increased computing resources, and research into efficient proof strategies. Computation time can also be reduced by identifying large inference steps which are computed more than once and proving them as lemmas. For example, in computing the end results of the host machine sequences, certain conditional subexpressions recur for choosing the next node. The reductions of these are proved once as lemmas rather than being computed several times in the course of the proof. This is also often more convenient than doing a proof within a larger proof. To give a very rough idea of the magnitude of by the proof, the following tables give the number of primitive inference steps and the CPU time in seconds for the main theorems of the proof. (Garbage collection time has been ignored.) The remarks in Section 5.1 about HOL proofs all apply here; in particular, the apparent sizes and times of proofs can be greatly (and rather misleadingly) inflated by reliance on powerful, general rewriting strategies where more specific and local strategies could be found. Theorem 1, for example, has a large proof which is generated by a strategy which simply rewrites according to previously proved theorems; Theorem 14 has a relatively small proof which is generated by laborious instantiations and carefully specified substitutions. We mention this so that these figures are not taken too seriously. The last six theorems in the first table are referred to but not shown in this paper. "Unfolded NEXT" is the fully unfolded definition of NEXT with no lets; likewise for ALU. "Covering" refers to the theorem stating that at each node, the conditions for choosing the next node cover all possibilities. "Numbers" refers collectively to all the properties of numbers involved in reasoning about the numbers of steps in paths. "Tuples" refers collectively to properties of n-tuples required to transform theorems in phrased in terms of the fourteen components into theorems phrased in terms of a state, transient and node.

68

A Proof of Correctness of the Viper Microprocessor

Theorems used for All 24 Paths Theorem Steps Section 3.1 Theorem 4 Section 5.1 Theorem 7 Section 5.2.1 Theorem 8 Section 5.2.1 Theorem 9 Section 5.2.1 Theorem 10 Section 5.2.1 Theorem 11 Section 5.2.2 Theorem 12 Similar for DUMMY Similar for STOP Similar for FETCH Similar for PERFORM Section 5.5 Unfolded NEXT Unfolded ALU Total

215 7,664 8,474 7,649 19,587 24,428 2,195 2,259 2,179 10,291 8,019 7,486 4,492 104,938

CPU Seconds 3 230 315 280 456 649 64 66 57 232 213 292 82 2,939

Theorems Just for the Literal Call Path Theorem Steps CPU Seconds Section Section Section Section Section Section Section Total

3.1 Theorem 5 4.2 Theorem 6 5.3 Theorem 15 5.2.3 Theorem 13 5.3Theorem 14 1.3 Theorem 2 1.3 Theorem 3

233 1,836 2,545 14,037 8,626 8,586 8,620 44,483

43 35 81 312 237 213 230 1,151

Theorems tying the 24 Cases Together Theorem Steps CPU Seconds Section Section Section Section Total

1.3 Covering 1.3 Numbers 5.2.3 Tuples 1.3 Theorem 1

19,772 20,173 5,116 35,554 80,615

364 222 120 384 1,090

A Proof of Correctness of the Viper Microprocessor

9

69

Conclusions

As the reader can tell, the machine-checked proof described in this paper was a very large and somewhat tedious project. Proofs of this sort, while perfectly feasible, require experts in theorem-proving rather than electrical engineers, and such experts are at present scarce. It is the experts' time rather than computation time which makes verification expensive!. At present, formal verification is only appropriate in selected applications where the cost can be justified. One clear conclusion of this work is that very much more basic research must be done before formal verification becomes practical and commonplace in real-life applications. Work to date shows that verification of real examples is still too time-consuming and tedious, and requires too much user guidance, for routine use. We believe that HOL is an excellent framework in which to research the problems: to design more automatic proof tactics, better ways to abstract and describe proofs, and more efficient and flexible rewriting strategies. Experimental proofs of real hardware like the Viper chip suggest many research areas which must be addressed before larger applications can be attempted. A second conclusion is that for theorem-proving experts to undertake hardware proofs, documentation is vital; the experts will not be knowledgeable in engineering matters, and informal explanations of the function of the systems will save a lot of time spent working out connections which are perhaps obvious to the engineer. Viper is to be manufactured and sold by two companies, for both civil and military applications. As we have shown, the HOL methodology can detect design errors at a certain level of abstraction from the actual hardware. We feel it necessary to warn against a false sense of security afforded by an HOL proof; there are many classes of errors not even visible in the models used. In particular, we would caution against a false sense of security in such hazardous applications as nuclear reactor control systems. The art of formal verification is in its early days. We believe that years of basic research are needed before the techniques can be reliably applied to life-critical systems; even then they can only be applied with an understanding of their limitations. Further, a proof that a design meets a specification is only as good as the specification, which, for a complex system, can be very difficult to produce. A perfectly correct proof may relate to a specification which misses some essential behaviour of the design, and so may not give much security. For an example, see [12] regarding the problems of specifying a simple computer design. Finally, for verification to be effective, we must first move to a situation in which the same sources are used by designers, verifiers and manufacturers. For example, as mentioned in Section 1, the errors we found in Viper's specification IThe first level of the Viper proof, for example, takes several hours of CPU time to run, and that is using an inefficient prototype system. It took six months, however, to organize the proof and carry it out. We expect that subsequent lower-level stages will be much more complex.

70

A Proof of Correctness of the Viper Microprocessor

and host machine are apparently not present in the actual chip; hence the manufacturers cannot have used the specification which we have started to verify. Uniting these various communities is the aim of the research at RSRE, but there is a long way to go.

Acknowledgements Theorems 1, 2 and 3, which tie together the body of the proof (the twenty-four cases), and the basic lemmas about numbers and tuples were generated in HOL by Mike Gordon. The author is grateful to Elsa Gunter for assistance with typesetting and suggestions about the paper. The work described here was supported by a grant from RSRE, Research Agreement 2029/205(RSRE), and has been placed under Crown Copyright © HMSO London 1987. This paper has also appeared under the same title as a University of Cambridge Computer Laboratory Technical Report, No. 104, 1987.

A Proof of Correctness of the Viper Microprocessor

71

References [1] A. Church, "A Formulation of the Simple Theory of Types", Journal of Symbolic Logic 5, 1940 [2] A. Cohn and M. Gordon, "A Mechanized Proof of Correctness of a Simple Counter", University of Cambridge, Computer Laboratory, Tech. Report No. 94, 1986 [3] W. J. Cullyer and C. H. Pygott, "Hardware Proofs using LCFJ,SM and ELLA", RSRE Memo. 3832, Sept. 1985 [4] W. J. Cullyer, "Viper Microprocessor: Formal Specification", RSRE Report 85013, Oct. 1985 [5] W. J. Cullyer, "Viper - Correspondence between the Specification and the 'Major State Machine' ", RSRE report No. 86004, Jan. 1986 [6] W. J. Cullyer, "Implementing Safety-Critical Systems: The Viper Microprocessor", In: VLSI Specification, Verification and Synthesis, Edited by G. Birtwistle and P. A. Subrahmanyam, (this volume) [7] M. Gordon, R. Milner and C. P. Wadsworth, "Edinburgh LCF, Lecture Notes in Computer Science", Springer-Verlag, 1979 [8] M. Gordon, "Proving a Computer Correct", University of Cambridge, Computer Laboratory, Tech. Report No. 42, 1983 [9] M. Gordon, "HOL: A Machine Oriented Formulation of Higher-Order Logic", University of Cambridge, Computer Laboratory, Tech. Report No. 68, 1985 [10] M. Gordon, "HOL: A Proof Generating System for Higher-Order Logic", In: VLSI Specification, Verification and Synthesis, Edited by G. Birtwistle and P. A. Subrahmanyam (this volume), Also: University of Cambridge, Computer Laboratory, Tech. Report No. 103, 1987 [11] W. A. Hunt Jr., "FM8501: A Verified Microprocessor", University of Texas, Austin, Tech. Report 47, 1985 [12] J. J. Joyce, "Verification and Implementation of a Microprocessor", In: VLSI Specification, Verification and Synthesis, Edited by G. Birtwistle and P. A. Subrahmanyam, (this volume) [13] J. Kershaw, "Viper: A Microprocessor for Safety-Critical Applications" , RSRE Memo. No. 3754, Dec. 1985 [14] L. Paulson, "A Higher-Order Implementation of Rewriting", Science of Computer Programming 3, 119-149, 1983 [15] L. Paulson, "Interactive Theorem Proving with Cambridge LCF", Cambridge University Press, To Appear 1987

3 HOL: A Proof Generating System for Higher-Order Logic Michael J.C. Gordon University of Cambridge Computer Laboratory Corn Exchange Street Cambridge, CB2 3QG England. Abstract: HOL is a version of Robin Milner's LCF theorem proving system for higher-order logic. It is currently being used to investigate (1) how various levels of hardware behaviour can be rigorously modelled and (2) how the resulting behavioral representations can be the basis for verification by mechanized formal proof. This paper starts with a tutorial introduction to the meta-language ML. The version of higher-order logic implemented in the HOL system is then described. This is followed by an introduction to goal-directed proof with tactics and tacticals. Finally, there is a little example of the system in action which illustrates how HOL can be used for hardware verification.

1

Introduction to HOL

Higher-order logic is a promising language for specifying all aspects of hardware behaviour [8], [1]. It was originally developed as a foundation for mathematics [2]. Its use for hardware specification and verification was first advocated by Keith Hanna [9]. The approach to mechanising logic described in this paper is due to Robin Milner [6]. He originally developed the approach for a system called LCF designed for reasoning about higher-order recursively defined functions. The HOL system is a direct descendant of this work. The original LCF system was implemented at Edinburgh and is called "Edinburgh LCF" . The code for it was ported from Stanford Lisp to Franz Lisp by Gerard Huet at INRIA and formed the basis for a French research project called "Formel". Huet's Franz Lisp version of LCF was further developed at Cambridge by Larry Paulson and eventually became known as "Cambridge LCF" [18]. The HOL system is implemented on top of Cambridge LCF and consequently many (good and bad) features of LCF are found in it. In particular, the axiomatization of higher-order logic used is not the classical one due to Church, but an equivalent formulation strongly influenced by LCF.

74

A Proof Generating System for Higher-Order Logic

To make this paper self-contained, a brief introduction to ML (the LCF metalanguage) is included. The version of ML described here (and included as part of the HOL system) is not Standard ML [16] but the version of ML documented in the ML Handbook [4]. The acronym "HOL" refers to both the computer system and the version of higher-order logic that it supportsj the former will be called the "HOL system" and the latter the "HOL logic" . Because this paper is about the HOL system, the machine-readable syntax for the logic will be presented. For example, instead of the conventional logical symbols A, V, ..." V, 3 and A, we use the following strings of ASCII characters /\, \/, ., !, ? and \ respectively. In various other papers on HOL (e.g. [8], [1]) conventional notation, rather than the ASCII notation, is employed. The two notations are, of course, equivalent.

2

Introduction to ML

This section is a very brief introduction to the metalanguage ML. The aim is just to give a feel for what it is like to interact with ML and to introduce some features of the HOL system (e.g. the representation in ML of the types and terms of the HOL logic!). ML is an interactive programming language like Lisp. At top level one can evaluate expressions and perform declarations. The former results in the expression's value and type being printed, the latter in a value being bound to a name. The boxes below contain a little session with the system. The interactions in these boxes should be understood as occurring in sequence. For example, variable bindings made in earlier boxes are assumed to persist to later ones. To enter the HOL system one types hoI to Unix 2 j the HOL system then prints a sign-on message and puts one into ML. The ML prompt is #, so lines beginning with # are typed by the user and other lines are the system's response.

ITypes and tenns are explained in detail in Section 3.2. 2The Unix prompt is

%.

A Proof Generating System for Higher-Order Logic

75

X hoI 1__ 1 1 1 1 1 lGHER 1__ 1 RDER 1__ OGle (Built on Sept

31)

#1. [2;3;4;5];;

[1; 2; 3; 4; 5J : int list

The ML expression 1. [2;3;4;5] has the form el op e2 where el is the expression 1 (whose value is the integer 1), e2 is the expression [2;3;4;5] (whose value is a list of four integers) and op is the infixed operator'.' which is like Lisp's cons function. Other list processing functions include hd (car in Lisp), t l (cdr in Lisp) and null (null in Lisp). The double semicolon '; ;' terminates a top-level phrase. The system's response is shown on the line not starting with a prompt. It consists of the value of the expression followed, after a colon, by its type. The ML typechecker infers the type of expressions using methods invented by Robin Milner [15]. The type int list is the type of 'lists ofintegers'j list is a unary type operator. The type system of ML is very similar to the type system of the HOL logic which is explained in Section 3.1 The value of the last expression evaluated at top-level in ML is always remembered in a variable called it. let 1

1

=

2

it;;

[1; 2; 3; 4; 5]

int list

t t l 1;;

[2; 3; 4; 5]

int list

#hd it;;

2 : int #tl(tl(t1(t1(t11»»;; [] : int list

Following standard A-calculus usage, the application of a function I to an argument x can be written without brackets as Ix (although the more conventional I(x) is also allowed). The expression IX1X2'" Xn abbreviates the less intelligible expression ( .. '«IX1) X2)' . ,)x n (function application is left associative).

76

A Proof Generating System for Higher-Order Logic

Declarations have the form let x1-e1 and ... and Xn=e n and result in the value of each expression ei being bound to the name Xi. #let 11 = [1;2;3] and 12 = ['a'; 'b'; 'c'];; 11 - [1; 2; 3] : int list 12 - ['a'; 'b'; 'c'] : string list ML expressions like' a', 'b', 'foo' etc. are strings and have type string. Any sequence of ASCII characters can be written between the quotes. The function words splits a single string into a list of single character strings, using space as separator.

#words'a b c';; ['a'; 'b'; 'c'] : string list An expression of the form (e1,e2) evaluates to a pair of the values of e1 and e2. If e1 has type 0"1 and e2 has type 0"2 then (e1,e2) has type 0"1#0"2. A tuple (e1, . .. , en) is equivalent to (e1, (e2, . .. , en» (i. e. ' , , is right associative). The brackets around pairs and tuples are optional; the system doesn't print them. The first and second components of a pair can be extracted with the ML functions fst and snd respectively. #(l,true,'abc');; l,true,'abc' : (int # bool # string) #snd it;; true, 'abc'

(bool # string)

#fstit;; true : bool The ML expressions true and false denote the two truthvalues of type bool. ML types can contain the type variables *, **, ***, etc. Such types are called polymorphic. A function with a polymorphic type should be thought of as possessing all the types obtainable by replacing type variables by types. This is illustrated below with the function zip. Functions are defined with declarations of the form let f v1 ... Vn = e where each Vi is either a variable or a pattern build out of variables 3. Recursive functions are declared with letrec instead of let. 3The ML Handbook [4] gives exact details.

A Proof Generating System for Higher-Order Logic

The function zip, below, converts a pair of lists to a list of pairs [(Xl,Yl); ... ; (x" ,y,,)].

77

([Xl; ... ; X,,], [Yl; ... ; Yn])

#letrec zip(11,12) = #if null 11 or null 12 # then [] # else (hd 11,hd 12).zip(tl 11,tl 12);; zip = - : «* list # ** list) -) (* # **) list) #zip( [1;2; 3], ['a'; 'b' ; 'c ']); ; [l,'a'; 2,'b'; 3,'c'] : (int # string) list

Functions may be curried, i.e. take their arguments 'one at a time' instead of as a tuple. This is illustrated with the function curried_zip below:

#let curried_zip 11 12 - zip(11,12);; curried_zip = - : (* list -) ** list -) (* # **) list) #let zip_num - curried_zip [0;1;2;3;4;5;6;7;8;9];; zip_num • - : (* list -) (int # *) list) #zip_num ['a';'b';'c'];; [0, 'a'; 1, 'b'; 2, 'c'] : (int # string) list

Curried function are useful because they can be 'partially applied' as illustrated above by the partial application of curried_zip to [0; 1; 2; 3;4; 5; 6;7; 8; 9] which results in the function zip_num. The evaluation of an expression either succeeds or fails. In the former case, the evaluation returns a value; in the latter case the evaluation is aborted and a failure string (usually the name of the function that caused the failure) is passed to whatever invoked the evaluation. This context can either propagate the failure (this is the default) or it can trap it. These two possibilities are illustrated below. A failure trap is an expression of the form el ?e2. An expression of this form is evaluated by first evaluating el. If the evaluation succeeds (i. e. doesn't fail) then the value of el ?e2 is the value of el. If the evaluation of el fails, then the value of el ?e2 is obtained by evaluating e2.

78

A Proof Generating System for Higher-Order Logic

hd[]; ; evaluation failed

hd

#hd[] ? 'hd applied to empty list';; 'hd applied to empty list' : string

Terms of the HOL logic are represented in ML by a type called term. For example, the expression "x /\ y =-> zIt evaluates in ML to a term representing xAy-:;)z. Anything between a pair of quotes is parsed as a logical term. The quotation mechanism is described in Section 3.3. Terms can be manipulated by various built-in ML functions. For example, the ML function dest_imp : term -> term # term

splits an implication into a pair of terms consisting of the antecedant and consequent, and the ML function dest_conj : term -> term # term

splits a conjunction into its conjuncts.

#"x /\ y _a> z";; "x /\ y ...> zIt : term #dest_imp it;; "x /\ y" ,"z" : (term # term) #dest_conj(fst it);; "x","y" : (term # term)

Terms of the HOL logic are quite similar to ML expressions and this can at first be confusing. Indeed, terms of the logic have types similar to ML expressions. For example, "(1,2)" is an ML expression with ML type term. The HOL type of this term is num#num. By contrast, the ML expression ("1", "2") has type term#term.

The types of HOL terms form an ML type called type. Expressions of the form ": ... " evaluate to logical types. The built-in function type_of has ML type term->type and returns the logical type of a term.

A Proof Generating System for Higher-Order Logic

79

." (1 ,2)";;

"1,2" : tera

'type_of it;; ":nua • nUll"

type

'("1","2");; "1","2" : (tera' tera) .type_of(fst it);; ":nUII" : type

To try to minimise confusion between the logical types of HOL terms and the ML types of ML expressions, we will call the former object language types and the latter meta-language types. For example, "(1, T)" is an ML expression that has meta-language type tera and evaluates to a term with object language type ":nUlll.bool" .

'''(1, T)";;

"1, T" : tera 'type_of it;; ":nUlll • bool"

type

HOL terms can be input using explicit quotation, as above, or they can be constructed using ML constructor functions. The function mk_var constructs a variable from a string and a type. In the example below, three variables of type bool are constructed. These are used later.

'let x • ak_var( 'x'," :bool") 'and y . ak_var('y',":bool") 'and z - ak_var('z' ,":bool");; x - "x" tera y • "y" tera z • "z" tera

The constructors ak_conj and ak_iap construct conjunctions and implications respectively.

80

A Proof Generating System for Higher-Order Logic

#let t = mk_imp(mk_conj(x,y),z);; t • "x /\ Y ==> ZOO : term

Theorems are represented in HOL by values of type thm. The only way to create theorems is by proof. For a logician, a formal proof is a sequence each of whose elements is either an axiom or follows from earlier members of the sequence by a rule of inference. In HOL (following LCF) a proof is a sequence in just that sense. There are five axioms of the HOL logic and eight primitive inference rules. The axioms are bound to ML names. For example, the Law of Excluded Middle is bound to the ML name BOOL_CASES_AX:

#BOOL_CASES_AX; ;

1- It. (t

= T)

\/ (t

= F)

Theorems are printed with a preceding turnstile 1- as illustrated above; the symbol "!" is the universal quantifier "'r/". Rules of inference are ML functions that return values of type thm. For example, the rule of specialization (or '1elimination) is an ML function, SPEC, which takes as arguments a term "a" and a theorem 1- !x.t[x] and returns the theorem 1- t[a], the result of substituting a for x in t[x].

#SPEC; ;

- : (term -> thm -> thm) #SPEC "1=2" BOOL_CASES_AX;;

1- «1 = 2) - T) \/ «1 = 2)

= F)

A proof in the HOL system is constructed by repeatedly applying inference rules to axioms or to previously proved theorems. Since proofs may consist of millions of steps, it is necessary to provide tools to make proof construction easier for the user. The proof generating tools in the HOL system are just those of LCF, and are described in Section 7.1. Theorems have assumptions and a conclusion. The inference rule ASSUME generates theorems of the form t 1- t. The ML printer prints each assumption as a dot. The function dest_thm decomposes a theorem into a pair consisting of list of assumptions and the conclusion. The ML type goal abbreviates (term)list # tera, this is motivated in Section 7.1.

A Proof Generating System for Higher-Order Logic

81

#let thl .. ASSUME lit 1==>t2"; ; thl - . 1- tl --> t2 #dest_thm thl;; ["tl s=> t2"]. "tl ==> t2" : goal

The inference rule UNDISCH moves the antecedant of an implication to the assumptions. #let th2 = UNDISCH thl;; th2 .... 1- t2 #dest_thm th2;; ["tl ss> t2"; Itl l ].lt2" : goal

The rule DISCH takes a term and a theorem and 'discharges' the given term from the assumptions of the theorem. The functions hyp and concl select the list of hypotheses and the conclusion of a theorem, respectively. They just return the components of the pair computed by dest_thm.

#DISCH Itl-->t2" (DISCH "tl" th2);; 1- (tl ==> t2) --> tl ==> t2 #let th3 = DISCH It1==>t2" (DISCH It1" th2);; th3 - 1- (tl ==> t2) ==> tl ==> t2 #hyp th3;; [] : term list #concl th3; ; "(tl --> t2) --> tl ==> t2"

term

In the sections that follow we systematically describe HOL. We then conclude with another session illustrating the various theorem-proving tools in action.

3

Syntax of the HOL logic

In this section, the structure of HOL terms and types is explained. The particular sets of types, constants and axioms that are available to a user of the

82

A Proof Generating System for Higher-Order Logic

HOL system is specified by the theory he or she is working in. Each theory T specifies sets TyopsT and ConstsT of type operators and constants. These sets then determine the sets TypesT and TermsT of types and terms. How to set up theories is described in Section 5. We start by giving fairly abstract definitions of the sets TypesT and TermsT. Various constructor and destructor functions are then described. Finally, we explain the quotation mechanism built-in to the ML parser. This mechanism includes a type inference algorithm that enables the user to input terms without having to write an excessive amount of explicit type information. In what follows, a name is a sequence of letters, digits or primes (') beginning with a letter. The set of names is denoted by Names. For example, x, fred23, fred23' and fred23, , are all members of Names.

3.1

HOL Types

Object language types are expressions that denote sets. Following tradition, we use 0', 0", 0''', ... , 0'1, 0'2, etc. to stand for arbitrary types. HOL terms will be defined in such a way that all well-formed terms are type-consistent. Before describing the terms of the HOL logic we must first specify its types. There are four kinds of object language types. These can be described informally as follows. 1. Type variables: These are sequences of asterisks, optionally followed by sequences of digits. The set of type variables is denote by Tyvars; for example, ., •• , ••• , .1, ••• 24 are all members of Tyvars. Type variables provide the system with a limited amount of polymorphism (this is explained later). Small Greek letters, possibly with subscripts or primes, are used to stand for type variables. 2. Type constants: These are names like bool, num and tok. They denote

fixed sets of values. 3. Function types: If 0'1 and 0'2 are types, then 0'1 ->0'2 is the function type with domain 0'1 and range 0'2; it denotes the set of functions from the set denoted by its domain to the set denoted by its range. 4. Compound types: These have the form (0'10 ... ,O'n)op, where the types 0'1, ... , O'n are the argument types and op is a type operator of arity n. Type operators must be names; they denote operations for constructing sets. The type (0'1, •.. , O'n) op denotes the set resulting from applying the operation denoted by op to the sets denoted by 0'1, .•• , O'n. For example, list is a type operator with arity 1. It denotes the operation of forming all finite lists of elements from a given set. Another example is the type operator prod of arity 2 which denotes the cartesian product operation. Although these four kinds of types are logically distinct, the HOL system (following LCF) represents constant types as compound types built with O-ary

A Proof Generating System for Higher-Order Logic

83

type operators and function types as compound types built with a 2-ary type operator called fun. Thus the constant type nUll is represented as the compound type OnUII and 1T1->1T2 is represented as (1T1o 1T2) fun. In general, compound types must be written in the postfixed form described above, but there are two exceptions (in addition to -».

• Cartesian product types of the form • Disjoint union types ofthe form

(1T1,1T2)prod

(lTltIT2)SUII

can be written as

1T1 #1T2.

can be written as 1T1+1T2.

Products and unions are not primitive, but because they are so useful, the HOL parser treats them specially. These types are explained in Section 5.5 and Section 5.6 respectively. It is useful to describe the set of types of a theory a bit more formally. Each theory T determines a set Tyopsr of type operators. A type operator is a pair (op, n) where op is the name of the operator and n is its arity. The set Typesr of types of the theory T is the smallest set such that: • Tyvars ~ Typesr. • If 1T1

E

Typesr and

• If

1T2 E

Typesr then 1T1->1T2

lTi E Typesr (for all i between 1 and (1T1, ... ,lTn)opE Typesr·

n)

E

Typesr·

and (op, n) E TyoP5r then

Types containing type variables are called polymorphic; others are monomorphic. An instance 11' of a type IT is obtained by replacing all occurrences of a type variable in IT by a type. The only instance of a monomorphic type is itself. Note that Typesr is closed under the operation of taking instances.

3.2

HOL Terms

A theory T specifies a set Constsr of constants. Each constant is a pair (c, IT) where c is a name and IT is a type of T (i. e. IT E Typesr); IT is called the generic type of c. Distinct constants of a theory cannot have the same name; (i. e. if (c, IT) and (c, IT') are both members of Constsr then IT = 11'). The set Termsr of tenns of the theory T is the smallest set such that: 1. If z is a name which is not the name of a constant in T (i.e. Constsr does not contain a pair whose first component is z) and IT E Typesr, then (z, IT) E Termsr. Terms formed in this way are called variables.

2. If (c, IT) E Constsr and 11' E Typesr is an instance of Termsr. Terms formed in this way are called constants.

3. If(t,IT'->IT)

IT,

then (c, IT')

E

E Termsr and (t',u') E Termsr then (comb,tl,t2, 1T) E Termsr. Terms formed in this way are called combinations or function applications.

84

A Proof Generating System for Higher-Order Logic

4. If (x, 0') E TermsT (where x is a name that is not the name of a constant in T) and (t,O") E TermsT and 0'->0" E TypesT then (abs, x, t, 0'->0") E TermsT. Terms formed in this way are called abstractions or A-terms (\ is HOL's ASCII approximation to A).

Members of the sets TypesT and TermsT correspond closely to the internal representations of ML values of type type and term in the HOL system. The following constructor functions (listed below with their ML types) can be used to input types and terms; they are explained in Section 3.3 below.

mk_vartype

string -> type

mk_type

(string # type list) -> type

mk_var

(string # type) -> term

mk_const

(string # type) -> term

mk_comb

(term

mk_abs

(term # term) -> term

#

term) -> term

The HOL logic consists of the built-in theories bool and ind (see Sections 5.3' and 5.7) together with eight rules of inference. In the theory bool the type bool is introduced (we often name theories after the type which they introduce). Terms of type bool are called formulas. There are two constant formulas: T and F; these represent true and false respectively. It follows from the Law of Excluded Middle (an axiom of HOL) that any formula is either equivalant to T or to F.

3.3

Quotation

It would be extremely tedious to always have to input types and terms using the constructor functions. The HOL system has a special parser and type-checker that enable them to be input using a fairly standard syntax. The HOL printer also outputs types and terms using this syntax. For example, the ML expression" :bool->bool" denotes exactly the same value (of ML type type) as mk_type('fun',[mk_type('bool',[]);mk_type('bool',[])])

and "\x.x+1" can be used instead of

A Proof Generating System for Higher-Order Logic

85

mk_abs (ak_ var(' x' , 1Ik_ type ( 'num' , (]» , ak_comb (mk_comb (lIk_const ('+' , Ilk_type ( 'fun' , [m.k_type( 'num' , (]) ; mk_type('fun',[m.k_type('num',[]); mk_type('num',[])])]», mk_var(' x', mk_type ('num' , []»), lIk_const (' 1 " Ilk_type ( 'num' , (] ) »)

Notice that there is no explicit type information in l\x.x+1". The HOL typechecker knows that (1, num) E Constsnum and (+, num-> (num->num» E Constsnum, i. e. that the constants 1 and + have type num and num-> (num->num) , respectively. From the types of 1 and + the type-checker infers that both occurrences of x in l\x.x+1" could have type num. This is not the only type assignment possible; one could, for example, make the first occurrence of x have type bool and the second one have type num. In that case there would be two different variables with name x, namely (x, bool) and (x, num), the second of which is free. In fact, in HOL, the only way to construct a term with this second type assignment would be by using constructors, since the type-checker uses the heuristic that variables with the same name have the same type. The type-checker was designed by Robin Milner. It uses heuristics like the one above to infer a sensible type for all variables occurring in a term. It uses the types of any constants in making this inference. If there are not enough clues, then the system will complain with the message evaluation failed

types indeterminate in quotation

To give the system a hint, one can explicitly indicate types by following any subterm by a colon and then a type. For example, lIf(x:num) :bool" will typecheck with f and x getting types num->bool and num respectively. If there are polymorphic constants in a term, there must be enough type information to uniquely identify a type instance for each such constant. The type-checking algorithm used for the HOL logic differs from that used for ML. For example, the ML expression \x.x will get ML type *->*, but the HOL term "\X.X" will fail to type-check and will result in the message shown above. To get this term to type-check one must indicate explicity the type of the variable x by writing, for example, "\x: * .X". This treatment of types is inherited from LCF. The table below shows ML expressions for various kinds of type quotations. The expressions in adjacent columns are equivalent.

86

A Proof Generating System for Higher-Order Logic

Types Kind of type

Type variable Type constant Function type Compound type

ML quotation " : . _ •• 11

":op" ": 0"1->0"2" ":(0"1, ...

, O"n)op"

Constructor expression mk_vartype('.· .. ') mk_type(' op' ,(]) mk_type('fun' , [":0"1";":0"2"]) mk_type('op' , [II : 0"1"; ... ;": 0"n II] )

Equivalent ways of inputting the four primitive kinds of term are shown in the next table.

Primitive terms Kind of term

ML quotation

Constructor expression

Variable Constant Combination Abstraction

"var:O"" "canst: 0""

mk_var(' var' ,": 0"") mk_const('const' ,":0"") mk_comb("t1". "t2") mk_abs ("x", "t")

"t1 t2"

"\x.t"

The HOL quotation mechanism can translate various standard logical notations into primitive terms. For example, if + has been declared an infix (as explained in Section 5), then "x+l" is translated to "$+ 1 2". The escape character $ suppresses the infix behaviour of + and prevents the quotation parser getting confused. In general, $ can be used to suppress any special syntactic behaviour a constant name might have; this is illustrated in the table below, in which the terms in the column headed "ML quotation" are translated by the quotation parser to the corresponding terms in the column headed "Primitive term". Conversely, the terms in the latter column are always printed in the form shown in the former one. The ML constructor expressions in the rightmost column evaluate to the same values (of type term) as the other quotations in the same row.

A Proof Generating System for Higher-Order Logic

87

Non-primitive terms Kind of term

ML quotation

Primitive term

Constructor expression

Negation Disjunction Conjunction Implication Equality 'V-quantification 3-quantification e-term Conditional let-expression

,,-ttl

"$- ttl "$\/ t1 t2" "$/\ t1 t2" "$-=> t1 t2" "$- t1 t2" "$! (\x.t)" "$7(\x.t)" "$I(\x.t)" "COND ttl t2" "LET (\x. t2)t1 "

mk.neg("t") mk.disj ("t1", "t2") mk.conj ("t1", "t2") mk.imp("t1'" "t2") mk.eq ("t1", "t2") mk.forall("x", "t") mk.exists("x", "t") mk.select("x" , "t") mk.cond("t", "t1'" "t2") mk.let ("x", "t1'" "t2")

"t1 \/t2" "t1/\t2" "t1 ='"'>t2" "t1 =t2" "!x.t" "7x.t" "Ix.t" "(t->t1l t2)" "let X=t1 in t2"

The constants COif0 and LET are explained in Section 5.3. The constants \/, /\, --> and '"' are examples of infixes. If c is declared to be an infix, then the HOL parser will translate "t1 e t2" to "$e t1 t2'" The constants!, 7 and I are examples of binders (see Section 5.3 also). If e is declared to be a binder, then the HOL parser will translate "e x. ttl to the combination "$e(\x. t)". In addition to the kinds of terms in the tables above, the parser also supports the following syntactic abbreviations. Syntactic abbreviations Abbreviated term

Meaning

Constructor expression

"t t1 .. ·tn " "\X1··· Xn· t " "!X1··· Xn· t " "7X1' .. xn .t"

"( .. ·(t t1)" ·t n )" "\X1' ... \xn.t" "!X1' ... !xn·t" "7X 1' ... ?xn.t"

list.llk.cOllb("t", ["t1" j ... j "t n "]) list.mk.abs(["x1" j . .. ; "zn I1J , lit") list.mk.forall (["Xl" j . .. j"X n "] ,"t") list.mk.exists( ["Xl" j . .. j"X n "] ,"t")

The parser will also convert "\(X1,X2) .t" to "UlfCURRY(\X1 X2.t)" (the constant UlfCURRY is described in Section 5.5). This transformation is done recursively so that, for example, "\(X1,X2,X3) .t" is converted to "UlfCURRY (\X1. UlfCURRY (\X2 X3. t» " More generally, ifbis a binder, then "b(X1,X2) .t" is parsed as "$b(\(X1,X2) .t)". For example, "!(x,y).x>y" parses to "$!(UlfCURRY(\x.\y.$> x y»" (> is an infixed constant of the theory num meaning "is greater than").

88

3.4

A Proof Generating System for Higher-Order Logic

Antiquotation

Within a quotation, expressions of the form A(t) (where t is an ML expression of type term or type) are called antiquotations. An antiquotation A(t) evaluates to the ML value oft. For example, "x \I A(mk_conj("y:bool"."z:bool"»" evaluates to the same term as "x \I (y /\ z)". The most common occurrence of antiquotation is when the term t is just an ML variable x; in this case A(x) can be abbreviated by AX. The following session illustrates antiquotion.

'let y '"' "x+l";; y • "x + 1" : term 'let z '"' lOy '"' Ay";; Z '"' lOy '"' x + 1" : term ." !x.1y. AZ";; "!x. 1y. Y '"' x + 1"

4

term.

Sequents, Theorems and Proof

The HOL system supports proof by natural deduction. Assertions in the HOL logic are not individual formulas (i.e. boolean terms), but are sequents of the form (r, t), where r is a set of formulas called the assumptions and t a formula called the conclusion. A sequent(r, t) asserts that if all the formulas in rare true, then so is t. Using sequents, one can design a deductive system in which proofs are more 'natural' than in other (e.g. Hilbert-style) systems. Sequents are represented in ML by pairs with ML type (term list)#term.; the first component represents the assumptions and the second component the conclusion. For example, (["x>y"; "y>z"]. "x>z") is an ML expression representing the sequent with assumptions "x>y" and "y>z" and conclusion "x>z". A theorem is a sequent that has a proof. More precisely, a theorem is a sequent that is either an axiom, or follows from other theorems by a rule of inference. To guarantee that the only way to get theorems is by proof, the HOL system (following LCF) distinguishes sequents from theorems by having a separate ML type tlm4. There are five initial theorems corresponding to the five axioms of the HOL logic. The only way to generate other objects with ML type thm is by using one of the eight primitive inference rules of the logic (see Section 5.4). ML's type discipline ensures that only functions that represent inference rules are applicable to theorems [5]. 'In ML jargon, thm is an abstract type whose representing type is (term list)#term.

A Proof Generating System for Higher-Order Logic

89

Interesting theorems require large numbers of applications of the primitive inference rules for their proof. For example, a recent verification of the design of a simple microprocessor took over a million inference steps [3]. The HOL system provides tools to help the user generate such proofs. One such tool is a derived inference rule. This is an ML function which invokes sequences of applications of the eight primitive rules. There are about a hundred derived inference rules predefined in the system. A brief discussion of derived rules can be found in Section 6. Proofs can be generated in a goal-oriented style using tactics; these were invented (for LCF) by Robin Milner and are explained in Section 7.1. The goal-directed proof style starts with a formula (representing the goal one wants to prove) and then reduces this to sub-goals, sub-sub-goals etc. until the problem is decomposed into trivial goals (e.g. instances of axioms). This is in contrast to forward proof in which one works forward from the axioms, using the rules of inference, until the desired theorem is deduced.

4.1

Axioms and definitions

Each theory T has a set AxiomsT ~ TermsT of terms of type bool, called axioms. These axioms usually specify the meaning of the constants of T; however, the HOL system enables the user to assert any formula as an axiom. A definition is an axiom of the form cst, where c is a constant and t is a closed term 5 • A constant c is defined in T if there is exactly one axiom in T containing c and that axiom is a definition. If all the axioms of T are definitions, then T is a definitional theory. Definitional theories have the important property that they cannot introduce any new inconsistency that wasn't already present; they are what logicians call conservative extensions. Axioms of the form f Xl ••• xn '" t, where the free variables of t are included among Xl, •.. , X n , are also regarded as definitions. This is because the formula f xl·· ,Xn ,. t is equivalent to the ordinary definition I = \XI .. ,xn.t. Definitions are also allowed to have the form l(xI •. . .• xn) = t. Another kind of conservative extension is a type definition.

4.2

Type definitions

There does not seem to be any standard notion of type definition in the literature on higher-order logic. The mechanism described here is experimental; it was suggested to me by Mike Fourman and may change soon. Recall that a constant type denotes a set and an n-ary type operator denotes an operation for combining n sets to form a set. The intuitive idea behind type definitions in the HOL logic is as follows: • A constant type is defined by specifying it to be in one-to-one correspondence with a non-empty subset of an existing type. 5A

term is clo6ed if it contains no free variables.

90

A Proof Generating System for Higher-Order Logic

• An n-ary type operator op is defined by specifying (al, .. . ,an)op to be in one-to-one correspondence with a non-empty subset of an existing type constructed from aI, ... , an. All types in the HOL logic must denote non-empty sets. This is because the term ex: 0'. T denotes a. member of the set denoted by 0' (for arbitrary 0'). The binder e is explained in Section 5.3. The HOL system has a single mechanism for defining types. The idea behind this mechanism is to define an n-ary type operatorS op by specifying the set denoted by (al>" .,an)op for all denotations ofthe type variables a!, .. , , an. This set is specified by giving a (polymorphic) term denoting its characteristic function 7 • This idea can perhaps be made clearer with a concrete example. Suppose we want to define a unary type operator i80 such that the type denoted by ": (*)ieo" is the subset of the set denoted by ": *->*" consisting of those functions which have an inverse. The characteristic function of the set of functions with inverses is denoted by the term "\f:*->*. 19. Ix. g(f x) - x". To ensure that only non-empty sets are used to define types, the HOL system requires the user to supply a theorem asserting non-emptiness when making a type definition. If top is the term representing the characteristic function defining op, then the user is required to supply a theorem of the form 1- 1z. top z. In the case of our example, op is i80 and ti80 is "\f:*->*. 19. Ix. g(f x) ,. x", so the required theorem is: 1- 1f. (\f:*->*.1g.!x. g(f x) ,. x) f

The ML function nev_type_definition : (8tring' tera • thm) -> thm

is used to make a new type definition. Suppose op is a name, top is a term whose object language type has the form O'->bool (where 0' contains n distinct type variables aI, ... , an) and th is the theorem 1- 1z.t op z. Then nev_ type_definition(op,top, th)

declares op to be a new n-ary type-operator such that (al,. . . ,an)op denotes the set with characteristic function top. This set is a subset of the set denoted by 0'. The theorem th ensures that this subset is non-empty for all values of the type variables ai, ... , an. The type 0' is called the representing type of op. The result of the type definition is an axiom asserting the existence of a one-to-one function from (al> .. . ,an)op onto the subset of 0' characterized by top.

Suppose we have proved 1- 1f. (\f:*->*. 19. !x. g(f x) - x) f and bound this theorem to the ML name tho Then the new type i80 is defined by executing the ML expression 6 Constant

types are defined by taking n

= O.

7The chlu'(JcterilJtic Junction of a set is a boolean-valued function that maps an element to true if and only if the element is a member of the set.

A Proof Generating System for Higher-Order Logic

91

The HOL system then automatically declares a new 1-ary type operator iso, a new constant REP _iso (whose name is generated by prefixing the name of the type operator being defined with the string 'REP _ ,), and finally, HOL automatically generates the axiom 1- ONE_ONE REP_iso /\ (!f. (\f. ?g. !x. g(f x)

a

x)f

a

(?f'. f = REP_iso f'»

This axiom has the form 1- ONE_ONE REP_op /\ !f.t op f • (?f'. f • REP_op f')

where the function ONE_ONE is defined by 1- ONE_ONE f • (!xl x2. (f xl • f x2)

._>

(xl· x2»

The constant ONE_ONE is introduced in the theory bool (see Section 5.3) and so is built into the HOL system.

5

Theories

The set of types, constants and axioms that are available in HOL depends on the theory in which one is working (see Section 3). Theories form a hierarchy in which a theory 1i. can be a parent of another theory 72, meaning that the types, constants and axioms of 1i. will be available in T2 • If 1i. is a parent of 72, then T2 is an immediate descendant of 1i.. 72 is a descendant of 1i. if the two theories are in the transitive closure of the immediate descendant relation. In this case we also say that 1i. is an ancestor of 12. When first entering the HOL system one enters a theory called HOL; this is a descendant of several theories including bool and theories of numbers, lists and pairs. In the subsections below, we describe the ML functions provided for manipulating theories and the various theories built in to the HOL system. First we describe a bit more abstractly exactly what a theory is.

5.1

Abstract Definition of a Theory

A theory T is characterized by a 4-tuple (ParentsT, TyopsT, ConstsT, AxiomsT)

where: • ParentsT is a (finite) set of all the parent theories of T. • TyopsT is a set of type operators (see Section 4.2). • ConstsT is a set of constants (see Section 3.2). • AxiomsT is a set of theorems which are the axioms of T (see Section 4.1).

92

A Proof Generating System for Higher-Order Logic

5.2

ML Functions for Manipulating Theories

In the HOL system, the four components of a theory are stored on disk in files with names of the form name. th, where name is the name given to the theory (with nelf_theory) when it is created. Various additional pieces of information are stored in the name. th files, including the parsing status of the contants (i.e. whether they are infixes or binders), which axioms are definitions (see Section 4.1) and the theorems that have been proved and saved by the user. The ML functions for creating theories are listed below. string -) void nelf_theory string -) void nelf_parent int -) string -) void nelf_type (string • type) -) void nelf_constant (string • type) -) void nelf_infix (string • type) -) void nelf_binder nelf_Hio. (string • tera) -) to. nelf_def init ion (string.' tera) -) to. (string • term) -) to. nelf_infix_definition nelf_binder_definition : (string. tera) -) tha (string' tera • to.) -) tha nelf_type_definition The effect of these functions should be reasonably clear from their names and types. The first argument of nelf_type is the arity of the type operator being declared; the second argument is its name. The arguments of type string to nelf_Hiom, nelf_definition etc. are the names of the corresponding axioms and definitions. These names are used when accessing theories with the functions Hiom, definition, etc., described below. The various functions for setting up theories are illustrated in the example session in Section 8. A theory with no descendants can be extended by adding new parents, types, constants, axioms and definitions. Theories that are already the parents of other theories cannot be extended because it would be messy (though not impossible) to implement the necessary checks to ensure that added types, constants etc. did not invalidate declarations in the descendant theories. When one is creating a new theory or extending an existing theory one is said to be in draft mode. When one is working in a completed theory, one is said to be in working mode; in this mode the functions with prefix "nelf_" listed above are not available. In draft mode there is the danger that one is not prevented from asserting inconsistent axioms such as 1- T"F. See the discussion of definitional axioms in Section 4.l. The functions for entering an already existing theory in either draft mode or working mode are, respectively: extend_theory load_theory

string -) void string -) void

A Proof Generating System for Higher-Order Logic

93

To finish a session and write all new declarations to the theory file there is the function: close_theory : void -> void

There are various functions for loading the contents of theory files: parents types constants infixes binders axioms definitions theorems

string string string string string string string string

-> -> -> -> -> -> -> ->

string list (int # string) term list term list string list (string # thm) (string # thm) (string # thm)

list

list list list

Once a theorem has been proved, it can be saved with the function save_thm : string # thm -> thm

Evaluating save_thm('Thi' ,th) will save the theorem th with name Thi on the current theory. Individual axioms, definitions and theorems can be read (from ancestor theories) using the following ML functions: axiom definition theorem

string -> string -> thm string -> string -> thm string -> string -> thm

Theories can be printed using the function print_theory, which takes a theory name (a value of type string) and then prints out the named theory in a readable format. In the remaining subsections of this section we describe all the theories built into the HOL systems. The theory HOL has all of these as parents.

5.3

The Theory bool

The theory bool introduces the type bool and contains four of the five axioms for higher-order logic (the fifth axiom is in the theory ind). These axioms, together with the rules of inference described in Section 5.4, constitute the core of the HOL logic. Because of the way the HOL system evolved from LCF 9 , the 8The actual theory structure in the HOL system is slightly different from that described in this paper. Specifically, the types and definitions for pairs are included in the theory bool (rather than being in a separate descendant theory called prOd). It is hoped that future versions of the system will correct this anomaly.

9In order to simplify the porting of the LCF theorem-proving tools to the HOL system, the HOL logic was made as like PPLAMBDA (the logic built-in to LCF) as possible.

94

A Proof Generating System for Higher-Order Logic

particular axiomatization of higher-order logic it uses is superficially different from the classical axiomatization due to Church [2]. The biggest difference is that in Church's formulation, type variables are in the meta-language, whereas in the HOL logic they are part of the object language. There are three primitive constants in the theory bool: '"' (equality, an infix), ==> (implication, an infix) and I (choice, a binder). Equality and implication are standard predicate calculus notions, but choice is more exotic: if t is a term having type O'->bool, then Ix.t x (or, equivalently, $It) denotes some member of the set whose characteristic function is t. If the set is empty, then Ix.t x denotes an arbitrary member of the set denoted by 0'. The constant G is a higher-order version of Hilbert's c:-operator; it is related to the constant £ in Church's formulation of higher-order logic. For more details, see Leisenring's book [12] and Church's original paper [2]. The logical constants T (truth), F (falsity), - (negation), 1\ (conjunction), \I (disjunction), ! (universal quantification) and? (existential quantification) can all be defined in terms of equality, implication and choice. The definitions listed below are fairly standard; each one is preceded by its ML name. (Later definitions sometimes use earlier ones.) 1- T

= «\x:*. x)=(\x. x»

1- $!

= \P:*->bool. P=(\x.

1- $?

= \P:*->bool.

T)

P($G P)

1- $1\ - \tl t2. !t. (tl ==> t2 =-> t) ==> t

1- $\1 = \tl t2. !t. (tl ==> t) -=> (t2 ==> t) ==> t

NOT_DEF

1- F

= !t. t

1- $-

= \t.

t ==> F

There are four axioms in the theory bool: BOOL_CASES_AX

1- !t. (t = T) \1 (t = F)

IMP_ANTISYM_AX

1- !tl t2. (tl ==> t2) ==> (t2 ==> tl) ==> (tl = t2)

ETA_AX

1- !t. (\x. t x) = t

SELECT_AX

1- !P:*->bool x. P x -=> P($G P)

The fifth and last axiom of the HOL logic is the Axiom of Infinity; this is in the theory ind described in Section 5.7. The theory bool supplies the definitions of a number of useful constants:

A Proof Generating System for Higher-Order Logic

95

1- LET

• \f x. f x

1- COND

- \t tl t2.tx.«t-T)-->(x-tl»/\«t-F)··>(x-t2»

1- ONTO f

- (!y.

1x. Y - f x)

The constant LET is used in representing terms containing local variable bindings (i.e. let-terms, as in Section 3.3). The constant COND is used in representing conditionals. The constant ONE_ONE is used in type definitions (see Section 3.3). The constant ONTO is used in stating the Axiom ofInfinity (see Section 5.7).

5.4

Primitive Rules of Inference of the HOL Logic

There are eight primitive rules of inference of the HOL logic. We will specify these using standard natural deduction notation. The metavariables t, t1, t2, etc. stand for arbitrary terms. The theorems above the horizontal line are called the hypotheses of the rule and the theorem below the line is called the result. Each rule asserts that its result can be deduced from its hypotheses, provided any restrictions (listed after the bullets) hold. The first three rules have no hypotheses; their results can always be deduced. The square brackets contain the ML names of the rules followed by their ML types. 5.4.1

Assumption introduction

[ASSUME :

tera -> thm]

t 1- t ASSUME

"t" evaluates to 1- t.

5.4.2

Reflexivity

[REFL : tera -> tha]

1- t = t REFL "t" evaluates to 1- t - t.

96

A Proof Generating System for Higher-Order Logic

5.4.3

Beta-conversion

• where tdt2/Z] denotes the result of substituting t2 for Z in tl, with the restriction that no free variables in t2 become bound after substitution into tl. BETA_COllY "(\Z.h>t2" evaluates to the theorem 1- (\Z.tl>t2 - tdt2/Z].

5.4.4

Substitution

[SUBST : (thllltera)list -> tara -> thll -> thm]

• where t[tl' ... , t n] denotes a term t with some free occurrences of the terms tt, ... , tn singled out and t[tL ... , t~] denotes the result of simultaneously replacing each such occurrences of to by t~ (for 1 ~ i ~ n), with the restriction that the context t[] must not bind any variable occurring free in either to or t~ (for 1 ~ i ~ n). • U denotes set union. The first argument to SUBST is a list [(I-tl-t~. Zl); ... (!-tn- t~. zn)]. The second argument is a template term t[Zl, ... , zn] in which occurrences of the variable Zo (where 1 ~ i ~ n) are used to mark the places where substitutions with 1- ti-t~ are to be done.

5.4.5

Abstraction

[ABS : tera -> thll -> thll]

• where

Z

is not free in

r.

If th is an ML name for the theorem theorem r 1- (\z. tl) .. (\z. t2).

r

1- tl-t2, then ABS "z" th returns the

A Proof Generating System for Higher-Order Logic

5.4.6

97

Type instantiation

[lIST_TYPE: (type'type) list -> tha -> tha)]

r

1- t

denotes the result of substituting (in parallel) the types 0'1, .•• , O'n for type variables OIl, ••• , OIn in t, with the restriction that none of 011, ... , OIn occur in r.

• t[O'l' ... ,O'n/0I1, .,. ,00n ]

IIST_TYPE[(0'1.0I1); ... ; (O'n .00n)]th returns the result of instantiating each occurrence of OIi in the theorem th to O'i (for 1 ~ i ~ n). 5.4.7

[DISCH

Discharging an assumption tera -> tha -> thm]

r-{td 1- t1 -=> t2 • r-{td denotes the set obtained by removing t1 from r. If th is the theorem r 1- t2, then the ML expression DISCH "t1" th evaluates to the theorem r-{td 1- t1 ==> t2' Note that ift1 is not a member of r, then r-{td = r. 5.4.8

Modus Ponens

[MP : tha -> tllll -> thm] r 1 1- tl ss> t2 r2 1- tl r 1 Ur21- t 2 MP takes two theorems (in the order shown above) and returns the result of applying Modus Ponens.

5.5

The Theory prod

The binary type operator prod denotes the Cartesian product operator (see Section 3.1). Values of type (0'100'2)prod are ordered pairs whose first component has type 0'1 and whose second 0'2. The HOL parser recognises 0'1'0'2 as an abbreviation for (0'1.0'2)prod. Cartesian products can be defined by representing a pair (t1.t2) as the function \x y. (x-t 1) /\ (y-t2). The representing type of 0'1 #0'2 is thus 0'1->0'2->bool. To define pairs this way, we first evaluate the ML expressions:

98

A Proof Generating System for Higher-Order Logic

new_definition ('IUCPAIR_DEF'. "IUCPAIR(x:.)(y:") • \a b. (a-x)/\(b-y)II) new_definition ('IS_PAIR_DEF'. illS_PAIR P • 1x: •. 1y: ••• p • IUCPAIR x y")

We then prove that: 1- 1p:.->•• ->bool. IS_PAIR p

which is easily done (since 1- IS_PAIR(IUCPAIR x y) is easily proved). This theorem is called PAIR_EXISTS. Next we define the type operator prod by evaluating

This results in the constant REP _prod being declared and the following axiom (called DEF _prod) being asserted. 1- ONE_ONE REP_prod /\ (!p. IS_PAIR p • (1p·. p . REP_prod p'»

The infix constructor"." and the selectors FST: .#•• ->. and SID: .#•• -> •• can then be defined by: new_infix_definition ('COMMA_DEF'. "$. (x:.) (y: •• ) • 'p. REP_prod p • IUCPAIR x y") new_definition ('FST_DEF'. ItFST(p: ( •••• )prod) - Ix.1y. MI_PAIR x Y • REP_prod pit) new_definition ('SID_DEF'. "SID(p:( •••• )prod) • • y.1x. MI_PAIRx y. REP_prodp")

From these definitions and the axiom DEF _prod, the following theorems are proved and stored in the theory prod. PAIR

1- !x. (FST x.SID x) • x

FST

1- !x y. FST(x.y)

• x

SID

1- !x y. SID(x.y)

- y

In addition to the constants just described, the theory prod also contains the definitions of CURRY and UlfCURRY. 1- CURRY f x Y

• f(x.y)

1- UlfCURRY f (x.y) • f x Y

A Proof Generating System for Higher-Order Logic

99

These constants are used for representing generalized the abstractions of the form \(Xl •.. .• xn).t described in Section 3.3.

5.6

The Theory sum

A type (Ul.U2)SUII. (which may be abbreviated as Ul +(2) denotes the disjoint union of types Ul and U2. The type operator SUII. can be defined just as prod was, but we omit the details here lO • From the user's point of view, all that is needed are the functions INL IliR OUTL OUTR ISL ISR

* -> *+** ** -> *+** *+** -> * *+** -> ** *+** -> bool *+** -> bool

IliL and IliR are the injections for inserting elements into the sum; OUTL and OUTR are the corresponding projections out of the sum. The predicates ISL and ISR test whether an element of a sum is in the left or right summand respectively. The following theorems are pre-proved in the system. ISL ISR ISL_OR_1SR OUTL OUTR IliL INR

111111-

1-

(!e. ISL(IliL e» /\ ( Ie. -ISL(IliR e» (!e. ISR(IliR e» /\ ( Ie. -ISR(IliL e» Is. ISL S \I ISR S Ie. OUTL(IliL e) ,. e Ie. OUTR(IliR e) ,. e Is. ISL S ==> (IlIL(OUTL s) ,. s) !a.

ISR

a

~~>

(INR(OUTR

a) - a)

These theorems follow from the definitions.

5.7

The Theory ind

The theory ind introduces the type ind of individuals and the Axiom of Infinity. This axiom states that the set denoted by ind is infinite. The four axioms of the theory bool, the rules of inference in Section 5.4 and the Axiom of Infinity, are together sufficient for developing all of standard mathematics. Thus, in principle, the user of the HOL system should never need to make a non-definitional theory. In practice, it is often very tempting to take the risk of introducing new axioms because deriving them from definitions can be tedious; proving that 'axioms' follow from definitions amounts to giving a consistency proof of them. The Axiom of Infinity is called IlIFIlIITY _AX and states: laThe definition of disjoint unions in the HOL system is due to Tom Melham.

100

A Proof Generating System for Higher-Order Logic

1- ?f:ind->ind. ONE_ONE f /\ -(ONTO f)

This asserts that there exists a one-to-one map from ind to itself that is not onto. This implies that the type ind denotes an infinite set.

5.8

The Theory num

The type nua of natural numbers can be defined as equivalent to a countable subset of indo Peano's axioms can then be proved as theorems. However, this has not yet been done in the HOL system. The type num is declared as a new primitive type and the constants 0 and suc are taken as primitive constants with types nua and nua->nua respectively. Peano's axioms are then asserted using nelJ_axioa. The resulting theorems are: 1- In. -(SUC n • 0) 1- !m n. (SUC a - SUC n) -=> (m • n) 1- !P. P 0 /\ (!n. P n ••> P(SUC n»

INDUCTION

--> (!n. P n)

In higher-order logic, Peano's axioms are sufficient for developing number theory because addition and multiplication can be defined. In first order logic these must be taken as primitive. Note also that INDUCTIOH could not be stated as a single axiom in first-order logic because predicates (e.g. p) cannot be quantified.

5.9

The Theory primJec

In classical logic, unlike domain theory logics such as PPLAMBDA, arbitrary recursive definitions are not allowed. For example, there is no function f (of type nUII->nu..) such that !x. f x

-

(f x) + 1

Certain restricted forms of recursive definition do, however, uniquely define functions. An important example are the primitive recursive functions. For any x and f the Primitive Recursion Theorem tells us that there is a unique function fun such that: (fun 0 • x) /\ (!a. fun(SUC a)

= f(fun

a).)

The Primitive Recursion Theorem follows from Peano's axioms. When the HOL system is built, the following theorem is proved and stored in the theory prim_rec:

A Proof Generating System for Higher-Order Logic

101

1- !x f. ?fun. (fun 0 .. x) /\ (! •. fun(SUC .) .. f(fun .).)

From this it follows that there exists a function PRIM_REC such that: 1- !lIxf. (PRIM_REC x f 0 .. x) /\ (PRIM_REC x f (SUC II) s f(PRIM_REC x f.).)

The function PRIM_REC can be used to justify any primitive recursive definition. In higher-order logic a recursion of the form fun 0 X1. •• Dl fun

(suc.)

.. f1(x1 •...• Dl)

xl .•. Dl" f2(fun. xl ... Dl.

II.

xl •...• Dl)

is equivalent to: fun 0

.. \xl ... Dl. f1(xl ..... Dl)

fun (SUC.)

a

\xl ••. Dl. f2(fun II xl .•• Dl ••• xl •...• Dl) .. (\f II xl ... Dl. f2(f xl ... Dl ••• xl •...• Dl»(fun .).

which is equivalent to the non-recursive definition: fun .. PRIM_REC (\x1. •• Dl. f1 (xl •...• Dl» (\f II xl ... Dl. f2(f xl ... Dl ••• xl ..... Dl»

For example, we can define addition and multiplication by: 1- + .. PRIM_REC(\n. n)(\f 1-

* ..

II

n. SUC(f n»

PRIM_REC(\n. O)(\f • n. (f n) + n)

To automate the use of the Primitive Recursion Theorem, HOL provides two functions: ne __ pria_rec_definition nev_infix_prim_rec_definition

(string # term) -) thm (string # term) -) thm

Evaluating nev_prill-xec_definition ('fun' • "(!xl ... Dl. fun 0 xl ... Dl" U[xl .... ,Dl]) /\ (!II xl. .. Dl. fun (SUC .) xl ... Dl" t2[fun • xl ... Dl. II. xl ..... xn])")

102

A Proof Generating System for Higher-Order Logic

automatically makes the (non-recursive) definition: nev_def init ion ( , fun_DEF' , "fun - PRI!CREC (\xl. .. xn. ti[xl, ... ,xn]) (\fun II xl. .. xn. t2[fun xl. .. xn,

II,

xl, ... ,xn])lI)

and then proves the theorem:

1- fun 0 fun (SUC

II)

xl. .. xn - ti[xl, ... ,xn] /\ xl ... xn - t2[fun. xl ... xn,

II,

xl, ... ,xn]

which is saved as the theorem with name 'fun'. The ML function nev_infix_pria_rec_definition declares an infixed function by primitive recursion. For example, here is the definition of + used by the system: nev_infix_pria_rec_definition (' ADD' , 11($+ 0 n - n) /\ ($+ (SUC .) n - SUC($+ II n»lI) The $'s are there to indicate that + is being declared an infix. Evaluating this ML expression will create a definition of the following form in the current theory: ADD_DEF

1- $+

s

PRIK_REC(\n. n)(\f

II

n. SUC(f n»

The $ is now necessary since the call to nev_infix_pria_rec_definition declares + as an infix. It also automatically proves the following theorem: ADD

1- (0 + n • n) /\ «SUC .) + n - SUC(. + n»

which is saved in the current theory with name ADD. We might hope to define the less-than relation "t2 then we can derive the theorem r u {til 1- t2 by Modus Ponens from the theorem t1 1- t1 (which follows from the primitive rule ASSUME). Derived inference rules enable proofs to be done using bigger and more natural steps. They can be defined in ML simply as functions that call the primitive inference rules. For example, the undischarging rule just described can be implemented by: let UNDISCH th .. MP th (ASSUME(fst(dest_imp(concl th»»

Each application of UNDISCH will invoke an application of ASSUME followed by an application of MP. Some of the predefined derived rules in HOL can invoke many thousands of primitive inference steps. The HOL system has all the standard introduction and elimination rules of Predicate Calculus predefined as derived inference rules. It is these, rather than the primitive rules, that one normally uses in practice. In addition, there are some special rules that do a limited amount of automatic theorem-proving. The most generally useful examples of these are a collection of rewriting rules developed by Larry Paulson [17]. Rewriting rules use equations of the form 1t1 =t2 to repeatedly replace subterms matching t1 by the corresponding instances of t 2 • The Cambridge LCF Manual [18] documents the rewriting rules in the HOL system, so we do not describe them here.

7

Goal Directed Proof: Tactics and Tacticals

A tactic is an ML function which is applied to a goal to reduce it to subgoals. A tactical is a (higher-order) ML function for combining tactics to build new tactics 12 . For example, if T1 and T2 are tactics, then the ML expression T1 THEN T2 evaluates to a tactic which first applies T1 to a goal and then applies T2 to each sub goal produced by T 1 . The tactical THEN is an infixed ML function. The tactics and tacticals in the HOL system are derived from those in the Cambridge LCF system [18] (which evolved from the ones in Edinburgh LCF

[6]).

7.1

Tactics

It simplifies the description of tactics if we use various type abbreviations. A type abbreviation is just a name given to a type. A type and its abbreviation 12The tenns "tactic" and "tactical" are due to Robin Milner, who invented the concepts.

106

A Proof Generating System for Higher-Order Logic

can be used interchangeably. The system prints types using any abbreviations that are in force. A type abbreviation is introduced by executing a declaration of the form lattypa name - type

For example: lattype goal - tara list I tara

The following ML type abbreviations are also used in connection with tactics (they will be explained later). proof subgoals tactic tha_tactic cony

-

tha list -> tha goal list I proof goal -> subgoals tha -> tactic tara -> tha

If T is a tactic and 9 is a goal, then applying T to 9 (i.e. evaluating the ML expression T g) will result in an object of ML type subgoals, i.e. a pair whose first component is a list of goals and whose second component has ML type proof.

Suppose T 9 - ([g1: ... :gn] ,p). The idea is that g1 , ... , gn are subgoals and p is a 'justification' of the reduction of goal 9 to subgoals g1 , .•. , gn. Suppose further that we have solved the subgoals g1 , ..• ,gn' This would mean that we had somehow proved theorems th1 , .,. , thn such that each thi (1 ~ i ~ n) 'achieves' the goal gi. The justification p (produced by applying T to g) is an ML function which when applied to the list [th 1: ... :th n ] returns a theorem, th, which 'achieves' the original goal g. Thus p is a function for converting a solution of the subgoals to a solution of the original goal. If p does this successfully, then the tactic T is called valid. Invalid tactics cannot result in the proof of invalid theorems; the worst they can do is result in insolvable goals or unintended theorems being proved. If T were invalid and were used to reduce goal 9 to subgoals g1 , ... , gn, then one might prove theorems th1 , ... , th n achieving g1 , ... , gn, only to find that these theorems are useless because p[th 1 : ... :th n] doesn't achieve 9 (i.e. it fails, or else it achieves some other goal). A theorem achieves a goal if the assumptions of the theorem are included in the assumptions of the goal and if the conclusion of the theorems is equal (up to renaming of bound variables) to the conclusion of the goal. More precisely, we say a theorem t1, ... ,tm 1- t achieves a goal ([U1:" .:un ] ,u) if and only if {t1, .. " t m } is a subset of {U1, ... , un} and t is equal to U (up to renaming of bound variables). For example, the goal (["x_y": "y-Z": IZaV"] ,1x=Z") is achieved by the theorem x-y, y-z 1- x-z (the assumption "z-v" is not needed).

A Proof Generating System for Higher-Order Logic

107

We say that a tactic solves a goal if it reduces the goal to the empty list of subgoals. Thus T solves 9 if T 9 ,. ([] ,p). If this is the case and if T is valid, then p[] will evaluate to a theorem achieving g. Thus if T solves 9 then the ML expression snd(T g) [] evaluates to a theorem achieving g. Tactics are specified using the following notation:

goal goah

goaln

goal2

For example, a tactic called cmU_TAC is described by

Thus CONJ_TAC reduces a goal of the form (f, "td\t2") to subgoals (f, "tl ") and (f, "t2"). The fact that the assumptions of the top-level goal are propagated unchanged to the two subgoals is indicated by the absence of assumptions in the notation. Another example is INDUCT_TAC, the tactic for doing mathematical induction on the natural numbers:

t[o]

!n.t[n] {t[n]} t[suc n]

INDUCT_TAC reduces a goal (f, "! n. t[n]II) to a basis subgoal (f, lit [0] II) and an induction step subgoal (f U {"t[n]"} , lit [suc n]II). The extra induction assumption "t[SUC n]II is indicated in our tactic notation with set brackets. Tactics generally fail (in the ML sense) if they are applied to inappropriate goals. For example, CONJ_TAC will fail if it is applied to a goal whose conclusion is not a conjunction. Some tactics never fail, for example ALL_TAC

t t

is the 'identity tactic'; it reduces a goal (f,O to the single subgoal (f ,0 i.e. it has no effect. ALL_TAC is useful for writing complex tactic using tacticals (e.g. see the definition of REPEAT in Section 7.3).

108

A Proof Generating System for Higher-Order Logic

7.2

Using Tactics to Prove Theorems

Suppose one has a goal 9 to solve. If 9 is simple one might be able to think up a tactic, T say, which reduces it to the empty list of subgoals. If this is the case then executing let gl,p = T 9 will bind p to a function which when applied to the empty list of theorems yields a theorem th achieving g. (The declaration above will also bind gl to the empty list of goals.) Thus a theorem achieving 9 can be computed by executing let th • p[] After proving a theorem one usually wants to store it in the current theory. To do this one chooses an unused name, ThIll say, and then executes: save_thm('Thlll', th) This saves the theorem th with name ThIll. If the current theory is called T then to retrieve th (in a later session, in T or its descendants) one does: let th • theorem 'T' 'ThIll' The ML function theorem has type string->string->thm. Its first argument should be the name of a theory and its second argument the name of a theorem on that theory. The named theorem is returned. To simplify the use of tactics there is a standard function called prove_ thm: prove_thm : (string # term # tactic) -> thm

(i.e. the goal with no assumptions and conclusion t) using tactic T and saves the resulting theorem with name foo on the current theory.

prove_thm('foo' ,t,T) proves the goal ([],t)

7.3

Tacticals

A tactical is an ML function that returns a tactic (or tactics) as result. Tacticals may take various parameters; this is reflected in the various ML types that the built-in tacticals have. Some important tactics in the HOL system are listed below (these are all in LCF also). 7.3.1

ORELSE: tactic

-> tactic -> tactic

The tactical ORELSE is an ML infix. If Tl and T2 are tactics, then the ML expression Tl ORELSE T2 evaluates to a tactic which first tries Tl and then if Tl fails it tries T 2 . It is defined in ML by let (Tl ORELSE T2) g

7.3.2

THEN: tactic

=

Tl g ? T2 g

-> tactic -> tactic

The tactical THEN is an ML infix. If Tl and T2 are tactics, then the ML expression Tl THEN T2 evaluates to a tactic which first applies Tl

A Proof Generating System for Higher-Order Logic

109

and then applies T2 to all the subgoals produced by T I . Its definition in ML is tricky and not given here. 7.3.3

THEIL: tactic -> tactic list -> tactic

If T is a tactic which produces n subgoals and T I , ... , Tn are tactics then T THEIL [TI j ••• j Tn] is a tactic which first applies T and then applies T; to the ith subgoal produced by T. The tactical THEIL is useful if one wants to do different things to different subgoals. 7.3.4

REPEAT: tactic -> tactic

If T is a tactic then REPEAT T is a tactic which repeatedly applies T until it fails. It is defined in ML by; letrec REPEAT T g

= «T

THEI REPEAT T) ORELSE ALL_TAC) g

The extra argument g is needed because ML does not use lazy evaluation. 7.3.5

EVERY..ASSUM: (thm

-> tactic) -> tactic

Applying EVERY_ASSIDI ! to a goal applying the tactic:

([t i j ... jtn],t)

is equivalent to

!(ASSUME tl) THEI ... THEI !(ASSUME t n )

7.3.6

FIRST..ASSUM: (thm -> tactic) -> tactic

Applying FIRST_ASSIDI! to a goal applying the tactic: !(ASSUME

7.4

t1 )

([tIj .. . jtn],t)

is equivalent to

ORELSE ... ORELSE !tactic, and the type conv I3 abbreviates tera->thm. 13The type cony comes from Larry Paulson's theory of conversions [17].

110

A Proof Generating System for Higher-Order Logic

ACCEPT_TAC: thin_tactic

7.4.1

• Summary: ACCEPT_TAC th is a tactic that solves any goal that is achieved by tho • Use: Interfacing forward and backward proofs. For example, one might reduce a goal g to subgoals gl, ... , gn using a tactic T and then prove theorems th 1 , •.. , th n achieving these goals by forward proof. The tactic T THENL[ACCEPT_TAC th 1 ; . .. ;ACCEPT_TAC th n ] would then solve g. DISJ_CASES..TAC: thin_tactic

7.4.2

• Summary: DISLCASES_TAC 1- u\/v splits a goal into two cases: one with "u" as an assumption and the other with "v" as an assumption. t

{u}t

{v}t

• Uses: Case analysis. The tactic ASK_CASES_TAC (see below) is defined in ML by let ASK_CASES_TAC t = DISJ_CASES_TAC(SPEC t EXCLUDED_MIDDLE)

where EXCLUDED_MIDDLE is the theorem 1- !t. ASM_CASES_TiC: term

7.4.3

t

\I -to

-> tactic

• Summary: ASK_CASES_TAC "u" does case analysis on the boolean term "u".

t {u}t

{-u}t

• Uses: Case analysis. 7.4.4

COID_CASES..TAC: tactic

• Summary: Does a case split on the condition of a conditional term.

t[P->ulv] {p-F}t[V] {p-T}t[u] COlfO_CASES_TAC searches for a conditional term and then does cases on its

if-part. It fails if the context p.

tU captures

any free variables in the if-part

A Proof Generating System for Higher-Order Logic

111

• Uses: Most useful when there is only one conditional term in the goal (otherwise it is difficult to predict which condition will be chosen for the case split). Successive case analysis on all conditions can be done using REPEAT COliD_CASES_TAC.

7.4.5

REWRITE..l'AC: thm list

-> tactic

• Summary: REWRlTE_TAC[thH •. . ;thnl simplifies the goal by rewriting it with the explicitly given theorems th 1 , ••• , th n , and various built-in rewriting rules.

{tt, ... , tm}t {tt, ... ,tm}t' where tf is obtained from t by rewriting with 1. tht, ... , th n and 2. the standard rewrites held in the ML variable basic_rewrites.

• Uses: Simplifying goals using previously proved theorems. • Other rewriting tactics (based on REWRITE_TAC): 1. AS!CREWRITE_TAC adds the assumptions of the goal to the list of theorems used for rewriting.

2. FILTER_AS!CREWRITE_TAC P [th 1 ; ... ;thnl simplifies the goal by rewriting it with the explicitly given theorems th 1 , ... , th n , together with those assumptions of the goal which satisfy the predicate p and also the built-in rewrites in the ML variable basic_rewrites. 3. PURE_AS!CREWRITE_TAC is like AS!CREWRITE_TAC, but it doesn't use any built-in rewrites. 4. PURE_REWRITE_ TAC uses neither the assumptions nor the built-in rewrites 7.4.6

ASSUME..l'AC: thm_tactic

• Summary: ASSUME_TAC I-u adds u as an assumption.

• Uses: Enriching the assumptions of a goal with previously proved theorems.

112

A Proof Generating System for Higher-Order Logic

COILTAC: tactic

7.4.7

• Summary: Splits a goal "td\t2" into two subgoals "tl" and "t2".

• Uses: Solving conjunctive goals. CONJ_TAC is invoked by STRIP _TAC (see below). DISCH_TAC: tactic

7.4.8

• Summary: Moves the antecedant of an implicative goal into the assumptions. u ••> v

{u}v • Uses: Solving goals of the form IOU ==> v" by assuming "u" and then solving "v". STRIP_TAC (see below) will invoke DISCH_TAC on implicative goals. 7.4.9

GEILTAC: tactic

• Summary: Strips off one universal quantifier. !x.t[x] t[x /)

Where x' is a variant of x not free in the goal or the assumptions. • Uses: Solving universally quantified goals. REPEAT GEI_TAC strips off all universal quantifiers and is often the first thing one does in a proof. STRIP _TAC (see below) applies GEI_TAC to universally quantified goals. 7.4.10

IMP..RES_TAC: tactic

• Summary: IMP _RES_TAC th 'resolves' (see below) th with the assumptions of the goal and then adds the results to the assumptions.

A Proof Generating System for Higher-Order Logic

113

where Ul, ... , Un are derived by 'resolving' th with tl, ... , t m . Resolution in HOL is not classical resolution, but just Modus Ponens with a bit of oneway pattern matching (not unification). The usual case is where th is of the form 1- ! Xl ... Xp. VI =->V2"'-> . . .-->vq-->v. IMP _RES_TAC th then tries to specialize Xl, ... , Xp so that VI, ... , Vq match members of {tl,' .. , t m}. If such a match is found then the appropriate instance of V is added to the assumptions, together with all appropriate instances of Vi""> ... vn"">v (2 :::; i :::; n). IMP_RES_TAC can also be given a conjunction of implications in which case it will do 'resolution' with each of the conjuncts. In fact, it applies a canonicalization rule to its argument to split it into a list of theorems. Each theorem produced by this canonicalization process is resolved with the assumptions. Full details can be found in the Cambridge LCF Manual [18]. • Uses: Deriving new assumptions from existing ones and previously proved theorems so that subsequent tactics (e.g. ASI'-REWRITE_TAC) have more to work with. 7.4.11

STRIP..l'AC: tactic

• Summary: Breaks a goal apart. STRIP _TAC removes one outer connective from the goal, using cmU_TAC, DISCH_TAC, GEICTAC, etc. If the goal has the form tl/\" ·/\tn ..=> t then DISCH_TAC makes each ti into a separate assumption. • Uses: Useful for spliting a goal up into manageable pieces. Often the best thing to do first is REPEAT STRIP _TAC. 7.4.12

SUBST..l'AC: thm list -> thm

• Summary: SUBST_TAC[I-Ul=Vl; ••. ; I-un=vn ] converts a goal of the form t[Ul, . .. , un] to a subgoal of the form t[Vl, ... , vn ]. • Uses: To make replacements for terms in situation in which REWRITE_TAC is too general or would loop. 7.4.13

ALL_TAC: tactic

• Summary: Identity tactic for the tactical THEli (see end of Section 7.1). • Uses: 1. Writing tacticals (see description of REPEAT in Section 7.3). 2. With THENL; for example, if tactic T produces two subgoals and we want to apply Tl to the first one but to do nothing to the second, then the tactic to use is T THENL[Tl;ALL_TAC].

114

A Proof Generating System for Higher-Order Logic

7.4.14

IO_TAC: tactic

• Summary: Tactic that always fails . • Uses: Writing tacticals (see the example in Section 8).

8

An Example of HOL in Action

The example in this section has been chosen to give a flavour of what it is like to use the HOL system. Although the theorems proved are simple, the way we prove them illustrates the kind of intricate 'proof engineering' that is typical. The proofs below could be done more elegantly, but presenting them that way would defeat our purpose of illustrating various features of HOL. We have tried to use a small example to give the reader a feel for what it is like to do a big one. Readers who are not interested in hardware verification should be able to learn something about the HOL system even if they do not wish to penetrate the details of the parity-checking example we use. As in Section 2, the boxed interactions below should be understood as occurring in sequence. These interactions comprise the specification and verification of a device that computes the parity of a sequence of bits. More specifically, we verify the implementation of a device with an input in and an output out with the specification that the nth output on out is T if and only if there have been an even number of T's input on in. We shall construct a new theory called Parity in which we specify and verify the device. The first thing we do is start up the HOL system and then enter draft mode for this theory.

% hoI

1__ 1 1 1 1 1 IGHER 1__ 1 RDER 1__ OGle (Built on Sept 31) #nev_theory'Parity';; o : void To specify the device we define a primitive recursive function PARITY such that PARITY n f is true if the number ofT's in the sequence f(O), ... , f(n) is even.

A Proof Generating System for Higher-Order Logic

115

'let PARITY_DEF • new_pria_rec_definition • ('PARITY_DEF'. • "(PARITY 0 f - T) /\ • (PARITY(SUe n)f - (f(SUe n) -> -(PARITY n f) 1 PARITY n f»") i i PARITY _DEF '"' 1- (PARITY 0 f - T) /\ (PARITY(SUe n)f - (f(SUe n) -> -PARITY n f 1 PARITY n f»

The effect of new_pria~ec_definition is to store the definition of the constant to bind the defining theorem to the ML variable device can now be given as:

PARITY on the theory Parity and PARITY. The specification of the !t. out t '"' PARITY t in

It is intuitively clear that this specification will be satisfied if the signal 14 func-

tions in and out satisfy: out(O) '"' T

and !t. out(t+l)

-

(in(t+l) -> -(out t) lout t)

We can verify this formally in HOL by proving the following lemma: !in out. (out 0 - T) /\ (!t. out (Sue t) - (in(SUe t)

=> -(out t) lout t»

--> (!t. out t - PARITY tin)

The proof of this is by Mathematical Induction and, although trivial, is a good illustration of how such proofs are done. We will prove this lemma i.teractively using HOL's subgoal package 15 . We start the proof by putting the goal we want to prove on a goal stack using the function I!.~t_goal which takes a goal as argument. 14 Signals

are modelled as functions from numbers (representing times) to booleans.

15The subgoal package is part of Cambridge LCF, but not Edinburgh LCF. It was implemented by Larry Paulson and provides proof building tools similar to those found in Stanford LCF [14]. We do not describe the subgoal package in this paper, but hope that the simple uses of it that we make will be clear.

116

A Proof Generating System for Higher-Order Logic

#set_goal #([], # "!in out. # (out 0 .. T) /\ (!t. out(SUC t) .. (in(SUC t) => -(out t) lout t» # ..=> # ( ! t. out t • PARITY t in)");; "!in out. (out 0 .. T) /\ (!t. out(SUC t) z (in(SUC t) => -out t lout t» ==> ( ! t. out t .. PARITY t in)"

The subgoal package prints out the goal on the top of the goal stack. We now expand the top goal by stripping off the universal quantifier (with GEICTAC) and then making the two conjuncts of the antecedant of the implication into assumptions ofthe goal (with STRIP _TAC). The ML function expand takes a tactic and applies it to the top goal; the resulting subgoals are pushed on to the goal stack. The message "OK •• " is printed out just before the tactic is applied. #expand(REPEAT GEN_TAC THEN STRIP_TAC);; OK •• "!t. out t .. PARITY t in" [ "out 0 .. T" ] [ "!t. out(SUC t) .. (in(SUC t) -> -out t

lout t)" ]

Next we do induction on t using INDUCT_TAC, which does induction on the outermost universally quantified variable. #expand INDUCT_TAC;; OK •• "out(SUC t) .. PARITY(SUC t)in" [ "out 0 .. T" ] [ "!t. out(SUC t) = (in(SUC t) [ "out t .. PARITY t in" ] "out 0 .. PARITY 0 in" [ "out 0 .. T" ] [ "!t. out(SUC t)

=> -out t lout t)" ]

= (in(SUC t) => -out t lout t)" ]

The assumptions of the two sub goals are shown in square brackets. The last goal printed is the one on the top of the stack, which is the basis case. This is solved by rewriting with its assumptions and the definition of PARITY.

A Proof Generating System for Higher-Order Logic

117

#expand(ASM_REWRlTE_TAC[PARITY_DEF]);; OK ..

goal proved . 1- out 0 - PARITY 0 in Previous subproof: "out(SUC t) .. PARITY(SUC t)in" [ "out 0 = Til ] [ "!t. out(SUC t) = (in(SUC t) => -out t lout t)" ] [ "out t - PARITY t in" ]

The top goal is proved, so the system pops it from the goal stack (and puts the proved theorem on a stack of theorems). The new top goal is the step case of the induction. This goal is also solved by rewriting.

#expand(ASM_REWRlTE_TAC[PARITY_DEF]);; OK ..

goal proved 1- out(SUC t) = PARITY(SUC t)in 1- !t. out t .. PARITY t in 1- !in out • .(out 0 = T) /\ (!t. out(SUC t) .. (in(SUC t) => -out t lout t))

--> (!t. out t - PARITY t in) Previous subproof: goal proved

The goal is proved, i.e. the empty list of subgoals is produced. The system now applies the justification functions (see Section 7.1) produced by the tactics to the lists of theorems achieving the sub goals (starting with the empty list). These theorems are printed out in the order they are generated (note that assumptions of theorems are printed as dots). The ML function

saves the theorem just proved (i.e. on the top of the theorem stack) in the current theory with a given name.

118

A Proof Generating System for Higher-Order Logic

#let UNIQUENESS_LEMMA = save_top_thm 'UNIQUENESS_LEMMA';; UNIQUENESS_LEMMA = 1- lin out. (out 0 = T) /\ (!t. out(SUC t) = (in(SUC t) => -out t lout t»

==>

(!t. out t = PARITY t in)

this saves the theorem in the theory Parity with name UNIQUENESS_LEMMA and also binds it to an ML variable with the same name. The lemmajust proved suggests that the parity checker can be implemented by holding the parity value in a register and then complementing the contents of the register whenever T is input. To make the implementation more interesting, we will assume that registers 'power up' storing F. Thus the output at time o cannot be taken directly from a register, because the output of the parity checker at time 0 is specified to be T. Another tricky thing to notice is that if t>O, then the output of the parity checker at time t is a function ofthe input at time t. Thus there must be a combinational path from the input to the output. The schematic diagram below shows the design of a device that is intended to implement this specification. in

12

out

This works by storing the parity of the sequence input so far in the lower of the two registers. Each time T is input at in, this stored value is complemented.

A Proof Generating System for Higher-Order Logic

119

Registers are assumed to 'power up' in a state in which they are storing F. The second register (connected to OlE) initially outputs F and then outputs T forever. Its role is just to ensure that the device works during the first cycle by connecting the output out to the device DIE via the lower multiplexer. For all subsequent cycles out is connected to 13 and so either carries the stored parity value (if the current input is F) or the complement of this value (if the current input is T). The devices making up this schematic will be modelled with predicates [8]. For example, the predicate DIE is true of a signal out iffor all times t, the value of out is T.

net OIE_DEF • new_definition • ('OIE_DEF', "OIE(out:nUII->bool) - It. out t - Til);; OIE_DEF - 1- OlE out - (!t. out t - T) Note that "OIE_DEF" is used both as an ML variable and as the name of the definition in the theory Parity. The binary predicate lOT is true of a pair of signals (in,out) if the value of out is always the negation of the value of in. We thus model inverters as having no delay. This is appropriate for a register-transfer level model, but would be wrong at a lower level. .

'let lOT _DEF • new_definition • ('IOT_DEF', "IOT(in,out:nUII->bool) - It. out t - -(in t)");; IOT_DEF - J- IOT(in,out) - (!t. out t - -in t) The final combinational device we need is a multiplexer. This is a 'hardware conditional'; the input n selects which of the other two inputs are to be connected to the output out. 'let IlUX_DEF • new_definition • ('IlUX_DEF' , • "1lUX(.w,in1,in2.out:DUII->bool)• It. out t - (n t -> in1 t 1 in2 t)It);; IlUX_DEF I- 1lUX(.w,in1.in2.out) - (!t. out t - (BW t -> in1 t 1 in2 t»

120

A Proof Generating System for Higher-Order Logic

The remaining devices in the schematic are registers. These are unit-delay elements; the values output at time t+1 are the values input at the preceding time t, except at time 0 when the register outputs F16 . #let REG_DEF • • nev_definition • ('REG_DEF', ItREG(in,out:nua->boo1) • • !t. out t .. «t ..O) -> F 1 in(t-1» It); ; REG_DEF .. 1- REG(in,out) .. (!t. out t .. «t • 0) -> F 1 in(t - 1»)

The schematic diagram above can be represented as a predicate by conjoining the relations holding between the various signals and then existentially quantifying the internal lines. This techniques is explained elsewhere (e.g. see [8],

[1]).

'let PARITY_IMP_DEF .. • nev_definition • ('PARITY_IMP_DEF', • "PARITY_IMP(in,out) .. • 111 12 13 14 15. • IOT(12,11) /\ KUX(in.11.12.13) /\ REG(out.12) /\ # ORE 14 /\ REG(14.15) /\ MUX(l5.13,14,out)");; PARITY_IMP_DEF .. 1- PARITY_IMP(in.out) .. (111 12 13 14 15. IOT(12,11) /\ KUX(in,11,12.13) /\ REG(out,12) /\ ONE 14 /\ REG(14,15) 1\ KUX(15,13,14,out»

What we shall eventually prove is 1- lin out. PARITY_IMP(in.out) ==> (!t. out t .. PARITY t in)

This states that if in and out are related as in the schematic diagram (i. e. as in the definition of PARITY_IMP), then the pair of signals (in, out) satisfies the specification. We shall start by proving the following lemma: 16Time 0 represents when the device is switched on.

A Proof Generating System for Higher-Order Logic

121

1- lin out. PARITY_IMP(in,out) -=> (out 0 .. T) /\ It. out(SUC t) - (in(SUC t) -> -(out t) lout t)

The correctness of the parity checker follows from this and UIUQUERESS_LEMKA by the transitivity of _a>. #set_goal # «(], "! in out. # PARITY_IMP (in ,out) _a> # (out 0 ,. T) /\ ! t. out (SUC t) .. (in(SUC t) -> - (out t) lout t) ") ; ; # "!in out. PARITY_IMP (in ,out) =-> (out 0 .. T) /\ (!t. out(SUC t) .. (in(SUC t) -> -out t lout t»"

We start proving this by rewriting with definitions followed by a decomposition of the resulting goal using STRIP _TAC. We use the rewriting tactic PURE_REWRITE_TAC which does no built-in simplifications, only the ones explicitly given in the list of theorems supplied as an argument. One of the built-in simplifications that REWRITE_TAC uses is 1- (x .. T) .. T. We use PURE_REWRITE_TAC to prevent rewriting with this being done. #expand(PURE_REVRITE_TAC # [PARITY_IMP_DEF;ORE_DEF;IOT_DEF;MUX_DEF;REG_DEF] # THEI REPEAT STRIP _TAC);; OIL. 2 subgoals "out(SUC t) - (in(SUC t) -> -out t lout t)" [ "!t. 11 t - -12 t" ] [ "!t. 13 t .. (in t "'> 11 t 12 t)" ] [ "!t. 12 t .. «t .. 0) -> F out(t - 1»" ] [ "!t. 14 t .. T" ] [ "!t. 15 t .. «t - 0) => F 14(t - 1»" ] [ nIt. out t .. (15 t -> 13 t 1 14 t)" ] "out 0 .. T" [ "!t. 11 t .. -12 tIt ] [ "!t. 13 t .. (in t -> 11 t 12 t)" ] [ "!t. 12 t out(t - 1»" ] - «t - 0) -> F [ "!t. 14 t - T" ] [ "!t. 15 t .. «t .. 0) "'> F l4(t - 1»" ] [ "!t. out t ,. (15 t -> 13 t 1 14 t)" ]

122

A Proof Generating System for Higher-Order Logic

The top goal is the one printed last; its conclusion is out 0 - T and its assumptions are equations relating the values on the lines in the circuit. We would like to expand the top goal by rewriting with the assumptions. However, if we do this the system will go into an infinite loop because the equations for out, 12 and 13 are mutually recursive. To prevent looping, we must be more delicate and only rewrite with a non-looping subset of the assumptions. To enable the assumptions corresponding to particblar lines to be selected for rewriting, we define an ML function linea such that linea 'It .. . 1,.' t is true if t has the form "! t. 1; t - ... " for some 1. in the set specified by the string 'It ... 1,.'. The functions vorda and rator used below are explained in the examples in Section 2. 'let linea tok t • (let x - fat(deat_var(rator(lha(snd(dest_forall t»») •• ae. in x (vorda tot» ? falae;; linea - - : (atring -> tera -> bool) FILTER_ASIl.REWRITE_TACUines'out 14 15') 0 is a tactic which rewrites with only those assumptions that involve out, 14 and IS (see Section 7.4.5).

hxpand(FILTER..ASJI_REWRITE_TAC(lineB'out 14 IS') 0);; 01 .. goal proved ... \- out 0 - T Previous Bubproof: "out(SUC t ) - (in(SUC t) -> -out t lout t)" [ "!t. 11 t - -12 t" ] [ "!t. 13 t - (in t -> 11 t 12 t)" ] out(t - 1»"] [ "!t. 12 t - «t - 0) -> F [ "!t. 14 t - T" ] [ "It. 15 t - «t - 0) -> F I 14(t - 1»" ] [ nIt. out t - (15 t -> 13 t I 14 t)" ]

The first of the two subgoals is proved. Inspecting the remaining goal we see that it will be solved if we expand out (SUC t) using only the last assumption (i. e. It. out t - (15 t -> 13 t I 14 t». If we just rewrite with this assumption, then all the subterms of the form out t will also be expanded. To prevent this we are forced to use the messy and ad hoc tactic shown below.

A Proof Generating System for Higher-Order Logic

123

FIRST_ASSUK (\th. if lines'out'(concl th) then SUBST_TAC[SPEC "suc ttl th] else NO_TAC) FIRST_ASSUM is

explained in Section 7.3.6. The function

(\th. if lines'out'(concl th) then SUBST_TAC[SPEC "suc t" th] else NO_TAC)

converts a theorem of the form 1- It. out t - tn1[t]

to a tactic that replaces occurrences of out(SUC t) by tn1[SUC t]. All other theorems are mapped to NO_TAC, a tactic that always fails.

'expand • (FIRST_AS SUM • (\th. if lines'out'(concl th) • then SUBSLTAC[SPEC "SUC t" th] • else NO_TAC»;; OK •• "(15(SUC t) -> 13(SUC t) 1 14(SUC t» - (in(SUC t) -> -out t lout t)" [ "!t. 11 t - -12 t" ] [ "! t. 13 t - (in t -> 11 t 12 t)" ] out(t - 1»" ] [ "!t. 12 t - «t - 0) -> F [ "! t. 14 t - T" ] [ "It. 15 t '"' «t '"' 0) -> F 14(t - 1»" ] [ "! t. out t - (15 t -> 13 t 1 14 t)" ]

The fact that we had to resort to this messy use of FIRST_ASSUM illustrates both the strengths and weaknesses of the HOL system. Trivial deductions sometimes require elaborate tactics, but on the other hand one never reaches an impasse. HOL experts can prove arbitrarily complicated theorems if they are willing to use sufficient ingenuity. Furthermore, the type discipline ensures that no matter how complicated and ad hoc are the tactics, it is impossible to prove an invalid theorem. Inspecting the goal above, we see that the next step is to unwind the equations for lines 11,13,14 and 15 and then, when this is done, unwind with the equation for line 12.

124

A Proof Generating System for Higher-Order Logic

#expand(FILTER_ASK_REWRlTE_TAC(lines'l1 13 14 15') [] # THEK FILTER_ASK_REWRlTE_TAC(lines'12')[]);; OK .. "«(SUC t .. 0) -> FIT) ..> (in(SUC t) -> -«SUC t - 0) -> F 1 out«SUC t) - 1» «SUC t - 0) -> F 1 out«SUC t) - 1») T) .. (in(SUC t) -> -out t lout t)" [ "!t. 11 t .. -12 ttl ] [ "!t. 13 t .. (in t -> 11 t 12 t)" ] [ "!t. 12 t - «t .. 0) -> F out(t - 1»" ] [ "!t. 14 t - T" ] [ "!t. 15 t - «t - 0) -> F 14(t - 1»" ] [ "!t. out t - (15 t => 13 t 1 14 t)" ]

This goal can now be solved by rewriting with two standard theorems: 1- In. -(SUC n .. 0) 1- !a. (SUC a) - 1 - a 'expand(REWRITE_TAC[HOT_SUC;SUC_SUB1]);; OK .. goal proved 1- «(SUC t - 0) -> FIT) -> (in(SUC t) -> -«SUC t .. 0) => F 1 out«SUC t) - 1» «SUC t .. 0) -> F 1 out«SUC t) - 1») T) ..

(in(SUC t) => -out t lout t) ••••• 1- (15(SUC t) -> 13(SUC t) 1 14(SUC t» (in(SUC t) => -out t lout t) .....• 1- out(SUC t) .. (in(SUC t) => -out t lout t) 1- lin out. PARITY3KP(in.out) "'"'> (out 0 - T) /\ (!t. out(SUC t) - (in(SUC t) -> -out t lout t» Previous subproof: goal proved

We name the theorem just proved PARITY _LEKKA and save it in the current theory.

A Proof Generating System for Higher-Order Logic

125

#let PARITY_LEMMA· save_top_tha'PARITY_LEMKA';; PARITY_LEMMA • 1- lin out. PARITY_IKP(in,out) (out 0 - T) /\ (!t. out(SUC t) • (in(SUC t) -> -out t lout t»

._>

We could have proved PARITY_LEMMA in one step with a single compound tactic. To illustrate this we set up the goal again. #Set_goal # ([], "! in out. # PARITY_IKP(in,out) _x> # (out 0 • T) /\ # !t. out(SUC t) • (in(SUC t) -> -(out t) lout t)");; "!in out. PARITY_IKP(in,out) (out 0 - T) /\ (!t. out(SUC t) • (in(SUC t) -> -out t lout t»"

_.>

We then expand with a single tactic corresponding to the sequence of tactics that we used interactively. #expand # (PURE_REWRITE_TACcPARITY_IKP_DEF;ONE_DEF;NOT_DEF;MUX_DEF;REG_DEF] # THEN REPEAT STRIP_TAC # THENL # [FILTER_ASK_REWRITE_TAC(lines'out 14 l5')[];ALL_TAC] # THEN FIRST_ASSUK # (\th. if lines 'out' (conc1 th) # then SUBSLTAC[SPEC "suc t" th] # else NO_TAC) # THEN FILTER_ASK_REWRITE_TAC(lines'll 13 14 15')[] # THEN FILTER_ASK_REWRITE_TAC(lines'12')[] # THEN REWRITE_TAC[NOT_SUC;SUC_SUB1]);; OK .. goal proved 1- lin out. PARITY_IKP(in,out) _a> (out 0 - T) /\ (!t. out(SUC t) - (in(SUC t) -> -out t lout t» Previous subproof: goal proved

126

A Proof Generating System for Higher-Order Logic

Armed with PARITY_LEKMA, we can now move quickly towards the final theorem. We will do this proof in one step using the ML function prove_ thm described in Section 7.2.

#let PARITY_CORRECT· # prove_to # ('PARITY_CORRECT', # "! in out. PARITY_IMP(in,out) ... > (!t. out t • PARITY t in)", # REPEAT GEN_TAC # THEN DISCH_TAC # THEN IMP _RES_TiC PARITY_LEKMA # THEN IMP _RES_TiC URIQUERESS_LEKMA);; PARITY_CORRECT .. 1- lin out. PARITY_IMP(in,out) _.) (!t. out t • PARITY t in) #close_theory();; o : void

This completes the proof of the parity checking device.

9

Conel usions

The example in the preceding section, though simple, is representative of larger scale proofs using the HOL system. For descriptions of such proofs see Avra Cohn's paper on the verification of the major state machine of the Viper microprocessor [3], Jeff Joyce's paper on the verification of a simple microcoded computer [11], John Herbert's proof of the EeL chip of the Cambridge Fast Ring [10] and Tom Melham's proof of a simple local area network [13]. All of these examples show that HOL can be used to prove non-trivial digital systems correct. Current research at Cambridge is looking at new ways of modelling hardware in higher-order logic (e.g. the representation of low-level circuit behaviour); ways of expressing and relating behavioural abstractions; and methods of executing formal specifications (for animation and simulation). In addition, we are trying to make the HOL technology available to the industrial community. This involves improving (and possibly reimplementing) the research prototype system to make it acceptably efficient and rugged. It also involves preparing documentation and tutorial material.

A Proof Generating System for Higher-Order Logic

127

Acknowlegements Members of the Hardware Verification Group at Cambridge have contributed to the development of HOL by using the system and suggesting improvements to it. Special thanks to Avra Cohn, Inder Dhingra, Roger Hale, John Herbert, Jeff Joyce and Tom Melham. Avra Cohn and Larry Paulson pointed out errors and infelicities in an earlier version of this paper.

References [1] A. J. Camilleri, T. F. Melham and M. J. C. Gordon, Hardware Verification Using Higher-Order Logic, University of Cambridge Computer Laboratory, Technical Report No. 91, 1986. [2] A. Church, A Formulation of the Simple Theory of Types, Journal of Symbolic Logic 5, 1940. [3] A. J. Cohn, A Proof of Correctness of the Viper Microprocessor: The First Level. In: VLSI Specification, Verification and Synthesis, edited by G. Birtwistle and P.A. Subrahmanyam (this volume)". [4] G. Cousineau, G. Huet and L. Paulson, The ML Handbook, INRIA, 1986. [5] M. Gordon, R. Milner L., Morris, M. Newey and C. Wadsworth, A Metalanhuage for Interactive proof in LCF, Fifth ACM SIGACT-SIGPLAN Conference on Principles of Programming Languages, Tucson, Arizona, 1978. [6] M. Gordon, R. Milner and C. P. Wadsworth, Edinburgh LCF: A Mechanised Logic of Computation, Lecture Notes in Computer Science, SpringerVerlag, 1979. [7] M. Gordon, HOL: A Machine Oriented Formulation of Higher-Order Logic, University of Cambridge Computer Laboratory, Technical Report No. 68, 1985. [8] M. Gordon, Why Higher-order Logic is a Good Formalism for Specifying and Verifying Hardware. In: Formal Aspects of VLSI Design, edited by G. Milne and P. A. Subrahmanyam, North-Holland, 1986. [9] F. K. Hanna and N. Daeche, Specification and Verification Using HigherOrder Logic. In: Formal Aspects of VLSI Design, edited by G. Milne and P. A. Subrahmanyam, North-Holland, 1986. [10] J. Herbert, Ph.D. Thesis, University of Cambridge, to appear 1987. [11] J. J. Joyce, Verification and Implementation of a Microprocessor, In: VLSI Specification, Verification and Synthesis, edited by G. Birtwistle and P.A. Subrahmanyam (this volume). [12] A. Leisenring, Mathematical Logic and Hilbert's f-Symbol, Macdonald & Co. Ltd., London, 1969.

128

A Proof Generating System for Higher-Order Logic

[13] T. Melham, Ph.D. Thesis, University of Cambridge, to appear. [14] R. Milner, Implementation and Application of Scott's Logic for Computable Functions, Proceedings of the ACM Conference on Proving Assertions about Programs, SIGPLAN notices 7,1, 1972.

[15] R. Milner, A Theory of Type Polymorphism in Programming, Journal of Computer and System Sciences, 17, 1978. [16] R. Milner, A Proposal for Standard ML, Proceedings of the 1984 ACM Symposium on LISP and Functional Programming, Austin, Texas, 1984. [17] L. Paulson, A Higher- Order Implementation of Rewriting, Science of Computer Programming 3, 119-149,1983. [18] L. Paulson, Logic and Computation, Cambridge University Press, to Appear, 1987.

4 Formal Verification and Implementation of a Microprocessor Jeffrey J. Joyce University of Cambridge Computer Laboratory Corn Exchange Street Cambridge, CB2 SQG England.

Introduction Mike Gordon has described the specification and verification of a microcoded computer using the LCF...LSM hardware verification system 181. We have subsequently redone this example in higher-order logic using the HOL system 1101. In this paper we present the specification of Gordon's computer in higherorder logic and a brief explanation of its formal verification. A more detailed discussion of the formal verification may be found in 1161. We also describe several related examples of hardware verification based on Gordon's computer and other microprocessor designs. Finally, we report experience in using a formal specification to implement Gordon's computer as a 5,000 transistor CMOS microchip.

1

Specification-Driven Design

Previous discussions of this example in 181 and 1161 have presented -finished" specifications of the computer. However, formal specification can also play an active role in the design process by precisely capturing informal descriptions and motivating cleanly derived views of abstract behaviour. In this paper, we describe the specification-driven design of Gordon's computer showing how an informal description can be refined into a formal specification written in the HOL language.

Basic Operation The principal function of the computer will be to execute programs. However, we will also need a way to load programs into the memory of the computer as well as a means of starting and interrupting the execution of programs. This suggests a design with two modes of operation: an 'idle' mode for loading a new program into the memory and a 'run' mode for program execution.

130 Formal Verification and Implementation of a Microprocessor

button

not button

not button

button

Figure 1: Finite State Machine Behaviour To make the computer change from idle mode to run mode and vice verla, we will include a push button on the front panel of the computer. A transition from idle mode to run mode starts the execution of a program while a transition from run mode to idle mode corresponds to the interruption or termination of an executing program. The state graph in Figure 1 informally captures this first approximation to the basic operation of the computer. This behaviour can also be described in higher-order logic by the following HOL specification.

1- ABSTRACT_MACHINE (buttoD_aba,idle_aba) -

It:time. idle_aba (t+l) idle_aba t -> (button_aba t -> F , T) 1 (buttoD_aba t -> T I F)

The above HOL specification defines a predicate ABSTRACT_MACHINE which describes the basic operation of the computer. As the design of the computer evolves we will we revise this definition until we obtain the final behavioural specification of the computer. As it appears above, the predicate ABSTRACT_MACHINE constrains the behaviour ofbutto~aba and idle_aba which are boolean signals. These signals are modelled as functions from discrete time, isomorphic with the natural numbers, to boolean values, T and F. Hence, idle_aba (t+1) is the boolean value of the idle_aba signal at time (t+1). The idle_aba signal indicates whether the computer is in idle mode or run mode at any time. In addition to booleans, a signal may be a mapping to bi-state and tri-state words or even to an entire memory state. 'I', '1' and '0' are ASCn-printable characters which denote universal and existential quantifi-

Formal Verification and Implementation of a Microprocessor 131

button

not button or button and knob is 0, " or 2

not button

button and knob is 3

Figure 2: Revised Finite State Machine Behaviour cation and Hilbert's e-operator in the HOL system. A conditional expression has the form, -A -> B I C·. Given this explanation, it is easy to see that the above HOL specification formally captures the behaviour represented by the state diagram in Figure 1. We must now consider how a program is to be loaded into memory while the computer is in idle mode. First, we need a set of sixteen switches so that the user can input memory addresses and data values. We will also need a pair of registers to buffer switch inputs. These are the 13-bit program counter PC, and 16-bit accumulator ACC. To store a 16-bit value into a memory location, a 13-bit address is read from the switches into the PC register and a 16-bit data value read from the switches into the ACC register. Following these two steps, the value of the ACC register can be stored in memory at the location given by the PC register. We refine our use of the button by adding a knob which selects one of four possible operations when the button is pushed. These four possible operations are: load PC, load ACC, store ACC at PC, and start program execution at PC. The state graph is revised in Figure 2 to reflect these refinements. The execution of an instruction is represented by a single transition in the graph because we have not yet proposed an instruction set for the computer. At this point our design consists of a memory with a 13-bit address space and a 16-bit word size, a 13-bit PC register, a 16-bit ACC register, 16 switches, a four position knob and a push button. Collectively, these signals along with the idle_aba signal completely describe the state of the abstract computer at any time. The revised HOL specification not only captures the new transitions introduced in the revised state graph but also provides precise details about the change of state for each transition. The HOL specification defines the next state of the computer as a function of its previous state and the current inputs.

132 Formal Verification and Implementation of a Microprocessor

1- ABSTRACT_MACHINE (button_aba.knob_aba.awitchea_aba. memory_aba.pc_aba.acc_aba.idle_aba) !t:num. (memory_aba(t+l}.pc_aba(t+l).acc_aba(t+l}.idle_aba(t+l}) (idle_aba t -> (button_aba t -> «VAL2(knob_aba t) - 0) -> (memory_aba t. CUTI6_13(awitchea_aba t). acc_aba t. T) 1 (VAL2(knob_aba t) - 1) -> (memory_aba t. pc_aba t. awitchea_aba t. T) 1 (VAL2(knob_aba t) - 2) -> (STOREI3(pc_aba t)(acc_aba t)(memory_aba t). pc_aba t. acc_aba t. T) 1 (memory_aba t. pc_aba t. acc_aba t. F» (memory_aba t. pc_aba t. acc_aba t. T» I (button_aba t -> (memory_aba t. pc_aba t. acc_aba t. T) I EXECUTE (memory_aba t. pc_aba t. acc_aba t») The above HOL specification describes the behaviour of the computer in terms of a relationship between signals of the abstract computer. For example, when the computer is in the idle state, the knob is in position 1 and the button is pushed, the next state of the computer is given by the following 4-tuple. The 4-tuple indicates that the accumulator will be loaded with the current switch inputs while the memory state and program counter will be unchange and the computer will remain in the idle state.

Our use of the suffix .. _aba" to name signals in the abstract computer emphasises the fact that we are describing these signals from an abstract point of view with respect to time. This convention will also be useful in distinguishing signals of the abstract computer from signals in the concrete implementation. The definition of ABSTRACT_MACHINE implies that every transition occurs in a single time step. However, when we consider a microcoded implementation of the computer, we shall see that state transitions actually require varying amounts of real time, or more precisely, varying numbers of clock cycles. However, it must be emphasized that this abstract view, where a state transition occurs in a single time step, is not inaccurate. Instead, much of our presentation is concerned with showing that this abstract view of sequential behaviour is logically consistent with the sequence of events in a concrete implementation of the computer.

Formal Verification and Implementation of a Microprocessor 133

Instruction Set Semantics Having designed the basic idle/run operation of the computer, we can now proceed with the design of the instruction set. Using a 3-bit opcode and 13-bit operand address instruction format, the following instruction set is proposed. OOo-addressOOI-addressOlo-addressOll-addressIOo-address100-address1lo-addresslll-address-

HALT JMPL JZR L ADDL SUBL LDL ST L

SKIP

Stops execution Jump to address Jump to address if ACC = 0 Add contents of address to ACC Subtract contents of address from ACC Load contents of address into ACC Store contents of ACC in address Skip to next instruction

We formally specify the semantics of this instruction set in terms of the state change which result from the execution of an instruction. EXECUTE is a function which returns a 4-tuple consisting of the next values for the memory, program counter and accumulator and idle/run status given the current state of the computer.

1- EXECUTE (memory_val,pc_val,acc_val) let opcode - VAL3(OPCODE(FETCH13 memory_val pc_val» in let address - CUT16_13(FETCH13 memory_val pc_val) in «opcode - HALT) -> (memory_val, pc_val, acc_val, T) (opcode - JMP) -> (memory_val, address, acc_val, F) (opcode - JZR) -> «VALI6 acc_val - 0) -> (memory_val, address, acc_val, F) 1 (memory_val, INC13 pc_val, acc_val, F» 1 (opcode - ADD) -> (memory_val, INCl3 pc_val, ADD16 acc_val (FETCHI3 memory_val address), F) 1 (opcode - SUB) -> (memory_val, INC13 pc_val, SUB16 acc_val (FETCH13 memory_val address), F) 1 (opcode - LD) -> (memory_val, INC13 pc_val, FETCH13 memory_val address, F) 1 (opcode - ST) -> (STORE13 address acc_val memory_val, INC13 pc_val, acc_val, F) 1 (memory_val, INC13 pc_val, acc_val, F» The above definitions of ABSTRACT_MACHINE and EXECUTE form a complete

134 Formal Verification and Implementation of a Microprocessor

target level specification of the proposed computer. A functional simulation of the abstract machine could easily be derived from this specification, however our interest lies in formally relating this target level specification to a detailed specification of a register-transfer level implementation of the computer.

2

Concrete Implementation

Many different implementations of the above computer are possible. For example, Daniel Weise has described a "RISC" implementation of the computer where every state transition occurs in exactly one clock cycle [181. However, we follow Gordon's original presentation of this example by implementing the computer with a microcoded control unit. The register-transfer level implementation of the computer is shown in Figure 3. In addition to the PC and ACC registers which are a part of the abstract computer, several other internal registers have been added. The registers, the memory, an arithmetic-logic unit and the external switches are connected to a 16-bit data bus through tri-state buffers. The control unit consists of a 32word by 3D-bit microcode ROM, decoding logic and a 5-bit microcode program counter. The microcode is written to realize the abstract behaviour of the computer using this architecture. There are a total of fifteen possible transitions in the revised state graph of Figure 2 when the single transition for the execution of an instruction is expanded into eight different transitions. Each of the fifteen transitions are implemented by a sequence of microinstructions whose net effect is given by the definitions of ABSTRACT_MACHINE and EXECUTE. Furthermore, each sequence will start and finish at one of two locations in the microcode depending on the start and finish states, i.e. 'idle' or 'run', of the corresponding transition. In the final version of the microcode shown in Figure 4, the 'idle' and 'run' states correspond to locations 0 and 5. The register-transfer level implementation of the computer can be formally described in the HOL language using hierarchical specification [41. This method of hardware specification within higher-order logic is based on the composition of smaller circuits and hiding internal signals. Circuit composition and internal signal hiding are modelled by logical conjunction and existential quantification. The use of existential quantification to hide internal signals is explained in [171. Behavioural specification in HOL involves the use of logical constants, e.g. ADD16, which are informally understood to name certain operations 1. Unlike the specification of a simulation model, it is not always necessary to provide details about these operations. In other cues some properties of the operation may be stated or the operation may even require a full definition. Whereas a simulation model must be completely operational, it is possible to formally IThe di.tinction between a formal theory and ita interpretation i. diaeuued in (11).

Formal VerUicatlon and Implementation of a Mlcroproce&8Ol' 135

switches

knob

button

"'PC

L..-_ _.... idle L..-_ _ _ _ _ _

+

t\w

wmat

lIucntl

pc

ready

rbuf

ace

Figure 3: Regiater-'Iransfer Level Implementation of Gordon', Computer

136 Formal Veri.8.cation and Implementation of a Microprocessor

Location 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Control

Signals

Next Location

Explanation

ready

idle

rsw rsw rpc ready rpc racc read

wpc wacc wmar

rir

wpc

racc racc rir rir rpc rbuf rir read rbuf rir read read race

warg warg wmar wmar inc wpc wmar add wace wmar sub wacc write

test_button (1,0) jump..knob 1 jump 0 jump 0 jump 7 test_button (0,6) jump 8 jump 0 jump 9 jump_opcode 10 jump 0 jump 5 test_acc (11,17) jump 19 jump 22 jump 24 jump 25 jump 18 jump 5 jump 20 jump 21 jump 17 jump 23 jump 21 jump 17 jump 17 jump 0 jump 0 jump 0 jump 0 jump 0 jump 0

begin idling cycle decode knob position switches -+ PC switches -+ ACC PC -+ MAR begin instruction execution PC -+ MAR ACC -+ MEM (MAR) MEM(MAR)-+IR decode instruction HALT JMP, IR -+ PC JZR ADD, ACC -+ ARG SUB, ACC -+ ARG LD, IR-+ MAR ST, ffi-+ MAR SKIP, PC + 1 -+ BUF BUF -+ PC IR -+ MAR ARG + MEM (MAR) -+ BUF BUF -+ ACC IR -+ MAR ARG - MEM (MAR) -+ BUF MEM (MAR) -+ ACC ACC -+ MEM (MAR) unused unused unused unused unused unused

wmar write wir

Figure 4: Microcode Program Pseudo-Code

Formal Verification and Implementation of a Microprocessor 137

reason about hardware from an abstract point of view where certain operational details are suppressed. At the leaf level, primitive components are specified in terms of their sequential behaviour. For example, the behavioural specification of a 16-bit register is shown below.

1- REG16 (input,load,output) !t:time. output (t+l) - «load t) => (input t) 1 (output t» The intermediate level of specification consists of structural specifications for the data path and the control unit in terms of primitive components. The top level specification defines the structure of the entire computer as the composition of the data path and the control unit. Existential quantification is used to hide the the control signals connecting the control unit to the data path. For readability, CONCRETE_MACHINE is defined as a predicate on signals grouped into target level signals, internal state signals, and the ready signal. Following the terminology used by Avra Cohn in the formal specification of the VIPER microprocessor !51, the internal state signals, mpc, mar, ir, arg and buf, are collectively known as the 'transient'. As we shall see, the ready signal has a fundamental role in relating the behaviour imposed by the definition of CONCRETE_MACHINE to the abstract behaviour of the computer.

1- CONCRETE_MACHUlE (button,knob,switches,memory,pc,acc,idle) (mpc,mar,ir,arg,buf) ready ?rsw wmar memcntl wpc rpc wacc race wir rir warg alucntl rbuf. CONTROL_UNIT MICROCODE (button,knob,acc,idle,mpc,ir,ready,rsw,wmar, memcntl,wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf) /\ DATA_PATH (switches,memory,pc,acc ,mar, ir , arg , buf ,rsw,wmar, memcntl,wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf) The complete specification of the implementation is given in an appendix to this paper and a detailed explanation may be found in !161. This specification results in a predicate, CONCRETE_MACHINE, which defines a relationship on a set of signals from one clock instant to the next. This behaviour is imposed by the structure of the implementation and the behaviour of primitive components. To summarize our progress so far, we have specified both the abstract and concrete behaviours for the computer. ABSTRACT_MACHINE defines the sequential behaviour of an abstract computer in terms of state transitions. On the

138 Pormal VeriJlcation and Implementation of a Microproceuor

other hand, CONCRETE_MA~HINE defines the sequential behaviour of a concrete implementation in terms of clocked sequential components. A central feature of these formal specifications is that the abstract computer and its concrete implementation operate at different time scales. To prove that the computer has been implemented correctly, we need to show that the abstract behaviour of the computer is logically consistent with its concrete implementation. A fundamental step in the formal verification of the computer will be to establish a mathematical relationship between the abstract time scale and the concrete time scale. The concrete behaviour of the computer consists of executing sequences of microinstructions whose net effect correspond to state transitions in the abstract computer. These sequences are represented in Figure 5 which shows the state graph for the concrete machine. Location 0 in the microcode corresponds to start of a sequence of microinstructions implementing a transition which begins in the 'idle' state and finishes in either the 'idle' or 'run' state. Similarly, location 5 corresponds to the start of a transition from the 'run' state. Therefore, when the microinstruction program counter reaches either location 0 or location 5, the state of the concrete implementation must be consistent with the state of the abstract computer. The ready signal was specifically included in the implementation to signal when the microinstruction program counter reaches location 0 or location 5. This explains the significance of the ready signal in relating the behaviour of the concrete implementation to the behaviour of the abstract computer: the ready signal of the concrete implementation is the implicit clock signal of the abstract computer. To formally describe the relationship between the time scale of the abstract computer and the concrete time scale, we need to define a function which computes the concrete time of the ntA point on the abstract time scale. More precisely, this function will compute the concrete time of ntA occurrence of the ready signal. In these proceedings, Tom Melham defines a function, TimeOf, which constructs a temporal abstraction function from a signal such as ready (171. (TillleOf ready) is a function which maps abstract time to points on the concrete time scale. For completeness, the HOL definitions of TillleOf and the relation bTimeOf are shown below, however the reader is referred to (171 for an explanation of these definitions and related theorems.

-->

/- IsTimeOf pOt - P t /\ It'. t' < t -(p t') /- IaTillleOf p (n+l) t P t /\ ?t'. t' < t /\ IaTillleOf p n t' /\ It". (t' < t" /\ t" < t) -> -(p t") /- TimeOf p n - Gt. IaTillleOf p n t

Formal Verification and Implementation of a Microprocessor

Figure 5: State Graph for the Concrete Implementation

139

140 Formal Verification and Implementation of a MlcroproceB8or

Using TimeOf to construct a temporal abstraction function from the ready signal and function composition, signals of the abstract computer can be formally related to the corresponding signals in the concrete implementation. A formal relationship between abstract and concrete signals allows the correctness of the implementation with respect to the abstract computer to be stated by the following theorem. The correctness of the implementation depends on assumptions about the stability of external inputs, in particular, the signals knob and awi tchea. The predicate STABLE is used to formally express these assumptions.

1- STABLE (tl,t2) a -

It. «tl < t) /\ (t < t2»

_a> «a t) - (a tl»

1- CONCRETE_MACHINE (button,knob,switchea,memory,pc ,acc, idle) (mpc,mar,ir,arg,buf) ready /\ (button_aba - button 0 (TimeOf ready» /\ (knob_abs - knob 0 (TimeOf ready» /\ (switchea_abs - switches 0 (TimeOf ready» /\ (memory_abs - memory 0 (TimeOf ready» /\ (pc_abs - pc 0 (TimeOf ready» /\ (acc_aba - acc 0 (TimeOf ready» /\ (idle_abs - idle 0 (TimeOf ready» /\ (!t. STABLE(TimeOf ready t,TimeOf ready (t+l» (!t. STABLE(TimeOf ready t,TimeOf ready (t+l»

_a>

knob) /\ awitches)

ABSTRACT_MACHINE (button_abs,knob_abs,switches_abs, memory_abs,pc_abs,acc_abs,idle_abs)

3

Formal Verification

A detailed description of the formal verification of Gordon's computer may be found in [161. In this paper, we simply sketch an outline of the proof. To prove the correctness statement given at the end of the previous section, we need to show that the values of the signals button_abs, knob_abs, swi tchea_abs, memory _abs, pc_abs, acc_abs and idle_aba are related from one point in time to the next according to the definition of ABSTRACT_MACHINE. The first step is to rewrite the hierarchical specification of the concrete implementation into a set of output equations for each of the externally visible state

Formal Verification and Implementation of a Microprocessor 141

signals: memory, pc, ace, idle, mpe, mar, ir, arg, buf and ready. This step, called "expansion", is performed automatically by a derived rule of inference provided the by HOL system. For example, expansion results in the following equation for the output of the accumulator showing that ACC is controlled by bit 22 of the microinstruction word 2. It.

aee(t + 1) (CNTL_BIT 22(FETCH5 MICROCODE(mpe t»

-> bus t 1 ace t»

This set of output equations can be used to derive theorems about future values of signals. In effect, the register-transfer implementation can be symbolically simulated through a sequence of states using case analysis on the initial state and current inputs to examine specific sequences of microinstructions. The crux of the proof is to show that for every possible state transition at the target level, the net effect of the the corresponding sequence of microinstructions preserves the effect of the target level state transition. For example, if the computer is at the beginning of the run cycle and about to execute an ADD instruction, then assuming the interrupt button is not pushed, the following theorem can be proved about the state of the computer at the end of the cycle with respect to its initial state at time It'. Note that the end of the cycle occurs in ten time units .

... 1- (memory t+10, pc t+10, ace t+10, idle t+10) - (memory t, INC13(pc t), ADD16 (ace t) OP, F) where OP -

FETCH13 (memory t) (CUT16_13(FETCH13(memory t)(pc t»»

Another important set in the proof is to show that the ready signal is true infinitely often. This is a consequence of the fact that from any location in the microcode, control will eventually return to either location 0 or location 5 and the fact that the implicit clocking of register-transfer level primitives is assumed to be infinite. In practical terms, this implies that the computer will reset properly from a random location in the microcode on power-up. As a step in the formal proof, showing that the signal ready is infinite results in several useful theorems described in 1171. Using these theorems, results from symbolic simulation of the concrete implementation can be translated into statements about the behaviour of the implementation with respect to the abstract time scale. In this manner, we can obtain, for example, the following theorem about the abstract computer when an ADD instruction is executed. 2Complete expansion would actually expand the terms ·CITL..BU" and "bu".

142 Formal Verification and Implementation of a Microprocessor

/- (memory_aba t+1. pc_aba t+1. acc_aba t+1. idle_aba t+1) - (memory_aba t. INC13(pc_aba t). ADD16 (acc_aba t) OP. F) where OP -

FETCH13 (memory_aba t) (CUT16_13(FETCH13(memory_aba t)(pc_aba t»»

This strategy of case analysis and symbolic simulation is straightforward in the case of Gordon's computer because there are exactly fifteen different possible sequences of microinstructions between adjacent points of time on the abstract time scale. A more realistic design would involve more than just fifteen unique sequences of microinstructions. For instance, Warren Hunt's FM8501 processor has a 'wait state' where control loops until an asynchronous memory process responds to a request. In this case, analysis of a microinstruction sequence involving a memory request would require a proof by induction on the length of the wait state. The verification of the VIPER microprocessor IS] employs a similar strategy of case analysis and symbolic simulation to prove that event sequences at a finer grain of time correctly implement major state transitions in the target machine. In the next section we describe several other related examples of hardware verification beginning with Gordon's LCF J.SM version of the computer proof.

4

Related Examples of Hardware Verification

LCF-LSM The LCF J.SM version of Gordon's computer example uses special terms to denote sequential behaviour. These terms consist of output and next state equations in terms of inputs and current state. A sequential behaviour for the concrete implementation is obtained from primitive sequential behaviours using the LCF J.SM analogues of circuit composition and internal signal hiding 17]- The target level behaviour is also specified as a sequential device using LCFJ.SM device behaviour terms. As in the case of the HOL specification, the behaviour of the target level device and the concrete implementation operate at different but related time scales. A special LCFJ.SM term, 'until c do t' is a temporal abstraction mechanism used to relate the sequential behaviour of the concrete implementation to the abstract behaviour of the target level computer. The semantics of this type of term are stated in 17]. Informally, an 'until c do t' term sequences a behaviour denoted by 't' until the condition 'c' becomes true. In the computer example, this term is used to step the concrete implementation through a sequence of microinstructions implementing a target level operation. This allows correctness to be stated as follows in the LCF J.SM language.

Formal Verification and Implementation or a Microprocessor 143

1- ABSTRACT_MACHINE (mellory __ tate.pc __ tate.acc __tate.idle __tate) -

lUltil ready do

CDNCRETE-M}.CHINE

(WORD5 (idle __ tate->O!5) • • emory __tate. pc __ tate. acc __tate • • ar __tate.ir_etate.arg __ tate.buf __tate)

The LCF-LSM correctness statement for the computer proof restricts the initial state of the microinstruction program counter to the start of either the idle or run cycles depending on the boolean variable idle__ tate. In effect, this statement of correctness only 8hows that the major 8tate transitioD8 of the abstract computer are correctly implemented by the corresponding microinstruction sequences of the concrete machine. This falls short of a statement of 'total' correctness for the computer implementation. For example, the LCF...LSM correctne88 statement ignores the problem of resetting the computer. The original specification of the computer implementation in [8J left the unu&ed part of the microcode ROM (locations 26 to 31) unspecified making it is easy to coutruct a case where the computer fails to power up correctly, e.g. it could loop forever in the unu&ed part of the microcode. Another shortcoming is that the need for stable switch and knob inputs is not made apparent by the LCF...LSM proof. The UNTIL rule implicitly introduces an &88umption that all inputs remain stable between major state traDSitionB whether or not this matters while these assumptions must be stated explicitly in the HOL correctness statement.

VERIFY Harry Barrow's VERIFY &ystem closely resembles the LCF...LSM style of hardware specification and has been u&ed to verify Gordon's computer [ll. This includes a temporal abstraction mechanism which is similar to the 'lUltil c do t' terms of LCF...LSM. The VERIFY system, implemented in Prolog, has been very successful in automating the verification of non-trivial designs using enumeration and symbolic manipulation strategies to prove the equivalence of an implementation and its desired behaviour. Automation is partially achieved by requiring the U8er to state behaviours (i.e. sub-goals) at moat intermediate levels in a hierarchical design. Neverthele88, the verification of Gordon's computer required considerably less human intervention than the LCF...LSM version.

Silica Pithecus Daniel Weise uses Lisp to describe the formal specification of a computer based on Gordon's example [l8}. However, Weise has replaced the microcoded implementation in Gordon's example with an implementation that executes a target

144 Formal Verification and Implementation of a Microprocessor

level instruction every clock cycle (e.g. the program counter has its own incrementation circuit). This modification simplifies the proof of correctness at the register-transfer level, however Weise's primary interest lies in establishing the correctness of the register-transfer implementation with respect to lower level circuit models. Weise does not discuss the target level specification of his computer or the formal verification of the register-transfer specification with respect to a target level specification.

Boyer-Moore Using the Boyer-Moore theorem prover, Warren Hunt has formally verified the FM8501 microprocessor which is similar in complexity to a PDP-ll 1141. Hunt also specifies behaviour using an approach similar to LCF.LSMj clocked sequential devices are modelled as recursive functions where each recursive call corresponds to a clock cycle. The significant difference between the FM8501 and Gordon's computer is that memory is modelled more realistically as an independent asynchronous process. In higher-order logic, an asynchronous memory process could be straightforwardly modelled using existential quantification. In a quantifier-free logic such as Boyer-Moore logic, free variables correspond to universally quantified variables while functions are used in place of existential quantification. The absence of existential quantification in Boyer-Moore logic motivates the use of an oracle to predict asynchronous responses from the memory process. Here, an oracle is a function from time to asynchronous events. In a higher-order logic version of the FM8501 example, induction would be used to prove that microcode 'wait states' terminate upon response from the memory process. In addition to verifying the register-transfer level behaviour with respect to the target level specification, Hunt establishes the correctness of gate level implementations for register-transfer level devices such as the arithmetic-logic unit.

5

Specification-Driven Implementation

An 8-bit version of Gordon's computer was implemented as a 5,000 transistor CMOS microchip in December 1985 while the author was a visiting student at Xerox PARCo The purpose of this exercise was to study the role of formal specification in the implementation of a design. The design was small enough to be completed in two months but was sufficiently varied to be representative of many aspects of VLSI design. The fabricated chip was tested in April 1986 at the University of Calgary and found to be fully functional. Besides halving the size of the data path down to an 8-bit version, the most significant difference between the formally specified design and the actual implementation was the use of a two-phase non-overlapping clock. We are currently

Formal Verification and Implementation of a Microprocessor 145

studying how to prove that the clocked sequential behaviour of the registertransfer level specification is an abstract view of the two-phase non-overlapping clocking scheme. In addition to the formally specified control unit and data path, a test structure and a small static RAM memory were included in the implementation. The layout of the '"TAMARACK" microprocessor, the name given to the implementation, is shown in Figure 6. Four separate subsystems are easily recognised in the layout. From top to bottom, these are: control unit, test structure, data path and 32-word by 8-bit RAM memory. Several other features are also easily identified such as the control signal wires and the microcode ROM at the top right-hand comer of the layout. An important decision in the design of the TAMARACK microprocessor was to use ratioless, complementary, restored, static CMOS logic wherever possible. A very simple layout style was also used even where this resulted in less than optimal space and time utilisation 3. Our desire for correct functionality over marginally better performance justified our use of simple circuitry and simple layout. Another motivation for simple circuitry was to see how much of a varied system could be designed using a simplified view of circuit behaviour. We found that only a very few parts of the design warranted more complex circuitry. The microcode ROM was implemented as a pseudo-NMOS NOR array and a 6-transistor XOR gate was used to implement the full-adder in the arithmeticlogic unit. Both of these departures resulted in considerable savings of circuit area using well-understood variations of CMOS logic 119J. However, a simple model of circuit behaviour could not be used to design the RAM memory. In this case, a detailed electrical model of the RAM memory was developed. The electrical model was not a structural representation of the RAM memory but rather a model based on analysis of worst-case behaviour. In many respects the usefulness of our formal specification of Gordon's computer was less than expected. The register-transfer level specification did not significantly infiuence circuit design and layout at the transistor level. Furthermore, there was no way to machine-check the consistency of the layout with the formal specification. A slight mis-ordering of control signals in the layout was only discovered by visual inspection shortly before the design was submitted for fabrication. This indorses the need for a tighter integration of hardware verification systems with conventional VLSI CAD tools. One of the most interesting discoveries of this implementation exercise was that the formal verification of the design missed a design error: there is no reset button to initialise the microinstruction program counter (this is different from the function of the interrupt button). The HOL version of the computer proof improves upon the original LCF J.SM version by ensuring that for any 8The same principles were used in the implementation of the much larger SECD chip, University of Calgary, 1986. The SECD chip is a 20,000 transistor processor which executes a purely functional language called "Lispkit" (12).

146

Formal Verification and Implementation of a Microprocessor

Figure 6: Microphotograph of the TAMARACK microprocessor

Formal Verification and Implementation of a Microprocessor 147

initial state of the microinstruction program counter, even an addre88 into the 'unused' part of the microcode, the computer will eventually reach the start of either the 'idle' or 'run' cycles, i.e. locations 0 or 5. However, our formal specification implied that the microinstruction program counter would assume a proper state on power-up and not some undefined value. After the design was submitted for fabrication and long after its formal verification, we decided to simulate the design using a switch-level simulator involving a more accurate model of a signal value 13). The design failed to simulate properly because the initial state of the mpe signal was undefined and there was no way to force the signal to a defined state. Fortunately, when the actual chips were returned and tested we found that the mpe signal tended to address o due to electrical factors not modelled at the switch-level and the chip worked correctly. Why did formal verification fail to discover this design error ? Clearly, the actual verification process, i.e. formal deduction, was correct. The source of the problem was the incorrect assumption that the mpe signal was bi-stable. This assumption was introduced by our behavioural specification of the MPC as a primitive component. Bi-stability is an abstract view of more complex forms of data in digital circuits, e.g. tri-state data, voltage values, etc. This abstraction is valid only under certain conditions. For instance, we could prove that the output of a register such as the MPC is bi-stable provided that the input to the register is bi-stable. In the formal specification of the computer we simply assumed that the output of the MPC was bi-stable. H instead we had been required to establish that the output of the MPC was bi-stable by showing that the input to the MPC was bi-stable (following a reset), then the need for a reset button would have inevitability been discovered. More generally, verification failed here because our register-transfer level primitives were not adequate models for reliable verification. Behaviours at the register-transfer level should be derived from some lower level model such as a switch-level model. 115) describes the verification of a CMOS full-adder using a data abstraction from a 4-value logic (low, high, float and error) up to bi-stable values. Behavioural models of register-transfer level components derived from a lower level model would clearly state the conditions under which the behaviour is valid. A library of register-transfer component behaviours and their respective validity conditions would then serve as a basis for a hardware verification methodology for VLSI design.

6

Summary and Future Research

The verification procedure requires approximately six hours of running time on a lightly loaded, 14 megabyte VAX 11/780. The proof required about two manmonths of effort which included learning how to specify and verify hardware in

148 Formal Veri8.cation and Implementation of a Microprocessor

higher-order logic and how to use the HOL proof assistant system. The main overhead of producing the initial specification and writing the ML programs to do the necessary inferences need only be done once. Thereafter, the finished proof could be easily modified to accommodate small changes in the hardware. The correctness of incremental changes to a design can be verified by editing the specification and verification procedure, usually in a very minor way, and re-running the proof as a batch job. This contrasts with the simulation approach which may require less initial effort but every modification requires the complete re-run and debugging of the entire simulation test suite. An experienced designer at Xerox PARe wisely observed that most Cbugs' in a VLSI design are due to last-minute modifications when time constraints do not permit complete re-simulation. Therefore, the initial effort to produce a formal verification may be rewarded in the long term.

Fundamental Circuit Models Future research in this area will likely focus on better circuit models [201. Our implementation of Gordon's computer suggests that it is possible to design non-trivial hardware with a simple model of circuit behaviour especially when functionality is more important than performance, e.g. safety-critical systems. Therefore, a fundamental circuit model for the purposes of formally reasoning about circuit behaviour is a reasonable prospect.

Computer-Assisted Theorem Proving The formal verification of non-trivial designs clearly requires computer assistance. Some of the theorems in the HOL version of Gordon's example were several pages in length but could be manipulated with ease using the HOL system. Nevertheless, the HOL system required considerable human intervention to guide the verification process. Other hardware verification systems such as VERIFY have largely automated the verification process but there will always be designs for which automatic verification fails. Although the automation of tedious low-level inference steps is universally agreed upon, the complete automation of the verification process is not necessarily desirable. We believe that the mere generation of a theorem in a formal system is unconvincing when taken as proof that a circuit has been properly designed. Behavioural specification of primitive components will always be approximations of physical reality. More importantly, the implication of a correctness statement may not be fully understood. Rather, it is the verification process itself and not the final result which is the more plausible guarantee of correct design. Future research in this area should focus on providing powerful tools which allow designers to reason accurately and efficiently about hardware designs.

Formal Verification and Implementation of a Microprocessor 149

Specification-Driven Design Most examples of hardware verification are based on "finished" designs. Little is known about the role of formal specification and verification in the actual design process. For instance, it is unclear how the derivation of a behaviour from a structure would significantly influence the design of the structure (other than to reject an incorrect design). However, we suggest that hardware verification has a potentially useful role in clarifying the informal abstractions used by designers. The design of complex hardware is often viewed as a hierarchy of informal abstractions. For instance, Neil Weste and Kamran Eshraghian present a relatively simple working set of abstractions about the behaviour of integrated circuits, e.g. CMOS logic structures, timing strategies [191. At a much higher level, the specification-driven design of Gordon's computer used abstraction to describe the target-level behaviour of the computer suppressing details about its concrete implementation. Once the abstract behaviour was formally specified in HOL, a microcode program was developed according to a mathematically defined relationship between the abstract and concrete time scales. More complicated hardware designs would involve a hierarchy of abstractions about data, time and behaviour [171. For example, various memory management strategies could be formally verified resulting in simpler and more abstract models of memory which could then be used to consider the correctness of the system at a higher level. An abstract view of the record allocation process in a Lisp machine would be designed to suppress details about garbage collection. Establishing the validity of this abstraction would involve showing that an abstract view of the processor state is left untouched by garbage collection. Finally, we suggest that formal abstraction may serve as a metric for "good" hardware design. Abstractions are usually only valid under certain conditions. For instance, the target-level behaviour of Gordon's computer is an abstract view of its register-transfer level behaviour only under assumptions about the stability of the knob and switch inputs. The formal verification of Gordon's computer results in a correctness statement in which these assumptions look untidy. It could be argued that a better design for the computer would have latched the knob and switch inputs at the start of the idle cycle eliminating the need for assumptions about the stability of these inputs. Hence, a correctness statement with fewer validity conditions may be an indication of a "better" design.

Acknowledgements I am grateful to Mike Gordon and Tom Melham for their helpful advice on this example and on using the HOL system. Xerox PARC generously fabricated the CMOS implementation of the TAMARACK microprocessor; thanks especially to Brian Preas, Rick Barth and Louis Monier. Rick Schediwy and Glenn Stone

150 Formal Veriflcatlon and Implementation of a Microprocessor

worked on testing the fabricated chip and Graham Birtwistle has helped to clarify my presentation of this work. This research is funded by the Cambridge Commonwealth Trust, Government of Alberta Heritage Fund and the Natural Sciences and Engineering Research Council of Canada.

References [1] Barrow, H., "VERIFY: A Program for Proving Correctness of Digital Hardware Designs" , Artificial Intelligence, Vol. 24, No. 1-3, December 1984. [2] Birtwistle, G., Joyce, J., Liblong, B., Melham, T., and Schediwy, R., "Specification and VLSI", Formal Aspects of VLSI Design: Proceedings of the 1985 Edinburgh Conference on VLSI, G.J. Milne and P.A. Subrahmanyam. eds., North-Holland, Amsterdam, 1986. [3] Bryant, R., "An Algorithm for MOS Logic Simulation" , Lambda Magazine, Fourth Quarter, 1980. [4] Camilleri, A., Gordon, M., and Melham, T., "Hardware Verification using Higher Order Logic" , Technical Report No. 91, Computer Laboratory, The University of Cambridge, June 1986. [5] Cohn, A. "A Proof of Correctness of the Viper Microprocessor: The First Level", Specification, Verification and Synthesis, January 1987, (this volume). [6] Gordon, M., "Representing a Logic in the LCF Metalanguage", Tools and Notions for Program Construction, edited by D. Neel, Cambridge University Press, 1982. [7] Gordon, M., "LCF .LSM, A System for Specifying and Verifying Hardware", Technical Report No. 41, Computer Laboratory, The University of Cambridge, September 1983. [8] Gordon, M., "Proving a Computer Correct using the LCF.LSM Hardware Verification System" , Technical Report No. 42, Computer Laboratory, The University of Cambridge, September 1983. [9] Gordon, M., and Herbert, J., "A Formal Hardware Verification Methodology and its Application to a Network Interface Chip", Technical Report No. 66, Computer Laboratory, The University of Cambridge, 1985. [10] Gordon, M., "HOL: A Proof Generating System for Higher Order Logic", Specification, Verification and Synthesis, January 1987, (this volume).

Pormal Verification and Implementation of a Microprocessor 151

[111 Hanna, F., and Daeche, N., -Specification and Verification of Digital Systems using Higher Order Predicate Logic" , lEE Proceedings, Vol. 133, Part E, No.5, September 1986. [121 Henderson, P., -Functional Programming: Application and implementation", Prentice-Hall, 1980. [131 Herbert, J., -Application of Formal Methods to Digital System Design", Ph.D Thesis, Computer Laboratory, University of Cambridge, (forthcoming 1987). [141 Hunt, W., -FM8501: A Verified Microprocessor", Ph.D. Thesis and Technical Report No. 47, Institute for Computer Science, The University of Texas at Austin, February 1986. [151 Joyce, J., -Formal Verification of a CMOS Full-adder in the HOL System", Internal Report, University of Cambridge. April 1987. [161 Joyce, J., Birtwistle, G., and Gordon, M., -Proving a Computer Correct in Higher Order Logic", Technical Report No. 100, Computer Laboratory, The University of Cambridge, December 1986. [171 Melham, T., -Abstraction Mechanisms for Hardware Verification", Specification, Verification and Synthesis, January 1987, (this volume). [181 Weise, D., -Formal Multilevel Hierarchical Verification of Synchronous MOS VLSI Circuits", Ph.D Thesis, Massachusetts Institute of Technology, August 1986. [191 Weste, N., and Eshraghian, K., -Principles of CMOS VLSI Design", Addison-Wesley Publishing Company, 1985. [201 Winskel, G., -Models and Logic of MOS Circuits", Specification, Verification and Synthesis, January 1987, (this volume).

152 Formal Verification and Implementation of a Microprocessor

Appendix The formal specification of the register-transfer level implementation of Gordon's computer is a higher-order logic 'theory' consisting of constants, axioms, and definitions. The following constants are used to specify the behaviour and implementation of Gordon's computer. EL SEG V VAL WORD BITS

:num-+* list-+* :num#num-+* list-+* list :boollist-+num :word-+num :num-+word :word-+boollist

nth element of a list sublist of a list number denoted by a bit list number denoted by a word word representing a number list of bits in a n-bit word

FLOAT 16 DEST_TRI16 MK..TRI16 U16

:trLword16 :trLword16-+word16 :word16-+trLword16 :tri_word16-+trLword16

high impedance value tri-state into bi-state bi-state into tri-state joining two tri-state signals

FETCHS_30 FETCH13_16 STORE13_16

:memS.30-+wordS-+word30 :mem13_16-+word13-+word16 :word13-+word16 -+mem13_16-+mem13-16

ROM memory fetch RAM memory fetch RAM memory store

PAD 13_16 CUT16_13

:word13-+word16 :word16-+word13

pad 3 most significant bits cut 3 most significant bits

INC16 ADD16 SUB16

:word16-+word16 :word16-+word16-+word16 :word16-+word16-+word16

16-bit incrementation 16-bit addition 16-bit subtraction

OPCODE

:word16-+word13

extract opcode field

MICROCODE

:memS.30

microcode ROM

The constants EL, SEG, V, VAL, WORD and BITS are built into the HOL system. These constants name operations on lists and conversions between natural numbers and their binary representations. Currently, complete theories about lists and n-bit words have not been developed. Instead, each of these constants has an improvised evaluation rule which evaluates applications of the constant to primitive data objects. The constants FETCHS_30, FETCH13_16, STORE13_16, INC16, ADD16, SUB16 and OPCODE are 'understood' to name certain operations but have

Formal Verification and Implementation of a Microprocessor

153

no properties in the formal theory. Properties for the constants PAD 13_16, CUT16_13, FLOAT16, DEST_TRI16, MK_TRI16, U16 and MICROCODE are described by the following axioms.

1- !w:word13. CUT16_13 (PAD13_16 w) - w

1- !w:word16. DEST_TRI16 (MK_TRI16 w) - w 1- !w:tri_word16. (FLOAT16 U16 w - w) /\ (w U16 FLOAT16 - w) 1111111-

FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCH5 FETCHS FETCHS FETCHS FETCHS FETCHS FETCHS

MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE MICROCODE

#00000 #00001 #00010 #00011 #00100 #00101 #00110 #00111 #01000 #01001 #01010 #01011 #01100 #01101 #01110 #01111 #10000 #10001 #10010 #10011 #10100 #10101 #10110 #10111 #11000 #11001 #11010 #11011 #11100 #11101 #11110 #11111

= #000000000000000110000000001001

#000000000000000000001000000011 #010001000000000000000000000000 - #010000010000000000000000000000 - #001000100000000000011100000000 - #000000000000000100011000000001 - #001000100000000000100000000000 - #000100001000000000000000000000 = #000010000100000000100100000000 - #000000000000000000101000000100 - #000000000000000000000000000000 - #000001000010000000010100000000 - #000000000000000001000101011010 - #000000001001000001001100000000 = #000000001001000001011000000000 - #001000000010000001100000000000 - #001000000010000001100100000000 - #000000100000010001001000000000 - #000001000000001000010100000000 - #001000000010000001010000000000 - #000010000000100001010100000000 - #000000010000001001000100000000 - #001000000010000001011100000000 - #000010000000110001010100000000 - #000010010000000001000100000000 - #000100001000000001000100000000 - #000000000000000000000000000000 - #000000000000000000000000000000 - #000000000000000000000000000000 - #000000000000000000000000000000 = #000000000000000000000000000000 - #000000000000000000000000000000 z

=

154 Formal Verification and Implementation of a Microprocessor

The rest of the fonnal theory consists of definitions. INC13, CNTL-BIT, CNTL-FIELD, B-ADDR, A-ADDR and TEST are defined to improve the readability of the fonnal specification. The remaining definitions describe structure and behaviour in the design.

%-------------------------------------------------------- ~ %-- Auxiliary Definitions ------------------------------- % 1- INC13 w - CUT16_13(INC16(PAD13_16 w» 11111-

CNTL_BIT n w - EL n (BITS30 w) CNTL_FIELD (m,n) w - WoRD2 (V (SEG (m,n) (BITS30 w») B_ADDR w - WORDS (V (SEG(3,7) (BITS30 w») A_ADDR w - WORDS (V (SEG (8,12) (BITS30 w») TEST w - V (SEG(O,2) (BITS30 w»

%-------------------------------------------------------% -- Primitive Behaviours --------------------------------

~ ~

1- REG16 (input,load,output) It:time. output (t+l) - «load t) -> (input t) 1 (output t» 1- ACC - REG16 1- IR - REGIS 1- ARG - REG16 1- REG13 (input,load,output) z !t:time. output (t+l) - «load t) -> (CUT16_13 (input t» 1- MAR - REG13 1- PC - REG13

1 (output t»

1- BUF (input,output) - !t:time. output (t+l) - input t

1- GATE16 (input ,control, output) 1111-

!t:time. output t - (control t c> MK_TRI16 (input t) 1 FLOAT16) GO - GATE16 G2 - GATE16 G3 - GATE1S G4 - GATE1S

Formal Verification and Implementation of a Microprocessor

1- GATE13 (input,control,output) -

!t:time. output t (control t -> NK_TRI16 (PAD13_16 (input t» 1- Gl - GATE13

1 FLOAT16)

1- MEN (memory ,mar ,buB ,memcntl ,memout) It:time. (memout t «VAL2 (memcntl t) - 1) -> NK_TRI16 (FETCR13 (memory t) (mar t» 1 FLOAT16» (memory(t+l) «VAL2 (memcntl t) - 2) -> STORE13 (mar t) (buB t) (memory t) 1 memory t»

/\

1- ALU (arg,buB,alucntl,alu) -

!t:time. (alu t «VAL2 (alucntl t) - 0) -> bUB t 1 (VAL2 (alucntl t) - 1) -> INC16 (buB t) 1 (VAL2 (alucntl t) - 2) -> ADD16 (arg t) (bus t) SUB16 (arg t) (bUB t»)

1- BUS(memout,gO,gl,g2,g3,g4,bus) !t:tillle. bUB t - DEST_TRI16 (memout t U16 SO t U16 Sl

t

U16 S2

t

U16 S3 t U16 S4 t)

1- RON microcode (mpc,rom) !t:time. rom t - FETCRS microcode (mpc t) 1- NPC (nextaddreBB,mpc) - !t:time. mpc (t+l) - nextaddreBB t 1- DECODE (rom,knob,button,acc , ir,nextaddresB,rBw,wmar,memcntl, wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf,ready,idle) !t:time. (nextaddreBB t «(TEST(rom t) - 1) /\ (button t» -> B_ADDR(roll t) 1 «TEST(roll t) - 2) /\ (VAL16(acc t) - 0» -> B_ADDR(roll t) 1 (TEST(roll t) - 3) ->

=

155

156 Formal Verification and Implementation of a Microprocessor

WORD5(VAL2(knob t) + VAL5(A_ADDR(rom t))) I (TEST(rom t) = 4) => WORD5(VAL3(OPCODE(ir t)) + VAL5(A_ADDR(rom t))) A_ADDR(rom t))) /\ (rsv t = CNTL_BIT 28 (rom t)) /\ (vmar t = CNTL_BIT 27 (rom t)) /\ (memcntl t = CNTL_FIELD (25,26) (rom t)) /\ (vpc t = CNTL_BIT 24 (rom t)) /\ (rpc t = CNTL_BIT 23 (rom t)) /\ (vacc t = CNTL_BIT 22 (rom t)) /\ (race t = CNTL_BIT 21 (rom t)) /\ (vir t = CNTL_BIT 20 (rom t)) /\ (rir t = CNTL_BIT 19 (rom t)) /\ (varg t = CNTL_BIT 18 (rom t)) /\ (alucntl t = CNTL_FIELD (16,17) (rom t)) /\ (rbuf t = CNTL_BIT 15 (rom t)) /\ (ready t = CNTL_BIT 14 (rom t)) /\ (idle t = CNTL_BIT 13 (rom t))

% -------------------------------------------------------- % % -- Intermediate Level Specifications ------------------- % 1- DATA_PATH (svitches,memory,pc,acc,mar,ir,arg,buf,rsv,vmar, memcntl,vpc,rpc,vacc,racc,vir,rir,varg,alucntl,rbuf) = ?gO gl g2 g3 g4 memout alu bus. MEM(memory,mar,bus,memcntl,memout) /\ MAR(bus,vmar,mar) /\ PC(bus,vpc,pc) /\ ACC(bus,vacc,acc) /\ IR(bus,vir,ir) /\ ARG(bus,varg,arg) /\ BUF(alu,buf) /\ GO(svitches,rsv,gO) /\ Gl(pc,rpc,gl) /\ G2(acc,racc,g2) /\ G3(ir,rir,g3) /\ G4(buf,rbuf,g4) /\ ALU(arg,bus,alucntl,alu) /\ BUS(memout,gO,gl,g2,g3,g4,bus)

Formal Verification and Implementation of a Microprocessor

157

1- CONTROL_UNIT microcode (button,knob,acc,idla,mpc,ir,~eady,rsw,wmar,

memcntl,wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf) = ?rom nextaddress. ROM microcode (mpc,rom) /\ MPC (nextaddress,mpc) /\ DECODE (rom,knob,button,acc,ir,nextaddress,rsw,wmar,memcntl, wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf,ready,idle)

% -------------------------------------------------------- % % -- Top Level Specification ----------------------------- % 1- CONCRETE_MACHINE (button,knob,switches,memory,pc,acc,idle) (mpc,mar,ir,arg,buf) ready = ?rsw wmar memcntl wpc rpc wacc racc wir rir warg alucntl rbuf. CONTROL_UNIT MICROCODE (button,knob,acc,idle,mpc,ir,ready,rsw,wmar, memcntl,wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf) /\ DATA_PATH (switches,memory,pc,acc,mar,ir,arg,buf,rsw,wmar, memcntl,wpc,rpc,wacc,racc,wir,rir,warg,alucntl,rbuf)

5 Towards a framework for dealing with system timing in Very High Level Silicon Compilers 1 P .A.Subrahmanyam AT&T Bell Laboratories Abstract. The design of a VLSI system typically involves many decisions relating to several levels of timing detail. Examples include choices relating to timing disciplines, architectural and circuit design paradigms, the number of clock phases, the number of amplification stages, the sizes of specific transistors, and the temporal behavior demanded of specific input signals (such as stability criteria they must obey). This paper discusses a framework for dealing with system timing issues in the context of very high level silicon compilers intended to aid in mapping abstract behavioral specifications into VLSI circuits. The paradigm considered allows the details of system timing to be gradually introduced as a design progresses. During the early stages of a design, no timing details need be explicit, and the only temporal constraints arise due to functionality, causality and stability criteria. As a design evolves, and decisions relating to resource allocation, computation schedules, timing and clocking disciplines are made, the temporal constraints on signals evolve correspondingly. If the external environment in which a system resides imposes any a priori temporal constraints, such constraints can be explicitly included in the initial specification, and accounted for in the synthesis process. The framework supports synchronous and asynchronous (self-timed) disciplines, multiphase-clocks, as well as techniques that bear on optimizations of performance metrics such as power and area.

1. Introduction: System Timing Issues in VLSI Design With an increase in the sophistication of tools dealing with lower level VLSI design tasks (such as layout editors and circuit simulators), there is growing interest in design tools that map higher level behavioral specifications into mask descriptions for circuit fabrication. An important issue in this context is that of 8y8tem timing, which deals with the relative sequencing of subcomputations, the detailed behavior of signal voltages over time, and overall system performance. 1. MIIoY 1986, Revised

160

P.A.Subrahmanyam

The VLSI design process spans the behavioral, structural and physical dimensions as well as several abstraction levels in each of these dimensions. The design of a non-trivial system typically involves many decisions relating to several levels of timing detail. Examples include choices relating to timing disciplines, such as synchronous versus asynchronous (or selftimed[21]) disciplines; .architectural and circuit design paradigms, such as static versus dynamic circuits in CMOS; the number of clock phases; the number of amplification stages; the sizes of specific transistors; and the temporal behavior demanded of specific input signals, such as intervals of time over which they must be stable. Design decisions that concern these and other timing issues, particularly when taken in combination, clearly have a major impact on the performance attributes, if not the correct functioning of a system. It is therefore important to be cognizant of the impact of each decision on the overall system timing in order to avoid making conflicting decisions that result in an erroneous functioning of the system. This requires a way to correlate timing decisions of various kinds from some uniform perspective. Unfortunately, however, no well-defined framework exists for dealing with the various aspects of system timing. In particular, no coherent formal framework exists that can serve as the basis for an agent dealing with system timing issues in the context of high level synthesis. In practise, therefore, considerations relating to overall system timing tend to be somewhat intangible, and are often introduced into a design in ad hoc ways and at ill-defined points. AI; a consequence, most work on high level design synthesis tools, such as very high level silicon compilers, has hitherto not addressed the issue of global system timing, let alone consider the issue of synthesizing circuits optimized for performance. While several existing design tools deal with physical and geometric constraints, such as chip area and floorplanning constraints, few tools deal with the spectrum of temporal constraints relating to the overall system behavior. Even at the lower levels, designers of digital VLSI circuits have few tools available for the analysis of system timing and the optimization of circuit performance. Instead, a designer relies extensively on design intuition and low level circuit analysis tools such as circuit simulation and/or critical delay path analysis. Such analysis is often very labor intensive and somewhat error prone. Further, the results obtained tend to be suboptimal in terms of area/delay or power/delay tradeoff. Only recently have some tools begun to evolve in this area that use mathematical programming techniques (such as linear programming, nonlinear programming, and convex programming) to address some limited aspects of these problems[18, 5J. This paper describes a rudimentary framework in which various aspects of system timing can be dealt with in a cohesive manner. The issues addressed here include the following: What are the different classes of external timing specifications, and how are such timing criteria specified? What are the

A framework for S1l8tem Timing

161

temporal abstractions that typically arise in a digital system design, and how are they specified and modeled? When, and how, are various timing details introduced into a design? Are there, or should there be, any criteria that influence these decisions? Is there a conceptual framework and an associated design paradigm that allows for an appropriate consideration of the temporal levels of abstraction involved? and, What are the formal underpinnings of such a framework? More specifically, the objectives of the framework developed here are: • to identify some of the important levels of temporal abstraction involved in system design, and to indicate how they may be formally modeled; • to associate each level of temporal abstraction with a class of design decisions that can potentially influence system timing details at this level; • to provide a formal means to infer the constraints induced on system timing by some important classes of design decisions; and conversely, to suggest how existing timing constraints (either externally decreed or generated as a result of earlier design decisions) may influence subsequent design decisions; • to provide a way to correlate the details relating to system timing at various levels of abstraction via their influence on a common representation of timing information based on partial specifications of waveforms; • to relate the timing parameters introduced by the more "abstract" models to the more "concrete" timing parameters that are associated with lower level circuit models, and ultimately semiconductor models. In addition, we identify some categories of timing information that can be generated during the synthesis process, and that can serve as useful input to lower level timing analyzers.

1.1 Related Research Some aspects pertaining to system timing have been studied in great depth; examples include simulation (at the functional, switch, electrical and device levels [3,16,10]), static timing analysis[25], and the impact of transistor slZlng on circuit timing characteristics[5,18]. Similarly, descriptive mechanisms for dealing with timing, such as temporal logic, have been examined in depth in various contexts, e.g.,[14,15]. Unfortunately, however, not much attention has yet been given to examining when and how different timing decisions are made in the context of the overall VLSI synthesis process, and how the cumulative effects of such design decisions may be reasoned about. The lack of research in this area may partly owe to the fact that design aids for high level behavioral

162

P.A.Subrahmanllam

synthesis have not yet matured to a point where they demand a sophisticated approach to system timing. Before discussing the details of the proposed framework, we first make some relevant observations about the interrelation between circuit design and system timing. These observations lead to a list of characteristics that are useful in such frameworks.

1.2 Some Observations It is desirable for design aids such as very high-level silicon compilers to avoid the premature introduction of explicit timing information during specification and/or synthesis. This is because typically, during the early stages of a system design/description, the major concern is one of functionality, and inessential details of timing are best suppressed. When dealing with system descriptions at a lower level, however, the temporal aspects assume greater importance and visibility. This suggests the need for a framework that supports gradual introduction of the relevant timing information as a design evolves. Further, it is important that any framework for dealing with system timing not impose any a priori restrictions on the timing discipline adopted. More specifically, it should allow for synchronous and asynchronous (self-timed) circuits, local and global clocks, multiple clocks, and multiphase-clocks (such as those used in dynamic CMOS circuits). Unfortunately, a majority of existing design aids typically (i) include explicit (and detailed) information about eventual circuit timing as part of their input specifications, or (ii) provide only for certain default mechanisms, or (iii) ignore timing issues altogether. For the reasons alluded to above, such approaches are somewhat unsatisfactory.

1.3 Characteristics of the proposed framework The design paradigm and associated framework advocated here attempt to meet the criteria just enumerated, and allows the details of system timing to be gradually introduced as a design progresses. At the highest level, no timing details need be explicit, and the only temporal constraints are causal. As a design evolves, and decisions relating to the implementation and timing disciplines are made, the temporal constraints on signals evolve correspondingly. H, on the other hand, the external environment in which a system resides imposes any a priori temporal constraints, the framework allows for such constraints to be explicitly included in the initial specification, and accounted for in the synthesis process.

163

A framework for S,lJtem Timing

PIa8 ClllllWalDti GO tllewll8 wawIbnm -.y ICqI\InI dIIIIDI ceNID bdenaII JIMIducIs cIeII,y panIIIIIIIn Into wawdbrm IpeaI

QoIceol DeIIp ObjecUwc PanIaI order to ~ t--~-------;.,. OR{nb)coqab&Joal ID aIpl&bm

abebaYiar

Henn:bIeIIJ,J ftlIa&fJI

IJ*IIIc - - OR ..... 1Udl. cIocIaI

01' ft4UIII&/1daIawIedp

..... too&lllr {nb)coqab&Joal

IDIIbnIeI a reIIIIOR W-&bedeIV

panIIIIIIIn c:IIancIedIIIIc

~

&be IIIbaIIIdIIB .... III \be ~oiloa ODd &be maID IIDCIuIe

CIIIIIIdIra&IDI

Le.._\be boIIaYIar 01 &be ~

I

beCllllllll&llll wIIIl \be eaIInIIl

DIIIerIdDe &be IIUIIIber 01

uq)1IIIca&IOR....

~: •

;

beIIaYIar deIINd

I ~ pa&Ja

I ~SIIDI r+

deIq

lae:.-:I

I

R:MIdIII

I 1

Figure 1. The evolution of system timing details

2. Design evolution and the introduction or timing details Before elaborating on the technical details of the proposed framework, we will first motivate (and summarize) the approach adopted. We begin by indicating how increasing degrees of timing detail are introduced into the design description as a design evolves (cf. Figure 1). In order to do this, we

164

P.A.Subrahmanllam

need to make certain assumptions about how a design evolves, and more specifically, we need to made make assumptions about those aspects of the design process that have a major impact on system timing. 2 Some aspects of such a design evolution are listed in the first column of Figure 1; the associated effects on system timing details are listed in the rightmost column. The first two entries in the second column are examples of auxiliary considerations that are used in generating some of the temporal constraints; the remainder of the second column lists some of the local design objectives that influence the design decisions (in horizontal proximity) in the first column. The first two entries in the third column are examples of temporal abstractions that are associated with the design decisions in the first column. The last two entries in the third column are examples of models that are used in circuit analysis; such analysis influences the design decisions and procedures listed in the first column. Initially, a design commences with a behavioral specification that details the functionality desired of the circuit to be synthesized. 3 The temporal constraints induced on signals at this stage stem from causality considerations implied by intrinsic data dependencies, as well as functionality and stability criteria. Inference and approximation techniques for computing such constraints are discussed in Section 4. It should be noted that the parameters characterising the constraints generated at this stage are not (yet) related to any underlying circuit models. The choice oC a particular algorithm to implement a speciCied behavior induces a temporal partial order on the sub computations involved. .As architectural design decisions relating to resource trade-ofCs are made, additional temporal constraints are imposed by the need to share resources in the design. Examples of such constraints and their ramiCications are discussed in Section 5. The choice of timing disciplines (synchronous or asynchronous) introduces the next level of timing detail. A timing discipline provides a protocol for indicating the initiation and/or completion of a set of events that comprise a (sub )computation. A specific protocol establishes a relation between the duration of sub computations and the states of some distinguished signals, e.g., clocks (for synchronous systems) and request/acknowledge signals (for 2. Although experienced human designers appear to have subliminal (and not always explicit) ways of accounting for such information, researchers in design automation do not appear to have hitherto addressed this problem in a rigorous manner (to our knowledge). 3. The initial specifications may additionally include the desired performance criteria (possibly along several dimensions such as area, response time, pin count, and throughput), and interfacing specifications such as interface protocols.

A framework for SY8tem Timing

165

self-timed systems). Ai; a consequence, such protocols provide a mechanism for synchronizing the sub computations in a system, and enable the temporal constraints to be developed hierarchically. Some forms of circuit design styles are related to a specific choice of a timing discipline, and may in turn impose constraints on clocking disciplines. For instance, the decision to use clocked dynamic circuits in CMOS typically implies the need for a precharge phase and a compute phase, and consequently a certain minimal number of clock phases. Additionally, the structural details of a circuit that is used to implement a function may impose further temporal constraints on signals, such as stability constraints relating to setup and hold times for signals. These issues are dealt with in Section 6. Within the confines of a fixed architecture, the need to optimize certain performance metrics such as area, time, (or some function thereof), and an analysis of the critical paths in an architectural design, may further influence design decisions relating to clocking disciplines (such as multiphase clocks, multiple clocks, local clocks, and global clocks). Lower level timing optimizations based on RC circuit models of increasing sophistication are performed once the overall architecture has been outlined. Such optimizations influence factors such as the number of amplification stages used by a logic design, and the sizes of the individual transistors in each stage. Further, they have a direct effect on controlling many of the parameters that influence the details of how various signals evolve over time. These issues are elaborated upon in Sections 7 and 8. Additionally, considerations at this level relate the more "abstract" tim.ing parameters introduced during the earlier design phases to more "concrete" process parameters that are associated with semiconductor models. The above scenario highlights some of the design decisions that can potentially have a major impact on system timing. To summarize, these include: the formulation of a behavioral specification for a system; design decisions pertaining to the system architecture (i.e., the choice of the set of modules used to implement a desired behavior); the choice of timing disciplines; the choice of clocking disciplines (including the number of clock phases); and further optimizations that influence the number of amplification stages and the sizes of the individual transistors in each stage. Further, a partial order on the evolution of these design decisions is suggested. Given this setting, we will, in subsequent sections, identify different levels of temporal abstraction in system design and indicate how they can be formally modeled in an algebraic/logical setting. We will also indicate how it is possible to mechanically infer some of the temporal constraints arising out of a design decision, as opposed to such constraints being explicitly specified. There are several noteworthy consequences of the existence of such a framework (consisting of the set of models and inferencing

166

P.A.Subrahmanllam

techniques). Firstly, it provides a way of correlating the timing decisions at various levels via their (cumulative) effect on signal waveform behaviors over time. Secondly, the information inferred may be used in incrementally checking for the consistency of the timing constraints induced on a signal waveform from different sources. Should an inconsistency exist, it enables an early detection of this fact, and serves to identify the set of design decisions that contribute to this inconsistency. The framework also assists in computing relations between the delay parameters of the modules used in an implementation and the delays characterising the overall system; and in conveying information relating to circuit partitioning and timing protocols to lower level tools such as timing analyzers. Thirdly, the existence of such a formalism potentially implies that both the generation of the temporal constraints and the consistency checking process can benefit from mechanical assistance, provided appropriately constrained primitives are used; however, we do not elaborate on this issue here. The details of how the various design decisions are actually made is an important related concern that is beyond the scope of this paper; we occasionally comment, however, on some of the criteria influencing such decisions. An example illustrating some of these issues is contained in Section 9. It is important to bear in mind that the overall VLSI (system) design process is often iterative in nature. Consequently, design details that evolve at a later stage may potentially affect earlier decisions, for instance layout considerations may influence some aspects of the architectural design. The conceptual model presented here for the introduction of timing details accommodates this viewpoint, and nothing in our subsequent development precludes such iterations in design. Indeed, each level of timing elaboration can impact design decisions at earlier stages; in optimistic scenarios, such redesign is avoided (or at least localized) by appropriate decisions during the early stages of design. The application of the proposed framework in the context of analysis and synthesis is discussed in Section 10. We next elaborate on some facets of the proposed framework.

3. Representing Timing Characteristics We first introduce a low level representation of the behavior of signals over time, and then discuss the correlation between such low level representations and the system behavior. Our subsequent discussions will relate to how the temporal aspects of design decisions at different levels of abstraction may be modeled, and how they (ultimately) influence this signal representation.

3.1 Signal and System Behavior

A framework for SY8tem Timing

167

9.1.1 Signal8 A signal may be viewed as a function from a time domain (denoted Time) that describes appropriate instants of time, to a set of signal voltage values (denoted Vol t.age). Thus, Signal: Time

~

VOlt.age

Signal voltage values usually form a continuous domain. However, in the context of digital design, we will assume here that such values can be abstracted to a discrete domain of values, e.g., {a, 1, X} or {a, I}. As a simple example, consider a function Discretize:Voltage- {0,1,X} of a function that maps the continuos range of voltages [0.0-5.0] to {a, 1, X}; voltages less than 0.5 are mapped to the value 0, voltages between 0.5 and 4.5 to the value X, and voltages above 4.5 are mapped to the value 1.

As the details of a system evolve, constraints evolve on the detailed nature of various signals. A signal waveform is constrained when an explicit statement is made about any of its characteristics (such as its value or shape) during some interval of time. Such statements may be formally made by using appropriately defined predicates or functions, as we will elaborate upon later. Very often, these constraints are only partial, in that the behavior of the signal waveforms outside of these intervals is unrestricted. Further abstractions of such discrete representations are usually required to relate them to elements of other problem specific domains, such as integers, stacks of integers, and arrays. We will not be directly concerned with such data abstractions in this paper, although they are naturally handled within the current framework. 9.1.2 System Behavior The behavior of a system may be described at many different levels of abstraction. At one extreme, system behavior may be described in terms of the values in some abstract input domains that are presented at the various input ports, and the functions computed on these values and made available at (some or all of) the output ports. At the other extreme, system behavior may be described in the terms of the relations (that must hold) between the signal voltage waveforms at its input and output pins.

An important difference between these two perspectives is that in the first, there is no explicit mention of timing, whereas in the second there is a very detailed concern with how signals behave over time (since signals waveforms represent signal voltages as they evolve over time). One of the objectives of this paper is to explore a framework for timing in the context of synthesis that enables us to bridge this gap. As a first step, we examine the temporal implications of the functional/behavioral specification of a system.

P.A.Subrahmanllam

168

4. Temporal Implications of Causality Considerations A major assumption we make relates to causality: we assume that an output of a function (or module) can be computed (i. c., become available) only after the inputs it depends upon are available. While we will typically assume that a delay is associated with such a computation, we will make no a priori assumptions about the detailed nature of this delay. We now outline some of the ramifications of such an assumption. More specifically, we give examples of constraints on the input and output waveforms that are implied by such an assumption. We will not discuss in detail an algorithm that enables the computation of such causality constraints in general, but rather convey the intuition underlying it. Our subsequent discussion uses the notion of signals being stable over intervals of time having specified durations. We next review the definition of these notions before proceeding further.

4.1 Notation The terminology used here draws upon work in Temporal logic [6, 7, 15]. Two instants tl and t2 of Time serve to define an interval i of duration t2"" t l • The interval is defined to be semi-open, starting at t l , and ending "just before" t 2, and is sometimes denoted [tbt2). A signal is said to be Iltable during an interval of time if its value remains unchanged during this interval. The stability of a signal can be captured by means of the predicate Stable: (Time X Time) X Signal .. Boolean defined by

which embodies the fact s maintains a constant value from time tl to just before t 2 • We will denote the fact that a signal s has the Cf>nstant value c in domain D during an interval [t b t 2), by constant(t b t 2)(Il)(C). Formally, the predicate constant: (Time X Time) X (Signal is defined by

D-

Boolean)

A framework for System Timing

169

which may also be written as constant(t ll t 2)(S)(C)

== del

stable(t ll t 2)(s) & S(tl)=C

4.2 Causality constraints and system timing We now elaborate, via an example, the implications of causality assumptions on the timing behavior of the input/output signals of a module. We will denote by d 1 the time taken for an output value (i. e., the signal at an output port) to stabilize once the stable input values are available. If there is a change in one or more inputs, the output may change; however, this change takes some time, which duration we will denote by d2 • Intuitively, d 1 and d2 can be thought of being the times taken for the output to "react" to a change in the input values. This reaction time may of course be different for each output (and, as discussed below, for each combination of inputs/outputs); we will here restrict ourselves to considering only one output. It is relatively straightforward to generalize the arguments to the multiple output case. If the cardinality of each of the input domains is D, and there are N inputs, the total number of input combinations (or input states) is DN. The total number of input state transitions possible is therefore DN*(DN_I). In general, the reaction times for an output may be different for each of these transitions. In such a case, we will denote by di and d~ the reaction times associated with the i-th transition. A transition into a given state may be denoted by the pair , whereas a transition out of the current state may be denoted by the pair . For each state transition, the corresponding output signal may (I) remain unchanged; (2) change to a different value; or (3) exhibit a "glitch" i.e., change to a different value for some period and then return to the original value. Obviously, it is expected that the output(s) of a properly functioning circuit will not exhibit any glitches, and exhibit a change to a different value only when this is consistent with the specified behavior. Consider a NAND module which implements a 2-input NAND function. The NAND function may be defined, for instance, as shown below.

nand(O,O)=l nand(O,l)=l nand(1,O)=l nand(l,l)=O As an example, consider a time interval when the inputs of a two input

170

P.A.Subrahmanyam

NAND gate have the value 00. IT p denotes the previous state and n the next state, the basic causality assumption associated with this interval yields the following constraint on the signals 8$' 8" and 8 z

:3 t1>t2 such that constant(t1>t2)(8$)(0)

& constant(t1>t2)(8,)(0) ~ constant(tl+df·oo, t 2+d?J'·)(8z )(I)

where

~

denotes logical implication.

Intuitively, this constraint says that if the inputs 8$ and 8, of the NAND gate are stable at the value 00 during the interval from tl to t 2, then the output 8 z must acquire the value 1 at some time tl+df·oo after this interval, where the actual delay in responding to the state 00 may depend upon the previous state p. Further, this value of 8 z is maintained until a time t 2+dB'l'· where the constant d?J.· may depend upon the next state n. It is important to note that there are two distinct factors that lead to this constraint. The first has to do with causality: if the value of 8 z is specified to depend on the values of 8$ and 8" then 8 z must, at some time, react (correctly or incorrectly) to the values of 8$ and 8, in some (and all) intervals. The second has to do with functionality, i.e., 8 z reacting properly: this mandates that if 8$ and 8, both have the value 0 then 8 z needs to attain the value 1, consistent with the definition of the NAND function. In essence, the causality assumption serves to "parameterize" a design by the variables di and d~ (for each possible transition index i), and introduces constraints on the signal waveform. The form of the causality constraints that arise from considering a 8ingle output is given by the proposition below, which follows from the arguments we have gone through above. We denote the vector of signal waveforms corresponding to the inputs by that comprise the input domain D byaD' and the signal waveform corresponding to the output on which a function F is computed by 8F' The variables p ,e ,n that range over the domain D are intended to connote the previous, current, and next state of the inputs of the module. We use constant( tl>t 2)(8D )(e) to denote the fact that all components of aD are stable during the interval [tl, t 2) at the value in the corresponding component ofe. Propo8ition 1. Given a function F:D-+ R, the temporal constraints on the signal waveforms aD and 8F that arise from causality considerations are given by

A framework for System Timing

171

V c,p,nED 3 t1>t2 constant(t1>t2)(8D)("t):::> constant{tl+df··, t2+d§''')(sF)(F("t)) where the df·· and d§." are parameters that characterise the module implementing F. Further, V c E D dr·· = 0, d§.· = 0. -. To simplify notation, we will henceforth drop explicit use of the vector symbol. 4.f.l Reducing the number of independent parameters For a specific function, the actual number of independent variables may be reduced to symmetries in the function implemented and/or the circuit structure used to implement it. Even for small values of N (where N denotes the number of inputs), this set can be quite large. For example, if N=2, and the domain of each input is {0,1}, the number of transitions is 22*(22-1) = 4*3 = 12, and the number of parameters is 24 (i.e., independent values of d 1 and d2 for each of the 12 transitions). To account for finite signal transition, it is necessary to have at least a 3-valued domain, such as {0,1,)C}. & a consequence, it seems desirable to reduce the ensuing complexity, by approximating the behavior of the signals in some appropriate manner. Given a function F:DIXD2x ... XDn-R, where R may in some cases decomposed as R IX ... X Rm, the delay parameters characterizing an implementation of F may depend, in the most general case, on all of the m+ n values in the domain and range. A simplification may be achieved by reducing the number of values in the domain and range the delay parameters are assumed to be dependent upon. Three obvious simplifications are to consider dependency only on (i) the domain values D IX D2X ... X Dn; (ii) the range values R; (iii) an explicitly specified subset of the range/domain values (particular to the function being considered). Additionally, it is possible to make the assumption that the delay parameters are completely independent of the actual values in the domain and range. The example discussed earlier in this section in fact corresponds to the first approximation. That is, the assumption was that the delay parameters for a two-input NAND gate depended upon the state of both the input signals. In the rest of this section, we elaborate on the ramifications of the other approximations indicated above. 4.f.1.1 Approximations based on output values Consider now the implication of the assumption that the delay parameters depend only upon the range R, i.e., upon the value computed by the function F. This approximation has the effect of partitioning all of the state transitions into equivalence classes that correspond to transitions into and out of different stable output values. Thus, in the case of Boolean valued functions, we can partition the set of transitions into two classes, corresponding to the stable

172

P.A.Subrahman7lam

output values 0 and 1. Formally, this may be stated as follows.

Approximation R. The set of parameters df,e and approximated by the set of parameters

dr

and

d~

d~," where p,c,nED are where rER. -

It should be noted that this approximation will reduce the number of independent parameters only if the cardinality of the domain D=DIX ... XDn is less than the cardinality of the range R. This condition is typically met for a nontrivial boolean valued function with more than one input. Let the domain D be partitioned into subsets D" such that d; and dj are in the same partition only if they are mapped into the same value by the function F. That is, d;,djE D" if and only if F(d;)=F(dj ). We will denote the set of such partitions D" by DF • Note that this partition depends on the semantics of the function F.

Propo8ition 2. If approximation R is used, the causality constraints on t .... e signals of a module implementing a function F:D- R have the form VD"ED F [ (V tljEDk :3 tllt2 constant(t ll t2 )(8D)(d;) :::) constant(t1+di(tlj ), t 2+d:(tl j))(8F)(F(d;))].As an illustration, consider again the 2-input NAND gate. There are only two stable values (0 and 1) that the output signal 8 z can assume. The output value 1 can result from any of the input combinations 00, 01, and 10. The output value 0, on the other hand, can result only from the input combination 11. In other words, we may interpret this as stating that • If a NAND gate is functioning properly, then i.

if either of the inputs is 0 during some interval [t 1,t 2), then the output must attain the value 1 during a subsequent interval defined by [tl+di"AND,l,t2+dfAND,1), where dP and dfAND,l are parameters characterising the NAND gate;

11.

if both of the input are 1 during some interval, then the output must attain the value 0 during a subsequent interval defined by [tl+dl,t2+dfAND,Q), where di"AND,Q and da are parameters characterising the NAND gate.

This may be stated formally by the two formulas below:

173

A framework for System Timing

:1 t 1 ,t2 such that constant(t1>t2)(s:r)(O) or constant(t1>t2)(s,I)(O) constant(t 1+dfAND,I, t 2+dfAND ,1 )(sz)(I)

:1 t1>t2 such that constant(t1>t2)(s:r)(I) & constant(t1>t2)(s,I)(I) constant( t 1+dfAND ,O, t 2+dfAND,O)(sz)(O)

~

~

In summary, using approximation R as discussed above, causality considerations for a 2-input NAND gate introduce the parameters dfAND,O, dfAND,I, dfAND,I, dfAND,O into its signal behavior, and yield the following temporal constraints on the signals waveforms S:r, S,I' and Sz: NAND _ SIGNAL _ BEH(dNAND,1 dNAND,O , 2 d NAND ,1 , 2 dNAND,O)(s Z1 S 11' s) = 1 ,1 Z :1 t1>t2 such that constant(t 1,t 2)(S:r)(O) or constant(t 1,t2)(s,l)(O) ~ constant(t 1+dfAND,I, t 2+dfAND ,1 )(sz)(I) & :1 t 1,t2 such that constant(tl,t2)(S:r)(I) & constant(t1>t 2)(s,l)(I) ~ constant(t 1+dfAND,O, t 2+dfAND,O)(sz)(O) • For future reference, and to reduce clutter, we will rename the parameters dfAND,I, dfAND,I, dfAND,O, d fAND ,0 that characterise a NAND gate by nl,n2,n3,n4' (where the letter n is intended to be a mnemonic for NAND), and abbreviate the entire set as n. With this notational change, the causal constraints on the NAND signal behaviors may be expressed by NAND_SIGNAL_BEH(nl' n3, n2, n4)(s:r,s,I'sZ) == :1 t1>t2 such that constant(t1>t2)(s:r)(O) or constant(t 1,t2)(s,I)(O) ~ constant(t 1+n1> t 2+n2)(sz)(I) & :1 t1>t2 such that constant(tl,t 2)(S:r)(I) & constant(t1>t2)(s,l)(I) ~ constant( t 1+n3, t 2+n4)( sz)(O) • Note that the approximations made above have reduced the number of independent parameters for a 2-input NAND gate from 24 to 4. Similarly, for a 3-input NAND gate, the number of parameters is reduced from 2 *23(2 3-1) = 112 to 4. It is important to stress that causality assumptions merely serve to introduce these parameters into the signal behavior of a NAND gate, without imposing any constraints on their values. 4.2.1.1.1 Some Remarks It is interesting to note that Hanna and Daeche, in their analysis discussed in [6] commence with a signal behavior for a NAND module that is very similar to the derived behavior NAND_SIGNAL_BEH above.- They observe that this is quite a complicated definition (although, with familiarity, readily comprehensible); however, they argue that this complexity is intrinsic to a digital designers notion of the behavior of a NAND gate. They also state, in the context of such examples: "We note that, in formulating this definition, several choices had to (be) exercised, as anyone starting off to frame such a definition ab initio will very soon discover! Thus, while we believe that the proposed definition

174

P.A.Subrahmanyam

is a reasonable one. it is not unique: many subtle variations are possible." Our discussion in this section indicates how signal behaviors such as those assumed in [6] can be inferred from a description of the function to be computed. given certain assumptions about causality and stability criteria. Further. the notion of "approximations" introduced above (for reducing the number of independent delay parameters that characterising a module) rationalises the use of the seemingly arbitrary parameters assumed in[6J.

,I.e.l.e Approzimations depending on identifying control signals Consider a "conditional" function defined as follows: cond(P ,x.y)

=

if P then X else Y

Such a function may be implemented by a multiplexer with ports P.z .y.z corresponding to the inputs P.X.Y and the multiplexer output. denoted Z. It is possible to view the output of this function as being "controlled" by the value of the predicate p. and to assume that the delay parameters depend on the value of s" but not on the values of the signals s" and 8,. Such an approximation leads to the causality constraint constant(t .. t 2)(8,,)(O) & stable(t .. t 2)(8,):J constant{tl+df-o. t 2+d!-O)(8a )(8,(t 1» constant(t .. t 2)(8,,)(1) & stable(t.. t 2)(8,,):J constant(tl+dr- 1 • t2+ d!-1 )(8a )(S,,(t 1

»·

This may be generalized to approximations where the set {Dl, ....• Dn. Rl, ...• Rm} is partitioned into two sets. say {01 •. ...• Op} and {Vl •...• Vq}. where the delay parameters are modeled as being dependent only on the values of the signals in the set {Ol •... ,Op} and not on the values of the signals in the set {Vl •... , Vq}. Note. however. that the signals in {Vl •... , Vq} may be required to obey other (independent) constraints. such as stability constraints. The resulting causality constraints may then be formulated in a manner similar to the example above; we omit the formal statement for brevity.

4.e.l.S Approzimation8 independent of value8 in the Domain and Range In order to further reduce the complexity of delay model. the delay parameters may be assumed to be independent of the actual values at the input and output. Prop08ition S. (Independent Dela, Approzimation). If it is assumed that the delay parameters characterising a module are independent of the values in the domain and range. then the causality constraints on the signals of a module implementing a function F:D- R have the form:

A framework for System Timing

175

stable( t h t 2 )( SD):::) constant(tl+df,t2+df)( SF )(F( sD(td» • 4.2.1.9.1 Approximating by a single delay In the extreme case, each output may be characterised by a single delay parameter under the assumption that df =d~ =d F • Such a model may be acceptable in situations where a reduction in the complexity of the model is the primary consideration, and where a finer degree of timing detail is not important. Proposition 4. (Single Delay Approximation). The causality constraints on the signals of a module implementing a function F:D- R, when a single parameter dF is used to approximate the delay induced by the module, have the form: 4.2.2 Summary In this section, we considered four categories of approximations aimed at reducing the number of delay parameters used to characterise a module. Such approximations enable a reduction in the complexity of the resulting delay model that needs to be analyzed during subsequent stages of a design. The disadvantage stems from the fact that the actual values of the delay parameters need to be conservative, in they must reflect the worst case behavior of the entire set of signal transitions that would otherwise have been treated individually. In essence, a reduction in the complexity of the delay model is obtained at the expense of having to use worst case approximations for the actual values of the delays. It is important to note that the nature of delay parameters introduced into the module description at this stage has no a priori correlation with any underlying technology or implementation mechanism. However, when a design is fleshed out, the actual values that are assumed by the delay parameters depend on the details of an implementation, and ultimately on the process parameters of the underlying technology. This relation will be discussed in subsequent sections. We next examine some further temporal constraints on the signals that arise from stability considerations.

4.3 Stability Criteria It is important that a properly functioning circuit be devoid of any spurious and perhaps transient outputs (sometimes called "glitches"). H the durations di and d~ are non-zero, then it becomes necessary to impose restrictions on signal waveforms to prevent such glitches. Consider, for example, a transition from input state p to c to n. We will assume that each state transition goes through the state 14 Consequently,

176

P.A.Subrahmanyam

let the input states be p, 1, c, 1, n during the intervals [t b t 2), [t 2,t3), and [t3' t 4), [t4' ts), [ts, t 6 ) respectively. Consider an output signal SF that is intended to carry the value of a computed function F. Causality constraints imply that the value F(p) be maintained until time t2+d~.· and that the value F( c) be maintained from time t3+ df·· until t 4+ d~." . H F(p) "" F( c) then these two intervals must not overlap in order prevent a conflict in the values the output signal is expected to assume. That is, we must have t3 + df·· ~ t2 + d~··, i.e.

where t3-t2 defines the transition time between the states p and c. Further, for the output F( c) to be perceived at the output, we must have t 3+df·· $ t4+d~·" i.e.,

where t 4-t 3 defines the time for which the input is stable at the value c. Such a stability criterion may be formally stated as follows. Proposition 5. Given a function F:D- R, the temporal stability constraints arising out of a consideration of stability criteria are: Vp,c,nED3t 1 < t2 < t3 < t4 constant(tl>t2)(8D)(P) & constant(t 2,t3)(8D)(D & constant(t3 ,t4)(8D)(C) & F(p) "" F(c):::> t3-t2 ~ d~·· - df·· & t 4-t3 ~ df·· - d~·" •

Of course, other implications, quite unrelated to stability criteria, may also follow from this precondition, but this does not concern us here. It should be noted, in reference to this example, that we have presumed that the outputs are expected to respond to steady state inputs, and that this imposes constraints on the time taken for transitions between steady states. H, on the other hand, the outputs are intended (or designed) to respond to transitions in the input states, such as in an edge-triggered device, then analogous reasoning yields constraints on the durations that inputs are required to maintain a steady state. Further, the approximations 4. Intuitively, this is justified by the fact that transitions between voltages that represent a logical 0 to voltages represent a logica.l 1 must involve intermediate voltage values that may be viewed as corresponding to an "undefined" state denoted X (or 1).

A framework for System Timing

177

discussed in the preceding subsection in connection with causality criteria may be applied to the discussions related to stability in many cases; we refrain from elaborating on these issues any further. 4.9.1 An E:tample: Cau8ality and Stability criteria for an inverter The function not computed by an inverter (Figure 2) is defined by:

not(O)=l not(1)=O We will denote the signals at the ports corresponding to the input and output by 8111 ,8,. The causality considerations, in conjunction with a "domain" approximation (cf. Proposition 1) lead to the following temporal constraints: constant(tht2)(8111)(0):::) constant(tl+dp,l, t 2+dB,1 )(8,)(1) constant(t h t 2)(8111 )(I):::) constant(tl+dl'O, t 2+dP)(8,)(0) For concreteness, consider the following values for the delay parameters (obtained by running SPICE with some default parameters): dp,l = 3.05 ns; dB,l = 0.65 ns; dl'o = 3.3 ns; dP) = 0.35 ns. For an input transition 0, the value dp,l is defined to be the time for the output to stabilize at not(l) i. e. 0 (i. e., reach 10 per cent of VDD = 0.5 volts), once the input is stable at 1 (i.e., has reached 90 per cent of VDD = 4.5 volts). The value dB,l is defined to be the time at which the output is no longer equal to not(O) i.e., no longer equal to 1 (i.e., is less than 4.5 volts) after the time when the input is no longer 0 (i.e., increases beyond 0.5 volts). Stability criteria then imply that 1.

the input transition time from 0 to 1 be ~ dB,l - dp,l = 0.65-3.05 = -2.4ns, and that the input be stable at 1 for a period exceeding dp,l_ dd'o = 3.05-0.35 = 2.6ns;

the input transition time between 1 and 0 ~ dP - dl'o = 0.35-3.3 = -2.95ns & that the input be stable at 0 for a period exceeding dl'odB,l = 3.3-0.65 = 2.55ns. Thus, the input transition times are in fact unconstrained, since the inequalities are always satisfied by virtue of the right hand sides being negative. On the other hand, the stability criteria imply that the input is constrained to be stable for a period of at least (approximately) 2.6ns for the output to drive another gate. It is interesting to note that this agrees well with observed empirical simulations that indicate that the output has glitches when the input pulse widths are less than about 3 ns. 2.

178

P.A.Subrahmanyam

5.,~--:-~===:-:---In

out

o.ovL_~-+==:::"'---:'---;~

Figure 2. Inverter Behavior 4.4 Summary In summary, we have indicated in this section the nature of the constraints induced on signal waveforms by causality, functionality and stability considerations. • Causality constraints arise from the assumption of finite delays between cause (the availability of inputs) and effect (the availability of computed output). Causality considerations serve to introduce various delay parametera that characterize signals waveforms; they do not constrain the actual values of these parameters. Initially, these delay parameters have no direct correlation with any implementation details. As a design evolves, however, such a correlation also evolves; ultimately, the delay parameters depend on the details of the system architecture and the process parameters of the underlying technology. A spectrum of approximation techniques was introduced to control the number of delay parameters explicitly modeled. • The need to exhibit a specified functionality imposes constraints on the valuea output waveforms can acquire during certain intervals, given that the input waveforms have certain values during related intervals. • The need to avoid conflicts between the constraints arising from causality and functionality considerations give rise to stability constraints on signal waveforms that govern the durationa of some intervals (as a function of the delay parameters).

5. Further Levels or Abstraction in System Timing

,

5.1 Partial temporal orders induced by resource allocation Up to this stage, we have outlined the effect on system timing behavior due to causality, functionality and stability criteria by indicating how they

A Iramework lor S,l8tem Timing

179

impose constraints on various signal waveforms. Much of this level of timing detail depends only on the specified functionality of the system, and not on any implementation decisions. As a design progresses, the details of the system architecture gradually evolve. We will next comment on the impact of architectural design decisions on system timing. The fully unraveled data flow graph corresponding to a computation embodies much of the available parallelism. The only partial ordering (i.e., sequencing) amongst the subcomputations arises from essential data dependencies. Resource allocation constraints may, in turn, impose further partial temporal orders amongst the sub computations involved. As a very simple example, the functional description of the computation (a+b)+(c+d) permits both the additions a+b and c+d operations to be performed in parallel; however, if only one adder module is available, the two additions have to be done in some (in this case, arbitrary) sequence. Analogously, a shared data bus imposes constraints on the concurrent bus accesses. The temporal partial orders amongst sub computations may be reasonably described using primitives that are characteristic of temporal logic. Alternatively, they may be explicitly described by means of constraints that must hold on the times when specific events occur. As an example, we next consider next the computations corresponding to two classes of commonly occurring expressions in a generic functional language (or even in an imperative language).

Notation. Given an expression e, let tf.if denote the time at which the computation corresponding to the expression e is initiated, let d. denote the time taken for this. computation, and let Glid denote denote the time till when the result of e will be valid (i.e., beyond which the result of e will be invalid, for example, due to the discharging of a node). Further, if e has the form I(ell ... , e.), and I is a "strict" function, i.e., a function whose evaluation requires all of its arguments to be evaluated, we will denote by t!.if the time at which the computation of I is commenced, and by d, the duration of this computation.

t:

Consider first the case when all of the subcomputations {ei' i=1..n} involved in the computation of e = I(ell ... , e.) are independent and may therefore be evaluated in parallel. If I is indeed a strict function, its computation cannot be initiated until alter all of the results of the computations corresponding to ei are available; further, its computation must be initiated belore they become invalid, i.e., during the such results are valid. Consequently, the temporal partial orders induced on the subcomputations involved may be expressed as follows: t!.if ~ t:~if + dei' for all i in {1, .. ,n}

180

P.A.Subrahman1lam

ti~it

::; t"i.-lid,

for all i in {I, .. ,n}

Consider now the case when architectural design decisions lead to resource constraints such that two functions, say Ii and Ii' both need to share a resource to which they must have mutually exclusive access. Then in the computation of an expression having the form e = 1( ... ,Ji(ei), ... ,Ji(ei)' ... )' it follows that Ii and Ii cannot be executed in parallel. Stated differently, Ij cannot be initiated until the execution of Ii has completed, or the other way around. Formally, this implies that the following constraint embodying this temporal ordering must be satisfied:

The intent of these examples was to convey some intuition relating to the nature of partial temporal orders on the sub computations in a system that are induced by resource trade-offs decisions during architectural design. These temporal ordering are in addition to the temporal dependencies implied by the data flow graph corresponding to the underlying computation.

5.1.1 Timing and Allocation Function8 Given a data flow graph of a computation, a re80urce allocation lunction details how the operators corresponding to the nodes in such a graph are allocated to hardware modules that implement these operators. A timing or 8cheduling lunction, details when (i.e., in what order) the various operations are actually performed in these modules. Such a schedule is typically implemented in hardware by state machines that may either be centralised (in the "control" part of an architecture) or distributed (as in systolic architectures). One of the tasks in synthesis involves the choice of allocation and timing functions that minimize an appropriate cost function. For certain classes of data flow graphs, e.g., graphs corresponding to uniform recurrence relations[19]' timing and allocation functions may be computed algorithmically. In general, however, heuristic techniques or global optimization techniques are needed. Heuristic or rule-based techniques often use local optimization techniques. Global knowledge can, however, be brought to bear on the solution process by using analyses of partially fleshed out designs. Since the computation graph is hierarchical, this hierarchy can be exploited in the corresponding analysis. Clustering techniques are useful in obtaining a measure of proximity between the various modules. Consider a set of modules Mi that implement the functions Fi corresponding to the nodes in the computation graph. Let di be the delay associated with an execution of the module Mi (using a single delay model), and let d be the delay associated with the overall computation. One aspect of the resource allocation problem consists of choosing the number Ni of

A framework for SY8tem Timing

181

modules M. to be used in the actual implementation. In addition, the variables corresponding to edges need to be allocated to signals, buses, registers, or memory. The cost C of an implementation may be defined in terms of a multi-dimensional vector that includes criteria such as area (depending upon the N., M. and the area for interconnections), the delays d. and d. The timing schedule governs the overall delay d, while the resource allocation affects the flexibility available in the choosing a timing schedule. The resource allocation problem maybe viewed as a multi-dimensional microcode placement problem. In the simplest case, a sub computation is allocated to grid location (i ,i) if it is executed during "time step" t. and uses module Mi. The interconnection density, and the number of buses required may be approximated by appropriate functions on a specific allocation on such a grid. Such a placement problem may be solved by using optimization techniques developed in this context, e.g.,[13]. Global optimization methods such as simulated annealing have also been found to yield reasonable solutions in this context. The dimensionality of the grid can be increased to allow for other independent factors of consideration, such as conditional sharing of resources. In a similar vein, the scheduling problem is related to that of microcode optimization[4]. If explicit values for the delays d. are known, it is possible to use solution techniques (for example, integer programming techniques,) to solve for the variables tfn'! and obtain an admissible schedule of computations that obey the specified constraints[27]. We will not elaborate any further on this issue here. We next discuss how design decisions relating to timing disciplines influence a further level of timing detail. This process establishes a relation between temporal partial orders on computations and the evolution of signal waveforms.

6. Timing Disciplines 6.1 Events and Computations We define an event to be a predicate "of interest" defined on (one or more) signal waveforms. An example of an "interesting" event is the transition in the value of a signal, say from representing a boolean value of o to a boolean value of I, or vice versa. The overall computation performed by a system may typically be hierarchically decomposed into "smaller" (sub )computations. An operation or subcomputation that is "meaningful" in the application domain may then be defined in terms of a sequence of relevant "lower level" events. Further,

182

P.A.Subrahmanyam

and more important, the initiation and the completion of such a (sub )computation needs to be signaled by specific events that are explicitly intended for this purpose. Examples of such distinguished events are transitions in clocks (in synchronous system designs), and transitions in request/acknowledge signals (in self-timed, or asynchronous systems). A timing discipline is essentially a protocol that governs the manner in which the initiation and completion of (sub )computations is indicated. A timing discipline therefore relates a computation an identifiable (set of) event(s) on some distinguished signal(s), such as clocks or protocol lines.

6.2 Elaboration of Time Intervals; timing disciplines. An additional level of timing detail in a design is introduced when a specific timing discipline is chosen. It establishes a relation between temporal orderings amongst sub computations to event predicates defined on signal waveforms. Formally, the "next time instance" is interpreted (defined) as being the next "real" time instance when some specific function (predicate) p:Time- Boolean becomes true. While we provide some simple (but typical) examples below, such a predicate may in general be any arbitrary predicate defined on waveforms. In this context, it is useful to define the function NextTime as follows: NextTime: Time X Time X (Time - Boolean) - Boolean where NextTime(tllt2)(P) == de, t1 < t2 & P(t2) & for all ttl < t < t 2 ...,p(t).

Intuitively, NextTime(tllt2)(P) says that t2 is the first instance at which the predicate p is true after time t 1; note that it does not say anything about the value of p at t 1• We now indicate how this notion supports both synchronous (clocked) and self timed timing disciplines. Consider, for example, a clocked latch that is used to implement a register. The predicate p may be interpreted as being true at the times "when the clock rises" . IT the function defining the clock behavior is elk:Time- Boolean, then the first instance after t1 when the clock rises next is described by Next Time( t 1, t 2)(Rising( elk)) where Rising( elk) is defined to be true at time t if the signal elk is true at time t and false just before. In general, given a function 8: Time- LogicalSignalValues,6 Rising(8)

IS

183

A framcwork for Sy8tcm Timing

defined to be true at time t if functionality of Rising is

6

is true at time t and false just before. The

Rising: (Time- Boolean) -

(Time- Boolean).

Consider now an asynchronous circuit implementation, using say, a request-acknowledge protocol as in[8]. The predicate p may in this case be defined to be true when the value of the rcquc6t signal makes the desired transition. 8

6.3 The clocked register Consider a latch or a register. At a reasonably abstract level, the functionality of the latch that is most relevant is that it "stores" a value presented at its input port, and makes this value available at its output port at the (abstract) next time instance. In other words, the register may be characterized by Reg(in,out)

== always

for all t (out(Hl)

=

in(t».

Once a decision about timing disciplines has been made, it is possible to amalgamate this into the register definition as follows: Reg(in,out)(clk) == always for all tll t2 (NextTime(tllt2)(Rising(clk»::J out(t2) = in(tl»

Note here that the parameter elk is now explicitly manifest in the definition of Reg, and the dependence on time of the signals in and out has now become more explicit; all of the signals in, out, and elk have the functionality Time- Boolean.

6.4 On synchronous and self-timed disciplines A timing discipline is often used in an attempt to minimize (ideally, to eliminate) timing errors that can arise due to races and hazards in circuits. Some of these errors cannot always be caught by simulation, particularly in tightly designed circuits, where there is a low tolerance for deviations in timing margins. Such deviations may arise due to fabrication process variations, thermal noise, etc. Two commonly used classes of timing disciplines are self-timed and synchronous (or clocked) timing disciplines.

5. The basic forma.lism we are using allows for mUltiple valued logic; however, we do not consider this any further in this paper. 6. An algebraic characterization of Seitz's "weak conditions" for self timed circuits may be used to prove the correctness of such protocols[20J.

184

P.A.Subrahmanyam

Sell timed timing disciplines have the advantage of yielding designs that are delay independent, and eonsequently modules that use self-timed protoeols are composable. This delay independenee is typieally aehieved by adding auxiliary logie elements, sueh as Muller 0 elements, that aid in deteeting when the preeonditions needed for initiating a computation (event) are met. Special signals for indieating the initiation and eompletion of events, ealled request and acknowledge signals, are often used in self timed protoeols[8]. The drawbaek with using self timed mechanisms is the added area and delay that is introdueed by the auxiliary logie elements. Synchronous timing disciplines, on the other hand, use special global signals ealled clocks in order to sequenee the (sub )eomputations in a system. The use of a synchronous timing diseipline tends to loealize timing problems that may be encountered to the segments between clocked elements. In teehnologies sueh as CMOS, however, there are typically fairly stringent constraints relating to overlaps of cloek lines that need to be met. For example, in very tight designs the overlap between cloeks and the rise and fall times of cloek edges may need to be less than a gate delay. Sueh requirements are difficult to ensure as geometries are sealed down. Indeed, they are further exacerbated by the inereasingly dominant role of capacitive and resistive effects that tend to eontribute to cloek skew and slower transition times. As a consequence, one is either foreed to adopt a eonservative cireuit design discipline that provides larger margins for error, or invest considerable effort in fine tuning clocks, using large clock drivers (to improve rise and fall times) and providing cloek buffers (to minimise skewing effects). The drawback associated with the latter technique is the increased area taken up by clocking structures; the drawback associated for the former is increase in delay that is introdueed by eonservative design styles. This drawbaek is partially eompensated for in "dynamie" design styles in CMOS by exploiting the presenee of a continuously running clock in various ways in order to avoid the neeessity of fully eomplementary logic (thereby reducing area). However, a deleterious seeondary effeet of this strategy is to increase the power eonsumed by sueh cireuits. Some common examples of dynamic circuit design paradigms in CMOS include: DOMINO logie[ll], which uses only n-transistors for computation and provides non-inverting logic; NORA logic[12], which allows the use of both n- and p-type transistors and provides both inverting and noninverting logic; and the CLIO design style[17] that addresses a problem associated with the NORA technique, namely extreme sensitivity to clock transition and overlap times.

6.4.1 Trade-oils Rough analysis[23] indicates that synchronous disciplines may yield comparatively faster response times when the size of the circuits is relatively small, and that self timed disciplines may be better for larger

A framework for System Timing

185

circuits. In small circuits with total delays of approximately 10-15 "normalized" time units,7 relatively tight design margins can be used; as a consequence, the overhead associated with self timed design is not justifiable. However, in larger circuits with larger total delays, the timing margins required in a synchronous design style may tend to exceed the delay introduced by auxiliary elements needed for a self timed protocol[23J; moreover, the area overhead of such auxiliary elements tends to be relatively low in this context.

An improvement in technology may easily invalidate a very tight synchronous design, whereas this is less likely to be the case for a self timed circuit. Indeed, the clocking structures in a tightly designed synchronous circuit may have to be substantially modified when a technology is scaled down, whereas a completely self timed circuit will, as a consequence of its delay independence, yield somewhat more robust designs across improvements in technology.

0.5 Considerations in Choosing Timing Disciplines The actual decision relating to the timing discipline used depends on the environment a system will be embedded in, constraints on interfacing protocols, envisioned system speed, possible clock skews, etc. Quite often, it may be desirable to design subsystems of a system in a "locally synchronous" fashion, while the interfaces between such systems may be self-timed[21J. Additionally, in some high performance systems, designers choose to adopt self timed protocols in view of the fact that the specifications of commercial parts {such as memories} are usually conservative: adopting a self timed interface, if possible, enables the design to exploit the actual working speed of the system, which may be better than its specified speed[9]. In the context of automated synthesis systems, such considerations may conceivably be accounted for in an agent dealing with timing disciplines. We refrain from delving into any further details on this issue.

0.0 On the parameters characterising an implementation Once the appropriate timing disciplines have been decided upon, the circuits used to implement the functional modules can be detailed. The interdependencies between the signals in such a circuit may impose further constraints on the behavior of signals over time, e.g., a signal might be required to be stable over some period of time. H 8i, Xi denote 7. Where the unit of time corresponds to the delay through an inverter that is driven by equally sized inverter.

186

P.A.Subrahmanyam

corresponding objects in an implementation domain (e.g., signals) and an abstracted domain (e.g., {O,l,x}), and Abs is an abstraction relation (function) that relates (maps) objects in the implementation domain to corresponding objects in the abstract domain, then a "consistent" implementation R' of an n-ary relation R in the abstract domain is typically homomorphic, i.e., Abs(R',R) & Abs(8j, Xj) ~ Abs(R'(81, ... ,8.. ),R(Xl, .... ,X.. )) In general, however, in order for this condition to hold, certain additional constraints relating 8j, Xj, R and R' may have to be satisfied, i.e., Abs(R',R) & Abs(8j, Xj) & constraints(8j, Xj, R', R) ~ Abs(R'(8l> ... ,8.. ),R(Xl> •••• ,X.. )) Thus, certain constraints may be imposed on the values of the parameters characterising R and R'. Here, we will be primarily concerned with the relations that must hold between the timing parameters characterising the modules used in the implementation and the timing parameters characterising the desired behavior. We next illustrate the nature of such constraints and parameter dependencies. 6.6.1 An Example Consider, as an example, the specification of an edgetriggered D-type flip-flop, shown in Figure 3 (cf.[6]). This device has 3 ports: a data input port (D), a data output port (Q), and a strobe or clock port (C). It is typically characterised in manufacturer's specification sheets by a set of (idealized) waveforms, and associated with certain parameters. Examples of such parameters are maximum clock rates, and minimum data setup and hold times, as well as maximum propagation times.

r - - .... -----....

8ell:

.... ~ •.1!4

~

'----- ...... . - /

....... .

-O---.. . . . . . ..

.. ~! .......

-C

)>--

Figure 3: The behavior of a D-type flip-flop

Let 8 elll, 84, 8, denote the signals at the ports corresponding to the clock, the input and the output. A detailed, but rigorous, specification of the Dflip flop may have the following form, where the annotations within braces

A framework for SllBtem Timing

187

provide an informal explanation of the associated expressions. s DFF_SIGNAL_BEH(a.,a2,a3,a4,a6,ae,a7,as)(s.,k, S4, Sf} = V to,tl,t2,t3,t4,t6 constant( t o,t 1)( Sclk)(O) {if the clock is low from to to ttl & t1-tO~ a1 {and the minimum clock low time is al} & NextTime(t.,t 2)(sclk)(I) {and the clock rises at t 2 } & t2""tl~"2 {and maximum clock rise time is a2} & constant( t 2,t3)(S./k)(I) {and the clock remains high from t2 to tal & t3'"t2~ a3 {and the minimum high time is a3} & NextTime{ t 3,t4)( Sclk)(O) {and the clock falls at t 4} & constant(t 4,t6)(S./k)(0) {and the clock remains low from t4 to t 6} & NextTime(t 6,t e)(s./k)(I) {and rises again at t e} & stable(t r a6,t 1+ae)(S4) {and the data is stable Jor a period overlapping the clock rise time} :J

{then}

stable(tl+a7, t 6+aS}(sf) {the output remains stable during this period} & Sf(t1+a7} = S4(t1} {and has the same value as the data input at t l }

S4 SI Figure 4: An implementation of a D-flip-flop

A standard implementation of such a D-type flip flop is indicated in Figure 4. Formally, this may be described by the predicate IMPL_SIGNAL_BEH below. Recall that we denote the 4 parameters characterising a NAND gate by n; since the implementation uses only NAND gates, the implementation is also parameterised by n.

8. This is a variant or the definition or a nip-nop used in

161.

188

P.A.Subrahmanyam

IMPL_SIGNAL_BEH(Ii)("dk'''4''','''t>''2'''S'''4,''S) = 4./ NAND_SIGNAL_BEH(Ii)("2'''4,''I) & NAND3_SIGNAL_BEH(Ii)("s'''dk,''t>''2) & NAND_SIGNAL_BEH(Ii)("4'''dk,''s) & NAND_SIGNAL_BEH(Ii)("t>"s'''4) & NAND_SIGNAL_BEH(Ii)(",'''2,8s) & NAND_SIGNAL_BEH(Ii)(8s,"s,",) Of course, it needs to be shown that this implementation indeed exhibits the behavior of a D-type flip flop. It can be shown that proving this involves proving the following implication: 9 IMPL_SIGNAL_BEH(Ii)("dk,8d,,,,,81,82,8S,8.,8S) :::) DFF_SIGNAL_BEH(alla2,aS,a.,aS,a6,a7,aS)(8elk, 8d, 8,) It should of course be immediately obvious that the parameters al, .. ,aS characterising the flip flop must be related in some manner to the parameters Ii (i.e. nl,. .. ,n.) characterising the NAND gates used in the implementation. It can indeed be shown that this configuration of NAND gates (i.e., the implementation) satisfies the desired behavior only if the parameters Ii and -a are appropriately constrained. The normal manner in which such constraints may be computed is by accumulating the weake8t condition8 that need to be made in order for a proof of correctne88 to be carried out. A set of the constraints relating the parameters of the flip-flop and those of the NAND gate is given by the relation DFF_CONSTRAINTS described below, and is obtained in course of the proof of this implementation described in[6J. DFF_CONSTRAINTS(-a,Ii) = al > 2nt+ 2ns al ~ as+nl as> 2nl+ 2ns n.+n2 ~ a.+nl as> 2 nl + ns a6 > nl + ns a7 ~ nl + 2 ns n. + n2 ~ a. + as n2> 0 9. This amounts to approximating the homomorphism function by an implication. We refrain from commenting in detail here on the general nature of proofs that need to be carried out. A skeleton of a detailed proof that this implementation satisfies the behavior of the flip flip (as embodied in the predicate DFF_SIGNAL_BEH) may be found in161.

A framework for S,lBtem Timing

189

It is possible to express these constraints as a function that defines the parameters a in terms of the parameters of the implementation primitives, i.e., n. We illustrate this below. DFF_ CHARACTERISTICS(n) al > 2nl+ 2ns al - a6 ~ nl as > 2nl+ 2ns a4::S; n4+ n r n l a6> 2 nl + ns a6> nl + ns a7 ~ nl + 2 ns a4 + as::S; n4 + n2 n2 > 0

==

Given a set of constraints that need to be satisfied, two possibilities arise. First, the required set of constraints may be unsatisfiable i.e., there may exist no combination of values for the timing parameters (n and a in this example) that are ·consistent with the given set of constraints. In such a case the proposed implementation is an incorrect one. Alternatively, a set of values satisfying the constraints and consistent with available devices may in fact exist, in which case the implementation may provide the specified behavior, subject to using devices that have the appropriate parameters.

7. What should the clock width be! Clocking disciplines Given the semantics of an architectural description, its combinational and sequential components can be identified. The timing delays introduced by the elements in the combinational component of circuit may then be computed. Such a delay may either be already available as an attribute of the circuit element (obtained, say, from a standard library), or computed from the layout details, given appropriate circuit timing models and the process fabrication parameters. The delays along various paths through the combinational circuit component can consequently be computed, and the worst case delay obtained. Existing critical path analysis tools, such as those found in[5,25] can be used in this context. In synchronous systems, this worst case delay typically tends to limit the speed of the overall system, br setting a limit on the minimum clock width that can be safely used. At this stage of development of the design, it may either be the case that the delays involved are acceptable (i.e., consistent with the desired timing requirements) or that they are not. In the latter case, two of the alternatives that may be considered are (i) complete architectural redesign at a higher level, or (ii) "local" architectural reconfigurations in the portions of the circuit that contain critical paths. In a synchronous design, it might be the case that (i) much of the circuit may have acceptable delays, except

190

P.A.Subrahmanllam

for a few" critical" paths; or that (ii) a certain operation must be performed in a stipulated interval (say, a single virtual clock period), but that the design style adopted to implement it (such as pre-charged/dynamic logic), or a pre-existing subsystem being used for this operation (e.g., an off-theshelf FIFO chip), consists of a sequence of sub-operations that need to be initiated by clock-pulses. Such situations lead to the introduction of additional clocks (or clock phases). As to whether these clock phases are distributed globally or regenerated locally is governed by trade-offs such as the clock skew that may arise versus the area needed to regenerate the phases, etc. Again, such decisions need to be made by an agent that is concerned with clocking disciplines. IT the actual performance of an implementation is deemed unsatisfactory, the architectural design has to be suitably modified. 1o

8. Improving performance Given a gate-level architecture for a module, it is typically possible to improve, within bounds, the response time of a circuit corresponding to this architecture. This involves reasoning about system timing details at even finer levels. The combinational parts of a circuit are of primary concern in the context of synchronous designs, since they determine the minimum clock width possible, and thereby govern the overall performance of a circuit. Performance improvements are usually guided by an appropriate circuit model. The correlation of the actual improvements obtained to the projected improvement derived from a model are obviously dependent on the accuracy of the model employed. We next indicate the flavor of some RC models that may be used to fine tune the performance of circuits; our intent here is neither to be exhaustive, nor to prescribe any specific set of models, but merely to indicate some of the possibilities.

8.1 Estimating the number of stages and the gain per stage The delay incurred in each stage of a typical MOS circuit is directly proportional to the capacitive load driven, and inversely proportional to the current produced by the driver. While the current delivered by any stage can be increased by increasing the size of the transistors used by that stage (i.e., by widening the transistors used), this also has the effect of increasing the eCCective capacitance observed at the inputs of this stage, thereby slowing down the preceding stage. 10. Throughout this paper, the word "performance" refers only to the timing characteristics of a. module; other a.ttributes such a.s area. a.nd power consumption, are explicitly used when needed.

A framework for Sy8tem Timing

191

A technique is needed, therefore, in order that enables a rapid computation of the optimal number of stages in a circuit, and further, the sizes of the individual transistors in each stage of such a circuit. Roughly speaking, if G is the total amplification required, d is the total delay incurred, and there are n stages in the circuit, it is can be easily shown that the overall circuit delay is minimized by dividing the overall delay equally between the stages. Each stage would provide Gil· of the amplification, and contribute dIn to the delay, where d is roughly proportional to Gil·. Thus, the total delay through the circuit is proportional to n *GI/., and an estimate for the optimal value for the number of stages, for a specific value of G, is given by that value of n for which n*GI/· = (n+ l)*GI/(.+I). It is therefore necessary to have a technique for computing the overall gain G that is required, and to provide appropriate guidelines for restructuring a circuit should an increased number of stages be required. When additional stages are added to a circuit, it is simplest to add them in the form of inverters, while making sure that proper functionality is maintained[24]. Such inverters provide the necessary electrical amplification while minimally affecting the logical function. It is of course possible to restructure the circuit while increasing the number of stages; while the restructured circuit may appear quite different, it is still performing basically the same function. We next provide examples of two such techniques. The first example indicates how performance improvements may be obtained by using conventional logic optimizations may be used in conjunction with analysis techniques based on RC models of MOS circuits. Such an analysis technique accommodates the potential alteration of circuit structures, but does not directly assist in indicating when and how such a modification may be performed. The second example indicates how multiple stages of amplification may be used in order to provide an improvement in the performance of the circuit.

8.2 Examples of logic optimization and cascaded gain stages As indicated above, performance improvements may be obtained by logic optmizations and cascading multiple gain stages. Detailed RC models may be used to compute the gain required per stage, depending on the attributes one desires to optimize, such as power dissipation or size. For concreteness, we first summarize computations relating to two examples, drawn from[26]. 8.e.1 Re8tructuring a circuit for improved gain Consider the implementation of an 8-input AND gate. Three alternative circuits for this purpose are shown in Figure 5. These consist of an 8-input NAND gate with an inverter; two 4-input NANDs and a 2-input NOR; and four 2-input

192

P.A.Subrahmanyam

NANDs, two 2-input NORs, and an inverter. Using a straightCorward RC model, it can easily be shown that the approximate average oC the rise and Call times oC an m-input NAND/NOR gate, normalised against an inverter's average rise and Call time, is given by

where OD = capacitance oC a unit drain area, and OL = load capacitance on gate owing to routing and Canout. This expression indicates that the best circuit Cor implementing an 8-input AND gate is circuit 3 Cor small to medium loads (e.g., OD = .SOLl, while Cor dominant values oC OL, circuit 2 perCorms better.

(a)

a7 ao

CASBl

a7 a4 43

(b)

aO a7 aO

a5 a4 a3 a2 (e)

Figure 6. 3 alternative circuits Cor and 8-input AND gate

8.e.e Oaaeatiin, ,ain ata,e3 AB a second example, consider the implementation oC an XOR gate using a 2-input NAND gate, an OR-AND gate, and an inverter; this circuit is shown in Figure 6. H the average stage ratio (i.e., the ratio by which successive transistor widths are multiplied) is denoted G, the average delay through the series oC gates can be shown to be TAVERAGE= O.S[2GR. OL + GR, OL + 2 GR. OL + 2GR, OL + (R. 0LOAD)/ 02+ (R, 0LOAD)/ G2} where OL is the load capacitance in each individual stage; OLOAD is the final load capacitance; R. and R, are the individual stage n- and p-transistor resistances. Depending upon the value oC OLOAD / OL, the average stage ratio that is optimal for this circuit may be determined. In this particular

A framework for System Timing

193

t>o--XOr(a,b)

vdd

Fi.....e 8. Cuc..tiDs Gam.

example, G =

[OLOAD/(R.+RI'}(2R. 0L+ 1.5RI'0dP/s. For example, if OLOAD= 1000L , G=5 yields a circuit that is almost twice as fast as one in

which G=1. Once a transistor level design has been finalized, performance may be improved locally by transistor resizing. A viable technique for transistor resizing is the one adopted in TILOS[5]; another alternative is the technique discussed in [18]. H A is the sum of transistor sizes and T the longest delay through the circuit and it is desired to minimize metrics such as A subject to T

IOR(t+D~teRE+D~:::")

In words, this says:

= Memory(SystemBus(t»

A framework for System Timing

197

IF • the enable signal MARE is raised at time t (and takes time D!f.1RE to rise) • and the system bus maintains a valid address from t +D!f.1RE+ Df!fl

t-Dl!t".: to

• and a memory READ is requested by holding the ReadWrite signal at 1 during this interval

THEN • the word at the address specified on the system bus will be made available in the registed lOR after a delay Df!~~:'" following the time when the enable signal MARE becomes high. Note that a somewhat more complicated memory system accommodating the notion of "fast" and "slow" memory fetches can also be specified; however, we do not consider this system here. Given the above assumptions relating to the behavior of the memory subsystem, we next describe the desired behavior of the GetNextWord operation, and analyze the evolution of a plausible implementation. 9.1.2 Desired Behavior of GetNextWord Informally, the objective of the GetNextWord operation is to fetch the word in memory at a computed (effective) address, and store this word either in the register lOR or in the register file. For concreteness, we assume that the word is stored in a register called lOB (for the I/O Buffer) that is in the register file. We also assume that the effective address is calculated by adding the contents of the register denoted PC (for the program counter) and the register denoted OA {for offset address}; further, both these registers are assumed to be in the register file.

The behavior desired of the GetNextWord function may be described as follows:

=

GetNextWord(RegF11e.Memory) Next lOB = Memory[Effect1veAddress] which basically reflects the fact that following the execution of the GetNextWord operation, the content of the register lOB is the same as the contents of memory at the computed effective address. 9.1.9 A computation method A straightforward algorithm for achieving the desired behavior involves the following steps:

• Compute the effective address by adding the contents of the register PC and OA in the register file;

198

P.A.Subrahmanyam

• Issue a read operation to the memory subsystem; • Write the result of the memory read operation into the lOB register. Functionally, this may be expressed by Write(Regfile, lOB, Read(Memory,EffectiveAddress(RegFile))) where EffectiveAddress(RegFile)=Add(Read(RegFile,PC),Read(RegFile,OA)) Alternatively, using more conventional imperative notation, the same computation may be expressed as: RegFile[IOBJ := MemoryRead[Regfile[pCJ

+ Regfile[OAII

9.1.4 Temporal Oon8equence8 0/ a computation method The choice of this algorithm implies that the effective address must be computed prior to the memory read operation. l1 However, the two reads from the register file may potentially be concurrent. Finally, the write operation into lOB follows the memory read. This partial order is embodied by the graph shown in Figure 8. CbJqIu~~l)-------!." SIoreNUlt

mIG lOB

~Read\

::ReadOAI :

~

Adcl(PC,OA)------;· .. Msm~

::

------'.1....

910... NUlt

IDIoIOB

.....................

Figure 8. Temporal ordering induced by the algorithm

9.1.5 Re80urce Allocation Deci8ion8 Hardware resources are needed for implementing (i) the read operation on the register file, (ii) the Add operation, (iii) the processor-memory interface, and (iv) the memory subsystem. We assume the following design decisions pertaining to the resource allocation: 11. The actual temporal orders induced are or course independent of the notation iii which the algorithm is expressed. However, a. functional notation has the ad'l'antage of pro'l'iding a more explicit representation or the potential parallelism.

199

A framework for S,lBtem Timing

• The logic associated with the Register file allows two simultaneous reads onto the two input ports of the ALU; • The ALU is capable of performing the addition operation; • A single system bus is shared by the microprocesor and the memory subsystem; • The data and address are multiplexed on the system bus. The first two allocation decisions do not impose any additional temporal constraints to the ones shown in Figure 8, since they allow the run parallelism permitted by the algorithm choice. However, because a single bus is used for carrying both the data and the address (and because the system bus may be shared by other subsystems) a protocol is needed for obtaining access to the system bus. We assume a simple protocol here that uses a distinguished signal denoted BusRequest for staking claim to the bus. This protocol stipulates that if the microprocessor wants to gain control of the bus, it should raise the BusRequest signal high. The set of temporal partial orders arising as a result of the resource allocation constraints is depicted in Figure 9 .

?~AdFC\

~



:: ::

...................................................... . .

Adcl(PC,OA) •

~ ~

..

a.-~ :NAR-BIII . . .

: :ReadOA

Fipre 9. Temporal orders due to resource constraints (multiplexed bus) 9.1.6 Timing Requirement6 for the operation GetNeztWord computation shown in Figure 8, there are 3 steps involved.

In the

We assume that the mandated requirement is that the computation of the effective address be completed during the first major cycle, and that the memory read and store into the register lOB be completed during the next major cycle. This is indicated in Figure 11. The

requirement

may

be

formally

stated

as

follows,

where

200

P.A. Subrahmanllam

Instruct1on(t.) denotes thil instruction is the instruction register at time t [GetNextWord T1m1ng Requ1rement] a Instruct1on(t)=GetNextWord ~ Bus (t+Tp )=Add(RegF11e [PC] (t).RegF11e[OA] (t» a IOB(t+2*Tp )=Memory[RegF11e[PC] (t)+RegF11e[OA] (t)] (t)

R1ses(t)(~1)

The constraints implied by this protocol are that the delay incurred by the computation of Add(RegFile[PC](t},RegFile[OA](t}} is less than Tp , and that the memory access (and store into lOB) from the time the address is available on the system bus takes less time than Tp. 9.1. 7 Timing/Clocking Di8cipline The clocking scheme adopted in a system typically corresponds to some of its "naturally occurring" logical operations. In the case of a microprocessor, the basic logical interval of concern is the microprocessor's microinstruction cycle. Depending upon the details of the processor, however, a microcycle may be further divided into two, three, or more phases. We assume here, for the sake of illustration, a three phase cycle. 12 In the data path, we assume that the register file is read during phase one, that the ALU performs its operation during phase two, and that the result written onto the register file (or multiplexed onto the system bus) during phase three. The control path may, for instance, update the microaddress latch during the first phase, access the AND plane during the second phase, access the OR plane and update the micro instruction register during phase 3. We next describe the behavior assumed of a 3phase clock, and relevant aspects of the control path behavior. 9.1. 7.1 Three Pha8e Overlapping Clock Behavior The behavior of a three phase overlapping clock with overlapping and single phase periods of To and Ts each is shown in Figure 10, and may be described as follows:

U. A 2 phase nonoverlapping clock is sometimes difficult to guarantee in tightly designed high performance systems, while a 2 phase overlapping clock leads to overly stringent timing constraints that restrict both the minimum and the maximum path delays. On the other hand, a three phase overlapping clock is relatively easy to generate, and does not lead to unreasonable constraints provided that the three phases do not simultaneously overlap during anyone time interval.

A framework for Sy,tem Timing

201

for all t Rises(t)(tPl)::> constant(HTs+2*To)(tPl)(I) & Falls(H Ts+2* TO)(tPl) & Rises(t+ Ts+ TO)(tP2) Falls(t)(tPl)::> constant(t,t+2* Ts+ TO)(tPl)(O) & Rises(t+2* Ts+ TO)(tPl) Rises(t)(tP2)::> constant(HTs+2*To)(tP2)(1) & Falls(H Ts+2* TO)(tP2) & Rises(t+ Ts+ To)(tPs) Falls(t)(tP2)::> constant(t,t+2* Ts+ TO)(tP2)(O) & Rises(t+2*Ts+To)(tP2) Rises(t)(tPs)::> constant(HTs+2*To)(tPs)(1) & Falls(HTs+2*To)(tPs) & Rises(t+Ts+To)(tPl) Falls(t)(tPs)::> constant(t,t+2* Ts+ To)(tPs)(O) & Rises(t+2* Ts+ To)(tPs) The above specification ignores the rise and fall times of the individual clocks; this may be explicitly modeled if necessary, but we refrain from doing this here. The "major" cycle has a period Tp = 3*(To+Ts). Each phase has a high time of Ts+2*To, and a low time of 2*Ts+To.

:Ta· T



: : s: ..

~

TS : :

'l'p

:

~~ Ts ~ ..

Figure 10. A 3 phase overlapping clock

9.1.8 Allumption, relating to the Oontrol Unit Behavior We assume that the output of the microinstruction register MIR is stable for a period T p , beginning with the time when tPl rises. The register MIR contains the sub fields [Rl,R2,W,OP,CNTRLj. These fields denote the registers Rl,R2 to be read from the register file (during phase 1); the operations OP to be performed by the ALU (during phase 2); the gating control CNTRL for the multiplexer indicating whether lOR or ALU is to be gated onto the multiplexer output; and the register W in the register file that is written with the value on MuxOut (during phase 3). H the functions NextMIR and NextState define the next microinstruction

and next state (as a function of the current state and the value on MuxOut), then the behavior of the control unit may be described as follows:

202

P.A.Subrahmanyam

Control(State) =4q Rises (t) (tPl) I; stable (t, t+Ts+To) (MuxOut) :::> constant(t+Tp ,t+2*Tp ) (MIR) (NextMIR(State,MuxOut» constant(t+Tp ,t+2*Tp ) (State) (NextState (State ,MuxOut» The actual details of the functions NextMIR and NextState are not of immediate concern to us. We further assume the existence of an "Instruction Register" somewhere in the control that contains the next "main" instruction to be executed. The content of this register at time t is denoted Instruction( t). 9.1. 9 Hierarchical temporal decompo8ition We restrict our discussion in the remainder of this section to a consideration of the operation GetNextWord. As discussed earlier, the execution of the GetNextWord instruction involves sub computations whose details may be hierarchically considered. The protocol for each of the subcomputations shown in Figure 9 can be enumerated and their temporal implications analyzed using the techniques introduced earlier. The timing diagram in Figure 11 indicates some of the constraints generated in a graphical fashion. The temporal constraints arising out of a hierarchical analysis of the sub computations involved in an execution of the operation GetNextWord above are summarized below. We have assumed an independent delay approximation for the ReadRegFile, ALU and MUX operations, with the associated delay constants being d~eIl4Re" dUe• 4Re,; d~LU, d~LU; dt'ux, dfuX respectively. df·,o.. t denotes the time for the result from MuxOut to be driven onto the system bus. tP.5 is the time at which a bus request is initiated.

Assume Rises (t 1) (tPl) dU··m·,>O df··4R., < T s +2*To . d~LU>O

dt LU dt'uX dt'uX tl
0 then if value(in(s» = true then c(s) else 0 else if (2 • Q(s) + 2 • charge(in(s» > c(s) + capacit(in(s» ) then c(s) else 0;

==

axiom out(s) -- if (Q(s) > 0) then vire(O,true,c(s),c(s» else vire(O,false,c(s),O); end;

3.6

Transistor

type transistor; declare s:T; interface p(s):T; interface in1(s),out1(s),in2(s),out2(s), incntl(s).outcntl(s):vire; axiom out2(s) if value(incntl(s» = true then in1(s) else vire(O,unc,O,O); axiom out1(s) == if value(incntl(s» = true then in2(s) else vire(O,unc,O,O); axiom outcntl(s) vire(O,unc,O,O); end;

==

==

Note that incntl and outcntl stand for the wires in the gate port of the transistor.

3.'7

Join

The join has three ports, and the output wire of a port depends on the input wires of the other two roughly as follows: if the strength of one input wire dominates over the other, then its strength and value are carried over to the output wire under consideration. If the strengths are both equal and non-zero, and the values are different, then the value of the output wire is uncertain. If the strengths are both zero then the value of the output wire is true if the combined charge of both inputs exceeds half the combined capacitance. The charge and capacitance of an output wire are, respectively, the sums of the charges and capacitances of the input wires of the other two ports.

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

225

type join; declare s:join; declare a,b: wire; interface in1(s),out1(s),in2(s),out2(s),in3(s),out3(s): wire; interface sumcap(a,b), sumchg(a,b): Integer; interface joinout(a,b): wire; axiom sumcap(a,b) == capacit(a) + capacit(b); axiom sumchg(a,b) == charge(a) + charge(b); axiom joinout(a,b) == if strength(a) > strength(b) then wire(strength(a),value(a),sumcap(a,b),sumchg(a,b» else if strength(a) < strength(b) then wire(strength(b),value(b),sumcap(a,b),sumchg(a,b» else if strength(a) > 0 then if value(a) = value(b) then wire (strength (a) ,value(a) ,sumcap(a,b),sumchg(a,b» else wire(strength(a),unc,sumcap(a,b),sumchg(a,b» else if 2 • sumchg(a,b) > sumcap(a,b) then wire(strength(a),true,sumcap(a,b),sumchg(a,b» else wire (strength(a) ,false,sumcap(a,b) ,sumchg(a,b»; axiom out1(s) -- joinout(in2(s),in3(s»; axiom out2(s) -- joinout(in3(s),in1(s»; axiom out3(s) -- joinout(in1(s),in2(s»; end;

4

Verification

Verification is carried out in Affirm-85 by 1. specifying the circuit in terms of its desired properties as an abstract data type (the behavioral specification),

2. specifying the actual internals of the circuit as an "implementation" of the above type (the structural specification), and 3. verifying the implementation (showing that the properties of the behavioral specification are satisfied by the structural specification). Affirm-85 has commands that enable the user to generate the verification conditions (i.e., theorems that must be proved for verification) and carry through the proof interactively. The environment in which the verification is carried out would typically include the primitive types discussed earlier and

226

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

2 1

Figure 2: An Inverter Circuit (possibly) other types that the user may have defined on his own. These userdefined types could in turn be behavioral specifications of circuits that were verified earlier. For large circuits, this allows us to break the problem down into several smaller ones in a structured manner; for once a circuit is verified, we only need its behavioral specification in later applications that involve this circuit as a component.

5

Examples

We now discuss the verification of three circuits, namely inverter, norgate and multiplexor ("mux"), using the BIDS model. We discuss them in that order because (a) they are of increasing complexity, and (b) the inverter is used in both norgate and multiplexor. Net capacitances are ignored in all three for simplicity.

5.1

Inverter

The inverter has two ports, say A and B, and can be behaviorally specified as follows: if the value of the input wire of port A is true then the output wire of B has strength 2 and value false; if the value is false, then the output wire of B has strength 1 and value true. The charge and capacitance values of the output wires of both ports are zero. Hence the type definition for the behavioral specification is

227

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

out

8----1

b

Figure 3: Norgate Circuit

type inverter: declare s:T: interface p(s):T: interface in1(s),out1(s),in2(s),out2(s):wire: axiom out2(s) if value(in1(s» false then wire(1,true,O,O) else wire(2,false,O,O):

==

axiom out1(s) end:

=

-- wire(O,unc,O,O);

The circuit consists of a power source, a resistor, a transistor, a ground and a join (see Figure 2). The implementation is specified as

implement inverter: interface tranA(s): transistor: interface powerA(s): power: interface resistorA(a): resistor: interface groundA(s): ground: interface joinA(a): join: axiom in1(tranA(s» out(groundA(s»: axiom in(groundA(s» = out1(tranA(s»: axiom in2(tranA(s» out1(joinA(s»: axiom in1(joinA(s» == out2(tranA(a»: axiom inentl(tranA(s» in1(s): axiom out1(s) = outcntl(tranA(s»;

== ==

==

228

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

axiom in2(joinA(s» == in2(s); axiom out2(s) == out2(joinA(s»; axiom in3(joinA(s» == out2(resistorA(s»; axiom in2(resistorA(s» = out3(joinA(s»; axiom in1(resistorA(s» out(poverA(s»; axiom in(poverA(s» out1(resistorA(s»; axiom good(s) = true; end;

=

=

Once all the necessary primitive types (namely, wire, join, transistor, resistor, power and ground) and the above specifications are entered, we can generate the verification conditions using a genimpvcs command (generate implementation verification conditions). The circuit is verified once these are proved correct.

5.2

Norgate

There are two input ports, say A and B, and one output port. The behavioral specification is as follows:

type norgate; declare s:T; interface p(s):T; interface in(s),out(s),ina(s),outa(s), inb(s),outb(s):vire; interface uncertain(s): Boolean; axiom outa(s) == vire(O,unc,O,O): axiom outb(s) == vire(O,unc,O,O); axiom out(s) == if value(ina(s» = true then vire(2,false,O,O) else if value(inb(s» = true then vire(2,false,O,O) else vire(1,true,O,O); end; The structural specification consists of an inverter, a transistor, a ground and a join (see Figure 3).

implement norgate; interface join(s):join; interface tran(s): transistor; interface inv(s):inverter; interface ground(s):ground; axiom in1(join(s» out2(inv(s»; axiom in2(inv(s» out1(join(s»; axiom in2(join(s» == out2(tran(s»; axiom in2(tran(s» == out2(join(s»;

== ==

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

229

A----t

...---- C

B-------r------~

CONTROL

Figure 4: Multiplexor Circuit

axiom axiom axiom axiom axiom axiom axiom axiom axiom end;

in3(join(s» == in(s); out(s) == out3(join(s»; inl(inv(s» == ina(s); outa(s) == outl(inv(s»; incntl(tran(s» == inb(s); outb(s) == outcntl(tran(s»; inl(tran(s» == out(ground(s»; in(ground(s» == outl(tran(s»; good(s) == true;

Note that during the proof we only need the behavioral specification of the inverter.

5.3

Multiplexor

The circuit is shown in Figure 4. COITROL is used only as input but A, B, and C are used bidirectionally. The main behavioral specification of the device is that if the value of the COITROL input is true, then A and C are connected; if it is false, Band C are connected. However, we found that we cannot assert this in its full generality; the above characterization of the mux/demux device holds good only when the strength of the COITROL input is non-zero. If the strength of the CONTROL input is zero, then all that we can say is that one of the lines A, B will be connected to line C. Thus our behavioral specification only specifies

230

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

the case when the input strength is non-zero; when the strength is zero, it is "immaterial" which wire is connected to which.

type mux; declare s:T; interface p(s):T; interface in(s),out(s),ina(s),outa(s), inb(s) ,outb(s),inc(s) ,outc(s):wire: {t tin" and t tout" are the input and output wires of the CONTROL port.} axiom outc(s) == if strength(in(s» > 0 then if value(in(s» = true then ina(s) else inb(s) else wire(imm,unc,O,O); axiom outa(s) if strength(in(s» > 0 then if value(in(s» = true then inc(s) else wire(O,unc,O,O) else wire(imm,unc,O,O); axiom outb(s) -- if strength(in(s» > 0 then if value(in(s» = false then inc(s) else wire(O,unc,O,O) else wire(imm,unc,O,O); axiom out(s) === wire(O,unc,O,O): end:

--

The case when the input strength is zero can be given as a separate theorem and proved, and this is what we actually did.

theorem t\\#i, all s «strength(in(s» = 0) imp «outc(s) = ina(s» or (outc(s) = inb(s») );

(imp stands for implies.) The structural description of the mux/demux circuit is in terms of two transistors and an inverter as shown in Figure 4. implement mux; interface trani(s), tran2(s):transistor; interface joini(s), join2(s):join; interface inv(s):inverter; axiom incntl(tran2(s» == out2(inv(s»; axiom in2(inv(s» == outcntl(tran2(s»; axiom ini(joini(s» == in(s); axiom out(s) == outi(joini(s»;

BIDS: A METHOD FOR BIDIRECTIONAL DEVICES

axiom axiom axiom axiom axiom axiom axiom axiom axiom axiom axiom axiom axiom axiom axiom endi

231

==

incnt1(tran1(s» out2(join1(s»i in2(join1(s» outcnt1(tran1(s»i in1(inv(s» out3(join1(s»i in3(join1(s» -- out1(inv(s»i in1(join2(s» -- out2(tran2(s»i in2(tran2(s» -- out1(join2(s»i in2(join2(s» -- out2(tran1(s»i in2(tran1(s» -- out2(join2(s»i in1(tran1(s» -- ina(s)i outa(s) out1(tran1(s»i in1(tran2(s» inb(S)i outb(s) out1(tran2(s»i in3(join2(s» inc(s); outc(s) -- out3(join2(s»i good(s) truei

==

== ==

==

== ==

==

We also found that the following assumptions are needed in the proofs:

assume a\#l, all v (strength(v) >= O)i assume a\#2, all xx «strength(in(xx» 0) imp «capacit(in(xx» and (charge(in(xx» = 0» )i

=

= 0)

where the types of v and xx are vire and mux respectively. The assumption a#l merely states the obvious fact that a wire cannot have negative strength. Since strengths of wires can be only 0, 1 or 2 in our model, we could have given a stronger assumption, but we found that the above weaker form is enough. The second assumption is that if the input wire of the CONTROL port has no strength, then it can have no capacitance or charge associated with it either. Assumption a#l is needed in the proofs of the verification conditions, whereas a#2 is needed in the proof of theorem t#l. It must be noted that the mux example needed more user interaction that the relatively simpler inverter and norgate examples. In larger examples even more user interaction may be needed. Finding ways of reducing the amount of interaction needed is a topic of ongoing research.

6

Conclusion

We have presented a method called BIDS for specifying and verifying lowlevel circuits. This method is distinguished from previous methods in that it allows relatively complete treatment of issues such as bidirectionality and stored charge. Many circuits can be dealt with using simpler approaches, as evidenced by a number of examples in the recent literature, without the rather high overhead of specification and proof complexity that the BIDS method incurs. However, we feel that it is important have available a method that can

232

REFERENCES

handle bidirectionality and stored charge in full generality for those cases in which a designer has good reasons to take,advantage of them. In any case, some of the complexity of the BIDS method could be reduced by having a preprocessor that constructs the set of connection equations from input in some simpler form. One of the most promising approaches is to construct the structural specifications from graphical input, and in fact such a facility has already been partially implemented as part of an Interactive VHDL Designers Workstation under development at General Electric Corporate Research and Development Center. A graphical editor has been developed to enter structural circuit specifications, from which equations and other formulae for the theorem prover are generated (and the proofs have been carried out in several examples). Thus far only a simpler version of the state-transition approach has been used, but we plan to experiment soon with automatic generation of BIDS model equations from graphical circuit representations. Although Affirm-85 was used in the experiments described in' this paper, we have since shifted our attention to RRL, a "Rewrite Rule Laboratory" currently being developed at CRD. RRL has a more powerful state-of-the-art theorem prover and is more naturally suited for reasoning with equations. The latter reason is quite important since equations play a central role in abstract data type specifications. Another feature of RRL is a better and more automatic way to prove theorems that require the use of induction. On the other hand, RRL currently lacks some of the proof interaction features of Affirm-85 that permit the user to provide guidance in cases where automatic proof attempts fail. We plan to adapt these Affirm-85 features for use in RRL.

References [1] Affirm Reference Manual (D. H. Thompson and R. W. Erickson, eds.), USC Information Sciences Institute, Feb. 1981. [2] Barrow, H. G., "Proving the Correctness of Digital Hardware Designs," VLSI Design, July 1984. [3] Gordon, M., "A Very Simple Model of Sequential Behaviour of nMOS," Proc. VLSI International Conference, J. Gray (ed.), Academic Press, London and New York, 1981. [4] Gordon, M. "Why Higher-Order Logic is a Good Formalism for Specifying and Verifying Hardware," Technical Report, University of Cambridge, Cambridge, U. K., 1985. [5] Guttag, J. V., and Horning, J. J., ''The Algebraic Specification of Abstract Data Types," Acta Informatica 10, 27-52 (1978). [6] Guttag, J. V., Horowitz, E., and Musser, D. R., "Abstract Data Types and Software Validation," CACM 21(12), pp. 1048-1064, December 1978.

REFERENCES

233

[7] Hunt, W. A., Jr., "FM8501: A Verified Microprocessor," Technical Report 47, University of Texas at Austin, December 1985. [8] Moszkowski, B., "A Temporal Logic for Multilevel Reasoning about Hardware," IEEE Computer, February 1985. [9] Musser, D. R., "Abstract Data Type Specification in the Affirm System," IEEE Tmnsactions on Software Engg., Vol. SE-6 (1), Jan. 1980. [10] Musser, D. R., and Cyrluk, D. A., "Affirm-85 Reference Manual," General Electric Corporate Research and Development Center, Schenectady, NY 12301, August 1985. [11] Sunshine, C. A., Thompson, D. H., Erickson, R. E., Gerhart, S. L., and Schwabe, D., "Specification and Verification of Communication Protocols in AFFIRM Using State 'Transition Models," IEEE Tmnsactions on Software Engg., Vol. SE-8, No.5, September 1982.

7 HARDWARE VERIFICATION IN THE INTERACTIVE VHDL WORKSTATION * Paliath Narendran

G.E. Corporate Research and Development Center Schenectady, NY 12345 and Jonathan Stillman

State University of NY at Albany (Consultant to GE-CRD) Albany, NY 12222

1. INTRODUCTION Formal Hardware Verification has gained a lot of attention recently as a viable alternative to simulation. Several notable achievements have been reported, such as Hunt [4], Birtwistle et. al. [1]. In this document we describe one aspect of an ongoing project aimed at developing an integrated environment for hardware description, simulation, and verification. We report here on the current efforts toward developing tools for formal verification of hardware and how these tools interact with other parts of the IVW workstation being developed at General Electric Research and Development Center (GE CRD). This is of necessity a preliminary document, as our work in developing an environment for highly automatic verification of hardware is at an early stage. There have been several ongoing activities on Software Specification, Verification, and Automated Reasoning at the CRD. Of these, our activities have centered on the Affirm-85 verification system and the Rewrite Rule Laboratory theorem prover. Most of the experimentation at the initial stages of the project was done with the Affirm-85 system and will be reported in another paper (Musser, Narendran, and Premerlani [12]). Here we report on the work currently being done with the Rewrite Rule Laboratory (RRL). This work is part of the Interactive VHDL workstation project (IVW) under Air Force contract number F33615-85-C-1862.

*

Work partially supported by Contract F33615-85-C-1862, AFWAL, Wright-Patterson Air Force Base, Ohio.

236

Hardware Verification in the Workstation

2. THE IVW WORKSTATION The Interactive VHDL Workstation project (IVW) is aimed at building a prototype VHDL workstation environment for design capture. The designs can be entered textually as well as graphically, although use of graphic editors is encouraged. The IVW integrates several graphical editors and specialized expert systems with a common object-oriented, design database, thus providing a more flexible medium of capturing design ideas than text entry systems. Once a design database is built, VHDL code can be automatically generated. Furthermore, a graphic editor, called the structure editor, is provided to maintain a correlation between several views of an object; changes made in one are reflected in the other views, both textually and graphically. (See Appendix C.) The design data base enables the user to store his designs and recall them later in designing higher level circuits. The major goal of the hardware verification project is to provide the user with tools to (a) enter the behavioral description of the circuit, and (b) automatically verify whether the circuit satisfies the behavior. The connectivity information for the structural description is generated from the graphical input. Verification is done using the Rewrite Rule Laboratory, an experimental theorem prover currently being developed at GE CRD. Emphasis is placed on the following issues: (a) The user should have a fairly powerful and easy-to-use front end for entering circuit descriptions. Furthermore, this should be language-independent, and should allow the user to build his circuit up hierarchically. The structure editor and the underlying design database are designed with this goal in view. (b) The verification should be automatic, or as automatic as possible. Our current efforts have been directed toward freeing the user from any detailed knowledge of the workings of the theorem prover. This approach has certain limitations which will be explained later. Note that currently we do not propose to build tools for the user to go directly from VHDL to RRL. As outlined in an earlier paper, work has also been done on modeling CMOS/NMOS circuits at the transistor level. The circuits are specified as abstract data types. We have not incorporated this into the verification framework at present. This will be done in the near future.

Hardware Verification in the Workstation

237

3. THE REWRITE RULE LABORATORY The Rewrite Rule Laboratory is a theorem proving program based on equational reasoning. The main approach is to treat axioms, given as equations, as rewrite rules; the loss of bidirectionality that this results in is compensated for by adding new rules. The goal is to fmd a 'complete' system where every two equivalent terms can be rewritten to a single term which is called a 'normal form' or 'canonical form.' Once such a complete system is found, every theorem in our theory can be proved - the proof involves rewriting the terms to their normal forms and checking if they are the same. RRL works as follows: given a set of equational axioms, the program helps the user turn them into rewrite rules that are finitely tenninating which essentially means that rewriting will never go on indefinitely. This part of the program is highly interactive since finite termination is established by setting certain precedences among the function symbols. The user is queried for these precedences. Searching for a complete set of rules is done using the Knuth-Bendix completion procedure (Knuth and Bendix [7]) which searches for ambiguities in rewriting by overlapping the left-hand sides of rules. Appendix A gives a typical RRL session where the input is the equational theory of groups. RRL also has a theorem prover for the full first-order predicate calculus where quantified formulae are allowed. The approach is refutational - the conclusion is negated and a contradiction is searched for. The proof procedure is equational in nature (Kapur and Narendran [6]). The formulae are represented as polynomials over the boolean ring whose operators are "exclusive-or" and "and." Searching for contradiction is done using a procedure similar to the KnuthBendix completion procedure. For formulae without equality, this procedure is almost completely automatic. RRL has a fairly powerful method for proving inductive theorems about equational specifications. This is called the "inductionless induction method" or "proof by consistency" (Kapur and Musser [5]). The approach is, in some sense, the dual of the "proof-by-contradiction" approach: the inductive theorem to be proved is added to the set of axioms, and if there is no inconsistency, we can assert the theorem to be true under certain conditions. However, this component of RRL needs more user interaction than the two mentioned before; in many cases the user has to coax the system into proving the theorem by proving several intermediate lemmas. The typing facility of RRL is very simple and primitive and still under development. Currently, variables are not typed; therefore the user cannot "overload" his functions. Apart from all this, RRL also gives the user the option to choose from several strategies in overlapping, rewriting, addition of new variables, etc. Statistics are provided about the times spent in most of the activities. The examples in the appendices illustrate many of these features.

238

Hardware Verification in the Workstation

4. A MODEL FOR CIRCUIT DESCRIPTION AND VERIFICATION Our approach to circuit description has been to provide a set of predefined elements as part of the structure editor's library; these elements have ports and behavioral descriptions, but no "internals". For instance, an "and" gate might be considered primitive at some point, with 2 input ports, 1 output port, and a behavioral description defining the output value as the logical "and" of the 2 input values. (Thus far we have restricted our behavioral descriptions to firstorder predicate calculus formulae in order to exploit the first-order capabilities of RRL, but this is not an inherent limitation.) New circuits are built by specifying their ports, the components (from the library) that make up the internals of the new circuit, and how those components are connected to one another and to the ports of the circuit being created. Finally, a behavioral description must be entered to describe the input-output behavior of the new circuit. Each behavioral deftnition is broken up into three parts: initialization assumptions, operational assumptions, and the behavioral statement. Initialization assumptions are those properties which follow from the circuit having been properly initialized; that is, we may want to assume that the circuit is in some given state at the time we begin to examine it. Operational assumptions involve constraints placed on the circuit from without; they often involve restrictions on the possible states of the circuit, such as stability of input signals, constraints on the values that input signals may have, or the behavior of an external clock. The behavioral statement describes how the circuit is expected to behave given that the initalization and operational assumptions hold. Once this information has been entered, the new circuit becomes part of the design library and can be used as a component of other circuits. Before we can be certain that the circuit behaves as specified, however, we must verify that the behavior specilled indeed follows from the structure of the circuit. The structural specification of the circuit can be obtained by examining each of its components in turn, "instantiating" its behavioral description by substituting the actual names of the wires hooked up to the ports in the particular instance for the port names. This results in a set of axioms which describe how the components force the circuit to behave. Since we are using a refutational approach to verification, the behavioral description of the circuit is negated and entered as an axiom. The resulting set of axioms is submitted to RRL where the actual proof takes place.

Hardware Verification in the Workstation

239

5. EXAMPLES: 5.1 The RS latch A simple example, the RS latch, is shoWD below:

r

q

qbar I

- Figure 1 The latch is made up of two "nor" gates, connected as shoWD. Assuming a unit delay in propagating the input values of a gate to the output value, the "nor" gate has the behavioral defmition:

all x (output(f(x» equ (not(input1(x) or inputl(x»))) where the variable x and the function f are used to express time (for instance, x is the initial time, f(x) is one time unit later, f(f(x» is two time units later, etc.). When instantiated to reflect the actual names of the wires connected to the input and output ports of each instance of a "nor" gate, the axioms derived from the structural desaiption of the latch are:

1. all x (q(f(x» equ (not (r(x) or qbar(x»» and 2. all x (qbar(f(x» equ (not (s(x) or q(x))))

In this example the only initialization assumption is that the q and qbar values start out complements of one another. The operational assumptions are that the r and s input lines are stable long enough for the circuit to stabilize and that the

240

Hardware Verification in the Workstation

input lines are not simultaneously high. Finally, the behavior of the circuit given that the initialization and operational assumptions hold is given. Thus the behavioral defmition of the latch is: all x «(q(x) equ (not(qbar(x)))) and (all x (not(r(x) and s(x)))) and (r(x) equ r(f(x))) and (s(x) equ s(f(x»» -+ «q(f(f(x))) equ (s(x) or (q(x) and (not (r(x)))))) and (qbar(f(f(x))) equ (not (q(f(f(x))))))))

That is, if q and qbar start out complemented, and the r and s inputs are stable for two time units, then when the circuit stabilizes, the q and qbar outputs are still complements, and furthermore the value of q is high if and only if either the s input was high or q was already high and the r input wasn't high. The input to RRL consists of the two derived axioms and the negation of the behavioral defmition. RRL runs the Knuth-Bendix completion procedure on this set of axioms; if an inconsistent system results, we may conclude that the behavioral defInition is correct. (A full transcript of the proof is given in Appendix B.)

5.2 A CMOS full adder In the above example, gates are considered to be primitive since no description of their architecture was entered. At another point, however, we might choose to use transistors as the primitive leve~ and would thus have to build gates from them. In the next example we show how one might use transistors as the primitive leve~ building a full adder circuit using transistors as the components. The two types of transistors we use are the n-transistor and the p-transistor. A description of these is given in the figure below:

p-translstor behavior: all x (not(a(x»->(b(x) equ c(x)))

n-translstor behavior: all x (a(x)->(b(x) equ c(x)))

- Rgure 2-

Hardware Verification in the Workstation

241

One way of implementing a full adder in CMOS is shown in the fJg1ll'e below (Gordon [3]):

- Figure 3Given such a circuit description we would like to ascertain whether it corresponds to our conception of what a full adder does. The behavioral defmition follows: (all x (all x (power(x) and (not(ground(x))))) «sum(x) equ (a(x) xor b(x) xor c-in(x))) and (c-out(x) equ «a(x) and b(x» or (a(x) and c-in(x» or (b(x) and c-in(x»))))

In other words, given that the power signal is always high and that the ground signal is always low, the the circuit being described behaves like a full adder (with the result available at the same instant that the input signals arrive). The axioms (both generated structural axioms and the negation of the behavioral defmition) given to RRL are listed below:

242

Hardware Verification in the Workstation

(all x «(not (PI (x» --+ (power (x) egu p2 (x»»» (all x «(not (a (x» --+ (P2 (x) equ p4 (x)))))) (all x «(not (b (x» --+ (p3 (x) equ p2 (x)))))) (all x «(not (e-in (x» --+ (power (x) equ p3 (x»»» (all x «(not (PI (x» --+ (p3 (x) egu p4 (x»))))) (an x «(not (p4 (x» --+ (power (x) equ sum (x»»» (an x «(not (a (x» --+ (power (x) equ p7 (x»»» (an x «(not (e-in (x» --+ (P7 (x) egu pi (x»»» (an x «(not (b (x» --+ (power (x) equ p7 (x»»» (all x «(not (a (x» --+ (power (x) equ p8 (x)))))) (all x «(not (b (x» --+ (p8 (x) egu pi (x»»» (an x «(not (PI (x» --+ (power (x) equ e-out (x»»» (all x «(a (x) --+ (p4 (x) equ p5 (x)))))) (an x «(PI (x) --+ (p5 (x) egu ground (x»»» (all x «(b (x) --+ (p6 (x) equ p5 (x)))))) (all x «(pi (x) --+ (p4 (x) equ p6 (x»»» (all x «(e-in (x) --+ (p6 (x) equ ground (x»»» (all x «(p4 (x) --+ (sum (x) egu ground (x»»» (all x (((e-in (x) --+ (PI (x) equ p9 (x»»» (an x «(a (x) --+ (p9 (x) egu ground (x»»» (an x «(b (x) --+ (p9 (x) equ ground (x)))))) (an x «(b (x) --+ (PI (x) equ plO (x»»» (all x «(a (x) --+ (pIO (x) equ ground (x»»» (all x «(p1 (x) --+ (e-out (x) equ ground (x»»» (not (all x «all x (power (x) and not (ground (x»» --+ «(sum (x) equ (a (x) xor b (x) xor e-in (x))) and (e-out (x) equ «a (x) and b (x» or (a (x) and e-in (x» or (b (x) and e-In (x»»»)))

The verification is completely automatic and takes approximately 2 minutes.

6. OTHER VERIFIED CIRCUITS At this point we have also automatically verified a number of other circuits, including circuits specified hierarchically and circuits such as clocked flip-flops with fairly complex timing patterns. We have also used this system to verify reasonably large combinational circuits at the gate leve~ even though the behavioral descriptions may be very different from the actual circuit description. This has proved useful in verifying sections of a systolic chip currently under development at GE. Work is presently underway on development of some methods for handling integer reasoning in RRL.

Hardware Verification in the Workstation

243

7. PROBLEMS AND ISSUES Our experiments thus far have uncovered several problems in the Rewrite Rule Laboratory and with automatic theorem proving in general. We briefly discuss some of them here. One is the inability of the prover to handle integers efficiently. At present, the only recourse is to specify integers and operations on them by Peano Arithmetic. Since many of the circuits (such as ALUs and multipliers) operate on integers, this is a serious drawback. Thus an important research problem in automated reasoning would be to develop ways of handling fragments of integer arithmetic (using built-in decision procedures, for instance). Progress in this direction has been reported in (Boyer and Moore [2]) where linear arithmetic has been incorporated into the basic logic of the theorem prover. We also feel that some interaction will be needed when the circuits grow larger in size. Some of this is reflected even in RRL, since proof of many theorems would involve induction and, as mentioned before, the induction component of RRL requires a certain amount of user interaction. Interactive capabilites would also be a useful option for knowledgeable users who could guide proofs that otherwise might either finish much more slowly or not finish at all. One fundamental problem in using an automatic theorem prover such as RRL for verification of circuits is that it is unlikely that one's first description will be correct; when this happens, the prover may never halt. Since the formalism used will probably be at least as powerful as first order predicate calculus this problem cannot be avoided, but developing methods for obtaining meaningful diagnostic information about incorrect circuit specifications is critical if such a tool is to be of practical value.

8. CONCLUSION As mentioned earlier, one of the issues that we had placed a lot of emphasis so

far was on making the verification process as automatic as possible. We believe we have made some progress in that direction, though more research is needed to incorporate integer capabilities. We have also been able to successfully handle feedback, as the flip-flop example illustrates. The overall approach has been to utilize the full power of first -order predicate calculus. Absence of usable theorem provers has prevented us from using other paradigms such as temporal logic (Moszkowski [9]). We have been able to provide the user with an easy-to-use front-end for entering his structural information. The graphic editor allows him to completely "draw" the circuit on the screen; the structural specification to be input to RRL is generated automatically. We do not have such facilities at present for behavioral specification. The obvious possibility is that of using VHDL itself for behavioral

244

Hardware Verification in the Workstation

specification. Due to reasons that are beyond the scope of this paper, we have decided against it. We are currently working on designing a user interface for behavioral specification. We would like to acknowledge the contributions of several of our co-workers whose efforts are making this work possible. Among them, notably, are Art Duncan, Wendy Haas-Murren, Kevin Spier and Rich Welty. We got considerablebelp in the use of RRL from Dave Cyrluk and Deepak Kapur. We would also like to thank Jim Caldwell, Bob Hardy, Dave Musser, BiU Premerlani, Dan Rosenkrantz, Joel Sturman and Pathy Yassa for contributing several ideas and suggestions. Above all, we would like to acknowledge Dave McGonagle, who is a strong motivating force for this work as well as a constant source of encouragement.

9. REFERENCES [1]

Birtwistle, G., Joyce, J., Liblong, B., Melham, T., and Schediwy, R. (1985). Specification and VLSI design. Research Report No. 85/21£J/33, Dept. of Computer Science, University of Calgary, Calgary, Alberta, Canada.

[2]

Boyer, R. and Moore, J.S. (1985). Integrating Decision Procedures into Heuristic Theorem Provers: A Case Study of Linear Arithmetic. Technical Report ICSCA-CMP-44, Institute for Computing Science, University of Texas at Austin.

[3]

Gordon, M. (1985). Why Higher-Order Logic is a good formalism for specifying and verifying hardware. Technical Report 77, Computer Laboratory, University of Cambridge, Cambridge, U.K.

[4]

Hunt, WA., Jr. (1985). FM8501: A Verified Microprocessor. Technical Report 47, Institute for Computing Science, Univ. of Texas at Austin.

[5] [6]

Kapur, D. and Musser, D.R. (1984). Proof by Consistency. To appear in

Artificial Intelligence.

Kapur, D. and Narendran, P. (1985). An Equational Approach to Predicate Calculus. In: Proceedings of UCAI-85, Los Angeles.

[7] Kapur, D., Sivakumar, G., and Zhang, H. (1986). RRL: A Rewrite Rule Laboratory. In: Proceedings of 8th Conference on Automated Deduction (CADE-86), Oxford, U.K. [8]

Knuth, D.E. and Bendix, P.B. (1970). Simple Word Problems in Universal Algebras. in Computational Problems in Abstract Algebras (ed. J. Leech), Pergamon Press, 263-297.

[9]

Moszkowski, B. (1985). A Temporal Logic for Multilevel Reasoning about Hardware. IEEE Computer.

[10]

Musser, D.R. (1980). Proving Inductive Properties of Abstract Data Types. In: Proc. of 7th Principles of Programming Languages Conf., Las

Hardware Verification in the Workstation

245

Vegas, Nevada. [11]

Musser, D.R., and Cyrluk, DA. (1985). AfI"Irm-85 Installation Guide and Reference Manual Update. GE Corporate R&D Center, Schenectady, NY 12345.

[12]

Musser, D.R., Narendran, P. and Premerlani, W. (1987). BIDS: A Method for Specifying and Verifying Bidirectional Hardware Devices. Appears elsewhere in this volume.

10. APPENDIX A WELCOME TO REWRITE RULE LABORATORY (RRL) (c) Copyright 1985,1986 G.Sivakumar. All rights reserved. (c) Copyright 1985,1986 Hantao Zhang. All rights reserved. (c) Copyright 1985,1986 General Electric Company. All rights reserved. Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, lnit, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, SufIic, Undo, Unlog, Write or Help. RRL-> add Type your equations, enter a ')' when done.

x + (y + z) = = (x + y) + z O+x==x i(x) ]

+x

==0

Equations read in are: 1. (X + (Y + Z» = = «X + Y) + Z) [USER, 1] 2. (0 + X) = = X [USER, 2] 3. (I(X) + X) = = 0 [USER, 3] New constant set is: { 0 } Time = 62.75 sec Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, !nit, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, Suffic, Undo, Unlog, Write or Help. RRL-> kb {stands for "Knuth-Bendix"} ----- Step 1 ----Trying to orient equation: (X + (Y + Z» = = «X + Y) + Z) [USER, 1] To prove: (X + (Y + Z»

> «X + Y) + Z)

246

Hardware Verification in the Workstation

or: (X + (Y + Z» < ((X + Y) + Z) I have no precedence suggestions. Try doing Equiv or Status. Type Abort, Display, Equiv, Twoway, Operator, Status, LR, RL, Postpone, MakeEq, Quit, Undo or Help. RRL>KB> status Type operator you wish to set status for: + Type LR, RL, HELP --> lr {stands for "left-to-right"} Operator, +, given status: LR ----- Step 2 ----Adding rule: [1] ((X + Y) + Z) --->

ex + (Y + Z»

[USER, 1]

Adding rule: [2] (0 + X) --- > X [USER, 2] Trying to orient equation: (I(X) + X) = = 0 [USER, 3] To prove: (I(X) + X) > 0 Here are some precedence suggestions: 1. + > 0 2. I> 0

Either type a list of numbers or . Type Abort, Display, Equiv, Twoway, Operator, Status, LR, RL, Postpone, MakeEq, Quit, Undo or Help. RRL> KB > 1 2 { + and I are given higher precedence over O} ----- Step 3 ----Adding rule: [3] (I(X) + X) --- > 0 [USER, 3] Computing critical pairs with: [2] (0 + X) --- > X [USER, 2] Computing critical pairs with: [3] (I(X) + X) --- > 0 [USER, 3] Computing critical pairs with: [1] + Y) + Z) ---> (X + (Y + Z» [USER, 1]

(ex

Adding rule: [4] (I(Y) + (Y + Z» ---> Z [3,1] Computing critical pairs with: [4] (I(Y) + (Y + Z» ---> Z [3,1]

Hardware Verification in the Workstation

Adding rule: [5] (1(0) + Z) --- > Z [2, 4] Adding rule: [6] (I(I(Z» + 0) ---> Z [3,4] Adding rule: [7] (I«X + Y» + (X + (Y + Z») ---> Z [1,4] Adding rule: [8] (I(I(Y» + Z) ---> (Y + Z) [4,4] Deleting rule: [6] (I(I(Z» + 0) ---> Z [3,4] Adding rule: [9] (Z + 0) ---> Z [DELETED,6] Computing critical pairs with: [9] (Z + 0) --- > Z [DELETED, 6] Adding rule: [10] 1(0) ---> 0 [3,9] Deleting rule: [5] (1(0) + Z) ---> Z [2, 4] Computing critical pairs with: [10] 1(0) ---> 0 [3,9] Computing critical pairs with: [8] (I(I(Y» + Z) ---> (Y + Z) [4,4] Adding rule: [11]I(1(Y» ---> Y [9,8] Deleting rule: [8] (I(I(Y) + Z) ---> (Y + Z) [4,4] Computing critical pairs with: [11] I(I(Y» ---> Y [9,8] Adding rule: [12] (Y + I(Y» --- > 0 [11, 3] Adding rule: [13] (Y + (I(Y) + Z» ---> Z [11,4] Computing critical pairs with: [12] (Y + I(Y» ---> 0 [11,3] Adding rule: [14] (X + (Y + I«X + Y»» ---> 0 [1,12] Computing critical pairs with: [13] (Y + (I(Y) + Z» ---> Z [11,4] Adding rule: [15) (X + (Y + (I«X + Y» + Z») ---> Z [1,13]

247

248

Hardware Verification in the Workstation

Computing critical pairs with: [14] (X + (Y + I«X + Y)) ---> 0 [1,12] Adding rule: [16] (X + (Y + (Y1 + I«X + (Y + Y1»»» ---> 0 [1,14] Adding rule: [17] (Y + 1«Y1 + Y)) ---> 1(Y1) [14,4] Deleting rule: [14] (X + (Y + I«X + Y)) ---> 0 [1, U] Computing critical pairs with: [17] (Y + 1«Y1 + Y)) ---> I(Y1) [14,4] Adding rule: [18] (X + (Y + 1«Y1 + (X + Y))) ---> 1(Y1) [1,17] Deleting rule: [16] (X + (Y + (Y1 + I«X + (Y + Y1»»» ---> 0 [1,14] Adding rule: [19] (Y + I«X + (Y1 + Y)) ---> I«X + Y1» [1,17] Deleting rule: [18] (X + (Y + I«Y1 + (X + Y))) ---> 1(Y1) [1,17] Adding rule: [20] (X + (I«Y +



+ Z» ---> (I(Y) + Z) [17,1]

Deleting rule: (15) (X + (Y + (I«X + Y) + Z») ---> Z [1,13] Trying to orient equation: (I(Y) + 1(Y1» = = 1«Y1 + Y) [17,4] To prove: (I(Y) + 1(Y1» > 1«Y1 + Y) or: (I(Y) + I(Y1» < I«Y1 + Y) Here are some precedence suggestions: 1.+>1 2.1>+ Either type a list of numbers or Type Abort, Display, Equiv, Twoway, Operator, Status, LR, RL, Postpone, MakeEq, Quit, Undo or Help. RRL>KB> 2 ----- Step 4 ----Adding rule: [21] I«Y + Y1» ---> (I(Y1) + I(Y) [17,4] Deleting rule: [7] (I«X + Y) + (X + (Y + Z») ---> Z [1,4] Deleting rule: [17] (Y + 1«Y1 + Y)) ---> 1(Y1) [14,4] Deleting rule: [19] (Y + I«X + (Y1 + Y)) ---> I«X + Y1» [1,17] Deleting rule: [20] (X + (I«Y + X» + Z» ---> (I(Y) + Z) [17,1]

Hardware Verification in the Workstation

Computing critical pairs with: [21] I«Y + Y1» ---> (I(Y1) + I(Y» [17,4] Your system is canonical. [1] «X + Y) + Z) ---> (X + (Y + Z» [USER,l] [2] (0 + X) ---> X [USER,2] [3] (I(X) + X) --- > 0 [USER, 3] [4] (I(Y) + (Y + Z» ---> Z [3,1] [9] (Z + 0) ---> Z [DELETED,6] [10] 1(0) ---> 0 [3,9] [11] I(I(Y» ---> Y [9,8] [12] (Y + I(Y» ---> 0 [11,3] [13] (Y + (I(Y) + Z» ---> Z [11,4] [21] I«Y + Y1» ---> (I(Y1) + I(Y) [17,4] Processor time used Number of rules generated Number of critical pairs Time spent in normalization (0.7968128 percent of time) Time spent while adding rules (8.366534 percent of time) (keeping rule set reduced) Time spent in unification (1.5936255 percent of time) Time spent in ordering

= 31.375 sec = 21 = 73 = 0.25 sec = 2.625 sec = 0.5 sec = 27.375 sec

(87.25099 percent of time)

Total time (including 'undo' action) Time = 38.25 sec

= 31.75 sec

Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, !nit, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, Suffic, Undo, Unlog, Write or Help. RRL-> list No equations in current system. Rules: [1] «X + Y) + Z) ---> (X + (Y + Z» [USER, 1] [2] (0 + X) --- > X [USER, 2] [3] (I(X) + X) --- > 0 [USER, 3] [4] (I(Y) + (Y + Z» ---> Z [3,1] [9] (Z + 0) ---> Z [DELETED, 6] [10] 1(0) --- > 0 [3, 9]

249

250

Hardware Verification in the Workstation

[11] I(I(Y» ---> Y [9,8] [12] (Y + I(Y» ---> 0 [11,3] [13] (Y + (I(Y) + Z» --- > Z [11, 4] [21] I«Y + Y1» ---> (I(Y1) + I(Y» [17,4] Time = 12.875 sec Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, lnit, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, Suffic, Undo, Unlog, Write or Help. RRL->prove Type equation to prove in the format: L == R (if C) Enter a 'J' to exit when no equation is given.

I(I(I(z»)

== I(z)

Yes, it is equational theorem. Tnne = 75.75 sec Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, Init, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, Suffic, Undo, Unlog, Write or Help. RRL-> quit Goodbye.

11. APPENDIX B WELCOME TO REWRITE RULE LABORATORY (RRL) (c) Copyright 1985,1986 G.Sivakumar. All rights reserved. (c) Copyright 1985,1986 Hantao Zhang. All rights reserved. (c) Copyright 1985,1986 General Electric Company. All rights reserved. Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, lnit, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, Suffic, Undo, Unlog, Write or Help. RRL- > read Iatcb.eqn Equations read in are: 1. ALL X «Q(F(X» EQU NOT«R(X) OR QBAR(X»») [USER, 1] 2 ALL X «QBAR(F(X» EQU NOT«S(X) OR Q(X»») [USER, 2] 3. NOT(ALL X ««Q(X) EQU NOT(QBAR(X») AND (ALL X (NOT«R(X) AND S(X»» AND «R(X) EQU R(F(X») AND (S(X) EQU S(F(X»»» -> «Q(F(F(X») EQU (S(X) OR (Q(X) AND NOT(R(X»») AND (QBAR(F(F(X») EQU NOT(Q(F(F(X»»»») [USER,3] Time 6.3125 sec

=

Hardware Verification in the Workstation

251

Type Add, Akb, Auto, Break, Clean, Delete, Dump, Grammar, Init, Kb, List, Log, Makerule, Narrow, Norm, Order, Option, Operator, Prove, Quit, Read, Refute, Stats, Suffic, Undo, Unlog, Write or Help. RRL-> Kb ----- Step 1 ----Adding rule: [1] Q(F(X» & QBAR(X) & R(X) ---> (Q(F(X» & QBAR(X) + Q(F(X» & R(X» [USER, 1] Adding rule: [2] QBAR(X) & R(X) ---> (TRUE + Q(F(X» + QBAR(X) + R(X» [USER, 1] Deleting rule: [1] Q(F(X» & QBAR(X) & R(X) ---> (Q(F(X» & QBAR(X) + Q(F(X» & R(X» [USER,1] Adding rule: [3] Q(X) & QBAR(F(X» & S(X) ---> (Q(X) & QBAR(F(X» + QBAR(F(X» & S(X» [USER,2] Adding rule: [4] Q(X) & S(X) ---> (TRUE + Q(X) + QBAR(F(X» + S(X» [USER, 2] Deleting rule: [3] Q(X) & QBAR(F(X» & S(X) ---> (O(X) & OBAR(F(X» + QBAR(F(X» & S(X» [USER, 2] S1 is a skolem function for X in the assertion: NOT(ALL X ««Q(X) EQU NOT(QBAR(X») AND (ALL X (NOT«R(X) AND S(X»» AND «R(X) EQU R(F(X») AND (S(X) EQU S(F(X»»))) -> «Q(F(F(X») EQU (S(X) OR (Q(X) AND NOT(R(X»») AND (QBAR(F(F(X») EQU NOT(Q(F(F(X»»)))))) Adding rule: (5) Q(S1) & QBAR(S1) ---> FALSE [USER,3] Adding rule: [6] QBAR(S1) ---> (TRUE + Q(S1» [USER, 3] Deleting rule: (5) Q(S1) & QBAR(S1) ---> FALSE [USER,3] Adding rule: [7] R(X) & S(X) --- > FALSE [USER, 3]

252

Hardware Verification in the Workstation

Adding rule: [8] R(SI) & R(F(SI» --> R(SI) [USER,3] Adding rule: [9] R(F(SI» ---> R(SI) [USER,3] Deleting rule: [8] R(SI) & R(F(SI» ---> R(SI) [USER,3] Adding rule: [10] S(SI) & S(F(SI» ---> S(SI) [USER,3] Adding rule: [11] S(F(SI» ---> S(SI) [USER, 3] Deleting rule: [10] S(SI) & S(F(SI» ---> S(SI) [USER,3] Adding rule: [12] QBAR(F(SI» & QBAR(F(F(SI))) & R(SI) ---> (Q(F(F(SI))) + Q(F(F(SI))) & QBAR(F(SI» + Q(F(F(SI))) & QBAR(F(F(SI») + QBAR(F(SI» & QBAR(F(F(SI))) + Q(F(F(SI))) & R(SI) + Q(F(F(SI))) & QBAR(F(SI» & R(SI) + QBAR(F(F(SI))) & R(SI» [USER, 3] Computing critical pairs with: [6] QBAR(SI) ---> (TRUE + Q(SI» [USER, 3] Adding rule: [13] Q(SI) & R(SI) ---> (Q(SI) + Q(F(SI») [2, 6] Computing critical pairs with: [11] S(F(SI» ---> S(SI) [USER, 3] Adding rule: [14] Q(F(SI» & S(SI) --> (TRUE + Q(F(SI» + QBAR(F(F(SI») + S(SI» [4, 11] Computing critical pairs with: [9] R(F(SI» --> R(SI) [USER, 3] Adding rule: [15] QBAR(F(SI» & R(SI) ---> (TRUE + Q(F(F(SI))) + QBAR(F(SI» + R(SI» [2, 9] Deleting rule: [12] QBAR(F(SI» & QBAR(F(F(SI))) & R(SI) ---> (Q(F(F(SI») +

Hardware Verification in the Workstation

O(F(F(SI») & OBAR(F(SI» + O(F(F(SI))) & OBAR(F(F(SI))) + OBAR(F(SI» & OBAR(F(F(SI))) + O(F(F(SI))) & R(SI) + O(F(F(SI))) & OBAR(F(SI» & R(SI) + OBAR(F(F(SI))) & R(SI» [USER, 3] Adding rule: [16] OBAR(F(F(SI))) ---> O(F(F(SI))) [DELETED, 12] The right hand side is reducible: [14] O(F(SI» & S(SI) ---> (TRUE + O(F(SI» + OBAR(F(F(SI))) + S(SI» [4, 11] Computing critical pairs with: [16] OBAR(F(F(SI))) ---> O(F(F(SI))) [DELETED, 12] Adding rule: [17] O(F(F(SI))) & R(F(F(SI))) ---> (TRUE + O(F(F(SI))) + O(F(F(F(SI)))) + R(F(F(SI)))) [2, 16] Computing critical pairs with: [13] O(SI) & R(SI) ---> (O(SI) + O(F(SI))) [2, 6] Derived: [18] TRUE ---> FALSE [4, 13] Your system is not consistent. Processor time used Number of rules generated Number of critical pairs Time spent in normalization (13.392857 percent of time) Time spent while adding rules (37.5 percent of time) (keeping rule set reduced) Time spent in unification (0.0 percent of time) Time spent in ordering (4.464286 percent of time)

=

7.0 sec

= 18

=8 = 0.9375 sec

= 2.625 sec = 0.0 sec = 0.3125 sec

253

254

Hardware Verification in the Workstation

Time spent in boolean translation (15.178572 percent of time)

=

1.0625 sec

Total time (including 'undo' action) Time = 8.9375 sec

=

7.03125 sec

Hardware Verification in the Workstation

255

12. APPENDIX C Structure editor representation of the full adder: shown are both a hierarchical view of the circuit and a CMOS implementation.

'.

--

-

--

-

...... M

"

.,

.'

Left......

.

-Do_

-

,

8 Contextual Constraints for Design and Verification Bruce S. Davie Department of Computer Science University of Edinburg, U.K. George J. Milne Texas Instruments Ltd. Manton Lane, Bedford MK41 7PA, U.K.

Abstract. A new approach to the design and verification of VLSI structures capitalizing on knowledge of the surrounding environment is proposed. This approach formalizes the use of contextual information as a means of simplifying the design and verification process and provides us with a new way of viewing the hierarchical evolution of a design. The proposed design process illustrates the different uses of contextual constraints to aid and guide the designer. In particular, constraints may be introduced to simplify verification and must in turn be satisfied by the design itself. Hence we find that design choices may be made to aid validation, an interesting reversal in the usual post-design role of simulation and formal verification. 1. INTRODUCTION

Design may be viewed as the repeated transformation of an abstract functional description through progressively more detailed levels of description until a concrete level of design is reached. Design is the process where information is added to a continuously developing design description, introducing more and more detail, while preserving the functionality of the design. The design evolution involves transforming a design description at a given level of abstraction (level of design detail) into a more detailed description which captures implementation of 'internal' properties belonging to an adjacent level. The behaviour of the design representations at adjacent levels should be closely related if not equivalent and this property should be demonstrable by mathematical proof, called verification, or by simulation. The design and validation of a circuit will generally require knowledge of more than just the intended device specification; it will require knowledge of the context in which the device is intended to function. Some contextual information will be necessarily detailed, where a particular context in which the device is to operate in is known. Alternatively some contexts will be unknown and in this case the device must be designed to function correctly

258

Contextual Constraints for Design and Verification

under all environments. Between the two, varying amounts of contextual information may be specified. A context is similar to a specification in that both are used to define information relating to the design. While for verification purposes we are primarily concerned with behavioural information, contexts and specifications may also refer to structural and geometrical properties. In an integrated design and verification framework we therefore require a rigorous way in which to represent specifications, contexts and evolving designs. We will propose design and verification within contexts as a new approach to VLSI design. The essence of this approach will be illustrated using a flowchart to picture the design cycle together with a simple example to indicate where contextual constraints fit into this design cycle. 2. THE DESIGN PROCESS

We will begin by considering a hierarchical design methodology for VLSI in which design and validation are carried out in an integrated way. The advantages of such an approach have been discussed in detail in [1], [2]. The key advantage is the ability to detect errors as soon as they are made in the design process rather than needing to design a circuit fully before finding that it may be incorrect. Figure 1 is a flow chart which shows how a designer might process through a design exercise in a hierarchical way, validating his design decisions at each successive level in the hierarchy before moving on to the next lowest level. We assume that at any stage his aim is to design a box; 'design' here means to partition the box into a number of smaller cells, specifying their interconnectiOn and respective behaviours. A box is simply an abstract circuit whose internal details are not yet specified. The required behaviour of the box will be its spec; the behaviour that is attached to a cell is referred to in the diagram as a cellspec. When the designer initially sets out to design a circuit, that whole circuit will be the box at level 1 in the hierarchy. As the design proceeds, the cells in turn become the boxes, which must also be designed. We have assumed for the purposes of our illustration that this repated partitioning terminates when a level is reached in the hierarchy where the cells are parts which already exist in a library of predefined parts. This would be the case in a standard cell design system. In a full custom design it might terminate when the cells are of suitable complexity to be laid out as leaf cells. There are various points worth noting in this somewhat idealized representation of the design process. The partitioning of a box into cells (which includes specifying their interconnections) is presented as completely separate operation from the description of cell behaviours. The former

Contextual Constraints for Design and Verification

259

operation is purely structural, the latter purely behavioural. This distinction may be considered artificial, as a designer will almost certainly have some idea of the intended behaviour of the cells when he does the partitioning. However, this association of behaviour with structure is not made formal until the specification phase, and so we consider the two operations separately here. It should also be noted that we are not by any means proposing a fully automated design process here. While we would hope that some machine assistance would be available for many of the tasks, and the role of constraints in providing some of this assistance will be discussed below, operations such as partitioning will certainly require some amount of skill on the part of the designer. There are various possibilities for iterative design here. If the behaviour of the box's implementation imp, which is obtained by composing the cellspecs, is not in some way equivalent! to the required behaviour spec, then it is necessary to make a new partitioning of the box. If it is eventually found that the box cannot be implemented successfully with any partitioning then we must move back up through the design hierarchy; the 'parent' box will then have to be redesigned. At the end of the design the device which we set out to implement, and perhaps several intermediate devices as well, will have been fully designed down to same level of 'primitive' parts. So that these implementations may be used in later designs, they should be added to a library, as shown in the last stage on the chart. Having now outlined a formal design and verification methodology, we are in a position to discuss. the ways in which contextual constraints may be of assistance in such a methodology. In the following Section the concept of contextual constraint will be presented and its role in design and validation outlined. 3. CONTEXTUAL CONSTRAINTS

Knowledge of the environment or context in which a design is to be used can drastically aid the design and verification process. This is an approach which is informally taken by designers at present, although the lack of a formal design/verification framework prevents the technique from being elevated into a powerful design methodology. As a example consider 1

Note that the notion of equivalence is not defined here. It is dealt with in detail in [51.

260

Contextual Constraints for Design and Verification

pl*ibly r.partitioD

(a) (b)

(c),(d)

Figure 1: The Design Process

Contextual Con8traint8 for De8ign and Verification

261

synchronous devices which use certain clocking regimes to simplify their design. While a context involving a specific system clock simplifies the design process by restricting the class of environments in which the circuit must work correctly, it also greatly simplifies the verification process. The circuit behaviour is restricted by the limited interactions which the environment or context will permit; it will evolve into a much reduced set of states through time while in any state only a limited set of possible actions will be permitted by the environment. We propose a technique where specifications, designs and contextual information are all described in a common language. Contextual knowledge is captured as just another expression in the language and may be used to simplify the representation of a design in a rigorous manner. This uses properties of the mathematical model which underlies the descriptive language, so also simplifying design verification. To illustrate the central ideas behind using contextual knowledge to constrain parts of the design process consider the task of verifying that a flip-flop constructed out of a number of gates is correct. While this at first sight appears to be a trivial example due to the structural simplicity of a flip-flop it should be noted that it has sophisticated time-related behaviour due to feedback. Describing the flip-flop behaviour, a necessary prerequisite for verification, requires that all response to all possible patterns of stimuli on its set and reset ports be represented. Certain patterns are known to cause a flip-flop to react in strange ways; 'spikes' on the inputs and simultaneous set and reset operations are two such examples. But we will frequently know something about the context in which the flip-flop is to be used. A system clock can be used to control the latching of signals such that spikes and simultaneous set and reset actions will never be generated. What is then required is to verify the flip-flop design only under the legitimate patterns of set and reset actions which we assume that the context will generate. At some stage the context must also be verified to assure ourselves that it generates the restricted behaviour as intended but this is a separate part of the design process and may be considered in isolation. The significance of contextual restriction is illustrated in the Section 4 where a D-type flip-flop is designed and verified using contextual constraints where possible. The Role or Constraints Four distinct uses of constraints may be isolated and their role in the total design process can be indicated by relating them to the flowchart in Section 2. (a)

They dictate, at least partially, the context in which a design component may be used and in so doing they restrict the design choices available when designing that component.

262

Contextual Constraints/or Design and Verification

(b)

They aid the component specification task by allowing us to ignore some of the possible behavioural patterns which an unconstrained environment might generate.

(c)

They may be introduced to aid component verification by reducing the state-space under consideration as indicated in the above discussion on the design of a flip-flop.

(d)

They may be generated when a verification proof is performed and will then be necessary to ensure design correctness. The context must then be modified to reflect this additional constraint and we have here an example of 'design for verification'.

The first two uses of constraints are primarily for design reasons while the latter two fulfill a validation role. However the use of constraint information always applies to both design and validation; what aids one will generally aid the other. Constraints for Design

As mentioned previously we imagine design to center around the task of creating and validating interconnected design components called boxes, as indicated by the flowchart of Section 2. We can assume that each box representation held in the design library should include the constraint information pertaining to that box. Hence boxes are represented by an identifier name, structural information which is captured by the sort, a behavioural specification written as a CmCAL expression, possibly some geometrical layout information and finally a (possibly null) contextual constraint expression which is also represented in CmCAL. If no constraints are to be imposed on the box, i.e., if it is designed to function correctly in any context, then the constraint is null. The reader is referred to [3J for details of the representation of circuit structure and behaviour by CmCAL expressions.

Property (a) above aids design partitioning which is the creative step in the design process. It therefore fits into the flow diagram at the partitioning stage as indicated by the (a) in the flowchart. Property (b) makes the specification task more straightforward and relates to the fundamentally important behavioural description stage of the design of a design component. Constraints for Verification

Constraints may be imposed on a design component in one of two ways; the context that the component is to be used within may be known prior to design, for instance a certain clocking scheme may be predetermined. Alternatively constraints may be found to be necessary to allow component verification to proceed by imposing restrictions on the

Contextual Constraints for Design and Verification

263

possible environmental interactions that the component can respond to. This constraint must then be the surrounding context, i.e., the other components which will interact with it. Property (c) above relates to constraints imposed prior to verification, introduced either in anticipation of the verification to come or else by the given design partitioning. It fits into the flowchart int he partitioning stage or in the verification stage respectively. Property (d), the discovery of the need for constraints to aid the verification process obviously fits into the verification stage of the flow chart, where we attempt to construct a proof that the component implementation meets its specification. Various CIRCAL validation techniques are presented by Milne [4J and Traub [5J.

DFF D

Q

eLK

Figure 2: D flip-flop implementation

4. AN EXAMPLE

We will consider the situation where a designer is attempting to design a simple D flip-flop. While this example may seem too simple or may appear contrived since the design flip-flop is already well understood, it does represent a situation which could perhaps arise if designing in some new technology or in an attempt to fit the device into some awkwardly shaped or sized area of a chip. Furthermore, it is hoped that the ideas illustrated here could be applicable to one or more general and complex design situations. It should also be noted that this example is intended to illustrate only some of the various ways in which contextual constraints may be of assistance to a designer. In the terminology of the above flowchart, the D flip-flop is the box which the designer wishes to design. Support he wishes to partition it into an RS flip-flop and some gates. Ai; mentioned in Section 2, this is an essentially structural operation, and must be followed by formally associating behaviours with each of the cells. In specifying the behaviour of the RS flip-flop, it is necessary either to specify its behaviour fully (i.e., taking account of all possible patterns of inputs) or partially. Ai; stated in Section 3, writing a full specification of an RS flip-flop is actually a nontrivial task, as its behaviour when presented with simultaneous ones on the

264

Oonteztual Oon8traint8 for De8ign and Verification

set and reset inputs is quite complex and is usually undesirable. Thus we may wish to specify its behaviour without taking account of such undesirable input patterns. If this approach is to be taken, then steps must also be taken to ensure that the context in which the RS flip-nop will be placed will never generate the pattern of inputs. Thus in addition to the flip-nop's partial specification, we would define the constraint in which it may legitimately be used. The partial specification and the corresponding constraint on the environment are as follows: RSFFO{res,set,q}

~

resl resO RSFFO setl ql setO RSFFl

RSFFl {res,set,q}

~

resl qO resO RSFFO

ENV{ res,set}

~

resl resO ENV + setl setO ENV

+

setl setO RSFFl

A full discussion of the CmCAL syntax and its semantics is presented in [3]. In essence, the contextual constraint specified in the last line of the above description states that the context or environment (ENV) must only present certain patterns of stimuli on the ports 'res' and 'set'. These are specified in such a way that res and set are never simultaneously presented with ones. Note that for the sake of clarity in this illustration we have not dealt with other constraint mentioned, that of preventing input 'spikes'. So we have shown that a contextual constraint may make the task of writing specifications simpler by allowing us to ignore certain patterns of inputs. Similarly, a part in a standard cell library, for example, might have a constraint associated with it, such as the set-up time required on its inputs. In this situation, as in the above, we see the need for a formal way of both specifying and manipulating constraints. Having writing the constraints on the context of a device in a formal way, a designer must then ensure that the context does in fact meet those constraints. In the case where the context is not yet designed, this means that knowledge of the constraints will innuence the designer's choices. In the case of the simple D flip-nop design, knowledge of the constraint that set and reset lines must never be simultaneously one might lead the designer to place an inverter between the two inputs; also, knowing that spikes on the inputs are undesirable, he might choose to clock the inputs using the arrangement of and gates shown in Figure 2. Having chosen to implement the context of the RS flip-nop in this way, the designer would formally associate behaviours with the components which make up the context and then verify that they do in fact meet the contextual constraint specified above.

Oontextual OonBtraint8 for De8ign and Verification

265

In this example there is still another way in which constraints can be seen to be of assistance to the designer. This is the area of validation of the design. When the D flip-flop is the box in the terminology of Figure 1, the designer must verify that the composition of the behaviours of the RS flipflop and the three gates satisfies the behavioural specification of the D flipflop. The greatly simplified specification of the RS flip-flop which was allowed by the formal specification of a constraint on its context leads directly to a simpler proof of this behavioural equivalence. Furthermore, in the next level down the hierarchy, when the RS flip-flop becomes the box and is in turn designed using a number of cells (e.g., nor gates), the verification process will also be simplified. This is again a direct result of the use of a simpler specification where certain patterns of input stimuli are disallowed by the use of a contextual constraint. 5. Concluding Remarks

We have informally presented contextual techniques for design and verification and indicated their utility as a practical tool for circuit design. This exercise has also illustrated the necessity of using a rigorous description and proof framework when we wish to take a design rule-ofthumb and extend and develop it into a powerful and far-reaching methodology. Here we have taken the informally used concept of contextual design and outlined how it may be used as a central feature of a design system.

REFERENCES

[1]

Davie, B. S. Hardware De8cription Language8: Some Recent De1Jelopment8, Departmental Report CSR-198-86, Edinburgh University, 1986.

[2]

Davie, B. S. and Milne, G. J. The Role of Behaviour in VLSI De8ign Language8 in Proc. IFIP WB 10.2 Conference, From HDL Descriptions to Guaranteed Correct Circuit Designs, Grenoble, Sept. 1986, (North-Holland).

[3]

Milne, G. J. OIROAL. A Oalculu8 for Oircuit De8cription, Integration, the VLSI Journal, vol. 1, nos. 2&3, p. 121-160, October 1983.

[4]

Milne, G. J. Simulation and Verification: Related Technique8 for Hardware Anal7l8i8 in Proc. 7th. Int. Symposium on Computer Hardware Description Languages and their Applications, CHDL '85, Tokyo, Koomen and Moto-oka (eds.), (North-Holland), 1985.

[5]

Traub, N. G. A Formal Approach to Hardware Ana17l8i8, Phd. thesis, Edinburgh University, 1986.

9 Abstraction Mechanisms for Hardware Verification Thomas F. Melham University of Cambridge Computer Laboratory Corn Exchange Street Cambridge, CB2 3QG England Abstract: It is argued that techniques for proving the correctness of hardware designs must use abstraction mechanisms for relating formal descriptions at different levels of detail. Four such abstraction mechanisms and their formalisation in higher order logic are discussed.

Introd uction Recent advances in microelectronics have given designers of digital hardware the potential to build electronic devices of unprecedented size and complexity. With increasing size and complexity, however, it becomes increasingly difficult to ensure that such systems will not malfunction because of design errors. This problem has prompted some researchers to look for a firm theoretical basis for correct design of hardware systems. Mathematical methods have been developed to model the functional behaviour of electronic devices and to verify, by formal proof, that their designs meet rigorous specifications of intended behaviour. Proving the correctness of a design using such hardware verification techniques typically involves: • Writing a set of formulas S which express the specification of the device whose design is to be proven correct. • Describing the design or implementation of the device by a second set of formulas I. • Proving that a 'satisfaction' or correctness relation holds between the sets of formulas I and S-i.e. that the implementation satisfies the specification. This process is usually carried out at each level of the structural hierarchy of the design. The top level specification is shown to be satisfied by some connection of components; the specifications of these components are in turn shown to be satisfied by their implementations, and so on-until the level of primitive components is reached.

268

Abstraction Mechanisms for Hardware Verification

If this technique is to be used to verify large and complex designs, it is clear that the 'satisfaction relation' that is used cannot be strict equivalence. Otherwise, at each level of the design hierarchy, the specifications will contain all the information present in the design descriptions at the level below. For large and complex designs this means that the specifications at the upper levels of the hierarchy will themselves become so large and complex that they can no longer be seen to reflect the intended behaviour of the device. Furthermore, proofs of equivalence between such complex formulas will become unmanageable, even when automated theorem proving tools are used. A satisfaction or correctness relation based on the idea of abstraction, rather than equivalence, is the key to making formal verification of large designs tractable. The aim of this paper is to discuss the role of abstraction in hardware verification and to show how various types of abstraction can be formalised and used to control the complexity of specifications and correctness proofs. The organisation of the paper is as follows. Section 1 introduces the idea of abstraction and its role in controlling the complexity of hardware verification. In section 2 an informal description is given of four basic abstraction mechanisms that can be used in proofs of hardware correctness. Section 3 contains a brief introduction to the logic that will be used to prove devices correct, and reviews the standard techniques for formally specifying the behaviour and structure of hardware in logic. Sections 4 through 7 show how each of the four basic abstraction mechanisms can be formalised in logic; in each section, an example is given to illustrate the use of the abstraction mechanism being discussed.

1

Abstraction

Abstraction involves the suppression of irrelevant detail or information, in order to concentrate on the things of interest. In hardware verification, the larger and more complex the system we are describing, the more irrelevant detail must be ignored to keep specifications small. Thus, in general, specifications will be abstractions of the devices actually implemented. This means that the satisfaction or correctness relation that is used must involve abstraction mechanisms that, by removing information, relate the more detailed implementation descriptions to their abstract specifications. The specification of a microprocessor, for example, would include a description of the effect of each instruction on the machine registers. The description of a microcoded implementation of the system would contain far more information, including the exact sequence of microinstructions which implements each macroinstruction. To show that this design is correct, we would have to prove that each microcode sequence produces the effect on the registers that is required by the specification of the corresponding macroinstruction. Here, the abstraction mechanism used to relate the levels of description is the composition of state changes; by composing a sequence of low level state changes to get

Abstraction Mechanisms for Hardware Verification

269

a single transition, we 'hide' from the top level specification information about the intermediate states that occur during macroinstruction execution. At the abstract level, we only know what each instruction does; at the more detailed implementation level, we also know how the instruction does it. Another example is the correctness proof of a multiplier circuit. A multiplier is constructed from hundreds of transistors, each of which exhibits quite complex behaviour. Furthermore, this device performs multiplication by some particular method, e.g. repeated addition. All this information will be explicit in the formulas that describe the multiplier's design but will be irrelevant to its abstract specification, which will simply state that it correctly performs multiplication. The correctness proof of this device will show that the logical formulas expressing its specification represent a valid abstract view of the formulas describing its design. The design description will include, for example, assertions about voltages present at certain points in the device; these must be translated into assertions in the specification about the numbers being multiplied. Such a translation is an abstraction mechanism for relating numbers in the specification to the voltages in the design which represent them. If we have proved that a formal specification represents a valid abstract view of a device, we can use it in place of the more detailed implementation description when proving the correctness of a larger system which contains the device as a component. This will allow us to derive a more clear and concise behavioural description of the larger system's implementation than would otherwise be possible. If we do this at each level of the proof of a hierarchically structured design, we can control the size and complexity of the entire proof. In this way, abstraction mechanisms combine with hierarchical structuring to make it possible to handle proofs of large systems.

2

Four Types of Abstraction

The type of abstraction most fundamental to hardware verification is structural abstraction-the suppression of information about a device's internal structure. The idea of structural abstraction is that the specification of a device should not reflect its internal construction, but only its externally observable behaviour. The description of a device's implementation, however, must contain explicit information about its structure; the mechanism of structural abstraction therefore involves formalising the idea that such information concerns 'internal' structure. A second type of abstraction, behavioural abstraction, concerns specifications that only partially define a device's behaviour, leaving unspecified its behaviour in certain states or for certain input values. Such partial specification is appropriate, for example, when it is known that a device will never have to operate in certain environments and it is therefore unnecessary to specify its expected behaviour in all environments. The mechanism of behavioural abstraction serves

270

Abstraction Mechanisms for Hardware Verification

to relate partial specifications to implementation descriptions that fully define the device's behaviour, by showing that they agree on the device's behaviour for all states and inputs that are of interest, i.e. that are defined by the specification. The concept of data abstraction is well known from programming language theory. There are several abstract data types which are useful for formal specification of hardware; a simple example is the type of boolean truth values, a data abstraction from analog signals routinely used by hardware designers to reason about circuits. Another example is the type of n-bit integers, represented at the less abstract level by vectors of booleans. A data abstraction step consists of constructing a mapping from the data types of an implementation description to the more abstract data types of the specification. This mapping is then used to show that the operations carried out on the low level data types correctly implement the desired operations on the high level types. A final type of abstraction is temporal abstraction, in which the sequential or time-dependent behaviour of a device is viewed at different 'grains' of discrete time. An example of temporal abstraction is the unit delay or register, implemented using an edge triggered flip flop. At the abstract level of description the device is specified as a unit delay, one 'unit' of discrete time corresponding to the clock period; at the detailed level of description the grain of time is finer, several units of time corresponding to the delay through a gate. The proof of a microcoded computer design, mentioned above, also involves temporal abstraction; at the low level there is one microcode step per unit of discrete time, and at the higher level there is one machine instruction step per unit of discrete time. Relating the levels of temporal abstraction involves mapping points or periods of low level time to points or periods of high level time and showing that a many-step low level computation implements a one-step high level computation.

3

Specification using Higher Order Logic

The formalism that we use to specify and verify hardware is a variety of higher order logic, adapted for this purpose by Mike Gordon at the University of Cambridge. In this section, we first give a very brief introduction to this logic; a more complete and formal presentation is given by Gordon in [2,3]. We then review how the behaviour and structure of digital hardware can be specified using higher order logic.

3.1

Introduction to Higher Order Logic

Higher order logic includes terms corresponding to the standard notation of predicate calculus. Thus, 'P z' expresses the proposition that z has property P and 'R(z, V)' means that the relation R holds between z and y. Higher order

Abstraction'Mechanisms for Hardware Verification

271

logic has the usual logical operators -', A, V, ::> and == denoting negation, conjunction, disjunction, implication and equivalence respectively. The universal and existential quantifiers V and 3 are used to express the concepts of every and some; 'Vx.P x' means that P holds for every value of x and '3x.P x' means that P holds for some (i.e. at least one) value of x. Terms of the form '(c =? t11 t2)' denote the conditional 'if c then t1 else t2'. The term 'log' denotes the composition of the functions I and 9 defined by (fog)(x) = I(g(x)). The constants T and F denote the truth values true and lalse respectively. The logic we use is higher-order. That is, variables are allowed to range over functions and predicates. The well ordering property of the natural numbers, for example, can be expressed using a variable P ranging over predicates: VP. (3n. P n) ::> (3n. P n A Vn'. n' -,p n)

This states that if any predicate P is true of at least one number then there is a smallest number for which P is true. In higher order logic, functions and predicates can take other functions and predicates as arguments. Functions can also yield functions as results. Consider, for example, the function Rise defined as follows:

(Rise ck)(t) == -,ck(t) A ck(t+l) Rise is intended to express the notion that a clock ck rises at time t. The clock, modelled by a function ck from natural numbers to booleans, is given as the argument to Rise. The result of applying Rise to ck is also a function from numbers to booleans. Thus Rise is an example of a function that both takes a function as an argument and yields a function as a result. Higher order logic is a typed logic; every term of the logic has a type, though these are usually omitted when a term is written down. Informally, types can be thought of as denoting sets of values and the value of a term as an element of the set denoted by the term's type. The basic types of the logic include the type of natural numbers, num and the type of boolean truth values bool. Types can be built from other types using type operators. For example, the type of functions from num to bool is denoted by 'num-+bool', using the infix type operator '-+'. Writing 'tm:ty' indicates explicitly that a term tm has type ty. For example, the term 'Rise ck' can be written with explicit type information as (Rise ck):num-+bool or even as «Rise:(num-+bool)-+(num-+bool)) (ck:num-+bool)):num-+bool

though it is seldom necessary to give this much type information. An important primitive constant in the logic we are using is the c-operator, the informal semantics of which are as follows. If P[x] is a predicate involving a

272

Abstraction Mechanisms for Hardware Verification

variable x of type ty then eX. P[x] denotes some value oftype ty such that Pis true of that value. If there is no such value (i.e. P[v] is false for each value v of type ty) then eX. P[x] denotes some fixed but arbitrarily chosen value oftype ty. Thus, for example, 'en.4 N, there will not exist points of time at which p is true for the nth time and 'Timeof p n' will be undefined. Thus, the value of 'Timeof p' will be a partial function. Such partial functions, however, are not directly definable in the logic; all functions defined must be total. The definition of Timeof will therefore be based on a relation, Istimeof, which can be defined directly. The term 'Istimeof p nt', will have the meaning 'p is true for the nth time at time t'. The formal definition of the relation Istimeof is done by primitive recursion on the variable n. In the case where n is zero, the definition is: IstimeofpOt == p(t) A Vt'. t't Ap(t') The predicate Inf can be used to express the fact that if p is true infinitely often then the time at which p is true for the nth time exists for all n: Inf(p) :J Vn. 3t. Istimeof p n t

(1)

The proof of this statement proceeds by induction on n and use of the well ordering property of natural numbers: VP. (3n. P n) :J (3n. P n A Vn'. n' Istimeof p n (Timeof p n) which gives the following easily-proved lemma: Inf(p) ::> p(Timeof p n) Using (2) the following two lemmas can also be proven: Inf(p) ::> (Timeof p n)«Timeof p (n+1)) Inf(p) ::> \:It. (Timeof p n) -,p(t) These lemmas can be conveniently collected together using the predicate Next defined by:

giving the following theorem Inf(p) ::> \:I n. Next (Timeof p n) (Timeof p (n+1)) p

(3)

This theorem states that if p is true infinitely often then 'Timeof p' is well defined and denotes the desired function from high level time points to low level time points.

286

7.1.2

Abstraction Mechanisms for Hardware Verification

Using Timeof to Define Abstractions

Having formally defined the function Timeof and shown that it is well defined for functions p that are true at an infinite number of points, it is possible to use Timeof to define temporal abstraction functions for statements of correctness. A temporal abstraction function that maps high level time to low level time is just an increasing function f:num-num. Any such function can be defined using Timeof and an appropriate function p that indicates the points of low level time that are abstracted to points of high level time. Formally: 'V f. Incr(f) == 3 p. Inf(p) 1\ f= Timeof p This theorem follows from the definition of Incr and the properties of Timeof given by theorem (3). With this theorem in mind, we can use Timeof to construct an alternative formulation of temporal abstraction as follows. If p is a term that denotes a function indicating the points of low level time that are to be abstracted to points of high level time, then the correctness relation that must hold between a specification Spec and an implementation Imp is: Inf(p) 1\ 'Va b. Imp( a, b) :J Spec( a 0 (Timeof p), b 0 (Timeof p))

(4)

This states that Imp is correct with respect to the temporally abstract specification Spec if whenever a and b satisfy Imp then the abstract signals constructed by sampling a and b when p is true will satisfy Spec. If 'when' is an infix operator defined as follows: 2

'Vst. s when t = so (Timeoft) then this correctness statement can be written: Inf(p) 1\ 'Vab.lmp(a,b):J Spec(awhen p, bwhen p) Since 'Vp.lnf(p) :J Incr(Timeof p), the correctness statement (4) implies a correctness statement of the old form: Incr(f) 1\ 'Vab.lmp(a,b):J Spec(aof, bof) with f= Timeof p. Furthermore, since every increasing function f can be expressed using 'Timeof p' with an appropriate function p, this alternative form of correctness relation for temporal abstraction is essentially equivalent to the original one. The following example shows how Timeof can be used in the correctness proof of a unit delay register, and describes some ways in which the ideal form of correctness given above must be modified in practice. 21n previous work, I have called this operator' Abs'. The mnemonicly superior name 'when' was suggested by the work of Halbwachs, Lonchampt and Pilaud [5].

Abstraction Mechanisms for Hardware Verification

7.2

287

An Example

A commonly used register-transfer level device is the unit delay:

fOeiI

in~out

which can by specified in higher order logic by:

Del(in, out)

== Vt. out(t+l) = in(t)

The unit delay device is a temporal abstraction; its implementation is specified at a more detailed grain of time. One possible implementation is the rising edge triggered D-type flip flop:

d

Dtype

q

ck

which, ignoring detailed timing information,3 can be specified by the predicate Dtype defined as follows:

Dtype(ck, d, q)

== Vt. q(t+l) = (Rise ck t :::} d(t) I q(t»

where the term 'Rise ck t' has the meaning 'the clock, ck, rises at time t':

Rise ck t

== -,ck(t) /\ ck(t+l)

We want to show that the D-type flip flop is a correct implementation of the register-transfer level unit delay. Using the formulation of correctness for temporal abstraction described above, we must prove a correctness statement of the following form:

Inf(p) /\ Vckdq. Dtype(ck,d,q):::> Del(dwhen p, qwhen p) where p is a term that denotes a function of type 'num-+bool'. It is not possible, however, to prove a correctness statement of this form; for we can show that:

-,3p.lnf(p) /\ Vck d q. Dtype(ck, d, q) :::> Del(dwhen p, q when p) 3For clarity, a greatly simplified specification of Dtype is used in this example. For D-type flip flop specifications that include information about detailed timing behaviour see [6] or [8].

288

Abstraction Mechanisms for Hardware Verification

The trouble is that the abstract time scale for Del must be defined in terms of the value of the clock, ck. Informally, Dtype implements a unit delay by sampling its input when the clock rises and holding this value on the output until the next rise of the clock. In this way the D-type delays by one clock period the sequence consisting of the values of the signal d at the times of successive clock rises. This suggests that the temporal abstraction function for this proof should map successive points of high level time to the successive points of low level time at which the clock rises. Using 'Rise ck', the required temporal abstraction function is given, for each value of ck, by the term 'Timeof(Riseck),. Using this, we can reformulate the correctness statement for the Dtype implementation of Del as:

'r/ ck.lnf(Rise ck) A 'r/ d q. Dtype(ck, d, q) J Del(dwhen (Rise ck), q when (Rise ck)) This states that the clock rises infinitely often, and whenever the signals ck, d and q are in the Dtype relationship, the abstracted signals-i.e. the signals d and q sampled at successive rises of the clock-will be in the Del relationship. Thus there is a 'family' of sampling functions defined by 'Rise ck', giving a temporal abstraction from Dtype to Del for each value of the clock ck. Because we are not interested in constructing temporal abstraction functions for signals that do not satisfy Dtype( ck, d, q), this correctness statement can be further modified to have the following form:

'r/ ckd q. Dtype(ck, d, q) J Inf(Rise ck) A Del(d when (Rise ck), q when (Rise ck)) As it stands, however, this correctness statement is false; for Dtype does not constrain the value of ck, and it is not the case that Inf(Rise ck) holds for all possible values of ck. The correctness statement will therefore be modified to have Inf(Rise ck) as an assumption, giving:

'r/ ck.lnf(Rise ck) J 'r/ d q. Dtype(ck, d, q) J Del(dwhen (Rise ck), q when (Rise ck))

(5)

which is the final form of the correctness statement for this device. The proof of (5) is straightforward. The main step is an induction on the number of time steps between adjacent rises of the clock, showing that the value of d that is sampled at a rise of ck is held on q until the next rise. The correctness statement follows easily from this result and the properties of Timeof given by (3). The assumption that the clock rises infinitely often is a validity condition on the temporal abstraction; the correctness statement asserts that Del presents a valid abstract view of Dtype provided the condition 'Inf(Rise ck)' is satisfied. To use the Del abstraction, this condition on the clock must be met by the

Abstraction Mechanisms for Hardware Verification

289

environment in which the Dtype is placed. The validity condition 'Inf(Rise ck)' is as unrestrictive as possible, given the simple D-type model used. The clock ck is not required to be regular or to have a minimum clock period; the assumed liveness condition is sufficient for the proof to go through. This proof gives a very simple example of the most common type of temporal abstraction, where contiguous intervals of low level time correspond to successive units of high level time. Examples involving detailed timing information or several different temporal abstractions in the same proof are more complex but involve the same general approach as this simple example. There are other types of temporal abstraction that can be used to formulate correctness statements, but that can not be constructed using Timeof. In one form of temporal abstraction, for instance, the occurrence of an event at any time during an interval of low level time corresponds to a signal being true at a point of high level time. This type of abstraction can be used to deal with asynchronous inputs to synchronous systems, hiding from the system's abstract specification irrelevant information about the exact timing of the asynchronous input signals.

8

Conclusion

In hardware verification, designs that are proven correct ought to have formal descriptions that model reality reasonably well. The more accurate the model of device behaviour, the less likely it is for design errors to escape discovery by formal verification. Formal specifications, on the other hand, must be clear and concise, in order to be seen to reflect the designer's intent. This means that most of the details of a device's behaviour must be left out of its formal specification; only the essentials can be included. The idea of abstraction is therefore fundamental to a formalisation of hardware design correctness. This paper has shown how four basic abstraction mechanisms for hardware verification can be formalised in higher order logic and used to control the size and complexity of device specifications. For clarity, the examples given were kept very simple; each example involved only one simple abstraction step. More substantial examples have been done, in which a series of nested abstractions is used to relate the bottom level design description to the top level abstract specification. The role of abstraction in hardware verification is not restricted to that of complexity control. Structuring a correctness proof to span several levels of abstraction allows it to serve as structured documentation for a design. The top level specification gives a concise description of the essential features of the device; if more information about its operation is required, this is provided by the specifications at the next lower level of abstraction. If still more information is desired, it can be obtained from the next level of abstraction, and so on, right down to the level of primitive components. The abstraction mechanisms that

290

Abstraction Mechanisms for Hardware Verification

are used in the proof serve to explain how each level of abstraction is related to the next lower level. Abstractions often involve validity conditions that state when a specification represents a valid abstract view of a more detailed design description. Formal correctness proofs make such abstraction validity conditions explicit; for the proofs to go through, they must either be satisfied or included as assumptions. This means that a formal proof clearly documents validity assumptions that might otherwise lead to misunderstandings by remaining implicit. The conditions under which a device's abstract specification reflects the actual behaviour of the device will be clear from the assumptions present in its proof of correctness. Validity conditions that arise in the correctness proofs of some of the components of a device can sometimes help to simplify the task of writing specifications for other components of the device. For example, the validity condition 'Inf(Rise ck)' generated in the Dtype proof simplifies the task of specifying the desired behaviour of clock generator circuits for use with the Dtype device. We know that the clock signal must satisfy this condition; we can therefore take the abstract specification of the clock generator to be just the validity condition itself. Some abstractions can be seen as specifications of the 'protocol' for communication at the external ports of a device. Consider, for instance, the temporal abstraction given for the D-type flip flop implementation of the unit delay. From the correctness statement, it is clear that values are input to the delay device only at rising edges of the clock. The abstract specification of this device states that the value input to the device is delayed by one unit of time; the temporal abstraction used in the correctness statement explains what it means for a value to be 'input to the device'. ' Abstraction mechanisms similar to the ones discussed in this paper are likely to play an important role in the automatic synthesis of designs. In this paper, abstraction mechanisms were described as removing information from design descriptions to yield abstract specifications. Synthesis of designs involves the opposite process-adding detail to formal specifications until an implementable design description is obtained. The abstraction mechanisms commonly used in post-design verification may also suggest heuristics for design synthesis from formal specifications.

Acknowledgements Thanks are due to the members of the Cambridge Hardware Verification Group, especially Albert Camilleri, Francisco Corella, Inder Dhingra, Mike Gordon and Jeff Joyce, who made valuable comments on drafts of this paper. I would also like to thank Graham Birtwistle of the University of Calgary for his suggestions. This research is funded by the Royal Commission for the Exhibition of 1851.

Abstraction Mechanisms for Hardware Verification

291

References [1] Camilleri, A.J., Gordon, M.J.C. and Melham, T.F., 'Hardware Verification Using Higher Order Logic,' to appear in: Proceedings of the IFIP International Conference: From H.D.L. Descriptions to Guaranteed Correct Circuit Designs, Grenoble, September 9-11, 1986. [2] Gordon, M.J .C., 'HOL: A Machine Oriented Formulation of Higher Order Logic,' Technical Report No. 68, Computer Laboratory, University of Cambridge, 1985. [3] Gordon, M.J.C., 'HOL: A Proof Generating System for Higher Order Logic,' in: VLSI Specification, Verification and Synthesis, G.M. Birtwistle and P.A. Subrahmanyam, eds., (this volume). [4] Gordon, M.J.C., 'Why Higher Order Logic is a Good Formalism for Specifying and Verifying Hardware,' in: Formal Aspects of VLSI Design: Proceedings of the 1985 Edinburgh Conference on VLSI, G.J. Milne and P.A. Subrahmanyam, eds., North-Holland, Amsterdam, 1986. [5] Halbwachs, N., Lonchampt, A. and Pilaud, D., 'Describing and Designing Circuits by means of a Synchronous Declarative Language,' to appear in: Proceedings of the IFIP International Conference: From H.D.L. Descriptions to Guaranteed Correct Circuit Designs, Grenoble, September 9-11, 1986. [6] Hanna, F.K. and Daeche, N., 'Specification and Verification using HigherOrder Logic: A Case Study,' in: Formal Aspects of VLSI Design: Proceedings of the 1985 Edinburgh Conference on VLSI, G.J. Milne and P.A. Subrahmanyam, eds., North-Holland, Amsterdam, 1986. [7] Hanna, F.K. and Daeche, N., 'Specification and verification of digital systems using higher-order predicate logic,' lEE Proceedings-E, Vol. 133, Part E, No.5, September 1986, pp. 242-254. [8] Herbert, J.M.J., Ph.D. Thesis, The University of Cambridge, to appear in 1987. [9] Hoare, C.A.R., 'A Calculus of Total Correctness for Communicating Processes,' Science of Computer Programming, Vol. 1, No.1, 1981. [10] Leisenring, A., Mathematical Logic and Hilbert'se-Symbo/, Macdonald and Co. Ltd., London, 1969. [11] Winskel, G., 'Lectures on models and logic of MOS circuits,' Proceedings of the Marktoberdorf International Summer School on Logic Programming and Calculi of Discrete Design, July 1986.

10 Formal Validation of an Integrated Circuit Design Style I. S. Dhingra1 University of Cambridge Computer Laboratory Corn Exchange Street Cambridge CB2 3QG December, 1986

1

Introduction

A specific design style is only ever used if it meets the required needs of the task in hand. The task in hand now is that of generating large, complex, application specific systems on silicon in a fairly short space of time with the confidence that they will perform to the required specification. In the past the development of a large circuit might have been done using a team of engineers over a period of few years, e.g. the development of the 68000 microprocessor. This method of circuit development is not acceptable in the present day due to the time and manpower spent in iterating to get the design correct. What is needed is a design technique which is easy to follow and gives very high degree of confidence in the first time correct implementation of the circuit. Before such a design technique can be developed we need to look at the sort of mistakes that are made which slow down the development of the circuit. The sort of errors that are generally made can be classified into two categories-timing errors and logical errors. Designers use simulators to model the behaviour of circuits before their implementation. The critical point to note here is that the degree of confidence one has in the design after it has been simulated is no more than the confidence one has in the model of the primitives used in the simulation. The sort of errors that are not easy to catch at the simulation stage are the complex timing errors. This is because the process of manufacturing is not ideal and so there are slight differences from one batch of devices to the next. So, the sort of design technique needed is a synchronous design scheme. Such a design scheme dictates that all storage elements be updated by the use of a 1 The author is jointly funded by the British Science and Engineering Rese8l"Ch Council, and Racal Research based in Reading, England.

294

Formal Verification of a Design Style

global clock. This reduces the timing problems to the areas between the clocked elements and the problems of identifying difficult timing paths and race hazards is simplified. It also frees the designer to think more about the algorithm and the logical design of the system rather than the detailed timing requirements of the logic. The price paid for this simplification of the design style is an increase in the size of the chip. This price is probably worth paying since the adoption of a difficult asynchronous design would probably lead to the introduction of obscure timing errors which may be difficult to model and may not be discovered until the chip has been fabricated. The technology we are interested in is CMOS, and associated with it are hazards not encountered with the simple NMOS technology. For example in CMOS all latches are clocked with a pair of clock lines which are inverses of each other. The constraints on the clock lines are that the overlap between the clocks and the rise and fall times of the clock edges should be less than the delay across a gate (i.e. less than a few nano-seconds). This is a stringent requirement for a VLSI technology and may be very difficult to meet as the device geometries get smaller and the capacitive and resistive effects due to the clock lines become more significant. This not only introduces clock stagger across the chip but also degrades the clock rise and fall times-both of which are hazardous and may cause the latches to become transparent for a short time and hence corrupt data. To over come the problem of clock stagger on a large chip requires fine tuning of the clock delays across the chip which would be difficult to model at the design stage and still provide no guarantee as to the probability of it working. To overcome the poor clock rise and fall times requires one to use excessively large clock drivers and clock buffers at regular intervals which leads to rather complex clock structures. Unfortunately it is also desirable not to have very fast clock rise and fall times, since this generates electromagnetic interference and hence causes neighbouring lines to get corrupted, a highly undesirable feature! The solution to these problems is to do with the design style, not the deployment of clever distributed clock drivers. Presented here is a technique for formally analysing such design styles. Our work is based on a design style known as CLIC [8] which is tolerant to considerable clock overlap and slow clock rise and fall times. It is an extension of the work done by Goncalves and De Man [4] where they present a design style they call NORA that generalises yet another design style known as DOMINO [7]. Common to all of these design styles is the use of dynamic logic rather than static logic. The reason for this comes from the fact that in static CMOS, all logic functions are duplicated~nce using the p-type transistors and then again using the n-type transistors. This has the advantage that the static power consumption of the circuit is very low, but it also means that almost twice the amount of silicon area is being used than necessary. However in the context

Formal Verification of a Design Style

295

of a synchronous system we have a continuously running global clock which could be used for other purposes: i. e. instead of using static logic blocks we use dynamic logic blocks which are clocked by the global clock. This does increase the power consumption a little but it is still less than the power that may be used by a similar NMOS circuit. This dynamic design feature was used in the DOMINO design style which only used the n-type gates and provided only non-inverting logic. The NORA design style on the other hand used both n-type and p-type devices and so provided the full freedom of inverting and non inverting logic. However a requirement with the NORA design technique was that the clock rise and fall times should be kept fairly small to avoid data corruption by the latchs becoming transparent for a short time. This problem begins to dominate as one approaches VLSI as described above. The CLIC design style overcomes this by use of a two phase clocking scheme. After a short overview of the CLIC design style and its design rules, we give formal means of developing correct CLIC gates. This is based on a simple transistor model with charge storage capability and a simple four value model of the signals on wires. The CLIC design style is fully described in [8]. The notation used there is quite different from ours and the clock labeling scheme is completely different which does not easily lend itself to formal analysis. So a summary of this design technique is first given.

2 2.1

Overview of the CLIC Design Style Introduction

In this section a brief overview is given of synchronous design methodology and dynamic logic, together with how these two ideas are married together to form the basis of the CLIC design style. 2.1.1

Synchronous Design Methodology

The basic principle behind any synchronous design philosophy is that the system is separated into blocks of purely combinatorial logic with no data storage facility, interleaved by register latches which hold the data between clock pulses. There is a global system clock which is used to clock the register latches, and, the period of this is such as to allow all combinatorial logic blocks to finish evaluation of results. So, on the tick of the clock, new data appears on the inputs of the combinatorial logic blocks, and the old results are passed as inputs

296

Formal Verification of a Design Style

to the next stage by use of the register latches. By definition there is no feedback within the purely combinatorial blocks. This principle is illustrated in figure 1.

-->--+--1 Logic 1--->---1 Latch 1--->---1 Logic 1--->---1 Latch 1---+---> I ,-------, ,-------, ,-------, ,-------, I I

I

,--------------------------------------------------------------, Figure 1: Synchronous Logic Concept

2.1.2

Dynamic Logic

A dynamic logic gate has two phases of operation, precharge and evaluate, which is controlled by a system clock. For example a simple pseudo-NMoS dynamic gate is shown in figure 2. The output is precharged high while the elk input is low. Then as elk goes high the path to Vdd is turned off and the path to Gnd is turned on. Now the output will either remain floating high or will be discharged low depending on the inputs.

------+------

Vdd

I

-,

--------011

.-+-+-------

I

I

output

'11---

-,

---II

ipl

ip2

'-+-'

-,

clk

----+---------11

------+------

Gnd

Figure 2: A simple dynamic logic gate Many design styles use such dynamic gates. In particular the DOMINO logic design style [7] uses this sort of structures with each output terminated by a static inverter. The advantage of this is that the output of the inverter is low while the gate is in its precharged state. So this can be fed as input to other such gates resulting in a chain of dynamic gates separated by static inverters.

297

Formal Verification of a Design Style

Now when the clock goes high, all the gates change to the evaluation phase. Since all inputs are initially low no output of a gate will change state unless the gate at the start of the chain changes. Effectively what we have here is a chain of evaluation going from the start of the chain to the end, much as the fall of one domino causes the next to fall which in turn causes the next to fall and so on. 2.1.3

The CLIC Design Style

This design style uses both dynamic logic features and the principles of synchronous design. It comes with a set of rules which guarantee that there will be no timing hazards. The basic principle behind this is very similar to that of the NORA design style as described in [4]. The major difference is that instead of using the simple clock and its inverse, a two phase non-overlapping clocking scheme is used which results in having four clock lines- flJl, 4Jl , flJ2 and 4J2• The remainder of this section tries to give a brief overview of this design style. A more detailed electrical analysis of this is given in [8].

2.2

Clock Definition and Generation for CLIC

The two phase clock for the CLIC design style is generated from a single square wave clock input. On every rising edge of the external clock input an internal narrow pulse is generated on the flJl clock line. Similarly on the falling edge of the external clock another pulse is generated on the flJ2 clock line. These two internal clock lines are then inverted to form the remaining two internal clock lines, namely ?il and ?i2 respectively. These are shown in figure 3 together with the external clock. For the purposes of analysing the clock scheme we can divide the clock cycle Extemal.

Clock Input Phi1 Phi1' Phi2 Phi2'

_______ 1

1________ 1

1________ 1

-

-

1_____ -

-

________ 1 1_______________ 1 1_______________ 1 1___________ _

I_I

-

I_I

-

I_I

-

_________________ 1 1_______________ 1 1_______________ 1 1__ _

LI

LI

LI

Figure 3: The external and the internal clock relationships

Formal Verification of a Design Style

298

into eight distinct intervals as shown in figure 4. The shaded regions represent uncertainty in the value on the clock lines, i.e. the value could be Hi, lo or something in between. The essential requirements of the clock scheme are that the duration of the clock pulses, i.e. the intervals ts and t7, should be long enough so that the internal gates of the chip have enough time to precharge their outputs. /------ O.e Cycle -----\ I I

t1 t2 tS t4 tS t6 t7 t8

I

I

I

I

I

I

I

I

Phi1

III

III

III

III

Phi1'

III

III

III

III

phi2

III

III

III

III

phi2'

III

III

III

III

Figure 4: Full definition of the internal clock

2.3

CLIC Primitive Gates

All the logic gates used in the CLIC environment, except for the static inverter, are dynamic logic gates which are driven by one of the four clock lines. The primitives used in realising a logic function are a combination of these gates. There are four basic building blocks used in CLIC circuits-the N..5hell, the PShell, the latch, and the StatJnv. These are illustrated in figure 5. Both the NShell and the PShell devices need extra components, namely transistors, to be "wired" into the them to make n-type and p-type gates respectively. It is these n-type and p-type gates which form the primitive gates as used in the CLIC design scheme. In the remainder of this section we give a brief account of how each of these devices work. The working of the static inverter however are trivial and will not be discussed here.

299

Formal Verification of a Design Style

--------+------

--------+------

-,

-,

.--011 I I I I /-----\ ip ....=:m..>1 P I I \-----/ I I I +----- op I I -, I phi' ----+---11

.--011 I I I +----- op I I I /-----\ ip ..=......>1 I I I \-----/ I I -, I phi ----+---11

--------+-------------+------

I

(a) I_Shell

--------+------

(b) P_Shell

-,

.-------011 I I I -, I ---011 I phi' I

ip

----+ I I I I I I

phi'

I 1\1 I \ ip -----IL >0---- op I / 1/1 I

+---- op

I -, ----II

phi

phi

-,

, --------11

--------+-------------+------

(c) Latch

-,

ip

.--011 I I

----+

+---I -,

I I '---II

op

--------+------

1\ ip -----1 >0---- op 1/

(d) Static Inyerter

Figure 5: The primitive building blocks of CLIC

300

2.3.1

Formal Verification of a Design Style

The Latch

The Latch as used in the CLIC environment is a dynamic device. It needs to be updated at regular intervals otherwise the value held on its output may deteriorate. There are two types of latches in the CLIC system, one driven by tP1 and 4'1' and the other driven by tP2 and 4'2' The differences between the two are only in the clocks they are driven by and none other. A typical latch is illustrated in figure 6 together with the clocks which drive it. The input is required to be stable during the times when there is a positive pulse on the tP line and a negative pulse on the 4' line. By stable is meant that the the input should either remain Hi or lo, but should not be in the process of changing. This constraint on the input is necessary to stop the latch from latching onto a false value.

.

----------+-----_

.-------0 II I I I

I

-.

---011

Phi

I phi' I

ip

----+I I I I I I

+---op I

-. ----II

phi

-,

c ________ 11

-

-

______ 1 1_______________ 1 1____ _

I_I

Phi'

I

I

I I

\

Latched _de

LI

\

TrlllUlparent

_de

----------+-----Figure 6: The Latch If a latch is correctly clocked then it can be in one of two modes-Transparent mode or Latched mode.

Transparent mode tP = Hi, and 4' = lo The two transistors driven by the clock lines get switched on, so the value on the output of the latch is charged to the inverse of the input. It is also important to note that during this time the latch behaves as a static

Formal Verification of a Design Style

301

inverter, i.e. if there were any changes on the input then they would be reflected by a change on the output.

=

=

Latched mode tP lo, and?> Hi The two transistors driven by the clock lines get switched off, so the output node of the latch gets isolated from the power rails and so retains the previously charged value. During this time any changes on the input have no effect on the output. The output node is designed to hold the value long enough until the latch is refreshed again by going into the transparent mode for a short time. The latch only needs to be in the transparent mode for a short time, long enough for the output node to be charged to the correct value. Even if there is considerable clock overlap and clock edges are poor, the latch will still lock onto the correct value provided the input is stable during the interval when there is a positive pulse on the tP line and a negative pulse on the?> line. This restriction ensures that the latch behaves correctly as regards locking onto the input value.

2.3.2

P-type and N-type Logic Gates

The name of p-type and n-type logic gates arises from the fact that these gates use only p-type or n-type transistors respectively except for the precharging and enabling transistors. All p-type gates are driven by either tPl or tP2 and all n-type gates are driven by either ?>l or ?>2' Perhaps the best way to understand the working of these gates is by example. Consider a simple two input n-type Nand gate as shown in figure 7. The working of this gate has two distinct phases-the Precharge period and the Evaluation period. Precharge ?> = lo During this period the output of the gate is pulled Hi by the enabled ptransistor. Any changes on the inputs during this time have no effect on the output since the bottom n-transistor is off and so the path to Gnd is effectively cut, i.e. the output cannot be pulled lo. Evaluation ?> = Hi When?> goes Hi the top p-transistor goes off and the bottom n-transistor comes on. If both the inputs now go Hi then the output will be pulled down to logic levello, otherwise it will remain floating at Hi. The correct answer is generated on the output of the gate at the end of the evaluation period and is held static until the next precharge period.

302

Formal Verification of a Design Style

.

----------+-----_ . ---011

I

I I I I

+----

I -.

ipl

-----------11

ip2

-----------11

I I I

I I I

Phil

op

-

-

____ I 1_______________ 1 1___ _

'-'

Phil' Phi2

------______ 1 1_____________ _

'-'

Phi2'

-.

I I I

-.

I I

Precharge I I

-.

Eftl.uate

phil·~----+----II

----------+------

Iold

Figure 7: Two input N-type Nand gate Note that the output may hold the wrong answer if the inputs are allowed to go Hi and then go lo during the evaluation period. Since there is no pull-up during the evaluation period, if the output goes lo then it will remain so until the next precharge period. So for example if the inputs are initially Hi and then go lo, then at the end of the evaluation period the output and the inputs will all be lo which is the wrong answer for a Nand gate. To overcome this problem a restriction is imposed which states that "there should be no Hi to lo transitions on the inputs of n-type gates during the evaluation period." Similarly the working of p-type gates can be understood by considering a two input p-type Nor gate. The structure ofthis is shown in figure 8. As in the case of n-type gates, this too has the preeharge period and the evaluation period. Precharge tP = Hi During this period the output of the gate is pulled lo by the enabled n-transistor. Any changes on the inputs during this time have no effect on the output since the top p-transistor is off and so the path to Vdd is effectively cut, i.e. the output cannot be pulled Hi. Evaluation tP = lo When tP goes lo the bottom n-transistor goes off and the top p-transistor comes on. If both the inputs now go lo then the output will be pulled up to logic level Hi, otherwise it will remain floating at lo.

303

Formal Verification of a Design Style

----------+------,

.---011

I I I

ipt

Phi1

-,

Phi1'

----------0 II I I I

Phi2

-,

phit

+----

I -,

-

LI LI ------______ 1 1_____________ _

Phi2'

ipa ----------011 I I I I I

-

____ I 1_______________ 1 1____

op

-----+----11

I I I

I I

I Precharge I I

Jyuuate

----------+------

Bold

Figure 8: Two input P-type Nor gate Just as before the correct answer is generated on the output of the gate at the end of the evaluation period and is held static until the next precharge period. For similar reasons to those of the n-type gates, we need to impose the restriction that "there should be no Lo to Hi transitions on the inputs of p-type gates during the evaluation period."

2.4

Composition Rules for CLIC

From the previous section's work we note that the outputs of the n-type gates cannot have Lo to Hi transitions during the evaluation period which is exactly the requirements on the inputs of the p-type gates. Similarly the outputs of the p-type gates cannot have Hi to Lo transitions during the evaluation period which also meets the exact requirements for the inputs of the n-type gates. This is not by accident but is designed into the style so that we can compose these gates together. Rules like this and others which are necessary to make sure that the circuit designed with CLIC dynamic gates does not contain any timing hazards are presented in this section. Perhaps the best method of presenting the CLIC composition rules is simply by listing them. Before giving these rules we introduce a few shorthands for the various types of gates and the clocks by which they may be driven.

304

Formal Verification of a Design Style

P_Gate(, iPb

iP2, op)

=de/ 3P1 P2 P3·

N..shell(4),P1,P3,op) " N_Tran(ip1,P1,P2) " N_Tran( iP2,]J2, Fa)

(23)

Now by using theorems 18, 19, 20, 21 and 22 we can derive the following two properties. This states that the output of a two input Nand gate is "well behaved" and that the output is also "well defined" over a certain interval of time with reference to the point when 4>1 is Hi.

(24)

314

Formal Verification of a Design Style

(25)

So far we have only demonstrated that the output of n-type gates are "well behaved" and "well defined," but nothing has been said about the derivation of the logical behaviour of these gates. For this we need theorems considerably more complex than those given for Opt..Link. These properties include Link, No_Link and WB_Link which state under what circumstances a "link" exists across the two nodes of the cluster of devices inserted in the N..shell. The line of thought regarding work on these is very similar to that followed for OpLlink, so we shall not deal with them here. However the other two theorems for the two input nand gate giving its logical behaviour are presented below. They use an abstraction function Val..Abs which maps the values Hi and Lo to true and false respectively.

(26) 1\

Val..Abs iP2 (HI)))

Clock( 4>1> 4>1,4>2,4>2) 1\ N_NandJmp(4)l, iPl, iP2, op) 1\ WB iP1 4>1 1\ WB iP2 4>1 1\ 4>1 (t) = Hi 1\ :J (27) Def iP1 (HI) 1\ Def iP2 (HI) 1\ Def iP1 (H2) 1\ Def iP2 (H2) ( Val..Abs op (H2) = "" (Val..Abs iP1 (H2) 1\ Val..Abs iP2 (H2))) The treatment for p-type logic gates follows exactly the same line of argument, even to the point where considerable number of the intermediate results are common to both. 3.6.2

The Latch

This is also known as the C 2 MOS latch and its structure is shown in figure 11. This is Formally captured as follows:

Formal Verification of a Design Style

315

.

----------+----------

. -------011 I I I I ---011 I phi' I I \ I

-.

ip ---+

I I I I I I ----II I phi I -. I , --------11

>---1

_.

CAP3

1---

,------.

op

----------+---------Figure 11: The Latch as used in LatchJmp(4),¢,ip,op) =deJ

CLIC

3Po P1 1'2 P3 P4 Ps P6·

Gnd(Po) Vdd(pt} N_Tran( ip,Po, 1'2) N_Tran(4),P2,P4) P _Tran(¢, P3, P5) P_Tran(ip,P1,P3) Join(P4, Ps, 116) Cap3(P6,OP)

A A A A A A A

(28)

Since this is simply a one off result, the derivation is not important but what is important is the result so only that is presented. The full behaviour of the Latch device is summarised in the following three theorems.

( WB op 1:,.1 A) WB op 4>1 A) Clock(4)b¢1,4>2'¢2) :J ( LatchJmp(4)b¢l,ip,op)A 4>l(t) Hi A· Def ip t

=

(29)

A)

( Def op tAA

Defop(t+l) Def op (t+2) Def op (t+3)

1\

(30)

316

Formal Verification of a Design Style

") Clock( ~b 4>1' ~2' 4>2) ( latchlmp( ~1' 4>1' ip, Op) " ~l(t) = Hi ,,::) Def ip t Val..Abs Op t = . . . Val..Abs ip ( Val..Abs op (HI) = ...... Val..Abs ip Val..Abs op (H2) = . . . Val..Abs ip Val..Abs op (H3) = . . . Val..Abs ip

(31)

t ") t " t " t

The first of these captures the fact that the output may drive any of p-type or n-type gates, even both at the same time. The second theorem states that the output is "well defined" so the results can be abstracted into the boolean domain by use of the Val..Abs abstraction function. Finally the third gives the logical behaviour between the input and the output at the abstract level, i.e. on the clock tick the input is inverted and passed to the output where it is held static until the next clock tick.

3.6.3

The Static Inverter

This is the only device in the entire CLIC design style which does not need one of the clock lines for it to function correctly. It has only two external ports namely the input (ip) and output (op) ports and its behaviour could perfectly be defined without the use of the Clock predicate. However, to enable it to be used in conjunction with other dynamic CLIC devices, its correctness statement has to be given in the same form. So we begin by giving the formal definition for the structure of the gate as follows: StatJnvJmp(ip,op)

=deJ

3po Pi P2 P3· Gnd(po) Vdd(Pl) N_Tran(ip,Po,P2) P_Tran(ip,Pl,P3) Join(P2, P3, op)

" " " "

(32)

Now the usual three properties can be derived for this gate. The first one being that it's output is "well behaved."

(WB (WB ( (WB (WB

ip 4>1 ip ~1 ip 4>2 ip ~2

::) ::)

::) ::)

WB WB WB WB

op ~d op 4>d op ~2) op 4>2)

~)

(33)

317

Formal Verification of a Design Style

Inspecting this theorem reveals that it is in a different form than the others so far. In fact it is not so different as to not allow logical inferences to be made using the same techniques. However if it were to be put in the same form as the ones so far, we would have four different theorems giving rise to the four different clauses. Remember that the inverter is used to invert the polarity of a gate so that a gate may drive its own sort, e.g. a p-type gate may drive another p-type gate only if it is buffered by an inverter. Since there are two different sorts of gates, n-type and p-type, and two clock phases, the need arises for four very similar theorems, or one containing all four clauses. The remaining two theorems for this device are fairly standard. In fact they are even simplified a little to take advantage of the fact that this device is not clocked. The next theorem for instance simply states that "if the input is defined then so is the output." The last one gives the logical behaviour of the gate appropriately abstracted to the boolean level using the Val..Abs predicate. ( StatJnvJmp(ip,op) Def ip t ( StatJnvJmp(ip,op) Def ip t

3.7

/I.) /I.)

(34)

:::> Def op t :::> (Val..Abs op t

~Val..Abs ip

t)

(35)

Formalising the CLIC Circuit Design Methodology

So far we have outlined a method for deriving the correctness statement of any logic gate designed in the CLIC design style. If we are to design real circuits with these correctness statements, rather than just admire their elegance and still use the old rules of thumb, then we must provide formal means of doing so. i.e. a formal method of combining the correctness statements of an arbitrary number of gates resulting in a new correctness statement for the new circuit. What the designer is interested in is simply obtaining the logical behaviour of the system so that he may satisfy himself that the system does what he intended it to do. So a technique is needed which allows the designer to compose the logical behaviour component of the specifications and leave the rest of the "checking" to the system. In our system simple logical inferences would be used to check the validity of connecting together the output of one gate to the input of an other. In fact the rule involved is the resolution of the predicates and their arguments governing the constraints on the inputs of devices, against the predicates and their arguments governing the properties of the outputs. Work is in hand at present to automate this process. To illustrate the use of this consider the the implementation of a simple logical function with delay. Firstly, in order to get delay we will have to use a latch

AND

318

Formal Verification of a Design Style

since this is the only device in the CLIC design style which allows behaviours between input and output to be mapped across time giving a controlled unit delay with respect to the clock. So the solution we shall choose to implement is to drive the output of an n-type nand gate, such as the one already used in an earlier example, by a latch. This is illustrated in figure 12.

1-----\ ip

phi2'

I

\

I

/

----I • 1-----/

1\

I \

>o--------IL >0----I /

op

1/

phi1'

phi2

Figure 12: A simple CLIC Circuit

The theorems needed for these two devices have already been derived earlier in this paper, namely theorems 27 and 31, for the nand gate and the latch respectively. The correctness statement for the nand gate is exactly as needed but that of the latch needs to be messaged into a form which allows these two to be combined. Given below are the two theorems for these two devices in their correct form just before they are to be combined. There are a number of important steps involved before we get to this state involving a lemma about clock but these are not covered here. Further details regarding these intermediate steps can be found in [3].

Clock( 1/11> ~l' 1/12, ~2)

/\

N~and-'mp(~l' iPl, iP2, op) /\ WB iPl ~l /\ WB iP2 ~l /\

1/11 (t) = Hi /\ ::) (36) Def iPl (t+l) /\ Def iP2 (t+l) /\ /\ Def iPl (t+2) Def iP2 (t+2) ( VaLAbs op (t+2) = '" (VaLAbs iPl (t+2) /\ VaLAbs iP2 (t+2»)

319

Formal Verification of a Design Style

Clock(~1'~1'~2'~2) ") ::> ( LatchJmp(~2'~2,ip,op)" ~l(t)=Hi " Def ip t+2 VaLAbs op (t+2) ( Val.Abs op (t+3) Val.Abs op (t+4) Val.Abs op (t+5)

= = = =

...... Val.Abs ip (t+2) ") ...... Val.Abs ip (t+2) " ...... Val.Abs ip (t+2) " ...... Val.Abs ip (t+2)

(37)

Now we can combine these two by using Modus Ponens and Conjunction rules together with theorem 25 which satisfies the input constraint for the latch. The final result together with hiding the internal line using the existential quantifier looks like the following: Clock( ~l' ~l' ~2, ~2)

( 3z.

"

N_NandJmp(~l!iPl,iP2'z) "),, LatchJmp( ~2' ~2' Z, op)

WB iPl ~l " WB iP2 ~l " ~l(t) = Hi " Def iPl (t+l) " Def iP2 (t+l) " Def iPl (t+2) " Def iP2 (t+2) Val.Abs 0P (t+2) = (Val.Abs iPl (t+2) " ( Val.Abs op (t+3) = (Val.Abs iPl (t+2) " Val..Abs op (t+4) = (Val..Abs iPl (t+2) " Val.Abs op (t+5) = (Val.Abs iPl (t+2) "

(38)

Val.Abs iP2 Val..Abs iP2 Val..Abs iP2 Val.Abs iP2

(t+2» ") (t+2» " (t+2»

"

(t+2» "

Note that this theorem which gives the combined behaviour of the n-type nand gate and the latch is in the same form as the correctness statements for the two devices from which it is built. This particular feature helps our formal approach of combining CLIC gates together to be expanded to cover circuits of arbitrary complexity. The correctness statements will increase in size in combining large and complex circuits but will not change in their inherent structure.

4

Discussion and Future Work

A full formal presentation has been given for the CLIC design style which we believe to be suitable for VLSI. In particular a form for the correctness statement

320

Formal Verification of a Design Style

of CLIC gates has been developed which maintains uniformity of specifications across the many levels of hierarchy of circuit design-from the very simple logic gates to fairly complex structures using many macro blocks. The use of these formal techniques have been demonstrated on a simple example which uses many CLIC gates, both simple and complex. A major case study based on the design of a digital phase-locked loop is in progress which demonstrates the use of these techniques on large systems. Naturally the work presented is only as good as the axioms on which it is based. Current models used for the primitive devices of integrated circuits are simplistic, with the view to making the proofs of correctness easier. These models are not inaccurate, but merely incomplete. They have only the features which are relevant to the design style; other properties are not modelled. Too simplistic a model of these devices, however, may allow a failure mode to pass unobserved. So proofs based on such models become void. More realistic models are needed for these primitives, together with means of showing that the simple models suffice in controlled environments. It is hoped that research in this area will support most of the work done to date using simpler models, by formally showing that the simpler models are adequate in the environment in which they are used. Some results in this area are already available [9], where the simulation model used in [1] is embedded in logic. However this is not yet developed to the point where it is usable for the dynamic behaviour of circuits.

Acknowledgements I would like to thank all members of the Cambridge Hardware Verification Group. In particular Tom Melham and Jeff Joyce who made valuable comments on earlier drafts of this paper. Special thanks are due to Mike Gordon who provided the necessary stimulus for this work and who was a continuous source of encouragement. I would also like to thank Dave Orton of Racal Research and Glynn Winskel of Cambridge University for providing many stimulating discussions in hardware and formal methods respectively.

REFERENCES

321

References [1] Randal Everitt Bryant "A Switch-Level Simulation Model for Integrated Logic Circuits," PhD. Thesis, also available as a Technical Report MIT/LCS/TR-259, Laboratory for Computer Science, MIT, Massachusetts, March 1981. [2] A. Camilleri, M. Gordon and T. Melham "Hardware Verification using Higher-Order Logic," In: Proceedings of the IFIP International Conference: From H.D.L. Descriptions to Guaranteed Correct Circuit Designs, Grenoble, September 9-11, 1986. [3] I. S. Dhingra Ph.D. Thesis. Computer Laboratory, University of Cambridge. 1987 [4] Nelson F. Goncalves and Hugo J. De Man "NORA: A Racefree Dynamic CMOS Technique for Pipelined Logic Structures," IEEE Journal of Solid-State Circuits, SC-18 (3), June 1983 pp.261-266. [5] M. J. C. Gordon "HOL: A Machine Oriented Formulation of Higher-Order Logic," Technical Report 68, Computer Laboratory, University of Cambridge, 1985. [6] M. J. C. Gordon "Why Higher-Order Logic is a Good Formalism for Specifying and Verifying Hardware," In: Formal Aspects of VLSI Design, edited by G. Milne and P. A. Subrahmanyam, North-Holland, 1986. [7] R. H. Krambeck, Charles M. Lee and Hung-Fai Stephen Law "High-Speed Compact Circuits with CMOS," IEEE Journal of Solid-State Circuits, SC-17 (3), June 1982 pp. 614--619. [8] D. W. R. Orton "Clocked Dynamic Logic for CMOS," Racal Research Internal Memo of 10th January 1984, Worton Drive, Worton Grange Industrial Est., Reading RG2 OSB, England. [9] Glynn Winskel "Models and Logic of MOS Circuits," in VLSI Specification, Verification and Synthesis, Ed. G. Birtwistle and P.A. Subrahmanyam, Kluwer 1987. (These proceedings)

11 A Compositional Model of MOS Circuits by Glynn WiiIskel University of Cambridge, Computer Laboratory, Corn Exchange Street, Cambridge CB2 3QG England.

1.

Introduction.

This paper describes a compositional model for MOS circuits. Following Bryant in [11 it covers some effects of capacitance and resistance used frequently in designs. Bryant's model has formed the basis of several hardware simulators. From the point of view of verification, however, it suffers the inadequacy that it is not compositional, an expression which will be explained further. This makes it hard to use the model to reason in a structured way. In this paper we shall restrict our attention to the static behaviour of circuits. Roughly, a circuit's behaviour will be represented as the set of possible steady (or stable) states the circuit can settle into, ignoring oscillatory behaviour. Certainly a good understanding of such static behaviour is necessary in order to treat sequential circuits where the clock is taken to be slow enough so that one circuit settles into a steady state between changes. Because sequential circuits often use capacitance to precharge isolated parts of a circuit we shall include a treatment of charge sources due to capacitance in our analysis of static behaviour. Unfortunately while the model appears to generalise well to sequential circuits when all the memory is dynamic, there are difficulties describing static memory correctly. This problem is discussed in the concluding section. Note that Bryant can avoid this problem because a simulator need only produce one possible sequence of steady states of a circuit. The difficulty comes in accounting for all possible behaviours. While the model introduced here owes a great deal to Bryant's work, the reader is warned that we have not stuck closely to Bryant's development or terminology. We take the view that it is useful to have a language to describe the construction of circuits. A term in the language describes a circuit and the term's structure can be used to direct reasoning about the circuit's behaviour. Naturally there should be atomic terms for basic components like transistors and then these are composed together using operations. As a guide in choosing the operations, we borrow ideas from CCS and CSP [7, 51. A circuit can be viewed as a process which communicates with its environment at its connection points. Assuming the connection points are named we can compose two circuits by connecting them at those points with names in common. In this way we can construct large circuits from smaller ones. Often once a connection point has been used to connect circuits we no

324

A Compositional Model of MOS Circuits

longer want to connect to it, so remove its name, and in effect hide, or insulate, the point. To summarise and make this more precise, a circuit is to be associated with a set of connection points which are named; the set of such points we shall call the sort of the circuit. There is an operation of composition on circuits. Two circuits are composed by connecting those points which share the same name. The sort of composition is thus the union of the sorts of its two constituent circuits. There is an operation of hiding; we can cease regarding a point of a circuit as a connecting point and remove its name. The sort of the resulting circuit is thus the sort of the original circuit with the name of the hidden point removed. These two operations, with atomic terms for basic components, will generate a language whose terms describe circuits. To define the language, we assume a count ably infinite set of point names II, ranged over by a, (3, "f, .. " and a set of capacitance strengths K, with typical element k, and conductance strengths G, with typical element g. Strengths will give a qualitative measure of the relative sizes of basic devices in a way to be explained. The syntax of the language is given by:

I Gnd (a) I caPkH(a) I caPkda) I pt( a) I wre( a, (3) I resg(a, (3) I ntran(a, (3, "f) I ptran(a, (3, "f) I c. c I c \ a.

c ::=Pow (a)

In the terms wre(a, (3), resg(a, (3), ntran(a, (3, "f), and ptran(a, (3, "f) we insist that the points a, (3, "f are distinct; so in these basic devices names are associated with unique connection points, a property that is inherited by any circuit described a compound term. Circuits are built up by composition; the term CO.Cl represents the composition of the circuits represented by Co and Cl' Hiding a point a in a circuit described by c is represented by the term c \ a. The atomic term Pow (a) stands for a power source connected to the point a, while Gnd (a) stands a connection to ground at a. We shall draw these as Pow (a):

a

--®

Gnd (a):

a-@

where the H signifies that power provides a high value of voltage and L that ground provides a low value. Another kind of source arises through charge storage. The term capkH(a) represents a capacitance of strength k, charged high, connected to point a at one end and to ground at the other. The term caPkda) is a similar capacitance but charged low. They are drawn as:

Note that all the capacitance terms we consider are associated with only one connection point because they are to be used simply as a sources of charge.

325

A Compositional Model of MOS Circuits

The terms pt(o:), wre(o:, (3) and resg(o:, (3) are various forms of connectors. The simplest is pt(o:), a point named 0:. The term wre(o:, (3) represents a wire connecting points 0: and (3 while the term res g (o:,(3) stands for a resistance with conductance strength g E G connecting points 0: and (3-it is intended to model components like d-type transistors or be included in situations where resistance effects may count. We shall draw points, wires and resistances as:

pt(o:):

eo:

wre(o:, (3):

0:-(3

The term ntran(o:, (3, 1') stands for an n-type transistor with a gate I' which when it is high connects points 0: and (3 with perfect conductance. The term ptran( 0:, (3, 1') stands for a p-type transistor with a gate I' which connects points 0: and (3, again with perfect conductance, when the gate I' is low. We draw transistors as:

ntran(o:, (3, 1') :

ptran(o:, (3, 1'):

it

0: ~(3.

Should the effect of resistance be significant we can insert a suitable resistance between 0: and (3, and, should the capacitance of the gate be significant, capacitances can be connected to model this. This is an idealised model of a transistor. It treats transistors as switches and ignores the fact that in reality transistors have thresholds (see section 4). We explain the idea of qualitative strength on which Bryant's model and ours, which follows his, are based. The capacitance strengths K are assumed to form a finite totally ordered set, to which we adjoin a least element 0,

(It seems most MOS designs can be described with m = 3.) The capacitances in a design are assigned a strength which says how they combine with other capacitances. The value 0 corresponds to negligible capacitance. The meaning of k < k' in the strength order is that a capacitance assigned strength k is negligible with respect to one assigned strength k'. More exactly, the assumption made of the strength order can be stated: if a capacitance of strength k is combined with one of strength k' the result is equivalent to a capacitance of strength the maximum of k and k'. Accordingly, if a capacitance of strength k is charged low and connected to one of strength k' charged high, the result is, in effect, a capacitance of strength k' charged high. Accompanying this, if neither of two capacitances can be guaranteed to override the other, they are assigned the same strength. Of course, a problem arises with this somewhat naive explanation. Two capacitances of the same strength k should combine to give a capacitance of strength kj the capacitance of one alone is not negligible relative to that of the combination. But if we combine many capacitances of the same strength k the result may not be a capacitance of strength kj it may not be negligible relative to one of strength k' even though k < k'. Still, for any particular

326

A Compositional Model of MOS Circuits

circuit term, the number of such combinations is bounded. It follows that, with respect to a particular circuit term, we can interpret capacitance strengths so that when capacitance of strength k combines with one of strength k' the result is a capacitance of strength the maximum of k and k'. In more detail, there is a small number € < 1, dependent on the circuit term, so that we can interpret the capacitance strengths as intervals of positive real numbers (capacitance values)

where eke represents an interval (e k, k e) with end points e k < ke, so

for 0 ~ i < m. With respect to this interpretation a capacitance value G has strength kif GEe P. By taking the intervals e P wide enough and the number € small enough there are choices of capacitances for the basic components caPk (O!) which ensure the assumptions of the strength order are never invalidated by compositions in the term. To see how capacitances compose, assume a capacitance of size G, charged low to a voltage v with charge Q, is connected to a capacitance G', charged high to voltage v' with charge Q'. From the equation Q = vG, relating charge to voltage and capacitance, we see Vre.

=

Q+Q'

C

+ C,'

If now we assume G has strength k, and G' strength k' with k < k' we know G / G' < €. By Taylor's theorem we derive

vre • = v' - (v' -

v)(~

-

(~)2 + higher order terms).

Hence v re • is to within €( v' - v) of v'. We cannot conclude, in general, that if v' is high then v re • is high also. However, because the number of compositions of capacitances in a term is bounded, we can choose f small enough so that whenever such a composition occurs in the circuit described by the term the resultant voltage is high. There is a similar strength order on conductances G of resistances, again assumed finite, to which is adjoined a maximum element 00, gl


- a and')' -->- (3 but not a -->- ')' or

r; -->- ')'.

9 ~

00

~

g

a~{3

This shows that conducting paths, in the sense of paths conducting values, are not just chains of resistances according to this model; here there is a chain of conductances between a and (3 and yet no flow relation between them. In Bryant's words the path is "blocked" through the point ')' having a higher strength than the endpoints a and (3. Despite the chain of resistances linking a to (3 they are effectively disconnected, and the value at one point does not effect the value at the other. Connecting two resistances in series between two points, without hiding the connection point, is not equivalent to, nor does it imply, the effect of putting a resistance between them (a phenomenon analogous to that due to interruption points in parallel programs). It is partly for this reason that the model is based on the relation -->- rather than on conductance matrices. All these arguments suggest the form of a static configuration of a circuit of sort A; it should consist of a strength, value function, internal-value function and a flow relation. Of course it remains to detail the axioms which relate the different attributes, and later to demonstrate that the attributes are sufficient to give a compositional model for the static behaviour of circuits. Definition. a structure

Let A be a set of points in II. A static configuration of sort A is

(8, V, I, -->-) where

8: A -> S (the strength function), V: A -> V (the value function), I: A -> V (the internal value function) and -->- is a reflexive, transitive relation on A (the flow relation)' which satisfy

(i)

8(a) = 0

+->

V(a) = 0,

(ii) I(a) ~ V(a), (iii) a -->- (3 -> 8 (a) ~ 8 ((3), (iv) a -->- (3 -> I(a) ~ 1((3) 1\ V(a) ~ V(3, (iii) a -->- (31\ 8((3) E K U {a} -> 8(a) = 8((3), (vi) a -->- (3 1\ 8(a) = 8((3) -> (3 -->- a. for any points a, (3 in A. Write sort(a) for the sort of a configuration a. Write StatAl for the set of static configurations of sort A, and Sta for the set of all static configurations. Most of the properties have appeared in the earlier discussion, but it is worth commenting on why they hold again. Property (i) says a signal of no

334

A Compositional Model of MOS Circuits

strength has value Z and vice versa; this fits the interpretations of 0 as miminmum strength and Z as negligible value. Naturally the contribution to the value from internal sources cannot exceed the resultant value-property (ii). Property (iii) says How cannot go from a point at weaker strength to a point at greater strength. This is clear by considering how a relation a + 13 is built out of a chain of relations a +1 ... +1 13 and noting (iii) holds across each single step +1' Property (iv) formalises the idea that + represents the direction in which values at points contribute to values at other points. The property (v) states that if a + 13 and 8(13) is a capacitance strength then 8(a) is the same strength. This follows as capacitance strengths are always ranked below those of conductance in the strength order. Assume a +1 13 and 8(13) E K. Then 8(a) . g = 8(13) for some conductance strength g for which 8(13) ~ g. This can only occur with 8(a) = 8(13). The property follows when a + 13 because this relation is built out of steps +1. Property (vi) follows in a similar way. It expresses the fact that if a + 13 and a and 13 are at the same strength then 13 + a, so a and 13 are connected. To justify (vi) assume that a +1 13 and 8(a) = 8(13). Then a +1 13 arises iff there is a conductance, of strength g say, between a and 13 so that 8 (a) . g = 8 (13). Hence 8 (13) . g = 8 (a) too making 13 +1 a. There are sufficiently many axioms, and the idea of static configuration sufficiently complicated, to raise the question of their completeness-do static configurations all share some property that does not follow from the axioms (i)-(vi)? We have argued for their soundness. To show completeness we need an argument showing that there are no properties shared by all static configurations of circuit terms which do not follow from those written down in the definition. Later, we shall show that every structure (8, V, I, +) on a finite set of points which satisfies all the axioms (i)-(vi) can be realised as the static configuration of a circuit built-up from resistances and sources connected to those points. Then any property of structures (8, V, I, +) which holds of all static configurations of circuits must also hold of all finite structures satisfying the definition above.

In a static configuration, two points a and 13 may be both in the relation a + 13 and 13 + a. In this case the points a and 13 must have the same 8, V and I values. Sometimes neither a + 13 nor 13 + a, i.e. the two points are incomparable with respect to the How relation +. This means that the points are effectively disconnected. We introduce notation to describe these circumstances.

Definition. Let u a, 13 e A define

= (8, V, I, +) be

a static configuration of sort A. For

a-13=.a+13"13+a a II 13 =. -,(a + 13) " -,(13 + a).

In the special case where the effects of resistance are ignored there is only only one perfect conductance strength 00, the How relation + coincides with

A Compositional Model of MOS Circuits

335

the equivalence relation "". In this situation any two points are either connected or disconnected. Proposition. Assume a static configuration u has the property that for all points x in the sort of u, either 8(x) = 0 or 8(x) = 00. Then in u the relation ~ is symmetric and so an equivalence relation on points with x ~ y iff x "" y for all points x, y of u. Proof. The proof is a little exercise in using the axioms which define a static configuration. • Now we present the obvious first-order syntax for describing static configurations. Static configurations are structures u = (8, V, I, ~). As such they can be described by predicates based on the functions 8, V, I, relation ~ together with relations of equality. The axioms for static configurations are examples of the kind of predicate we have in mind. Also, for example, the predicate

(Vb) = H) - (ex ~ f3

1\

f3 ~ ex)

which is true of all those static configurations where if the point 'Y is high then the points ex and f3 are connected. As this predicate suggests, such assertions will be useful for giving the meaning of circuit terms as sets of static configurations and for specifying their behaviour. The treatment is a little different from that usual for predicate calculus because we shall allow point names in assertions which apply to static configurations where the names do not appear in the sort. However variables bound by quantifiers only range over points in the sort. Actually we are presenting a simple example of a "free logic" where terms need not necessarily denote; it is of the kind discussed in [8] for instance. Notation. Because we shall work with partial functions, mathematical expressions may sometimes be undefined. We indicate a mathematical expression e is defined by writing e !. We first present the formal syntax of the language. The syntax of assertions: Variables: We assume a set of variables Var which range over points IT, and have typical members :1:, y, Z, .•.• Value terms: Terms, denoting values in {H, L, Z, X} have the form t ::=

HI LIZ I X I V(7I")

11(71"),

where 71" is a point term,i.e. an element of IT or a variable in Var. Strength terms: Terms, denoting values in S = K u G u {O, oo} have the form e ::= s 18(71") I e·e I e + e where s E S, and 71" is a point term. Assertions for static con.figuration.s: The set of assertions is generated by

tP ::=71"0 =

71"1 I 71"0 ~ 71"1

I

I to ~ t1 I I It I If I tP 1\ tP I tP V tP I -,tP I 3x·tP I Vx·tP

to = t1

eo = e1 I eo ~ e1

A Compositional Model of MOS Circuits

336

where 11"0.11"1 are point terms, to, t1 are value terms and eo, e1 are strength terms. We shall regard I/> --+ as abbreviating ...,1/> V and I/> ...... as abbreviating I/> --+ e 1\ e --+ 1/>, and use previously introduced abbreviations like x ,. . , y for x + y 1\ Y + x.

e

e

e

The semantics of assertions: We give the semantics of closed assertions. For brevity, assume a static configuration a has the form (Sa, Va, J,r, +a). First we treat value terms which are denoted by partial functions Sta ~ V, as follows: [H~a =

H for all a E Sta

= L for all a E Sta [Z~a = Z for all a E Sta [X~a = X for all a E Sta [L~a

[V(a)TIa _ {v,,(a)

undefined

M -

[J(a)~a = {J,,(a)

undefined

if a E Sta and a E sort (a), otherwise, if a E Sta and a E sort(a), otherwise.

Strength terms are denoted by partial functions Sta ~ S, as follows: [8~a =

8 for all a E Sta

[S(a)TIa _ {S,,(a)

undefined

M -

[

n

eo . elMO'

=

[eo + eda

{ 80 ·81

undefined

= { 80 d+ fi8 1 d un e ne

if a E Sta and a E sort(a), otherwise, if [eo~a = 80 and [e1~a otherwise,

= 81

if [eo~a = 80 and [elBa otherwise.

= 81

Each assertion is denoted by the subset of static configurations at which it is true, defined by the following induction on the length of assertions: .

[ao = ad = {a E Sta I ao E sort (a) and a1 E sort (a) and ao = a1} [ao + ad = {a E Sta I ao E sort(a) and a1 E sort(a) and ao +" ad [to = td = {a E Sta I [to~a! and [tda! and [to~a = [tda} [to ~ td = {a E Sta I [to~a! and [tda! and [to~a ~ [tda} [eo = e1~ = {a E Sta I [eda! and [eda! and [eda = [eda} [eo ~ e1~ = {a E Sta I [eo~a! and [eda! and [eo~a ~ [eda} [tt~

= Sta,

[1/>0

1\

[""I/>~

[ff~

=0

I/>d = [I/>o~ n [I/>d

and [1/>0 V I/>d

= [I/>o~ u [I/>d

= (Sta \ [I/>n

{a 13a E sort(a). a E [1/>[a/xJH [Vx.l/>~ = {a I Va E sort(a). a E [1/>[a/xJH [3x.l/>~ =

A Compositional Model of MOS Circuits

337

This completes our formal language for describing static configurations. We shall write a F

} where if> is an assertion on static configurations to mean {a I a 1= if>}. Define the semantic function [ J from circuit terms to P(Sta) to be the map defined by the following structural induction.

A Compositional Model of MOS Circuits

Sta[all 8(a)

= 00 1\ 1(a) = H}

[Gnd (aH ={O" E Sta[all 8(a)

= 00 1\ 1(a) = L}

[Pow (aH ={O"

E

[capkU(a)] ={O" E Sta[all 8(a) ~ k 1\ (8(a) = k --+ 1(a) = U) 1\ (8(a) > k [pt(a)] ={O" E Sta[all 1(a)

=

339

1(a) = Z)}

--+

Z}

[wre(a,,8)] ={O" E Sta[a,,811 1(a) = Z

1\

1(,8) = Z

1\

a,...,,8}

[res g (a,,8H ={O" E Sta[a,,811 1(a) = Z 1\ 1(,8) = Z 1\ 8(a)·g::; 8(,8) 1\ 8(,8)·g::; 8(a) 1\ (8(a)·g = 8(,8) +-+ a -+,8) 1\ (8(,8)·g = 8(a) +-+,8 -+ a)}

=Z

1\

[ptran(a,,8, ')')] ={O" E Sta[a,,8, ')'11 1(a) = Z

1\

[ntran(a,,8, ')')] ={O"

E

Sta[a,,8, ')'11 1(a)

1(,8) = Z

1\

1(,8) = Z

1\

1b) = Z

1\

1b) = Z

1\

')' II a 1\ ')' II ,8 1\ (a II ,8 Va"'" ,8) 1\ = H --+ a,...,,8) 1\ (Vb) = L --+ a II ,8n

(Vb)

')' II a 1\ ')' II ,8 1\ (a II ,8 Va,..., ,8) 1\ = L --+ a,..., ,8) 1\ (V(')') = H --+ a II ,8)}

(V(')')

[c. d] ={O". p I 0" E [c] and p E [d] and [c \ a] ={O" \ a 10" E [c] and

0" \

a

0".

pH

H.

As the reader will notice, even capacitance sources have fairly complicated behaviours involving conditional assertions, in order to cope with strengths and the fact that they can be overridden. Sources of power and ground have simpler definitions because they cannot be overridden. Resistances introduce the flow relation in a manner dependent on strengths, as we explained when introducing the flow relation. The behaviour of wires is got as a special case, by assuming perfect conductance. We stipulate that a transistor either connects or disconnects according to the voltage on the gate. Notice for both kinds of transistors a value of X on the gate can be associated with a static configuration where it is connecting or with one where it is disconnecting. The denotations of compositions and hiding of terms are obtained in a pointwise fashion.

340

A Compositional Model of MOS Circuits

Further terms can be defined. For instance, we can define a general source

see.u(o:), at 0:, of strength s and value U E V where s = 0 iff U = Z, to have denotation

[see.u(o:)] ={cr E Sta[o:]I 8(0:) ~ s (8(0:)

=S

-+

1\

1(0:) = U)

1\ (8(0:)

>

S -+

1(0:) = 0)}

Such a source only makes a contribution at 0: if the strength of 0: is exactly s. If s is a capacitance strength k then such a weakened source with value U = H or L can be realised by the charged capacitance eaPkU(O:). Similarly, if s is a conductance strength 9 it can be realised by passing signals from ground or power through a resistance of strength g. Sources with value X can be got by connecting two sources with values Hand L at a common point. Writing c = c' iff [c] = [e /], for circuit terms e, e' , we have

seekH(O:) = eaPkH(O:), seekdo:) = eaPkdo:), seegH(O:) = (Pow (,8). resg(,8, 0:)) \,8, seegdo:) = (Gnd (,8). resg(,8,o:)) \,8, scegx(o:) = seegH(O:). seegdo:)· These are examples of some basic equivalences between circuit terms, which can be proved from the denotational semantics. Of course, such ideas were used informally in motivating the definition of static configuration. The one remaining case, sceoz (0:) can be realised as a single connection point standing alone, i.e.

sceoz(o:) = pt(o:).

To explain the way sources combine, it is helpful to use an ordering between pairs 8U, where 8 E Sand U E V and 8 = 0 iff U = Z. On such pairs define

8U ~ 81 U ' iff 8 < 8' or

(8

= 8' and U ~ U ' ).

This forms a finite distributive lattice (meet· and join +), mentioned in Bryant's thesis and Hayes work [4], which may be drawn as:

coX

oOH~ooL

~

~-

klH~klL

OZ The way sources compose can now be summarised by:

A Compositional Model of MOS Circuits

341

As further examples of equivalences on terms which follow from the semantics we look at composing resistances in series and parallel:

(resg(a,,B). resg,(,B,-y)) \,B = resg.g,(a,-y). (resg(a,,B). resg,(a,,B)) = resg+g,(a,,B). In other words combined in series two resistances yield a resistance of conductance strength the minimum of the strengths of the components, while combined in parallel they are equivalent to one of strength the join. Again this agrees with the informal explanation given earlier but this time it follows from the general definition of the semantics of circuit terms. Some equivalences highlight problems with the model. For example, according to the model we could produce a wire by sealing in a source to the gate of a transistor:

wre(a,,B) = (Pow (a). ntran(a,,B,-y)) \-y = (Gnd (a) • ptran(a,,B, -y)) \-y. A first complaint is that this is unrealistic because transistors have significant resistance. But this objection can be catered for simply by building in resistances. More importantly it indicates how effects caused by transistors having thresholds are ignored by the model, a topic we shall return to in the final section. Earlier we pointed out the problem of knowing whether or not we had written down sufficient axioms for static configurations. It is sufficient to show every finite static configuration can be realised as a static configuration of a circuit term. This is established in the following proposition.

Proposition. Any static configuration 0" of sort A a finite subset of IT is the static configuration of some circuit c i.e. 0" E ~c]. Proof. Given 0", the idea is to define the required circuit to be the composition of the following finite set of components:

{sceS(a)U(a) I a E A and I(a) = U ¥- Z}u {resg(a,,B) I a ~ ,Band g EGis the least s.t. S(a)· S(,B):S g}. This uses our knowledge of how to build all sources sce.u(a) as circuits. From the definition of sources, resistances and composition, it follows that 0" E [c] .



We would like a language of assertions with which to specify properties of circuits. A circuit denotes a subset of static configurations Sta. A circuit specification Spec should pick out the subsets of circuits which satisfy it, and so semantically should denote a subset [Spec] ~ P(Sta) of the powerset of static configurations. We have already introduced assertions rP describing static configurations and, like circuits, they denote subsets [rP] ~ Sta, so as they stand they do not have the right type to be taken as circuit specifications. However, there is a standard way to to take assertions of type Sta and make them into

342

A Compositional Model of MOS Circuits

assertions of type P(Sta). For 4> an assertion about static configurations, define 4> and 0 4> to be circuit specifications with denotations ~ 4>~ = ~O A~

{C E P(Sta) I C n ~4>~ -I0},

= {C E P(Sta) I C

~ ~4>n.

A circuit term c satisfies 4> iff ~c~ E ~ 4>~, i.e. 3a E ~cl a P 4>, while c satisfies 04> iff ~c~ ~ ~4>], i.e. Va E ~cl a P 4>. Thus a circuit satisfies 4> means it can satisfy 4>, whereas it satisfies 0 4> means it must satisfy 4>, which is consistent with the way these two words were used informally in section 2. Compound specifications may be made by combining these basic specifications using the usual logical operations. The operators and 0 are then each definable from the other as 4> is equivalent to ...,0 ...,4> (i.e. they are both satisfied by the same things), and 0 4> is equivalent to ..., ...,4>. However both and 0 specifications characterise circuit behaviour in the sense of the following proposition. A circuit specification Spec denotes a subset ~S~ ~ P{Sta), and, for a circuit term c, we define c

p Spec {:> ~c~ E

~Spec~,

when we say a circuit term c satisfies Spec. Proposition. Let c and c' be a circuit terms. Define

c ::Scan c' {:> V assertions 4>. c p 4> ~ c' p 4>, and c ::Smu.t c' {:> V assertions 4>. cpO 4> ~ c' p 0 4>. Then

c

::5ean

c'

{:>

c ::Smu.t c'

~c~ ~ ~c'~, and

{:>

~c'~ ~

[cl

Hence if two circuit terms c, c' satisfy the same specifications 4>, or the same specifications 0 4>, then c = c' (i.e. they have the same denotation). Proof. The proof use the fact that any finite configuration 0" is a characterised by an assertion A" in the sense that p

P A" {:> P =

a.

Suppose c ::5ean c'. If a E [c~ then c p A" so c' p ACT, whence a E [c'l Hence c ::Scan c' ~ [c~ ~ [c~', and the converse is obvious. Suppose c ::Smu.t c'. Clearly cpO WCTEle) ACT. It follows that c' pOW"Ele) ACT. If P E ~c'~ then p p W"E[eD ACT which implies p = a for some 0" E [c~. Hence c ::Smu.t c' ~ [c'~ ~ ~c~, and the converse is obvious. • Now we see how to give a specification of the NMOS inverter. The specification takes account of the fact that the strengths of its outputs are not the same. Its circuit is constructed by the term

c == (ntran{a,,8, ,) • resg {,8, 6) • Pow 6. Gnd (a)) \ a \ 6.

A Compositional Model of MOS Circuits

343

Informally, its output f3 behaves like a direct connection to ground when its input "f is high and like a weakened power source when "f is low. Formally, if we define Sce.u(x) ::s :$ 8(x) 1\ S = 8(x) -> l(x) = U 1\ S

< 8(x)

->

l(x) = Z,

then we can say c satisfies the specification

D (Vb) = H -> Sce oo L(f3) 1\ Vb) = L -> Sce g H(f3)). We would like a proof system in which to demonstrate that a circuit meets a specification. In [10] a complete proof system is presented for proving a circuit term satisfies a specification of the form D tP. A proof system incorporating specifications tP will most likely follow similar lines, though it has not been done yet. To conclude this section, we note an inadequacy which arises if we restrict to specifications of the form D tP. A related difficulty appears in the model of [2,3] where terms like Pow (0:). Gnd (0:) cause problems because they denote the empty relation and satisfy any specification. We have [Pow (0:). Gnd (o:H

=10,

and so, when we come to the logic, the circuit Pow (0:). Gnd (0:) will not satisfy D ff. However, there are other circuit terms which do denote 0 and so will satisfy D ff, and indeed any specification D tP. These are terms which represent circuits whose only possible behaviour is to oscillate. For example the term c :: (ntran(o:, f3, "f) • resg (f3, 6) • Pow 6 • Gnd "f • wre("t, f3))

\ 0: \ f3 \ "f \ 6

with g a conductance strength strictly between 0 and 00, "ties-back" the output of an NMOS inverter to its input, and then insulates all points from the environment. It can be drawn as:

The circuit c denotes the emptyset, [cJ = 0. The circuit is a little peculiar in that it has an empty sort. However composing it with any circuit with nonempty sort yields a circuit with nonempty sort which denotes 0. For this reason the proof system for circuits in [10]' where all specifications have the form D tP, is akin to the logic of partial correctness assertions (Hoare logic); there a purely oscillating circuit satisfies D tP for any assertion tP just as a diverging program satisfies all partial correctness assertions. However, a purely oscillating circuit like that above will not satisfy true, so a proof system extended to include specifications tP would not have this inadequacy.

344 4.

A Compositional Model of MOS Circuits

Problems.

A compositional model for the static behaviour of MOS circuits has been presented. Certainly such a model is necessary for a satisfactory treatment of dynamically changing circuits. Their behaviour can often be viewed as going through a sequence of static configurations as data changes in synchrony with one or several clocks. The data and the clock pulses are held long enough for the circuit to settle into a static configuration. It seems obvious therefore to represent a possible history for such a circuit as a sequence of static configurations. Copying the approach in [2] we could try to extend our work to the dynamic case by simply including a time parameter t, a natural number, and associating devices with assertions expressing time dependencies between static configurations at different times. For example, then we could express facts like a capacitance would contribute a charge at time t + 1 if it was precharged at time t. While this works reasonably well for circuits with purely dynamic memory there is a hard problem in modelling static memory accurately. The main difficulty, it seems, is that short-term capacitance effects influence the physical behaviour but are not captured easily in any extension of our model. The strength orders we use assume that the environment is stable long enough for sources of current to override charges stored by capacitance. This assumption breaks down in some stages needed to explain the behaviour of devices like the following register:

When enable is high, a strong signal at in overrides that already present, and its value is established, after two inversions, at out. When enable becomes low the value is preserved at out (the input value is stored). Speaking loosely, the latter stage relies on capacitance to maintain the value at in until the feedback loop takes over. This occurs over a very short time when the assumptions behind the strength order are violated. I cannot, at present, see how to extend the model to account for effects over such a short time. Through failing to cope adequately with such short-term effects, the obvious model I have sketched allows more possibilities than are physically possible. By the way, the work of Bryant does not address this problem; a simulator need only generate one possible course of behaviour and Bryant does this by making a unit-delay assumption. (In his simulators all transistors switch with the same delay after their gates change.) Even for just static behaviour there are inadequacies in the model. Chief among them, it seems, is that it fails to cope with the fact that transistors do not behave purely as switches which connect or disconnect according to the voltage on the gate. An n-type transistor ntran( a, {3, 1) only connects when the difference between the voltage on its gate 1 and the minimum of that on a and

A Compositional Model of MOS Circuits

345

fJ exceeds a certain positive threshold. This is ignored in switch level models like those of [11 and the model here. Consequently the model we have described allows certain static configurations which are not physically feasible. Consider the term Pow (a) • ntran(a, fJ, '1) • wre(a, fJ).

~p It has a static configuration in which

"'(

I(a) = I(fJ) = V(a) = V(fJ) = H and a,.., fJ. This is not possible for a real n-type transistor. The gate '1 and the source fJ have the same voltage so the difference in their voltages cannot exceed the positive threshold necessary for a and fJ to connect. Similarly the term Pow (a) • ntran(a, fJ, '1) • wre(a, fJ) • Gnd (fJ) has a static configuration in which

I(a) = I(fJ) = V(a) = V(fJ) = X and a,.., fJ. Again this is not realistic. Such inaccuracies can lead to an incorrect circuit being proved correct and a correct circuit being proved incorrect. For example according to the model presented the circuit I---enal>lt

can store high and low, though showing it acts like a register encounters the same problems as for the real re,aist.r above. It will not in reality because of thresholds. The following diagram describes a circuit which does compute exclusive-or (it is taken from [91, Figure 8.7, page 31~).

However, according to the model when both inputs inlo and in2 are high not only may the output be low, as required but also X-a possibility that is ruled out in reality because of thresholds. Interestingly, although the model is inaccurate, we could not prove the design incorrect according to the model if we just use specifications of the form 4>. Certainly if the design satisfies 0 4> according to the model then it will in reality, because the model allows more static configurations than are really possible. We could not prove it correct either. There is a -assertion which

o

346

A Compositional Model of MOS Circuits

holds of the circuit according to our model, viz. Au where Au is the assertion characterising the unrealistic static configuration we have described, which does not hold of the real design. It looks as if the model can be extended to cope with thresholds while keeping the same qualitative style. It appears to need another attribute to be attached to static configurations-a measure at each point of how strongly the value has been degraded. N-type transistors can transmit high values. But they do so at the cost of degrading the voltage to that of roughly the gate voltage minus the threshold. The measure of degradation would count the number of such degradations. The idea needs to be worked through more carefully however, and a decision taken as to how to manage different thresholds values and their relationship to each other. Much needs to be done. Extending the model to sequential circuits with static memory seems hard. Coping with thresholds to give a more accurate treatment of static behaviours seems more tractable. Dealing with it will surely press home the need for a more careful analysis of the relationship between qualitative models of the kind here and explanations based more closely on the physics of devices. Some work, [6] and [11], has been done on the relationship between different models of hardware. But the development of a general framework in which to carry out the verification of hardware at all stages in its development, from logical design to layout, calls for a spectrum of models at different levels of abstraction with formal relations between them. We cannot claim to have anything like this at present. Acknow ledgements. I would like to thank the members of the Hardware Verification Group here in Cambridge for help and encouragement and especially Mike Gordon for laying bare some of the issues in modelling and verifying hardware which led to this work. Mark Greenstreet helped explain the difficulties which arise from ignoring switching thresholds in transistors. Randy Bryant has been encouraging. Finally many thanks to Graham Birtwistle for his organisation and encouragement.

A Compositional Model of MOS Circuits

347

References. [lj Bryant, R.E., "A switch-level model and simulator for MOS digital systems" , IEEE Transactions on Computers C-33 (2) pp. 160-177, February 1984. [2j Camilleri, A., Gordon, M., and Melham, T., "Hardware verification using higher order logic", To appear in the proceedings of the IFIP International working conference, Grenoble, France, September 1986. Also available as a report 91 of the Computer Laboratory, University of Cambridge, 1986. [3j Gordon, M.J.C., "Why higher order logic is a good formalism for specifying and verifying hardware", in "Formal aspects of VLSI design: Proceedings of the 1985 Edinburgh Conference on VLSI", G.J.Milne and P.A.Subrahmanyam, eds., North-Holland, Amsterdam, 1986. [4j Hayes, J., "A unified switching theory with applications to VLSI design", Proc. IEEE 70 (10) pp. 1140-1155 Oct. 1982. [5j Hoare, C.A.R.,· "Communicating sequential processes", Prentice-Hall, 1985. !6j Melham, T.F., "Abstraction Mechanisms for Hardware Verification" in: VLSI Specification, Verification and Synthesis G.M. Birtwistle and P.A. Subrahmanyam, eds, (this volume). [71 Milner, R., A Calculus of Communicating Systems. Springer Lecture Notes in Compo Sc. vol. 92, 1980. [8j Scott, D.S., Identity and existence in intuitionistic logic. In proc. of Durham conference on Applications of Sheaves 1977, Lecture notes in Math., SpringerVerlag, 1979. [9j Weste, N., and Eshraghian, K., "Principles of CMOS VLSI Design", Addison Wesley Publishing Company, 1985. [10j Winskel, G., "Lectures on models and logic of MOS circuits", Proceedings for the Marktoberdorf Summer School, July 1986. Being published by Springer. [l1j Winskel, G., "Relating two models of hardware", Submitted to the conference "Category theory and computer science", Edinburgh, September 1987.

12 A Tactical Framework for Hardware Design

A Tactical Framework for Hardware Design * Steven D. Johnson, Bhaskar Bose, and C. David Boyer Computer Science Department Indiana University Bloomington, Indiana

Introduction This work explores an algebraic approach to digital design. A collection of transformations has been implemented to derive circuits, but "hardware compilation" is not a primary goal. Paths to physical implementation are needed to explore the approach at practical levels of engineering. The potential benefits are more descriptive power and a unified foundation for design automation. However, these prospects mean little when the algebra is done manually. Hence, a basic, automated algebra is prerequisite to our experimentation with design methods. The transformation system discussed in this paper supports a basic design algebra. At this early stage of its development, the system is essentially an editor for deriving digital system descriptions from certain applicative specifications and for interactively manipulating them. It is integrated with existing assemblers for programmable hardware packages, such as the PAL components used to implement the examples presented in Sections 3 and 5. The transformation system and back-end assemblers are implemented in Scheme [18], a Lisp dialect, by Bose [2]. The coherent manipulation of design descriptions is a significant problem. Digital engineering is governed by a variety of theories addressing orthogonal problem aspects. It is not enough to unify these treatments in some

* Research reported herein was supported, in part, by the National Science Foundation, under grants numbered DCR 84-05241 and DCR 85-21497, and by the Westinghouse Educational Foundation.

350

A Tactical Framework lor Hardware Design

metalanguage, for they must also be integrated. In any design instance, independent aspects become dependent facets, and hence, designs simultaneously decompose in distinct hierarchies. In this paper, we look at the separation of algorithm and architecture, the logical hierarchy of functional units, and the organization of physical components. The direction of this work is to find ways to maintain orthogonal views of a description as it is manipulated toward an implementation. The examples in this paper are implemented as externally clocked systems, often called "synchronous systems" or "clocked sequential systems." The underlying model of sequential behavior is simply realized by a global clock. There is nothing to preclude other realizations of timing, although we have not done so. The examples show a transformational approach to synchronous design. They follow a structured digital design strategy to obtain comparable implementations. They also reflect a more general design methodology. Section 1 is an informal discussion of how the approach is related to standard methods. The key idea, and central subject of this paper, is a class of transformations called system factorizations. These are used to isolate a level of detail while maintaining the global coherence of a circuit description. We use an algebra on applicative notation. Section 2 gives a brief look at this language and its use to describe sequential systems. Functional recursion equations are used as specifications. Part of the algebra is to correctly translate these specifications into sequential-system descriptions, and Section 2 concludes with a simple characterization relating the two forms of description. Section 3 is a first example of design derivation, and it serves several purposes. Most important is the illustration of the transformation algebra, and in particular, the factorization of subsystems. At the same time, a generic treatment of synchronous communication is discussed. We then explain how the transformation system is integrated with PAL assemblers to obtain physical implementations. Integration with VLSI tools is in progress. Section 3.3 is a brief discussion promoting the free use of functional abstraction in describing sequential systems. This notion is not explicitly used later, so Section 3.3 may be skipped. Section 4 explores system factorizations in more detail, laying out basic laws from which transformations are composed. The parameters in factorization are discussed. Section 5 reviews the derivation and implementation of a garbage collector. This example shows how the basic algebra presented in this paper applies

A Tactical Framework for Hardware Dellign

351

to a non-trivial design exercise. The accompanying figures (5-1 through 5-6) are developed directly from representations used by the transformation system. It is the form of these figures, not their content, that is of interest. They are included to convey the sense of engineering in this syntactic framework. Boyer gives details of the collector's design 141. The garbage collector's derivation is targeted to a bit-slice implementation in PAL components. Repeated factorizations lead to an appropriate physical decomposition and bring a substantial portion of the design into programmable technology. This derivation strategy also bypasses the problem of incorporating representations in higher level descriptions. This issue is discussed in the conclusion.

1.

The Relationship to Conventional Synchronous Design

Synchronous design isolates the algorithmic aspect of a problem-often specified as a finite-state machine, flowchart, or procedure-and refines a preliminary architecture once control is understood. We view this as a translation between dialects of recursive expression, a perspective that is projected in the ideogram SaIl' --+ laIr --+ Ca/I' --+ C~1I9I' --+ ... This formula expresses nothing formal beyond the notion of transformation. S is a problem specification, which for us is a system of function definitions. S is expressed in terms of a vocabulary, 9 /r, of primitive constants, operations, predicates, and 80 forth, called the basis, or "ground type." The basis is an aggregate of complex objects (9) and concrete objects (r). Some examples are arrays of integers, queues of characters, sets of points. The complex/concrete distinction is subjective and hierarchical; r represents an intended level of description. The first engineering task is transforming S to a form conducive to hardware implementation. Previous papers develop this phase of design derivation 112, 14, 131. I stands for "iterative"-that class of recursion schemata that characterizes sequential control. In software, S - I is sometimes called "recursion removal," but implementing recursion in 9 is not precluded. An explicit treatment of control is one argument for the power of a functional algebra, but that is apart from the subject here. In this paper, all specifications are iterative to begin with.

352

A Tactical Framework for Hardware Design

C is a sequential-system description. We have shown that such descriptions are mechanically derivable from iterative specifications [14]. The transformation I -+ C is implemented and this phase of design derivation is automatic. C has the same ground type as I, but its interpretation is "behavioral." Where before, variables ranged over values, in C they denote sequences of values. We are interested in other interpretations of C as well, such as its description of physical connectivity or graphical layout [16, 31. C is an abstract description because its sequences hold complex objects. It is refined to a more concrete description by factoring a as an abstract component. The formula c'lla is meant to suggest a communicating system. C' is subject to further refinement and r to further decomposition. What makes this kind of transformation a factorization is that terms inside the sub-system a may be retained in C'.

One purpose of a system factorization is to segregate problem dependent concrete terms from terms that are intrinsic to the type abstraction. The entailed analysis may depend both on the structure of the basis and also on prior assumptions of how a is implemented. Our present concern is exploring how to parameterize syntactic transformations in support of such analysis. For the purpose of contrast, we would describe standard synchronous design methods as The engineer makes an estimate of the architecture required to solve a problem, then develops control algorithm I to govern that architecture. A hardware realization of control is then compiled or systematically derived; that is, C' is a's controller. Communication is implicit in the control specification

A Tactical Framework lor Hardware Design

353

language. For example, in finite-state [10] and ASM [22] notations, variables denote the presence of a value now; and this is also true in higher level hardware description languages. The basis e /r of a functional specification corresponds to the estimate of architecture. The initial transformations secure the global design description as control is isolated. That is, a correct description of connectivity is maintained as the control is incorporated in architecture. However, complex (i.e., complex typed) behaviors arise in descriptions thus derived. Factorizations isolate concrete terms that are actually implemented from those that serve to preserve the global coherence of a description. It is a thesis of this work that many aspects of digital engineering can be addressed in this manner.

2.

Notation and Background

Descriptions are built from applicative terms, such as I(x, c), where operation symbols like I and constant symbols like c come from a given basis and x is a variable. Functional values are allowed; the lambda-expression .AX.t denotes the function with rule t and parameter X. In function definitions it is conventional to suppress the lambda, writing d

AndNot(u, tI) = and(u,not(tI)) instead of

AndNot = .A(u,tI) . and(u,not(tI)). Function names are written in slanted font, variables in italic font and constants in sans serif font. The names of defined functions, such as AndNot, are capitalized. Product formation is expressed with brackets. For instance, we might describe a multiple output function

BitSum(a, b, Cin) ~ [s, Cout]

where

{

s = parity( a, b, Cin) = majority(a, b, Cin)

Cout

A recursion equation is a simultaneous system of function-defining equations. There are examples throughout this paper. Function definitions usually involve conditional expressions, sometimes written as

P

-+

E, E'

354

A Tactical .Framework lor Hardware Delign

and sometimes as case-statements such as

per instruction VB G: El

R: Es

W : B.J.

This form expresses a simple selection; the per-expression is always a variable and the case labels are always constants. An expression is simple if it contains no recursively defined function symbols; otherwise it is recursive. A simple function definition is called a combination. A recursive term is tail-recursive if its only defined symbol is in the outermost applied position. A recursion equation is iterative if each branch of every conditional is either simple or tail-recursive. All but one of the examples in this paper are iterative. Circuits are also described with applicative expressions, but the interpretation is sequential. The signallZ] (X,@]) denotes the sequence S in which Si = I(Xi , c). The boxes are a convention to distinguish the sequential interpretation and are omitted later. Signal variables are upper case. Abstraction is extended so that, for example

IAndNotl(A, B) == landl(A,lnotl(B)) and

I~(u,l/) . I(u, g(u,l/HI (X, Y) ==

Il1 (X, I!! (X, Y))

1m is a sequential conditional. {S" if P" is true, X = I!!J (P, S, S') then X" = S~: if P" is false.

The two-way selector H

r;;)

I

The operator '!' prefixes a value to a signal. It expresses storage or delay. H X =

~

!S

then

Xo =

~

and XHI = S".

A system description is a collection of simultaneous signal-defining equations. For example, d ToggJe(O, T) = Q where Q = 4" (0, Walsel, Z)

1m

Z=Im(T, Inotl(Q), Q)

A Tactical Framework for Hardware Design

355

describes a storage element that is cleared to false whenever signal C is true and otherwise inverts its content whenever signal T is true. The symbol