114 58 4MB
English Pages 281 [276] Year 2010
Synthesis of Embedded Software
Sandeep K. Shukla
Jean-Pierre Talpin
Editors
Synthesis of Embedded Software Frameworks and Methodologies for Correctness by Construction
123
Editors Dr. Sandeep K. Shukla Virginia Tech Bradley Dept. Electrical & Computer Engineering Whittemore Hall 302 24061 Blacksburg Virgin Islands USA [email protected]
Dr. Jean-Pierre Talpin INRIA Rennes-Bretagne Atlantique 35042 Rennes CX Campus de Beaulieu France [email protected]
ISBN 978-1-4419-6399-4 e-ISBN 978-1-4419-6400-7 DOI 10.1007/978-1-4419-6400-7 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010930045 c Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Introduction Sandeep Shukla and Jean-Pierre Talpin
Embedded Systems are ubiquitous. In applications ranging from avionics, automotive, and industrial process control all the way to the handheld PDAs, cell phones, and bio-medical prosthetic devices, we find embedded computing devices running embedded software systems. Growth in embedded systems is exponential, and there are orders of magnitude more embedded processors and embedded applications today in deployment than other forms of computing. Some of the applications running on such embedded computing platforms are also safetycritical, real-time, and require absolute guarantees of correctness, timeliness, and dependability. Design of such safety-critical applications require utmost care. These applications must be verified for functional correctness; satisfaction of real-time constraints must be ensured; and must be properly endowed with reliability/dependability properties. These requirements pose hard challenges to system designers. A large body of research done over the last few decades exists in the field of designing safety-critical hardware and software systems. A number of standards have evolved, specifications have been published, architecture analysis and design languages have been proposed, techniques for optimal mapping of software components to architectural elements have been studied, modular platform architectures have been standardized, etc. Certification by various authorities and strict requirements for satisfying certification goals have been developed. Today’s avionics, automotive, process control, and many other safety-critical systems are usually developed based on such a large body of knowledge and technology. However, there is still no guarantee that a system will not crash, or an automotive throttle by wire system malfunction will not cause problems, or a process control system will fail to work properly. All the model based development process
S. Shukla Fermat Laboratory, Virginia Tech, Blacksburg, VA, USA e-mail: [email protected] J.-P. Talpin INRIA, Centre de Recherche Rennes – Bretagne Atlantique, Campus de Beaulieu, Rennes, France e-mail: [email protected]
v
vi
S. Shukla and J.-P. Talpin
specification, and strict certification are meant to eliminate the possibilities of such eventualities, but cannot eliminate them completely. Formal verification could provide more certainty than test based certification, but even formal verification is as good as the abstract models and properties that one verifies on such models. Even though, 100% guarantee is not possible, we believe that a rigorous formal methods based approach which transforms abstract specifications via refinement steps will be the methodology of choice for such systems, if it is already not so. While the refinement steps will add further complexity to satisfy various constraints, only refinements to be used are those that could be mathematically proven to preserve the correctness already proven at higher abstraction levels. Towards achieving the goal of designing safety-critical systems, especially embedded software in this way, a number of disciplines have been developed, and are being researched at the moment. These research areas consist of programming models, abstractions, semantics preserving refinements, real-time and other nonfunctional constraint based model elaboration, automated synthesis of code, formal verification, etc. In Europe, a number of programming models and languages based on such programming models have evolved for this purpose. Synchronous and polychronous programming models constitute important classes among them. Stream based programming models, Petri Nets, Process algebra and various semantically enriched UML models also belong to these. In the USA, actor based modeling paradigms, tuple-space based modeling, and guarded command languages are the main ones, other than automata based composition of processes. Defining a programming model, its semantics, and translation schemes based on operational semantics into execution languages such as C is only one part of the task. The other part is to create an entire methodology around the language, programming model, and tools for creating executable software from it. In 2009, we developed a one day tutorial at the ACM/IEEE Conference on Design, Analysis and Test in Europe (DATE 2009) where a number of speakers spoke about the various languages, models, and tools for correct-by-construction embedded software design. Following up on this successful tutorial the present book introduces approaches to the high-level specification of embedded software that are supported with correct-by-construction, automated synthesis and code generation techniques. It focuses on approaches based on synchronous models of computation and the many programmatic paradigms it can be associated with to provide integrated, system-level, design methodologies. Chapter 1, on the “Compilation of Polychronous Data-Flow Equations”, gives a thorough presentation of the program analysis and code generation techniques present in Polychrony, a compiler for the data-flow synchronous language Signal. Introduced in the late 1980s, Signal and its polychronous model of computation stand among the most developed concepts of synchronous programming. It allows to model concurrent embedded software architectures using high-level multi-clocked synchronous data-flow equations.
Introduction
vii
The chapter defines the formal methodology consisting of all required program analysis and transformation techniques to automatically generate the sequential or concurrent code suiting the target architecture (embedded or distributed) or compilation goals (modularity or performance). Chapter 2, on “Formal Modeling of Embedded Systems with Explicit Schedules and Routes”, investigates data-flow process networks as a formal framework to model data-flow signal processing algorithms (e.g., multimedia) for networks of processors (e.g., multi-cores). While not readily a programming model, a process network provides a suitable model of computation to reason on the issues posed by the efficient implementation of data-intensive processing algorithms on novel multi-processor architectures. Process networks provide a bridge between the appealing scheduling and routing issues posed by multi-core architectures and the still very conventional programming language concepts that are commonly used to devise algorithms. In this aim, process networks act as a foundational principle to possibly define domain-specific yet high-level programming and design concepts to match the main concern of data-processing algorithms, which is to minimize communication latencies (with external memory or between computation units) as early as possible during system design. Chapter 3, on “Synoptic: A Domain-Specific Modeling Language for Space Onboard Application Software”, is a perfect illustration of a related effort to design and structure high-level programming concepts from the specific requirements to an application domain. Synoptic is a modeling language developed in the frame of the French ANR project SPaCIFY. Its aim is to provide a programming environment for real-time embedded software on-board space application, and especially control and command software. Synoptic consists of an Eclipse-based modeling environment that covers the many aspects of aerospace software design: control-dominated modules are described using imperative synchronous programs, commands using mode automata, data-processing modules using data-flow diagrams, and also partitioning, timing and mapping of these modules onto satellite architectures using AADL-like diagrams. As such, it is a domain-specific framework relying on the standard modeling notations such as Simulink/Stateflow and AADL to provide the engineer with a unified modeling environment to handle all heterogeneous analysis, design, implementation and verification tasks, as defined in collaboration with the industrial end users of the project. Chapter 4, on “Compiling SHIM”, provides a complete survey to another domainspecific language, whose aim is to help engineer software interfaced with asynchronous hardware: “the shim under the leg of a table”. In the design of Shim, emphasis is put on so-called scheduling-independence as a principle guaranteeing correctness by construction while assuming minimum programmatic concepts.
viii
S. Shukla and J.-P. Talpin
Shim is a concurrent imperative programming language in which races and non-determinism are avoided at compile-time by enforcing simple analysis rules ensuring a deterministic input–output behavior regardless of way program threads are scheduled: scheduling independence. The chapter gives a complete survey of Shim, the programming principles it is build upon, the program analysis techniques it uses, its code generation strategies for both shared-memory and message-passing architectures. Chapter 5, on “A Module Language for Typing SIGNAL Programs by Contracts”, brings up the polychronous model of computation again to present a means to modularly and compositionally support assumption-guarantee reasoning in that framework. Contract-based design has become a popular reasoning concept in which contracts are used to negotiate the correctness of assumptions made on the definition of a component at the point where it is used and provides guarantees to its environment. The chapter first elaborates formal foundations by defining a Boolean algebra of contracts in a trace-theoretical framework. Based on that contracts algebra, a general-purpose module language is then specified. The algebra and module system are instantiated to the framework of the synchronous data-flow language Signal. This presentation is illustrated with the specification of a protocol for Loosely TimeTriggered Architectures (LTTA). The aim of Chap. 6, on “MRICDF: A Polychronous Model for Embedded Software Synthesis”, is to define a simple modeling language and framework to reason about synchronous multi-clocked actors, as an alternative to the formal methods in the Polychrony toolbox for use by a larger community of software engineers. As the leitmotiv of MRICDF can be summarized as “polychrony for the mass”, a systematic emphasis is put on presenting a modeling language that uses programming concepts as simple as possible to implement a complete synchronous design workbench. The chapter gives a thorough presentation of the design modeling language and on the issues encountered in implementing MRICDF. The use of this formalism together with its visual editor EmCodeSyn is illustrated through case studies to design safety-critical applications. Chapter 7, on “The Time Model of Logical Clocks Available in the OMG MARTE Profile”, gives a detailed presentation of CCSL, the clock constraints specification language defined in the OMG profile MARTE for specifying timing relations in UML. The aim of CCSL is to capture some of the essence of logical time and encapsulate it into a dedicated syntax that can be used to annotate UML diagrams in order to refine them by formalizing this critical design aspect. Multi-form or polychronous logical time, introduced and made popular through its central role in the theory of synchronous languages, is already present in many formalisms pertaining to embedded system design, sometimes in a hidden fashion. Logical time considers time bases that can be generated from any sort of sequences
Introduction
ix
of events, not necessarily equally spaced in physical time. CCSL provides ways to express loose or strict constraints between distinct logical clocks. Solving such clock constraints amounts to relating clocks to a common reference, which then can be thought of as closer to the physical clock. Finally, Chap. 8, entitled “From Synchronous Specifications to Statically Scheduled Hard Real-Time Implementations”, addresses the issue of transforming high-level data-flow specifications expressed in a synchronous model of computation down to sequential or distributed expressed in a real-time model of computation. Hard real-time embedded systems are often designed as automatic control systems that can include both continuous and discrete parts. The functional specification of such systems is usually done using data-flow formalisms such as Simulink or Scade. These formalisms are either quasi-synchronous or synchronous, but go beyond the classical data-flow model, by introducing means of conditional execution, hence allowing the description of hierarchical execution modes. Specific real-time implementation approaches have been proposed for such formalisms, which exploit the hierarchical conditions to improve the generated code. This last chapter present one such approach. It takes data-flow synchronous specifications as input and uses static scheduling heuristics to automatically produce efficient distributed real-time code. The chapter insists on improving the analysis of the hierarchical conditions in order to obtain more efficient code. This book is intended to provide readers with a sample on advanced topics and state of the art in various fields related to safety-critical software synthesis. Hence, it is obviously not an exhaustive handbook. Our hope is that the reader will find the area of research covered in this book important and interesting, and find other interesting research being done in related topics from various other sources. We wish to thank all the contributors of this book for participating in preparing this volume, and for demonstrating their patience with the entire publication process. We also thank Charles Glaser from Springer for his support with this project. Finally, we thank the National Science Foundation, the US Air Force Office of Scientific Research, INRIA, and the ARTIST network of excellence for their support and funding which enabled us to work on this project.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sandeep Shukla and Jean-Pierre Talpin
v
1
Compilation of Polychronous Data Flow Equations . . . . . . . . . . . . . . . . . . . . . . . Lo¨ıc Besnard, Thierry Gautier, Paul Le Guernic, and Jean-Pierre Talpin
1
2
Formal Modeling of Embedded Systems with Explicit Schedules and Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Julien Boucaron, Anthony Coadou, and Robert de Simone
3
Synoptic: A Domain-Specific Modeling Language for Space On-board Application Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 A. Cortier, L. Besnard, J.P. Bodeveix, J. Buisson, F. Dagnat, M. Filali, G. Garcia, J. Ouy, M. Pantel, A. Rugina, M. Strecker, and J.P. Talpin
4
Compiling SHIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121 Stephen A. Edwards and Nalini Vasudevan
5
A Module Language for Typing SIGNAL Programs by Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147 Yann Glouche, Thierry Gautier, Paul Le Guernic, and Jean-Pierre Talpin
6
MRICDF: A Polychronous Model for Embedded Software Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173 Bijoy A. Jose and Sandeep K. Shukla
7
The Time Model of Logical Clocks Available in the OMG MARTE Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .201 Charles Andr´e, Julien DeAntoni, Fr´ed´eric Mallet, and Robert de Simone xi
xii
8
Contents
From Synchronous Specifications to Statically Scheduled Hard Real-Time Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .229 Dumitru Potop-Butucaru, Robert de Simone, and Yves Sorel
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .263
Contributors
Charles Andr´e Laboratoire I3S, UMR 6070 CNRS, Universit´e de Nice-Sophia Antipolis, 06903 Sophia Antipolis Cedex, France, INRIA, Centre de Recherche de Sophia-Antipolis M´editerran´ee, 06902 Sophia Antipolis, France, Charles. [email protected] Lo¨ıc Besnard CNRS/IRISA, Campus de Beaulieu, Rennes, France, [email protected] Jean-Paul Bodeveix IRIT-ACADIE, Universit´e de Toulouse, site Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex 9, France, [email protected] Julien Boucaron INRIA, Centre de Recherche de Sophia-Antipolis M´editerran´ee, 06902 Sophia Antipolis, France, [email protected] ´ Jeremy Buisson VALORIA, Ecoles de St-Cyr Cotquidan, Universit´e Europ´eenne de Bretagne, 56381 Guer Cedex, France, [email protected] Anthony Coadou INRIA, Centre de Recherche de Sophia-Antipolis M´editerran´ee, 06902 Sophia Antipolis, France, [email protected] Alexandre Cortier IRIT-ACADIE, Universit´e de Toulouse, site Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex 9, France, [email protected] Fabien Dagnat Institut T´el´ecom – T´el´ecom Bretagne, Universit´e Europ´eenne de Bretagne, Technopole Brest Iroise, CS83818, 29238 Brest Cedex 3, France, [email protected] Julien DeAntoni INRIA, Centre de Recherche de Sophia-Antipolis M´editerran´ee, 06902 Sophia Antipolis, France and Laboratoire I3S, UMR 6070 CNRS, Universit´e de Nice-Sophia Antipolis, 06903 Sophia Antipolis Cedex, France, [email protected] Robert de Simone INRIA, Centre de Recherche de Sophia-Antipolis, Sophia Antipolis Cedex, France, [email protected] Stephen A. Edwards Columbia University, New York, NY, USA, [email protected]
xiii
xiv
Contributors
Mamoun Filali IRIT-ACADIE, Universit´e de Toulouse, site Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex 9, France, [email protected] Gerald Garcia Thales Alenia Space, 100 Boulevard Midi, 06150 Cannes, France, [email protected] Thierry Gautier INRIA, Centre de Recherche de Rennes, Campus de Beaulieu, Rennes, France, [email protected] Yann Glouche INRIA, Centre de Recherche de Rennes, Campus de Beaulieu, Rennes, France, [email protected] Bijoy A. Jose FERMAT Lab, Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA, [email protected] Paul Le Guernic INRIA, Centre de Recherche de Rennes, Campus de Beaulieu, Rennes, France, [email protected] Fr´ed´eric Mallet INRIA, Centre de Recherche de Sophia-Antipolis M´editerran´ee, 06902 Sophia Antipolis, France and Laboratoire I3S, UMR 6070 CNRS, Universit´e de Nice-Sophia Antipolis, 06903 Sophia Antipolis Cedex, France, [email protected] Julien Ouy IRISA-ESPRESSO, Campus de Beaulieu., 35042 Rennes Cedex, France, [email protected] Marc Pantel IRIT-ACADIE, Universit´e de Toulouse, site Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex 9, France, [email protected] Dumitru Potop-Butucaru INRIA, Centre de Recherche de Paris-Rocquencourt, Le Chesnay Cedex, France, [email protected] Ana Rugina EADS Astrium, 31 rue des Cosmonautes Z.I. du Palays, 31402 Toulouse Cedex 4, France, [email protected] Sandeep K. Shukla FERMAT Lab, Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA, [email protected] Yves Sorel INRIA, Centre de Recherche de Paris-Rocquencourt, Le Chesnay Cedex, France, [email protected] Martin Strecker IRIT-ACADIE, Universit´e de Toulouse, site Paul Sabatier, 118 Route de Narbonne, 31062 Toulouse Cedex 9, France, [email protected] Jean-Pierre Talpin INRIA, Centre de Recherche de Rennes, Campus de Beaulieu, Rennes, France, [email protected] Nalini Vasudevan Columbia University, New York, USA, [email protected]
Acronyms
IFFT DMA FSM RTL HLS EDA HDL CAOS CDFG HTG GCD BSC BSV LPM VM LTL TLA MCS ACS MNS MIS MLS FFD PTAS AES UC
Inverse Discrete Fourier Transform Direct Memory Access Finite State Machine Register Transfer Level High Level Synthesis Electronic Design Automation Hardware Description Language Concurrent Action Oriented Specifications Control Data-Flow Graph Hierarchical Task Graph Greatest Common Divisor Bluespec Compiler Bluespec System Verilog Longest Prefix Match Vending Machine Linear-time Temporal Logic Temporal Logic of Actions Maximal Concurrent Schedule Alternative Concurrent Schedule Maximum Non-conflicting Subset Maximum Independent Set Minimum Length Schedule First Fit Decreasing Polynomial Time Approximation Scheme Advanced Encryption Standard Upsize Converter
xv
Chapter 1
Compilation of Polychronous Data Flow Equations Lo¨ıc Besnard, Thierry Gautier, Paul Le Guernic, and Jean-Pierre Talpin
1.1 Introduction High-level embedded system design has gained prominence in the face of rising technological complexity, increasing performance requirements and shortening time to market demands for electronic equipments. Today, the installed base of intellectual property (IP) further stresses the requirements for adapting existing components with new services within complex integrated architectures, calling for appropriate mathematical models and methodological approaches to that purpose. Over the past decade, numerous programming models, languages, tools and frameworks have been proposed to design, simulate and validate heterogeneous systems within abstract and rigorously defined mathematical models. Formal design frameworks provide well-defined mathematical models that yield a rigorous methodological support for the trusted design, automatic validation, and systematic test-case generation of systems. However, they are usually not amenable to direct engineering use nor seem to satisfy the present industrial demand. Despite overwhelming advances in embedded systems design, existing techniques and tools merely provide ad-hoc solutions to the challenging issue of the so-called productivity gap [1]. The pressing demand for design tools has sometimes hidden the need to lay mathematical foundations below design languages. Many illustrating examples can be found, e.g., the variety of very different formal semantics found in state-diagram formalisms [2]. Even though these design languages benefit from decades of programming practice, they still give rise to some diverging interpretations of their semantics.
L. Besnard CNRS/IRISA, Campus de Beaulieu, Rennes, France e-mail: [email protected] T. Gautier (), P. Le Guernic, and J.-P. Talpin INRIA, Centre de Recherche de Rennes, Campus de Beaulieu, Rennes, France e-mail: [email protected]; [email protected]; [email protected]
S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 1, c Springer Science+Business Media, LLC 2010
1
2
L. Besnard et al.
The need for higher abstraction levels and the rise of stronger market constraints now make the need for unambiguous design models more obvious. This challenge requires models and methods to translate a high-level system specification into (distributed) cooperating (sequential) processes and to implement high-level semantics-preserving transformations such as hierarchical code structuration, sequentialization or desynchronization (protocol synthesis). Synchronous hypothesis, in this aim, has focused the attention of many academic and industrial actors. This synchronous paradigm consists of abstracting the non-functional implementation details of a system. In particular, latencies due to effective computing and communications depend on actual implementation architecture; thus they are handled when low level implementation constraints are considered; at higher level, time is abstracted as sequences of (multiple) events in a logical time model. Thus the designer can forget those details and focus his or her attention on the functionalities of the system, including logical synchronizations. With this point of view, synchronous design models and languages provide intuitive models for embedded systems [3]. This affinity explains the ease of generating systems and architectures, and verifying their functionalities using compilers and related tools that implement this approach. Synchronous languages rely on the synchronous hypothesis: computations and behaviors of a synchronous process are divided into a discrete sequence of atomic computation steps which are equivalently called reactions or execution instants. In itself this assumption is rather common in practical embedded system design. But the synchronous hypothesis adds to this the fact that, inside each instant, the behavioral propagation is well-behaved (causal), so that the status of every signal or variable is established and defined prior to being tested or used. This criterion ensures strong semantic soundness by allowing universally recognized mathematical models to be used as supporting foundations. In turn, these models give access to a large corpus of efficient optimization, compilation, and formal verification techniques. The polychronous model [4] extends the synchronous hypothesis to the context of multiple logical clocks: several synchronous processes can run asynchronously until some communication occurs; all communications satisfy the synchronous hypothesis. The resulting behaviors are then partial orders of reactions, which is obviously more general than simple sequences. This model goes beyond the domain of purely sequential systems and synchronous circuits; it embraces the context of complex architectures consisting of synchronous circuits and desynchronization protocols: globally asynchronous and locally synchronous architectures (GALS). The S IGNAL language [5] supports the polychronous model. Based on data flow and equations, it goes beyond the usual scope of a programming language, allowing for specifications and properties to be described. It provides a mathematical foundation to a notion of refinement: the ability to model a system from the early stages of its requirement specifications (relations, properties) to the late stages of its synthesis and deployment (functions, automata). The inherent flexibility of the abstract notion of signal handled in the S IGNAL language invites and favors the design of
1
Compilation of Polychronous Data Flow Equations
3
correct-by-construction systems by means of well-defined model transformations that preserve both the intended semantics and stated properties of the architecture under design. The integrated development environment P OLYCHRONY [6] provides S IGNAL program transformations that draw a continuum from synchrony to asynchrony, from specification to implementation, from abstraction to refinement, from interface to implementation. S IGNAL gives the opportunity to seamlessly model embedded systems at multiple levels of abstraction while reasoning within a simple and formally defined mathematical model. It is being extended by plugins to capture SystemC modules or real-time Java classes within the workbench. It allows to perform validation and verification tasks, e.g., with the integrated S IGALI model checker [7], or with the Coq theorem prover [8]. C, C++, multi-threaded and realtime Java and S YN DE X [9] code generators are provided. This chapter focuses on formal transformations, based on the polychronous semantic model [4], that can be provided by a safe methodology to generate “correct-by-construction” executable code from S IGNAL processes. It gives a thorough presentation on program analysis and code generation techniques that can be implemented to transform synchronous multi-clocked equation systems into various execution schemes such as sequential and concurrent programs (C) or objectoriented programs (C++). Most of these techniques are available in the toolset P OLYCHRONY to design embedded real-time applications. This chapter is structured as follows: Sect. 1.2 presents the main features of the S IGNAL language and introduces some of the mathematical properties on which program transformations are based. In Sect. 1.3, some of the modularity features of S IGNAL are first introduced; then an example, used in the rest of this chapter, is presented. Section 1.4 is dedicated to Data Control Graph models that support program transformations; their use to guide various code generation schemes that are correct by construction is then considered. Section 1.5 introduces those various modes of code generation, illustrated on the considered example.
1.2 S IGNAL Language S IGNAL [10–13,15,19] is a declarative language expressed within the polychronous model of computation. S IGNAL relies on a handful of primitive constructs, which can be combined using a composition operator. These core constructs are of sufficient expressive power to derive other constructs for comfort and structuring. In the following, we present the main features of the S IGNAL language and its associated concepts. We give a sketch of the primitive constructs and a few derived constructs often used. For each of them, the corresponding syntax and definition are mentioned. Since the semantics of S IGNAL is not the main topic of this chapter, we give simplified definitions of operators. For further details, we refer the interested reader to [4, 5].
4
L. Besnard et al.
1.2.1 Synchronized Data Flow Consider as an example the following expression in some conventional data flow formalism: if a > 0 then x D a endifI y D x C a: Considering the data flow semantics given by Kahn [16] as functions over flows, y is the greatest sequence of values a0 t C at where a0 is the subsequence of strictly positive values in a. Thus in an execution where the edges are considered as FIFO queues [17], if a is a sequence with infinitely many non-positive values, the queue associated with a grows forever, or (if a is a finite sequence) the queue associated with x remains eventually empty although a is non-empty. Now, suppose that each FIFO queue consists of a single cell [18]. Then as soon as a negative value appears on the input, the execution (of the C operator) can no longer go on because its first operand (cell) does not hold a value (is absent): there is a deadlock. These results are not acceptable in the context of embedded systems where not only deadlocks but also uncontrolled time responses can be dramatic. Synchronized data flow introduces synchronizations between occurrences of flows to prevent such effects. In this context, the lack of value is usually represented by nil or null; we represent it by the symbol # (stating for “no event”). To prevent deadlock to occur during execution, it is necessary to be able to verify timing properties before runtime. In the framework of synchronized data flow, the # will correspond to the absence of value at a given logical instant for a given variable (or signal). In particular, to reach high level modularity allowing for instance internal clock rate increasing (time refinement), it must be possible to insert #’s between two defined values of a signal. Such an insertion corresponds to some resynchronization of the signal. However, the main purpose of synchronized data flow is to completely handle the whole synchronization at compile time, in such a way that the execution phase has nothing to do with #. This is assumed by a static representation of the timing relations expressed by each operator. Syntactically, the overall detailed timing is implicit in the language. S IGNAL describes processes which communicate through (possibly infinite) sequences of (typed) values with implicit timing: the signals.
1.2.2 Signal, Execution, Process in S IGNAL A pure signal s is a (total) function T ! D, where T , its time domain, is a chain in a partial order (for instance an increasing sequence of integers) and D is some data type; we name pure flow of such a pure signal s, the sequence of its values in D. For all chains T T , T T T , a pure signal s W T ! D can be extended to a synchronized signal ss W T T ! D# (where D# D D [ f#g) such that for all t in T; ss.t / D s.t / and ss.t / D # when t is not in T ; we name synchronized flow of a synchronized signal ss the sequence of its values in D# ; conversely we
1
Compilation of Polychronous Data Flow Equations
5
name pure signal of the synchronized flow ss, the unique pure signal s from which it is extended and pure flow of ss the pure flow of s. The pure time domain of a synchronized signal is the time domain of its pure signal. When it is clear from the context, one may omit pure or synchronized qualifications. Given a pure signal s (respectively, a synchronized signal ss), st (respectively, sst ) denotes the t -th value of its pure flow (respectively, its synchronized flow). An execution is the assignment of a tuple of synchronized signals defined on the same time domain (as synchronized signals), to a tuple of variables. Let TT be the time domain of an execution, a clock in this execution is a “characteristic function” clk W T T ! f#; trueg; notice that it is a synchronized signal. The clock of a signal x in an execution is the (unique) clock that has the same pure time domain as x; it is denoted by x. O A process is a set of executions defined by a system of equations over signals that specifies relations between signal values and clocks. A program is a process. Two signals are said to be synchronous in an execution iff they have the same clock (or equivalently the same pure time domain in this execution). They are said to be synchronous in a process (or simply synchronous), iff they are synchronous in all executions of this process. Consider a given operator which has, for example, two input signals and one output signal all being synchronous. They are logically related in the following sense: for any t , the t -th token on the first input is evaluated with the t -th token on the second input, to produce the t -th token on the output. This is precisely the notion of simultaneity. However, for two occurrences of a given signal, we can say that one is before the other (chronology). Then, for the synchronous approach, an event is associated with a set of instantaneous calculations and communications.
1.2.3 S IGNAL Data Types A flow is a sequence of values that belong to the same data type. Standard data types such as Boolean, integer, . . . (or more specific ones such as event – see below) are provided in the S IGNAL language. One can also find more sophisticated data types such as sliding window on a signal, bundles (a structure the fields of which are signals that are not necessarily synchronous), used to represent union types or signal multiplexing: The event type: to be able to compute on (or to check properties of) clocks,
S IGNAL provides a particular type of signals called event. An event signal is true if and only if it is present (otherwise, it is #). Signal declaration: tox x declares a signal x whose common element type is tox. Such a declaration is a process that contains all executions that assign to x a signal the image of which is in the domain denoted by tox. In the remainder of this chapter, when the type of a signal does not matter or when it is clear from the context, one may omit to mention it.
6
L. Besnard et al.
1.2.4 S IGNAL Elementary Processes An elementary process is defined by an equation that associates with a signal variable an expression built on operators over signals; the arguments of operators can be expressions and variables. Stepwise extensions. Let f be a symbol denoting a n-ary function ŒŒf on values
(e.g., Boolean, arithmetic or array operation). Then, the S IGNAL expression y := f(x1,..., xn) defines the process equal to the set of executions that satisfy:
The signals y; x1; : : : ; x n are synchronous: Their pure flows have same length l and satisfy 8tl; yt D ŒŒf .x1t ; : : : ; x nt /:
If f is a function, its stepwise extension is a pure flow function. Infix notation is used for usual operators. Derived operator ı Clock of a signal: ˆx returns the clock of x; it is defined by (ˆx) D (x = x), where = denotes the stepwise extension of usual equality operator. Delay. This operator defines the signal whose t -th element is the .t 1/-th
element of its (pure flow) input, at any instant but the first one, where it takes an initialization value. Then, the S IGNAL expression y := x $ 1 init c defines the process equal to the set of executions that satisfy: 8 < y; x are synchronous: : Pure flows have same length l and satisfy 8t l;
.t > 1/ ) yt D xt1 ; .t D 1/ ) yt D c:
The delay operator is thus a pure flow function. Derived operator ı Constant: x := v; when x is present its value is the constant value v; x := v is a derived equation equivalent to x := x $ 1 init v. Note that this equation does not have input: it is a pure flow function with arity 0. Sampling. This operator has one data input and one Boolean “control” input.
When one of the inputs is absent, the output is also absent; at any logical instant where both input signals are defined, the output is present (and equal to the current data input value) if and only if the control input holds the value true. Then, the S IGNAL expression
1
Compilation of Polychronous Data Flow Equations
7
y := x when b defines the process equal to the set of executions that satisfy: 8 0 then x D a endifI y D x C a is the following “DeadLocking Specification”: DLS (| x := a when a > 0 | y := x + a |) DLS denotes the set of executions in which at > 0 for all t . Then safe execution requires this program to be rejected if one cannot prove (or if it is not asserted) that the signal a remains strictly positive when it is present.
10
L. Besnard et al.
Embedding DLS without changing it with the following front-end process results in a safe program that accepts negative values for some occurrences of a; these negative values are not transmitted to DLS: ap := a when a > 0 | (| (| DLS | a := ap |) where a |) Process expression in normal form. The following properties of parallel composition are intensively used to compile processes: Associativity: (| P | Q |) | R P | (| Q | R |). Commutativity: P | Q Q | P . Idempotence: P | P P is satisfied by processes that do not produce side
effects (for instance due to call to system functions). This property allows to replicate processes. Externalization of restrictions: if x is not a signal of P, P | (| Q where x |) (| P | Q |) where x Hence, a process expression can be normalized, modulo required variable substitution, as the composition of elementary processes, included in terminal restrictions. Signal expression in normal form. Normalizations can be applied to expressions on signals thanks to properties of the operators, for instance: when is associative and right-commutative:
(a when b)when c a when(b when c) a when(c when b) default can be written in exclusive normal form:
a default b a default (b when (bˆ-a)) default is associative and default commutes in exclusive normal form:
if a and b are exclusive then a default b b default a when is right-distributive over default:
(a default b) when c (a when c) default (b when c) Boolean normalization: logical operators can be written as expressions involving
only not, false and operators on event type. For example: AB := a or b (| AB := (when a) default b | a ˆ= b |) Process abstraction. A process Pa is, by definition, a process abstraction of a process P if P jP a D P . This means that every execution of P restricted to variables of Pa is an execution of Pa and thus all safety properties satisfied by executions of P a are also satisfied by executions of P .
1.3 Example As a full example to illustrate the S IGNAL features and the compilation techniques (including code generation) we propose the description of a process that solves iteratively equations aX 2 C bX C c D 0. This process has three synchronous signals a, b, c as inputs. The output signals x1, x2 are the computed solutions. When there
1
Compilation of Polychronous Data Flow Equations
11
is no solution at all or infinitely many solutions, their synchronized flows hold # and an output Boolean x_st is set. Before presenting this example let us introduce modularity features than can be found in S IGNAL.
1.3.1 S IGNAL Modularity Features Process model: Given a process (a set of equations) P_body, a process model MP associates an interface with P_body, such that P_body can be expanded using this interface. A process model is a S IGNAL term process MP ( ? t_I1 I1; ...; t_Im Im; ! t_O1 O1; ...; t_On On ) P_body
that specifies its typed input signals after “?”, and its typed output signals after “!”. Assuming that there is no name conflict (such a conflict is solved by trivial renaming), the instantiation of MP is defined by: (| (Y1, ..., Yn) := MP(E1, ..., Em) |) D (| I1:=E1 |...| Im:=Em | P_body | Y1:=O1 |...| Yn:=On |) where t_I1 I1; ...; t_Im Im; t_O1 O1; ...; t_On On
Example When a is equal to 0, the second degree equation becomes bX C c D 0. This first degree equation is solved using the FirstDegree process model. process FirstDegree = ( ? real b, c; ! boolean x_st; real x; ) (| b ˆ= c | b1 := b when (b/=0.0) | c1 := c when (b/=0.0) | x := -(c1/b1) | x_st := (c/=0.0) when (b=0.0) |) where real b1, c1; end Example: FirstDegree S IGNAL process
When the factor b is not 0, the output signal x holds the value of the solution (-c/b), the x_st Boolean signal is absent. Conversely, when b is 0, x is absent and x_st is either true when the equation has no solution (c is not 0) or false when the equation has infinitely many solutions (c is 0). This process is activated when the input parameter a equals 0. This is achieved by: (x_st_1, x11) := FirstDegree (b when (a=0.0), c when (a=0.0))
The interface of a process model can begin with a list of static parameters given between “{” and “}”; see for instance real epsilon in the interface of rac (Example p. 12). Local process model: A process model can be declared local to an other process model as process rac local to process SecondDegree in Example p. 13.
12
L. Besnard et al.
Shared variables: A variable x that has partial definitions (Sect. 1.2.4) in a process model MP, or in process models local to MP, is declared as variable shared real x, local to MP. A shared signal x cannot appear in the interface of any process.
1.3.2 Full Example with Time Refinement In our example, the resolution of the equation aX 2 C bX C c D 0puses the iterative Newton method: starting from 0, the computation of R D is defined by the limit of the series .Rn /n0 : D b 2 4ac;
R0 D
; 2
RnC1 D
.Rn Rn C /=Rn .n 0/: 2
The iterative method for computing the square root of the discriminant is implemented in S IGNAL using a time refinement of the clock of the discriminant. The process model rac computes in R the sequence of roots Rt of the values of a signal St assigned to S (corresponding to the discriminant when it is not negative); epsilon is a static threshold parameter used to stop Newton iteration. process rac = { real epsilon; } ( ? real S; ! boolean stable; real R; ) (| (| S_n := var S | R_n := (S/2.0) default (next_R_n $1 init 1.0) | next_R_n := (((R_n+(S_n/R_n))/2.0) when loop) default R_n |) where real S_n; end | (| loop := (ˆS) default (not stable) | next_stable := abs(next_R_n-R_n) b when c is an elementary process in the S IGNAL syntax. It denotes the precedence relation Œc W a ! b. This constraint is usually implicit and related to data dependencies (for instance in x := a+b the constraints a ! x and b ! x hold), and clock/signal dependencies (the value of a signal x cannot be computed before x, O the clock of x, hence the implicit precedence relation xO ! x). Precedences can be freely added by the programmer. They can be computed by the compiler and made available in specifications associated with processes (Sect. 1.4.6). Derived expressions a --> b is a simplified term that means a --> b when (aˆ*b). Local precedence constraints can be combined as in {b,c} --> {x_st,x} meaning that for all pairs (u in {b,c}, v in {x_st,x}), u --> v holds. The S IGNAL term ll::P is a derived process expression formally defined with S IGNAL primitive features. For the purpose of this chapter, it is enough to know
16
L. Besnard et al.
that ll::P associates the label ll with the process P. A label ll is a special event signal that “activates” P. It may occur in synchronization and precedence constraints as any other signal, with a specific meaning for precedence: if a process P contains ll1::P1 and ll2::P2 then ll1 --> ll2 in P means that every node in P1 precedes all nodes of P2, while for a signal x, x --> ll2 (resp. ll1 --> x) means that x precedes (resp. is preceded by) all nodes of P2 (P1).
1.4.3 Synchronization Relation Table 1.1 gives the synchronization relation associated with a S IGNAL expression P, as a S IGNAL expression C(P) (column 2), and as a time domain constraint T (P) (column 3). The time domain constraints have to be satisfied by all executions of P. The synchronization relation associated with derived expressions (and normalizable ones) are deduced from this table. In this table “[E]” stands for “when E”, “ˆ0” is the “never present” clock, the pure time domain of a signal x is noted “dom.x/”. Other notations are standard ones and S IGNAL notations. The transformation C defined in Table 1.1 satisfies the following property making C(P) a process abstraction of P: .j C(P) j P j/ = P. The clock of a Boolean signal b is partitioned into its exclusive sub-clocks Œb and Œ:b which denote the instants at which the signal b is present and carries the values true and false, respectively, as shown in Fig. 1.2. Table 1.1 Synchronization relation associated with a process Construct P Clocks: C (P) boolean b [b]ˆ+[not b] ˆ= b | [b]ˆ*[not b] ˆ= ˆ0 event b [b] ˆ= b | [not b] ˆ= ˆ0 y := f(x1 ,...,xn ) y ˆ= x1 ˆ= ... ˆ= xn y := x $1 init c y ˆ= x y := x when b xˆ*[b] ˆ= y y := x when not b xˆ*[not b] ˆ= y z := x default y xˆ+y ˆ= z P1 jP2 C .P1 / j C .P2 / C (P) where x P where x
Pure time domains T (P) T .C .P// T .C .P// T .C .P//
dom.y/ D dom.x/ dom.y/ D dom.x/ \ dom.Œb/ dom.y/ D dom.x/ \ dom.Œ:b/ dom.z/ D dom.x/ [ dom.y/ T .C .P1 // ^ T .C .P2 // 9x T .C .P//
b, c absent
b, c present [b=0], [b/=0], x_st present x present
b, c [b=0], x_st
Fig. 1.2 Time subdomains and hierarchy for the FirstDegree process
[b/=0], x
1
Compilation of Polychronous Data Flow Equations
1.4.3.1
17
Clock Hierarchy
The synchronization relation provides the necessary information to determine how clocks can be computed. From the synchronization relation induced by a process, we O (dominates) build a so-called clock hierarchy [20]. A clock hierarchy is a relation & on the quotient set of signals by ˆ= (x and y are in the same class iff they are synchronous). Informally a class C dominates a class D (C is higher than D) or O , if equivalently a class D is dominated by C (D is lower than C ), written D .C the clock of D is computed as a function of Boolean signals belonging to C and/or to classes recursively dominated by C . To compute this relation, we consider a set V of free value Boolean signals (the free variables of the process); this set contains variables the definition of which cannot be rewritten using some implemented rewriting rule system: in the current P OLYCHRONY version, input signals, delayed signals, results of most of the nonBoolean predicates are elements of this set V . O is based on the following Depending on V , the construction of the relation & O is the transitive closure of &): O rules (for x a signal, Cx denotes its class; & 1. If x1 is a free variable such that the clock of x is defined by sampling that of x1 O x : the value of x1 is required to compute (xO D Œx1 or xO D Œ:x1 ) then Cx1 &C the clock of x. 2. If xO D f .xO 1 ; : : : ; xO n /, where f is a Boolean/event function, and there exists C O Cx for all xi in x1 ; : : : ; xn then xO is written in canonical form such that C & i xO D cf .yO1 : : : ; yOm /; cf .yO1 ; : : : ; yOm / is either the null clock, or the clock of C , or is transitively dominated by C ; in P OLYCHRONY, BDDs are used to compute canonical forms. 3. If xO D cf .yO1 ; : : : ; yOm / is a canonical form such that m 2 and there exists Cz O Cy for all yi in y1 ; : : : ; ym , then there exists a lowest class C such that Cz & i O x. that dominates those Cyi , and C &C When the clock hierarchy has a unique highest class, like the classes of Bx in Fig. 1.3b, the process has a fastest rated clock and the status (presence/absence) of all signals is a pure flow function: this status depends neither on communication delays, nor on computing latencies. The process is endochronous.
Bx, Bb
x
b
x ^*[b]: y
x [Bx]
b [Bb]
Bx Bb [Bx] ^*[b]: y [¬Bb]
x b
[b]
[¬b]
a: not a tree
[¬Bx]
[b]
Bx ^= Bb | x ^= when Bx | b ^= when Bb y:=x when b
[¬b]
b: endochronized
Fig. 1.3 Endochronization of y := x when b
c: clock container
y
18
L. Besnard et al.
The “clock calculus” provided by P OLYCHRONY determines the clock hierarchy of a process. It has a triple purpose: – It verifies that the process is well-clocked: synchronization relation of the process can be written as a set of acyclic definitions. – It assigns to each clock a unique normalized definition when possible. – It structures the control of the process according to its clock hierarchy.
1.4.3.2
“Endochronization”
A process that is not endochronous can be embedded in a container (Fig. 1.3c) such that the resulting process is endochronous. For instance, consider the clock hierarchy associated with the process y := x when b; it has two independent classes: the class of ˆx and the class of ˆb (Fig. 1.3a), the third one is defined by a clock product. To get an endochronous process, one can introduce a new highest clock and two new input Boolean signals Bx and Bb, synchronous with that clock; the sampling of Bx and Bb defines respectively the clocks of x and b (Fig. 1.3b). This embedding can be made by the designer or heuristics can be applied to instrument the clock hierarchy with a default parameterization. Building such a container is useful not only to make a process endochronous but also to insure that the pure flow function associated with a KPN remains a synchronized flow function (i.e., synchronizations are preserved despite various communication delays).
1.4.4 Precedence Relation Table 1.2 gives the precedence relation associated with a S IGNAL expression P, as ı
a S IGNAL expression S(P) (column 2), and as a path algebra !(P) (column 3). Notice that the delay equation (line 3) does not order the involved signals. The table is completed by relations coming from the clock hierarchy. The transformation S defined in Table 1.2 satisfies the following property making S(P) a process abstraction of P: .j S(P) j P j/ = P.
1.4.4.1
ı
Path Algebra !
From basic precedence relation, a simple path algebra can be used to verify deadlock freeness, to refine precedence and to produce information for modularity. The path
1
Compilation of Polychronous Data Flow Equations
19
Table 1.2 Precedence relation associated with a process; the transitive closure .P / , is more precisely a path algebra presented in Sect. 1.4.4.1 Construct P {a --> b when c}
ı
Precedence: S (P) {a --> b when c}
Path algebra: ! .P/ Œc W a ! b ı
! .S .P//
y := f(x1 ,...,xn ) x1 -->y j : : : j xn -->y y := x $1 init c y:= x when b x-->y
x!y ı
z:= x default y
x-->z j y-->z when (ˆyˆ-ˆx) ! .S .P//
P1 j P 2
S .P1 /jS .P2 /
..! .P1 // [ .! .P2 ///
P where x
S (P) where x
! .P/
Clock construct P x, a signal x ˆ= when b x ˆ= when not b
Precedence: S (P) ˆx --> x b --> ˆx
Path algebra: ! .P/ xO ! x b ! xO
ı
ı
ı
ı
ı
! .S .P//
y ˆ= cf(ˆx1 ,. . . ,ˆxn ) ˆx1 -->ˆy j : : : j ˆxn -->ˆy
ı
algebra is given by the following rules defining ! for expressions c W x ! y in normalized form: ı Rule of series ı Rule of parallel
O W x ! z; c W x ! y and d W y ! z ) c d cWx!y O W x ! y: ) c Cd d Wx!y ı
A pseudo cycle in the precedence relation ! associated with a process P is a ı
sequence c1 W x1 ! x2 , c2 W x2 ! x3 : : : cn1 W xn1 ! x1 in ! . P is deadlock free iff for all pseudo cycle c1 W x1 ! x2 , c2 W x2 ! x3 : : : cn1 W xn1 ! x1 in its O O 2 O : : : c O n1 is null (D 0). precedence relation, the product c1 c
1.4.4.2
Precedence Refinement
To generate sequential code for a (sub)process P it is usually necessary to add new dependencies c1 W x1 ! x2 to P, getting so a new process PS which refines precedence of P. But it is wishable that if the composition of P with some process Q is deadlock free, then the composition of PS with the same process Q remains deadlock free; a refinement that satisfies this property is said to be cycle consistent by composition. In general, maximal cycle consistent refinement is not unique. The union of all maximal cycle consistent refinements is actually a preorder that can be computed using the path algebra [21]. The preorder associated with process FirstDegree is shown in Fig. 1.4b (clocks are omitted): one can for instance compute b1 before c1 or conversely.
20
L. Besnard et al.
a ^b = ^c b
c
[b=0]
[b/=0] b1:=b when k
x_t:=(c=0)when h
h is “when b=0” k is “when b/=0” IO Clock hierarchy scheduling refinement
b ^b =^c b
c
[b=0]
[b/=0]
c1:=c when k
x:=c1/b1
a-clock and graph for FirstDegree
b1:= x_t:=(c=
c1:= x:=c
b- scheduling refinement preorder
Fig. 1.4 FirstDegree process DCG and its precedence refinement preorder
1.4.5 Data Control Graph The Data Control Graph, composed of a clock hierarchy and a Conditioned Precedence Graph as shown in Fig. 1.4a, not only associates with every clock node the set of signals that have this node as clock, but also produces a hierarchy of the CPG in which with every clock node is associated the subgraph of the computations that must be processed when this clock is present (for instance computations of b1, c1, x are associated with the node [b /= 0] in Fig. 1.4a).
1.4.6 S IGNAL Process Abstraction The interface of a process model MP can contain, in a process P_abst, a specific description of relations, typically, clock and precedence relations, applying on its inputs and outputs: process MP(?I1, ...,Im; !O1,..., On) spec (|P_abst|) P_body
When the process body P_body of MP is a set of equations, the actual process associated with MP is not P_body but (|P_abst|P_body|). Hence, P_abst is by construction an abstraction of (|P_abst|P_body|). When MP is an external process model, P_abst is assumed to be an abstraction of the process associated with MP. Supplementary properties are required to correctly use those external processes. Thus, a process is (can be) declared: Safe: a safe process P is a single-state deterministic, endochronous automaton;
such a process does not have occurrence of delay operator and cannot “call” external processes that are not safe; in practice the code generated to interact with P (to call P ) can be freely replicated. Deterministic: a deterministic process P is a deterministic, endochronous automaton (it is a pure flow function); such a process cannot “call” external
1
Compilation of Polychronous Data Flow Equations
21
processes that are not safe; in practice the code generated to interact with P (to step P ) can be replicated only if the internal states of P are also replicated. Unsafe: a process is unsafe by default. For instance, the FirstDegree function (Example p. 11) is a safe process. Its synchronization and precedence relations are made explicit in its specification: the signal x_st is synchronized with the clock at which b is zero, the signal x is present iff b is non-zero, all output signals depend on all input signals ({b, c}-->{x, x_st}). The abstraction of its generated C code is then given by the FirstDegree S IGNAL external process presented below. For a S IGNAL process P such an abstraction can be automatically computed by the P OLYCHRONY S IGNAL compiler: it is the restriction to input/output signals (of MP) of the process (|C(P)|S 0 (P)|), where S 0 (P) is the precedence refinement of S(P) taking into account the precedences introduced by code generation. When the specifications are related to imported processes their source code may not be available (written in another language such as C++ or Esterel [22]). Legacy codes can be used in applications developed in S IGNAL, in which they are considered as external processes via specifications. Besides, P OLYCHRONY provides a translator from C code in SSA form to S IGNAL processes [23]. The translated process can be used to abstract the original C behavior. Each sequence of instructions in the SSA form is associated with a label. Each individual instruction is translated to an equation and each label is translated to a Boolean signal that guards its activation. As a result, the S IGNAL interpretation has an identical behavior as the original C program. void FirstDegree (int float b, float c, float *x, int *x_st) { if (b != 0.0) *x = -(c/b); else *x_st = (c !=0.0); }
process FirstDegree = ( ? real b, c; ! boolean x_st; real x;) safe spec (| b --> c %precedence refinement% | {b, c} --> {x , x_st} | b ˆ= c | x ˆ= when (b/=0.0) | x_st ˆ= when (b=0.0) |) external "C" ;
Example: Legacy code and S IGNAL abstraction for FirstDegree
Syntactic sugar is made available in S IGNAL to easily handle usual functions. For instance the arithmetic C function abs used in rac can be declared as function abs=(? real x; ! real s;); function means that the process is safe, its input and output signals are synchronous and every input precedes every output.
1.4.7 Clustering The DCG (Sect. 1.4.5) with its CH (Sect. 1.4.3.1) ant its CPG (Sect. 1.4.4) is the basis of various techniques applied to structuring the generated code whilst preserving the semantics of the S IGNAL process thanks to formal properties of the S IGNAL
22
L. Besnard et al.
language (Sect. 1.2.6). We give in this section a brief description of several important techniques that are illustrated in Sect. 1.5. 1.4.7.1
Data Flow Execution
The pure data flow code generation provides the maximal parallel execution. If we consider the FirstDegree example, we can see that some elementary equations are not pure flow functions, and then the result of their pure data flow execution is not deterministic. It is the case for c1 := c when (b/=0.0): an occurrence of c being arrived and b not, b may be assumed absent. A simple solution to this problem is to consider only endochronous nodes as candidates for data flow execution. One can get this structure by duplicating the synchronization b ˆ= c. The resulting code for FirstDegree, in which each line is a pure flow function, is then: (| | | |
b1 := b when (b/=0.0) (| b ˆ= c | c1 := c when (b/=0.0) |) x := -(c1/b1) (| b ˆ= c | x_st := (c/=0.0) when (b=0.0) |) Example: S IGNAL code with endochronous nodes
This example illustrates a general approach, using properties of the S IGNAL language (Sect. 1.2.6) to get and manage endochronous subprocesses as sources for future transformations. Building primitive endochronous nodes does not change the semantics of the initial process. 1.4.7.2
Data Clustering
Data clustering is an operation that lets unchanged the semantics of a process. It consists in splitting processes on various criteria thanks to commutativity, associativity and other properties of process operators (Sect. 1.2.6). It is thus possible to isolate: – The management of state variables (defined, directly or not, as delayed signals) in a “State variables” process (required synchronizations are associated with those variables) – The management of local shared variables used instead of signal communications between subprocesses in “Local variables” blocks – The computation of specific data types towards multicore heterogeneous architectures 1.4.7.3
Phylum
Atomic nodes can be defined w.r.t. the following criterion: when two nodes both depend on (are transitively preceded by) the same set of input variables, they can
1
Compilation of Polychronous Data Flow Equations
23
be both executed only after the inputs are achieved. This criterion defines an equivalence relation the class of which we name phylums: for a set of input variables A we name phylum of A the set of nodes that are preceded by A and only A. It results from this definition that if a phylum P1 is the phylum of A1, and P2 that of A2, P1 precedes P2 iff A1 A2. As an example, the FirstDegree function (Example p. 22) might have four phylums: the phylum of the empty set of inputs which is empty, the phylum of b that contains b1 := b when(b/=0.0), the phylum of c which is empty, and finally the phylum of fb,cg that contains the three remaining nodes. Due to its properties, an endochronous process can be split into a network of endochronous phylums that can be executed atomically (as function calls for instance).
1.4.7.4
From Cluster to Scheduling of Actions
Using data clustering and phylums, a S IGNAL process can be transformed in an equivalent process the local communications of which are made invisible for a user context. This transformation is illustrated with the FirstDegree process; two phylums are created: Phylum_B is the phylum of b and Phylum_BC is the phylum of {b,c}; two labels L_PH_B and L_PH_BC are created to define the “activation clock” and the precedence related to these phylums. One can notice that this S IGNAL code is close to a low level imperative code. process FirstDegree = ( ? real b, c; ! boolean x_st; real x; ) (| b ˆ= c ˆ= b1 ˆ= bb ˆ= L_PH_B ˆ= L_PH_BC | L_PH_B :: Phylum_B(b) | L_PH_BC :: (x_st, x) := Phylum_BC(c) | L_PH_B --> L_PH_BC |) where shared real b1, bb; process Phylum_B = ( ? real b; ) (| bb ::= b | b1 ::= b when (b/=0.0) |) process Phylum_BC = ( ? real c; ! boolean x_st; real x;) (| c1 := c when (b/=0.0) | x := -(c1/b1) | x_st := (c/=0.0) when (bb=0.0) |) where real c1; end end Example: Labels and actions in FirstDegree S IGNAL process
A grey box is an abstraction of such a clustered process: the grey box of a process contains a specification that describes synchronization and precedence over signals and labels. The phylums synchronized by these labels are declared as abstract processes. The grey box of a process P can be imported in another process PP. The global scheduling generated for PP includes the local scheduling of P. In the case of object-oriented code generation, this inclusion can be achieved by inheritance and redefinition of the scheduling, the methods associated with abstract processes remaining unchanged. The grey box associated with FirstDegree can be found below.
24
L. Besnard et al. process FirstDegree = ( ? real b, c; ! boolean x_st; real x; ) %grey box% safe spec (| b ˆ= c ˆ= L_PH_B ˆ= L_PH_BC | L_PH_B :: Phylum_B(b) | L_PH_BC :: (x_st, x) := Phylum_BC(c) | b --> L_PH_B --> c --> L_PH_BC --> {x , x_st} | x ˆ= when (b/=0.0) | x_st ˆ= when (b=0.0) |) where process Phylum_B = ( ? real b; ) external; process Phylum_BC = ( ? real c; ! boolean x_st; real x;) external; end external ; Example: Grey box abstraction for FirstDegree S IGNAL process
1.4.8 Building Containers The construction of containers previously proposed to build endochronous processes (Sect. 1.4.3.2) is used to embed a given process in various execution contexts. The designer can for instance define input/output functions in some low level language and import their description in a S IGNAL process providing to the interfaced process an abstraction of the operating system; this is illustrated below for the FirstDegree process: process EmbeddedFirstDegree = ( ) (| (| L_SCAN :: (b,c) := scan() | b ˆ= c | (x_st, x) := FirstDegree(b,c) | (| L_EA :: emitAlarm() | L_EA ˆ= x_st | (| L_PRINT :: print(x) | L_PRINT ˆ= x |) where real b, c, x; boolean x_st; process scan = ( ! real b, c; ) spec (| b ˆ= c |) process emitAlarm() process print = ( ? real x ) process FirstDegree = ( ? real b, %description of FirstDegree% end
ˆ= L_SCAN |) |) |)
c; ! boolean x_st; real x; )
Example: Container for FirstDegree S IGNAL process
The statement (b,c) := scan(), synchronized to L_SCAN, delivers the new values of b and c each time it is activated. The statement print(x) prints the result when it is present. The statement emitAlarm() is synchronized with x_st, thus an alarm is emitted each time the equation has not a unique solution. The construction of containers can be used to execute processes in various contexts, including resynchronization of asynchronous communications, thanks to the var operator (Sect. 1.2.5) or more generally to bounded FIFOs – that can be built in S IGNAL.
1
Compilation of Polychronous Data Flow Equations
25
1.5 Code Generation in P OLYCHRONY Toolset Section 1.4 introduces the compilation of a S IGNAL process as a set of transformations. In this section, we describe how code can be generated, as final step of such a sequence of transformations, following different schemes. When a process P is composed of interconnected endochronous subprocesses, free of clock constraints, it is a pure flow function (a KPN). One could then generate code for P following the Kahn semantics. Nevertheless, the execution of the generated code may deadlock. If this KPN process is also acyclic then deadlock cannot occur: the code generation functionalities of P OLYCHRONY can be applied. The code is generated for different target languages (C, C++, Java) on different architectures, preserving various semantic properties (at least the defined pure flow function). However, it is possible to produce reactive code or defensive code when the graph is acyclic but there are remaining clock constraints. In these modes, all input configurations are accepted. For reactive code, inputs that satisfy the constraints are selected; for defensive code, alarms are emitted when a constraint is violated during the simulation.
1.5.1 Code Generation Principle The code generation is based on formal transformations presented in the previous sections. It is strongly guided by the clock hierarchy resulting from the clock calculus to structure the target language program, and by the conditioned precedence graph not only to locally order elementary operations in sequences, but also to schedule component activations in a hierarchical target code. The code generation follows more or less the structure presented in Fig. 1.5. The “step block” contains a step scheduler that drives the execution of the step component and updates the state variables (corresponding to delays). The step component may be hierarchically decomposed as a set of sub-components (clusters), scheduled by the step scheduler,
Step scheduler
MAIN C1
Step scheduler (Hierarchical) Step component State variables
Local variables
IO container
Fig. 1.5 Code generation general scheme
Step component control
Ci
State variables
data Component Ci Local variables: x::=E State variables: zx:=x$1
Cn Local variables
Local step scheduler Ci1
Cji
Local state variables
Cm Local variables
26
L. Besnard et al.
and each sub-component has, in the same way, its own local step scheduler. The step block communicates with its environment through the IO container and it is controlled by a main program. Target language code is generated in different files. For example, for C code generation, we have, for a process P, a main program P_main.c, a program body P_body.c (that contains the step block) and an input-output module P_io.c (the IO container). The main program calls the initialization function defined in the program body, then keeps calling the step function. The IO container defines the input and output communications of the program with the operating system. Each component of the target code (except the main program, which is an explicit loop) may be seen as a S IGNAL process. Every such component, generated in C for instance, may be abstracted in S IGNAL for reuse in an embedding S IGNAL process. When target language is an object oriented language, then a class is generated for each component. This class can be specialized to fit new synchronizations resulting from embedding the original process in a new context process. 1.5.1.1
Step Function
Once the program and its interface are initialized, the step function is responsible for performing the execution steps that read data from input streams, calculate and write results along output streams. There are many ways to implement this function starting from the clock hierarchy and conditioned precedence graph produced by the front-end of the compiler. Various code generation schemes [21, 24, 25] are implemented in the P OLYCHRONY toolset. They are detailed in the subsequent sections on the solver example: Global code generation
– Sequential (Sect. 1.5.2) – Clustered with static scheduling (Sect. 1.5.3) – Clustered with dynamic scheduling (Sect. 1.5.4) Modular code generation
– Monolithic (Sect. 1.5.5.1) – Clustered (Sect. 1.5.5.2) Distributed code generation (Sect. 1.5.6)
1.5.1.2
IO Container
If the process contains input and/or output signals (the designer did not build his or her IO container), the communication of the generated program with the execution environment is implemented in the IO container. In the simulation code generator, each input or output signal is interfaced with the operating system by a stream connected to a file containing input data and collecting output data. The IO container
1
Compilation of Polychronous Data Flow Equations
27
(Example p. 27) declares global functions for opening (eqSolve_OpenIO) and closing (eqSolve_CloseIO) all files, and for reading (r_eqSolve_a) and writing (w_eqSolve_x1) data along all input and output signals. void eqSolve_OpenIO() { fra = fopen("Ra.dat","rt"); if (!fra) { fprintf(stderr, "Can’t open %s\n","Ra.dat"); exit(1); } fwx1 = fopen("Wx1.dat","wt"); if (!fwx1) { fprintf(stderr, "Can’t open %s\n","Wx1.dat"); exit(1); } /* ... idem for b, c, x2, x_st */}
void eqSolve_CloseIO() { fclose(fra); ... fclose(fwx1); } int r_eqSolve_a(float *a) { return (fscanf(fra,"%f",a)!=EOF); } void w_eqSolve_x1(float x1) { fprintf(fwx1,"%f ",x1); fprintf(fwx1,"\n"); fflush(fwx1); } /* ... idem for b, c, x2, x_st */
Example: An extract of the C code generated for the solver: the IO container
The IO container is the place where the interface of generated code with an external visualization and simulation tool can be implemented. The P OLYCHRONY toolset supports default communication functions for various operating systems and middlewares that can be modified or replaced by the user. The r_xx_yy functions return an error status that should be removed in embedded programs.
1.5.1.3
Main Program
The main program (Example p. 27) initializes input/output files in eqSolve_ OpenIO, state variables in eqSolve_initialize, and iterates call to the step function in eqSolve_step. In the case of simulation code, the infinite loop can be stopped if the step function returns error code 0 (meaning that input streams are empty) and the main program will close communication (eqSolve_CloseIO). extern int main() { int code; eqSolve_OpenIO(); /* input/output initializing code = eqSolve_initialize(); /* initializing the state variables while(code) code = eqSolve_step(); /* the steps eqSolve_CloseIO(); } /* input/output finalizing
*/ */ */ */
Example: Generated C code of the solver: the main program
1.5.2 Sequential Code Generation This section describes the basic, sequential, inlining, code generation scheme that directly interprets the S IGNAL process obtained after clock hierarchization. This description is illustrated by the eqSolve process. Figure 1.6 contains an extract
28
L. Besnard et al. stable {R_n,...}
{a,b,c,C_delta,C_,C_x_st,...}
{delta,...}
[C_delta]
C_delta = not (a=0) C_ = (a=0) %not C_delta%
[stable]
[C_]
{...}
[C_x_st] {x_st}
Fig. 1.6 Clock hierarchy of the solver process
of the clock hierarchy resulting from the clock calculus applied to the solver. Example p. 28 is the C generated code for this solver. The code of the step block is structured according to the clock hierarchy. The precedence relation is the source of local deviations. Let us give some details about the clock hierarchy, the interface and the state variables of this example. In the eqSolve_step function, an original S IGNAL identifier xxx has a Boolean clock named C_xxx. The master clock, which is the clock of the stable variable, ticks every time the step function is called. Inputs are read as soon as their clock is evaluated and is true (see for example r_eqSolve_a(&a), called when the signal stable is true). Outputs are sent as soon as they are evaluated (see for example w_eqSolve_x_st(x_st), called when the signal C_x_st is true). int eqSolve_step() { if (stable) { if (!r_eqSolve_a(&a)) return 0; if (!r_eqSolve_b(&b)) return 0; if (!r_eqSolve_c(&c)) return 0; C_ = a == 0.0; C_delta = !(a == 0.0); if (C_delta) { delta = b * b - (4.0*a)*c; C_231 = delta < 0.0; C_x1_1 = delta == 0.0; C__250 = delta > 0.0; void eqSolve_step_initialize() if (C_x1_1) x1_1 = -b/(2.0*a);} { C_ = 0; C_234 = (C_delta ? C_231 : 0); C_delta = 0; if (C_) { C_b1 = 0; C_x_st_1 = b == 0.0; C_x1_1 = 0; C_b1 = !(b == 0.0); C__250 = 0; } if (C_x_st_1) x_st_1 = c != 0.0;} C_x_st_1_220 = (C_ ? C_x_st_1 : 0); int eqSolve_step_finalize() C_x_st = C_x_st_1_220 || C_234; { if (C_x_st) { stable = next_stable; if (C_234) x_st=1; else x_st=x_st_1; C_x_st_1 = 0; w_eqSolve_x_st(x_st); } C_231 = 0; } eqSolve_step_initialize(); /* ... */ return 1; eqSolve_step_finalize(); } return 1;} static float a, b, c; static int x_st; ... int eqSolve_initialize() { stable = 1; S_n = 0.0e0; next_R_n = 1.0; XZX_162 = 0.0e0; U = 0.0e0; eqSolve_step_initialize(); return 1; }
Example: Generated C code of the solver: the step block
1
Compilation of Polychronous Data Flow Equations
29
The state variables are updated at the end of the step (eqSolve_step_ finalize). One can notice the tree structure of conditional if-then-else statements which directly translates the clock hierarchy. For instance, the computation of x1_1 is executed only if stable is true and a/=0 (C_delta is true) and delta=0 (C_x1_1 is true). One can also notice the precedence refinement as presented in Sect. 1.4.4.2: to generate sequential code, it is usually necessary to add serializations. To illustrate this, consider the abstraction, reduced to the precedence relations of the solver (Example p. 29-a). To generate the sequential code, reading actions are ordered: a precedes b and b precedes c. This refinement is expressed in S IGNAL in the abstraction given in Example p. 29-b. Internal actions are also serialized when required. Thus the generated code is a specialization of the original process. process eqSolve_ABSTRACT = ( ? real a, b, c; ! boolean x_st; real x1, x2; ) spec (| {a,b,c} --> {x_st,x1,x2} | %clocks ... % |) a-From the original process
spec (| (| a --> b | b --> c | x_st --> x2 | x2 --> x1 |) | {a,b,c} --> {x_st,x1,x2} | %clocks ... % |) b-Precedence refinement
Example: Abstractions (reduced to precedence relations) of the solver
Note that in this code generation scheme, the scheduling and the computations are merged.
1.5.3 Clustered Code Generation with Static Scheduling The scheme presented here uses the result of clustering in phylums to generate code. This method is particularly relevant in code generation scenarios such as modular compilation and distribution. Figure 1.7 displays the clusters obtained for the eqSolve process. Since there are three inputs a, b, c, the clustering in phylums can lead to, at most, 23 D 8 clusters, plus one for state variables. Fortunately clustering is usually far from reaching worst combinatoric case. In the case of the solver, five clusters are non-empty. The clusters are subject to inter-cluster precedences that appear in the figure as solid
c
x1 Cluster_2 {a,c}
a
Cluster_1
Cluster_4 {a,b,c} {a} b
Cluster_3 {a,b}
Fig. 1.7 The phylums of the solver
x2 x_st
Cluster_delays
30
L. Besnard et al.
arrows. Serializations, represented as dotted arrows, are added for static scheduling. The Cluster_delays is the “State variables” component (Fig. 1.5), in charge of updating state variables. It is preceded by all other clusters. Example p. 30 presents the code generated for this structuring of the solver into five phylums. The function eqSolve_step encodes a static scheduler for these clusters. int eqSolve_step() { if (stable) { if (!r_eqSolve_a(&a)) return 0; if (!r_eqSolve_b(&b)) return 0; if (!r_eqSolve_c(&c)) return 0; } eqSolve_Cluster_4(); eqSolve_Cluster_3(); if (stable) eqSolve_Cluster_2(); eqSolve_Cluster_1(); if (stable) if (C_x_st) w_eqSolve_x_st(x_st); if (C_x1_2) w_eqSolve_x2(x2); if (C_x1) w_eqSolve_x1(x1); eqSolve_Cluster_delays(); return 1; } Example: Generated C code of the solver: statically scheduled clusters
In contrast to the previous code generation method, which globally relies on the clock hierarchy and locally relies (for each equivalence class of the clock hierarchy) on the conditioned precedence graph, clustering globally relies on the conditioned precedence graph and locally relies (for each cluster of the graph) on the clock hierarchy. Note. In general any clustering is suitable with respect to some arbitrary criterion, provided that each cluster remains insensitive to communication delays or operator latencies. Monolithic clustering (one cluster for the whole process) minimizes scheduling overhead but has poor concurrency. The finest clustering (one cluster per signal) maximizes concurrency and unfortunately scheduling overhead. Heuristics are used in P OLYCHRONY to limit the exploration cost.
1.5.4 Clustered Code Generation with Dynamic Scheduling Clustered code generation can be used for multi-threaded simulation by equipping it with dynamic scheduling artifacts. The code of a cluster is encapsulated in a component implemented as a task. A component is structured as outlined in Fig. 1.8a: each component Ti has a semaphore Si used to manage the precedence relation between the components. Each task Ti starts by waiting on its Mi predecessors with wait(Si)statements. Each task Ti ends by signaling all its successors j D 1; : : : ; Ni with signal(Sij). Moreover, one component is generated for each input-output function. The step scheduler (Fig. 1.8b) is implemented by a
1
Compilation of Polychronous Data Flow Equations
void * P_Ti() { /* task Ti */ while (true) { pK_Isem_wait(Si); ... pK_Isem_wait(Si); P_Cluster_i(); pK_Isem_signal(Si1); ... pK_Isem_signal(SiNi); } } a-A step component
31 void * P_step_Task() { /* Task T0 */ while (true) { /*signal to the clusters without predecessors */ pK_Isem_signal(S01); ... pK_Isem_signal(S0N0); /* wait the signal of the clusters without successors */ pK_Isem_wait(S0); ... pK_Isem_wait(S0); } } b-The step scheduler
Fig. 1.8 Code template of a cluster task and of the step function
particular task T0: it starts execution by signaling all source tasks j D 1; : : : ; N0 with signal(S0j) and ends by waiting on its sink tasks with wait(S0) statements. The semaphores and tasks are created in the P_initialize function of the generated code. When simulation code is generated, a P_terminate task is also added to kill all tasks of the application.
1.5.5 Modular Code Generation Modular compilation consists of compiling a process, then exporting its model, and use it in another process. For the purpose of exporting and importing the model of a process whose code has been compiled separately, the P OLYCHRONY S IGNAL compiler provides an annotation mechanism to associate a compiled process with a profile (Sect. 1.4.6). This profile consists of an abstraction of the original process model consisting of the synchronization and precedence relations of the input and output signals of the process. These properties may be provided by the designer (in the case of legacy C code, for instance) or calculated by the compiler (in the case of a compiled S IGNAL process). The annotations also give the possibility to specify the language in which the process is compiled (C, C++, Java) since the function call conventions and variable binding may vary slightly from one language to another. Starting from a S IGNAL process, modular compilation supports the code generation strategies, with or without clusters, outlined previously. Considering the specification of the solver, we detail possible compilation scenarios in which the sub-processes FirstDegree and SecondDegree are compiled separately. The process SecondDegree has been adapted to the context of modular compilation: it does not assume that the values of a are different from 0. The clock of the signal stable is the master clock of the SecondDegree process.
32
L. Besnard et al.
1.5.5.1
Sequential Code Generation for Modular Compilation
A naive compilation method consists in associating each of the sub-processes of the solver with a monolithic step function. Example p. 32 gives the S IGNAL abstraction (black box abstraction) that is inferred – or could be user-provided – in order to use the SecondDegree process as external function. process SecondDegree_ABSTRACT = (? real a, b, c; ! boolean x_st; real x21, x2; boolean stable,C_x1_2,C_delta,C_x_st,C_x21; ) spec (| (| stable --> {a,b,c} ... |) | (| stable ˆ= C_x1_2 ˆ= C_x21 | ... |) |) pragmas BlackBox "SecondDegree" end pragmas external "C"; Example: Black box abstractions of the FirstDegree and SecondDegree process models
The interface of the process SecondDegree has been modified: it is necessary to export Boolean signals that are used to define clocks of output signals. Then the interface of SecondDegree is endochronous. The process eqSolve_bb, Example p. 32, is the S IGNAL process of the solver in which the separately compiled processes FirstDegree (the abstraction and generated code of which can be found in Example p. 21) and SecondDegree are called. process eqSolve_bb = {real epsilon} ( ? real a, b, c; ! boolean x_st; real x1, x2; ) (| a ˆ= b ˆ= c ˆ= when stable | (x_st_1, x11) := FirstDegree_ABSTRACT(b when (a=0.0), c when (a=0.0)) | (x_st_2, x21, x2, stable, C_x1_2, C_delta, C_x_st2, C_x21) := SecondDegree_ABSTRACT(a, b, c) | x1 := x11 default x21 | x_st := when x_st_2 default x_st_1 |) where ... Example: Importing separately compiled processes in the specification of the solver
Unfortunately, the code from which SecondDegree_ABSTRACT is an abstraction is a sequential code. And it turns out that the added precedence relations, put in the context of process eqSolve, generate a cycle that is reported by the compiler (Example p. 32): the abstraction of SecondDegree exhibits the precedence stable --> a, the function call to SecondDegree_ABSTRACT implies that the input signal a precedes the output signal stable, hence the cycle. Code generation cannot proceed further. process eqSolve_bb_CYC = ( ) (| (x_st_2, x21,x2, stable, C_x1_2, C_delta, C_x_st2, C_x21) := SecondDegree_ABSTRACT(a,b,c) | stable --> a |) Example: Cycle in the solver displayed by the compiler
1
Compilation of Polychronous Data Flow Equations
1.5.5.2
33
Clustered Code Generation for Modular Compilation
To avoid spurious cycles from being introduced, it is better suited to apply clustering code generation techniques. The profile of a clustered and separately compiled process (grey box abstraction), Example p. 33, provides more information than a black box abstraction: it makes the clusters apparent and details the clock and precedence relations between them. In turn, this profile contains sufficient information to schedule the clusters, and hence, call them in the appropriate order dictated by the calling context. process SecondDegree_ABSTRACT = ( ? real aa, bb, cc; ! boolean x_st; real x21, x2; boolean stable, C_x1_2, C_delta, C_x_st, C_x21; ) pragmas GreyBox "SecondDegree" end pragmas (| (| Tick := true | when Tick ˆ= stable ˆ= C_x1_2 ˆ= C_x21 | when stable ˆ= aa ˆ= bb ˆ= cc ˆ= C_delta | when C_x1_2 ˆ= x2 | when C_delta ˆ= C_x_st | when C_x_st ˆ= x_st | when C_x21 ˆ= x21 |) | (| lab :: (x_st,x21,x2,C_x1_2,C_x_st,C_x21) := SecondDegree_Cluster_1() | lab ˆ= when Tick |) | (| lab_1 :: C_delta := SecondDegree_Cluster_2(aa) | lab_1 ˆ= when Tick |) | (| lab_2 :: SecondDegree_Cluster_3(bb) | lab_2 ˆ= when stable |) | (| lab_3 :: SecondDegree_Cluster_4(cc) | lab_3 ˆ= when stable |) | (| lab_4 :: stable := SecondDegree_Cluster_delays() | lab_4 ˆ= when Tick |) | (| cc --> lab | bb --> lab | aa --> lab | lab_1 --> lab_3 | lab_1 --> lab_2 | lab_1 --> lab | aa --> lab_1 | lab_2 --> lab | bb --> lab_2 | aa --> lab_2 | lab_3 --> lab | cc --> lab_3 | aa --> lab_3 | lab_3 --> lab_4 | lab_2 --> lab_4 | lab_1 --> lab_4 | lab --> lab_4 |) |) where %Declarations of the clusters% end; Example: The grey box abstraction of the SecondDegree process model
The calling process schedules the generated clusters in the order best appropriate to its local context, as shown in Example p. 34. The step associated with FirstDegree and the clusters of the separately compiled process SecondDegree are called in the very order dictated by scheduling constraints of the solver process, avoiding the introduction of any spurious cycle. extern int eqSolve_gb_step() { if (stable) { if (!r_eqSolve_gb_a(&a)) return 0; if (!r_eqSolve_gb_b(&b)) return 0; if (!r_eqSolve_gb_c(&c)) return 0; C_ = a == 0.0; if (C_) { FirstDegree_step(&FirstDegree1,b,c,&x_st_1,&x11); C_x11 = !(b == 0.0); } C_x_st_1_227 = (C_ ? (b == 0.0) : 0); } SecondDegree_Cluster_2(&SecondDegree2,a,&C_delta); C_x11_236 = (C_ ? C_x11 : 0);
34
L. Besnard et al. if (stable) { SecondDegree_Cluster_3(&SecondDegree2,b); SecondDegree_Cluster_4(&SecondDegree2,c); } SecondDegree_Cluster_1(&SecondDegree2,&x_st_2,&x21,&x2, &C_x1_2,&C_x_st2,&C_x21); ... if (C_x1_2) w_eqSolve_gb_x2(x2); if (C_x1) { if (C_x11_236) x1 = x11; else x1 = x21; w_eqSolve_gb_x1(x1); } eqSolve_gb_step_finalize(); return 1; } Example: Generated C code for the first and second degree clusters steps
1.5.6 Distributed Code Generation Distributed code generation in P OLYCHRONY follows along the same principles as dynamically scheduled clustered code generation (Sect. 1.5.4). The final embedded application implemented on a distributed architecture can be represented by the Fig. 1.9. While clustered code generation is automatic, distribution requires additional information provided by the user. Namely: A block-diagram and topological description of the target architecture A mapping of software diagrams onto the target architecture blocks
Automated distribution consists of a global compilation script that proceeds with the following steps: 1. DCG computation for the main process structure (Sect. 1.4.5) 2. Booleanization: extension of event signals to Boolean signals according to the clock hierarchy 3. Clustering of the main process according to the given target architecture and mapping (Sect. 1.5.6.1) 4. Endochronization of each cluster (Sect. 1.4.3.2); this is made by importing/exporting Boolean signals between clusters, according to the global DCG
main
main
Step scheduler
Step scheduler (Hierarchical) Step State Local variables variables IO container
Fig. 1.9 Overview of a distributed code
(Hierarchical) Step Communication support
State Local variables variables IO container
1
Compilation of Polychronous Data Flow Equations
35
5. Addition of communication information (Sect. 1.5.6.2) 6. Individual compilation of all clusters 7. Generation of the IO container (Sect. 1.5.6.3) and the global main program
1.5.6.1
Topological Annotations
The distribution methodology is applied in Example p. 35. The user defines the mapping of the elementary components over the target architecture with a few pragmas. The pragma RunOn specifies the processors on which a set of components has to be located. For example, the expression RunOn {e1} "1" specifies that the component labelled with e1 is mapped on the location 1 (e1 is declared in the statement e1 :: PROC1{} as being the label of the component consisting of the subprocess PROC1). process eqSolve = {real epsilon} ( ? real a, b ,c; ! boolean x_st; real x1, x2; ) pragmas Topology {b,a} "1" Topology {c} "2" Topology {x1,x2} "2" Topology {x_st} "1" Target "MPI" RunOn {e1} "1" RunOn {e2} "2" end pragmas (| e1 :: PROC1{} | e2 :: PROC2{} |)
process PROC1 = ( ? real a, b, c, x11; process PROC2 = boolean x_st_1; ( ? boolean stable; ! boolean stable; real b, c; real x2, x1; ! real x11; boolean x_st; ) boolean x_st_1; (| (x_st_2, x21, x2, stable) ) := SecondDegree{.}(...) (| a ˆ= b ˆ= c | x1 := x11 default x21 ˆ= when stable | x_st := x_st_2 | (x_st_1, x11) default x_st_1 := FirstDegree(...) |) |) ;
Example: Functional clustering of the solver
The pragma Topology associates the input and output signals of the process with a location. For example, the pragma Topology {a,b} "1" tells that the input signals a and b must be read on location 1. The pragma Target specifies the API used to generate code that implements communication: for instance, Target "MPI" tells that the MPI library is used for that purpose.
1.5.6.2
Communication Annotations
Example p. 36 displays the process transformation resulting of the clustering requested by the user. A few signals have been added to the interface of each component: signals and clocks produced on one part of the system and used on the other one have to be communicated.
36
L. Besnard et al. process eqSolve_EXTRACT_1= ( ? boolean x_st_1; real x11, c, b, a; boolean C_b1, C_x_st_1_490, process eqSolve_EXTRACT_2= ( ? real c, b, a; C_, C_x_st_1; boolean stable; ! boolean x_st; ! boolean x_st_1; real x1, x2; real x11; boolean stable; boolean C_, C_x_st_1, ) C_x_st_1_490, C_b1; pragmas ) RunOn "1" pragmas Environment {c} "1" RunOn "2" Environment {b} "3" Environment {c} "2" Environment {a} "5" Environment {b} "4" Environment {x_st} "7" Environment {a} "6" Environment {x1} "8" Receiving {stable} "10" Environment {x2} "9" "eqSolve_EXTRACT_1" Sending {stable} "10" Sending {x_st_1} "11" "eqSolve_EXTRACT_2" "eqSolve_EXTRACT_1" Receiving {x_st_1} "11" Sending {x11} "12" "eqSolve_EXTRACT_2" "eqSolve_EXTRACT_1" Receiving {x11} "12" Sending {C_} "13" "eqSolve_EXTRACT_2" "eqSolve_EXTRACT_1" Receiving {C_} "13" Sending {C_x_st_1} "14" "eqSolve_EXTRACT_2" "eqSolve_EXTRACT_1" Receiving {C_x_st_1} "14" Sending {C_x_st_1_490} "15" "eqSolve_EXTRACT_2" "eqSolve_EXTRACT_1" Receiving {C_x_st_1_490} "15" Sending {C_b1} "16" "eqSolve_EXTRACT_2" "eqSolve_EXTRACT_1" Receiving {C_b1} "16" end pragmas "eqSolve_EXTRACT_2" end pragmas; ... Example: The components annotated by the compiler
Required communications are automatically added using additional pragmas. The pragma Environment associates an input or output signal with the location of a communication channel. For instance, Environment {c} "1" means that signal c is communicated along channel 1. The pragma Receiving associates an input signal with a channel location and its sender process. To send the signal x11 from process eqSolve_EXTRACT_1 along channel 12, the following pragma is written: Receiving {x11} "12" "eqSolve_EXTRACT_2". Similarly, the pragma Sending associates an output signal with a channel location and its receiving processes. For example, the output signal stable of process eqSolve_EXTRACT_1 is sent along channel 10 to process eqSolve_ EXTRACT_2 by: Sending {stable} "10" "eqSolve_EXTRACT_2".
1.5.6.3
IO Code Generation
Multi-threaded, dynamically scheduled, code generation, as described in Sect. 1.5.4, is applied on the process resulting from the transformations performed for automated distribution. The information carried by the pragmas Environment,
1
Compilation of Polychronous Data Flow Equations
37
Receiving and Sending is used to generate communications. The communications of the signal C_ (Example p. 36), between the sender eqSolve_EXTRACT_2 and the receiver eqSolve_EXTRACT_1, are implemented using the MPI library (Example p. 37). (From the file: eqSolve EXTRACT 1 io.c) int r_eqSolve_EXTRACT_1_C_(int *C_) { MPI_Recv(C_, /* name */ 1,MPI_INT, /* type */ eqSolve_EXTRACT_2, /* received from */ 13, /* the logical tag of the MPI_COMM_WORLD, /* MPI specific parameter MPI_STATUS_IGNORE); /* MPI specific parameter return 1; } (From the file: eqSolve EXTRACT 2 io.c) void w_eqSolve_EXTRACT_2_C_(int C_) { MPI_Send(&C_, /* name */ 1,MPI_INT, /* type */ eqSolve_EXTRACT_1, /* sent to */ 13, /* the logical tag of the MPI_COMM_WORLD); /* MPI specific parameter }
receiver */ */ */
sender */ */
Example: Example of communications
1.6 Conclusion The P OLYCHRONY workbench is an integrated development environment and technology demonstrator consisting of a compiler (set of services for, e.g., program transformations, optimizations, formal verification, abstraction, separate compilation, mapping, code generation, simulation, temporal profiling, etc.), a visual editor and a model checker. It provides a unified model-driven environment to perform embedded system design exploration by using top-down and bottom-up design methodologies formally supported by design model transformations from specification to implementation and from synchrony to asynchrony. In order to bring the synchronous multi-clock technology in the context of modeldriven environments, a metamodel of S IGNAL has been defined and an Eclipse plugin for P OLYCHRONY is being integrated in the open-source platforms TopCased from Airbus [26] and OpenEmbeDD [27]. The P OLYCHRONY workbench is now freely distributed [6]. In parallel with the P OLYCHRONY academic set of tools, an industrial implementation of the S IGNAL language, called S ILDEX, was developed by the TNI company, now part of Dassault Systems. This commercial toolset, which is now called RTB UILDER, is supplied by Geensoft [28]. P OLYCHRONY supports the polychronous data flow specification language S IGNAL. It is being extended by plugins to capture within the workbench specific
38
L. Besnard et al.
modules written in usual software programming languages such as SystemC or Java. It provides a formal framework: 1. 2. 3. 4.
To validate a design at different levels To refine descriptions in a top-down approach To abstract properties needed for black-box composition To assemble predefined components (bottom-up with COTS)
To reach these objectives, P OLYCHRONY offers services for modeling application programs and architectures starting from high-level and heterogeneous input notations and formalisms. These models are imported in P OLYCHRONY using the data flow notation S IGNAL. P OLYCHRONY operates these models by performing global transformations and optimizations on them (hierarchization of control, desynchronization protocol synthesis, separate compilation, clustering, abstraction) in order to deploy them on mission specific target architectures. In this chapter, we meant to illustrate the application of a general principle: that of correct-by-construction design of systems, from the early stages of the design to the code generation phases on a given architecture. This is obtained by means of formally defined transformations, based on the mathematical polychrony model of computation, that may be expressed as source-to-source program transformations. This has several beneficial consequences for practical usability. In particular, scenarios of transformations can be fully controlled by an application designer. Among possible scenarios, a designer will have to his or her disposal predefined ones allowing for instance simulation of the application following different options, or safe code generation. This transformation-based mechanism makes it possible to formally validate the final result of a compilation. Moreover, it provides a suitable level of consideration for traceability purpose. Source-to-source transformation of programs is used also for temporal analysis of S IGNAL processes on their implementation platform [29]. Basically, it consists of formal transformation of a process into another S IGNAL process that corresponds to a so-called temporal interpretation of the initial process. The temporal interpretation is characterized by quantitative properties of the implementation architecture. The new process can serve as an observer of the initial one. Here, we focused more particularly on formal context for code generation (Sect. 1.4) and on code generation strategies available in P OLYCHRONY (Sect. 1.5) by considering a S IGNAL process solving second degree equations (Sect. 1.3). While mathematically simple, this process exhibits non-trivial modes, synchronization relation, precedence relation, which we analyzed and transformed to illustrate several usage scenarios and for which we applied the following code generation strategies: Sequential code generation, Sect. 1.5.2, consists of producing a single step func-
tion for a complete S IGNAL process. Clustered code generation with static scheduling, Sect. 1.5.3, consists of parti-
tioning the generated code into one cluster per set of input signals. The step function is a static scheduler of these clusters.
1
Compilation of Polychronous Data Flow Equations
39
Clustered code generation with dynamic scheduling, Sect. 1.5.4, consists of a
dynamic scheduling of a set of clusters. Distributed code generation, Sect. 1.5.6, consists of physically partitioning a pro-
cess across several locations and of installing point to point communications between them. Sequential code generation for separate compilation, Sect. 1.5.5.1, consists of associating the sequential generated code of a process with a profile describing its synthesized synchronization and precedence relations. The calling context of the process is passed to it as parameter. Clustered code generation for separate compilation, Sect. 1.5.5.2, consists of associating the clustered generated code of a process with a profile describing the synchronization and precedence relations of and between its clusters. The scheduler of the process is generated in each call context. These code generation strategies are based on formal operations, such as abstractions, that make possible separate compilation, code substitutability and reuse of legacy code. The multiplicity of these strategies is a demonstration of the flexibility of the compilation and code generation tools provided in P OLYCHRONY. It is also an indicator for the possibility offered to software developers to create new generators using the open-source version of P OLYCHRONY. Such new generators will be needed for developing new execution schemes or to adapt the current ones in enlarged contexts providing for example a design-by-contract methodology [30].
References 1. S.K. Shukla, J.-P. Talpin, S.A. Edwards, and R.K. Gupta. High level modeling and validation methodologies for embedded systems: bridging the productivity gap. In VLSI Design 2003, pp. 9–14. 2. M. Crane and J. Dingel. UML vs. classical vs. rhapsody statecharts: not all models are created equal. In Software and Systems Modeling, 6(4):415–435, December 2007. 3. A. Benveniste, P. Caspi, S. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The synchronous languages twelve years later. In Proceedings of the IEEE, 91(1):64–83, January 2003. 4. P. Le Guernic, J.-P. Talpin, and J.-C. Le Lann. Polychrony for system design. In Journal for Circuits, Systems and Computers, 12(3):261–304, April 2003. 5. L. Besnard, T. Gautier, and Paul Le Guernic. S IGNAL V4-Inria Version: Reference manual. http://www.irisa.fr/espresso/Polychrony. 6. The Polychrony platform. http://www.irisa.fr/espresso/Polychrony. 7. M. Le Borgne, H. Marchand, E. Rutten, and M. Samaan. Formal verification of programs specified with Signal: application to a power transformer station controller. In Science of Computer Programming, 41:85–104, 2001. 8. M. Kerbœuf, D. Nowak, and J.-P. Talpin. Specification and verification of a steam-boiler with signal-coq. In Theorem Proving in Higher Order Logics (TPHOLs’2000). Lecture Notes in Computer Science. Springer, Berlin, 2000. 9. T. Grandpierre and Y. Sorel. From algorithm and architecture specifications to automatic generation of distributed real-time executives: a seamless flow of graphs transformations. In Formal Methods and Models for Codesign Conference, Mont-Saint-Michel, France, June 2003.
40
L. Besnard et al.
10. P. Le Guernic. S IGNAL: Description alg´ebrique des flots de signaux. In Architecture des machines et syst`emes informatiques, pp. 243–252. Hommes et Techniques, Paris, 1982. 11. P. Le Guernic and A. Benveniste. Real-time, synchronous, data-flow programming: the language S IGNAL and its mathematical semantics. Technical Report 533 (revised version: 620), INRIA, June 1986. 12. P. Le Guernic and T. Gautier. Data-flow to von Neumann: the S IGNAL approach. In J.L. Gaudiot and L. Bic, editors, Advanced Topics in Data-Flow Computing, pp. 413–438. Prentice Hall, Englewood Cliffs, NJ, 1991. 13. A. Benveniste, P. Le Guernic, and C. Jacquemot. Synchronous programming with events and relations: the S IGNAL language and its semantics. In Science of Computer Programming, 16:103–149, 1991. 19. P. Le Guernic, T. Gautier, M. Le Borgne, and C. Le Maire. Programming real-time applications with S IGNAL. In Proceedings of the IEEE, 79(9):1321–1336, September 1991. 15. A. Gamati´e. Designing Embedded Systems with the SIGNAL Programming Language. Springer, Berlin, 2009. 16. G. Kahn. The semantics of a simple language for parallel programming. In J.L. Rosenfeld, editor, Information Processing 74, pp. 471–475. North-Holland, Amsterdam, 1974. 17. Arvind and K.P. Gostelow. Some Relationships Between Asynchronous Interpreters of a Dataflow Language. North-Holland, Amsterdam, 1978. 18. J.B. Dennis, J.B. Fossen, and J.P. Linderman. Data flow schemas. In A. Ershov and V. A. Nepomniaschy, editors, International Symposium on Theoretical Programming, pp. 187–216. Lecture Notes in Computer Science, vol. 5. Springer, Berlin, 1974. 19. M. Le Borgne. Dynamical systems over Galois fields: Applications to DES and to the Signal Language, In Lecture Notes of the Belgian-French-Netherlands Summer School on Discrete Event Systems. Spa, Belgium, June 1993. 20. T. Amagbegnon, L. Besnard, and P. Le Guernic. Arborescent Canonical Form of Boolean Expressions. INRIA report n. 2290, 1994. 21. O. Maffe¨ıs and P. Le Guernic. Distributed implementation of SIGNAL: scheduling & graph clustering. In 3rd International School and Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems, pp. 547–566. Lecture Notes in Computer Science, vol. 863. Springer, Berlin, 1994. 22. D. Potop-Butucaru, S.E. Edwards, and G. Berry. Compiling Esterel. Springer, Berlin, 2007. 23. L. Besnard, T. Gautier, M. Moy, J.-P. Talpin, K. Johnson, and F. Maraninchi. Automatic translation of C/C++ parallel code into synchronous formalism using an SSA intermediate form. In L. O’Reilly and M. Roggenbach, editors, Ninth International Workshop on Automated Verification of Critical Systems (AVOCS’09), 2009. 24. T. Gautier and P. Le Guernic. Code generation in the SACRES project. In Towards System Safety, Proceedings of the Safety-critical Systems Symposium, SSS’99, Huntingdon, UK, Springer, 1999, pp. 127–149. 25. P. Aubry, P. Le Guernic, and S. Machard. Synchronous distribution of S IGNAL programs. In Proc. of the 29th Hawaii International Conference on System Sciences, vol. 1, pp. 656–665. IEEE Computer Society Press, Los Alamitos, CA, 1996. 26. The Topcased platform. http://www.topcased.org. 27. The OpenEmbeDD platform. http://www.openembedd.org. 28. Geensoft’ RT-Builder. http://www.geensoft.com/fr/article/rtbuilder. 29. A. Kountouris and P. Le Guernic. Profiling of S IGNAL programs and its application in the timing evaluation of design implementations. In Proceedings of the IEE Colloq. on HW-SW Cosynthesis for Reconfigurable Systems, pp. 6/1–6/9, Bristol, UK, February 1996. HP Labs. 30. Y. Glouche, T. Gautier, P. Le Guernic, and J.-P. Talpin. A module language for typing S IGNAL programs by contracts, In this book.
Chapter 2
Formal Modeling of Embedded Systems with Explicit Schedules and Routes Julien Boucaron, Anthony Coadou, and Robert de Simone
A main goal of compilation is to efficiently map application programs onto execution platforms, while hiding the details of the latter to the programmer through high-level programming languages. Of course this is only feasible inside a certain range of constructs, and the judicious design of sequential programming languages and computer architectures that match one another has been a decades-long process. Now the advent of multicore processors brings radical changes to this topic, bringing forth concurrency as a key element in efficiency, both for application design and architecture computing power. The shift is mostly prompted by technological factors, namely the ability to cram several processors on a single chip, and the diminishing gains of Instruction Level Parallelism techniques used in former architectures. Still, the definition of high-level programming (and more generally, application design) formalisms matching the new era is a largely unsolved issue. In the face of this new situation, formal models of concurrency will have to play a role. While not readily programming models, they can on the one hand provide intermediate representation formats that provide a bridge towards novel architectures. On the second hand they can also act as foundational principles to devise new programming and design formalisms, specially when matching domains have natural concurrent representation, such as dataflow signal processing algorithms. On the third hand their emphasis on communication and data transport as much as actual computations makes them fit to deal with communication latencies (to/from external memory or between computation units) at an early design stage. As a result such Models of Communication and Computation (MoCCs) may hold a privileged position, at the crossing point of executable programs and analyzable models. In the current chapter, we investigate dataflow Process Networks. They seem specially relevant as a formal framework modeling situations when execution platforms are themselves networks of processors, while applications are prominently dataflow signal processing algorithms. In addition they form a privileged setting where to express and study issues of static scheduling and routing patterns. J. Boucaron, A. Coadou, and R. de Simone () INRIA Sophia Antipolis M´editerran´ee, 06902 Sophia Antipolis, France e-mail: [email protected]
S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 2, c Springer Science+Business Media, LLC 2010
41
42
J. Boucaron et al.
Outline In the first section, we recall and place in relative positions some of the many Process Networks variants introduced in literature. We partially illustrate them on an example. Sections 2.2 and 2.3 provide more detailed formal definitions, and list a number of formal properties, some classical, some recently obtained, some as our original contribution. Section 2.2 focuses on pure dataflow models, while Section 2.3 introduces condition control switches provided it does not introduce conflicts between individual computations. We conclude on open perspectives.
2.1 General Presentation 2.1.1 Process Networks As their name indicates, Process Networks (PN) consist of local processing (or computation) nodes, connected together in a network by point-to-point communication links. Generally the links are supposed to carry data flow streams. Most Process Network models are considered to be dataflow, meaning here that computations are triggered upon arrival of enough data on incoming channels, according to specific semantics. Historically the most famous PN models are Petri Nets [65], Karp-Miller’s Parallel Program Schemata [50], and Kahn Process Networks [47]. The Ptolemy [39] environment is famous as a development framework based on a rich variety of PN models. More new models are emerging due to a renewed interest in such streaming models of computation, prompted by manycore concurrent architecture and dataflow concurrent algorithms in signal processing and embedded systems. By nature PNs allow design methodologies based on assembly of components with ports into composite subsystems; hierarchical descriptions come as natural. When interconnect fabrics more complex than simple point-to-point channels are meant, they need to be realized as another component. Local components (computation nodes or composite subsystems) may contain states. Even though PNs usually accept a native self-timed semantics (which only states that computations are triggered whenever the channels data occupancy allows), a great deal of research in PN semantics consists in establishing when “optimal” schedules may be designed, and how they can be computed (and if possible statically). Such a schedule turns the dataflow semantics into a more traditional control-flow one, as in Von Neumann architectures, while it guarantees that compute operation will be performed exactly when the data arguments are located in the proper places (here channel queues instead of registers). More generally, we believe that Process Networks can be used as key formal models from Theoretical Computer Science, ready to explain concurrency phenomena in these days of embedded applications and manycore architectures. Also they may support efficient analyses and
2
Formal Modeling of Embedded Systems
43
transformations, as part of a general parallelizing compilation process, involving distributed scheduling, optimized mapping, and real-time performance optimization altogether.
2.1.2 Pure Dataflow A good starting point for dataflow PN modeling is that of Marked Graphs [32] (also called Event Graphs in the literature). They form the simplest model, the one for which the most results have been established, and under the simplest form. For instance the questions of liveness (or absence of deadlocks and livelocks), boundedness (or executability with finite queue buffers), the question of optimal static schedules have positive answers in this model (and specially in the case of strongly connected graphs underlying the directed-communication network). Then other models of computation and communication can be considered as natural extensions of Marked Graphs with additional features. Timed Marked Graphs [68] add minimal prescribed latencies on channel transport and computation duration. Synchronous Data Flow [56] (SDF) graphs, also called Weighted Event Graphs, require that data are produced and consumed on input and output channels not individually, but along a prescribed fixed number each channel. The case of single token consumption and production is called “homogeneous” in SDF terminology, so that Marked Graphs are also called homogeneous SDF processes. Bounded Channel Marked Graphs limit the buffering capacity of channels, possibly to a value below the bound observed by safety criteria, so that further constraints of data traffic are imposed by congestion control. The recent theory of Latency-Insensitive Design [21] and Synchronous Elastic processes is attempting to introduce Bounded Channels, Timed Marked Graphs for the modeling of System-on-Chip, and the analysis of timing closure issues. It focuses on the modeling of the various elements involved by typical hardware components. In all cases the models can be expanded into Marked Graphs, but the exact relations between properties on original models and those of their reflection into Marked Graphs often need to be carefully stated. We shall consider this as we go through detailed definitions of the various such Process Networks in the next section. The syntactic inclusions are depicted in Fig. 2.1. Any circuit in a Marked Graph supports a notion of structural throughput which is the ratio of the number of tokens involved in the circuit over its length (this is an invariant). Then, a Marked Graph contains critical circuits, and a critical structural throughput, which provides an upper bound of the allowable frequency of computa-
Fig. 2.1 Syntactic expressivity of Pure DataFlow
44
J. Boucaron et al.
tion firings. A master result of Marked Graphs is that they can indeed be scheduled in an ultimately repetitive way which reaches the speed of this critical throughput. This result can be adapted to other extended Process Network models, and the schedules optimized against other criteria as well in addition. We shall describe our recent contributions in that direction in further sections.
2.1.3 Static Control Preserving Conflict-Freeness The previous kinds of models could be tagged as “purely dataflow” based, as data (or rather their token abstractions) always follow the same paths. In the chapter, we shall considered extended models, where some of the nodes may select on which channels they will consume or produce respectively their input or output tokens respectively (instead of producing/consuming on each channel uniformly). Nevertheless some restrictions will remain: choices should be made according to the node current internal state, and not upon availability of tokens on various channels. Typically, the “choice between input guards” construct of CSP [44] will be forbidden, just as any synchronous preemption scheme. As a result, the extended class of Process Networks shall retain the “conflict freeness” property such as defined in Petri Nets terminology. As a result, the various self-time executions of the systems result all in the same partially ordered trace of computations. In other words, a computation once enabled must eventually be triggered or executed and cannot be otherwise disabled. Conflict freeness can also be closely associated with the notion of “confluence” (in the terms of R. Milner for process algebra), or also to the “monotonicity continuity” in Kahn Process Networks. Conflict freeness implies the important property of latency-insensitivity. The input/output semantics of the system as a whole will never depend upon the speed of travelling through channels: only the timings shall be incidentally affected. While pure dataflow models (without conditional control) are guaranteed to be conflict-free, this property is also preserved when conditional control which bears only on internal local conditions (and not the state of connected buffer channels) are introduced. This is the case for general Kahn Process Networks, but also for socalled Boolean DataFlow [18] (BDF) graphs, which introduce two specific routing nodes for channel multiplexing and demultiplexing. We shall here name these nodes Merge and Select respectively (other authors use a variety of different names, and we only save the name Switch for the general feature of switching between dataflow streams according to a conditional switch pattern). This is also the case in CycloStatic Data Flow (CSDF) [9] graphs, where the weights allowed in SDF are now allowed to change values according to a statically pre-defined cyclic pattern; while some weights are allowed to take a null value, Merge and Select may be encoded directly in CSDF. While the switching patterns in BDF are classically supposed to be either dynamically computed or abstracted away in some deterministic condition (as for Kahn Process Networks), one can consider the case where the syntactic simplicity of BDF
2
Formal Modeling of Embedded Systems
45
Fig. 2.2 Syntactic expressivity of MG, SDF, CSDF, BDF and KRG
(only two additional Merge/Select node types) are combined with the predictive routing patterns of CSDF. This led to our main contribution in this chapter, which we call K-periodically Routed Graphs (KRGs). Because the switching patterns are now explicit, in the form of ultimately periodic binary words (with 0 and 1 referring to the two different switching positions), the global behaviors can now be estimated, analyzed and optimized. We shall focus in some depth on the algebraic and analytic properties that may be obtained from such a combination of scheduling and routing information. The syntactic inclusions are depicted in Fig. 2.2. To conclude, one global message we would like to pass to the reader is that Process Networks (at least in the conflict-free case) can indeed be seen as supporting two types of semantics: self-timed before scheduling and routing, synchronous (possibly multirate) after scheduling and routing have been performed. The syntactic objects representing actual schedules and routing patterns (mostly in our case infinite ultimately periodic binary words) should be seen as first-class design objects, to be used in the development process for design, analysis, and optimization altogether.
2.1.4 Example We use the Kalman filter as supporting example through this chapter. This filter estimates the state of a linear dynamic system from a set of noisy measurements. Kalman filters are used in many fields, especially for guidance, positioning and radar tracking. In our case, we consider its application to a navigation system coupling Global Positioning System (GPS) and Inertial Navigation System (INS) [20]. Actually, we do not need to understand fully how the Kalman filter works, what really matters is to understand the corresponding block diagram shown in Fig. 2.3. This block diagram shows data dependencies between operators used to build the Kalman filter. Boxes denote data flow operators: for instance, the left top box is an array of multiply-accumulates (MACs), with two input flows of 2D arrays of size Œn W n, and an output flow of 2D array of size Œn W n. The number of states (position, velocity, etc.) is denoted by n. The number of measurements (number of satellites in view) is denoted by m. Arcs denote communication channels and flow dependencies. Initial data are shown as gray filled circles.
46
J. Boucaron et al. from Œn W n
MAC (array)
Œn W n
MAC from (array) Œn W n Œn W n
Œn W n
C (array)
MAC (array)
from Hk Œm W n
Œn W n Œn W m from In Œn W n
(array)
Œn W n from Hk Œm W n
MAC (array)
MAC (array) Œn W 1
from Q Œn W n
Œn W n MAC (array)
from Œn W n
MAC from Hk Œm W n (array) Œm W m C from R (array) Œm W m Œm W m
(array)
MAC from Hk Œm W n (array) Œm W 1 from ızk Œm W 1 (array) Œm W 1 MAC (array) Œn W 1 C (array) Œn W 1
Œn W m to xO kjk
Fig. 2.3 Block diagram of the Kalman filter
We refer the interested reader to Brown et al. [16] and Campbell [20] for details on the Kalman filter. Just notice that Hk is the GPS geometry matrix, ızk is a corrected velocity with INS data, and xO kjk is the estimated current state.
2.2 Pure Dataflow MoCCs Pure dataflow MoCCs are models where the communication topology of the system is static during execution, and data follow same paths during each execution. There exists a partial order over events in such system which leads to a deterministic concurrent behavior. First, we present the simplest pure data flow MoCC with the synchronous paradigm. After, we detail Latency-Insensitive Design. Then, we detail Marked Graphs and Synchronous Data Flow.
2.2.1 Synchrony The synchronous paradigm is the de facto standard for digital hardware design: Time is driven by a clock. At each clock cycle, each synchronous module samples all its inputs and pro-
duces results on all its outputs.
2
Formal Modeling of Embedded Systems
47
Computations and communications are instantaneous. Communications between concurrent modules are achieved through signals
broadcast throughout the system. Synchrony requires that designs are free of instantaneous loops (called also causality cycles): each loop shall contain at least one memory element (e.g., latch or flip-flop), such that when all memory elements are removed the resulting graph is acyclic. This implies a partial order of execution among signals at each instant. The behavior is deterministic with respect to concurrency. Synchrony is control-free: control can be encoded through if-conversions: a control dependency is turned into a data dependency using predicates. For instance, “if (a) b D c else b D d ” is transformed into b D .a ^ c/ _ .:a ^ d /. From a digital hardware view, the previous code is equivalent to a 2-to-1 multiplexer with conditional input a, with data inputs c and d , and with data output b. Example 1. Implementing the Kalman filter as a synchronous system is straightforward: for each box in the initial block diagram in Fig. 2.3, we assign a corresponding set of combinatorial gates; for each initial amount of data, we assign a set of memory elements (latch, flip-flop). After, we check absence of causality cycles: all circuits (in the meaning of graph theory) of the design must have at least one memory element. In real-life, computation and communication take times. When we implement a synchronous design, performance bottlenecks are caused by highest delay combinatorial paths. Such paths are called critical paths, they slow down the maximum achievable clock rate. Using multiple clocks with different rates can help implement more efficiently a design, given a set of constraints on performances and resources: such systems are sometimes called multiclock. A multiclock implementation introduces synchronizers for crossing multiple clock domains, synchronizers resample a single crossing clock domain. It exists also polychronous implementations that introduce logical clock domains. Logical clock in this context represents an activation condition for a module given by a signal, such signal abstracts a measure of time that can be multi-form: a distance, a fixed number of elements, etc. For instance, our block diagram has different data path widths (e.g., n W n, n W m, m W m), we can assign to each one a logical clock corresponding to its data width. Polychronous modules are connected together using logical synchronizers that enable or disable the module with respect to availability of input data, output storage, etc. We refer the reader to Chaps. 1, 5 and 6 for further details.
Further Reading Synchronous languages [7] have been built on top of synchronous and/or polychronous hypotheses. These include Esterel [66], Syncharts [3], Quartz [70],
48
J. Boucaron et al.
Lustre [25], Lucid Synchrone [26, 27], and Signal [53]. They are mainly used for critical systems design (avionics, VLSI), prototyping, implementation and formal verification [10]. Tools such as SynDEx [43] have been designed to obtain efficient mapping and scheduling of synchronous applications while taking into account both computation and communication times on heterogeneous architectures. Synchrony implicitly corresponds to the synthesizable subset of hardware description languages such as (System-)Verilog, VHDL or System-C. Logic synthesis tools [37] that are daily used by digital hardware designers rely on synchrony as formal MoCC. This model allows to check correctness of non-trivial optimizations [57,72]. It enables to check design correctness using for instance formal verification tools with model-checking techniques [17, 62, 67]. Synchronous systems can contain false combinatorial circuits as detailed in [69]. This is a very interesting topic, since such circuits are needed to obtain digital designs with smaller delay and area. Correctness of such design is checked using tools for formal verification, based on binary decision diagrams [17, 54] or SAT solvers [38].
2.2.2 Latency-Insensitive Design In VLSI designs, when geometry shrinks, gate delays decrease and wire delays increase due mainly to increases in resistivity as detailed in Anceau [2]. In today’s synchronous chips, signals can take several clock cycles to propagate from one corner of the die to the other. Such long global wires are causing timing closure issues, and are not compatible with the synchronous hypotheses of null signal propagation times. Latency-Insensitive Design (LID) [21] (also known as Synchronous Elastic in the literature) is a methodology created by Carloni to cope with such timing closure issues caused by long global wires. LID introduces a communication protocol that is able to handle any latency on communication channels. LID ensures the same behavior of the equivalent synchronous design, modulo timing shifts. LID enables modules to be designed and built separately, and for them to be linked together with the Latency-Insensitive protocol. The protocol requires patient synchronous modules: behaviour of a patient module only depends on signal values, and not on reception times; i.e., given an input sequence, a patient module always produces the same output sequence, whatever arrival instants are. The composition of patient modules is itself a patient module as shown in Carloni et al. [22]. This requirement is a strong assumption not fulfilled by all synchronous modules that need in such case a Shell wrapper. LID is built around two building blocks: Shells. A Shell wraps each synchronous module (called pearl in LID jargon) to obtain a patient pearl. Shell function is two-fold: (1) it executes the pearl as soon as all input data are present and if there is enough storage to receive results in all
2
Formal Modeling of Embedded Systems
49
downward receivers (called Relay Stations). (2) it implements part of the latency insensitive protocol, it stores and forwards downward backpressure due to traffic congestion. Backpressure consists of stalling an upward emitter until a downward receiver is ready to store more data. Relay Stations. A relay-station is a patient module: it implements the storage of results – a part of the Latency-Insensitive protocol – through a backpressure mechanism. Actually, a chain of Relay Stations implements a distributed FIFO. The minimal buffering capacity of a Relay Station is two pieces of data to avoid throughput slow-down as detailed in Carloni et al. [22]. There is at most one initial data in each Relay Station. Wires having a delay greater than one clock cycle are split in shorter ones using Relay Stations until reaching timing closure. The core assumption in LID is that pearls can be stalled within one clock cycle. Block placements and communication latencies are not known, they have to be estimated during floor-planing, and refined further during placement and routing. Different works [11, 13, 15, 21, 28, 33, 75, Harris, unpublished] have addressed how to implement LID Relay Stations and Shells using the backpressure flowcontrol mechanism. This solution is simple and compositional, but it suffers from a wiring overhead that may be difficult to place and route in a design. Example 2. Implementing the Kalman filter using LID is similar to the synchronous implementation. Operators, or sets of operators – arrays of multiply-accumulates (MACs) for instance – are wrapped by Shells. Long wires are split by Relay Stations. We assume there is at least one Relay-Station in each circuit of the design for correctness purpose. The reader should notice that LID is a simple polychronous system, where there is the same physical global clock and logical clocks generated by Shells and Relay Stations to execute each Pearl.
Further Readings The theory is described in detail in [22]. Performance analyses are discussed in [14,15,19,23,24,29]. Formal verification of Latency Insensitive systems is described in [14,73,74]. Some optimizations are described in [14,15,19,23,24,29]. LID is also used in network fabrics (Networks on Chips), as described in [34, 41, 45]. Handling of variable latencies is described in [5, 6, 30].
2.2.3 Marked Graphs Marked Graphs (MG) [32] are a conflict-free subset of Petri nets [65], also known as Event Graphs in the literature. MGs are widely used, for instance in discrete events simulators, modeling of asynchronous designs or job shop scheduling.
50
J. Boucaron et al.
We briefly introduce its definition and operational behavior. We introduce results on deadlock freeness, and results on static schedulability and maximum achievable throughput. Then, we briefly recall Timed Marked Graphs, and we recall the case of Marked Graphs with bounded capacities on places. Then, we show the impact of such capacities on the throughput of a MG with capacities. After, we show two optimizations on MG. First, an algorithm on MGs with capacities that computes the capacity of each place such that the graph reaches the maximum achievable throughput of its equivalent MG without capacities. Next, we provide a throughputaware algorithm called equalization that slows down the fastest parts of the graph while ensuring the same global throughput. This algorithm provides best return on investment locations to alter the schedule of the system. This can allow for instance to postpone an execution of a task to minimize power hotspots. Definition 1 (Marked Graph). A Marked Graph is a quadruplet hN ; P; T ; M0 i, such that:
N is a finite set of nodes. P is a finite set of places. T .N P/ [ .P N / is a finite set of edges between nodes and places. M0 W P ! N is the function that assigns an initial marking (quantity of data abstracted as tokens) to each place.1
In a MG, nodes model processing components: they can represent simple gates or complex systems such as processors. Each place has exactly one input and one output and acts as a point-to-point communication channel such as a FIFO. The behavior of a MG is as follows: When a computation node has at least one token in each input place, then it can
be executed. Such node is said enabled or activated. If we execute the node, then it consumes one token in each input place and pro-
duces one token in each output place. A MG is deterministic in case of concurrent behaviour. A MG is confluent: for all sets of activated nodes, the firing of any node does not remove another node from this subset than itself. This leads to a partial order of events. We recall some key results on Marked Graphs from the seminal paper of Commoner et al. [32], where associated proofs can be found. Lemma 1 (Token count). The token count of a circuit does not change by node execution. Theorem 1 (Deadlock-freeness). A marking is deadlock-free if and only if the token count of every circuit is positive.
1
We recall that N D f0; 1; 2; : : : g and N D Nn f0g.
2
Formal Modeling of Embedded Systems
51
Fig. 2.4 Coarse-grain abstraction of the Kalman filter as a MG
Example 3. Figure 2.4 shows a basic model of the Kalman filter as a MG. The graph has five circuits: three in the left strongly connected component, and two in the right one. They all contain one token: the system is deadlock free.
2.2.3.1
Static Schedulability and Throughput
Now, we recall useful results on static schedulability of MG and how to compute its throughput. Static schedulability is an interesting property that enables to compute a performance metric at compilation time. The As Soon As Possible (ASAP) firing rule is such that when a node is enabled, it is executed immediately. The ASAP firing rule is also known as earliest firing rule in literature. Since a MG is confluent, any firing rule will generate a partial order of events compatible with the ASAP firing rule. The ASAP firing rule generates the fastest achievable throughput for a system. Definition 2 (Rate). Given a circuit C in a MG. We denote the node count of the circuit C as L .C/, and the token count of the circuit C as M0 .C/. We denote the rate 0 .C/ . of the circuit C as the ratio ML.C/ Notice that circuits of a strongly connected component have side effects on each others: circuits with low rates slow down those with higher rates. Theorem 2 (Throughput). The throughput of a strongly connected graph equals the minimum rate among its circuits. The throughput of an acyclic graph is 1. Proof. Given in [4]. Strongly connected graphs reach a periodic steady regime, after a more-or-less chaotic initialization phase. These MGs are said k-periodic and are statically schedulable as shown in [4]. Definition 3 (Critical circuit). A circuit is said critical if its rate equals the graph throughput.
52
J. Boucaron et al.
2.2.3.2
Timed Marked Graphs
Timed Marked Graphs (TMG) are an extension of MG introduced by Ramchandani [68] where time is introduced through weights on places and nodes. They represent abstract amounts of time needed to move the set of tokens from inputs to outputs. Definition 4 (Timed Marked Graph). A Timed Marked Graph is a quintuplet hN ; P; T ; M0 ; Li, such that: hN ; P; T ; M0 i is a Marked Graph. L W N [ P ! N is a function that assigns a latency to each place and computa-
tion node. Ramchandani showed that TMGs have the same expressivity as MGs [68]. He provides a simple transformation from a TMG to an equivalent MG. The transformation is shown in Fig. 2.5 for both a place with a latency b, and a node with a latency c. The place of latency b is expanded as a succession of the pattern of a node followed by a place with a unitary latency. The succession of this pattern has the same latency b. The transformation for the node of latency c is similar, with a succession of the pattern of a unitary latency place followed by a node. All theoretical results on MG hold for TMGs.
2.2.3.3
Bounded Capacities on Places
Real-life systems have finite memory resources. We introduce place capacities in MG to represent the maximal number of data that can be stored in a place: we add to Definition 4 the function K W P ! N that assigns to each place the maximum number of tokens it can hold.
a
L .a/ D 0
a
L .b/ D m
b
L .c/ D n
c
Fig. 2.5 Example of latency expansion
b
m1 nodes m places
c
nC1 nodes n places
or
2
Formal Modeling of Embedded Systems
53
A MG with capacities can be transformed in an equivalent one without capacities through introduction of complementary places [4]: a place p1 with a capacity K .p1 / from node n1 to node n2 is equivalent to a pair of places p2 and p2 , where p2 is from n1 to n2 , and p2 is from n2 to n1 . We write P for the set of complementary places of P and for T their corresponding arcs. Definition 5 (Complemented graph). Given G D hN ; P; T ; M0 ;˝Ki a connected˛ MG with finite capacities, we build the complemented graph G 0 D N 0 ; P 0 ; T 0 ; M00 as follows: N0 D N; P 0 D P [ P; T0 D T [T; 0 8 p 2 P; M0 .p/ D M0 .p/ ; and M00 .p/ D K .p/ M0 .p/ : The hint for the proof of correctness of this transformation is as follows: when we introduce a complementary place, we introduce a new circuit holding the place and the complementary one. This new circuit has a token count equal to the capacity of the original place. Since token count is invariant by node execution, both places cannot have more tokens than the capacity of the original place. In a regular MG, a node can produce data when it has one token on all inputs. It stops only when an input is missing. In MG with finite capacities, nodes are awaiting on inputs and also on outputs. This means that a finite capacity graph can slow down the maximum achievable throughput of the topological equivalent regular graph. Example 4. Initially in Fig. 2.6, the graph has two circuits: the left circuit has a rate of 5=5 D 1, and the right one has a rate of 4=5. The capacity of each place equals two. After application of the transformation, the transformed graph without capacities has more circuits due to complementary places. In particular, there is
Fig. 2.6 A strongly connected TMG with unitary latencies whose throughput is limited to 3=4 due to 2-bounded places. Plain places belong to the graph with finite capacities; we only show the most significant complementary places with dotted lines
54
J. Boucaron et al.
a new circuit whose throughput equals 3=4. This circuit slows down the system throughput, due to lack of capacity on places. Notice that the property of acyclicity loses its soundness when considering capacity-bounded graphs. Actually, introducing complementary places creates circuits. An acyclic graph can have a throughput lower than 1. The throughput of a connected graph with finite buffering resources equals the minimum rate among the circuits of its complemented equivalent. Corollary 1 (Throughput w.r.t. capacities). Let ˝ G D hN ; P; T˛ ; M0 ; Ki be a connected MG with finite capacities, and G 0 D N 0 ; P 0 ; T 0 ; M00 its complemented graph. The maximum reachable throughput .G/ of G is .G/ D min0 C2G
M00 .C/ ;1 : N 0 .C/
(2.1)
Proof. Given in [4].
2.2.3.4
Place Sizing
Previously, we have seen how to compute the throughput of a MG and the throughput of a MG with bounded capacities. The first one only depends on communication and computation latencies, while the second one can be lower due to capacity bounds. Now, we address the issue of place sizing to reach the maximal throughput of the graph. Our goal is to minimize the overall sum of place capacities in the MG. We state it as the following Integer Linear Programming (ILP) problem (a similar ILP formulation for the same problem is given in Bufistov et al. [19]): minimize
X
k .p/ ;
(2.2)
p2P
where P is the set of places. It is subject to the following set of constraints in the complementary graph: for each circuit C that contains at least a complementary place, P
P P tokens tokens C k P P critic 0; latencies latencies critic
(2.3)
where the left part of the constraint is the throughput of the circuit C we consider, and the right part is the maximum achievable throughput of the graph without capacities. This program states that we want to P minimize global capacity count: we add additional capacities to places through k in the marking of complementary places found in each circuit C, until we reach the maximal throughput of the graph. We assume k 0.
2
Formal Modeling of Embedded Systems
55
Now, we provide the algorithm to compute minimum capacities for each place in order to reach the maximum throughput of the system: 1. Compute the maximum throughput of the not-complemented graph using, for instance, the Bellman–Ford algorithm, or any minimum cycle mean algorithm [36, 49]. 2. Build the complemented graph using previous transformation in Definition 5. 3. Enumerate all directed circuits having at least one complementary place in the complemented graph. We can use Johnson’s algorithm [46] with a modification: when a circuit is found, we check that the circuit contains at least one complementary place. 4. Build and solve the previous formulation of the ILP. As the reader understands this optimization does not come for free. Additional resources are needed to reach the maximum throughput of the system.
2.2.3.5
Equalization
Now, we recall an algorithm called equalization [14]. Equalization slows down as much as possible the fastest circuits in the graph, while maintaining the same global system throughput. This transformation gives hints on potential slack. Such slack can be used to add more pipeline stages while ensuring the same performance of the whole system. It can be used to postpone an execution, for instance to smooth dynamic power and flatten temperature hot-spots. We provide a revised equalization that minimizes the amount of additional latency to minimize resource overhead. Additional latency here is a dummy node followed by a single place attached to an existing place. The problem is stated as the following Integer Linear Program: maximize
X
weight .p/ a .p/ ;
(2.4)
p2P
where a .p/ are additional latencies assigned to each place p, and weight.p/ is the sum of occurrences of place p in all circuits of the graph. We have the following set of constraints assigned to each non-critical circuit C: P tokens tokens P P P critic 0; latencies C a critic latencies P
(2.5)
where the left part of the constraint is the fast circuit that we slow down with an amount of additional latencies a, and the right part is the critical throughput of the MG. We also ensure that we have a.p/ 0. The weight found in the objective is used to force to choose places shared by different circuits to minimize additional latencies.
56
J. Boucaron et al.
The algorithm is as follows: Compute the maximum throughput of the MG using the Bellman–Ford algo-
rithm, or any minimum cycle mean algorithm Enumerate all directed circuits using for instance Johnson’s algorithm Build and solve previous formulation of the ILP
Links with Other MoCCs MG is a self-timed system; there is no global clock as in Synchrony, each component has its own clock. MG is a polychronous system where each component clock is driven by presence of tokens on inputs and its firing rule. Synchrony is a special case of MG running under an ASAP firing rule. The transformation from Synchrony to MG is as follows: we assign to each node in the MG a directed acyclic graph of combinatorial gates; we assign to each place a corresponding memory element (flip-flop, latch) and we put an initial token. The obtained MG behaves as a synchronous module, at each instant all nodes are executed as in the synchronous model. LID is somehow a special case of MG with bounded capacities. A Shell behaves as a node does: both require data on all their inputs and enough storage on their outputs to be executed. Relay-stations carry and store data in-order as places. But not all implementations of LID behave like a MG, especially when there are some optimizations done on the Shell where we can use its buffering capacity to extend further the capacity of input Relay Stations: this can avoid further slow-down of the system.
2.2.4 Synchronous Data Flow Synchronous Data Flow (SDF) also known as Weighted Event Graphs in the literature, has been introduced by Lee and Messerschmitt [55,56]. SDF is a generalization of previous Marked Graphs, where a strictly positive weight is attached to each input and output of a node respectively. Such weight represents how many tokens are consumed and produced respectively, when the node is executed. SDF is a widely used MoCC, especially in signal processing. It describes performance critical parts of a system using processes (actors) with different constant data rates. SDF is confluent and deterministic with respect to concurrency. The weight in SDF raises the problem of deciding whether places are bounded or not during execution: if not, there is an infinite accumulation of tokens. Lee and Messerschmitt [56] provides an algorithm to decide if a SDF graph is bounded, through balance equations. The idea is to check if there exists a static schedule where token production equals token consumption.
2
Formal Modeling of Embedded Systems
57
Balance equations are of the form: weight .n/ executions .n/ weight n0 executions n0 D 0;
(2.6)
where n and n0 are respectively the producer and the consumer of the considered place, and executions are unknown variables. To solve this equation, we can use a Integer Linear Programming solver, or better yet, a Diophantine equation solver [64]. If there is no solution, then there exists an infinite accumulation of tokens in the SDF graph. Otherwise, there exists an infinite number of solutions and the solver gives us one solution. This solution holds the number of occurrences of execution for each node during the period of the schedule. After boundedness check, we can perform a symbolic simulation to compute a static schedule using the previous solution. We can check for deadlock-freedom and compute place capacities. Example 5. Figure 2.7 represents a SDF graph corresponding to the Kalman filter (Fig. 2.3), where we consider n D 8 states and m D 5 measurements. Balance equations solving gives the trivial solution of one execution for each node during the period.
Further Readings Several scheduling techniques have been developed to optimize different metrics such as throughput, schedule size, or buffer requirement [8, 42]. Some results on deadlock-freeness are found in Marchetti et al. [60, 61].
64 64
64 64
64
8
40 40
8
5
64
5 5
40
5
40 40 8
25
64 40
25
40
40
40
Fig. 2.7 SDF graph of the Kalman filter .m D 5; n D 8/
8
8 8
58 Fig. 2.8 Expressiveness of Pure DataFlow MoCCs
J. Boucaron et al.
Synchronous DataFlow Marked Graphs Latency Insensitive Design Synchrony
Retiming is a commonly used optimization in logic synthesis [57]. In case of SDF, retiming is used to minimize the cycle length or maximize the throughput of the system [59,63]. The correctness of the retiming algorithm uses a useful transformation from a bounded SDF graph to an equivalent MG described in Bhattacharyya et al. [8]. StreamIt [1, 48, 52, 71, 76] is a language and compiler for stream computing where SDF is the underlying MoCC. StreamIt allows exposing concurrency found in stream programs and introducing some hierarchy to improve programmability. The StreamIt compiler can exploit this concurrency for efficient mapping on multicore, tiled and heterogeneous architectures.
Summary This section provides an overview of pure dataflow MoCCs. Expressiveness of pure dataflow MoCCs is described in Fig. 2.8. Such MoCCs have the following properties: Deterministic behavior with respect to concurrency. Decidabibility on bounded buffers during execution: structural in case of
Latency-Insensitive Design and Marked Graphs; using balance equations algorithm in case of Synchronous Data Flow. Static communication topology, that leads to a partial order of events. Static schedulability: the schedule of a bounded system can be computed at compilation time with needed buffer sizes. Deadlock-freedom checks using a structural criteria in case of LatencyInsensitive Design and Marked Graph; or using bounded length symbolic simulation in case of Synchronous Data Flow.
2.3 Statically Controlled MoCCs Previously, we introduced pure dataflow MoCCs where the communication topology is static during execution. Such topology does not allow the reuse of resources to implement different functionality.
2
Formal Modeling of Embedded Systems
59
Now, we introduce statically controlled MoCCs where the communication topology is dynamic and allows resources to be reused. Static in this context means that the control (routing of data in our case) is known at compilation time. The main interest in such statically controlled MoCCs is that it allows greater expressivity while conserving all key properties of pure dataflow MoCCs. Such MoCCs are deterministic and confluent. We can decide at compilation time if buffers are bounded during execution. We can also check if the system is deadlock free. We can statically schedule a bounded system to know its throughput and the size of its buffers. Such MoCCs also generate a partial order on events: for instance, this allows us to check formally if a transformation applied on an instance of such MoCC is correct. We briefly recall the MoCC Cyclo-Static Dataflow (CSDF). Then, we detail K-periodically Routed Graphs (KRG).
2.3.1 Cyclo-Static Dataflow Graphs Cyclo-Static DataFlow (CSDF) [9, 40] extends SDF such that weights associated to each node can change during its execution according to a repetitive cyclic pattern. Actually, CSDF introduces control modes in SDF, also called phases in the literature. The behavior of a CSDF graph is as follows: A node is enabled when there is at least the amount of tokens given by the index
on the repetitive cyclic pattern associated on each input. When a node is executed then it consumes and produces respectively the amount
of tokens given by the index on each cyclic pattern according to the associated input and output respectively. Finally, we increment the index of each pattern modulo its length to prepare the next execution. Example 6. Figure 2.9a shows how we can model a branch node in CSDF. When we start the system, every index is 0. When this node is executed, it always consumes one input token since the pattern is .1/. But this is different for outputs: at first execution, it produces only one token on the upper output since the index of the
.1; 0; 0; 0/ .1/
.0; 0; 0; 1/ .1/
.1; 0/
.1/
.1/
.0; 1; 1; 1/
.1; 1; 1; 0/
.0; 1/
(a) Branching Fig. 2.9 Routing examples in CSDF
.1/
(b) Loop
60
J. Boucaron et al.
pattern is 1, and the lower output index is 0. At the end of this first execution, we increment all indices of the node. Now, we execute this node a second time. It does not change for the input still having the index on 1, but it changes for outputs: now, the upper output produces 0 tokens, and the lower one produces 1 token. We increment all index of the node, and we get back to the same behaviour as in the first execution. Figure 2.9b shows how to model a simple for loop: when an input token arrives on the upper left input, then we start the loop where the token loops three times in the circuit before exiting. CSDF is confluent and deterministic with respect to concurrency. It can be decided if memories are bounded, and if so the CSDF graph is statically schedulable. Static routing can also be modeled using CSDF, although it is implicit: routing components are denoted as computation nodes.
2.3.2 K -Periodically Routed Graphs In CSDF, it is difficult to check correctness of a transformation applied on the topology of a graph. The issue is that we do not know clearly whether a node performs a computation, whether it routes data, or a combination of both. KRG has been introduced to tackle this issue. KRG is inspired by CSDF; the key difference is that nodes are split in two classes as done in BDF: one for stateless routing nodes, and another one for computation nodes. In KRG, only routing nodes have cyclic repetitive patterns as in CSDF, while computation nodes behave as in MG. In KRG, there are two kinds of routing nodes called Select and Merge, akin to demux and mux. Such routing nodes have at most two inputs or two outputs respectively, this allows the use of binary words to describe the cyclic repetitive pattern instead of positive integers. This section will be longer and more technical than previous ones. After the definition of KRG, we show decidability of buffer boundedness through an abstraction to SDF. Then, we show how we can check deadlock-freedom through symbolic simulation, and also through the use of a dependency graph. Such dependency graph leads to a similar transformation from a SDF graph to a MG: such transformation is needed for instance to apply a legal retiming in case of SDF. Finally, we present KRG routing transformations on routing nodes, and show they are legal through the use of on and when operators applied on routing patterns. Before providing the definition of a KRG, we recall some definitions borrowed from the n-synchrony theory [31]:
B D f0; 1g is the set of binary values. B is the set of finite binary words. BC is the set of such words except the empty word ". B! is the set of infinite binary sequences. We write jwj for the length of the binary word w.
2
Formal Modeling of Embedded Systems
61
We write jwj0 and jwj1 respectively for the number of occurrences of 0 and 1
in w. We write wi for the i th letter of w; we note whead D w1 and wtail such that
w D whead :wtail . We write Œwi for the position of the i th “1” in w.
For instance, Œ01011011104 D 7. As a rule, Œ01 D C1.
A sequence s in B! is said ultimately periodic if and only if it is of the form
u:v! , where u 2 B and v 2 BC . We call u the initial part and v the steady (or periodic) part. We write Ppk for the set of ultimately periodic binary sequences (or k-periodic sequences), with a steady part of p letters, including k occurrences of 1. We call k the periodicity and p the period. We write P for the set of all ultimately periodic sequences. Definition 6 (K-periodically Routed Graph). A K-periodically Routed Graph (KRG) is a quintuple hN ; P; T ; M0 ; Ri, where: N is a finite set of nodes, divided in four distinct subsets:
– Nx is the set of computation nodes. – Nc is the set of stateless copy nodes. Each copy node has exactly one input and at least two outputs. – Ns is the set of select nodes. Each select node has exactly one input and two outputs. – Nm is the set of merge nodes. Each merge node has exactly two inputs and one output.
P is a finite set of places. Each place has exactly one producer and one consumer. T .N P/ [ .P N / is a finite set of edges between nodes and places. M0 W P ! N is a function assigning an initial marking to each place. R W Ns [ Nm ! P is a function assigning a routing sequence to each select and merge node.
Places and edges model point-to-point links between nodes. A KRG is conflict free. Given a place p, we denote p and p its input and output node respectively. Similarly for a node n, we denote n and n its set of input places and output places. The behavior of a KRG is as follows: A computation node has the same behavior as in MG: when a token is present on
each input place, the node is enabled. An enabled node can be executed. When executed it consumes and produces one token on each input place and output place respectively. A copy node is a special case of stateless computation node where tokens on the input place are duplicated on each output place.
62
J. Boucaron et al. ızk
INS velocity
Hk 0
Kalman filter
9 ! 0 :1
1
1
10
1
0 1
1
1
9
1
9
1
C 1 9 ! 1:0
0 1
C
1
1
10
(a) KRG of the system
corrected velocity
1 1
(b) SDF abstraction of the KRG
Fig. 2.10 Correcting INS velocity using GPS position
A select node splits a token flow in two parts. It is enabled when there is a token
on its unique input place; when executed it consumes the input token and it produces a token on one of its output places. The output of production is given by the index of the routing pattern: if the index points to a 1 or 0 respectively, then it produces on the output labeled 1 or 0 respectively. At the end of the execution, the index of the routing pattern is incremented modulo the length of the steady part of the routing pattern. See Fig. 2.9 and Example 6 for a similar select node in CSDF. A merge node interleaves two token flows into one. It is enabled when a token is available on the input place pointed to by the index of the routing pattern. When executed, the input token is consumed and routed to the output place. At the end of the execution, the index is incremented modulo the length of the steady part of the routing pattern. Example 7. We come back to our Kalman filter example. Now we integrate it into a navigation system that computes the velocity of a vehicle, as shown in Fig. 2.10a. Inertial Navigation Systems suffer from integration drift: accumulating small errors in the measurement of accelerations induces large errors over time. INS velocity is corrected at a low rate (1 Hz) with integrated error states based on GPS data at higher rate (10 Hz). Integrating ten error states is performed with a for loop: at the beginning of each period, the merge node consumes a token on its right input, when result is reset to 0. Then, data loop through the left input of the merge node for the next nine iterations. Conversely, the select node is the break condition: the nine first tokens are routed to its left output, while the tenth exits the loop. Its SDF abstraction is given in Fig. 2.10b.
2.3.2.1
Buffer Boundedness Check
As in CSDF, buffer boundedness is decided through an SDF abstraction. We solve balance equations of the SDF graph in order to compute node firing rates and buffer bounds.
2
Formal Modeling of Embedded Systems
63
Given a KRG, the corresponding SDF graph is constructed as follows: The communication topology of the graph is identical for places and edges. Computation and copy nodes are abstracted as SDF nodes with a weight of 1 on
all inputs and all outputs. Select and merge nodes are also abstracted as SDF nodes where the weight of
each labeled input or output corresponds to the number of occurrences of the label found in the steady part of the routing pattern. In case of unlabeled input or output, the associated weight is the length of the steady part of the routing pattern. In Fig. 2.10, we show the result of the application of this abstraction reduction. Solving balance equations to show that the KRG is bounded is left as an exercise.
2.3.2.2
Deadlock-Freedom Check
Given a bounded KRG deadlock-freedom only depends on its initial markings. We can use a bounded length symbolic simulation with the following halting conditions: Deadlock: Simulation ends when no node is enabled. Periodic and live behavior: Simulation ends when we found a periodic behavior:
we get back to an already reached marking at index x in the list of reached markings, then we check for equal markings: Œx W x C j 1 Œx C j W x C j C j 1, where j is the distance to the new marking in the list of reached markings. If we have equal markings, then we have found a periodic behavior Œx W x C j 1 with an initialization from Œ0 W x 1.
2.3.2.3
Static Schedulability and Throughput
KRG is confluent due to the conflict freedom property on places: when a node is enabled, it remains enabled until executed. We can introduce latency constraints in our model; we call the result Timed KRG. We define a new function associated to a KRG: L W N [ P ! N is a latency function, assigning a positive integer to places and nodes. Latencies of copy, merge or select nodes are supposed to be null. N.B.: We assume that there is no circuit where the sum of latencies is null, otherwise we have a combinational cycle as in synchrony. Computation and communication latencies are integer and possibly non-unit. However, the synchronous paradigm assumes evaluations at each instant. This problem is solved by graph expansion to KRG, in a similar way as in TMG [68]: for each node with non-null latency (resp. place with latency higher than 1), it is split using intermediate nodes with null latency and places with unit latency. A nonreentrant node can be modeled with a data dependency (or loop) [4], as shown in Fig. 2.5.
64
J. Boucaron et al. ızk
INS velocity
Hk
! Kalman 0 .100/9 1 filter ! 00 .100/9 1
9 ! 0 :1
27 ! 10 ! 000 .100/9 1
0 1
0
C 9 ! 1:0
0 1
! 0 .100/9 1
C
! 04 027 1
corrected velocity
Fig. 2.11 Example of KRG scheduling, in bold font. We have arbitrarily set place latencies to 1 and node latencies to 0
Timed KRG have the same expressivity as KRG. Similarly as found in MG, we introduce an ASAP firing rule such that when any node is enabled, it is executed immediately. Since a KRG is confluent any firing rule will generate a partial order of event compatible with the ASAP firing rule. The ASAP firing rule generates the fastest achievable throughput for a given KRG. Definition 7. The rate of a binary word w is defined as rate .w/ D 1 if w D ", and rate .w/ D jwj1 =jwj otherwise. The rate of a k-periodic sequence s D u:v! equals the rate of its steady part: rate .s/ D rate .v/. Definition 8. The throughput of a node is the rate of the steady part of its schedule. The throughput of a KRG is the list of all input and output node throughputs. Unlike Marked Graphs, the throughput of a KRG may differ from the throughput of its slowest circuit (or path). Example 8. We introduce latencies on the KRG in Fig. 2.11. We assume here that place latency equals 1, and node latency equals 0. Bold sequences next to each node are their respective schedules, with an ASAP firing rule and with respect to boundedness constraints.
2.3.2.4
Dependency Analysis
Dependency analysis [51] is one of the most useful tool for compilation and automatic parallelization [35]. Some techniques can be transposed to KRG for the problematic of token flow dependencies. Dependency analysis in our case can provide an expansion from a bounded KRG to a MG with the same behavior. Such transformation is useful to check if a transformation applied on a KRG is legal or not. This transformation can also be used to show if two instances of a KRG have the same behavior modulo timing shifts. First, we introduce some definitions on relations and ordering properties on token flows. Next, we explain data dependencies in token flows and their representations as dependency graphs. Then, we show how to build a behavior-equivalent MG.
2
Formal Modeling of Embedded Systems
65
Definition 9 (Token flow). Let E be the set of tokens passing through a node n or a place p of a KRG within a period. Sequential productions then consumptions define a total order 0 labels a skip that materializes a synchronized sequence of actions. The interpretation ŒŒAs;m;g;E D hhP iin;h;F of an action A (Fig. 3.13) takes as parameters the state variable s, the state m of the current section, the guard g that leads to it, and the environment E. It returns a process P , the state n and guard h of its continuation, and an updated environment F . The set of variables defined in E is written V .E/. We write usegE .x/ for the expression that returns the definition of the variable x at the guard g and defgE .x/ for storing the final values of all variables x defined in E at the guard g. usegE .x/ D if x 2 V .E/ then hhE.x/ii else hh.x pre 0/ when gii; Q defg .E/ D x2V .E / x D usegE .x/ :
3
Synoptic: A DSML for Space On-board Application Software
103
ŒŒ do Art D hh.P j s sync t j rD.sD0// =sii where hhP iin;h;F DŒŒAI end s;0;..s pre0/D0/;; ŒŒ end s;n;g;E D hhsD0 when g j defg .E/ii0;0;; ŒŒ skip s;n;g;E D hhsDn C 1 when g j defg .E/iinC1;..s pre 0/DnC1/;;
ŒŒxŠs;n;g;E D hhxD1 when giin;g;E ŒŒxDyfzs;n;g;E D hhxDeiin;g;Ex ]fx7!eg where eDhhf .usegE .y/; usegE .z// when gii ŒŒAI Bs;n;g;E D hhP j QiinB ;gB ;EB where hhP iinA ;gA ;EA DŒŒAs;n;g;E and hhQiinB ;gB ;EB D ŒŒBs;nA ;gA ;EA ŒŒ if x then A else Bs;n;g;E D hhP j QiinB ;.gA default gB /;.EA ]EB / g where hhP ii D ŒŒAs;n;.g when useE .x//;E and hhQii nA ;gA ;EA g s;nA ;.g when not useE .x//;E
nB ;gB ;EB DŒŒB
Fig. 3.13 Interpretation of timed sequential actions
Execution is started with s D 0 upon receipt of a trigger t . It is also resumed from a skip at s D n with a trigger t . Hence the signal t is synchronized to the state s of the action. The signal r is used to inform the parent block (an automaton) that the execution of the action has finished (it is back to its initial state 0). An end resets s to 0, stores all variables x defined in E with an equation x D usegE .x/ and finally stops (its returned guard is 0). A skip advances s to the next label n C 1 when it receives control upon the guard e and flushes the variables defined so far. It returns a new guard .s pre 0/ D n C 1 to resume the actions past it. An action xŠ emits x when its guard e is true. A sequence AI B evaluates A to the process P and passes its state nA , guard gA , environment EA to B. It returns P j Q with the state, guard and environment of B. Similarly, a conditional evaluates A with the guard g when x to P and B with g when not x to Q. It returns P j Q but with the guard gA default gB . All variables x 2 X , defined in both EA and EB , are merged in the environment F . In Fig. 3.13, we write E ] F to merge the definitions in the environments E and F . For all variables x 2 V .E/ [ V .F / in the domains of E and F , 8 x 2 V .E/ n V .F /; < E.x/; .E ] F /.x/ D F .x/; x 2 V .F / n V .E/; : E.x/ default F .x/; x 2 V .E/ \ V .F /: Note that an action cannot be reset from the parent clock because it is not synchronized to it. A sequence of emissions xŠI xŠ yield only one event along the signal x because they occur at the same (logical) time, as opposed to xŠI skip I xŠ which sends the second one during the next trigger.
104
A. Cortier et al.
0 1 2 3 4 5 6 7 8 9 10
x1 D 0I if y then x2 D x1 C 1 else f x3 D x1 1I x D x3 I skip I x4 D x 1 gI x D .x2 ; x4 / end
x1 D 0 when .s D 0/ x2 D x1 C 1 when .s D 0/ when y x3 D x1 1 when .s D 0/ when not y x4 D .x pre 0/ 1 when .s D 1/ x D x2 when .s D 0/ when y default x3 when .s D 0/ when not y default x4 when .s D 1/ s 0 D 0 when .s D 1/ default 0 when .s D 0/ when y default 1 when .s D 0/ when not y s D s 0 pre 0
Fig. 3.14 Tracing the interpretation of a timed sequential program
Example Consider the simple sequential program of the introduction. Its static single assignment form is depicted in Fig. 3.14. x D 0I if y then fx D x C 1g else fx D x 1I skip I x D x 1gI end As in GCC, it uses a -node, line 9 to merge the possible values x2 and x4 of x flowing from each branch of the if . Our interpretation implements this by a default equation that merges these two values with the third, x3 , that is stored into x just before the skip line 6. The interpretation of all assignment instructions in the program follows the same pattern.2 Line 2, for instance, the value of x is x1 , which flows from line 1. Its assignment to the new definition of x, namely x2 , is conditioned by the guard y on the path from line 1 to 2. It is conditioned by the current state of the program, which needs to be 0, from line 1 to 6 and 9 (state 1 flows from line 7 to 9, overlapping on the -node). Hence the equation x2 D x1 C 1 when .s D 0/ when y.
3.3.5 Interpretation of Automata Automata schedule the execution of operations and blocks by performing timely guarded transitions. An automaton receives control from its trigger and reset signals
2
In the actual translation, temporary names x1;:::;5 are substituted by the expression that defines them. We kept them in the figure for a matter of clarity.
3
Synoptic: A DSML for Space On-board Application Software
105
x:trigger and x:reset as specified by its parent block. When an automaton is first triggered, or when it is reset, its starts execution from its initial state, specified as initial state S . On any state S W do A, it performs the action A. From this state, it may perform an immediate transition to new state T , written S ! on x T , if the value of the current variable x is true. It may also perform a delayed transition to T , written S on x T , that waits the next trigger before to resume execution (in state T ). If no transition condition applies, it then waits the next trigger and resumes execution in state S . States and transitions are composed as A j B. The timed execution of an automaton combines the behavior of an action or a dataflow. The execution of a delayed transition or of a stutter is controlled by an occurrence of the parent trigger signal (as for a dataflow). The execution of an immediate transition is performed without waiting for a trigger or a reset (as for an action). .automaton/ A; B WWD state SW doA j S ! on x T j S on x T j A j B : An automaton describes a hierarchic structure consisting of actions that are executed upon entry in a state by immediate and delayed transitions. An immediate transition occurs during the period of time allocated to a trigger. Hence, it does not synchronize to it. Conversely, a delayed transition occurs upon synchronization with the next occurrence of the parent trigger event. As a result, an automaton is partitioned in regions. Each region corresponds to the amount of calculation that can be performed within the period of a trigger, starting from a given initial state.
Notations We write !A and A for the immediate and delayed transition relations of an automaton A. We write pred!A .S / D fT j .T; x; S / 2 Rg and succ!A .S / D fT j .S; x; T / 2 Rg (resp. predA .S / and succA .S /) for the predecessor and successor states of the immediate (resp. delayed) transitions !A (resp. A ) from a state S in an automaton A. Finally, we write S for the region of a state S . It is defined by an equivalence relation. 8S; T 2 S .A/; ..S; x; T / 2 !A / , S D T: For any state S of A, written S 2 S .A/, it is required that the restriction of !A to the region S is acyclic. Notice that, still, a delayed transition may take place between two states of the same region.
Interpretation An automaton A is interpreted by a process ŒŒ automaton x Art parameterized by its parent trigger and reset signals. The interpretation of A defines a local state s. It is synchronized to the parent trigger t . It is set to 0, the initial state, upon receipt of
106
A. Cortier et al.
a reset signal r and, otherwise, takes the previous value of s 0 , that denotes the next state. The interpretation of all states is performed concurrently. We give all states Si of an automaton A a unique integer label i D dSi e and designate with dAe its number of states. S0 is the initial state and, for each state of index i , we call Ai its action i and xij the guard of an immediate or delayed transition from Si to Sj . ŒŒ automaton x Art D Q s =ss 0 ii : hh t sync s j s D .0 when r/ default .s 0 pre 0/ j Si 2S .A/ ŒŒSi The interpretation ŒŒSi s of all states 0 i < dAe of an automaton (Fig. 3.15) is implemented by a series of mutually recursive equations that define the meaning of each state Si depending on the result obtained for its predecessors Sj in the same region. Since a region is by definition acyclic, this system of equations has therefore a unique solution. The interpretation of state Si starts with that of its actions Ai . An action Ai defines a local state si synchronized to the parent state s D i of the automaton. The automaton stutters with s 0 D s if the evaluation of the action is not finished: it is in a local state si ¤ 0. Interpreting the actions Ai requires the definition of a guard gi and of an environment Ei . The guard gi defines when Ai starts. It requires the local state to be 0 or the state Si to receive control from a predecessor Sj in the same region (with the guard xj i ). The environment Ei is constructed by merging these Fj returned by its immediate predecessors Sj . Once these parameters are defined, the interpretation of Ai returns a process Pi together with an exit guard hi and an environment Fi holding the value of all variables it defines.
8i < dAe; ŒŒSi s D Pi j Qi j si sync when .s D i / j s 0 D si0 =si where hhPi iin;hi ;Fi D ŒŒAi si ;0;gi ;Ei Q Qi D .Si ;xij ;Sj /2A defhi when .useF .xij // .Fi / i U Ei D Sj 2pred! .Si / Fj A W gi D 1 when .si pre 0 D 0/ default .Sj ;xj i ;Si /2!A .useE .xj i // gij D hi when .useFi .xij //; 8.Si ; xij ; Sj / 2 A W si0 D .s when si ¤ 0/ default .Si ;xij ;Sj /2A .j when gij / Fig. 3.15 Recursive interpretation of a mode automaton
3
Synoptic: A DSML for Space On-board Application Software
107
Upon evaluation of Ai , delayed transition from Si are checked. This is done by the definition of a process Qi which, first, checks if the guard xij of a delayed transition from Si evaluates to true with Fi . If so, variables defined in Fi are stored with defhi .Fi /. All delayed transitions from Si to Sj are guarded by hi (one must have finished evaluating i before moving to j ) and a condition gij , defined by the value of the guard xij . The default condition is to stay in the current state s while si ¤ 0 (i.e., until mode i is terminated). Hence, the next state from i is defined by the equation s 0 D si0 . The Q next state equation of each state isWcomposed with the other to form the product i ˆ > ˆ ˇ v > ˆ > ˆ ˇC D C r [ fmin.T .b.x///g = < ˇ C ; ŒŒ data y pre v ! x D b 2 B jx;y ˇ ˇ8t 2 C v ; b.x/.t/ D v > ˆ > ˆ ˇ > ˆ > ˆ ˇ > ˆ : ˇ8t 2 C x n C v ; b.x/.t/ D b.y/.predC x .t// ; ˇ ( ) ˇ8t 2 T .b.x// D T .b.y// D T .b.z//; ˇ ŒŒ data yfz ! xC D b 2 B jx;y;z ˇ ; ˇ b.x/.t/ D f .b.y/.t/; b.z/.t// ˚ ŒŒ event y ! xC D b 2 B jx;y jT .b.x// D T .b.y// ; ŒŒA j BC D ŒŒAC j ŒŒBC :
Denotation of Actions Given its state b, the execution ck D ŒŒAb of an action A returns a new state c and a status k whose value is 1 if a skip occurred and 0 otherwise. We write b#x D b.x/.min T .b// and b"x D b.x/.max T .b// for the first and last value of x in b. A sequence first evaluates A to ck and then evaluates B with store b c to dl . If k is true, then a skip has occurred in A, meaning that c and d belong to different instants. In this case, the concatenation of b and c is returned. If k is false, then the execution that ended in A continues in B at the same instant. Hence, variables defined in the last reaction of c must be merged to variables defined in the first reaction of d . To this end, we write .b ‰ c/.x/ D b.x/ ] c.x/ for all x 2 V .b/ if t D max.T .b// D min.T .c//.
110
A. Cortier et al.
ŒŒ do Ab D b c ;
where ck D ŒŒAb ;
ŒŒ skip b D ;1 ; ŒŒxŠb D f.x; t; 1/g0 ; ŒŒx D yfzb D f .x; t; f .b"y ; b"z / g0 ; ŒŒ if x then A else Bb D if b"x then hhAiib else hhBiib ; ŒŒAI Bb D if k then .c d /l else .c ‰ d /l ; where ck D ŒŒAb and dl D ŒŒBbc : Denotation of Automata An automaton x receives control from its trigger at the clock C t and is reset at the clock C r . Its meaning is hence parameterized by C D .C t ; C r /. The meaning of its specifications hhAiisb is parameterized by the finite trace b, that represents the store, by the variable s, that represents the current state of the automaton. At the i th step of execution, given that it is in state j (.bi /"s D j ) the automaton produces a finite behavior biC1 D hhSj iisb . This behavior must match the timeline of the trigger: T .bi / C t . It must to the initial state 0 is a reset occurs: 8t 2 C r ; bi .s/.t / D 0. ŒŒ automaton x ACˇ D 8 ˇb D f.x; t ; 0/ j x 2 V .A/ [ fsgg ˆ 0 ˇ 0 < ˇ .ıi0 bi / =s ˇ8i 0; biC1 2 ŒŒSj sbi ; T .biC1 / C t ˇ ˆ : ˇ j D if min.T .biC1 // 2 C r then 0 else .bi /"s
9 > =: > ;
When an automaton is in the state Si , its action Ai is evaluated to c 2 hhAi iib given the store b. Then, immediate or delayed transitions departing from Si are evaluated to return the final state d of the automaton. 8 ŒŒSj sc ; ˆ ˆ < c ] f.s; t; j /g; ŒŒSi sb D ˆ ˆ : c ] f.s; t; i /g;
.Si ; xij ; Sj / 2 !A ^ c"xij ; .Si ; xij ; Sj / 2 A ^ c"xij ; where c=s D ŒŒ do Ai b and t D max T c; otherwise:
3.4 Middleware Aspect In order to support the execution of code generated from Synoptic model at runtime, a middleware is embedded in the satellite. This SPaCIFY middleware not only implements common middleware services such as communication or naming, but also offers domain-specific services of satellite flight control software. There are then
3
Synoptic: A DSML for Space On-board Application Software
111
two difficulties that must be addressed. First, one has take into account the scarceness of resources in a satellite. The large variety3 of satellite platforms from one project to another being the second. To take into account the scarceness of resources in a satellite, this middleware is tailored to the domain and adapted to each specific project. This notion of generative middleware is inherited from the ASSERT project which has studied proof-based engineering of real-time applications but is here specialised to the area of satellite software. ASSERT defines a so-called virtual machine, which denotes a RTOS kernel along with a middleware and provides a language-neutral semantics of the Ravenscar profile [14]. It relies on PolyORB-HI [24], a high integrity version of PolyORB refining the broker design pattern [15], which fosters the reuse of large chunks of code when implementing multiple middleware personalities. Satisfying restrictions from the Ravenscar profile, PolyORB-HI can suitably be used as a runtime support for applications built with AADL code generators. In our context, the support of a precise adaptation to the need of the middleware is obtained thanks to its generation based on the requirements expressed in the Synoptic models (mainly through the interaction contracts attached to external variables). Some of the services of the middleware cannot be reused and must be reimplemented for each satellite. For example, the AOCS4 implements control laws that are specific to the expected flight and to the mission. Providing a framework, the middleware will help capitalizing on the experience of domain expert, giving a general architecture for the AOCS. The models of services can belong to either the middleware or the application. In the later case, we use the same process as for the rest of the application, including the Synoptic language. Furthermore as services will have a well defined interface (i.e., an API) other services or applications using AOCS are not coupled to any particular implementation. This section starts by presenting the architecture of the SPaCIFY middleware. Then, the middleware kernel and the communication between applications generated from Synoptic models and their hosting middleware is explained. Lastly, the reconfiguration service that has been one of the main focus during the project is described.
3.4.1 Architecture of the Execution Platform Figure 3.17 depicts the overall architecture of the SPaCIFY middleware. Following previous work on middlewares for real-time embedded systems [8, 31], the SPaCIFY middleware has a microkernel-like or service-based architecture. That way, high flexibility allows to embed only the services that are required, depending on the
3
This diversity is due to the high specialisation of platforms for a specific mission and the broad range of missions. 4 Attitude and Orbit Control System.
112
A. Cortier et al.
synchronous islands External variables Management of external variables
a
c ti
S
r abst
Services naming, time, persistency, reconfiguration, redundancy, PUS TM/TC, PUS event, AOCS...
RTOS kernel
RT O
- mi
w are ke dle
el rn
d
asynchronous environment
on of the
bus & devices on board/gound link persistent memory
Fig. 3.17 Architecture of the middleware
satellite. While previous work has focused on techniques to reduce the time during which the reconfigured system is suspended, along with an analysis to bound that time, our work aims at: Using application model to specify reconfiguration Proposing a contract approach to the specification of relation between applica-
tions and the middleware Providing on demand generation of the needed subset of the middleware for a
given satellite The RTOS kernel offers the usual basic services such as task management and synchronisation primitives. The core middleware is built on top of this RTOS and provides core services of composition and communication. They are not intended to be used by application-level software. They rather provide means to structure the middleware itself in smaller composable entities, the services. Upon these abstractions are built general purpose services and domain specific services. These services are to be used (indirectly) by the application software (here the various synchronous islands). They can be organized in two categories as follows: The first layer is composed of general purpose services that may be found in
usual middleware. Among them, the naming service implements a namespace that would be suitable for distributed systems. The persistency service provides a persistent storage for keeping data across system reboots. The redundancy service helps increasing system reliability thanks to transparent replication management. The reconfiguration service, further described in a dedicated subsection below (Sect. 3.4.3), adds flexibility to the system as it allows to modify the software at runtime. The task and event service contributes real-time dispatching of processors according to the underlying RTOS scheduler. It provides skeletons for
3
Synoptic: A DSML for Space On-board Application Software
113
various kinds of tasks, including periodic, sporadic and aperiodic event-triggered tasks, possibly implementing sporadic servers or similar techniques [25, 32]. The second layer contains domain-specific services to capture the expertise in the area of satellite flight control software. These services are often built following industry standards such as PUS [17]. The TM/TC service implements the well established telemetry/telecommand link with ground stations or other satellites. The AOCS (attitude and orbit control system) controls actuators in order to ensure the proper positioning of the satellite. As discussed earlier, services may be implemented by synchronous islands. To support the execution the platform use hardware resources provided by the satellite. As the hardware platform changes from one satellite to another, it must be abstracted even for the middleware services. Only the implementation of specific drivers must be done for each hardware architecture. During the software development lifecycle, the appropriate corresponding service implementations are configured and adapted to the provided hardware. As already exposed, one particularity of the SPaCIFY approach is the use of the synchronous paradigm. To support the use of the middleware services within this approach, we propose to use an abstraction, the external variable as shown in Sect. 3.2.2.2. Such a variable abstracts the interaction between synchronous islands or between a synchronous island and the middleware relaxing the requirements of synchrony. In the model, when the software architect wants to use a middleware service, he provides a contract describing its requirements on the corresponding set of external variables and the middleware is in charge to meet these requirements. Clearly, this mediation layer in charge of the external variables management is specific to each satellite. Hence the contractual approach drives the generation of the proper management code. This layer is made of backends that capture all the asynchronous concerns such as accessing to a device or any aperiodic task, hence implementing asynchronous communications of the GALS approach. The middleware is in charge of the orchestration of the exchange between external variables, their managing backends and the services while ensuring the respect of quality of service constraints (such temporal one) specified in their contracts.
3.4.2 The Middleware Kernel and External Variables As stated above, the middleware is built around a RTOS providing tasks and synchronisation. As the RTOS cannot be fixed due to industrial constraints, the middleware kernel must provide a common abstraction. It therefore embeds its own notion of task and for specific RTOS an adaptor must be provided. The implementation of such adaptors has been left for future until now. Notice that the use of this common abstraction forbids the use of specific and sophisticated services offered by some RTOS. The approach here is more to adapt the services offered by the middleware to the business needs rather using low level and high performance services.
114
A. Cortier et al.
The task is the unit of execution. Each task can contain Synoptic components according to the specification of the dynamic architecture of the application. Tasks have temporal features (processor provisioning, deadline, activation period) inherited from the Synoptic model. The middleware kernel is in charge of their execution and their monitoring. It is also in charge of provisioning resources for the aperiodic and sporadic tasks. Communication inside a task results from the compilation of the synchronous specification of the various components it must support. All communication outside a task must go through external variables limiting interaction to only one abstraction. External variables are decomposed into two sub abstractions: The frontend identified in the Synoptic model and that constitutes the interaction
point. It appears as a usual signal in the synchronous model and may be used as input or output. The way it is provided or consumed is abstracted in a contract that specify the requirements on the signal (such as its timing constraints). An external variable is said to be asynchronous because no clock constraint is introduced between the producer of the signal and its consumer. In the code generated access to such variables are compiled into getter and setter function implemented by the middleware. The contract must include usage requirements specifying the way the signal is used by the task (either an input or an output). They may also embed requirements on the freshness of the value or on event notifying value change. The backend specified using stereotypes in the contract configure the middleware behavior for non synchronous concerns. For example, persistency contract specify that the external variable must be saved in a persistent memory. Acquisition contracts can be used by a task to specify which data it must get before its execution. Such backends are collected and global acquisition plan are built and executed by the middleware. As the middleware supports the reconfiguration of applications at runtime, tasks can be dynamically created. The dynamic modification of the features is also provided by the middleware. The middleware will have to ensure that these constraints are respected. Beware that any modification must be made on the model and validated by the SPaCIFY tools to only introduce viable constraints. Each task has a miss handler defined. This defensive feature makes the middleware execute corrective actions and report an error whenever the task does not respect its deadline.
3.4.3 Reconfiguration Service Among, domain specific services offered by the middleware, the reconfiguration service is the one that had the most impact on the middleware conception. The design of the reconfiguration service embedded in the SPaCIFY middleware is driven by the specific requirements of satellite onboard software. Reconfiguration is typically controlled from ground stations via telemetry and telecommands.
3
Synoptic: A DSML for Space On-board Application Software
115
offline compilation OBSW image
a d-s groun
onboard software model
tel l
Developer satellite state abstract reconf plan Operator
compilation according to the OBSW image
ite l
ink
concrete reconf plan
ground station
Fig. 3.18 Process and responsibilities for reconfigurations
Human operators not only design reconfigurations, they also decide the right time at which reconfigurations should occur, for instance while the mission is idle. Due to resource shortage in satellites, the reconfiguration service must be memory saving in the choice of embedded metadata. Last, in families of satellites, software versions tend to diverge as workarounds for hardware damages are installed. Nevertheless, some reconfigurations should still apply well to a whole family despite the possible differences between the deployed software. In order to tackle these challenges, the reconfiguration service rely on the structural model (Synoptic models) of the OBSW enriched by the current state of the satellite as described in Fig. 3.18. Using the structure of the software allows to abstract the low-level implementation details when performing reconfigurations. While designing reconfigurations, operators can work on models of the software, close to those used at development time, and specify so called abstract reconfiguration plan like: change the flow between blocks A and B by a flow between A and C. An abstract reconfiguration plan use high level elements and operations which may increase the CPU consumption of reconfigurations compared to low level patches. Typical operations such as pattern matching of components make the design of reconfiguration easier, but they are compute intensive. Instead of embedding the implementation of those operations into the satellite middleware, patterns may be matched offline, at the ground station, thanks to the knowledge of the flying software. We therefore define a hierarchy of reconfiguration languages, ranging from high-level constructs presented to the reconfiguration designer to low-level instructions implemented in the satellite runtime. Reconfigurations are compiled in a so called concrete reconfiguration plan before being sent to the satellite. For instance, the abstract plan upgrade A may be compiled to stop A and B; unbind A and B; patch the implementation of A; rebind A and B; restart A and B. This compilation process uses proposed techniques to reduce the size of the patch sent to the satellite and the time to apply it [34]. Another interest of this compilation scheme is to enable the use of the same abstract reconfiguration plan to a whole family of satellites. While traditional patch based approaches make it hard to apply a single reconfiguration to a family, each concrete plan may be adapted to the precise state of each of the family member.
116
A. Cortier et al.
Applying the previously model driven approach to reconfigurations of satellite software raise a number of other issues that are described in [10]. Work remains to be conducted to get the complete reconfiguration tool chain available. The biggest challenge being the specification of reconfiguration plan. Indeed, considering reconfiguration at the level of the structural model implies to include the reconfigurability concern in the metamodel of Synoptic . If reconfigurability is designed as an abstract framework (independent of the modeling language), higher modularity is achieved when deciding the elements that are reconfigurable. First experiments made on Fractal component model [11] allow us to claim that focusing on the points of interest in application models, metadata for reconfiguration are more efficient. Indeed, separating reconfigurability from Synoptic allows to download metadata on demand or to drop them at runtime, depending on requested reconfigurations. Lastly, applying reconfiguration usually require that the software reach a quiescent state [27]. Basically, the idea consists in ensuring that the pieces of code to update are not active nor will get activated during their update. For reactive and periodic components as found in the OBSW, such states where reconfiguration can be applied may not be reached. We have proposed another direction [12]. We consider that active code can be updated consistently. Actually, doing so runs into low-level technical issues such as adjusting instruction pointers, and reshaping and relocating stack frames. Building on previous work on control operators and continuation, we have proposed to deal with the low level difficulties using the notion of continuation and operators to manipulates continuations. This approach do not make updating easier but gives the opportunity to relax the constraints on update timing and allow updates without being anticipated.
3.5 Conclusion Outlook The SPaCIFY ANR exploratory project proposes a development process and associated tools for hard real time embedded space applications. The main originality of the project is to combine model driven engineering, formal methods and synchronous paradigms in a single homogeneous design process. The domain specific language Synoptic has been defined in collaboration with industrial end-users of the project combining functional models derived from Simulink and architectural models derived from AADL . Synoptic provides several views of the system under design: software architecture, hardware architecture, dynamic architecture and mappings between them. Synoptic is especially well adapted for control and command algorithm design. The GALS paradigm adopted by the project is also a key point in the promoted approach. Synoptic language allows to model synchronous islands and to specify how these islands exchange asynchronous information by using the services of a dedicated middleware.
3
Synoptic: A DSML for Space On-board Application Software
117
In particular, the SPaCIFY project proposed a contract approach to the specification of the relations between applications and the middleware. This middleware does not only implement common services such as communication or naming, but also offers domain-specific services for satellite flight control software. Reconfiguration is one of these domain-specific services. The SPaCIFY project studied this service with a particular focus on the use of application model to specify reconfiguration.
The Synoptic Environment The SPaCIFY design process has been equipped with an Eclipse-based modeling workbench. To ensure the long-term availability of the tools, the Synoptic environment rely on open-source technologies: it guarantees the durability and the adaptability of tools for space projects which can last more than 15 years. We hope this openness will also facilitate adaptations to other industries requirements. The development of the Eclipse-based modeling workbench started with the definition of the Ecore meta-model [2] of the Synoptic language. The definition of this meta-model has relied on the experience gained during the GeneAuto project. This definition is the result of a collaborative and iterative process. In a first step, a concrete syntax relying on the meta-model has been defined using academic tools such as TCS (Textual Concrete Syntax) [26]. This textual syntax was used to validate the usability of the language through a pilot case study. These models have helped to improve the Synoptic language and to adapt it to industrial know-how. Once the language was stabilized, a graphical user editor was designed. A set of structural and typing constraints have been formalized, encoded in OCL (Object Constraint Language), and integrated into the environment. In parallel of these activities, the semantics of the language was defined. The formal semantics of the Synoptic language relies on the polychronous paradigm. This semantics was used for the definition of the transformation of Synoptic models towards SME models following a MDE approach. This transformation allows to use the Polychrony platform for verification, semantics model transformation hints for the end user, such as splitting of software as several synchronous islands and simulation code generation purposes. The Synoptic environment is being used to develop case studies of industrial size.
Future Investigations The Synoptic environment is based on model transformation. Thus, verifying this transformations is a key point. It has been addressed in the Geneauto project to certify sequential code generation from a Stateflow/Simulink based language. This work must be extended to take into account features of the execution platform such as timers, preemption-based schedulers, multi-threading, multi-processors, . . . . Work is in progress on a subset of the Synoptic language.
118
A. Cortier et al.
The Synoptic environment provides a toolset supporting a development process. Experience acquired during the SPaCIFY project with industrial partners has enlighten two different processes and thus the need to parameterize the platform by the process. A SPEM-based specification of the development process could be used as input of a generic platform so that it could be configured to match end user current and future development method. The Synoptic environment offers a limited support to refinement-based development process. This support could be extended and coupled with versioning to allow refinement checking between any couples of models or submodels. It means a support for defining and partially automatically generating gluing invariants between models. Then proof obligations could be generated.
References 1. Altarica Project. http://altarica.labri.u-bordeaux.fr/wiki/. 2. Eclipse Modeling Framework project (EMF). http://www.eclipse.org/modeling/emf/. 3. RT-Builder. Solutions for Real-Time design, modeling and analysis of complex, multiprocessors and multi-bus systems and software. http://www.geensys.com/?Outils/RTBuilder. 4. Simulink. Simulation and model-based design. http://www.mathworks.com/. 5. ASSERT Project. Automated proof-based system and software engineering for real-time systems. http://www.assert-project.net/, 2007. 6. As-2 Embedded Computing Systems Committee SAE. Architecture Analysis & Design Language (AADL). SAE Standards no AS5506, November 2004. 7. Albert Benveniste, Patricia Bournai, Thierry Gautier, Michel Le Borgne, Paul Le Guernic, and Herv Marchand. The Signal declarative synchronous language: controller synthesis and systems/architecture design. In 40th IEEE Conference on Decision and Control, December 2001. 8. U. Brinkschulte, A. Bechina, F. Picioroag˘a, and E. Schneider. Open System Architecture for embedded control applications. In International Conference on Industrial Technology, volume 2, pages 1247–1251, Slovenia, December 2003. 9. Christian Brunette, Jean-Pierre Talpin, Abdoulaye Gamati´e, and Thierry Gautier. A metamodel for the design of polychronous systems. Journal of Logic and Algebraic Programming, 78(4):233–259, 2009. 10. J´er´emy Buisson, Cecilia Carro, and Fabien Dagnat. Issues in applying a model driven approach to reconfigurations of satellite software. In HotSWUp ’08: Proceedings of the 1st International Workshop on Hot Topics in Software Upgrades, pages 1–5, New York, NY, USA, 2008. ACM. 11. J´er´emy Buisson and Fabien Dagnat. Experiments with fractal on modular reflection. In SERA ’08: Proceedings of the 2008 Sixth International Conference on Software Engineering Research, Management and Applications, pages 179–186, Washington, DC, USA, 2008. IEEE Computer Society. 12. J. Buisson and F. Dagnat, “ReCaml: Execution State as the Cornerstone of Reconfigurations”. In The 15th ACM SIGPLAN International Conference on Functional Programming, pages 27–29, Baltimore, Maryland, USA, September 2010. 13. Alan Burns. The Ravenscar profile. ACM Ada Letters, 4:49–52, 1999. 14. Alan Burns, Brian Dobbing, and Tullio Vardanega. Guide for the use of the Ada Ravenscar Profile in high integrity systems. Ada Letters, XXIV(2):1–74, 2004. 15. F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. Stal. Pattern-oriented software architecture: a system of patterns. Wiley, New York, 1996.
3
Synoptic: A DSML for Space On-board Application Software
119
16. F.-X. Dormoy. Scade 6: a model based solution for safety critical software development. In Proceedings of the 4th European Congress on Embedded Real Time Software (ERTS ’08), pages 1–9, Toulouse, France, January–February 2008. 17. ESA. European space agency. ground systems and operations – telemetry and telecommand packet utilization (ECSS-E-70), January 2003. 18. Sanford Friedenthal, Alan Moore, and Rick Steiner. A practical guide to SysML: the systems modeling language. Morgan Kaufmann, San Francisco, CA, 2008. 19. James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. Java(TM) language specification, 3rd edition. Addison-Wesley, New York, 2005. 20. Object Management Group. CORBA Component Model 4.0 Specification. Specification Version 4.0, Object Management Group, April 2006. 21. Paul Le Guernic, Jean-Pierre Talpin, Jean-Christophe Le Lann, and Projet Espresso. Polychrony for system design. Journal for Circuits, Systems and Computers, 12:261–304, 2002. 22. N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous dataflow programming language LUSTRE. In Proceedings of the IEEE, pages 1305–1320, 1991. 23. David Harel. Statecharts: a visual formalism for complex systems. Science of Computer Programming, 8(3):231–274, 1987. 24. Jerome Hugues, Bechir Zalila, and Laurent Pautet. Combining model processing and middleware configuration for building distributed high-integrity systems. In ISORC ’07: 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing, pages 307–312, Washington, DC, USA, 2007. IEEE Computer Society. 25. Damir Isovic and Gerhard Fohler. Efficient scheduling of sporadic, aperiodic, and periodic tasks with complex constraints. In 21st IEEE Real-Time Systems Symposium (RTSS’2000), pages 207–216, Orlando, USA, November 2000. 26. Fr´ed´eric Jouault, Jean B´ezivin, and Ivan Kurtev. Tcs:: a DSL for the specification of textual concrete syntaxes in model engineering. In GPCE ’06: Proceedings of the 5th international conference on Generative programming and component engineering, pages 249–254, New York, NY, USA, 2006. ACM. 27. J. Kramer and J. Magee. The evolving philosophers problem: dynamic change management. IEEE Transactions on Software Engineering, 16(11):1293–1306, November 1990. 28. Object Management Group Management Group. A UML profile for MARTE, beta 2. Technical report, June 2008. 29. Peter J. Robinson. Hierarchical object-oriented design. Prentice-Hall, Upper Saddle River, NJ, 1992. 30. Jean-Franois Rolland, Jean-Paul Bodeveix, Mamoun Filali, David Chemouil, and Thomas Dave. AADL modes for space software. In Data Systems In Aerospace (DASIA), Palma de Majorca, Spain, 27 May 08 – 30 May 08, page (electronic medium), http://www.esa.int/ publications, May 2008. European Space Agency (ESA Publications). 31. E. Schneider. A middleware approach for dynamic real-time software reconfiguration on distributed embedded systems. PhD thesis, INSA Strasbourg, 2004. 32. Brinkley Sprunt, John P. Lehoczky, and Lui Sha. Exploiting unused periodic time for aperiodic service using the extended priority exchange algorithm. In IEEE Real-Time Systems Symposium, pages 251–258, Huntsville, AL, USA, December 1988. 33. Andres Toom, Tonu Naks, Marc Pantel, Marcel Gandriau, and Indra Wati. GeneAuto: an automatic code generator for a safe subset of SimuLink/StateFlow. In European Congress on Embedded Real-Time Software (ERTS), Toulouse, France, 29 January 08 – 01 February 08, page (electronic medium), http://www.sia.fr, 2008. Socit des Ingnieurs de l’Automobile. 34. Carl von Platen and Johan Eker. Feedback linking: optimizing object code layout for updates. SIGPLAN Notices, 41(7):2–11, 2006.
Chapter 4
Compiling SHIM Stephen A. Edwards and Nalini Vasudevan
4.1 Introduction Embedded systems differ from traditional computing systems in their need for concurrent descriptions to handle simultaneous activities in their environment or to exploit parallel hardware. While it would be nice to program such systems in purely sequential languages, this greatly hinders both expressing and exploiting parallelism. Instead, we propose a fundamentally concurrent language that, by construction, avoids many of the usual pitfalls of parallel programming, specifically data races and non-determinism. Most sequential programming languages (e.g., C) are deterministic: they produce the same output for the same input. Inputs include usual things such as files and command-line arguments, but for reproducibility and portability, things such as the processor architecture, the compiler, and even the operating system are not considered inputs. This helps programmers by making it simpler to reason about a program and it also simplifies verification because if a program produces the desired result for an input during testing, it will do so reliably. By contrast, concurrent software languages based on the traditional shared memory, locks, and condition variables model (e.g., pthreads or Java) are not deterministic by this definition because the output of a program may depend on such things as the operating system’s scheduling policy, the relative execution rates of parallel processors, and other things outside the application programmer’s control. Not only does this demand a programmer consider the effects of these things when designing the program, it also means testing can only say a program may behave correctly on certain inputs, not that it will. That deterministic concurrent languages are desirable and practical is the central hypothesis of the SHIM project. That they relieve the programmer from considering different execution orders is clear; whether they impose too many constraints is not something we attempt to answer here.
S.A. Edwards () and N. Vasudevan Computer Science Department of Columbia University, New York, NY, USA e-mail: [email protected]; [email protected]
S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 4, c Springer Science+Business Media, LLC 2010
121
122
S.A. Edwards and N. Vasudevan
In this chapter, we demonstrate that determinism also benefits code synthesis, optimization, and verification by making it easier for an automated tool to understand a program’s behavior. The advantage is particularly helpful for formal verification algorithms, which can ignore different execution interleavings of SHIM programs. Although statements in concurrently running SHIM processes may execute in different orders, SHIM’s determinism guarantees this will not affect any result and hence most properties. This is in great contrast to the motivation for the SPIN model checker [1], one of whose main purposes is to check different execution interleavings for consistency. SHIM has no need for SPIN. SHIM is an asynchronous concurrent language whose programs consist of independently running threads coded in an imperative C-like style that communicate exclusively through rendezvous channels. It is a restriction of Kahn networks [2] that replaces Kahn’s unbounded buffers with the rendezvous of Hoare’s CSP [3]. Kahn’s unbounded buffers would make the language Turing-complete [4], and are difficult to schedule [5], so the restriction to rendezvous makes the language easy to analyze. Furthermore, since SHIM is a strict subset of Kahn networks, it inherits Kahn’s scheduling independence: the sequence of data values passed across each communication channel is guaranteed to be the same for all correct executions of the program (and potentially input-dependent). We started the SHIM (Software/Hardware Integration Medium) project after observing students having difficulty making systems that communicated across the hardware/software boundary [6]. Our first attempt [7] focused exclusively on this by providing variables that could be accessed by either hardware processes or software functions (both written in C dialect). We found the inherent nondeterminism of this approach a key drawback. The speed at which software runs on processors is rarely known, let alone controlled, and since software and hardware run in parallel and communicate using shared variables, the resulting system was nondeterministic, making it difficult to test. After our first attempt, we started again from the wish list of Table 4.1. Our goal was to design a concurrent, deterministic (i.e., scheduling-independent) model of computation and started looking around. The synchronous model [8] embodied in languages like Lustre or Esterel assumes a single or harmonically related clocks and thus would not work well for software. The Signal language is based on a richer
Table 4.1 The SHIM wish list Trait Concurrent Mixes synchronous and asynchronous styles Only requires bounded resources Formal semantics Scheduling-independent
Motivation Hardware/software systems fundamentally parallel Software slower and less predictable than hardware; need something like multirate dataflow Fundamental restriction on hardware No arguments about meaning or behavior I/O should not depend on program implementation
4
Compiling SHIM
123
model whose clocks’ rates can be controlled by data, but many find its syntax and semantics confusing. Furthermore, Signal does not guarantee determinism; establishing it requires sometimes-costly program-specific analysis. In the rest of this chapter, we describe a series of code-generation techniques suitable for both sequential and parallel processors. Each actually works on a slightly different dialect of the SHIM language, although all use the Kahn-with-rendezvous communication scheme. The reason for this diversity is historical; we added features to the SHIM model as we discovered the need for them.
4.2 SHIM with Processes and Networks Programs in our first Kahn-with-rendezvous dialect of SHIM consisted of sequential processes and hierarchical network blocks (Fig. 4.1). This dichotomy (which we later removed – see Sect. 4.5) came from mimicking a similar division in hardware description languages like Verilog. The body of a network block consisted of instantiations (written in a functioncall style) of processes or other networks (recursion was not permitted), which all ran in parallel. For succinctness, the compiler inferred the names of communication channels from process and network arguments, although this could be overridden. Processes consisted of C-like code without pointers. Process arguments were input or output channels. Following C++ syntax, outputs were marked with process sink ( uint32 D) f int v; for (;;) v = D; / Read from D / g
process sender( uint32 &C) f int d, e; d = 0; while (d < 4) f e = d; process receiver ( uint32 C, uint32 &D) f while (e > 0) f int a, b, r , v; C = 1; / Write to C / a = b = 0; C = e; / Write to C / for (;;) f e = e 1; r = 1; g while (r) f C = 0; / Write to C / r = C; / Read from C / d = d + 1; if (r != 0) f g v = C; / Read from C / g a = a + v; g network main() f g sender (); b = b + 1; receiver (); D = b; / Write to D / sink (); g g g Fig. 4.1 The dialect of SHIM on which the tail-recursive (Sect. 4.3) and static (Sect. 4.4) code generators worked. A process contains imperative code; its parameters are channels. A network runs processes or other networks in parallel
124
S.A. Edwards and N. Vasudevan
ampersands (&), suggesting they were passed by reference. References to arguments would be treated as blocking write operations if they appeared on the left of an assignment statement and blocking reads otherwise. The overall structure, then, of a SHIM program in this dialect was a collection of sequential processes running in parallel and communicating through point-to-point channels. The compiler rejected programs in which a channel was an output on more than one process.
4.3 Tail-Recursive Code Generation Our first code-generation technique produces single-threaded C code for a uniprocessor [9]. The central challenge is efficiently simulating concurrency without (costly, non-portable) operating system support (we present a parallel code generator in Sect. 4.6). Our technique uses an extremely simple scheduler – a stack of function pointers – that invokes fragments of concurrently-running processes using tail-recursion. In tail-recursive code generation, we translate the code for each process into a collection of C functions. In this dialect of SHIM, every program amounted to a group of processes running in parallel (Sect. 4.2). The boundaries of these functions are places where the process may communicate and have to block, so each such process function begins with code just after a read or a write and terminates at a read, a write, or when the process itself terminates. At any time, a process may be running, runnable, blocked on a channel, or terminated. These states are distinguished by the contents of the stack, channel meta-data structs, and the program counter of the process. When a process is runnable, a pointer to one of its functions is on the stack and its blocked field (defined in its local variable struct) is 0. A running process has control of the processor and there is no pointer to any of its functions on the stack. When a process is blocked, its blocked field is 1 and the reader or writer function pointer of at least one channel has a function pointer to one of the process’s functions. When a process has terminated, no pointers to it appear on the stack and its blocked field is 0. Normal SHIM processes may only block on a single channel at once, so it would seem wasteful to keep a function pointer per channel to remember where a process is to resume. In Sect. 4.4, we relax the block-on-single-channel restriction to accommodate code that mimics groups of concurrently-running processes. Processes communicate through channels that consist of two things: a struct channel that contains function pointers to the reading or writing process that is blocked on the channel, and a buffer that can hold a single object being passed through the channel. A non-null function pointer points to the process function that should be invoked when the process becomes runnable again. Figure 4.2 shows the implementation of a system consisting of a source process that writes 42 to channel C and a sink process that reads it. The synthesized C code consists of data structures that maintain a set of functions whose execution is
4
Compiling SHIM
void (stack [3])( void ); void (sp)(void );
125 / runnable process stack / / stack pointer /
struct channel f void (reader )( void ); / process blocked reading , if any / void (writer )( void ); / process blocked writing , if any / g; struct channel C = f 0, 0 g; int C value; struct f char blocked ; int tmp1; g source = f 0 g;
/ local state of source process / / 1 = blocked on a channel /
struct f char blocked ; int v; int tmp2; g sink = f 0 g;
/ local state of sink process / / 1 = blocked on a channel /
void source 0 () f source .tmp1 = 42; C value = source .tmp1; if ( sink . blocked && C.reader) f sink . blocked = 0; (sp++) = C.reader ; C.reader = 0; g source . blocked = 1; C. writer = source 1 ; ((sp))(); return; g void source 1 () f ((sp))(); return; g void sink 0 () f if (source . blocked && C.writer) f sink 1 (); return; g sink . blocked = 1; C.reader = sink 1 ; ((sp))(); return; g void sink 1 () f sink .tmp2 = C value; source . blocked = 0; (sp++) = C. writer ; C. writer = 0; sink . v = sink .tmp2; ((sp))(); return; g
process source( int32 &C) f C = 42; / send on C / g
1 2 / / / / /
write to channel buffer / if reader blocked / mark reader unblocked / schedule the reader / clear the channel /
/ block us , the writer / / to continue at source 1 / / run next process /
3 4
process sink ( int32 C) f int v = C; / receive / g
2 1 / if writer blocked / / go directly to sink 1 / / block us , the reader / / to continue at sink 1 / / run next process /
3 / / / /
read from channel buffer / unblock the writer / schedule the writer / clear the channel /
/ run next process /
void termination process () fg int main() f sp = &stack [0]; (sp++) = termination process ; (sp++) = source 0 ; (sp++) = sink 0 ; ((sp))(); return 0; g
Fig. 4.2 Synthesized code for two processes (in the boxes) that communicate and the main() function that schedules them
126
S.A. Edwards and N. Vasudevan
pending, a buffer and state for each communication channel, structs that hold the local variables of each process, a collection of functions that hold the code of the processes broken into pieces, a placeholder function called termination process that is called when the system terminates or deadlocks, and finally a main function that initializes the stack of pending function pointers and starts the system. Processes are scheduled by pushing the address of a function on the stack and performing a tail-recursive call to a function popped off the top of the stack. The C code for this is as follows. void func1 () f ... (sp++) = func2 ; / schedule func2 () / ... ((sp ))(); return ; / run a pending function / g void func2 () f ... g
Under this scheme, each process is responsible for running the next; there is no central scheduler code. Because this code generator compiles a SHIM dialect that uses only point-topoint channels for blocking rendezvous-style communication (see Sect. 4.2), the first process that attempts to read or write on a channel blocks until the process at the other end of the channel attempts the complementary operation. Communication is the only cause of blocking behavior in SHIM systems (i.e., the scheduler is nonpreemptive), so processes control their peers’ execution at communication events. The sequence of actions at read and write events in the process is fairly complicated but still fairly efficient. Broadly, when a process attempts to read or write, it attempts to unblock its peer, if its peer is waiting, otherwise it blocks on the channel. Annotations in Fig. 4.2 illustrate the behavior of the code. There are two possibilities: when the source runs first ( 1 ), it immediately writes the value to be communicated into the buffer for the channel (C Value because the code maintains the invariant that a reader only unblocks a writer after it has read data from the channel buffer) and checks to see if the reader (the sink process) is already blocked on the channel. Since we assumed the source runs first, the sink is not blocked so the source blocks on the channel. Next, the source records that control should continue at the source 1 function (the purpose of setting C.writer) when the sink resumes it. Finally, it pops and calls the next waiting process function from the stack. Later, ( 2 ) the sink checks if the source is blocked on C. In this source-beforesink scenario, the source is blocked so sink 0 immediately jumps to sink 1, which fetches the data from the channel buffer, unblocks and schedules the writer, and clears the channel before calling the next process function, source 1 ( 3 ). When the sink runs first ( 1 ), it finds the source is not blocked and then blocks. Later, the source runs ( 2 ), writes into the buffer, discovers the waiting sink process, and unblocks and schedules sink before blocking itself. Later, sink 1 runs ( 3 ), which reads data from the channel buffer, unblocks and schedules the writer, which eventually sends control back to source 1 ( 4 ).
4
Compiling SHIM
127
The main challenge in generating the code described above is identifying the process function boundaries. We use a variant of extended basic blocks (see Fig. 4.3c): a new function starts at the beginning of the process, at a read or write operation, and at any statement with more than one predecessor. This divides the process into single-entry, multiple-exit subtrees, which is finer than it needs to be, but is sufficient and fast. The algorithm is simple: after building the control-flow graph of a process, a DFS is performed starting from each read, write, or multiple-fanin node that goes until it hits such a node. The spanning tree built by each DFS becomes the controlflow graph for the process function, and code is generated mechanically from there. Figure 4.3 illustrates the code generation process for a simple process with some interesting control-flow. The process (Fig. 4.3a) consists of two nested loops. We translate the SHIM code into a fairly standard linear IR (Fig. 4.3b). Its main novelty is await, a statement that represents blocking on one or more channels. E.g., await write C goto 6 indicates the process wants to communicate with its environment on channel C and will branch to statement 6 once this has occurred. Note that the instruction itself only controls synchronization; the actual data transfer takes place in an earlier assignment statement. Although this example (and in fact all SHIM processes) only ever blocks on a single channel at a time, our static scheduling procedure (Sect. 4.4) uses the ability to block on multiple channels simultaneously. Our generated C code uses the following macros: #define BLOCKED READING(r, ch) r.blocked && ch.reader #define RUN READER(r, ch) n r . blocked = 0, (sp++) = ch. reader , ch. reader = 0 #define BLOCK WRITING(w, ch, succ) w.blocked = 1, ch.writer = succ #define BLOCKED WRITING(w, ch) w.blocked && ch.writer #define RUN WRITER(w, ch) n w.blocked = 0, (sp++) = ch. writer , ch. writer = 0 #define BLOCK READING(r, ch, succ) r.blocked = 1, ch. reader = succ #define RUN NEXT ((sp))(); return
BLOCKED READING is true if the given process is blocked on the given channel. RUN READER marks the given process that is blocked on the given channel as runnable. BLOCK WRITING marks the given process – the currently running one – as blocked writing on the given channel. The succ parameter specifies the process function to be executed when the process next becomes runnable. Finally, RUN NEXT runs the next runnable process.
4.4 Code Generation from Static Schedules The tail-recursive code generator we presented above uses a clever technique to reduce run-time scheduling to little more than popping an address off a stack and jumping to it, but even this amount of overhead can be high for an extremely simple process such as an adder. In this section, we describe how to eliminate even this low scheduling overhead by compiling together groups of concurrently-running processes into a single
128
S.A. Edwards and N. Vasudevan
process source( int32 &C) f bool b = 0; for ( int32 a = 0 ; a < 100 ; ) f if (b) f C = a; g else f for ( int32 d = 0 ; d < 10 ; ++d) a = a + 1; g b = ˜b; g g
(a)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
b = 0 a = 0 ifnot a < 100 goto 14 ifnot b goto 7 C = a await write C goto 6 goto 12 d = 0 ifnot d < 10 goto 12 a = a + 1 d = d + 1 goto 8 b = 1 - b goto 2 Exit
(b) b=0 a=0
struct channel C = f0, 0g; int C val; struct f bool blocked ; bool b; int32 a; int32 d; g source = f 0 g; static void source 0 () f source . b = 0; source . a = 0; source 1 (); return ; g static void source 1 () f if (!( source . a < 100)) goto L9; if (!( source . b)) goto L7; C val = source .a; if (BLOCKED READING(sink, C)) RUN READER(sink, C); BLOCK WRITING(source, C, source 2 ); RUN NEXT; L7: source . d = 0; source 3 (); return ; L9: RUN NEXT; g static void source 2 () f source 4 (); return ; g
ifnot a < 100 goto 14 ifnot b goto 7 C=a
d=0
await write C goto 6 goto 12
ifnot d < 10 goto 12 a = a+1
static void source 3 () f L1: if (!( source . d 0 ; e = e 1 ) f next C = 1; next C = e; g next C = 0; g throw Done; g void main() f chan uint32 C, D; try sender(C); par receiver (C, D); par sink (D); catch (Done) fg g
Fig. 4.5 The program of Fig. 4.1 coded in the latest SHIM dialect. We added par, chan, send, recv, next, try, catch, and throw and removed the distinction between processes and networks: both are now functions
(the previous policy confused many users), was also found we were often reading a value from a channel and storing it locally so we could refer to it multiple times. We attempted to retain the communication-in-an-expression syntax by introducing the next keyword, which sends on a channel if it appears on the left side of an assignment and receives when it appears elsewhere. While it can make for very succinct code (next b D next a is a succinct way of writing a buffer), users continue to find it confusing and prefer the send and recv syntax. Figure 4.5 shows the program of Fig. 4.1 coded in this new dialect. We also added the facility for multiway rendezvous. Although a channel may be passed by reference only once, it may be passed by value an unlimited number of times. In this case, each function that receives a pass-by-value copy of the channel is required to participate in any rendezvous on the channel. A primary motivation of this was to facilitate debugging – it is easy now to add processes that monitor channels without affecting a system’s behavior. In retrospect, this facility is sparsely used, difficult to implement, and slows down the more typical point-to-point communication unless carefully optimized away.
4.5.1 Recursion When we introduced function calls, we had to consider how to handle recursion. Our main goal was to make basic function calls work, allowing the usual re-use of code, but we also found that recursion, especially bounded recursion, was an interesting mechanism for specifying more complex structures.
134
S.A. Edwards and N. Vasudevan
void buffer ( int i , int &o) f for (;;) f recv i ; o = i; send o; g g
void fifo ( int i , int &o, int n) f int c; int m = n 1; if (m) buffer ( i , c) par fifo (c , o, m); else buffer ( i , o ); g
Fig. 4.6 An n-place FIFO specified using recursion, from Tardieu and Edwards [12]
void fifo3 (chan int i , chan int &o) f fifo ( i , o, 3); g
void fifo3 (chan int i , chan int &o) f chan int c1, c2, c3; buf ( i , c1 ); par buf (c1, c2 ); void fifo (chan int i , chan int &o, int n) f par if (n > 1) f buf (c2, o ); chan int c; g buf ( i , c ); par fifo (c , o, n1); g else buf ( i , o ); g void buf (chan int i , chan in &o) f int tmp; for (;;) f tmp = recv i ; send o = tmp; g g
(a)
void buf (chan int i , chan in &o) f int tmp; for (;;) f tmp = recv i ; send o = tmp; g g
(b)
Fig. 4.7 Removing bounded recursion, controlled by the n variable, from (a) gives (b). After Edwards and Zeng [13]
Figure 4.6 illustrates this style. The recursive fifo procedure calls itself repeatedly in parallel, effectively instantiating buffer processes as it goes. This recursion runs only once, when the program starts, to set up a chain of single-place buffers. We developed a technique for removing bounded recursion from SHIM programs [13]. One goal was to simplify SHIM’s translation into hardware, where general recursion would require memory for a stack and choosing a size for it, but it has found many other uses. In particular, if all the recursion in a program is bounded, the program is finite-state, simplifying other analysis steps. The basic idea of our work was to unroll recursive calls by exactly tracking the behavior of variables that control the recursion. Our insight was to observe that for a recursive function to terminate, the recursive call must be within the scope of a conditional. Therefore, we need to track the predicate of this conditional, see what can affect it, and so forth. Figure 4.7 shows the transformation of a simple FIFO. Our procedure produces the static version in Fig. 4.7b by observing that the n variable controls the predicate
4
Compiling SHIM
135
around fifo’s recursive call. Then it notices n is set first to 3 by fifo3 and generates three specialized versions of fifo – one with n D 3, n D 2, and n D 1 – simplifies each, then inlines each function, since each is only called once. Of course, in the worst case our procedure could end up trying to track every variable in the program, which would be impractical, but in programs written with this idiom in mind, recursion control only involved a few variables, making it easy to resolve.
4.5.2 Exceptions At this stage, we also added exceptions [14], without question the most technically difficult addition we have made. Inspired by the Esterel language [15], where exceptions are used not just for occasional error handling but as widely as, say, ifthen-else, we wanted our exceptions to be widely applicable and be concurrent and scheduling-independent. For sequential code, the desired exception semantics were clear: throwing an exception immediately sends control to the most-recently-entered handler for the given exception, terminating any functions that were called in between (Fig. 4.8). For concurrently running functions, the right behavior was less obvious. We wanted to terminate everything leading up to the handler, including any concurrently running relatives, but we insisted on maintaining SHIM’s scheduling independence, meaning we had to carefully time when the effect of an exception was felt. Simply terminating siblings when one called an exception would be nondeterministic: the behavior would then depend on the relative execution rates of the processes and thus not be scheduling independent. Our solution was to piggyback the exception mechanism on the communication system. The idea was that a process would only learn of an exception when it attempted to communicate, since it is only at rendezvous points that two processes void main() f int i ; i = 0; try f i = 1; throw T; i = i 2; // not executed g catch(T) f i = i 3; // i = 3 g g
(a)
void main() f int i ; i = 0; try f throw T; g par f for (;;) i = i + 1; g catch(T) fg g
// thread 1 // thread 2 // never terminated
(b)
Fig. 4.8 (a) Sequential exception semantics are classical. (b) Thread 2 never feels the effect of the exception because it never communicates. From Tardieu and Edwards [14]
136
S.A. Edwards and N. Vasudevan
void main() f chan int i = 0, try f while ( i < 5) throw T; g par f for (;;) send g par f for (;;) recv g catch (T) fg g
j = 0; // task 1 send i = i + 1; j = recv i + 1; j;
// poisons itself // task 2 // poisoned by task 1 // task 3 // poisoned by task 2
Fig. 4.9 Transitive poisoning: throw T poisons the first task, which poisons the second when the second attempts recv i. Finally the third is poisoned when it attempts recv j and the whole group terminates
agree on what time it is. Thus, SHIM’s exception mechanism is layered on the inter-process communication mechanism to preserve determinism while providing powerful sequential control. To accommodate exceptions, we introduced a new “poisoned” state for a process that represents when it has been terminated by an exception and is waiting for its relatives to terminate. Any process that attempts to communicate with a poisoned process will itself become poisoned. In Fig. 4.9, the first thread throws an exception; the second thread is poisoned when it attempts to rendezvous on i, and the third is poisoned by the second when it attempts to rendezvous on j. The idea was simple enough, and the interface it presented to the programmer could certainly be used and explained without much difficulty, but implementing it turned out to be a huge challenge, despite there being fairly simple set of structural operational semantics rules for it. The real complexity came from a combination of having to implement exception scope, which limits how far the poison propagates (it does not propagate outside the scope of the exception) and how that interacts with the scope of multiple, concurrently thrown exceptions.
4.6 Generating Threaded Code To handle multiway rendezvous and exceptions on multiprocessors, we needed a new technique. Our next backend [16] generates C code that calls the POSIX thread library. Here, the challenge is minimizing overhead. Each communication action acquires the lock on a channel, checks whether every connected process had also blocked (whether the rendezvous could occur), and then checks if the channel is connected to a poisoned process (an exception had been thrown).
4
Compiling SHIM
void h(chan int &A) f A = 4; send A; A = 2; send A; g void j (chan int A) throws Done f recv A; throw Done; g void f (chan int &A) throws Done f h(A); par j (A);
137
void g(chan int A) f recv A; recv A; g void main() f try f chan int A; f (A); par g(A); g catch (Done) fg g
Fig. 4.10 A SHIM program with exceptions
4.6.1 An Example We will use the example in Fig. 4.10 to illustrate threaded code generation. There, the main function declares the integer channel A and passes it to tasks f and g, then f passes it to h and j. Tasks f and h send data with send A Tasks g and j receive it with recv A. Task h sends the value four to tasks g and j. Task h blocks on the second send A because task j does not run a matching recv A. As we described earlier, SHIM’s exceptions enable a task to gracefully interrupt its concurrently running siblings. A sibling is “poisoned” by an exception only when it attempts to communicate with a task that raised an exception or with a poisoned task. For example, when j in Fig. 4.10 throws Done, it interrupts h’s second send A and g’s seconds recv A, resulting in the death of h and g. An exception handler runs after all the tasks in its scope have terminated or been poisoned.
4.6.2 The Static Call-Graph Assumption For efficiency, our compiler assumes the communication and call graph of the program is known at compile time. We reject programs with unbounded recursion and can expand programs with bounded recursion [13], allowing us to transform the call graph into a call tree. This duplicates code to improve performance: fewer channel aspects are managed at run time. We encode in a bit vector the subtree of functions connected to a channel. Since we know at compile time which functions can connect to each channel, we assign a unique bit to each function on a channel. We check these bits at run time with logical mask operations. In the code, something like A f is a constant that holds the bit our compiler assigns to function f connected to channel A, such as 0x4.
138
S.A. Edwards and N. Vasudevan
4.6.3 Implementing Rendezvous Communication Implementing SHIM’s multiway rendezvous communication with exceptions is the main code generation challenge. The code at a send or receive is straightforward: it locks the channel, marks the function and its ancestors as blocked, calls the event function for the channel to attempt the communication, and blocks until communication has occurred. If it was poisoned, it branches to a handler. Figure 4.11 is the code for send A in h in Fig. 4.10. For each channel, our compiler generates an event function that manages communication. Our code calls an event function when the state of a channel changes, such as when a task blocks or connects to a channel. Figure 4.12 shows the event function our compiler generates for channel A in Fig. 4.10. While complex, the common case is quick: when the channel is not ready (one connected task is not blocked on the channel) and no task is poisoned, A.connected ! D A.blocked and A.poisoned D 0 so the bodies of the two if statements are skipped. If the channel is ready to communicate, A.blocked D A.connected so the body of the first if runs. This clears the channel blocked D 0) and main’s value for A (passed by reference to f and h) is copied to g or j if connected. If at least one task connected to the channel has been poisoned, A.poisoned !D 0 so the body of the second if runs. This code comes from unrolling a recursive procedure at compile time, which is possible because we know the structure of the channel (i.e., which tasks connect to it). The speed of such code is a key advantage over a library. This exception-propagation code attempts to determine which tasks, if any, connected to the channel should be poisoned. It does this by manipulating two bit vectors. A task can die if and only if it is blocked on the channel and all its children connected to the channel (if any) also can die. A poisoned task may kill its sibling tasks and their descendants. Finally, the code kills each task in the kill set that can die and was not poisoned before by setting its state to POISON and updating the channel accordingly (A.poisoned /Dkill).
lock (A.mutex); A.blocked j= (A hjA f jA main); event A (); while (A.blocked & A h) f if (A.poisoned & A h) f unlock(A.mutex); goto poisoned ; g wait(A.cond, A.mutex); g unlock(A.mutex);
/ / / / /
acquire lock for channel A / block h and ancestors on A / alert channel of the change / while h remains blocked / were we poisoned? /
/ wait on channel A / / release lock for channel A /
Fig. 4.11 C code for send A in function h() of Fig. 4.10
4
Compiling SHIM
139
void event A () f unsigned int can die = 0, kill = 0; if (A.connected == A.blocked) f / communicate / A.blocked = 0; if (A.connected & A g) A.g = A.main; if (A.connected & A j) A.j = A.main; broadcast (A.cond); g else if (A.poisoned) f / propagate exceptions / can die = blocked & (A gjA hjA j ); / compute can die set / if ( can die & (A hjA j ) == A.connected & (A hjA j )) can die j= blocked & A f; if (A.poisoned & (A f jA g)) f / compute kill set / kill j= A g; if ( can die & A f) kill j= ( A f jA hjA j ); g if (A.poisoned & (A hjA j )) f kill j= A h; kill j= A j ; g if ( kill &= can die & ˜A.poisoned) f / poison some tasks? / unlock(A.mutex); if ( kill & A g) f / poison g if in kill set / lock (g.mutex); g. state = POISON; unlock(g.mutex); g / also poison f , h, and j if in kill set ... / lock (A.mutex); A.poisoned j= kill ; broadcast (A.cond); g g g Fig. 4.12 C code for the event function for channel A of Fig. 4.10
lock (main.mutex); main. state = POISON; unlock(main.mutex); lock ( f . mutex); f . state = POISON; unlock(f.mutex); lock ( j . mutex); j . state = POISON; unlock(j.mutex); goto poisoned ; Fig. 4.13 C code for throw Done in function j() of Fig. 4.10
Code for throwing an exception (Fig. 4.13) marks as POISON all its ancestors up to where it will be handled. Because the compiler knows the call tree, it knows how far to “unroll the stack,” i.e., how many ancestors to poison.
4.6.4 Starting and Terminating Tasks It is costly to create and destroy a POSIX thread because each has a separate stack, it requires interaction with the operating system’s scheduler, and it usually requires a system call. To minimize this overhead, because we know the call graph of the program at compile time, our compiler generates code that creates at the beginning as many threads as the SHIM program will ever need. These threads are only destroyed when the SHIM program terminates; if a SHIM task terminates, its POSIX thread blocks until it is re-awakened.
140
lock (A.mutex); / connect / A.connected j= ( A f jA g); event A (); unlock(A.mutex); lock (main.mutex); main. attached children = 2; unlock(main.mutex); lock ( f .mutex); / pass args / f . A = &A; unlock( f . mutex); / A is dead on entry for g, so do not pass A to g /
S.A. Edwards and N. Vasudevan
lock ( f . mutex); / run f () / f . state = RUN; broadcast(f .cond); unlock( f . mutex); lock (g.mutex); / run g() / g. state = RUN; broadcast(g.cond); unlock(g.mutex); lock (main.mutex); / wait for children / while (main. attached children ) wait(main.cond, main.mutex); if (main. state == POISON) f unlock(main.mutex); goto poisoned ; g unlock(main.mutex);
Fig. 4.14 C code for calling f() and g() in main() of Fig. 4.10
Figure 4.14 shows the code in main that runs f and g in parallel. It connects f and g to channel A, sets its number of live children to 2, passes function parameters, then starts f and g. The address for the pass-by-reference argument A is passed to f. Normally, a value for A would be passed to g, but our compiler found this value is not used so the copy is avoided (discussed below). After starting f and g, main waits for both children to return. Then main checks whether it was poisoned, and if so, branches to a handler. Reciprocally, Fig. 4.15 shows the code in f that controls its execution: an infinite loop that waits for main, its parent, to set its state field to running, at which point it copies its formal arguments into local variables and runs its body. If a task terminates normally, it cleans up after itself. In Fig. 4.15, task f disconnects from channel A, sets its state to STOP, and informs main it has one less running child. By contrast, if a task is poisoned, it may still have children running and it may also have to poison sibling tasks so it cannot entirely disappear yet. In Fig. 4.15, task f, if poisoned, does not disconnect from A but updates its poisoned field. Then, task f waits for its children to return. At this time, f can disconnect its (potentially poisoned) children from channels, since they can no longer poison siblings. Finally, f informs main it has one less running child. For IBM’s CELL processor, we developed a backend [17] that is a direct offshoot of the pthreads backend but allowed the user to assign certain (computationally intensive) tasks directly to the CELL’s eight synergistic processing units (SPUs); the rest of the tasks ran on the CELL’s standard PowerPC core (PPU). We did this by replacing these offloaded functions with wrapper functions that communicated across the PPU–SPU boundary. Function calls across the boundary turned out to be fairly technical because of data alignment restrictions on function arguments, which we would have preferred to be stack-resident. This, and many more fussy aspects
4
Compiling SHIM
int A; / value of channel A / restart : lock ( f . mutex); while ( f . state != RUN) wait( f . cond, f . mutex); A = f . A; / copy arg . / unlock( f . mutex); / body of the f task / terminated : lock (A.mutex); / disconnect f / A.connected &= ˜A f; event A (); unlock(A.mutex); lock ( f . mutex); / stop / f . state = STOP; unlock( f . mutex); goto detach ;
141
poisoned : lock (A.mutex); / poison A / A.poisoned j= A f ; A.blocked &= ˜A f; event A (); unlock(A.mutex); lock ( f . mutex); / wait for children / while ( f . attached children ) wait( f . cond, f . mutex); unlock( f . mutex); lock (A.mutex); / disconnect j , h / A.connected &= ˜(A hjA j ); A.poisoned &= ˜(A hjA j ); event A (); unlock(A.mutex); detach : / detach from parent / lock (main.mutex); main.attached children ; broadcast (main.cond); unlock(main.mutex); goto restart ;
Fig. 4.15 C code in function f() controlling its execution
of coding for the CELL, did convince us that language at a higher level than C is appropriate for such heterogeneous multicore processors.
4.7 Detecting Deadlocks SHIM, although race free, is not immune to deadlocks. A simple example is f recv a; recv b; g par f send b; send a; g, which deadlocks because the first task is attempting to communicate on a first yet the second, which is also connected to a, excepts b to be first. Fortunately, SHIM’s scheduling independence means that for a given input sequence, a SHIM program behaves the same for all executions of the program, and thus either always deadlocks or never deadlocks. In particular, SHIM does not need to be analyzed under an interleaved model of concurrency, rendering all the partial order reduction tricks of model checkers such as Holzmann’s SPIN [1] unnecessary for SHIM. Our first attempt at detecting deadlocks in SHIM [18] employs the symbolic synchronous model checker NuSMV [19] – an interesting choice since SHIM’s concurrency model is fundamentally asynchronous. Our approach abstracts away data operations and chooses a specific schedule in which each communication event takes a single cycle. This reduces the SHIM program to a set of communicating state machines suitable for the NuSMV model checker.
142
S.A. Edwards and N. Vasudevan
We have since continued our work on deadlock detection in SHIM [20]. Here we take a compositional approach in which we build an automaton for a complete system piece by piece. Our insight is that we can usually abstract away internal channels and simplify the automaton without introducing or avoiding deadlocks. The result is that even though we are doing explicit model-checking, we can often do it much faster than a brute-force symbolic model checker such as NuSMV. Figure 4.16 shows our technique in action. Starting from the (contrived) program, we first abstract the behavior of the first two tasks into simple automata. The first task communicates on channel a, then on channel b, then repeats; the second task d
c
a
b
(b)
b c a b
c
(c) a
c (d) a
c
a
c
a
a
c
a
(f)
d c d c d
a d
(g) a
a
d
a
a
d
a
d
(h)
(k)
(l)
b c
a (e)
void main() f chan int a, b, c , for (;;) f recv a; b = a + g par for (;;) f recv b; c = b + g par for (;;) f recv c; d = c + g par for (;;) f recv d; a = d + g g
d; 1; send b;
a
d c d a c
d
a
a
(j)
(i)
1; send c;
d
1; send d;
a b
1; send a;
d
d
c a a
b a
c d (m)
(a)
Fig. 4.16 Analyzing a four-task SHIM program (a) for deadlock. Composing the automata for the first (b) and second (c) tasks gives a product automaton (d). Channel b only appears in the first two tasks, so we abstract away its effect by identifying (e) and merging (f) equivalent states. Next, we compose this simplified automaton with that for the third task (g) to produce another (h). Now, channel c will not appear again, so again we identify (i) and merge (j) states. Finally, we compose this with the automaton for the fourth task (k) to produce a single, deadlocked state (l) because the fourth task insists on communicating first on d but the other three communicate first on a. The direct composition of the first three tasks without removing channels (m) is larger – eight states
4
Compiling SHIM
143
does the same on channels b and c. We compose these automata by allowing either to take a step on unshared channels but insisting on a rendezvous when a channel is shared. Then, since channel b is local to these two tasks, we abstract away its behavior by merging two states. This produces a simplified automaton that we then compose with the automaton for the third task. This time, channel c is local, so again we simplify the automaton and compose it with the automaton for the fourth task. The automaton we obtained for the first three tasks insists on communicating first on a then d; the fourth tasks communicates on d then a. This is a deadlock, which manifests itself as a state with no outgoing arcs. For programs that follow such a pipeline pattern, the number of states grows exponentially with the number of pipeline stages (precisely, n stages produce 2n states), yet our analysis only builds machines with 2n states before simplifying them to n C 1 states at each step. Although we still have to step through and analyze each of the n stages (leading to quadratic complexity), this is still a substantial improvement. Of course, our technique cannot always reduce an exponential state space to a polynomial one, but we find it often did on the example programs we tried.
4.8 Sharing Buffers We also applied the model-checking approach from the previous section to search for situations where buffer memory can be shared [21]. In general, each communication channel needs its own space to store any data being communicated over it, but in certain cases, it is possible to prove that two channels can never be active simultaneously. In the program in Fig. 4.17, the main task starts four tasks in parallel. Tasks 1 and 2 communicate on a. Then, tasks 2 and 3 communicate on b and finally tasks 3 and 4 on c. The value of c received by task 4 is 8. Communication on a cannot occur simultaneously with that of b because task 2 forces them to occur sequentially them. Similarly communications on b and c are forced to be sequential by task 3. Communications on a and c cannot occur together because they are forced to be sequential by the communication on b. Our tool understands this pattern and reports that a, b, and c can share buffers because their communications never overlap, thereby reducing the total buffer requirements by 66% for this program.
4.9 Conclusions The central hypothesis of the SHIM project is that its simple, deterministic semantics helps both programming and automated program analysis. That we have been able to devise truly effective mechanisms for clever code generation (e.g., static scheduling) and analysis (e.g., deadlock detection) that can gain deep insight into
144
S.A. Edwards and N. Vasudevan
void main() f chan int a, b, c; f // Task 1 send a = 6; // Send a ( synchronize with task 2) g par f // Task 2 recv a; // Receive a ( synchronize with task 1) send b = a + 1; // Send 7 on b ( synchronize with task 3) g par f // Task 3 recv b; // Receive b ( synchronize with task 2) send c = b + 1; // Send 8 on c ( synchronize with task 4) g par f // Task 4 recv c; // Receive c ( synchronize with task 3g // c = 8 here g g Fig. 4.17 A SHIM program that illustrates the possibility of buffer sharing. Channels a, b, and c are never active simultaneously and can therefore share buffer space
the behavior of programs vindicates this view. The bottom line: if a programming language does not have simple semantics, it is really hard to analyze its programs quickly or precisely. Algorithms where there is a large number of little, variable-sized, but independent pieces of work to be done do not mesh well with SHIM’s schedulingindependent philosophy as it currently stands. The obvious way to handle this is to maintain a bucket of tasks and assign each task to a processor once it has finished its last task. The order in which the tasks is performed, therefore, depends on their relative execution rates, but this does not matter if the tasks are independent. It would be possible to add scheduling-independent task distribution and scheduling to SHIM (i.e., provided the tasks are truly independent or, equivalently, confluent); exactly how is an open research question. Exceptions have been even more painful than multiway rendezvous. They are extremely convenient from a programming standpoint (e.g., SHIM’s rudimentary I/O library wraps each program in an exception to allow it to terminate gracefully; virtually every compiler test case includes at least a single exception), but extremely difficult to both implement and reason about. An alternative is to turn exceptions into syntactic sugar layered on the exceptionfree SHIM model. We always had this in the back our minds: an exception would just put a process into an unusual state where it would communicate its poisoned state to any process that attempts to communicate with it. The problem is that the complexity tends to grow quickly when multiple, concurrent exceptions and scopes are considered. Again, exactly how to translate exceptions into a simpler SHIM model remains an open question.
4
Compiling SHIM
145
That buffering is mandatory for high-performance parallel applications is hardly a revelation; we confirmed it anyway. The SHIM model has always been able to implement FIFO buffers (e.g., Fig. 4.6), but we have realized that they are sufficiently fundamental to be a first-class type in the language. We are currently working on a variant of the language that replaces pure rendezvous communication with bounded, buffered communication. Because it will be part of the language, it will be easier to map to unusual environments, such as the DMA mechanism for inter-core communication on the CELL processor. SHIM has already been an inspiration for aspects of some other languages. We ported its communication model into the Haskell functional language [22] and proposed a compiler that would impose its scheduling-independent view of the work on arbitrary programs [23]. Certain SHIM ideas, such as scheduling analysis [24], have also been used in IBM’s X10 language. Acknowledgements Many have contributed to SHIM. Olivier Tardieu created the formal semantics, devised the exception mechanism, and instigated endless (constructive) arguments. Jia Zeng developed the static recursion removal algorithm. Baolin Shao designed the compositional deadlock detection algorithm. The NSF has supported the SHIM project under grant 0614799.
References 1. Holzmann, G.J.: The model checker SPIN. IEEE Transactions on Software Engineering 23(5) (May 1997) 279–294 2. Kahn, G.: The semantics of a simple language for parallel programming. In: Information Processing 74: Proceedings of IFIP Congress 74, Stockholm, Sweden, North-Holland (August 1974) 471–475 3. Hoare, C.A.R.: Communicating sequential processes. Communications of the ACM 21(8) (August 1978) 666–677 4. Buck, J.T.: Scheduling dynamic dataflow graphs with bounded memory using the token flow model. PhD thesis, University of California, Berkeley (1993). Available as UCB/ERL M93/69 5. Parks, T.M.: Bounded scheduling of process networks. PhD thesis, University of California, Berkeley (1995). Available as UCB/ERL M95/105 6. Edwards, S.A.: Experiences teaching an FPGA-based embedded systems class. In: Proceedings of the Workshop on Embedded Systems Education (WESE), Jersey City, New Jersey (September 2005) 52–58 7. Edwards, S.A.: SHIM: A language for hardware/software integration. In: Proceedings of Synchronous Languages, Applications, and Programming (SLAP). Electronic Notes in Theoretical Computer Science, Edinburgh, Scotland (April 2005) 8. Benveniste, A., Caspi, P., Edwards, S.A., Halbwachs, N., Guernic, P.L., de Simone, R.: The synchronous languages 12 years later. Proceedings of the IEEE 91(1) (January 2003) 64–83 9. Edwards, S.A., Tardieu, O.: Efficient code generation from SHIM models. In: Proceedings of Languages, Compilers, and Tools for Embedded Systems (LCTES), Ottawa, Canada (June 2006) 125–134 10. Edwards, S.A., Tardieu, O.: SHIM: A deterministic model for heterogeneous embedded systems. In: Proceedings of the International Conference on Embedded Software (Emsoft), Jersey City, New Jersey (September 2005) 37–44 11. Edwards, S.A., Tardieu, O.: SHIM: A deterministic model for heterogeneous embedded systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14(8) (August 2006) 854–867
146
S.A. Edwards and N. Vasudevan
12. Tardieu, O., Edwards, S.A.: R-SHIM: Deterministic concurrency with recursion and shared variables. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (MEMOCODE), Napa, California (July 2006) 202 13. Edwards, S.A., Zeng, J.: Static elaboration of recursion for concurrent software. In: Proceedings of the Workshop on Partial Evaluation and Program Manipulation (PEPM), San Francisco, California (January 2008) 71–80 14. Tardieu, O., Edwards, S.A.: Scheduling-independent threads and exceptions in SHIM. In: Proceedings of the International Conference on Embedded Software (Emsoft), Seoul, Korea (October 2006) 142–151 15. Berry, G., Gonthier, G.: The Esterel synchronous programming language: Design, semantics, implementation. Science of Computer Programming 19(2) (November 1992) 87–152 16. Edwards, S.A., Vasudevan, N., Tardieu, O.: Programming shared memory multiprocessors with deterministic message-passing concurrency: Compiling SHIM to Pthreads. In: Proceedings of Design, Automation, and Test in Europe (DATE), Munich, Germany (March 2008) 1498–1503 17. Vasudevan, N., Edwards, S.A.: Celling SHIM: Compiling deterministic concurrency to a heterogeneous multicore. In: Proceedings of the Symposium on Applied Computing (SAC), Volume III, Honolulu, Hawaii (March 2009) 1626–1631 18. Vasudevan, N., Edwards, S.A.: Static deadlock detection for the SHIM concurrent language. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (MEMOCODE), Anaheim, California (June 2008) 49–58 19. Cimatti, A., Clarke, E.M., Giunchiglia, E., Giunchiglia, F., Pistore, M., Roveri, M., Sebastiani, R., Tacchella, A.: NuSMV version 2: An opensource tool for symbolic model checking. In: Proceedings of the International Conference on Computer-Aided Verification (CAV), Copenhagen, Denmark (July 2002). Volume 2404 of Lecture Notes in Computer Science, Springer, Berlin, pp. 359–364 20. Shao, B., Vasudevan, N., Edwards, S.A.: Compositional deadlock detection for rendezvous communication. In: Proceedings of the International Conference on Embedded Software (Emsoft), Grenoble, France (October 2009) 21. Vasudevan, N., Edwards, S.A.: Buffer sharing in CSP-like programs. In: Proceedings of the International Conference on Formal Methods and Models for Codesign (MEMOCODE), Cambridge, Massachusetts (July 2009) 22. Vasudevan, N., Singh, S., Edwards, S.A.: A deterministic multi-way rendezvous library for Haskell. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Miami, Florida (April 2008) 1–12 23. Vasudevan, N., Edwards, S.A.: A determinizing compiler. In: Proceedings of Program Language Design and Implementation (PLDI), Dublin, Ireland (June 2009) 24. Vasudevan, N., Tardieu, O., Dolby, J., Edwards, S.A.: Compile-time analysis and specialization of clocks in concurrent programs. In: Proceedings of Compiler Construction (CC), York, United Kingdom (March 2009). Volume 5501 of Lecture Notes in Computer Science, Springer, Berlin, pp. 48–62
Chapter 5
A Module Language for Typing SIGNAL Programs by Contracts Yann Glouche, Thierry Gautier, Paul Le Guernic, and Jean-Pierre Talpin
5.1 Introduction Methodological guidelines for the design of real-time embedded systems advise the validation of specifications as early as possible. Moreover, in a refinement-based development methodology of large embedded systems, an iterative validation of each refinement or modification made to the initial specification, until the implementation of the system is finalized, is highly desirable. Additionally, cooperative componentbased development requires to use and to assemble components, which have been developed by different suppliers, in a safe and consistent way [11, 17]. These components have to be provided with their conditions of use and guarantees that they have been validated when these conditions are satisfied. These conditions of use and guarantees represent a notion of contract. Contracts are now often required as a useful mechanism for validation in robust software design. Design by Contract, as advocated in [26], is being made available for usual languages like C++ or Java. Assertion-based contracts express program invariants, pre- and post-conditions, as Boolean type expressions that have to be true for the contract being validated. We adopt here a different paradigm of contract to define a component-based validation technique in the context of a synchronous modeling framework. In our theoretical model, a component is represented by an abstract view of its behaviors. It has a finite set of input/output variables to cooperate with its environment. Behaviors are viewed as multi-set traces on the variables of the component. The abstract model of a component is thus a process, defined as a set of such behaviors. A contract is a pair (assumptions, guarantees). Assumptions describe properties expected by a component to be satisfied by the context (the environment) in which this component is used; on the opposite, guarantees describe properties that are satisfied by the component itself when the context satisfies the assumptions. Such
Y. Glouche (), T. Gautier, P. Le Guernic, and J.-P. Talpin INRIA, Centre de Recherche de Rennes, Campus de Beaulieu, Rennes, France e-mail: [email protected]; [email protected]; [email protected]; [email protected]
S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 5, c Springer Science+Business Media, LLC 2010
147
148
Y. Glouche et al.
a contract may be documentary; however, when a suitable formal model exists, contracts can be supplied to some formal verification tool. We want to provide designers with such a formal model allowing “simple” but powerful and efficient computation on contracts. Thus, we define a novel algebraic framework to enable formal reasoning on contracts. The assumptions and guarantees of a component are defined as process-filters: assumptions filter the processes (sets of behaviors) a component may accept and guarantees filter the processes a component provides. A process-filter is the set of processes, whatever their input and output variables are, that are compatible with some property (or constraint) expressed on the variables of the component. Foremost, we define a Boolean algebra to manipulate process-filters. This yields an algebraically rich structure that allows us to reason about contracts (to abstract, refine, combine and normalize them). This algebraic model is based on a minimalist model of execution traces, allowing one to adapt it easily to a particular design framework. A main characteristic of this model is that it allows one to precisely handle the variables of components and their possible behaviors. This is a key point. Indeed, assumptions and guarantees are expressed, as usual, by properties constraining or relating the behaviors of some variables. What has to be considered very carefully is thus the “compatibility” of such constraints with the possible behaviors of other variables. This is the reason why we introduce partial order relations on processes and on process-filters. Moreover, having a Boolean algebra on process-filters allows one to formally, unambiguously and finitely express complementation within the algebra. This is, in turn, a real advantage compared to related formalisms and models. We put this algebra to work for the definition of a general purpose module system whose typing paradigm is based on the notion of contract. The type of a module is a contract holding assumptions made on and guarantees offered by its behaviors. It allows to associate a module with an interface which can be used in varieties of scenarios such as checking the composability of modules or efficiently supporting modular compilation. The corresponding module language is generic in that processes and contracts may be expressed in some external target language. In the context of real-time, safety-critical applications, we consider here the synchronous language SIGNAL to specify processes.
Organization We start with a highlight on some key features of our module system by considering the specification of a protocol for Loosely Time-Triggered Architectures, Sect. 5.2. This example is used in this article to illustrate our approach. We give an outline of our contract algebra, Sect. 5.3, and demonstrate its capabilities for logical and compositional reasoning on the assumptions and guarantees of componentbased embedded systems. This algebra is used, Sect. 5.4, as a foundation for the definition of a strongly-typed module system: contracts are used to type components
5
A Module Language for Typing SIGNAL Programs by Contracts
149
with behavioral properties. Section 5.5 demonstrates the use of our module system by considering the introductory example and by illustrating its contract based specification.
5.2 A Case Study We illustrate our approach by considering a protocol that ensures a coherent system of logical clocks on the top of Loosely Time-Triggered Architectures (LTTA). This protocol has been presented in [7]. We define contracts to characterize properties of this protocol.
5.2.1 Description of the Protocol In general, a distributed real-time control system has a time-triggered nature just because the physical system for control is bound to physics. A LTTA features a quasi-periodic and non-blocking bus access and independent read-write operations. The LTTA is composed of three devices, a writer, a bus, and a reader (Fig. 5.1). Each device is activated by its own, approximately periodic, clock. At the nth clock tick (time tw .n/), the writer generates the value xw .n/ and an alternating flag bw .n/ s.t.: bw .n/ D
false if n D 0; not bw .n 1/ otherwise:
Both values are stored in its output buffer, denoted by yw . At any time t , the writer’s output buffer yw contains the last value that was written into it: yw .t / D .xw .n/; bw .n// ;
xw tw .xw ; bw / bus
where n D supfn0 j tw .n0 / < t g:
? writer
? reader
?
? -
Fig. 5.1 The three devices of the LTTA
.xb ; bb /
xr tr .xr ; br / tb
(5.1)
150
Y. Glouche et al.
At tb .n/, the bus fetches yw to store in the input buffer of the reader, denoted by yb . Thus, at any time t , the reader input buffer is defined by: yb .t / D yw .tb .n// ;
where n D supfn0 j tb .n0 / < t g:
(5.2)
At tr .n/, the reader loads the input buffer yb into the variables x.n/ and b.n/: .x.n/; b.n// D yb .tr .n//: Then, the reader extracts x.n/ iff b.n/ has changed. This defines the sequence m of ticks: m.0/ D 0 ;
m.n/ D inffk > m.n 1/ j b.k/ ¤ b.k 1/g; xr .k/ D x.m.k//:
(5.3)
In any execution of the protocol, the sequences xw and xr must coincide, i.e., 8nxr .n/ D xw .n/. In [7] it is proved that this is the case iff the following conditions hold: jwk r ; (5.4) w b and b b where w, b and r are the respective periods of the clocks of the writer, the bus and the reader (for x 2 R, bxc denotes the largest integer less or equal to x). Conditions (5.4) are abstracted by conditions on ordering between events. The first condition, w b, is abstracted by the predicate: w b $ never two tw between two tb .
(5.5)
The abstraction of the second condition, bw=bc r=b requires the following definition of the first instant (of the bus) b .n/ D minftb .p/ j tb .p/ > tw .n/g where the bus can fetch the nth writing. The second condition is then restated as the requirement (5.6) that no two successive b can occur between two successive tr : jwk b
r $ never two b between two successive tr . b
(5.6)
Under the specific conditions (5.5) and (5.6), the correctness of the protocol reduces to the assumption: 8n 2 N; 9k 2 N;
s:t: b .n/ < tr .k/ b .nC1/:
It guarantees that all written values are actually fetched by the bus ( b .n/ always exists, and b .nC1/ ¤ b .n/ since there is at least one instant tr .k/ which occurs in between them), and all fetched values are actually read by the reader ( b .n/ < tr .k/ b .nC1/): see Fig. 5.2.
5
A Module Language for Typing SIGNAL Programs by Contracts
151
Fig. 5.2 Correctness of the protocol
5.2.2 Introduction to the Module Language Considering first the writer and the bus, the protocol will be correct only if w b, so that the data flow emitted by the bus is equal to the data flow emitted by the writer (8n xb .n/ D xw .n/). In the module language, a specification is designated by the keyword contract. It defines a set of input and output variables (interface) subject to a contract. The interface defines the way the component interacts with its environment through its variables. In addition, it embeds properties that are modeled by a composition of contracts. For instance, the specification of a bus controller could be defined by the assumption w b and the guarantee 8n xb .n/ D xw .n/. An implementation of the specification, designated by the keyword process, contains a compatible implementation of the above contract. module type WriterBusType = contract input real w, b; boolean xw output boolean xb; assume w >= b guarantee xb = xw end;
module WriterBus : WriterBusType = process input real w, b; boolean xw output boolean xb; (| ... |) end;
The specification of the properties we consider for the whole LTTA consists of two contracts. Each contract applies to a given component (bus or reader) of the LTTA. It is made of a clock relation as assumption and an equality of flows as guarantee. Instead of specifying two separate contracts, we define them as two instances of a generic one. To this end, we define an encapsulation mechanism to generically represent a class of specifications or implementations sharing a common pattern of behavior up to that of the parameters. In the example of the LTTA, for instance, we have used a functor to parameterize it with respect to its clock relations.
152
Y. Glouche et al.
module type LTTAProperty = module type LTTAClockConstraints = contract input real w, b, r; functor(real c1, c2) boolean cw contract input boolean xwb output boolean xr; output boolean xbr; LTTAProperty(w, b)(xw, xb) and assume c1 >= c2 LTTAProperty(floor(w/b), r/b)(xb, xr) guarantee xbr = xwb end; end;
The generic contract, called LTTAProperty, is parameterized with two clocks. When the clock constraint associated with the context of the considered component (bus or reader) of the LTTA is respected, the preservation of the flows is ensured by this component. The contract of the LTTA is defined by the composition “and” of two applications of the generic LTTAProperty contract. Each application defines a property of the LTTA with its appropriate clock constraint. The composition defines the greatest lower-bound of both contracts. Each application of the LTTAProperty functor produces a contract which is composed with the other ones in order to produce the expected contract. A module is hence viewed as a pair M W I consisting of an implementation M that is typed by (or viewed as) a specification I of satisfiable contract. The semantics of the specification I , written [[I]], is a set of processes (in the sense of Sect. 5.3) whose traces satisfy the contract associated with I . The semantics [[M]] of the implementation M is a process contained in [[I]].
5.3 An Algebra of Contracts for Assume-Guarantee Reasoning Section 5.3.1 introduces a suitably general algebra of processes. A contract (A,G) is viewed as a pair of logical devices filtering processes: the assumption A filters processes to select (accept or conversely reject) those that are asserted (accepted or conversely rejected) by the guarantee G. Process-filters are defined in Sect. 5.3.2 and contracts in Sect. 5.3.3. The proofs of properties presented in this section are provided in [12]. Section 5.3.4 discusses some related approaches for contracts.
5.3.1 An Algebra of Processes We start with the definition of a suitable algebra for behaviors and processes. We deliberately choose an abstract definition of behavior as a function from a set of variable names to a domain of typed values. These typed values may be themselves functions of time to some domain of data values: it is the case, for instance, when we consider the SIGNAL language, where a behavior describes the trace of a discrete process.
5
A Module Language for Typing SIGNAL Programs by Contracts
153
Definition 1 (Behavior). Let V be an infinite, countable set of variables, and D a set of values; for Y a nonempty finite set of variables included in V, a Y-behavior is a function b : Y ! D . The set of Y-behaviors is denoted by BY D Y ! D ; B; D ; denotes the set of behaviors (which is empty) on the empty set of variables. The notation cjX is used for the restriction of a Y-behavior c on X, a (possibly empty) subset of Y. A process is defined as a set of behaviors on a given set of variables. Definition 2 (Process). For X a finite set of variables, an X-process p is a nonempty set of X-behaviors. The unique ;-process, on the empty set of variables, is denoted by ˝ D f;g. It can be seen as the universal process; it has no effect when composed with other processes. The empty process, which is defined by the empty set of behaviors, is denoted by D ;. It can be seen as the null process; when composed (intersected) with other processes, it always results in the empty process. The set of X-processes is denoted by P X D P.BX / n fg and P ? X D P X [ fg. The set of all processes is denoted by P D [(X V) P X and P ? D P [ fg. For an X-process p, the domain X of its behaviors is denoted var(p), and var() D V. Complement, restriction and extension of a process have expected definitions: Definition 3 (Complement, restriction and extension). For X a finite set of variables, the complement e p of a process p 2 P X is defined by e p D (BX n p). Also, e D BX . For X, Y, finite sets of variables such that X Y V, qjX D fcjX / c 2 qg is the restriction qjX 2 P X of q 2 P Y and pjY D fc 2 BY / cjX 2 pg is the extension pjY 2 P Y of p 2 P X . Also, jX D and jV D . Note that the extension of p in P X to Y V is the process on Y that has the same constraints as p. The set P ? X , equipped with union, intersection and complement, is a Boolean algebra with supremum P ? X and infimum . The extension operator induces a partial order on processes, such that p q if q is an extension of p to the variables of q; the relation , used to define filters, is studied below. Definition 4 (Process extension relation). The process extension relation is defined by: ( 8 p 2 P ) ( 8 q 2 P ) (p q) D ((var(p ) var( q)) ^ (pjvar(q) D q)) Thus, if (p q), q is defined on more variables than p; on the variables of p, q has the same constraints as p; its other variables are free. This relation extends to P ? with ( ). Property 1. (P ? ,) is a poset. In this poset, the upper set of a process p, called extension upper set, is the set of all its extensions; it is denoted by p" D fq 2 P / p qg.
154
Y. Glouche et al.
Fig. 5.3 Controlled (left) and non-controlled (right) variable y in a process q
To study properties of extension upper sets, we characterize semantically the set of variables that are constrained by a given process: a process q 2 P controls a given variable y if y belongs to var(q) and q is not equal to the extension on var(q) of its projection on (var(q)nfyg). Formally, a process q 2 P controls a variable y, written (q y), iff (y 2 var(q)) and q ¤ ..qj(var(q)nfyg) /jvar(q) /. A process q 2 P controls a variable set X, written (q X), iff ( 8 x 2 X) (q x). Also, is extended to P ? with V. This is illustrated in Fig. 5.3 (left): there is some behavior b in q that has the same restriction on (var(q)nfyg) as some behavior c in Bvar(q) such that c does not belong to q; thus q is strictly included in .qj(var(q)nfyg) /jvar(q) . We define a reduced process (the key concept to define filters) as being a process that controls all of its variables. Definition 5 (Reduced process). A process p 2 P ? is reduced iff p var(p). O
Reduced processes are minimal in (P ,). We denote by q, called reduction of q, O
O
the (minimal) process such that q q (p is reduced iff p D p). Property 2. The complement e p of a nonempty process p strictly included in Bvar(p) is reduced iff p is reduced; then e p and p control the same set of variables var(p). O
The extension upper set p" of the reduction of p is composed of all the sets of behaviors, defined on variable sets that include the variables controlled by p, as maximal processes (for union of sets of behaviors) that have exactly the same constraints as p (variables that are not controlled by p are also not controlled in the O
O
processes of p" ). We also observe that var( q) is the greatest subset of variables O
such that q var( q). Then we define the inclusion lower set of a set of processes to capture all the subsets of behaviors of these processes. Let R P ? , R# is the inclusion lower set of R for defined by R# D fp 2 P ? / ( 9 q 2 R) (p q)g.
5
A Module Language for Typing SIGNAL Programs by Contracts
155
5.3.2 An Algebra of Filters In this section, we define a process-filter by the set of processes that satisfy a given property. We define an order relation (v) on the set of process-filters ˆ. With this relation, (ˆ,v) is a Boolean algebra. A process-filter R is a subset of P ? that filters processes: it contains all the processes that are “equivalent” with respect to some constraint or property, so that all processes in R are accepted or all of them but are rejected. A process-filter is built from a unique process generator by extending it to larger sets of variables and then by including subprocesses of these “maximal allowed behavior sets”. Definition 6 (Process-filter). A set of processes R is a process-filter iff ( 9 r 2 P ? ) O
((r D r) ^ (R D r" # )). The process r is a generator of R (R is generated by r). O
The process-filter generated by the reduction of a process p is denoted by b pD O
p" # . The generator of a process-filter R is unique, we refer to it as R. Note that ˝ generates the set of all processes (including ) and belongs to all filters. Formally, ( 8 p,r,s 2 P ? ), we have: O
.p 2 b r/ H) .var( r) var(p)/
O
O
b r Db s”rDs
˝ 2b r ”b r D P?
Figure 5.4 illustrates how a process-filter is generated from a process p (depicted by the bold line) in two successive operations. The first operation consists of building the extension upper set of the process: it takes all the processes that are compatible with p and that are defined on a larger set of variables. The second operation proceeds using the inclusion lower set of this set of processes: it takes all the processes that are defined by subsets of behaviors from processes in the extension upper set (in other words, those processes that remain compatible when adding constraints, since adding constraints removes behaviors). We denote by ˆ the set of process-filters. We call strict process-filters the process-filters that are neither P ? nor fg. The filtered variable set of a processO
filter R is var(R) defined by var(R) D var(R).
Fig. 5.4 Example of process-filter
156
Y. Glouche et al.
We define an order relation on process-filters, that we call relaxation, and write R v S to mean that S is a relaxation of R. Definition 7 (Process-filter relaxation). For R and S, two process-filters, let Z D var(R) [ var(S). The relation S is a relaxation of R, written R v S, is defined by: O
jZ
O
jZ
.R v S ” R S /
fg v S
.R v fg/ ” fg D R
The relaxation relation defines the structure of process-filters, which is shown to be a lattice. Lemma 1. (ˆ,v) is a lattice of supremum P ? and infimum fg. Let R and S be O
jV
jV O
two process-filters, V D var(R) [ var(S), RV D R and SV D S . Conjunction R u S, disjunction R t S and complement e R are respectively defined by: O
‚ …„ ƒ R u S D .RV \ SV /" # ,
fg u R D fg,
O
‚ …„ ƒ R t S D .RV [ SV /" # ; e O e R D R" # ;
fg t R D R,
e
fg D P ? ;
f? D fg: P
Let us comment the definitions of these operators. Conjunction of two strict processfilters R and S, for instance, is obtained by first building the extension of the O
O
generators R and S on the union of the sets of their controlled variables; then the intersection of these processes, which is also a process (set of behaviors) is considered; since this operation may result in some variables becoming free (not controlled), the reduction of this process is taken; and finally, the result is the process-filter generated by this reduction. The process-filter conjunction R u S of two strict process-filters R and S is the greatest process-filter T D R u S that accepts all processes that are accepted by R and by S. The same mechanism, with union, is used to define disjunction. The process-filter disjunction R t S of two strict process-filters R and S is the smallest process-filter T D R t S that accepts all processes that are accepted by R or by S. And the complement of a strict process-filter R is the process-filter O
generated by the complement of its generator R. Finally, we state a main result for process-filters, which is that process-filters form a Boolean algebra: Theorem 1. (ˆ, v) is a Boolean algebra with P ? as 1, fg as 0 and the complement e R.
5.3.3 An Algebra of Contracts From process-filters, we define the notion of assume/guarantee contract and propose a refinement relation on contracts.
5
A Module Language for Typing SIGNAL Programs by Contracts
157
Fig. 5.5 A process p satisfying a contract (A,G)
Definition 8 (Contract). A contract C D (A,G) is a pair of process-filters. The variable set of C D (A,G) is defined by var(C) D var(A) [ var(G). C D ˆ ,ˆ is the set of contracts. Usually, an assumption A is an assertion on the behavior of the environment (it is typically expressed on the inputs of a process) and thus defines the set of behaviors that the process has to take into account. The guarantee G defines properties that should be guaranteed by a process running in an environment where behaviors satisfy A. A process p satisfies a contract C D (A,G) if all its behaviors that are accepted by A (i.e., that are behaviors of some process in A), are also accepted by G. Figure 5.5 depicts a process p satisfying the contract (A,G) (b p is the process-filter generated by the reduction of p). This is made more precise and formal by the following definition. Definition 9 (Satisfaction). Let C D (A,G) a contract; a process p satisfies p u A) v G. C, written p C, iff (b Property 3. p C ” b p v (e A t G). We define a preorder relation that allows to compare contracts. A contract (A2 ,G2 ), iff all pro(A1 ,G1 ) is finer than a contract (A2 ,G2 ), written (A1 ,G1 ) cesses that satisfy the contract (A1 ,G1 ) also satisfy the contract (A2 ,G2 ): Definition 10 (Satisfaction preorder). (A1 ,G1 ) (A1 ,G1 )) H) (p (A2 ,G2 ))).
(A2 ,G2 ) iff ( 8 p 2 P )((p
The preorder on contracts satisfies the following property: Property 4. (A1 ,G1 )
f1 t G1 ) v (A f2 t G2 ). (A2 ,G2 ) iff (A
Refinement of contracts is further defined from the satisfaction preorder: Definition 11 (Refinement of contracts). A contract C1 D (A1 ,G1 ) refines a con(A2 ,G2 ), (A2 v A1 ) and G1 tract C2 D (A2 ,G2 ), written C1 4 C2 , iff (A1 ,G1 ) v (A1 t G2 ). Refinement of contracts amounts to relaxing assumptions and reinforcing promises under the initial assumptions. The intuitive meaning is that for any p that satisfies a contract C, if C refines D then p satisfies D. Our relation of refinement formalizes substitutability. Among contracts that could be used to refine an
158
Y. Glouche et al.
existing contract (A2 ,G2 ), we choose those contracts (A1 ,G1 ) that “scan” more processes than (A2 ,G2 ) (A2 v A1 ) and that guarantee less processes than those of A1 t G2 . The refinement relation can be expressed as follows in the algebra of processfilters: Property 5. (A1 ,G1 ) 4 (A2 ,G2 ) iff A2 v A1 , (A2 u G1 ) v G2 and G1 v (A1 t G2 ). The refinement relation (4) defines the poset of contracts, which is shown to be a lattice. In this lattice, the union (or disjunction) of contracts is defined by their least upper bound and the intersection (or conjunction) of contracts is defined by their greatest lower bound. These operations provide two compositions of contracts. Lemma 2 (Composition of contracts). Two contracts C1 D (A1 ,G1 ) and C2 D (A2 ,G2 ) have a greatest lower bound C D (A,G), written (C1 + C2 ), defined by: f2 u G1 / t .A f1 u A2 u G2 / t .G1 u G2 // A D A1 t A2 and G D ..A1 u A and a least upper bound D D (B,H), written (C1 * C2 ), defined by f1 u G1 / t .A f2 u G2 / t .A1 u G2 / t .A2 u G1 / . B D A1 u A2 and H D .A A Heyting algebra H is a bounded lattice such that for all a and b in H there is a greatest element x of H such that the greatest lower bound of a and x is smaller than b [5]. For all contracts C1 D (A1 ,G1 ), C2 D (A2 ,G2 ), there is a greatest element X of C such that the greatest lower bound of C1 and X refines C2 . Then our contract algebra is a Heyting algebra (in particular, it is distributive): Theorem 2. (C, 4) is a Heyting algebra with supremum (fg,P ? ) and infimum (P ? ,fg). Note that it is not a Boolean algebra since it is not possible to define in general a complement for each contract. The complement exists only for contracts of the form (A,e A) and it is then equal to (e A,A).
5.3.4 Contracts: Some Related Approaches The use of contracts has been advocated for a long time in computer science [1, 16] and, more recently, has been successfully applied in object-oriented software engineering [25]. In object-oriented programming, the basic idea of design-by-contract is to consider the services provided by a class as a contract between the class and its caller. The contract is composed of two parts: requirements made by the class upon its caller and promises made by the class to its caller. In the theory of interface automata [2], the notion of interface offers benefits similar to our notion of contract and for the purpose of checking interface compatibility
5
A Module Language for Typing SIGNAL Programs by Contracts
159
between reactive modules. In that context, it is irrelevant to separate the assumptions from guarantees and only one contract needs to be and is associated with a module. Separation and multiple views become of importance in a more general-purpose software engineering context. Separation allows more flexibility in finding (contravariant) compatibility relations between components. Multiple views allow better isolation between modules and hence favor compositionality. This is discussed in Sect. 5.5.3. In our contract algebra as in interface automata, a contract can be expressed with only one filter. To this end, a filtering equivalence relation [12] (that defines the equivalence class of contracts that accept the same set of processes) may be used to express a contract with only one guarantee filter and with its hypothesis filter accepting all the processes (or, conversely, with only one hypothesis filter and a guarantee filter that accepts no process). In [6], a system of assume-guarantee contracts with similar aims of genericity is proposed. By contrast to our domain-theoretical approach, the EC Speeds project considers an automata-based approach, which is indeed dual but makes notions such as the complement of a contract more difficult to express from within the model. The proposed approach also leaves the role of variables in contracts unspecified, at the cost of some algebraic relations such as inclusion. In [18], the authors show that the framework of interface automata may be embedded into that of modal I/O automata. This approach is further developed in [27], where modal specifications are considered. This consists of labelling transitions that may be fired and other that must. Modal specifications are equipped with a parallel composition operator and refinement order which induces a greatest lower bound. The glb allows addressing multiple-viewpoint and conjunctive requirements. With the experience of [6], the authors notice the difficulty in handling interfaces having different alphabets. Thanks to modalities, they propose different alphabet equalizations depending on whether parallel composition or conjunction is considered. Then they consider contracts as residuations G/A (the residuation is the adjoint of parallel composition), where assumptions A and guarantees G are both specified as modal specifications. The objectives of this approach are quite close to ours. Our model deals carefully with alphabet equalization. Moreover, using synchronous composition for processes and greatest lower bound for process-filters and for contracts, our model captures both substitutability and multiple-viewpoint (see Sect. 5.5.3). In [22], a notion of synchronous contracts is proposed for the programming language L USTRE. In this approach, contracts are executable specifications (synchronous observers) timely paced by a clock (the clock of the environment). This yields an approach which is satisfactory to verify safety properties of individual modules (which have a clock) but can hardly scale to the modeling of globally asynchronous architectures (which have multiple clocks). In [9], a compositional notion of refinement is proposed for a simpler streamprocessing data-flow language. By contrast to our algebra, which encompasses the expression of temporal properties, it is limited to reasoning on input–output types and input–output causality graph.
160
Y. Glouche et al.
The system Jass [4] is somewhat closer to our motivations and solution. It proposes a notion of trace, and a language to talk about traces. However, it seems that it evolves mainly towards debugging and defensive code generation. For embedded systems, we prefer to use contracts for validating composition and hope to use formal tools once we have a dedicated language for contracts. Like in JML [21], the notion of agent with inputs/outputs does not exist in Jass, the language is based on class invariants, and pre/post-conditions associated with methods. Our main contribution is to define a complete domain-theoretical framework for assume-guarantee reasoning. Starting from a domain-theoretical characterization of behaviors and processes, we build a Boolean algebra of process-filters and a Heyting algebra of contracts. This yields a rich structure which is (1) generic, in that it can be implemented or instantiated to specific models of computation; (2) flexible, in the way it can help structuring and normalizing expressions; and (3) complete, in the sense that all combinations of propositions can be expressed within the model. Finally, a temporal logic that is consistent with our model, such as for instance ATL (Alternating-time Temporal Logic [3]), can directly be used to express assumptions about the context of a process and guarantees provided by that process.
5.4 A Module Language for Typing by Contracts In this section, we define a module language to implement our contract algebra and apply it to the validation of component-based systems. For the peculiarity of our applications, it will be instantiated to the context of the synchronous language SIGNAL, yet it could equally be used in the context of related programming languages manipulating processes or agents. Its basic principle is to separate the interface, which declares properties of a program using contracts, and implementation, which defines an executable specification satisfying it.
5.4.1 Syntax We define the formal syntax of our module language. Its grammar is parameterized by the syntax of programs, noted p or q, which belong to the target specification or programming language. Names are noted x or y. Types t are used to declare parameters and variables in the interface of contracts. Assumptions and guarantees are described by expressions p and q of the target language. An expression exp manipulates contracts and modules to parameterize, apply, reference and compose them.
5
A Module Language for Typing SIGNAL Programs by Contracts
x; y p; q b; c WWD event j boolean j short j integer j : : : t WWD b j input b j output b j x j t t dec WWD t x Œ; dec def WWD module Œtype x D exp j module x ŒW t D exp j def I def ag WWD Œassume p guarantee qI j ag and ag j x.y / exp WWD contract decI ag end j process decI p end j functor .dec/ exp j exp and exp j x .exp / j let def in exp
161
(name) (process) (datatype) (type) (declaration) (definition)
(contract) (process) (contract) (process) (functor) (composition) (application) (scoping)
5.4.2 A Type System for Contracts and Processes We define a type system for contracts and processes in the module language. In the syntax of the module language, contracts and processes are associated with names x. These names can be used to type formal parameters in a functor and become type declarations. Hence, in the type system, type names that stand for a contract or a process are associated with a module type T . A base module type is a tagged pair .I; C /. The tag is noted for the type of a process and for the type of a contract. The set I consists of pairs x W t that declare the types t for its input and output variables x. The contract C is a pair of predicates .p; q/ that represent its assumptions p and guarantees q. The type of a functor .x W S /:T consists of the name x and of the types S and T of its formal parameter and result. S; T WWD t j .I; C / j S T j .x W S /:T (type)
WWD j (kind):
5.4.3 Subtyping as Refinement We define a subtyping relation on types t to extend the refinement relation of the contract algebra to the type algebra. In that aim, we apply the subtyping principle S T (S is a subtype of T ) to mean that the semantic objects denoted by S are contained in the semantic objects denoted by T (S refines T ). Hence, a module of type T can safely be replaced or substituted by a module of type S . Figure 5.6 depicts a process P with one long input x and two short outputs a, b, and a process Q with two integer inputs x, y and one integer output a, such that P refines Q.
162
Y. Glouche et al.
x:long!
P
!a:short x:integer ! !b:short y:integer !
Q
!a:integer
Fig. 5.6 Example of module refinement
Then the type of a module M encapsulating P is a subtype of a module N encapsulating Q, thus M can replace N . The subtyping relation is defined inductively with axioms for datatypes, rules for declarations and rules for each kind of module type. The complete rules are described in [14]. In particular, a module type S D .I; C / is a subtype of T D .J; D/, written S T , iff the inputs in J subtype those in I , the outputs in I subtype those in J , and the contract C refines D (written C D). We can interpret the relation C D as a mean to register the refinement constraint between C and D in the typing constraints. It corresponds to a proof obligation in the target language, whose meaning is defined by the semantic relation ŒŒC ŒŒD in the contract algebra, and whose validity may for instance be proved by model checking (the decidability of the subtyping relation essentially reduces to that of the refinement of contracts).
5.4.4 Composition of Modules Just as the subtyping relation, which implements and extends the refinement relation of the contract algebra in the typing algebra, the operations that define the greatest lower bound (glb) and least upper bound (lub) of two contracts are extended to module types by induction on the structure of types. Greatest lower bound: – Modules: .I; C / u .J; D/D .I u J; C + D/ – Products: S T u U V D.S u U / .T u V / – Functors: .x W S /:T u .y W U /:V D.x W .S t U //: .T u V Œy=x/ Least upper bound: – Modules: .I; C / t .J; D/D .I t J; C * D/ – Products: S T t U V D.S t U / .T t V / – Functors: .x W S /:T t .y W U /:V D.x W .S u U //:.T t V Œy=x/ Note that the intersection and union operators have to be extended to combine the set of input and output declarations of a module. For the greatest lower bound, for instance, the resulting set of input variables is obtained by intersection of the input sets, and the type of an input variable is defined as the lub of the considered corresponding types. For the lower bound again, the resulting set of output variables is obtained by the union of the output sets, and for common output variables,
5
A Module Language for Typing SIGNAL Programs by Contracts
163
their type is defined as the glb of the considered corresponding types. Similar rules are defined for the least upper bound (union of sets for inputs with glb of types, intersection of sets for outputs with lub of types). The composition of modules is made available in the module language through the “and” operator. The operands of this operator can be contracts, in that case, the resulting type is the greatest lower bound + of both contracts. Or they can be expressions of modules, in that case the resulting type is the greatest lower bound u of both module types.
5.5 Application to SIGNAL We illustrate the distinctive features of our contract algebra by reconsidering the specification of the LTTA and its translation into observers in the target language of our choice: the multi-clocked synchronous (or polychronous) data-flow language SIGNAL [19, 20].
5.5.1 Implementation of the LTTA We model the LTTA protocol in SIGNAL by specifying the abstraction of all functionalities that write and read values on the bus. Refer to [8] for a description of the operators of the SIGNAL language. In particular, the cell operator y := x cell b init y0 allows to memorize in y the latest value carried by x when x is present or when b is true. It is defined as follows: (| y := x default (x$1 init y0) | y ˆ= x ˆ+ (when b) |) We consider first the following basic functionalities: process current = { boolean v0; } ( ? boolean wx; event c; ! boolean rx; ) (| rx := (wx cell c init v0) when c |); process interleave = ( ? boolean x, sx; ) (| b := not (b$1 init false) | x ˆ= when b | sx ˆ= when (not b) |) where boolean b; end; The process current defines a cell in which values are stored at the input clock ˆwx and loaded on rx at the output clock c (the parameter v0 is used as
164
Y. Glouche et al.
initial value). The other functionality is the process interleave, that desynchronizes the signals x and sx by synchronizing them to the true and false values of an alternating Boolean signal b. A simple buffer can be defined from these functionalities: process shift_1 = ( ? boolean x; ! boolean sx; ) (| interleave(x, sx) | sx := current{false}(x, ˆsx)
|);
It represents a one-place FIFO, the contents of which is the last value written into it. Thanks to the interleave, the output (signal sx) may only be read/retrieved strictly after it was entered. Also, there is no possible loss nor duplication of data. For the purpose of the LTTA, a couple of signals have to be memorized together: the value to be transmitted, and its associated Boolean flag. So we define the process shift 2, in which both values are memorized at some common clock: process shift_2 = ( ? boolean x, b ! boolean sx, sb; ) (| interleave(x, sx) | sx := current{false}(x, ˆsb) | sb := current{true}(b, ˆsb) |); The shift processes ensure there is necessarily some delay between the input of a data and its output. But for a more general buffer, some data may be lost if a new one is entered and memorized values have to be sustained. Using the shift 2 (for the LTTA), we may write: process buffer = ( ? boolean x, b; event c ! boolean bx, bb, sb;) (| (sx, sb) := shift_2(x, b) | bx := current{false}(sx, c) | bb := current{true}(sb, c) |) where boolean sx; end; The signal c provides the clock at which data are retrieved from the buffer. The clock of the output signal sb (which is the clock resulting from the internal shift) represents the clock of the first instants at which the buffer can fetch a new value. It will be used to express some assumptions on the protocol. Then the process ltta is decomposed into a reader, a bus and a writer: process ltta = ( ? boolean xw; event cw, cb, cr; ! boolean xr;) (| (xb, bb, sbw) := bus(xw, writer(xw, cw), cb) | (xr, br, sbb) := reader(xb, bb, cr) |) where boolean bw, xb, bb, sbw, br, sbb; end;
5
A Module Language for Typing SIGNAL Programs by Contracts
165
Using the buffer process, the components have the following definition: process writer = ( ? boolean xw; event cw; ! boolean bw; ) (| bw ˆ= xw ˆ= cw | bw := not(bw$1 init true) |); process bus = ( ? boolean xw, bw; event cb; ! boolean xb, bb, sbw; ) (| (xb, bb, sbw) := buffer(xw, bw, cb) |); process reader = ( ? boolean xb, bb; event cr; ! boolean xr, br, sbb; ) (| (yr, br, sbb) := buffer(xb, bb, cr) | xr := yr when switched(br) |) where boolean yr; end; The switched basic functionality allows to consider the values for which the Boolean flag has alternated: process switched = ( ? boolean b; ! boolean c; ) (| zb := b$1 init true | c := (b and not zb) or (not b and zb) |) where boolean zb; end;
5.5.2 Contracts in SIGNAL In this section, we will specify assumptions and guarantees as SIGNAL processes representing generators of corresponding process-filters. The behavior of the LTTA is correct if the data flow extracted by the reader is equal to the data flow emitted by the writer (8n xr .n/ D xw .n/). It is the case if the following conditions hold: wb
and
jwk b
r : b
We will consider this property as a contract to be satisfied by a given implementation of the protocol. Here, we use again the SIGNAL language to specify this contract with the help of clock constraints or of signals used as observers [15]. In general, the generic structure of observers specified in contracts will find a direct instance and compositional translation into the synchronous multi-clocked model of computation of SIGNAL [20]. Indeed, a subtlety of the SIGNAL language is that
166
Y. Glouche et al.
an observer not only talks about the value, true or false, of a signal, but also about its status, present or absent. Considered as observers, the assumption and guarantee of the contract for LTTA could be described as follows:
˘ ALTTA D w b ^ wb br ; GLTTA D xr .n/ D xw .n/: For instance, GLTTA is true when xr .n/ D xw .n/ and it is false when xr .n/ ¤ xw .n/. By default, it is absent when the equality cannot be tested. Notice that the complement of an event (a given signal, e.g. xr, is present and true) is that it is absent or false. The signal GLTTA is present and true iff xr is present and the condition xr .n/ D xw .n/ is satisfied. For a trace of the guarantee GLTTA , the set of possible traces corresponding to its complement GLTTA is infinite (and dense) since it is not defined on the same clock as GLTTA . For example, GLTTA D 1 0 1 0 1 0 1 0 1 and GLTTA D 0 0 0 0 0 or 0 1 1 1 0 1 1 0 1 1 0 1 0 1 : : : Let cw, cb and cr be the signals representing respectively the clocks of the writer, the bus and the reader. If the signal sbw has the clock of the first instants at which the bus can fetch a new value (from the writer) – it has the same period as cw but is shifted from it –, the constraint “never two tw between two successive tb ” .w b/ can be expressed in SIGNAL by: cb ˆ= sbw ˆ+ cb. If the signal sbb has the clock of the first instants at which the reader can fetch a new value (from the bus), and its values represent the values of the Boolean flag transmitted along the communication path, then the constraint “never two
˘ b between two successive tr ” wb br ) – remind that b .n/ is the first instant where the bus can fetch the nth writing – can be expressed in SIGNAL by: cr ˆ= (when switched(sbb)) ˆ+ cr. Then the assumptions of the contract for the LTTA may be expressed as the synchronous composition of the above two clock constraints. Consider now the property that has to be verified: 8n xr .n/ D xw .n/. Let xr and xw represent respectively the corresponding signals. The property that these two signals represent the same data flows (remind that they do not have the same clock) can be expressed in SIGNAL by comparing xr (the output of the reader) with a signal – call it xok – which is the output of a FIFO queue on the input signal xw, such that xok can be freely resynchronized with xr. The signal xok can be defined as xok := fifo_3(xw) with fifo_3 a FIFO with enough memory so that the clock of xr is not indirectly constrained when xr and xok are synchronized:
A
A
process fifo_3 = ( ? boolean x; ! boolean xok; ) (| xok := shift_1(shift_1(shift_1(x))) |); The observer of the guarantee is expressed as: obs := (xr
= xok).
5
A Module Language for Typing SIGNAL Programs by Contracts
167
This contract can be used as a type for specifying a LTTA protocol. A possible implementation of this protocol is the one described in Sect. 5.5.1 using the SIGNAL language: module ltta : spec_LTTA = module type spec_LTTA = process input boolean xw; contract input boolean xw; event cw, cb, cr event cw, output boolean xr, cb, cr sbw, sbb; output boolean xr, (| (xb, bb, sbw) := sbw, sbb; bus(xw, writer(xw, cw), cb) assume | (xr, br, sbb) := (| cb ˆ= sbw ˆ+ cb reader(xb, bb, cr) | cr ˆ= cr ˆ+ |) (when switched(sbb)) where boolean bw, xb, bb, br; |) process writer = where ( ? boolean xw; event cw; process switched = ... ! boolean bw; ) end; (| ... |); guarantee process bus = (| xok := fifo_3(xw) ( ? boolean xw, bw; event cb; | obs := xr = xok ! boolean xb, bb, sbw; ) |) (| ... where boolean xok; |); process fifo_3 = ... process reader = end; ( ? boolean xb, bb; event cr; end; ! boolean xr, br, sbb; ) (| ... |); end; end;
It is needless to say that a sophisticated solver, based for instance on Pressburger arithmetics, shall help us to verify the consistency of the LTTA property. Nonetheless, an implementation of the above LTTA specification, for the purpose of simulation, can straightforwardly be derived. As a by-product, it defines an observer which may be used as a proof obligation against an effective implementation of the LTTA controller to verify that it implements the expected LTTA property. Alternatively, it may be used as a medium to synthesize a controller enforcing the satisfaction of the specified property on the implementation of the model.
5.5.3 Salient Properties of Contracts in the Synchronous Context In the context of component-based or contract-based engineering, refinement and substitutability are recognized as being fundamental requirements [10]. Refinement allows one to replace a component by a finer version of it. Substitutability allows one to implement every contract independently of its context of use. These properties are
168
Y. Glouche et al.
essential for considering an implementation as a succession of steps of refinement, until final implementation. As noticed in [27], other aspects might be considered in a design methodology. In particular, shared implementation for different specifications, multiple viewpoints and conjunctive requirements for a given component. Considering the synchronous composition of SIGNAL processes, the satisfaction relation of contracts and the greatest lower bound as a composition operator for contracts, we have the following properties: Property 6. Let two processes p, q 2 P , and contracts C1 , C2 , C0 1 , C0 2 2 C. (1) (2) (3) (4) (5)
C1 4 C2 H) ((p C1 ) H) (p C2 )) C1 C2 ” ((p C1 ) H) (p C2 )) ((C0 1 4 C1 ) ^ (C0 2 4 C2 )) H) ((C0 1 + C0 2 ) 4 (C1 + C2 )) ((p C1 ) ^ (q C2 )) H) ((pjq) (C1 + C2 )) ((p C1 ) ^ (p C2 )) ” (p (C1 + C2 ))
(1) and (2) relate to refinement and implementation; (3) and (4) allow for substitutability in composition; (5) addresses multiple viewpoints: (1) and (2) illustrate the substitutability induced by the refinement relation. For
relation (1), if a contract C1 refines a contract C2 then a process p which satisfies C1 also satisfies C2 . Consequently, the set of processes satisfying C1 being included in the set of processes satisfying C2 , a component which satisfies C2 can be replaced by a component which satisfies C1 . For relation (2), a contract C1 is finer than a contract C2 if and only if the processes which satisfy C1 also satisfy C2 . (3) and (4) illustrate the substitutability in composition. For relation (3), if a contract C0 1 refines a contract C1 and a contract C0 2 refines a contract C0 2 , then the greatest lower bound of C0 1 and C0 2 refines the greatest lower bound of C1 and C2 . Relation (4) expresses that a subsystem can be developed in isolation. Then, when developed independently, subsystems can be substituted to their specifications and composed as expected. If a SIGNAL process p satisfies a contract C1 and a SIGNAL process q satisfies a contract C2 , then the synchronous composition of p and q satisfies the greatest lower bound of C1 and C2 . Thus, each subsystem of a component can be analyzed and designed with its specific frameworks and tools. Finally, the composition of the subsystems satisfies the specification of the component. Property (4) could be illustrated as follows on the LTTA example: define the implementation of the LTTA as a functor parameterized by two components, bus and reader, respectively associated with the types (i.e., contracts) busType and readerType, such that the greatest lower bound of busType and readerType is equal to the type spec_LTTA associated with the LTTA implementation. (5) illustrates the notion of multiple viewpoints: a process p satisfies a contract C1 and a contract C2 if and only if p satisfies the greatest lower bound of contracts C1 and C2 . This property is a solution for the need for modularity coming from the concurrent development of systems by different teams using different
5
A Module Language for Typing SIGNAL Programs by Contracts
169
frameworks and tools. An example is the concurrent handling of the safety or reliability aspects and the functional aspect of a system. Other aspects may have to be considered too. Each of these aspects requires specific frameworks and tools for its analysis and design. Yet, they are not totally independent but rather interact. The issue of dealing with multiple aspects or multiple viewpoints is thus essential.
5.5.4 Implementation The module system described in this paper, embedding data-flow equations defined in SIGNAL, has been implemented in OCaml. It produces a proof tree that consists of (1) an elaborated SIGNAL program, that hierarchically renders the structure of the system described in the original module expressions, (2) a static type assignment, that is sound and complete with respect to the module type inference system, (3) a proof obligation consisting of refinement constraints, that are compiled as an observer or a temporal property in SIGNAL. The property is then passed on to SIGNAL’s model-checker, Sigali [24], which allows to prove or disprove that it is satisfied by the generated program. Satisfaction implies that the type assignment and produced SIGNAL program are correct with the initially intended specification. The generated property may however be used for other purposes. One is to use the controller synthesis services of Sigali [23] to automatically generate a SIGNAL program that enforces the property on the generated program. Another, in the case of infinite state system (e.g., on numbers) would be to generate defensive simulation code in order to produce a trace if the property is violated.
5.6 Conclusion Starting from an abstract characterization of behaviors as functions from variables to a domain of values (Booleans, integers, series, sets of tagged values, continuous functions), we introduced the notion of process-filter to formally characterize the logical device that filters behaviors from processes much like the assumption and guarantee of a contract do. In our model, a process p fulfils its requirements (or satisfies a contract) (A,G) if either it is rejected by A (i.e., if A represents assumptions on the environment, they are not satisfied for p) or it is accepted by G. The structure of process-filters is a Boolean algebra and the structure of contracts is a Heyting algebra. These rich structures allow for reasoning on contracts with great flexibility to abstract, refine and combine them. In addition to that, the negation of a contract can formally be expressed from within the model. Moreover, contracts are not limited to expressing safety properties, as is the case in most related frameworks, but encompass the expression of liveness properties [13]. This is all again due to the central notion of process-filter. Our model deals with constraints
170
Y. Glouche et al.
or properties possibly expressed on different sets of variables, and takes into account variable equalization when combining them. In this model, assumption and guarantee properties are not necessarily restricted to be expressed as formulas in some logic, but are rather considered as sets of behaviors (generator of processfilter). Note that such a process can represent a constraint expressed in some temporal logic. We introduced a module system based on the paradigm of contract for a synchronous multi-clocked formalism, SIGNAL, and applied it to the specification of a component-based design process. The paradigm we are putting forward is to regard a contract as the behavioral type of a component and to use it for the elaboration of the functional architecture of a system together with a proof obligation that validates the correctness of assumptions and guarantees made while constructing that architecture. Acknowledgement Partially funded by the EADS Foundation.
References 1. Abadi, M., Lamport, L.: Composing specifications. ACM Transactions on Programming Languages and Systems 15(1), 73–132 (1993) 2. de Alfaro, L., Henzinger, T.A.: Interface automata. ACM SIGSOFT Software Engineering Notes 26(5), 109–120 (2001) 3. Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. Journal of the ACM 49(5), 672–713 (2002) 4. Bartetzko, D., Fischer, C., M¨oller, M., Wehrheim, H.: Jass – Java with assertions. Electronic Notes in Theoretical Computer Science 55(2), 1–15 (2001) 5. Bell, J.L.: Boolean algebras and distributive lattices treated constructively. Mathematical Logic Quarterly 45, 135–143 (1999) 6. Benveniste, A., Caillaud, B., Passerone, R.: A generic model of contracts for embedded systems. Tech. Rep. 6214, INRIA Rennes (2007) 7. Benveniste, A., Caspi, P., Le Guernic, P., Marchand, H., Talpin, J.P., Tripakis, S.: A protocol for loosely time-triggered architectures. In: J. Sifakis, S.A. Vincentelli (eds.) EMSOFT ’02: Proceedings of the Second International Conference on Embedded Software, Lecture Notes in Computer Science, vol. 2491, pp. 252–265. Springer, Berlin (2002) 8. Besnard, L., Gautier, T., Le Guernic, P., Talpin, J.P.: Compilation of polychronous dataflow equations. In this book 9. Broy, M.: Compositional refinement of interactive systems. Journal of the ACM 44(6), 850– 891 (1997) 10. Doyen, L., Henzinger, T.A., Jobstmann, B., Petrov, T.: Interface theories with component reuse. In: EMSOFT ’08: Proceedings of the 8th ACM international conference on Embedded software, pp. 79–88. ACM (2008) 11. Edwards, S., Lavagno, L., Lee, E.A., Sangiovanni-Vincentelli, A.: Design of embedded systems: formal models, validation, and synthesis. Proceedings of the IEEE 85(3), 366–390 (1997) 12. Glouche, Y., Le Guernic, P., Talpin, J.P., Gautier, T.: A boolean algebra of contracts for logical assume-guarantee reasoning. Tech. Rep. 6570, INRIA Rennes (2008) 13. Glouche, Y., Talpin, J.P., Le Guernic, P., Gautier, T.: A boolean algebra of contracts for logical assume-guarantee reasoning. In: 6th International Workshop on Formal Aspects of Component Software (FACS 2009) (2009)
5
A Module Language for Typing SIGNAL Programs by Contracts
171
14. Glouche, Y., Talpin, J.P., Le Guernic, P., Gautier, T.: A module language for typing by contracts. In: E. Denney, D. Giannakopoulou, C.S. P˘as˘areanu (eds.) Proceedings of the First NASA Formal Methods Symposium, pp. 86–95. NASA Ames Research Center, Moffett Field, CA, USA (2009) 15. Halbwachs, N., Lagnier, F., Raymond, P.: Synchronous observers and the verification of reactive systems. In: AMAST ’93: Proceedings of the Third International Conference on Methodology and Software Technology, pp. 83–96. Springer, Berlin (1994) 16. Hoare, C.A.R.: An axiomatic basis for computer programming. Communications of the ACM 12(10), 576–580 (1969) 17. Kopetz, H.: Component-based design of large distributed real-time systems. Control Engineering Practice 6(1), 53–60 (1997) 18. Larsen, K.G., Nyman, U., Wasowski, A.: Modal I/O automata for interface and product line theories. In: R. De Nicola (ed.) ESOP, Lecture Notes in Computer Science, vol. 4421, pp. 64–79. Springer, Berlin (2007) 19. Le Guernic, P., Gautier, T., Le Borgne, M., Le Maire, C.: Programming real-time applications with SIGNAL. Proceedings of the IEEE 79(9), 1321–1336 (1991) 20. Le Guernic, P., Talpin, J.P., Le Lann, J.C.: Polychrony for system design. Journal for Circuits, Systems and Computers 12(3), 261–304 (2003) 21. Leavens, G.T., Baker, A.L., Ruby, C.: JML: A notation for detailed design. In: H. Kilov, B. Rumpe, W. Harvey (eds.) Behavioral Specifications of Businesses and Systems, pp. 175–188. Kluwer, Dordrecht (1999) 22. Maraninchi, F., Morel, L.: Logical-time contracts for reactive embedded components. In: EUROMICRO, pp. 48–55. IEEE Computer Society (2004) 23. Marchand, H., Bournai, P., Le Borgne, M., Le Guernic, P.: Synthesis of discrete-event controllers based on the Signal environment. Discrete Event Dynamic System: Theory and Applications 10(4), 325–346 (2000) 24. Marchand, H., Rutten, E., Le Borgne, M., Samaan, M.: Formal verification of programs specified with Signal: application to a power transformer station controller. Science of Computer Programming 41(1), 85–104 (2001) 25. Meyer, B.: Object-Oriented Software Construction (2nd ed.). Prentice-Hall, New York (1997) 26. Mitchell, R., McKim, J., Meyer, B.: Design by Contract, by Example. Addison Wesley Longman, Redwood City, CA (2002) 27. Raclet, J.B., Badouel, E., Benveniste, A., Caillaud, B., Passerone, R.: Why are modalities good for interface theories? In: Proc. of the 9th International Conference on Application of Concurrency to System Design (ACSD’09), pp. 119–127. IEEE Computer Society Press (2009)
Chapter 6
MRICDF: A Polychronous Model for Embedded Software Synthesis Bijoy A. Jose and Sandeep K. Shukla
6.1 Introduction Safety critical applications require embedded software that can guarantee deterministic output and on time results during execution. Synchronous programming languages are by nature focused on providing the programmer with deterministic output at fixed points in time [1, 2]. Synchronous programming languages such as Esterel [3], LUSTRE [4], SIGNAL [5], etc., have successfully generated sequential software for embedded applications targeting avionics, rail transport, etc. The underlying formalism of these languages is robust and the associated software tools provide correctness preserving refinement from the specification to implementation. But even within the class of synchronous programming languages, there are very distinctive Models of Computation (MoC) [6]. Esterel and LUSTRE use an external global tick (or global clock) as reference for events that occur in the system. SIGNAL has an entirely different Model of Computation, where no assumptions are made on the global tick while designing the system. Each event has its own tick associated with it and for compilation the analysis of the specification would have to yield a totally ordered tick from the specification. In its absence, a variable (also known as signal) is constructed which has an event happening in synchrony with every event occurring in the system. This signal can be viewed as a root clock of the system with a hierarchical tree structure relating it with every other signal. Since events can occur at each signal at different rates, this MoC is also known as polychronous or multi-rate.
B.A. Jose () and S.K. Shukla FERMAT Lab, Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA e-mail: [email protected]; [email protected]
S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 6, c Springer Science+Business Media, LLC 2010
173
174
B.A. Jose and S.K. Shukla
6.1.1 Motivation for Polychronous Formalism Embedded software systems interact with their environments by sampling inputs (continuous inputs such as temperature, pressure, etc.) or by getting discrete inputs from other digital systems, infinitely. Hence all the inputs, internal variables and outputs of such systems can be thought of as infinite streams of values. If one has a global discrete clock, one could also associate time stamps to each occurrence of values on each stream. But in polychronous MoC, there is no assumption of a priori global clock, and hence occurrence of a new value on a variable is termed an event. Now a few software design requirements are discussed along with their possible implementations through polychronous and non-polychronous MoC. Example 1 (Priority-Merge). Let a and b be two signals, and we need a software that can merge these two streams of events into one stream such that if a and b have events at the same instant, the value from a’s event will be sent to the output. If b has an event at an instant when a does not have an event, then only the value from b’s event will be sent to the output. In a single clocked specification language such as Esterel, the time is globally divided into discrete instants at which inputs are sampled, and outputs are computed with no time delay. So the term ‘instant’ in Example 1 refers to a discrete time point in a linear global timeline, at which sampling a and b determines if any events are present and if so what their values are. Thus if a is absent at the sampling instant, and b is present, then output at that instant will take the value from the stream b. However, in a multi-clocked specification, no externally defined global linear timeline is present, and hence there is no unique implementation for Example 1. One example would be that the software waits for events on both inputs, and if an event occurs on a, it outputs that event. However, if an event occurs on input b, it outputs event on b. But an ‘instant’ in the specification is a period in real execution, thus, if during the period after b has had an event, if a has an event, one should output event on a. Since the length of this period is not predetermined, the implementation will be nondeterministic. Does that mean multi-clock specifications are problematic? The discussion about the next example will give more insight into this problem. Example 2 (Mutually exclusive signals). Suppose that every time an event on a signal c arrives with value t rue, an event on a arrives. When an event on c arrives with a value of false, then only event b arrives. Events on a or b are always accompanied by an event on c. Here the implementation should remain suspended until an event occurs on c, and determine which of the two streams (a or b) must be sampled based on the value carried by this event. If no event on c occurs, the system does not react at all. Now we see the advantage of multi-rate specification. In a global synchronization based model of computation, every time the global tick occurs, at least c must be sensed and hence the program needs to wake up repeatedly even when no event on c occurred, leading to less efficient implementation (e.g., the system cannot go to sleep mode).
6
MRICDF
175
Example 3 (Memory). Suppose you want a system that accepts a stream of values from signal a, and outputs the same value. However, you want the system to have memory so that if its output is sampled by another system, and no event on a occurs at that sampling time, the output must provide the last output value (memorized). In this case, the system reads a only if either an event on a occurs or if another system is requesting this program to give an output. If a does have an event, the new value is forwarded as output. If this checking was on a request, and a does not have an event during that request, it sends the old memorized value. In a singlerate driven system, at every global tick, the system must check for event on a, even when not requested, as the notion of global tick is external to the specification of the program. In a multi-rate MoC, the only time the program wakes up is when a has a new event, or when a request for an output arrives. These examples show that having an a priori notion of global tick which is external to the specification, results in generation of inefficient code. If the inter-arrival time between environment events or requests is unpredictable, a global tick driven program will be less efficient. The multi-rate formalism of SIGNAL realizes this requirement and generates code which would have lesser sampling instances as opposed to single-clock frameworks. Recently, various multi-rate extensions to Esterel have been proposed as well [7], affirming the importance of multi-rate systems in the embedded systems context. Let us conclude this discussion with another example, where an implementation based on multi-rate specifications can be more efficient due to relative performance difference between different parts of a program. Example 4 (Razor [8]). Consider a system that computes a function f on two synchronized input signals a and b. In order to check that the software implementation of f is correct, one can have multiple implementations for f , say Pf and Qf . The inputs taken from a and b are simultaneously passed to both Pf and Qf , and outputs are compared as shown in Fig. 6.1. If the outputs are always equal, then Pf and Qf are both implementing the same function f , otherwise not (cf. program checking [9]). If we have a global synchronization or ticks at which inputs are fed to Pf and Qf , by synchrony hypothesis, the outputs from both Pf and Qf come out instantaneously and they are compared instantaneously. Now consider the case where the implementation of Pf runs much faster than that of Qf . If the specification was in a global tick based formalism, either the inter-tick interval must be
a Pf
c Equivalent c = d?
Fig. 6.1 Comparison of two different implementations – Razor example [8]
b
Qf
d
176
B.A. Jose and S.K. Shukla
max(WCET(Pf ),WCET(Qf )), or Pf must be suspended until the tick at which Qf completes. This requires a suspension code around Pf , watching at every tick if Qf has completed among other synchronizations. In polychronous formalism, one can directly state that Pf and Qf must synchronize. A ‘completion’ signal from Qf will be enough for this synchronization, and there is no need to check at predetermined intervals. This might remind the reader about the advantages of asynchronous design vs. synchronous in hardware. The above examples are to illustrate cases where global synchronization based specifications unnecessarily restrict some obvious optimization opportunities that can be exploited using a polychronous formalism. Software synthesis from polychronous formalism involve clock calculus of greater complexity which are discussed in the next section.
6.1.2 Software Synthesis from Polychronous Formalism Synchronous languages such as Esterel and LUSTRE are relatively more popular than polychronous formalism based language SIGNAL. Programming using single-clocked languages is easier since having an external reference of time is more natural for users and it follows conventional software synthesis methodology. Hence code synthesis tools like Esterel Studio [10] and SCADE [11] are being used in the industry more often than Polychrony [12]. SIGNAL’s MoC is reactive or in other words execute only when an event occurs at any of the signals in the system. Signals which are related to them will compute next and once all data dependent signals have completed execution, one round of computation is over. The reactive response happens for an interval of time and the start of such an interval is considered to be the global software tick of the system, if one exists. This is very useful in low power applications where an embedded system has to be put into sleep once computation has been completed. An embedded system with a non-polychronous MoC will have to check for new inputs for every tick of the external global clock as it does not have any knowledge about events occurring in the system. SIGNAL [13] and its toolset Polychrony [12] provides a framework for multirate data flow specification and is the inspiration for our Multi-Rate Instantaneous Channel Connected Data Flow (MRICDF) framework. However, MRICDF envisions a visual framework to represent the data flow, with control specification only through epoch relations (defined in later sections). The execution semantics and implementability analysis also differs from Polychrony. Polychrony analyzes the system by solving a set of clock equations [14] which only apply to Boolean signals. They use ROBDDs for creating canonical form for all clock variables, and then construct clock hierarchy in the form of tree-like data structures. This makes the analysis more complex unnecessarily. MRICDF synthesis methodology seeks to remain in the Boolean equational domain for analysis, but uses various techniques such as prime implicate generation for obtaining a total order on events occurring in the network.
6
MRICDF
177
This chapter is organized as follows. The semantics of MRICDF require information on terminology which is introduced first in Sect. 6.2. MRICDF formalism is described in Sect. 6.3. Issues in implementing MRICDF as embedded software are discussed in Sect. 6.4. A visual framework called EmCodeSyn which implements MRICDF models as embedded software is discussed in Sect. 6.5. Sections 6.4 and 6.5 include case studies which demonstrate how the MRICDF and EmCodeSyn are used to design safety-critical applications. The chapter is concluded in Sect. 6.6 with a discussion on possible extensions and future work.
6.2 Synchronous Structure Preliminaries The semantics of MRICDF is explained in terms of synchronous structures. The preliminaries of synchronous structures are given in [15]. Here we review some of the definitions that are required in explaining the semantics of MRICDF. Definition 1 (Event, signals). An occurrence of a value is called an event. We denote with the set of all events that can occur in a synchronous system. A signal is a totally ordered set of events. If ‘a’ is a signal in MRICDF formalism, the events on ‘a’ are denoted by E(a). The occurrence of events on signals could be defined in terms of causality relations. Here we define three operators on events namely precedence, equivalence and preorder. Definition 2 (Precedence, equivalence, preorder). We define a precedence on events such that 8 a; b 2 , a b if and only if the event a occurs before b. An equivalence relation exists between events a; b 2 if and only if a ˜ b and b ˜ a. This means that neither event happens before the other or in other words both events occur simultaneously. They are also called synchronous events. A relation ‘’ on events a; b 2 defines a preorder, such that a b if and only if a’s occurrence is a prerequisite for the occurrence of b, or if they are synchronous. Here event b occurs after a or together with a. With these operators on events in place, we can see that for two events a; b2 , a b means that a b and a œ b. From a code synthesis perspective, a data dependence is a binary relation on events of precedence type. We denote data dependence by the notation ‘+’. So for any two events a; b 2 , a + b implies a b. In the absence of global time reference, an instant or a point in time does not make sense. That is why an instant is defined as the maximal set of events, such that all events in that set are related by ‘’ relation with each other. In other words, all events that are synchronized with each other define an instant. Definition 3 (Instant). Once the quotient of the set of all events is taken with respect to , we obtain equivalence classes, each of which is an Instant. The set of instants is denoted by D = .
178
B.A. Jose and S.K. Shukla
Each set S 2 will contain events which have the property 8 a; b 2 S , a b. Since instants are equivalence classes, we can define a precedence order between them. For two sets S; T 2 , S T if and only if, 8 .s; t / 2 .S; T /, there exists the relation s t . Definition 4 (Signal behavior). The behavior of a signal ‘x’ is an infinite sequence of totally ordered events. In other words, the behavior of signal ‘x’ or ˇ.x/ is E.x/ , such that for any two events a; b 2 E.x/ either a b or b a (totally order). One can also associate a function to a signal a as .a/ W N ! E.a/, where N is set of natural numbers. Even though N is a countably infinite set, it does not represent a discrete global time line, but rather the sequence number of a value on signal a. Thus one can talk about the first event on a and nth event of a, but nth event of another signal b is not necessarily synchronized with that of a. Note that our definition of an instant eliminates the need for any ‘absent’ event, as opposed to the standard polychrony literature [13]. Definition 5 (Epoch). The epoch of a signal defines a possibly infinite set of instants at which the signal has events. Given a signal ‘a’, I.a/ (or b a) represents its Epoch (or rate). Let I.a/ denote the set of instants of ‘a’ or its epoch. Formally I.a/ D fj 2 ^ .E.a/ \ ¤ /. Given two signals a and b, intersection of I.a/ and I.b/ will denote those instants where both signals have events together, and union of I.a/ and I.b/ will denote those instants where either of the two signals have events on them. Definition 6 (Synchronous signals). Two signals ‘a’ and ‘b’ are said to be synchronous if and only if, I.a/ D I.b/. In other words, the signals have unique events that are part of the same equivalence class S 2 . For code synthesis, a partial order on events is required by relating the events occurring in a synchronous structure. For two events e; f there could be a relation e b due to two reasons. One possibility is the existence of data dependence between the events, i.e., e + f . Another possibility being the case where two events are part of distinct instants e 2 S , f 2 T with S; T 2 and S T .
6.3 MRICDF Actor Network Model Multi-Rate Instantaneous Channel-connected Data Flow (MRICDF) is a data flow network model consisting of several actors in a network communicating with each other via instantaneous channels [16]. Each actor in the network will contain input and output ports and a specified computation that will be performed on the input signals. Events in different signals may occur at different instants and there is no external global time reference. As a result, signals may or may not be synchronized
6
MRICDF
179
w x y
Actor 1
z
Actor 2
u w Actor 4
Actor 3
v
Fig. 6.2 An MRICDF network model
with each other. Synchronization information between events may either be implicit in the model or explicitly added as part of the model as exogenous constraints. Actors could be of two types. Primitive actors or hierarchically composed composite actors. All actors compute their reaction to a trigger condition in an instant. The communication between actors using an instantaneous-channel is assumed to be completed within an instant. Figure 6.2 is a simple MRICDF model consisting of four actors who communicate through their input and output signals. The MRICDF model here has signals x; y as input and w as output. The intermediate signals are u; v and z. This entire network can be again used as an actor in another MRICDF model. Thus hierarchical specification is enabled. What triggers an actor to react (or compute) is not a priori fixed, and is computed using epoch analysis. These trigger conditions are derived from the model along with some epoch constraints imposed by the modeler. If it can be derived for all actors in the model, then software synthesis is possible. If not, the user has to add extra constraints to help determine the triggering conditions. A triggering event at any actor could spark a series of computations with event generation at intermediate signals and output interfaces throughout the network.
6.3.1 Primitive and Composite Actors for MRICDF An Actor is structurally represented by A D hT; I; O; N; Gi where I; O are respectively, the set of input ports and output ports. T is the type of the actor which is either one of the primitive actors (B; S; M; F .n; m/) which are discussed later or a composite actor Tc . N is the set of internal actors in the actor A, where each n 2 N is of type t 2 T . G denotes an instantaneous-channel connection graph between internal actors like n 2 N and the input and output ports (I; O) of the actor A. Primitive actors do not have any internal actors and hence for primitive actors N is empty. So a primitive actor can be represented as A D hTp ; I; O; ;; Gi, where Tp 2 fB; S; M; F .n; m/g. The four types of primitive actors are described below:
180
B.A. Jose and S.K. Shukla
i1 i2
. . .
in
F (n,m)
. . .
o1 o2
o
i B
om
Function
Buffer
i1
i1 o
o
S i2 (boolean)
M i2
Sampler
Merge
Fig. 6.3 MRICDF primitives operators [16]
1. A Buffer node B is a node which has a single input port and a single output port. An event arriving into a buffer node is consumed and stored in the buffer node, and the event that was consumed before this particular event is emitted on the output port. However, for the very first input event, a pre-specified default value is emitted. An input event and the corresponding output event belong to the same instant. A pictorial visualization of such a primitive node is in Fig. 6.3. If x is an input signal to a B type actor, and y is its output, then .y/.i / .x/.i / and .y/.i / D .x/.i 1/ for all i 2 N ^ i ¤ 1, and .y/.1/ is a default value. 2. A Sampler node S has two input ports, and one output port. The second input port must always have Boolean valued events. If the first input has an event on it, and if the second input has a ‘true’ valued Boolean event belonging to the same instant, then the first input event is emitted on the output port. If events appear on its first input port, and no event or a ‘false’ Boolean input appears on its second input port at that instant, then no output is emitted. However, even the ‘false’ Boolean valued event is consumed and discarded. Whenever S node produces an output event, it belongs to the same instant as the two input events it consumes. A visualization of a sampler node is in Fig. 6.3. If x is the first input signal, c the second input and y the output signal, then the following must be true: 8i 2 N, if there exists j 2 N and T 2 , such that .x/.i /; .c/.j / 2 T and .c/.j / carries the Boolean value ‘true’, then there exists l 2 N such that .y/.l/ 2 T and the values of .y/.l/ and .x/.i / are the same. If .c/.j / carries a value ‘false’ or if .x/.i / 2 T but .c/.j / … T for any j , then no l 2 N exists such that .y/.l/ .x/.i /. Also if .x/.i / … T for any i and .c/.j / 2 T , there will be no l 2 N such that .c/.j / .y/.l/. 3. A Prioritized merge node M also has two input ports and one output port. If an input event appears on the first input port, then it is emitted on the output port irrespective of whether the second input port has any event appearing on it. However, in the absence of an event at the first input port, if the second input port has an event on it, then that event is emitted on the output port. The output event belongs to the same instant as the input event that is responsible for the output
6
MRICDF
181
Table 6.1 MRICDF Operators and their epoch relations
MRICDF actor
MRICDF expression
Epoch relation
Function F .n; m/ Buffer B Sampler S Merge M
fo1 ; o2 ; : : : ; om g D F fi1 ; i2 ; : : : ; in g ob1 D D obm D b i1 D D b in o D Bfig b o Db i c ob1 D b i1 \ Œi o1 D Sfi1 ; i2 g 2 b b o1 D M fi1 ; i2 g ob1 D i1 [ i2
event. A visualization of a merge node is in Fig. 6.3. Let x and y be the input signals, and z be the output signal, then the following must be true: If there exists i; j 2 N such that .x/.i /; .y/.j / 2 S for S 2 , then there exists an l 2 N such that .z/.l/ .x/.i / and they carry the same value. If for some j 2 N .y/.j / 2 S for some S 2 , and there is no i 2 N such that .x/.i / 2 S , then there exists l 2 N such that .z/.l/ .y/.j / where .z/.l/ and .y/.j / carry the same value. 4. A Function node F .n; m/ of input arity n, and output arity m basically computes the function F on n events, one event arriving on each of its input ports, and produces m output events, one each on all the m output ports. Graphically, one can depict a function node F as a circle with n arrows coming into it, and m arrows going out of it. (See Fig. 6.3). Let x1 ; x2 ; : : : ; xn be the input signals, and y1 ; y2 ; : : : ; ym be the output signals, then the followings must hold: For each i 2 N , for an instant S 2 if 8j D f1; : : : ; ng .xj /.i / 2 S and 8lDf1; : : : ; mg .yl /.i / 2 S , and vector of values carried by h .y1 /.i /; .y2 /.i /; : : : ; .ym /.i /i is obtained from F . .x1 /.i /; .x2 /.i /; : : : ; .xn /.i //: Table 6.1 summarizes the four MRICDF primitive actors and their epoch relations. If the input port names of a Function F .n; m/ node are ij for j D 1::n, and output port names are ok for k D 1::m, then usage of the F .n; m/ node implies that the events on these signals occur at the same instant. The epoch relation will be: ib1 D ib2 D D ibn D ob1 D ob2 D D oc m . If i is the input port name of a Buffer primitive node B, and o is its output port, then it is implied b i Db o. If i1 and i2 are the input port names for a Sampler node, and o1 is its output port c name, then it is implied that ob1 D ib1 \ Œic 2 . Œx denotes the set of instants at which the port x has an event occurrence which contains the Boolean ‘true’ value. So for a c \ Œ:x D ;. If i1 and c [ Œ:x and Œx Boolean variable/signal one can write b x D Œx i2 are the input port names for a prioritized merge node, and o1 is its output port name, then it is implied that ob1 D ib1 [ ib2 .
b
b
6.4 Embedded Software Synthesis from MRICDF The implementation of MRICDF models as embedded software, involves a detailed analysis of the model and several transformation steps to reach the code generation phase. Correct-by-construction software synthesis methodology is put in place to form a total order on all the events occurring in an MRICDF model. Obtaining a
182
B.A. Jose and S.K. Shukla
unique order of execution is important for sequential software synthesis. Apart from these transformation steps, issues like deadlock detection have to be handled as well. We discuss the implementation issues of MRICDF models first and then explain how Epoch Analysis and master trigger identification steps are used to resolve them.
6.4.1 Issues in the Transformation from MRICDF Models to Embedded Software The instantaneous reaction or communication is part of the abstract semantics of MRICDF. When software is synthesized, an instant has to be mapped to an interval of time. There may be structural cycles in the MRICDF network, which means one has to determine possible deadlock during the real software execution. Since different instants will be mapped to different periods of real time, only deadlock possibilities are cycles within an instant. So given the data dependence and precedence relations, verification has to be done to check for any cyclic dependencies within an instant: Recall that e f , ..e f / ^ .f e//. Also e + f could imply e f
and f + e could imply f e. This means that two events in the same instant may have cyclic dependency. This may result in a deadlock while executing the software to mimic that instant. So one has to take the transitive closure of + relation, and check for such cyclic dependency. However, any other cyclic dependency that spans two different instants is not a deadlock. Mapping the abstract instant into real world time periods requires a special signal which has instants in each equivalence class . This special signal called master trigger, needs to be found for a possible sequential implementation of MRICDF model. The existence/non-existence of a master trigger, proofs for identifying a signal as master trigger and how to use it in the conversion into embedded software will be explained in later sections. The event set is an infinite set. Even its quotient, the set of instants with respect to is also an infinite set. To generate software that is supposed to mimic the instants in their right precedence order, the synthesis tool must find a way to analyze the MRICDF specification in finite time. Even if the two previously described issues are satisfied, there could be a case where no deterministic sequential implementation is possible from the given MRICDF model. It might indicate that more constraints on the components or signals must be provided (exogenous constraints), for code synthesis to be possible. For synthesizing sequential software from MRICDF models, a total order has to be achieved on through the precedence operator ‘’. The software executes in rounds, each round corresponding to one Si 2 . Within each Si , there will be data dependence between events implied by +. If is a partial order, the specification does not have enough information for a unique sequential implementation, and hence the synthesized software will have a behavior that is one of the possible
6
MRICDF
183
behaviors from the specification. To obtain a unique behavior, there needs to be a master trigger signal having events in every instant of the MRICDF model. Definition 7 (Master trigger [16]). Let M be an MRICDF model with denoting the instants of the model. Let ‘t ’ be a signal with E.t / as its set of events, with the property that for each S 2 , there exists an event eS 2 E.t /, such that eS 2 S , and there is no S 2 which has a causal cycle with respect to the relation +. Then ‘t ’ will be termed the master trigger for M and M is said to be sequentially implementable. Informally the signal t can be used as a master trigger for the software implemented from MRICDF to indicate each round. If a master trigger ‘t ’ exists, then E.t / is totally ordered, and it is implied that is totally ordered with respect to . Sequential implementability is analogous to the ‘endochrony’ concept in SIGNAL literature [17]. An endochronous SIGNAL program is one which has a deterministic sequential schedule or whose schedule can be statically computed. Multi-threaded implementability could be defined from the events occurring simultaneously, but here we limit the discussion to sequential code generation. Note that existence of master trigger is a necessary, but not a sufficient condition for sequential implementability. If no master trigger has been identified, exogenous constraints will be applied on the MRICDF model to form a master trigger.
6.4.2 Epoch Analysis: Determining Sequential Implementability of MRICDF Models In this section, we explain how Epoch Analysis technique solves the three major implementation issues described in the previous section. Epoch Analysis involves a deadlock detection technique, master trigger identification test on the MRICDF signals and finally an optional ‘endochronization’ process. The endochronization step is taken when the master trigger identification test fails and an external master trigger needs to be inserted. 6.4.2.1
Deadlock Detection
In an actor network model, the generation of tokens (or events) happen at input ports and at Buffer output ports where there is the occurrence of a stored event. A deadlock situation can happen when the computation loop is within an instant and hence there should not be any Buffer actor within the loop. In the Fig. 6.4, an MRICDF network with a deadlock condition is shown. The computation of the output signal t is dependent on the current value of the same signal. Once the MRICDF network is translated into epoch equations, a value chain can be formed to identify deadlock situations. For the network in Fig. 6.4, the epoch equations are: c b c b b Had any of the actors within the loop b x Db b; b y Db t \ Œa; z Db x \ Œy; t Db z \ Œc. been a Buffer actor, the loop would have been broken, since the data dependence is between instants.
184
B.A. Jose and S.K. Shukla
Fig. 6.4 MRICDF model with deadlock
b
F
x z
S a
6.4.2.2
S
S
y c
t
Master Trigger Identification
From the definition of master trigger/sequential implementability, for deterministic sequential software synthesis from MRICDF, mapping from the instances of rounds to software code has to be performed. It is required that all events of Si must occur during the execution of the software before any event of Sj , where fSi ; Sj g 2 and i j . The synthesized sequential software must execute in rounds, each round corresponding to one Si 2 . Note that events in Si may have data dependence between them, hence events within a single Si must also execute in a way as to preserve the order implied by +. To identify a master trigger which has events in all instants of an MRICDF model, the epoch equations of the MRICDF model are analyzed in terms of Boolean equations. For a Boolean signal x, Œ:x and Œx represent signals which have an event at the same instant as x with the Boolean value ‘false’ and ‘true’ respecc tively. The epoch relations between these signals are as follows: b x D Œ:x [ Œx, c Œ:x \ Œx D . If a signal a has an event that belongs to T 2 , then in the Boolean domain ba D true, else ba D false. When Boolean equations are constructed using Boolean variables for type of epoch relation, three types of Boolean equations can x Db y [b z, be obtained. For b x Db y , the Boolean equation becomes bx D by . If b x Db y \b z Boolean equation is bx D by ^ bz . then bx D by _ bz and finally for b For a Boolean signal c, there exists two Boolean equations, bc D bŒc _ bŒ:c and bŒc ^ bŒ:c D false. For signals of other type such integer float, etc., their values are ignored and in the Boolean domain they are represented by separate signals which denote their presence and absence. From such a system of Boolean equations, if there exists a variable bx such that if bx D false, then all variables in the equation system are false for the equations to be satisfied, then its equivalent signal x is the master trigger. If no such bx exists, then there is no master trigger signal, and hence the given MRICDF model is not sequentially implementable.
b
b
Theorem 1 (Master Trigger Identification [16]). Given an MRICDF model M , let BM denote the system of Boolean equations obtained by converting each signal in terms of its Boolean signals. Then a signal x in M is a master trigger if and only
6
MRICDF
185
if there exists an input signal x in the model M such that its corresponding Boolean variable bx in BM has the property that if bx is false, every other variable is false. Proof sketch. Recall that a master trigger is one which has one event in every instant of the set of abstract instants . The instants are constructed by partitioning the union of events from all signals. Thus if x has to have an event in every instant, its instant set must be a superset of all instants in the model. In other words, every signal’s instant set must be a subset of its instant set. By definition bx should then be implied by every variable by for all signals y in the system. bx encodes the presence of an event of x in an arbitrary instant T , and if by ! bx , then if y has an event in T , so does bx . So if the solution set of this equation system implies that for any signal y in M , by ! bx , then when we set bx to false, all left hand sides of those implications have to be set to false. t u Note that we are considering MRICDF models which are sequentially implementable at this point, where there is a master trigger signal available from the environment. There are cases where an internal signal has the highest epoch in the system, which is called as oversampling in SIGNAL language [18]. We do not consider models with oversampling as sequentially implementable at this point. Instead we add external information about input signals to create a master trigger. 6.4.2.3
Exogenous Constraints for Code Synthesis
For a master trigger signal to exist, there should be a signal which has events in all instants of the MRICDF network. For a simple Sampler actor, C D Sampler.A; X /, where B D ŒX , the epoch equation is as follows: b D b b If the signals A and B are independent, there is a possibility that C A \ B. no master trigger exists for an MRICDF network with this single actor. Let a 2 A be an event belonging to an instant where there is no b 2 B. Similarly, there could be an event b 2 B in an instant where there is no a 2 A. Since C can have events only when both A and B have events, there are several instants of A and B that are not part of instants of C . In short, there is no signal who has events in all instants of the network. Hence there is an absence of master trigger. In MRICDF models with no master trigger, exogenous rate constraints have to be applied to construct an external one. For the Sampler actor, the exogenous c and b c Now the set of Boolean equations for constraints are b x Db y ;b a D Œx; b D Œy. the actor is as follows: bx D by , ba D bŒx , bb D bŒy , bx D bŒx _ bŒ:x , by D bŒy _ bŒ:y and bc D ba ^ bb . In this system of Boolean equations, we can observe that the external signals x (or y) can become the master trigger, since when the Boolean variable bx (or by ) is set to false, all the variables become false. Now that a master trigger has been identified a schedule for computing signals in the network has to be constructed which is called as its follower set. To obtain the follower set from the set of Boolean equations, set the master trigger signal to true, simplify the Boolean equations and perform master trigger identification process again. Repeated execution of these steps will form the follower set for the whole network which is the order of execution.
186
B.A. Jose and S.K. Shukla
6.4.3 Case Study of Producer–Consumer Problem The Producer–Consumer problem or the bounded-buffer problem is a computer science example to demonstrate synchronization problem between multiple processes. We select this example due to its inherent multi-rate nature and for demonstrating hierarchical specification feature of MRICDF. Producer and consumer have separate rates of operation and they must be synchronized for obtaining a read/write from a single-stage buffer. A pseudo code representing the model is shown in Listing 6.1. The SIGNAL representation of this model was given in [17]. The Producer and Consumer models are represented as individual composite actors connected to another actor PCmain which synchronizes them and also takes care of the storage of data being sent. In the MRICDF representation of the example shown in Fig. 6.5, the PCmain is separated into multiple composite actors, Cell and Activate. Figure 6.6 shows the MRICDF composite actor representing the Producer. Producer actor performs a counter function and its value is being
Listing 6.1 Pseudo code for Producer–Consumer model Actor Producer ( i n p u t boolean p t i c k ; output i n t e g e r dvalue ) = ( j c o u n t e r : = s a m p l e r ( ( p r e v c o u n t + 1 ) mod 7 , p t i c k ) j prevcount := Buffer ( counter , 0 ) j ) A c t o r Consumer ( i n p u t b o o l e a n c t i c k , u ; i n t e g e r bvalue ; output i n t e g e r dread ; boolean v ) = ( j v : = P r i o r i t y M e r g e ( Sample ( t r u e , u ) , f a l s e ) j d r e a d : = P r i o r i t y M e r g e ( Sample ( b v a l u e , v ) , 1) j ) A c t o r PCmain ( i n p u t b o o l e a n p , c ; output i n t e g e r pcdata , boolean pcvalid ) = ( j d := Producer ( p ) j b := PriorityMerge ( c , Buffer ( d , 0 ) ) j pc : = P r i o r i t y M e r g e ( B u f f e r ( p , f a l s e ) , f a l s e ) j ( p c d a t a , p c v a l i d ) = Consumer ( c , pc , b ) j )
p
Producer
d Cell
b
pcvalid Consumer
c
pcdata Activate
Fig. 6.5 Producer–Consumer example
pc
6
MRICDF
187
B counter
F(1,1) S
prev_count
dvalue = d p = ptick
Fig. 6.6 Producer Actor model
d d
M b
M
B S c
Fig. 6.7 Cell Actor model
Fig. 6.8 Activate Actor model
p
B pc M
false S c
sent to the output on every input trigger represented by ptick. So for every event in the signal ptick, the Producer emits a value at the port dvalue. The PCmain actor in the pseudocode combines both synchronization tasks along with the storage of the generated value. These tasks are separated and shown in Figs. 6.7 and 6.8. The Consumer actor is triggered by events arriving at the signal ctick. The alias for trigger signals of Producer and Consumer actors are p and c respectively. The Activate MRICDF actor shown in Fig. 6.8 accepts these signals and produces a combined trigger for synchronizing all actors. This trigger signal pc denotes instants where there is either a production or consumption of data from the buffer. In the Consumer actor shown in Fig. 6.9, pc is used to indicate a valid data is present/absent in the buffer. The consumed data is obtained through the port b connected to the storage actor Cell. Figure 6.7 shows the structure of this single data storage block. For every Consumer tick c, the newly generated value from the consumer d or the value stored in the Buffer is passed onto the output port b. There is a loop with a Buffer primitive actor which refreshes the previously stored value. Prior to the code synthesis stage, the master trigger has to be identified and a follower set has to be built. In this model, the master trigger has to be enforced
188
B.A. Jose and S.K. Shukla
true
b = bvalue S
S
pc = u
dread = pcdata M
false
v = pcvalid
M
−1 S c = ctick
S
c = ctick
Fig. 6.9 Consumer Actor model
externally, since p and c are independent signals. The follower set will be decided on the basis of values present at these ports. Once a unique sequential schedule is decided in the form of a follower set, code generation can be performed. More information about producer–consumer model and other examples implemented using MRICDF are available in [19].
6.5 Visual Framework for Code Synthesis from MRICDF Specification One of the goals of MRICDF is to popularize the use of polychronous formalism among engineers in the embedded software field. Embedded Code Synthesis or EmCodeSyn is a software synthesis tool which provides an environment to specify MRICDF models and utilities to perform the debugging and code generation from specification [20]. In this section, we explain the design methodology of the EmCodeSyn tool and show using a case study each step in achieving code synthesis from an MRICDF specification. The design methodology of EmCodeSyn is shown in Fig. 6.10. The MRICDF model specified using the GUI is stored in a Network Information File (NIF). This will contain information about the types of actors, their inter-connections, computation to be performed, and other GUI information about placement of actors, lines drawn, etc. Now Epoch Analysis is performed on the model which will breakdown the composite actors in terms of primitive actors and translate the model into epoch equations and its corresponding Boolean equations as explained in Sect. 6.4.2. Now master trigger identification tests based on Theorem 1 is performed to compute a sequential schedule for computation called follower set. If a master trigger cannot be found within the system, exogenous rate constraints will have to be provided to force a master trigger on the MRICDF model. Once a follower set is established for the MRICDF model, code generation can begin. Section 6.4.2 described how
6
MRICDF
189 MainFile
MRICDF Specification
NIF
Epoch Analysis
Epoch Equations
Code Generation
Function definition File Header File
Boolean Equations Exogenous rate constraints
Master Trigger Identification
Follower Set Generation
Fig. 6.10 EmCodeSyn design methodology
to perform a test for identifying a master trigger in an MRICDF model and how to provide exogenous information to force one, in the absence of master trigger. In this section, we explain how a prime implicate based strategy is used to identify a master trigger from a system of Boolean equations.
6.5.1 Prime Implicates for Master Trigger Computation An implicate is defined as a sum term that can cover max terms of a function. By covering a max term we mean that the sum term can be simplified into the individual max terms. A prime implicate is defined as a sum term that is not covered by another implicate of the function. Jackson and Pais [21] surveys some of the prime implicate computation algorithms in literature. Research into artificial intelligence and verification tools have contributed to the development of better algorithms and tools to compute Prime Implicates (PI). De Kleer [22] used a specialized data structure called tries to improve the performance of prime implicate algorithms. These PI generation algorithms provide us with solutions to a set of Boolean clauses. In EmCodeSyn, Boolean equations for the signals in the network are generated. From this system of Boolean equations, SAT equations in Conjunctive Normal Form (CNF) can be formed. Once a PI generator has been used to solve these equations, the result is an array of solutions which are in their reduced form. Now if any of the prime implicates is a single positive literal, that particular signal is the master trigger. The selection of master trigger helps in identifying the order of execution of actors which is called the follower set of the model. This is computed by setting the current master trigger value to true and by repeating the PI generation process. The next set of PIs will be the candidates for the follower set of previous one and so on. The prime implicate generator we use in our software tool is developed by Matusiwicz et al. [23].
190
B.A. Jose and S.K. Shukla
6.5.2 Sequential Implementability of Multi-rate Specification Using Prime Implicates Similar to endochrony property in SIGNAL/Polychrony framework, an equivalent property of sequential implementability exists in MRICDF formalism, which guarantees that the scheduling of computation based on input events can be inferred correctly. In other words, the order in which inputs are read is known from the program itself. For deterministic code synthesis, either the MRICDF design has to be endochronous or exogenous constraints have to be applied to force sequential implementability of the design. Here we show how a PI based strategy is used to identify the master trigger for MRICDF primitive actors. Also in the absence of a master trigger, we demonstrate how exogenous constraints are applied using the Prime Implicate method to force sequential implementability.
6.5.2.1
Endochronous Primitive Actors: Buffer and Function
Buffer actor has the simplest of epoch equations, where the input and output rates are equated. Listing 6.2 shows the epoch equations based on Table 6.1 and the resulting SAT equations for PI generation in the first column. The union and intersection operations are represented by C and respectively. A trivial solution exists where ‘false’ value for all variables would result in the whole system giving a false output as required by the Theorem 1. To eliminate this solution, we add the constraint a C b D true. Each variable (a; b) is assigned a number in CNF format (1; 2) and a two variable system of SAT equations is formed. The SAT equations in each line
Listing 6.2 Epoch and Boolean equations for endochronous actors Buffer and Function MRICDF t e x t u a l r e p r e s e n t a t i o n a = Buffer b
MRICDF t e x t u a l r e p r e s e n t a t i o n a = b Function c
Epoch E q u a t i o n s a = b , a+b = t r u e
Epoch E q u a t i o n s c =a , b = a , a+b+c = t r u e
Boolean E quations 1: a , 2:
Boolean E quations 1:b , 2: a , 3: c
p sat 2 ((
p sat 3 (( + ( (1 2 ) (1 2) ) +(1 2)
)) Prime I m p l i c a t e s : f a g , f bg Master Trigger : fbg
+ ( (1 2 ) (1 2) ) + ( (3 2 ) (3 2) ) +(1 2 3) )) Prime I m p l i c a t e s : f a g , f b g , f c g Master Trigger : fb g ,f c g
6
MRICDF
191
corresponds to the epoch equations shown just above them in Listing 6.2. For the external PI generator we used SAT equations are expressed in sum of products form. So C..1 2/ .1 2// represents f.a b/ C .a b/g. The SAT equations are used to find prime implicates fag and fbg. Two single positive literals are obtained from the PI generation, which satisfies the sequential implementability requirement. Since a is an output port, it is not a candidate for master trigger. Hence b is chosen as the master trigger for the Buffer actor. Function actor equates the rates of all of its input and output ports through its epoch equations. Listing 6.2 shows the epoch relations for double input, single output Function actor in the second column. From the epoch equations a 3 variable system of Boolean equations is formed. The PI generator result contains all possible single literals fag, fbg and fcg. From this endochronous result, we remove the output port fag and narrow down the choices to fbg or fcg. Since they are of equal rate, the order of execution can be chosen arbitrarily. 6.5.2.2
Non-endochronous Primitive Actors: Sampler and Merge
Sampler actor performs an intersection operation on the rates of two input signal to compute the rate of an output signal as shown in Table 6.1. Listing 6.3 shows the epoch and Boolean equations of a Sampler actor with inputs b, c and output a. The
Listing 6.3 Epoch and Boolean equations for non-endochronous actor Sampler MRICDF t e x t u a l r e p r e s e n t a t i o n a = b Sampler c Epoch E q u a t i o n s a = b inter [c] c = [ c ] u n i o n [c ] [ c ] i n t e r [c ] = f a l s e a+b+c= t r u e Boolean E quations 1: a , 2 : b , 3 : c , 4 : [ c ] ,5:[ c ] p sat 5 (( + ( (1 (2 4 ) ) (1 +(2 4)) ) + ( (3 + + ( 4 5 ) ) (3 (4 5 ) ) ) +(4 5) +(1 2 3) )) P I : f b , c g , f b , [ c ] ,[ c ] g , Master T r i g g e r : fg
E n d o c h r o n i z e d Epoch E q u a t i o n s a = b inter [c] c = [ c ] u n i o n [c ] [ c ] i n t e r [c ] = f a l s e x = y b =[ x ] [c] = [y] x = [ x ] u n i o n [x ] y = [ y ] u n i o n [y ] [ x ] i n t e r [x ] = f a l s e [ y ] i n t e r [y ] = f a l s e a+b+c+x+y= t r u e Boolean Equations 1: a , 2 : b , 3 : c , . . , 1 0 : [ y ] ,11:[ y ]
p s a t 11 (( ... )) PI : fx g ,f y g ,f b , c g , . . Master Trigger : fx g ,f yg
192
B.A. Jose and S.K. Shukla
true valued Boolean input requires additional epoch equations to define the relation between the variables c, Œc and Œc. A 5 variable system of Boolean equations is solved to produce the PIs, but from Listing 6.3 we can see that none of the PIs is a single positive literal. Hence there are no master trigger candidates. To find a master trigger, exogenous constraints are forced on the system using the new variables x and y as shown in the second column of Listing 6.3. The variables x and y are equated, but Œx and Œy are independent of each other. x; y act as the exogenous information about input signals. The inputs of the Sampler actor are equated to these variables as b D Œx and Œc D Œy. In this way, we can simulate all combinations of presence/absence of b; Œc occurring at the input. Once the PIs are generated, fxg or fyg can be chosen as the master trigger. In the subsequent follower set, either Œx is true or Œx is true and we can be sure whether to read b or not. Thus a master trigger and a follower set are established to avoid any ambiguity in the order of reading and writing signals for the Sampler actor. Merge actor performs the union operation on the input signals to produce the output signal as shown in Table 6.1. Listing 6.4 shows the epoch and Boolean equations of a Merge actor with inputs b,c and output a. A 3 variable system of Boolean equations is solved using the prime implicate generator to identify a candidate fb; cg. Since there is no single positive literal as a candidate, the model is not endochronous. The endochronized version of the Merge actor is shown in the second column of Listing 6.3. In a similar manner as in Sampler case, new variables x; y are introduced with accompanying epoch equations. A 9 variable system of Boolean equations is solved to identify the PIs and the master trigger is chosen
Listing 6.4 Epoch and SAT equations for non-endochronous actor Merge MRICDF t e x t u a l r e p r e s e n t a t i o n a = b Merge c
E n d o c h r o n i z e d Epoch E q u a t i o n s a = b union c
Epoch E q u a t i o n s a = b union c a+b+c= t r u e
x = y b = [x] c = [y] x = [ x ] u n i o n [x ] y = [ y ] u n i o n [y ] [ x ] i n t e r [x ] = f a l s e [ y ] i n t e r [y ] = f a l s e a+b+c+x+y = t r u e
Boolean E quations 1: a , 2 : b , 3 : c p sat 3 (( + ( (1 + ( 2 3 ) ) (1 (2 3 ) ) ) +(1 2 3) ))
PI : fb , c g Master Trigger : f g
Boolean Equations 1: a , 2 : b , . . , 8 : [ y ] ,9:[ y ]
p sat 9 (( . . . )) P I : x , f [ x ] ,[ x ] g , f y g , f b , c g , Master Trigger : f x g ,f yg
6
MRICDF
193
from fxg; fyg. The value of Œx or Œy will tell us the presence/absence of b and c. If both are present, order of reading them can be arbitrary since we are assured that the input will arrive. If only one is present, we can avoid waiting for the other signal. Effectively, we obtain a deterministic order for reading the signals.
6.5.3 Sequential Code Synthesis Epoch Analysis analyzed the epochs of individual signals in the network. A master trigger having events in all instants of the MRICDF model was identified from or enforced on the MRICDF model. Software will be executed as rounds mapping instants of the MRICDF model, with computation within each round following the order given in the follower set. The computation within each instant is made up of primitive actors interacting with each other. At the code generation phase of EmCodeSyn, each primitive actor is represented in terms of its equivalent C code. Figure 6.11 shows the equivalent C code for the 4 endochronized primitive actors in isolation. Endochronization allows us to know the order of reading the input signals a and x for Merge actor. The MRICDF network is built by connecting the signals using these pieces of C code according to the scheduling order specified in the follower set. EmCodeSyn generates three C files for an MRICDF model. The Main file which lists the computation according to the scheduling order obtained from the follower set. The Header file contains all the functions and primitives to be used and the Function Definition file contains the code for each function that was declared. Note that code generation stage is reached after computation of master trigger and follower set. The template for the generated Main file is shown in Listing 6.5. Each iteration of the “while (1)” loop represents each round of computation or an instant at the MRICDF abstraction level. The scheduling of actors within the loop will be
y = Merge (a,x)
b = Function (a>0) a a
y
b
F
M x
x = Sampler (a,b)
w = Buffer (a)
a x
a
S b
Fig. 6.11 MRICDF primitive actors represented in C code
w
B
194
B.A. Jose and S.K. Shukla
Listing 6.5 Code generation format for Main C file # include ” syn header . h” i n t main ( v o i d ) f / / S e l e c t manual , random , o r i n p u t s f r o m f i l e while ( 1 ) f Read m a s t e r t r i g g e r s i g n a l Compute f o l l o w e r s e t / / Code f o r b u f f e r , s a m p l e r , f u n c t i o n , merge a c t o r s / / Sends b u f f e r ’ s i n p u t s to temporary v a l u e s / / S e n d s v a l u e s o f a l l t h e i n p u t s and o u t p u t s f i n i s h computation of follower s e t g return 0; g
according to the order in the follower set of the MRICDF network. Each iteration of the loop will begin with the master trigger signal and will schedule actors according to the nature of input signals. Finally the buffer values and the output values are updated. To demonstrate the capabilities of EmCodeSyn, we take you through the process of designing an MRICDF model, creating its NIF, forming equations for Epoch Analysis and finally performing code generation. The model used is a simplified version of a height supervisory unit (STARMAC) in a miniature aircraft. This will serve as a guide in following the EmCodeSyn design methodology.
6.5.4 Case Study on Implementation of an MRICDF Model using EmCodeSyn STARMAC is a multi-agent control project of a small Unmanned Aerial Vehicle (UAV) at Stanford University [24]. The model presented in this section is a simplified version of a height supervisory unit of the STARMAC UAV. The block diagram of the intended design is shown in Fig. 6.12. There is a supervisory phase which takes care of all computation and a monitoring phase which verifies if the height co-ordinate of the UAV is above a minimum. The design requirement would need a parallel implementation as shown in Fig. 6.12a. A sequential implementation of the requirement can also perform the verification of elevation co-ordinate if the monitoring phase is placed after the supervisory part. This particular case is shown in Fig. 6.12b. The MRICDF model for the requirement is shown in Fig. 6.13. The height supervisory unit is represented in terms of two composite actors combined with a Merge actor. The composite actors are designed using the GUI of EmCodeSyn and saved in the actor library. They are reused in the top level model of the STARMAC design
6
MRICDF
195
a
b Supervisor
Xin Yin
Yin
Above Tol. ?
Zin
Supervisor
Xin Zref
Above Tol. ?
Zin
Xout
Zs update
Zref Xout
Zs
Yout
Yout Zout update
Below min ?
Zm
Below min ?
Monitor
Design Requirement
Zm
Zout
Monitor
Actual Implementation
Fig. 6.12 Concurrent design and sequential implementation of STARMAC unit
Fig. 6.13 MRICDF model in EmCodeSyn for the STARMAC example
196
B.A. Jose and S.K. Shukla
along with the Merge primitive actor. The composite actors are shown separately in Fig. 6.14. The function actors in Monitor and Supervisor implement the comparison operations F 1 W ZinM > Zm and F 2 W ZinS < ZrefS respectively. During Epoch Analysis, epoch equations of these actors are generated. Epoch equations of STARMAC model is shown in Listing 6.6. The union and intersection operations are represent as ‘C’ and ‘&’ respectively. The epoch equations in lines 3
SUPERVISOR I3 ZinS ZrefS
S2 o3 F2 o2
I5
ZoutS M1
I4
Zout
Zin
M2 ZrefM ZinM
MONITOR F1 o1
ZoutM
I2 S1 o4
i6
M3
I1 Zm
Fig. 6.14 MRICDF models of monitor and supervisor for STARMAC unit Listing 6.6 Epoch equations for STARMAC model in EmCodeSyn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
F2 . o2 = F2 . ZinS F2 . o2 = F2 . Z r e f S S2 . o3 = S2 . i 3 & [ S2 . i 4 ] S2 . i 4 = [ S2 . i 4 ] + [S2 . i 4 ] t r u e = [S2 . i 4 ] + [S2 . i 4 ] F1 . o1 = F1 . ZinM F1 . o1 = F1 . ZrefM S1 . o4 = S1 . i 1 & [ S1 . i 2 ] S1 . i 2 = [ S1 . i 2 ] + [S1 . i 2 ] t r u e = [S1 . i 2 ] + [S1 . i 2 ] M3 . ZoutM = M3 . i 6 + M3 . Zm M2 . Z o u t = M2 . ZoutS + M2 . ZoutM M1 . ZoutS = M1 . i 5 + M1 . Z r e f S F2 . o2 = S2 . i 4 S2 . o3 = M1 . i 5 F1 . o1 = S1 . i 2 S1 . o4 = M3 . i 6 M3 . ZoutM = M2 . ZoutM M1 . ZoutS = M2 . ZoutS S2 . i 3 = F2 . ZinS F1 . ZinM = F2 . ZinS F1 . ZinM = S1 . i 1 M1 . i . 1 0 = F2 . Z r e f S
6
MRICDF
197
and 8 show the Sampler epoch equations of actors S1 and S2 , while the Merge actors are represented in the lines 11, 12 and 13. The Boolean signal S 2:i 4 is separated into a true value signal ŒS 2:i 4 and ŒS 2:i 4 in lines 4, 5. From the examination of the MRICDF epoch equations and from Fig. 6.14, it can be observed that the epochs of Zin; ZinS; ZinM; ZrefM; ZrefS are being equated due to the Function epoch relations. The only other independent input to the system is the minimum elevation to be maintained Zm. Due to these two independent inputs to the system, the MRICDF model is non-endochronous. An external master trigger has to be computed. EmCodeSyn adds exogenous constraints to the system, x and y. Here x D y, Œx D F 2:ZinS, Œy D M 3:Zm. So based on the value of Œx and Œy, the independent inputs can be read and the follower set can be computed. The follower set of the system will be as follows: (1) fxg; fyg; (2) fŒx; Œyg or fF 2:ZinS; M 3:Zmg; (3) fŒS1:i 2; ŒS 2:i 4g; and (4) M 2:Zout. Each set in the follower set shown contains signals which can be replaced with a different signal in the network of same rate. In other words, signals with same rate are computed in the same set, but are not shown here to maintain readability. A few examples in the second set are signals .F 2:ZrefS; F 2:o2; S 2:i 3; S 2:i 4/ having same rate as F 2:ZinS or Œx. With a sequential scheduling order in place for all signals in the network, code synthesis can be performed. The generated output files and intermediate files of this model can be found in [19].
6.6 Conclusions and Future Direction An alternate polychronous model of computation, MRICDF, was discussed in this chapter. MRICDF provides a visual representation for the primitives used in SIGNAL in the form of actors in a synchronous data flow network. These actors interact within the network through instantaneous communicating channels. The transformation of MRICDF models to embedded software goes through an Epoch Analysis step which looks for deadlocks and for sequential implementability. Sequential implementability of a model relies on finding a master trigger, a signal within the network which can act as a master tick. A unique sequential implementation is achieved using a prime implicate based master trigger identification technique. The existence of master trigger is proved and a sequential schedule is created in the form of a follower set. EmCodeSyn, a visual framework for designing MRICDF models is also discussed. EmCodeSyn provides the utilities for each step in the transformation of the specification into embedded software. The capabilities of MRICDF and EmCodeSyn have been demonstrated in the form of two case studies in this chapter. Together MRICDF and EmCodeSyn aim to make the design and synthesis of code targeting safety-critical applications easier. Synchronous programming languages have concurrency at the specification level, which can be used to generate multi-threaded code [25]. The sequential implementability or endochrony property discussed in the chapter has an extended counterpart called ‘weak endochrony’ which allows multiple behaviors
198
B.A. Jose and S.K. Shukla
for a program, if the output remains the same. Another area of research is real-time simulation of computation which has been partially implemented in EmCodeSyn. Computation of prime implicates is a time consuming process. Since PI computation time is dependent on the number of Boolean equations, an actor elimination technique to reduce the Boolean equations is being worked on [26].
References 1. N. Halbwachs. Synchronous programming of reactive systems. Kluwer, Dordrecht, 1993. 2. E. A. Lee and A. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(12):1217–1229, 1998. 3. G. Berry and G. Gonthier. The ESTEREL synchronous programming language: design, semantics, implementation. Science of Computer Programming, 19(2):87–152, 1992. 4. N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data flow programming language LUSTRE. Proceedings of the IEEE, 79(9):1305–1320, 1991. 5. P. L. Guernic, T. Gautier, M. L. Borgne, and C. L. Maire. Programming real-time applications with Signal. Proceedings of the IEEE, 79(9):1321–1336, 1991. 6. A. Benveniste, P. Caspi, S. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The synchronous languages twelve years later. Proceedings of the IEEE: Special Isssue on Modeling and Design of Embedded Systems, 91(1):64–83, 2003. 7. G. Berry and E. Sentovich. Multiclock esterel: correct hardware design and verification methods. Lecture Notes in Computer Science, volume 2144, pages 110–125. Springer, Berlin, 2001. 8. T. Austin, D. Blaauw, T. Mudge, and K. Flautner. Making typical silicon matter with razor. Computer, 37(3):57–65, 2004. 9. M. Blum and S. Kannan. Designing programs that check their work. In Proc. of the 21st ACM Symposium on Theory of Computing, pages 86–97, New York, NY, USA, 1989. ACM. 10. Synfora Inc. Esterel Studio EDA Tool. http://www.synfora.com/products/esterelstudio.html. 11. ESTEREL Technologies. The SCADE suite. http://www.esterel-technologies.com/products/ scade-suite. 12. ESPRESSO Project, INRIA. The Polychrony toolset. http://www.irisa.fr/espresso/Polychrony. 13. T. Gautier, P. Le Guernic, and L. Besnard. SIGNAL: A declarative language for synchronous programming of real-time systems. In Proc. of a Conf. on Functional Programming Languages and Computer Architecture, pages 257–277, 1987. 14. T. P. Amagbegnon, L. Besnard, and P. Le Guernic. Implementation of the data-flow synchronous language signal. In ACM Symp. on Prog. Languages Design and Implementation (PLDI’95), volume 1, pages 163–173, 1995. 15. D. Nowak. Synchronous structures. Information and Computation, 204(8):1295–1324, 2006. 16. B. A. Jose and S. K. Shukla. An alternative polychronous model and synthesis methodology for model-driven embedded software. In Asia and South Pacific Design Automation Conference (ASP-DAC 2010), pages 13–18, Jan. 2010. 17. B. A. Jose, S. K. Shukla, H. D. Patel, and J-P Talpin. On the multi-threaded software synthesis from polychronous specifications. In Formal Models and Methods in Co-Design (MEMOCODE), Anaheim, CA, USA, pages 129–138, Jun. 2008. 18. B. Houssais. The synchronous programming language signal. a tutorial. Technical report, IRISA ESPRESSO project, 2004. 19. B. A. Jose, L. Stewart, J. Pribble, and S. K. Shukla. Technical report on EmCodeSyn models: STARMAC and producer–consumer examples. Technical Report 2009-02, FERMAT Lab, Virginia Tech, 2009. 20. B. A. Jose, J. Pribble, Lemaire Stewart, and Sandeep K. Shukla. EmCodeSyn: a visual framework for multi-rate data flow specifications and code synthesis for embedded applications. In 12th Forum on Specification and Design Languages (FDL’09), pages 1–6, Sept. 2009.
6
MRICDF
199
21. P. Jackson and J. Pais. Computing prime implicants. In CADE-10: Proc. of the Tenth Intl. Conf. on Automated Deduction, pages 543–557, New York, NY, USA, 1990. Springer, New York. 22. J. de Kleer. An improved incremental algorithm for computing prime implicants. In Proc. AAAI-92, pages 780–785, San Jose, CA, USA, 1992. 23. A. Matusiwicz, N. Murray, and E. Rosenthal. Prime implicate tries. In Proc. of 18th Intl. Conf. on Automated Reasoning with Analytic Tableaux and Related Methods, Oslo, Norway, 2009. Lecture Notes in Computer Science, volume 5607, pages 250–264. Springer, Berlin, 2009. 24. STARMAC project Group. The Stanford testbed of autonomous rotorcraft for multi-agent control overview. http://www.hybrid.stanford.edu/starmac/overview. 25. B. A. Jose, H. D. Patel, S. K. Shukla, and J.-P. Talpin. Generating multi-threaded code from polychronous specifications. Electronic Notes on Computer Science, 238(1):57–69, 2009. 26. B. A. Jose, J. Pribble, and S. K. Shukla. Faster software synthesis using actor elimination techniques for polychronous formalism. In 10th International Conference on Applications of Concurrency to System Design (ACSD 2010), June 2010.
Chapter 7
The Time Model of Logical Clocks Available in the OMG MARTE Profile Charles Andr´e, Julien DeAntoni, Fr´ed´eric Mallet, and Robert de Simone
7.1 Introduction Embedded System Design is progressively becoming a field of choice for Model-Driven Engineering techniques. There are fundamental reasons for this trend: 1. Target execution platforms in the embedded world are often heterogeneous, flexible or reconfigurable by nature (as opposed to the conventional Von Neumann computer architecture). Such architectures can even sometimes be decided upon and customized to the specific kind of applications meant to run on them. Early architecture modeling allows exploring possible variations, possibly optimizing the form of future execution platforms before they are actually built. Architecture exploration is becoming a key ingredient of embedded system design, where applications and execution platforms are essentially designed jointly and concurrently at model level. 2. Applications are often reactive by nature, that is, meant to react repeatedly with an external environment. The main design concern goes with handling data or control flow propagation, which includes frequently streaming and pipelined processing. In contrast, the actual data values and computation contents are of lesser importance (for design, not for correctness). Application models such as process networks, reactive components and state/activity diagrams are used to represent the structure and the behavior of such applications (more than usual software engineering notions such as classes, objects, and methods). 3. Designs are usually subject to stringent real-time requirements, imposed “from above”, while they are constrained by the limitations of their components and the availability of execution platform resources, “from below”. Allocation models can here serve to check at early stages whether there exist feasible mappings and C. Andr´e, J. DeAntoni, F. Mallet, and R. de Simone Laboratoire I3S, UMR 6070 CNRS, Universit´e Nice-Sophia Antipolis, INRIA Sophia Antipolis M´editerran´ee, 06902 Sophia Antipolis, France e-mail: [email protected]; [email protected]; [email protected]; Robert.De [email protected]
S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 7, c Springer Science+Business Media, LLC 2010
201
202
C. Andr´e et al.
scheduling of application functions to architecture resources and services that may match the requirements under the given constraints. The model-driven approach to embedded system engineering is often tagged as Y-Chart methodology, or also Platform-Based design. In this approach, application and architecture (execution platform) models are developed and refined concurrently, and then associated by allocation relationships, again at a virtual modeling level. The representation of requirements and constraints in this context becomes itself an important issue, as they guide the search for optimal solutions inside the range of possible allocations. They may be of various natures, functional or extra-functional. In this article, we focus on requirements and constraints which we call logical functional timing, and which we feel to be an important (although too often neglected or misconceived) aspect of embedded system modeling. Timing information in modeling is often used as extra-functional, real-time annotations analyzed mostly by simulation. But the relevance of these timing figures in later implementations has to be further assessed then. On the other hand, time information also carries functional intent, as it selects some behaviors and discards others. Very often this can be done not only by using single form physical time, but also logical multiform time models. For instance, it may be sufficient to state that a process runs twice as fast as another, or at least as fast as the second, without providing concrete physical time figures. The selected solutions will work for any such physical assignment that matches the logical time constraints. Also, durations may be counted with events that are not regular in physical time: the number of clock cycles of a processor in a low-power design context may vary according to frequency scaling or clock gating; processing functions may be indexed by the engine crankshaft revolution angle in case of automotive applications. Other examples abound in the embedded design world. Modeling with logical time partial ordering was advocated in [16]. The notion of multiform (or polychronous) logical time has been exploited extensively in the theory of Synchronous languages [2], in HDLs (Hardware Description Languages), but also importantly in the many approaches of model-based scheduling theories around process networks [6, 26] and formal data/control-flow graphs synthesized from nested loop programs [8]. Software pipelining and modulo scheduling techniques are based on such logical time counted in parallel processor execution cycles. The important common feature of all these approaches is that (logical) time is an integral part of the functional design, to be used and maintained along compilation/synthesis, and not only in simulation. In many cases the designer is fronted with time decisions in order to specify correct behaviors. This is of course largely true also of classical physical real-time requirements, but their operational demand is usually downplayed as they are only considered for analysis, not synthesis. Many of the techniques are still shared between both worlds (for instance consider zero-time abstraction of atomic behaviors, and progress all behaviors in correct causal order in a run-to-completion mode before time may pass in a discrete step). Some techniques are still different, and certainly the goals are, but the underlying models are similar. Large and heterogeneous
7
MARTE Time Model
203
systems require a single common environment to integrate all these models while still preserving the semantics and the analysis techniques of each of them. We consider here the use of existing modeling tools to integrate all these models. The UML appears as a good candidate since it has unified in a common syntactic notation most of the underlying formal models generally used for embedded systems: state machines, data-flow graphs (UML activities), static models to describe the execution platforms (block diagrams). It is even more relevant because the UML profile for MARTE, recently adopted by the OMG, proposes extensions to UML specifically targeting real-time and embedded systems. Our goal is to provide an explicit formal semantics to the UML elements so that it can be referred to as a golden model by external tools and make the model amenable to formal analysis, code generation or synthesis. An explicit and formal semantics within the model is fundamental to prevent different analysis tools from giving a different semantics to the same model. Each analysis technique relies on a specific formal model that has its own model of computation and communication (MoCC). We propose a language, called Clock Constraint Specification Language (CCSL), specifically devised to equip a given model (conformant to the UML or to any domain-specific language) with a timed causality model and thus define a specific MoCC. When considering UML models, CCSL relies on the MARTE time subprofile to identify model elements on which CCSL constraints apply. In this chapter, we select different models from different domains to illustrate possible uses of CCSL. First, we show how it can be used to express classical constraints of the real-time and embedded domain by expressing East-ADL [27] extra-functional properties in CCSL. East-ADL proposes a set of time requirements classical in automotive applications (duration, deadline, jitter) and on which time requirements for AUTOSAR1 are being built. The second illustration falls in the avionics domain and focuses on AADL (Architecture and Analysis Description Language) [9]. AADL is an Architecture Description Language (ADL) adopted by the SAE (Society of Automotive Engineering) that offers specific support for schedulability analysis. It also considers classical computation (periodic, sporadic, aperiodic) and communication (immediate/delayed, event-based or timed-triggered) patterns. However, it departs from East-ADL because it explicitly considers the execution platform to which the application is allocated. Our illustration uses MARTE (and notably its allocation subprofile) to build a model amenable to architecture exploration and schedulability analysis. These first two examples consider models that combine logical and physical time. The last example considers a purely logical case. A CCSL library that specifies the operational semantics of the Synchronous Data Flow (SDF [15]) is built. This library is applied to several diagrammatic views, equipping purely syntactic models with an explicit behavioral semantics. Section 7.2 gives a general overview of MARTE, a detailed view of the time subprofile and of its facilities to annotate UML models with (logical) time. Then, Sect. 7.3 describes the CCSL syntax and semantics together with the mechanisms for building libraries. It also introduces TimeSquare, the dedicated environment 1
http://www.autosar.org.
204
C. Andr´e et al.
we have built to analyze and verify UML/MARTE/CCSL models. The following sections address several possible usages of CCSL and describe CCSL libraries for the different subdomains: automotive and East-ADL in Sect. 7.4, avionics and AADL in Sect. 7.5, static analysis and SDF in Sect. 7.6.
7.2 The UML Profile for MARTE 7.2.1 Overview The Unified Modeling Language (UML) [23] is a general-purpose modeling language specified by the Object Management Group (OMG). It proposes graphical notations to represent all aspects of a system from the early requirements to the deployment of software components, including design and analysis phases, structural and behavioral aspects. As a general-purpose language, it does not focus on a specific domain and maintains a weak, informal semantics to widen its application field. However, when targeting a specific application domain and especially when building trustworthy software components or for critical systems where life may be at stake, it is absolutely required to extend the UML and attach a formal semantics to its model elements. The simplest and most efficient extension mechanism provided by the UML is through the definition of profiles. A UML profile adapts the UML to a specific domain by adding new concepts, modifying existing ones and defining a new visual representation for others. Each modification is done through the definition of annotations (called stereotypes) that introduce domain-specific terminology and provide additional semantics. However, the semantics of stereotypes must be compatible with the original semantics (if any) of the modified or extended concepts. The UML profile for Modeling and Analysis of Real-Time and Embedded systems (MARTE [22]) extends the UML with concepts related to the domain of real-time and embedded systems. It supersedes the UML profile for Schedulability, Performance and Time (SPT [20]) that was extending the UML 1.x and that had limited capabilities. MARTE has three parts: Foundations, Design and Analysis. The foundation part is itself divided into five chapters: CoreElements, NFP, Time, Generic Resource Modeling and Allocation. CoreElements defines configurations and modes, which are key parameters for analysis. In real-time systems, preserving the nonfunctional (or extra-functional) properties (power consumption, area, financial cost, time budget, . . . ) is often as important as preserving the functional ones. The UML proposes no mechanism at all to deal with non-functional properties and relies on mere string for that purpose. NFP (Non Functional Properties) offers mechanisms to describe the quantitative as well as the qualitative aspects of properties and to attach a unit and a dimension to quantities. It defines a set of predefined quantities, units and dimensions and supports customization. NFP comes with a companion language called VSL (Value Specification Language) that defines the concrete syntax to be used in expressions of non-functional properties. VSL also recommends
7
MARTE Time Model
205
syntax for user-defined properties. Time is often considered as an extra-functional property that comes as a mere annotation after the design. These annotations are fed into analysis tools that check the conformity without any actual impact on the functional model: e.g., whether a deadline is met, whether the end-to-end latency is within the expected range. Sometimes though, time can also be of a functional nature and has a direct impact on what is done and not only when it is done. All these aspects are addressed in the time chapter of MARTE. The next section elaborates on the time profile. The design part has four chapters: High Level application modeling, Generic component modeling, Software Resource Modeling, and Hardware Resource Modeling. The first chapter describes real-time units and active objects. Active objects depart from passive ones by their ability to send spontaneous messages or signals, and react to event occurrences. Normal objects, the passive ones, can only answer to the messages they receive. The three other parts provide a support to describe resources used and in particular execution platforms on which applications may run. A generic description of resources is provided, including stereotypes to describe communication media, storages and computing resources. Then this generic model is refined to describe software and hardware resources along with their non-functional properties. The analysis part also has a chapter that defines generic elements to perform model-driven analysis on real-time and embedded systems. This generic chapter is specialized to address schedulability analysis and performance analysis. The chapter on schedulability analysis is not specific to a given technique and addresses various formalisms like the classic and generalized Rate Monotonic Analysis (RMA), holistic techniques, or extended timed automata. This chapter provides all the keywords usually required for such analyses. In Sect. 7.5, we follow a rather different approach and instead of focusing on syntactic elements usually required to perform schedulability analysis (periodicity, task, scheduler, deadline, latency), we show how we can use MARTE time model and its companion language CCSL to build libraries of constraints that reflect the exact same concepts. Finally, the chapter on performance analysis, even if somewhat independent of a specific analysis technique, emphasizes on concepts supported by the queueing theory and its extensions. MARTE extends the UML for real-time and embedded systems but should be refined by more specific profiles to address specific domains (avionics, automotive, silicon) or specific analysis techniques (simulation, schedulability, static analysis). The three examples addressed here consider different domains and/or different analysis techniques to motivate the demand for a fairly general time model that has justified the creation of MARTE time subprofile.
7.2.2 The Time Profile Time in SPT is a metric time with implicit reference to physical time. As a successor of SPT, MARTE supports this model of time. UML 2, issued after SPT,
206
C. Andr´e et al.
has introduced a model of time called SimpleTime [23, Chap. 13]. This model also makes implicit reference to physical time, but is too simple for use in real-time applications, and was initially devised to be extended in dedicated profiles. MARTE goes beyond SPT and UML 2. It adopts a more general time model suitable for system design. In MARTE, Time can be physical, and considered as continuous or discretized, but it can also be logical, and related to user-defined clocks. Time may even be multiform, allowing different times to progress in a non-uniform fashion, and possibly independently to any (direct) reference to physical time. In MARTE, time is represented by a collection of Clocks. Each clock specifies a totally ordered set of instants. There may be dependence relationships between instants of different clocks. Thus this model, called the MARTE time structure, is akin to the Tagged Systems [16]. To cover continuous and discrete times, the set of instants associated with a clock can either be dense or discrete. In this paper, most clocks are discrete (i.e., they represent discrete time). In this case the set of instants is indexed by natural numbers. For a clock c, cŒk denotes its k th instant. The MARTE Time profile defines two stereotypes ClockType and Clock to represent the concept of clock. ClockType gathers common features shared by a family of clocks. The ClockType fixes the nature of time (dense or discrete), says whether the represented time is linked to physical time or not (respectively identified as chronometric clocks and logical clocks), chooses the type of the time units. A Clock, whose type must be a ClockType, carries more specific information such as its actual unit, and values of quantitative (resolution, offset, etc.) or qualitative (time standard) properties, if relevant. TimedElement is another stereotype introduced in MARTE. A timed element is explicitly bound to at least one clock, and thus closely related to the time model. For instance, a TimedEvent, which is a specialization of TimedElement extending UML Event, has a special semantics compared to usual events: it can occur only at instants of the associated clock. In a similar way, a TimedValueSpecification, which extends UML ValueSpecification, is the specification of a set of time values with explicit references to a clock, and taking the clock unit as time unit. Thus, in a MARTE model of a system, the stereotype TimedElement or one of its specializations is applied to model elements which have an influence on the specification of the temporal behavior of this system. The MARTE Time subprofile also provides a model library named TimeLibrary. This model library defines the enumeration TimeUnitKind which is the standard type of time units for chronometric clocks. This enumeration contains units like s (second), its submultiples, and other related units (minute, hour, . . . ). The library also predefines a clock type (IdealClock) and a clock (idealClk) whose type is IdealClock. idealClk is a dense chronometric clock with the second as time unit. This clock is assumed to be an ideal clock, perfectly reflecting the evolutions of physical time. idealClk should be imported in user’s models with references to physical time concepts (i.e., frequency, physical duration, etc.). This is illustrated in Sects. 7.4 and 7.5.
7
MARTE Time Model
207
7.3 CCSL Time Model 7.3.1 The Clock Constraint Specification Language CCSL is a language to impose dependence relationships between instants of different clocks. This dependency is specified by Clock constraints. A ClockConstraint is a TimeElement that extends UML Constraint. The constrained elements are clocks. A clock constraint imposes relationships between instants of its constrained clocks. So, to understand clock constraints, we have first to define relations on instants.
7.3.1.1
Instant Relations
The precedence relation 4 is a reflexive and transitive binary relation on set of instants. From 4 we derive four new relations: Coincidence ( , 4 \ delay. The delay previous read operations actually get tokens initially available and that do not match an actual write operation. CCSL operator delayedFor can represent such a delay (7.8). To represent SDF arcs, we propose to create in a library, a new relation definition, called token. Such a relation has three parameters: two clocks (write and read) and an integer (delay). The token relation applies the adequate constraint (7.8). Note that, when delay D 0, (7.8) reduces to write read. def token.clock write; clock read; int delay/ , write .read delayedFor delay/
7.6.2.3
(7.8)
Inputs
Inputs require a packet-based precedence (keyword by in (7.9)). A relation definition, called input, has three parameters. The clock actor represents the actor with which the input is associated. The clock read represents the reading of tokens on the incoming arc. The strictly positive integer weight represents the input weight. def input.clock actor; clock read; int weight/ , .read by weight/ actor
(7.9)
Here again, the packet-based precedence can be built with the filtering operator (7.10). When weight D 1, it reduces to read actor.
! read H 0weight1 :1 actor
(7.10)
224
7.6.2.4
C. Andr´e et al.
Outputs
Outputs are represented by the relation definition output , which has three parameters. The clock actor represents the actor with which the output is associated. The clock write represents the writing of tokens to the outgoing arc. The strictly positive integer weight represents the output weight. CCSL operator filteredBy is used (7.11). When weight D 1, (7.11) simplifies into actor D write. def output.clock actor; clock write; int weight/ , ! actor D write H 1:0weight1
7.6.2.5
(7.11)
Arcs
Arcs can globally be seen as a conjunction of one output, one token and one input. The library relation definition arc (7.12) specifies a complete set of constraints for an arc with a delay delay, from an actor source, with an output weight out to another actor target, with an input weight in. In that case, CCSL clocks write and read are local clocks. def arc.int delay; clock source; int out; clock target; int in/ , clock read; write output.source; write; out/ j token.write; read; delay/ j input.target; read; in/
(7.12)
7.6.3 Applying the SDF Semantics The previous subsection defines, for each SDF model element, the CCSL clocks that must be created and the clock constraints to apply. This section illustrates the use of our CCSL library for SDF. Our purpose is to explicitly add to an existing model the SDF semantics. In this example, we use UML activities (Fig. 7.9a) and state machines (Fig. 7.9c) to build with the Papyrus editor a simple SDF graph (Fig. 7.9b). Our intent is NOT to extend the semantics of UML activities and state machines but rather to use Papyrus as a mere graphical editor to build graphs, i.e., nodes connected by arcs. Our proposal is to attach CCSL clocks to some UML elements represented by the modeling editor. Papyrus gives the concrete graphical syntax and the clock constraints give explicitly the expected execution rules. By instantiating elements of our CCSL library for SDF, these two models can be given the execution semantics as classically understood for SDF graphs. The idea is to add the semantics within the model explicitly without being implicitly bound to some external description. In our case, the semantics is given by the CCSL specification and by explicit associations between CCSL clocks and model elements.
7
MARTE Time Model
a
Activity in Papyrus
225
b
c
State machine in Papyrus
SDF graph
SDF
SDF
A oab
A
A o_ab
1 tab
t_ab
2 iab B
i_ab icb
B i_cb
o_bc
2
2 tcb
tbc
1 i_bc
o_cb C
B
obc
t_bc
t_cb
1 ocb
ibc
C
C
Fig. 7.9 Paradigms of deadlocks
Fig. 7.10 A simulation output with TimeSquare
When using the syntax of activities, CCSL clocks that represent actors are associated with actions and CCSL clocks that represent readings from (resp. writings to) the arc queues are associated with input (resp. output) pins. All other CCSL clocks are left unbound and are just used as local clocks with no direct interpretation on the model. When using the syntax of state machines, only CCSL clocks that represent actors are bound to the states and other clocks are left unbound. Equation (7.13) uses our library to give to the models on parts (a) and (c) the same semantics as understood when considering the SDF graph in the middle part (b). Clocks are named after the model elements with which they are associated, i.e., the clock for actor A is named A. However, this rule is for clarity only and the actual association is done explicitly in the CCSL model. S , arc.0; A; 1; B; 2/ j arc.0; B; 2; C; 1/ j arc.2; C; 1; B; 2/:
(7.13)
Fig. 7.10 shows one possible execution of this specification produced by TimeSquare. Intermediate clocks are hidden and only actors are displayed. The relative execution of the actors is what matters when considering SDF graphs.
226
C. Andr´e et al.
7.7 Conclusion The UML profile for MARTE extends the UML to model real-time and embedded systems, at the appropriate level of description. However, the real-time and embedded domain is vast and lots of efforts have already been done to provide dedicated proprietary UML extensions [5, 11, 25]. MARTE clearly does not cover the whole domain and should be refined to tackle specific aspects, and provide a larger support for analysis and synthesis tools. Our contribution is to illustrate the use of logical time for system specification. This is done by using the MARTE time subprofile conjointly with CCSL. Logical time is thus an integral part of the functional design that can be exploited by compilation/synthesis tools, not restricting time to annotations for performance evaluation or simulation. At the same time, there is a large demand for standards because the community, tool vendors and manufacturers, wants to rely on industry-proven, perennial languages. Many specifications in various subdomains are issued by various organizations to answer this demand (AADL, AUTOSAR, East-ADL, IP-Xact, . . . ), even though these subdomains have covered for a long time separate markets and were addressed by different communities. With the emergence of large systems, systems of systems, we need to combine several aspects of these subdomains into a common modeling environment. The automotive and avionic industries, which were using dedicated processors are now integrating generic processors and need interoperability between their own models and models used by electronic circuits manufacturers. Therefore, we need formalisms able to compose these models both syntactically and semantically. Having a common semantic framework is required to ensure interoperability of tools. By selecting several examples from some of these subdomains, we have shown that MARTE time profile and CCSL can be used to tackle different aspects of some of these emerging specifications. We advocate that the composed model must provide structural composition rules but also a description of the intent semantics, so that some analysis can be done at the model level. Then, this model can serve as a golden specification for equivalence comparison with other formal models suitable to apply specific analysis or synthesis techniques.
References 1. C. Andr´e. Syntax and semantics of the clock constraint specification language (CCSL). Research Report 6925, INRIA and University of Nice, May 2009. 2. A. Benveniste, P. Caspi, S. Edwards, N. Halbwachs, P. Le Guernic, and R. de Simone. The synchronous languages twelve years later. Proceedings of the IEEE, 91(1):64–83, 2003. 3. A. Cohen, M. Duranton, C. Eisenbeis, C. Pagetti, F. Plateau, and M. Pouzet. N-synchronous Kahn networks: a relaxed model of synchrony for real-time systems. In J. G. Morrisett and S. L. P. Jones, editors, POPL, pages 180–193. ACM, January 2006. 4. P. Cuenot, D. Chen, S. G´erard, H. L¨onn, M.-O. Reiser, D. Servat, C.-J. Sjostedt, R. T. Kolagari, M. Torngren, and M. Weber. Managing complexity of automotive electronics using the EastADL. In Proc. of the 12th IEEE Int. Conf. on Engineering Complex Computer Systems (ICECCS’07), pages 353–358. IEEE Computer Society, 2007.
7
MARTE Time Model
227
5. B. P. Douglass. Real-time UML. In W. Damm and E.-R. Olderog, editors, FTRTFT, volume 2469 of Lecture Notes in Computer Science, pages 53–70. Springer, Berlin, 2002. 6. J. Eker, J. Janneck, E. Lee, J. Liu, X. L., J. Ludvig, S. Neuendorffer, S. Sachs, and Y. Xiong. Taming heterogeneity – the Ptolemy approach. Proceedings of the IEEE, 91(1):127–144, 2003. 7. M. Faug`ere, T. Bourbeau, R. de Simone, and S. G´erard. Marte: also an UML profile for modeling AADL applications. In ICECCS – UML&AADL, pages 359–364. IEEE Computer Society, 2007. 8. P. Feautrier. Compiling for massively parallel architectures: a perspective. Microprogramming and Microprocessors, (41):425–439, 1995. 9. P. Feiler, D. Gluch, and J. Hudak. The architecture analysis and design language (AADL): an introduction, Technical report CMU/SEI-2006-TN-011, CMU, 2006. 10. P. H. Feiler and J. Hansson. Flow latency analysis with the architecture analysis and design language. Technical report CMU/SEI-2007-TN-010, CMU, June 2007. 11. S. Graf. OMEGA: correct development of real time and embedded systems. Software and System Modeling, 7(2):127–130, 2008. 12. R. Johansson, H. L¨onn, and P. Frey. ATESST timing model. Technical report, ITEA, 2008. Deliverable D2.1.3. 13. G. Kahn. The semantics of a simple language for parallel programming. In Information Processing, pages 471–475, 1974. 14. R. M. Karp and R. E. Miller. Properties of a model for parallel computations: determinacy, termination, queueing. SIAM Journal on Applied Mathematics, 14(6):1390–1411, 1966. 15. E. A. Lee and D. G. Messerschmitt. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers, 36(1):24–35, 1987. 16. E. A. Lee and A. L. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(12):1217–1229, 1998. 17. S.-Y. Lee, F. Mallet, and R. de Simone. Dealing with AADL end-to-end flow latency with UML Marte. In ICECCS – UML&AADL, pages 228–233. IEEE CS, April 2008. 18. F. Mallet and R. de Simone. MARTE vs. AADL for Discrete-Event and Discrete-Time Domains, volume 36 of LNEE, chapter 2, pages 27–41. Springer, Berlin, 2009. 19. F. Mallet, M.-A. Peraldi-Frati, and C. Andr´e. Marte CCSL to execute East-ADL timing requirements. In ISORC, pages 249–253. IEEE Computer Society, March 2009. 20. OMG. UML Profile for Schedulability, Performance, and Time Specification, v1.1. Object Management Group, January 2005. formal/05-01-02. 21. OMG. Systems Modeling Language (SysML) Specification, v1.1. Object Management Group, November 2008. formal/08-11-02. 22. OMG. UML Profile for MARTE, v1.0. Object Management Group, November 2009. formal/2009-11-02. 23. OMG. Unified Modeling Language, Superstructure, v2.2. Object Management Group, February 2009. formal/2009-02-02. 24. C. Petri. Concurrency theory. In W. Brauer, W. Reisig, and G. Rozenberg, editors, Petri Nets: Central Models and Their Properties, volume 254 of Lecture Notes in Computer Science, pages 4–24. Springer, Berlin, 1987. 25. B. Selic. The emerging real-time UML standard. Computer Systems Science and Engineering, 17(2):67–76, 2002. 26. S. Sriram and S. S. Bhattacharyya. Embedded Multiprocessors Scheduling and Synchronization, second edition. CRC, Boca Raton, FL, 2009. 27. The ATESST Consortium. East-ADL2 specification. Technical report, ITEA, March 2008. http://www.atesst.org, 2008-03-20. 28. The East-EEA Project. Definition of language for automotive embedded electronic architecture approach. Technical report, ITEA, 2004. Deliverable D.3.6. 29. T. Weilkiens. Systems Engineering with SysML/UML: Modeling, Analysis, Design. The MK/OMG, Burlington, MA, 2008.
Chapter 8
From Synchronous Specifications to Statically Scheduled Hard Real-Time Implementations Dumitru Potop-Butucaru, Robert de Simone, and Yves Sorel
8.1 Introduction The focus of this chapter is on the development of hard real-time embedded control systems. The evolution of this field has been lately driven by two major trends: 1. The increasing synergy between traditional application areas for hard realtime (avionics, automotive, robotics) and the ever-growing consumer electronics industry 2. The need for new development techniques to meet the challenges of productivity in a competitive environment These trends are the result of industrial needs: In avionics and automotive applications, specification size and complexity rapidly increases, while the development cycle needs to be shortened. Meanwhile, the pervasive introduction of real-time embedded control systems in ordinary-life objects, such as phones or home appliances, implies a need for higher reliability, including strict timing requirements, if only due to the large product recall costs. Thus, in both consumer electronics and more traditional safety-critical areas there is a need to build larger, more reliable hard real-time embedded control applications in shorter time frames. New development techniques have been proposed in response to this need. In this chapter, we present one such technique allowing the synthesis of efficient hard real-time embedded systems [31]. Our technique has two specific points: It allows the synthesis of correct-by-construction distributed embedded systems
in a fully automatic way, including both partitioning and real-time scheduling. The strength of our approach is derived from the use of Synchronous Reactive
(S/R) languages and formalisms and the associated analysis and code generation techniques, adapted of course to our real-time framework. D. Potop-Butucaru () and Y. Sorel INRIA, Centre de Recherche de Paris-Rocquencourt, Le Chesnay Cedex, France e-mail: [email protected]; [email protected] R. de Simone INRIA, Centre de Recherche de Sophia-Antipolis, Sophia Antipolis Cedex, France e-mail: [email protected] S.K. Shukla and J.-P. Talpin (eds.), Synthesis of Embedded Software: Frameworks and Methodologies for Correctness by Construction, DOI 10.1007/978-1-4419-6400-7 8, c Springer Science+Business Media, LLC 2010
229
230
D. Potop-Butucaru et al.
S/R languages [2–4, 20] are high-level modeling formalisms relying on the synchronous hypothesis, which lets computations and behaviors be divided into a discrete sequence of computation steps which are equivalently called reactions or execution instants. In itself, this assumption is rather common in practical embedded system design. But the synchronous hypothesis adds to this the fact that, inside each instant, the behavioral propagation is well-behaved (causal), so that the status of every signal or variable is established and defined prior to being tested or used. This criterion, which may be seen at first as an isolated technical requirement, is in fact the key point of the approach. It ensures strong semantic soundness by allowing universally recognized mathematical models such as the Mealy machines and the digital circuits to be used as supporting foundations. In turn, these models give access to a large corpus of efficient optimization, compilation, and formal verification techniques. The synchronous hypothesis also guarantees full equivalence between various levels of representation, thereby avoiding altogether the pitfalls of non-synthesizability of other similar formalisms. Synchronous reactive formalisms have long been used [3] in the design of digital circuits and in the design of safety-critical embedded systems in fields such as avionics, nuclear I&C and railway control. Novel synchronous formalisms and associated development techniques are currently gaining recognition for their modeling adequacy to other classes of embedded applications. The S/R formalisms used in our approach belong to the class of so-called dataflow formalisms, which shape applications based on intensive data computation and data-flow organization. Two reasons to this: The first one is that hard real-time (embedded) systems are often designed as automatic control systems, and the de facto standard for the design of automatic control systems is the data-flow formalism Simulink [38]. When a real-time (embedded) implementation is desired, the (continuous-time) Simulink specification is discretized. To represent the resulting discrete-time specification, it is then convenient to use a data-flow synchronous language such as Scade, which is very close in syntax to Simulink, and at the same time gives access to the various analysis and code generation techniques mentioned above. The second reason for relying on data-flow formalisms is that they are similar to classical dependent task models used in hard real-time scheduling. However, synchronous dataflow formalisms go beyond the dependent task models and the classical dataflow model of Dennis [14] by introducing a form of hierarchical conditional execution allowing the description of execution modes. Execution conditions are represented with so-called logical clocks, which are (internally generated) Boolean expressions associated with data-flow computation blocks. The clocks prescribe which data computation blocks are to be performed as part of the current reaction. Efficient real-time implementation of such specifications relies on two essential ingredients: Clock analysis determines the relations between the various hierarchical execution conditions (clocks). For instance, we can determine that two computation blocks can never be executed in parallel, because their execution conditions are never true at the same execution cycle. We say in this case that their clocks are exclusive.
8
Real-Time Implementation of Synchronous Specifications
231
Real-time scheduling consists in the efficient (spatial) allocation and (temporal) scheduling of the various data-flow blocks, taking into account the clock relations determined above. For instance, two computation blocks having exclusive clocks can be allocated on the same processor at the same date (but the conditional execution will ensure that only one is active at a time). In this chapter we present an offline (static) scheduling technique based on these principles. The schedule is computed offline as a schedule table where all the computations and communications of an execution cycle are allocated in time and space. To allow correct-by-construction implementation, the schedule table must respect consistency rules such as causality. Causality means that all the information needed to compute the clock of a dataflow block, as well as its data inputs, are available before the start date of the dataflow block on the processor where it will run. Once a consistent schedule table is computed, a real-time implementation will have to periodically run the dataflow blocks, as prescribed by the schedule table, on the target architecture. The real-time period on which the schedule table is run must be large enough to ensure that successive runs of the table (which correspond to successive clock cycles of the specification) do not interfere. The remainder of the chapter is structured as follows: We first give a general view of the synchronous model and of the synchronous formalisms. We explain the rationale behind their introduction, present the basic principles and underlying mathematical models, compare them with neighboring models, and briefly present their implementation issues. Then, we give a brief overview of the problems related with the construction of real-time systems. We then present our solution for the development of distributed real-time implementation from synchronous specifications. This presentation starts with an intuitive example which explains how a synchronous dataflow specification can be transformed into a statically scheduled real-time implementation, and how clock analysis can be used to improve the generated implementations. The remainder of the paper introduces the formalism allowing the representation of real-time schedules of synchronous specifications, defines the correctness properties such a schedule must satisfy, and provides correctnesspreserving optimization techniques.
8.2 The Synchronous Hypothesis 8.2.1 What For? Program correctness (the process performs as intended) and program efficiency (it performs as fast as possible) are major concerns in all of computer science, but they are even more stringent in the embedded area, as on-line debugging is difficult or impossible, and time budgets are often imperative (for instance in safety-critical systems and multimedia applications).
232
D. Potop-Butucaru et al.
Program correctness is sought by introducing appropriate syntactic constructs and dedicated languages, making programs more easily understandable by humans, as well as allowing high-level modeling and associated verification techniques. Provided semantic preservation is ensured down to actual implementation code, this provides reasonable guarantees on functional correctness. However, while this might sound obvious for traditional software compilation schemes, the development of an embedded control system is often not “seamless”, involving manual rewriting and/or the use of hardware, operating systems and middleware for which the semantics are either undefined, or too complex to be fully integrated in the semantic analysis without some level of abstraction. Program efficiency is traditionally handled in the software world by algorithmic complexity analysis, and expressed in terms of individual operations. But in modern systems, due to a number of phenomena, this “high-level” complexity reflects rather imperfectly the “low-level” complexity in numbers of clock cycles spent. In the real-time scheduling community, the main solution has been the introduction of various abstraction levels (known as task models and architecture models) allowing the various timing analyses that provide real-time guarantees. Such task models are often very abstract. In the seminal model defined by Liu and Layland [27], control, synchronization and communication are not explicitly represented, meaning that their costs cannot be directly accounted for. What the model defines are repetitive pieces of code (the tasks), bounds on task durations, and real-time constraints (deadlines). The strength of the model (its simplicity) is also its major problem, because timing (schedulability) analyses tend to be overly pessimistic. The repetitive, cycle-based execution specified by task models is natural in realtime, because it corresponds to the natural implementation of automatic control systems. But cycle-based execution models are not restricted to the real-time community. They are also the norm in the design of digital synchronous circuits, and simulation environments in many domains (from scientific engineering to Hardware Description Language (HDL) simulators) often use lockstep computation paradigms. The difference is that, in these settings, cycles represent logical steps, not physical time. Of course, timing analysis is still possible afterwards, and in fact often simplified by the previous division into cycles. In this chapter we advocate for a similar approach for the RT/E domain. The focus of synchronous languages is to allow modeling and programming of such systems where cycle (computation step) precision is needed. The objective is to provide domain-specific structured languages for their description, and to study matching techniques for efficient design, including compilation/synthesis/realtime scheduling, optimization, and analysis/verification. The strong condition ensuring the feasibility of these design activities is the synchronous hypothesis, described next.
8
Real-Time Implementation of Synchronous Specifications
233
8.2.2 Basic Notions What has come to be known as the Synchronous Hypothesis, laying foundations for S/R systems, is really a collection of assumptions of a common nature, sometimes adapted to the framework considered. We shall avoid heavy mathematical formalization in this presentation, and defer the interested reader to the existing literature, such as [2, 3]. The basics are: Instants and reactions. Behavioral activities are divided according to (logical, abstract) discrete time. In other words, computations are divided according to a succession of non-overlapping execution instants. In each instant, input signals possibly occur (for instance by being sampled), internal computations take place, and control and data are propagated until output values are computed and a new global system state is reached. This execution cycle is called the reaction of the system to the input signals. Although we used the word “time” just before, there is no real physical time involved, and instant durations need not be uniform (or even considered!). All that is required is that reactions converge and computations are entirely performed before the current execution instant ends and a new one begins. This empowers the obvious conceptual abstraction that computations are infinitely fast (“instantaneous”, “zero-time”), and take place only at discrete points in (physical) time, with no duration. When presented without sufficient explanations, this strong formulation of the Synchronous Hypothesis is often discarded by newcomers as unrealistic (while, again, it is only an abstraction, amply used in other domains where “all-or-nothing” transaction operations take place). Signals. Broadcast signals are used to propagate information. At each execution instant, a signal can either be present or absent. If present, it also carries some value of a prescribed type (“pure” signals exists as well, that carry only their presence status). The key rule is that a signal must be consistent (same present/absent status, same data) for all read operations during any given instant. In particular, reads from parallel components must be consistent, meaning that signals act as controlled shared variables. Causality. The crucial task of deciding whenever a signal can be declared absent is of utter importance in the theory of S/R systems, and an important part of the theoretical body behind the Synchronous Hypothesis. This is of course especially true of local signals, that are both generated and tested inside the system. The fundamental rule is that the presence status and value of a signal should be defined before they are read (and tested). This requirement takes various practical forms depending on the actual language or formalism considered, and we shall come back to this later. Note that “before” refers here to causal dependency in the computation of the instant, and not to physical or even logical time between successive instants [5]. The Synchronous Hypothesis ensures that all possible schedules of operations amount to the same result (convergence); it also leads to the definition of “correct” programs, as opposed to ill-behaved ones where no causal scheduling can be found.
234
D. Potop-Butucaru et al.
Activation conditions and clocks. Each signal can be seen as defining (or generating) a new clock, ticking when it occurs; in hardware design, this is called gated clocks. Clocks and sub-clocks, either external or internally generated, can be used as control entities to activate (or not) component blocks of the system. We shall also call them activation conditions.
8.2.3 Mathematical Models If one forgets temporarily about data values, and one accepts the duality of presentabsent signals mapped to true/false values, then there is a natural interpretation of synchronous formalisms as synchronous digital circuits at schematic gate level, or “netlists” (roughly RTL level with only Boolean variables and registers). In turn, such circuits have a straightforward behavioral expansion into Mealy Finite State Machines (FSM). The two slight restrictions above are not essential: the adjunction of types and values into digital circuit models has been successfully attempted in a number of contexts, and S/R systems can also be seen as contributing to this goal. Meanwhile, the introduction of clocks and presence/absence signal status in S/R languages departs drastically from the prominent notion of sensitivity list generally used to define the simulation semantics of Hardware Description Languages (HDLs). We now comment on the opportunities made available through the interpretation of S/R systems into Mealy machines or netlists. Netlists. We consider here a simple form, as Boolean equation systems defining the values of wires and Boolean registers as a Boolean function of other wires and previous register values. Some wires represent input and output signals (with value true indicating signal presence), others are internal variables. This type of representation is of special interest because it can provide exact dependency relations between variables, and thus the good representation level to study causality issues with accurate analysis. Notions of “constructive” causality have been the subject of much attention here. They attempt to refine the usual crude criterion for synthesizability, which forbids cyclic dependencies between non-register variables (so that a variable seems to depend upon itself in the same instant), but does not take into account the Boolean interpretation, nor the potentially reachable configurations. Consider the equation x D y _ z, while it has been established that y is the constant true. Then x does not really depend on z, since its (constant) value is forced by y’s. Constructive causality seeks for the best possible faithful notion of true combinatorial dependency taking the Boolean interpretation of functions into account. For details, see [36]. Another equally important aspect of the mathematical model is that a number of combinatorial and sequential optimization techniques have been developed over the years, in the context of hardware synthesis approaches. The main ones are now embedded in the SIS and MVSIS optimization suites, from UC Berkeley
8
Real-Time Implementation of Synchronous Specifications
235
[16, 34]. They come as a great help in allowing programs written in high-level S/R formalisms to compile into efficient code, either software or hardwaretargeted [35]. Mealy machines. They are finite-state automata corresponding strictly to the synchronous assumption. In a given state, provided a certain input valuation (a subset of present signals), the machine reacts by immediately producing a set of output signals before entering a new state. The Mealy machines can be generated from netlists (and by extension from any S/R system). The Mealy machine construction can then be seen as a symbolic expansion of all possible behaviors, computing the space of reachable states (RSS) on the way. But while the precise RSS is won, the precise causal dependencies relations are lost, which is why Mealy FSM and netlists models are both useful in the course of S/R design [41]. When the RSS is extracted, often in symbolic BDD form, it can be used in a number of ways: We already mentioned that constructive causality only considers dependencies inside the RSS; similarly, all activities of model-checking formal verification, and test coverage analysis are strongly linked to the RSS construction [1, 7, 8, 37].
8.2.3.1
Synchronous Hypothesis Vs. Neighboring Models
Many quasi-synchronous formalisms exist in the fields of embedded system (cosimulation): the simulation semantics of SystemC and other HDLs at RTL level, the discrete-step Simulink/Stateflow simulation, or the official StateCharts semantics for instance. Such formalisms generally employ a notion of physical time in order to establish when to start the next execution instant. Inside the current execution instant, however, delta-cycles allow zero-delay activity propagation, and potentially complex behaviors occur inside a given single reaction. The main difference here is that no causality analysis (based on the Synchronous Hypothesis) is performed at compilation time, so that an efficient ordering/scheduling cannot be pre-computed before simulation. Instead, each variable change recursively triggers further re-computations of all depending variables in the same reaction.
8.2.4 Synchronous Languages Synchronous netlists and Mealy machines provide not only the main semantic models, but also the two fundamental modeling styles followed by the various synchronous languages and formalisms: By adding more types and arithmetic operators, as well as activation conditions to introduce some amount of controlflow, the modeling style of netlists can be extrapolated to the previously mentioned data-flow (block-diagram) networks that are common in automatic control and multimedia digital signal processing. The data-flow synchronous languages can be seen
236
D. Potop-Butucaru et al.
as attempts to provide structured programming to compose large systems modularly in this class of applications. Similarly, imperative synchronous languages provide ways to program in a structured way hierarchical systems of interacting Mealy FSMs. The imperative style is best illustrated by the Esterel language [6]. Its syntax includes classical imperative control flow statements (sequence, if-then-else, loop), explicit concurrency, and the statements that operate on the signals and state of the program by: Emitting a signal, i.e., making it present for the current execution cycle, and give
it a value, if it carries a value Defining a halting point where the control flow stops between execution cycles Conditionally starting, stopping, or suspending for a cycle the execution of a
(sub-)statement based on the status of a signal The clock (activation condition) of a statement in an Esterel program is determined by the position of the statement in the syntax tree, and by the signals controlling its execution through enclosing start/stop/suspend statements. Signals are therefore the language construct allowing explicit clock manipulation. The data-flow, or declarative, style is best illustrated by the Signal/Polychrony [19] and Lustre [21] languages. Both languages structure programs as hierarchical networks of dataflow blocks that communicate through signals that are also called flows. Special blocks allow the encoding of state and control. State is represented through delay blocks that transmit values from one execution instant to the next. Control is encoded by (conditional) sampling and merge blocks. The clocks controlling the execution and communication of all blocks and flows are defined through clock constraints. Clock constraints are either implicitly associated to the dataflow blocks (e.g., an adder block and all its input and output flows have the same clock), or explicitly defined by the programmer (e.g., data never arrives on the data input without an accompanying opcode on the instruction input, but an opcode can arrive without data). The main difference between the various data-flow formalisms comes from the restrictions they place on sampling and merge, and by the clock analyses that: Determine that clock constraints are not contradictory Transform clock constraints into actual clocks that can be efficiently computed
at run-time The Signal/Polychrony language is a so-called multi-clock synchronous language where clocks can be defined and used without specifying their relation to the base clock of the specification (the activation condition of the entire program, which is true at every execution cycle). Complex clock constraints can therefore be defined that often have more than one solution. Efficient heuristics are then used to choose one such solution allowing code generation. The Lustre language can be seen as the single-clock subset of Signal/Polychrony, because it only allows the definition of clocks that are constructed by sampling from the base clock.
8
Real-Time Implementation of Synchronous Specifications
237
The Esterel, Signal, and Lustre languages mentioned above have a particular position in synchronous programming, in both research and industrial areas. Their development started in the early 1980s, and they were the subject of sustained research ever since. All three languages have been incorporated into commercial products: EsterelStudio for Esterel, Scade for Lustre, and Sildex and RTBuilder for Signal. However, synchronous language research has also produced a large array of formalisms adapted to specific tasks, such as formal verification (e.g., the Esterel variant Quartz [33]), the programming of interactive systems such as virtual realities, GUIs or simulations of large reactive systems [40], or the development of signal video processing systems, where the n-synchronous Kahn networks [11] allow the natural representation of clocks involving regular activation motives (e.g., every eight samples, the filter will compute one output as the average of the samples 1, 3, and 7) and define a notion of consistent buffering allowing the direct connection of dataflows with different clocks but the same throughput. We mentioned here only a few synchronous languages, and we invite the interested reader to reference papers dedicated exclusively to the presentation of the synchronous approach [3]. In the remainder of the chapter, we shall focus on particular, very simple synchronous formalism, which is used as input by the real-time scheduling tool SynDEx [17]. Like Lustre, it is a single-clock synchronous language, but with a simpler clock definition language. After defining the formalism in Sect. 8.4.2, we fully present in the remainder of the chapter a technique for optimized real-time implementation of such specifications.
8.2.5 Implementation Issues The problem of implementing a synchronous specification mainly consists in defining the step reaction function that will implement the behavior of an instant, as shown in Fig. 8.1. Then, the global behavior is computed by iterating this function for successive instants and successive input signal valuations. Following the basic mathematical interpretations, the compilation of a S/R program may either consist in the expansion into a flat Mealy FSM, or in the translation into a flat netlist (with more types and arithmetic operators, but without activation conditions).
Fig. 8.1 The reaction function is called at each instant to perform the computation of the current step
reaction () f decode state ; read input ; compute ; write output ; encode state ; g
238
D. Potop-Butucaru et al.
The run-time implementation consists here in the execution of the resulting Mealy machine or netlist. In the first case, the automaton structure is implemented as a big top-level switch between states. In the second case, the netlist is totally ordered in a way compatible with causality, and all the equations in the ordered list are evaluated at each execution instant. These basic techniques are at the heart of the first compilers, and still of some industrial ones. In the last decade fancier implementation schemes have been sought, relying on the use of activation conditions: During each reaction, execution starts by identifying the “truly useful” program blocks, which are marked as “active”. Then only the actual execution of the active blocks is scheduled (a bit more dynamically) and performed in an order that respects the causality of the program. In the case of dataflow languages, the activation conditions come in the form of a hierarchy of clock under-samplings – the clock tree, which is computed at compile time. In the case of imperative formalisms, activation conditions are based on the halting points (where the control flow can stop between execution instants) and on the signal-generated (sub-)clocks. When the implementation is distributed over a network of processors, the specified functionality is implemented not by one, but by several iterating functions that run in lockstep. Input reading is distributed among the iterating functions running on the various processors. Similarly, state storage, encoding, and decoding is also distributed. The information transmitted through the communication media must allow not only the computation of the execution condition of each dataflow block, but also the reliable communication between blocks on different processors (e.g., after being sent, no message remains in the communication stack due to unmatched execution conditions, because this could block or confuse the system). To ensure temporal predictability, we require that communications on the bus(es) are statically scheduled (modulo conditional communication). This requires extra synchronization, but communication time must remain small, so that the real-time implementation can meet its deadlines. Balancing between these two objectives – functional correctness and communication efficiency – is necessarily based on the fine manipulation of both time and execution conditions, which is the objective of the technique presented in Sect. 8.5.
8.3 Real-Time Embedded (RT/E) Systems Implementation The digital electronic (software and hardware) part in complex applications found in domains such as automobile, aeronautics, telecommunications, etc., is growing rapidly. On the one hand it increasingly replaces mechanical and analog devices which cost a lot and are too sensitive to failures, and on the other hand it offers to the end-users new functionalities which may evolve more easily. These applications share the following main features:
8
Real-Time Implementation of Synchronous Specifications
239
Automatic-control and discrete-event: they include control laws, possibly using
signal and image processing, as well as discrete events, in order to schedule these control laws through finite state machines. Critical real-time constraints: they must satisfy the well-known deadlines often equal to a sampling rate (period) or/and end-to-end delays (latencies [12]) more difficult to master, otherwise the application may fail leading to a human, ecological or financial disaster. Embedding constraints: they are mobile and rely on limited resources because of weight, size, energy consumption, and price limitations. Distributed and heterogeneous hardware architecture: they are distributed in order to provide enough computing power through parallelism, but also for the purpose of modularity, and to keep the sensors and actuators close to the computing resources. Furthermore, fault tolerance imposes redundant architectures to cope with failures of hardware components. They are also heterogeneous because different types of resources (processors, ASICs, FPGAs, various communication lines) are necessary to implement the intended functionalities while satisfying the constraints. The reader must be aware that distributed real-time systems are considerably more difficult to tackle than centralized ones (only one type of resource), it is the reason why the most significant results in the literature are given for this latter case. Taking all these features into account is a great challenge, that only a formal (based on mathematics) methodology may properly achieve. The typical approach consists in decomposing into real-time tasks a large C program (produced, for instance, from a higher-level model), and then using an RTOS (Real-Time Operating System) for the implementation. This approach is no longer efficient enough to cope with the complexity of today’s applications, mainly because there is a gap between the specification step and the implementation step. However, this does not mean that the application should not be carried out with respect to the constraints, but that the development cycle will have too long a duration, essentially due to the real-time tests which must cover as many cases as possible. This is the reason why we propose to automatically generate an implementation satisfying the real-time and resource constraints by transforming the synchronous specification into a statically scheduled and distributed program running onto the embedded architecture. There are two main issues when distributed real-time implementation is considered: Real-time scheduling: Liu & Layland’s pioneering work provided in 1973 [27]
the main principles for the schedulability of real-time independent preemptive periodic task systems in the case of an uniprocessor architecture. This schedulability analysis allows the designer to decide whether a real-time task system will be schedulable when executed by performing simple tests involving the timing characteristics of every task, i.e., its period, duration, and deadline. We recall that a task is simply an executable program which is temporally characterized. The schedulability analyses are all based on simple fixed priority scheduling algorithms such as rate monotonic (RM), deadline monotonic (DM), or based on
240
D. Potop-Butucaru et al.
variable priority scheduling algorithms such as earliest deadline first (EDF), least laxity first (LLF). These works have been extensively used and improved over the years, to take into account dependent tasks sharing resources, aperiodic, and sporadic tasks, etc. However, there are several levels of dynamicity in these approaches which induce indeterminacy. The first one is related to the difficulty the designer has to assess the cost of the scheduler which is obviously a program like a task. This cost can be decomposed in two parts, a constant part corresponding to the scheduling policy and a variable part corresponding to the preemption mechanism. The constant cost is more difficult to assess in the case of variable priorities than in the case of fixed ones. The cost of the preemption is very difficult to assess since, at the best, only an upper bound of the number of preemptions [9] is known. As stated by Liu & Layland the usual solution consists in including its worst-case cost inside the task durations. Another solution is the use of nonpreemptive scheduling policies, such as the one presented in this chapter. The problem of non-preemptive scheduling algorithms is that some task systems that are schedulable using a preemptive algorithm become here non-schedulable. The second level of dynamicity is due to the fact that some tasks may have parts conditioned by the result of a test on variables whose values are known only at the execution time. Thus, depending on the path taken for each condition, the execution cost of the task may be different. The solution is again to include the worst-case run-time of each task in the task duration, but this is often very pessimistic and leads to over-dimensioning of the systems. Resource management: multiprocessor (distributed) embedded architectures feature more than one computing resource. Consequently, the real-time scheduling problem increases in difficulty. There are two possible approaches for multiprocessor real-time scheduling. Partitioned multiprocessor scheduling is based on a specific scheduling algorithm for every processor and on an assignment (distribution) algorithm, used to assign tasks to processors (Bin Packing complexity) which implies heuristics utilization since these problems are NP-hard. Global multiprocessor scheduling is based on a unique scheduling algorithm for all the processors. A task is put in a queue that is shared by all the processors and, since migration is allowed, a preempted task can return to the queue to be allocated to another processor. Tasks are chosen in the queue according to the unique scheduling algorithm, and executed on the processor which is idle. The two approaches cannot be opposed since one may solve a problem while the other one may fail, and vice-versa [26]. On the other hand, some scheduling algorithms such as EDF have their multiprocessor variant [13, 28]. Nevertheless, the global multiprocessor approach induces a cost for migrating tasks that is too important for the processors available nowadays (this may change in the following years due to improvements in integrated circuit technology). Currently, the partitioned multiprocessor scheduling approach is the one used for realistic applications in the industrial world. Here again indeterminacy can be minimized using static approaches rather than dynamic ones. In addition, since the problem must be solved
8
Real-Time Implementation of Synchronous Specifications
241
with heuristics, other criteria than the time, e.g., memory, power consumption, etc., can be added in the optimization problem, possibly under constraints such as deadlines, task dependences, etc., [22–24].
8.4 The SynDEx Specification Formalism Our technique for optimized real-time implementation has evolved from needs in the development of the SynDEx optimized distributed real-time implementation tool [17]. Hence, our presentation is based on the specification formalism of the SynDEx real-time scheduling tool, which we slightly adapt to suit the needs of our presentation. The results of the chapter can be easily generalized to cover hierarchical conditioned dataflow models used in embedded systems design, such as Lustre/Scade [21, 32] or Simulink [38]. The definitions of this section shall be exemplified in Sect. 8.5. The SynDEx tool takes as input three types of information: The topology of the target hardware architecture, the specification of the behavior to be implemented (a synchronous data-flow program), and the timing information giving the cost of the basic data-flow constructs on the various computing and communication resources of the architecture.
8.4.1 The Architecture In order to focus on the relation between time and execution conditions, we consider in this chapter simple architectures formed of a unique broadcast bus denoted B that connects a set of processors P D fPi j i D 1::ng. The architecture example of Fig. 8.3 is formed of three processors connected to the central bus. We assume all communication and synchronization is done through asynchronous message passing. We also make the simplifying assumption that the bus is reliable (no message losses, no duplications, no data corruption). The reliability assumption amounts to an abstraction of real-life asynchronous buses, such as CAN [29], which is realistic in certain settings [17] (pending the use of fault tolerant communication libraries and a reliability analysis that are not covered in this chapter). We further assume for the scope of this chapter that no form of global or local time is used to control execution (e.g., through timeouts), either synthesized by clock synchronization protocols or integrated to the hardware platform through the use of time-triggered buses such as TTA [29] and FlexRay. However, the optimization technique we develop can be extended (with weaker optimality results) to such cases.
242
D. Potop-Butucaru et al.
The formal model of our execution mechanism will be given under the form of constraints on the possible schedules in Sect. 8.8. We give in Sect. 8.5 the intuition behind these constraints.
8.4.2 Functional Specification 8.4.2.1
Syntax
In SynDEx, functional specification is realized using a synchronous formalism very similar to Scade/Lustre. A specification, also called algorithm in SynDEx jargon, is a synchronous hierarchic dataflow diagram with conditional execution. It is formed of a hierarchy of dataflow nodes, also called operations in SynDEx jargon. Each node has a finite, possibly void set of named and typed input and output ports. We respectively denote with I.n/ and O.n/ the sets of input and output ports of node n. To each input or output port p we associate its type (or domain) Dp . An algorithm is a hierarchic description, where hierarchic decomposition is defined by means of dataflow graphs. A dataflow graph G ; AG / where Sis a pair G D .NS NG is a finite set of nodes and AG is a subset of . n2NG O.n// . n2NG I.n// satisfying two consistency properties: Domain consistency: For all .o; i / 2 AS G , we have Do D Di Static single assignment: For all i 2 n2NG I.n/ there exists a unique o such that .o; i / 2 AG . The nodes are divided into basic nodes, which are the leaves of the hierarchy tree (the elementary dataflow executable operations), and composed nodes, which are formed by composition of other nodes (basic and composed). There are two types of basic nodes: dataflow functions, which represent elementary dataflow computations, and delays, which are the state elements. Each delay d has exactly one input and one output port of the same type, denoted Dd , and also has an initializing value d0 2 D d . Each composed node has one or more expansions, which are dataflow graphs. We denote with E.n/ the set of expansions of node n. The expansion(s) of a composed node define its possible behaviors. Composed nodes with more than one expansion are called conditioned nodes and they need to choose between the possible dataflow expansions at execution time. This choice is done based on the values of certain input ports called condition input ports. We denote with C.n/ I.n/ the set of condition ports of a conditioned node n. The association of expansions to condition port valuations is formally described using a partial function: condn W
Y i2C.n/
Di ! E.n/
8
Real-Time Implementation of Synchronous Specifications
243
The function needs not be complete, because all input valuations are not always feasible. However, the function must be defined for all feasible combinations of conditioning inputs. We assume this has already been checked. An expansion G D .NG ; AG / of a node n must satisfy the following consistency properties: There exists an injective function from I.n/ to NG associating to each port i the
basic node iG having no input ports and one output port of domain Di . We call iG an input node. There exists an injective function from O.n/ to NG associating to each port o the basic node oG having no output ports and one input port of domain Do . We call oG an output node. We assume the hierarchy of nodes is complete, the algorithm being the topmost node. We also assume that the algorithm has no input or output ports. This implies that interaction between the system and its environment is represented by dataflow functions. To simplify notations, we assume that the algorithm is used in no expansion, and that each other node is used in exactly one expansion. We denote with Nn the set of nodes used in the hierarchic description of n.
8.4.2.2
Operational Semantics
The operational semantics is the classical cycle-based synchronous execution model which is also used in Scade/Lustre. The execution of a node (the algorithm included) is an infinite sequence of cycles where the node reads all its inputs and computes all its outputs. The behavior of a node depends on its type: The computation of a dataflow function is atomic: All inputs are waited for and
read, then the non-interruptible computation is performed, and then the outputs are all produced. The delays are state elements. When executed in a cycle, a delay delivers the value stored in its previous cycle. In the first cycle, no previous value is available, and the delay d delivers the initializing value d0 . Then, it waits and reads its input, and concludes the execution of the cycle by storing this new value for the next cycle. If a composed node has condition input ports, then its execution starts by waiting and reading the values of the conditioning inputs. These values are used to choose one expansion, and execution proceeds as though the node were replaced by its expansion. In particular, no atomicity is required and two nodes can be executed in parallel if no dependency exists between them. As mentioned above, it is assumed that any possible combination of conditioning inputs of a node n is assigned an expansion through condn . This amounts to a partition of the possible configurations of the conditioning inputs among expansions.
244
D. Potop-Butucaru et al.
This requirement ensures an important correctness property: The fact that any every value that is read in a cycle has been produced in that cycle (conditional execution never leads to the use of uninitialized data). Whenever a data-flow block or communication arc x is executed at an instant, if m is an expansion of conditioned node n that encloses x, then the current values of the input ports in C.n/ are in cond1 n .m/. Conversely, the clock of x is the conjunction for all m 2 E.n/ enclosing x of the predicates requiring that the current values of the input ports in C.n/ are in cond1 n .m/.
8.5 Motivating Example We give in this section an example of full SynDEx [17] specification, including the definition of the architecture, dataflow, and timing information. We then explain what type of implementation SynDEx generates, and show that the generated real-time schedule is sub-optimal. Finally, the type of optimizations we can realize is illustrated with a version of the SynDEx schedule where a finer clock analysis and clock-driven schedule optimizations result in a significantly shorter cycle duration. We thus motivate the following sections, which introduce the formal apparatus needed to perform these clock analyses and optimizations.
8.5.1 Example of SynDEx Specification Figures 8.2 and 8.3 gives our SynDEx specification example in an intuitive graphical form that is more suited for presentation than the original one.
FS IN
FS HS=false FS=false F1
HS IN
ID
HS F2
V
ID F3
ID
M
FS=true HS=true G
N
ID
C1
Fig. 8.2 The data-flow of a simple SynDEx specification
C2
8
Real-Time Implementation of Synchronous Specifications
Fig. 8.3 The architecture and timing information of a simple SynDEx specification
245 HS IN=1 FS IN=1 F1=3 F2=8
G=3 P2 F3=3
P1
Asynchronous broadcast bus
boolean=2 V type=2 ID type=5
M=3 P3 N=3 F3=2
8.5.1.1
Dataflow
Dataflow nodes are represented by boxes on which the ports (black boxes) are placed. Circle boxes are the input and output nodes. The remaining boxes are the other basic and composed nodes. The labels “HSDtrue” and “HSDfalse”, identify the two expansions of the conditioned node C1 and their clocks. Similarly, the labels “FSDtrue” and “FSDfalse” identify the two expansions of conditioned node C2 and their clocks. For clarity reasons, we only name four output ports: HS, FS, V, and ID. The first two correspond to the values acquired from the environment by the nodes “FS IN” and “HS IN”. The output of F2 is V, and the output of the C1 is ID. Our specification represents a system with two switches (Boolean inputs) controlling its execution: high-speed (HSDtrue) vs. low-speed (HSDfalse), and failsafe (FSDtrue) vs. normal operation (FSDfalse). In the low-speed mode, more operations can be executed, whereas in the fail-safe mode the operation that gets executed (N) does not use any of the inputs (which corresponds to the use of default values). The specified behavior is the following: At each execution cycle, read FS and HS. Depending on the values of HS, either execute F1, followed by F2 and F3, or execute G. Both F1 and G are computing the output value ID of the conditioned node C1. Depending on the value of FS, we either wait for ID and then execute M, or execute N. Dataflow blocks having no dependency between them can be executed in parallel. For instance, if FSDtrue then N can be executed and output as soon as FS is read. On the contrary, the computation of M must wait until both FS and ID have arrived.
8.5.1.2
Architecture and Timing Information
The architecture model we use is formed of three processors connected to an asynchronous broadcast bus, as explained in Sect. 8.4.1. The timing information specifies: The atomic dataflow nodes that can be executed by each processor, and their
WCET (Worst-Case Execution Time) on that processor. Each processor has a list of timing information of the form “nodeDwcet”.
246
D. Potop-Butucaru et al.
The communication operations that can be executed by the bus (the data types
that can be sent atomically), and the worst-case duration of the communication, assuming the bus is free. The bus has a list of timing information of the form “data typeDwcet”. Timing information for one atomic node can be present on several processors, to denote the fact that the node can be executed by several processors. For instance, node F3 can be executed on P2 with WCETD3, and on P3 with WCETD2. We assume that all processors are sequential (no parallel execution is possible), and that the bus can be used for no more than one Send operation at a time.
8.5.2 Schedule Generated by SynDEx The SynDEx tool takes the previously defined specification and produces a static schedule of all computations and communications for one execution cycle. The schedule of a full execution is obtained by repeating the execution cycle schedule for as many cycles as needed. The tool tries to minimize the worst-case duration of one execution cycle (the latency), and not the input and output rates (the throughput). The left-hand part of Fig. 8.4 gives the schedule generated by SynDEx for our example. The schedule is formed of one lane for each processor, and one lane for the bus. Each atomic node is scheduled onto one processor, while the bus carries inter-processor communications. Given that the bus is broadcast, communication operations are of the form “Send(P,OP)”, where P is the sender processor, and OP is the output port whose value is sent. The clock of each scheduled node and communication is figured after the name of the operation. For instance, HS IN is executed in all cycles to read the HS value from the environment, because its clock (“@true”) is always true. Similarly, F1 is only executed when the value read for HS is false. Execution conditions (clocks) are predicates over the port names of the dataflow. The width of an operation (its support) inside its lane intuitively represents its clock (the larger it is, the more its clock is simple). Much like in Venn diagrams, the logical relations between clocks of various operations are given by the horizontal overlapping of supports. Given operations O1@P1 and O2@P2 inside a lane, their supports are disjoint to show that P1 ^ P2 D false. In this case, the two operations are never both executed in a single cycle. They can be scheduled independently and can be placed side by side in a lane without contradicting the sequentiality of processors and buses. The last lane in Fig. 8.4 contains an instance of such independent operations. Whenever we cannot prove that P1 ^ P2 D false, we must assume that the operations O1@P1 and O2@P2 can be executed both in the same cycle. Such operations have overlapping supports in Fig. 8.4. Non-conditioned operations, like “HS IN@true”, use the full width of the lane. Each scheduled operation (dataflow node or communication) is assigned a start date and an end date. While this presentation of the schedules may suggest a
F2@(HS=false)
G@HS=true
M @(FS=false)
F3@(HS=false)
N @(FS=true)
Generated by SynDEx P2 P3
Send(P1,V) @(HS=false)
Send(P2,ID) @(FS=false)
Send(P1,ID) @(HS=false)
Send(P1,FS)@true
Send(P1,HS)@true
Bus
M @(FS=false)
F3@(HS=false)
N @(FS=true)
Send(P1,V) @(HS=false)
Send(P1,ID) @(HS=false ^FS=false)
Send(P2,ID) @(FS=false ^ HS=true)
Send(P1,FS)@true
Send(P1,HS)@true
Optimized by our technique Optimized P3 Optimized bus
Fig. 8.4 The static schedule generated by SynDEx and what we want to obtain through optimizations (we only figured the optimized lanes that differ from the SynDEX-generated ones). Time flows from top to bottom. We give here the schedule for one execution cycle. An execution of the system is an infinite repetition of this pattern. The width of an operation inside its lane intuitively represents its execution condition (clock), as explained in Sect. 8.5.2
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
F1@(HS=false)
FS IN@true
2
HS IN@true
1
P1
0
time
8 Real-Time Implementation of Synchronous Specifications 247
248
D. Potop-Butucaru et al.
time-triggered implementation, this is not the case. Currently, SynDEx produces schedules that are implementable as latency-insensitive systems [39] where the communication operations allow the preservation of the scheduled sequence of node computations on each processor and the sequence of communication operations on the bus (static scheduling modulo conditional execution). For instance, M can only be executed only after both FS and ID have been received by P3 and FSDfalse. The start and end dates of a scheduled operation should therefore be interpreted as worstcase start and end dates for operations in a latency-insensitive system (thus, timing errors will not affect the functional correctness and determinism of the system). Intuitive constraints define the correctness of a schedule (they will be formalized later in Sect. 8.8): For each scheduled node, the values needed to compute its clock, as well as its
inputs, are available on the processor where the operation has been scheduled. For each Send operation on the bus, the data to send is already available on
the sending processor, and the values needed to compute its clock are available on all processors (having been sent on the bus previously). This way, when a Send occurs, all processors know what to do with the incoming value: keep it, or discard it.
8.5.3 Optimizing Communications In Fig. 8.4, note that the schedule generated by SynDEx contains redundant bus communications: Two send operations of signal ID exist, with execution conditions “HSDfalse”
and “FSDfalse”. Therefore, ID is sent twice on the bus in cycles where both HS and FS are false. In cycles where HS=false and FSDtrue, ID is sent over the bus, even though the node M which uses ID is not executed. The first problem here is to determine such redundancies. This amounts to verifying the satisfiability of predicates over the (possibly non-Boolean) ports of the dataflow. This can be done using approached satisfiability checks that determine a sub-set of these redundancies (but never report false redundancies). This chapter determines, in Sect. 8.9, the predicates corresponding to the redundancies we seek to remove. We do not discuss the arbitrarily complex satisfiability verification techniques that can be used on these predicates. A variety of techniques exist, ranging from the lowcomplexity one implicitly used in SynDEx to BDD-based techniques like those of Wolinski [25]. We simply explain here how this computation can be done on our example. Once a set of redundant communications has been determined, we optimize the schedule accordingly by (1) recomputing the clocks of the communications to remove the redundancies, and (2) compacting the schedule based on the new exclusiveness properties. Section 8.9 defines such an optimization technique consisting
8
Real-Time Implementation of Synchronous Specifications
249
in a sequence of low-complexity transformations of the initial schedule. The output of the transformations is optimal, in the sense where no further minimizations of the clocks can be done while guaranteeing the functional correctness of the implementation, on the basis of the redundancies provided by the previous step. We exemplify these optimizations in Fig. 8.4, which gives in its right-hand part the optimized version of the SynDEx-generated schedule (we only figured the lanes that change after optimization). Here, (1) the clock of Send(P1,ID) is changed, so that ID is never sent on the bus when M is not executed, (2) the clock of Send(P2,ID) is changed, so that ID is not sent a second time by P2 when P1 has already sent it, and (3) Send(P2,ID) is moved earlier in the schedule, which is now possible. Like in a Tetris game (falling tiles), the operations Send(P1,V), M, and F3 can be moved earlier in the schedule. The optimized schedule has a worst-case cycle duration that is 20% (four time units) better than the SynDEx generated one. Note that our transformations do not alter the allocation or the order of operations of the initial schedule (just the clocks). For instance, we do not exchange the slots of HS IN and FS IN, even though the existing choice is not optimal. The key point of our formal approach is the use of a notion of execution condition, defined in the next section. An execution condition represents both a clock and the executable code used in the implementation to compute this clock (a hierarchy of conditionals, and the associated data dependencies). Manipulating execution conditions amounts to both defining new clocks and transforming executable code.
8.6 Execution Conditions An execution condition is a pair c D< clk.c/; supp.c/ > formed of: A clock clk.c/ which is a predicate over the values produced by the ports of the
dataflow.1 This predicate is defined using a set of operators given in this section, and allowing us to also see it as the sequence of operations for computing the clock. The support supp.c/ of the execution condition, consisting in the set of ports whose value is needed to compute clk.c/, and, for each such port, the execution condition defining the cycles where the value is needed. To give an intuition of this definition, assume in Fig. 8.4, last lane, that the guarded execution of Send(P2,ID)@(FSDfalse^HSDtrue) is implemented as: if (FS=false) then if (HS=true) then Send(P2,ID)
1
Clocks can be easily extended to include cycle indices, which is of help when dealing with multiperiodic systems.
250
D. Potop-Butucaru et al.
Then, it is natural to use the execution condition c with clk.c/ D ŒF S D f alseŒHS D true, which specifies that FS is evaluated first, and that if its value is false we evaluate HS, and if HSDtrue then the predicate is true, otherwise it is false. The support of c is supp.c/DfF S @t rue; HS @ < ŒF S Dtrue; fF S @trueg >g, which specifies that we can compute the predicate provided FS is available at each cycle (FS@true), and provided HS is available in cycles where FSDfalse. The support information is necessary to check the correctness of the communications, and possibly to optimize them. Other execution conditions with different supports can compute the same Boolean function (FSDfalse^HSDtrue), for instance c 0 with clk.c 0 / D ŒFS D false ^ HS D true and supp.c 0 / D fFS@true; HS@trueg, which intuitively corresponds to the implementation: if (FS=false and HS=true) then Send(P2,ID)
8.6.1 Basic Operations We naturally extend on execution conditions the basic Boolean operations on predicates (^, _, :): c1 ^ c2 D < clk.c1 / ^ clk.c2 /; supp.c1 / [ supp.c2 / > c1 _ c2 D < clk.c1 / _ clk.c2 /; supp.c1 / [ supp.c2 / > :c D < :clk.c/; supp.c/ > : We define the difference operator on predicates and execution conditions by p1 n p2 D p V1 ^ :p2 . If C is a finite set of predicates or execution W conditions, we also define C as the conjunction of all the elements in C , and C as their disjunction. Clocks are partially ordered with p1 p2 whenever p1 implies p2 . We extend to a preorder on execution conditions, by setting c1 c2 whenever clk.c1 / clk.c2 / and for all s@c 2 supp.c1 / there exists s@c 0 2 supp.c2 / with c c 0 . We shall say that c1 is true (respectively false) when p1 is true (resp. false).
8.6.2 Signals Port values mentioned above are formally represented by objects named signals. A signal s is a value that is computed and can be used in computations at precise execution cycles given by an execution condition cond.s/. At every such cycle, s is assigned a unique value. Inside each cycle, the value of s can be read only after being assigned a value. A signal s has a type Ds . Given a signal s and an execution condition c cond.s/, we define the sub-sampling of s on c, denoted s@c, as
8
Real-Time Implementation of Synchronous Specifications
251
the signal of execution condition c and domain Ds that has the same values as s at cycles where c is true. In a dataflow graph, each value is produced by exactly one output port in an execution cycle (due to the single assignment rule). However, hierarchical decomposition and conditional execution mean that several ports may in fact correspond to the same signal. In Fig. 8.2, the signal named ID represents the values produced by the output ports of F1, G, C1,2 and the input node labelled ID of C2, and is consumed by the input ports of M, C2, and the output node labelled ID of C1. More precisely, we need to create one signal in Sig.n/ for each: Input and output port of the algorithm node. The input ports of the other nodes
use the signal associated with output port feeding them. Output port of a dataflow node, whenever the port is not connected to the input
port of an output node. The remaining output ports produce values of the signal associated with the output port they feed. For a signal s, we denote with node.s/ the node whose input or output port defines s. By construction cond.s/ D cond.node.s//. We denote with sig.p/ the unique signal associated to some port p of a node in the algorithm.
8.6.3 Conditioning Q Given a set of signals S and X s2S Ds we shall write S 2 X to denote the condition that the valuation of the signals of S belongs to X . Consider now an execution condition Q c and a finite set of signals S such that cond.s/ c for all s 2 S . Consider X s2S Ds a set of valuations of the signals in S . Then, we define the conditioning of c by S 2 X , denoted cŒS 2 X , by clock clk.cŒS 2 X / D clk.c/ŒS 2 X and the support supp.c/ [ fs@c j s 2 S g. Using on clocks the new operator pŒS 2 X gives us a syntax to represent hierarchical conditions, whereas for verification purposes pŒS 2 X D p ^ .S 2 X /. Using the conditioning operator, it is easy to define the hierarchical computations of all the nodes in a SynDEx algorithm. The condition of the algorithm node is < true; ; >, also denoted true. We also abbreviate trueŒS 2 X as ŒS 2 X . Given the execution condition cond.n/ of a composed node n: If n is not conditioned, then the execution condition of all the nodes in its unique
expansion is cond.n/. If n is conditioned, then the execution condition of all the nodes in expansion G 2
E.n/ is cond.n/ŒC.n/ 2 condn 1 .G/, where condn 1 .G/ D fx 2 condn .x/ D Gg.
2
Q
c2C.n/
Dc j
This does not contradict the single assignment rule because the dataflow functions F1 and G that produce ID cannot be both executed in a cycle.
252
D. Potop-Butucaru et al.
The resulting execution conditions are all of the form ŒS 1 2 X 1 : : : :ŒS k 2 X k , which are naturally organized in a tree with true as root.
8.7 Flattened Dataflow The hierarchy of a SynDEx specification facilitates the specification of complex control patterns, ensuring through simple construction rules that every value that is consumed has been produced. However, actual dataflow computations are only performed by the leaves of the hierarchy tree (dataflow functions and delays), while composed nodes only represent control (conditional execution). We want to be able to represent schedules that specify the spatial and temporal allocation at the level of the hierarchy leaves.3 To facilitate the description of such flat schedules, we identify in this section the set Op.a/ of operations that need to appear in a correct schedule of the algorithm a. Each o 2 Op.n/ is associated an execution condition cond.o/ defining its execution, a set IS.o/ of signals it reads, and a set OS./s of signals it produces. The set Op.n/ is formed of: All the atomic dataflow functions that are not input or output nodes. For such a
node n we set IS./n D fsig.i / j i 2 I.n/g and OS.n/ D fsig.o/ j o 2 I.n/g. For each delay node n:
– One read operation read.n/ with cond.read.n//Dcond.n/, IS./read.n/D;, and OS./read.n/ D fsig.o/g, where o is the output port of n. – One store operation store.n/ with cond.store.n//Dcond.n/, OS./store.n/D;; and IS./store.n/ D fsig.i /g, where i is the input port of n. For each input port i of the algorithm node a, an input operation read.i / with
cond.read.i // D >, IS./read.i / D ;, and OS./read.i / D fsig.i /g. For each output port o of the algorithm node a, an output operation send.o/ with
cond.send.o// D >, OS./send.o/ D ;, and IS./send.o/ D fsig.o/g. We associate no operation to nodes that are not leaves, as (1) all control information is represented by execution conditions, and (2) we assume control to take no time, so that we don’t need to attach its cost to some object.
8.7.1 Timing Information To allow real-time scheduling, we assume each operation op 2 Op.a/ is assigned a worst-case execution time dP .op/, on each processor P it can be allocated on. 3
Such a formalism can also represent coarser-grain allocation policies where allocation is done at some other hierarchy level.
8
Real-Time Implementation of Synchronous Specifications
253
To represent the fact that a processor P cannot execute an operation op, we set dP .op/ D 1. This last convention is particularly important for the operations corresponding to algorithm inputs and outputs. These operations must be allocated on the processor having the corresponding input or output channel. In addition to operation timings, we need communication timings. We associate a positive value dB .D/ to each domain D. This value represents the worst-case duration of transmitting one message of type D over the bus (in the absence of all interference).
8.8 Static Schedules In this section, we define our model of static real-time schedule of an algorithm over the bus-based architecture defined in Sect. 8.4.1. To simplify the developments, we assume that scheduling is done for one cycle, the global scheduling being an infinite repetition of the scheduling of a cycle. This corresponds to the case where the computations of the different cycles do not overlap in time in the real-time implementation.4 The remainder of the section defines our formal model of static schedule. The definition includes some very general choices, such as the use of data messages for synchronization. These supplementary hypotheses will be clearly identified in the text. The resulting formalism is general enough to allow the representation of schedules generated by several common static scheduling techniques, including the ones of SynDEx.
8.8.1 Basic Definitions Consider an algorithm a. A static schedule S of a on the bus-based architecture is a set of scheduled operations. Each so 2 S has (1) one execution resource, denoted Res.so/, which is either the bus or one of the processors, (2) the execution condition cond.so/ of the operation, (3) a real-time date tso giving the latest (worst-case) realtime date at which the execution of the operation will start, and (4) the scheduled operation, which is resource-specific: If Res.so/ D B, the scheduled operation is the emission of a signal denoted
sig.so/ by processor emitter.so/. If Res.so/ D Pi , the scheduled operation is the execution of an operation
op.so/ 2 Op.a/ with cond.so/ D cond.op.so//.
4
The results of this chapter can be extended to the case where cycles can overlap by considering modulo scheduling techniques.
254
D. Potop-Butucaru et al.
For convenience, we define SB D fso 2 S j Res.so/ D Bg and SPi D fso 2 S j Res.so/ D Pi g. We also define the duration of a scheduled operation as d.so/ D dB .Dsig.so/ / if Res.so/ D B and d.so/ D dRes.so/ .op.so// otherwise. Note that the definition of SB implicitly assumes that we make no use of specialized synchronization messages, nor data encoding, only allowing the transmission of messages containing signal values. We assume that each operation is scheduled exactly once in S, meaning that no optimizing replication such as inlining is done. This hypothesis is often used in realtime scheduling. Formally, we require that for all o 2 Op.a/, there exists a unique so 2 S with op.so/ D o.
8.8.2 Data Availability The execution condition defining the execution cycles where signal s is sent on the bus before date t is _ cond.s; t; B/ D fcond.so/ j .so 2 SB / ^ .sig.so/ D s/ ^ .tso C dB .Ds / t /g: Recall that each signal is computed on a single processor at each cycle. Then, cond.s; t; B/ is the execution condition giving the cycles where s is available system-wide at all dates t 0 t . The execution condition defining the cycles where s is available on P at date t is cond.s; t; P / D cond.s; t; B/ _
_˚ cond.so/ j .so 2 SP /
^.s 2 OS.op.so/// ^ .tso CdP .op.so// t / :
The second term of the conjunction corresponds to the local production of s. Given a schedule S, a signal v and an execution condition c cond.s/, we denote with ready date.P; s; c/ the minimum t such that cond.s; t; P / c, and with ready date.B; s; c/ the minimum date t such that cond.s; t; B/ c.
8.8.3 Correctness Properties The following properties define the correctness of a static schedule S with respect to the initial algorithm a (viewed as a flattened graph), and the timing information that was provided. This concludes the definition of our model of real-time schedule, and allows us, in the next section, to state the aforementioned optimality properties, and define the corresponding optimization transformations.
8
Real-Time Implementation of Synchronous Specifications
8.8.3.1
255
Delay Consistency
The read and store operations of a delay node must be scheduled on the same processor, to allow the use of local memory for storing the value. Formally, for all delay node n and so1 ; so2 2 S, if op.so1 / D op.so2 / D n, then Res.so1 / D Res.so2 /.
8.8.3.2
Exclusive Resource Use
A processor or bus cannot be used by more than one operation at a time. Formally: On processors: If so1 ; so2 2 SP such that so1 6D so2 and cond.so1 / ^
cond.so2 / 6D false, then either tso1 tso2 C dP .op.so2 // or tso2 tso1 C dP .op.so1 //. On the bus: If so1 ; so2 2 SB such that so1 6D so2 and cond.so1 / ^ cond.so2 / 6D false, then either tso1 tso2 C dB .Dsig.so2 / / or tso2 tso1 C dB .Dsig.so1 / /.
8.8.3.3
Causal Correctness
Intuitively, to ensure causal correctness our schedule must ensure in static fashion that when a computation or communication is using a signal s at time t on execution condition c, the signal has been computed or transmitted on the bus at a previous time and on a greater execution condition: On a processor: For all so 2 SP we have:
– If s@c 2 supp.cond.so//, then cond.s; tso ; P / c. – If s 2 IS.op.so//, then cond.s; tso ; P / cond.so/. On the bus: For all so 2 SB we have:
– If s@c 2 supp.cond.so//, then cond.s; tso ; B/ c. – cond.sig.so/; tso ; emitter.so// cond.so/.
8.9 Scheduling Optimizations In this section, we use the previously defined model of static schedule to state some optimality properties which we expect of all schedule, yet which are not easy to state or ensure in the presence of complex execution conditions. We provide algorithms ensuring these properties. The schedule optimizations we are interested in are based on the removal of redundant idle time and redundant communications. This is done only through manipulations of the execution conditions of the communications on the bus, and through adjustments of the dates of the scheduled operations. In particular, our optimizations preserve the ordering of the computations on processors, as well as
256
D. Potop-Butucaru et al.
the ordering of communications on the bus (with the exception of the fact that only the first send of a given signal is performed at a given cycle, the other being optimized out).
8.9.1 Static Dependencies To simplify notations, we define the sets of bus operations and processor operations that produce a given signal s. They are SB .s/ D fso 2 SB j sig.so/ D sg and SP .s/ D fso 2 SP j s 2 OS.op.so//g. To allow the optimization of a static schedule, we need to understand which data dependencies must not be broken through removal of communications. Our first remark is that a signal s can be used on all processors as soon as it has been emitted once on the bus. Subsequent emissions bring no new information, so that readers of s should not depend on them. We shall denote with depB .s@c/ the subset of operations of SB that can (depending on execution conditions) be the first communication of signal s on the bus in cycles where c is true. n _ depB .s@c/ D so 2 SB .s/ j clk cond.so/ ^ c n fcond.so0 / j .so0 2 SB .s// o ^.tso0 < tso /g 6D false : On a processor, things are more complicated, as information can be locally produced. However, local productions are always the source of the data, and cannot be redundant. We denote the set of local dependencies with dep loc W P .s@c/ D fso 2 SP .s/ j clk.cond.so/ ^ c/ 6D falseg, and we set cloc D fcond.so/ j so 2 dep locP .s@c/g. With these notations, depP .s@c/ D dep locP .s@c/ [ depP [email protected] n cloc //: Now, we can define the static dependencies between the various scheduled operations. The set of scheduled operations upon which so 2 SB depends is [
dep.so/ D depemitter.so/ .sig.so// [
depB .s@c/:
[email protected]//
Similarly, for so 2 SP : 0 dep.so/ D @
[
1 depP .s/cond.so/A
s2IS.op.so//
[
[
[email protected]//
depP .s@c/ :
8
Real-Time Implementation of Synchronous Specifications
257
To these data dependencies, we must add the dependencies due to sequencing on the processors and on the bus, and we obtain the set of static dependencies that must be preserved by any optimization. We denote with pre.so/ the set of predecessors of a scheduled operation, which includes dep.so/ and the dependencies due to sequencing: n pre.so/ D dep.so/ [ so0 2 S j .Res.so/ D Res.so0 //
o ^.clk.cond.so/ ^ cond.so0 // 6D false/ ^ .tso > tso0 / :
We call scheduling DAG (directed acyclic graph) the set S endowed with the order relation given by the pre./ relation. The computation of pre.so/ relies on computing the satisfiability of predicates. As explained in Sect. 8.5.3, we assume these satisfiability computations are not exact, so that the computed pre.so/ contains more dependencies than the exact one. The closer the approximation of pre.so/ to the exact one, the better results are given by the following optimizations.
8.9.2 ASAP Scheduling The first optimality property we want to obtain is the absence of idle time. In our static scheduling framework, this amounts to recomputing the dates of all the scheduled operations by performing a MAX+ computation on the scheduling DAG defined in the previous section. More precisely, starting on the scheduled operations without predecessors, we set tso D
max
so0 2pre.so/
tso0 C d.so0 / :
This operation is of polynomial complexity, and it does not change other aspects of the schedule, meaning that the same implementation will function. The gain is a tighter worst-case bound for the execution of a cycle.
8.9.3 Removal of Redundant Communication A stronger result is the absence of redundant communications. We explore two complementary approaches to this problem here. 8.9.3.1
Removal of Subsequent Emissions of a Same Signal
The simplest of the two approaches is the one that seeks to reduce the execution condition of a signal communication based on previously scheduled communica-
258
D. Potop-Butucaru et al.
tions of the same signals. The minimization works by replacing for each so 2 SB the execution condition cond.so/ with cond.so/ n
_˚ cond.so0 / j .so0 2 SB .sig.so/// ^ .tso > tso0 / :
The transformation of the schedule is a traversal of the bus scheduling DAG (of polynomial complexity). Due to the minimization of execution conditions (meaning that some communications are no longer realized at certain cycles), the output of this transformation should be put in ASAP form. To do this, the scheduling DAG must be recomputed through a re-computation of the dependencies due to sequencing on the bus. The data dependencies are not changed, as this form of redundancy has been taken into account during the computation of the data dependencies.
8.9.3.2
Removal of Useless Communication Operations
The removal of subsequent emissions reduces to 1 the maximal number of emissions of a signal in a cycle. However, it does not try to restrict emissions to cycles where the signal is needed. Doing this amounts to a reduction of the execution condition on which a signal s is sent through the bus in a cycle to W so2SB .s/ cond.so/. We propose here the simplest such transformation, which consists in the removing of communication operations that are in none of the pre.so/ for some so 2 S. It is important here to note that the removal of one operation may cause the removal of another, a.s.o. The removal and propagation process can be done in polynomial complexity. The modified schedule is easily implemented by removing pieces of code from the initial implementation. For a given schedule, the removal of subsequent emissions, followed by the removal of useless communication operation, and by a transformation into ASAP form leads to a schedule where none of the three techniques can result in further improvement.
8.10 Related Work To our knowledge, no work exists on the optimization of given real-time schedules for conditioned dataflow programs. Meanwhile, an important corpus of heuristic algorithms for real-time scheduling of conditioned dataflow programs (and similar formalisms) exists. We mention here only a few approaches. Our approach is closest related to work by Eles et al. [15] on the scheduling of conditioned process graphs onto multiprocessor systems. Like in our approach, the difficulty is the handling of execution conditions. However, the chosen approach is different. Eles et al. start from schedules corresponding to each execution mode of
8
Real-Time Implementation of Synchronous Specifications
259
the system, and unify them into a global schedule covering all modes. The enumeration of execution modes allows a finer real-time analysis than possible in our model of static schedule (because a given operation has not one worst-case starting date, but one date per mode). The main drawback of the approach is the enumeration of all execution modes of the system, which can be intractable for complex systems. A better approach here could be an intermediate one between ours and Eles’, where mode-dependent execution dates can be manipulated, but manipulation is not done on individual modes, but on sets of nodes identified by execution conditions. A second drawback of the approach, from a communication optimization point of view, is that for most communications the value that is transmitted is not specified. Our work is based on existing work on SynDEx [17]. The SynDEx tool allows the real-time scheduling of conditioned dataflow specifications onto hardware architectures involving multiple processors and communication lines (point-to-point and buses). When applied to architectures with a single bus, SynDEx generates schedules that can be represented using our formalism, but where each communication has either the execution condition of the sender, or the execution condition of the intended receiver. This simplifies our calculus, but potentially pessimizes communication, meaning that our techniques can be directly applied. The work by Kountouris and Wolinski [25] on the scheduling of hierarchical conditional dependency graphs, a formalism allowing the representation of data dependencies and execution conditions whose development is related to the implementation of the Signal/Polychrony language [19]. Kountouris and Wolinski focus on the development of scheduling heuristics that use clock calculus to drive classical optimization techniques. We focus on the determination of optimality properties in a more constraint solution space, and on a schedule analysis based on a calculus involving both logical clocks and time. We also mention here the large corpus of work on optimized distributed scheduling of dataflow specifications onto time-triggered architectures [18, 42]. Given the proximity of the specification formalism, we insist here on the work of Caspi et al. on the distributed scheduling of Scade/Lustre specifications onto TTA-based architectures [10]. The main difference is that in TTA-based systems communications must be realized in time slots that are statically assigned to the various processors. Therefore, communications from different processors cannot be assigned overlapping time frames even if their execution conditions are exclusive. This implies that the computation of an ASAP scheduling is no longer a simple MAX+ computation, as it is in our case. Closer to our approach is the scheduling of the FlexRay dynamic segment, where time overlapping is possible (but still ASAP scheduling does not correspond to MAX+). Finally, in time-triggered systems all execution conditions do not need to be computed starting from “true” because a notion of time and communication absence is defined. Our model of static schedule can also represent schedules for systems based on time-triggered buses such as TTA or FlexRay [29]. Our technique for removal of redundant communications still works, but the ASAP minimization does not.
260
D. Potop-Butucaru et al.
8.11 Conclusion and Future Work In this paper, we presented a technique for the real-time implementation of dataflow synchronous specifications. The technique is based on the introduction of a new format for the representation of static real-time schedules of synchronous specifications. At the level of this formalism, we define the correctness of the schedule with respect to the initial functional specification, which enables us to define correctness-preserving optimization algorithms that improve the real-time latency of the computation of one synchronous cycle. We envision several directions for extending the work presented here. Most important is the extension of our results to allow optimization over time-triggered buses. One specific problem here is how to use execution conditions in order to better determine which communications of the FlexRay dynamic section can be guaranteed. Another is to take into account the latency of the protocol layers on the sender and receiver side (in our model, this latency is assumed to be 0). A second direction is the extension of the model to cover architectures with multiple buses. A natural question in this case is which of the optimality results still stand. A third direction consists in extending the clock calculus, to cover specifications where the clocks are less hierarchized, or defined by external events, such as periodic input arrivals. A good starting point in this direction is the use of endochronous synchronous systems [30] specified in the Signal/Polychrony language [19]. We intend to improve our schedule representation formalism to allow a finer characterization of worst-case execution dates, a` la Eles et al. [15]. More work is also needed to deal with multi-rate synchronous specifications and their schedules. A classical (single-clock) synchronous specification fully defines the temporal allocation of all operations with cycle precision. The only degrees of freedom that can be taken by a scheduling algorithm concern (1) temporal allocation inside a cycle, and (2) the distribution (which is what SynDEx does). However, in multi-clock specifications, the various operations can have different periods, which amounts to having not one global cycle-based execution, but several interlocked cycle-based executions running in parallel. One approach here (already used with SynDEx, and thus amenable to our optimizations) is to transform multi-rate specifications into single-rate ones either by determining a so-called hyperperiod (the least common multiple of all periods of all operations).
References 1. Arditi, L., Boufa¨ıed, H., Cavani´e, A., Stehl´e, V.: Coverage-directed generation of system-level test cases for the validation of a DSP system. In: FME 2001: Formal Methods for Increasing Software Productivity. Lecture Notes in Computer Science, vol. 2021. Springer, Berlin (2001) 2. Benveniste, A., Berry, G.: The synchronous approach to reactive and real-time systems. Proceedings of the IEEE 79(9), 1270–1282 (1991)
8
Real-Time Implementation of Synchronous Specifications
261
3. Benveniste, A., Caspi, P., Edwards, S.A., Halbwachs, N., Guernic, P.L., de Simone, R.: The synchronous languages 12 years later. Proceedings of the IEEE 91(1), 64–83 (2003) 4. Berry, G.: Real-time programming: general-purpose or special-purpose languages. In: G. Ritter (ed.) Information Processing 89, pp. 11–17. Elsevier, Amsterdam (1989) 5. Berry, G.: The constructive semantics of Pure Esterel. Esterel Technologies. Electronic version available at http://www.esterel-technologies.com (1999) 6. Berry, G., Gonthier, G.: The Esterel synchronous programming language: design, semantics, implementation. Science of Computer Programming 19(2), 87–152 (1992) 7. Bouali, A.: XEVE, an Esterel verification environment. In: Proceedings of the Tenth International Conference on Computer Aided Verification (CAV’98). UBC, Vancouver, Canada. Lecture Notes in Computer Science, vol. 1427. Springer, Berlin (1998) 8. Bouali, A., Marmorat, J.P., de Simone, R., Toma, H.: Verifying synchronous reactive systems programmed in ESTEREL. In: Proceedings FTRTFT’96. Lecture Notes in Computer Science, vol. 1135, pp. 463–466. Springer, Berlin (1996) 9. Burns, A., Tindell, K., Wellings, A.: Effective analysis for engineering real-time fixed priority schedulers. IEEE Transactions on Software Engineering 21(5), 475–480 (1995) 10. Caspi, P., Curic, A., Maignan, A., Sofronis, C., Tripakis, S., Niebert, P.: From Simulink to SCADE/Lustre to TTA: a layered approach for distributed embedded applications. In: Proceedings LCTES (2003) 11. Cohen, A., Duranton, M., Eisenbeis, C., Pagetti, C., Plateau, F., Pouzet, M.: N-synchronous Kahn networks: a relaxed model of synchrony for real-time systems. In: ACM International Conference on Principles of Programming Languages (POPL’06). Charleston, SC, USA (2006) 12. Cucu, L., Pernet, N., Sorel, Y.: Periodic real-time scheduling: from deadline-based model to latency-based model. Annals of Operations Research 159(1), 41–51 (2008). http://www-rocq. inria.fr/syndex/publications/pubs/aor07/aor07.pdf 13. Danne, K., Platzner, M.: An EDF schedulability test for periodic tasks on reconfigurable hardware devices. In: Proceedings of the Conference on Language, Compilers, and Tool Support for Embedded Systems, ACM SIGPLAN/SIGBED. Ottawa, Canada (2006) 14. Dennis, J.: First version of a dataflow procedure language. In: Lecture Notes in Computer Science, vol. 19, pp. 362–376. Springer, Berlin (1974) 15. Eles, P., Kuchcinski, K., Peng, Z., Pop, P., Doboli, A.: Scheduling of conditional process graphs for the synthesis of embedded systems. In: Proceedings of DATE. Paris, France (1998) 16. Gao, M., Jiang, J.H., Jiang, Y., Li, Y., Sinha, S., Brayton, R.: MVSIS. In: Proceedings of the International Workshop on Logic Synthesis (IWLS’01). Tahoe City (2001) 17. Grandpierre, T., Sorel, Y.: From algorithm and architecture specification to automatic generation of distributed real-time executives. In: Proceedings MEMOCODE (2003) 18. Gu, Z., He, X., Yuan, M.: Optimization of static task and bus access schedules for timetriggered distributed embedded systems with model-checking. In: Proceedings DAC (2007) 19. Guernic, P.L., Talpin, J.P., Lann, J.C.L.: Polychrony for system design. Journal for Circuits, Systems and Computers 12(3), 261–303 (2003). Special Issue on Application Specific Hardware Design 20. Halbwachs, N.: Synchronous programming of reactive systems. In: Computer Aided Verification (CAV’98), pp. 1–16 (1998). citeseer.ist.psu.edu/article/halbwachs98synchronous.html 21. Halbwachs, N., Caspi, P., Raymond, P., Pilaud, D.: The synchronous dataflow programming language Lustre. Proceedings of the IEEE 79(9), 1305–1320 (1991) 22. Kermia, O., Cucu, L., Sorel, Y.: Non-preemptive multiprocessor static scheduling for systems with precedence and strict periodicity constraints. In: Proceedings of the 10th International Workshop on Project Management and Scheduling, PMS’06. Posnan, Poland (2006). http:// www-rocq.inria.fr/syndex/publications/pubs/pms06/pms06.pdf 23. Kermia, O., Sorel, Y.: Load balancing and efficient memory usage for homogeneous distributed real-time embedded systems. In: Proceedings of the 4th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems, SRMPDS’08. Portland, Oregon, USA (2008). http://www-rocq.inria.fr/syndex/publications/pubs/srmpds08/srmpds08. pdf
262
D. Potop-Butucaru et al.
24. Kermia, O., Sorel, Y.: Schedulability analysis for non-preemptive tasks under strict periodicity constraints. In: Proceedings of 14th International Conference on Real-Time Computing Systems and Applications, RTCSA’08. Kaohsiung, Taiwan (2008). http://www-rocq.inria.fr/ syndex/publications/pubs/rtcsa08/rtcsa08.pdf 25. Kountouris, A., Wolinski, C.: Efficient scheduling of conditional behaviors for high-level synthesis. ACM Transactions on Design Automation of Electronic Systems 7(3), 380–412 (2002) 26. Leung, J., Whitehead, J.: On the complexity of fixed-priority scheduling of periodic real-time tasks. Performance Evaluation 2(4), 237–250 (1982) 27. Liu, C., Layland, J.: Scheduling algorithms for multiprogramming in a hard real-time environment. Journal of ACM 14(2), 46–61 (1973) 28. Lopez, J.M., Garcia, M., Diaz, J.L., Garcia, D.F.: Worst-case utilization bound for EDF scheduling on real-time multiprocessor system. In: Proceedings of 19th Euromicro Conference on Real-Time Systems, ECRTS’00. Stockholm, Sweden (2000) 29. Obermeisser, R.: Event-Triggered and Time-Triggered Control Paradigms. Springer, Berlin (2005) 30. Potop-Butucaru, D., Caillaud, B., Benveniste, A.: Concurrency in synchronous systems. Formal Methods in System Design 28(2), 111–130 (2006) 31. Potop-Butucaru, D., de Simone, R., Sorel, Y., Talpin, J.P.: Clock-driven distributed realtime implementation of endochronous synchronous programs. In: Proceedings EMSOFT’09. Grenoble, France (2009) 32. The Scade tool page: http//www.esterel-technologies.com/products/scade-suite/ 33. Schneider, K.: Proving the equivalence of microstep and macrostep semantics. In: 15th International Conference on Theorem Proving in Higher Order Logics (2002) 34. Sentovich, E., Singh, K.J., Lavagno, L., Moon, C., Murgai, R., Saldanha, A., Savoj, H., Stephan, P., Brayton, R., Sagiovanni-Vincentelli, A.: SIS: a system for sequential circuit synthesis. Memorandum UCB/ERL M92/41, UCB, ERL (1992) 35. Sentovich, E., Toma, H., Berry, G.: Latch optimization in circuits generated from high-level descriptions. In: Proceedings of the International Conference on Computer-Aided Design (ICCAD’96) (1996) 36. Shiple, T., Berry, G., Touati, H.: Constructive analysis of cyclic circuits. In: Proceedings of the International Design and Testing Conference (ITDC). Paris (1996) 37. de Simone, R., Ressouche, A.: Compositional semantics of Esterel and verification by compositional reductions. In: Proceedings CAV’94. Lecture Notes in Computer Science, vol. 818. Springer, Berlin (1994) 38. The Simulink tool page: http://www.mathworks.com/products/simulink/ 39. Singh, M., Theobald, M.: Generalized latency-insensitive systems for single-clock and multiclock architectures. In: Proceedings DATE (2004) 40. Susini, J.F., Hazard, L., Boussinot, F.: Distributed reactive machines. In: Proceedings RTCSA (1998) 41. Touati, H., Berry, G.: Optimized controller synthesis using Esterel. In: Proceedings of the International Workshop on Logic Synthesis (IWLS’93). Lake Tahoe (1993) 42. Zheng, W., Chong, J., Pinello, C., Kanajan, S., Sangiovanni-Vincentelli, A.: Extensible and scalable time triggered scheduling. In: Proceedings ACSD (2005)
Index
A Algebraic transformations, 18–19 Alias, 132, 187 Automaton, 20, 89–93, 98, 100, 103–107, 110, 129–132, 142, 143, 238
B Balance equations, 56–58, 62, 63, 67, 73 Basic blocks, 127, 128 Bit vector, 137, 138 Blocking, 22, 25, 26, 28, 34, 48, 49, 80, 81, 87–101, 103–105, 115, 123, 124, 126–130, 132, 137–139, 187, 230, 231, 234, 236, 238, 244, 245 Buffer sharing, 143, 144
C C, vi, 3, 14, 21, 25–28, 30, 31, 33–34, 48, 96, 99, 121–124, 126, 127, 130–132, 136, 138–141, 193, 239 CCC, 3, 21, 25, 31, 123, 147 Call tree, 137, 139 Causality, 47, 109, 159, 177, 203, 210, 231, 233–235, 238 Cell processor, 140, 145 Channels, 36, 42–45, 48, 50, 122–130, 132, 133, 136–144, 176, 178, 179, 197, 253 Clock analysis, 230, 231, 236, 244 Clock calculus, 8, 18, 25, 28, 212, 214, 223, 259, 260 Clocks, viii, ix, 2–9, 12, 14–21, 23, 25–35, 46–49, 56, 95, 100, 101, 103, 108, 110, 114, 122, 123, 149–152, 159, 163–166, 173–176, 201–226, 230–232, 234, 236–238, 241, 244–251, 259, 260
Code generation, vi, viii, 3, 8, 10, 13–39, 81, 85, 86, 93, 94, 117, 123–132, 137, 138, 143, 160, 181, 183, 188, 193, 194, 203, 229, 230, 236 Communication, vii, 2, 5, 8, 17, 18, 22–24, 26, 27, 30, 35–37, 39, 41–43, 45–50, 54, 58, 59, 63, 69, 73, 74, 80, 83, 84, 91, 110–114, 117, 122, 123, 126, 129, 132, 133, 135–139, 141, 143, 145, 166, 179, 182, 203, 205, 215, 217–219, 231, 232, 236, 238, 239, 241, 244, 246, 248–250, 253, 255–260 Communication patterns, 129 Compilation, vi, vii, 1–39, 41, 43, 51, 58, 59, 64, 73, 74, 94, 100, 114, 115, 148, 173, 202, 226, 230, 232, 235, 237 Components, v, viii, 1, 8, 25, 26, 30, 35, 36, 38, 42, 43, 50, 51, 56, 60, 80, 83, 88, 89, 94, 96–97, 114–116, 147, 148, 151, 152, 159, 160, 165, 167, 168, 170, 182, 201, 204, 205, 215–217, 233, 234, 239 Compositional approach, 142 Concurrency, 30, 41, 42, 47, 56, 58, 60, 69, 73, 74, 81, 124, 141, 197, 236 Condition variables, 121 Contract assume/guarantee, 151–161, 167 boolean algebra, viii, 148, 153, 155, 156, 158, 160, 169 design, viii, 147, 148, 158, 168–170 process algebra, 148, 152–163 real-time, 147–149 refinement, 147, 156–159, 161–162, 167, 168 semantic, 152, 154, 161, 162
263
264 specification, 112, 114, 117, 147–149, 151, 152, 159, 160, 163, 168–170 synchronous language, 148, 160 Contract-directed code generation, 160 Control-flow graph, 127, 202 Control software, 80, 110, 113, 117 Correct by construction static routing transformations, 60 CSP, 44, 122 Cyclo-Static DataFlow (CSDF), 44, 45, 59–60, 62, 73
D Data Control Graph (DCG), 3, 14, 20, 21, 34 Dataflow, 41–44, 46–60, 74, 81, 86, 88–93, 96, 98–101, 105, 107, 109, 122, 230, 231, 236–238, 241–246, 248, 249, 251–253, 258–260 Dataflow graphs, 59–60, 242, 251 Deadlock, 4, 9, 14, 15, 18, 19, 25, 43, 50, 51, 57–60, 63, 68, 71, 73, 126, 130, 141–143, 182–184, 197, 225 Determinism, viii, 74, 121–123, 136, 248 DFS, 127 DMA, 145
E Embedded software, vi–viii, 83, 173–198 Embedded software design, vi Embedded systems, v, vii, viii, ix, 1–4, 14, 37, 41–75, 111, 121, 147, 148, 160, 175, 176, 201–205, 226, 229, 230, 235, 241 Endochrony, 14, 183, 190, 197 Endochrony (endochronous system), 14, 24, 183, 190–192, 197, 260 Esterel, 21, 47, 122, 135, 173–176, 236, 237 Event-driven, 175 Event function, 17, 138, 139 Exception handler, 137 Exceptions, 132–139, 144, 256 Exceptions and communication, 135–138, 144 Exception scope, 136 Explicit model-checking, 142 Exponential state space, 143 External variable, 86, 88, 94, 99, 111, 113–144
F FIFO, 4, 24, 49, 50, 66, 134, 135, 145, 166, 217 Fixed point, 129, 173
Index Flight software, 79–85, 87, 89, 92, 93, 98, 99 Formal methods, vi–viii, 116 Formal verification, vi, 2, 37, 48, 49, 74, 75, 81, 122, 148, 230, 235, 237 Function calls, 23, 31, 32, 81, 126, 132, 133, 140 Function pointers, 124, 126, 132
G Globally asynchronous and locally synchronous (GALS), 2, 20, 80, 84, 88, 113, 116
H Hardware/software boundary, 122 Haskell, 145
I IBM, 140, 145 Inferred parameters, 123 Interconnect optimization, 42–43, 75 Interleavings, 122 I/O library, 144
J Java, 3, 25, 31, 38, 96, 121, 147
K Kahn networks, 122, 208, 237 K-periodic routing, 45, 59–74 K-periodic throughput equalization, 51, 59, 63–64
L Latency insensitive design (LID), 43, 46, 48–49, 56, 58, 74 Locks, 68, 94, 121, 136, 138, 211 Logical clock, viii, ix, 2, 47, 49, 149, 201–226 Lustre, 48, 82, 122, 159, 173, 176, 214, 236, 237, 241–243, 259
M Macros, 127 Marked Graphs, 43, 44, 46, 49–56, 58, 64, 68, 74 Mask operation, 137
Index Model-checking, 48, 142, 143, 162, 235 Model-Driven Engineering (MDE), 80, 82, 85, 87, 116, 117, 201 Models of computation, vi, 42, 43, 74, 160, 173 Modulo scheduling, 202 Multi-clock, 37, 174, 236, 260 Multiway rendezvous, 133, 136, 138, 144
N Nested loops, 127, 202 Network block, 123 Next, 125–128, 132, 133, 136, 142 Non-determinism, viii, 121, 122 NuSMV, 141, 142
O Offline scheduling, 231 Operating system, 24–27, 80, 121, 124, 139, 232, 239 Optimality condition, 255 Optimization, 2, 37, 38, 43, 45, 48, 50, 55, 56, 58, 122, 176, 230–232, 234, 241, 244, 247–249, 254–260
P Parallel exceptions, 135–136 Parallelism, 41, 74, 121, 239 Par construct, 132 Pass-by-reference, 132, 140 Pipeline, 55, 143, 201, 220 Point-to-point channels, 42, 124 Poisoned, 136–141, 144 Polychrony, vi, viii, 3, 9, 14, 17, 18, 21, 25–27, 30, 31, 34, 37–39, 86, 117, 176, 178, 190, 236, 259, 260 Polychrony, multi-rate, 173 Porting C programs, 132 POSIX threads, 136, 139 PPU, 140 Prime implicate, 176, 189–192, 197, 198 Processes, 122–127, 129–135 Process Networks, vii, 41–45, 74, 201, 202, 222 Product machine, 129 Program analysis, vi–viii, 3, 143 Program counter, 124, 129, 130 Program transformation, 3, 37, 38 Pthreads, 121, 140
265 R Races, viii, 121 Reader pointer, 124 Real-time, v–vii, ix, 3, 8, 43, 75, 80–82, 85, 95, 99, 111, 112, 147–149, 198, 201–206, 226, 229–260 Real-time scheduling, 229–232, 237, 239–241, 252, 258, 259 Reconfiguration, 86, 111, 112, 114–117 Recursion, 123, 124, 132–137 Recv, 132–134, 136, 137, 141, 142, 144 Refinement, vi, 2–4, 12, 19–21, 29, 86, 88, 96–99, 118, 147, 156–159, 161, 162, 167–169 Rendezvous, 122, 123, 126, 133, 135, 136, 138–139, 143–145 Resuming, 102, 103, 105, 124, 126, 132
S Scheduler, 9, 25, 26, 30, 31, 38, 88, 112, 117, 124, 126, 139, 205, 240 Scheduling, vii, viii, ix, 20, 23–24, 26, 29–31, 33, 38, 39, 41, 43, 45, 48, 49, 57, 64, 74, 82, 127, 129, 132, 141, 143–145, 190, 193, 197, 202, 209, 222, 229–233, 235, 237, 239–241, 248, 252–260 Scheduling-independent, 122, 135, 144, 145 Scheduling policy, 121, 130, 240 Semantics, vi, 1–4, 7, 9–10, 21, 22, 25, 42, 44, 45, 69, 81, 86, 88, 99–111, 117, 122, 123, 135, 136, 144, 152, 161, 162, 176, 177, 182, 203, 204, 206, 208–210, 216, 217, 221–226, 230, 232, 234, 235, 243–244 Send, 125, 126, 130, 132–138, 141, 142, 144 Sequential implementability, 183–185, 190–193, 197 Service oriented middleware, 110–111 Shared memory, viii, 121, 143 Shared variables, 8, 12, 22, 86, 94, 122, 233 SHIM, vii, 121–145 SIGNAL, vi, vii, viii, 2–32, 34–38, 41, 42, 47, 48, 56, 75, 80–82, 86, 88, 94, 95, 98–109, 114, 122, 123, 147–170, 173–194, 197, 205, 214, 230, 233–239, 248, 250–260 Software architecture, vi, 80, 82, 87–96, 116 Software synthesis, viii, ix, 173–198 Spanning tree, 127 SPIN, 122, 141 State abstraction, 129, 143 State names, 130
266 State signature, 129 Static routing, 60 Static scheduling, ix, 26, 29–30, 38, 41, 127, 143, 222, 231, 248, 253, 257 Subset construction algorithm, 129 Switch statement, 132 Synchronous Data Flow (SDF), vi, viii, 43–46, 56–60, 62, 63, 67, 73, 74, 86, 197, 203, 204, 221–225, 231, 241 Synchronous Elastic Systems, 43, 48 Synchronous formalism, 230, 231, 234, 235, 237, 242 Synchronous islands, 88, 93–94, 99, 112, 113, 116, 117 Synchronous languages, vi, viii, 2, 88, 148, 160, 176, 202, 230, 232, 235–237 Synchronous model, ix, vi, 56, 114, 122, 141, 231 Synchronous programming, vi, 173, 197, 237 SynDEx, 3, 48, 237, 241–249, 251–253, 259, 260 Synergistic processing unit (SPU), 140 Synoptic, vii, 79–118 Synthesized code, 124, 125
Index T Tail-recursion, 124 Termination, 125, 126, 130 Testing, 84, 121 Transitive poisoning, 136 Turing complete, 122
V Verification, vi, vii, 2, 3, 37, 48, 74, 75, 81, 82, 85, 86, 88, 89, 117, 121, 122, 148, 182, 189, 194, 232, 235, 237, 248, 251 Verilog, 48, 123
W Writer pointer, 124
X X10, 145