265 26 24MB
English Pages 379 Year 1992
A-PDF Deskew DEMO: Purchase from www.A-PDF.com to remove the wat
REAL-TIME SYSTEMS
This page is intentionally left blank
REAL-TIME SYSTEMS Implementation of Industrial Computerised Process Automation
Wolfgang A. Halang University of Groningen, The Netherlands
Krzysztof M. Sacha Warsaw University of Technology, Poland
Marek Drozdz Marten D. van der Laan Jan Niziotek Jean J. Schwartz Jacques J. Skubich Alexander D. Stoyenko Tomasz Szmuc Ronald M. Tol
with contributions by: Academy of Mining and Metallurgy, Cracow University of Groningen Silesian University of Technology, Gliwice Institut National des Sciences Appliquees de Lyon Institut National des Sciences Appliquees de Lyon New Jersey Institute of Technology Academy of Mining and Metallurgy, Cracow University of Groningen
Consulting Editor: David J. Leigh, Staffordshire University
World Scientific Singapore • New Jersey • London • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Fairer Road, Singapore 9128 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 73 Lynton Mead, Totteridge, London N20 8DH
Library of Congress Cataloging-in-Publication Data Halang, W. A. (Wolfgang A.) 1951Real-time systems : implementation of industrial computerised process automation / Wolfgang A. Halang, Krzysztof M. Sacha, with contributions by Marek Drozdz . . . [et al.]. p. cm. Includes bibliographical references and index. ISBN 9810210639 1. Real-time data processing. 2. Process control - Automation. I. Sacha, Krzysztof M. II. Title. QA76.54.H353 1992 629.8'95--dc20 92-41026 CIP
Copyright © 1992 by World Scientific Publishing Co. Pte. Ltd. All rights reserved This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
Printed in Singapore by Utopia Press.
Preface The Trans-European Mobility Scheme for University Studies (TEMPUS) is a pro gramme of the European Communities aimed at transferring state-of-the-art tech nology and scientific knowledge to the former Communist countries of Central and Eastern Europe. To this end, scientists and graduate students are exchanged and common teaching activities are pursued. In the framework of TEMPUS, the three-year Joint European Project 0962-90/1 "Education in Control Systems and Information Technology" is carried out by a con sortium of some ten European Community universities and three Polish universities. The main teaching activities of this project are three summer schools at Warsaw (1991), Cracow (1992), and Gliwice (1993). Teaching material is prepared for and presented for the first time during these summer schools. Afterwards it is intro duced into the normal curricula of Polish and other Central and Eastern European universities. Real-time computing and industrial computerised process automation was the ma jor topic of the summer school held at the Academy of Mining and Metallurgy, Cra cow, in June 1992. During the preparation for this event no suitable textbook written in English could be found. The shortcomings of the reviewed books for the envisaged purpose are either their focus on different application areas or their lack of compre hensiveness, or even their lack of a solid conceptional foundation, which culminates in the common misconception of real-time computing as fast computing. Therefore, it was decided to write the present book. It is based on the lecture notes prepared by the first author for a course held at the University of Dortmund in the summer semester 1985, and updated several times. Supported by a TEMPUS mobility flow grant, the second author spent the time from November 1991 to April 1992 in Groningen, where he brought the existing material into a consistent form and included considerable extensions. Thus, to the best of our knowledge, this book represents the first comprehensive text in English on real-time and embedded computing systems. It is addressed to engineering students of universities and polytechnics as well as to practitioners and provides the knowledge required for the implementation of industrial computerised process control and manufacturing automation systems. The book avoids mathemat ical formalisms and illustrates the relevance of the concepts introduced by practical examples and case studies. Special emphasis is placed on a sound conceptual basis and on methodologies and tools for the development of high quality control soft ware, since software dependability has been identified as the major problem area of computerised process automation. The first three chapters of the book lead the reader into the application area of industrial automation and lay the conceptual foundation for real-time computing and
VI
Real-Time Systems
computerised control of continuous and discrete processes. The following three chap ters are concerned with the hardware aspects of process control computers, viz., with architectural issues, process interfacing, and communication networks in distributed systems. Then, the real-time specific features of operating systems are presented, followed by a comparison of some well-known representatives for real-time operating systems. The common theme of the following five chapters is application software. First, with special emphasis on the language PEARL, and to a lesser extent Ada, a detailed treatment of high level real-time programming is presented. Proving that given programs satisfy their timing requirements is an indispensable step towards dependability of real-time systems. To this end, the methodology of schedulability analysis is applied as described in Chapter 10. Then, general methods for the de velopment of highly reliable software are discussed accompanying each phase of the software life cycle. The presentation focuses on software quality assurance schemes and a sound requirements engineering methodology. Since formal specification and verification methods are considered to be tools with a promising potential, a short introduction into this area is provided in Chapter 14. After that, hardware and pro gramming aspects of programmable logic controllers are highlighted, because these devices enjoy an increasing popularity in industrial automation, but appear to be neglected by academia. The book concludes with two case studies on a software de velopment environment for distributed real-time applications and on a workbench for the graphical engineering of real-time software, respectively. Finally, the authors should like to express their appreciation to the following gen tlemen for their contributions to the text of the book: M.D. van der Laan (Chapter 3), A.D. Stoyenko (Chapter 10), R.M. Tol (Chapter 13), T. Szmuc (Chapter 14), and J. Niziolek, J. J. Schwartz, J. J. Skubich (Chapter 16). Special thanks go to M. Drozdz, who devoted great effort to the preparation of the book's many detailed illustrations. The quality of the language, and especially the use of English, was assured by D.J. Leigh and through a final thorough check carried out by J. Wichmann.
Contents Preface List of Figures
xiii
List of Tables
xxi
Authors 1
2
3
4
v
xxiii
Real-Time Computing and Industrial Process Automation
1
1.1
Introduction
1
1.2
Industrial Control Systems
4
1.3
Example: A Chemical Process
7
1.4
Historical Perspective
10
Conceptual Foundations
15
2.1
Real-Time System Characteristics
15
2.2
Continuous and Discrete Time
17
2.3
Engineering Approach to Hard Real-Time System Design
20
Digital Control of Continuous Processes
23
3.1
Introduction
23
3.2
Linear System Theory
24
3.3
Control System Analysis and Design
28
3.4
Digitising Analogue Signals
32
3.5
New Developments
41
Hardware Architectures
45
viii
5
6
7
8
Real-Time Systems 4.1
Classical Process Automation
45
4.2
Centralised Direct Digital Control
50
4.3
Redundant Configurations
53
4.4
Multi-Level Control Systems
58
4.5
Network-Based Distributed Systems
63
Process Interfacing
69
5.1
Technical Process Infrastructure
69
5.2
Interfacing Digital Signals
72
5.3
Analogue Outputs
76
5.4
Analogue Inputs
82
5.5
Serial Interface
89
Communication Networks
99
6.1
Network Architecture
99
6.2
LAN Technology
104
6.3
LAN Medium Access Control
108
6.4
LAN Logical Link Control
115
6.5
M A P / T O P Protocol
117
Real-Time Operating Systems Principles
123
7.1
Operating System Requirements
123
7.2
Synchronous and Asynchronous Task Execution
126
7.3
Multi-Tasking
130
7.4
Task Synchronisation and communication
137
7.5
Time and Event Handling
145
7.6
Distributed Operating Systems
,
147
Comparison of Some Real-Time Operating Systems
151
8.1
151
System iRMX88
Contents
9
ix
8.2
System iRMX86
155
8.3
System QNX
160
8.4
System PORTOS
165
8.5
Comparison of Real-Time Operating Systems
169
High Level Real-Time Programming
173
9.1
Real-Time Features in High Level Languages
173
9.2
A Closer Look at Ada and PEARL
177
9.3
Requirements for New High Level Language Features
183
9.4
High-Integrity PEARL
185
9.5
Advanced Features of High-Integrity PEARL
189
10 Schedulability Analysis
201
10.1 Schedulability Analyser
201
10.2 Front-End of the Schedulability Analyser
203
10.2.1 A Segment Tree Example
204
10.2.2 Front-End Statistics
206
10.3 Back-End of the Schedulability Analyser
208
10.4 Program Transformations
209
10.5 Empirical Evaluation
215
11 System and Software Life Cycle
217
11.1 System Development
217
11.2 Software Life Cycle
220
11.3 Software Development Economy
225
11.4 Classical Software Development Methods
228
11.5 Prototyping
233
11.6 Object-Oriented Development
236
11.7 Transformational Implementation
238
11.8 Evaluation of the Development Paradigms
242
x
Real-Time Systems
12 Software Quality Assurance
245
12.1 Software Quality Assurance Planning
245
12.2 Reviews and Audits
247
12.3 Structured Walkthrough and Inspections
249
12.4 Software Testing
251
13 Computer Aided Software Engineering Tools
255
13.1 Software Development Environments
255
13.2 The EPOS System
258
13.3 Example: A Chemical Process
263
14 Formal Specification and Verification Methods
271
14.1 Introduction
271
14.2 Sequential and Parallel Description
272
14.3 Petri Nets
275
14.4 Properties of Petri Nets
280
14.5 Temporal Logic
284
14.6 Correctness Verification Using Temporal Logic
286
15 Programmable Logic Controllers
291
15.1 Hardwired and Programmable Logic Control
291
15.2 PLC Hardware Architecture
295
15.3 Programming Languages
301
15.4 Sequential Function Charts
308
15.5 PLC Peripherals
310
16 Case Studies and Applications 16.1 Distributed System Design in CONIC
315 315
16.1.1 Main Goals and Characteristics
315
16.1.2 CONIC Module Programming Language
316
Contents
xi 16.1.3 CONIC Configuration Language
317
16.1.4 CONIC Distributed Operating System
318
16.1.5 The CONIC Communication System
320
16.1.6 Conclusion
321
16.2 Graphical Design of Real-Time Applications in LACATRE
322
16.2.1 Main Goals and Objectives
322
16.2.2 Objects of the LACATRE Language
323
16.2.3 Configurable Objects
324
16.2.4 Programmable Objects
326
16.2.5 Example: A Robotics Application
328
16.2.6 Real-Time Software Engineering Workbench
331
Bibliography
335
Index
345
This page is intentionally left blank
List of Figures 1.1
Technical process
4
1.2
Real-time control system
7
1.3
Chemical process
9
2.1
Continuous-time process
18
3.1
Model of a physical system
24
3.2
A simple hydraulic system
25
3.3
Impulses
26
3.4
Block diagram of a closed loop control system
29
3.5
Block diagram of a digital controller
32
3.6
Spectra of original and sampled signal
34
3.7
Zero-order hold
34
3.8
Undersampling of a signal, causing aliasing
35
3.9
The frequency response of an ideal low-pass filter
36
3.10 The second-order B-spline
38
3.11 Representation of a signal with respect to a basis of first-order B-splines. 39 3.12 An example of a (uniform) quantisation characteristic, K = 5
40
3.13 Modeling the quantisation noise by an independent additive source. .
40
3.14 Incorporating the delay time by extrapolation of the sampled input. .
42
4.1
Control system of a chemical process
46
4.2
Continuous controller
47 xiii
Real-Time Systems 4.3
The structure of a classical process automation system
48
4.4
The structure of a centralised control system
50
4.5
Centralised control of the chemical process
52
4.6
Centralised control system with a stand-by computer
54
4.7
Back-up loop controllers
56
4.8
Parallel computers with voting
57
4.9
Safeguarding circuit
59
4.10 Set-point control system
60
4.11 Three-level control system structure
62
4.12 Local area network based control system
64
4.13 Layered structure of a distributed control system
65
4.14 Computer integrated manufacturing system
67
5.1
Interface between a process and a control system
70
5.2
Two-state contact switch
71
5.3
Scale platform
72
5.4
Galvanic decoupling
73
5.5
Relay driver
74
5.6
Parallel input-output interface
75
5.7
Bouncing of an electrical contact
76
5.8
Pulse width modulator scheme (a), and timing diagrams (b)
77
5.9
Resistor ladder (a) and digital-to-analogue converter (b)
79
5.10 Galvanic decoupling of analogue signal
80
5.11 Voltage to current converter
81
5.12 Mechanical digital-to-analogue converter
81
5.13 Triac driver scheme (a) and timing diagrams (b)
83
5.14 Analogue-to-digital converter
84
5.15 Converter with an amplifier, sample and hold, and multiplexer
85
5.16 Flying capacitor multiplexing
86
%
List of Figures
xv
5.17 Dual-slope converter scheme (a) and timing diagrams (b)
88
5.18 Voltage-to-frequency converter scheme (a) and timing diagrams (b). .
90
5.19 Serial transmission link
92
5.20 RS-232-C connection from a computer to a video display terminal (a), and to a signal conditioning circuit (b)
94
5.21 Byte (10001101) transmission waveform
95
5.22 Current-loop: active-to-active converter
96
5.23 Two-wire connection for a differential RS-422-A signal
97
6.1
End stations and communication network
99
6.2
Point-to-point connections: star (a) and irregular (b)
100
6.3
Multi-point connections: ring (a) and bus (b)
100
6.4
Layered network architecture
102
6.5
Baseband signaling
105
6.6
Baseband bus network interface
107
6.7
Ring network interface
108
6.8
Broadband bus LAN: twin-cable bus (a) and tree topology (b)
109
6.9
Standard ISO 8802
110
6.10 ISO 8802.3 frame
Ill
6.11 ISO 8802.4 frame
114
6.12 ISO 8802.2 Logical link control frame (a) and the control field (b). . . 116 6.13 MAP/TOP/PROFIBUS network hierarchy
118
7.1
Synchronous execution of periodic and event-driven tasks
127
7.2
Synchronous executive
129
7.3
Quasi-simultaneous task execution
130
7.4
Task state transition diagram (T - Terminated, R - Ready, S - Sus pended, E - Running)
131
7.5
Model of a real-time multi-tasking operating system
133
7.6
Model of a multi-tasking executive
133
xvi
Real- Time Systems 7.7
Task state transition algorithms
135
7.8
Active semaphore operations
137
7.9
Incorrect task synchronisation in a control system
139
7.10 Correct task synchronisation in a control system
140
7.11 Mutual exclusion of contending tasks
141
7.12 Semaphore operations
142
7.13 Synchronised (a) versus mailbox (b) message passing scheme
143
7.14 Rendezvous
144
7.15 Periodic task execution
146
7.16 Layered network operating system
149
7.17 Distributed operating system
150
8.1
iRMX implementation of the synchronised task communication. . . . 152
8.2
iRMX task state diagram
153
8.3
Periodic task in iRMX
155
8.4
iRMX86 implementation of the mutual exclusion of tasks
159
8.5
QNX implementation of the rendezvous
161
8.6
QNX task state diagram
162
8.7
QNX virtual circuit
163
8.8
QNX periodic task
165
8.9
PORTOS system architecture
166
10.1 A block of high level language statements
205
10.2 The corresponding block of assembly statements
205
10.3 The corresponding tree of segments
207
10.4 The corresponding converted subtree of segments
210
10.5 Four conditional clustering tests
213
10.6 An irreducible conditional
214
11.1 System life cycle model
218
List of Figures
xvii
11.2 Software life cycle model
222
11.3 Software cost trends
228
11.4 Dataflow diagram of the vat control system
230
11.5 State transition diagram of the transformation "Control vat workcell". 231 11.6 Detailed diagram of the transformation "Maintain vat"
232
11.7 R-net for the message "pH sample"
235
11.8 Object-oriented decomposition diagram of the pH controller
237
11.9 Transformational implementation model
239
ll.lOThe pH monitoring process and the links to co-operating processes. . 240 12.1 Sample program structure
252
13.1 Schematic representation of a CASE system
256
13.2 EPOS project model
259
13.3 EPOS-R: problem statement
264
13.4 Definition of a lexicon term
264
13.5 EPOS-S input: Action
265
13.6 EPOS-S input: Data
265
13.7 EPOS-S input: Event
266
13.8 EPOS-S input: Device
266
13.9 Functional view of the control system
266
13.10Control flow diagram
267
13.11EPOS-P input: activity
. 267
13.12EPOS-P input: team-member
268
13.13EPOS-P input: resource-unit
268
13.14EPOS-P input: error-report
268
13.15Project structure
269
13.16Project organisation
269
13.17Milestone diagram
270
xviii
Real-Time Systems
14.1 A process as a description of a program execution
274
14.2 Flow of markers through, a transition
277
14.3 Parallel (a) and sequential (b) specifications of a Petri net
278
14.4 Mutual exclusion in a critical region.
282
14.5 A Petri net describing the producer-consumer problem
283
14.6 A Petri net with a deadlock
284
15.1 Bottle-filling line and its state and control signals
292
15.2 Timing diagram of the bottling line control
293
15.3 Contact scheme
294
15.4 Gate logic scheme
294
15.5 PLC architecture
296
15.6 PLC cycle
299
15.7 Multiprocessor PLC architecture
300
15.8 Distributed network of Modicon 984
301
15.9 Function block representation
306
15.10Sequential function charts: sequential steps with alternative branches (a), and parallel steps (b)
309
15.11CASE tools for PLC programming
312
16.1 The layers of the CONIC Distributed Operating System
319
16.2 CONIC system topology.
320
16.3 Message passing scheme in the CONIC Communication System
. . . 321
16.4 Semaphores and mailboxes; state bars (a), and action bars (b)
325
16.5 Resources; state bar (a), and action bars (b)
325
16.6 Tasks; state bar and body bar (a), and action bars (b)
327
16.7 Interrupts
327
16.8 Algorithmic forms
328
16.9 A part of the robot application
330
16.10The creation sequence
332
List of Figures 16.11 The structure of the RTSEW prototype
xix 333
This page is intentionally left blank
List of Tables 1.1
Hard and soft real-time environments
3
1.2
Types of coupling between process and computer control system. . . .
5
1.3
Basic steps in development technology of real-time systems
13
3.1
Types of control
31
5.1
Basic characteristics of analogue-to-digital converters
91
5.2
Basic RS-232-C signals
93
5.3
Popular serial transmission standards
97
6.1
Relationship between medium and signaling method
106
6.2
MAP protocols
120
8.1
The relation between task and interrupt priorities
154
8.2
Comparison of four real-time operating systems
170
9.1
Survey on Real-Time Elements in Ada, Industrial Real-Time FOR TRAN, LTR, PEARL, Real-Time P L / 1 , and Real-Time Euclid. . . .
176
Desirable real-time features
183
9.2
11.1 Objects and operations in the pH controller
238
12.1 Software documentation, reviews and audits
247
15.1 The family of Modicon PLCs 15.2 Modicon 984: Technical data
296 298 xxi
This page is intentionally left blank
Authors Wolfgang A. Halang studied mathematics and theoretical physics at RuhrUniversitat Bochum, Germany. He received a doctorate with a thesis on numerical analysis in 1976. Subsequently, he studied computer science at Universitat Dortmund concluding with a dissertation on function oriented structures for process control com puters. Simultaneously with the latter, he worked with Coca-Cola GmbH in Essen where he was in charge of real-time applications in the chemical Research and Devel opment Department. From 1985 until 1987, he was an assistant professor for Systems Engineering at the King Fahd University of Petroleum & Minerals in Dhahran, Saudi Arabia, teaching courses mainly on real-time and computer control systems. During the academic year 1987/88 he was a visiting assistant (research) professor with the Department of Electrical and Computer Engineering and with the Computer Sys tems Group of the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. From June 1988 till May 1989 he was back in Germany working in the process control division of Bayer AG. In June 1989 he became a full professor at the University of Groningen in the Netherlands holding the chair for applicationsoriented computing science in the Department of Computing Science, which he headed between May 1990 and October 1991. Since July 1992 he holds the chair for infor mation technology in the Department of Electrical Engineering at FernUniversitat Hagen in Germany. His research interests comprise all major areas of hard real-time systems with special emphasis on innovative and function oriented architectures and on application specific peripheral components for process control computers. Presently, he concentrates his interest on predictable system behaviour and on rigorous software verification and safety licensing. He is the founder and European editor-in-chief of "Real-Time Systems", the international journal of time critical com puting systems, and the director of the 1992 NATO Advanced Study Institute on Real-Time Computing, has authored numerous publications and given many guest lectures on topics in this area, is active in various professional organisations and technical committees as well as involved in the programme committees of many con ferences.
Krzysztof M . Sacha studied electrical engineering and automatic control at War saw University of Technology, Warsaw, Poland, and received the M.Sc. degree with honours in 1973. Then he studied computer science at Warsaw University of Tech nology, until receiving his Ph.D. in 1976. Simultaneously, he worked with the ERA Minicomputer Research and Development Centre in Warsaw as a member of the com puter peripherals development group. Since 1976 he has been with the Institute of Automatic Control, Warsaw University of Technology. From 1987 till 1990 he served as a Software Engineering Consultant for the PNEFAL Industrial Automation Enterxxin
XXIV
Real-Time Systems
prise in Warsaw and has been involved in a number of research and industrial projects on design and development of embedded control systems. During the academic year 1991/92 he was a visiting researcher at the University of Groningen in the Nether lands. Currently he is an assistant professor at the Warsaw University of Technology, with the responsibility for the courses on Computer Control Systems and Software Engineering. His research interests include software development technology for real time applications with the emphasis on software specification and design methods, and distributed operating systems. He has published 28 papers and 8 books.
Chapter 1 Real-Time Computing and Industrial Process Automation 1.1
Introduction
The term real-time system is, unfortunately, often misused. Therefore, it is necessary to define precisely the notions relating to real-time systems. The fundamental char acteristic of this field is the real-time operating mode, which is defined in the German industry standard DIN 44300 [1] under number 9.2.11 as follows: R e a l - T i m e O p e r a t i o n : The operating mode of a computer system in which the programs for the processing of data arriving from the outside are permanently ready, so that their results will be available within pre determined periods of time; the arrival times of the data can be randomly distributed or be already a priori determined depending on the different applications. It is, therefore, the task of digital computers working in this operating mode to execute programs which are associated with external processes. The program processing must be temporally synchronised with events occurring in the external processes and must keep pace with them. Hence, real-time systems are always to be considered as embedded in a larger environment and, correspondingly, also called embedded systems. The general idea of operating in real-time is not new, and it can be easily illustrated by examples from everyday life. • A driver who drives a car in a town must permanently observe the environment consisting of streets, traffic lights, other vehicles, pedestrians, animals, and so 1
2
Real-Time Systems on, and must be permanently ready to react quickly enough to prevent an acci dent. Essentially, the driver processes data arriving from his or her environment at random, and responds within periods of time determined by the speed of the car and the relative position of other objects in the environment. If he or she fails, a crash occurs. • A housewife performing her everyday duties must observe a washing-machine, prepare a meal for a dinner, and look after children. The signals arriving from her environment force her to switch among the tasks and respond with the speed determined by the type of a signal. In an emergency, when a child is endangered, she must react immediately. • A control system of a power plant permanently monitors the status of technical processes running in the plant and responds with control signals changing some parameters of the processes. The response time is strictly limited, since the processes run continually and cannot be temporarily suspended. If the control signals do not come in time, a loss or even a catastrophe may happen.
The last example is taken from the technical world, but it is easy to notice that the control system of a power plant is exactly in the same position as the driver or the housewife: it must sense the environment, process the requests, and respond. The precision of a response as well as timeliness is essential. To expand this concept, consider two further examples: • In an airline reservation system the environment consists of clients who request flight reservations. The computer which supports the system cannot prevent them from making the requests, and it must respond within a time period determined by the clients' patience. If it fails, clients may refrain from buying tickets which means a loss for the air carrier. • In a banking system for recording, processing, and displaying the results of transactions, the transactions are entered by bank clerks through video display terminals, following a random distribution. The response time should be short enough to maintain the desired efficiency, but the costs of lateness increase mildly with waiting time, and no sharp, ultimate deadline exists. The examples given above are similar in that they involve time-bounded computer operation. Upon request from the external process, data acquisition, evaluation, and appropriate reactions must be performed on time. The mere processing speed is not decisive for this, but the important aspect is the timeliness of the reactions within predefined time bounds. Hence, it is characteristic for real-time systems that their correctness depends not only upon the processing results, but also upon the time when these results become available. Correct timing is determined by the environment, which cannot be forced to yield to the computer's processing speed.
Real-Time Computing and Industrial Process Automation
3
Table 1.1: Hard and soft real-time environments. Real-time environment Consequences of not responding in time Examples
Analysis objectives
Hard useless results, or danger for people or the process car driver, housewife power plant flight guidance system steel mill worst case performance
Soft costs increasing with waiting time beginning of a party airline reservation bank library system average performance
Despite the similarity, the last two examples differ significantly from the others. A driver, a housewife, and a power plant control system operate in very rigid envi ronments. They must always respond in time and cannot cease operation even for a moment. The reservation and bank supporting systems work in much more flexi ble environments. The computer should on average respond quickly, but occasional delays are acceptable. Moreover, in the case of a failure, a temporal suspension of the system operation may be tolerated provided that the data recorded in a database are preserved. The difference is fundamental. It may be expressed by referring, re spectively, to hard and soft real-time conditions. The bank and airline reservation systems are examples of soft real-time business applications where the problem of data integrity has priority over the problem of timeliness. The power plant control system is an example of a hard real-time process control application where the prob lem of timeliness is prevalent. Hard and soft real-time systems can be distinguished by the consequences of violating the timeliness requirement: whereas soft real-time environments are characterised by costs rising with increasing lateness of results, such lateness is not permitted under any circumstances in hard real-time environments, because late computer reactions are either useless or even dangerous. In other words, the costs for missing deadlines in hard real-time environments are infinitely high. Hard real-time constraints can be determined precisely, and typically result from the physical laws governing the technical processes being controlled [2]. Short character istics of environments and systems of both types are assembled in Table 1.1. This book focuses on hard real-time industrial process control and discrete man ufacturing systems. Soft real-time systems are not discussed, although the methods and tools described herein can be applied to address problems of this type. The book is organised as follows. Chapter 1 provides the reader with introductory in formation concerning problems, structures, and historical perspective of industrial real-time systems. The conceptual foundations are presented in Chapters 2 and 3. The next three Chapters: 4, 5, and 6, deal with hardware architecture problems and solutions. Chapters 7 and 8 describe principles, basic elements, and examples of oper-
Seal-Time Systems
4
State variables Input materials, \ energy or —7 information
Technical process
\ /
Output products
Control variables Figure 1.1: Technical process. ating systems. Real-time programming languages are discussed in Chapters 9 and 10. In the next part, consisting of Chapters 11 through 14, the methodology of embedded software development is presented. Industry standard programmable logic controllers are introduced in Chapter 15 and case studies are presented in Chapter 16.
1.2
Industrial Control Systems
To establish precisely the scope of our further considerations, it is necessary to define the notions of a process, and more specifically a technical process, which determine the operational environment of an industrial real-time system. Both definitions are given in the German industry standard DIN 66201 [3] as follows: Process: The totality of activities in a system which are influencing each other and by which material, energy, or information is transformed, transported, or stored. Technical Process: A process whose physical quantities can be acquired and influenced by technical means. Technical processes can be classified as production, distribution, and storage processes. The view of a process given by the above definitions can be summarised as shown in Figure 1.1. The process runs inside an installation, represented by a rectangle. This can be, e.g., a set of pipes, valves, and distillery columns in a refinery, or the engine and a set of on-board equipment in a spacecraft. There is a stream of materials, part-products, and energy which are input to the process in order to be transformed, transported, or stored; and there is a stream of products which are output from the process. This is represented by horizontal double-line arrows in Figure 1.1. In a refinery the input stream consists mostly of petroleum and cooling water, while in
Real-Time Computing and Industrial Process Automation
5
Table 1.2: Types of coupling between process and computer control system. Type off-line on-line, open loop on-line, open loop on-line, closed loop
Link process to computer human operator human operator direct coupling direct coupling
Link computer to process human operator direct coupling human operator direct coupling
the output stream one can find natural gas, petrol, oil, and other useful products. The spacecraft example illustrates a transportation process, where the notions of input and output are somewhat more abstract and refer to the craft's position and speed. The current status of a technical process is reflected by a set of state variables, such as, e.g., temperature, pressure, and concentration of a chemical reaction, which can be measured by means of sensors attached to certain elements of the installation, and are interpreted by people and devices. At the same time, the process can be influenced by a set of control variables, e.g., liquid flows and temperatures. Control variables can effect physical changes in the installation by means of actuators, which are attached to elements of the installation and can change, e.g., valve positions. The measurement and control flows are depicted in Figure 1.1 as vertical arrows. State and control variables will be jointly referred to as process variables or process data. To maintain a process in a desired state, one must control the process according to certain rules. These are implemented by a control system, which may be manual (human operator), automatic, or mixed. According to the current state of the art in automation technology, the term "automatic" is used as a synonym of "computerbased" (except for very simple or very special applications). The reason for using computer control systems is that they are much more reliable, much more accurate, and much faster than human operators. To do its job, a control system must be somehow connected with the controlled process to read values of state variables and adjust values of control variables. Depending on the type of this connection, computer control systems may be classified as off-line, on-line with open loop, or on-line with closed loop. The classification is assembled in Table 1.2. Systems with a manual input (the first two lines in Table 1.2) cannot be recognised as hard real-time systems, because the time scale of the manual measurements and input operations is extremely slow by computer standards. Off-line systems are often referred to as advisory systems and are used mostly for strategic planning rather than for control. On-line control systems with a manual input are seldom used, but when they occur they are most frequently found in the starting phase of a process automation.
Real-Time Systems
6
Systems with direct coupling from a process to a computer (the last two lines in Table 1.2) are distinctly within the scope of this book. The main application areas of such systems are in: • data acquisition and monitoring of technical processes (open loop), and • process control and automated manufacturing (closed loop). It should be noted that only the last option (on-line, closed loop) corresponds to a fully automatic control system, in which total responsibility for process maintenance is delegated from human interaction to a computer system. The signals generated by sensors and accepted by actuators attached to a pro cess installation do not generally comply with the electrical and logical standards established for a computer interface. Depending on the type of the process and the equipment, the signals can be: • Analogue: Signals whose parameters (e.g., voltage or current) change contin uously with time, reflecting continuous changes of values of process variables. • T w o - s t a t e : Static, binary signals (e.g., contact open or closed, voltage low or high) which reflect current values of binary variables. • Pulse: Dynamic, two-state signals whose parameters (e.g., frequency or number of pulses in a series) change in compliance with the values of process variables. Moreover, the magnitude of the signals may be as low as a few millivolts and as high as several kilovolts. This creates the need for special interfacing capabilities to interface process variables to a computer control system. Now we are at the position to recognise basic elements of an industrial control system (Figure 1.2). These are: Hardware elements: • Process interface: A set of hardwired and programmable devices which inter faces sensors and actuators attached to the process installation to the computer. • Computer: A centralised or distributed computer system which examines the status of the controlled process and evaluates control signals. • Operator station: A set of devices implementing a convenient, preferably graphic, interface for man-machine communication. Software elements:
Real-Time Computing and Industrial Process Automation
—1 -
Apphcation - _ ^ programs ^ ~ — v IRun-time softwan Process interface /\
— Computer —^
,
1
Operating system / ^
Utility programs
^^
0 P : Operator -^— —^> er =>= > 7station ^ a t o Control system r ■
^
—
\ / Sensors | Actuators | Technical process Figure 1.2: Real-time control system. • Operating system: A managerial set of programs which supports the execu tion of all other programs in the computer system. • Application software: A set of programs implemented for monitoring and controlling a particular process. • Utility programs: A set of programs necessary for the development and im plementation of application programs. It should be emphasised that the last element (utility programs) differs significantly from the others. It enables off-line preparation of application programs, but in general does not participate in on-line operation of the system.
1.3
Example: A Chemical Process
Control systems cannot readily be studied outside the context of a controlled process. To provide the reader with the appropriate background information, the example of a simple chemical installation (Figure 1.3) will be described and discussed in some detail. Although simply stated, the problem contains many of the elements of actual real-time systems, illustrates basic concepts, and is used as an example throughout the book. The example describes a modified version of a chemical system introduced in [68]. The installation consists of a single vat, in which two liquid chemicals react, and a number of bottle-filling lines fed by the vat.
Real-Time Systems
8
The liquid to be bottled consists of about 90% of a solution A and the remaining 10% of a solution B, which are mixed to maintain the liquid at a constant pH value. The concentrations of chemicals A and B in the input solutions are not stabilised and can vary with time. The reaction speed depends on the temperature and may become explosive in the case of overheating, so that the vat temperature must be carefully controlled. Bottles to be filled on a particular line are drawn one by one from a supply of bottles and positioned on a scale platform by a special mechanism. As soon as the bottle is at the required position, a contact sensor attached to the platform is depressed and the bottle-filling valve is opened. The scale platform measures the weight of the bottle plus its contents. When the bottle is full, the bottle-filling valve is shut off, and the bottle is removed from the line. Removing the bottle releases the contact sensor, thus enabling the next bottle to be drawn from the bottle supply. The lines can bottle one and two litre bottles. The line are of the same size, and may differ in size from line might be filling one-litre bottles, while another two-litre bottles at the same time. (For simplicity, drawn as if there were only one bottling line.)
bottles filled on one line to line, e.g., one line might be filling Figure 1.3 has been
The control system is required to control the level, the pH, and the tem perature of the liquid in the vat, to release the bottles from the bottle supplies on the various lines, to open and close bottle-filling valves on the various lines, and to display the current status of the installation for a human operator. The above description defines the plant and establishes an imaginary boundary between the control system to be developed and the environment to be controlled. The boundary is defined by a set of signals carrying the values of process variables. There are three continuous state variables in the process: - Liquid level in the vat is measured by a level sensor which yields an electrical signal in the range from 0 V (empty vat) to 10 V (full vat). - pH is measured by a pH sensor which yields an electric signal normalised to the same range from 0 V (pH 1) to 10 V (pH 14). - Temperature is measured by a thermocouple with an output signal ranging from 0 V to 100 mV. There are also three digital signals coming from each bottling line: - Weight is measured by a scale platform which yields a series of pulses with the pulse number being equal to the weight in grams. - A contact of the platform sensor is open when the sensor is released and is closed when the sensor is depressed.
-Rea]-Time Computing
and Industrial Process Automation
9
A input valve
B input valve
^ZKI
TZL
±
Level sensor
Vat
B
-i
PH sensor
C
Vat apparatus
Thermocouple--
D
rui
Heater
>H
-eSa-
Bottle filling valve
ooooooo
Bottling line
Positioning mechanism Scale platform
Figure 1.3: Chemical process
Real-Time Systems
10
- A contact switch is open when the line is bottling one-litre bottles and is closed when the line is filling two-litre bottles. The vat apparatus is controlled by three continuous signals: two settings for the input valves of the ingredients A and B, and one setting for the heater; while each bottling line is controlled by two digital signals: one for opening and closing the output valve, and the other for releasing the bottles from the bottle supply. The total number of continuous signals is six, while the total number of digital and pulse signals depends on the number of bottling lines, and may be, e.g., twenty, providing for four bottling lines to be fed by a single vat apparatus. This proportion is typical for industrial applications: usually the majority of process variables are two-state variables, and only a few of them are continuous. There are several lessons that can be learned from the example. In particular, the role of the components of the control system shown in Figure 1.2 can be illustrated. • The interface between a control system and its environment is complex and consists of a number of signals with different physical characteristics. It is the role of a process interface to convert the signals to digital representation, normalise the voltage levels, and input the values to a computer system. • The required functionality of the control system is complex, as different tasks for monitoring the devices, evaluating the status of the process, adjusting control variables, and communicating with humans must be performed simultaneously. To manage the complexity, all such tasks can be implemented by separate ap plication programs executed by a computer. Planning of the task execution and supporting intertask communication is the function of an operating system. • Process data should be structured before displaying, to help operators in captur ing details of the process status. Graphic capabilities are highly recommended. • Control systems can be extraordinarily hard to test and debug. The obstacles are the complexity of the interface between the system and its environment and the fact that programs can hardly ever be tested in their operational environ ment. This is why specialised utility programs for software development are essential.
1.4
Historical Perspective
The vast majority of the early uses of computer technology in the nineteen forties and nineteen fifties were in numerical computation and business applications. At that time many open-loop and closed-loop control systems were implemented using
Real-Time Computing and Industrial Process Automation
11
electrical and pneumatic analogue devices. The acceptance of the new technology for real-time applications was relatively slow. This reluctance was caused mainly by the fact that the performance demands on the technology (e.g., processor speed and reliability requirements) were much more stringent for typical real-time applications than in the case of other systems. However, as the technology matured, first attempts to computerise industrial, avionics, space and military applications appeared. Early control systems which were developed in the nineteen sixties were domi nated by hardware problems. Technological limitations, such as the requirements for an air conditioned computer room and qualified computer operators, forced rigid cen tralised architecture with a computer and its process interface placed in the centre of a star-shaped network of cabling to the process sensors and actuators. Such systems suffered from extremely high computer and the cabling cost, and frequent failures. The situation changed dramatically in the early nineteen seventies when minicomput ers produced on a large scale (e.g., PDP-8 and PDP-11) became available. The costs of hardware devices dropped rapidly and reliability was increased by many orders of magnitude. Star-shaped cabling organisation was replaced by bus structures with decentralised capabilities for gathering, processing, and outputting process data. Re liable video terminals opened new perspectives for application-driven communication with the computer system. But the technological threshold was not crossed until the explosive growth of mi croprocessor capabilities in the late nineteen seventies. Distributed control systems became economically feasible, providing for extremely high processing capabilities, fast response, and fault tolerance. The sharp boundary between an intelligent com puter and an unintelligent process interface became blurred. Networks consisting of microcomputers, smart sensors and actuators, programmable controllers, intelligent terminals, and communication peripherals created the possibility for local processing of process data. Cabling was simplified by shifting the intelligent equipment very close to the data sources. However, complex hardware architectures still demanded more powerful and sophisticated software support. Control systems became dominated by software problems. Advances in hardware technology were accompanied by continuous progress in soft ware technology. Initially, the development was directed towards operating systems and programming languages. Early real-time applications were developed without any operating systems, and were implemented in assembly languages. However, the need of a clear distinction between application programs and a relatively applicationindependent operating system was quickly recognised. When minicomputers became available on the market, multitasking operating systems offered capabilities for vir tual concurrency and time-dependent task scheduling. In the middle of the nineteen seventies user-friendly systems, e.g., UNIX [4], became more widespread. At the same time a number of high level languages for process control applications had been developed and marketed. Some of them, e.g., Industrial FORTRAN and Real-time
12
Real-Time Systems
BASIC, have evolved from well-known general purpose languages, while others, e.g., Ada, LTR, and PEARL, have been specifically developed (Chapter 9). The advent of microcomputer networks made available technologies that were many orders of magnitude more powerful than those used earlier. Following demand, dis tributed operating systems, e.g., QNX (Section 8.3), and programming languages, e.g., Distributed PEARL (Section 9.2), with provision for physical concurrency and fault tolerance have been developed. Typical applications in process control have been covered by parameterised software packages. Despite these efforts, software technol ogy has not kept pace with the growing demand for new software products. Software engineers have been faced with the problem of devising a methodology suitable for the development of real-time applications. The earliest researches of interest done in software methodology were in the middle of the nineteen sixties and were concerned with a discipline for programming [71]. The methods developed, known as structured programming, consist of a set of guidelines for coding groups of computer instructions. The results relate closely to the devel opment of high-level programming languages. Nearly ten years later the methods for structured design [6] provided a set of guidelines for structuring programs into manageable modules. Following the methodology, software tools for computer aided software engineering (CASE) appeared on the market. Structured programming as well as structured design methods were dominated by an implementation-oriented approach to software development. This was recognised as a bottleneck in software development. It is the problem formulation which should drive the implementation, and not vice versa. Structured implementation and design techniques assumed that programs are de veloped to satisfy well-defined requirements for a problem solution. As the complexity of control systems increased, this assumption became untrue. Hence, the methods for requirements engineering [7] developed in the late nineteen seventies and early nineteen eighties provided rigorous guidelines for formulating problems to be solved by computer systems. Current efforts in software engineering are directed towards integrating the requirements specification, the design, and the implementation into a consistent process, supported by computerised development environments, powerful enough to cover the full life-cycle of real-time software. Table 1.3 shows the milestones in the historical development of the technologies and methods over the past three decades.
Real-Time
Computing
and Industrial Process
Automation
13
Table 1.3: Basic steps in development technology of real-time systems.
Hardware Big centralised computers Star-shaped cabling Primitive terminals Primitive process interface
Minicomputers Multicomputer systems Bus structures Graphic terminals
Microcomputers Distributed networks Intelligent terminals Intelligent process interface
Software No operating system Assembly language
Multi-tasking executives High-level languages
User-friendly interface to system programs Packages of application programs
Methodology Ad-hoc procedures
Structured programming
Structured design CASE tools
Structured analysis Distributed operating systems Distributed real-time languages Full life-cycle software development tools
This page is intentionally left blank
Chapter 2 Conceptual Foundations 2.1
Real-Time System Characteristics
Real-time operation distinguishes itself from other forms of data processing by the explicit involvement of the dimension time. This is expressed by the following two fundamental user requirements, which real-time systems must fulfill: • timeliness and • simultaneity. Many different systems require inputs from multiple sources, and in many cases these inputs may be overlapped in time. However, true requirement for simultaneity usu ally involves correlated processing of data from more than one input within the same time interval. Returning to an example, an aircraft control system may be required to correlate characteristics of position and speed, and simultaneously to control temper ature and humidity in the passenger compartment. The requirement for concurrent processing results from internal characteristics of the problem itself, and is essen tially different from the requirement for overlapping processing of transactions in a multi-user air reservation system. In the latter example simultaneity arises because of the typical implementation strategy employed, and not from the problem require ment: that is only for fast response to all transactions, and not for their concurrent processing. Upon request from the external process, data acquisition, evaluation and appropri ate reactions must be performed on time. In hard real-time systems this requirement for timeliness must be fulfilled even under extreme load conditions and despite errors and failures. This leads to more specific demands for:
15
Real-Time Systems
16 • predictability and • dependability.
Data to be processed may arrive randomly distributed in time, according to the definition of the real-time operating mode. This fact has led to the widespread con clusion that the behaviour of real-time systems cannot and may not be deterministic. This conclusion is based on a conceptual error! Indeed, the external technical process may be so complex that its behaviour appears to be random. The reactions to be carried out by the computer, however, must be precisely planned and fully predictable. This holds particularly in the case of the simultaneous occurrence of several events, leading to a situation of competition for service; it also includes transient overload and other error situations. In such a circumstance the user expects that the system will degrade its performance gracefully in a way transparent and foreseeable to him. Only fully deterministic system behaviour will ultimately enable the safety licensing of programmable devices for safety critical applications. The notion of predictability and determinism appropriate in this context is to be illustrated by the example of a house catching fire: we do not know when it will burn, but under normal circum stances we expect the arrival of the fire brigade, after being called, within some short time. Predictability of system behaviour is, as we have seen, of central importance for the real-time operating mode. It supports the timeliness demand, for the latter can only be guaranteed if the system behaviour is precisely predictable, both in time and with respect to external event reactions. Despite the best planning of a system, there is always the possibility of a transient overload in a node resulting from an emergency situation. To handle such cases, load sharing schemes have been devised which allow tasks to migrate between the nodes of distributed systems. In industrial embedded real-time systems, however, such ideas are generally not applicable, because they only hold for computing tasks. In contrast to this, control tasks are typically highly I/O bound and the permanent wiring of the peripherals to certain nodes makes load sharing impossible. The definition of the real-time operating mode implies the requirement for the dependability of real-time systems, since the permanent readiness demanded of com puters working in this mode can only be provided if the system can preserve its operational capabilities in the presence of failures and despite inadequate handling. This requirement holds both for hardware and software, and is particularly impor tant for those applications for which a computer malfunction not only causes loss of data, but also endangers people and major investments. Naturally, expecting high dependability does not mean the unrealistic demand for a system which will never fail, since no technical system is absolutely reliable. However, using appropriate mea sures, one must strive to make the violation of deadlines and corresponding damages quantifiable and as unlikely as possible.
Conceptual Foundations
17
Some prevailing misconceptions (additional to the ones mentioned in [8]) about real-time systems need to be overcome: neither time-sharing nor systems which are merely fast are necessarily real-time systems. Commencing the processing of tasks in a timely fashion is much more significant than speed. The thinking in probabilistic or statistical terms, which is common in computer science with respect to questions of performance evaluation, is as inappropriate in the real-time domain as is the notion of fairness for the handling of competing requests, or the minimisation of average reaction times as an optimality criterion of system design. Instead, worst cases, deadlines, maximum run-times, and maximum delays need to be considered. For the realisation of predictable and dependable real-time systems, reasoning in static terms and the acceptance of physical constraints is a must — all dynamic and "virtual" features are considered harmful. Maximum processor utilisation is a major issue in the conceptual criteria of classical computer science, and is still the subject of many articles. For embedded real-time systems, however, it is totally irrelevant whether the processor utilisation is optimal or not, as costs have to be seen in a larger context, viz., in the framework of the controlled external process and with regard to the safety requirements. Taking the costs of a technical process and the possible damage into account, which a processor overload may cause, the cost of a processor is usually negligible and, in the light of the still constantly declining hardware costs, in absolute figures its price is also of decreasing significance. (E.g., a one hour production stoppage of a medium size chemical facility due to a computer malfunction and the required clean-up costs some $ 50,000. This is about the price of the computer controlling the process itself. A processor board costs only a minor fraction of that amount). Hence, processor utilisation is not a critical design criterion for embedded real-time systems. Lower than optimum processor utilisation is a cheap price to be paid for system and software simplicity as a prerequisite to achieve dependability and predictability.
2.2
Continuous and Discrete Time
The requirement for timeliness discussed in Section 2.1 can easily be defined in discrete-event systems, where the timing of events is explicitly given. The speed of computer response can be expressed in terms of dependencies between the events which have happened and the events which will happen. E.g., consider a bottling line from the example described in Section 1.3. Whenever a bottle becomes filled, i.e., its weight crosses a threshold value set by an operator, the filling valve must be closed immediately, otherwise the bottle will overflow. The maximum length of the time interval between the event "bottle full" and the event "valve closed" can be precisely calculated. If a one-litre bottle is filled within five seconds and the required accuracy of the filling process is one percent (i.e., the bottle contents should be 1 1 ± 10 ml),
Real-Time Systems
18 controller settings -y0(t)
set point
-y(t)
state variable
Controller
control variable
u(t)
Technical process
disturbances Figure 2.1: Continuous-time process. then the time interval cannot be longer than fifty milliseconds. In continuous-time systems the situation is quite different. State variables as well as control variables are continuous time functions, denoted y(t) and u(t) in Figure 2.1. An ideal controller is an analogue device which reads the functions y(t) and yo(t), and produces the function u(t). There are no distinct time instants or events which could impose a time scale. The requirements for control system performance can be most naturally stated in terms of accuracy: the function u(t) should be chosen to minimise the difference between the actual process state y(t) and the desired (optimal) value yo(t)- Consider once more the chemical process example. The vat level can be controlled by monitoring a level sensor and adjusting an ingredient A input valve. The demanded liquid level in the vat is constant. However, the current level as well as the valve position will change continuously because of varying demands for the liquid from the bottling lines. The better the quality of the control system the smaller the difference will be between the current level and its desired value (set point). The timeliness problem arises because of the inability of digital computers to deal with continuous functions. A computer-based control system does the job by repeat ing periodically, every time interval At, the following three steps: 1. Input: At time instant t the current value y(t) of the state variable is read from the process. 2. Evaluation: A new value of the control variable is computed. 3. Output: At time instant t + r the computed value u(t+r) of the control variable is sent to the process. The output value u(t + T) is maintained through the time interval At, until the next sample u(t + At + r ) is computed and output. As a result, the functions y(t) and
Conceptual Foundations
19
u(t) are approximated by sequences of values: yo,yi,---,yn,---
read at time instants t, t + At, . . . , i + nAt,...
and
u0,ui,... ,«„,... output at time instants t + T, t + Ai + r , . . . , t + nAt + T, . . . There are two delays in these formulae: T — which is an inevitable delay between the input operation and the output operation, and Ai — which is the repetition pe riod of the sampling operation. Both are harmful and limit the speed of the computer reaction to changes in the process state: r, because the process state evolves continu ously, causing the computer response to be always somewhat late with respect to the current process status; and At, because the value un which is maintained through the time interval Ai was computed with respect to the state yn and becomes obsolete as the time interval passes. The problem of timeliness is then equivalent to the problem of how long the delays may become which can be safely tolerated by the controlled process. The time instants of sampling the process state make the time flow discrete. This is the key point of the discussion and it will be developed more precisely using an appropriate mathematical notation. Consider the basic problem of observing a pro cess state reflected by a single state variable y(t). In the simplest case the process behaviour is described by a differential equation:
f =/(».«)
(2-1)
with an initial value y(0) = y0- The analytical solution of the equation cannot be directly employed by a digital computer which is unable to deal with continu ous functions. Instead, the functions must be approximated by sequences of values: u0,it\, ■ ■ ■>un, u n+i • • • and I/O)3/i) • • • > Vn, J/n+i.... A numerical solution can be effec tively computed by converting the Equation 2.1 to the equivalent integral form:
y(t)=
I
f(y,u)dT
Jo
and by changing the differentials to differences and the integral to a sum: n
2/n+l = £ / ( i / n , ' " „ ) A t i=0
The sequence of values J/o, J/i, • ■ •, yn, Vn+i, • • • approximates the function y(t). The accuracy of the numerical evaluation depends on the length of the sampling interval Ai: the shorter the interval, the greater the accuracy.
20
Real-Time Systems
The admissible length of the sampling interval At depends on the dynamics of the process. This can be qualitatively illustrated by the chemical process example. Assume that the vat capacity is about 1 m 3 . If the ingredient A is supplied through a 15 mm pipe (this is a typical diameter of the pipes in a house water supply installation) and three bottling lines are supplied through 8 mm pipes, then the speed of change of the liquid level in the vat is small, and a relatively long interval At is acceptable. But if the vat is fed through a 25 cm pipe and twelve bottling lines are supplied through 22 mm pipes, then the process becomes very fast, and a much smaller sampling interval must be considered.
2.3
Engineering Approach to Hard Real-Time System Design
The design of real-time systems must strive to construct systems for which the qual ity appropriate for safety-critical applications can be guaranteed. The requirements formulated in Section 2.1 represent the goals which should be achieved in the devel opment process. To meet these successfully, the design should be based on a rational engineering approach with clear understanding of major goals, constraints, and risks. There are a number of operations which must be performed by a real-time system. Some of them are periodical and known in advance, while others are related to random events. Thus, the load the system must carry varies with time. At least some of the operations will have deadlines for their execution. To meet deadlines and to fulfill the timeliness requirement, the operations must be carefully planned and a schedule for their execution must be constructed at run-time. However, the best scheduling methods may fail if the system throughput does not match the peak load it has to carry. Therefore, prior to system design, a clear understanding of the peak load the sys tem is expected to handle, is essential. As it is not possible to design, e.g., a bridge without knowing the load it is expected to carry, it is not possible to design a pre dictable real-time system without estimating the load parameters and task execution times. As the load increases, system performance is further affected due to increasing management overhead. When the load and performance curves cross, the system be comes unable to proceed. Nevertheless, its behaviour must still remain predictable, and the degradation of its function must be carefully controlled. Graceful degrada tion (e.g., suspending secondary tasks or diminishing precision) is the only acceptable option in a situation in which overload occurs. Therefore, the design methodology must integrate timing analysis of the system environment and the system itself as early in the development cycle as possible. This creates the basis for the analysis of the scheduling which can verifiably guarantee that the deadlines are met, provided
Conceptual Foundations
21
that the execution times can be met by the implementation in the first instance. The environment of a process control system consists of the process installation and the computer hardware itself. System correctness is defined with respect to assump tions about the behaviour of the environment within which it operates. However, for most real-time systems, acceptable behaviour must be maintained under all possi ble conditions, even when the assumptions about the environment are not fulfilled, e.g., in case of failure in the installation within which the process is running. This requirement for robustness means that the system should be able to detect and re spond appropriately to improper behaviour of the environment. This in turn implies a very strict demand for completeness of the system requirements specification. The specification should include all possible scenarios of events and should clearly define the acceptable behaviour in the presence of failures or inadequate handling, when it becomes impossible to continue achieving the goals of the system. Methods of handling system failures depend on the characteristics of the applica tion. Basically, two different approaches can be distinguished: • Fault-stop. In some applications there exists a safe state which may be entered in the case of a failure or malfunction — e.g., in many robot-based manufac turing processes robot arms can be withdrawn to base positions and all moving tools can be stopped. Although safety is guaranteed by assumption, such a reaction of a control system is often expensive due to production losses. • Fault-tolerance. In applications where a safe state does not exist, e.g., in flight guidance systems, continuous service is essential. However, preserving full oper ational capabilities in the presence of a failure can be achieved only at the price of maintaining redundant hardware subsystems and software modules which, in practice, is extraordinarily hard to obtain and extremely expensive. In most practical cases a merger of the two approaches is employed, which assumes graceful, i.e., controlled, degradation of the system's functionality. In doing so, the individual limitations of a real-time system must be recognised and the risk of its utili sation must be carefully pondered. This approach relies on software rather than hard ware reserves. However, it is clear that a distributed hardware architecture makes the system much less susceptible to hardware malfunctions. Regardless of the approach, clear understanding of the faults the system must be able to handle is necessary. This should be clearly stated in the requirements specification prior to design and implementation. The requirement for simultaneous processing of external process requests can be fulfilled by different implementation strategies, e.g., by allocating a microcomputer to each source of data, or by running sets of tasks on one or more processors controlled by a multitasking operating system, as well as by combining the activities inside
22
Real-Time Systems
one task on one processor. However, clarity and dependability will be much greater when a distributed architecture providing parallel processing capabilities is employed. Therefore, a design methodology should support physical distribution of the hardware as well as the software system components. A key concept of a distributed design methodology is the ability for systematic decomposition and allocation of the re quirements to particular system components to take place. Starting from a complete set of requirements, the system under design should be gradually decomposed into smaller units which can operate relatively independently. To preserve predictability, clear rules for synchronisation and communication must be defined, and the impact of communication overheads on the system performance must be evaluated. In the implementation phase, the potential of the operating system and the language for concurrent and fault-tolerant resource usage must be properly employed. After the system has been designed and implemented, it must be verified against the requirements and design specifications. In the current state of development tech nology, verification of real-time systems is performed by testing, which accounts for approximately half of the total development cost. Therefore, the design methodol ogy must be integrated with a suitable test methodology [9]. This implies that the requirements should be stated quantitatively, and that the hardware and software architecture must make the designed system testable, and preferably easy to test. Adequacy of testing methods contributes greatly to the overall system dependability; hence, one cannot rely on ad hoc procedures. Instead, systematic test metrics and procedures must be employed. Summarising, a rational engineering approach to real-time systems development must address the following list of issues: • Integrated view of the system and the software design, beginning with a com plete specification of all possible scenarios of the system and the environment behaviour. • Performance evaluation based on a clear understanding of the peak load the system must handle and worst-case schedulability analysis prior to design and implementation. • Specification and design methods which can guarantee testability of the require ments specification and the implementation. • Design and implementation methods and tools supporting distributed system design with fault-tolerant characteristics. • Distributed hardware architectures with operating system and language support which can handle simultaneous processing, resource constraints, and deadlines.
Chapter 3 Digital Control of Continuous Processes 3.1
Introduction
In the chemical process example, which was introduced in Section 1.3, among others the vat level has to be controlled. In principle, this is a hydraulic system consisting of a buffer, an input flow and an output flow. The input and output flow rate, and the vat level, are variables which change continuously with time and must, therefore, be modeled by continuous, real valued functions of time. Hence, the hydraulic system itself is a time continuous system. Classically, this kind of systems was controlled using analogue control systems. More and more, control is taken over by digital control systems, using a computer to evaluate the state of the system and to generate accurate control signals. It was pointed out before that this approach has many advantages. On the other hand, in order to enable digital processing of analogue signals, conversions must be made from analogue to digital signals and vice versa, which cannot be done without loss of information. This chapter provides an overview of the theory of digital control systems, which can be used as background information to other subjects in this book. As a full discussion of control systems lie outside the book's scope, the reader is referred to various textbooks on this subject [11, 12]. In the following section, the theory of linear systems is discussed. Section 3.3 deals with the analysis and design of control systems. In Section 3.4 the conversion between digital and analogue signals, consisting of sampling and quantisation, is discussed. Besides the traditional sampling according to the sampling theorem, an alternative method is outlined. Some problems related to real-time control, and possible solutions, are addressed in Section 3.5. 23
Real-Time Systems
24
f(t)
y(t)
Figure 3.1: Model of a physical system.
3.2
Linear System Theory
Many physical systems can be modeled by linear system theory , which describes the behaviour of linear, time invariant systems. This theory proved to be a powerful tool in many engineering applications and provides a formal foundation for analysis and development of control systems. A physical system is modeled by a black box F with input signal x(t) and output signal y(t), shown in Figure 3.1. The system is linear if the following conditions hold for two different input signals xi(t) and xi(t), and a constant X: F ( * i ( t ) + as2(t)) = AF*!(i) =
Fx1(t) + Fx2(t) F(Ax!(t))
.
.
{
}
Time inva.ria.nce can roughly be interpreted by the statement that the system prop erties do not change with time. It is assumed that the system was created at time < = — oo and did not change since its creation. From now on, linear, time invariant systems will be referred to by the term linear systems. Two different subclasses of linear systems can be distinguished: • time continuous systems and • time discrete systems. A time continuous system deals with continuous functions x(i) and y(t), and can be modeled by a differential equation with constant coefficients or, alternatively, by a set of first-order differential equations. Time continuous systems are widespread in everyday life. Various electrical, mechanical, hydraulic, pneumatic, and thermal systems can be regarded as systems of this class and can be analysed appropriately. Consider, e.g., the hydraulic system shown in Figure 3.2, consisting of a vat with input and output pipes. The input flow is (ft. The output flow qa is controlled by a valve with resistance R. The liquid level inside the vat is h. Assuming incompressible liquid, simple mechanics lead to the following expression:
Jh
Digital Control of Continuous Processes
q;
25
►-
ft q0
R Figure 3.2: A simple hydraulic system. (From [11].) where A is the cross-sectional area of the vat. An alternative equation can be derived by introducing the resistance of the valve: ph qo- ~R
L R
where R is the resistance, p the liquid pressure and p the density. Substituting for h gives: d _ _ ARdqo _ CR 1o (3-2) p dt dt where C = A/p is the liquid capacitance. This equation is the standard differential equation of a first order buffering system. In addition, a liquid flow has a certain inertia, i.e., the pressure difference between two points that is needed to cause a flow acceleration of unity. A differential equation can be easily derived by considering the following situation: two equal cross sections of area A at a distance L apart have a pressure difference of Ap. Applying Newton's second law, it follows: A(Ap) = m — Since the flow rate is q = Av and the mass m = ApL, it gives: pLdq Ap =
T^
dq = /
^
(3 3)
-
where I = pL/A is defined as the liquid flow inertia. Combining (3.2) and (3.3), the system becomes a second-order system. In many hydraulic systems the inertia is often neglected.
26
Real-Time Systems
a
)
b
„
0
)
t ->
-5 -3 -4 -2 -1 0 1 2 3 4 5 k ->
Figure 3.3: Impulses: S(t) for time continuous systems (a), and S[k] for time discrete systems (b). Time discrete systems, on the other hand, are systems which evaluate the input and generate output only at equidistant time instants tk, tk+i, tk+i, The value of the inputs between two evaluation times is not considered. Therefore, time discrete systems can be interpreted as systems acting on number sequences x(t*), x(ijt +1 ), Commonly, x(tk) is written as x[k]. Time discrete systems can be modeled by differ ence equations. As an example, the system smoothing the input by a moving average is described by the following difference equation:
y[k] = -(x[k-l]+x[k]
+ x[k + l])
In case the differential or difference equation can be solved, the output of the system can be derived for any input. An important step in this procedure is the concept of impulse response. The impulse response, denoted by h(t) for continuous systems and h[k] for discrete systems, is the output of the system when an impulse S(t) (or S[k], respectively) is applied. The impulses for both system classes are shown in Figure 3.3. In mathematics, 6{t) is known as the Dirac 6-distribution. The impulse response has practical importance in linear system analysis, because if it is known, one is able to find the responses to other signals. In case of continuous systems, the input signal can be decomposed into a sum of rectangular pulses. In the limit case, the rectangular pulses become impulses and the sum becomes an integral: x(t) = /
x(r)S(t -
r)dr
J—oo
According to the first linearity condition (3.1), the output due to a sum of inputs equals the sum of the outputs due to the individual inputs. Knowing the response to a single ^-function, the total output can be regarded as a sum (integral) of impulse responses. This relation between input and output is called the convolution integral: y(t)
= = =
F^x(r)S(t-T)dr) f"
x(r)F{S(t-T))dT
f" x(r)h{t - r)dr J — OO
(3.4)
Digital Control of Continuous Processes
27
In the time discrete case, a similar relationship exists. The decomposition is: oo
x
[k]=
^2 xln] fift ~ n] n=—oo
It can be noted that this decomposition is exact; every bounded number sequence can be decomposed this way. The system output is expressed by the convolution sum: oo
y[k]= Y, x[n]h[k-n]
(3.5)
n=—oo
In many digital signal processing applications, this expression is directly used in the algorithms. Another, powerful description of a system is the so-called transfer function. transfer function for time continuous systems is defined as follows:
*(-) = | g
The
(3.6)
where X(s) and Y(s) are the Laplace transforms of x(t) and y(t), respectively. For time discrete systems, the ^-transform is used, giving:
»M=W>
(3 7)
-
The transfer function is a system representation in the frequency domain, whereas the differential/difference equation or the impulse response are representations in the time domain. The frequency domain has a major advantage over the time domain: instead of differentiation or integration, the response can be obtained by simple arith metic operations. Moreover, the frequency domain is very useful for characterising systems and signals, e.g., a high-pass filter is a filter which suppresses low frequencies and passes high frequencies. There is a straightforward relation between the impulse response and the transfer function: H(s) is the Laplace transform of h(t) or, respectively, H(z), the ^-transform of h[k]. The other direction is also possible, using the inverse transformations. In the example of Figure 3.2 the transfer function:
can easily be derived by taking the Laplace transform of the differential equation (3.2).
Real-Time Systems
28
3.3
Control System Analysis and Design
The main function of a control system in an industrial environment is to support the process in achieving a product with constant high quality at low costs. Analysing and designing a control system, the following topics have to be taken into account: • Controllability. A process is controllable if and only if every internal state can be reached by changing the inputs in a finite time. • Observability. The control system must be able to determine the internal state of the controlled process from the values of its inputs, i.e., the controlled process must be observable. If this is not the case, a so-called observer must be constructed, which can provide the required information [19]. • Stability. The total system must be stable, i.e., a bounded input sequence must never lead to unlimited growth of the outputs. In an industrial environment, stability problems can have major drawbacks. • S y s t e m behaviour. In general, the response of the total system to its ex ternal inputs is specified. The control system must be designed to meet the specification. The first three items are basic requirements for any practical system. The real challenge is to accomplish the desired system behaviour. This topic will be discussed in the remainder of this section. Analysis. When a specific system is developed for a given application, it must satisfy certain requirements which involve the system response. For instance, reviewing the example of Figure 3.2, it might be required that the vat level never exceeds a maximum level, or that after opening the output valve, the vat level is restored at its desired value within a certain time. In order to meet the requirements, a control system must be included, which enforces the proper behaviour of the controlled process. A classification of control systems was given in Table 1.2. Figure 3.4 shows the block diagram of a general closed loop system, consisting of the following signals and blocks: Signals: R(s) B(s) E(s) M(s) C(s) Blocks:
= = = = =
reference input (set-point input) feedback signal error signal actuating signal output signal
Digital Control of Continuous Processes
29
Controllei
~,rJ
ffC-il
n(S)
E{s)
Plant M(s)
Gc(s)
► G„( S )
C(s)
+
B(s) H(s)
.
Figure 3.4: Block diagram of a closed loop control system Gc(s) Gp(s)
= =
H(s)
=
transfer function of controller transfer function of the system to be controlled (the technical process, or the plant) transfer function of the feedback element
It is important to notice that all signals and blocks are represented in the frequency domain. The signals shown in Figure 3.4 can be related to each other by the following set of equations: E(s) = R(s) - B(s) C(s) = G(s)E(s) B{s) = H(s)C(s) where G(s) = Gc(s) Gp(s) is the forward loop transfer function. expressions for E(s) and B(s) to the third equation, we obtain: C{S)
G(s) hence
cw
R R
U)
Substituting the
H(s)C(s)
+ H(s)
= R(s)
and finally: C(s) R(s)
G(S) 1 + G(s)H{s)
(3 9)
"
The equation (3.9) defines the overall closed loop transfer function which characterises the behaviour of the closed loop control system, and particularly the system's response C(s) to an arbitrary input signal R(s). By changing Gc(s) and/or H(s), it is possible to alter the overall transfer function, thus changing the system's behaviour.
Real-Time Systems
30
In this situation the transfer function was obtained by straightforward substitu tions. For more complex system structures special methods exist which lead to a systematic derivation of the transfer function [11]. Design. The goal of control system design is to define the transfer functions Gc(s) and H(s) which can guarantee the desired system behaviour. In the first step, the desired behaviour must be precisely specified. In this step the following three issues are addressed: • Specification of the system's response. This can be done either in the time domain or in the frequency domain. In the time domain, it is often specified what the response to a step-input must be. This response is specified in terms of peak overshoot, rise time, delay time, settling time, damping ratio, and undamped natural frequency. In the frequency domain, one can think of the position of poles in the complex plane, the bandwidth, or the shape of the transfer function. • Specification of the steady state response, i.e., the state of the system after a long time has elapsed. This steady state response is often specified for different inputs, e.g., a step-input or a ramp input (r(t) = ct for t > 0, 0 for t < 0). Evaluation of the steady state response is important for so-called servo systems, i.e., systems whose output must follow certain input signals as for example in a sun tracking system. • Specification of the system sensitivity to parameter variations. A system design is useless if the system response changes dramatically after a slight variation of the parameters, which will always occur in practice. Given the desired system response, the controller Gc(s) can be designed, and in cluded in the feedback loop. In the majority of industrial applications only three types of control are applied: • proportional control, • integral control, and • derivative control. The characteristics of the controller for these three types of control are summarised in Table 3.1, specified in the time domain by the relation between m(t) and e(J), and in the frequency domain by the transfer function of the controller. Using proportional control, the actuating signal is proportional to the error signal. The gain coefficient Kp is to be selected according to the requirements. Often a
31
Digital Control of Continuous Processes
Table 3.1: Types of control. Type of control
Time domain
Frequency domain
Proportional
m(t) = Kp e(t)
Gc(s) = Kv
Integral
m(t) = Ki f e(t)dt
Gc(s) = %
J—oo
Gc(s) =
Derivative
Kds
compromise is reached so that the overshoot and the steady state error both are within acceptable limits. If this is not possible, the controller can be extended with one or both of the other types of control. Integral control strongly reduces the steady state error. For this type of control, the actuating signal is proportional to the integral of the error. Hence, if the error is constant, the actuating signal is linearly increasing, which will reduce the error via the feedback path. The drawback of integral control is that the system will respond slowly to a variation of the error. Moreover, for some gains Ki the system may become unstable. For derivative control, the actuating signal is proportional to the rate of change of the error. This form of control is useful when it is necessary to increase the system damping. It should always be used in combination with other types of control, because the actuating signal m(t) remains zero when the error is constant. The three types of control are often combined into a so-called PID-controller.
m{t) = Kp e(t) + Ki f_
e{t)dt + Kd ^
(3.10)
or, equivalently, in the frequency domain:
Gc{s) = Kp + Kds + ^
(3.11)
s The usage of PID-controllers in different hardware architectures will be discussed in Chapter 4.
Real-Time Systems
Controller
,
e(t)| ^ !
}e'(t) Computer
m*(t)
At ! sampler
Hold circuit
m(t)| i i i i
Figure 3.5: Block diagram of a digital controller.
3.4
Digitising Analogue Signals
In modern control systems, computers play a central role. As already discussed in Chapter 1, the use of a computer has several advantages, but also introduces new problems. In Section 2.2 it was mentioned that a digital computer cannot deal with continuous signals. Since the technical process to be controlled has continuous output signals, the latter have to be prepared for computer processing. This consists of two different actions: • sampling, i.e., converting a time continuous function into a time discrete func tion (which is actually a sequence of numbers), and • quantisation of the amplitude, i.e., rounding the numbers obtained in the sampling phase to the nearest integers. On the other hand, the output of the computer has to be converted back into a continuous signal. This task is carried out by a hold-circuit In this section the theory of sampling and quantisation will be discussed. Ideal sampling. In case of a computer based control system, the controller in Figure 3.4 consists of three parts: the sampler, the computer, and the hold circuit (see Figure 3.5). Assume that the computer is able to handle real-valued inputs, i.e., no amplitude quantisation is needed. The ideal sampler is drawn as a switch, which closes at certain time instants: . . . , t - At, t, t + At, t + 2At, . . . where A/ is the sampling time, and its inverse fa = (At)" 1 is the sampling frequency. The output of the sampler e*(t) is the infinite sequence: . . . , e(t - At), e(t), e(t + At), e(t + 2At), . . .
Digital Control of Continuous Processes
33
Mathematically, the ideal sampler can best be described using a so-called impulse train: oo
imp(t)= Yl £(*-wA*) n~—oo
where 6{t — nAt) is an impulse at t = nAt. multiplication with this impulse train:
Sampling can now be expressed as
oo
e*(t) = e(t) ■ imp(t) =
£
e(nAt)S(t
- nAt)
(3.12)
n=—oo
To gain more insight into the properties of the ideal sampler, the sampler output will be analysed in the frequency domain. The Laplace transform of e*(t) is: oo
E*(s) = £ [e*(t)\ = Y,
e(nAt)exp(nAts)
(3.13)
n=—oo
Although the Laplace transform is expressed as a series, it can be written in closed form for some specific input signals. It can be shown [12] that E*(s) has two interesting properties. First, E*(s) is periodic in s with period / , 1. Moreover, E*(s) can be expressed as: £*(*) = XT £
E(,+j2*f.)
n=—oo
This means that the spectrum of the sampled signal consists of duplications of the original frequency spectrum, shifted in frequency. The properties of E*(s) are illustrated in Figure 3.6. In Figure 3.6.a a signal with a spectrum E(s) limited to the interval [—/mal, / J is considered. According to the first property, E*(s) is periodic with period / , . In Figure 3.6.b we have / , > 2 / m o l : the output of the sampler exhibits no overlapping of its components. On the other hand, if / , < 2 / m a l , overlapping does occur (Figure 3.6.c). It should be noted that an expression for the output of the sampler in the frequency domain is found by Laplace transformation of e*(i). Since this output signal is time discrete, it can be Z-transformed, too. Fortunately, the frequency domain representa tion obtained by the Laplace transform is equivalent to the representation obtained by the Z-transform, which can easily be verified by substituting z = exp(sAt). Hold. Having a sampled signal, the computer can process it and prepare a control signal for the plant. As the plant can only handle continuous signals, a continuous J Note that the * occurring in -E*(J) refers to the complex Laplace variable, whereas the 5 in /, refers to the sampling frequency.
Real-Time Systems
34
L
"%oi
\ U
*max
b)
A~TA /• •f.
A /
\
c)
z:
0
X
1
2
^
f-
Figure 3.6: Spectra of original signal (a), and of signals sampled with / , > 2fmax (b) and with / . < 2fmax (c).
e(t) e(t)
*Jfc—3
tk-2
tk-1
Figure 3.7: Zero-order hold.
tk
Digital Control of Continuous Processes
35
Figure 3.8: Undersampling of a signal, causing aliasing.
signal must be constructed from the samples. This is carried out by the hold-circuit in Figure 3.5. Mathematically, the hold circuit performs an interpolation between the samples. Commonly used is the zero-order hold, which holds the output within the period At at the constant level e[k], which is the value of the sample at time t^ (see Figure 3.7). Other types of interpolation are used as well. Sampling theorem. Intuitively, one would think that information is lost during the sampling process, because the signal value between two consecutive samples is unknown. The problem whether functions exist which can be completely described by their samples has been investigated by numerous researchers. This finally lead to Shannon's sampling theorem [18], which will be discussed now. It was shown that the spectrum of the sampler output is a sum of duplications of the original spectrum, shifted in frequency. A necessary condition for error-free reconstruction is that the components of the spectrum do not overlap, as is the case in Figure 3.6.b. In Figure 3.6.C components do overlap. The upper part of the spectrum is superimposed by a part of the next component. This effect is called aliasing: higher frequencies are folded back into the original spectrum, causing distortion in the reconstructed signal. The effect of aliasing is shown in Figure 3.8. The original signal is a sine-wave with period T, drawn with a solid line. Samples are taken at the rate 0.75T. If one tries to reconstruct the signal, the original sine does not show, but instead a sine with period 3T (dashed). The conclusion is that the sampling frequency should be at least twice as high as the maximum frequency in the signal. Equivalently, given a sampling frequency / , , the signal should not contain frequencies above the so-called Nyquist frequency IN = / . / 2 If the above rule is obeyed, Shannon proved that it is indeed possible to make a perfect reconstruction from the samples. The sampling theorem states that:
Real-Time Systems
36
mi) i
Tf
/max
/
Figure 3.9: The frequency response of an ideal low-pass filter.
// 1. a signal e(t) is sampled by an ideal sampler, 2. the frequency spectrum of the signal is bandlimited, i.e., it has a maximum frequency fmax, and the signal is sampled at a frequency higher than 2fmax, o-nd 3. the signal is observed during infinitely long time, then it is possible to reconstruct the signal from its samples without error, by applying the interpolation formula: ., e(t)=
¥^ / • » sinf2irfat — nw) ._ .. £ e(nAt) } JN > (3.14) n=-«> 2nfNt - nit where e(t) is the reconstruction of the original signal e(t). This reconstruction can be seen as filtering the sampled signal with a perfect low-pass filter, i.e., a filter with a frequency response as shown in Figure 3.9. Practical sampling. Though Shannon's sampling theorem and related work is very often referred to, the practical value of the theorem is limited, because the requirements cannot be met: • A signal is only observed during a finite time interval. From the interpolation formula (3.14) it follows immediately that a finite observation time will cause a reconstruction error. • A real signal is not bandlimited, because a time-limited signal can never be bandlimited [14]. Consequently, aliasing will always occur, regardless of the sampling frequency. • Ideal samplers cannot be realised, because any practical measuring device has a (small) integration time. In practice, sampling is carried out by an analogueto-digital converter, or shortly A/D-converter. This device will be discussed in detail in Section 5.4. In contrast to the ideal sampler, the "switch" is closed during a small time interval, causing a local integration of the signal. If the
Digital Control of Continuous Processes
37
signal is not constant during this integration, the result will be different from that of the ideal sampler. • The reconstruction cannot be carried out, because ideal low-pass filters do not exist. Often, a zero-order hold device is used, followed by a low-pass filter to smooth the result. The practical device is a digital-to-analogue converter, which will be discussed in detail in Section 5.3. It is clear that there is a discrepancy between theory and practice. However, in many applications signals are sampled according to the sampling theorem as if the requirements were met. Usually, a maximum frequency is obtained experimentally, which gives sufficient results in most cases. Local integral sampling. Considering the gap between theory and practice, it is desirable to develop a theory which is closer to the practical situation. The problem can be seen from the following viewpoint: A signal is observed during a finite time interval, say [0, T], and N parameters (the samples) should be extracted in such a way that the original signal can be regained from the samples with minimum error. In signal theory, it turned out that it is sometimes useful to re-formulate the problem, using linear algebra. The set of all possible signals is considered as a vector space with infinite dimension. The N coefficients will span an, at most, TV-dimensional subspace of this space of signals. In this light, sampling becomes an approximation problem [16]. Minimum error is equivalent to minimum distance. The reconstructed signal, which is an element of the subspace, can be written as: N-l
e(*) = ^2 CnWn, c„ 6 R n=0
where the w„'s are basis functions of the subspace. However, the problem is not solved, because the following questions remain: • Which subspace is optimal for a given signal type? • How should the coefficients c„ be calculated? • How large should the number N be chosen? Trying to answer these questions, one should be aware of the fact that the sampling operation is to be carried out by a computer and/or an analogue device. In order to minimise the processing expense, some additional requirements are introduced: • The sampling operator should be linear (in the mathematical sense). Only in this case can linear operations on analogue signals be approximated by linear operations on the sampled signals. Existing (linear) continuous time control strategies can then be directly carried out by the computer.
Real-Time Systems
38
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2
T
i
i
-1.5
-1
-0.5
0
i
i
r
0.5
1
1.5
2
Figure 3.10: The second-order B-spline. • Every sample should be obtained using the same procedure, i.e., the sampling operator should be invariant to time shifts. • In order to operate in real-time, the sampling operator should be causal, i.e., future values are not to be used. • The parameters should be calculated at equidistant points in [0, T], otherwise additional timing data have to be provided. In many approximation problems, polynomial splines give good results. Therefore, a space of polynomial splines is used as a subspace of the signal space. An appropriate set of basis functions are B-splines as defined by the following equation:
,
,x
! "Sri, ^k(m
W ) = ^ £(-!)*( in which:
+ l\f k
m+1
^
)(' + — - * ) + . « € R
(Xr = I *m x - ° (X)+
,\ra
\ 0 x < 0 and where m is the degree of the spline. The B-spline of degree 2 centered at 0 is shown in Figure 3.10. Figure 3.11 shows how a function is represented in a first-order spline space. In case the coefficients equal the function values at the evaluation points, this representation is known as linear interpolation. Splines have some advantageous properties. First, they are easy to evaluate, because every spline consists of polyno mial pieces. For the same reason, splines are easy to differentiate, which can be very useful in applications were the sampled signal should be analysed for the location of peaks, etc. Moreover, B-splines have a compact support, i.e., the interval where the function is non-zero is compact. Owing to this compact support, the influence of erroneous samples remains local.
39
Digital Control of Continuous Processes
Figure 3.11: Representation of a signal with respect to a basis of first-order B-splines. In standard sampling theory as well as in spline approximation theory, function val ues are used as coefficients. However, it was stressed before that a practical measuring device cannot take point measurements, but will always locally integrate the signal. This can be carried out explicitly, using the following equation for the coefficients: /•(n+l)A(
Cn = I
(3.15)
e(t)dt
where again At = T/N is the sampling time. This has the advantage that theory conforms with practice. Moreover, integration inherently cancels out high-frequency noise and spikes, thereby increasing the robustness of the sampling operation. It can be shown that, if the integration length is chosen to be a multiple of the powernet cycle, the sensitivity to noise at multiples of the power frequency is reduced to zero [17]. Combining the local integral method (3.15) with a B-spline representation, the following reconstruction formula is obtained:
8
=
'„
I1
/-(n+lJAt f\n+l)at
« JL2SjL
m+V
*>*■*»>
(3.16)
The m basis functions occurring additionally in this representation are necessary to achieve a uniform approximation quality on [0,T]. The corresponding coefficients must be calculated from the signal outside of [0, T] or by other methods. It was shown that the reconstruction is a good approximation of the original sig nal [15]. For this reason and the noise suppression properties, this local integral sampling is a serious alternative to the traditional method. Quantisation of the signal value. Quantisation is a mapping of the infinitely many values of the input signal x(t) into a finite number of output values y(t). The
Real- Time Systems
40
y< ya V2
yi y
^
"^i
x£
~xTx
Figure 3.12: An example of a (uniform) quantisation characteristic, K = 5. corresponding operator is non-linear. The relation between input and output is given by the quantisation-characteristic. An example of such a characteristic is shown in Figure 3.12. The x-axis is divided into K intervals, separated by K + 1 decision levels Xk- The output is restricted to one of the quantisation levels. If x 6 [xk,Xk+i) the output becomes y*, k € { 0 , 1 , . . . , K}. If all intervals have the same size, the quantisation is uniform. From the figure it is clear that the output signal cannot be larger than the largest quantisation level. For inputs higher than XK, the quantiser becomes overloaded. The mapping is irreversible, i.e., during the quantisation operation information is lost. One can define the quantisation error by n{t) = x(t) - y(t) This error is completely determined by the momentary input x(t). The error intro duces distortion, which is denoted as quantisation noise or quantisation distortion. It is convenient to model the noise by an additive source, independent of the signal (see Figure 3.13).
*(t) t ( j) y{t) Figure 3.13: Modeling the quantisation noise by an independent additive source.
Another type of error, which can occur in practical devices, is caused by dissimi larities between the specified quantisation characteristic and the real characteristic.
Digital Control of Continuous Processes
41
Decision levels may be shifted or, in case of a uniform characteristic, the interval size may vary. In these cases, the quantised value is different from the expected one. In practical (uniform) quantisers, this type of error is referred to as the non-linearity error. If a digital controller as in Figure 3.5 is included in a closed loop system, the quantisation error cannot be neglected. Systems which are designed to have certain characteristics can show a different behaviour, or even may become unstable, due to the quantisation error. For a comprehensive discussion of quantisation, see for instance [13]. A practical analogue-to-digital converter carries out both sampling and quanti sation. The output is a word of n bits, giving 2" quantisation levels. The type of quantisation is uniform.
3.5
New Developments
In the previous sections, the use of computers in a control system was discussed. The introduction of a digital system inside the path of a closed loop system implies two data conversions, which were previously denoted as sample and hold, respectively. This has a major drawback: sampling inhibits real-time control of a system. If a sampler is included into a control system, the actual state of the system is unknown between successive samples. Of course, the sampling time should be chosen in such a way that the actual state can be estimated from the samples with small error. However, this cannot always be achieved, because the computer needs the time between two samples to generate its actuating signal. In other words: there is a minimum sampling time, which equals the execution time of the control algorithm. In case the required sampling time is smaller than the execution time, either the control algorithm must be simplified in such a way that it takes less time to process, or a faster computer must be used. Both actions have consequences. Reducing the complexity of the control algorithm will, in general, affect the system performance. If, for instance, integer arithmetic is used instead of floating point arithmetic, the quantisation error will increase and the system response will change. Use of a faster system will certainly affect the costs and is restricted by the available technology. In general, a control system supervises more than one variable. In Section 4.2 the structure of a centralised control system will be discussed. In such a system, the controller generates an output for every variable to be controlled. These actuating signals are based on the complete state of the system, i.e., the ith actuating signal mi is a function of all error signals eo, ...,e„ and reference inputs r0,...,rn. In a digital controller, the error signals are sampled signals. In order to optimise the overall system performance, all signals must be sampled synchronously. In a small
42
Real- Time Systems
Figure 3.14: Incorporating the delay time by extrapolation of the sampled input.
system, this can easily be done by providing a central clock. If the physical system is distributed over different locations, synchronisation is more complicated, due to different delay times in the signal lines. External synchronisation with a precise clock, e.g., a satellite clock, can solve the problem. Samples are then taken at predefined times. It is easy to supply the computer a priori with the timing information of the different variables. This concept is, as far as we know, not yet adopted in real-time control systems. Another important topic, which was not discussed so far, is the influence of the delay caused by the execution of the control algorithm on the performance of the system. Classically, a delay element is introduced in the loop to model this delay time. However, the situation has not improved, because the standard analysing tools are no longer sufficient for systems with time delay. To analyse systems of this type, the so-called modified Z-transform is defined, see [12]. It should be emphasised that a delay can cause major changes in the system response. Systems may become unstable when the delay is not properly modeled. The problem can also be addressed in a different way. Instead of locating the delays and model them, delays can be incorporated in the input signals of the con troller. Suppose, for instance, that a signal is sampled with sampling time At, and the execution time of the control algorithm is T, see Figure 3.14. Suppose the last sample was taken at time tk- In a delay free system, an actuating signal is generated at time tk, whereas the computer will generate its response not before time tk + r. The actuating signal is based on the last sample and a number of previous samples, the number depending on the order of the controller.
Digital Control of Continuous Processes
43
If this output signal was present in time, the controller could be considered as delay free, which would simplify analysis and design. In order to achieve this, the computer should start executing its algorithm at tk — r . In this case the computer must know in advance e[k], the value of the signal at (tk), a demand that cannot be satisfied in real-time. Therefore, the required value has to be extrapolated from the past samples in some way.
This page is intentionally left blank
Chapter 4 Hardware Architectures 4.1
Classical Process Automation
The classical approach to the implementation of a control system is based on the concept of separated control loops. The key point of the concept is an attempt to decompose the system into a set of independent clusters (loops), each of which is related to a single state variable and a single control variable. Particular state variables are controlled by corresponding control variables within the loops. To approach this abstract concept, consider the example of the control structure of the simple chemical process described in Section 1.3. The requirements expressed in the problem statement can be accomplished by the following four control loops (Figure 4.1): 1. The vat level is controlled by monitoring the level sensor attached to the vat and adjusting the ingredient A input valve accordingly. 2. A constant pH is maintained by monitoring the pH sensor attached to the vat and adjusting the ingredient B input valve. 3. The liquid temperature is controlled by monitoring the temperature sensor at tached to the vat and adjusting the heater. 4. On each bottling line, the filling valve is opened when the sensor attached to the platform is depressed and is shut off when the weight of the bottle crosses a threshold value set by an operator, according to the bottle size. Positioning a bottle on the platform is accomplished by an autonomous mechanism which is triggered by a release signal from the sensor attached to the platform. 45
Real-Time Systems
46
A input valve
B input valve
Figure 4.1: Control system of a chemical process.
Hardware Architectures
47
Vo(t)
3,(i)1
—»
J
u=f(!/,3fo) "=M.y,w
|
-«( 2/2) •••■ !/n - state variables ui, U2,..., tin - control variables Figure 4.3: The structure of a classical process automation system. (Section 15.1). Unlike the continuous case, the mapping between input and output signals cannot be unified and must be individually designed for each application. More generally, classical process automation systems have the structure as shown in Figure 4.3. The important structural properties of such systems are as follows: • The system is inherently distributed and splits into independent control loops with very limited information exchange between the loops. • There is a very clear assignment of data processing devices (the controllers) to particular subprocesses of the controlled process. It can be noticed that both properties refer to the hardware structure of the control system, and do not depend on particular control algorithms. The advantages and disadvantages of the classical structure can be derived from the above properties as follows: The advantages: • The distributed nature of the system contributes to increased reliability, as some local faults can be safely handled by other controllers. Consider, e.g., what will happen if the level controller in the bottling system fails. The level of the liquid in the vat will begin to vary, but the pH and the temperature will still be maintained properly. If the level drops, the bottling lines will be stopped, because the weight of empty bottles will not cross the threshold values. In the opposite case, the excess of the liquid will flow out through a safety-pipe. Although some production losses are possible, safety as well as quality of products are preserved. • Clear assignment of the controllers to subprocesses contributes greatly to en hancing system maintainability, since local changes to the process or the process
Hardware Architectures
49
installation have only local effects on the control system. If we decide, e.g., to add a further bottling line to the vat apparatus, then only a single additional logic controller must be added, but the rest of the control system need not be affected. The disadvantages: • The concept of separated control loops can be effectively applied only in those applications which can be decomposed into a set of independently controllable loops. • Lack of cross-coupling between separate control loops makes the global opti misation of the control algorithm impossible. Consider once more the example chemical process. The rate of reaction between the ingredients A and B depends on the liquid temperature. This means that the temperature should depend on the flow of liquid through the vat. When the flow is slow, the temperature can be decreased to economise the heating costs. When the flow is fast, the temper ature should be raised to ensure completeness of the reaction. This rule cannot be implemented without a global co-ordination between separate control loops. The classical architecture and the PID regulatory control technique were developed before computers were widely applied to industrial applications. Controllers were im plemented using analogue devices, e.g., transistors and operational amplifiers. The limited processing capabilities of the controllers reflect the possibilities of the tech nology available at the time. Despite these limitations, PID-controllers became a standard in industrial automation. Currently, controllers are implemented as specialised microcomputer systems that preserve the functionality of earlier analogue devices. A controller contains a central processing unit, a set of analogue input (y, yo) a n d output (u) circuits, a serial interface to a host computer, and an operator panel attached to the device. There are keys on the operator panel to enter the settings (including yQ), and there is an LCD display which shows the current value of the controlled state variable. The settings can also be provided by a host computer through a serial link. The requirement for displaying the current status of an installation to a human operator is usually fulfilled by gathering all controllers inside a control r o o m . All sensors and actuators which are mounted directly at the installation must be con nected to the control room. The controllers are arranged in a logical order and are mounted in such a way that their displays are parts of a schematic drawing of the controlled installation. This enables the operator to follow with ease the scheme and the status of the installation.
Real-Time Systems
50
■s
(—
Central computer system
k.
Mass storage
Operator station V
\
Process interface i
Vi
V2
L
Vn
■
Ul
•U2
''
1Hn 1
'
Technical process !/i>y2. •••> J/n — state variables «i, U2,..., Un — control variables Figure 4.4: The structure of a centralised control system.
4.2
Centralised Direct Digital Control
The extended processing capabilities of contemporary computers remove the need for decomposing a control system into separate control loops. On the contrary, mon itoring of all state variables, as well as adjusting of all control variables, can be accomplished by the same computer system. This enables the information concern ing particular subprocesses of the controlled process to be shared and creates the possibility of global optimisation. The structure of a centralised control system is shown in Figure 4.4. The system consists of a computer, a process interface, and an operator station. The process interface and the operator station are connected to the computer as special purpose peripherals. Other peripherals can include mass storage devices, auxiliary terminals, printers, and communication modems. The signals from process sensors and actuators are connected to the process interface and can be accessed by computer software. The software reads the values of the state variables at regular intervals, calculates new values of the control variables, and outputs the values to process actuators. This applies to continuous as well as to two-state process variables. In particular, the computer can perform the tasks of logic and continuous controllers. In the latter case, data processing need not be limited to the PID-algorithm. On the contrary, advanced
Hardware Architectures
51
control and optimisation methods can be implemented. In industrial applications with both types of signals, the continuous and binary subsystems need not be separated but can influence each other, thus enhancing the quality of control. It should be noted, however, that unlike a typical PID-controller which is a ready-made device, computer software must be specifically designed for each particular application. As an example, consider the chemical process and the problem of adjusting the vat temperature in relation to the flow of liquid through the vat. A computer control system (Figure 4.5) can maintain the temperature according to the PID-algorithm. But instead of having the required value y set constant by an operator, the computer system can calculate an optimal value depending on the average value of the flow. This cannot be measured directly, but can be calculated as a sum of outflows to the bottling lines. The outflow to a particular line is the product of the bottle size, which can be read from the contact switch, and the number of bottles filled in a time unit, which is equal to the number of pulses issued by the platform sensor. Centralised control systems can be effectively applied for controlling discrete-event installations, like a train signaling system, which cannot be decomposed into sepa rately stabilised loops. The concept of a control room, outlined in the previous section, remains unchanged: separate controllers are replaced by a computer with a process interface and an opera tor station. However, communication between the control system and an operator can be reorganised, as the processing capabilities of the computer enable a more detailed and, at the same time, more user friendly dialog. The data relevant to the current process monitoring and maintenance can be displayed in the control room either on an installation scheme or on an appropriate video display terminal. Process data needed for reporting and planning purposes can be made accessible to plant management by means of remote video terminals and hard-copy devices. Process maintenance documentation can be automatically created and archived on mass storage devices. The advantages and disadvantages of centralised systems can be derived from their structural characteristics as follows: The advantages: • Centralised process data acquisition does not create barriers to information flow inside the control system so that global optimisation of the controlled process may be achieved. • Computer-based man-machine interfaces create the possibility for flexible and intelligent communication between the control system and the human operators. The disadvantages: • The reliability of a centralised control system depends on the reliability of a
52
Real-Time Systems
Figure 4.5: Centralised control of the chemical process.
Hardware Architectures
53
single computer; in the case of computer malfunction a total system crash may occur. • The large number and diversity of tasks performed by the control computer results in enormous software complexity and makes the design, the verification, and the error tracing extraordinarily difficult. It should be strongly emphasised that the differences between classical and cen tralised architectures refer to the physical hardware structure, and not to the logical structure of a control system. Software programs can simulate the parallel opera tion of a number of PID-controllers inside a single computer. This approach allows decreased complexity of the controlling software and makes possible rapid develop ment of new applications. In fact, such software structures are widely applied and are offered by many vendors in the form of ready-made parameterisable application packages for process monitoring and regulatory control. In general, centralised systems do not fulfill the dependability requirement dis cussed in Chapter 2, and are reluctantly applied in the industry for controlling large installations. Particularly, they are hardly ever employed in continuous process con trol applications, and only in those cases when the control system cannot be decom posed into separate loops or when the system operation is not safety-critical. However, the ideas of centralised systems are employed in programmable logic controllers, numerical machine tool controllers, and robot controllers used for dis crete manufacturing automation. The basic component of such installations, the programmable logic controller, is a small factory-hardened computer providing inter faces for a large number of two-state signals and a limited number of analogue signals. The primary task of the controller is to control tool movements and operations. The task is described in terms of logic functions and machine sequencing and cannot be decomposed into independent loops. An important application area of centralised systems lies in process monitoring and laboratory experiment automation.
4.3
Redundant Configurations
The methods of increasing the reliability of control systems are based on the idea of having back-up modules in the most critical places of a system. If a primary module fails and can no longer operate properly, its functions are taken over by a back-up module. In a centralised computer control system the most critical element is the computer itself. In this light, an architecture with a redundant stand-by computer has been developed (Figure 4.6).
54
Real-Time Systems
Primary computer
Stand-by computer
Process interface
Process interface
,
L
Switch
*
Cabling
•'
r
Sensors 1 Actuators 1 Technical process
Figure 4.6: Centralised control system with a stand-by computer.
There are two computers and an output switch in the system. Both computers read the same process data and execute the same programs. However, only one of them sends control signals to the controlled process through the output switch. If that computer fails, the switch operates and the secondary computer takes over. It should be noted that a stand-by computer does not increase the overall system throughput as it does exactly the same processing as the primary computer. Although simple in principle, the idea of duplicating a control system is extraor dinarily hard to implement and the costs of implementation are very high. A closer look at Figure 4.6 reveals that there are a number of potential sources of malfunctions in the system. These are: - sensors and actuators, - cabling (usually a cable does not "fail", but it can be torn open or cut — acciden tally or maliciously), - switch, - process-interface, - central computer, - auxiliary elements, e.g., power supplies or cooling systems (if required). All such elements must operate properly in order to guarantee proper operation of the control system, and each malfunctioning device can cause a process break-down. To gain a real increase in system reliability, all the elements, or at least all safety-critical elements, have to be duplicated. However, neither the switch nor the actuators can be
.Hardware Architectures
55
duplicated. It is not feasible, for example, to back-up a valve by mounting a second one on the same pipeline: if they are mounted one after the other on the same pipe, then closing one of them closes the pipeline definitely; otherwise, if they are mounted in parallel on two branches of a pipeline, opening one of them opens the flow through the pipeline on a permanent basis. The presence of an output switch creates the problem of controlling its position. It is obvious that the switch cannot be controlled by the primary computer, since switching is accomplished when it fails. But it cannot be controlled by the stand-by computer either: if the switch were controlled by the stand-by computer, it could be switched as a result of a malfunction of this computer. It should also be taken into account, as noted, that the switch itself can fail. In industrial practice the configuration with a stand-by computer is not recognised as cost-effective and is seldom used. In the rare instances of employing a stand-by computer, the switch problem can be solved in one of two ways, depending on the application characteristics. One method applies to those applications in which the required speed of computer response is relatively slow, and the switch can be operated manually by a human operator who is expected to recognise a computer malfunction. Otherwise, a special switch controller is employed which monitors both computers — e.g., receiving prompts from each of them at regular intervals. If a prompt does not come in time, then the corresponding computer is regarded as malfunctioning. It can be seen that the methods are quite safe in that a switch controller malfunction is not dangerous, provided that neither primary nor stand-by computer has failed. As far as the switch is concerned, its reliability is assumed to be higher than the reliability of the computers. An alternative approach, which is useful when a system can be split logically into a number of control loops, is to back-up particular loops rather than the entire control system (Figure 4.7). The approach is very flexible, as not all tasks of the system are of equal relevance. Usually some of them can be temporarily suspended when a failure occurs. Basically, the technique can be applied to continuous as well as to two-state control loops; however, it is used mostly for continuous loops, in which case PIDcontrollers can be employed. Occasionally, not only the controller but also sensors and cables are duplicated. Switching a particular loop over can be accomplished manually by an operator, or automatically by a special switch controller. The idea of having back-up loop controllers is not limited to centralised com puter systems, but it can be applied in any other system architecture as well, e.g., the classical one. It has been proved cost-effective and is frequently applied in industrial practice. A special case in this discussion is that of space and military applications, e.g., for air defence or ballistic missile control, where the highest reliability is required, and the costs are considered a less important factor. The demanded reliability can
Heal-Time Systems
56
Central computer
Process interface
Controller 1 Vi
Controller k ui
Vk
«*
Technical process V\i —i Vk, Vk+i,-,yn — state variables ■Ui, ...,Uk,Uk+\, ...jiim — control variables critical control loops Figure 4.7: Back-up loop controllers.
Vk+i
■Urn
Hardware Architectures
57
Computer 1
Computer 3
Computer 2
..
i
,
1
1 Voting logic
1
'
Sensors
Actuators Technical process
Figure 4.8: Parallel computers with voting. be achieved by applying three or more independent computers operating in parallel (Figure 4.8). All computers receive the same data from a common environment and must respond to the same requests. The responses are compared and the one supplied by the majority of computers is chosen by a voting logic. The voting logic is a critical part of the system and, therefore, it is usually implemented using special fail-safe techniques and technologies. It is worth noting that the computers in a voting system need not be of the same type. They only have to implement the same behavioural functionality, while the details of the hardware as well as the software implementation can be different. This enhances the system immunity to design and coding errors. Parallel systems with voting are seldom employed in industrial process control applications, particularly in continuous control. However, they can be more naturally applied within discrete-event environments of command-and-control applications. So far, the discussion in this section has focused on the methods for providing a continuous control system service in the presence of a failure. Implementations of this fault-tolerant approach have turned out to be difficult and expensive. The second approach is to stop the controlled process in a safe state and at minimum costs. This corresponds to the fault-stop approach outlined in Section 2.3. As an example consider the chemical process shown in Figure 4.5. According to the process description given in Section 1.3, the most critical part of the control system is the temperature control loop, as overheating may bring the process to the point of explosion. This may be prevented by an additional safeguarding circuit consisting
Real-Time Systems
58
of a threshold temperature sensor and an additional heater switch (Figure 4.9). The switch is on if the temperature is below the threshold value, and is off when the temperature exceeds the threshold level. Switching the heater off can be accompanied by an alarm signal to a human operator. Usually the off state is permanent and can be reset only by manual operation. It should be clear from the above example that a safeguarding circuit is not a redundant control loop. In particular, it does not take over control functions in case of a controller malfunction. Instead, it stops the process as a last resort, to prevent serious damage and casualties. This can be compared to the role of electric fuses in an electric power supply installation. It is also worth noting that a safeguarding circuit is two-state in nature, and that it can be implemented using binary signals handled by a logic controller. The use of safeguarding circuits has been generally accepted and has become a part of control systems technology. The technique is compatible with other methods for increasing system dependability and can be used in any control system architecture.
4.4
Multi-Level Control Systems
The advantages and disadvantages of the classical and centralised structures can be balanced within the framework of a two-level hierarchical architecture (Figure 4.10). The lower level consists of local controllers for particular control loops, and resembles the classical process control structure. The higher level consists of a central computer which supervises the controllers of the lower level. The main criterion for decomposing a control system into two levels is the urgency of tasks which are performed on each level. Usually the time-critical tasks of current control are performed by controllers of the lower level, while the higher level is responsible for long term optimisation, and communication with human operators. Most of the safety-critical functions are delegated to the loop controllers. However, the central computer monitors the process status and informs the operator when a dangerous situation occurs. The distributed hierarchical structure matches the structure of the controlled pro cess: separate objects of the process installation are controlled by separated control loops, while global information processing and displaying are centralised in the main computer system. In industrial applications for continuous process automation, the lower level consists of PID-regulators which stabilise state variables in the control loops. The required values, as well as other regulator settings for particular con trollers, are determined by the central computer. This is why the structure is widely known under the traditional name of a set-point control system. Modern PID-controllers are microcomputer-based devices which contain a serial interface for communication with a host computer. All of the settings, including
Hardware Architectures
Central controller
B input valve
A input valve
1EL
EE 'u2(t)
«i(0
C
nW
»(0 u 3 (t)
-&k
Vat
D
n_n Heater
»(*)
Q
X
£r3Switch on/oiF
; Platform
uuuuoouu
< Scale
Figure 4.9: Safeguarding circuit.
Real-Time Systems
60
"N
r
J
r
Mass storage
A
Controller
1
i
k
^
llfeO
|yio
1 U 1 I
[VnO
Controller 2 A
y2\
Supervising computer
[Operator] I station j
Central computer
I
\u2
Controller n
Digital links Local controllers
A
ki
Analogue wiring
r
Technical process 3/1»V2, • • •, Vn — st ate variables t*i> U2t..., ti„ — control variables Figure 4.10: Set-point control system.
the demanded value y0i can be sent through this interface from the host. In the opposite direction the controller sends consecutive values of a state variable y read from a process sensor. It can be seen from the above description that there are two different communication interfaces in the system. The communication between process sensors and actuators, and the local controllers is based on the transmission of analogue signals, while the communication between the local controllers and the central computer uses digital transmission media and protocols. There are no direct connections between local PID-controllers. The classical set-point architecture of Figure 4.10 is too restrictive for contempo rary industrial applications. The reasons are twofold. First, the majority of process variables are in practice two-state and not continuous, and second, there are usually many more process variables which are to be monitored by a control system than control variables which are to be adjusted. Therefore, the idea of a two-level control system architecture has been significantly extended to incorporate data acquisition stations and programmable logic controllers in the lower level. For example, in the chemical process of Figure 4.9 the system should, in fact, monitor six continuous variables: two flows of ingredients A and B, liquid level in the vat, heating power, temperature, and pH, while controlling only three of them. The additional data can be displayed to help an operator in making decisions, can be stored on disks as a pro cess maintenance documentation, and can be used by computer programs for process optimisation.
Hardware Architectures
61
It can be seen that there is no centralised process interface in the system architec ture. Instead, process interface capabilities are distributed among the low level local controllers. The idea of a control room is still supported, as a high level computer together with an associated operator station and other peripherals constitute a monitoring and controlling center. However, the physical placement of the hardware within a plant can be distributed, in that local controllers can be placed inside the control room or outside of it, close to the controlled objects. The latter option shortens the analogue communication lines inside the control loops and is better for two reasons. First, because analogue signals are more susceptible to disturbances than digital signals. Second, because the communication inside a control loop is more safety-critical than the communication between the loop and a central computer. Unfortunately, spreading local controllers within a plant is difficult and expensive. Here the main problem is that the environment of an engine room is not appropriate for computer-based equipment. The obstacles include dust, high level of electromag netic noise, possibility of mechanical damage, and sometimes corrosive atmosphere. The situation is even worse if the plant is spread out of doors (e.g., in the case of railway systems and refineries). To adapt the computer equipment to industrial con ditions, special mechanical and electrical construction is required. This increases the costs of the control hardware significantly. The problem is difficult and has no unique solution. In each particular case the reliability requirements and budget limitations must be carefully balanced. In fact, there are no reasons to limit the control system layering to exactly two levels. In vast plants more levels may match the decision structure of the enterprise in a better way. Consider, e.g., a train signaling system. At the lowest level of the structure there are local controllers for particular devices, which are supervised by a control system for a railway station. Station systems can be co-ordinated for a high throughput main line or within a region by a supervisory regional computer. Regional computers can be connected within a state to the central computer placed at the top of the structure. Similarly, at least three levels are required in a refinery: local controllers for regulatory process control, local control rooms for particular distillery columns, and a central facility for the whole distillery installation (Figure 4.11). It should be noted that both examples given above emphasise the physical distri bution of the control hardware: local (station) control rooms are located close to the controlled installation, while the central facility is located in departmental (regional) headquarters. In fact, one of the main reasons for employing architectures of more than two levels is the need for cost-effective communication between a controlled plant and the remote decision center. Hierarchical hardware architectures have been proved successful in the majority
62
Real-Time Systems
Central dispository
Planning svstem
, '• Control room 2
'' Control room 1
>
■
Control room 3 ]
£1 SL £1 M i Subprocess u ! 1 !
Subprocess ,_ 2
]
Process supervisors
Local controllers
Subprocess ,
:
Technical process Figure 4.11: Three-level control system structure. of industrial applications. In most cases they can benefit from the advantages of classical as well as centralised structures. • The distributed nature of a system contributes to increasing reliability, as some local faults can be safely handled by other controllers and/or the central com puter. • Clear assignment of the controllers to subprocesses contributes to enhancing system maintainability, since local changes to the process or the process instal lation cause only local effects to the control system. • Centralised processing of the global data enables global optimisation and su pervision of the controlled process. • Computer-based man-machine interfaces create the appropriate environment for flexible and intelligent communication between the control system and the human operators. It can be seen that the reliability of a multi-level system can be further increased by applying the methods described in Section 4.3. The disadvantages of the multi-level architecture are the following:
Hardware Architectures
63
• A decrease in speed caused by communication delays and a managerial overhead inevitably connected with each level of data processing. • Lack of direct information flow between controllers (computers) within each level, thus diminishing system flexibility and creating unnecessary design con straints. It is important to notice that the above list of advantages and disadvantages refers to the physical hardware structure and not to the logical structure of a control sys tem. Hierarchical structuring of system functions and of software programs has been recognised as an effective method of handling the design complexity and is frequently applied within centralised as well as distributed hardware environments.
4.5
Network-Based Distributed Systems
Multilevel architectures considered in Section 4.4 have been devised to support par ticular logical architectures. This approach creates a rigid relationship between the physical (hardware) and the logical structures, and imposes unnecessary constraints on system and software design. This observation relates particularly to the com munication system which enables only "vertical" information flow between different levels of the architecture, while providing no information flow between controllers (computers) within a particular level. The current technology of high speed communication removes barriers to informa tion flow within a system and enables efficient implementation of an arbitrary logical architecture within the framework of a computer local area network (Figure 4.12). A network consists of a communication medium, i.e., coaxial cable or fibre optics, a network interface, i.e., transmitters and receivers, and a number of network sta tions, which can be chosen from process data acquisition units, PID-controllers, pro grammable logic controllers, specialised controllers (e.g., for robots and numerically controlled tools), general data processing computers, storage devices, video display operator stations, and communication peripherals. Communication within a network is based on messages carried through a cable. The cable connects the network stations and imposes, theoretically, no limitations on communication between the stations: in principle each station can send a message to each other station. Extremely high transmission speed, ranging from 200 Kbits/s up to 100 Mbits/s, provides high network throughput and enables effective handling of heavy communication loads. More details on network communication systems will be given in Chapter 6. The network approach opens new perspectives for control system design. The assignment of tasks to network stations is only partially determined by the permanent
Real-Time Systems
64
Other networks
Figure 4.12: Local area network based control system. wiring of the peripherals, as the extremely high speed of data transmission enables broadcasting of process data through the network. This gives the designer great freedom in balancing performance and safety requirements with budget limitations. Multi-level logical architectures can be easily implemented by spreading control and supervisory tasks among a variety of stations within a network. Different schemes of redundant architectures can be mixed within the same network. Consider Figure 4.12, which gives more details. Apart from the network cable, there is separate analogue wiring from process sensors and actuators to the stations of the control system. The physical span of each connection depends on the location of the stations within a plant. As in the case of the two-level architecture, the stations can be placed inside a control room, or outside of it, near to the controlled process. The advantages of the latter option are significant: elimination of analogue transmission over long distances, decreasing cabling costs, and increasing flexibility. Its disadvantages are the much higher costs of the control hardware, which must be suitable for operation in a rugged industrial environment. To balance the needs with the costs, the concept of a network hierarchy has been invented (Figure 4.13). There are field-busses at the lowest level of the hierarchy, which couple local data acquisition stations, smart (intelligent) sensors and actuators, and local controllers from within particular work cells to work cell controllers. The
Hardware Architectures
65
Control room(s) Process
Field bus
1 1
1
1 1
1
1 1
1 Engine rooms
1 1
Su bp rocess 1
Su bp rocess 2 1fee hnical pi oc ess 1 1
1 1
1 1
1 1
Su bp rocess 3
1 1
Figure 4.13: Layered structure of a distributed control system. field busses run across engine rooms, thus making analogue wiring as short as possible. A process-bus constitutes the next layer, coupling work cell controllers with general data processors, mass storage devices, and video operator stations. System software provides for network transparency, in that process data can be made available to each station of the network. Physical transmission media as well as communication protocols within the field and the process busses may, in general, be different. So far, the discussion has focussed on the automation of a particular technical process within a factory department. However, in a real plant, a number of relatively independent processes must be maintained at the same time. Consider, e.g., a car factory. The plant consists of a number of departments for assembling engines and transmissions, finishing car bodies, etc., with the final assembly of cars at the end. The technical processes running in the departments are relatively independent, so that they are usually controlled by separate control systems. However, they are not completely independent, as the products of certain processes are passed as inputs to other processes. This means that the flow of parts and part-products, as well
66
Real-Time Systems
as order management, billing, production monitoring, inventory control, etc., should be jointly planned for the entire factory. The tasks of planning and supervising the manufacturing operations are traditionally included in the tasks of plant management,
and handled separately from current control. However, on each level of the control and management structure, computer equip ment is used. This creates the possibility of integrated data processing within a plant, which can be accomplished by connecting control and management networks through inter-network bridges (Figure 4.14). System software provides for network transparency and enables distribution of process data to the control as well as to the management network stations. This creates a large network which integrates the microcomputers and minicomputers of the control system with the personal comput ers and mainframes of the management and planning system. In the next step, the computer systems of all particular factories within a corporation can be connected with each other to form a very large network supervising the manufacturing processes within that organisation. This supports the idea of Computer Integrated Manufac turing (CIM) and is the topic of significant research and standardisation efforts [21], The aim of CIM is to produce varying numbers of products in the same cost-effective way and to be able to vary products using the same machinery. Three levels of industrial networks can be classified as the communication support for CIM: 1. Specialised cell networks for process control, logic and numerical control, and robot control, as well as CAD, factory management and office automation. 2. Factory-wide networks connecting particular working cells within a factory. 3. An organisation-wide network connecting factories and the management of the corporation. The technology for implementing these communication systems is provided within the framework of the MAP/TOP architecture, which will be discussed in some details in Section 6.5.
Hardware
Architectures
Factory 2
Headquarters X25 modem
X25 modem
Corporate planning
Long-term scheduling
rSF
Central dispository
X25 modem
Administrative network
Market analysis
Investment planning
LT^Zi
o
Payroll
X Router
Order processing
Computer
Engineering Router
X
CAD
CAD
Production planning
Local control room
o
Local control room Router
1
Storage
l-O
Router
[
ffl^ Technical process 1 Department 1
^ m
Technical process 2 Department 2
Factory 1 control and management system
Figure 4.14: Computer integrated manufacturing system.
This page is intentionally left blank
Chapter 5 Process Interfacing 5.1 Technical Process Infrastructure An interface between a technical process and a control system is complex and the boundary between elements of a process and elements of a control system may be established arbitrarily. In traditional architectures the criterion was based on the physical distribution of the hardware within a plant: the devices inside a control room belonged to the control system, while the remainder outside belonged to the process (Figure 5.1). However, in an integrated network architecture, the boundary established according to that criterion becomes blurred. To investigate the problem one can view the interface as consisting of several layers. The layer closest to the process installation consists of sensors and actuators. Sensors are devices which convert the physical characteristics of a process such as temperature, intensity of liquid flow, pressure, and revolution speed, into electrical signals . Depending on the type of characteristic and the construction of the corresponding sensor , output signals from a sensor can occur at discrete points in time or can be produced continuously. Examples of sensors are thermocouples, flowmeters, tachometers, contact probes, optical scanners, and video cameras . It can be said that the functioning of sensors is analogous to the functioning of senses in a living creature. Actuators are devices which, when driven by electrical signals , can effect the physical world in quite a literal way, by changing temperatures, valve positions, speed of movement or rotation, and so on. Examples of actuators are electric motors, pumps, and electromagnetic valves. Electrical signals generated by sensors and accepted by actuators can be as low as a few millivolts or even microvolts, as in the case of some medical applications, and as high as some kilovolts DC or AC. It may be difficult or even dangerous to transmit such signals over long distances from the controlled process to the control system.
69
70
Real-Time Systems
Computer control system Process interface
' ' v
Analogue and digital signals
Normalising converters, sensors and actuators Technical process Figure 5.1: Interface between a process and a control system.
Therefore the next layer of the interfacing devices typically consists of amplifiers and converters which normalise the signals to fall within some standard ranges. The normalised signals define the boundary between a control system and its en vironment. They can be connected to a control room (or to the nearest front-end processor) and coupled with a control system through a process interface, which constitutes the last layer of the interface between a process and a control system. The normalised signals which input to and output from a process interface fall, in general, into two distinct categories: analogue and digital signals. An analogue signal represents the value of a continuous process variable by the value of its voltage or current. In other words, the variations of a continuous process variable are represented by the amplitude modulation of a continuous signal. Typically, standard ranges of analogue signals in industrial applications are 0 . . .+10 V and 4 . . . 20 mA. However, the standards are not always kept and the voltage ranges ±10 V (i.e., - 1 0 . . . + 1 0 V), ± 5 V, and ± 1 V, as well as the current range 0 . . . 5 mA can also be encountered, particularly in laboratory experiment automation. Moreover, a process interface may occasionally be required to handle the signals generated by typical sensors directly, such as thermocouples in chemical industry and resolvers in tool and robot control. It is worth noting that the 4 . . . 20 mA current-based representation is superior to the others. First, because current signals are less susceptible to noise than voltage signals. Second, because all valid values of a signal require some current flow, the opportunity is given to discover the most frequent cause of failure, i.e., when the circuit is broken and no current flow occurs. A digital signal is characterised by two values of its voltage or current, which can be either low or high. The signal can be used to represent values of either two-state or
71
Process Interfacing
o+U \]R Contact
T
cy^O
1
_. Binary » signal voltaee
Figure 5.2: Two-state contact switch.
continuous process variables. In the former case a static signal value is meaningful, while in the latter case a pulse representation with frequency modulation or pulse width modulation is employed. These two cases will be referred to as two-state and pulse signals, respectively. The two values of a digital signal are usually referred to as logic 0 and 1. It can thus be noted that a set of two-state signals can be read as a binary number representing the actual values of a multi-state or continuous variable. There are no definite standards for ranges of digital signals: however, the most typical representation in industrial applications is 0 V (low) — 24 V (high), which gives the voltage sufficient for driving a typical relay. Another representation in common use is 0 — 5 V, the range compatible with the standard voltage levels of TTL logic, which have been adopted by microcomputer peripherals. More precisely, the TTL voltage levels are defined as follows. A transmitter drives the voltage 0 . . . + 0.4 V for logic 0 and the voltage + 2 . 4 . . . + 5 V for logic 1. A receiver recognises 0 when the voltage is below +0.8 V and recognises 1 when the voltage is above + 2 V. The 0.4 V gap between transmitter output and the receiver input provides protection against noise. A two-state signal represents the value of a two-state process variable. As an example, consider the chemical process described in Section 1.3. The two-state signals generated by the platform sensor and the contact switch were defined as "open" or "closed". These correspond to the states of an electrical contact. It can be seen that such a signal can be easily converted into a two-state voltage signal by connecting a resistor from the contact to a power supply (Figure 5.2). The contact can be also used to switch current on and off. The current or the voltage level of the output signal does not depend on the construction of the contact itself but on the current or voltage of the power supply (however, the construction of the contact limits the maximum voltage and current which can safely be switched by the contact). A pulse signal can represent the value of a multi-state or continuous process variable by the number of pulses issued in a time interval, by the length of a single
72
Jteai-Time Systems
JL _*"—
Spring
—
r
]
R
1
^\,
^r
Pulse signal
Figure 5.3: Scale platform. pulse, or by the frequency or pulse width modulation of a continuous pulse stream. Consider the scale platform from the chemical process example (Figure 5.3). A bottle presses down on the platform, which is supported by a spring. The platform moves down a distance proportional to the weight of the bottle. During the movement, the platform closes electrical contacts mounted at regular intervals and causes pulses to be output in proportion to the distance and, hence, to the weight. A similar idea can be employed to measure the speed of a motor. An electrical contact mounted on the motor axle can output a pulse every revolution. The frequency of the continuous stream of pulses will be proportional to the speed of revolution. Analogue and digital signals exhibit different characteristics when transmitted over long distances. In general, digital representation is superior to analogue representation, because analogue signals are more susceptible to electromagnetic noise, whose influence increases with processing speed. Moreover, the resistance of transmission lines and the drift of electronic circuits cause additional loss of accuracy when using analogue signals. Pulse signals are much more resistant to noise and deformation as the wave forms can be reconstructed by simple electronic circuits (Schmitt triggers).
5.2
Interfacing Digital Signals
Interfacing digital signals to computer peripherals is relatively easy as it requires only simple electrical transformation. The only problems are: • Adaptation of voltage levels which must comply with TTL standards on the computer side of the interface. • Protection of the computer system from excessive voltages, which can easily occur in a harsh industrial environment.
Process Interfacing
73
Process power supply 0+24V
i
Computer power supply O+U o+5V
D
,
l) 1
*""|
f
1
^ KtP '
.
TTL ►■ logic
signal
|N^
U
R
Process circuit
D =c
Computer circuit
Figure 5.4: Galvanic decoupling. The definition of what are "excessive" voltages depends on the technology, but it is safe to assume that a voltage outside the range 0 . . . + 5 V may be dangerous and can cause serious damage to delicate computer circuits. Both problems can be solved by dividing interfacing circuits into two parts and electrically isolating the process portion of the circuit from the computer portion of the circuit. (In the remainder of this chapter, electrical isolation will be referred to as galvanic decoupling.) This can be implemented by means of a relay or an opto-coupler (Figure 5.4). An opto-coupler (opto-isolator) consists of a light-emitting diode (LED) and a phototransistor placed in a common package. The phototransistor acts as an electrical contact, which is closed when the LED emits light and is open otherwise. The phototransistor drives an amplifier which provides the final shape for the signal. If an excessive voltage occurs in the process portion of the circuit, the LED will be destroyed, but the other part of the circuit will not be damaged. The scheme in Figure 5.4 shows an input interface, but the same technique can be used for output signals. The scheme is simple: however, two points should be emphasised, as they are not obvious. • Power supplies in both parts of the circuit, i.e., the process part and the com puter part, must be separated. If the same power supply were used, then both circuits would be coupled through the common supply. • If the signal wiring runs through an engine room, decoupling is necessary also in the case when the amplitude of the process signal does not exceed the safe value of +5 V. This protects the system from short circuits between different wires
74
Real-Time Systems
Computer power supply o+5V o+24V
Computer circuit
I
Process circuit
Figure 5.5: Relay driver. and from voltage peaks which can be induced when switching high powered tools on and off. As an alternative to an opto-coupler, a relay can be considered. The advantage of a relay is that it can drive high powered devices using both DC and AC. The disadvantage is a long switching time, which can be as much as tens of milliseconds. Relays are rarely used for simple galvanic decoupling of input signals, however, they are commonly applied in high power two-state output circuits. Usually, a relay can not directly be driven by a logic gate, and an auxiliary transistor driver (amplifier) is needed (Figure 5.5). The diode shown in the figure is essential as it prevents de struction of the transistor by a voltage peak induced when the current through the coil is switched off. Static two-state signals are usually grouped into words which can be connected to a microcomputer bus through standard registers or parallel input-output (PIO) pe ripherals. The details of the connection depend on the bus structure of the particular microcomputer used. An example for an Intel-like bus is illustrated in Figure 5.6. Dynamic input signals, i.e., signals where transitions (and not the levels) are essen tial, can be entered via a static interface and examined by microcomputer software; or, alternatively, they can be connected to the microprocessor interrupt inputs. The implementation is straightforward, as ready-made programmable interrupt controllers are offered for any microprocessor by many vendors. Pulse signals can be most conveniently handled by means of counters. Readymade programmable counter-timer peripherals offer a number of possibilities for signal processing. Input pulses can be counted within a given time interval, or the length of a single pulse can be measured. Output pulses can be generated in a single series of any length, or repetitively as a stream of pulses with a desired frequency. A single pulse of an arbitrary length can also be produced. Occasionally, pulse signals can be
75
Process Interfacing
I/O read pulse Microcomputer bus
ADDRESS:
Process interface
to a process
from a process
Figure 5.6: Parallel input-output interface.
interfaced through a static or an interrupt interface and handled directly by software programs. This decreases hardware costs, but significantly increases the load on the microprocessor. It is easy to see that a real-time clock is a special case of a counter-based device. It consists of a number of counters for microseconds, milliseconds, seconds, minutes, etc., fed by a pulse wave with a precisely defined frequency. The wave can be produced, e.g., by a quartz stabilised oscillator. Stepping a counter up or down is usually assumed to be an instantaneous operation. In fact, however, it takes some finite time and an intermediate state between the previous valid value and the next valid value exists. In this state the counter contents can be a random number. If a microprocessor reads a counter exactly at the time of a transition, the result could be undefined. To prevent such situations it is necessary to block the counter input for the time of reading. Counter-timer peripherals of a microcomputer are always protected in this way, but one must be aware of the problem when reading values from external counters. Another problem can arise when attempting to count pulses issued by a relay or a contact switch. The electrical contact can rebound a few times during switching on and off ("bouncing"), thus emitting a series of pulses, instead of a single edge (Figure 5.7). A filter to suppress additional pulses must be applied before the signal
Real-Time Systems
76
>+U
+u
^output
nn i
LfL
signal
Figure 5.7: Bouncing of an electrical contact. enters a counter. This is the role of the capacitor shown in Figure 5.4. Generation of a pulse width modulated signal is slightly more difficult. The sim plest circuit (Figure 5.8) consists of an n bit register R, a counter C and an n bit comparator. The counter is fed by a pulse wave x with a constant frequency fx. The signal y generated by the comparator is high as long as G < R, and is low when the reverse is true. It is easy to see that the frequency of the output signal y is given by:
f h
=2"f-
and the duty cycle k of the output signal is proportional to the contents of the register R: 2n
5.3
Analogue Outputs
Analogue signals are created by digital-to-analogue conversion of binary numbers (i.e., values of control variables) calculated by a computer. The simplest converter is an electronic circuit based on a resistor ladder (Figure 5.9). Consider first a resistor ladder connected as in Figure 5.9a. It can easily be seen that in each node of the ladder the resistance of the right hand part of the ladder as well as of the "vertical" resistor are always equal to 2R. Hence, from Ohm's law, the current which flows to a node from the left hand part of the circuit is always divided in half. In general, the current in a branch i is equal to: Ii = 2~{I (5.1) The operational amplifier shown in Figure 5.9b issues an output voltage which is proportional to the difference between the voltages on the inputs U+ and U-, i.e.:
U0 =
k{U+-UJ)
Process
77
Interfacing
») Microcomputer bus
Register R
W Comparator^C 1^3
medium
Dual-slope
> 17713
medium
Parallel
10ns . . . lfis
high
Single-slope
> 1ms
low
Type of | converter 1 Successive approximation
Resolution and comments 12... 16 bits Stable input More than 20 bits Noise suppression 4 . . . 8 bits Outputs always available 8 bits Lack stability with time and temperature |
transmission lines, as well as electromagnetical noise, influences the signal value, and results in a decrease of the accuracy of the analogue data representation. An alter native approach is to perform analogue-to-digital and digital-to-analogue conversion as close to sensors and actuators as possible, and to transmit the data in a digital form. The conversion can be done within an intelligent (smart) sensor or actuator, by a front-end processor, or by a special signal conditioning module. The results of conversion are binary numbers which can be interfaced to a computer system through parallel or serial data links. A parallel interface consists of a number of data lines, each carrying a single bit of data, plus a few lines carrying control information. Parallel data can be easily entered to a microcomputer through parallel peripheral ports (Figure 5.6). However, parallel data representation is not convenient for long haul transmission. The disadvantages of this method are the following: - A number of signal lines are required (a "line" implies not only a wire, but also a transmitter and a receiver of the signal). - There are no defined electrical standards for transmitting parallel signals (TTL level voltage signals can be transmitted only for 2 . . . 7 m , depending on the type of logic gate and the transmission speed; in general, high power gates and low transmission speed enable longer transmission distances, but different sources cite different data and no agreement exists). - There are no defined logical standards for structuring parallel data words (e.g.: binary or BCD numbers; all bits transferred simultaneously or sequentially byte by byte; if sequentially then the most or the least significant byte first, etc.). As a result, parallel transmission of digital data is employed only over short distances,
Real-Time Systems
92
Data byte
i
Data byte
Byte transmitted
1
Shift register
Shift register shift
shift +1
Byte received
+1
Bit counter
Receiver clock
Transmitter clock Transmitter
Bit counter
Serial data link
Receiver
Figure 5.19: Serial transmission link. between a computer and its closest peripherals. For transmitting data over longer distances serial interfaces are used. A serial data link consists of a transmitter which converts a data byte into a sequential bit stream, a single data line which carries the bits one after another, and a receiver which assembles the data byte from the stream of bits. The parallel-toserial and serial-to-parallel conversion is accomplished by shift registers embedded in the transmitter as well as in the receiver (Figure 5.19). To ensure synchronous operation of both registers, the transmitter and the receiver clocks should have the same values. This can be achieved in one of two ways: • Asynchronous transmission. The clocks are separate, but the receiver clock is tuned to the same frequency and is synchronised with the transmitter clock at the beginning of each data byte. • Synchronous transmission. The value of the transmitter clock is sent to the receiver through a separate wire or is encoded with data in a single selfsynchronising signal. Interfacing process peripherals is usually accomplished by asynchronous rather than synchronous links. The most popular interface for low speed serial data trans mission has been standardised by the Electronic Industries Association (EIA) under the name RS-232-C. The standard defines the connector type, pin numbers, line names
Process Interfacing
93
Table 5.2: Basic RS-232-C signals. pin 1 7 2 3 4 5 6 20 8
RS AA AB BA BB CA CB CC CD CF
V.24 101 102 103 104 105 106 107 108.2 109
Name GND: Protective Ground 0V: Signal Ground TxD: Transmitted Data RxD: Received Data RTS: Request To Send CTS: Clear To Send DSR: Data Set Ready DTR: Data Terminal Ready DCD: Data Carrier Detected
Direction
Output Input Output Input Input Output Input
and functions, and electrical signal characteristics. RS-232-C is compatible with the interface standards V.24 (functional characteristics) and V.28 (electrical characteris tics) issued by the Consultative Committee on International Telegraph and Telephone (CCITT). The RS-232-C standard was originally intended for connecting computers and terminals (which are jointly called DTE — Data Terminal Equipment) to modems (which are called Data Sets): it defines 21 signal lines, which are used for establishing logical links between co-operating devices, for transmitting data, and for handling error situations. When the interface is used for connecting peripherals directly to a computer most of the signals are unnecessary. (It should be noted, however, that RS-232-C was never intended for such DTE-to-DTE connections.) In the majority of typical applications only a small fraction of 2 . . . 9 signals are accepted and generated by the equipment. These 9 signals of the DTE interface are shown in Table 5.2. The signals are numbered according to three systems: pin numbering used in the CANNON DB-25 connector, the RS-232-C numbering, and the V.24 numbering. The Protective Ground (GND) is for safety and is connected to the equipment chassis at both ends of the link. The Signal Ground (0V) establishes a common ground reference voltage for all data signals. Transmitted Data (TxD) is the data path for transmitting data to a co-operating device, while Received Data (RxD) is the path in the reverse direction. Data Terminal Ready (DTR) and Data Set Ready (DSR) have been designed as handshake signals which show that power is on and that the devices (i.e., a computer and a modem) are ready to operate. Request To Send (RTS) and Clear To Send (CTS) signals have been designed to control data transfer from a computer to a modem. The computer generates RTS when it wishes to transmit, while the CTS signal indicates that the modem is ready to receive characters from the transmitter. The Data Carrier Detected (DCD) signal is used to start the computer receiving data.
Real-Time Systems
94
Microcomputer
>c >c >c
TxD RxD DTR DSR DCD RTS CTS _0V
T
GND
■o
TxD RxD DTR DSR DCD RTS CTS OV
Video display terminal
zn—
b)
Microcomputer
TxD RxD DTR DSR DCD RTS CTS OV
x
TxD " RxD Signal Conditioning module
Z2 OV
t
t Process signals
Figure 5.20: RS-232-C connection from a computer to a video display terminal (a), and to a signal conditioning circuit (b).
Process peripherals usually use only ground and data signals. However, commu nication peripherals used on the computer side of the interface examine the CTS, DSR and DCD signals and delay any transmission until they are in the "on" state. Because of this, the inputs cannot be left open. Two recommended connections are shown in Figure 5.20. A transmitted data byte (Figure 5.21) consists of a START bit, 5 .. .8 data bits (least significant bit first), an optional parity bit, and a STOP mark. Time slots for transmitting the bits are the same, with exception for the STOP mark which can be as long as 1, 1^, or 2 bit slots. Obviously, the parameters must be tuned to be the same for both co-operating devices. The electrical specifications of RS-232-C are not compatible with TTL voltage levels. Control signals use positive logic. An RS-232-C transmitter generates a voltage
95
Process Interfacing
START
1
0
1
1
0
0
0
1
PARITY
STOP
Figure 5.21: Byte (10001101) transmission waveform.
of + 5 . . . + 15 V to signal the "on" line condition, and a voltage of — 5 . . . — 15 V to signal the "off" condition. A receiver recognises voltages of above -f 3 V as "on" and voltages below —3 V as "off". On the TxD and RxD lines the data are sent in a negative logic, i.e.: 0 - above +5 V (+3 V), equivalent to "on" state, 1 - below - 5 V ( - 3 V), equivalent to "off" state. When a signal changes from one level to the other, it is allowed to spend, at most, 4% of a bit period in the transition region. RS-232-C specifies that the amount of stray capacitance allowable in the transmis sion link must not exceed 2500 pF, and, because ordinary cables have a capacitance of 120... 160 pF per meter, RS-232-C limits cables to 15 m. The transmission speed is limited to 20 000 bps (bits per second); with the typical values of 75, 150, 300, 600, 1200, 2400, 9600, and 19 200 bps. The major drawback to RS-232-C is its limited transmission distance of 15 m. In practice, transmissions can be made over longer distances, but always at some risk. Furthermore, the RS-232-C voltage levels are not particularly convenient, because they are not the same as those in standard TTL and MOS technologies. This means that an additional power supply is needed. To overcome these problems, a currentloop interface, designed originally for teletype terminals, is frequently used to transmit serial data over longer distances. The current-loop interface is not a proper standard, in that it has not been officially sanctioned. In general, it contains only the Transmit and Receive data signals. The data bits are represented by switching a current on and off. The interface comes in two versions, in which 20 mA or 60 mA current is switched, i.e.: 0 - 0 mA, 1 - 20 or 60 mA. The interface works properly at rates of up to 9600 bps over distances of up to 450 m (1500 feet). Lower transmission speed, e.g., 4800 bps, enables longer transmission distance, e.g., 1200 m, but different sources cite different data and no agreement
Real-Time Systems
96
Current source
Active transmitter
Current source
Active receiver
Figure 5.22: Current-loop: active-to-active converter. exists. The problem with the current-loop interface is that the interface is provided in two varieties: active, which actually generates the current, and passive, which switches or detects it. Either the receiver or the transmitter can be active. Because of the convenience of possessing a power supply, the computer usually has active interface circuits. If two active interfaces are to be connected, a simple active-to-active con verter is needed (Figure 5.22). It can be noted that the converter provides for galvanic isolation between the transmitter and the receiver. The limitations of RS-232-C result from its electrical specifications, and particu larly its earthing arrangements. Different ground potentials at the ends of the link cause ground current to flow through the Signal Ground wire. The inevitable resis tance in this wire insures that a potential difference exists between the Signal Grounds that could cause the data to be received incorrectly. The problems can be overcome by sending each signal differentially on two wires (Figure 5.23). This idea of balanced transmission has been implemented in the RS-422-A in terface which supersedes the electrical part of RS-232-C. The difference between the voltages on the wires determines whether 0 or 1 is read: if the difference is positive and more than 200 mV, the receiver reads 0 (or "on"), while if it is negative and less than —200 mV, the receiver reads 1 (or "off"). The transition region between the values is thus only 400 mV. With the elimination of the ground potential prob lem, the use of such a narrow transition region becomes possible. This approach allows suitable transmitters as well as receivers to be implemented with the typical ± 5 V power supply. The electrical specifications of RS-422-A have been incorporated into the RS-485 standard interface which is widely used for multi-point connections between computers and process peripheral equipment.
Process Interfacing
97
Twisted pair wire
Differential transmitter Figure 5.23: Two-wire connection for a differential RS-422-A signal. Table 5.3: Popular serial transmission standards. Name RS-232-C
Date 1969
Speed/Distance 20 000 bps and 15 m
RS-422-A
1975
RS-423-A
1975
10 million bps and 12 m 100 000 bps and 1200 m 100 000 bps and 12 m 1000 bps and 1200 m
Current loop
10 000 bps and 450 m 5 000 bps and 1200 m
Purpose Interface specification for modem control, including electrical and mechanical characteristics, and functional definitions of signals Electrical specification only, two-wire connection for each signal Electrical specification only, one-wire connection for each signal with common return wire Electrical specification of data path only, two-wire connections for 20/60 mA current
Because of the much smaller transition region, RS-422-A devices are completely incompatible with RS-232-C devices. This poses such a serious disadvantage that the intermediate standard RS-423-A has been introduced, which can be used in both environments. Basic parameters of the interfaces are listed in Table 5.3. The details can be found in [25].
This page is intentionally left blank
Chapter 6 Communication Networks 6.1
Network Architecture
A distributed computer system consists of a collection of application-oriented end stations, e.g., computers running application programs, terminals or controllers, and a communication network which carries messages from station to station (Figure 6.1). Basic components of the communication network are network interfaces, which will be referred to as nodes, and transmission lines. All traffic to or from a station goes via its node which, in general, is a specialised computer or a special purpose peripheral device. The boundary between a station and its node can be blurred in contemporary computer architectures: a station is usually a general purpose or specialised computer with its node implemented by plugging board(s) into the station computer bus, plus a modem (if required). The network interfacing functions are divided between the node hardware, on-board processor software, and the station computer software. There can be a variety of network structures. Two general types of designs for a network are point-to-point connections and multi-point transmission lines. In the first case the network contains a number of separate lines each one connecting a pair of Station 1
/Node)
Station 2
Station n
Messages
End stations
Communication network with nodes and transmission lines
Transmission lines Figure 6.1: End stations and communication network. 99
Real-Time Systems
100
•)
n
b)
Figure 6.2: Point-to-point connections: star (a) and irregular (b).
b)
A6 A Figure 6.3: Multi-point connections: ring (a) and bus (b). nodes (Figure 6.2). If two nodes that are not connected directly wish to communicate, they can do this indirectly via one or more intermediate nodes. In the other case a single line is shared by all nodes (Figure 6.3). In principle, a message sent by any node can be received by all other nodes. Depending on its physical size, i.e., the maximum distance between nodes, a net work can be classified as a local area network (LAN) or a wide area network (WAN). Main characteristics of LANs are the following: • The network is owned by a single organisation (e.g., a factory). • Interconnected computers co-operate in distributed processing applications. • The physical size is usually less than 1 . . . 10 km in length. • Transmission lines are dedicated. • Data rates are high (up to 100 Mbps). Characteristics of WANs are the opposite: they are large in size and, in general, they rely on public transmission lines such as those used by telephones or by satellite
Communication Networks
101
communication systems. The most typical LAN topologies are bus or ring; and ir regular or star in WANs. Moreover, a number of LANs can be merged together with autonomous mainframes within a very large heterogeneous WAN. In fact, a strong tendency to interconnect LANs into broader, possibly world-wide, environments has been observed recently. For this perspective the net architecture can be viewed hi erarchically, with a number of particular LANs at the bottom, and a global WAN which joins LANs together on the top. Distributed control systems are usually limited in size to the factory floor, and fall into the category of LANs. However, as pointed out in Section 4.5, there exists the need for inter-network communication on, at least, a company-wide basis. Regardless of their sizes and topologies, communication networks are constructed in a highly structured way as a series of layers, each one built upon its predecessor. The purpose of each layer is to offer certain services to the higher layers, while hiding the details of how the services are actually implemented. The convention is that layer n on one node carries on a conversation with layer n on another node. The rules established for this conversation are jointly referred to as the layer n protocol. In reality, data are directly transferred on the lowest layer of physical communication only. The conversation on layer n is performed indirectly via all layers lying below layer n. The most general model of network architecture is given by the Reference Model for Open Systems Interconnections [26] issued by the International Standards Or ganisation (ISO). The model is not an implementation specification for services and protocols, but it provides a conceptual and functional framework for describing com munication networks and for the development of protocol specifications. The model identifies seven related layers (Figure 6.4) arranged in such a way that they are par tially independent. The detailed standards for each layer are defined separately. The physical layer (1) transmits a raw stream of bits supplied by the data link layer over a communication medium. It performs all functions associated with signal ing, modulation and synchronisation at the bit level. The layer definition comprises transmission medium characteristics, the electrical and mechanical connections to the medium, and the data rates. The layer can perform error detection by signal quality monitoring. An example of a physical layer definition is the electrical part of the RS-232-C specification (Section 5.5). The data link layer (2) establishes access to the medium, and transmits packets supplied by the network layer. It can also provide error detection and correction, so that the line appears free of transmission errors. The tasks of the data link layer include breaking the input packets up into data frames, transmitting the frames se quentially, processing positive as well as negative acknowledgements sent back by the recipient, and retransmitting erroneous frames. This can be summarised as convert ing the raw physical medium into an error-free point-to-point logical link between two
Real-Time Systems
102
Node A
NodeB
7 Application * i Presentation 6 «,_ _ _ - - - - - - -r 5 Session
Layer protocols
NodeC
r~\
Application
i
1~~
Presentation
- - - - - - i - -
Session
4 Transport
Transport
Network
Network
2 Data link
Data link
3
1
Physical
Physical
Network Data link v__y
Physical
Physical data transfer Directly connected nodes
Directly connected nodes
Figure 6.4: Layered network architecture.
Communication Networks
103
directly connected nodes. Data link frames contain addresses which uniquely identify the nodes sharing a common transmission line. The network layer (3) routes the messages supplied by the transport layer throughout the network. This layer is concerned with addressing networks and nodes, and it isolates the higher layers from all the peculiarities of the network topology: whether a point-to-point line, multi-point LAN or even a system of interconnected networks. In the latter case the communication is achieved by means of routers, i.e. devices that provide links between networks that differ at the data and physical layers. The tasks of the network layer include converting messages into packets, establishing the route (possibly through intermediate nodes) and directing packets toward their destinations, and ensuring that all packets are received correctly and in the proper order. Network packets contain addresses which uniquely identify stations within the network. This can be summarised as converting the physical structure of the network into a reliable point-to-point connection between the source and the destination sta tions. The network layer can also deal with the accounting and billing of the network customers. The transport layer (4) controls the transport of messages supplied by the session layer to the destination nodes and optimises the use of communication resources to provide the required performance at minimum cost. It isolates the higher layers from limitations imposed by, and from variations in, the hardware technology. The tasks of the transport layer include splitting unlimited messages from the session layer into smaller units acceptable to the network layer (defined by the packet size), and passing them to the destination in the most efficient way possible. Depending on the requirements formulated by the higher layers, the optimisation may involve creating multiple network connections for one transport connection, if high throughput is required, or alternatively, multiplexing several transport connections onto the same network connection, if cost optimisation is required. Transport messages contain addresses which uniquely identify end users, i.e., processes executed by end stations. The session layer (5) establishes a connection between individual end users (pro cesses), provides a recovery to a known state in case the connection is lost, and terminates that connection when it is no longer needed. The connection between end users is usually called a session. The session layer can provide facilities for controlling authentication and negotiating various dialogue options. The presentation layer (6) interprets the transmitted data and converts them into a format that is acceptable to the end user. The typical tasks of the layer include data transformation, formatting, encryption, and compression. Most of these functions are application-dependent and the boundary between the presentation and the application layers is not very clear. The application layer (7) interfaces end users to the network. The tasks of the layer include standard services such as file transfer or application-oriented message
Real-Time Systems
104 exchange.
6.2
L A N Technology
Many important characteristics of computer networks, including transmission dis tance, data rate and error rate, result from characteristics of the physical layer, which is responsible for the transmission of data bits between the network nodes. Typical components of the transmission system are: • Transmission lines, i.e., the medium which transfers signals over a distance; a line should be closed at both ends with appropriate terminators. • Modems, i.e., devices which convert the data into the signal transmitted over transmission lines, and recover the data from the received signal. • Repeaters, i.e., devices placed between separated segments of transmission lines which regenerate the transmitted signals. The most important issues concerning the physical layer are: types of transmission lines, signaling methods, and network topology. Transmission media. The most frequently used transmission media are twisted pair wire, coaxial cable, and fibre optic cable. A twisted pair is the cheapest type of transmission medium with a limited bandwidth, thus only suitable for transmitting data with rates not exceeding a few megabits per second (Mbps) over distances of a few hundred meters. It can only support limited numbers of a few dozen stations and is typically used for point-to-point connections. A coaxial cable is relatively expensive, but it offers a much higher transmission rate of up to 100 Mbps over a distance of approximately 1 km. It can support a few hundred stations and can be used for point-to-point as well as for multi-point connections, as tapping into the cable is very easy. Coaxial cable is the most typical communication medium used in high speed LANs. In theory, fibre optic cable exhibits the best characteristics in that it offers virtu ally unlimited bandwidth and small signal attenuation. Moreover, it is completely resistant to electromagnetic noise. This is why it has been promoted for several years as a panacea for transmission problems, especially in harsh industrial environments. The problem is, however, that tapping into a fibre optic cable is very difficult, which limits the application of fibre optics to point-to-point connections. Signaling m e t h o d s . There are a variety of methods for converting a digital bit stream into the signals to be transmitted over a distance. Two general types of sig naling techniques are baseband and broadband. Baseband transmission uses digital
Communication Networks
105
Binary signal Phase-encoded signal
Figure 6.5: Baseband signaling. signaling in which a digital pulse signal at its original frequency is directly fed to the transmission line. The transmission can be asynchronous or synchronous (Sec tion 5.5), depending on the required data rate. High speed LANs use synchronous transmission and a special phase-encoding technique, known as Manchester II. Data bits are coded by signal transitions rather than by voltage or current levels (Fig ure 6.5). This guarantees a signal transition during each bit time slot and enables data to be reconstructed as well as provides bit clocking pulses. Most baseband sys tems use a special 50 fi coaxial cable, which provides better protection against low frequency noise than standard 75 U cable. Twisted pair wires are sometimes used in low-cost, low-performance applications. Broadband transmission uses analogue signaling in which a digital signal is used to modulate a carrier wave sent along the transmission line. Basic modulation methods are: amplitude modulation, frequency modulation, and phase modulation. The latter two methods are also known as frequency shift keying (FSK) and phase shift keying (PSK), respectively. It is possible to employ combinations of these basic methods which lead to more than one bit being encoded in a single signal element. Most broadband systems use standard 75 fi coaxial cable and the cable television frequency band 5 . . . 300 MHz, which is divided into a number of channels with a standard 6 MHz bandwidth. Because of considerable signal attenuation at high frequency, in most practical cases amplifiers are required. A special case of a broadband system, which will be referred to as carrierband, is a low-cost system which uses only a single channel. Transmission is generally at a low frequency of a few MHz. The attenuation at these frequencies is relatively low and amplifiers are not required for distances of up to 1 km. The broadband transmission technique has a number of advantages over carrierband and baseband. It is based on the proven technology of cable television and can use many of the cable TV components. A typical cable with a bandwidth of 300 or even 450 MHz can support multiple users simultaneously, and enables the integration of voice and video as well as digital data within the same cabling system. Moreover, broadband is suitable for longer distances, of tens of kilometers, if amplifiers are used. On the other hand, broadband systems are more expensive and, as we shall see, can-
106
Heal- Time Systems
Table 6.1: Relationship between medium and signaling method. Transmission medium Twisted pair Coaxial cable 50 fl Coaxial cable 75 fl
Fibre optics
Signaling method Baseband Baseband Baseband Carrierband Broadband
Maximum data rate (Mbps) 1 ...4 20 50 50 20 100
Maximum range (km) 1 1 1 1 10's 10
Number of stations 10's 100's 10's 10's lOOO's 10's
not be as reliable and as fast as baseband and carrierband buses. The advantage of carrierband over baseband is that it uses standard 75 ft coaxial cable, which enables easy up-grading if needs grow and broadband capacity is required. On the other hand, baseband has the advantages of simplicity and low cost. Typical characteristics of transmission media, in conjunction with signaling tech niques for LANs, are gathered in the Table 6.1. Network topology. The most popular LAN topologies are multi-point bus and ring (Figure 6.3) and, occasionally, point-to-point star (Figure 6.2a). Point-to-point star shaped LANs are cost-effective in those hierarchical systems in which short mes sages are to be sent over relatively short distances. The advantages of this design are: very simple node structure and short response time achieved with moderate data rates, which allows the use of cheap twisted pair cables. However, for longer distances the total cable length, hence the cost of cabling, becomes unacceptable. Consider a multi-point communication line carrying data bits at the rate of 10 Mbps. A time slot for transmitting a single bit of information has a duration of 100 ns. A signal sent from one end of the 10 km long line must travel 35 fts until it reaches the other end (the speed of propagation is assumed to be roughly 300,000 km/s). This means that each bit propagates through the line on its own, in a way similar to a parcel in a pneumatic pipeline. Moreover, each node connected to the line receives data bits at constant intervals, and the last bit of a short message may be sent before the first bit has been received by the node on the other end of the line. This observation is a key point in the analysis of the high speed LAN characteristics. In baseband and carrierband bus networks, a node is interfaced to the line by a medium access unit (Figure 6.6), which consists of a transceiver (line driver) and a tap. Depending on the technology the tap can be passive or active. A passive tap does not break the line and does not affect the flow of messages to and from other nodes through the line. This promotes network reliability, since a node failure does not prevent other nodes from communicating over the bus. An active tap breaks the
Communication Networks
107
Network mode Physical signaling
Tap
Transceiver
Medium access unit
Physical medium
Figure 6.6: Baseband bus network interface. line into two pieces and actively conveys (repeats) the messages from one segment of the line to the other. A failure or disabling of the node breaks the bus into two separate sections. Passive taps are commonly available for coaxial cable, which is the basic type of transmission medium used in high speed LANs. Optical couplers which connect the nodes in an optical fibre are active. For this reason, fibre optics are seldom used in bus networks. In a ring network each node is interfaced to the line by a transmitter-receiver circuit which separates two adjacent segments of the ring (Figure 6.7). If a node is not transmitting its own message, its transmitter and receiver act as a repeater, shifting messages from the previous segment to the next segment of the ring. This means that communication can only be in one direction round the ring and that the signal is regenerated at each network interface. To prevent infinite circulation of a message within the ring, the message must be removed in some fashion when it has fulfilled its purpose. This can be accomplished by the recipient or by the originator of the message. The advantage of the ring architecture is that the overall length of the ring is not limited by line-driving capability, because the signal is regenerated by each network interface. The disadvantage is that the interface between a node and the transmission line is active (transmitter-receiver), which means that power failure at a node causes total disabling of the ring, unless hardware redundancy is included. Fibre optic cable may be used in a ring network without any problems. In a broadband bus network the situation is more complicated, as the amplifiers are usually unidirectional, which makes the bus a one-way communication channel. This means in turn that the signal transmitted by a node in the middle of the line can travel towards one end of the line, but not towards the other (Figure 6.8a). Two channels are needed with a head-end repeater to distribute the signal to all the nodes
108
Heal-Time Systems
Network Node
—►
Physical signaling (1-3 bit delay)
—►- Physical medium
Figure 6.7: Ring network interface. connected to the bus. In practice, a single cable is used, with the two channels t>eing provided by different frequency bands. The head-end is a remodulator which converts the input frequency from the transmitter band to the receiver band. A great advantage of a broadband bus is that the cable need not be a single line without branches (as in the case of baseband buses). Instead, it may have a branched tree-like structure (Figure 6.8b). Amplifiers and a head-end remodulator are crucial elements of the system: care must be exercised in their provision, as a failure can disable communication over the bus. Fortunately, as the 40-year history of cable television shows, the mean time between failures (MTBF) of an average amplifier has been found to be of the order of 18 years [27].
6.3
LAN Medium Access Control
Comparing the ISO reference model with the multi-point LAN characteristics one can notice that the network and transport layers are superfluous, as their functions are essentially performed by the data link layer. The highest layers (application, presentation, and session) are usually incorporated into the application software. The remaining two layers (physical and data link) have been defined by the IEEE 802 committee, which issued a document authorised as the ISO 8802 standard. The standard divides the data link layer into two sublayers for medium access control (MAC) and logical link control (LLC). The document is organised into several parts, defining physical media and medium access control mechanisms for a variety of bus and ring LANs. The logical link protocol is independent of the type of medium and the medium access control. The structure of the ISO 8802 standard and its relation to the reference model is shown in Figure 6.9. The prevalent topology of LANs applied in distributed control systems is that of a bus. The remainder of this section therefore focuses on bus networks omitting the details of rings. Since the communication medium (a line) in a bus LAN is shared, and not each node can talk at the same time, a mechanism for medium access control is needed.
Communication
Networks
109
a)
, Taps
Amplifiers
Receiver channel
E3
Head-end repeater Transmitter channel Node
Node
Node
b) Head-end remodulator Transmitter, frequency Node
Receiver frequency
Transmitter /Amp\freq uency ^Amp/Receiver frequency
Node
Node
Node
Figure 6.8: Broadband bus LAN: twin-cable bus (a) and tree topology (b).
Real-Time Systems
110
ISO Reference model ISO Local area networks Network layer
8802.1 Internetworking 8802.2 Logical link control
Data link layer 8802.3
8802.4
8802.5
CSMA/ CD
Token bus
Token ring
Physical layer
Figure 6.9: Standard ISO 8802. The mechanism is, in general, independent of the signaling method. Two algorithms recommended for the bus topology are carrier sense multiple access with collision de tection (CSMA/CD) denned in ISO 8802.3, and token passing defined in ISO 8802.4. Unfortunately, both standards are ambiguous and offer a wide range of options. C S M A / C D . The physical layer of the network is defined as the baseband bus carrying Manchester II phase-encoded data through a coaxial cable. Network nodes are attached to the cable by means of passive taps. The maximum length of the cable is limited to 500 m, and the maximum number of taps to 100. The numbers above are for a recommended data rate of 10 Mbps. Other rates (e.g., of 1, 5, or 20 Mbps) can result in different engineering trade-offs involving cable length and number of taps. To extend the length of the network, it is possible to have branches from the bus, with the segments connected by means of repeaters. A repeater regenerates and passes the signals between segments, but does no buffering and is fully transparent to the rest of the system. There must be no loops in the network (i.e., there must be only one path between any pair of nodes). A maximum number of 5 segments can be connected together, resulting in a maximum network length of 2.5 km. The medium access control is fully distributed (i.e., there is no distinguishing network master) and is based on the following set of rules. Each node listens to the medium before it tries to transmit. If the medium is idle the node may transmit. Otherwise, the node waits until the medium is clear. It is possible that two or more nodes will find the medium free and start their transmissions. This means that a collision will occur, and the transmitted data are lost. Therefore a node continues to listen to the medium while it is transmitting. If a collision is detected, the node stops the transmission and sends a short jamming signal to ensure that all stations recognise
Communication Networks
Preamble Destination address
111
Source address
Data length
Data (LLC frame)
CRC
Figure 6.10: ISO 8802.3 frame. the collision. After transmitting the jamming signal the node waits a random amount of time and then retries. A data link layer protocol data unit is commonly called a frame. For correct oper ation of the CSMA/CD algorithm, the minimum size of the frame should be sufficient to ensure collision detection in the worst case. This means that the time taken to transmit the frame should be greater than the length of the collision slot, which is twice the propagation delay between the most distant nodes. The waiting time after a collision is a random number within the range 0 . . . 2 n At, where n is the number of retries and At is the length of the collision slot. Thus the waiting time increases ex ponentially with the number of collisions, which diminishes the communication traffic when the network becomes heavily loaded. The data frame which is generated by the medium access control sublayer and transmitted over the bus by the physical layer consists of six fields (Figure 6.10): -
Preamble — a bit pattern used to achieve bit synchronisation with the receiver. Destination address — to identify the destination node for the frame. Source address — to identify the source node of the frame. Data length — to give the length of the data field. Data field — to contain the LLC frame supplied by the LLC sublayer. This is a variable length field. - CRC — the cyclic redundancy check control word for error detection. The source and destination address fields may be 2 or 6 bytes long. The 2 byte addressing scheme is intended for purely local node addressing within a LAN, while 6 byte addresses are recommended when inter-network (global) addressing capabilities are desired. There exists a lower bound on the frame length, and the transmitter must pad the data field if the LLC frame does not appear to be long enough. ISO 8802.3 networks are intended for office automation. The standard is based on the specification of the Ethernet [28] system developed and promoted by Xerox. The advantages and disadvantages of the CSMA/CD algorithm can be summarised as follows. The advantages: • Simplicity of the algorithm. • Low overhead under light load conditions.
Real-Time Systems
112 The disadvantages: • Probabilistic access time to the medium. • Low throughput under heavy load (> 50%). • Inefficient handling of short messages.
Token bus. The physical layer of the network is defined as the broadband bus using 75 fi coaxial cable. Network nodes are attached to the bus by means of passive taps. The maximum length of the cable and the maximum number of taps are not limited, provided that amplifiers are used. The recommended data rate is 10 Mbps, however, slower options providing data rates of 1 or 5 Mbps are available. Simplified, low-cost carrierband versions are permitted by the standard ISO 8802.4. A carrierband bus does not use amplifiers and a head-end remodulator, hence it has the advantages of a passive bus and is potentially more reliable than the broadband. However, running at 5 Mbps, the bus is limited to a length of 700 m and at most 30 taps. Other data rates (of 1 or 10 Mbps) can result in different recommendations. The medium access control is distributed and, basically, collision-free. The nodes on the bus form a logical ring. A control frame, called the token, is passed from node to node round the ring. The node possessing the token controls access to the medium for a specified period of time, during which it may transmit frames and poll other nodes and receive responses. When the node has finished or the time has expired, it transfers the token to the next node in the ring. The logical sequence of the nodes in the ring is independent of the physical ordering on the bus. The ring is implemented by means of address pointers maintained within the nodes: each node knows the identity of its predecessor and successor in the ring. Token management is complex because many exceptional situations have to be handled effectively. The basic ones are: addition to ring, deletion from ring, successor failure, duplicate tokens, and ring initialisation. Addition to ring is initiated periodically by the current token holder which issues the solicit-successor control frame. This invites the non-participating nodes with an address between the token holder and its successor to insert themselves into the ring. The token holder waits for a period of time equal to the response slot, which is twice the propagation delay between the most distant nodes. If there is no response it passes the token to its successor. If one node responds, the token holder adjusts its successor address pointer and passes the token to the new successor. More than one response results in a collision on the bus. The token holder issues the resolvecontention control frame and waits for four response slots. Each applying node can respond within one slot, depending on the value of the first two bits of its address. If
Communication Networks
113
a collision occurs again, the process is continued based on the next two address bits. This process continues until a single valid response is received. Deletion from ring is initiated by the node which wishes to leave. When holding the token, the leaving node transmits the set-successor control frame to its predecessor. The predecessor node changes its successor address pointer excluding the leaving node from the ring. Successor failure condition can result from successor node failure or from distortion of the token frame. When the token is passed to the successor it must immediately transmit a data or token frame. If this does not happen, the former token holder retransmits the token to the same successor. After two failures, the token holder assumes that the successor has failed and issues a who-follows control frame. The second node down the ring responds with the set-successor control frame. The token holder adjusts its successor address pointer and passes the token to the new successor. If the response does not come, the token holder tries to reconnect the ring using the addition to ring procedure (but within the full address range). If this fails, the node ceases activity and listens to the bus. Duplicate tokens can result from a collision on the bus. If the token holder detects another node transmitting a frame, it immediately discards its token by reverting to listener mode. This drops the number of tokens in the ring to 1, which is the correct condition, or 0, which is equivalent to the loss of token and requires ring initialisation. Ring initialisation is equivalent to restoring the token. This occurs at start-up or after a token holder failure. When a node recognises a lack of bus activity it issues a claim-token control frame, padded with arbitrary data to a length of 0, 1, 2 or 3 response slots, depending on the value of the first two bits of its address. After transmission, the node waits one response slot and then listens to the medium. If another transmission is detected, the claim is dropped. If no transmission is sensed, the transmit-listen sequence is repeated using the next two address bits. If all address bits have been used, the last sequence uses a pair of random bits to resolve duplicate node addresses. If no transmission has been detected, the node considers itself the token holder. The data frame which is generated by the medium access control sublayer and transmitted over the bus by the physical layer consists of eight fields (Fig ure 6.11): -
Preamble — a bit pattern used to achieve bit synchronisation with the receiver. Begin-of-frame marker — to identify the beginning of the frame. Type field — to identify the frame contents: information, control or token. Destination address — to identify the destination node for the frame. Source address — to identify the source node of the frame. Data field — to contain the LLC frame supplied by the LLC sublayer. This is a
Real-Time Systems
114
Preamble
Data Begin of Type of Destination Source address address (LLC frame) frame frame
CRC
End of frame
Figure 6.11: ISO 8802.4 frame. variable length field. - CRC — the control word for error detection. - End-of-fxame marker — to identify the end of the frame. ISO 8802.4 networks are intended for real-time applications. The frame format is the same for broadband and carrierband networks. The advantages and disadvantages of the token passing algorithm can be summarised as follows. The advantages: • Deterministic access time to the medium. • High throughput under heavy load. • No minimum frame length requirement. The disadvantages: • Complexity of the algorithm. • High overhead involved by passing tokens, particularly under light load condi tions. The maximum time for one complete circulation of the token around the logical ring can be given as: T = NT + V?=lAti where N is the number of nodes, T is the maximum holding time of the token by any node, and AU is the propagation time between logically adjacent nodes in the logical ring. The time T defines the worst case medium access time for a node. It can be seen that the access time in a very large broadband bus network can be, although predictable, unexpectedly long. Polling. The algorithm is not part of the ISO 8802 standard. Nevertheless, it is occasionally used in star and bus networks. The algorithm is centralised, and it assumes the existence of an identified node (network master) which controls the right of other nodes to transmit messages. The polling method consists of the controller sending a message to each other node in turn, inquiring whether or not the node has anything to send. Each node recognises its own poll and responds to it, while ignoring polls directed to the other nodes. The nodes which have data ready to send
Com.manica.tion Networks
115
are granted the line in turn by the network master. The advantages of the algorithm are: • Simplicity of the implementation. • Deterministic access time to the medium. • No minimum frame length requirement. The disadvantages: • Low reliability caused by the fact that failure of the network master stops all communication. • High overhead involved by the polling procedure.
6.4
LAN Logical Link Control
The logical link sublayer defined by ISO 8802.2 is independent of the type of medium and medium access protocol. The logical link protocol has the capability to provide a connectionless datagram service without determining whether the data were suc cessfully transmitted, or a connection-oriented service that guarantees reliable frame delivery. The difference between the two types of services is fundamental. The datagram service allows a frame to be sent without a logical link between the source and the destination of the frame having been established first. This provides point-to-point as well as multi-destination and broadcast message transmission. However, there is no feedback to the source node whether or not a particular frame was delivered. This means that there can be no frame sequencing, frame acknowledgement, flow control, or error recovery. These missing functions must be provided by the transport layer. The connection-oriented service provides a reliable point-to-point transmission between two network nodes and is based on the High-level Data Link Control (HDLC) protocol [29]. There are three phases of the operation. In the first phase, a connection between two nodes is established. In this phase the frame numbers in the transmitter as well as in the receiver are reset. In the last phase, the connection is cancelled. In the middle phase, data frames can be sent in both directions. The transmitted frames are numbered modulo 8, with the ordinal numbers contained in the frame headers. The receiver acknowledges a received frame by sending back the ordinal number of the next frame expected. In fact, not every frame needs to be acknowledged. By convention, an acknowledgement with the frame number n acknowledges all previously transmitted frames with the ordinal numbers less than n. There is also no need
Heal-Time Systems
116
a) DSAP SSAP CTRL
Data
b) 0
N(S)
P/F
N(R)
Figure 6.12: ISO 8802.2 Logical link control frame (a) and the control field (b). for special acknowledgement frames. Since the communication is bidirectional, the receiver may include the number of the acknowledged frame in the header of its own frame. Thus the frame header must contain two frame numbers: - N(S): the ordinal number of the data frame itself, and - N(R): the ordinal number of the next frame expected, which acknowledges the receipt of the frames up to N(R) — 1. To prevent a node from overwhelming another node by sending more frames than the other one can handle, some form of flow control is necessary. This is implemented by the rule that a node can transmit the frame number N(S) if it received the acknowledgement N(R) and: N(S) < N(R) + W (modulo 8) . The predefined parameter W, called an acknowledgement window size, specifies the maximum number of frames which can be transmitted in advance, prior to acknowl edgement. It can be seen that for W = 0 the next frame cannot be transmitted until the previous frame has been acknowledged. The LLC frame format recommended by the ISO 8802.2 standard is shown in Figure 6.12. The frame consists of four fields: - DSAP — the destination service access point, i.e., the logical identifier of the frame recipient. - SSAP — the source service access point, i.e., the logical identifier of the frame originator. - CTRL — the control field which specifies the type of the frame. - Data — the sequence of data bytes comprising the message from the originator to the recipient of the frame. The control field (CTRL) of a data frame (Figure 6.12b) contains two ordinal numbers: N(S) — of the transmitted frame itself and N(R) — of the next frame expected, and the Poll/Final ( P / F ) bit. The P / F bit is used to request an immediate response from the partner. No other frame can be transmitted on this connection until the response has been sent.
Communication Networks
117
The frame with the most significant bit in the control field set to 1 is a control frame. It conveys supervisory messages rather than data. The following supervisory messages are provided: SABM (Set asynchronous balanced mode) requests a connection to be established between the originator and the recipient of the frame. DISC (Disconnect) terminates the connection. UA (Unnumbered acknowledgement) acknowledges positively, i.e., accepts SABM or DISC commands. DM (Disconnected mode) acknowledges negatively, i.e., rejects the SABM command. RNR (Receiver not ready) requests the originator to suspend the transmission of frames temporarily. RR (Receiver ready) requests the originator to resume the transmission of frames after a temporary suspension. REJ (Reject) acknowledges negatively the receipt of information frames, i.e., requests the originator to resend the information frames starting with ordinal number N(R). XID (Exchange identification) passes state information, e.g., acknowledgement win dow size.
6.5
M A P / T O P Protocol
Early computer control systems introduced to the factory floor in the nineteen sev enties were largely confined to particular work cells, such as process control systems or numerically controlled machine tools. These formed so-called "islands of automa tion" with devices linked together mostly by means of point-to-point cabling. When computer and control technologies matured, the need for connecting particular "is lands" together was recognised. The major drawback in accomplishing this goal was the inability of individual manufacturing work cells to communicate with other work cells — mostly because the LANs of those cells were proprietary and did not conform to any common communication standard. By the end of the nineteen seventies, an investigation of General Motors (GM) showed that half of GM's automation budget was spent on interfaces between incompatible machines [21]. To overcome the problems, the MAP project was originated by GM to establish a worldwide standard for factory floor communication systems, known as the Man ufacturers Automation Protocol . At about the same time the Proway project was initiated by the Instrument Society of America (ISA) with the same goals as MAP. A third project conducted at that time was TOP, promoted by Boeing, and intended to establish a communication standard for office automation, known as the Technical Office Protocol. The work of particular committees converged to make all controllers, computers, robots, vision systems, and all other computer-based devices communi-
Real-Time Systems
118
Office and management workstations Electronic Workstation mail TOP TOP end end station station
File server TOP end station
TOP 8802.3 baseband
Production Directory scheduling service MAP MAP end end station station MAP/TOP router
7
Mini Mini Mini MAP MAP MAP Bar code Robot Numeric reader controller controller
Wide area public network PROFIBUS
Slave
MAP 8802.4 broadband EPA gateway
X25 modem
Master
Factory floor equipment
Slave
Process field devices
Gateway Loop control
EPA gateway Mini MAP
Mini MAP
Programmable logic controllers
Figure 6.13: MAP/TOP/PROFIBUS network hierarchy.
cate in a language that would be compatible with every other element on the factory floor and in the front office. The result of these efforts is a set of compatible specifications for factory floor — MAP, and office — TOP communication systems [30]. The extension of MAP towards process automation equipment is the fieldbus, which is the continuation of digital communication from business management over plant supervisory manage ment to process end devices and their physical interface to the process itself. An advanced fieldbus standard developed in the late nineteen eighties by the Interna tional Electrotechnical Commission (IEC) is PROFIBUS [31]. The relation between MAP, TOP and PROFIBUS is shown in Figure 6.13. It can be noticed that the structure is compatible with the hierarchical network architecture of Figure 4.14. MAP as well as T O P adhere to the seven-layer ISO reference model described in Section 6.1 and accept the ISO 8802 specifications described in Sections 6.3 and 6.4.
Communication Networks
119
The standards are fully compatible and the only difference between them is at the lowest layers (physical and medium access control). The reason for this diversity lies in the different requirements for factory floor and office networks. The main issues in designing communication systems for distributed real-time ap plications are those of high reliability and of deterministic response times under heavy load. These demands can be related to the physical structure of the network in the form of the following three requirements: • There should be no distinguished network master node. • A break-down of a single node cannot affect the communicational abilities of other nodes. • Each node must have guaranteed access to the transmission line in a determin istic time. Moreover, the network should handle a large number of relatively short messages efficiently. The requirements in an office automation system are for high average throughput and efficient handling of relatively large data files. For these reasons, TOP uses the ISO 8802.3 CSMA/CD bus specification and MAP uses the ISO 8802.4 token passing bus specification for physical and medium access control layers. M A P architecture. The physical layer specification of MAP provides two alter natives: 10 Mbps broadband bus using three-level duo-binary amplitude modulation with phase shift keying for backbone networks, and 5 Mbps carrierband bus using frequency shift keying for small subnets. The transmission medium is consequently 75 fi coaxial cable. The broadband version uses a single cable with a head-end remodulator, and needs two transmission channels each with 12 MHz bandwidth. The carrierband version uses neither amplifiers nor head-end remodulator, which makes the bus bidirectional and entirely passive. The lower half of the data link layer (medium access control) implements the token passing scheme. The upper half of the data link layer (logical link control) is specified to provide connectionless datagram service. This speeds the frame transfer up, but provides no flow control and no error recovery functions. The LLC protocol can be used to link together networks which differ at physical and MAC levels, e.g., token passing subnetworks with different tokens. The network layer is responsible for routing messages between network users, and becomes important if the messages have to be sent between different subnetworks interconnected within a wider network. For reasons of speed, the network layer of MAP is defined by the ISO 8473 connectionless Internet protocol. Internetworkability with connection-oriented networks, particularly with TOP, is achieved by means of
Real-Time Systems
120
Table 6.2: MAP protocols. No 1. 2. 3. 4. 5. 6. 7.
Layer name Physical layer Data Link layer Network layer Transport layer Session layer Presentation layer Application layer
ISO ISO ISO ISO ISO ISO ISO
Standard documents 8802.3, 8802.4 8802.2, ISO 8802.3, 8802.4 8473 8073 8326, ISO 8327 8823 8649, ISO 8650
routers. The interface between a MAP network and public WANs can be achieved by the network layer via an X25 link [29]. The transport layer is defined by the ISO 8073 protocol and provides connectionorientated services with the functions of packet sequencing, flow control and error recovery. Higher layers (session and presentation) make use of kernel functions pro vided by the ISO 8326, ISO 8327 and ISO 8823 protocols. The application layer, which is in fact the network-user interface, is defined by the ISO 8649 and ISO 8650 protocols and includes network management services, a directory service for name-to-address mapping for all "objects" attached to the net work, and two application oriented services implemented by the file transfer, access and management protocol (FTAM) and the manufacturing message standard (MMS). FTAM provides for the transfer of information between application processes and file stores. In doing so, it resolves the problems of different file structures and organisa tions in the various user systems which use the network. MMS provides sophisticated mechanisms for accessing data structures in a wide variety of different types of equip ment. It allows an application process to work independently of the internal data representation of a particular control or machine device. ISO documents defining communication protocols for particular layers of MAP are listed in Table 6.2. Conformance to these standards enables different systems and devices to exchange information and to work together in an interconnected network system. M i n i - M A P architecture. The versatility of broadband MAP has been achieved at the expense of speed and simplicity. The overhead imposed by network hardware as well as by multi-layered software makes MAP networks expensive and not fast enough for time-critical control at the work cell level. This was the reason why an alternative network architecture, called Mini-MAP, was introduced. Mini-MAP achieves speed at the price of departing from the ISO seven-layer model. It consists of 2\ layers (physical, data link and simplified application) while dispensing with the network,
Communication Networks
121
session and presentation layers. At the physical layer, Mini-MAP is a passive carrierband bus running at 5 Mbps, and is restricted to a local segment of cable no longer than 700 m. It differs from MAP in the logical link control sublayer. This is because (time-critical) applications running Mini-MAP have to interface directly with the data link layer and must rely exclusively on the limited services provided by this layer. The modification provides the application layer with reliable connection-oriented service complying with the ISO 8802.2 specification. Mini-MAP nodes can effectively communicate with nodes outside the local bus segment only in an indirect way, viz., via Enhanced Performance Architecture (EPA) nodes, which have dual architecture of the seven- and the 2|-layer devices and are, in fact, inter-network gateways. EPA nodes can communicate with Mini-Map nodes within a local segment and with full MAP nodes within the entire MAP network. The Mini-MAP architecture is intended for low-cost intra-cell communication, while the EPA architecture can be implemented for a cell controller. P R O F I B U S . The general architecture of PROFIBUS consists of 3 layers (phys ical, data link and application) while dispensing with the other layers of the ISO reference model. The physical layer is basically defined as the multi-point RS-485 interface using shielded twisted pair cables. The communication is asynchronous and can easily be implemented by means of widely available standard microcomputer peripherals. Optionally, galvanic isolation and duplicated communication links can be provided. Further options for an ISO 8802.4 based token bus and fibre optics point-to-point connection are available. The data link layer is based on the master-slave communication principle. A master can query the slaves in an order defined by a polling list. The slaves respond to the master requests by sending and receiving data, or by requesting additional services from the master. There can be more than one master in a network. In such a case the communication medium is shared among the masters according to the token passing protocol. The logical link protocol intended for the token bus architecture implements connectionless as well as connection-oriented services. The application layer provides network management services and a fieldbus message standard, which enables data structures to be accessed by a variety of process devices, regardless of the internal data representation of a particular device. The relation between PROFIBUS and Mini-MAP is not clear at the moment, and the two standards may appear to be competitors at the work cell level. Furthermore, it seems that support for PROFIBUS (or other low-cost specialised fieldbus networks) mainly comes from the process industry and that the standard is mainly intended for continuous process control applications, while Mini-MAP is more oriented towards discrete manufacturing systems. It also appears that, despite efforts being made to standardise factory floor communication, some widely installed networks — such as
122
Real-Time Systems
the Allan-Bradley Data Highway , the Gould Modbus, DECnet, IBM SNA and HP DS/1000 — will not be eliminated and that the interconnection facilities between those systems must be maintained.
Chapter 7 Real-Time Operating Systems Principles 7.1
Operating System Requirements
An operating system can be defined as a collection of software, firmware, and sometimes even hardware elements that control the execution of computer programs and provides facilities for computer resource allocation. Depending on the type of the operating system, and the intended application area, the operating system programs can provide additional services, such as job control, communication with a human operator and network communication. The functioning of an operating system can be described in the most coherent way in terms of resource management. Basic resources of a computer system are processors, the main memory, peripheral devices and data files. At least two of them (processor time and main memory space) are necessary for any program execution. In a multi-programming environment a number of programs is executed simultaneously or quasi-simultaneously, which implies that the resources must be shared among them. This creates the problems of allocating resources to programs, protecting resources allocated to particular programs, and defining means for inter-program co-operation. An exhaustive discussion of various aspects of resource management in general pur pose operating systems can be found in [34]. A real-time operating system intended for control applications differs signif icantly from general purpose and business-oriented systems. This is caused by dif ferences in the hardware environment as well as in the requirements for operating system functions and services. The most important requirement a real-time operat ing system has to fulfill is that it can respond to external and internal events within predefined time limits. An event can be informally defined as a condition, arising in 123
124
Real-Time Systems
the computer system or its environment, which requires some specific processing from an appropriate software program. (In this, and the following discussion, software pro grams are understood to include firmware programs.) It is a good engineering practice to structure real-time software around particular events, or groups of related events, into a set of software tasks. Each task implements the processing required by the corresponding event or group of events. Tasks can be organised as separate program units managed by the operating system and executed by the hardware. It can be seen from the above description that the notion of a task closely cor responds to the notion of a program. In several contexts the words can be used in terchangeably. Nevertheless, an important difference between them should be clear. The term program refers to static program code which can be loaded into a com puter memory and executed. The term task refers to the dynamic process of program execution. The distinction is essential, particularly in those systems which include re-entrant multi-access software routines. A single copy of program code can be exe cuted at the same time by a number of tasks. According to this naming convention, the term multi-tasking will be preferred to multi-programming. A brief survey of those features which are common to general purpose and real time operating systems, and those which are different, can be structured with respect to characteristics of particular resources and resource management mechanisms. P r o c e s s o r m a n a g e m e n t . In a multi-tasking environment, tasks compete with each other for system resources and, on the other hand, co-operate to achieve com mon goals in a distributed application. The main problems in the construction of general purpose operating systems are those of sharing system resources among the tasks created by multiple users. The sharing strategy should be fair; and particular users should be carefully protected from other users against accidental or malicious influences. The overall processing performance should be (in average terms) high, but occasional delays are of minor importance. To achieve these goals, the operating system optimises the ordering of task executions, which makes the execution times of individual tasks unpredictable to the software designer. The requirements for security and protection are implemented by maintaining a large volume of additional data de scribing the resources allocated to particular tasks. Processing security-related data increases the overhead and slows down the operating system response. The requirements for real-time operating systems are different. The resource allo cation strategy, and particularly the task scheduling strategy, must promote the most urgent tasks, and average throughput must be sacrificed in favour of meeting dead lines. The mutual protection of tasks is of secondary importance, as in most cases the control computer is dedicated to an application, and no other user exists. The main requirement is for the mechanisms of task synchronisation and communication, which give the designer full control over the ordering of task executions. M e m o r y management. There are two different aspects of memory management
Real-Time Operating Systems Principles
125
in a computer system. The basic one is that of sharing main memory space among various tasks executed by the system. The other more sophisticated one is that of swapping programs and data between main memory and backing storage to create an environment allowing programs to be executed which require more immediate access storage than actually available in main memory. If the swapping is handled by the operating system in a way perceived transparent by the application tasks, it creates the illusion of a vast, practically unlimited and homogeneous space, known as virtual memory. The mechanism of virtual memory is very convenient, as it frees the software designer and the implementer from memory space limitations. The price for this is the time overhead spent in transferring programs and data between the various levels of system memory. The price is not high when related to the performance of an average system, as the transmission can be accomplished in parallel with the execution of other tasks in a multi-tasking environment. This makes the mechanism of virtual memory highly desirable in most general purpose and commercial systems. However, the evaluation changes dramatically when considered from the viewpoint of a particular task: the time needed for transferring data and program code delays the task execution and slows down the system response times. Moreover, the transparent handling of the swapping operations by the operating system makes the response times unpredictable and is, therefore, unacceptable in hard real-time applications. The preferred strategy for memory allocation in a real-time system is static allocation of memory space to tasks, with the provision for overlaying controlled by explicit requests from each task. It will be seen that there is a large number of embedded control systems which are based on diskless microcomputers, with a fixed set of programs stored in read-only memory (ROM), whose data is statically allocated in random access memory (RAM). In such applications no memory management is needed. In any case, the problem of memory management is relatively simple to analyse and process, and is of only secondary importance. Device m a n a g e m e n t . The most important peripheral devices in a control system are the process interfacing devices described in Chapter 5. Such devices are usually dedicated to particular tasks, and are directly serviced by the application routines. Standard devices, such as terminals and disks, are serviced by the operating system in a standard way, but special mechanisms for device reservation and allocation to tasks are hardly ever needed. Many embedded systems do not use standard devices at all — e.g., on-board avionic systems or computer-based numeric controllers. The absence of disk memory and an operator terminal makes the problem of coding and debugging application programs in embedded systems relatively difficult. The problem is solved by employing two different computer systems in different phases of the software life-cycle:
Real-Time Systems
126
• The host computer creates the environment for software development by pro viding services for editing, storing, and compiling application programs. • The target system creates the run-time environment for the application software execution. In the first phase the application programs are developed and debugged in the host system. After preliminary testing, the programs are placed into ROM and plugged into the target hardware for final testing and operation. The two systems can be based on the same or different operating systems. The former option (the same operating system) is better, as it provides for more reliable software debugging and testing in the host environment. However, this creates an additional requirement for the configurability of the operating system, i.e., the ability to tailor the operating system structure for different hardware architectures. In a distributed system, particular computers can be booted through the network from a disk-based server station. File management. The mechanisms for file management in a disk-based real time operating system do not significantly differ from the standard mechanisms used in general purpose systems. The above discussion shows that the most important problems of real-time operat ing system development and application relate to problems of processor management, i.e., time and event handling, task scheduling, synchronisation and communication. Other problems are either of secondary importance, or do not significantly differ from those faced in the design and application of general purpose systems. This is why the rest of this chapter is devoted mainly to the problems of processor management. The operating system becomes more complicated when a computer network is considered instead of a single centralised machine. Lack of common memory in a distributed system limits the options for storing common data of the operating system and changes the style and timing of inter-task communication. Basic problems of network systems will be briefly discussed in Section 7.6.
7.2
Synchronous and Asynchronous Task Execu tion
Application tasks executed in a control system are relatively independent. Most of them are processed periodically, while others are triggered by events. In those applications in which the majority of tasks is periodic and the execution times have known upper bounds, it is possible to schedule the tasks statically, based on explicit computer time allocation. In the simplest case, when all tasks require processing at the same frequency, the
Real-Time Operating Systems Principles
127
Event tasks —* Major tasks—► Minor tasks—* B lAl B r* minor frame ** major frame
\M B
¥
time
Figure 7.1: Synchronous execution of periodic and event-driven tasks. following method can be implemented. A "frame" is defined as the basic timing interval triggered by the timer interrupt. To each periodic task a slot (a segment) is assigned in each frame. Within its slot a task is run to completion, and has exclusive access to all global data. The order in which the tasks are executed is used to synchronise data transmission among them. Event-driven tasks can be activated in the time interval between the end of the last slot in the current frame and the beginning of the next frame. The recognition of the events can be based on a hardware interrupt mechanism or on a software polling procedure. In the latter case a software routine called an event scheduler examines the status of peripheral devices and checks it against the conditions which have been defined as "events". In a more realistic setting the tasks require processing at different frequencies. This can be implemented by defining a multi-level structure of time frames. A single frame in an upper level consists of a fixed number of lower level frames. In a twolevel structure, which is shown in Figure 7.1, the low level frames are referred to as "minor frames", while the high level frames are called "major frames". To each task executing at the lower frequency, i.e., the frequency of major frames in the two-level example, a slot of one of the minor frames in a major frame is assigned. If a task is too large to fit into an available slot, it is broken into subtasks and distributed over several minor frames. In both cases the tasks are executed at a constant frequency, controlled by the system timer. This is why the approach to real-time software construction described above is sometimes called synchronous task execution approach. Since the allocation of frame slots to tasks can be quite complex, it is usually centralised in an executive program. To perform its job, this "cyclic executive" usually employs a precompiled list of tasks to determine which tasks will be invoked in each minor frame, as well as the order in which they will be invoked. The partitioning of the lower frequency tasks into subtasks can be explicit, performed by a software designer, or it can be automatically accomplished by the executive. The basic structure of a two-level synchronous (cyclic) executive with automatic
128
Real-Time Systems
task partitioning is shown in Figure 7.2. Tasks executed at the frequency of minor frames are referred to as minor tasks, while tasks executed at the frequency of major frames are referred to as major tasks. The lists of tasks of both types are called the "minor" and the "major" task list respectively. The acronym PSW stands for program status word, and refers to the contents of the processor registers (program counter, accumulators, etc.) which are involved in the execution of each computer program. The PSW can also be called a task context. The layout of frames implemented by the synchronous executive must be feasible, i.e., it must obey the following rules: • The length of the minor frame must be greater than the maximum of the sum of execution times of all minor tasks plus the time slot reserved for event-driven tasks. • The length of the major frame must be greater then the maximum time used by the minor tasks plus the slots reserved for event-driven tasks plus the maximum of the sum the major tasks* execution times. • The size of slots reserved for the event-driven tasks and the ordering of the tasks must ensure that the deadlines defined for event processing are met. The synchronous approach is an efficient method for handling task execution in those applications in which the tasks do not need to wait for resources (other than processor time) and are always ready for execution. When a task must wait for the allocation of a resource or for the occurrence of a specific event, it should relinquish the processor in the current frame and check the corresponding conditions again in the next frame. Otherwise, processor time is wasted, and the maximum duration of task executions cannot be guaranteed. When a task execution depends on several conditions, each combination must be explicitly handled by the task's program code. This leads to a very high complexity in the task structure, which makes task debug ging, testing and modification difficult. Moreover, the conditions must be checked anew in each frame, which results in significant overhead and unproductive processor use. Synchronous task handling is frequently used for implementing embedded soft ware for relatively simple ROM-based specialised controllers, such as industrial PID controllers or on-board controllers for video recorders. In the applications with more complicated patterns of dependencies between tasks, resources, events and time, task scheduling and execution must be organised by an operating system dynamically and asynchronously. The key concept in this case is the parallel and simultaneous execution of a number of tasks under the control of a multi-tasking operating system. The operating system provides the means for
Real-Time
Operating
Systems
Principles
129
Timer interrupt Save PSW Open minor list
End o f ^ v .N No
* Execute minor task
[Yes Event ?
1 Yes Execute event-driven task
Yes
_E
Restore PSW
J Return from interrupt
No
Open major list
Execute major task Wait for interrupt Figure 7.2: Synchronous executive.
T
Read-Time Systems
130
Execution of C >k
Execution of 33
c i—i
B
\
•
• j-
j
A ■
ii
Start C Start B
1
End C
End B
—*time
Figure 7.3: Quasi-simultaneous task execution. synchronising task execution with respect to other tasks, events, and time. When a task must wait for the allocation of a resource or the occurrence of an event, its execution is temporarily suspended, and the processor is re-scheduled to another task which is ready for execution. The execution of a suspended task is automatically resumed when the condition for which it is waiting is fulfilled.
7.3
Multi-Tasking
Simultaneous execution of tasks can be implemented in two ways. True simultaneity can be achieved in a multiple processor system with different processors executing dif ferent tasks at the same time. Otherwise, the resource provided by a single processor can be shared among tasks contending for execution (Figure 7.3). Since the number of tasks is usually greater than the number of processors, even if multiple processor hardware is employed, there is no need to distinguish between these two cases. The life-cycle of a task can most conveniently be described using the notion of task states. During its life, a task can be in one of the following four states: Terminated. The task is registered and entered in the list of tasks maintained by the operating system. Resources can neither be requested by nor allocated to the task. A task is in this state before starting and after finishing execution. Ready. The task is ready for execution and waits for the processor. This means that all conditions for starting or continuing the task are fulfilled and all resources requested by the task have been allocated, with the exception of the processor, which is busy and is executing one of the other tasks. Running. The processor has been allocated and the task is being executed. This means that the processor is executing the program code of this particular task. Suspended. The task is waiting for the allocation of a resource, for the occurrence of an event or for the expiration of a specified time interval.
Real-Time Operating Systems Principles
131
Figure 7.4: Task state transition diagram (T - Terminated, R - Ready, S - Suspended, E Running). The state transition diagram showing the states and the operations for the changes of state is shown in Figure 7.4. It is important to notice a fundamental difference between the states Terminated, Suspended and Ready, and the state Running. The former three states reflect the relation between a task and its environment, and the degree of the task's readiness for execution: a terminated task does not want to run, a suspended task must wait for something, and only a ready task can be executed. The distinction between the states Ready and Running reflects an implementational limitation of the computer system (lack of processors), rather than a status of the task itself. Unlike other transitions, which are caused by changes in the task environment, i.e., in another task or in a related peripheral device, the transition between these two states cannot be forced by the environment but result from internal decisions of the operating system. There can be a number of tasks which are ready for execution, but only one of them can actually be running in a single processor system. The decision — which of several tasks it is to be — is dynamically taken by the operating system scheduler. An ideal criterion for selecting a task for execution should be based on the analysis of the deadlines for the tasks' executions. The simplest algorithm of that type assumes that in each state the ready task with the earliest deadline should be selected for execution before the others. However, in many applications it is considered easier to define not the deadlines, but relative urgencies of particular tasks, which may be reflected by an ordered set of task priorities. The ready task with the highest priority is selected for execution.
132
Real-Time Systems
State changes are triggered by events occurring inside the computer system or outside of it, in the controlled environment. Events can be classified into the following four groups: External events correspond to conditions arising in the computer environment. The events are usually signaled by interrupts presented by peripheral devices (e.g., an interrupt from a two-state threshold sensor). However, they can also be gen erated by software routines (e.g., by a program which periodically monitors the state of a sensor). Timing events correspond to the expiration of specified time intervals (e.g., expi ration of the time At specified in a DELAY(At) instruction). These events are usually signaled by the system timer handler, which is triggered by hardware timer interrupts. Internal events correspond to errors and exceptions arising during program exe cution, generated within the computer system. The events can be signaled by interrupts generated by computer hardware (e.g., a divide-by-zero interrupt from a numerical coprocessor) or can be raised by software routines (e.g., a divide-byzero error code from a floating-point arithmetic subprogram). Program events correspond to special conditions occurring within a task during its execution. Such events are generated by explicit task requests to the operat ing system, issued by means of program traps or other calls to operating system subprograms. A multi-tasking operating system consists of an executive (system nucleus, kernel), which is responsible for task and event handling, and a number of system tasks (shell), which provide a variety of operating system services. System tasks are run under control of the executive, just like other application tasks (Figure 7.5). The partitioning of functions between system tasks and the executive can differ signifi cantly from system to system, but the basic operations of task state transitions are always implemented by the executive. Other functions which are often implemented by the executive correspond to memory management. The logical structure of a multi-tasking executive which is responsible for processor management can be viewed as in Figure 7.6. The executive is entered via an interrupt request or a program trap which signals an event. The interrupt monitor module recognises the type of interrupt and starts the corresponding event handling routine. The reaction to the event usually involves one or more state transitions, e.g., to stop an erroneous task or to activate tasks related to an external event. As its last step, the task scheduler module selects a task for execution and releases control to that task. The bottom part of the executive consists of a set of system tables which describe the particular tasks maintained in the system, and a set of state transition subrou tines. Each task is represented within the executive by a task state table (TST)
Real-Time
Operating
Systems
133
Principles
System tasks
Application tasks Task level
Executive: task and event handling
Nucleus level
Operating system Figure 7.5: Model of a real-time multi-tasking operating system.
Interrupts and program traps
Interrupt monitor
Event handling routines \* -
State transition subroutines
Task state tables
Task scheduler
Return to task execution Figure 7.6: Model of a multi-tasking executive.
134
Real-Time Systems
which contains data describing task state, task priority and the resources allocated to the task. Other data items which can be placed in the TST will be introduced below. The task state table is also called task control block. The state transition subroutines implement the operations Start, Abort, Sus pend and Jiesume shown in Figure 7.4, and the operation Seiect which selects the highest priority task ready for execution. It should be noted that the operation Stop is a special case of Abort. The operations Run and Preempt are implemented by the task scheduler. The state transition operations can be described in terms of the processing of data items stored in task state tables. To speed the algorithms up, the operating system executive usually organises the tables into lists. The basic lists are: 1. the [task directory ( T D ) ] which contains all tasks registered in the system and enables the task identifiers to be translated to the addresses of the task state tables, 2. the [ready list (RL) ] comprising the tasks which are ready for execution. Other lists comprise the tasks which are suspended for particular reasons. Usually a separate list is organised for each reason of suspension, e.g., a list of tasks waiting for a particular resource, or a list of tasks waiting for the expiration of specified time intervals. The lists are usually linked by pointers, stored as items in the task state tables. We now consider the task state tables and the state transition subroutines in more detail. The minimum set of data items which must be stored in a TST comprises the following: State — the task state code, Priority — a number increasing with task urgency, PSWo — the initial contents of the PSW, PSW— the current PSW. The algorithms describing the subprograms Start, Abort, Suspend and Resume, and of the task scheduler, are shown in Figure 7.7. The initials RT used in the figure stand for running task, and refer to the system variable which stores the identifier of the currently running task. The task state codes T, S, R are consistent with the notation in Figure 7.4. There is no specific code for the running task, which is distinguished by the variable RT. It should be emphasised that the example of the TST and the state transition algorithms given above has been simplified significantly. In practical applications the volume of data stored in the TST is much greater, and the algorithms are more complex. The additional data items describe the time and resource requirements of particular tasks. The data are maintained by operations which reflect the status of and the relations between the task and its resources. For example, the operation
135
Real-Time Operating Systems Principles
Resume(T)
Start(T)
Suspend(T)
Find TST of the task T in TD
Find TST of the task T in TD
Find TST of the task T in TD
Enter the task T into RL
Remove the task T from RL
Enter the task T into RL
State=R
State=S
State=R
PSW=PSWo
Return
Return
I
Return Abort(T)
Task scheduler
—T~
Find TST of the task T in TD
Select the highest priority ready task T in RL
Yes Remove the task T from RL State=T Return
Store PSW of RT in its TST RT=T Restore PSW of RT from its TST — Return to task execution Figure 7.7: Task state transition algorithms.
136
Real-Time Systems
Abort must deallocate all resources which have been allocated to the aborted task, and must remove the task from the list(s) of tasks waiting for events or resource allocation. If a task is to be dynamically created in a disk-based system, then the operation Start must allocate initial memory space, create a TST, and load the task's program code initially from the memory to the allocated main memory area. Another problem related to the implementation of the executive is that of protect ing the operating system, and particularly the executive, from accidental or malicious damage. The problem can be solved radically by restricting each task to within an al located memory area, and preventing it from accessing memory segments outside this area. This may be implemented at the hardware level by a m e m o r y management unit which controls the processor's access to main memory and allows access to the specified memory segments only. An attempt to violate the rules causes a "memory protection violation" interrupt, which breaks into task execution and enters the exec utive program. The processor can operate in one of two states: user and supervisor. The instructions referring to the memory management unit are privileged, i.e., they can only be executed in the supervisor state. The switch from the user state to the supervisor state is usually triggered by an interrupt signal. The return to the user state can often be accomplished under software control. Other privileged instructions are those for controlling peripheral devices and the interrupt system. A memory protection mechanism prevents tasks from accessing system data struc tures, but does not entirely solve the problem of exclusive access to the system data. Consider an interrupt occurring at a time when the executive is running. If the interrupt system were enabled, the current execution of the executive could be inter rupted, and a new execution would be started from its initial point. This could create the phenomenon of a "doubled executive" operating concurrently on the same data structures. The solution in a single processor system is simple: interrupts must be disabled when the executive is running. (In fact, not all interrupts must be disabled; but a refinement of this solution which is not dealt with here.) In a multiple processor system the situation is, however, more complicated, as two or more processors can be using the same executive program code at the same time. To solve this problem, some mechanisms for reserving exclusive access of an individual processor to the common memory are necessary. In the simplest case, exclusive access to common data structures can be arranged using a hardware mechanism to reserve the system bus (hardware supported Bus lock and Bus uniocJc operations). Unfortunately, this approach is extremely inefficient, as it prevents all other processors from accessing the common resources, and not only the particular data structures which actually need to be reserved. A much better approach is based on the concept of a binary semaphore which can selectively protect the required device or data structure. A binary semaphore is a two-state variable located in the common memory space: it indicates whether the resource corresponding to the semaphore is free or busy. Simple algorithms for semaphore operations are shown
Real-Time Operating Systems Principles
137 Release(B)
Reserve(B)
B=l Return
1
BUS UNLOCK
—czz B=0
BUS UNLOCK
I Return Figure 7.8: Active semaphore operations. in Figure 7.8. The operation ReseTve(B) examines the semaphore B and suspends the processor when the resource is busy (B = 0). The operation Release(B) marks the resource free ( 5 = 1). The algorithm for the operation .Reserve applies an active form of waiting in a loop, and makes use of the hardware system bus reservation mechanism. However, the bus is reserved for as short a time interval as possible. The problems of synchronising concurrent activities (tasks) will be discussed in greater detail in the next section.
7.4
Task Synchronisation and Communication
Application tasks in a multi-tasking system are executed concurrently. The actual ordering of the task executions depends on the distribution of events and is essen tially unpredictable. This indeterminism, however, which is inherent to the nature of concurrent systems, must not lead to unpredictable results during computation. Therefore some mechanisms for synchronising task executions and forcing a desired ordering of actions are necessary. The synchronising mechanisms need to be fully
138
Real-Time Systems
controlled by the application designer and the implementer. Basic tools for task synchronisation and communication are implemented by the operating system executive. More sophisticated operations can be implemented by system tasks and library subroutines of high level languages. The discussion in this Section gives an overview of the basic concepts of task synchronisation, without paying attention to the actual implementation method. To begin with, consider two co-operating tasks in a process control system (it is assumed that there are at least two tasks in the system). The first one monitors the state of the process variables, compares the values against some predefined limits, and produces error messages when the limits are violated. The second one waits for an error message and outputs it through a relatively slow serial interface to a remote operator station. Each function (monitoring and displaying messages) has to be performed by a separate task, because the average message transmission time is much longer than the required repetition period of the monitoring function. An attempt to solve the problem of co-operation between the tasks is shown in Figure 7.9. The messages are queued in a buffer served according to the FIFO (first in first out) algorithm, and inter-task synchronisation is based on explicit use of the operations Suspend and Resume described in Section 7.3. The monitoring task MON is executed periodically. It reads the values of process variables, checks the values against limits, and writes error messages to the buffer. The transmitting task COM suspends itself waiting for a message. When a message appears in the buffer the task is resumed. It reads the message and transmits it through the serial link. Since new messages can arrive during the transmission of one of the previous messages, they are counted by the counter S. After transmitting a message, the task COM checks the counter S and processes the next message, or suspends itself if the buffer is empty. The synchronisation in F i g u r e 7.9 is i n c o r r e c t . If a new message appears precisely at the time instant after the check has been made but before suspension of COM (the place is indicated by an arrow in Figure 7.9), then it will not be transmitted before the arrival of the next message. This error can be corrected by combining the elementary operations of checking and suspending on condition into a single, indivisible check-and-suspend operation. This leads to the concept of a s e m a p h o r e , which is a special tool for task synchronisation. A semaphore is defined as an integer variable which can be referenced by tasks using only the following two operations: WAIT(S): if S > 0 then 5 = 5 — 1 else suspend the running task. SIGNAL(S): if no task is suspended on S then 5 = 5 + 1 else resume one task. Both operations are indivisible, which means that no two references to a semaphore
Real-Time Operating Systems Principles
MON
COM
i
DELAY(At) Read process variables Suspend(COM)
Compare with limits
5 = 5-1
buffer Enter message to the buffer
Get message from the buffer Transmit the message
5 = 5+1 Resume(COM)
Initial value 5 = 0
Figure 7.9: Incorrect task synchronisation in a control system.
Real-Time Systems
140 MON
COM
DELAY(At) Read process variables Compare with limits
Enter message to the buffer Initial value 5 = 0 SIGNAL(S)
Figure 7.10: Correct task synchronisation in a control system. S can be executed at the same time. The task co-operation in the control system described above can be correctly synchronised by means of a semaphore as shown in Figure 7.10. Another typical problem of inter-task synchronisation is that of competition be tween two or more tasks, which try to access a common resource, such as a printer or a data file. This mutual exclusion problem can also be solved by means of a semaphore, as exemplified in Figure 7.11. The initial value of the semaphore is equal to the number of available resource items. The value S = 1, provided in the example, indicates that initially one printer is available. The task which executes the WAIT(S) operation before any other task can do so and changes the value to S = 0, indicating that the device is reserved. Any other task which attempts to request the printer by executing the operation WAIT(S) will become suspended. This state will last until the device is released by executing the operation SIGNAL(S). This will either resume one of the suspended tasks, or set the semaphore value to 5 = 1 if no task has been suspended. To be indivisible, semaphore operations are usually implemented within the multi tasking executive. The data structure related to the implementation of a semaphore
Real-Time Operating Systems Principles
141
Task 2
Taskl WAIT(S)
WAIT(5)
Print a message
Print a report
I
SIGNAL(S)
SIGNAL^)
~~r~
Other tasks contending for the printer
I
Initial value S ■
Figure 7.11: Mutual exclusion of contending tasks.
consists of an integer variable S and a list QS of tasks suspended on the semaphore S. Basic algorithms of the operations WAIT(S) and SIGNAL(S) are shown in Fig ure 7.12. These algorithms are compatible with those in Figure 7.7. The task list QS can be serviced according to the FIFO strategy, or organised otherwise using task priorities. Each solution has advantages and disadvantages and no consensus on the best one has been reached. However, the FIFO strategy is usually preferred because of its fairness and simplicity. The task list can be organised in a dedicated memory area, or the tasks can be linked by pointers stored in TSTs. It is interesting to compare the concept and the implementation of the semaphore introduced above with the more basic concept of the active binary semaphore in Figure 7.8. It should be noticed that suspending a task on a semaphore does not prevent the processor from executing other tasks, while it is the processor which is suspended on the active semaphore. It can also be seen that the operations of testing and modifying the binary semaphore B have been placed within the Bus lock — Bus unlock section to make the operations indivisible. Semaphore operations can be viewed as operations which send (signal) and receive (wait for) abstract synchronisation signals. Sending a signal is equivalent to the occurrence of a program event. Semaphores provide an efficient way to synchronise concurrent task execution but do not provide means for inter-task communication. This is why a message buffer had to be implemented in the application program in the example of Figure 7.10. There are a variety of methods to provide task communication based on the concept of messages exchanged between co-operating tasks. The most basic distinction be tween the different methods is that of synchronous and asynchronous message passing schemes. The synchronised operations SEND(T2,M) and RECEIVE(T1,M) executed by the tasks T\ and T2, respectively, carry the message M directly from the address space of the task T l to the address space of the task T1 (Figure 7.13a). This means
Real-Time Systems
142
SIGNAL(5)
WAIT(S)
Return
Suspend running task T, i.e., call Suspend(T)
Return
Remove the first task T from QS
I
Enter the task T into QS
Resume the task T, i.e., call Resume(T)
Task scheduler
Task scheduler
Figure 7.12: Semaphore operations.
that no intermediate buffering is provided, but that the operations synchronise the tasks in such a way that a pair of related operations is effectively executed at the same time instant. In other words, the task which comes to the SEND or RECEIVE operation before the other one, becomes suspended and must wait until its partner is ready to accept or to send the message. It is important to note that each task must explicitly know the identity of its partner. However, a variant of the RECEIVE operation which allows the receipt of any message, regardless of its originator, is also possible (the originator of a message must know the identity of the recipient in any case). A s y n c h r o n o u s message passing operations provide for the buffering of messages on their way from their source to their destination. This removes the need for bidirec tional synchronisation of message transfer. The task which sends a message simply puts it into a buffer and continues, without waiting for the recipient. The receiving task, however, must wait until the message becomes available. The messages stored in the buffer are queued according to the FIFO algorithm. There are two types of designs for buffered message transfer. For the first, the message buffer (message queue) is integrated into the receiving task. The operation RECEIVE(M) executed by the task T2 suspends the task if the buffer is empty, or fetches the message if it is already available. The operation SEND(T2,M) executed by the task T\ puts the message M into the buffer of the task T1 and resumes this task if it is suspended. The state of the task T\ is not changed in any case. It should
Real-Time Operating Systems Principles
143
a) Task T l
Task T2
SEND(T2,M)
TT— RECEIVE(T1 ,M)
Task Tl
Task T2
SEND(P.M)—>■ | | [ | | —*■ RECEIVE(P.M) buffer P — Figure 7.13: Synchronised (a) versus mailbox (b) message passing scheme.
be noted that the message originator needs to know the identity of the recipient, but not vice versa. For the other method, the message buffer is related neither to the originator nor to the recipient, but works as an independent mailbox. The addressing scheme is symmetric, in that both communicating tasks only need to know the identity (i.e., the name or number) of the mailbox. The message transfer operations RECEIVE(P,M) and SEND(P,M), where P is the mailbox identifier, are partially synchronised in exactly the same way as in the previous case. The construction of a mailbox can easily be recognised as a direct generalisation of the semaphore concept. The difference between the two is that data items are exchanged between the co-operating tasks, instead of pure synchronisation signals. This makes the mailbox implementation more complex than the implementation of a semaphore, as a message buffer must be used to store the messages, instead of having only a simple signal counter. In both cases the list of suspended tasks must be maintained in the same way. Semaphores as well as message passing operations are based on the concept of uni directional transfer of information (i.e. signals or messages) between the co-operating tasks. Quite a different concept is that of exporting and importing services offered by tasks to or from other tasks. The original idea of a remote procedure call has evolved and matured into the form of an extended rendezvous strongly promoted by the definition of the Ada language.
Real- Time Systems
144 TaskT2 TaskTl
jf, accept rendezvous (x:in, y:out) do
x
rendezvous (x:in, y:out); v
'
end rendezvous;
Figure 7.14: Rendezvous.
Consider the diagram in Figure 7.14. Task T\ invokes the rendezvous, specifying the name of the entry point defined in the co-operating task (rendezvous in Fig ure 7.14), the input parameters (in variables) and the results (out variables). Task T2 accepts the rendezvous, which causes the transfer of input data, the execution of the program routine related to this entry point, and the reverse transfer of results. The procedure is synchronised, which means that the task which comes to the invo cation or acceptance point before the other one becomes suspended, and must wait until the partner is ready for the rendezvous. Between the beginning and the end of the rendezvous the invoking task ( T l ) remains suspended, while the accepting task (T2) performs the processing. When the processing is finished, the suspended task is resumed and both tasks continue their execution concurrently. Comparing the methods for task synchronisation and communication described in this section, one can see that all of them are logically equivalent, in that each method can be easily modelled using another method. We invite the reader to model the rendezvous using only semaphore operations (two semaphores are required to provide bidirectional synchronisation). The practical usefulness of the methods, however, depends on the type of application. Semaphores represent the most primitive mechanism, as they only offer a flow of pure synchronisation signals. The implementation in a centralised system with a com mon memory is fast and efficient. Therefore semaphores are commonly implemented as a basic synchronisation mechanism, which can be either made directly accessible to application tasks or used by the operating system internally, for implementing more sophisticated communication mechanisms. Unfortunately, semaphores do not fit well into the structure of a distributed system in which there is no natural location for common data structures. The asynchronous mailbox message passing scheme is particularly convenient for implementing inter-task co-operation, as it offers flexible solutions for task commu nication problems. However, an efficient implementation requires a common memory
Real-Time Operating Systems Principles
145
area for the task list and message buffer storage. Synchronised communication methods, i.e., message passing and the rendezvous, are less flexible than asynchronous methods, as they offer no easy way to model asyn chronous communication, and the techniques which can be applied for this purpose are not particularly efficient. This will be demonstrated in Section 9.2. However, synchronised tools match distributed system architectures, as they do not need any common data structures, but entirely rely on messages exchanged directly between the tasks.
7.5
Time and Event Handling
Application tasks executed in a real-time system permanently and repetitively inter act with their environments, and explicitly depend on time. This implies that the task programs are usually structured as never-ending loops with strictly defined pe riods for the loop executions. The mechanisms for time measurement and signaling are provided by the operating system. The most primitive tool for time signaling is the operation DELAY(At) which causes task suspension for a period of time At. An example of a periodic task using the DELAY(At) instruction has been shown in Figure 7.10. The problem with this task structure is that the actual cycle time equals the period At plus the task execution time, which cannot be defined precisely. The accuracy of such time measurement is poor. It is easy to see that the problem cannot be appropriately solved at the task level, but that a specific mechanism for periodic t a s k execution must be implemented by the operating system. The most direct method for that purpose is to prompt at regular time intervals the task using the system timer module. To start the procedure, a task performs the operation CYCLE(At) which defines a frame (a time interval) of length At and requests the system to resume the task every frame. Prom now on the task does its job and suspends for the rest of the frame; when the frame expires, the task is automatically resumed. The process is repeated endlessly (Figure 7.15). Quite another use of the system timing mechanism is that of controlling the du ration of task suspension periods. When a task requests an input-output operation or another service from either the operating system or another task, it is usually sus pended — e.g., on a semaphore — until the end of the operation. A problem arises when the device or the co-operating task fails, and the signal to resume the suspended task never comes. The problem is particularly severe in a distributed system in which a failure in either a co-operating computer or the network interface can affect a group of tasks, while preserving full operational capabilities of the others. A mechanism to prevent expanding the "disaster area" is essential.
Real-Time Systems
146 Periodic task T Resume — At
— — At
Resume —
1
■
CYCLE(At)
Task
►loop:
Periodic function
Task
ti time
0 Suspend
Suspend
Suspend(T) go to loop Figure 7.15: Periodic task execution. The problem stated above is usually solved by means of timeouts. An optional argument of the operation, e.g., WAIT(S,At), specifies the maximum time interval within which the task can be suspended. If the task has been suspended, and the signal to resume has not come within the time At, then the operating system will cancel the operation and resume the task when the time expires. Despite the functional differences, the implementation of all of the timing mecha nisms introduced above is similar. The operating system maintains the list of tasks which have been suspended for specified time intervals. The list is usually ordered with respect to the length of the time intervals. A hardware timer regularly prompts the software timer module so that the intervals may be counted down. When the interval specified for a task expires, the task is immediately resumed. Further pro cessing depends on the type of the operation: • DELAY(At):
the task is removed from the timer list.
• CYCLE(At): the task is removed from the timer list and reentered with the initial value of time At. • Timeout: the task is removed from the timer list as well as from the other lists of tasks suspended on a semaphore or on an input-output operation. The timing mechanisms are usually implemented by the multi-tasking executive. However, it is also possible to disperse the functions between the executive, system tasks which drive the input-output devices and a dedicated system timer task. External events correspond in most cases to the interrupts originating in the computer environment. For this reason, the traditional style of event handling follows the style of the programming interrupt system. Basic operations available at the task level are the following:
.Real-Time Operating Systems Principles
147
CONNECT(Int,Ev): the interrupt identified by the code Int is defined as the event named Ev; this informs the interrupt monitor to pass the information about the occurrence of the interrupt Int to the event handling module. ENABLE(Ev): the interrupt corresponding to the event Ev is enabled; if the event occurs, it will be noted by the event handling module. AWAIT(Ev): the task performing this operation suspends execution and waits for the event Ev, or continues if the event has already occurred; information concerning the previous occurrences of the event is cancelled. DISABLE(Ev): the interrupt corresponding to the event Ev is disabled, and no in formation about the occurrence of the interrupt is passed to the event handling module. DISCONNECT(Ev): the correspondence between the event Ev and the interrupt is cancelled; no further operation concerning the event can be executed. The modern style of event handling is significantly different. The tendency is to suppress the technical aspects of interrupt systems, and to emphasise the logical aspect of synchronising tasks and events. Therefore an attempt is made to unify the event handling tools with the tools for inter-task synchronisation. Depending on the type of synchronisation operations implemented by a particular operating system, an event is viewed as a hardware initiated SIGNAL operation on a semaphore, a SEND of an empty message or a hardware invocation of a rendezvous.
7.6
Distributed Operating Systems
Distributed operating systems are usually defined as ones that control multiple com puter networks in a transparent way, i.e., which make the use of multiple processors invisible (transparent) to the user. This implies that the user of a truly distributed system should not be aware on which computer particular tasks are running, where files are stored, etc. This view of a distributed system does not match the charac teristics of real-time control systems, in which permanent wiring of process interface modules enforces, at least partially, the fixed allocation of tasks to processors, and where the most important decisions concerning resource allocation must remain with the application designers. Therefore we define a d i s t r i b u t e d o p e r a t i n g s y s t e m as one which controls a multiple computer network and provides unified, homogeneous means for intertask synchronisation and communication. This definition pushes the issues of distributed file systems out of the scope of this section. An illuminating discussion of file management in distributed systems can be found in [35]. The computers forming a distributed system do not share main memory. This renders methods for task synchronisation and communication which are based on shared memory techniques, such as semaphores and mailboxes, generally inapplicable.
148
Real-Time Systems
Instead, methods based on message passing must be used. The basic framework for message passing systems is the ISO reference model, described in Section 6.1, which enables computers with widely different hardware, operating systems, and data representations to be connected. Unfortunately, the overhead created by passing messages through all seven layers and running complex protocols is substantial, and the price for the generality of the model is very high, especially in fast LANs consisting of similar microcomputers which do, in fact, use the same internal representations and operating system conventions. This is why the M A P / T O P company-wide networks have accepted the ISO model, while the factory floor FIELDBUS and Mini-MAP have not. Another fundamental design decision that must be made is the choice between un reliable (connectionless) and reliable (connection-oriented) message passing services. The former choice means that the SEND operation simply puts a message out to the network and wishes it good luck! No acknowledgement is expected and no automatic retransmission is attempted if the message is lost. The latter option means that a connection between the source and destination tasks is established and the SEND and RECEIVE operations control the flow of messages internally. This includes message acknowledgement, error control and retransmission of erroneous or lost messages. When SEND terminates, the task which issued the operation can be sure that the message has been transmitted over the network and received in the other node. An algorithm for message flow control has been described in Section 6.4, and an example of the connection-oriented message delivery service of the real-time operating system will be given in Section 8.3. Comparing these methods, one can see that the unreliable, connectionless service creates much less overhead and is much faster than the other. Moreover, a connec tionless service provides an easy and natural way to broadcast messages throughout the network. The alternative is that a connection-oriented scheme guarantees not only reliable message delivery under normal operating conditions, but also enables system behaviour to be controlled in the presence of task failures. If a connection (a virtual circuit) has been established prior to message transmission, a failure in one of the co-operating tasks can be automatically signaled to its partner, e.g., in the form of an exception (an internal event). This enables recovery action, and prevents the errors from spreading across the system. For this reason, operating systems as well as high-level languages intended for real-time application (e.g., QNX and Ada) tend to use reliable rather than unreliable message passing schemes. Another point worth noting is the way in which a distributed operating system is implemented. One way to realise the system is to put a layer of software on top of the native operating systems of the individual computers. This could be a special library package which would intercept all of the system calls and decide whether each one was local or remote. Local calls are directed to the native operating system which handles them in a normal way, while the remote ones are emulated by the library
Read-Time Operating Systems
Principles
: Network operating system System calls interceptor and emulator global calls
System tasks
149 In a multi-computer operating system the system calls • are intercepted and emulated by the network system layer
Application tasks
Network handler
,In a single-computer operating system the system calls enter directly the executive
Figure 7.16: Layered network operating system.
package procedures (Figure 7.16). T h e m e t h o d m e n t i o n e d above is usually t h e only way of i m p l e m e n t i n g a network s y s t e m which evolves from an operating system originally w r i t t e n for a centralised c o m p u t e r . O n e of t h e reasons for this is t h e m e c h a n i s m for passing p a r a m e t e r s be tween different modules of t h e operating s y s t e m . To avoid copying d a t a , p a r a m e t e r s are passed within t h e system by reference, while t h e y can only b e passed by value over t h e network. T h e price paid for t h e network service is t h e overhead i n t r o d u c e d by an additional layer of t h e o p e r a t i n g system. Q u i t e a different way of building a d i s t r i b u t e d o p e r a t i n g s y s t e m is based on t h e client-server model. T h e o p e r a t i n g system s t r u c t u r e is flat: it consists of an executive a n d s y s t e m tasks handled by t h e executive. No additional layers a r e present. T h e executive i m p l e m e n t s unified, message-based c o m m u n i c a t i o n between t h e s y s t e m a n d t h e application tasks. All other operating s y s t e m functions and services are shifted t o t h e s y s t e m tasks. In this view, system tasks a r e servers, which respond t o requests from clients, which can b e applications as well as other s y s t e m t a s k s . T h e key feature is t h a t t h e message passing m e c h a n i s m works t r a n s p a r e n t l y t h r o u g h t h e network: each task can send a message t o another task within t h e network. At least in principle, all resources of t h e network are equally well available t o each task, regardless of t h e actual network configuration (Figure 7.17). However, one should be aware t h a t constructing a s y s t e m in principle is usually m u c h easier t h a n constructing it in practice. A serious p r o b l e m which is inherent to t h e n a t u r e of d i s t r i b u t e d systems is t h e one
JieaJ-Time Systems
150 System tasks
Application tasks
Executive
System tasks
Application tasks
Network messages Remote request Figure 7.17: Distributed operating system.
of t i m e service. Some distributed algorithms depend on having a single value of the global time, which enables the global ordering of separated events. This requires a global time service which can be asked by other tasks what time it is, or which broad casts the correct time periodically. However, conventional implementations of both mechanisms rely on transmitting timing messages over the network, which involves an unpredictable, variable delay. The problem cannot be adequately solved in this way, which suggests the conclusion, that having a single global time in a distributed system is impossible. A discussion of problems caused by the lack of global time can be found in [36]. However, a reliable time base of highest precision can be provided in each node of a distributed system by means of radio receivers for standard time signals, which are derived from the official atomic time and which are being broadcast by official agencies [37]. One of the potential sources of the timing signals is the General Positioning Sys tem (GPS), which disseminates the official atomic time of the U.S. National Bureau of Standards over the entire planet via 14 satellites (18 in the final stage). The other sources which can be considered are terrestrial stations in various countries, e.g., DCF 77 [38] of Physikalisch-Technische Bundesanstalt in Braunschweig, Germany, which is already used for many years by industry, government, railway, etc. Based on a global timing system, local clocks can be synchronised with the international time standard within an accuracy of 100 nsec. The global time system has already proved its usefulness under difficult conditions, e.g., during the 1991 Gulf war. The solution is quite cheap, as it requires only the integration of a receiver, local quartz controlled clock, and an appropriate synchronisation logic in a single chip.
Chapter 8 Comparison of Some Real-Time Operating Systems 8.1
System iRMX88
Intel's iRMX88 is a simple multi-tasking, real-time operating system intended for embedded applications built around the Intel 8088 and 8086 sixteen-bit microproces sors. The system has been designed by Intel Corporation for the iSBC88 and iSBC86 line of single board computers, and does not create a software framework for general purpose (commercial or numerical) data processing applications. The expansion soft ware package MMX800 enables the iRMX88 to control a multiprocessor system with a common memory bank. Distributed, i.e., network based, hardware is not supported by the system. The iRMX88 system does not create an environment for programming application tasks, in that it offers neither a flexible file system nor a set of utility programs and compilers. New applications have to be designed and coded on a host computer, preferably controlled by the ISIS II or the 1RMX86 (Section 8.2) operating systems. The 1RMX88 is delivered by Intel in the form of an object module library which must be linked together with the application modules. The resulting load module, comprising the system and the application programs, can be moved to the target computer for final testing and operation. The iRMX88 best suits diskless applications, in which the load module is placed in read only memory chips (ROMs). i R M X 8 8 architecture. The system consists of a multi-tasking executive (a nucleus) and a set of system tasks (Figure 7.5), which offer services to application tasks. Tasks can be created and destroyed dynamically and no arbitrary limitation on their number exists. To each task a priority is assigned in the range from 0 (maximum priority) to 255 (minimum priority). Ready tasks with equal priorities are queued 151
152
Real-Time Systems Task T l
Task T2
mailbox P2 Send(P2,M) -
+-|
|
*- Receive(P2,M)
mailbox PI Receive(Pl,Mr)
|-
Send(Pl,M,)
Figure 8.1: iRMX implementation of the synchronised task communication.
for execution according to the FIFO strategy. Task priorities are statically defined during task creation. The system executive is responsible for the creation and destruction of tasks, con current task execution, task communication and timing, and event (interrupt) han dling. The other services of the operating system are implemented by a set of system tasks. Each of the services can be requested by sending a message to an appropriate system task. The list of tasks and services comprises the following: 1. The Free Space Manager dynamically allocates and deallocates main memory seg ments requested by tasks. 2. The Terminal Handler allows system and application tasks to input text from the operator keyboard and to display text on the screen. 3. The I/O Manager handles all peripheral devices, in particular disk memory devices and files. 4. The Idle Task is an empty task which runs only when there is no other task ready for execution. All system tasks and services except the Idle task are optional, and can be included or not, depending on the particular application requirements. The only mechanism for task communication provided by the system executive is asynchronous message passing through mailboxes, which is implemented by two operations: Send: Send message to the indicated mailbox. Receive: Receive message from the indicated mailbox, or suspend execution if no message has been sent to the mailbox. Interaction between application tasks and system tasks is usually synchronised by sending a pair of messages between the co-operating tasks (Figure 8.1). The task state transition diagram (Figure 8.2) differs slightly from that introduced
Comparison of Some Real-Time Operating Systems
153
Figure 8.2: iRMX task state diagram. in Section 7.3: the state Terminated has been removed, while two additional states (Blocked (B) and Blocked-Suspended (BS)) have been added 1 . The other three states (Ready (R), Running (E), and Suspended (S)) are defined consistently with the model introduced in Section 7.3. When a task is created, it enters the Ready state directly. This means that the task code and data must have been loaded into the main memory and the task's Program Status Word (PSW) must have been initialised before the task is created. A task enters the Suspended state when it is waiting for a message, and the task is resumed when the message arrives. The "blocked" states enable the temporal deactivation of an erroneous task, and are intended as an aid in task debugging. Both states can only be entered or left as a result of explicit operations. The transition from Blocked-Suspended to Blocked corresponds to the transition from the state Suspended to Ready. This makes it possible to trace of the status of the deactivated tasks during debugging. Task communication. The iRMX88 executive implements a uniform mechanism for task communication through mailboxes. To each task a private (permanent) mail box can be assigned which can be read by the owner only. Furthermore, a number of public mailboxes can be created, each of which can be read by an arbitrary number of tasks. The public mailboxes are created and destroyed dynamically, and no arbitrary limitation on the number of public mailboxes exists. Messages sent between co-operating tasks can convey data and synchronisation 1 In the original iRMX88 terminology, our state Suspended is called Asleep, and the term Suspended is used instead of our Blocked. Mailbox operations are originally called sndm and wait, and have slightly different syntax.
154
Real-Time Systems
Table 8.1: The relation between task and interrupt priorities.
Priority of the running task Active interrupt levels 0 ...16 none 17...32 0 33 ...48 0 ...1 49 ...64 0 ...2 65...80 0...3 81 ...96 0...4 9 7 . . . 112 0...5 113 ...128 0...6 129 ...255 all
signals. To speed up communication, the contents of messages are not copied within the memory, but only pointers to the message data are passed from the originator to the destination tasks. This rule must be violated in multiprocessor systems, in which inter-processor messages must be copied from the local memory bank of a microprocessor to the common memory bank. This is performed by the MMX800 expansion package. A message can be empty. This makes mailbox operations logically equivalent to semaphore operations, which convey only pure synchronisation signals. However, message processing is less efficient than signal processing, as empty messages are still queued, instead of being counted. Tasks can also communicate through common data areas in the main memory. However, this form of communication is outside the control of the iRMX88 system. Event processing. The iRMX88 system does not use the logical notion of an event, but it offers the possibility of application-dependent handling of external in terrupts, which are connected to the system through an 8259A interrupt controller chip. The executive can service eight levels of interrupt signals with the priorities ranging from 0 (maximum priority) to 7 (minimum priority). The interrupt levels can be active or inactive during task execution, depending on the relation between the interrupt and the running task priorities (Table 8.1). Only active interrupt signals can effectively interrupt the task execution. However, the active interrupt levels can be selectively enabled or disabled under control of the running task. There are two methods of interrupt servicing implemented by the iRMX88 ex ecutive: by a task, or by an interrupt subprogram. In the first case a mailbox is assigned to each interrupt level. A message is sent to the appropriate mailbox when an interrupt is received by the system interrupt handler. In this way, mailbox mes sages can be used to signal external events to tasks. In the second case, the system
Comparison of Some Real-Time Operating Systems
155
Periodic task T
loop: Receive (P,M,At) Periodic function go to loop Figure 8.3: Periodic task in iRMX. interrupt handler can be temporarily replaced by an interrupt subprogram supplied by the running task for a given interrupt level. When a interrupt occurs, the subpro gram is called, and no task state transition is performed. The subprogram returns to the interrupted task, after which execution is continued in the normal way. Basic operations which control the interrupt system are the following: Elvl: Enable the indicated interrupt level. DM: Disable the indicated interrupt level. Setv. Set interrupt vector, i.e., define the interrupt subprogram which will be called to service an interrupt request. Endi: Return from the interrupt subprogram. Task execution timing. The only timing mechanism implemented in 1RMX88 is a timeout for the Receive operation. Using an additional argument to call the Receive operation, a task can specify the maximum time interval within which it can wait for a message. When the time interval expires, the operating system will send an empty message to the corresponding mailbox, thus causing the suspended task to resume. The timeout mechanism can also be used to organise periodic task execution. The structure of a periodic task is exemplified in Figure 8.3. It can be seen that this structure is the same as the structure for event-driven tasks.
8.2
System iRMX86
Intel's iRMX86 is a large and powerful multi-tasking, multi-user, real-time operating system intended for embedded applications built around the 8086 microprocessor fam ily. The system has been designed by Intel Corporation for the iSBC86 and iSBC286 line of single board computers. The expansion software package MMX800 enables the iRMX86 to control a multiprocessor system with a common memory bank. More precisely, a set of single processor microcomputers coupled by a common memory,
156
Jteai- Time Systems
and running iRMX80 (eight bit microcomputers), iRMX88, and iRMX86 operating systems, can be co-ordinated by the MMX800 software package. Distributed, i.e., network based, hardware is not supported by the system. Apart from the basic services for task and event management and timing, iRMX86 offers sophisticated mechanisms for device independent input-output handling, file management, and high level man-machine communication. The system is config urable, and it can be tailored to create a program development environment for ap plication tasks on a host computer. When the coding and testing process is finished, the unnecessary system modules can be removed, and the system and application modules can be moved to the target computer for final testing and operation. The iRMX86 system can be used in compact diskless applications, but it is best suited to large applications, which require disk file processing and extensive man-machine dialogue. 1RMX86 a r c h i t e c t u r e . The system consists of a multi-tasking executive (a nucleus) and a set of system tasks (Figure 7.5), which offer services to application tasks. Tasks can be created and destroyed dynamically, and no arbitrary limit is placed on the number of tasks. To each task a priority is assigned, which can be changed dynamically during task execution. Tasks with equal priorities are queued for execution according to the FIFO strategy. The task state transition diagram is the same as that of the iRMX88 system (Figure 8.2). The system executive is responsible for the creation and destruction of tasks, con current task execution, task communication, synchronisation and timing, event (in terrupt) handling, and the dynamic allocation and deallocation of main memory seg ments for tasks. The other services of the operating system are implemented by a set of system tasks. Each service can be requested by sending a message to an appropriate system task. The list of tasks and services comprises the following: 1. Terminal Handler allowing system and application tasks to input text from the operator keyboard and to display text on the screen. 2. Debugger allowing the designer to interact with prototype system and application software, trace its status, and isolate and correct program errors. 3. Basic I/O System handling all peripheral devices, including disk memories and files, and implementing the low level interfaces between tasks and devices. 4. Extended I/O System implementing the high level interfaces between tasks and devices, including automatic buffering of I/O operations. 5. Application Loader enabling the overlaying of large application programs in a lim ited main memory area. 6. Human Interface allowing the system operator to load application programs into main memory, set up tasks, and start their execution; moreover, it provides a set of utility routines for file copying, disk formatting, etc.
Comparison of Some Real-Time Operating Systems
157
7. The Idle Task is an empty task which runs only when there is no other task ready for execution. The internal architecture of iRMX86 is an object-orientated architecture. This means that the operating system is organised as a collection of building blocks that are manipulated by operators. The building blocks are called objects, and are of several types, such as tasks, jobs, mailboxes, semaphores, regions, and main memory segments. The operators are software procedures. A great advantage with such an architectural approach is the ease with which the operating system can be tailored to an application. If there is no need for a given object in an application, all operators for that object are excluded from the final configuration. This means that not only system tasks, but also particular functions, implemented either by the executive or by system tasks, can be included in or excluded from the final configuration, depending on the particular application requirements. Support for multi-user applications is provided by the job object. A job can be defined as a group of related tasks (and other objects) which co-operate to achieve a common goal. The operating system is expected to provide the means to allow synchronisation and communication of the tasks within a job. On the other hand, different jobs are devoted to different users (different applications) and generally interact only occasionally. Therefore, the operating system is expected to protect jobs as much as possible from other jobs, and to prevent one job from accidentally or maliciously destroying another job. The basic mechanism for task communication is asynchronous message passing through mailboxes, and can be used for communication between tasks within a job or between tasks belonging to different jobs. Other mechanisms, such as communication through common data areas in the main memory, can only be applied to tasks within the same job. Task communication and synchronisation. The iRMX86 executive imple ments three mechanisms (types of objects) for task synchronisation and communica tion: mailboxes, semaphores, and regions. Such objects can be created and destroyed dynamically, and no arbitrary limitation on the number of objects maintained by the system exists. Mailboxes. The semantics of the mailbox operations Send and Receive have been defined in Section 7.4. Tasks which perform the Receive operation on an empty mail box become suspended, and must wait until a message has been sent. When a message arrives, one of the suspended tasks is resumed. In the traditional implementation, suspended tasks were queued for resumption according to the FIFO algorithm. The advantage of this strategy is fairness, and the disadvantage is that it is not consistent with the idea of task priorities: a high priority task may be waiting in the queue, while a lower priority task receives a message and resumes execution. The iRMX86 system offers two variants of the mailbox implementation and allows the designer to
158
Real-Time Systems
choose either FIFO or a priority-based queueing algorithm. In the latter case, an arriving message is passed to the highest priority task waiting in the mailbox queue. Semaphores. The semantics of the semaphore operations Wait and Signal 2 have been defined in Section 7.4. The extensions offered by the iRMX86 system are the "multiple" operations Wait(S,N) and Signal(S,N), which are equivalent to an indivis ible sequence of N operations Wait(S) or Signal(S) respectively. Regions. Consider the mutual exclusion problem discussed in Section 7.4 (Fig ure 7.11), and assume that three tasks with different priorities are executed concur rently. The following sequence of events may happen: 1. Task T l (low priority) performs a Wait(S) operation and reserves the resource associated with the semaphore. 2. Task T2 (high priority) is resumed, performs a Wait(S) operation and becomes suspended, waiting for the resource. 3. Task T3 (medium priority) is resumed, and because of its higher priority it pre empts the running task T l . After the last step, the task T3 prevents the task T l from releasing the semaphore which blocks the task T2. In other words, the high priority task T2 waits for the lower priority tasks T3 and T l . This violates the rule of priority-based task execution. To solve the problem, the iRMX86 system introduces the notion of a region, which is a special tool for reserving and releasing resources. A region can be defined as a portion of program code which can be entered and left (Figure 8.4) only through the operations Receive-control: Check the region status, and continue if it is free, or suspend execu tion if it is busy. Send-control: Leave the region, and resume one of the tasks (if any) waiting for entry to that region. It can be seen that the above operations correspond to the Wait and Signal operations on a semaphore with an initial value equal to one. The difference, however, is that the priority of a task which has passed the Receive-control operation successfully is temporarily increased to the priority level of the highest priority task waiting for entrance to the region. In the example considered above, the sequence of events is the following: 1. Task T l (low priority) performs a Receive-control(R) resource.
operation and reserves the
2. Task T2 (high priority) is resumed, performs a Receive-control(R)
operation and
2 In the original iRMX86 terminology semaphore operations are called receive and send, respectively.
Comparison of Some Real-Time Operating Systems (low priority)
159
(high priority)
(medium priority)
Task T l
Task T2
Task T3
Receive-control(R)
Receive-control(R)
use the resource
use the resource
Send-control(R)
Send-control(R)
Figure 8.4: iRMX86 implementation of the mutual exclusion of tasks.
becomes suspended; the priority of the task T l is increased to the priority level of
the task T2. 3. Task T3 (medium priority) is resumed, and enters the Ready state; the execution of the task Tl is continued. After the last step, the execution of the task T l is continued until it leaves the region. At that time, the high priority task T2 will be resumed, and the priority of the task T l will be decreased to its original level. This procedure guarantees consistency with the mutual exclusion as well as the priority scheduling rules. Event processing. The iRMX86 system preserves the interrupt servicing scheme of iRMX88 as described in Section 8.1. The only difference is the number of interrupt levels, which has been increased to 64. Exception handling. Two types of exceptions are recognised by iRMX86. The first is the programmer error condition, which comes from an error in the coding of a system call. The second is an environmental condition arising from factors out of the programmer's control (e.g., insufficient memory, or input-output error). Each of these exception types can be handled in-line by checking a status code upon return from the call, or can cause a user-supplied exception handler to be called by the operating system. Exception handlers can be defined for each task separately. Task execution timing. The only timing mechanism implemented by the iRMX86 executive is a timeout for the mailbox Receive and semaphore Wait opera tions. The timeout mechanism can also be used to organise periodic task execution. The structure of a periodic task is similar to that shown in Figure 8.3. However, the more efficient Wait operation can be used instead of Receive.
Real- Time Systems
160
8.3
System QNX
QNX is a multi-tasking, multi-user, distributed, real-time operating system intended for applications based on local area networks of IBM PC and PS/2 microcomputers. The system has been designed by Quantum Software Ltd., and has been installed in more than 100 000 applications, ranging from office automation to process control and robotics. Q N X architecture. The system is based on the client-server model. It consists of a multi-tasking executive (a nucleus) and a set of system tasks (Figure 7.17), which offer services to application tasks. Up to 150 tasks can be executed on a single computer. To each task a priority is assigned in the range from 1 (maximum priority) to 15 (minimum priority). Tasks with equal priorities run concurrently with fair sharing of processor time. Task priorities are not permanent, but can be changed dynamically. The system executive is only responsible for concurrent task execution, basic task communication through rendezvous, and event handling. The other services of the operating system are implemented by a set of system tasks. Each of the services can be requested by invoking a rendezvous with an appropriate system task. However, the rendezvous are not visible at the programmers' level, as they are hidden within the C language library package. The list of tasks and services comprises the following: 1. Task Administrator, priority 1, to create and destroy other tasks, and to maintain the task directory. 2. Device Administrator, priority 2, to handle all character-oriented devices, e.g. ter minals and printers. 3. File System Administrator, priority 3, to handle disk memory devices and to main tain the file system. 4. Idle Administrator, priority 15, an empty task which runs only when there is no other task in the Ready state. 5. Network Administrator, priority 3, to handle the network interface and to imple ment the Arcnet LAN communication protocol. 6. Timer Administrator to signal and handle timing events. 7. Spooldev Administrator to intercept commands directed to shared characteroriented devices, to buffer transmitted data in disk files, and to handle off-line output to physical devices. 8. Locker Administrator to manage the simultaneous access of concurrent tasks to shared disk files. 9. Queue Administrator to implement asynchronous task communication through mailboxes. Tasks 1 . . . 4 must be present in any QNX system configuration. Tasks 6 . . . 9 are
Comparison of Some Real-Time Operating Systems Task Tl
161 Task T2
l&_0^r Receive (M) Send (T2,M,Mr) ^ ^
— MT~~~- Reply (M,)
Figure 8.5: QNX implementation of the rendezvous. optional, and can be included or excluded, depending on the particular application requirements. Task 5, the Network Administrator, must be included in the network version of the system. The functioning of the Network Administrator is transparent to the other tasks, in that they direct requests for communication to the executive, and need not be aware that such a task exists. The basic mechanism for task communication is the rendezvous, which is imple mented by three primitive operations (Figure 8.5): Send: Send message to the co-operating task, and suspend execution while waiting for the reply. Receive Receive message, or suspend execution if no message to the task has been sent. Reply. Send reply message. Messages cannot only exist within a single computer, but can be sent between any pair of tasks within the network. A task which has been suspended as result of a Send or Receive operation is immediately resumed when the co-operating task is terminated. The task state transition diagram (Figure 8.6) is relatively complex, as some "sus pended" states have to be maintained. The inner cycle in the diagram describes the sequence of state transitions of a task invoking a rendezvous. When a Send operation is performed before the corresponding Receive, the sending task is suspended in the state Ss, in which it waits for the acceptance of the rendezvous. After the .Receive operation has been performed, the sending task enters the state Sr, in which it waits for the reply message. The outer cycle in the diagram describes the state transitions of the task accepting the rendezvous. When a Receive operation is performed before the corresponding Send, the receiving task is suspended in the state S, in which it waits for the message. The tasks which wait for an event or for the expiration of a time interval also enter the state S. Task communication and synchronisation. The QNX executive implements
Real-Time Systems
162
Send
Receive
Figure 8.6: QNX task state diagram.
two mechanisms for task synchronisation and communication: rendezvous, and event signaling through ports. In addition, the Queue Administrator task provides buffered message passing through mailboxes. Rendezvous. As illustrated in Figure 8.5, the rendezvous is implemented by two messages sent between the co-operating tasks. The message passing mechanism in a network system is based on a reliable, connection-oriented protocol. This means that a connection — which is called a virtual circuit — between co-operating tasks should be established before the first message is sent, and should be cancelled when it is no longer needed. A connection can carry messages in both directions. There are two primitive operations for establishing and destroying connections: VCcreater. Create a connection to an indicated task. VCreleeaer. Release the connection. The actual semantics of these operations depend on whether the tasks involved are executed on the same computer, or on different computers within the network. In the former case, the connection is completely maintained by the executive. In the latter case, the VCcreate operation creates two virtual tasks on the co-operating computers. A virtual task is the local representation of the corresponding remote partner (Figure 8.7). The identifiers of the virtual tasks are used as parameters in the communication functions Send, Receive, and Reply. When a task is finished, i.e. when it enters the Terminated state, all virtual tasks which represent the finished task anywhere in the network are aborted. Each of the co-operating tasks is notified
Comparison of Some Real-Time Operating Systems
163
Virtual tasks Task
Task ^ S e n d t V Q ) , / ^
VP ) R e C e i v e ( V P ) . (
Q
Computer 2
Computer 1 Automatic message transfer through the virtual circuit Figure 8.7: QNX virtual circuit.
that the connection has been discontinued. It can be seen that the implementation of each of the operations described so far requires a series of physical messages to be sent between the network nodes. The reader is encouraged to compare the semantics of the operations introduced in this section with the logical link protocol described in Section 6.4. Ports. Basically, ports are similar to the semaphores described in Section 7.4. The difference is that ports — unlike semaphores — cannot function as independent objects. Each port must be attached to a task, which becomes an "owner" of the port. Any task may signal to a port, but only the owner can wait for the signals. There are four primitive operations implemented to handle synchronisation through ports: Attach: Attach a port to the running task; the operation can request a specific port, or an arbitrary port from the pool of ports. Detach: Release the port; the operation is only effective when performed by the owner of the port. Await: Wait for a signal to the port; the operation is analogous to the semaphore operation Wait, and is effective only when performed by the owner. Signal: Send a signal to the port; the operation is analogous to the semaphore oper ation Signal. Unlike the rendezvous, communication through ports can be used only locally, within a single computer. No signals between tasks executed on different computers can be passed. Basic mechanisms for task communication, i.e., rendezvous and ports, are not completely independent. The Signal operation not only sends a signal to a port, but also an empty message to the port owner. If the task which owns the port was
164
Real-Time Systems
suspended by a Receive operation, then the message will cause the task to be resumed immediately. As we shall see below, this mechanism can be applied to control the duration of the task suspension period. Mailboxes. The system QNX implements classical mailbox operations which en able buffered message passing between tasks. In general, many tasks can write to a mailbox, and many tasks can read from a mailbox. To prevent chaos, a task which intends to use a mailbox should declare, in advance, whether it intends to write or read messages. There are three operations which support communication: Queue-open: Connect the running task to indicated mailbox; additional parameters specify the intended use of the mailbox (read or write). Queue-write: Send message; the messages are queued in the order of their arrival at the mailbox. Queue-read: Receive message; the task fetches the oldest message from the mailbox, or suspends execution if the mailbox is empty. Mailboxes are serviced by the system Queue Administrator task. Each mailbox opera tion is implemented by means of a rendezvous between the task which uses the mailbox and the Queue Administrator. This implies that the operations work transparently throughout the network. Unfortunately, the efficiency of mailbox communication is poor, and this mechanism is not recommended for time-critical applications. Event processing. External events can be interfaced to tasks through ports. The point is that the Signal operation cannot only be performed by a task, but also by a system interrupt handler. Thus, port signals can be used to signal external events to tasks. In the QNX operating system architecture, ports serve as the primary mechanism for linking interrupts (which are serviced by the executive) to device handlers (which are system tasks). Exception handling. The QNX executive handles 32 exceptions, which inform tasks about special situations which can occur during their execution. Sixteen excep tions have been predefined as system errors, such as serial interface error, memory protection violation, divide-by-zero, or lack of shared library. The other 16 can be used by tasks for signaling application-dependent errors. Two primitive operations have been implemented for this purpose: Exc-handler. Define a software module which is to be used as exception handler. Set-exception: Raise an exception in the indicated task; if the operation is addressed to a virtual task, the exception is raised in the remote task at the other end of the pertaining connection. When an exception has been raised in a task, normal task execution is interrupted and the exception handler is started. If no exception handler has been defined, the task is aborted.
Comparison of Some Real-Time Operating Systems
165
Periodic task T
»-loop: Set timer (At) Periodic function Suspend (T) go to loop Figure 8.8: QNX periodic task. Task execution timing. Basic timing facilities are provided by the system's Timer Administrator. Timing mechanisms are based on the idea of delayed functions, which can be requested by tasks from the Timer Administrator. Three types of delayed functions can be requested by means of a Set timer operation: - resuming a task, - raising an exception, - signaling to a port. A delay can be specified in the form of an absolute time, i.e., a number of seconds from midnight, 1, January 1980, or incrementally from the current time. In the latter case, resolution of the time measurement is 1 ms. The first of the above functions can be used for scheduling periodic task execution. The structure of a periodic task is exemplified in Figure 8.8. The last function, i.e., delayed signaling to a port, can be used to implement timeouts. To do this, the delayed signal to the attached port should be requested prior to a Send or Receive operation. A signal sent after the specified time period will cause the task waiting for a rendezvous to be returned if the rendezvous has not come into effect.
8.4
System P O R T O S
PORTOS is a powerful multi-tasking, real-time operating system designed by GPP mbH for embedded applications built around the 8086 family of microprocessors. Un like the iRMX and QNX systems, which are tied to particular hardware architectures, PORTOS can easily be configured to support a wide variety of hardware architec tures, ranging from IBM PC microcomputers, through the Intel iSBC86 and iSBC286 single board computers, to factory hardened systems such as the Siemens UFC400 or the Dornier DP400. However, neither multiprocessor nor distributed hardware
Real-Time Systems
166
Application tasks
Language processor Shell
Operating system
Nucleus Figure 8.9: PORTOS system architecture.
architectures are supported by the system. Apart from the basic services of task and event management and timing, PORTOS offers mechanisms for input-output handling, file management, and man-machine communication. The system is configurable, and may be extended from a very small executive with only a few functions to an operating system with a medium or large range of functions. However, it cannot create an environment for programming ap plication tasks, in that it does not offer an appropriate set of utility programs and compilers. New applications should be designed and coded on a host computer, which can be an IBM PC under MS-DOS, a VAX under VMS, an iSBC286 under JRMX86, or a Siemens 7000 under the BS2000 operating system. P O R T O S architecture. The system has a layered architecture (Figure 8.9), fundamentally different from that described in Section 7.3. It consists of a hardwaredependent nucleus, and a hardware-independent shell. The nucleus is responsible for concurrent task execution, task communication, synchronisation and timing, event (interrupt) handling, dynamic allocation and deallocation of main memory segments, and low level input-output handling. The shell is organised as a set of procedures, which can be invoked by application tasks, and which are executed as parts of these tasks. The functions offered by the shell deal with high level device independent input-output handling, file management, and event scheduling. An additional layer is a language processor which implements language-specific operations. The language processor forms the basis for the execution of task code produced by the PEARL, Ada or ATLAS compilers. The structure of application tasks is static in that the tasks can be created only in an initialisation phase, before the application software is started. This saves both storage space for PORTOS code and program execution time. To each task a priority is assigned, which can be dynamically changed during task execution. The task state transition diagram is the same as that introduced in Section 7.3 (Figure 7.4). An additional component of the PORTOS system is a debugger, which allows to
Comparison of Some Real-Time Operating Systems
167
trace and monitor the execution of the system and the application tasks. Complying with the overall system architecture, the debugger has not been implemented as a system task (as in iRMX86) but as a package of subroutines, which can be incor porated into the bodies of the application tasks. During the operator dialog with the debugger, the whole system is "frozen" and it is possible for the user to take an instantaneous "snapshot" of the entire system. Task communication and synchronisation. The PORTOS nucleus imple ments three mechanisms for task synchronisation and communication: message queues, binary semaphores, and event flags. These objects are created statically during system initialisation. Message queues. A message queue is assigned to each task, which can be read by the owner, but into which any other task can write. The semantics of the message passing operations Send and Receive have been defined in Section 7.4. Binary semaphores. A binary semaphore is a restricted form of the semaphore defined in Section 7.4. In the PORTOS environment, it is intended to solve the mutual exclusion problem only. A binary semaphore is defined as a two-state variable, with the initial value of 1. The variable can be referenced by tasks using the following operations: Request(S): if 5 = 1 then 5 = 0 else suspend the running task. Release(S): if no task is suspended on S then 5 = 1 else resume one task. It can be seen that the above operations correspond to the Wait and Signal operations defined in Section 7.4. Event flags. The concept of an event flag is similar to that of a binary semaphore. A flag is a two-state variable which can pass a synchronisation signal from one task to another. The main difference between the mechanisms lies in a broader set of operations to test or to change the state of a flag: Set-£ag: Set the indicated flag to the value 1. Reset-flag: Reset the indicated flag to the value 0. Test-flag: Test the value of the indicated flag. Wait-flag: Test the value of the indicated flag, and suspend execution if the flag is reset; the value of the flag is not changed. Event processing. PORTOS does not use the logical notion of an event, but it refers to the more technology-oriented notion of an external interrupt. An interrupt can either be serviced by an interrupt subprogram or it can cause a series of task state transitions. In the first case, the system interrupt handler can be replaced by an interrupt subprogram supplied by the running task. When the interrupt is requested, the subprogram is called, and no task state transition is performed. The subprogram returns to the interrupted task, whose execution is continued in the normal way. Basic
168
Real-Time Systems
operations which control the interrupt system are the following: Enable: Enable the indicated interrupt signal. Disable: Disable the indicated interrupt signal. Connect: Define the interrupt subprogram which will be called to service the indicated interrupt signal. Disconnect: Disconnect the interrupt subprogram, and pass the interrupt service to the system interrupt handler. End: Return from the interrupt subprogram. The second option is more complicated. The PORTOS system allows tasks and interrupt signals to be related, and the state transition which is to be performed at an interrupt occurrence to be declared in advance. Many tasks can be related to an interrupt, and many interrupts can be related to the same task. The basic element of the event scheduling mechanism is the schedule, which is defined as a triple: (the task identifier, interrupt number, task state transition code). The event scheduler maintains a list of schedules and performs the required task state transitions when interrupt signals occur. Basic operations which control the event scheduler activity are the following: Create-schedule: Enter the indicated schedule to the schedule list. Delete-schedule: Delete the indicated schedule from the schedule list. Prevent-task: Delete all schedules related to the indicated task from the schedule list. Task execution timing. PORTOS implements a sophisticated timing mecha* nism, which allows cyclic task execution, delayed task execution, and timeouts. The basic element of the time scheduling mechanism is the time schedule, which is de fined as a quadruple: (task identifier, repetition period or time delay, time limits, task state transition code). The time scheduler maintains a list of time schedules and performs the required task state transitions either periodically, according to specified repetition periods, or only once, after a specified time delay. Periodic transitions are performed within the time interval defined by the time limits. The structure of a periodic task is similar to that shown in Figure 7.15. Basic operations which control the time scheduler activity are the same as those which control the event scheduler module, i.e., Create-scheduJe, Delete-schedule, and Prevent-task. Timeout in task communication can be implemented by means of time schedules which request delayed resumption of a suspended task. Timeout in input-output operations can be directly requested by means of an additional argument to the I/O shell functions.
Comparison of Some Real-Time Operating Systems
8.5
169
Comparison of Real-Time Operating Systems
The functionality of the real-time operating systems described in Sections 8.1 through 8.4 can be compared with respect to a variety of characteristics. A short summary of the most important operating system features is given in Table 8.2. Hardware environment. The systems are intended for different hardware archi tectures, ranging from a network of standard off-the-shelf personal computers (QNX), through multiprocessor systems built around companion standard single board com puters (iRMX), to custom designed single processors using standard microprocessor chips (PORTOS). It can be noticed that PORTOS is a slightly older design than the other systems and, hence, supports neither multiprocessor nor network architectures. On the other hand, configurability to a variety of hardware structures is a highly desirable feature, and it can be regarded as a great advantage of PORTOS. Application development. Basically, application software for an embedded system is developed off-line, using a powerful host computer, which creates an ap propriate programming environment. The operating systems used on both, i.e., host and target, computers can generally be different. However, the iRMX86 and QNX systems can be tailored to create host as well as target environments. This allows reliable debugging of the developed software in the host environment which saves time and money during the integration and testing steps. Comparing the spectrum of programming languages for application software devel opment, one can notice that PORTOS offers the widest range of languages. However, SYSLAN is not in common use, and Ada can be implemented in the PORTOS envi ronment only partially (e.g., PORTOS does not offer a dynamic task creation facility). Configurability. The high configurability of the iRMX and PORTOS systems is strongly appreciated. It enables the efficient tailoring of system functions to specific application requirements, and saves program code (ROM and RAM space) and time. For example, the PORTOS nucleus can be as small as 2 KB of code, and as large as 18 KB, while the PORTOS shell can be tailored within the range from 2 to 100 KB! It is not a surprise that the possibilities for tailoring the QNX system are much less, as the system is tied to a rigid hardware architecture, and is not intended for ROM-based applications. Task scheduling. The iRMX and PORTOS systems are similar in that they offer many task priority levels, so that a different priority can be assigned to each task. If two or more tasks happen to share the same priority level, they are scheduled for execution according to the FIFO strategy. This facilitates a very detailed differentia tion of the task urgencies, and guarantees that more urgent tasks are always executed before less urgent ones. The QNX solution is different, as the system only offers a few priority levels and implements a round-robin time sharing strategy within each prior-
Jieai-Time Systems
170
Table 8.2: Comparison of four real-time operating systems. iRMX88
iRMX86
1SBC88/86
iSBC86/286
IBM PC, PS/2
Hardware architecture Host/target environment Primary programming languages Functional configurability Task creation Task priorities
multiprocessor
multiprocessor
network
PORTOS configurable for a variety of systems uniprocessor
target
host/target
host/target
target
PL/M assembler
PL/M assembler
C assembler
high
high
medium
PEARL, Ada, SYSLAN, C, PL/M, ATLAS high
dynamic static
dynamic dynamic
Task com munication
mailbox, common memory
mailbox, common memory
dynamic dynamic, time sharing within a level rendezvous
Hardware platform
Task syn chronisation Event processing Exception handling Task timing
— simple interrupt handling — timeout
QNX
semaphore, region
signaling through ports
simple interrupt handling user-supplied exception handlers timeout
event signaling through ports user-supplied exception handlers timeout, delayed functions
static dynamic
message queue, common memory binary semaphore, event flags sophisticated interrupt handling — timeout, delayed functions, periodic task execution
Comparison of Some Real-Time Operating Systems
171
ity level. This results from a compromise between the requirements of control and of general purpose, multi-access applications. The QNX scheduling strategy makes the system inappropriate for applications which require very short response time. Task priorities can be dynamically changed in all systems except iRMX88. How ever, this feature is of minor importance, as in all practical cases dynamic priorities can be easily simulated by co-operation of a set of tasks with different priorities. Tasks can be dynamically created in the iRMX and QNX systems, while only static task creation is allowed in PORTOS. This feature is of minor importance as well, as industrial systems deal with static structures of their environment and their algorithms, so that dynamic task creation is hardly ever needed. Moreover, as was pointed out in Section 2.1, dynamic task creation may be harmful, as it decreases real-time system predictability. Task communication and synchronisation. The basic form of communication implemented by the four operating systems is message passing. However, messages are sent synchronously by QNX, and asynchronously by the other systems. The asyn chronous task communication in QNX is extremely inefficient. The second difference between QNX and the other systems is that QNX does not allow communication through common memory areas. This is the price paid for the network capabilities of QNX. However, the price does not seem to be very high in relation to the advantages of the network-based architecture. Event and time handling. The iRMX88 and iRMX86 systems only implement a minimum set of simple operations for interrupt servicing and timeouts. The POR TOS implementation is more elaborate and allows more direct planning of periodic and event-driven task executions. However, the efficiency of the iRMX and PORTOS event and time processing is similar. The QNX implementation is fundamentally dif ferent. This system offers a more elegant, and in some sense higher level, mechanism unified with the task synchronisation tools. This makes the structure of the appli cation software clearer, and improves software testability and maintainability. The other side of the coin is that response to external events is slower than for the other systems. Exception handling. Only the iRMX86 and QNX systems offer mechanisms for exception handling. The QNX implementation is clearly superior, as exceptions are automatically passed among co-operating tasks. This prevents error conditions from spreading across the system. This feature is particularly important in the context of networking, as it enables system recovery to take place after failures in a particular network station. Conclusion. It is difficult to say which of the operating systems is the best, as the application areas of the four systems presented in this chapter are, in fact, disjoint. QNX will probably be the cost-effective choice for a laboratory automation system and
172
Real-Time Systems
for the upper layer computer in a layered hardware architecture (Section 4.4 and 4.5). iRMX will be superior in applications based on standard Intel boards, as it is tuned to that particular architecture. PORTOS represents another design strategy and is not dedicated to and restricted by any particular hardware architecture. It should be considered for custom designed systems using standard microprocessor chips. The efficiency of operating systems is not easy to measure, as it depends strongly on the performance of the underlying hardware (microcomputer clock rate). Usually, one can assume that the typical latency time of an interrupt driver is a few tens of microseconds, and a typical task switching time is a few hundred microseconds.
Chapter 9 High Level Real-Time P rogramming 9.1
Real-Time Features in High Level Languages
Longer than in other areas of data processing, assembly languages were prevailing for the formulation of real-time applications. The reason for this were the high prices for processors and memories enforcing optimal programming with respect to execution time and storage utilisation. However, also for developing real-time systems, the demand for high level languages grew. The most natural approach was to augment existing and proven languages oriented at scientific and technical applications by real-time features. This resulted in a large number of vendor-specific real-time dialects of FORTRAN, BASIC, and a few other languages. The idea of creating new languages especially dedicated to program real-time software was first realised in the sixties in a rather curious way: in Britain the two languages Coral 66 and RTL/2 were developed, which had no real time features. Finally, since 1969, also new real-time languages possessing real-time elements were devised and later standardised, viz., PEARL in Germany, PROCOL in France, which was later replaced by LTR, and Ada in France and the U.S.A. A third group of universal, procedural programming languages is often used for real time programming purposes, too, although they contain, at most, rudimentary real time features. Such languages are, e.g., C, Modula-2, or PL/M. The reasons for their application may be that they are better known and more easily available than genuine real-time languages. In parallel to the three approaches outlined above, non-procedural languages and fill-in-the-blanks program generators have been devised for a number of particular ap plication areas. Examples for the first group are ATLAS for automatic test systems, 173
174
Real-Time Systems
EXAPT for machine tool controls, and STEP5 for programmable logic controllers. Program generators are generally vendor-specific and contain preprogrammed func tion blocks for measurement, control, and automation, which need to be configured and parameterised by the user. They are available for a number of distributed process control systems and programmable logic controllers and have proved to be relatively successful in their respective application areas. Within the scope of this chapter it is naturally impossible to consider all languages aiming at applications in real-time processing. The selection of the languages to be reviewed here is determined by their dissemination and their suitability in industrial environments. The above mentioned languages Coral 66 and RTL/2 were designated as process control languages. But the actual languages only comprise algorithmic elements and, as far as real-time features, synchronisation, and input-output are concerned, they totally rely on operating system calls. Therefore, they will not be considered here, as well as Concurrent Pascal and Modula. The latter two are operating system imple mentation languages and process control oriented elements are lacking. Particularly, Modula is not suited for industrial applications, since no input-output and timing facilities are incorporated and since it does not provide as language constructs the features necessary to model a set of tasks upon a technical process and to control this task set accordingly. Instead, Modula allows to write machine dependent periph eral driver routines and the users' own timing, synchronisation, and resource sharing mechanisms. Formulating real-time application programs in Modula would there fore require a considerable amount of non-problem oriented work and would yield non-portable software. The above statements on real-time facilities hold even more for Modula-2, especially with respect to tasking, since only the coroutine concept is implemented to express concurrency. There are many dialects of BASIC available, incorporating a minimum of real time features aimed at scientific, not heavily used applications, where the control and evaluation of experiments is to be quickly programmed. Hence, these BASIC derivatives are also outside of our scope. Finally, there are the new languages Oc cam and 0ccam2, which are often considered as being real-time. They support the communication and processor configuration in transputer based distributed systems. Parallelism, however, can only be generated in a static way. Since the languages do not allow exception handling and only provide a non-deterministic guarded command feature for reacting to several possible events, they do not fulfill the predictability re quirements of hard real-time systems. The main languages for real-time systems programming which will be discussed in this section are: Ada, Industrial Real-Time FORTRAN, LTR, PEARL, P L / 1 , and Real-Time Euclid. To compare the languages, a classification of the real-time elements has been defined, and the availability of real-time features in the six languages under
High Level Real-Time Programming
175
consideration has been compiled in Table 9.1. Conventional elements. The first category comprises elements required in pro cess control applications but of a conventional character. This may cause contradic tion as far as process input/output is concerned. Apart from their time scheduling, however, these operations are carried through by corresponding driver routines of the operating system in a manner similar to standard input-output. Multitasking. In all languages considered, parallel processing is organised on the basis of tasks or processes. These can be hierarchically ordered as in Ada and PEARL. Each language has as foundation a different model of task states. But it is not possible to interrogate as to which state a task is presently in. Only Ada and PL/1 allow to determine whether a task is active or already terminated. To control the execution of tasks, task state transition operations are provided as language features. In this respect PEARL, FORTRAN, and Real-Time Euclid offer a wide range of capabilities that may even be scheduled in PEARL. In Ada and PL/1 it is only possible to activate and abort tasks, whereas additionally a wait operation may be requested in LTR. Furthermore, LTR allows task activation to be scheduled for the occurrence of simple timing and other events. All languages except FORTRAN imply that the appropriate dispatching decisions are then made on the basis of a strategy. The majority uses task priorities that may be dynamically changed, except in LTR, FORTRAN, and Ada. Only Real-Time Euclid employs the more appropriate concept of deadlines to schedule tasks. Task synchronisation. All languages considered provide means for the synchro nisation of task executions and for the reservation of resources to be used exclu sively. The most common synchronisation feature is the semaphore. In addition, various other concepts are implemented in the different languages. The availability of resources can be checked in all languages except PEARL. The employed resource allocation schemes are mostly either priority based or operating system dependent. Real-Time Euclid allocates resources by deadline or priority dependent on the nature of the resources. Only Ada systems work according to the FIFO strategy. Deadlock prevention as objective of the language has only been found in P L / 1 . Event handling. The handling of interrupts is a common feature. However, they cannot be enabled or disabled in Ada and LTR programs. Task execution timing. As it will be shown in the sequel, all languages are lacking in important timing features. The cumulative run-time of a task, which must be known to perform deadline driven scheduling, is only available in the preliminary version of Ada and in Euclid. Whereas the capabilities of Ada, P L / 1 , and LTR for the time scheduling of task executions and operations are very limited, Ada, P L / 1 , and Euclid allow to supervise whether synchronisation operations are completed within predefined time frames. The other languages and Euclid offer a wide range of time scheduling features.
176
Real-Time Systems
Table 9.1: Survey on Real-Time Elements in Ada, Industrial Real-Time FORTRAN, LTR, PEARL, Real-Time PL/1, and Real-Time Euclid. Feature
Ada
Bit processing Reentrant procedures File handling Process I/O Hierarchy of tasks Task state available Task scheduling Task control Exception handling Scheduling strategy
Yes Partially Yes Poor Yes Priority FIFO Yes No
Usage of priorities Dynamic priorities Semaphores Further synchro nisation means
PL/1
Euclid
Yes Yes
FORTRAN | LTR PEARL Conventional elements Yes Yes Yes No Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes No Yes Complete Yes Priority
No No Yes Complete Yes Deadline
No Yes No No Task synchronisation Yes Yes Resource- Block mark
Yes Yes
No Partially Yes Poor Yes Priority OSD Yes Yes No Shared object Lock Yes
No Monitor Event Yes
Yes Yes
Yes
Yes
Yes Block Region Implicitly Yes
Yes
Yes
Yes
No
Yes
Yes
FIFO
OSD
Priority
Priority
No
No
No
Priority OSD No
Yes
Deadline Priority No
Interrupt handling Enable/disable interrupts
Yes Yes
Yes Yes
Event handling Yes No
Yes Yes
Yes Yes
Yes Yes
Time available Cumulative run time available Forms of time scheduling
Yes No
Yes No
Timing Yes No
Yes No
Yes No
Yes Yes
Delay
Various
Various
Various
Timeouts
Yes
Delay Fixed date Yes No
Raise exception Assert
Resource reservation Resource state Available Resource allocation strategy Deadlock prevention
Testing aids
No Rendezvous Shared variable Yes
Yes Yes Tasking No No No No Yes Yes Complete Limited No No Priority
Delay Cyclic
No No No Verification and exception handling Raise Set event Mapping Induce exception Trace event Call inter Trigger Flag rupt entry display interrupt
OSD — Operating System Dependent.
Yes
High Level Real-Time Programming
177
Real-Time Euclid is unique among the languages compared in that it is schedula bility analysable. This means, given a Real-Time Euclid program, it can always be ascertained statically whether the program will meet its timing constraints. To allow for schedulability analysis, Real-Time Euclid has no constructs taking arbitrarily long to execute, the number of tasks in the system may not be arbitrarily high, and the timing and activation/deactivation constraints of each task are expressed explicitly. Verification and exception handling. Only Ada and Euclid provide full excep tion handling mechanism. PEARL has the possibility of simulating interrupts under program control for testing purposes. Language availability. As far as actual implementations are concerned, the sit uation is quite different for the six languages considered here. There is a vast number of Ada compilers available, whose target systems include all major microcomputers. Various dialects of FORTRAN and PL/1 with a broad range of different real-time capabilities are on the market. Suitable for consideration in a language comparison, however, are only the corresponding proposals for standardisation. Of the subroutine package constituting Industrial Real-Time FORTRAN only subsets have experimen tally been implemented. For extended PL/1, unfortunately, no implementations could be ascertained. LTR can be employed on SEMS' line of Mitra computers. Despite its good standing in this language comparison, it is being replaced by Ada for po litical reasons. PEARL programming systems are commercially distributed for some 50 different target computers ranging from microcomputer to mainframe systems. They are based on some 10 compilers and corresponding real-time operating systems providing the required features. Single board microcomputer systems either contain ing complete PEARL programming environments or only object programs generated from PEARL sources are shipped in large quantities and used in industrial, scientific, and military applications. Real-Time Euclid has been implemented at the University of Toronto. The system includes a compiler, a schedulability analyser, and a run-time nucleus. The target hardware is a distributed microprocessor system. The implemen tation and use of the system have taken place in a research environment only. Thus, Real-Time Euclid is not available commercially.
9.2
A Closer Look at Ada and P E A R L
According to the availability consideration given in Section 9.1, Ada [44] and PEARL [45] are the only high level real-time languages readily applicable in in dustrial control environments, since they have been implemented on a wide range of computers including the major 16 and 32 bits microprocessor series, on which most contemporary process control computers are based. That is the reason to take a closer look at these two languages and to discuss their merits and deficiencies.
178
Real-Time Systems
Evaluation of Ada. The programming language Ada emanated from an effort of the U.S. Department of Defence to make available a software production tool suited to program embedded systems. Ada is supposed to save considerable software costs by replacing the vast variety of languages presently in use for DoD projects. The evaluation of the latter gave rise to requirements for a common DoD language which have been refined several times. Correspondingly, Ada claimed to be the language of the eighties, also being apt for the formulation of real-time applications. Owing to its military background, one expects that Ada should provide remarkable properties with regard to safety, reliability, predictability of program behaviour, and especially with regard to timing features, because military and technical processes are rather time sensitive, and often require the observation of tight timing constraints with only small tolerances. Remarkable, in Ada, are the language's elegant task synchronisation concept, viz., the rendezvous, and the possibility to control whether a rendezvous takes place within a given time frame. The semantics of the rendezvous is asymmetric: an invocation from within a task looks like a procedure call, and for the acceptance by the co operating task there is a special accept statement. A non-deterministic selection of several rendezvous, provided by the select statement, enables the easy implementa tion of server tasks which offer services requested by other tasks through rendezvous invocations. The following example illustrates the use of the rendezvous concept by implementing a server task, which models a semaphore. TASK semaphore IS ENTRY wait; ENTRY signal; END; TASK BODY semaphore IS s: INTEGER; BEGIN LOOP SELECT WHEN s > 0 = > ACCEPT wait DO s := s — 1; END wait; OR ACCEPT signal DO s := s + 1; END signal; END SELECT; END LOOP;
High Level Real-Time Programming
179
END semaphore; The semaphore operations are invoked by the statements: semaphore, wait; or semaphore.signal; respectively. The WHEN clause defines a precondition which en ables or disables the corresponding ACCEPT statement. Current experience shows ([46]), however, that the rendezvous takes too much time to be carried through. On the other hand, the non-deterministic selection of an entry call by select statements, in case several rendezvous are simultaneously possible, introduces unpredictability into the program execution. In the sequel a series of shortcomings is mentioned making it questionable whether Ada's claims can be maintained as far as real-time features are concerned. Although Ada was intended to support the development of large software systems in a well structured way, experience shows that it "does not have the adequate capabilities to express and to enforce good system design" [47]. The reason for this may be that a language must reflect the users' way of thinking, and the users generally are engineers and technicians, but not computer scientists. So, the language features must be easily conceivable, safe to handle, and application oriented. Ada has disadvantages in this respect, because some of its constructs, although being very elegant, are very hard to understand. With regard to conventional elements, bit handling is only indirectly possible. Initiation and termination are provided as sole tasking elements. The only way for expressing time dependencies is to delay the executions of tasks by given periods. On the other hand, absolute time specifications are impossible. Hence, it is necessary to schedule explicitly within their bodies the execution of tasks according to external demands whose arrivals are actively and, if need be, also selectively awaited. With an interrupt only a single task can be reactivated. The operations of enabling or disabling interrupts are not contained in the language. Furthermore, the prevention of deadlocks is not supported. Another design goal of Ada was to support distributable and dynamically reconfigurable applications. Since the language does not provide any related expressive means [48, 49], it is impossible to formulate distributed software within its framework. The semantics of Ada imply that processors and other resources are dispatched according to the FIFO strategy with regard to priorities. Tasks are immediately activated when their declarations are elaborated. Task priorities may not be changed dynamically. Hence, when also considering the above mentioned shortcomings and the absence of most of the features, discussed below, which ought to be present in a contemporary real-time language, it must be concluded that Ada does not constitute a progress as far as its suitability for real-time applications is concerned. The deficiencies of Ada summarised above have now been widely recognised. Some developers of large embedded real-time systems even expressed their concern in recent workshops that Ada cannot be called a real-time language. Therefore, there is a num ber of international conference series in progress, which discuss ways of appropriate
180
Real-Time Systems
remedies. The goal is a substantial revision of Ada, called Ada-9X, to take effect in the late Nineties. Evaluation of P E A R L . Although conceived a decade earlier, the "Process and Experiment Automation Real-time Language" PEARL provides much more expres sive power for formulating real-time and process control applications than Ada. The reason may be that it was defined by electrical, chemical, and control systems en gineers on the basis of their practical experience in industrial automation problems and that special attention has been devoted to the timing aspects of the applications. The development of PEARL began already in the late sixties and was sponsored by the Ministry of Research and Technology of the Federal Republic of Germany. There are two standardised (DIN) versions of PEARL, i.e., Basic PEARL, and the superset Full PEARL. Evaluating Table 9.1 and the comparison above, PEARL appears to be the most powerful high level real-time programming language for the automation of technical processes. As far as its algorithmic part is concerned, PEARL has approximately the same facilities as Pascal, although the notation is different. It provides the additional data types CLOCK and DURATION and corresponding arithmetic operations. PEARL programs are composed of modules, which may be compiled separately. This structure supports the modular composition of complex software systems. Each module may contain a system division and a number of problem divisions. In the system divisions, the hardware configuration with special regard to the process peripherals is described. User-defined identifiers are associated with hardware devices and addresses. Only these identifiers are referenced in the problem divisions in order to enhance the documentation value and the portability of PEARL programs. When moving to a different hardware, only the system divisions need to be modified. On the other hand, PEARL modules thus contain complete information necessary to run them, relinquishing the need for the description of devices and files in the job control language of an operating system outside of PEARL. Besides high level file oriented input-output statements, PEARL also provides low level constructs to exchange information with process peripherals, which are necessary for automation and control applications. Neither of these input-output statements, however, directly communicates with peripherals but with virtual data stations or "dations". They axe mapped onto real devices in the system divisions, where all implementation details are encapsulated for portability reasons. As already stated above, PEARL features the most comprehensive applicationoriented range of elements for task scheduling and control, and for expressing the time behaviour. Two types of schedules axe generally available: one for single-event scheduling and the other for repetitive-event scheduling. scieduie::=
High Level Real-Time Programming
181
singie-event-scheduJe | repetitive-scheduJe singJe-event-scheduie::= AT dock-expression | AFTER duration-expression
| WHEN
interrupt-name
repetitive-schedule: := AT clock-expression ALL duration-expression UNTIL clock-expression \ AFTER duration-expression ALL duration-expression DURING duration-expression | WHEN interrupt-name ALL duration-expression {UNTIL clock-expression \ DURING duration-expression} The following five statements for the control of task state transitions can be placed within the scope of a schedule clause. A task is transferred from the Terminated to the Ready state by an activation, in the course of which also the task's priority may be changed: [schedule] ACTIVATE
task-name [PRIORITY
positive-integer];
The operation inverse to this is termination: [schedule] TERMINATE
task-name;
Running or ready tasks can temporarily be suspended with the help of: [ schedule ] SUSPEND
task-name;
which is reversed by: [schedule] CONTINUE
task-name;
Applied to the calling task itself, the last two operations are combined into: /singie-event-scheduie/RESUME; All scheduled future activations of a task are annihilated by the execution of: PREVENT
task-name;
In conclusion, it can be said that PEARL combines the advantages of general purpose and of special purpose real-time languages. Despite its merits, PEARL also has a number of deficiencies, as the interpretation of Table 9.1 reveals. The most significant one is the lack of well structured synchronisation primitives with temporal supervision. The experience gained with the application of PEARL has also suggested some further enhancements of the present PEARL standards. The features to be included into these new versions ("PEARL-90") comprise a few minor modifications of some non-real-time language elements, process graphics facilities, and the ones, which are discussed in the next section. P E A R L for Distributed Systems. Distinguishing it from all other available
182
Real-Time Systems
real-time languages, recently a third part of PEARL was standardised in Germany [45], which allows for the programming of distributed applications and for dynamic system reconfiguration. This language extension contains elements for describing the hardware configuration of a distributed system, i.e., of the stations, of the physical communication network between them, and of the attachments to the peripherals and the technical process, and for describing the software properties, viz. of the distribution of software units among the stations, of the logical communication chan nels and the corresponding transmission protocols as well as how to react on system malfunctions. Distributed PEARL programs are structured by grouping modules to collections, which are distributed to the single stations in dependence on the system states. These collections are the basic units for dynamic reconfiguration. Since collections may be moved between different stations, the system divisions describing the access to peripherals may not be contained in the collections. Instead, there is only one system division for each station in a given configuration. In addition, there is a global system division to enable the access to devices attached to other stations. For the purposes of dynamic reconfiguration in response to changing system states, Distributed PEARL provides configuration divisions and corresponding executable statements for loading and unloading of collections and for the establishment and the discontinuation of logical communication paths. The detailed discussion of these facilities is omitted here, because they are very similar to the reconfiguration language elements that will be introduced below. The communication between collections is solely performed by message passing employing the port concept. It avoids the direct referencing of communication objects in other collections and decouples the communication structure from the logic of message passing. Ports represent the interface between collections and the outside world. There are input and output ports, and a collection may have several ones of them. In the problem divisions of collections messages are exclusively routed by mentioning ports. The logical communication paths between collections in a certain configuration are established by executable statements connecting ports of different collections with each other. One-to-many and many-to-one communication structures may be set up. The logical transmission links can be mapped to the physical ones in three way, i.e., by selecting certain lines, by specifying preferred lines, or by leaving the selection to the network operating system. Three different protocol types are available to carry out communications: the asynchronous no-wait-send protocol, the synchronised blocking-send protocol, and the rendezvous-type send-reply protocol. When using the asynchronous protocol for send ing a message, the latter is passed to the network operating system for transmission and, if need be, buffering at the receiver. Applying the blocking-send protocol, after transmission the sender waits until it obtains an acknowledgement generated by the corresponding input operation in the receiving task. For the third type of communica-
High Level Real-Time Programming
183
Table 9.2: Desirable real-time features. Application oriented synchronisation constructions Surveillance of the occurrences of events within time windows Surveillance of the sequences in which events occur Timeout of synchronisation operations Timeout of resource claims Availability of current task and resource state Inherent prevention of deadlocks Feasible scheduling algorithms Early detection and handling of transient overloads Determination of entire and residual task run-times Task oriented look-ahead virtual storage management Accurate real-time Exact timing of operations Dynamic reconfiguration of distributed systems when failures occur Support of software diversity Application oriented simulation regarding the operating system overhead Interrupt simulation and recording Event recording Tracing Usage of only static features if necessary
tion protocol a one-to-many connection is not possible. Here, the sending task waits until the receiving one has processed the message and returned a result. The send and receive operations can be endowed with timeout mechanisms if a synchronous protocol is selected.
9.3
Requirements for New High Level Language Features
When comparing the requirements for real-time languages and systems with the ca pabilities of existing languages, it becomes obvious that various elements especially important for the production of reliable software are still missing or are only rudimentarily present in available languages. We shall discuss these elements and give arguments for their necessity in the sequel. For reasons of easy reference, a survey listing of these features is provideed in Table 9.2. The real-time features considered in this section can be divided into three groups.
184
Real-Time Systems
The first group comprises constructions easing the formulation of frequent appli cations, serving to supervise the states of tasks and resources, and controlling the duration of synchronisation and resource claim operations. The second group con sists of desirable operating system services, that should be provided for the purpose of reliable and predictable software performance. To be able to control these features, several language elements need to be added. Finally, software verification measures are collected in the third group of features. Their utilisation only requires a few control statements. The leading idea behind all the proposals described is to facilitate reliable, pre dictable, and fault-tolerant program execution by providing robust and inherently safe language constructs, which generally aim to prevent software malfunctions. All this is a prerequisite for the safety licensing of software in hard real-time environments. This approach is oriented at the objective of modern quality assurance, according to which not the detection of faults, but their systematic prevention is to be achieved. Despite their fundamental significance in real-time applications, almost none of the languages considered allows to specify completion deadlines for task executions. But processors ought to be scheduled using algorithms capable of guaranteeing the obser vation of the tasks' deadlines. This goal can generally not be achieved by dynamic priority schemes under control of the programmer, as supported by most languages and operating systems. Deadline driven scheduling algorithms also allow for the early recognition of whether it will be possible to process a task set in time. Otherwise, parts of the workload have to be removed. In order to carry this through in an orderly and pre dictable manner, it is desirable that the language allows a specification of which tasks could be terminated or at least be replaced by ones with shorter run-times, when required by an emergency or an overload situation. The implementation of due dates observing scheduling algorithms must be supported by language elements for the de termination of the tasks' run-times or upper bounds for them, and for the updating of residual run-times necessary before the tasks' final completion. To provide information about date and time, existing operating systems and thus also real-time languages completely rely on interval timers under control of a corre sponding interrupt servicing routine. Since the processor load by the latter must be kept reasonably small, the resolution of the time information remains rather poor. The unpredictable operating system overhead introduces further inaccuracies which become more intolerable the longer systems run after restarts. Therefore, highly ac curate real-time clocks should be provided as hardware components. The attention of the supervising software will be initiated by an interrupt to be generated when the actual time equals the next critical moment that has been loaded into a compari son register. Thus, the requirements of scientific and technical applications also with regard to very short time intervals can be met. Provisions should be made in the Ian-
High Level Real-Time Programming
185
guage to suspend tasks just before performing certain operations, e.g. inputs. When the clock's interrupt signal can be directly utilised to resume the execution of tasks waiting in such a manner, the accurate timing of single operations is made possible. In this way, for example, measurements can be carried out at precisely equidistant or Gaussian points. Based on this timing hardware, several time related surveillance features should be expressible within a language that otherwise would have to be explicitly programmed. So in control applications it needs to be guaranteed that events occur within given time frames or in predefined sequences. As can be seen in Table 9.1, Ada, P L / 1 , and Real-Time Euclid already provide a timeout feature for synchronisation operations. More than that, the process of claiming resources and assigning them to tasks must be supervised in a real-time environment. When developing process control programs, not only the software correctness, in the sense of mathematical mappings as in batch processing environments, has to be proved; also, its intended behaviour in the dimension time, and the interaction of concurrently active tasks, need verification. A step towards enabling safety licensing of real-time programs is to create the possibility for application oriented simulation. A test procedure, oriented at hard real-time environments, must yield exact information whether time constraints can be observed and whether due dates can be observed, in distinct contrast to the prevailing "hope and pray" practice of assuming that computers are so fast that they will easily master the given workload. This would also allow to determine the adequate computer capacity needed for a certain application. Moreover, the processing time for the operating system overhead has to be taken into consideration in the course of a simulation. In the test phase, interrupts must be simulated and protocoled to check whether a task set can cope with the requirements of a hard real-time environment. On the other hand, all events occurring during the execution of such a task set are to be recorded for the subsequent error analysis. This feature is also useful in the later routine application of a software package where it can ease the post mortem error search. To guarantee a predictable program ex ecution, every task should be fully expressible in terms of static language elements wherever necessary. It has already been mentioned that requested resources must be granted within predefined time frames. Finally, it is to be stressed that real-time lan guages ought to make the occurrence of deadlocks impossible by employing a priori prevention schemes.
9.4
High-Integrity PEARL
The requirements stated in Section 9.3 define a paradigm of computing that is very different from traditional computing paradigms. To bring those ideas to the industrial
Heal-Time Systems
186
real-time software writer, i.e., to facilitate the writing of real-time programs with predictable behaviour, an extension of PEARL, referred to as High-Integrity PEARL or HI-PEARL is defined. The original description of HI-PEARL can be found in [50]. .
Monadic Operators. State information on tasks and synchronisers should be accessible at the programming language level. The proposed functions yield results of type fixed. Given a certain model of task states, the function: TSTATE(task-identi£er) returns the code of the actual task state. Similarly, the function: SYNC(shared-object) interrogates the state of the shared-object, and returns the values 0 or -1 in the unreserved or exclusive access state, respectively, or the number of tasks having shared reading access. Surveillance of t h e occurrence of e v e n t s . For the surveillance whether, and in which sequence, events occur, we propose a new language feature. In this context we use a wider notion of events to comprise: - interrupts, - exceptions, - time events, - state transitions of synchronisers and tasks, and - the assumption of certain relations between shared variables and given values. These events may be stated according to the following syntax rules: event::= WHEN interrupt-expression \ ON exception-expression \ AT clock-expression \ AFTER duration-expression \ state-function-call relational-operator expression | shared-variable relational-operator expression | where state-functions are the ones introduced in the previous paragraph. In case of the last two of the above alternatives, the events are raised when the corresponding relations turn true. Now the new language element is defined as the statement: EXPECT alternative-list with: aiternative::=
FIN;
High Level Real-Time Programming AWAIT event-list DO
187
statement-list
When the program flow of a task monitoring events reaches an EXPECT block, the expressions contained in the event specifications are evaluated and the results stored. The task is suspended until any one of the events mentioned in the AWAIT clauses occurs. Then the statement list following the associated DO keyword will be executed. In case several events listed in different AWAIT clauses occur simul taneously, the corresponding DO clauses will be performed in the sequence they are written down. When the operations responding to the just occurred event(s) have been executed, the task is again suspended expecting further events. For leaving the described construction after finishing a surveillance function and to transfer control behind the block end FIN, the statement: QUIT; has to be applied. When this has been done, or when it has been left otherwise, there will be no further reaction to the events mentioned in the EXPECT block. The EXPECT statement allows to formulate the surveillance of input-output operations' execution and time behaviour. It corresponds to the selective wait statement of Ada, but, being especially important for real-time systems, it has fully deterministic semantics. Protection of Resources and Temporal Surveillance of Synchronisation Operations. For the purpose of securing data integrity when several tasks access the same resource, most languages only provide very basic means, such as semaphores and regions. Applying them for the locking of resources veils the nature of the op eration to be performed, since there is no obvious and verifiable relation between resource and synchroniser. Furthermore, programming errors become possible by not releasing requested resources, that cannot be detected by the compiler due to the missing syntactical connection between the mutually inverse synchronisation opera tions. On the other hand, sophisticated concepts like monitors or tasks to be accessed by rendezvous are also error-prone due to their complexity. The access to shared objects can be protected with the help of implicit, "invisible" regions to be generated by the compiler. For instructing it to do so in the case of shared basic objects, arrays, and structures we introduce the optional attribute: SHARED as part of the pertaining declaration syntax. As data types of shared variables, array elements, and structure components, respectively, only basic ones are admitted, because sharing others either leads to difficulties or is meaningless. For encapsulating the access to protected resources and to enforce the release of synchronisers, we introduce a LOCK statement with a timeout clause:
Real-Time Systems
188 LOCK synchronisation-clause-list [NONPREEMPTIVELY] [timeout-clause] [exectime-clause]; PERFORM statement-list UNLOCK; The three clauses of the above statement are defined by: synchronisation-clause: := EXCL USIVE(shared-object-expTession-list) \ SHARED(shared-object-expression-list) timeout-clause: := TIMEOUT {IN duration-expression | AT clock-expression} OUTTIME statement-list FIN exectime-clause: := EXECTIMEBOUND duration-expression
The task executing a LOCK statement waits until the listed shared objects can be requested in the specified way. By providing a TIMEOUT attribute, the waiting time can be limited. If the lock cannot be carried through before the time limit expires, the statements of the OUTTIME clause will be executed. Otherwise, control passes to the statement sequence of the PERFORM clause as soon as the implied request operations become possible. The corresponding releases will be automatically performed upon reaching UNLOCK, or when terminating the construction with the help of the: QUIT; statement. For reasons of program efficiency, seized resources ought to be freed as early as possible. To this end, the instruction: UNLOCK
shared-object-expression-list;
can be applied already before dynamically reaching the end of the surrounding LOCK statement, where the remaining releases will be carried through. The optional exectime clause is introduced to enhance the predictability and safety of real-time systems. It limits the time, during which a task is in a critical region. When the time specified by duration-expression expires, a system exception is raised. Thus, it is prevented that programming errors resulting in infinite loops can cause a blockage of the whole system. The optional attribute NONPREEMPTIVELY serves to improve performance. By specifying it, the operating system is instructed not to preempt the execution of the LOCK statement due to reasons of the applied processor scheduling strategy. Thus, superfluous and time-consuming context switching operations can be saved in the case, where a more urgent task requesting one of the locked resources commences execution before termination of the LOCK statement.
High Level Real-Time Programming
189
Except for their appearance in synchronisation clauses, shared objects may only be referenced within the framework of LOCK statements. They must be locked for exclusive access if the reference is used in any assignment context. In addition, deadlock prevention schemes are well supported by the LOCK statement, as will be shown below. An illuminating discussion of the more advanced features of the LOCK statement can be found in [51]. Exact timing of operations. Although most languages allow to specify time schedules for tasking operations, one cannot be sure when they actually take place. Since this situation is unacceptable in many industrial and scientific applications, a language element appears to be necessary to request the punctual execution of tasking operations, i.e., the critical moments need to be actively awaited. For that purpose an optional attribute: EXACTLY to be placed in time schedules will be used. Parallel processing and precedence relations of tasks sets. Although par allelism can be expressed and implemented with the task concept, it is sometimes necessary to request parallel execution of certain activities in a more explicit and specific way. That is the rationale to introduce the following language feature: PARALLEL activity-list FIN; with: activity::= ACTIVITY
statement-list
Naturally, PARALLEL statements may be nested and applied in sequence. Then, by including task activation statements into the various ACTIVITY statement-lists, they are particularly useful in expressing the bundling of task executions and to spec ify precedence relations holding between sets of tasks comprising arbitrary numbers of predecessor and successor tasks.
9.5
Advanced Features of High-Integrity PEARL
Inherent prevention of deadlocks. Employing the LOCK statement introduced in Section 9.4 for claiming resources, it already becomes possible for the compiler to verify the correct application of either of the two best known deadlock prevention schemes [39]. According to the complete resource allocation procedure, all required shared re sources must be reserved en bloc, before entering the critical region where they are
Real- Time Systems
190
used. This can easily be accomplished by mentioning them in the synchronisationclause-list of the LOCK statement. To ensure a deadlock-free operation, the compiler has only to examine now, if no further resource requests appear within the PERFORM clause. In order to apply the hierarchical resource allocation or partial ordering method, an ordering of the shared objects needs to be declared. An appropriate facility, when this deadlock prevention scheme shall be applied, is introduced as the statement: RESOURCE HIERARCHY sync-object
hierarchy-clause-list;
with: hierarchy- ciause::= > sync-object Nesting of LOCK statements can then be allowed, as long as the sequence in which the resources are claimed complies with the predefined hierarchical ordering. Application of feasible scheduling algorithms. As it was pointed out in Sec tion 9.3, tasks ought to be scheduled employing procedures capable of guaranteeing the observation of strict deadlines, usually set in hard real-time environments. The implementation of feasible scheduling algorithms, like the deadline driven one opti mal for single processor systems, needs to be accompanied by language elements for specifying the due dates, and for determining total and residual run-times, or at least upper bounds thereof, required by tasks to reach their completion. Furthermore, it should be possible to state in the source program, which tasks could be terminated or at least be replaced by ones with shorter run-times, when required by an emergency or an overload situation. We begin with replacing priority specifications in task declarations and in task activation statements by an optional deadline specifications, of the form: DUE AFTER
duration-expression
When the condition for a task activation is fulfilled, this duration is added to the actual time yielding the task's due date. As additional parameter, the deadline driven algorithm requires the task's (residual) run-time, which is stated in its declaration in the form: RUNTIME {duration-expression
\ SYSTEM]
In general, the given duration can only be an upper bound for the required pro cessing time. If the latter is not known, the keyword SYSTEM can be inserted, thus instructing the compiler to supply it according to a method outlined below. For use by the scheduling algorithm, three variables have to be allocated for each taskT:
192
Real-Time Systems
all interrupts will be masked, all tasks will be terminated, and the schedules for further activations of those tasks, whose declarations do not contain the optional attribute: KEEP will be deleted. Then the remaining tasks, together with the emergency tasks sched uled for the event of an overload condition, will be processed. An overload, and consequently discharge of load, may also arise as an effect of an error recovery. Since the recovery from an error, which occurred in a certain task, requires additional time, the ready task set may lose its feasible execut ability. The KEEP attribute provides the appropriate means to cope with such a situation and to implement a fault-tolerant software behaviour. R u n - t i m e estimation of tasks. As long as the program flow is strictly se quential, there is no problem with the run-time evaluation, since only the processing times of the corresponding compiler generated instructions need to be summed up. The language elements covered by this argument are assignment statements, in-line and access function evaluations, interrupt masking statements, i.e., ENABLE and DISABLE, branching to and from procedures, and operations invisible to the pro grammer, which are caused by the block structure. Whereas run-time determination is exact for the above mentioned features, in all other cases one cannot expect more than to obtain an upper bound by carrying out an appropriate estimation. As far as IF and CASE statements are concerned, an upper bound is yielded by adding the time required for the evaluation of the conditional expressions occurring in these statements to the maximum of the execution times of the two or more alter natives provided. The LOOP statement causes difficulties, because the number of times the instruc tions within the loop are performed generally depends on the values of variables and is, therefore, a priori not determined. In order to enable the run-time estimation for LOOP statements, we augment its syntax by the following clause: MAXLOOP fixed-literal EXCEEDING statement-list FIN; If the number of iterations exceeds the limit set with this clause, the loop execution is terminated and the statement list specified after the keyword EXCEEDING is carried out before control passes to the first statement behind the loop. At first, this restriction seems to be a serious drawback. But, on the other hand, it enhances the reliability being essential for real-time software, because faulty programming cannot lead to infinite loops and thereby to system hang-ups. From the viewpoint of run-time determination most concern is caused by the GOTO statement, in addition to its harmfulness resulting from the fact that its unrestricted use may produce unstructured and, hence, error-prone programs. This has been perceived at an early stage and has given rise to the development of struc-
High Level Real-Time Programming
191
T.DUE is a constant of type clock, which stores the task's due date, T. TIME is a variable of type duration, which is initially set to zero and is increased to accumulate the task execution time, T.RESIDUAL is a variable of type duration, which is initialised with the RUNTIME parameter, and then decremented to provide the residual time interval needed to complete the task properly. When the due dates and execution times of tasks are a priori available, using the following necessary and sufficient conditions ([52]), a feasible scheduling algorithm is able to detect whether a task set given at a certain moment can be executed meeting the specified deadlines. For any time t, 0 < t < oo , and any task T with deadline tz > t , let: a{t) = tz — t be its response time, l(t) the (residual) execution time required before completion, and s(t) = a(t) — l(t) its laxity (slack-time, margin). Then, necessary and sufficient conditions that a task set, indexed according to increas ing response times of its n elements, can be carried through meeting all deadlines are: a) for m = 1, i.e., for single processor systems:
a*>X^,fc = l,...,n,
(9.1)
b) and for m > 1, i.e., for homogeneous multiprocessor systems:
ak>~
E^+
m
t=l k
ak>
n-fc + 1
E n
X) /. + J ] t=l
max(0,ak-Si) , k = m, ...,n — m -+- 1,
(9.2)
i=lfc+l fc-1
maX
(®,
a
k -Si)-
*=fc+l
^
°«
, k = n — m -\- 2, ...,TI,
i=n-m+l
(9.3) For k = l,...,ra — 1 Equation 9.2 must be valid, except if there are j tasks with ak > Si for k < i < n and j' + k < m. Then, the condition: ak>
1 k+ j
^2h+ t=l
X)
rnax(ti,ak-Si)
(9.4)
t=Jb+l
must be fulfilled. If at least one of the above inequalities is violated, a transient overload situation must be handled. In order to carry this through in an orderly and predictable manner,
Real-Time Systems
194
clauses. Taking the maximum execution time of all alternatives present in the EX PECT block provides an upper bound for its processing time upon each activation or resumption, respectively. Finally, the task's continuation will be scheduled for the end of the EXPECT element. As additional WHEN clauses, also actions can be specified which are performed in case certain time limits are exceeded. Thus, utilising the given time conditions, the overall run-time estimation of the task surrounding an EXPECT construct becomes possible. The above discussion has shown that an exact calculation of a task's run-time will be an exceptional case. But, it will often be possible to improve the estimation of a task's residual run-time in the course of its execution, e.g., when control reaches a point where two alternative program sequences of different length join again. To this end, the statement: UPDATE
task-identifier.RESIDUAL:=duration-expression;
is introduced for setting the residual run-time to a new and lower value. Owing to inaccuracies in the run-time estimation, overloads may be detected and handled, although they would not occur when executing the corresponding ready task sets. This is the price which has to be paid for the predictable and safe operation gained. It was the rationale for the provision of the UPDATE statement to minimise the occurrence frequency of the mentioned effect. By the thinking criteria of classical computer science, the lower processor utilisation achieved with this run-time estima tion technique may be considered a serious drawback. However, as pointed out in Section 2.1, processor utilisation is not a feasible design criterion for embedded real time systems, as costs have to be seen in the framework of the controlled external process and with regard to the latter's safety requirements. The run-time estimation techniques outlined above can be implemented in the form of a pre-processor working on the source code and a post-processor interpreting the compiler generated assembly language code. The detailed algorithms and structures of the automatic software analyser will be given in Chapter 10. Support of task oriented virtual storage management. In [51] nearly opti mal look-ahead algorithms for virtual storage administration were given, which em ploy the code of entire tasks as paging element and which are closely related to deadline driven scheduling. The idea behind these algorithms is, that the mentioned scheduling method orders the elements of ready task sets according to increasing response times. Since the tasks are processed in this sequence, an ordering of the cor responding storage accesses to program code and data by increasing forward distance is also implied. This observation suggests basing a virtual storage administration scheme on task objects as paging elements, thus representing an application oriented realisation of working sets. It is even possible to take not only the ready tasks into account, since, for utilisation in a more sophisticated look-ahead algorithm, informa-
High Level Real-Time Programming
193
tured programming. Since this discipline has revealed, that in the majority of cases the application of jumps can be avoided provided appropriate structured language elements are available, we can confine the usage of GOTO statements to the purpose of leaving loops, blocks, and other constructions, when some code is to be skipped. Thus, both the software reliability is enhanced and the run-time estimation is en abled, because jumps can be simply disregarded. The restricted use of the GOTO statement can be enforced by the compiler. The span between the times a synchronisation operation requesting resources is reached and finally carried through is generally undetermined. An execution time bound for the LOCK statement introduced in Section 9.4, however, can easily be obtained by adding the time required to claim the specified resources, the maximum waiting time specified in the TIMEOUT clause, and the maximum of the execution times for the OUTTIME statement list on one hand and for the PERFORM statement list including the release of the synchronisers on the other. By combining the above stated estimation techniques and applying them recur sively, run-time bounds for program modules such as blocks, procedures, and task bodies can be determined. In particular, the method can also be applied to proce dures, which are part of a language's run-time package. Such procedures are invoked by the input-output statements and the tasking statements. From the view-point of run-time determination, temporal suspensions of tasks not need to be considered, since for the scheduler only the residual run-time required after a task's resumption to reach its final completion is a significant quantity. Statements simulating exter nal interrupts or exceptions can be excluded in the context of run-time estimations, because they are only useful in the software testing phase. In order to take the run-time of exception handling into account, an exception handler is regarded as a procedure, whose maximum run-time can be estimated. This is added to the execution times of those parts of the task, which are processed when the exception handler is invoked. Taking finally the maximum of the task run time without exception handling and of all paths through the task which include calls of the exception handlers yields an overall upper bound for the task's execution time. According to [53], however, the necessity of exception handlers ought to be avoided wherever possible. This can be achieved by incorporating, into the application programs, appropriate checks before the statements, whose execution may cause the invocation of an exception handler, and by providing suitable alternatives. This procedure is also advantageous from the standpoint of run-time determination. A run-time estimation for an EXPECT block can be derived by considering the feature itself, the preceding, and the succeeding parts of the surrounding task as sep arate tasks. At first, the initial part of the task surrounding the EXPECT feature is performed, finishing with activating all alternative actions of the latter as separate tasks, scheduled for the occurrence of the events specified in the respective WHEN
High Level Real-Time Programming
195
tion is available in process control systems specifying when temporarily non-ready tasks will be activated. When the latter event occurs, these tasks may supersede other ones, that became ready at earlier points in time. The information mentioned above can be extracted by considering buffered task activations and the schedules for task activations and continuations. The calculation of the required parameters is easily possible for only time dependent schedules. To enable a similar determination of future task activations in the case of interrupt driven schedules, the user must supply an average occurrence frequency for each interrupt. This can be achieved by augmenting the interrupt definition in the system division with a corresponding optional attribute: INTERVAL duration-literal; Some other features can also be utilised for providing further directives for the storage management. System data and shared objects with GLOBAL scope should be placed together with the supervisor in a non-paged storage area. The same se mantics may be assigned to the attributes RESIDENT and REENTRANT of tasks and procedures, respectively. D y n a m i c reconfiguration of distributed s y s t e m s . An objective for the utili sation of a distributed computer system is to provide the ability to react to erroneous states of some of its components. To this end, the available hardware redundancy can be employed. It necessitates the capability of dynamic reconfiguration, i.e., an au tomatic migration of software modules from a malfunctioning unit to an operational one. However, it should be kept in mind that the applicability of dynamic reconfigu ration for maintaining the operability of a process control system is relatively limited. This is due to the fact that control applications are heavily input-output oriented and that the relocation of software to other processors is meaningless when the peripherals hard-wired to a faulty processor cannot be used any more. In order to apply the concept of dynamic reconfiguration, a language construct is needed which serves to specify how software modules are assigned to the various processors initially, and how this assignment is to be rearranged upon occurrence of different faults. Such a configuration statement is introduced as follows: CONFIGURATION initial-part [reconfiguration-part-list] ENDCONFIG; with: initial-part: := load-clause-list
Real-Time Systems
196 reconfiguration-part::— STATE Boolean-expression remove-clause-list load-clause-list ENDRECONF; load-clause: := LOAD task-identi£er
TO
processor-identifier;
remove-clause: := REMOVE task-identifier FROM
processor-identifier;
The CONFIGURATION statement contains an initial part and may contain recon figuration parts. The assignment of tasks to the processors present in the system for undisturbed normal operation is specified in the initial-part by a list of load-clauses. For an error condition represented by a BooJean-expression in the STATE block of a reconfiguration-part, the latter determines how the software is to be redistributed when the error occurs. This is carried out by specifying which tasks are to be removed from certain processors and which tasks are to be loaded to other ones with the help of remove-clauses and Joad-cJauses. Graceful degradation. The concept of graceful performance degradation by utilising imprecise results of computation was proposed in [54]. In a system that supports imprecise computations, intermediate results produced by prematurely ter minated server tasks are made available to their client tasks. By making results of poorer quality available when the results of desirable quality cannot be obtained in time, real-time services, possibly of degraded quality, are provided on a timely basis. This approach makes it possible to guarantee schedulability of tasks while the system load fluctuates. In order to utilise the imprecise computation approach, real-time tasks need to have the monotone property, the accuracy of their intermediate results is non-decreasing as more time is spent to obtain the results. The result produced by a monotone task when it terminates normally is the desired result and is called the precise one. Exter nal events such as timeouts and failures may cause a task to terminate prematurely. If the intermediate result produced by a server task when it terminates prematurely is saved and made available, a client task may still find the result usable and acceptable. Such a result is then called imprecise. The imprecise computation approach makes scheduling hard real-time tasks sig nificantly easier for the following reason: to ensure that all deadlines are met only requires that the processor time assigned for every task, i.e., the amount of time during which the processor is assigned to run a task, be equal to or larger than the amount of time required for the task to produce an acceptable result, which is called
High Level Real-Time Programming
197
its minimum execution time. The scheduler may choose to terminate a task any time after it has produced an acceptable result. It is not necessary to wait until the exe cution of the task is finished. Consequently, the amounts of processor time assigned to tasks in a valid schedule can be less than the amounts of time required to finish executing the tasks. The prerequisite for employing the imprecise computation approach is that real time tasks possess the monotone property. Iterative algorithms, multiphase protocols, and dynamic programming are examples of methods giving rise to such tasks. Unfor tunately, tasks with the monotone property are relatively seldom in process control applications. This prevents the utilisation of the very promising concept of imprecise results in its basic form. In the next paragraph a modified version is presented which makes the concept feasible for control and automation applications. In order to detect and handle errors on the basis of diversity and to cope with transient overloads, the altered concept is employed to provide graceful degradation of a system in response to the occurrence of faults. Transient Overloads. Despite the best planning of a system, there is always the possibility of a transient overload of a node resulting from an emergency situation. To handle such a case, load sharing schemes which migrate tasks between the nodes of distributed systems have been considered. In industrial process control environments, however, such schemes are generally not applicable. Unlike computing tasks, control tasks are highly input-output bound and the permanent wiring of the peripherals to certain nodes makes load sharing impossible. In addition to the overload handling scheme using the KEEP attribute for tasks, which was described earlier in this section, a more sophisticated method based on the concept of imprecise results can be introduced. The method's basic idea is that the programmer provides for each task alternatives of their executable parts, i.e., the task bodies. To this end, the run-time attribute of a task declaration may be given in a third form: RUNTIME
SELECTABLE
Such a task provides alternatives of different run-times for its executable part. They are made available for selection by the task scheduler in the framework of the following language construct: TASK BODY alternative-list FIN; with: alternative: := ALTERNATIVE statement-list
WITH RUNTIME {duration-expression
\ SYSTEM};
The run-time parameters of each alternative are specified in the same form as for
Real-Time Systems
198
tasks, which was described earlier. The compiler sorts the alternatives in decreasing order of their run-times. Thus, when a task with this feature is activated, the task scheduler can select for execution the first task body alternative in the ordered list which allows feasible schedulability. Since the order generally conforms with decreas ing quality and accuracy of the results produced by a task, this scheme represents an implementation of the requirement of graceful degradation of system performance when a transient overload occurs. The usefulness of the above approach can be shown by considering the following example. In process control applications, a frequently occurring task is the periodic sampling of analogue or digital external values. In the presence of a transient over load, for one or a few measuring cycles, a good estimation of an external value can be provided to client tasks by extrapolating the value from the most recent measure ments. Since no driver programs need to be executed, the task's execution time is reduced to a very small fraction. Nevertheless, acceptable results are delivered and, thus, a system break-down is prevented. Diversity based error detection and handling. Software diversity can be used to detect and cope with programming errors. In general, however, this approach is rather costly, because several different versions of a program module need to be provided and executed either in parallel or sequentially. Here, savings can be achieved by employing the concept of imprecise results again. Instead of having different mod ules, which are to yield the same results, only one module is provided that determines the precise results. The latter are compared with estimations, i.e., imprecise results, generated by less complex and faster to execute software modules. This is performed in the framework of the following language structure: DIVERSE alternative-list ASSURE statement-list
FIN;
with: aiternative::= ALTERNATIVE
statement-list
Each alternative constitutes a diverse software module. When all occurring alter natives are executed, the comparison of their results is carried through in the ASSURE clause. Its function is to verify the precise results. If, however, an error is detected, imprecise results can be assigned to the result variables, which will be used in the continuation part of the program. The rationale behind this approach is that the processes of obtaining the approximations are more reliable, since the corresponding software modules are less complex. Naturally, the above described semantics could also be implemented with if-thenelse statements. The proposed language feature, however, provides the possibility
High Level Real-Time Programming
199
for compiler and operating system to distribute the single alternatives to different processors and, thus, for their parallel execution. Consider again the example of the last paragraph. One can notice that the ap proach corresponds to the well known procedure for validating a sampled external value. The measured value is compared with the result supplied by an extrapolation procedure; if both values only differ within a predetermined margin, the ASSURE clause releases the sampled one for further usage in the program. Otherwise, an error in the external data source, the process peripheral, or in an invoked software module is detected, and the approximated value is used for further calculations. Tracing and event recording. For the purpose of controlling whether a real time software package works in the intended way, it is necessary to record intermediate results and the occurrence of events influencing the single activities. The first fea ture to be provided in this context is tracing, being not typical for process control systems. Hence, we can refer here to other languages like FORTRAN and LTR, where statements were defined, instructing the compiler to generate additional code for writing specified traces into certain files. In order to avoid frequent changes in the source code, a compiler option appears helpful that selects whether the tracing control statements are to be considered or to be treated as comments. When the behaviour of a real-time system is to be understood, it must be known when the events determining the state transitions of tasks have occurred. The kind of events to be considered here are: - interrupts, exceptions, and changes of masking states, - state transitions of tasks and synchronised variables, - reaching and actual execution of tasking and synchronisation operations. These events, or specified subsets thereof, should be recorded on a mass storage device also during routine operation to enable the post mortem analysis of software malfunctions. When simulations are carried out in the test phase, such files represent the corresponding output requiring no further specification. Conclusion. In the past, and especially in the case of Ada, concessions have been made, when designing real-time languages, to enable their implementation under existing operating systems. This contradicts a feasible proceeding, because operating systems ought to support the language features, in an inconspicuous manner, and to bridge the gap between language requirements and hardware capabilities. Hence, the development of a process control system should commence with the definition of a suitable language incorporating a host of features enhancing software reliability. Then the hardware architecture is to be designed, enabling the implementation of the language as easily and efficiently as possible and, thus, keeping the operating system and its overhead relatively small. In the case where the implementation and operating efficiencies of a real-time system are in conflict with the efficiency and the
200
Real-Time Systems
safety of the process to be controlled, the latter has naturally to be valued higher. In this chapter, some shortcomings of presently available real-time languages with regard to process control applications were indicated, and various real-time features were proposed to overcome these drawbacks, making it possible to express all process control programs fully in terms of a high level language. The proposals emphasise language constructs which are inherently safe and which facilitate the development of fault-tolerant and robust software.
Chapter 10 Schedulability Analysis 10.1
Schedulability Analyser
A typical real-time application involves the use and control of expensive equipment, possibly in situations where a missed deadline may well lead to substantial damage, including loss of human life. For many real-time systems it is thus of paramount importance that they adhere predictably to their critical timing constraints. A real time system must be tested, analysed, and checked to adhere to its critical timing constraints before the system is put in place. A real-time system consists of hardware and software components. In most cases, the hardware components are composed of a relatively small number of subcompo nents. It is uncommon to develop new hardware for real-time applications. Thus, considerably more effort is usually invested in the development of the software compo nents of a system. The software components are typically programmed in a high level language with some lower level functions possibly written in assembly language. As the software is written, the programmer attempts to follow the timing specifications of the system to the best of his ability. The resulting code is subjected to analysis for adherence to its critical timing constraints. This form of analysis is commonly referred to as schedulability analysis (a term introduced in [56]). Naturally, schedulability analysis can be done by hand or through ad hoc tech niques. However, this approach is not desirable. Consider the following analogy with compilation. The earliest programs were hand-assembled, or even written di rectly at machine-level. As application software size grew, the need for high level language programming became apparent, such languages were developed, and com pilers were written. It is still possible to hand-assemble any program. However, hand-assembly is extremely time-consuming, error-prone, and produces code that is not inherently better than that generated by a compiler. Assuming that automated, 201
202
Real-Time Systems
accurate schedulability analysis technology can be made available to the real-time system developer, there is no excuse for the continuing wastage of the developer's time on ad hoc analysis activities. Accurate, language level schedulability analysis techniques and a prototype soft ware tool have been described in [56, 57]. In short, the analyser consists of two parts: a partially language-dependent front-end and a language-independent back-end. The front-end is embedded in the code generating/storage allocating back-end of a high level language compiler. As instructions are generated, the front-end computes the amount of time they will take to execute and builds program trees of segments (a tree per compilation unit, such as a subprogram or a task — see below for an example of a program tree), keeping track of intra- and inter-procedural and modular depen dencies, concurrency control, and control flow. The back-end reduces some inter-tree relations (such as unrolling a subprogram call), and then undertakes the analysis of irreducible inter-tree relations, generating worst-case bounds on the amount of inter task contention (the amount of time tasks spend while waiting to enter a critical region, in the application or the runtime kernel) and deriving guaranteed response times and other schedulability information (see below). To keep the analyser language- and environment-independent, hardware and soft ware dependencies are confined to tables. Thus, for instance, a table is used to store functions that compute how long each machine instruction takes to execute. The anal yser "walks" through the tables as needed, with each step triggered by either a newly emitted instruction (the front-end) or a particular tree transition (the back-end). When the analysis is complete, the analyser reports the schedulability informa tion, per system, task, subprogram, statement, even instruction, to the programmer. Should the report indicate that the system is not guaranteed to meet its critical timing constraints, the programmer decides, on the basis of the detailed information provided, where and how the program may be modified. While the predicted response times derived by the analysis are very accurate, the analysis may take very long to execute on some programs. Consequently, care must be taken to keep the execution time of the analyser tool acceptably short. Special techniques which virtually eliminate the conditions under which the analysis takes too long have been developed for that purpose. The new techniques first prepare the program through deadline-preserving transformations, which eliminate timing discrepancies on most irreducible program tree branches. The program trees are then subjected to even more effective reduction than before, resulting in much reduced trees. Both the transformations and the reduction execute very fast. The program developer is then given a forecast of the analysis running time before the remainder of the analysis proceeds. Should the forecast be acceptably short, the remaining analysis is run. Otherwise, the developer is offered a choice of program transformations which may change (for better or for worse, and typically only slightly)
SchedulMHty Analysis
203
the worst-case response times of the program. Once some of these transformations are made (again fast), the sources of long execution are eliminated and the analysis will run quickly. A prototype schedulability analyser has been thoroughly evaluated in a realistic environment. It has been found to considerably shorten the development time of real-time software, and to serve as a convenient, practical tool. However, to fully benefit from the advantages of the new approach, it is necessary to interface the analyser to the compiler of an appropriate real-time programming language, such as Real-Time Euclid, which was used in the experimental environment. This approach will be applied in the next section to define the concept of an automatic schedulability analyser for High-Integrity PEARL, the new PEARL language version described in Sections 9.4 and 9.5. Armed with the analyser/transformer, HI-PEARL will be a tool highly suitable for industrial real-time process control.
10.2
Front-End of the Schedulability Analyser
The front-end of the schedulability analyser is incorporated into the coder, and its task is to extract timing information and calling information from each compilation unit. Clearly, the front-end of the analyser is compiler- and machine architecturedependent. The output of the front-end, however, is free of such dependencies. As statements are translated into assembly instructions in the coder, the front-end of the analyser builds a tree of segments for each procedure or task. The various types of segments are as follows. Simple segment contains the amount of time it takes to execute an explicit section of code. As each assembly instruction is generated in the straight-line section, the front-end of the schedulability analyser adds its execution time to the segment time. Instruction execution time is computed, in a table-driven fashion, as a function of the opcode(s), the operands and addressing modes, following the procedure described in the appropriate hardware manual. All times are recorded in processor clock cycles instead of in absolute time. Internal-call s e g m e n t contains the name of a called internal routine, and will even tually be resolved by substituting the amount of time it takes for that routine to execute. External-call segment and System-call segment are similar to an internal-call segment. If the call attempts to enter or exit a critical region (see below), the identifier of the region, and a pointer to the region state descriptor are stored in the segment record. Furthermore, if the call has a timeout, then the time after which to time out is recorded. If segment splits a tree into two subtrees where one subtree corresponds to the then-
Real-Time Systems
204
clause, and the other one to the else-clause of the corresponding if-statement. The segment will eventually be resolved, and only the subtree which takes longer to execute will stay. Case-statements result in a multi-way generalisation of two-way if segments. A loop does not by itself cause a tree sub-branching. If the body of the loop is all pure explicit code, then the loop just results in a simple segment incorporating the amount of time it takes to execute a single iteration times the number of iterations. If the body of the loop contains a call segment or an if segment, then the loop is unwound into a segment chain, i.e., its body is repeated the number of iterations times. As an assembly instruction is generated, the front-end of the analyser records its execution time. The time is computed as a function of the operation code, the operands, and the addressing modes. A procedure explaining how to compute such timing information for a particular machine architecture is typically given in a ma chine instructions manual. The time is added to the current simple segment being built. Apart from building segment trees, the front-end of the analyser also records infor mation on the organisation of critical regions as well as calling information. A critical region is defined as the body of a LOCK statement or a call serialised by the runtime kernel. For every critical region its start and end are recorded, as well as which routines and tasks are using it. For every routine it is recorded who calls it, whom it calls, what critical regions it uses, and what I/O operations it executes. Finally, for each task its frame (i.e., the task's minimum activation period — for instance, a sensor may not be probed more often than once per second) and activation information are recorded, as well as whom the task calls, what critical regions it uses, and what I/O operations it executes.
10.2.1
A Segment Tree Example
To demonstrate how segment trees are built, let us consider the example of Fig ures 10.1, 10.2 and 10.3. The high level statements Hi and H2 translate into assembly sections Aoi and A02, respectively. Hs, a call to a procedure B, generates sections Am and A^. The procedure execution will take place right after AQZ, a section comprising a push on stack of the variables x and y and a jump to B, and before J4O4, a stack restore section. The if-test of H+ results in a compare AQS and a conditional jump AQ&. The assignment H$ maps to A0?. The else H6 becomes the unconditional jump AoaThe enter-critical-region Hj generates Aw, an assembly section encompassing a push
Schedulability Analysis
205
Hi
H2 H3 H< H5 H6 H7 Hs H9
x := y + z y:=S*y
B fay) if y > x then z := z + 5 else enter-critical-region (R) end if z .-= x * y
Figure 10.1: A block of high level language statements.
AQI A02 -i4o3 A04 Aos Aoe A07 Aos Aog Aio An
< Shift y left one bit>
< Restore stack> < Compare y and x> < If greater, jump to Aog>
< Multiply x and y and assign to z>
Figure 10.2: The corresponding block of assembly statements.
Real-Time Systems
206
on stack of the region identifier R and a jump to Kernel.EnterCriticalRegion, the kernel routine supporting the enter-critical-region operation, and Ai, a stack restore section. The last assignment statement, Hg, generates section An. The segment tree corresponding to the high level code block is formed as follows. Aoi, AQ2, and A03 form a contiguous section of straight-line code. Thus, they comprise a single simple segment S%. The amount of time Aoi, AM, and A03 take to execute is recorded in the segment. The call to B forms S3, an internal-call segment. 1 A04 and A05 form a simple segment 53. Now we must form two branches. If the conditional jump Aos fails, then the sections A07, Aos, and .An will be executed. The amount of time it takes to execute A ^ if the test fails is thus combined with the times of A07, Aos, and An to form a simple segment 54. If the test succeeds, then the sections A09, A10, and An will be executed. The right branch is therefore formed as a list of three segments: a simple segment ^5 encompassing Aoe (if it succeeds) and A09, a kernelcall segment Se corresponding to the enter-critical-region, and £7, a simple segment comprising A10 and A n . For simplicity, in this example we omitted a discussion of how the region identifier name and other supplementary information are stored.
10.2.2
Front-End Statistics
The front-end of the schedulability analyser provides the following statement-level timing information. The statistics for each statement involved consists of its line number coupled with the worst amount of time, in clock cycles, the statement may take to execute. The information is produced for the following statements. 1. Each language-level expression or non-iterative statement (such as assignments, asserts, binds, returns, and exits). 2. Each selector (if- or case-) statement consisting of straight-line code, on the basis of which it is possible to determine the longest (timewise) clause. 3. Each iteration statement consisting of a body of straight-line code. The programmer can use this statement-level information to determine which state ments and groups of statements run longer than he expects. The information can also be used at a later stage, when the rest of schedulability information, such as guar anteed response times, becomes available. Then the programmer can decide which program parts have to be re-written to increase their speed. or an external-call segment. It does not really matter for the purpose of the example.
Schedulability Analysis
ft
Si
All, A ^ A B
ft
CallB
ft
A-M, A0S
Aoe(jump failed,) AOT, Aos, An
Aoedump succeeded), Aog
Call Kernel. Enter CriticalRegion
ft
^10,^11
Figure 10.3: The corresponding tree of segments.
208
10.3
Real-Time Systems
Back-End of t h e Schedulability Analyser
The back-end of the schedulability analyser is a separate program, and its task is to resolve all information gathered by the front-end, and to predict guaranteed response times and other schedulability information and diagnostics for the entire real-time application. The back-end constructs a map from a program, or more precisely, from segment trees built by the front-end, to an instance of a schedulability analysable model. The instance is then solved (analysed) for guaranteed schedulability. The back-end of the analyser is independent of the compiler, the machine architecture and so on. The back-end of the analyser starts off by resolving segment trees built by the front-end. All individual segment trees and critical region, routine, and task records are concatenated into one file, including the segment trees and records pertinent to predefined kernel and I/O routines. All non-kernel calls are recursively resolved by substituting the corresponding routine segment tree in place of each call segment. As many if segments as possible are resolved. After all non-system calls are resolved, only task segment trees are left. The trees are converted to different segment trees, where each possible segment is one of the following. Regular segment is one containing the amount of time it takes to execute a section of code that is not in a critical region in the absence of contention. Critical region segment is one containing the amount of time it takes to execute a critical region of code in the absence of contention. Specified-delay segment corresponds to a delay specified in any execution time bound clause or in a schedule in the program. Queue-delay segment corresponds to a delay while waiting to enter a critical re gion, and is implicitly present in the segment tree just before every critical region segment. If segment has the same semantics as before. We now demonstrate how front-end segment trees are converted continuing with the example of Figures 10.1, 10.2 and 10.3. Figure 10.4 has the corresponding con verted subtree, now a part of a process tree. The simple segment Si maps to a regular segment Ni. The call to B is resolved and TVj, the tree of B, is substituted. We now form two branches (the new if segment is implicit). Since one branch of the subtree involves a kernel call, both branches are kept. The segment 53 is combined with the first regular segment of each branch. Thus, 5 3 and S4 together form N3, a regular segment corresponding to the failed jump case, and 53 also becomes a part of the first regular segment of the right branch. The way Kernel.Enter CriticalRegion op erates is as follows: interrupts are turned off, queues are updated, and interrupts are
Schedulability Analysis
209
restored. The regular segment JV4 thus comprises S3, S$, and the interrupt disabling code of the signal. The queue-updating code of Kernel.EnterCriticalRegiorn forms a critical region segment JV5 (preceded by an implicit queue-delay segment). The return from Kernel.EnterCriticalRegian is combined with 57 to form a regular segment N6. Each leaf of the tree of B is extended with the two branches we just formed. When segment tree conversion is complete, the tasks are ready to be mapped to an instance of a schedulability analysable model. The exact form of the model depends on the topology of the system on which the real-time application is intended to be run. Details of the model are beyond the scope of this chapter, but are similar to those described in [58]. Once mapped to the model (represented as sets of equations), the model is solved for intermediate (such as queue sizes) and ultimate (such as guaranteed response times) schedulability metrics using frame superimposition [56]. Frame superimposition simply means fixing a single task's frame at its starting time and positioning frames of other tasks along the time line in such a way as to maximise the amount of resource contention the task incurs. Frame superimposition uses relative positions of segments and distances between segments on the time line. The algorithm shifts frames exhaustively, for every time unit, for every task, and for every combination of frames possible. The frame superimposition algorithm is clearly of exponential complexity. In fact, finding the optimal worst-case bound for resource contention in the presence of dead lines is NP-complete, given that even the most basic static deadline scheduling prob lems are [59, 60]. However, this part of the analysis operates on a small number of objects (segments, each combining a great many statements), in a relatively small number of arrangements. Moreover, it has been demonstrated that considerably bet ter delay bounds are derived by this algorithm, than by closed-form or polynomial complexity algorithms[56, 57]. Once the metrics are available, they are reported to the programmer. The program mer then studies these metrics (detailing, if need be, the timing of the performance of every routine, call, critical region access, even every statement), and decides whether the program should be run as it exists, or whether it should be modified. Should the program need to be modified, the metrics are of great help in determining where and how.
10.4
Program Transformations
The reason why accurate schedulability analysis algorithms (such as frame superimpo sition) may sometimes be prohibitively time-consuming in nature is that the number of possibilities they have to consider grows combinatorically with the number of con-
Real-Time Systems
210
N3
Regular
Nt
Regular
N5
Critical region
N6
Regular
Figure 10.4: The corresponding converted subtree of segments.
Schedulability Analysis
211
trol flow alternatives (in conditional statements). Specifically, all execution paths must be considered where it is not known which may take longer to execute or which may generate more inter-task contention. Each such pair of alternatives then doubles the number of possibilities to consider. Consequently, the schedulability analyser was extended with a program transformer, whose purpose it is to reduce such alternatives. The transformer is partially based on the techniques reported in [61, 62]. The semantics of real-time programs must include deadline satisfaction. We say that a program transformation is: Deadline-isomorphic if the transformed program will meet deadlines if and only if the original program did; Deadline-preserving if the transformed program will meet deadlines if the original program did; Deadline-extending if the original program would have met deadlines whenever the transformed program does; Deadline-destroying if it satisfies none of these properties. We employ a number of deadline-preserving transformations of conditional code, subject to a number of standard symbolic execution and other assumptions. Essen tially, we can view a sequence of delay-free statements (i.e., the statements do not request a shared service) as a single statement taking as long to execute as the sum of the lengths of the individual statements. Likewise, a conditional with two delayfree branches will certainly do not take longer than the longer branch, and by above assumptions, could take that long. We can also condense a loop without delays into a node taking as long as the loop would have, i.e., approximately, the size of the range times the sum of the length of the loop body and the time for incrementing the range variable. In a loop with delays, the delay-free tail of an iteration of the loop can be merged with the delay-free head of the next iteration. We can, however, do more: we can insert fixed delays into flow branches to make accesses to a critical region happen at the same time as in other branches. The critical region will then start and finish at the same time in both branches. Inserted delays that are needed to balance a non-critical region node only block the task they are inserted in. On the other hand, inserted delays that are needed to balance a critical region node extend the shorter critical region and, thus, do not only block the task they are inserted in, but also other tasks seeking entry to a critical region. Moreover, we can insert varying delays into a shorter branch that does not access a critical region to mimic the critical region access done in a longer branch. The insertion of delays requires that, during schedulability analysis as well as at runtime 2 , when an inserted delay is reached, the task acts and has the same effect on other tasks as if it were trying to execute a critical region of the size of the delay. The 3
That is, during both symbolic and real execution.
212
Real-Time Systems
insertion of appropriate delays is our principal deadline-preserving transformation. The transformation does two things: first, it makes the resulting flow graph smaller and simpler (and thus faster to analyse); and second, the transformed program is easier to pre-schedule at compile-time. There are at least three dimensions on tests for conditionals: 1. What structure (in particular, loops or conditionals) can the branches involve? 2. What can the delay structure of the branches be? 3. How can we compare the times of the two branches? We can easily either (a) compare two delay-free branches, or (b) take any branch in place of an empty branch. A small extension of (b) is (b') to take a branch with arbitrary structure and time T in place of a delay-free branch with smaller time T". In either case, we can insert delays; in the first case, the delays are constant, but in the second they can be symbolic, but still cannot result in a program which fails to satisfy a deadline if the original program (subject to the symbolic execution assumptions) would. Otherwise, our system will compare branches only if neither contains a conditional or loop statement which itself contains a delay, i.e., if both branches are effectively linear. Observe that it is acceptable to have requests to multiple critical resources along the same branch, since the requests have to follow a static resource hierarchy, as defined in HI-PEARL (Section 9.5). In this case, we may be able to select one branch and insert delays in the other, to constrain calls to the critical region to occur exactly at times when calls would be made in the other branch, without violating a deadline. We use two versions of such tests. The first (c) applies to branches of the same length, and compares the sequences of execution times. We say that one sequence dominates another one if its elements are pairwise greater than or equal to the other's. If the sequence for the branch with greater total execution time dominates the other branch sequence, then delays can be inserted in a deadline-isomorphic manner. The second (d) applies to branches of different lengths: if the longer also has the greater total execution time, and if the sequence of times for that branch can be partitioned (by adding consecutive elements together) so as to dominate the other sequence, then we can still insert delays in a deadline-isomorphic manner. We illustrate the tests (a), (b'), (c), and (d) in Figure 10.5. The i-th non-critical region straight-line node is represented as a circle labeled Si = x on the inside, where clock(node)=x. The i-th critical region straight-line node is represented as a square labeled Cj = x on the inside, where clock(node)=x. The i-th delay prior to entering a critical region is represented as a black disk labeled a\ on the outside. Without loss of generality, all fork and join nodes have size 0. All resulting delays are generated and inserted in the right branches. In (a), a fixed delay of 2 is inserted between s3
Schedulability Analysis
213
Figure 10.5: Four conditional clustering tests. and 34. In (b'), a varying delay of 4 + rfj is inserted between s3 and 54. In (c), a fixed delay of 3 is inserted between s3 and d.2, another fixed delay of 2 is prepended to Oi and yet another fixed delay of 2 is inserted between ss and s 6 . In (d), a varying delay of 6 + di is inserted between s3 and d3 and a fixed delay of 3 is inserted between a6 and (£7. Although our tests can insert delays, and eliminate conditionals from schedulability analysis, some simple conditionals may remain irreducible. Consider, for instance, the graph in Figure 10.6. The graph was obtained by taking the graph (c) of Figure 10.5 and setting Ci and c2 to 2 and 6, respectively. Now, the two branches remain different but neither dominates the other. Thus, delay insertions are no longer guaranteed to be deadline-preserving. However, if we insert a fixed delay of 3 between «i and s3 and
Real-Time Systems
214
Figure 10.6: An irreducible conditional.
prepend Ci with a fixed delay of 4, then the new conditional is reducible (by inserting a fixed delay of 2 between 35 and s6). Unfortunately, the effect of the two insertions is not deadline-isomorphic nor preserving, but is deadline-extending. In general, deadline-extending transformations involve delay insertions on multi ple branches. Another interesting class of transformations involves introducing only some steps of a complete deadline-extending transformation. In the same example, a simpler (than the original) graph with a partially-reduced conditional results if only the delay of 3 between si and s3 is inserted, but Cx is not prepended with a delay of 4. This transformation, while potentially deadline-destroying, nevertheless preserves more of the original conditional structure. The transformation is of interest because even such partially-reduced graphs may already be analysed for schedulability in polynomial time using incremental techniques. The resulting, transformed program is, of course, much easier to analyse than the original program. Furthermore, when a deadline is not satisfied, we can undo the deadline-extending transformations one at a time until the deadline is satisfied. In some sense, this is analogous to dual algorithms for linear programming: we begin with an optimal program, and iterate until we find a feasible one. The early steps of this algorithm are quite inexpensive, since most tasks are represented by straight-line flow.
Schedula-bility Analysis
10.5
215
Empirical Evaluation
A detailed description of the evaluation of the schedulability analyser is outside the scope of this chapter, but can be found in [56]. Briefly, a comprehensive distributed Real-Time Euclid system (compiler, run-time system, schedulability analyser) have been implemented and evaluated. Teams of se nior undergraduate students have written medium size (under 5K lines of code) RealTime Euclid programs. Each program performs data acquisition via simulated sensors and keyboards, processes, analyses and reduces the data, and outputs commands via simulated output devices and terminal screens. The schedulability analyser has been used to predict response times and to compute other schedulability information for these programs. We have run these programs, recorded the observed actual response times and other statistics, and compared the actual statistics to those guaranteed by the analyser. It has been found that in the absence of overloads (i.e., missed deadlines) the worstcase guaranteed statistics differ very little (usually, under 10%, and never over 26%) from those observed. In the presence of missed deadlines, 25% to 44% discrepancies have been recorded. In contrast, when used on the same programs, the state-of-theart method [63] has predicted bounds that differed by 40% to 200% from the observed statistics. Through schedulability analysis and program transformations, a real-time appli cation can be, to a significant extent, guaranteed to perform in a predictable, timely manner. Schedulability analysis is thus an important part of the real-time applica tion development process. To be used for writing real-time software, a programming language should be schedulability analysable. This guiding principle is followed in the definition of HI-PEARL.
This page is intentionally left blank
Chapter 11 System and Software Life Cycle 11.1
System Development
Any development process consists of three types of activities: the identification of needs, the design of a solution, and the implementation. The activities are distributed in time turning one into another, returning back and overlapping. This observation divides the development process into three major phases. The division helps the developer in separating different aspects of the development, and forms the basis for any systematic development methodology. The phases can be characterised in terms of questions which must be answered in each phase. 1. What should be done? 2. How should it be done? 3. Does it really work? Distinguishing subsequent phases divides the development process in the time di mension into a sequence of smaller tasks, which are more manageable and easier to handle than the original problem. The development process can also be divided in the space dimension, by decomposing the developed system into a set of co-operating components. Each component, which will be referred to as configuration item, can be developed separately, provided that its interface to the environment has been defined carefully. To make the approach more systematic, it is necessary to precisely define a sequence of steps performing particular activities and to establish rules for transition from one step to the next. It is common practice to define the borderlines between steps in terms of documents which should be issued. The documents are evaluated according to predefined criteria and the acceptance of the documents enables the transition to the next step. This means, that the definition of each step consists of a list of tasks 217
Real-Time Systems
218 System requirements analys is System requirements
■ '
System design < ' Software item development
■
Hardware item development
design review
Functional audit Physical audit
r
' System integration and testing IVirmal qualification r eview Evaluation and acceptance ' System production and deployment Figure 11.1: System life cycle model.
to be performed, a list of documents which should be produced, and the definition of criteria for evaluating and accepting the documents. A comprehensive model of the system life cycle is shown in Figure 11.1. Apart from the beginning and the end steps, the model is split into two distinct paths of the hardware and software items' development. S y s t e m requirements analysis. In the first step the goals of the system are identified and documented. The primary result of the step is a document called System Requirements Specification which defines system functions, accuracy and per formance required, and indicates all pre-developmental decisions which can constrain the freedom of the designer and the implementer by forcing, for instance, the use of particular hardware devices, operating systems or design methods. The system requirements specification is accompanied by the project schedule and budget, and the set of criteria for the final product acceptance. The documents are subject to the formal System Requirements Review which examines the documents for understand-
System and Software Life Cycle
219
ability, internal consistency and adequacy. S y s t e m design. The second step roughly corresponds to the system decomposi tion into hardware and software configuration items. This means that the functional architecture of the system is defined, and the appropriate hardware architecture is selected. System functions identified in the requirements specification step are allo cated to hardware as well as to software items. The primary result of the step is the System Specification, which describes the system's decomposition into a set of configuration items, allocates system functions to particular items, and defines the interfaces between particular items and their environments. The system specifica tion is accompanied by the system test plan which defines the methods for testing particular items and their co-operation. The documents are subject to the formal System Design Review, which examines the documents for understandability, com pleteness and consistency, and traceability to the system requirements specification. After acceptance, the system specification forms the functional baseline which cannot be changed without a formal approval from the project management, and becomes the basis for the development of particular configuration items. Configuration item development. Configuration items identified in the system design step can be designed and implemented separately. This splits the step into a set of parallel actions, dealing with the development of particular items. In general, each item must be specified, designed, implemented (fabricated) and tested. The result of this step is the set of particular items accompanied by their full requirements, design and test documentation. The items themselves and the documents are subject to the Functional Configuration Audit and the Physical Configuration Audit to check for completeness and consistency. The details of reviews and audits will be explained in Chapter 12. S y s t e m integration and testing. The configuration items implemented in the item development step are assembled according to the interface definitions given in the system specification. The integrated system components and the system itself are tested according to the system test plan. Errors in system design and implementation are detected and removed. The results of the step are a working system and system Test Reports. After finishing the testing process and correcting the errors, the system documentation is updated and subjected to the Formal Qualification Review, which examines the completeness and adequacy of the tests conducted, and checks if the system is ready for the final evaluation. Evaluation and acceptance. In the last step final system testing and evaluation is conducted based on the criteria established in the requirements step. The goal of the evaluation step is to ensure that the implemented system fulfills the requirements stated in the system requirements specification. It is strongly recommended that the evaluation and acceptance should be performed by an independent qualification team.
Real-Time Systems
220
Relating the five steps of the system life cycle model to the three questions stated at the beginning of the current section, one can notice that the requirements specification step answers the question "what?", the design step answers the question "how?", and the last two steps (testing and evaluation) answer the last question "does it work?". The other step (item development) implements the solution defined in the design step. It should be emphasised that distinguishing the requirements from the design and implementation is essential. The requirements specification should describe the prob lem to be solved using the vocabulary of the application domain as much as possible. This facilitates review and verification of the specification by subject-matter experts and limits the risk of building the wrong system, i.e., a system which does not ful fill the users' expectations. Moreover, omitting the implementation details prevents favouring one technology over another from the very beginning, which could result in not finding the optimal solution. The methods for developing hardware and software items are, in general, different. Industrial practice shows, that the majority of system hardware can in most cases be configured from ready-made computer based components, and only a small fraction of specialised devices must be developed specifically. Software items, however, and particularly application programs, must usually be customised for each particular ap plication. Therefore, the remainder of this chapter is devoted to software development paradigms and methods.
11.2
Software Life Cycle
The engineering approach to software development is based on the same principle, viz., decomposition, as general system development. The typical development procedure consists in dividing the software to be developed into a hierarchy of modules, which can be designed, implemented and tested separately. To make the discussion on software development paradigms more precise, the following naming convention will be used. Software configuration i t e m (or item) is a collection of software components with a well-defined functionality, e.g., the software of an on-board computer, which is a part of a distributed avionics system. Software component is a distinct part of a software configuration item, which can be further subdivided into other components and units. Software unit is the smallest software component specified in the software design description that can be tested separately, i.e., for which the test requirements have been defined in the test plan; in other words, this is the smallest software component which is visible on the design level.
System and Software Life Cycle
221
Particular software configuration items are developed separately. The starting point for the development process is a preliminary (or original) requirements state ment, which indicates the needs and expectations of the software user. If the software is developed within the framework of a larger system, the preliminary requirements statement results from the system design step and is documented in the system spec ification. The classical software life cycle model, known as the "waterfall model" [66], divides the development project into a sequence of seven steps following one after the other allowing backtracking to correct the errors made in the previous steps (Figure 11.2). Each step produces a set of documents, which are used by different participants of the development process for different purposes: - The documents describe the current status of the development and are the basis for any further development activities. - They define the borderlines between particular steps and enable the project man agement to control the progress of the development. - They record all important choices and decisions which have been made, show the relationships between the requirements, the design, and the implementation components and units, thus forming the basis for any further modification and maintenance of the software. Maintaining a comprehensive documentation is tedious and time-consuming, but the benefits outweigh the expenses, and each practical development method defines cer tain documentation standards. It is recommended that the documents should be maintained in a machine-processable form and stored in a computer data base. Requirements analysis and specification. In the first step the user needs are analyzed and documented. The feasibility of particular requirements is studied and proved. Any pre-developmental decisions which constrain further development, such as the requirements for a particular hardware, operating system and programming language, are documented. The primary result of the step is the document: - Software Requirements
Specification.
The requirements specification precisely defines the functions performed by the soft ware, the external interface, the accuracy, time limits and performance required. An important part of the specification is the definition of a complete set of criteria for the final qualification testing and acceptance. Apart from the requirements, the project schedule and budget are specified. The specifications are subject to the formal Soft ware Requirements Review and are evaluated using the criteria of internal consistency and understandability, traceability to the preliminary requirements statement, quality of methods used for the requirements analysis, and adequacy of the qualification test definition. After the acceptance, the software requirements specification forms the allocated baseline which cannot be changed without a formal approval from the project
Real-Time Systems
222
Software requirements review
Software requirements analysis Preliminary design Detailed design
Preliminary design review Detailed design review
_ Implementation
1
Test readiness review
Integration and testing Formal qualification testing Maintenance Figure 11.2: Software life cycle model.
System and Software Life Cycle
223
management, and becomes the basis for the design and implementation activities. Preliminary design. The step corresponds to high level decomposition of a software item into main components. The requirements documented in the software requirements specification are allocated to particular components, and the interfaces between the components are defined. The qualification criteria established in the requirements step are elaborated to identify particular tests to be conducted and describe the way for generating (simulating) test conditions, gathering test data, and evaluating test results. The primary results of the step are two documents: -
Preliminary Design Description, and Software Test Plan.
It is recommended that, apart from specifying final design decisions, the documents include additional information that is essential to understand the design. The in formation may include rationale, results of analyses, and trade-off studies. The pre liminary design description is accompanied by the test requirements for conducting components integration and testing. It is important that the test requirements shall include stressing the software at the limits of its accuracy, time and performance re quirements. The description and the test plan are subject to the Preliminary Design Review and are evaluated using the criteria of internal consistency and understandability, traceability to the software requirements specification, and quality of the design methods. Detailed design. The decomposition of software components is continued until the level of individual software units and the interfaces between components and units are defined. The requirements are allocated to particular software units, and the algorithms and data structures for each unit are elaborated. The software test plan is extended to describe test cases in terms of input data, output data, and evaluation methods. Similarly, test cases for particular component and unit testing are defined. The primary results of the step are three documents: - Software Design Description, - Software Test Description, and - Component and Unit Test Descriptions. The documents are subject to the Critical Design Review and are evaluated using the same criteria as in the preliminary design step. The test descriptions are additionally examined for adequate coverage of the requirements. Implementation. The main goal of the implementation step is coding and debug ging programs of particular software units according to the description given in the software design description. Moreover, test procedures for higher level components are developed. The procedures are detailed instructions for the set-up, execution, and evaluation of results for particular test cases defined in the component test descrip-
224
Real-Time Systems
tion. The primary results of the step are the following: -
Source Code for each software unit, Test Procedures and Results for each software unit, and Test Procedures for each software component.
The test procedures are evaluated for consistency and adequate coverage of require ments, and the test results for completeness of testing and conformance to expecta tions. All revisions and updates to the source code and design documentation which result from correcting errors are then effected. Integration and testing. In the integration step, software units and components are upwards integrated into higher level components up to entire configuration item. All components are tested according to the test procedures defined in the component test description. In case of errors the source code as well as the design description are updated. Furthermore, test procedures for the software configuration item are developed and recorded in the software test description. The primary results of the step are the following: -
Source Code for the configuration item, updated and free from implementation errors, - Software Test Description updated with the item test procedures, and - Test Results for each software component. All revisions and updates to the source code and design documentation which result from correcting errors are then effected. The results of testing and retesting (after a component modification) are evaluated for completeness and conformance to ex pectations. The documents are subject to the formal Test Readiness Review and are evaluated for understandability and internal consistency, and traceability to the de sign description and the requirements specification. Source code listings are evaluated for the quality of design and coding methods. Formal qualification t e s t i n g ( v a l i d a t i o n ) . In the last step the configuration item is validated, i.e., checked whether or not it fulfills the acceptance criteria defined in the requirements step. This is accomplished by testing the item according to the software test description. In case of errors the source code as well as the design description are updated, and the modified components are retested. The primary results of the step are working software, final versions of the basic design documents: requirements specification, design description, source code listing and test reports, and the set of user documentation, including: - Software Product Description, and - Operation and Support Manuals. Maintenance. Software validation and acceptance are not the last development activities. After being put into operation and before its retirement, software is still
System and Software Life Cycle
225
subject to changes and modifications caused by the necessity for correcting errors, which have not been revealed during the testing and qualification steps, and by changing user requirements. Those activities, which are referred to as "maintenance", constitute the last step in the software life cycle. It is important to emphasise the difference between system testing in the inte gration and testing step — which corresponds to software verification, and the final qualification testing — which corresponds to software validation. The difference is fundamental. Verification is the process of evaluating a software component to determine whether it satisfies the conditions imposed at the beginning of the implementation step. This means that the software correctness is checked with respect to the description resulting from the previous (i.e., design) step. In other words, the verification process attempts to answer the question whether the component has been built correctly. Validation is the process of evaluating a software item to determine whether or not it fulfills the user requirements. This means that the software correctness is checked with respect to the requirements specification, which is usually a part of the legal contract between the customer, who orders the software development, and the development team. In other words, the validation process attempts to answer the question whether a correct software item has been built. It is worth noting that testing is not the only method for software verification and validation. An alternative approach is based on formal methods for proving program correctness. The latter approach has the advantage over software testing that — at least in theory — it is possible to prove the absence of errors, while testing can not. The problem is, however, that proving program correctness is extremely difficult, and there is no evidence that such proofs can be conducted by average programmers and without containing errors themselves. Formal methods for program development and verification will be treated in more details in Chapter 14. The software life cycle model does not constitute a method for software develop ment, as it gives no guidelines how to perform the activities required within the single steps. It only creates a management framework within which particular methods can be used. The methods can apply to particular single steps, or otherwise, they can cover several steps or even the entire development process.
11.3
Software Development Economy
The software life cycle model described in the previous section is unbalanced in that it underestimates the scope and cost of the maintenance phase. To investigate the problem it is necessary to estimate the average distribution of the development costs
Real-Time Systems
226
among the particular steps of the life cycle. The problem is difficult, as the data can be very different for different projects. Nevertheless, the average costs of the particular steps can be roughly related to the original project development costs (100%) as follows: (i)
00
(iii) (iv) (v) (vi) (vii)
Requirements analysis Preliminary design Detailed design Implementation Integration and testing Qualification Maintenance
15% 10% 15% 20% 20% 20% 70 - 200%
The above data apply to rather big projects which are conducted by multi-person teams within periods of at least one year. Looking at the table one notices surprisingly high costs of the maintenance step. Many studies report examples of the maintenance absorbing twice as many resources as the original development. Moreover, only a small fraction of the maintenance costs are due to error corrections, while the majority is caused by changes in the user re quirements. This observation is particularly true for embedded real-time systems, which are relatively long-lived and subject to continuous changes caused by the evo lution of the operating environments. The personnel turnover makes the problem of maintenance even harder. As a second observation, the table shows unexpectedly high costs of software testing — steps (v) and (vi) — which account for 40 - 50% of the total development costs. However, this represents not only the amount of effort to run the tests, but also the activities required to rework deficiencies in requirements, design, or code, which have been revealed during the testing process. What makes the problem really difficult is that real-time embedded software is usually extremely hard to test. This is caused by the complexity of the interface between a system and its environment, and by the inability to test the software in its operating environment. It is not feasible, e.g., to test a flight guidance system by switching it on and recording the number of crashes. A detailed analysis of costs spent on correcting errors has shown that the vast majority of these expenses — about 80% — is caused by the errors made in the requirements and design steps, while only 20% are due to implementation errors. This emphasises the role of the early steps of the software development cycle, and explains why they absorb twice as much in costs and effort as the code implementation step. The discussion leads to the conclusion that practical approaches to software devel opment should pay great attention to decreasing the testing and maintenance costs, by increasing testability and maintainability of the requirements and the design
System and Software Life Cycle
227
as well as the implementation of software items and their components. The means for achieving these goals are the following: • A rigorous management of software projects, comprising periodical reviews and verification of the results. This should prevent the building of systems which do not conform to the requirements and expectations. • Comprehensive documentation and recording of the requirements, design, and implementational choices and decisions. It is recommended to maintain the documents not only in plain textual form, but to provide graphic diagrams, charts and indices. • Careful decomposition and modularisation of the software during design and implementation, leading to understandable and manageable structures which are much easier to document, test and modify. • Reuse of various project elements. This does not only apply to the software code units, but also to elements of the requirements, the design, and the docu mentation. It is worthwhile to have a closer look at the most desirable features of project documents, which serve as the main vehicle for conveying ideas between the users and the developers. The quality of the documents significantly influences the costs of the original development as well as the project maintenance. It is recommended that the documents should be understandable, complete, unambiguous, consistent, and traceable. The first three terms refer to general rules for good writing style and do not require additional comments. Consistency may be interpreted in two ways. Internal consistency means that no two statements in a given document contradict one another, a given term, acronym, or abbreviation means the same throughout the document, and a given object or concept is always referred to by the same name. External consistency between documents refers to the same features, but applies to separate documents that are not related hierarchically. TraceabUity has a slightly broader meaning than external consistency, and applies to documents which are in a hierarchical predecessor-successor relationship, like re quirements specification and design description. Apart from consistency, it means that a given document supports all stipulations of the predecessor document, and that all material in the successor document has its basis in the predecessor docu ment, i.e., no untraceable material has been introduced. The relations between the documents should be made visible by means of explicit cross-references. The motivation for studying economical aspects of software development is to im prove productivity and to decrease costs. To evaluate potential savings it is interesting
228
Real-Time Systems Software costs (*109 $ per year) i
800 ^ world
1000
^USA
500 2 5 0 ^ ^ ^^^400 200
140^
100
^^125
50 [ ^ ^ 7 0 20 * 1980
1985
1990
1995
2000
Year
Figure 11.3: Software cost trends. to know the total expenditure for software worldwide. Recent and projected statistics for the USA and worldwide ([67]) are displayed in Figure 11.3.
11.4
Classical Software Development Methods
The traditional approach to software development supports the life cycle model de scribed in Section 11.2. The requirements specification is a document, usually long and boring, written in a natural language and precisely stating what the system should do, but avoiding statements about how it should be done. The document is readable to the user who is required to validate it, i.e., to review and accept its contents. After the validation, the document serves as a basis for system design and implementation. To play this role, the requirements specification should be: com plete, unambiguous, consistent, traceable, verifiable, and modifiable. A weak point in this procedure is that the validation of the requirements specification entirely re lies on user imagination. The ambiguity of natural language results in ambiguity of the specification. Moreover, the users rarely know precisely what they really want. The probability of building a wrong system which needs immediate modification (i.e., maintenance) is high. In the design and implementation steps the structure and algorithms of the system to be developed are determined and globally optimised according to selected criteria. Both steps are performed manually and require great creativity of the designers and the implementers. To verify consistency between the requirements specification and the implementation, comprehensive testing is needed. A weak point in this procedure is that only implemented programs may be tested, and that the correction of design errors found in the testing step is usually very expensive. Moreover, testing can never
System and Software Life Cycle
229
prove the absence of errors. The maintenance step is mostly performed on the implementation level. The op timised structure is modified and only local optimisation on the modified segments is performed. The requirements specification and the design description are not always updated, as this is a tedious and time-consuming task. This makes both documents inadequate and causes enormous problems during further maintenance. Major disadvantages of the traditional approach may be briefly summarised as follows: • Lack of an appropriate communication vehicle between analyst and user result ing in insufficient recognition of user needs. • Inconsistency and ambiguity of the requirements specification introduced by the (natural) language of the specification. • Low development productivity caused by the fact that major steps: re quirements analysis, design and implementation are labour-intensive and not amenable to automation. • Low reliability and very high costs of the verification step which entirely relies on testing. • Extremely high costs of maintenance which is performed on implementation code. It should be emphasised that the disadvantages are not related to any particular method, but are inherent characteristics of the development philosophy. Several methods which address the problems listed above have been developed and marketed. Most of the methods applied in the requirements analysis step build a more or less formal conceptual model of the system being developed. The model is decomposed with respect to data flow, control flow or data structure. The goal of the decomposition made in the requirements analysis step is structuring the problem description to make it more understandable. Implementational performance is not considered a decomposition premise. The methods make extensive use of graphic techniques and tools, which help the analyst in understanding the problem, make the specification less ambiguous, and support communication with the user. The most commonly used decomposition tool are data flow graphs which underlie a number of methods, e.g.: Structured Analysis [68], SADT [69] and CORE [70]. How ever, while data flow graphs are particularly well-suited for describing data processing and commercial applications, they cannot sufficiently describe real-time systems in which the control flow must be represented explicitly. To extend the application area
Real-Time Systems
230
Enable
Enable workcell Disable workcell
Vat status temperature
l
Temperature
k A valve control B valve control
Heater control
Figure 11.4: Data flow diagram of the vat control system.
of these methods, data flow graphs have been supplemented with additional notations to describe the flow of control. According to the notation introduced in [68], an extended data flow graph consists of circles, which represent data transformations (i.e., logical processes or actions), open rectangles, which represent data s t o r e s (i.e., sets or data structures), and directed arcs, which represent flows of data or control signals. The graph shown in Figure 11.4 describes the requirements for controlling the operation of the vat in the chemical process discussed in Section 1.3. The requirements for controlling the bottling lines fed by the vat have been omitted for simplicity. The arcs bearing double arrowheads correspond to continuous data flows (continu ous process variables), which are always present and which can be read at any instant of time. The same rule applies to the flows leading from the data stores, which can also be read at any time. The transformation Maintain vat monitors the continuous state variables pH, Level, and Temperature, reads the set-points from the data store Set-Points, and adjusts the control variables A valve control, B valve control, and Heater control. In principle, continuous variables are maintained continuously. The details of how it is done are not relevant at the requirements level. The arcs bearing single arrowheads correspond to discrete data flows (discrete
System and Software Life Cycle
231
Enable "Set pH" Workcell disabled Enable
Disable
Enable "Maintain vat" Disable "Set pH"
Disable "Maintain vat" Enable "Set pH"
Workcell enabled Figure 11.5: State transition diagram of the transformation "Control vat workcell".
process variables), which can be read at particular time instants only. The set-points are put into the store by three data transformation: Set pH, Set temperature and Set level, which accept external discrete messages carrying the values of the current set-points. The messages trigger the activity of the respective data transformations. The diagram in Figure 11.4 describes the requirements more precisely and gives more details than the original requirements statement given in Section 1.3. In partic ular, it highlights the relation between controlling the chemical process in the vat and setting the set-points for the vat controller. This problem has not been raised in the original requirements statement — nevertheless, it exists in reality. The set-points for the vat level and temperature can be changed at any time, as they are relevant to the performance of the controlled process, but do not change the product quality. The set-point for the pH, however, changes the product quality and can only be changed when the process is stopped. The flow of control is represented in the diagram by the control transformation Control vat workcell and the control flow arcs, drawn as dashed lines with single arrowheads. In fact, the control transformation implements a finite state machine and, as such, it can be fully described by a state transition diagram (Figure 11.5). To provide the designer with a complete requirements specification, the require ments are described hierarchically, with increasing level of detail. A refined view of the transformation Maintain vat is shown in Figure 11.6. There are no limitations on the number of levels in the hierarchy of diagrams. The methods applied in the design steps convert the structure of the conceptual model established in the requirements analysis step and add the technology-oriented details to achieve the performance required of the software. The result is an implementational model, described in the design description, which shows precisely how
Real-Time Systems
232
Temperature
Vat status
Temperature
Heater >*control
B valve control
A valve control
Level
Figure 11.6: Detailed diagram of the transformation "Maintain vat".
System and Software Life Cycle
233
the system performs its functions. Basic concepts underlying the methods used in the design and implementation steps are top-down decomposition and various techniques of structured programming. In most cases decomposition is performed with respect to functions [71], data flow (Structured Design [68], SADT, MASCOT [72]) and data structure (JSP [73]). The control flow between software modules is always defined explicitly. The methods are aimed at structuring the implementation code in such a way as to achieve the performance required and to make the system reliable and maintainable. Some of the design methods can be directly interfaced to particular specification methods, e.g., structured design to structured analysis, or MASCOT to CORE. This means, that the methods rely on similar conceptual background and use similar no tations. Referring to the diagram in Figure 11.6, the problems which could be con sidered in the design step are, e.g.: "Should the level, the temperature and the pH be maintained by three different processes or by one controlling process?" or "Should monitoring and maintenance of the temperature be performed by two separate pro cesses or by one process?" From the requirements point of view, the problem was irrelevant, but from the design viewpoint it is important and can result in a variety of different implementations. Apart from the graphic notation, some textual documents are still indispensable. Documentation techniques and tools provide for computerised support of document creation and storage. To make documents less ambiguous, natural language is fre quently restricted to a form of structured English, supported by data and control flow diagrams, data dictionaries, cross-references, etc. Traceability of requirements specification and design description, which helps keeping track of successive modifi cations is strongly emphasised. The problem of consistency between documents and implementation, however, remains unsolved, as document updating is still considered secondary with respect to a current maintenance cycle, since it may not be vital before the next maintenance cycle.
11.5
Prototyping
Requirements specification was early recognised as crucial for the success of the de velopment process. Errors in specification, when revealed after the implementation step, are usually extremely expensive to correct. Good recognition of the user needs is of vital importance for the final success. However, in spite of the efforts towards improving the quality of requirements analysis and specification, the problems with incompleteness, inconsistency and ambiguity of specification remained unsolved. This suggested an inadequacy in the very general analytical approach to the requirements analysis.
234
Real-Time Systems
The prototyping approach involves building a working model (a prototype) of the system to be developed. The prototype is presented to the user who is to get a feel and to see early results. Before endorsing the specification, the prototype may be modified to examine different options. There are two main differences between the prototype and the final product: first, the prototype does not meet performance requirements and only deals with the system functionality; second, it may be incomplete and may only tackle one facet of the system to be developed, e.g., conventions of man-machine communication. Prototyping fits well into the life cycle model described in Section 11.2. Proto types are built and examined during the requirements analysis step. After validation, the prototypes can be thrown away and the design and implementation process can commence. The main advantage of having a prototype is better recognition of user needs and more reliable early validation of the system behaviour and of its interface to the external world. This reduces the need for further maintenance. Prom another point of view, developing a prototype is a project in its own right, in that it must be specified informally, designed and then implemented. To be acceptable for the project management, the prototype development should be rapid and its costs should not exceed the usual costs of the requirements analysis step. This is feasible using appropriate software tools: prototyping languages, and application packages. In many small data processing applications, where there is only little concern to machine efficiency, a prototype may meet all system requirements thus becoming the implementation. Prototyping of real-time systems is more difficult, as functionality and performance cannot be separated. This leads to the concept of executable specification, which describes the requirements and, at the same time, constitutes a system prototype to be examined by simulation. Methods and tools for building prototypes mostly rely on problem space modeling and decomposition. Control flow aspects are directly addressed by showing the flow in an explicit way. An example of a well-established prototyping specification method is SREM [74], supported by the powerful software tool REVS. The formal foundation for the method is the structured finite-state machine model, represented in the form of a network of stimulus-response paths or R-nets. Problem decomposition is performed with respect to data inputs, structured in the form of messages. Each R-net describes in detail the transformation of a single input message into some number of output messages. The requirements specification is built as a network of R-nets describing the processing required for all input messages. Performance requirements are expressed in terms of response times and accuracies expected in distinguished nodes of the R-nets, called validation points. SREM is best suited for describing discrete systems of command and control, but it can also be applied for defining continuous process control systems, like the one
System and Software Life Cycle
235 Control pH
Figure 11.7: R-net for the message "pH sample".
shown in Figures 11.4 through 11.6. The input flows of pH, Level and Temperature correspond to input messages, which are interpreted as discrete samples. The Rnet describing the processing of the pH input message is shown in Figure 11.7. The message is input through the interface ADC1. The processing is split into two parallel branches. In one branch the value of the control variable B valve control is computed and output through the interface DAC1. In the other branch the value of pH is compared with predefined limits (taken from an inter-net data base) and either the processing is finished or an error message is output through the interface Error display. The accuracy and timing of the computation can be verified by recording the values and time instants at the validation points VI and V2. The above example shows a very clear distinction between a prototype and a design and its implementation. Each R-net can be coded and executed, i.e., it constitutes a prototype of a particular software function. However, the set of all R-nets, which describe the requirements for an entire software item, do not impose any practical
Real-Time Systems
236
software structuring and do not correspond to the software design and implementa tion. Prototyping can also be considered in the broader context of an entire software development project. If the structure designed for the implementation has been pro totyped, it may be examined experimentally. This helps in identifying and evaluating the risks related to particular design decisions. Unsatisfactory results of the proto type evaluation may force modification and refinement of the designed structure. An exhaustive discussion of this paradigm can be found in [75].
11.6
Object-Oriented Development
Design is the second critical part of the software life cycle. The program structure forced by the design method is the most important determinant of the life cycle costs, particularly of the costs borne in the maintenance phase. The traditional life cycle model inherently relies on a top-down decomposition and stepwise refinement approach. In each development step a complete description of the system to be developed is produced at an increased level of details. The motivation for such an approach is reducing the complexity from a large system to a number of small modules which are easier to capture. At the very end, during program packaging, the logical modules are placed into physical modules, thus defining the physical structure of the implementation code. Packaging is performed with performance optimisation in mind. A weak point in this procedure is that the most critical decisions concerning the overall system structure have to be made very early, when the problem is still largely unknown. Moreover, since decomposition is primarily performed in the functional domain, the most important data tend to be global to the entire system. This causes significant problems in the maintenance phase, when data representations must be changed in response to changes in the environment or in user requirements. The object-oriented approach explores a different paradigm, inherently based on the concepts of information hiding, data encapsulation, and abstract data types [76]. The key observation is that a program structure should model the structure of the real world in which the problem is defined. The real world consists of distinct entities which interact with each other. Following this structure, the software to be developed is viewed as a set of objects, each of which consists of a portion of data and a set of operations performed by the object and upon the object. The latter operations are the only actions that modify the data contained inside the object. The representation details of the state data are hidden, i.e., not visible (and not accessible) for other objects. Each software module denotes an object or class of objects derived from the problem space.
System and Software Life Cycle
237
Clock 1 pH sensor
2
PH monitor
4
Error display
3 ■
A valve
5
pH controller
Figure 11.8: Object-oriented decomposition diagram of the pH controller. In its current form, object-oriented design is a partial life cycle method applied in the design and implementation steps within the traditional life cycle model. The major steps of the method are the following [77]: 1. Identify the objects. 2. Identify the operations suffered by and required of each object. 3. Establish the interfaces between the objects. 4. Design and implement the operations. An important distinction between traditional design and the object-oriented method is the inversion of program packaging, which determines the physical structure of a software item (steps 1, 2), and functional refinement (steps 3, 4). For an example refer to the pH controller described in Figure 11.7. An objectoriented decomposition diagram of the controller is depicted in Figure 11.8. It should be noticed that the arcs in the diagram do not denote data flows between the objects, but a balance between the operations performed by and on the objects. The list of objects, and operations which can be defined in the problem space is given in Table 11.1. The numbers in Table 11.1 correspond to the numbers of operations in Figure 11.8. If a software is to be executed in a multitasking environment, the objects may be implemented as co-operating parallel processes. In a sequential environment a transformation function which explicitly defines the control flow must be developed. The object-oriented approach has been recognised as an extremely useful develop ment method in the context of Ada. Unlike traditional languages such as FORTRAN,
238
Read-Time Systems
Table 11.1: Objects and operations in the pH controller Object Operation performed by Operation performed on Timer Force cycle (1) pH monitor Get sample (2) Start cycle (1) Put value (3) Display error (4) pH sensor Get sample (2) pH controller Adjust A valve (5) Get value (3) Error display Display error (4) A valve Adjust A valve (5) COBOL or C, Ada offers packages as an excellent mechanism for the object defini tion. A package is a program unit that encompasses data and operations on the data defined in the form of exported subprograms and task entries. Generic packages fa cilitate the definition of object classes and creation of new instances of objects from specified classes. The advantages of the object-oriented approach are evident in the maintenance phase. A network of co-operating objects forms a very natural model of the problem space, which is the most stable part of the software to be developed. All important characteristics of real objects are encapsulated in their computational models, and any changes in the environment only lead to local consequences inside particular objects. Moreover, all important data are encapsulated within object modules, and changes in data representation do not spread across object boundaries.
11.7
Transformational Implementation
The most creative part in a software project is requirements analysis and specification. Once a complete and unambiguous specification is constructed, it describes in detail the behaviour of the system to be developed. The semantics of this description must not be changed in the design and implementation process. This observation leads to the concept of automatic transformation of the requirements specification into the implementation. The central point in the t r a n s f o r m a t i o n a l i m p l e m e n t a t i o n approach [78, 79] is the executable specification, which constitutes the first prototype of the system to be developed. After validating the specification, it may be subject to a series of trans formations that have been proved correct, i.e., which refine the functions and data definitions, change the structure and performance, while preserving the functionality of the system being designed. The transformations may be selected manually, but are performed automatically. The result of a transformation is executable, and forms the
System and Software Life Cycle
^
1
Requirements analysis
Executable ' specification
Evaluation of the requirements
Performance evaluation
Transformation
Implementation generation
—
T~~
Implementation Figure 11.9: Transformational implementation model. next prototype of the system to be developed. Successive prototypes are examined and evaluated, and compromises between functionality, performance and costs may be studied. When the requirements are met, the design process terminates with the formal and executable design description, which is automatically translated into the implementation. The advantages of this new approach are obvious. Labour-intensive steps of design, implementation and testing disappear, and the development process is viewed as consisting of requirements analysis and specification, evaluation, transformation, and generation of implementation code (Figure 11.9). The only labour-intensive phase is requirements specification. The productivity of the development process may be increased by an order of magnitude. The primary system documentation comprises the executable requirements specification. The maintenance is no longer performed on code. It is the specification which is being modified, then transformed and reimplemented just like during the original development project. As an example consider once more the vat control system. The system functions can be decomposed into a number of co-operating processes for:
240
Real- Time Systems
u
Timing process
—
A,
■
pH monitoring process
wait time
fr
read pH
Vat model
J
— — process pH
6
"> error;
maintain — — monitoring parameters
ok
r
get error • message 1
Error handling
DA
Formal Specification and Verification Methods
287
Selected examples of the safety property considered in [114] are presented below. Partial correctness. A process (program) is called partially correct if every finite com putation ends in a state in which the program's post-condition is satisfied. Since any computation starts in a state which satisfies the program's pre-condition, par tial correctness defines a relation between the initial variables (determined by b(x)) and the values of the variables in the final state. This relation is called the re sult relation of a program, and is denoted Res[x, y). Partial correctness can be expressed by the formula: h m
=>
n
((Vn)ai(/7) = >
Res(x,jf))
Mutual exclusion. Let C\, d be subsets of labels defining critical regions of processes Pi,Pa, respectively. Mutual exclusion in critical regions can be expressed using the formula: |= b(x) = » □ ~ ( S ( d ) A oi(C a )) where at(C) = Vjec a *(0Deadlock freedom. Any label ln, different from the final one is called a waiting location if there exists a state s such that:
at(l)A
~En{y)
where En(y) if and only if hn(s) £ DomTn. Consider a vector of waiting locations / = (I1,..., lm). Deadlock freedom with respect to the vector / is expressed by the formula:
(= h{2) =► D( A /') => V EM i=l
i=l
This means that there is no state with all processes waiting for an event that will never occur. Liveness. The liveness property can be informally described by the requirement that when some condition holds, then as a result something good will happen [118]. The good event may be, e.g., an access to a resource, responsiveness (in real-time systems), etc. Formally, the liveness property can be expressed as follows: [= Ai = »
OA2
This means that when a state occurs such that A\ is satisfied, then a state such that A2 holds will occur in the future. Selected examples of this property are presented below [114]. Total correctness. Total correctness tightens the condition defined for partial correct ness of programs in such a way, that the set of computations contains only finite computations. Total correctness is expressed by the formula: |= b(x) =>- 0{a~i{Cf) A
Res(x,y))
288
Real-Time Systems
This means that each computation starting in a state which satisfies b(x) ends at the final state (at(//)), which satisfies the program post-condition Res(x,y). Accessibility. Consider the mutual exclusion problem. A user process which enters a critical region sends a request to the executive which implements inter-process synchronisation. It is important to guarantee that each request will be serviced, and that the access to the critical region will eventually be granted. Let / be a label assigned to the request operation. The accessibility property can be formulated in the following way: |= at(Z) =► OSf(C) Responsiveness. A critical region may be interpreted as a resource which may be assigned to at most one of the requesting processes at any time. The property called responsiveness can be defined by the following formula: \=Rn=^
OA
where: (Rn = true) means that a process Pn has sent a request for a resource, (A = true) means that the executive has serviced the request. P r e c e d e n c e . Precedence operators are defined to describe properties of selected states connected by semicomputations. The two classes of the corresponding proper ties can be defined [114] as: • \= AU B — until property which means that for every semicomputation starting in the initial state there exists a state satisfying B, and for all states preceding the event the formula A is satisfied. • \= AVB — precede property which means that every occurrence of a state satisfying B is preceded by a state in which A is satisfied. A relationship between the U and P operators can be written as follows: A V B if and only if ~ ( ( ~ A ) W B ) Safe liveness. An example of the until property, the so-called safe liveness [114] may be informally stated: nothing bad will happen until something good occurs. The property is expressed by the formula: \=AUB where A specifies the safety property, e.g., mutual exclusion, which should be preserved, and B describes the liveness requirements, e.g., total correctness or accessibility.
Formal Specification and Verification Methods
289
Absence of unsolicited response. The responsiveness property states that each request has to be serviced: (R =$■ OA). On the other hand, the requirement that each service A has to be preceded by a request R for service can complementarily be defined as:
\=RVA Temporal logic is widely applied for formal verification of concurrent systems. Verification is usually carried out in two steps: - a specification of the system and the properties which have to be proven is defined using the temporal logic language, and then - the required properties are proved using the system specification and temporal formulae transformation rules [113, 114].
This page is intentionally left blank
Chapter 15 Programmable Logic Controllers 15.1
Hardwired and Programmable Logic Con trol
Consider the chemical plant described in Section 1.3. The installation consists of a vat, and a set of bottling lines fed by the vat. The processes running in different parts of the plant exhibit different characteristics. The chemical process running in the vat is continuous, i.e., the state and control variables of the process are continuous, and correspond to analogue input and output signals. The bottling processes are discrete in nature, i.e., the state and control variables are discrete, and correspond to two-state input and output signals. A bottling line and the set of its input and output signals is shown in Figure 15.1. The output signals from the line, which are issued by the sensors attached to the installation, can be listed as follows: p: bottle in position, a two-state signal from the electrical contact of the platform sensor; the contact is open when the sensor is released, and is closed when the sensor is depressed; s: bottle size, a two-state signal from an electrical contact which is open when the line is bottling 1 1 bottles, and is closed when the line is bottling 2 1 bottles; w: weight increments, a pulse signal issued by the scale platform. The pulse signal w conveys the information related to the weight of the bottle, which is just being filled. More precisely, the bottle weight corresponds to the number of pulses issued by the scale platform. It will be assumed in the sequel, that the pulse signal w is fed to a hardware counter which counts the pulses, and drives two output contacts which can be sensed by external devices: 291
Real-Time Systems
292
(Bottle-filling valve)
AA O O O O O O O O
(Scale) (Release bottle) — O - ^ O — (Bottle size)
Figure 15.1: Bottle-filling line and its state and control signals.
Piogia.mma.ble Logic Controllers
11 bottle
293
2 1 bottle open closed
bottle
' arrives
bottle is , removed
bottle is , removed
bottle , arrives
clear counter
clear
'. counter
1
ttfi
n
w2 open valve
close valve
open valve
open closed open closed open ' closed
close valve
' coil energised
coil energised
Figure 15.2: Timing diagram of the bottling line control.
W\. the contact is open when the bottle contents exceeds 1 1, and is closed otherwise, w2: the contact is open when the bottle contents exceeds 2 1, and is closed otherwise. The two-state (binary) signals: p, s, vt\, and w2 describe the line operation, and constitute the set of state variables of the bottling process. They can be used by a control circuit to yield three control signals: r: which releases the next bottle from the bottle supply, c: which clears the counter, and v: which drives the bottle-filling valve. It is easy to see that the next bottle can be released after the removal of the previous one, i.e., that the pulse r which triggers the bottle supply gate should be generated by the opening of the contact p. The counter of the scale pulses can be cleared when no bottles stay on the platform, i.e., when the contact p is open. The control of the bottle-filling valve is more complex. The electromagnetic valve is open when its coil is energised, and is closed otherwise. The timing diagram in Figure 15.2 shows, that the valve should be open when: (1) the contact (2a) the contact (2b) the contact and it contains
p is closed (bottle at the position), and w-i is closed (bottle contains less than 1 1), or s is closed and the contact w2 is closed (2 1 bottle is being filled, less than 2 1).
Real-Time Systems
294
Figure 15.3: Contact scheme. A control circuit of the bottle-filling line must transform the state signals into the control signals. The most complicated part of the controller, i.e., the bottle-filling valve control, can be accomplished in the most direct way, by connecting the contacts p, 3, Wi, and v>2 with the coil of the valve as shown in Figure 15.3. The open-closed signals from the electrical contacts can easily be converted into two-state TTL signals. If the method shown in Figure 5.2 is used, then the'logic 1 corresponds to "open", and the logic 0 to "closed" contact state. The timing diagram in Figure 15.2, and the contact scheme in Figure 15.3 both represent the Boolean function: v =p A (ioi V s A102)
(15.1)
The function can be equally well implemented using logic AND and NAND gates, as is shown in Figure 15.4. The binary TTL signal v has to be amplified to drive the coil of the electromagnetic valve (Figure 5.5). The Boolean function 15.1 defines formally the requirement for the bottle-filling valve controller. The requirement can be implemented in a number of ways. Both the implementations considered above fall into the category of hardwired logic, since the Boolean function is encoded in the structure of a hardware circuit. The advantage of the hardwired approach is the speed of signal processing: all input signals of the controller are processed simultaneously. The disadvantage is inherent inflexibility of the implementation. Any modification to the implemented function requires a suitable modification of the hardware circuit. It is clear, that if four bottle-filling lines are to be controlled, then four identical controllers must be used. Quite another approach to the construction of logic controllers is based on pro-
to 2 -
f>
slUj -
o
> PFigure 15.4: Gate logic scheme.
0
Programmable Logic Controllers
295
grammed logic, i.e., software implementation of Boolean functions. The idea is, that instead of a specialised controller, a programmable processor can be used which reads the binary state signals, evaluates the required Boolean functions, and outputs the evaluated binary control signals to the controlled process. A sample program for evaluating the function 15.1 can be written as follows:
LDA AND OR AND OUT
w2 a wl P V
load the accumulator with w2 and the accumulator with s or the accumulator with IOJ and the accumulator with p output contents of the accumulator to v
(A = (A = (A = (A = (v =
wj) 3 A twj) w\ V a A 702) p A (u>i V s A 102)) A)
The program is stored in the program memory of a programmable logic controller (PLC), and executed repetitively by a specialised or universal processor of the PLC. If four bottle-filling lines are to be controlled, then the program must be repeated four times, once for each bottling line. The real power of industrial PLCs is usually much greater than shown in the example, as they offer the possibility for implementing not only Boolean functions, but arbitrary finite-state machines. In particular, the hardware counter which was used for counting scale pulses could be easily replaced by an internal PLC function. The advantage of the programmable logic is its flexibility, as the programs can be modified relatively easily and quickly. The disadvantage is that the operations are performed sequentially, introducing certain delays between the changes of input and output signals and rendering the method inapplicable to very fast processes.
15.2
PLC Hardware Architecture
The hardware structure of a programmable logic controller (PLC) is similar to the structure of a general-purpose computer. A PLC consists of a processor, main mem ory and input-output interfaces, which are coupled together by a common bus. De spite this structural similarity, the functional architecture of PLCs (Figure 15.5) is significantly different from that of general-purpose systems. The input and the output interfaces enable information to be exchanged between the PLC and the technical process. Originally, PLCs were designed to deal with two-state binary and pulse signals. Depending on the type of a PLC, the number of inputs and outputs can vary from 16 in very small controllers up to 16,000 in large systems. Modern PLCs offer the additional capability to effectively process analogue signals. The types of input-output signals, and of the interface modules have been
Real-Time Systems
296
PLC Program memory Instruction! Processor
Data memory
Input interface
Output interface
Figure 15.5: PLC architecture.
Table 15.1: The family of Modicon PLCs. Model Modicon A 010 Modicon A 020 Modicon A 030 Modicon A 120 Modicon A 130
I/O signals 16 . . . 20 36 . . . 72 40 ...640 8...256 16 ...512
I/O bus length local installation local installation distributed up to 1.2 km local installation distributed up to 1.2 km
Modicon A 350
16 ...2048
distributed up to 1.2 km
Modicon A 500
16 ...5088
distributed up to 1.2 km
Modicon C 984-380 Modicon C 984-480
8 ...256 8 ...1024
local installations distributed up to 1.5 km
Modicon C 984-680
8...2048
distributed up to 1.5 km
Modicon C 984-A
8 ...2048
distributed up to 4.5 km
Modicon C 984-B
8 . . . 16384
distributed up to 4.5 km
I/O modules discrete only discrete only discrete and analogue discrete and analogue discrete and analogue, intelligent modules discrete and analogue, intelligent modules discrete and analogue, intelligent modules discrete and analogue discrete and analogue, intelligent modules discrete and analogue, intelligent modules discrete and analogue, intelligent modules discrete and analogue, intelligent modules
Programmable Logic Controllers
297
described in Chapter 5. Basic characteristics of a sampled family of typical PLCs are given in Table 15.1. Usually, PLCs use neither operator terminals nor mass memories, such as disks or tapes, which are sensitive to the harsh industrial environment. The basic operating software resides entirely in main memory, and a PLC operates autonomously without human intervention. The main memory of a PLC is divided into specialised segments designated for different purposes. Depending on the intended use, the segments can be organised in bits or words. Basic segments contain the following types of information: O p e r a t i n g s y s t e m . The segment is organised in words and contains the code and the data of the operating system; the system code is usually stored in a ROM memory, while the data are stored in a RAM segment with battery backup, which is carefully protected against incursion from the other programs. Application programs. The segment is organised in words and contains the in structions of application programs. Application data. The segment is subdivided into three areas containing the vari ables into which the machine puts sensor data, the intermediate variables, and the output variables associated with the actuators. PLCs offering only logic functions have its data segment organised in bits, while the more advanced machines have part of this area made up of words. There is a strict correspondence between the positions of the bits of the input and output data areas and the layout of input and output signals at the connectors and cables leading in and out the PLC. All the input-output operations are implicit on PLCs. This means that the application programs refer to the data areas in memory only, while the physical data transfer is handled entirely by the system software or firmware. In most cases, all input and output data are transferred in and out the machine at regular time intervals through direct memory access (DMA) channels. The processor is the operational unit designated to control the machine as a whole and to execute instructions of the programs. The instructions are fetched from con secutive words of the program segments of the main memory, which are addressed by the program counter. The data (variables) are addressed in the data segment of the main memory using a variety of addressing modes. The operations of the instructions take place in the accumulator register. Partial results of the operations are stored on a stack, which constitutes an internal memory of the processor. The processors intended for small controllers contain only 1-bit accumulators and can execute oper ations only on bits. More advanced processors contain 8 . . . 32-bits accumulators and can execute operations on bits or words. The basic instruction set of a PLC processor contains the instructions for loading and storing data from and to the memory, logical operations, and conditional and
Real-Time Systems
298
Table 15.2: Modicon 984: Technical data. Word length Program memory (words) Data memory (words) Extended memory (words) Cycle time Self-diagnostic unit Real-time clock Network interfaces Digital I/O signals Analogue I/O signals Distributed I/O stations
984-380 16 bits 1.5... 6 K 1920
984-480 16 bits 4 ...8K 1920
984-A 16 bits 16 . . . 3 2 K 1920
5 ms/K yes no 1 256
5 ms/K yes yes 2 1024
0.75 ms/K yes yes 3 2048
32 in 32 out
224 in 224 out 6
1920 in 1920 out 32
984-B 24 bits 32 . . . 64 K 9999 32 . . . 96 K 0.75 ms/K yes yes 3 8192 in 8192 out 2048 in 2048 out 32
unconditional jumps. The processors intended for larger systems with digital as well as analogue signal processing can additionally execute typical instructions for processing numerical data and for inter-computer communication. The most typical mode of operation of a PLC is cyclic scanning of an application program, which is loaded into the application program area in the memory. The instruction pointer scans consecutively all the words of the program memory down to the last beginning again with the first. Unlike in general-purpose computers, the branch instructions are applied only occasionally. The execution time of 1 K words of instructions is called the cycle of a PLC, and is the most important measure of PLC performance. Basic parameters of a typical PLC family are given in Table 15.2. A real cycle of the PLC has two phases: the system phase and the processing phase (Figure 15.6). The application program is executed in the second phase, while the first phase is implicit and hidden from the user. The system phase is devoted to input-output operations and self-diagnostic programs of the PLC. The PLCs at the top of the performance range enable logic control as well as continuous, e.g., PID control. The duration of the numerical computations can be quite long and, if executed within the cycle of a PLC, can prolong it unacceptably. Therefore, following the idea of multi-tasking (Chapter 7), advanced PLCs allow quasi-simultaneous execution of several cycles. The execution of particular tasks can be time or event driven. Multiprocessor PLCs with the potential for physically simultaneous operation have been introduced to increase system performance and to enhance the reliability of
Programmable Logic Controllers
299
system operation. It should be emphasised, that the need for multiprocessing did not arise from the need for more powerful logic processing, but from the needs of applications which go far beyond the original design limits of PLCs. These are: - numerical processing (statistics, technical and administrative data processing), - communication (man-machine and machine-machine within a network), - analogue signal processing (measurement, regulation, control). Multiprocessing greatly increases the complexity of system programming. To pre serve the initial advantages of PLCs, i.e., simplicity, reliability, and maintainability, PLC designers introduced specialised intelligent units for the various tasks to be car ried out by logic controllers. Basically, any call of a specialised processing unit is implicit, and transparent to the PLC programmer. This approach can be exemplified by the analogue input-output operations: the functions of periodic sampling, super vision of the sensor, linearisation of the measurements, scaling, and comparison with thresholds, can be delegated to an intelligent processor which stores the final results in the memory of the PLC in an area reserved for this purpose. In most cases, the specialised processing units can be classified into one of the following types: • numerical processors for fixed and floating point operations, • supervisory processors for bus traffic and fault diagnosis, • input-output processors for input signal pre-processing, and for operations such as axis positioning, feedback regulation (e.g., PID), motor control, etc., • communication processors for simple RS-232-C or complex network communi cation. In fact, the majority of hardware structures described in Chapter 4 has been adopted in a variety of PLC constructions. An example of a multiprocessor architecture with a stand-by computer is shown in Figure 15.7. PLC cycle Inputoutput
Selfdiagnosis
Program execution
time System phase
Processing phase Figure 15.6: PLC cycle.
300
Real-Time
Systems
MAP network
!PLC
PLC Communication processor
Program memory
Communication processor
PLC processor
PLC processor
Data memory
Data memory
I/O bus interface
_J
Redundancy controller
Program memory
I/O bus interface Redundancy controller
Fast redundancy-controller bus
I/O
I/O
IT
M
I/O
I/O
"FT
TT
Technical process Figure 15.7: Multiprocessor PLC architecture.
Process interface modules
Programmable
Logic
Controllers
General purpose computer
301
General purpose computer
Modnet 3/MAP 10 Mbps, broadband
Gateway Modnet 2/MiniMAP 5 Mbps, carrier-band
PLC Modicon 984
Programming console IBM PC
Operator console IM 1062
PLC Modicon 984
Loop controller Modicon 884 Modnet 1 (I/O bus) 1.5 Mbps, 4.5 km
I/O
IT
I/O
I/O
I/O
IT
TT
I/O
I/O
Figure 15.8: Distributed network of Modicon 984.
P L C s can b e connected into local area networks controlling vast installations. A n e x a m p l e of a large control network composed of Modicon 984 P L C s a n d associated devices is shown in F i g u r e 15.8. It is interesting t o c o m p a r e this network w i t h t h e s t r u c t u r e shown in Figure 4.13. It is easy t o see t h a t t h e M o d n e t 1 LAN corresponds t o a field-bus which joins t h e i n p u t - o u t p u t m o d u l e s with a workcell controller, while t h e M o d n e t 2 a n d 3 LANs are factory floor networks which connect p a r t i c u l a r workcell controllers. Owing t o their high degree of specialisation in t h e control of m a i n l y discrete pro cesses, P L C s have been designed t o be used by persons unspecialised in c o m p u t e r or software engineering. Behind t h e h a r d w a r e a r c h i t e c t u r e of a multiprocessor P L C a s y s t e m software is h i d d e n which co-ordinates application p r o g r a m execution. T h i s m a k e s t h e m a c h i n e s easy t o use, since t h e processors are so specialised t h a t p r o g r a m m i n g t h e m reduces t o select a configuration a n d t o set p a r a m e t e r s .
15.3
P r o g r a m m i n g Languages
P r o g r a m m i n g languages and generally c o m p u t e r aided software development tools for P L C s are presently only available in vendor specific form. T h e c o m m o n l y used
Real-Time Systems
302
programming languages can be subdivided into two categories: textual languages, based on instruction lists, and graphical languages employing ladder diagrams or function charts. The International Electrotechnical Commission (IEC) has made a considerable effort towards standardisation, and has worked out a very detailed draft standard for a set of PLC programming languages [119]. The languages are apt for all performance classes of PLCs. Since they provide a range of capabilities, which is larger than would be necessary for just covering the classical application area of PLCs, i.e., logic processing, they are also suitable for the front-end part of distributed process control systems. This, however, does not hold for their communication features and the user interface level. The standard defines a family of system independent languages, with textual as well as graphical capabilities. The languages are equivalent in that they can be transformed into one another: IL LD ST FBD SFC
Instruction List, Ladder Diagram, Structured Text, Function Block Diagram, and Sequential Function Chart.
C o m m o n features of the I E C languages. In the languages proposed by the IEC standard, there is a unified form for keywords, identifiers, simple and composed variables, numerical and character string literals, time literals, as well as elementary and derived data types comprising various Boolean, integer, real, string and time formats, and enumerated types, subranges, structures and arrays. Furthermore, the declaration, specification, and initialisation of objects in the various program organi sation units can only be carried out with standardised language constructs. The program organisation units are: function, function block, and the program itself. When executed, a function yields as result exactly one data element, which may be multi-valued. Functions do not contain internal state information, i.e., the invocation of a function with the same arguments always yields the same result. Functions and their invocations can be represented either graphically or textually. The standard states a wide range of intrinsic functions, which are also common to all languages. Some of them are extensible, i.e., they are allowed to have variable numbers of inputs, and are considered as applying the indicated operation to each input in turn. Examples for extensible functions are the Boolean AND and OR. The available intrinsic functions can be subdivided into the following groups: • Boolean functions, • bit and character string processing,
Programmable Logic Controllers
303
• selection and comparison functions, • operations on time data, • type conversion, and • numerical functions, including logarithmic and trigonometric operators. The second type of program organisation units, the function block , yields one or more values upon execution. Multiple, named instances, i.e., copies, of a function block can be created. Each instance has an associated identifier, and a data structure containing its outputs and internal variables, and possibly its input variables. All the values of the output variables and the necessary internal variables of this data structure persist from one execution of the function block to the next one. Therefore, invocation of a function block with the same arguments does not necessarily yield the same output values. Only the input and output variables are accessible outside of a function block instance, i.e., the function block's internal variables are hidden from the outside. Function block instances can be created either textually or graphically. As for functions, the standard predefines a number of function blocks providing: • bistable elements, • pulse generators, • counters, • timers, and • message transfer and synchronisation operators. A program is a collection of functions and function blocks, which can be invoked sequentially from the program body, or can be executed quasi-simultaneously, em ploying the task concept. As in the real-time languages considered in Chapter 9, a task represents a thread of code, which is to be executed in parallel with other ones. It is capable of invoking the execution of a set of program organisation units, either on a periodic basis (time-driven tasks) or upon occurrence of a change of a certain Boolean variable (event-driven tasks). The general form of a task is similar to that of a function block. It provides a priority attribute to determine the order of task processing in case there is a conflict. Instruction List. Instruction list languages are, in fact, assembly languages of the processors utilised in PLCs. They are inherently machine and vendor specific, and constitute the lowest end of the whole spectrum of PLC programming languages. An example of an IL program using a traditional syntax has been introduced in Section 15.1.
304
Reai-Time Systems
To make the language less machine and vendor specific, and to ease the migration from IL language to higher level languages, the IEC standard provides an Instruction List which defines an assembly language for a hypothetical processor possessing an accumulator and using a symbolic single address scheme. Most operators of the standard IL language have the form: resuJt := result OP operand i.e., the value of a resuit variable, or the accumulator, is replaced by its current value operated upon by the operator OP with respect to the operand. The classes of standard operators available in the IEC IL language are: load and store the accumula tor, set and reset a Boolean variable, elementary Boolean and arithmetic operations, comparisons, as well as instructions for jumping and subroutine control. Ladder Diagrams. LD languages are the most characteristic languages for PLC programming. They represent a formalisation of electric circuit diagrams to describe relay based binary controls. Ladder diagrams follow the traditional path of engineer ing education and are in widespread use among people concerned with the power part of process installation and instrumentation. The typical notation of a LD language is very similar to that used in the contact scheme shown in Figure 15.3. A great advantage of the LD languages is that, at least in principle, they are machine independent. The disadvantage is, however, that if other than binary op erations are to be programmed, then, the ladder diagram language already needs to be extended by special purpose function blocks. To benefit from the advantage and make the language vendor independent, the IEC standard defines a unified syntax of the basic language elements, i.e., make and break contacts, coils, and connections, as well as a set of typical function blocks. Structured T e x t . The ST language has a Pascal-like syntax and functionality. The language is machine and vendor independent, and constitutes a high level textual programming language, designed for people with computer and software oriented education. The ST language features general mathematical expressions including function and function block invocations, and the following statement types: assignments, prema ture exits from functions and function blocks, IF and CASE selections, and FOR, WHILE and REPEAT iterations. Function Block Diagram. The FBD language has been derived from blueprints of digital circuits, in which each chip represents a certain module of the overall func tionality. The direct generalisation of this concept leads to functions and function blocks, which are depicted by rectangular boxes and which may have inputs and out puts of any data type. The FBD language is machine independent, and it supports the concepts of object-oriented modular design, and module reusability.
Programmable Logic Controllers
305
The IEC standard distinguishes between functions and function blocks. A function is an object which generates for given inputs one function value at its output using a certain algorithm. The language provides elementary Boolean and arithmetic func tions which can be combined to yield new derived library functions. The latter are stored under identifying symbols. Function blocks may have more than one output. Furthermore, and also in contrast to functions, upon multiple invocation with equal input values, function blocks may yield different outputs. This is necessary to be able to express internal feedback and storage behaviour. Like functions, function blocks can be hierarchically defined to represent the internal structure at various levels of detail (Figures 15.9). The figure reveals that the FBD language requires only four different structural elements: - function and function block frames, i.e., symbols, - names, i.e., identifiers, - connectors, and - connections. Functions and function blocks may perform binary, numerical, analogue, or char acter string processing. The schematic representation of logical and functional rela tionships by symbols and connecting lines provides easy conceivability in contrast to alphanumerical ones. However, also this form of representation turns out to be dif ficult to understand when the complexity increases. A function diagram is a process oriented representation of a control problem, independent of its realisation. It serves as a means of communication between different interest groups concerned with the engineering and the utilisation of PLCs, which usually represent different technical disciplines. The system independent software engineering for PLCs is carried out in two steps: 1. Set-up of a function and function block library. 2. Interconnection of functions and function block instances. In the second step, the interaction of functions and function block instances is deter mined in the form of a function block diagram in order to yield the solution to an application problem. To carry out this constructive step of PLC programming, i.e., the linking of function block instances, the user invokes, places, and interconnects predefined functions and function block instances from his or her library. A function block library consists of standard functions, which are universally appli cable for automation purposes and which are usually provided by the system vendor, and of project specific elements compiled by the user himself. New library entries can be generated either by denning new functions or function blocks directly from
Real-Time Systems
306
Type 5
'
rL ]
,
Type 4
i
C
3—3 3—3 3—3
£> 3—3
3 Type 3 c
r.
3 3
C
C
V Type 5
[
3
FB1.1 Typel
-
3 3 C
Type 4 FB1.3 Type 2 3
FBI i
c vn->
3
Typel
-
FB2.1 Type 2
3 3 C
3
C
-
3
C
3 3
C
Type 3
FB1.2 sin
Figure 15.9: Function block representation.
Programmable Logic Controllers
307
scratch or by editing and modifying already existing ones. Once a library has reached a certain size, the second method is employed more frequently. The interconnection of functions or function block instances is carried out interactively by pointing to the respective inputs and outputs on the screen and by selecting a polygonal line connecting feature of the applied graphical CASE tool. Sequential Function Chart. A special class of PLC applications are sequence control systems , in which the control algorithm is naturally defined as a sequence of steps, which should be performed one after another, or — sometimes — in parallel. For the purpose of performing sequential control functions, the standard defines the special SFC language, which serves to partition PLC program organisation units, i.e., programs and function blocks, into sets of steps and transitions interconnected by directed links. Associated with each step is a set of actions, and with each transition is associated a transition condition. A step is a state in which the behaviour of a program organisation unit with respect to its inputs and outputs follows a set of rules defined by the actions associated with the step. A step can be either active or inactive. At any given moment, the state of a program organisation unit is defined by the set of active steps and the values of its internal and output variables. As outputs, each step has a Boolean variable indicating its activity status, and a variable of the type time specifying the time elapsed during its execution. The initial state of a program organisation unit structured with SFC elements is represented by the initial values of its internal and output variables, and by its set of initial steps, i.e. the steps which are initially active. A transition represents the condition whereby control passes from one or more steps preceding the transition to one or more successor steps along the corresponding directed links. Each transition has an associated transition condition, which is the result of the evaluation of a single Boolean expression. Upon initiation of a program or function block organised as a sequential function chart, its initial steps are activated. Then, evolutions of the active states of steps takes place along the directed links when caused by the clearing of one or more transitions. A transition is enabled when all the preceding steps, connected to the corresponding transition by directed links, are active. The clearing of a transition occurs when the transition is enabled and when the associated transition condition is true. The clearing of a transition simultaneously leads to the activation of all the immediately following steps connected to the corresponding transition by directed links, and to the deactivation of all the immediately preceding steps. The scope of high level graphical and textual languages. Since the de gree of programming detail achievable in the graphical FBD and SFC languages is limited, the structured text language ST is used for the formulation of libraries of project specific software modules in the form of function blocks containing, and at the same time hiding, all implementation details. These modules are then utilised and
Real-Time Systems
308
interconnected in the graphical languages FBD and SFC for expressing solutions to automation and control problems. Thus, the advantages of graphical programming, i.e., orientation at the engineer's way of thinking, inherent documentation value, clear ness, and easy conceivability, are combined with the ones of textual programming, i.e., unrestricted expressibility of syntactic details, of control structures, of algorithms, and of time behaviour.
15.4
Sequential Function Charts
The SFC language defined by IEC is, in fact, a slight modification of the well known GRAFCET [120] language 1. It is used for describing the structure of sequence control systems, and ranks above the other four languages described in Section 15.3. The representation method of the SFC language can be considered as an applica tion of the general Petri net concept, which has been briefly introduced in Section 11.7. The interpretation of the basic Petri net elements is the following: Places correspond to the SFC steps, in which some actions are performed. Transitions correspond to SFC transitions, which define the criteria for moving from one step to another. The Petri net structuring rules allow to express sequential as well as parallel relation ships between steps, and to describe the co-ordination of parallel steps. A sequential function chart generally consists of steps, which are linked to other steps by connectors and transitions. One or more actions are associated with each step. The transition conditions are Boolean expressions formulated as Function Block Diagrams or in Structured Text. The main elements of sequential function charts are employed under observation of the following boundary conditions: - any step may be associated with one or more actions, - there is one transition condition for any transition. Examples of sequential function chart structures are provided in Figure 15.10. At any given point in time during program execution, a step can be either active or inactive. With each step a Boolean variable is associated expressing with the values " 1 " or "0" its active or inactive state, respectively. A step remains active and causes the associated actions as long as the conditions of the subsequent transitions are not fulfilled. The initial state of a process is characterised by initial steps, which are activated upon commencement of the process. There must be at least one initial step. x
The first version of the language was introduced by Siemens under the name GRAPH 5.
Programmable Logic Controllers
309
_L 1
32
"3L
3 6 4 7
3=±
5
T« b)
11
IT"
ZL 12
14
-- P
--q
13
15
16
Figure 15.10: Sequential function charts: sequential steps with alternative branches (a), and parallel steps (b).
Real-Time Systems
310
A transition is either enabled or not. It is considered to be enabled when all steps connected to the transition and directly preceding it are being set. A transition between steps cannot be fired (performed), unless: - it is enabled, and - its transition condition has turned true. The firing of a transition causes the simultaneous setting of the directly following step(s) and resetting of the immediately preceding ones. Transitions which can be fired concurrently must be synchronised.
15.5
PLC Peripherals
A typical PLC constitutes an autonomous controller which operates, at least in prin ciple, without human intervention. However, the application of a PLC requires, in the development and testing phase, communication with the designer. The most impor tant device which serves this purpose is a programming console , i.e., a privileged device which allows to enter, modify, and test application programs of the PLC. There are two types of designs for programming consoles. A basic console merely acts as a terminal of the PLC computer and offers only limited possibilities for PLC programming. In the simplest case the console looks like a pocket calculator with its keys labelled with IL language mnemonics and digits. The console display shows the contents of the current memory location. Pressing the keys enters the corresponding instructions into the PLC memory. Input, output and intermediate variables are identified by numbers. A few functional keys enable single-step or continuous program execution. The IL language is usually the only language which can be used with a basic console. The reasons are twofold. First, the processing power of the basic console and the specialised PLC processor is insufficient for more sophisticated compilers, and second, the lack of mass memory devices and good graphical capabilities makes the software development process inefficient. In some cases, a subset of LD may be available. An advanced console is a specialised or general-purpose microcomputer with a high resolution graphical display and a hard disk. The heart of the console is a comprehensive software development toolkit comprising an intelligent editor, a set of language compilers, a program unit library manager, and a linker. Consoles at the top of the performance range can offer additional CASE tools for rapid prototyping and to simulate environments of PLCs. Advanced programming console and PLC can operate independently, except at the moment when a program is transferred to the PLC through a dedicated link
Programmable Logic Controllers
311
or through a general purpose network interface (Figure 15.8). Because of its broad functionality, an advanced console is usually more expensive than the central unit of the PLC itself. This explains why the primitive basic consoles are still in use. On the other hand, it can be noted that the same console can serve several PLCs, and that a console can usually be hired from the PLC manufacturer. A survey on the functionality of an advanced console is given in Figure 15.11. When a new software is to be developed, the programmer uses a graphical editor to set up his or her drawings of function blocks and sequential function charts. In particular, he or she fetches appropriate graphical objects from his or her library, places them in a work sheet, and links these objects by connectors. After the drawing process is complete, the work sheet is stored and, then, lists of all objects and all interconnection nodes occurring in the drawing are generated. These lists constitute a textual representation which is fully equivalent to the contents of the considered worksheet. The object lists are then submitted to a compiler generating code in the ST lan guage. In particular, the compiler produces complete program units with the decla rations of input, output, and local variables, of tasks, and of instantiated function blocks, with calling sequences of functions, function block instances, and tasks, and with the encoded description of steps, transitions, and actions. The texts of these program units are stored in a library, which, of course, can also be filled by a text editor with hand coded modules. It serves as input for a post-processor, i.e., a ma chine dependent compiler, that generates executable programs by including, from the library, the source codes of all functions and function blocks, which are invoked in a module, and any subprograms thereof, that is specified by the user as a main program. When functions and function blocks are directly coded in the ST language and stored in the text library, it is desirable to have a tool available which automatically generates from their source codes the corresponding graphical symbols to be used in the drawing process. Thus, a second editing step needed to set up entries for the graphical library of the graphical editor can be saved. To this end a further tool interprets the name and the input and output declarations of a function or function block in order to generate the textual description of an appropriately sized graphic symbol. The output of this tool is finally subjected to the library manager of the CASE system, which translates the descriptions into an internal form and places the latter into its component library. Another type of PLC peripheral is an operating console which provides capabil ities for operating PLC automated systems. The primary function of the operating console is displaying the status of the controlled process, and facilitating human in tervention into the process. Similar as for programming consoles, two types of designs can be distinguished.
Read-Time
312
l[
Graphical library imrary
j
\ Intelligent graphical editor
Library manager
I
I
Connection lists
Symbol description
PLC Figure 15.11: CASE tools for PLC programming.
Systems
Programmable Logic Controllers
313
A basic operating console merely consists of a small set of displays and specialised keys. It allows to select process variables, to display their current values, and to change the current values of selected control variables. Basic operating consoles are usually manufactured as small, factory-hardened devices, which can be placed in the nearest surrounding of the controlled process. Advanced operating consoles are usually industry standard microcomputers with high graphical capabilities, and mass storage devices. A console is usually placed in the control room of the controlled process. Its primary function is to display a synopsis and the status of the controlled process, to store the process history, and facilitate the set-up of the controlled variables. Appropriate software packages provide statistical analysis of the process data, and display the forecasts of future changes.
This page is intentionally left blank
Chapter 16 Case Studies and Applications 16.1
Distributed System Design in C O N I C
16.1.1
Main Goals and Characteristics
CONIC is a software development environment for distributed real-time applications, designed in the Department of Computing, Imperial College of Science and Technol ogy, London [122]. The CONIC project was supported by the British National Coal Board and was aimed at the design and development of real-time computer systems for the monitoring and control of coal mines. The main result of the project is a package of four software tools which can be used for the design and implementation of network based control systems of large industrial installations. The components of the CONIC system are: • Module Programming Language, • Configuration Language, • Distributed Operating System, and • Communication System. The components of the CONIC system will be briefly described in Sections 16.1.2 through 16.1.5. A short discussion of the microcomputer hardware suitable for em bedded system design is given in Section 16.1.6. Industrial applications addressed by the CONIC projects are, in general: - very large (hundreds of computer stations), 315
Real-Time Systems
316 - physically distributed (distances up to 20 km), and - real-time in nature.
The systems must fulfill the requirements stated for hard real-time systems in section 2.1. The main goal of CONIC is to help the designer in fulfilling those re quirements, and more specifically, in achieving the following characteristics of the designed systems: -
safety, predictability, and fault-tolerance, self-diagnosis capabilities, flexibility and extensibility, high performance, and maintainability.
The features listed above are difficult to achieve simultaneously. However, they are necessary in real-time applications to provide reliable operation of the control system. In particular, fault-tolerance and flexibility are more desired than high performance. The reasons to stress the maintainability are twofold: - the life-time of a control system is usually as long as the life-time of the industrial process which is controlled by it, and - the cost of maintenance is very high.
16.1.2
C O N I C Module Programming Language
The CONIC Module Programming Language is based on Pascal and has been ex tended to incorporate multitasking capabilities. The basic module of each applica tion developed in CONIC contains a sequential program which defines a task type. The instances of the task type, which are called t a s k s , correspond to the program execution. A task type definition can be developed and compiled independently, as it does not reference other task type definitions directly. Multiple task instances can be created from a single task type definition. Tasks can communicate with each other by means of messages. An interface for message passing consists of an exitport and an entryport. A message can be sent to an exitport, and a message can be received from an entryport. A port definition describes the structure of messages which can be passed through, and introduces the port name, which is used within the task (the port owner) as a local variable. The message structure is defined in terms of data types. The interfaces which join exitports with entryports of co-operating tasks are de fined in the system configuration phase, using the CONIC Configuration Language.
Case Studies and Applications
317
This makes task type definitions independent from the system architecture, and con tributes significantly to the software maintainability and extensibility. Any modifica tion of an existing task type does not influence the rest of the system, provided that the definitions of ports have been preserved. Adding a task, or removing any existing one, requires some changes in the system configuration, but need not influence the task type definitions. CONIC provides for synchronous and asynchronous communication operations. The basic syntax of synchronous operations is as follows: RECEIVE msg-inp FROM name-of-entryport SEND msg-out TO name-of-exitport where name-of-entryport and name-o/-exitport are local names of an entryport and exitport, respectively; msg-out is the name of the variable whose contents will be sent in the message; and msg-inp is the name of the variable which will hold the received message data. The CONIC system supports many communication operation variants, including the asynchronous operations SEND and RECEIVE, and the TIMEOUT option. The TIMEOUT statement provides a backstop in case of communication error. An exam ple of the TIMEOUT function in the receive operation is shown below: RECEIVE msg-inp FROM name-of-entryport process-message-statement-list ELSE TIMEOUT(time-interval) alternative-statement-list If a message is received before the specified time interval has expired, then the process-message-statement-list is executed, and the aJternative-statement-Jist is omit ted. Otherwise, the a/ternative-statement-Jist is executed, to handle a communication error. The time interval should be carefully tuned to the characteristics of the communi cation system. A small value makes the system susceptible to delays in communication lines, whilst a large value makes error recovery very slow.
16.1.3
C O N I C Configuration Language
The primary function of the CONIC Configuration Language is to create tasks from the task type definitions. Initial parameters can be passed to the tasks during their creation. The CONIC Configuration Language allows the system to be configured, and permits changes to be made in the working system by removing the running tasks, or creating new tasks, without the need to recompile any unchanged tasks.
Real-Time Systems
318
The command for creating task instances from the task type definition has the following form: CREATE task'type(parameter-l, task-1, task-2, . . .
. . . , parameter-nj
The above command creates the tasks named: task-1, task-2, . . . , which share the code of the task-type. The CONIC Configuration Language provides the means for creating connections between entryports and exitports in the system. A connection between tasks executed on the same station or on different stations in the network is established using the LINK command. The connections between ports and stations are independent of the particular hardware architecture. The same configuration can be used in a system consisting of a one or two stations, or in a large distributed network-based application. The connections between tasks can be changed, if desired. The syntax of the LINK command is as follows: LINK source.exitport
TO
destination.entryport
exitport and entryport are local names of the respective ports in the source and the destination tasks, respectively. The LINK command defines a connection between the exitport and the entryport and enables messages to be passed from the source to the destination tasks.
16.1.4
CONIC Distributed Operating System
The CONIC Distributed Operating System consists of a nucleus, which supports mul titasking, resource management, and interrupt handling; and a set of utilities, which are executed as system tasks. The operating system programs have been implemented using the CONIC Module Programming Language. The actual functionality of a CONIC station is not fixed, but can be defined by the host station during the system configuration phase. To this end, a small ROM is provided for each station, containing a simple loader program. The loader enables the selected operating system and application tasks to be downloaded into the station's RAM. The structure of the CONIC Distributed Operating System is shown in Figure 16.1. The system consists of four layers, which are briefly described in the subsequent paragraphs. The Station Nucleus supports multitasking, interrupt handling and the low-level functions used by the Local Management layer for device control. This is the lowest layer of the CONIC Distributed Operating System, and must be installed on each
Case Studies and Applications Configuration Manager Utility Package Local Manager Station Nucleus
319 Implemented in host system only May be installed on each station but is not obligatory station executive obligatory on each station
Figure 16.1: The layers of the CONIC Distributed Operating System station. The Local Manager is installed on each station. It consist of the following four components: - A module manager, which loads the task type definition code, and creates the tasks which are to be executed on the station; - A link manager, which co-operates with the CONIC Communication System to link ports within the station and ports on remote stations; - A store access manager, which supports the writing and reading of blocks of mem ory, as required by the remote debugger; - An error manager to detect errors, and to send error messages to the respective system and application tasks. The Utility Package comprises a set of tasks for controlling peripheral devices, for file management, and for application software debugging. The most important constituents of the Utility Package are: - the file server which maintains file structure, creates directories, and stores data on discs; - the loader which loads the compiled task type code into the target stations, provides translation from language-independent code to the final code, and prepares the tasks for execution (the loader co-operates closely with the module manager)] - the debugger which supports remote detection and correction of programming er rors; - device drivers which control peripheral devices attached to the station (floppy disk, user keyboard and screen, etc.). The Utility layer of the operating system can be tailored to the requirements of particular stations. The Configuration Manager is the highest layer of the operating system which is implemented on the host computer only. The Configuration Manager co-operates
Real-Time Systems
320
l~s~ nr
|~S~
nr Gateway
~s~| n-"
r-fs~]
"si n-1
H~s~l
Gateway
fs~|
H~s~l 1 1 Q 1 lo
S - network stations of the distributed system Figure 16.2: CONIC system topology. with the Local Manager to load and create (delete, remove) tasks, and to link (un link) ports. The functioning of the Configuration Manager is controlled by the com mands of the CONIC Configuration Language. The system configuration is stored in a database, which contains information about task type definitions, the config uration specification, code and symbol tables, and the physical net configuration. The database is automatically modified when changes to the application software are introduced.
16.1.5
The CONIC Communication System
The CONIC Communication System supports a variety of network topologies, from simple bus or ring LANs to irregular wide networks composed of local subnets con nected via gateways. If possible, it handles redundant communication paths within the network. The Communication System enables new stations to be added to the net without the need for changing the existing software components. An example of a network topology is shown in Figure 16.2. The CONIC Communication System is an independent component, not included in the operating system. It is an intermediate layer between the operating system and an application. The Communication System implements the ISO model of the network architecture (Chapter 6). The primary constituents of the CONIC Communication System are the tasks IP-
321
Case Studies and Applications
SENDER exitport
RECIPIENT entryport
IPCOUT
IPCIN
r
StaticDn
1
t network message
Station 2
Figure 16.3: Message passing scheme in the CONIC Communication System. COUT and IPCIN which support the functions of message formatting and addressing, routing, and error handling. The functional structure of the Communication System is shown in Figure 16.3. The messages sent to an exitport of the SENDER task are placed in the communica tion buffer, which is read by the IPCOUT task. IPCOUT fetches the messages from the buffer, establishes the route, supplies port and station addresses and formats the frames, and puts the frames onto the network. On the other side of the network, the message is received by the IPCIN task. IPCIN examines the destination port address (entryport) contained in the message, and places the message in the communication buffer corresponding to that entryport. During synchronised message operations, IPCIN takes the reply from the RECIP IENT task, swaps the addresses of the SENDER and the RECIPIENT, and puts the reply onto the network. The IPCOUT and IPCIN tasks are implemented on each station.
16.1.6
Conclusion
The components of the CONIC system constitute a development environment for real-time applications. The Module Programming Language facilitates structural and modular programming, and supports multitasking and inter-task communication. The Configuration Language allows the system to be configured as a collection of tasks. The Distributed Operating System supports the station executive environment, the set of utilities, and the configuration management. It provides all necessary functions for efficient task and resource management, and for interrupt handling. The Communication System supports connections between tasks which can be in stalled on different stations connected by the communication network. It implements the ISO model of open system interconnections.
Real-Time Systems
322
A major disadvantage of the CONIC system is its relatively low efficiency. The reason is that the stations were implemented using low performance microprocessors which were available at that time. New microprocessors offer not only much higher performance, but also ready-made facilities for implementing real-time operating sys tems. As an example consider the /iPD 79011 (V25 TAf ) single chip microcomputer with built-in input/output capabilities, and ROM and RAM memory banks. The micro computer is produced by NEC [123]. The on-chip ROM contains the nucleus of a Real-Time Operating System (RTOS) which supports multitasking, and inter-task communication and synchronisation tools, including semaphores and mailboxes. A priority based scheduling method has been adopted to manage the tasks. The designers of RTOS provided the C program ming language for applications development. The jxPD 79011 can be employed directly in a single station system, as the on-chip executive implements all functions of the station nucleus. However, higher layers of the distributed operating system, as well as the communication system, have to be created by the users.
16.2
Graphical Design of Real-Time Applications in LACATRE
16.2.1
Main Goals and Objectives
The design and development of industrial real-time applications raise a number of specific difficulties. While significant improvements have recently been introduced to the specification and design approach (Section 11.4), the transition to detailed design is still a problem. Programming of real-time applications usually requires multitasking (single CPU or multiprocessor-based) capabilities. The available software development tools can be classified as modeling tools, used in the specification step, on the one hand, and operating systems and languages, used in the implementation step, on the other. To bridge the gap between the software model and the implementation, it seems necessary to introduce complementary tools to ease the design, the implementation, and the maintenance of multitasking applications based on real-time operating systems. LACATRE (Language d'Aide a la Conception d* Applications Temps-REel) is a software tool which appears as an over-layer to an industry standard real-time oper ating system. In the software life cycle, LACATRE covers the end of the preliminary design step and the beginning of detailed design: it allows to express the dynamic
Case Studies and Applications
323
behaviour and the relations between tasks. It facilitates the graphical design of a skeleton of a multitask program, but is not able to describe the algorithms for data processing. The tool can also be helpful in high level education. The method described in this section was originally developed to promote new, and more efficient teaching techniques [124]. The remainder of this chapter is organised in three parts: a brief presentation of the graphical language and its textual form is given in Sections 16.2.2 through 16.2.4, an example of a medium complexity application is developed in Section 16.2.5, and a brief presentation of the Real-Time Software Engineering Workbench based on the iRMX operating system is given in the last Section 16.2.6.
16.2.2
Objects of the LACATRE Language
LACATRE or La4a objects represent the software objects handled by standard real time multitasking executives, such as tasks, semaphores, mailboxes, and events, with some extensions. The graphical symbols which are used to represent the objects and their interactions follow certain grammatical rules. The description of the grammar lies, however, outside the scope of this short description. An object of a defined type (a mailbox, for instance) has always the same graphical symbol. Particular instances of the object type are characterised by identifiers and descriptors. Graphical characteristics of the objects. A graphical symbol of an object is composed of a set of primitive elements called bars, which are connected to each other. Three types of bars can be distinguished: The state bar represents the object itself, e.g., a mailbox shown in Figure 16.4a. The bar has literal attributes, such as an identifier, a priority, or other management parameters. The action bar represents an action performed upon the object, e.g., wait or send bar of the mailbox object in Figure 16.4b. The body bar defines the object behaviour. An action bar linked to a state bar indicates the appropriate action which can modify the status of the object. A configurable object is only composed of a state bar and action bars. The application programmer can only act on the attributes of that object. The behaviour of a configurable object is defined by the semantics of the corresponding object im plemented by the real-time operating system. The configuration of the object is static and occurs at the time of creation of that object. La4 configurable objects are 1
La4 is a short form of a French phonetical equivalent of LACATRE.
Real-Time Systems
324 semaphores, mailboxes, events, and resources.
A programmable object is composed of a state bar, action bars, and a body bar. The behaviour of a programmable object is defined by its body, which must be filled in by the programmer in the form of a sequence of actions or instructions. The behaviour of programmable objects is application dependent. La4 programmable objects are tasks (Figure 16.6) and interrupts (Figure 16.7). Connections between objects are represented graphically by action bars linked by connecting lines to the status bars of the respective objects. For example, a task can send a message to a mailbox if the status bar of that task is linked to the mailbox's status bar through a SEND.MBX action bar. The textual m o d e (named LA4_t) complements the graphical mode (named La4_g) of LACATRE. This mode enables automatic generation of the application program skeletons in a target language, with the system calls matching the actual real-time operating system call convention.
16.2.3
Configurable Objects
Semaphores and mailboxes (Figure 16.4). A semaphore is subject to deposit and withdrawal of synchronisation signals. Mailboxes are concerned with messages. The SEND bar and the WAIT bar are the two action bars of the semaphore and the mailbox objects. Both objects can be implemented according to a first-in-first-out or a priority-based algorithm. To be used with a mailbox, a message must be created, and then deleted. The creation of a message involves dynamic allocation of a memory block, with which a data type of the target language, e.g., array, record or structure, is associated. The data types cannot be represented in La4, other than in comment form (La4 ignores data types). Semaphores and mailboxes are objects of very similar behaviour. Therefore, their handling instructions are graphically and textually similar. The withdrawal instruc tions, i.e., WAJT_on_MBX and WAIT.on.SEM, can be used with an optional param eter which specifies the maximum waiting time for completion of the operation. Resource object (Figure 16.5). Real-time operating systems place the basic ob jects (such as semaphores and regions) necessary to correctly manage the access to critical resources at the application programmer's disposal. Even so, the program doc umentation should describe the resource to be protected, rather than the protection method. The La4 resource object has two configuration attributes:
Case Studies and Applications
325
MAILBOX (mbxid);
SEMAPHORE (semid);
i
\ J «fo :
MESSAGE (mesJd);
(
iifo
mesJd
1
b) WAIT-on-SEM (semid, [At]) WAIT.on.MBX (mbxid, mesJd, [At])
' SEND.to.SEM (semid) SEND.to.MBX (mbxid, mes j d ) SEM
SEM CjnesJd_J
At
MBX infinite time
MBX
finite time
Figure 16.4: Semaphores and mailboxes; state bars (a), and action bars (b).
a) RESOURCE(resourceJd,[RT],[PT]); resourceJd : resource identifier RT : resource type : [S] -S : structure -F:nle -D : device.
resource_id
z_
c
H Y
s I Q
Cud
u
L
E
c
Sem-dem
Modele-Appr
2:
Sem-Cor
r~\
Sem-Der / g
r~\
( «itu>tion )
Ci id-Cofr
1 C Mtuation)
Bal-sup-art
I
Bal-art-sup
j \ I Etat
Controle
-i Serveur de X Coordonnees
Y
^
~X?0.J5
( lituation )
Figure 16.9: A part of the robot application.
A R T I C U L A T I 0 N
Case Studies and Applications
331
take place in a logical manner, i.e., the sequence is ordered from 1 to 4.
16.2.6
R e a l - T i m e Software Engineering Workbench
The LACATRE method, as described here, is used in a "free hand way": no special computer aided tool is required. Therefore, it applies to a great variety of applications and multitasking real-time operating systems. The existence of dual representations, graphical and textual, allows the foundations of a Real-Time Software Engineering Workbench (RTSEW) to be established. A detailed description of the prototype implementation of the workbench can be found in [125]. The most important components of the workbench are: • A CAD system giving on-line assistance and verification during the preliminary design phase. The CAD software must be able to produce a textual output of the drawings. • A compiler to translate the textual output into a program of La4_t instructions. • Tools for making cross-reference lists, for deadlock detection, etc. • A compiler to translate the La4_t program into a target program accepted by the target computer. • A tool for the automatic generation of textual and graphical documentation. Except for the last element, the La4 RTSEW prototype implements all of the com ponents listed above (Figure 16.11). The graphical part is implemented using Cadware's Graphical Method Generator (GMG) Sylva Foundry. This generator runs on PC AT compatible microcomputers under the MS-DOS operating system. With the help of its diagram editor and the "rule tool" it is possible to develop the La4 graphical method with an interactive application design. The Sylva graphical tool yields the required textual description of the drawing. The textual description is translated into a La4_t program using the compiler design language (CDL) STARLET [126]. This compiler also deals with the cross-reference analyses. Finally, the last component of the workbench is a La4_t to PLM compiler (also implemented using STARLET CDL). It generates skeletons of PLM286 programs, including all of the requested declarations and the iRMX operating system calls. The implemented prototype proved the feasibility of the workbench. However, some limitations of the prototype revealed the need for a few improvements. The
Jleai-Time Systems
332
Moteurs
Mot. YLib.
Y Mannete Liste-Cmd
Int-Cmd
P H Y S I Q
Cmd-Mot
ID—
1
u
S
E
Sem-dem
iH_>^J Modele-Appr