133 53 14MB
English Pages [373] Year 2023
Gopinath Karmakar Amol Wakankar Ashutosh Kabra Paritosh Pandya
Development of Safety-Critical Systems Architecture and Software
Development of Safety-Critical Systems
Gopinath Karmakar • Amol Wakankar • Ashutosh Kabra • Paritosh Pandya
Development of Safety-Critical Systems Architecture and Software
Gopinath Karmakar RCnD Bhabha Atomic Research Centre Mumbai, India
Amol Wakankar RCnD Bhabha Atomic Research Centre Mumbai, India
Ashutosh Kabra RCnD Bhabha Atomic Research Centre Mumbai, India
Paritosh Pandya Department of Computer Science and Engineering Indian Institute of Technology Bombay Mumbai, India
ISBN 978-3-031-27900-3 ISBN 978-3-031-27901-0 https://doi.org/10.1007/978-3-031-27901-0
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
“Om ajnana-timirandhasya jnananjana-salakaya caksur-unmilitam yena tasmai sri-gurave namah” (Respectful obeisances to the Gurus who opened our eyes with the lighted torch of knowledge from the darkness of ignorance) Dedicated to our Gurus who blessed us with their teachings!
Foreword
Safety is simultaneously a cultural problem, a management problem and an engineering problem. Therefore, a well-thought-out safety engineering approach has to address not only hardware and software issues in isolation but also their interfaces and man-machine interface problems. No single approach can be considered sufficient to achieve the safety features required in many safety applications. The authors of this book intend to contribute to the creation of adequate awareness of these problems and to illustrate technical solutions applied or being developed. What makes this book interesting is that before delving into the softwarerelated aspects of safety, it brings out the importance of the role played by system architecture in achieving system dependability. This allows it to offer a holistic view of the development of safety-critical systems—with practical approaches covering architectural analysis, development process, compliance with the regulatory guidelines and safety standards as well as software qualification. To this end, the book presents the conventional approach as well as the use of formal methods. The discussion on the challenges involved in making the formal techniques applicable to the development of real-world systems and suggested solutions is another attractive feature of this book. I am sure readers will appreciate both these aspects. Furthermore, the authors correctly point out that, in practice, most of the safetycritical functions are fairly simple and can be realized with only a small fraction of a large safety-critical software. That is, well-crafted, large safety-critical software will make use of many not-so-complex features of the subsystems to reduce the complexity of what remains to be covered to complete the solution. This led to the development and deployment of partitioned operating environments, so that less critical components of the large safety-critical software can be partitioned from those which are actually performing safety-critical functions. This aspect of large software needs to be exploited before one starts to tackle the full complexity of the overall software. It will facilitate less rigorous verification and validation for less critical modules to get the necessary approval for safety-critical systems from regulatory authorities. The book’s authors have between them extensive experience in the field of instrumentation and control for safety-critical systems in nuclear power plants and research reactors, which includes the development of vii
viii
Foreword
hard real-time embedded systems, operating systems for safety-critical applications, software engineering for Class IA and IB systems, programmable controllers and system engineering. I am happy to note that Karmakar’s depth of understanding of these topics has been bolstered by his having completed his PhD (I am proud to say, under my guidance) on energy management by exploiting ensuring safety and timeliness properties. Not surprisingly, the book reflects the strength of the authors’ background in these topics, and the lucid prose also makes the content reachable even to readers who are not that familiar with safety systems. In addition, case studies from real-world applications help keep the feet of the authors as well as the readers on the ground. I am delighted that this book will soon add a new perspective to the coverage of safety systems and serve as a must-read reference to the area. I wish the authors the very best. Currently Distinguished Professor of Computer and Data Science, Sai University, Chennai, India
Krithi Ramamritham
Preface
The development of computer-based systems (CBS) for safety-critical applications is challenging, and it can even be intimidating for a beginner to figure out the right path to go forward through the complex maze of (i) complying with a plethora of application domain-specific standards, (ii) meeting the strict verification and validation requirements, (iii) making the best use of the tools and techniques during the process of development and (iv) finally generating documentary evidences that the right path of development has been followed to meet the requirements of the regulatory authority. It has also been observed that in the absence of a well-documented and well-defined development process for addressing the above issues, the execution of a project can get delayed. For example, standards usually specify the means, which can be interpreted differently by different users. Therefore, if standards are left to an individual’s interpretation, the implementation can be different from the true intentions of the standards. This can make the implementation unacceptable to an independent verification and validation team. This book aims at providing practical guidance to professionals to deal with the aforementioned issues (i) through (iv). The dependability of computer-based systems begins with the system architectural design. Even a system with high reliability may not be suitable for safety-critical applications unless it is supported by redundancy in its architecture. This is because redundancy can tolerate failure of components (sensor, actuator, controller, physical communication media, etc.), and it facilitates maintainability without affecting plant availability. Higher plant availability and reliability help in achieving the desired (i.e. the target) dependability. Chapter 2 is dedicated to discussing the dependability of safety systems and how architectural design at the system level helps deal with failures and yet achieve the target dependability attributes, viz. the probability of failure on demand (PFD) and the availability. The discussion also offers insights into the challenges involved in analysing the dependability of real-world systems and presents practical solutions with the help of a case study. Following a well-defined process, which includes verification and validation (V&V) at every stage, is fundamental to the development of software for systems ix
x
Preface
performing safety functions. The discussion on the conventional software development process for safety systems identifies a set of activities that can facilitate the building of dependable software. The other goal of the discussion is to explain how the process helps in developing a safety case that can be independently verified and validated. This is essential to generate adequate documentary evidence to secure approval from the regulatory authorities for its worthiness in safety-critical applications. A safety-critical computer-based system cannot be deployed without strict adherence to the applicable standards at every phase of development—right from requirements analysis to design and implementation along with V&V. The standards are many, and they differ for systems belonging to different types of industries. Some domain-specific standards prescribe the means to be followed, and some prescribe the objectives to be fulfilled during the process of development. In addition, country-specific guidelines exist, which also endorse specific standards to be followed. Thus, a detailed analysis is required to arrive at a common ground and establish correlations among various standards, which are either complementary or redundant. This will help users meet the design objectives using the means prescribed by various complementary standards. This topic is taken up in Chap. 4. Chapter 5 discusses the steps towards complying with the standards at every phase of development. It offers a guided tour traversing the path of software qualification by exploring the necessary steps towards achieving the goal with the help of case studies. This includes independent verification and validation (IV&V) of SDLC documentation, static and dynamic analysis of code and an introduction on how formal verification can support qualification of software. Formal methods have come of age for their use in the development of real-world safety-critical systems. However, a developer requires the relevant background knowledge to maximize the benefit of applying formal techniques. Chapter 6 presents how the use of formal methods can enhance the quality while simplifying the process of software development and its verification and validation (V&V). High-level modeling and relating it to the logical requirements specification lead to the early detection of design errors. Moreover, this approach allows the articulation of quality-enhancing attributes such as soft requirements and robustness to failures to be incorporated in the development cycle in a principled manner. The chapter highlights the application of formal methods in the safety system software development and introduces a few available tools which assist the process. It provides an exposure to the emerging area of automatic synthesis of high-quality controllers as well as run-time enforcement shields from logical requirement specifications. The chapter also describes the well-established techniques of static code analysis and rigorous testing (both at the unit and the system level). The use of such techniques is important in safety systems, as it facilitates certification of safety-critical systems. The formal model-based development process, which includes formal requirements capture, high-level modeling, code synthesis, code analysis and automated testing, is explained with the help of case studies. Chapter 7 presents a detailed discussion on the importance and the advantages of a qualified platform for safety system application development. This includes PLC
Preface
xi
and a formal model-based development platform. The advantages of a model-based development platform over PLC are also highlighted. Though it makes practical sense to develop PLC as a qualified platform, it does not provide support for statechart, which is a powerful tool for the development of reactive systems— the category to which most of the safety-critical systems belong. We include case studies to explain the development process of safety-critical applications using realworld qualified platforms. The targeted readers of this book are the practitioners and students interested in knowing the art and science of developing computer-based systems for safetycritical applications. Practitioners as well as students will get insights into the tools and techniques along with the latest developments in design, analysis and qualification, which are constrained by the regulatory and compliance requirements mandated by the applicable standards. The book also aims to address the needs of professionals and young graduates who would like to go beyond the development of safety-critical systems and venture into the development of necessary tools and qualified platforms. Depending on the availability of resources in terms of access to the tools and techniques, we would like to group the targeted readers into two—(i) those who want to follow the conventional path of development and (ii) those who wish to derive the benefits of formal specification, automated synthesis of code and qualified platforms. For the first group, some basic knowledge of Unified Modeling Language (UML) is a prerequisite. In fact, they can skip the last two chapters altogether. Readers belonging to the second group are expected to have the basic knowledge of first-order logic and formal specification. However, practitioners from the latter group, who want to limit their scope to the domain of qualified PLC, need to be familiar with PLC application programming (conforming to IEC 61131-3) only. Mumbai, India January 2023
Gopinath Karmakar Amol Wakankar Ashutosh Kabra Paritosh Pandya
Acknowledgements
We thankfully acknowledge the contributions of a number of individuals for their encouragement, support and care during the course of writing this book. We thank R. K. Patil and Anita Behere, our ex-colleagues at the Bhabha Atomic Research Centre, for their encouragement and support for this work. We would like to mention that R. K. Patil instilled the idea of writing a book on the development of safety-critical systems about a decade ago in order to facilitate hands-on training to young professionals with the help of a practical guide. We are happy that it did materialize now with a larger scope. We would like to thank an excellent group of people at the Bhabha Atomic Research Centre—Anup Bhattacharjee, Manoj Kumar, S. K. Sen, Ajith K. J., Vikas Chauhan and Yogesh Nirgude—whose work with us contributed to creating some of the building blocks of this book. Our sincere thanks to Prateek Saxena and Suraj Mukade for carefully reviewing a few chapters and providing us with valuable feedback. Our special thanks to Puneet Panwar for giving his feedback on a few chapters and volunteering to improve the aesthetic quality of a number of figures used in this book. In addition, our sincere thanks to D. A. Roy and Siddhartha Mukhopadhyay at the Bhabha Atomic Research Centre for their unconditional support to the first three authors by providing a favourable environment for this endeavour. The last author would like to thank the Tata Institute of Fundamental Research (TIFR) and IIT Bombay for providing him with a conducive home for writing this book. Many of the insights presented here were garnered during the author’s research career with these institutes. We are grateful to Ralf Gerstner, executive editor, Springer, for giving us the opportunity and help in making this book a reality. Our gratitude to Daniel Ignatius Jagadisan and Ramya Prakash, Springer Nature, for their support in publishing the book. Finally, our special thanks go to our families for their love and care and also for bearing with us while we spent time at home writing this book.
xiii
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Computer-Based Systems for Safety Applications. . . . . . . . . . . . . . . . . . . 1.1.1 What Is a Safety-Critical System? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 What Are the Advantages and Challenges? . . . . . . . . . . . . . . . . . . 1.2 Steps Towards the Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Safety System and Its Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Software in Safety Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 The Software Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Functional Safety and the Guiding Standards . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Functional Safety: What and Why? . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 The Safety Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Qualification of Safety System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Automated Development and Formal Verification . . . . . . . . . . . . . . . . . . . 1.8 Qualified Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 3 3 3 4 6 7 8 8 9 9 14 14 15
2
System Architecture and Dependability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Redundancy, Reliability and Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Redundancy and Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Redundancy and Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Plant Safety and Safety System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Redundancy: How Far We Should Go and Why . . . . . . . . . . . . . . . . . . . . . 2.2.1 Failure Modes and Dependability Parameters . . . . . . . . . . . . . . . 2.2.2 Comparison Between 2oo3 and 2oo4 Architectures . . . . . . . . . 2.2.3 Markov Analysis: Implementation Technique . . . . . . . . . . . . . . . 2.2.4 Analysis for Safety and Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Architecture Model-Driven Dependability Analysis . . . . . . . . . . . . . . . . . 2.3.1 The Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Architecture-Driven Dependability: A Formal Approach . . . 2.3.3 System Architecture Modeling in AADL . . . . . . . . . . . . . . . . . . . . 2.3.4 AADL Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17 18 18 20 22 23 23 24 28 33 33 37 38 39 41 42 xv
xvi
Contents
2.3.5 2.3.6
AADL Fault Model for Dependability Analysis . . . . . . . . . . . . . Model-Based Dependability Analysis: Safety and Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.7 Compositional Analysis Methodology . . . . . . . . . . . . . . . . . . . . . . . 2.3.8 Automatic Translation of AADL Fault Model to a PRISM DTMC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Case Study 1: Reactor Trip System (RTS) of a PWR . . . . . . . 2.4.2 Case Study 2: Engineered Safety Feature Actuation System (ESFAS) of a PWR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
Software Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Development Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Software Project Management Plan (SPMP). . . . . . . . . . . . . . . . . 3.1.2 Software Quality Assurance Plan (SQAP) . . . . . . . . . . . . . . . . . . . 3.1.3 Software Verification and Validation Plan (SVVP) . . . . . . . . . . 3.1.4 Software Configuration Management Plan. . . . . . . . . . . . . . . . . . . 3.1.5 Software Safety Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 System Security Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 SDLC Approaches/Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 UML in Software Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Capturing Requirements: The Requirements Definition Phase . . . . . . 3.3.1 System Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 User Requirements Definition Phase . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Impact on Plan Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Software Requirements Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Software Requirements Specification (SRS) . . . . . . . . . . . . . . . . . 3.3.6 Completeness of Software Requirements . . . . . . . . . . . . . . . . . . . . 3.3.7 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.8 Impact on Plan Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.9 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Architectural and Detailed Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Input to Plan Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Verification and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 The V -Model and the V&V Activities . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Verification Activities in Different SDLC Phases . . . . . . . . . . . 3.6.3 Validation Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Software Quality Assurance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 SQA and the Software Development Process . . . . . . . . . . . . . . . . 3.7.2 Software Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75 76 77 78 79 81 81 82 83 88 90 90 91 92 93 94 96 96 96 97 100 100 101 105 107 109 110 110 111 112 112 113
2.4
2.5 3
45 48 53 56 56 69 73
Contents
3.8
Software Configuration Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Configuration Identification and Baseline . . . . . . . . . . . . . . . . . . . . 3.8.2 Configuration Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.3 Configuration Status Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.4 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Development of AAS: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 User Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2 System Specification and System Architecture . . . . . . . . . . . . . . 3.9.3 Capturing Software Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.4 Use Case and Scenario Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.5 Detailed Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.6 Concurrency Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.7 Implementation Snippets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
117 118 118 119 119 122 122 123 123 127 131 135 138 140
Complying with Standards and Guides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Codes, Safety Standards and Guides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Safety Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Guides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Safety Classification/Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 IEC 61508 Functional Safety Classification . . . . . . . . . . . . . . . . . 4.2.2 Safety Classification in NPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Avionics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Safety Classification Under Various Standards . . . . . . . . . . . . . . 4.3 Codes, Regulatory Guides and Standards for CBS in NPP . . . . . . . . . . 4.3.1 IAEA Safety Guides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 USNRC Codes and Regulatory Guides (RG) . . . . . . . . . . . . . . . . 4.3.3 AERB Codes and Guides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Safety Standards for the Development of CBS . . . . . . . . . . . . . . 4.3.5 Standards for Software Development Process . . . . . . . . . . . . . . . 4.3.6 Avionics Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Case Study 1: Complying with the Safety System Criteria and General Design Criteria (GDC) . . . . . . . . . . . . . . . . . 4.4.2 Case Study 2: AERB SG D-25 and IEC 60880 for Licensing of a Computer-Based System (Software) in Safety Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 IEC 60880 Versus AERB SG D-25: Recommendations and Documentation . . . . . . . . . . . . . . . . . . . . . . 4.5 Summary and Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143 144 145 145 145 146 146 147 147 148 148 151 154 156 157 167 168 182
Qualification of Safety System Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Regulatory Requirements and Safety Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Software Safety Case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Generating Evidences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195 196 196 197
3.9
3.10 4
5
xvii
182
188 189 192
xviii
5.2
Verification and Validation Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 The Art of Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Safety Case Docket and Process Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Safety Case Documents: Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Safety Case Documents: Development . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Safety Case Documents: V&V of Requirements . . . . . . . . . . . . 5.3.4 Safety Case Documents: V&V of Design . . . . . . . . . . . . . . . . . . . . 5.3.5 Safety Case Documents: V&V of Implementation . . . . . . . . . . 5.3.6 Safety Case Documents: Verification of Verification (Code Review) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.7 Safety Case Documents: Configuration Management . . . . . . . 5.3.8 Safety Case Documents: SQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formal Specification and Verification of Software . . . . . . . . . . . . . . . . . . . 5.5.1 Formal Verification Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Verification Using Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
200 200 203 203 204 205 205 205 205 206
Formal Modeling, Verification and Automated Synthesis . . . . . . . . . . . . . . . 6.1 Formal Model-Based Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Need for the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Application Development with Model-Based Design Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Formal Requirements Specification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Traditional Method of Capturing Requirements . . . . . . . . . . . . . 6.2.2 Formal Requirements Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Quantified Discrete-Time Duration Calculus (QDDC) . . . . . . 6.2.4 Formalization of Requirements in QDDC . . . . . . . . . . . . . . . . . . . 6.2.5 Formalizing Visual Requirements in Logic . . . . . . . . . . . . . . . . . . 6.3 Verification and Analysis of Logic-Based Requirements . . . . . . . . . . . . 6.3.1 Requirements Analysis: Consistency Checking, Realizability and Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Automatic Synthesis from Formal Requirements . . . . . . . . . . . . . . . . . . . . 6.4.1 Correct-by-Construction Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Controller Synthesis: From Correctness to Quality . . . . . . . . . . 6.4.3 Synthesizing Robust Controllers from QDDC Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Synthesis of Run-Time Enforcement Shields . . . . . . . . . . . . . . . . 6.5 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Theorem Proving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
219 220 221
5.3
5.4 5.5
5.6 6
Contents
206 207 207 208 209 211 213 217
224 229 230 230 232 237 241 243 243 245 245 246 247 250 253 254 258
Contents
6.6
Static Analysis of Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Syntactic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.2 Data Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.3 Abstract Interpretation-Based Analysis . . . . . . . . . . . . . . . . . . . . . . 6.6.4 Theorem Proving-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dynamic Analysis of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.1 Black Box Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.2 White Box Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.3 Automation in Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Logic-Based Specification of AAS . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
261 262 262 263 264 264 265 265 266 269 270 272
Development of Qualified Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Qualified Programmable Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Qualifying a Programmable Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Qualification of Software Items in a Programmable Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Categories of Qualified Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Qualified PLC: Why and How? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Why Programmable Controller? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 How to Qualify a PLC Application Program? . . . . . . . . . . . . . . . 7.3.3 A Scheme for ST to Lustre Translation . . . . . . . . . . . . . . . . . . . . . . 7.3.4 C Code Generation from a PLC Program . . . . . . . . . . . . . . . . . . . . 7.3.5 Regulatory Review of PLC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Qualified Platform for Safety-Critical Applications . . . . . . . . . . . . . . . . . 7.4.1 QPSA and Its Constituent Components . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Managing Tasks in QPSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 A Safe Programming Environment for Real-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Qualification of Application Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Case Study 1: PLC Application Program Verification . . . . . . . . . . . . . . 7.6.1 Verification of Cooling Water Supply Control System . . . . . . 7.7 Case Study 2: PLC Supporting Space and Time Partitioning . . . . . . . 7.7.1 Partition System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 Space Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.3 Time Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.4 Inter-partition Communication (IPC). . . . . . . . . . . . . . . . . . . . . . . . . 7.7.5 Handling Interrupt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.6 System Clock Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Case Study 3: Software Development of Reactor Protection System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.1 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.2 Testing of Model with Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
275 276 277
6.7
6.8 6.9 7
xix
277 279 280 281 282 283 288 295 296 296 297 298 306 307 307 312 312 313 314 318 319 319
320 320 322
xx
Contents
7.9
7.8.3 Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.4 Code Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.5 Report Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
323 323 326 326
A Markov Analysis: Implementation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B A PLC Program and Its Formalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 B.1 Textual Representation of the PLC Program . . . . . . . . . . . . . . . . . . . . . . . . . 331 B.2 Lustre Specification of the PLC Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 C Temporal Partitioning: Proof of TDC-FP Schedulability . . . . . . . . . . . . . . . 337 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
About the Authors
Prof. Paritosh Pandya is an adjunct professor at IIT Bombay and former Dean of the School of Technology and Computer Science at the Tata Institute of Fundamental Research (TIFR), where he worked as a researcher for most of his academic career. After receiving a PhD in Computer Science at TIFR, he worked as a research officer in Oxford University Computing Laboratory under Prof. C.A.R. Hoare and subsequently as a visiting scientist at the United Nations University’s International Institute for Software Technology in Macau. Prof. Pandya is the recipient of the prestigious IEEE RTS 2020 “Test of Time Award” for his pioneering contributions to the theory of schedulability in hard realtime systems. He is known for his work on Duration Calculus and for the tools he developed, which include DCVALID, a validity and model checker, and DCSynth for the automatic synthesis of robust controllers. Dr. Gopinath Karmakar is a Scientific Officer-H at the Bhabha Atomic Research Centre (BARC). He has nearly 35 years of experience in the field of instrumentation and control (I&C) for safety-critical applications. He obtained his PhD very late in his career after spending more than two decades in the design, installation and commissioning of I&C systems of nuclear power plants (NPP) and research reactors. His field of expertise includes development of hard real-time systems, operating systems for safety-critical applications, software engineering for Class IA and IB systems, programmable controllers and system engineering. Karmakar is also an adjunct faculty in BARC Training School. He enjoys creative writing and cooking. Karmakar’s recent research interest in the area of smart homes and smart grids led him to co-authoring the book titled SMART Energy Management: A Computational Approach, World Scientific, 2022. Dr. Amol Wakankar has been working at the Bhabha Atomic Research Centre (BARC), Mumbai, India, as a Scientific Officer since 2006. Amol has been working in the area of safety-critical system development for nuclear power plants for xxi
xxii
About the Authors
more than 15 years. His field of expertise includes analysis and compilation of synchronous data flow programs and application of formal methods in the safety-critical application domain. His current research interests include application of formal methods for architecture-centric dependability analysis and automated synthesis from formal requirements. He has developed the DCSynth tool with Paritosh Pandya, which offers automatic synthesis of robust controllers and runtime enforcement shields. Ashutosh Kabra has been working as a Scientific Officer at the Bhabha Atomic Research Centre (BARC) since 2007. He has over a decade of working experience in the development of computer-based I&C systems for Indian nuclear power plants and research reactors. His expertise includes embedded software development, PLC programming and its formal verification, system dependability analysis and software qualification. His current research activities are reliability analysis using system architecture, use of distributed systems in safety-critical applications and formal verification.
Chapter 1
Introduction
A ship in harbor is safe, but that is not what ships are built for. – John A. Shedd
Computer-based systems (CBS) in safety-critical applications encompass a wide range of industries, which include nuclear, avionics and automobiles. The dependability of this category of systems is of utmost importance for any failure due to the fact that hardware or software can have a disastrous effect on human life. Further, if the system belongs to a category of applications like nuclear safety, its failure may have far-reaching consequences over a long period of time. The development of safety-critical systems is a process that demands constant evolution keeping pace with the technological advancement in a number of areas, viz. (i) system architectural design, (ii) software development process, (iii) qualification of software towards meeting the regulatory requirements and (iv) the use of tools and techniques, including formal methods, associated with the development of real-world safety-critical systems. Furthermore, in addition to the development of dependable software, formal methods can also be applied in carrying out dependability analysis of system architecture to ascertain its suitability for applications important to safety. However, what a practitioner needs is the following.
– A well-established development process – Clear identification of the applicable standards at different stages of development – An unambiguous things-to-do list that will lead to generating documentary evidence, so that the software under development can be claimed as qualifiable and admissible for obtaining regulatory approval
The practitioner’s perspective stated above sets the goal of this book.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. Karmakar et al., Development of Safety-Critical Systems, https://doi.org/10.1007/978-3-031-27901-0_1
1
2
1 Introduction
The Goal This book aims to: (i) Play the role of a useful guide for practitioners, supported by numerous case studies (ii) Help graduate students who may be interested in exploring the recent trends in the development of computer-based system performing safety functions (iii) Keep its focus on the dependable system architecture and the development of software for safety applications
To this end, this book is designed to address practitioners and students interested in knowing the latest developments in the tools and techniques pertaining to requirements analysis, design, development, testing and qualification of computerbased systems performing safety functions. Practitioners will find it useful as a practical guide to develop CBS for safety-critical applications, and it will offer a guided tour to graduate students who may be interested in exploring the path of developing safety systems. Before proceeding further, let us mention that the terms safety and safety-critical are used interchangeably in this book. Terminologies used for systems performing safety functions—preventing accidents or mitigating their fallout—are different in different types of industries. The terminology used in the nuclear domain is safety [121, 122] system; in avionics, it is safety-critical [21] and in automobile industries, it is safety-related [39]. However, what is important is that such systems, across the industries, are designed to perform safety functions at different levels. This book is about computer-based systems performing safety functions, and its focus is on system architecture and software and not on hardware. This is because the reliability of hardware is an independent area, which is well studied, and there is no dearth of literature and books on this subject. However, the dependability of a system cannot be achieved by improving the reliability of hardware alone. It is the system architecture that plays one of the most important roles in achieving dependability. Therefore, a separate chapter is dedicated to discussing the topic. The discussion on what and how regarding the development of software for safetycritical applications is distributed throughout the rest of the chapters. In this chapter, we introduce (i) functional safety and the key features of the safety systems, which make the system architecture and the development process of software unique, (ii) the importance of system architecture and the role it plays in the design of safety-critical systems, (iii) the software development process, (iv) qualification of software, (iv) the guiding standards, (v) the established tools and techniques for development and qualification, (vi) model-based development and synthesis of code and (vii) qualified platforms for safety system development.
1.1 Computer-Based Systems for Safety Applications
3
1.1 Computer-Based Systems for Safety Applications In today’s world, computer-based systems (CBS) dominate the instrumentation and control (I&C) applications in the industry, which include safety systems. However, in addition to their many advantages, computer-based systems bring their own challenges in making them dependable. The unique characteristics of CBS for safety applications and the related topics are introduced in this section. In this book, we use the term CBS, which, unless stated otherwise, always refers to computer-based systems responsible for performing safety functions—preventing accident and/or mitigating its fallout.
1.1.1 What Is a Safety-Critical System? A safety-critical or a safety system is one which is responsible for carrying out safety actions in the event of any postulated failure that can lead to accident causing loss of human life or property and/or severe damage to the environment. A safety system is designed to perform one or both of the following safety functions: (i) Prevent accident by taking appropriate measure, e.g. opening of safety valve, if pressure goes beyond its operating limit. (ii) Mitigate the fallout of any accident, if it occurs. The task can be a simple or a complex one. For example, actuation of an automatic water sprinkling system in case of fire in a closed space is a simple system compared to the initiation of emergency core cooling in the event of an accidental loss of coolant in a nuclear power plant, which is a highly complex process.
1.1.2 What Are the Advantages and Challenges? The advantages of using CBS are many and so are its challenges. On the one hand, it offers flexibility in design and supports changes and functional testing before installation. In addition, it helps reduce the installation and maintenance cost by means of (i) better monitoring and health diagnostics and (ii) reduction in cables and wiring, where distributed systems supported by reliable communication are used. On the other hand, a CBS throws challenges in quantifying its dependability, which is affected by the quality of the software in it. While the reliability figures for hardware are often available, quantifying software reliability is a challenge. In order to generate confidence and convince the regulatory authority, conventionally, a well-defined software development process, semi-formal design using Unified Modeling Language (UML) and rigorous verification and validation process are followed across various types of industries where safety is a concern.
4
1 Introduction
The use of formal methods in developing software is not new. For example, static and dynamic analyses of code are widely applied in the industry today. However, this is still an active area of research. In this book, we will also explore how a practitioner can potentially exploit the recent developments in formal verification tools and techniques.
1.2 Steps Towards the Development At the outset, let us outline the steps involved in the development of safety systems at a higher level of abstraction as shown in Fig. 1.1. This, in our opinion, will help most of the readers to appreciate the upcoming chapters in the right context. The system development steps, in a nutshell, are the following: (i) The development process begins with understanding and analysing the user requirements, which results in the production of a system requirements document. (ii) The system architecture design is the next step, and during this phase, the hardware and software requirements are identified. This leads to branching off the development into two distinct paths: – Development or selection of qualified hardware – Development of software The important activities in this phase are the following: (a) Design of the system architecture and also, in the process, partitioning the system requirements into hardware and software requirements. (b) Analysis of the architecture to ascertain the suitability of the architecture given that the reliability and maintainability figures of the hardware are available and the reliability of the software is assumed as 1. The purpose is to compute and find out if the system meets the safety goals of achieving the desired reliability and availability to perform its safety functions when the demand arises. (c) Arriving at the appropriate system architecture supported by the analysis. (iii) Documentation of: (a) Hardware requirements (b) Software requirements As has been already discussed, the focus of this book is not hardware, and it is realistically assumed that a set of qualified hardware for safety system application is available. Therefore, only those steps which are involved in the development of software and validation of the system against user requirements are outlined further.
1.2 Steps Towards the Development
5
Fig. 1.1 Steps for the development of a computer-based system (CBS)
(iv) To follow a well-defined process of software development in accordance with the chosen software development life cycle (SDLC) model. The SDLC process includes:
6
1 Introduction
(a) Software requirements analysis (b) Software design: – Architectural design – Detailed design (c) Implementation (d) Testing including system validation test (v) Verification and validation (V&V) at every stage of SDLC stated above. One of the important V&V activities is testing. It is worth noting here that in addition to testing, use of formal verification, wherever feasible, helps build a higher level of confidence in the system. This is because testing can only identify the presence of some bugs but cannot guarantee the absence of bugs. This book dwells on the process of arriving at a suitable system architecture and developing software for safety systems as outlined above. In addition, the development of a qualified platform that helps enforce the above steps, by design, is also discussed.
1.3 Safety System and Its Architecture System architecture design identifies its building blocks and the structural as well as functional relationship among them, which helps realize the system’s requirements. The building blocks of a safety system architecture include (i) a hardwired electronic (analog) module, (ii) a computer-based sub-system and/or (iii) a pure hardwarebased module (e.g. module built using electro-mechanical relays). Therefore, a discussion only on the development of software is not enough to offer a global view of the development of a safety system. At the same time, it is not impractical for a designer to assume that some set of hardware is available for designing a given system. Rather, in practice, it is often the case that the use of a set of available and proven hardware comes as one of the constraint requirements in the system design. Further, architectural design decisions like the use of diverse controllers and the communication among redundant channels/trains in a safety system influence the development of software. Thus, the dependability of a system is closely coupled with its architecture and the quality of software, given that a set of proven hardware is available.
The dependability of a system cannot be achieved by improving the reliability of hardware alone. This is because dependability is something that also takes into account probable failures. Hardware reliability is a measure of its failure (continued)
1.4 Software in Safety Systems
7
over a period of time. It is recognized that even a highly reliable hardware has a finite probability of failure and therefore it can fail at any point of time. A computer-based system, like any other system, can fail. Dealing with failure requires built-in redundancy in the system architecture, i.e. the ability to execute safety functions in the event of failure of one or more redundant channels/trains. Therefore, it is the system architecture that plays one of the most important roles in achieving dependability.
Dependability and System Architecture The dependability of a system depends largely on its architectural design at the early stage of development. It requires built-in redundancy in the system architecture, i.e. the ability to execute safety functions in the event of failure of one or more redundant channels/trains.
A discussion on the role and importance of the architecture of CBS performing safety functions is presented in Chap. 2. In addition, the ideas of robust controller design and run-time enforcement shields presented in the formal verification in Chap. 6 provide some potent novel approaches to achieving reliability under transient failures.
1.4 Software in Safety Systems The challenge in the development of software for safety-critical applications stems from the very purpose for which such systems are designed. In the event of any accidental condition, the system must perform to meet the design objective of ensuring safety. Any failure in executing the safety functions can be catastrophic, causing loss of human life and severe damage to property and to the environment. Failures in a safety system can be of two types—failure to meet the functional requirements and failure to meet the temporal requirements. For example, in the event of a plant parameter (say, primary coolant pressure) reaching beyond its safety limit, the correct processing of safety logic alone cannot prevent catastrophe in a nuclear power plant. It must also be ensured that the reactor protection system generates output for dropping down the safety rods within a specified time. Therefore, it is of utmost importance that the software in a safety system not only meets its functional requirements but also does so within a specified upper bound of time (temporal requirement), which is referred to as a hard deadline.
8
1 Introduction
1.4.1 The Software Development Process A structured and well-defined process is necessary to develop software for safety systems. The development process must identify a set of activities that will facilitate the building of dependable software. At the same time, the process should also help in developing a safety case1 that can be independently verified and validated so that adequate documentary evidence can be generated to secure approval from the regulatory authorities for its worthiness in safety-critical applications. This topic is discussed substantially in Chap. 3. Software Development Process The software development process is a structured and measured set of activities designed to produce a specific software output for a particular safety class of computer-based systems.
1.5 Functional Safety and the Guiding Standards A safety system has to carry out its functions automatically in order to mitigate the effect of accidental conditions so that it eliminates the potential risk of a catastrophe. While the goal is to protect the user from potential harm, functional safety and the identified criticality levels of safety can be different for different industries, which led to the development of a number of industry-specific standards. Some of the important safety standards are introduced in this section and further discussed in Chap. 4 with a focus on nuclear power plants (NPP) and avionics systems.
1 A safety case is the documentary evidence which is used to establish that (i) the right process (well defined and conforming to the applicable standards) has been followed in developing the safety system and (ii) the right product (meeting the user requirements) has been developed.
1.5 Functional Safety and the Guiding Standards
9
1.5.1 Functional Safety: What and Why? The terms safety and functional safety2 are defined by IEC Standard 61508, which is the general safety standard covering electric/electronic/programmable electronic (E/E/PE) safety systems.
“Safety is the freedom from unacceptable risk of physical injury or of damage to the health of people, either directly, or indirectly as a result of damage to property or to the environment”. “Functional safety is part of the overall safety that depends on a system or equipment operating correctly in response to its inputs”. Source: IEC TR 61508-0 ed.1.0 [128]
Let us explain the term functional safety with the help of an example. In an automobile system, automatic activation of airbags under an accident condition is an instance of functional safety, which helps eliminate or reduce the potential risk to human life in a moving car. Here, the input(s) to the system is(are) the accident condition detected by the crash/impact sensors. Functional safety is important as it protects the user/consumer from the potential harm that a system can cause in the event of any accident condition. Complex computer-based systems are an integral part of industries today, and assurance of functional safety is important for users and even for non-users, who can be affected simply because of proximity.
1.5.2 The Safety Standards The purpose of a safety standard, like any other industry standard, is to guide designers, developers and manufacturers to make different products which are compatible with each other, cost-effective, functionally safe and user-friendly.
2 Copyright
©2005 IEC Geneva, Switzerland. www.iec.ch. The authors thank the International Electrotechnical Commission (IEC) for permission to reproduce information from its international standards. All such extracts are copyright of the IEC, Geneva, Switzerland. All rights reserved. Further information on the IEC is available from www.iec.ch. The IEC has no responsibility for the placement and context in which the extracts and content are reproduced by the author, nor is the IEC in any way responsible for the other content or accuracy therein.
10
1 Introduction
However, the primary focus of safety standards remains on functional safety. Thus, conformance to applicable safety standards is mandatory in the development of systems performing safety functions.
A safety standard typically classifies systems by the level of reliability they are required to achieve. It also specifies the means of achieving and establishing such reliability. Thus, a safety standard would specify the management and development process to be followed through the development life cycle, as well as the type of documentation and its verification to be carried out at each stage of design to validate the reliability of the developed system. Though the fundamental principles and the safety goals of all industries remain the same, a number of industry-specific standards evolved from IEC Standard 61508, the general safety standard for electric, electronic and programmable electronic (E/E/PE) safety systems. This is because different types of industries developed their own methodologies to address the uniqueness of their application domain. For example, fail-safe design is an important criterion applicable to all safety-critical systems across industries, but the fail-safe action may vary from one application to another. This is because, on failure of a computer-based system, it may not be feasible to generate output that will lead the plant to its safe state automatically or manually. In the nuclear industry, safety systems are designed such that any detected failure leads to automatic shutdown or tripping of the reactor. However, in the case of an aircraft, it would be catastrophic to shut down the engine if some failure were detected in the flight controller or in the landing gear system. In such a scenario, the feasible fail-safe action is to generate alarm and depend on the crew for safe manual operation, if feasible, or opt for an emergency landing. Safety Standard A safety standard is the collective wisdom of the experts aimed at guiding the stakeholders towards making equipment/plants functionally safe, compatible with each other and cost-effective and driving innovations towards safe and user-friendly holistic solutions.
Further, certification of suitability is demanded by the regulatory authority (local, national or international) before any system having safety implications can be put to use. It is necessary that some standard development procedure is followed to help facilitate certification by an independent and qualified team or agency. Though IEC 61508 covers all industries, specific industries belonging to a particular domain (avionics, nuclear, automobile, etc.) have their own uniqueness, and therefore many of them have developed their own standards, as discussed earlier. We refer to these standards as domain-specific standards. In this section,
1.5 Functional Safety and the Guiding Standards Table 1.1 IEC 61508 safety integrity level
11 Integrity level SIL 4 SIL 3 SIL 2 SIL 1
PFD −5 .≥10 −4 .≥10 −3 .≥10 −2 .≥10
to .= 1) OR .
((nOperational(T B) >= 1) AND (nF ailedU U (RP S_Controller) = 2))
The RTS train will be in the Failed_Safe state if any one of the following conditions is satisfied: (i) Both the TBs are in the Failed_Safe state. (ii) No TB is in the Failed_UU state AND at least one of the RPS controllers is in the Failed_Safe state. (nF ailedSaf e(T B) = 2) OR .
((nF ailedU U (T B) = 0) AND (nF ailedSaf e(RP S_Controller) >= 1))
Fault Model of the RPS Controller The RPS controller receives outputs of the sensing modules of other trains using three P2P links and applies 2oo4 voting logic to generate a controller-level reactor trip. The RPS controller will be in the Failed_UU state if any one of the following conditions is satisfied: (i) At least three sensing modules (including corresponding P2P link failure) are in the Failed_UU state. (ii) Two sensing modules (including corresponding P2P link failure) are in the Failed_UU state AND at least one is in the Failed_Safe state. (nF ailedU U (Sensing_Module Or P 2P ) >= 3) OR .
((nF ailedU U (Sensing_Module Or P 2P ) = 2) AN D (nF ailedSaf e(Sensing_Module Or P 2P ) >= 1))
On the other hand, the RPS controller will be in the Failed_Safe state if the following condition is satisfied: At least three sensing modules (including the corresponding P2P link failure) are in the Failed_Safe state. nF ailedSaf e(Sensing_Module Or P 2P ) >= 3
.
Fault Model of the Sensing Module The sensing module consists of a sensor and a comparator. The sensing module will be in the Failed_UU state if any one of the following conditions is satisfied:
2.4 Case Studies
65
(i) The comparator is in the Failed_UU state. (ii) The comparator is in the Operational state AND the sensor is in the Failed_UU state. isF ailedU U (Comparator) OR .
(isOperational(Comparator) AN D isF ailedU U (Sensor))
The sensing module will be in the Failed_Safe state if any one of the following conditions is satisfied: (i) The comparator is in the Failed_Safe state. (ii) The comparator is in the Operational state AND the sensor is in the Failed_Safe state. isF ailedSaf e(Comparator) OR .
(isOperational(Comparator) AN D isF ailedSaf e(Sensor))
2.4.1.3
Compositional Dependability Analysis of the Reactor Trip System
The two models, viz. the abstract RTS and refined RPS controller, described in Sect. 2.4.1.2 are analysed separately, and the CLS(s) of all atomic components that achieve the required TDA for RTS is obtained using the compositional dependability analysis algorithm. Analysis of the Abstract RTS Model The DTMC model corresponding to the abstract RTS model (presented in Fig. 2.22) is generated. The CLSs of the RPS controller and TB are obtained using the PRISM model checker such that these CLSs achieve the required PFD and SUA of RTS. In this study, the PCTL17 specification is used for the computation of PFD. The PCTL specification for the PFD computation is as follows: S =? [RT S_I s_F ailed_U U ]
.
This PCTL formula is the specification for determining the steady-state probability of RTS going into the Failed_UU state, which gives us the PFD. Similarly, the PCTL specification to compute the SUA is as follows: S =? [RT S_I s_F ailed_Saf e]
.
17 Probabilistic computational tree logic (PCTL) is an extension of computation tree logic (CTL). PCTL allows for the probabilistic quantification of the stated properties.
66
2 System Architecture and Dependability
Table 2.6 CLS values considered for the analysis of reactor trip system Parameter name .λS for TB .λU S for TB .λS for RPS controller .λU S for RPS controller .λS for sensor .λU S for sensor .λS for comparator .λU S for comparator .λS for P2P communication link .λU S for P2P communication link DC .Tp MTTR
Start value −4 .10 −4 .10 −4 .10 −4 .10 −3 .10 −3 .10 −3 .10 −3 .10 −3 .10 −3 .10 0 720 24
End value −6 .10 −6 .10 −6 .10 −6 .10 −5 .10 −5 .10 −6 .10 −6 .10 −5 .10 −5 .10 .0.9
Step Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Multiplying factor .10−1 Increased by .0.1
This PCTL formula is the specification to determine the steady-state probability of RTS being in the Failed_Safe state, which gives us the SUA. Experiments are carried out to obtain the PFD and SUA for various values of DC ranging from .0.2 to .0.9. The other parameters, viz. the rates of failure of the RPS controller and TB, are used with a step size of .10−1 as shown in Table 2.6. The voting logic component is assumed to be highly dependable, and its rate of failure is considered as 0. Thus, the fault model of RTS is considered to be independent of the rate of failure of the voting logic component. The values for MTTR .(1/μ) and .Tp are taken as 24 h and 720 h, respectively, which are based on the plant operating experience. The PFD and SUA values between two consecutive DC values are estimated by linear interpolation. All the values of PFD and SUA were plotted, out of which the relevant values which could achieve the required TDA (PFD .< 10−5 and SUA .< 10−5 ) are shown in Figs. 2.24 and 2.25. The corresponding results are highlighted in Table 2.7. The main observations from Table 2.7 are summarized18 below. – The rate of failure of TB should be of the order of .10−5 /h with DC of at least .0.5 in order to achieve the required TDA for RTS. Further decrease in the rate of failure of TB does not provide any significant improvement in PFD and SUA. – The rate of failure of the RPS controller should be within .10−5 /h to .10−4 /h with DC between .0.1 and .0.5.
18 Copyright
©2019, Elsevier, reprinted from [209], with permission.
2.4 Case Studies
Fig. 2.24 PFD of RTS with .λS = λU S = 10−5 for TB, MTTR=24 h and .Tp =720 h
Fig. 2.25 SUA of RTS with .λS = λU S = 10−5 for TB, MTTR=24 h and .Tp =720 h
67
68
2 System Architecture and Dependability
Table 2.7 Experimental results for RTS architecture analysis Sr. no. 1 2 3 4 5
Rate of failure of RTS component (per h) MTTR RPS controller TB DC (h) 10−6 10−4 0.9 24 10−4 10−5 0.5 24 10−5 10−5 0.1 24 10−4 10−6 0 24 10−5 10−6 0 24
Tp (h) 720 720 720 720 720
PFD of RTS 1.05 × 10−5 8.84 × 10−6 6.73 × 10−6 3.87 × 10−6 2.05 × 10−8
SUA of RTS 1.61 × 10−7 2.76 × 10−6 3.02 × 10−9 2.54 × 10−6 3.12 × 10−9
Being an atomic component, the CLS of TB is finalized based on the above observations. However, for the RPS controller which is a composite component, the analysis is repeated in order to obtain the CLS of its sub-components. From the calculated CLS requirements of the (.λS = λU S = 10−4 with .DC = 0.5) RPS controller, the corresponding TDA is computed using Tables 2.3 and 2.4. This TDA is used as the required TDA to calculate the CLS of the sub-components of the RPS controller in further analysis as presented in the next section. Analysis of Refined RPS Controller Model The DTMC model for the RPS controller with its sub-components (refined RPS controller model shown in Fig. 2.23) is generated, and CLSs of its sub-components, viz. sensor, P2P communication link and CM, are computed using the PRISM model checker such that the required TDA (.P F D = 1.76 × 10−2 and .SU A = 2.35 × 10−3 as can be found from Tables 2.3 and 2.4, respectively) for the RPS controller is achieved. The following PCTL formulas are used with the PRISM DTMC model of the RPS controller to determine the PFD and SUA: S =? [RP S_Controller_I s_F ailed_U U ]
.
S =? [RP S_Controller_I s_F ailed_Saf e]
.
Dependability analysis for this refined model was carried out using the same procedure followed for the analysis of RTS. The PFD and SUA of the RPS controller for various values of DC (between 0 and .0.9) were obtained, with .MTTR = 24 h and .Tp = 720 h. The rates of failure of the sub-components, viz. sensors, P2P communication links and CM (comparator), were varied with a step size of .10−1 as shown in Table 2.6. The relevant results are highlighted in Table 2.8. The following are the important observations derived from the experimental results presented19 in Table 2.8.
19 Copyright
©2019, Elsevier, reprinted from [209], with permission.
2.4 Case Studies
69
Table 2.8 Experimental results for RPS controller architecture analysis Sr. no. 1 2 3 4 5 6 7
Rate of failure of RPS component (per h) Sensor P2P link CM 10−5 10−5 10−3 10−4 10−4 10−4 10−4 10−5 10−4 10−5 10−4 10−4 10−4 10−4 10−4 10−4 10−5 10−4 10−5 10−5 10−4
DC ≈0.80 ≈0.20 0 0 ≈0.70 ≈0.60 ≈0.20
MTTR (h) 24 24 24 24 24 24 24
Tp (h) 720 720 720 720 720 720 720
PFD of RPS controller 1.0 × 10−2 1.0 × 10−2 1.0 × 10−2 1.0 × 10−2 1.0 × 10−3 1.0 × 10−3 1.0 × 10−3
SUA of RPS controller 2.36 × 10−4 5.5 × 10−6 2.45 × 10−6 1.67 × 10−6 6.6 × 10−6 2.93 × 10−6 5.16 × 10−7
– The rates of failure of the RPS controller, sensors, P2P communication links and CM modules should be within .10−4 /h with .DC = 0.2 to achieve the required TDA of the RPS controller. – A further decrease in the rate of failure of the P2P communication link does not significantly improve dependability.
2.4.1.4
Results
From the results of the analysis obtained for the RTS architecture (Fig. 2.20), it is concluded that the following design requirements are to be met in order to achieve the target dependability (PFD .≤ .10−5 and SUA .≤ .10−5 ): (i) Rates of failure of atomic components, viz. sensors, CMs and P2P communication links, should be .≤ 10−4 /h with a minimum DC of .20%. (ii) Rates of failure of TBs should be .≤ 10−5 /h with a minimum DC of .50%. (iii) The values 24 h and 720 h for MTTR and .Tp , respectively, are suitable for all the atomic components.
2.4.2 Case Study 2: Engineered Safety Feature Actuation System (ESFAS) of a PWR ESFAS is a safety system of PWR which is engineered to sense the postulated initiating events (PIE) or accident conditions (such as loss-of-coolant accident (LOCA), steam line break, etc.) and actuate emergency safety feature (ESF) in order to mitigate the effect of the accident.
70
2 System Architecture and Dependability Safety Sensors Train A
Safety Sensors Train D
Functinaly Diverse
Functinaly Diverse
RPS-A
Bistable Calculation
RPS-B
RPS-D
RPS-C
Bistable Calculation
Bistable Calculation
A1
Bistable Calculation
D1 I
I
I
I
I
A2P
I
D2P
A2
D2
2/4 Voting Logic
2/4 Voting Logic A2P B2P C2P
B2P C2P D2P I
I
I
I
2/4 Voting Logic
I
2/4 Voting Logic
A2E
I
I
I
I
D2E
A2E B2E C2E D2E I
I
I
I
A1
A2
ESFAS-B
Safety Panel-A
I
I
I
ESFAS-D
ESFAS-C
2/4 Voting Logic
D1
D2
Component Level Control
1/2
ESF Actuator Train A
A2E B2E C2E D2E
I
2/4 Voting Logic
Component Level Control
Component Interface Module (Priority Logic)
I
I
2/4 Voting Logic
Component Level Control
I
I
ESFAS-A
2/4 Voting Logic
I
Safety Panel-D
Component Level Control
1/2
DPS-A
Component Interface Module (Priority Logic)
DPS-D
ESF Actuator Train D
Fig. 2.26 Architecture of engineered safety feature actuation system (ESFAS)
As shown in the architecture depicted in Fig. 2.26, ESFAS uses RPS controllers to get all the field inputs it requires. The processing of the inputs is also done by the RPS to detect emergency safety feature (ESF) actuation conditions. The hierarchical architecture of ESFAS is shown in Fig. 2.27. RPS controllers, P2P communication links (from RPS to ESFAS), component-level control (CLC) modules and ESF actuators are assumed to be atomic components in this analysis.
Fig. 2.27 Hierarchical model of ESFAS
2.4 Case Studies 71
72
2 System Architecture and Dependability
The compositional analysis methodology illustrated in Sect. 2.4.1.3 is used for the analysis of ESFAS. Interested readers may refer to the detailed fault model and analysis in [210], which is not discussed here as the compositional analysis technique remains the same as for RPS. Here, only the results obtained are presented in brief.
2.4.2.1
Experimental Results
The important findings are as follows: (i) In order to achieve the desired TDA (PFD .< 10−5 and SUA .< 10−5 ) of ESFAS, the ESF actuators, CLCs, P2P communication links and RPS controllers are required to be designed such that their rates of failure are −5 /h, .≤ 10−5 /h, .≤ 10−4 /h and .≤ 10−4 /h, respectively, with DC .≥ 0.2. .≤ 10 (ii) The MTTR and .Tp should not go higher than 24 h and 720 h, respectively, for all the components of ESFAS. (iii) Since the results of the analysis (of both RTS and ESFAS) establish the same requirements of CLS for sensors as well as the RPS controllers, the required TDA of both RTS and ESFAS can be achieved by sharing of RPS controllers and sensors between the two systems.
2.4.2.2
Comparative Study of Different Architectural Options
The system development may start with a number of candidate architectures for the system. The dependability analysis framework is used to compare the dependability of alternative architectures, which can help in selecting the most suitable option. Let us consider the architectures of RTS and ESFAS, which are denoted as ArchRTS and ArchESFAS, respectively. The effect of changing or replacing a component in ArchRTS and ArchESFAS is discussed below. Effect of Local 2oo4 Voting Logic in RPS Controllers of RTS An alternate architecture of RTS without local 2oo4 voting logic (termed as ArchRTS_NoLocal2oo4) was compared with the original architecture ArchRTS. It was observed that if the CLSs of all the RTS components are kept the same, then the SUA of ArchRTS_NoLocal2oo4 is found to be degraded when compared to the ArchRTS architecture. For example, the SUA of ArchRTS_NoLocal2oo4 is degraded to −2 when compared with ArchRTS, which is found to be .2.76 × 10−6 . This .9.0 × 10 leads to the following conclusion:
Use of local 2oo4 voting logic in ArchRTS increases the availability by reducing the probability of spurious actuation.
2.5 Summary and Takeaways
73
ESFAS Train with One Controller The comparison of ArchESFAS with an alternate architecture of ESFAS having only one controller in each ESFAS train was carried out. This alternate architecture is referred to as ArchESFAS_SingleController. It is observed that with the alternate architecture, ArchESFAS_SingleController improves the availability (SUA =.4.61 × 10−7 ) when compared with ArchESFAS (SUA = .2.94 × 10−6 ), but safety decreases with a PFD of .1.0 × 10−3 , which is not desirable. Further, in order to achieve the target PFD (.≤ 10−5 ) with ArchESFAS_SingleController, the rate of failure of the ESFAS controller is required to be .≤ 10−5 / h, which is difficult to achieve. Thus, the following conclusion can be drawn:
The architecture with two controllers in each ESFAS train with 1oo2 voting logic is more suitable because it offers lower PFD.
2.5 Summary and Takeaways The role of a system architecture in achieving the desired dependability goal of a safety system is discussed, covering various levels of abstraction. By the end of this chapter, a practitioner should be able to appreciate the rationale behind adopting a particular safety system architecture at the highest level of abstraction. In addition, an awareness of the formal techniques that can help arrive at the dependability attributes of all the constituent components of a system architecture is discussed in detail. Furthermore, the technique of compositional analysis to overcome the scalability issue associated with formal techniques in dealing with real-world applications is also addressed. For aspiring students, this chapter offers an insight into the challenges involved in analysing the dependability of a real-world system using formal techniques and a direction towards a practical solution.
•
? Review Questions
1. What is system dependability? How can system dependability be defined quantitatively? 2. Why is reliability not enough to achieve high dependability in designing a safety-critical system? 3. Can you derive the reliability of a system using the two-out-of-four (2oo4) voting logic if the reliability of all the individual trains is R? 4. Why is the availability of a plant equally important as plant safety?
74
2 System Architecture and Dependability
5. Can you justify the use of the 2oo4 voting logic in comparison to the 2oo3 voting logic, in spite of the additional cost of providing the fourth train? 6. What does the term diagnostic coverage (DC) mean to you? 7. Which configuration of redundant component—series or parallel—will be helpful in increasing system reliability? Can you justify your answer? 8. What is the significance of the term probability of failure on demand (PFD)? 9. Construct a Markov model for a one-out-of-two (1oo2) system, where the safe failure rate, unsafe failure rate, diagnostic coverage, repair rate and proof test interval of individual trains are .λs , .λus , DC, .μ and .Tp , respectively. 10. What are the challenges you may face if you use the probabilistic model checker for the analysis of a large real-world system?
Chapter 3
Software Development Process
Think horizontally, implement vertically. – Bruce Powel Douglass
Software, like any other engineering product, is developed today by adhering to a well-established development process involving a well-defined set of activities. Over decades, it has been established that building software of industrial quality, leave alone software for safety or mission-critical systems, is not just about talented programmer(s) writing code. Industries have developed their own processes with an aim to produce quality software. A process must facilitate assessing the quality of the software. In addition, it is also necessary to (i) make the development of software person independent and (ii) make the software, especially large software, easier to maintain and manage. The means to achieve these goals, to put it in simple terms, are (i) analysis and design before writing a code and (ii) verification and validation (V&V) at every stage, i.e. from requirements analysis and design to implementation and finally system integration and acceptance till the release of the software. However, these activities must conform to the applicable standards and follow a development plan. In this chapter, we briefly discuss the (i) development plans—a set of plan documents covering project management, software quality assurance, configuration management, verification and validation—and (ii) development procedures conforming to the relevant standards along with the software development life cycle (SDLC) models for execution. The phases of software development are further elaborated along with a case study. There is no dearth of books on software engineering, which include practical ones like the ESA1 Software Engineering Standards [168]. Therefore, we will be brief in discussing software engineering while bringing out the essence of the topic and attempt to provide a guided tour on the practical aspects of it with the help of a real-world case study.
1 The
ESA Software Engineering Standards is an example which decodes the relevant software engineering standards and defines a set of practices for all the phases of development. This is one of the most comprehensive practical guides available. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. Karmakar et al., Development of Safety-Critical Systems, https://doi.org/10.1007/978-3-031-27901-0_3
75
76
3 Software Development Process
In this chapter, we will focus on requirements analysis and software design involving its structural and behavioural modeling and not get into the theoretical aspects of these topics. Verification and validation, which is one of the important phases of development, will be introduced in this chapter and will be discussed further in Chap. 5 as V&V is an essential part of software and system qualification.
3.1 Development Plan The goal of software development plans is to manage a software towards developing a product. Therefore, a project management plan has to be in place which will define the technical and managerial processes necessary to meet the project requirements. The requirements include conformance with the standards and the regulatory requirements to meet the end goal of developing a software of adequate quality for its use in safety systems. Before getting into the nitty-gritty of the development process, let us look into a simple analogy. The prescriptions for good cooking can be considered analogous to good software development. It can be observed from Table 3.1 that whether it is cooking or the development of software, following a disciplined approach is essential to create a product of high quality. The software quality assurance (SQA) process demands planned and systematic actions which are necessary to provide adequate confidence that the software development processes are appropriate and produce a software product of suitable quality for its intended purpose. The SQA process ensures this by verifying that (i) the plans (including the project management plan) are described according to the standards, (ii) the procedures are carried out according to the plans and (iii) the products (software) are developed in conformance with the standards.
Table 3.1 Cooking and software development Cooking Required ingredients of good quality Well-defined recipe Conforming to specific cuisine (local or continental) Proper tools/equipments and utensils Decomposition of cooking process (chop/mix/grind, marinate, bake/boil/fry/stir-fry, etc.) Right choice of spices Affirmation of taste, texture and colour (physical verification—whether well done)
Software development Well-formed requirements Well-defined and matured development process Conforming to guides and standards for a specific domain of application (country specific and international) Development environment Requirements analysis, modeling and architectural design Use of a suitable and safe subset of programming language (e.g. MISRA-C/C++ ) Analysis, verification and testing (white box and black box testing)
3.1 Development Plan
77
The development plan begins with a software project management plan (SPMP), which also states that software quality assurance (SQA) is a part of the plan. Thus, a software quality assurance plan (SQAP) has to be in place to ensure the quality of the software by verifying the procedural requirements mandated by the applicable standards at every phase of development. This includes the identification and review of plan documents like software verification and validation plan (SVVP) as part of the SQAP among other SQA activities. Furthermore, it is mandated by IEC Std. 60880-2006 [121] that verification is carried out at every phase of software development involving safety systems. The software configuration management plan (SCMP)2 is also identified as an essential plan documentation, which comes under the managerial review requirements of SQAP.
3.1.1 Software Project Management Plan (SPMP) The software project management plan is essential for large and medium software projects. It is also advisable to have an SPMP even for a small software project. This is because, in the absence of SPMP, it is likely that the development team starts without a clear vision, and clarity comes at a later date at the cost of errors, with more errors revealed as the development progresses. Furthermore, it is important that the development team is well aware of the steps involved in meeting the end goal, and in the process, they also learn about project management. The SPMP identifies the project deliverables, describes the organizational structure and explains the process model of the SPMP. The process model of SPMP (i) identifies the major project functions and activities and (ii) defines the relationship among them by specifying the timelines. These include deliverables and milestones, baselines and reviews at various stages of project execution. In addition, the identification and assessment of the risk involved in the execution of the project pertaining to not-so-well-known/well-understood areas, especially in the beginning phase, comes under the ambit of SPMP. The risk factors can be technological or managerial, including human resource management. Software Project Management Plan (SPMP) “The controlling document for managing a software project. A software project management plan defines the technical and managerial project functions, activities, and tasks necessary to satisfy the requirements of a software project, as defined in the project agreement”. Source: IEEE Std. 1058.1-1987 [9]
2 Software configuration management (SCM) helps prevent unrecorded changes from finding their way into the software, which can lead to system failure or instability.
78
3 Software Development Process
Examples of technological risk include (i) interfaces with new/custom hardware and third-party software, (ii) making a choice out of available software solutions (based on their performance and computational time), (iii) risk due to the complexity of the requirements and also (iv) the evolving software requirements in some cases. The risk management that comes under the category of managerial includes contractual risks (time of delivery and customer acceptance) and risks involved in the acquisition/allocation and retention of skilled human resources. The SPMP should define a software development life cycle approach and a documentation plan in order to achieve the end goal of the project. SPMP in a Nutshell
– Identify and define milestones and deliverables along with the timelines. – Identify and assess both the technological and managerial risks. – Define a life cycle approach and provide clear guidelines on: – Software development life cycle documentation – Mechanism to resolve issues related to requirements—ambiguity, completeness, constraints as well as evolving requirements – Coordination between the development team and the verification team – Specify: – Plan for SQA, V&V and SCM – Applicable standards and regulatory requirements – The tools and techniques to be used in all the steps of development, which include requirements analysis, modeling, implementation (according to programming guideline), code analysis and testing
To put it succinctly, the important technical aspect of SPMP involves monitoring and controlling the process/mechanisms of review, audit and information flow. SPMP shall also define/specify the tools and techniques to be used in monitoring and controlling these activities and ensuring adherence to SPMP. The format and the content of SPMP can be found in IEEE Std. 1058-1998 [13], which states that the standard can be applied to software projects of any size—small, medium and large—and of any level of criticality or complexity.
3.1.2 Software Quality Assurance Plan (SQAP) The software quality assurance plan (SQAP) [25] has many facets, which include (i) management; (ii) documentation; (iii) applicable standards, practices, conventions
3.1 Development Plan
79
and statistical techniques; (iv) software reviews; (v) tests; (vi) problem reporting; (vii) tools, techniques and methodologies; (viii) risk management; and (ix) record collection and retention. SQAP depicts the organizational structure and defines the roles and responsibilities with the objective of controlling and monitoring the quality of the software. It describes the level of organizational freedom necessary to assure evaluation and monitoring of the quality of the software and also to verify problem resolutions. SQAP SQAP identifies the: (i) SQA tasks and activities as well as the roles and responsibilities to carry out the same (ii) Documentation standards, design standards, programming guidelines/standards and testing standards and practices (iii) Tools, techniques and methodologies to be used during the development life cycle and helps in providing adequate confidence in the software quality by verifying that: (i) The plans (SPMP, SVVP, SCMP) are described according to the standards (ii) Procedures are carried out according to the plans (iii) The software is implemented to meet the requirements of the standards
The identification of the documentation governing the software development along with its verification and validation (V&V), maintenance and use (e.g. user manual) accounts for the major activities of SQA. IEEE Std. 730-2014 [25] identifies a minimum set of documents, which are (i) software requirements specification (SRS), (ii) software design description (SDD), (iii) verification and validation (V&V) plans, (iv) verification results report and validation results report, (v) software configuration management plan (SCMP) and (vi) user documentation (UD). In addition to review, (i) the documents to be audited are also identified and (ii) the criteria by which the adequacy of reviews and audits is to be confirmed are specified in the SQAP.
3.1.3 Software Verification and Validation Plan (SVVP) Let us first discuss the terms verification and validation. Depending on the context, verification can mean:
80
3 Software Development Process
– Documenting the evidences by reviewing, auditing, testing and taking any other suitable measures in order to establish conformance with the specified requirements, which include the appropriateness of the software/product for its desired criticality level – Providing formal proof of the correctness of the code developed or generated using formal techniques Validation is the process of evaluating a system during the course of the development or at the end of it in order to determine whether the product (system/component or software) satisfies the requirements as specified. Verification and validation are essential to assure the quality of a software. Both these processes are interrelated, and they are complementary to each other in ensuring software quality. Verification and Validation Verification: “The process of providing objective evidence that the system, software, or hardware and its associated products conform to requirements (e.g., for correctness, completeness, consistency, and accuracy) for all life cycle activities during each life cycle process (acquisition, supply, development, operation, and maintenance); satisfy standards, practices, and conventions during life cycle processes; and successfully complete each life cycle activity and satisfy all the criteria for initiating succeeding life cycle activities. Verification of interim work products is essential for proper understanding and assessment of the life-cycle phase product(s)”. Validation: “The process of providing evidence that the system, software, or hardware and its associated products satisfy requirements allocated to it at the end of each life cycle activity, solve the right problem (e.g., correctly model physical laws, implement business rules, and use the proper system assumptions), and satisfy intended use and user needs”. Source: IEEE Std. 1012-2016 [133]
The software verification and validation plan (SVVP) documents the tasks to be executed for an effective verification and validation of a software. The SVVP shall document the V&V activities pertaining to all the phases—requirements analysis, design, implementation and testing (unit, integration, system and acceptance). It is necessary that every life cycle process is completed only after the development products are verified and validated in accordance with the V&V activities defined and specified in the software verification and validation plan. The standard IEEE 1012-2016 [133] suggests (i) the required table of contents for producing an SVVP and (ii) recommends approaches to verification and validation (V&V) planning.
3.1 Development Plan
81
3.1.4 Software Configuration Management Plan The software configuration management plan (SCMP) documents the software configuration management (SCM) activities to be carried out and specifies how these activities are to be executed. Software Configuration Management (SCM) “SCM is a formal engineering discipline that, as part of overall system configuration management, provides the methods and tools to identify and control the software throughout its development and use. SCM activities include the identification and establishment of baselines; the review, approval, and control of changes; the tracking and reporting of such changes; the audits and reviews of the evolving software product; the management of software release and delivery activities, and the control of interface documentation and project supplier SCM”. Source: IEEE Std. 828-2005 [16]
SCM activity begins with the identification of the configuration items (CIs). CIs can be grouped into two: (i) The outputs of the SDLC such as requirements specifications, design documents, codes (source as well as executable), databases, test plans, test cases, user documents, interface control documents (with hardware, system software and support software) and the plan documents (ii) The support items which include compilers, programming tools, operating systems and test beds The CI identification involves the naming convention to be used for the unique naming of each CI and its subsequent versions. The SCMP describes how CIs shall be stored, tracked and retrieved. The details of the process of CI identification and configuration control can be found in IEEE Std. 828-2012 [23].
3.1.5 Software Safety Plan In the context of the development of safety-critical software, the software safety plan aims at meeting the software safety requirements throughout the software development life cycle and its maintenance. It is to be noted that software safety is to be considered within the context of system safety as required by IEEE Std. 1228-1994 [11]. The examples of software safety and its planning include the following: (i) Determining specific outputs in the event of the failure of the system/software assigned to perform safety functions (ii) Imposition of restrictions in inter-operation with other systems
82
3 Software Development Process
In addition to generating output for operator alert, it is necessary to have a planned/predefined system output if it fails due to either hardware failure or software failure. For example, in a reactor protection system, the system must generate a “failsafe” output, i.e. a predetermined set of outputs that will facilitate the dropping of shut-off rods to shut down the reactor and maintain it in the shutdown state. However, in avionics systems, it may not be feasible to generate fail-safe output. Consider the example of a landing gear system. If its controller fails, the feasible option is to generate an alarm for the operator to take the system under manual control, if permissible, and follow the standard operating procedure (SOP) in an emergency. This scenario can be called “fail-move-on” as against the “fail-safe” operation.
3.1.6 System Security Plan Computer-based systems (CBSs) used in safety applications should be protected against potential cyberattacks resulting in data stealing, manipulation, falsification, etc. This can cause the unavailability of systems, which may lead to catastrophe. A system security plan specifies procedural and technical measures to be incorporated in the design and operation of CBS to (i) prevent, detect and respond to such attacks, (ii) mitigate the effect and (iii) recover from them. In general, a plant-specific system security plan is prepared, which defines the security measures for safety-critical CBS, which are also referred to as critical digital assets (CDAs). Security planning for CDAs includes the following: (i) Critical digital assets (CDAs) are identified and categorized based on their safety criticality. (ii) Vulnerability analysis is carried out to identify the potential threats, vulnerabilities and risks to the CDAs. (iii) Based on the results of vulnerability analysis, security requirements are defined for maintaining the confidentiality, integrity and availability of CDAs. (iv) Security controls are defined to meet the security requirements. Examples of security controls include the following: (a) Access control to allow only authenticated persons to access the hardware/software of CDAs. (b) Well-defined security policies and procedures for the development and operation of CDAs. (c) The configuration management process for implementing any change in CDAs. (d) Periodic audit of access to the CDAs. (e) Communication with only pre-identified systems using secure and authenticated communication paths. (f) Removal of unused hardware or software interfaces.
3.1 Development Plan
83
(v) Security policies and procedures are specified and enforced during the development of the system and its operation. (vi) In case of loss of system due to any security-related incident, countermeasures including recovery procedures are specified. The security of computer-based I&C systems performing safety-critical functions is a wide and challenging topic which deserves more attention—preferably in a separate book. Since security is not the main focus of this book, this topic is only introduced briefly.
3.1.7 SDLC Approaches/Models The software development life cycle (SDLC) approach began with the classic waterfall model, and thereafter, various software development life cycle models have been defined and designed for the development of real-world software. Every development process model outlines a series of steps to be followed towards the successful development of software. Since there is no dearth of books on software engineering, which offer in-depth descriptions of various SDLC models, we will only introduce the well-known and popular models in this section. However, we will focus more on the V -model as this is a model which is recommended by IEC Std. 60880-2006 [121] for the development of safety-critical software in nuclear power plants. The V -model, also known as the verification and validation model, is found to be most suitable for the development of software performing safety functions. The most important and prevalent SDLC models in the software industry are the following.
3.1.7.1
Waterfall Model
This is the earliest SDLC approach that was adopted for software development. In the waterfall model, also known as the linear sequential model, the phases are executed in a sequence as shown in Fig. 3.1. In the waterfall approach, each phase is executed once, and the next phase begins only after the completion of the previous phase. However, as can be observed from Fig. 3.1, it allows the iteration of part of a phase, which may be necessary to correct any error discovered in the next phase. The waterfall model offers a simple approach, but it assumes that all the requirements are explicitly stated at the requirements phase, which is often not the case in real-world software projects. Initially, some level of vagueness about the requirements exists in the minds of the user, which gets dispelled slowly as the development progresses often as a result of interaction with the software development team. The lack of clarity causing any modification in the requirements at any later phase can be costly to address in the waterfall model.
84
3 Software Development Process
Fig. 3.1 The waterfall model
3.1.7.2
Incremental Delivery Model
The incremental delivery model supports the management of a large software project where it may not be practical to develop the complete software with full functionality in one go. This can happen if (i) early delivery of usable software with essential features becomes a necessity or (ii) the size of the development team is small and it calls for the subdivision of the project into smaller manageable deliveries. This model can be considered as an extension of the waterfall model that offers flexibility from the design phase onwards. It facilitates the development of software in repetitive cycles with incremental delivery—the initial version with essential features only and then coming back to the design phase for the next delivery with additional features as shown in Fig. 3.2. The process is repeated until all the user’s requirements are met and the software/system is ready to be deployed. This model is also called the iterative and incremental delivery model because the approach is based on the idea of developing software iteratively (repeated cycles) by enhancing its features/functions with each incremental delivery. At each iteration, the design is modified to add new functional capabilities. The success of this approach depends on the clear definition of (i) a useful subset that can be developed incrementally and (ii) interfaces to add functions in every iteration to facilitate smooth implementation. Note that this exercise has to be done during the analysis and design phase.
3.1 Development Plan
85
Fig. 3.2 The incremental and iterative model
However, the main drawbacks of this approach include the following. (i) It may be hard to add features later, if not taken care of during the design phase and (ii) it may also lead to an increased number of regression testing at every incremental delivery to confirm that the existing functions and capabilities are not affected or impaired.
3.1.7.3
Spiral Model
The spiral model attempts to derive the best of the iterative and waterfall models. The idea of iterative development supporting incremental delivery is used for risk
86
3 Software Development Process
Fig. 3.3 The spiral model
analysis based on the user’s evaluation at every iteration. At the same time, the waterfall model is used for every incremental development for its systematic and tightly controlled development approach. The spiral model has four phases—(i) planning, (ii) risk analysis, (iii) engineering (design, implementation and testing) and (iv) user evaluation. A software project repeatedly passes through these phases in iterations leading to a spiral path of development as shown in Fig. 3.3. This model facilitates the early involvement of the user in the development process. This helps in (i) risk analysis, which includes technical feasibility as well as management risks like slipping timelines and cost overruns, (ii) accommodating changed and evolving user requirements and (iii) taking up risky parts of software early, which improves risk management. However, the complexity of this model makes its management complex. Further, it requires excessive documentation as the spiral grows larger with a large number of intermediate release and requires care so that the growth of the spiral is controlled within its feasible limits.
3.1.7.4
V-Model
The V-model is widely used in the development of safety-critical software, and its name is derived from the execution sequence of the process, which forms a V-shape. This model is also referred to as the verification and validation model. The V-model is essentially an avatar of the waterfall model where each development stage is associated with a corresponding verification phase as shown in Fig. 3.4. In other words, verification/testing is performed at every phase of the
3.1 Development Plan
87
Fig. 3.4 The V -model of software development
development life cycle. As in the waterfall, every next phase is followed by the completion of its previous phase in the V-model. This makes the model highly disciplined and suitable for the development of safety systems as suggested in IEC 60880 [121]. It can be observed from Fig. 3.4 that the process of verification starts at the beginning of the project execution with the preparation of the acceptance test plan (ATP) during the user requirements definition phase. The ATP is used to carry out an acceptance test essential for the acceptance of the system by the user. The software requirements analysis phase and the system validation plan (SVP) go hand in hand in facilitating the system validation test at a later stage. The software design phase has two distinct sub-phases—architectural design and detailed design. During software architectural design (SAD), various software components3 are identified along with their interfaces. Therefore, during this phase, the system integration test plan (SITP) is prepared, which is required to carry out the software integration test. The software detailed design decomposes the software requirements, which are assigned to small manageable modules or units. All the modules are implemented and tested in accordance with the software unit test plan (SUTP) prepared in this phase. The unit test can be considered as the leaf-level validation following implementation. 3 Components, in UML (Unified Modeling Language) parlance, are the run-time artefacts, viz. the executables, libraries and databases.
88
3.1.7.5
3 Software Development Process
Other Models
The other related methodologies are the Agile development model, RAD (rapid application development) and prototyping. At the heart of all these approaches lies the iterative and incremental process model. Prototyping refers to building a software application which presents the functionality of the software to the user in order to get feedback and understand the user’s requirements accurately. Prototyping is also used to test the design ideas and their feasibility, which is useful in reducing risk through practical experience. A prototype is a working software with limited functionality and may not always implement the actual software logic/algorithm as stated in the software requirements specification. The Agile SDLC model, as the name suggests, has its focus on adaptability, and it does not prescribe rigorous documentation like conventional models to enable software quality assurance. This model sets guidelines to rapidly develop and deliver the immediately required software in close interaction with the user. The Agile model supports the delivery of software products in a series of increments while providing scope for iterations before each delivery. Thus, it provides customer satisfaction through quick delivery of the working product. The period for each iteration is designed to be small—a few weeks only—and usually does not cross a month. Teams involved in various areas—planning, requirements analysis, design, coding, unit testing and acceptance testing—work in tandem till a working product is developed and delivered to the user/customer. The Agile development model is popular and used extensively in various software industries. However, it is not yet considered to be fit for wider application in the domain of software for safety-critical systems. The main reason behind this is the emphasis on the working software over comprehensive documentation. Documentary evidences of following a well-defined development process and traceability at every phase to its previous phase and finally to the user requirements are among the most important elements, which are essential to build a safety case for the certification/licensing of software. Traditionally, the Agile methodology is not considered suitable for the development of safety-critical systems, and even some researchers [19, 24] declared it incompatible for adoption in the safety-critical domain. But recent extensions [60, 110] of Agile methods generated renewed interest. However, considering the main focus of this book, the Agile methodology is not addressed further.
3.2 UML in Software Development Modeling is an essential tool in software development and is applied in every phase of it, which includes the development of (i) a logical model of the requirements, (ii) a software architectural/structural model and (iii) a behavioural model.
3.2 UML in Software Development
89
UML (Unified Modeling Language) is an object-oriented analysis technique which is most widely used, rather ubiquitous, in the software industry. Plenty of books are available on UML. However, in order to make this book self-contained, UML modeling diagrams will be introduced in this section. Subsequently, the artefacts corresponding to various modeling diagrams along with their graphical notations are presented under the relevant sections, where various phases of software development life cycle processes are introduced. The UML was originally designed by Jim Rumbaugh, Grady Booch and Ivar Jacobson by merging their ideas, viz. the object-oriented analysis and design (OOAD) aka the Booch method, the object-modeling technique (OMT) of Rumbaugh and the object-oriented software engineering (OOSE) of Jacobson, at Rational Corporation. In 1997, UML was adopted as a standard for object-oriented analysis and design (OOAD) by the Object Management Group (OMG). In 2005, the International Organization for Standardization established UML as an ISO standard, which is used in many industries for designing object-oriented models. The UML is a general-purpose graphical modeling language, which specifies a standard approach to use the best practices of software engineering. It is used throughout the software life cycle to specify, construct and document the software artefacts involved in the development. The following nine modeling diagrams constitute the core of UML: (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix)
Use case diagram Class diagram Object diagram Statechart Activity diagram Sequence diagram Collaboration diagram Component diagram Deployment diagram
As discussed, the artefacts and the graphical notations pertaining to the above modeling diagrams will be presented in the upcoming sections as and when their applicability will arise. In addition, a case study at the end of this chapter will demonstrate how to make use of UML modeling in a real-world software development project. UML CASE Tools A number of commercial as well as open-source and/or free CASE tools for UML modeling are available. Examples of commercial tools include IBM Engineering Systems Design Rhapsody and Borland Together. Though plenty of open-source tools are available, the free versions come with limited features. The freely available tools include ArgoUML, which is used to create some of the UML models presented in this book. Rhapsody is used for the case study presented at the end of this chapter.
90
3 Software Development Process
– Rhapsody: Rhapsody is a commercially available modeling environment supporting UML. It provides a graphical environment for systems engineers and software developers creating real-time or embedded systems and software. – ArgoUML: ArgoUML is an open-source UML modeling tool, which can be used for learning. However, one of the drawbacks of this tool is that it does not have the facility for report generation.
3.3 Capturing Requirements: The Requirements Definition Phase Software development begins with the “problem definition”. The purpose of this phase is to refine the user’s idea about a requirement and define it as a task expected to be performed by a computer-based system. The outcome of interactions between the user and the developer through discussions, studies and prototyping is often utilized in formulating user requirements. The information generated during the user requirements definition phase is used for assessing the feasibility and planning the basic approach to project execution. A user often does not view a computer-based system in terms of hardware, software or firmware. It is viewed as a single entity, which must meet the user requirements. However, a software engineer is concerned with the subset of user requirements, which are allocated to software. Therefore, following a feasibility study, a user requirements document (URD) is generated. This is followed by system requirements analysis to generate a system requirements specification (SyRS) from the user requirements. Then the system architectural design produces a system architectural design document, which partitions the system requirements into two— one set of requirements is allocated to hardware and the other one is allocated to software. Readers may refer back to Fig. 1.1 to locate the software requirements analysis/specification phase in context with the steps involved in the development of a CBS. The next activity is to analyse the statements of the user requirements and specify a set of software requirements, which should be unambiguous, correct, consistent and as complete as achievable.
3.3.1 System Context The place of the software in the real-world system must be determined before defining the user requirements. In other words, the system context or the understanding of the operating environment must be clear before stepping in to define the user requirements. Interfaces to the environment, e.g. user interfaces like keys/buttons, display of equipment status as well as process parameters, and communication
3.3 Capturing Requirements: The Requirements Definition Phase
91
interface with surveillance or an external health monitoring system should be specified at the beginning of the project. While the details of the interfaces can be developed throughout the subsequent phases—software requirements analysis and design—the type of interfaces must be identified. This is usually done with the help of a context diagram consisting of the system block diagram and its interfaces with the external systems. An example of a context diagram can be found in Sect. 3.9.2. User Requirements and System Engineering
– Feasibility study and identification of system context – User requirements definition – System requirements specification (SyRS) accounting for all the user requirements – System architectural design (SyAD) partitioning the system requirements to hardware and software and the identification of the subset of system requirements allocated to software
3.3.2 User Requirements Definition Phase The user requirements can be broadly categorized into two—(i) capability requirements and (ii) constraint requirements. Capability requirements include functional requirements as well as performance requirements, which a system must meet so that the desired system objective (as envisaged and expected by the user) can be achieved. There can be other requirements like security requirements, which can be clubbed under capability requirements. The simplest example of a security requirement can be the password-based authentication of users. Users can also impose some constraints on solutions to the problem by limiting the designer’s choice on how some of the requirements are to be met. For example, users may ask for the use of a specific set of hardware (which may be existing already), dictate the use or non-use of the operating system or enforce the use of some specific communication protocol, say, MIL-STD-1553-B [172].
3.3.2.1
User Requirement Attributes
The user requirement must be well formed as defined in IEEE/ISO/IEC Std. 29148 [132], which states that it shall be possible to (i) validate the requirements statement aimed at solving a user (customer) problem or achieving a user (customer) objective and (ii) qualify it by measurable conditions (monitoring physical parameters) within the bounds of the constraints, if any.
92
3 Software Development Process
Let us consider the following user requirement: On reactor pressure reaching 170 bar, drive down the safety/control rods with a constant speed of 2 mm/sec until the pressure goes below the stated limit.
Here, (i) the capability part of the requirement is driving down the shut-off rod, (ii) the conditions start driving down when pressure goes .≥ 170 bar and continue till the pressure is .< 170 bar and (iii) the constraint is that the speed is to be maintained at 2 mm/sec. Well-Formed Requirement A well-formed requirement is a statement that (i) can be verified, (ii) has to be met or possessed by a system to solve a stakeholder problem or to achieve a stakeholder objective, (iii) is qualified by measurable conditions and bounded by constraints and (iv) defines the performance of the system when used by a specific stakeholder or the corresponding capability of the system, but not a capability of the user, operator or other stakeholder. Source: IEEE/ISO/IEC Std. 29148 [132]
The essential user requirement attributes are the following. Identification Every user requirement shall have an identifier so that it can be traced through the subsequent phases of development. Unambiguity User requirements must be clear and unambiguous such that they can have only one interpretation. Importance Importance of each requirement, especially the essential and nonnegotiable requirements, must be clearly specified so. Verifiability Verifiability is one of the most important attributes of user requirements. It must be possible to (i) ascertain that the requirement will be implemented in software, (ii) verify that the requirement finds its place in the design and (iii) ensure by testing that the software implements the requirement. Priority If the incremental delivery approach is followed, then user requirements must have a priority attached to it so that incremental delivery can be scheduled accordingly.
3.3.3 Impact on Plan Documents The activity of defining user requirements has a role to play in providing inputs to the planning and the plan documents [168], which include the following.
3.3 Capturing Requirements: The Requirements Definition Phase
93
Acceptance Test Plan The preparation of a user acceptance test plan (ATP) starts in parallel with the user requirements definition phase itself. This is because it is the acceptance test results that demonstrate whether the software meets the user requirements or not. SQAP The user may ask for certain quality monitoring procedures conforming to the application domain-specific standard as well as organization-level SQAP requirements. SVVP The user requirements document can potentially provide input to the procedure for review and traceability. For example, a user from the aerospace industry would ask for the verification of DO-178C objectives.
3.3.4 Software Requirements Analysis After studying the software section of SyRD, the software requirements are analysed and documented generating a software requirements specification (SRS). Before the construction of the software requirements, an implementation-independent logical model is built, which provides an abstract description of what a system is expected to do. It is to be noted that the logical model must be implementation independent, i.e. it will only describe what to do and not how to.
3.3.4.1
Logical Model Construction
In order to arrive at a logical model, the following techniques are adhered to. Functional Decomposition First, the key functional requirements are identified. This can be done by decomposing the higher-level functional requirements into smaller lower-level requirements. In real-world projects, it is also not surprising to find lower-level requirements scattered around the user requirements document (URD), which can be grouped to relate to its higher-level requirement. Once the key functional requirements are identified and grouped into a number of domains attached to specific system functions, a domain model is constructed. A domain model is a graphical depiction of various domains at the highest level of requirements abstraction and their dependencies among themselves. Even a simple application, for example, can have two domains at the highest level of abstraction, viz. application domain and the user interface domain. The requirements of each domain are decomposed into smaller groups of requirements as use cases. The functional decomposition of requirements can be modeled as an object diagram of packages (domain model) and use case diagram(s) of each domain using Unified Modeling Language (UML) [184] . We will discuss further on this topic in Sect. 3.9 presenting the case study.
94
3 Software Development Process
Performance Analysis The user may specify some performance attributes in the URD like accuracy, response time, etc. These performance attributes will be attached to specific functions or group of functions. It is necessary to analyse the logical model so that no conflict in performance requirements arises. For example, the rate at which the input data are acquired and processed should match with the performance requirement of the function that is responsible for sending the processed data through communication at a given frequency. However, during the software requirements analysis phase, how the performance attribute (e.g. response time) will be apportioned between the functions—acquiring inputs and doing the processing (e.g. algorithm or logic processing)—shall be left to the designer and taken up during the design phase. Criticality Analysis The criticality level of a software usually depends on the safety category of the system. However, all the functional requirements of a safetycritical system may not belong to the category of safety-critical, and some may not be important to safety. For example, in a nuclear reactor, the protection system is responsible for taking the reactor to a safe state and maintaining it in the event of any off-normal condition of over-power or over-pressure, which is a safety-critical function. But communication with the operator information system may not be a safety-critical function. URD may specify the availability requirements of some system capabilities depending on their criticality level. The logical model should be analysed to propagate the criticality level of all the functions that achieve the system availability requirement together.
3.3.5 Software Requirements Specification (SRS) The first step in specifying software requirements involves its classification into a number of categories [168]. It is also necessary that every software requirement includes some specific attributes so that it can be qualified and verified. The reader may refer to the user requirement attributes (discussed in Sect. 3.3.2.1), which are also applicable to software requirements. Furthermore, software requirements specification (SRS) as a whole shall possess two important features, viz. completeness and consistency, which will be explained later in this section.
3.3.5.1
Classification of Software Requirements
The logical model of software requirements is examined and the software requirements are classified [12, 168] in terms of:
3.3 Capturing Requirements: The Requirements Definition Phase
95
1. Functional requirements 2. Interface requirements (a) (b) (c) (d)
Hardware interface requirements Software interface requirements User interface requirements Communication interface requirements
3. Performance requirements (a) Response time requirements (b) Accuracy requirements 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Design constraints (requirements) Operational requirements Safety requirements Resource requirements Verification requirements Acceptance testing requirements Documentation requirements Security requirements Portability requirements Quality requirements Maintainability requirements
Software requirements should be specified rigorously, and therefore, the use of semi-formal languages and/or formal specification, as it may be feasible, is desirable. In order to facilitate verifiability, software requirements should be stated quantitatively, wherever feasible.
3.3.5.2
Attributes of Software Requirements
The attributes of software requirements are no different from the attributes of user requirements. A software requirement should include the following attributes. Identifier: Identifier is essential to trace a software requirement through the subsequent phases of development. Clarity: A software requirement must be clear and unambiguous and have only one interpretation. Importance: Of each requirement, especially the essential and non-negotiable requirements must be clearly specified so. Verifiability: It must be possible to (i) verify that the requirement finds its place in the design and (ii) ensure by testing that the software implements the requirement. Priority: This becomes important if the incremental delivery approach is followed, then the user requirement must have a priority attached to it so that incremental delivery can be scheduled accordingly.
96
3 Software Development Process
Stability: It is expected that software requirements should be stable throughout the software development life cycle. Therefore, if some requirements are subject to change based on feedback from the design phase, then such unstable requirements shall be flagged.
3.3.6 Completeness of Software Requirements The completeness of software requirements dictates that every user requirement has been accounted for and nothing is left out or overlooked in the SRS. In order to achieve this, a traceability matrix is generated in SRS, where every user requirement (allocated to the software) can be traced from the software requirements identified in the traceability matrix. Also, each software requirement should be sufficiently described to meet the user’s need without any further amplification. For example, the requirement—generation of alarm for parameter x going above a threshold (set point) h—may not be complete unless a hysteresis (.δh) is specified that the alarm shall be cleared only if .X ≤ h − δh.
3.3.7 Consistency The set of requirements specified as software requirements is consistent if (i) no requirement is in conflict with any other requirements and (ii) the same term is used for the same item in all the requirements. In addition to logical conflict, inconsistency can also arise either if different terms are used for the same thing or if the same term is used for different things. Therefore, it is important that consistency checking of requirements specifications is carried out during the verification phase. In this context, it may be noted that manual verification of the consistency of text-based requirements specification is tedious and error-prone. How formal specification can help resolve this issue is discussed in Sect. 6.3.1.
3.3.8 Impact on Plan Documents The plan documents, which will derive their inputs from the requirements defined in the software requirements specification, include the following. System Validation Test Plan (SVTP) The system validation test plan will outline the approach towards meeting the software requirements. Software Verification and Validation Plan (SVVP) The traceability of the software architecture and the detailed design of the software requirements will be governed by the SRS (software requirements specification).
3.3 Capturing Requirements: The Requirements Definition Phase
97
3.3.9 Tools and Techniques 3.3.9.1
Use Case Diagram
A use case diagram is used to represent the system context, which shows the general cases of its interaction with the external entities or objects. A use case diagram can be imagined as a black box where only the input, output and functionality of the black box are known. The main purposes of creating a use case diagram in the requirements analysis phase are to: (i) Capture the use cases, i.e. the expectations of the user of the system that can be functionally grouped together in one or more groups (ii) Identify the participant actors (iii) Identify the events and the messages for describing the interaction between actors and the system A use case is a coherent piece of system functionality, which interacts with the external objects. A use case is strictly behavioural and is used to capture system requirements. It only describes what to do? (not how to do?) when the system interacts with the external objects called Actors. An actor can be a human user or an external system or device. An actor interacts with the use case by exchanging messages. Use case diagrams are helpful in the following aspects of software development. (i) Communication with system users: A use case diagram is helpful for developers to communicate with system users and understand their expectations. (ii) Identification of derived requirements: Use case also helps identify new requirements as the requirements analysis progresses. (iii) Generation of test cases: The set of scenarios for a use case helps in designing test cases for the verification of the corresponding functional requirements. In a use case diagram, a rectangle represents the system boundary, oval shapes represent use cases, and human-like stick figures represent actors. Links between use cases and actors represent association. Important graphical notations pertaining to the UML use case are given in Table 3.2. A use case diagram for a typical embedded system is shown in Fig. 3.5. The system processing (Sys Proc) use case represents the execution of control logic functions. The figure shows that the Sys Proc interacts with two actors—Field Inputs and Field Outputs. Also, the Sys Proc use case includes other use cases, viz. User Interface and Operator Comm, to facilitate interaction with the system user and operator information system, respectively.
3.3.9.2
Sequence Diagram
The scenario is an instance of a use case. Multiple scenarios are possible for any given use case of a system. In other words, a scenario is a specific instance of a use
98
3 Software Development Process
Table 3.2 Artefacts of a use case diagram Artefacts
Graphical notation
case, which provides a mechanism for validating the software requirements against the user’s expectations. A UML sequence diagram is used to depict a scenario with the help of objects, actors and an ordered list of messages exchanged among them. A sequence diagram shows how the participants interact with each other in a particular scenario with the help of the following graphical notations. (i) Participants of a sequence diagram are represented either as rectangles (if it is an object) or as stick figures (if it is an actor). (ii) A vertical dashed line below each of the participants represents its lifeline— progression of time from top to bottom. (iii) A rectangle on the lifeline represents the execution of a function performed by the participant. (iv) A labeled arrow with a solid arrowhead represents a synchronous message. (v) An arrow with an open arrowhead represents an asynchronous message. (vi) A dashed arrow with an open arrowhead represents a return message sent in reply to a calling participant. Important graphical notations pertaining to the UML sequence diagram are shown in Table 3.3.
3.3 Capturing Requirements: The Requirements Definition Phase
99
Fig. 3.5 Use case diagram of a typical embedded system Table 3.3 Artefacts of a sequence diagram
Artefacts
Class instance
Call and messages
Self call
n
100
3 Software Development Process
3.4 Architectural and Detailed Design The next step following the development of the logical model of the software requirements is to define its architecture. The SRS is examined, and a physical model (architecture) of the software is constructed. The physical model provides a framework supporting the development of lower-level components during the detailed design phase. This can be carried out independently once the physical model is constructed. The architecture of the software is built by assigning functions to software components, identifying the interfaces—including networked communication interfaces among these components—and defining the control (including hierarchy in control) and the data flow. This is followed by the detailed design of the software to facilitate the realization of the software components and their interfaces, coding and testing. Some organizations like ESA [168] prefer to produce separate documents for the software architectural design (SAD) and the software detailed design. However, the two documents together constitute the software design description (SDD). However, a single document SDD can include the architectural design as well. Usually, in a new project, more than one architectural design is proposed, which is reviewed for design modularity and robustness in terms of maintainability, leading to the selection of the best architecture. As already pointed out, following the architectural design, i.e. the construction of the physical model, all the internal design details of the components and the modules are clearly and minutely defined producing the SDD.
3.4.1 Input to Plan Documents The plan documents, which will be governed by the architectural and the detailed design (carried out in this phase), include the following. Integration Test Plan (ITP) Integration test plan shall be defined during this phase. This is because the integration test is dependent on the architectural design of both the system and the software. Verification and Validation Plan Verification pertaining to the software requirements assigned to individual modules and the plan for its validation shall be defined at this phase. Acceptance Test Plan (ATP) The design, the generation of test cases and the test procedures for acceptance test shall be specified in this phase. This is because acceptance test specification (ATS), which is a part of ATP, can be derived from the detailed design only.
3.4 Architectural and Detailed Design
101
3.4.2 Tools and Techniques Architectural Design The component and deployment models are useful for software architectural design. Architectural design addresses the following aspects of a CBS. (i) Mapping software packages to processors, inter-processor communication via bus and protocol selection (component model) (ii) Identifying the tasks and their interactions (concurrency model) (iii) Mapping of components to the physical hardware (deployment model) Thus, for architectural design, a component diagram and a deployment diagram are used. Detailed Design Class diagrams and object diagrams are widely used UML artefacts for the design of structural views, whereas activity diagrams and state charts are used when designing behavioural views.
3.4.2.1
Component Model
A component in UML is a software artefact that exists at run-time. The component model does the job of packaging the generated object artefacts that exist at run-time into executables, libraries and databases. A component may call other components through interfaces. A component may have multiple instances, which can be deployed on multiple processors. For example, instances of a communication component may exist on all processors in a distributed system. A component is a standalone replaceable entity in the system, and its revision can be replaced without recompiling other components as long as its interfaces are intact. Therefore, the component modeling of a system facilitates easy updating, maintenance and reuse. Graphical notations pertaining to the deployment diagram are shown in Table 3.4. Table 3.4 Artefacts of a component diagram
Artefacts
Graphical notation
102
3.4.2.2
3 Software Development Process
Deployment Model
Components are run-time entities, and a deployment model specifies where these components execute and how they interact. This is where the tyre (software) meets the road (hardware). The design decisions pertaining to the deployment of software components are made in accordance with the number of available devices, especially the processors and the physical communication media. A deployment diagram represents the physical architecture with the interconnected nodes. Nodes represent the processor, sensors, actuators, input devices, displays or any physical object. Connection represents physical interconnection among nodes. A deployment diagram is developed by putting the components in a cuboid to indicate that all these components are deployed in a single processing unit to run. An example of a deployment diagram is presented in the case study.
3.4.2.3
Class Diagrams
A class diagram presents the structure of a system in UML from the perspective of the developer by showing its classes and the relationships among them. A UML class is represented by a rectangle, which is divided into three parts: class name, its attributes (the data encapsulated within a class) and its operations. An operation has a name, and it implements the behaviour of a class. The implementation of an operation is called a method. A connecting link represents the relationship between two classes. The following types of relationships among classes are defined for a class diagram. Dependency: Dependency between two classes indicates that a change in either the structure or the behaviour of one class affects the other class. It is a unidirectional relationship. Association: An association between two classes exists if in order to perform its task, it becomes necessary that an instance of one class should be aware of the other one. An association is represented by a link connecting two classes, which represents the bidirectional association. The association can be unidirectional also, which is denoted by an arrow. The direction of the arrow shows the direction of a query. Associations with no navigability arrows are bidirectional. The multiplicity of an association shows the number of objects that may participate in a particular relationship such as one to one, one to many, many to one and many to many. The multiplicity at one end of an association is the number of possible instances of the class that may be associated with a single instance of the other associated class. The symbolic representations of multiplicities and their meaning are presented in Table 3.5. Aggregation: Aggregation is a special type of association that implies a hasa or whole-part relationship between the aggregate (whole) and its parts or the components. It is an association where one class belongs to the aggregate class. Aggregation is a weak association, where a child (component) can exist
3.4 Architectural and Detailed Design Table 3.5 Multiplicity of association
103 Multiplicities 0..1 n..m 0..* or * 1 1..*
Meaning Zero or one instance n to m instances Zero or more instances One and only instance One or more instance
Table 3.6 Artefacts of a class diagram Artefacts
Graphical notation
Artefacts
Graphical notation
association
independent of the parent. The UML symbol for an aggregation relationship is a line with a hollow diamond at the end attached to the aggregate class. Composition: Composition is a special case of aggregation, where a child cannot exist independently of the parent. It is known as part-of relationship. Generalization: Generalization indicates the superclass-subclass (base classderived class) relationship between classes. The derived class inherits all the properties of the base class. A generalization is represented by a triangle pointing to the base class. Important graphical notations pertaining to the UML class diagram are shown in Table 3.6.
3.4.2.4
Object Diagram
An object is an instance of a class, which contains data specific to the particular object, and its behaviour is defined by the class. Similarly, an object diagram represents an instance of a class diagram, which helps in visualizing the particular
104
3 Software Development Process
functionality of a system. An object diagram depicts the nature and structure of the system at an instant in time to provide a clearer view of the relationships among objects.
3.4.2.5
Activity Diagrams
An activity diagram, which is essentially a flowchart, specifies the flow of control among activities of a process. Activities are connected by transitions, which are labeled with guard expressions (inside .[ ]). A transition may lead to two or more mutually exclusive branches based on a decision, which is depicted as a hollow diamond before branching. An activity diagram can be used for modeling multitasking environments using fork and join transitions. A fork transition leads to two or more parallel activities (threads), which are shown using parallel swimlanes for the individual threads. Multiple threads are merged by a join transition. Important graphical notations pertaining to the UML activity diagram are presented in Table 3.7.
3.4.2.6
Statechart
Statechart models the dynamic behaviour of a system, which is represented by the states that an object passes through in response to various events. Objects have behaviours and states. A statechart of an object (an object with a statechart is known as an active object) shows the possible states of it and the transitions that cause change in states. While the finite-state machine (FSM) is flat, statechart supports hierarchical states and concurrency as well. The following are the basic artefacts used in statechart. Table 3.7 Artefacts of an activity diagram
Artefacts Graphical notation
3.5 Implementation
105
State: The state of an object is represented by a rectangle with round corners. A state may have entry actions (performed at the time of entry to a state), activities (performed as long as the state is active) and exit actions (performed at the time of exit from a state). Transition: A transition is the change of control from one state of an object to another state due to the occurrence of some event. It is represented by an arrow labeled with the event, which triggers the transition—from one state of the system to another. Transitions may also have parameters (passed with the event signal), guard condition (a Boolean expression that must evaluate to TRUE for the transition to take place) and actions (operations to be executed as a result of the transition being taken). Note that this action may be carried out by this or another object. Initial state: It defines the initial (default) state of a system and is represented by a filled black circle. Final state: It represents the final (end) state of a system and is denoted by a filled circle within a circle. Composite state: A composite state is one that has been decomposed into sequential or concurrent substates. History state: It defines the initial state of re-entry into a composite state. Note that a composite state is a state composed of substates (with statechart in the next level of the hierarchy). Therefore, a history state is essential to define re-entry to a particular state (from where it exited) within the composite state. Important graphical notations pertaining to the UML statechart case are given in Table 3.8. An example of statechart is shown in Fig. 3.6. The following are some of the good practices for the use of statechart in the modeling of system behaviour: • Use states to hold behaviour that takes significant time (activities). Activities may take a long time, and in principle, they are interruptible. Hence, activities should not be attached to transitions, state entry or state exit. • Use actions on entry or exit if the action is always taken when a state is entered or exited, respectively. • Use actions on transitions if the actions are executed only on some paths into a state.
3.5 Implementation In this stage of SDLC, the actual development of software begins, and the product is built. The realization of a design into code depends much on the quality of the SDD in terms of its level of detailing and the manner in which the design is organized. For example, coding becomes straightforward and almost person independent, if the dynamic behaviour of a reactive system is modeled using a statechart with every
106
3 Software Development Process
Table 3.8 Artefacts of statechart Artefacts
Graphical notation
state state
decision
state
state
detail of the event(s) and the action(s) associated with each and every transition from one state to another. A real-world example along with the code listings can be found in Sect. 3.9. In addition to adhering to the software’s detailed design, the developer is required to follow the programming guideline (PG) to write the code. Usually, a standard PG, e.g. MISRA-C [174], is chosen so that some tool like LDRA [5] can be used to
3.5 Implementation
107
Fig. 3.6 Statechart
check compliance of code with the PG. Furthermore, the development tool chain like compiler, interpreter and debuggers is also standardized by the developing organization for the chosen language. Higher-level languages like C and C++ are mostly used in the development of safety-critical systems.
3.5.1 Tools and Techniques It is trivial that a software development environment provides an editor for writing code and supports compilation and linking to build the executable binary. However, we need to do more in order to facilitate the implementation of software for a safety system, which includes the use of a verified compiler, static analysis and dynamic analysis of the code under development.
3.5.1.1
Verified/Certified Compiler
CompCert C [162] is an example of a formally verified optimizing C compiler. The biggest advantage of this compiler is that students and researchers can use it for free. Its intended use is compiling safety-critical and mission-critical software written in C and meeting a high level of assurance. It accepts most of the ISO C 99 language, with some exceptions and a few extensions. It produces machine code for PowerPC, x86, ARM, AArch64 and RISC-V architectures.
108
3 Software Development Process
Among the commercially available compilers, the certified compiler of MULTI offered by Green Hills Software is most widely used in safety-critical applications. The Green Hills tool chain is certified for ASIL D, the highest safety class under ISO 26262 [39].
3.5.1.2
Static Analysis
Manual code reviews are time consuming and not so effective for large and complex code bases. This is because it depends on the factors such as the reviewer’s experience and his/her conversancy with the programming language used for implementation. Therefore, automatic techniques to detect common programming errors are desirable. Static analysis is an automatic analysis technique to detect errors in the source code without executing it.
Static analysis of code is performed early in the development process, before software testing (other than the unit-level tests by implementer) begins.
Thus, static analysis techniques play an important role in finding potential bugs, which can be applied during the implementation phase itself to detect programming errors such as (i) violation of programming guidelines, (ii) invalid arithmetic operation like division by zero, (iii) out-of-bound access to the array, (iv) buffer overflow, (v) memory leak, (vi) dereferencing of a null pointer, (vii) use of uninitialized variables/pointers, (viii) non-terminating loops, (ix) presence of unreachable code and (x) data race conditions. It helps in saving considerable amount of time and reducing cost during software testing. Compliance Check to Coding Standard High-level programming languages, like C and C++, have a vast set of constructs and features. Some of them are very much prone to cause an error due to the language semantics which are either ambiguous or open to interpretations by compiler developers. When these high-level languages are used for safety-critical software development, these error-prone constructs/features should be avoided by using a safe subset of the high-level programming languages like MISRA-C [174], MISRA-C++ [173] and CERT C [76]. The use of static analysis tools is desirable to enforce compliance with these coding standards. Static analysis tools such as LDRA TBvision and Astrée are widely used to check for compliance with programming guidelines and report violations, if any. Worst-Case Execution Time Analysis The correctness of a real-time system depends not only on the correctness of the result computed but also on the timeliness of the output (result). Therefore, it is required to perform a schedulability analysis in order to demonstrate that all tasks shall complete their execution before their
3.6 Verification and Validation
109
respective deadline. For this purpose, the worst-case execution time (WCET)4 of each task should be known. AbsInt’s aiT WCET Analyzer statically analyses the task’s intrinsic cache and pipeline behaviour and provides tight upper bounds for the WCET. Important tools and techniques for static analysis are discussed in Sect. 6.6.
3.5.1.3
Dynamic Analysis
Dynamic code analysis is a technique to find out software bugs by monitoring the internal states and parameters during its execution. The process of dynamic analysis involves (i) preparing an input data set, (ii) launching a test program, (iii) gathering data (internal states, parameters and outputs) and (iv) the analysis of data by automated tools known as debuggers or simulators. Important tools and techniques for dynamic analysis are discussed in Sect. 6.7.
3.6 Verification and Validation As introduced in Sect. 3.1.7.4, verification and validation (V&V) is one of the most important activities in the development of software performing safety functions. Furthermore, it is necessary that the V&V is carried out by a team independent of the development team as well as the developing organization. V&V is an essential component of software qualification and therefore is taken up further in a separate chapter (Chap. 5) for a more detailed discussion it deserves. In this section, we briefly introduce the activities involved in the verification and validation at different phases of development. In addition to meeting the functional requirements of an I&C system, software in a safety system also performs functions introduced by the (safety) system design, which include (i) initialization of hardware, (ii) hardware diagnostics and surveillance and (iii) communication between safety sub-systems, their synchronization and handling the effect of communication failure. This makes the verification of software and the system inseparable. In other words, the software safety life cycle is integrated with the system safety life cycle due to the fact that the software requirements are derived from the system requirements specification and its architecture. For example, the software requirement of redundancy resolution cannot be separated from the system’s architectural design. Similarly, software validation is also a part of system validation.
4 It is the maximum time required for a task to complete its execution on a specific hardware resource.
110
3 Software Development Process
3.6.1 The V -Model and the V&V Activities As already discussed in Sect. 3.1.7.4, the V-model is the preferred development approach for safety-critical software, which recognizes and identifies the interrelationship between software and the system safety life cycle. Development activities and the software safety life cycle have been identified and presented in IEC 608802006 [121] using the V -model. The model presented in Fig. 3.7 shows the V&V activities associated with the different phases of the SDLC life cycle.
3.6.2 Verification Activities in Different SDLC Phases 3.6.2.1
Requirements Analysis Phase
During the user requirements definition phase, all the user requirements including user constraints are gathered, and a user requirements document (URD) is produced. It is during this phase that the acceptance test plan (ATP) is also prepared. The verification activities during this phase are: – Review of the user requirements document (URD) – Review of the acceptance test plan (ATP) During the software requirements specification phase, the system validation plan (SVP) is also prepared. This is because software not only meets the functional
Fig. 3.7 The V -model and the V&V activities
3.6 Verification and Validation
111
requirements of a system but also carries out functions introduced by the safety system design as already discussed. The verification activities in this phase are: Review of the software requirements specification (SRS) Review of the system validation plan (SVP)
3.6.2.2
Software Design Phase
Once the user requirements are specified clearly, a system is designed—its architecture and the components of the product are identified. The hardware requirements and the software requirements are separated out of the system requirements as discussed in Sect. 3.3. This is followed by the software requirements analysis paving the way for software design. Software design begins with the software architectural design, which is a highlevel design identifying the software components and their interfaces. Therefore, the software integration test plan (SITP) is prepared at this stage of SDLC. The next step is the detailed design of the individual components and the modules, which is lowlevel design. Note that while components are organizations at the architectural level, the modules are the code-level organizations to facilitate the implementation of software. Thus, this is the time when the software unit test plan (SUTP) is prepared. The verification activities in this phase are: (i) Review of the software detailed design (SDD) document (ii) Review of the software integration test plan (SITP) (iii) Review of the software unit test plan (SUTP)
3.6.3 Validation Activities 3.6.3.1
Software Unit Test (SUT)
A unit test is carried out with the help of the unit test cases designed and documented in the SUTP (produced during the detailed design). Unit tests are performed on the individual modules and components. This test helps identify software defects at the lowest level, which is performed by the software developer during the implementation phase.
3.6.3.2
Software Integration Test (SIT)
The individually tested software modules and the components are integrated together to develop the complete software. This calls for integration testing on the modules and the components, which is performed using integration test cases
112
3 Software Development Process
designed and documented in the SITP (produced during architectural design). Integration testing is usually performed by the testers to validate the developer’s claim. However, in the case of a small project with a small team, this can be done by the developer. In any case, independent verification and validation (IV&V) of the SITP test results is a must for any safety-critical software. 3.6.3.3
System Validation Test
Once the software is incorporated into the system, the system validation test (SVT) is performed in order to validate that the system is able to execute the entire system functionality correctly. SVT cannot be separated from the validation of the software requirements specified during the requirements analysis phase. In other words, the validation of software requirements is a subset of system validation. 3.6.3.4
Acceptance Test
An acceptance test is performed/witnessed by the user, and therefore, it is associated with the user requirements definition (URD) phase. Acceptance testing is carried out in a setup providing an actual user interface and/or an environment dictated by the user in accordance with the acceptance test plan.
3.7 Software Quality Assurance Unlike mechanical systems, the quality of software cannot be assured by testing alone. For software, quality has to be built into the development process right from the beginning—including the requirements specification. Software Quality Assurance (SQA) “SQA is a planned and systematic pattern of all actions necessary to provide adequate confidence that the item or product conforms established technical requirements”. Source: IEEE Std. 730-2014 [25]
3.7.1 SQA and the Software Development Process Well-known facts about software development include the following: – Most of the software defects are attributed to incorrect requirements specifications. – If the software is made simple, the chances are that there is no deficiency.
3.7 Software Quality Assurance
113
– It is the process that makes all the difference. A well-defined development process and adherence to software development standards are required to assure the quality of the software. Objective evidence of adherence to the standards is to be obtained during all phases of the SDLC. It calls for verification and validation throughout the SDLC (software development life cycle) that involves: – SDLC documents to be obtained, reviewed and audited. – Source code to be checked for conformance to coding standards (e.g. MISRA-C). – Wherever possible, the quality of the software should be established using wellestablished metrics. The quality parameters include complexity (measured by cyclomatic complexity, Halstead difficulty level), maintainability, number of defects, number of software change requests (SCR) and number of review item discrepancies (RID).
3.7.2 Software Metrics Software metrics deal with the measurement of the software produced and the process by which it is developed. Software metrics help in software quality control by providing a quantitative assessment. The fundamental qualities required for any technical/engineering system are its: – Functionality: correctness and dependability – Performance: response time and throughput – Cost-effectiveness Software metrics mainly concern with functionality and economy (i.e. costeffectiveness). We introduce here two important product metrics used in accessing the software: size and complexity. While the metrics related to software size have a bearing on the production (development, testing and qualification) cost, the complexity metric tells us about the maintainability and verifiability of the software.
3.7.2.1
Halstead Metric Using Software Science Model
In general, the complexity of a program cannot be judged by its size only. A program with less lines of code (LOC) may not be always less complex than a program with more LOC. Indeed, even a small-size program can be more complex, while a lengthy program can be straightforward. An effective metric for representing the complexity of a program has been defined by Halstead based on operators and operands. Halstead’s metrics can be generated by software tools that count the
114
3 Software Development Process
tokens and determine which are operators and which are operands. The following base measures can be generated subsequently: n1 = Number of distinct operators
.
n2 = Number of distinct operands N1 = Total number of operators N2 = Total number of operands Program vocabulary n = n1 + n2
.
Program length N = N1 + N2
.
Estimated length .
= n1 log2 n1 + n2 log2 n2
The difficulty level of the software D=
.
N2 n1 × 2 n2
The effort required to develop the software E = D×V
.
3.7.2.2
Cyclomatic Complexity
McCabe’s cyclomatic complexity [169] is a software quality metric that quantifies the logical complexity of a software program in terms of the independent paths in the program. The higher the number of this metric, the more complex the code. This software metric is mainly used for: – Accessing and controlling the complexity of a function or module – Determining the number of test cases required for covering all the paths This metric is based on the control flow representation of the program. Control flow depicts a program as a directed graph, where nodes represent the processing tasks and the edges represent control flow between the nodes. The notion of a flow graph is used to measure and control the number of independent paths in a program.
3.7 Software Quality Assurance
115
The complexity of a computer program can be correlated with the topological complexity of the flow graph. Construction of Flow Graph A flow graph is used to represent the control flow of a program, where the nodes denote the program statements and the directed edges show the flow of control. Standard notations used in constructing a flow graph for basic artefacts of the program are shown in Table 3.9.
Table 3.9 Flow graph notations
artefacts
notation
116
3 Software Development Process
Fig. 3.8 Source code for bubble sort algorithm
Computing Cyclomatic Complexity Once a flow graph G is constructed for the given program P , cyclomatic complexity .V (G) can be computed by any one of the following methods. Method 1: If R is the total number of closed regions in G, then V (G) = R + 1
.
Method 2: If E is the total number of edges in the flow graph and N is the total number of nodes in the flow graph G, then V (G) = E − N + 2
.
Method 3: If P is the total number of predicate nodes5 contained in the flow graph G, then V (G) = P + 1
.
Example The computation of cyclomatic complexity is illustrated with an example of the bubble sort algorithm. The source code and flow chart for the algorithm are shown in Figs. 3.8 and 3.9. The following information for computing cyclomatic complexity can be derived from Fig. 3.9. Total number of closed regions in the flow graph .R = 3 Total number of edges in the flow graph .E = 9 Total number of nodes in the flow graph .N = 7 Total number of predicate nodes in the flow graph .P = 3 The cyclomatic complexity .V (G) can be computed in three different ways as follows:
5 Predicate nodes are conditional nodes, which lead to the generation of two branches in the control flow graph.
3.8 Software Configuration Management
117
Fig. 3.9 Flow graph for the bubble sort algorithm
(i) .R + 1 = 3 + 1 = 4 (ii) .E − N + 2 = 9 − 7 + 2 = 4 (iii) .P + 1 = 3 + 1 = 4 Note that a maximum of four test cases will be required for covering all the paths as given below: (i) (ii) (iii) (iv)
1, 7 1, 2, 6, 1, 7 1, 2, 3, 4, 5, 2, 6, 1, 7 1, 2, 3, 5, 2, 6, 1, 7
3.8 Software Configuration Management Changes are inevitable in software because of changes in the requirements and/or the problems found in the software during its use. In order to maintain the quality of the software, it is necessary to analyse the changes, record and implement software change requests in a controlled manner. This is necessary to maintain the correctness and consistency of the software. Software configuration management (SCM) is a techno-administrative process to (i) identify and define configuration items (CIs) in a software, (ii) control the release and changes in these CIs throughout the software development life cycle, (iii) record and report the status of the CIs and (iv) verify compliance with the requirements specified in SCMP. Configuration Item The configuration item or CI mentioned above refers to each element of the software products covering computer codes, SDLC documentation and data. SCM manages the evolution of these software products throughout the development phases and during the maintenance phase as well. All the software tools used during the development life cycle are also controlled by SCM.
118
3 Software Development Process
SCM Activities SCM includes the following activities during the software development life cycle: (a) (b) (c) (d)
The identification of configuration items (CI) and the creation of baselines Configuration control for making changes in CI Keeping configuration status records and their accounting Ensuring that the correct version of the entities/software product is released and used
3.8.1 Configuration Identification and Baseline This activity establishes schemes for the identification of the items to be controlled and their versions. It includes identifying, naming and describing the documented physical and functional characteristics of the items to be controlled for the software development. A controlled item can be (i) an intermediate or a final output (such as an executable, source code, SDLC document, database, test case, test plan, etc.) or (ii) an element of the support environment (such as a compiler, operating system, programming tool, analysis tool, etc.). Baselines are created to establish and maintain the approved milestones (documents, code, parameter data items, etc.) at every stage of SDLC. These baselines are used for the review, testing and release of the product. A baseline, along with approved changes, constitutes the current configuration of any CI. In general, there are four baselines [179], viz. (i) functional, (ii) allocated, (iii) developmental and (iv) product. Each baseline implies the completion of a milestone and is used as a base document for the next phase of the development process. The items to be controlled under each baseline are shown in Table 3.10.
3.8.2 Configuration Control During the software life cycle, changes are inevitable. Configuration control manages the changes made in the CIs after establishing the baselines. It includes (i) request for change, (ii) evaluation of requested change, (iii) approval or disapproval of change and (iv) implementation of required changes to the baselines. Changes may arise due to (i) requirement change request, (ii) enhancement and (iii) error correction. The development team leader manages and controls all the CIs.
3.8 Software Configuration Management
119
Table 3.10 Baselines for configuration control Baseline Functional (requirements) Allocated (design)
Developmental (testable)
Product (delivered)
Items to be controlled in the baseline System requirements specification (SyRS) System architectural design (SAD) Software requirements specification (SRS) Software design descriptions (SDD) Computer software (model, code, executable file, etc.) Software unit test plan and reports (SUTP&R) System Integration Test Plan & Reports (SyITP&R) SyRS
Event for creating baseline Release of the SyRS Release of SAD and SRS
Review and approval of software products
Delivery of the system after testing
SAD SRS SDD SUTP&R SyITP&R User manual System build document Executable code
3.8.3 Configuration Status Accounting This activity keeps a record of the information on the status of configuration items (status of documents, reports of approved changes, status of proposed changes and status of product version). Configuration item status is maintained as per the format specified in SCMP.
3.8.4 Tools and Techniques Version control tracks changes in the source code or any other files in order to facilitate collaboration and concurrent development among multiple developers and help in having backups on the remote repository. The version control system makes it possible for developers to roll back to the previous working state and the source code’s last version if needed. The system records all the changes made to a file so that a specific version may be rolled out or rolled back if required in the future.
120
3.8.4.1
3 Software Development Process
Concurrent Versions System (CVS)
Concurrent Versions System (CVS) creates and uses a simple repository and maintains all the versions in a single file. It stores only the differences between successive versions of the original file. Client/server CVS allows more than one developer to work on the same file(s) from different geographic locations.
3.8.4.2
Subversion (SVN)
Subversion (SVN) is a free/open-source software version control system. It uses a central repository, where users check out files/directories, work on them and check them in after modification. Core items of SVN are as follows. • Repository: A repository is a shared database where all the project artefacts like design, source code, documentation and test cases with the complete revision history are stored. • SVN checkout: It makes a copy of the project artefacts from the central repository to the local machine. • A working copy: The set of directories and files on a local machine is called a working copy. The working copy is the individual’s work area, which neither incorporates other people’s changes nor makes local changes available to others, until explicitly made to do so. Multiple users can work simultaneously in their individual working copies. • SVN update: If the repository has been changed after checkout, the SVN update fetches those changes from the repository into the working copy. • Commit: It saves the changes made in the working directory to the repository as a single atomic transaction. During the commit, meaningful commit messages should be provided to describe the purpose of changes. Important steps of a typical SVN workflow (as shown in Fig. 3.10) for source code management are as follows. (i) A contributor (programmer) copies the project in working copy using svn checkout. (ii) A working copy is edited using svn add, svn delete, svn copy, svn move, etc. Changes can be tested, reviewed and, if required, reverted using svn revert. (iii) Changes made in the repository after checkout are fetched using svn update. (iv) Conflicts are resolved using svn resolve during merging. (v) Changes are committed to the Repository using svn commit.
3.8.4.3
Git
Git is a distributed version control system, which is designed to manage the project with multiple contributors. Git allows every contributor to have his/her own local
3.8 Software Configuration Management
121
Fig. 3.10 SVN workflow
Fig. 3.11 Git workflow
repository, and a master copy of the project is maintained at a central location which serves as a remote repository for all the contributors. As every contributor works on a local repository, (s)he can track and commit the changes locally. Once the changes are stable, they can be pushed/merged with the remote repository. The typical workflow for source code management using Git is shown in Fig. 3.11. The contributor first clones the master repository from remote to create a local repository. Then (s)he switches to one of the branches (or creates a new branch) which is required to be modified using the git checkout command. The required changes are made and added to the staging area. These changes are first
122
3 Software Development Process
committed to the local repository using the git commit command. Generally, one of the contributors is designated as the maintainer, who is responsible for merging the code from different contributors and resolving conflicts, if any, during merging these local changes to the master branch. When appropriate changes are made by a contributor on his branch and committed locally, the changes made by other contributors on the same branch are pulled from the remote repository and merged locally. These changes are pushed back to the remote repository so that all the other contributors can pull the changes carried out by a particular contributor working on the same branch. Finally, a request is made to the maintainer to merge the changes made in a particular branch to the master branch. The maintainer accepts (or rejects) the request, and changes are merged (or dropped) to the master branch after resolving the conflict and considering inputs from the testing team, as may be applicable.
3.9 Development of AAS: A Case Study Let us take you along on a journey through the development of a real-world, but simple, software development project right from understanding the user requirements, capturing software requirements and software design to writing code. The project under discussion is the development of an alarm annunciation system (AAS).
3.9.1 User Requirements We need to develop an alarm annunciation system (AAS) conforming to the ISA standard 18.1 ring-back alarm sequence [140]. The requirement is the development of a software for a small and scalable AAS consisting of alarm cards and one communication card sitting on the .I 2 C backplane with the following details: (i) The maximum number of alarm cards is limited to eight. In other words, the backplane has only eight slots for the alarm cards. (ii) All the alarm cards are identical except that each card will be assigned a card ID based on its slot number. (iii) An alarm card receives potential-free contact input from the field and from the operator console or simply the console directly wired to it. It sends outputs to the console in the form of potential-free contact for audio and visual alarms. (iv) The communication card receives alarm data from n number of alarm cards (.1 ≤ n ≤ 8) and sends data to an operator console (a PC screen) through a RS-232 link. (v) The operator console software and hardware are beyond the scope of this project.
3.9 Development of AAS: A Case Study
123
Note that the user often does not specify the requirements as clearly as presented above. This is the result of across-the-table discussions by the system designer/developer with the user. Nevertheless, for a software developer, even the above requirements specification is not complete as we will find out in the requirements analysis phase of the software development life cycle (SDLC). UML (Unified Modeling Language) [120, 184], a universally accepted modeling language, is used throughout the discussion on the development life cycle.
3.9.2 System Specification and System Architecture Arriving at the system specification and the system architecture is almost straightforward from the user requirements stated in Sect. 3.9.1. The following inputs for system specifications and the design of its architecture can be derived from the user requirements. 1. The system consists of the following: (a) A set of alarm cards, the number of which is derived from the given number of alarm parameters (inputs) along with spare input requirements (b) One communication card (c) The associated hardware, which includes the power .I 2 C backplane, RS232C communication interface along with cables and the power supply 2. Alarm processing is done in an alarm card, which interacts with the field inputs and outputs (IO) as well as the operator IOs. 3. The communication card receives an alarm status data of the field parameters (inputs) from one or more alarm cards. In addition, it may also receive the operator command inputs. Note that whether the operator actions (command inputs) will be logged by the alarm card is not clear from the user requirements as yet. 4. The operator console receives all the data it requires, through the communication card. The system architecture of the AAS is as shown in Fig. 3.12.
3.9.3 Capturing Software Requirements In order to capture the software requirements from the user requirements, let us first carry out the software requirements analysis by identifying the manageable domains and their interfaces, which will help us to: – Allocate the software functions to small independent modules along with their interfaces
124
3 Software Development Process
Printer port
I2C Bus
RS-232C port
Communication Card
Set of Alarm Cards
Fig. 3.12 Alarm annunciation system (AAS): system architecture
– Arrive at the operation profile, which specifies how users will use the software/system The first step in the software requirements analysis/specification phase is to identify the key functional requirements. 3.9.3.1
Key Functional Requirements
(i) Read the alarm parameter contact inputs. (ii) Read the operator inputs. (iii) Send the visual and audio alarm outputs in conformance with the ISA 18.1 ring-back sequence [140]. (iv) Do card health diagnostics. (v) Send alarm data to the communication card over the communication bus (on request). (vi) Send data to the operator console (PC) on demand over the RS-232 serial line. Given the above system functions, let us try to identify various domains attached to the specific system function(s) and their dependency on other domains. In the process, we will also refine the user requirements to arrive at the software requirements to make it complete, unambiguous and verifiable. For example, reading the alarm contact input also involves its validation to take care of contact bouncing6 and to discard any momentary spurious contact actuation. Alarm Annunciation Domain: Its key system functions are: (i) Read the alarm contact inputs and validate the data. (ii) Send the alarm window output.
6 When a mechanical switch or relay contact closes, the switch elements often bounce for a brief period (typically, a few milliseconds), before making its final contact.
3.9 Development of AAS: A Case Study
125
(iii) Detect operator action, i.e. actuation of the ACK, RST and TEST push buttons.7 (iv) Execute the ring-back alarm annunciation sequence after identifying the alarm input state and responding to operator actions at each and every appropriate stage. Dependent on: None I 2 C Bus Interface Domain: Its key system functions are: Send alarm data to the communication card over a data bus (I 2 C bus) on request. Dependent on: (a) Alarm annunciation domain (b) Communication domain Communication Domain: Its key system functions are: Acquire data—alarm as well as diagnostics data—from the alarm card through a data bus (I 2 C bus) and send it to PC (operator console) over the RS-232C serial link, on demand. Dependent on: Alarm annunciation domain Card Health Diagnostics Domain: Its key system functions are the following: (i) Carry out periodic FIT (fine impulse test). (ii) Carry out the read-back test. (iii) Send health status data. Dependent on: None 3.9.3.2
Refinement of the Functional Requirement Domains
Note that the two distinct hardware, alarm card and communication card, communicate with each other over the .I 2 C bus. Thus, the bus interface domain has two distinct functional requirements. The communication card will have a piece of software, which will act as the .I 2 C master, and the alarm card will have the .I 2 C slave software. So, we can refine the .I 2 C bus interface domain and split it into two distinct domains as follows.
7 The ACK, RST and TEST push buttons are used by the operator to (i) acknowledge a new alarm, (ii) reset when the alarm condition clears and (iii) test the functioning of the alarm system, respectively.
126
3 Software Development Process
I 2 C Slave Domain Its key system functions are:
.
Receive a request and send alarm data to the communication card over the .I 2 C bus. Dependent on: (a) Alarm annunciation domain (b) .I 2 C master domain I 2 C Master Domain Its key system functions are:
.
Send a request to the alarm card and receive data from it over the .I 2 C bus. Dependent on: (a) .I 2 C slave domain (b) Communication domain 3.9.3.3
The Domain Model: Packages and Object Diagram
In UML, the requirement domains can be modeled as packages, and we can develop an object model to show the dependencies and interfaces among the packages as shown in Fig. 3.13. 3.9.3.4
Review of the Domain Model
The development of software requires a process that supports a dynamic review at various stages of its development. AlarmCard
ComCard
I2CSlave I2Cmaster
HealthDiagnostic
RS232com
Fig. 3.13 Object diagram: the domain model of the alarm annunciation system (AAS)
3.9 Development of AAS: A Case Study
127
It is necessary to conduct a review of the domain model with domain experts (the application experts) and the team members. Checklist for Domain Review The following checklist can be handy for the purpose of the review of the domain model: 1. Are all the key system functions assigned to a domain? 2. Are there any missing interfaces between domains? 3. Are the interfaces between domains logically correct vis-à-vis the system requirements? 4. Are all the names and descriptions clearly written? In other words, are the descriptions unambiguous? 5. Are all the names consistent throughout the documentation?
3.9.4 Use Case and Scenario Model As defined by Ivar Jacobson himself in [142]: A use case is all the ways of using a system to achieve a particular goal for a particular user. Taken together the set of all the use cases gives you all of the useful ways to use the system, and illustrates the value that it will provide.
A use case is a coherent piece of functionality of a system that is visible from outside the system as a black box. All the use cases together define the system from the perspective of users’ requirements. Use cases are strictly behavioural, and they neither define nor imply a set of objects or classes. Use cases represent system behaviour that interfaces with the objects in the system’s external world. These objects are called actors.
3.9.4.1
Use Case Diagram
Use case diagrams describe what a system is expected to do from the user’s point of view. The emphasis is on what is expected from a system rather than how it is to be achieved. During the phase of specifying the user requirements and understanding them better, use case diagrams are helpful mainly in the following areas. Deriving system features: The functional grouping of the requirements helps bring out the features of the system, which can also be categorized as the most essential, immediate requirements, etc. in the timeline of development. Also, new use cases can be generated with the progress of requirements analysis. Ease of communication with users/clients: The pictorial simplicity of use case diagrams makes them easier for the developer to communicate with the user effectively without getting into implementation details.
128
3 Software Development Process
Fig. 3.14 Use case model of the alarm annunciation system (AAS)
Generation of test cases: A number of scenarios that are identified against a use case often pave the way for generating test cases. The functional grouping of the requirements and the domain model helps us to arrive at the use cases and the actors for the AAS and develop the use case model of our system. It can be observed from Fig. 3.14 that the set of use cases for AAS comprises the following: – AlarmAnnunciation – I2CslaveCom – Diagnostics Note that the domain model suggests the need for a communication bus interface, which is an .I 2 C bus, and we found from the user requirements that the AAS shall act as an .I 2 C slave, and hence the use case I2Cslave is identified. In addition, more detailed requirements reveal themselves during the process of the identification of use cases. For example, the requirement of time-stamping the alarms and assigning the responsibility to a particular software component can be resolved in this phase itself. Specifically, a decision can be made whether timestamping will be done by the alarm card or by the communication card depending on the requirements as well as the hardware capabilities.
3.9 Development of AAS: A Case Study
129
The actors (represented by the human figure), i.e. the external objects that interface with the system, are: – FieldInput representing the alarm contacts – Operator representing the operator command through the ack, rst and test push buttons – AlarmWindow for visual alarm – Hooter for audio alarm – Display (alphanumeric display) for the card health diagnostics display – .I 2 Cbus for communication between the communication card and the set of alarm cards In addition to depicting the interfaces of the system with the external objects (actors), the use case diagram (Fig. 3.14) also brings out the interfaces among the use cases. This leads us to specify the interface requirements, which come under the category of non-functional requirements. Let us figure out the other non-functional requirements of the AAS.
3.9.4.2
Non-functional Requirements
In addition to interface requirements, the non-functional requirements include (i) the performance requirements and (ii) the constraint requirements. Interface requirements: The following interface requirements straightaway stem from the use case model: 1. 2. 3. 4. 5.
I 2 C slave interface Operator interface Field input interface Alarm interface—both audio and visual Alphanumeric display interface
.
The details of these interface requirements are not presented as this involves specific implementations, which does not add much value here. Performance requirements: One of the key performance requirements of almost all software-based systems is its response time. In the case of the AAS, it is the time that will be taken by the system to generate output for alarm annunciation following an event of a plant parameter going beyond its limit. Depending on the criticality level of the system and the assumed operator response, the response time of a typical alarm annunciation system can vary from 100 to 500 ms. However, if the AAS also has to play the dual role of alarm
130
3 Software Development Process
annunciator as well as an event sequence recorder,8 the user can demand a faster response time—5 to 10 ms. Going by the system architecture and hardware capabilities, the alarm processing along with its time-stamping and audio-visual annunciation is done by the alarm card, and it saves (keeps in a buffer) the relevant alarm data for the communication card to acquire. Thus, the rate at which the communication card will acquire alarm data periodically from individual alarm cards generates one more performance requirement. While the user requirement drives the rate, the developer should also keep in mind that this should not cause any buffer overflow in the alarm cards. Constraint requirements: In safety system software, often the constraint requirements originate from the use of specific hardware, development tool chain, specific programming language and the use or non-use of a real-time operating system (RTOS). For AAS, the following constraint requirements can be specified: – The use of hardware: In order to facilitate faster alarm processing and keep it independent of the less critical job of communicating data for archiving and operator information (other than a hardwired audio-visual alarm), it is required that: · Alarm processing and annunciation be done in the individual alarm cards · Communication for archiving and operator information be processed in the communication card
. .
– Programming language: The use of MISRA-C[174], a safe subset of C, is mandatory for this software development. – The use of RTOS: No use of RTOS or real-time executive
3.9.4.3
Realization of Use Cases: Sequence Diagram
The realization of use cases starts with the identification and the textual description of the interactions between the user and the system represented by the use case descriptions. Such a description is termed as scenario. Sequence diagrams (SD) take a scenario further by providing a pictorial description of how a particular system behaviour, represented by a use case, is realized by interactions among a group of objects. Here is a scenario of the AAS derived from the user requirements specifications.
8 An event sequence recorder records the events of some selected (critical) parameters going beyond their specified limits, at a faster rate. This helps in post-incident analysis to figure out the initiating event, which often becomes the cause of the occurrence of the subsequent events.
3.9 Development of AAS: A Case Study
131
Alarm annunciation follows the ISA 18.1 ring-back sequence (ISA 18.1R) as follows: 1. When there is an alarm (read by the contact input status), the corresponding alarm window flashes at high frequency, and the high-frequency hooter buzzes. 2. Next, when the operator presses the Ack push button, then and then only the alarm window will glow steady, and the hooter is silenced. 3. Thereafter, when the alarm gets cleared (read by the contact input status again), the corresponding alarm window flashes at a lower frequency, and the lowfrequency hooter buzzes. 4. Finally, when the operator presses the Reset push button, the corresponding alarm window goes dark, and both the hooters are silenced. The sequential diagram (SD) is basically an instance of a use case in realizing a particular system function. As already discussed, an SD depicts a set of objects and a sequential exchange of messages between the objects meeting a specific system requirement. Thus, in the process of developing a scenario model, many objects, viz. AlarmMon, FieldIntfc, OperatorIntfc, ALWindow and Hooter, are identified/discovered as shown in Fig. 3.15. It may be observed that the figure has a clarifying note, which helps provide additional information for improved clarity. Such notes can be attached to any UML diagram. Note that the above alarmFunction scenario is a simple one. However, scenarios may have branching points, at which there may be several responses available to either actors or the system. Every path with a unique set of branches constitutes a separate SFD. Thus, many SFDs may be required to fully elaborate a use case.
3.9.5 Detailed Design The detailed design is carried out in two stages: first the high-level architectural design, which is followed by the detailed design of the software packages pertaining to each architectural component and its interface.
3.9.5.1
Architectural Design: Component and Deployment Models
The component and deployment models are useful for the software architectural design. The architectural design decides on key strategies for the large-scale organization of the system. These strategies are: – Mapping software packages to processors, inter-processor communication via bus and protocol selection (the component model) – Identifying the tasks and their interactions (the concurrency model) – The mapping of components to the physical hardware (the deployment model)
132
3 Software Development Process FieldIntfc
Operator Intfc
AlarmMon
ALwindow
Hooter
UpdateALinp()
ProcessAlarm() FlashModes STEADY=0 Hi=1 FlashWindow(WinNo, FlashMode=Hi) Lo=2
ALARM
BuzzHooter(HooterNo=Hi, Mode=ON) UpdateOinp()
ProcessOpAction()
ACK FlashWindow(WinNo, FlashMode=Steady) BuzzHooter(HooterNo=Hi, Mode=OFF) UpdateALinp()
ProcessAlarm() Ringback FlashWindow(WinNo, FlashMode=Lo) BuzzHooter(HooterNo=Lo, Mode=ON) UpdateOinp() ProcessOpAction()
NORMAL FlashWindow(WinNo, FlashMode=OFF) BuzzHooter(HooterNo=Lo, Mode=OFF) BuzzHooter(HooterNo=Hi, Mode=OFF)
Fig. 3.15 Sequence diagram: alarm sequence of the use case alarmFunction
Component Diagram A component in UML is a software artefact that exists at run-time. The component model does the job of packaging the generated object artefacts that exist at run-time into executables, libraries and databases. Components are important in any system to facilitate easy updating, maintenance and reuse. We will construct the AAS software components as participating layers of abstraction.
3.9 Development of AAS: A Case Study
AlarmAnnunciation
133
Communication
I2C_slave
I2C_Master
HardwareAbstraction
Fig. 3.16 Component diagram: the alarm annunciation system (AAS)
The most abstract layers (closest to the application domain) are at the top. They are: (i) AlarmAnnunciation (ii) Communication and the most concrete layer (closest to the hardware) is at the bottom, and it is: HardwareAbstraction The components in UML are shown as rectangles with two tabs at the upper left. The component diagram (Fig. 3.16) shows the software components participating in various layers of the AAS software and their relationships. Deployment Diagram The components are the run-time objects that execute or provide data. The deployment model is used to describe where these components execute and how they interact between processors. This is where the idea (software) finds its expression in the stone (hardware) creating a sculpture. The decision of deploying software modules in relevant hardware components like processors, communication devices, etc. is presented in the deployment diagram. The deployment diagram (Fig. 3.17) shows the relationships between software and hardware components involved in the AAS. It can be observed from the figure that the components AlarmAnnunciation, Hardware Abstraction and I2Cslave are
134
3 Software Development Process
AlarmAnnunciation
Communication
I2C_Master
I2C_slave I2C_slave
I2Cbus
HardwareAbstraction
HardwareAbstraction
Fig. 3.17 Deployment diagram: the alarm annunciation system (AAS)
deployed in the alarm card, i.e. they run on the processor in the alarm card. The components Communication, Hardware Abstraction and I2Cmaster run on the processor in the communication card. The physical hardware communicating with each other is termed as a node, and each component belongs to a specific node.
3.9.5.2
Structural Model: Object Diagram
Objects start getting identified when use cases are realized in sequence diagrams or sequence charts. We also know that an object is an instance of a class. The static structure of the classes is presented in a class diagram, and it only displays the associations among them, i.e. which of them interacts with each other and not what happens when they do interact. The class diagram shown in Fig. 3.18 gives an overview of the alarm annunciation system by depicting its classes and the relationships among them.
3.9.5.3
Behavioural Model
Any UML model of a software will have at least one active object, and an active object is either associated with a statechart or an activity diagram. The behaviour of an active object is dynamic and exhibits unique behaviour specific to its states. In other words, the dynamic behaviour of the piece of software during its execution in a processor is represented by an active object. It is an independent engine9 that
9 Theoretically,
each active object can be run by an independent processor.
3.9 Development of AAS: A Case Study
135
AlarmMon RecvnSendAlarmData
I2Cslave 1
1
talks
1
1
1
IOintfc
Flashes
1 Buzz
1..*
ALwindow
RecvOpInpData 1..* FieldIntfc
1..*
1..* OperatorIntfc
Hooter
Fig. 3.18 Class diagram: the alarm annunciation system
changes its state depending on events, which can be a change in a process parameter or an operator input that the system acquires or a change in some internal variable demanding a change in the state. The statechart of the active class AlarmMon that models its dynamic behaviour is presented in Fig. 3.19.
3.9.6 Concurrency Model Concurrency is inherent in most real-world systems, where a set of system functions are required to be executed simultaneously and often independently. In realtime system parlance, these system functions or tasks, which can be executed concurrently, are assigned to individual threads of control for a processor to execute them. Actions (statements) within a thread can be executed sequentially. A task is often associated with a priority and a deadline to complete its execution once a thread/task is ready. Real-time systems usually have more than one task, and the exchange of information (by passing messages or sharing variables) between them may be necessary to meet the system requirements. The concurrency model identifies and establishes the relationship between the tasks and the messages they exchange. In other words, the concurrency model brings out the following: (i) The message arrival pattern—synchronous or asynchronous (ii) Deadlines within which the system must respond to the external events
136
3 Software Development Process
/FlashWindow(no, Mode=OFF); BuzzHooter(no=Hi, mode=OFF) Normal
ALARM/FlashWindow(no, mode=Hi); BuzzHooter(no=HI, mode=ON);
Acknowledged Alarm RESET/ FlashWindow(no, mode=OFF); BuzzHooter(no=HI, mode=OFF); BuzzHooter(no=Lo, mode=OFF);
ACK
ALARM/ FlashWindow(no, mode=Hi); BuzzHooter(no=HI, mode=ON);
RingBack
[ALARM==TRUE]/ FlashWindow(no, mode=STEADY); BuzzHooter(no=Hi, mode=OFF);
C
[ALARM==FALSE]/ FlashWindow(no, mode=Lo); BuzzHooter(no=Lo, mode=ON); BuzzHooter(no=Hi, mode=OFF); ALARM/FlashWindow(no, mode=Lo); BuzzHooter(no=Lo, mode=ON);
Fig. 3.19 Statechart of the active class AlarmMon
In UML, concurrency can be modeled in a couple of ways. Object diagrams can show the tasks (represented as active objects) directly. In UML, an active object is the root of a thread. We can develop a task diagram (the object model diagram of active classes or objects), concurrent activity diagrams, etc. to capture tasks and their interactions.
3.9.6.1
Defining Tasks for AAS
Since it was a requirement not to use classes and objects in the code,10 no object diagram was developed showing concurrency. But the use cases were analysed, and the tasks were defined using the following strategies, considering that those tasks perform a set of actions in response to a set of related events. (i) Event Source: Events from a common source were grouped together, and they were put into one thread: Thread1: Operator input (reading and validating) Thread2: Field inputs, i.e. alarm contact inputs (reading and validating) (ii) Sequential processing
10 This
is due to the user constraint of using MISRA-C as the programming language.
3.9 Development of AAS: A Case Study
137
Thread3: AAS follows the ISA18.1R sequence, and therefore, it calls for a series of steps, which should be performed sequentially. Hence, these sets of actions were grouped within a single thread to enforce the requirement. (iii) Interface Device Thread4: The control of a specific interface was assigned to a single thread. So, handling the .I 2 C data becomes another thread. (iv) Timing Characteristics Thread5: Various card health diagnostics (FIT,11 read-back) were required to be done periodically, and hence, they were assigned to a single periodic task.
3.9.6.2
Reviewing the Tasks for AAS
We had a constraint of not using RTOS for this software. This constraint was not a regulatory requirement but because of resource availability. It was preferred to execute the software on a single thread. So, the grouping of the tasks defined in Sect. 3.9.6.1 was refined. It was found that Thread1, Thread2 and Thread3 could be grouped together and put under a single thread of execution, which is assigned to task .τ1 . Also, this refinement was not coming in the way of enforcing requirements. What about the remaining concurrent tasks? An alarm card acts as an .I 2 C slave to send alarm and health data to the communication card over the .I 2 C communication bus for logging. This remains a concurrent task and so also the periodic Health Diagnostics of the hardware. Therefore, the final set of tasks is as follows: (i) .τ1 : Alarm processing (a) Read and validate the alarm contact inputs. (b) Read and validate the operator inputs (ack, rst and test). (c) Perform the ISA 18.1R alarm sequence. (ii) .τ2 : Logging Handle the .I 2 C data (interrupt) request for alarm and health diagnostics data logging. (iii) .τ3 : Diagnostics Carry out a periodic health diagnostics of the hardware. The requirements of (i) processing and the annunciation of alarms every 20 ms, (ii) carrying out the alarm card health diagnostics every 100 ms and (iii) sending alarm and health data to the communication card every 30 ms determine the periodicity of these tasks.
11 Fine impulse test (FIT) is carried out to check the health of a digital input board. The input at the board level is toggled for a very short duration (fine impulse) to verify if the input is read correctly and reverted back fast enough ensuring that the board does not actually update the input data.
138
3 Software Development Process
Considering the observed worst-case execution time (WCET) of these tasks, the following task set .τ emerges: τ = {τ1 (5, 20), τ2 (2, 30), τ3 (1, 100)}
.
Note that a task is represented by .τ (Ci , Ti ), where .Ci is its worst-case execution time and .Ti is the periodicity in milliseconds, which is equal to its deadline. Further Refinement This task set can be further refined when we look into the very small execution time of diagnostics task .τ3 in conjunction with the large slack time of alarm processing task .τ1 . These two tasks can be combined together into a single task, where diagnostics and alarm processing can be executed sequentially. So we can have a new task .τ1 (7, 20) using a simple trick of having a counter so the code pertaining to diagnostics is executed every f ive cycles of .τ1 , keeping the periodicity of diagnostics unaltered at 100 ms. This reduces the number of independent tasks in .τ to two, which are easily managed by the use of timed interrupts: τ = {τ1 (7, 20), τ2 (2, 30)}
.
3.9.6.3
Handling Babbling Idiot
The alarm annunciation and communication of the alarm and the health data for logging have been separated by using deployments in two different cards/boards. But considering that alarm processing and annunciation are more critical than information logging, it is necessary to use defensive computing so that even if the communication board becomes a babbling idiot due to failure or security vulnerability, the alarm board continues functioning as specified. This is achieved by providing a time delay between two successive requests by the communication board. Once a request (query) is made by the communication board, it generates an .I 2 C interrupt at the alarm card (.I 2 C slave). The alarm card ensures that (i) the .I 2 C interrupt is disabled following the acknowledgement of the query, (ii) a timer is fired at the same time and (iii) the interrupt is enabled only after the expiry of the time delay, which is 30 ms in this case.
3.9.7 Implementation Snippets Harel’s statecharts [111] form the basis of UML state diagrams. The primary feature of a state diagram is its support for finite-state machines (FSM). However, it may be noted that a statechart has more expressive power than a flat one-dimensional FSM. The example of AAS helps simplify the exposition of realizing a model into a code as its dynamic behaviour can be modeled simply by an FSM. In order to implement an FSM, we use the switch-case statements of C language, where the value of the case determines the state of the FSM. The code within a particular case performs
3.9 Development of AAS: A Case Study
139
the activities in the state, and it also responds to the event for transitions. Let us look at the example of realizing the statechart of the active class AlarmMon presented in Figs. 3.20 and 3.21 using C-code.
Fig. 3.20 Realization of the AAS statechart: Listing 1
140
3 Software Development Process
Fig. 3.21 Realization of the AAS statechart: Listing 2
3.10 Summary and Takeaways This chapter introduced the software development process defining its two aspects— the management plan and the development approaches. Considering the larger scope of this book, a concise yet complete description of various software development life cycle models along with the phases of development is presented. The tools and techniques applicable to various phases of development are also introduced for the benefit of the practitioners. Though it has been stressed that in conjunction with a clearly defined software development process, strict adherence to the applicable standards is essential for the development of a safety-critical software, the discussion on standards is kept to a minimum in this chapter. This was a conscious approach, which was followed so that a reader can appreciate the next chapter, dedicated to the standards and its compliance, better without getting entangled in the web of guides and standards from the beginning itself. Following the same approach, the real-world case study presented at the end of this chapter is also aimed at offering a guided tour towards development with minimum discussions on the standards.
•
? Review Questions
1. What are the limitations of the waterfall model of software development? 2. When the requirements are not fully analysed (e.g. the efficiency of your algorithm), which life cycle model will be the best choice?
3.10 Summary and Takeaways
141
Fig. 3.22 Code for the binary search algorithm
3. What is the difference between verification and validation in the context of software development? 4. Can you identify the applicable UML artefacts at different phases of software development? 5. Statechart can be used in which phases of software development? Can you justify its use in the software requirements analysis/specification phase itself? 6. What is the use of a UML sequence diagram? 7. How can static analysis tools help improve software quality? 8. What do you mean by baseline in the software configuration management process? 9. Can you explain the workflow for any change in a CI after its baselining? 10. Can you draw the flow graph for the code of the binary search algorithm given in Fig. 3.22 and compute the cyclomatic complexity?
Chapter 4
Complying with Standards and Guides
If you want to date an engineer, raise your STANDARDS. – Anonymous
A safety standard, like any other industry standard, is the collective wisdom of experts aimed at guiding stakeholders—designers, developers, manufacturers and users—towards making products suitable for its domain of application and driving innovations towards safe and user-friendly holistic solutions. However, a safety standard has its primary focus on functional safety, and therefore, it led to the development of a number of industry-specific standards. Historically, industries have grown more or less independent of each other and developed/improved their own methodologies for the development of safety systems. This is evident from the domain-specific standards which are followed by industries. For example, the standards and techniques followed by the avionics, nuclear and automobile industries are different and naturally so, because of the uniqueness of their application domains. However, the underlying principles and the safety goals of all these industries remain the same, and all the industryspecific standards evolved from IEC 61508 [128], the umbrella standard for electric, electronic and programmable electronic (E/E/PE) safety systems. Further, the certification of suitability by a regulatory authority (local, national or international) is essential before any equipment/plant system having a safety implication can be put to use. It is necessary that the standard development procedure is followed so as to help facilitate the certification by an independent and qualified team or agency. Some of the important domain-specific safety guides and standards, which are to be followed in order to obtain regulatory approval, include the following: – Nuclear safety guides and standards – – – –
UN-NRC guides (e.g. [40, 203]) IAEA safety guides (e.g. [15, 119]) IEC 60880 [121] IEEE 603 [136]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. Karmakar et al., Development of Safety-Critical Systems, https://doi.org/10.1007/978-3-031-27901-0_4
143
144
4 Complying with Standards and Guides
– Avionics – DO-178C [21] – ARINC 653 [59] – Automotive – ISO 26262 [30–39] – MISRA-C [174] In addition to domain-specific guides and standards, a large number of industry standards have been developed by professional organizations, IEEE and IEC, which specify the means to fulfil the requirements of regulatory guides. Some domainspecific industry standards are also endorsed by the corresponding regulatory guides. In this book, these industry standards are referred to as working standards. The role of these working standards is described along with the identification of the overlaps and extensions among the related ones. The number of working standards is considerably large, and the system-level generic standards are further elaborated in the form of application-specific standards as well as standards specific to various software development phases. This chapter offers a holistic approach towards compliance with standards accompanied by detailed discussions on their applicability at various stages of development.
4.1 Codes, Safety Standards and Guides Safety codes are intended to establish objectives and to set minimum requirements that are to be met for providing adequate assurance of safety in any safetycritical system/facility. Safety guides provide guidelines and suggest methods for implementing specific requirements that help conform to the relevant safety code(s). Correctness of safety-critical systems in the nuclear, avionics, automotive and other industries (railways, medical, etc.) is of prime importance. This is because system malfunction due to any reason may result in serious consequences in the form of either loss of human life, damage to the environment or huge economical loss as well as loss of credibility. Therefore, safety-critical systems must be designed, implemented and tested to ensure correct, robust, efficient performance and elimination of any possibility of potential system failure. However, various countries have developed their own code owing to country-specific legal requirements. These codes are intended to establish a set of objectives and minimum requirements which are to be fulfilled to provide adequate assurance for safety. Regulatory (international as well as country-specific) and technical/industrial bodies have produced various guides and standards, respectively, for the development of safety-critical systems in conformance with safety codes.
4.1 Codes, Safety Standards and Guides
145
4.1.1 Codes Codes are the set of rules which are imposed on the industry to ensure safety, quality, compatibility and other benefits. It is to be noted that the term code in this chapter refers to the set of rules for safety system development and has nothing to do with the software program (code). Codes and regulations spell out the requirements that must be fulfilled in order to ensure the safety of people and the environment. They set mandatory legal requirements, which are enforced by different levels of government and regulatory authorities.
4.1.2 Safety Standards Safety standards provide the fundamental principles, requirements and recommendations to ensure safety. These standards are used worldwide for the safety-critical system development in order to protect people and the environment from undue risk. These standards have been developed by gathering, integrating and sharing the knowledge and experience gained over the years by domain experts. Compliance with all the code requirements is necessary for building and operating a facility in a safe manner. Safety standards provide recommendations and guidance for compliance with codes.
4.1.3 Guides In order to build and operate a system/facility compliance with code requirements, safety guides provide recommendations and guidance on compliance with these requirements in respect of the safety case. Guides can be of two types: safety guides and regulatory guides. Safety Guides detail out the methods, procedures or provisions to meet the safety requirements which are mandated by safety standards and code. In other words, safety guides can be seen as supporting and explanatory documents to safety standards and codes, which are meant for user guidance. Safety guides present wellestablished and best practices to help stakeholders striving to achieve high levels of safety. Regulatory Guides provide guidance on how to comply with safety requirements by implementing specific regulations in line with the law of the land. They also provide techniques used by the regulator in evaluating (i) specific emergency conditions or postulated accidents and (ii) the relevant data, for reviewing applications before granting licenses.
146
4 Complying with Standards and Guides
4.2 Safety Classification/Categorization The classification of safety-critical systems based on the severity of risk to human beings and the environment, which can be caused by its failure to perform the intended safety functions, is essential. This forms the basis of the applicability of the provisions mandated in codes, regulatory guides and standards in developing a system belonging to a particular safety class. In this context, note that safety functions are assigned a category, and a class is assigned to systems performing safety functions. For example, in the event of any safety parameter going beyond the specified limit, the reactor protection system (RPS) drops down the safety rods to make the nuclear reactor sub-critical. This is a category A safety function, and the RPS system is a Class 1 (as defined in IEC 61226 [122]) or Class IA (as defined in AERB SG D-1 [52]). Various standards have their own nomenclature. But, the underlying principle of classification remains the same—the degree of risk to human life. Let us discuss the safety classifications specified by various domain-specific standards along with the umbrella standard IEC 61508.
4.2.1 IEC 61508 Functional Safety Classification A safety integrity level (SIL) is assigned to a system performing safety functions based on the risk assessment. Accordingly, four SIL levels are identified by IEC 61508-1 [129]. A safety instrumentation and control (I&C) system consists of sensors, logic solvers/controllers and actuators. All these sub-systems hardware/software, in the loop, must meet the required SIL level. As introduced in Sect. 1.5.2.1, SIL levels are defined in terms of the probability of failure on demand (PFD). Note that the PFD is a dependability attribute, which depends on the system reliability, redundancy, health diagnostics and maintainability. However, for continuous operation, failure rate, quantified as probability of failure per hour (PFH), is more relevant. Table 4.1 presents the SIL levels along with the corresponding PFD and PFH. Table 4.1 SIL levels with PFD and PFH
Integrity level SIL 4 SIL 3 SIL 2 SIL 1
PFD −5
.10
−4 .10 −3
.10
−2 .10
PFH to .10−4 to .10−3 to .10−2 to .10−1
−9
.10
−8 .10 −7
.10
−6 .10
to .10−8 to .10−7 to .10−6 to .10−5
4.2 Safety Classification/Categorization
147
Table 4.2 IEC-61226: safety classification in NPP
IEC61226
Safety function Safety system
Systems important to safety Category A Category B Class 1 Class 2
Systems not important to safety Category C Class 3
Nonclassified
Table 4.3 Country-specific safety system classification in NPP Country Canada France India USA
Systems important to safety Category 1 Category 2 Class F1A Class F1B Class IA Class IB Safety related –
Category 3 Class F2 Class IC
Systems not important to safety Category 4 Non-classified Not important to nuclear safety (NINS) Non-safety related
4.2.2 Safety Classification in NPP IEC 61226 is the standard that defines the safety levels in the domain of nuclear power plants as introduced in Sect. 1.5.2.2. However, IEC 61226 does not specify the PFD values against safety classes/levels. But, the correlation of safety class of I&C systems in NPP and the SIL can be established based on the standard practices and the real-world implementations in various nuclear power plants. Such a correlation is presented in Sect. 4.2.4 of this chapter. According to IEC 61226, instrumentation and control (I&C) systems, which perform category A safety functions, are classified as Class 1. Correspondingly, I&C systems performing category B and category C safety functions are classified as Class 2 and Class 3, respectively, as shown in Table 4.2. However, individual countries use their own nomenclature as shown in Table 4.3. It can be observed from the table that USNRC1 has only two classifications. It classifies systems important to safety as safety related and the rest as non-safety related, which is in contrast with the classification by other countries.
4.2.3 Avionics DO-178C [21] classifies avionics computer-based systems into five levels based on the potential consequence of its failure, as shown in Table 4.4.
1 USNRC and IEEE classify functions and not systems. They do not have separate classifications for systems important to safety.
148
4 Complying with Standards and Guides
Table 4.4 DO-178C safety classification Safety class Level A Level B
Level C
Level D Level E
Description Catastrophic: Failure can cause aircraft crash and fatal injuries to almost all occupants Hazardous: Failure has a large negative impact on safety and performance, results in higher workloads or stress reducing the ability of the crew and thus preventing safe operation of the flight and causes serious or fatal injuries among occupants Major: Failure causes significant reduction in safety and significant increase in the workload impairing crew efficiency and may lead to occupants’ discomfort or cause possible injuries Minor: Failure results in reduced safety margins but, within the capabilities of the crew, may cause inconvenience to the occupants No effect: Failure does not affect the safety of the aircraft at all
Table 4.5 Comparison: the safety classification of various standards and mapping of the corresponding PFDs and PFHs IEC-61508 SIL 4
IEC-61226 A
SIL 3 SIL 2 SIL 1 – –
B C – – –
DO-178C A B B C D E
ISO-26262 ASIL D ASIL C, B – – – –
PFD
PFH
−5
to .10−4
−4
to .10−3 to .10−2 to .10−1
.10
.10
−3 .10 −2
.10
– –
−9
to .10−8
−7
to .10−6 to .10−7 to .10−6
.10
.10
−8 .10 −5
.10
−5
.≥10
–
4.2.4 Safety Classification Under Various Standards In order to offer a global view of safety classification, a comparison of various classifications under different guides and standards is presented in Table 4.5.
4.3 Codes, Regulatory Guides and Standards for CBS in NPP There are many standards in the nuclear domain. A number of standards originated from various national and international regulatory authorities as well as industrial bodies, which are often overlapping and complementary in some cases. For the use of CBS in I&C systems performing safety functions in nuclear power plants (NPP), the International Atomic Energy Agency (IAEA) has developed the regulatory guide IAEA SSG-39—“IAEA Safety Guide, Design of Instrumentation and Control Systems for Nuclear Power Plants” [27]. While countries like the UK, Sweden
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
149
and India follow the safety guide IAEA SSG-39, many countries follow additional guides, which include country-specific guides like USNRC RG 1.152 (followed by countries other than the USA also); AERB SG D-25 (India); T/AST/046, Technical Assessment Guide “Computer-Based Systems” (UK); UNESA CEN-6, “Guía para la implementación de sistemas digitales en centrales nucleares” (Spain); etc. These guides led to the development of industry standards by IEEE and IEC, which are endorsed by regulatory authorities of various countries as well. For example, IEEE Std. 7-4.3.2—“IEEE Criteria for Digital Computers in Safety Systems of Nuclear Power Generating Stations” [137]—is endorsed by USNRC Regulatory Guide 1.152. Another example is the endorsement of IEC Std. 60880 and IEC 62138 by the Regulatory Guide AERB SG D-25—“Computer-Based Systems of Pressurised Heavy Water Reactors”. The evaluation of safety-critical I&C system cannot be done without considering system safety. Therefore, in developing I&C systems, domain-specific industry standards, e.g. IEEE Std. 603, “IEEE Standard Criteria for Safety Systems for Nuclear Power Generating Stations” [136], and IEC 61513, “Nuclear power plants—Instrumentation and control important to safety— General requirements for systems” [126], are to be followed in the nuclear domain. The applicable guides and standards, which are most widely used in the nuclear industries, are summarized in Table 4.6.
Safety-critical system developers need to arrive at a set of standards to be followed as it may require customization in order to meet the requirements of country-specific guides.
In addition to the general guides and standards mentioned in Table 4.6, various industry standards are followed during the different stages of the development of the software and the system, which include software quality assurance (SQA) and the associated verification and validation (V&V). A collection of industry standards has been developed by both IEEE and IEC in this domain, and large sections of overlaps can be found in their contents, which is obvious. But, with IEEE and IEC being two different bodies, the sub-topics covered by the corresponding standards produced by them, in many cases, are significantly different. In this book, we will mainly focus on (i) the regulatory guidelines laid down by IAEA, USNRC and AREB and (ii) avionics standards for reasons of their wide applicability across the countries. We will also discuss the relevant IEEE and IEC standards, which specify means to meet the recommendations of the guidelines. It is interesting to note that there is a fundamental difference between the avionics standard DO-178C and the rest of the industry standards in what they prescribe to achieve the common goal of developing dependable computer-based systems for safety applications.
150
4 Complying with Standards and Guides
Table 4.6 Applicable standards and guides for CBS in NPP Item Guide for CBS
USNRC USNRC RG 1.152
System safety classification Endorsed industry standard
USNRC RG 1.201 IEEE 7-4.3.2
IAEA IAEA NS-G-1.1 IAEA SSG-39 IAEA Technical Reports Series No. 367 IAEA SSG-30, IEC 61226 IEC 60880
Software development life cycle Software testing
IEEE 12207
IAEA SSG-39
USNRC RG 1.169
IAEA Technical Reports Series No. 367
AERB SG D-25
IAEA Technical Report Series No. 384
AERB SG D-25
IAEA Nuclear Security Series No. 33-T
AERB SG D-25
Verification and validation
USNRC RG 1.170 USNRC RG 1.168
Security
IEEE 1012 USNRC RG 5.71
AERB AERB SG D-25
AERB SG D-1, IEC 61226 IEC 60880 IEC 62138 AERB SG D-25
While DO-178C identifies and defines a set of objectives pertaining to the software development processes, the corresponding IEEE and IEC standards specify the means to follow during the process of software development for safety systems.
We will discuss the following codes and guides applicable for the development of safety systems for NPPs: (i) IAEA safety guides (ii) A few country-specific codes and guides (a) United States Nuclear Regulatory Commission (USNRC) (also followed by a number of countries other than the USA) (b) Atomic Energy Regulatory Board (AERB), India
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
151
4.3.1 IAEA Safety Guides The International Atomic Energy Agency (IAEA) is a centre for cooperation in the nuclear field and seeks to promote the safe, secure and peaceful use of nuclear technologies. Thus, IAEA safety guides are applicable to all the NPPs across the world. In this section, we will discuss the IAEA safety guide for the development of software for computer-based systems performing safety functions in nuclear power plants.
4.3.1.1
IAEA NS-G-1.1: Software for Computer-Based Systems Important to Safety in Nuclear Power Plants
IAEA NS-G-1.1 [15] provides guidance on generating evidences for demonstration of the safety and reliability of the software for its use in a computer-based safety system (CBS) performing safety functions. This standard emphasizes on the preparation of documents for demonstration of the safety and reliability of CBS and details out the technical considerations in their development. The following are some of the important requirements for the management of safety. Simplicity in Design The software of safety systems should contain only what is necessary to perform the safety functions. Functions belonging to lower safety category should be assigned to other systems. To the extent possible, a simpler alternative should be chosen to implement a function. The hardware of the system should be selected with sufficient capacity and performance in order to avoid unnecessary complexity in software. Simplicity in interfaces, design and coding should be preferred in the software development for computer-based safety systems. Safety Culture An environment of safety consciousness should be established in the organization developing safety systems. Sound procedures should be developed for ensuring strict adherence to safety requirements. Safety Classification A safety classification scheme should be adopted to define the safety significance of a system, which will form the basis for the level of rigour applied during the system development process by designers, operators and regulatory authorities with the aim of providing a balance between risk and development efforts. The industry standard IEC 61226 [122] has been endorsed by the IAEA for safety classification. Defence in Depth Defence in depth should be applied in the development of the CBS and its software. It is mandated that in addition to computer-based protection systems, a purely hardwired electronics-based (or electro-mechanical relay bases) system is provided as a necessary backup in the event of common cause failure of computer-based systems. This is an example of defence in depth. Common Cause Failure Common cause failure (CCF) of redundant trains of computer-based safety systems is one of the major concerns because of identical
152
4 Complying with Standards and Guides
software in all redundant trains. Therefore, it is important to ensure and demonstrate the correctness of software by following a well-defined and well-documented process. To avoid common cause failures of CBS, functional and equipment diversity should be incorporated into the system architecture. Diversity (Equipment and Functional) The application of functional diversity and equipment diversity in the design of CBS is effective in enhancing system dependability as it reduces the potential for common cause failures. For example, redundant controllers can be designed and developed using diverse technologies— one controller can be general-purpose processor-based and the other can be FPGA-based. In addition, diversity in methods, programming languages, software tools and personnel also plays an important role in achieving high dependability of CBS. Diversity (Software) Software diversity is achieved by developing multiple variants of software for implementing a safety function. The use of diverse software may provide some level of protection against common cause software errors arising due to error in development tools, library functions, operating systems, etc. Diversity in software will be effective against CCF if developed by different and independent development teams, and diversity is employed in every possible area such as the use of (i) diverse design methodologies (object-oriented design vs structural design), (ii) different programming languages and compilers and (iii) different operating systems. Fault Detection and Fault Tolerance Safety systems should be designed with self-supervision capability to detect their own faults, which include (i) faults in the input, output and processing hardware modules, (ii) communication failure and (iii) software failure. In addition, the system architecture should provide fault tolerance so that the system can continue its operation without compromising on safety even in the presence of limited failures. For example, a system with four redundant trains with two-out-of-four (2oo4) voting logic can tolerate the failure of two trains as already discussed in Chap. 2. Fail-Safe Design Failure can prevent a CBS in carrying out its safety functions.
“If such a situation occurs, the computer system should default to a ‘safe state’, even in the event of a loss of electrical power to the computer system and its output actuators.” Source: IAEA Safety Guide NS-G-1.1
If a system detects a severe fault, which may prevent it from carrying out the safety function, the system should automatically initiate action to put the plant into a safe state. For example, the fail-safe design is applied to the nuclear reactor trip system (RTS), where safety shut-off rods are kept held at the top of the reactor core using a magnetic clutch. In case of failure of the RTS controller or its power
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
153
supply, default/predefined outputs ensure that the magnetic clutch gets de-energized and the shut-off rods drop under gravity, which takes the reactor to its safe state by making the reactor sub-critical quickly. It may be noted that a safe state cannot be determined straightforward in all cases and it may vary with the situation in the plant. In some cases, the fail-safe action of the CBS is only to generate an alarm to the operator for taking safe action manually, if feasible. For example, if the landing gear control system in an aircraft fails, an alarm is generated, and all efforts are made by the crew in keeping the plane flying and carrying out the standard operating procedures (SOP) under emergency; otherwise, it can be catastrophic. Wherever feasible, on failure, the system outputs are driven to a predefined safe state. This can be achieved by the detection of failure using an independent watchdog timer, which is re-triggered periodically by the software. A watchdog timeout indicates failure of either the software or processor. Separation of Safety Function from Non-safety Function A safety-critical system should be dedicated to perform only safety functions. If it becomes inevitable (due to space constraints or resource constraints) to assign some of the nonsafety functions to the same processor/controller performing safety functions, the qualification requirements of the non-safety functions shall be raised to the same level/class of safety criticality. In addition, an analysis should be performed to justify that the safety functions are not compromised due to the non-safety functions. Reliability Safety systems should be designed to achieve the target reliability or higher than the same. Higher reliability can be achieved by the use of redundancy, diversity, fail-safe design, independence,2 protection against CCF and testability (self-supervision as well as surveillance test) features. In this context, note that attempts are also made to evaluate software reliability. For this purpose, statistical random testing, based on expected operational profiles, can be used effectively. However, software reliability is a specialized topic and beyond the scope of this book. Maintainability Provisions should be made for the detection and localization of failure in the design of a safety system so that it can be repaired and put into operation within a short time interval. This will reduce system downtime and increase system availability. Testability in Operation Provisions should be made for self-diagnostic tests as well as periodic surveillance tests for the testing of a system’s functional capabilities in order to ensure safe operation of the system. The capability for online testing of safety systems along with automatic self-tests improves the system maintainability. Security Provisions should be made to protect computer-based systems against physical attack, cyberattack, intentional and non-intentional intrusion and viruses. It is preferred not to connect safety systems to external/public networks. 2 Independence is necessary (i) between safety and process control systems as well as (ii) among redundant trains of a safety system.
154
4 Complying with Standards and Guides
Human Factor Consideration Human-machine interfaces (HMI) should be designed to provide sufficient (but not an overwhelming) information in a structured way. If an operator command is required for any safety action, sufficient time should be provided for analysing the information provided, deriving manual actions and finally executing them. The possibility of human error in plant operation should also be taken into account in the design. For example, in the case of avionics systems, where the “plant” cannot be brought into a safe state on failure, it is the crew members who are expected to either bring back the aircraft to its safe state or carry out emergency operating procedures following any accidental condition. Therefore, the human-operator interface and the operating environment, if not designed considering human factors, can cause severe stress to the crew members, and it may be disastrous. Quality Assurance Quality assurance (QA) is to be applied throughout all the developmental activities to ensure a high level of confidence in declaring that the product meets the specified requirements.
4.3.2 USNRC Codes and Regulatory Guides (RG) Let us have a look into the code establishing the general design criteria (GDC) and other regulatory guides (RG) under USNRC for the construction and licensing of water-cooled nuclear power plants. These GDC and regulations form the basis of designing CBS for nuclear applications in the USA and a number of other countries.
4.3.2.1
Codes
The following USNRC codes are to be followed. USNRC 10 CFR Appendix A to Part 50 Appendix A to Part 50 of USNRC 10 CFR [204] provides the general design criteria (GDC) for nuclear power plants (NPPs). These GDC establish minimum requirements for the construction and licensing of water-cooled NPPs. The highlights of this code are as follows: – The GDC are the general design guidelines targeted to the safety systems of NPP with an aim to ensure their reliability, quality, testability, independence and safe operation during postulated failures. – Safety systems are to be designed with an extremely high probability of accomplishing their safety functions, whenever required. This calls for system design with high reliability and availability. – In order to meet the reliability goal, it is to be ensured that a protection system meets the single failure criterion by means of applying appropriate redundancy, diversity and independence.
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
155
– Furthermore, the protection system is to be designed with test features permitting self-diagnostics and periodic testing of redundant trains independently. USNRC 10 CFR Part 73.54 This code provides regulation for the protection of digital computers, communication systems and networks. Safety systems are required to protect from cyberattacks, which may adversely impact the integrity and availability of the system. For protecting against such cyberattacks, 10 CFR73.54 [205] provides the following broad guidelines: – Analyse and identify the critical assets and the security threats to the safety systems. – Define and implement physical security and cybersecurity programs for the protection of critical assets. USNRC 10 CFR 50.55a(h) Section (h) of USNRC 10 CFR 50.55a [8] makes it mandatory for protection system(s) of NPP to meet the standard design criteria set by IEEE Std. 603 [136].
4.3.2.2
Regulatory Guides
The USNRC regulatory guides, which are relevant in the development of computerbased safety systems, include the following: USNRC RG 1.152 Criteria for Use of Computers in Safety Systems of Nuclear Power Plants [203]. This guide provides USNRC’s regulations that should be adhered to while using digital computers in the safety systems of NPP. It endorses IEEE Std. 603 [136], which provides standard design criteria for safety systems of NPP. IEEE Std. 74.3.2 [137] complements IEEE Std. 603 by specifying additional requirements for the use of digital computers in the safety systems of NPP. Compliance with these requirements ensures high dependability of CBS. NUREG/CR-6303 Method for Performing Diversity and Defence-in-Depth Analyses of Reactor Protection Systems [10]. This regulatory guide describes a method for analysing computer-based reactor protection systems against design vulnerabilities, which may lead to common cause failure. Diversity and defence-in-depth (D3) analysis should be performed for CBS to determine defence against common cause failures based on the multiple “echelons of defence” and different types of diversities such as (i) human diversity, different designers and operators; (ii) diversity in design, the use of generalpurpose CPU and FPGA (field-programmable gate array) for controller design; (iii) functional diversity, the use of different principles to implement the same requirement, e.g. RTD (resistance temperature detector) and thermocouple for temperature measurement; (iv) diversity in process signals, e.g. the use of both pressure and temperature signals to detect the high power condition in a nuclear
156
4 Complying with Standards and Guides
reactor; (v) software diversity; and (vi) equipment diversity. USNRC Branch Technical Position BTP-7-19 [44]: “Guidance for Evaluation of Defence in Depth and Diversity to Address Common-Cause Failure due to Latent Design Defects in Digital Safety Systems” is another important reference document to evaluate D3 provisions against CCF of CBS. USNRC RG 5.71 Cybersecurity Programs for Nuclear Facilities [18]. This regulatory guide provides guidance on how to comply with the regulations defined by USNRC 10 CFR 73.54 for the protection of CBS against any possible cyberattack. Cyberattacks may result in (i) loss of confidentiality and integrity of data; (ii) denial of access to systems, services and data; and most importantly (iii) loss of safety functions. This guide specifies how to establish and implement a cybersecurity plan for protecting digital infrastructure involving computer-based control systems, data servers, networks and communication systems associated with the safety functions from such cyberattacks.
4.3.2.3
Endorsement of Standards by USNRC RGs
It is important to note that USNRC regulatory guides (RG) and the Code of Federal Regulations (CFR) endorse the use of (i) IEEE Std. 603 to meet the prescribed criteria for safety systems of NPP in general and (ii) IEEE Std. 7-4.3.2 standard to meet the prescribed guidelines for the development of computer-based safety systems. IEEE Std. 7-4.3.2 “Standard Criteria for Digital Computers in Safety Systems of Nuclear Power Generating Stations” has been endorsed by USNRC RG 1.152. This standard specifies additional software-specific requirements to complement the standard design criteria set by IEEE Std. 603 for CBS performing safety functions.
4.3.3 AERB Codes and Guides In India, radiation safety of the public and of occupational workers during the development, control and use of atomic energy for peaceful purposes is being ensured through the enforcement of safety provisions in the Atomic Energy Act, 1962, and the rules framed thereunder. Therefore, all the activities related to civilian utilization of nuclear power, radioactive sources and the associated services are governed by these rules and regulations. AERB is the regulatory body in India, which is entrusted to develop and issue safety codes and guides for nuclear and radiation facilities and other related activities covered by the Atomic Energy Act, 1962. The safety code AERB/NPP-PHWR/SC/D [51] provides the design requirements for the structures, systems and components important to nuclear safety. These general design requirements are extended and elaborated in AERB safety guide
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
157
AERB/NPP-PHWR/SG/D-10 [53] for the safety systems of pressurized heavywater reactors (PHWR). The specific design requirements for the instrumentation and control (I&C) systems performing functions involving safety are outlined in the AERB safety guide AERB/NPP-PHWR/SG/D-20 [54]. However, AERB/NPP-PHWR/SG/D-25 [50] is the document which is applicable specifically for the design and development of CBS for its use in NPP, which we refer to as AERB SG D-25 or simply SG D-25 for brevity. 4.3.3.1
AERB SG D-25
AERB SG D-25 establishes the regulatory requirements for computer-based systems (CBS) belonging to different safety classes and lays down a standard regulatory review process (SRRP) for the qualification of newly developed software as well as pre-developed software (PDS). To this end, it utilizes the international good practices (in line with IAEA NUSS—Nuclear Safety Standards Series) in its recommendations, which forms the basis for the acceptance of computer-based systems. The guide also recognizes the difficulty in the quantitative estimation of software reliability. Therefore, it places a high level of importance to the development process that helps demonstrate the qualitative attributes of a software. In a nutshell, SG D-25 recommends the following for the development of CBS for NPP: (i) Categorization of safety class as per AERB SG D-1 [52] (ii) Following a well-defined software development life cycle process (iii) Building a safety case for the certification/acceptance of a computer-based system commensurate with its safety class (iv) Qualification of PDS before use
4.3.4 Safety Standards for the Development of CBS In this section, we will discuss the IEEE and IEC standards, which specify the means to meet the requirements of codes and RGs. 4.3.4.1
IEEE Std. 603: IEEE Standard Criteria for Safety Systems for Nuclear Power Generating Stations
This standard establishes minimum considerations, called standard criteria, for designing safety systems to be used in NPP. This standard provides guidance on establishing the design basis for a safety system so that adequacy of the safety system can be determined based on these events. This standard also establishes standard design criteria, which details out the design consideration for compliance to the requirements of USNRC 10CFR Part 50, Appendix A.
158
4 Complying with Standards and Guides
The safety criteria specified by IEEE Std. 603 have been further extended by IEEE Std. 7-4.3.2 for the programmable digital devices performing safety functions.
4.3.4.2
IEEE Std. 7-4.3.2: IEEE Standard Criteria for Programmable Digital Devices in Safety Systems for Nuclear Power Generating Stations, 2016
IEEE 7-4.3.2 standard specifies additional requirements for computer-based safety systems to complement the criteria and requirements of IEEE Std. 603, which should be used in conjunction with IEEE Std. 603 for the design and development of a programmable digital device (PDD) performing safety functions. It specifies the criteria for ensuring software quality with use of a well-defined software development life cycle process, use of software tools, verification and validation as well as qualification of existing commercial PDDs. Single Failure Criterion A single detectable failure, in association with all identifiable but non-detectable failures and all failures caused by the single failure within the safety system, must not prevent the initiation and accomplishment of a safety function. It is also to be ensured that a single failure within safety system does not result in spurious actuation also. The functionality of CBS should be distributed such that a single programmable digital device (PDD) malfunction or software error shall not result in spurious actuation of a safety system. Common Cause Failure Criterion A safety system must be able to perform its functionality in the presence of a single common cause failure (CCF). A major concern with the use of digital systems for safety functions is the CCF due to design error in the software, which may result in loss of safety function in all redundant trains at the same time. Diversity is recommended as a defence against common cause failures. Manual operator actions are also effective against CCF when the principles of diversity are applied to the design of monitoring and control system of a plant using (i) a combination of digital, analog electronics-based and LED lamps for display/indication and (ii) diverse controls (hardwired push buttons or switches for initiating safety functions to back up manual controls through digital computers). Qualification of Tools If software tools are used for developing the PDD, these tools should be qualified based on the type of tool, the tasks performed by the tool and the outputs of the tools. For example, the qualification requirement for a compiler is more stringent than the qualification of a static analysis tool. This is because the output of a compiler is used in a system at run-time and any error due to the compiler will result in the incorrect implementation of safety functions.
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
159
Verification and Validation The standard recommends for independent verification and validation (IV&V) and adopts IEEE Std. 1012 for the IV&V process, activity and task. The IV&V of CBS should be carried out throughout the development, which includes the assessment of products by means of review, inspection, evaluation, analysis and testing to confirm that (i) the output of each SDLC phase and its activity conform to the requirements specified in the previous phase and (ii) the system meets the user requirements as intended. Software Quality The standard recommends the use of software quality metrics throughout the SDLC, which facilitates the quantitative assessment of quality and thus compliance with regulatory requirements. Software Configuration Management Software configuration management should be performed, which includes the identification of configuration items (CIs), change control for any modification in software and status accounting for software release. IEEE Std. 828 is endorsed for preparing the software configuration management plan (SCMP). Any change to the software should be analysed, documented and approved in line with SCMP. Testability The integrity of computer-based safety systems should be continuously checked by self-diagnostic features, which facilitate continuous health monitoring of hardware components such as CPU, memory, inputs, outputs, communication module, etc. Self-diagnostic features should not affect the safety system operations and independence among redundant trains. In addition, surveillance test features should be provided to allow the periodic testing of specifically those functions which are not automatically tested through self-diagnostics. Examples of such functions are manual actions and operation of actuators. Integrity of Software In order to ensure integrity, the development of software for the computer-based safety system must be carried out following the IEC 60880 guidelines. In addition, the adopted software life cycle process should conform to IEEE Std. 12207. A CBS should be designed to survive and perform its safety functions even in the presence of limited failures like I/O processing failure, roundoff error, communication failure, etc. Self-diagnostic features should be provided to detect system faults and failures (as many as feasible) and notify the user in a timely manner. It should also be ensured that self-diagnostic features do not adversely affect the performance of safety functions or cause spurious actuation of any safety function. Independence: Physical Separation, Isolation and Functional Independence Physical separation and electrical isolation should be provided among the redundant trains of a safety system as well as between a safety system and other safety/safetyrelated/non-safety systems. Provisions should be made to prevent fault propagation among redundant systems. A buffering function can be provided between the communications interface and the safety function so that any faults and failures of other redundant trains do not propagate to the safety function. Also, the redundant trains of the safety system should be functionally independent from each other and
160
4 Complying with Standards and Guides
Table 4.7 Mapping of GDC to the IEEE 7-4.3.2 clauses USNRC GDC GDC 19: Control room GDC 20: Protection system functions GDC 21 : Protection system reliability and testability
GDC 22: Protection system independence GDC23: Protection system failure modes GDC 24: Separation of protection and control systems
IEEE 7-4.3.2 safety criteria Clause 5.8: Information displays Clause 4: Safety system design basis Clause 5.1: Single failure criterion Clause 5.5.3 Fault detection and self-diagnostics Clause 5.7: 13.1.7 Testability Clause 5.6.1 : Independence between redundant portions of a safety system Clause 5.5.1: Design for computer integrity Clause 5.6: Independence between safety systems and other systems
Table 4.8 Mapping of IAEA NS-G 1.1 recommendations to the IEEE 7-4.3.2 clauses IAEA NS-G 1.1 recommendation (6.7–6.11) Separation of safety aspects from non-safety aspects 6.12: Redundancy 6.13: Channelization and voting logic (6.15–6.16) Diversity 6.17: Fault detection (6.21–6.22) Testability in operation (6.21–6.22) System process monitoring and equipment monitoring 10: Verification and analysis 12: Validation of computer system
IEEE 7-4.3.2 safety criteria 5.6.3: Independence between safety systems and other systems 5.1: Single failure criterion 5.6.4.2: Communications independence 5.16: Common cause failure criterion 5.5.3 Fault detection and self-diagnostics 5.5.2 Design for test and calibration 5.8 Information displays 5.3.3: Verification and validation
from systems of a lower safety class. If any safety function uses the inputs received from other redundant systems, then provisions, like use of voting logic, should be made to ensure the validity and correctness of the received inputs. System Security Provisions should be made for the protection of the safety system from malicious security threats. Locating the safety systems in the vital area of NPP and providing physical security, access controls, etc. are recommended to meet this requirement. Note that the GDCs applicable for CBS under USNRC 10-CFR Part 50, Appendix A and IAEA NS-G 1.1 recommendations can be mapped to IEEE Std. 7-4.3.2 as presented in Tables 4.7 and 4.8, respectively.
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
4.3.4.3
161
Working Standards: Application of Standard Criteria of IEEE 7-4.3.2
The working standards corresponding to each standard criterion specified in IEEE 74.3.2 are presented here. These working standards explain the applicable criteria and provide guidance to apply the same. Important recommendations of these standards are presented in Table 4.9.
Table 4.9 Working standards corresponding to the IEEE 7-4.3.2 criteria IEEE 7-4.3.2 standard criteria Single failure criterion
Applicable standards and guides IEEE 379
Important recommendations in meeting the corresponding criterion • Provide redundant system to perform safety functions • Independence among redundant systems • Detection of all failures using test features • Protection against cascaded failures by providing functional independence (to the extent possible) and physical separation among different systems. • Qualification of the system to make it immune from those design basis events, which may necessitate the need for safety functions implemented by the system • Protection against propagation of failure to/from other systems (Refer to Sects. 4.4.1.1, 4.4.1.3 and 4.4.1.4 for the application of these recommendations)
Common cause failure criterion
IEC 62340
• • • • • • • • •
Defence in depth Functional diversity Independence and physical separation of systems Environmental robustness Tolerance against postulated latent software faults Protection against failure propagation via communication paths Prohibition of simultaneous maintenance activities in multiple redundant trains Manual operator action for backup to safety functions disabled by postulated CCF Diverse automation as backup to safety functions disabled by a postulated CCF
(Refer to Sects. 4.4.1.3 and 4.4.1.8 for the application of these recommendations) (continued)
162
4 Complying with Standards and Guides
Table 4.9 (continued) IEEE 7-4.3.2 standard criteria Quality
Applicable standards and guides IEC 60880 IEEE Std. 730
IEEE 1061
IEEE 1012 IEEE 1042 IEEE 828
Equipment qualification
IEEE/IEC 60780-323 IEC 62342 IEC 60880
System integrity
IEC 61508 IEC 60880
Important recommendations in meeting the corresponding criterion • Software quality assurance plan (SQAP) based on the guidelines of IEC Std. 60880 and IEEE Std. 730 for software development, modification or acceptance. (Refer to Sect. 3.1.2) • Qualification and configuration management of software tools used in software development (Recommended by IEC 60880) Use of software quality metrics as per IEEE Std. 1061 throughout the software life cycle to assess and improve software performance Independent V&V of software in accordance with IEEE Std. 1012. (Refer to Sect. 3.6) Software configuration management for establishing baselines, change control and version control in accordance with IEEE Std. 1042 and IEEE Std. 828. (Refer to Sects. 3.1.4 and 3.8) Type testing (testing of sample equipment) Condition monitoring to determine if an equipment, which is being used, is suitable for further service Testing of CPU, memory, input and output modules, diagnostics functions, human-system interfaces and communication network to ensure that the performance requirements are being met in the presence of specified environmental conditions The recommendations by these two standards developed by two different bodies are summarized as follows: • Derive software requirements from safety system requirements to specify functions, different modes of behaviour, interfaces, performance, constraints, etc. • Self-supervision for supervising the hardware and software behaviour • Periodic surveillance testing for those components which are not covered adequately by self-supervision • Design followed by the implementation of new software in high-level languages or application-oriented languages (a formal model-based design is discussed at length in Sect. 6.1) • Configuration of pre-developed software (continued)
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
163
Table 4.9 (continued) IEEE 7-4.3.2 standard criteria Independence
Applicable standards and guides IEEE 384 IEC 60709
Important recommendations in meeting the corresponding criterion The recommendations by these two standards developed by two different bodies are summarized as follows: • Buffering function to ensure that faults and failures on the communication originating in one train do not propagate to the other trains • No communication among redundant safety trains unless that communication supports or enhances the performance of the safety function • Data exchanged between redundant safety trains should be validated before use
Capability for test and calibration
IEC 60671
• Testing of actuation logic using simulated signals • Verification of actuation set points • Self-supervision to confirm the integrity of the stored program, address and data busses • Use of watchdog timers to detect cessation of program execution • Software performing self-supervision functions shall be assigned to the same safety class category as the system it is testing (Refer to Sect. 4.4.1.4 for the application of these recommendations)
IEEE 338
• Testing of entire logic system by a series of sequential, overlapping or total system tests • Independent testing of redundant portion of the safety system • Design tests to generate data necessary for the objective assessment of performance and availability of system • Online monitoring of system’s health status • Response time verification test (Refer to Sect. 4.4.1.4 for the application of these recommendations) (continued)
164
4 Complying with Standards and Guides
Table 4.9 (continued) IEEE 7-4.3.2 standard criteria Displays for manually controlled actions
Applicable standards and guides IEC 60964
Control of access
IEC 62859
Use of commercial digital equipment
IEC 60880
4.3.4.4
Important recommendations in meeting the corresponding criterion • Functional design of human-machine interface and its V&V • Application of human factors engineering principles • Physical security of equipment • Two-factor access controls to prevent any unauthorized access • One-way communication from safety to non-safety system • Consideration of potential security vulnerabilities during each SDLC phase • Provisions for access control through authentication for any software update and change in configurations/parameter data items • Provisions for logging of all user actions and its audit Dedicated team should perform evaluations and tests to confirm that the commercial off-the-shelf (COTS) item meets the design requirements and performs its intended safety function
Relationship of IEC 61508 with Domain-Specific Safety Standards
IEC 61508 is an umbrella standard for systems comprised of electrical, electronic and/or programmable electronic (E/E/PE) elements that are used to perform safety functions. Part 3 of this standard provides a generic approach for safety life cycle activities, recommendation for software design, analysis and testing. IEC 615083 [130] provides requirements for achieving safety integrity for both embedded and application software in terms of combination of fault avoidance (quality assurance) and fault tolerance provisions. IEC 61508 provides the core requirements for safety system development (hardware and software) and provides a framework for domain-specific standards, such as IEC 61513 (nuclear applications) [126], DO-178C (avionics) [21] and ISO 26262 (automotive) [30–39]. These domain-specific standards are further complemented by other standards for the realization of safety requirements. IEC 60880 [121] is derived from IEC 61513 and provides specialized requirements for safety-critical software development. These specialized requirements are the refinements of IEC 61513 specifications presented in the form of additional
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
165
requirements for requirements specification, design, V&V, defence against CCF, use of software tools for developing software, use of pre-developed software, etc. IEC Standard 62138—“Software aspects for computer-based systems performing category B or C functions” [127]—provides requirements for the software of computer-based I&C systems of safety Class 2 or Class 3. The relationship between IEC 61508 and domain-specific safety standards is shown in Fig. 4.1.
4.3.4.5
IEC Standard 60880: Software Aspects for Computer-Based Systems Performing Category A Functions
IEC 61513, which deals with the system aspects of high-integrity computer-based I&C systems for its use in performing safety functions in nuclear power plants, extensively refers to IEC Std. 60880. This standard details out the process and requirements for developing highly reliable software based on the best available practices. – It refines the system safety life cycle defined by IEC 61513 and introduces concepts of the software safety life cycle for developing computer-based safety systems. – It provides the requirements for software project management, software quality assurance and software verification. – This standard provides recommendations for requirements specification, design, implementation, installation, operation and maintenance of software along with documentation. – It provides the requirements for defence against software error leading to CCF calling for diversity and defence in depth. – This standard strongly recommends the use of computer-aided software engineering (CASE) tools for the development of software and provides guidelines for qualifying these tools.
4.3.4.6
IEC Standard 62138: Software Aspects for Computer-Based Systems Performing Category B or C Functions
This standard provides the requirements for the software of computer-based I&C systems of safety classes 2 or 3, which are designed to perform category B or C functions. This standard provides the nuclear domain-specific interpretation of IEC 61508-3.
4 Complying with Standards and Guides
Fig. 4.1 Evolution of safety standards
166
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
167
4.3.5 Standards for Software Development Process 4.3.5.1
ISO/IEC/IEEE 12207: Systems and Software Engineering—Software Life Cycle Processes
ISO/IEC/IEEE Std. 12207 [28] is the specialization of “ISO/IEC Std. 15288 System Life Cycle Processes” [26] corresponding to software development. A system architectural design allocates system requirements to various components of the system, and software requirements are derived from the system requirements allocated to the software component(s). Software requirements serve as a basis for the development of particular software components. This standard defines various processes for software development, review and audit and describes the activities, tasks and outcome of each of the processes in detail. This standard is complemented by other IEEE standards, which provide the guidance for implementation, execution and documentation of these software development processes. Applicable standards for each life cycle phase are given in Table 4.10.
Table 4.10 Applicable standards for software development phases/processes Software development process Software requirements analysis Software architectural design
ISO/IEC/IEEE 42010 [45]
Software detailed design
IEEE 1016 [17]
Software construction
ISO/IEC/IEEE 29119 [41– 43, 46] IEEE 730 [25]
Software qualification Software configuration management Software verification and validation Software review User documentation
Applicable standards IEEE 830 [12]
IEEE 828 [23] IEEE 1012 [133] IEEE 1028 [134] IEEE ISO/IEC/IEEE 26512 [29]
Description Provides the content and characteristics of a software requirements specification Provides a conceptual framework and content for the architectural description of software-intensive systems Provides the content and organization of a software design description Provides a set of standards for software testing describing test process, test techniques and test documentation Provides the format and content of a software quality assurance plan Provides the content of a software configuration management plan Provides the guidance for software verification and validation activities Provides the requirements for systematic reviews and audit of software Provides documentation process from the user’s standpoint as well as the supplier’s standpoint
168
4.3.5.2
4 Complying with Standards and Guides
IEC Std. 60880: Nuclear Power Plants—Instrumentation and Control Important to Safety (Software Aspects for Computer-Based Systems Performing Category A Functions)
As discussed previously, IEC 61513 standard provides the detailed description of a system safety life cycle for the overall I&C architecture and also for individual systems. The system safety life cycle is further specialized for the development of safety-critical software in IEC 60880 standard, which is called the software safety life cycle. This standard describes the software development process and related activities in the system safety life cycle, which is based on the traditional “V”model as described in Sect. 3.1.7.4. This standard also describes other management processes, viz. software project management, software quality assurance and quality control, software configuration management and software verification and validation process, to support the development.
4.3.6 Avionics Standards 4.3.6.1
ESA Software Engineering Standards
The ESA Software Engineering Standards are more than just a set of standards developed by the European Space Agency (ESA). These standards can be described as practitioners’ guides as well, which detail out not only what is to be done but also how it is to be done. ESA PSS-05-0 is a top-level (Level-1) document, which describes the software engineering standards to be applied for the software in all ESA projects. It presents different software life cycle approaches, defines the software development life cycle phases and recommends standard practices. It categorizes the standard practices as (i) mandatory (containing shall word), (ii) recommended (containing shall word) and (iii) guidelines (containing may word). Level-2 documents are designed to provide guidance in carrying out the standard practices described in the Level-1 document (ESA PSS-05-0). A list of Level-2 documents and the important guidelines are summarized in Table 4.11. It can be observed from the table that the use of CASE tools in the development process has been recommended for all the applicable phases—from software requirements analysis to the design. Also note that ESA recommends applicable IEEE standards for individual phases of the software development life cycle (SDLC), especially for the table of contents (TOC) of the relevant documents.
– Defines what user requirements (UR) are and provides suggestions on how to capture them – Recommends the use of CASE tools for building models in order to capture UR – Recommends table of contents (TOC) for user requirements document (URD) – Suggests software management activities to be carried out during the UR definition phase
– Construction of software requirements by examining the URD and building a logical model – Suggests how to construct a logical model, which describes the high-level essential requirements – Recommends the use of CASE tools in constructing the logical model using recognized methods – Recommends the categorization of software requirements in terms of functional, performance, interface, operational, resource, verification, acceptance testing, documentation, security, portability, quality, reliability, maintainability and safety requirements – Recommends TOC for the specification of software requirements in a software requirements document (SRD)—also referred to as software requirements specification (SRS)
ESA PSS-05-03: Guide to the Software Requirements Definition Phase
– Explains the structure of the ESA document tree involving two main branches – Software project management and software development – Explains the document identification scheme
Scope and key guidelines
ESA PSS-05-02: Guide to the User Requirements Definition Phase
Document ESA PSS-05-01: Guide to the Software Engineering Standards
Table 4.11 ESA software development life cycle (Level-2) documents
(continued)
IEEE Std. 830
IEEE Std. 830
Standard endorsed for document TOC –
4.3 Codes, Regulatory Guides and Standards for CBS in NPP 169
Document ESA PSS-05-04: Guide to the Architectural Design Phase
Table 4.11 (continued)
– Provides guidelines on how to produce the architectural design by examining the SRD and building a physical model – Recommends construction of the physical model by defining software components (derived from the software requirements), along with their interfaces showing control and data flow – Recommends to start the design process by the top-down decomposition of the software into major components, which are developed by assigning software requirements (both functional and non-functional) to these components – Recommends the use of recognized methods and CASE tools for architectural design – Recommends building “experimental prototypes” in order to verify the correctness of a technical solution – Recommends TOC for the specification of architectural design in an architectural design document (ADD)—also referred to as software architectural design (SAD) document
Scope and key guidelines
Standard endorsed for document TOC IEEE Std. 1016
170 4 Complying with Standards and Guides
ESA PSS-05-05: Guide to the Detailed Design and Production Phase
– Provides guides on how to produce a detailed design document (DDD)—a comprehensive specification for implementation – Suggests how to carry out decomposition further from the architectural components so that they can be coded as software modules in the programming language of choice – Suggests cares to be taken in case of reuse of software by standardizing interfaces and enhancing portability and also avoiding making minor modifications in a library module itself, which is under reuse – Recommends adopting defensive design principles, viz. mutual suspicion (modules should be designed to remain unaffected by erroneous inputs from other modules, assuming that they are error-prone), immediate detection and redundancy – Recommends prototyping to assess feasibility of the design and to compare alternative designs to arrive at the best choice – Recommends “falsificationist” approach in testing, which encourages testers in designing test cases that capture potential failures, not passes – Recommends use of recognized methods and CASE tools for detailed design and establishes coding standard(s) for the chosen programming language(s) – Recommends TOC for the detailed design document (DDD)—also referred to as software detailed design (SDD) document – Suggests generation of inputs for the production of software user manual (SUM) in parallel with design so that SUM is produced as an output of the detailed design phase
(continued)
IEEE Std. 1016 (SDD) IEEE Std. 1063 (SUM)
4.3 Codes, Regulatory Guides and Standards for CBS in NPP 171
ESA PSS-05-07: Guide to the Operation and Maintenance Phase
Document ESA PSS-05-06: Guide to the Transfer Phase
Table 4.11 (continued)
– Recommends user training to operate the software based on the user experience and complexity or novelty of the software – Suggests how to generate software problem reports (SPRs) in case of any software problem identified during operation – Recommends problem diagnosis and analysis of the required changes based on their effect on performance, resource consumption, cohesion, coupling, complexity, consistency, portability, reliability, maintainability, safety and security before actually modifying the software – Recommends to follow the change control process defined in the SCMP for changing the software – Recommends installation, testing and validation of modified software as well as change in the documents before releasing the modified software – Suggests how to produce the “project history document” (PHD) – Suggests to carry out a review for final acceptance at the end of the warranty period
– Suggests how to build and install software – Suggests how to carry out acceptance tests to demonstrate that the software meets the user requirements specified in the URD – Suggests how to produce a software transfer document (STD)
Scope and key guidelines
–
Standard endorsed for document TOC –
172 4 Complying with Standards and Guides
– Describes software project management by assigning roles and responsibilities within an organization, systematic estimation of resources, time duration, effort and cost and provides schedule and total cost – Recommends risk management with consideration of experience, technology, maturity of supplier, planning and external factors like stability of user requirements, availability of external systems, etc. – Recommends measurement of process and product using software metrics
– Defines and explains software configuration management (SCM) and the contents of SCMP and also provides guidelines on how to do the same – Establishes a relationship between the SCM activities and the software component development and system integration. In addition, identifies and explains document as well as code configuration change control activities – Recommends the use of CASE tools for configuration management – Recommends TOC for SCMP
ESA PSS-05-08: Guide to Software Project Management
ESA PSS-05-09: Guide to Software Configuration Management
(continued)
IEEE Std. 828
IEEE Std. 1058
4.3 Codes, Regulatory Guides and Standards for CBS in NPP 173
ESA PSS-05-011: Guide to Software Quality Assurance
Document ESA PSS-05-010: Guide to Software Verification and Validation
Table 4.11 (continued)
– Defines and explains software quality assurance (SQA) and provides guidelines on how to carry out SQA activities – Identifies SQA activities for various phases of software development as well as management of software project – Recommends TOC for SQAP
– Defines and explains software verification and validation (V&V) activities and the contents of SVVP and explains the V-model of software life cycle verification approach – Recommends the use of formal proofs for the verification of correctness of software – Recommends the use of CASE tools for the static analysis of code and testing – Recommends TOC for the specification of SVVP for unit test (SVVP/UT), integration test (SVVP/IT), system test (SVVP/ST) and acceptance test (SVVP/AT)
Scope and key guidelines
IEEE Std. 730 (SQAP)
Standard endorsed for document TOC IEEE Std. 1012
174 4 Complying with Standards and Guides
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
4.3.6.2
175
DO-178C: Software Considerations in Airborne Systems and Equipment Certification
The DO-178 standard (the latest version of which is DO-178C) is used by the aviation industry for the development and certification of software. While most of the standards prescribe means to satisfy the safety objectives, DO-178C prescribes the objectives, and the means are left to the designers and developers. This makes the DO-178C standard unique, which provides guidelines in the form of a defined set of objectives pertaining to the software development life cycle processes, viz.: (a) (b) (c) (d) (e)
Planning process Development process Verification process Configuration management process Quality assurance (QA)
The standard: (i) Identifies the objectives of the software development life cycle processes mentioned above (ii) Describes the activities and design considerations to achieve these objectives Conformance to DO-178C guidelines requires evidences that these objectives have been satisfied. The objectives prescription allows flexibility in the development approaches using a large variety of technologies, including emerging ones. But the flip side is that it leaves scope for individual interpretation of these objectives to be different from the intent of the standard. However, a mapping of these objectives and the industry standards can be made to avoid this, which is presented in Sect. 5.3, where the software qualification process is discussed. This is because standards, when applied strictly, meet all their specified conditions and therefore the intent of their authors. DO-178 has evolved from 178A to 178B and to 178C in 2012. It is important to note that DO-178C (i) clarifies and removes a number of known inconsistencies in 178B and (ii) added new topics such as the parameter data item (PDI) file and the verification of PDI file [212]. Parameter data items (PDIs) are used to configure a software, or it can be a database used by the executable software. For example, (i) the set points of process parameters and (ii) the maximum number of inputs and outputs and their types are stored in a table as parameter data items. PDIs are stored in files, which are known as configuration files. These files are introduced as software life cycle data items and considered as important as the executable object code. This is because PDIs can influence the software behaviour without any modification of the executable code. As already pointed out, DO-178C objectives will be discussed in detail in Sect. 5.3 (next chapter). However, we will discuss MC/DC (modified condition/decision coverage) here in detail because this prescription of branch coverage testing technique is unique to DO-178C though its applicability is not limited to avionics alone.
176
4 Complying with Standards and Guides
Modified Condition/Decision Coverage (MC/DC) One of the important clarifications of DO-178B in 178C is the updated definition of modified condition/decision coverage (MC/DC) [93] supporting masking and short circuit in addition to unique cause. MC/DC needs detailed discussion as issues related to the independent effect of conditions can be confounding [114], and it may lead to difficulty in generating MC/DC test cases, if not understood properly. Before discussing further, let us introduce a few useful terms with the help of examples. Let us consider three expressions A, B and C for this discussion, and the vectors representing the truth table of these three expressions are denoted as .0 = (0, 0, 0) to .7 = (1, 1, 1). Also note that (i) the term condition is used for a Boolean expression without any Boolean operator and (ii) the term decision is used for Boolean expression with zero or more Boolean operators. Furthermore, if a condition appears more than once within a decision, each occurrence is considered as a distinct condition [114]. Thus, all conditions can be a decision, but all decisions are not conditions. Truth Vector The truth vector for a Boolean function/expression is a specific combination of conditions and the result of the function/expression. For example, for the expression A and (B or C)
.
one truth vector is 3:F(False)-(0,1,1), where the condition combination 3:(0,1,1) applied to the expression results in an F. Independence and Independence Pair A condition possesses independence if two truth vectors exist such that: (i) Both truth vectors produce different results—if one is T(true), the other is F (false) (ii) The condition of interest assumes T(1) and F(0) in these two truth vectors (iii) All other conditions either remain unchanged or are masked The combination of two truth vectors—one true and one false—demonstrating the independence of a condition in a Boolean expression is termed as an independence pair. We will use the example in Fig. 4.2 to explain independence. The independence of expression A (condition of interest) within a Boolean expression (if condition here) can be shown if the sub-expression ensures that either B or C remains true while A toggles between true and false. The truth vectors .1 : F − (0, 0, 1) and .5 : T −(1, 0, 1) are an example of an independence pair for A. This is because these two truth vectors if applied to the if condition will result in two different outcomes (F and T) ensuring coverage of both the branches.
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
177
Fig. 4.2 A representative code structure for independence
Masking Consider the expression A and B
.
If B is set to 0, the effect of A will get masked, i.e. the output remains the same irrespective of A. Similarly, in the case of A or B
.
if B is set to 1, the output of the expression will be independent of A. MC/DC with Short Circuit It can be observed from the discussion on masking that the short circuit3 logic is similar to masking. Short circuit is used in MC/DC to generate only those test cases whose outputs are observable, i.e. have an effect on the outputs. Consider the following expression: A and B
.
If B is set to 1, the output of the expression will depend only on A, which is a short circuit. This is analogous to a 2-input and-gate with B as the control input, which is shorted to logical 1, and A as the input of interest. Similarly, in case of (A or B), if B is set to 0, the output of the expression will depend only on A.
Short circuit establishes a direct path from input of interest to the output and observes its outcome.
Coupling Consider the listing in Fig. 4.3. Let X represent (.A and B) and Y represent (.!A and B) in the if condition of the listing. If A in X is changed, it
3 The short circuit control form evaluates LHS first and then conditionally evaluates the RHS of a Boolean expression. For example, if the LHS of the or operator is true, then the RHS is not evaluated as the outcome becomes independent of RHS. Some languages like C and C++ follow the same.
178
4 Complying with Standards and Guides
Fig. 4.3 A representative code structure for coupling
will also affect Y . This makes X and Y coupled conditions because changes in one condition cause changes in the other. Due to the coupling, these conditions are not free to assume any possible combination of conditions among them.
The condition A in X can be evaluated independently for structural coverage only if the condition Y is masked. Similarly, the condition A in Y can be evaluated independently for structural coverage only if the condition X is masked.
MC/DC Test Cases MC/DC aims at structural coverage using a set of unique test cases without unnecessary repetitions. A test set conforming to MC/DC consists of test cases, which ensures that : (i) Each and every entry and exit point in the code has been invoked at least once (ii) Each and every decision in the code takes all possible outcomes at least once (iii) Each and every condition in a decision affects the outcome of the decision independently The requirement .(iii) of MC/DC can only be met with test cases involving independence pairs of truth vectors corresponding to each condition in a decision. These test cases are known as unique cause MC/DC test cases. Unique Cause MC/DC Unique cause MC/DC test cases are generated so that the effect of each and every expression on the outcome can be evaluated independently by keeping all other expressions constant. A short circuit can be used to generate unique cause test cases. Let us elaborate MC/DC further with the help of an example. Consider the expression (A and B) or C
.
which decides the code path. In order to show the independence of C, the subexpression .A and B must be set to false, and A can be toggled between .1(T ) and .0(F ). This can generate three unique cause test sets by ensuring that the expression
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
179
A and B remains false while keeping A and B constant. This leads to a set of unique cause MC/DC test cases as follows:
.
0 : (0, 0, 0) and 1 : (0, 0, 1)
.
2 : (0, 1, 0) and 3 : (0, 1, 1) 4 : (1, 0, 0) and 5 : (1, 0, 1) Out of these three independence pairs, any one can be used to satisfy the MC/DC requirement for condition A. To show the independence of A, conditions B and C are to be short circuited to 1 and 0, respectively. This can generate only one independence pair. 2 : (0, 1, 0) and 6 : (1, 1, 0)
.
Similarly, only one independence pair (test case) can be generated for the condition B to meet the MC/DC requirement. Effect of Masking Let us now look into the effect of masking in showing the independence of C. If masking is allowed, then A and B can assume any two of the combinations (0,0), (0,1) (1,0), while C can toggle between 0 and 1. This will generate nine sets of masking MC/DC test cases, of which any one can be used. 0 : (0, 0, 0) and 1 : (0, 0, 1)
.
0 : (0, 0, 0) and 3 : (0, 1, 1) 0 : (0, 0, 0) and 5 : (1, 0, 1) 2 : (0, 1, 0) and 3 : (0, 1, 1) 2 : (0, 1, 0) and 5 : (1, 0, 1) 2 : (0, 1, 0) and 1 : (0, 0, 1) 4 : (1, 0, 0) and 5 : (1, 0, 1) 4 : (1, 0, 0) and 3 : (0, 1, 1) 4 : (1, 0, 0) and 1 : (0, 0, 1) Thus, masking helps in generating more test cases for coverage testing. A larger test set is advantageous as it helps in carrying out requirements-based tests while satisfying structural coverage (MC/DC) by picking a suitable test case that meets some specific requirements. Documents Supplementing DO-178C A number of supplement documents are introduced by DO-178C to address the rapid advances in technology. These are:
180
(i) (ii) (iii) (iv)
4 Complying with Standards and Guides
DO-330: Software Tool Qualification Considerations DO-331: Model-Based Development and Verification DO-332: Object-Oriented Technology and Related Techniques DO-333: Formal Methods
The need for qualification of tools stated in DO-330 is also recommended by IEC 60880 [121] and IEEE Std. 7-4.3.2 [137]. The model-based development (MBD) recommended by DO-331 is the emerging technique of software development, which uses formal specifications as models. Such a model, being executable, can be subjected to simulation, testing and analysis for finding logical flaws in the system design. MBD and code generation from the high-level model specification make use of formal methods as recommended by DO-333. MBD and formal methods go hand in hand, and it calls for discussions in depth, which is taken up in Chap. 6.
4.3.6.3
ARINC 653
The basic philosophy behind the ARINC 653 specification is as follows: – Integrated modular avionics (IMA) partitioning – Software decomposition
ARINC 653 [59] is a set of documents consisting of Part 0 to Part 5 as shown in Table 4.12, which specifies the operating environment supporting IMA. As it is clear from its name, the IMA system is invented for avionics applications. Its need arises from the practical considerations unique to avionics systems, which are limitations in available space and weight. This is not the case for land-based safety-critical I&C systems as in nuclear power plants, where physically separated safety-critical systems are designed to be independent from non-safety or systems with lower safety category so that any failure in these systems does not affect the functioning of the safety systems. In addition, redundant I&C trains within a safety-
Table 4.12 ARINC specification 653 ARINC 653 Part 0 Part 1 Part 2 Part 3A and 3B Part 3B Part 4 Part 5
Title Overview of ARINC 653 Required services Extended services Conformity test specification for ARINC 653 required services Conformity test specification for ARINC 653 extended services Subset services Core software recommended capabilities
4.3 Codes, Regulatory Guides and Standards for CBS in NPP
181
critical system are designed to be independent, which includes physical separation, right from the sensor, processor and modules generating outputs to the actuators. However, the fundamental design philosophy of the independence of safetycritical systems from systems of lower category as well as independence among the redundant channels/trains remains the same in the aviation industry also. This led to the development of integrated modular avionics. The modularity in the IMA system also helps in the verification and validation of software-based systems by making it easier as the V&V of individual modules can be carried out independently. IMA Partitioning The IMA system supports independent execution of one or more avionics applications. This is achieved by providing partitioning so that a fault in a partitioned function is contained within the partition. In other words, any fault in a partitioned application is prevented from causing a failure in some other function executed by an application in a different partition. In addition, partitioning reduces the cost of verification and validation in terms of both time and effort. This is because the V&V of applications belonging to different partitions can be verified and validated separately and the rigours of V&V of applications of different levels of criticality are not the same. Software Decomposition The software in an IMA system consists of: (a) An application software, which can be developed and verified independently for each partition. (b) An execution environment, which includes an operating system and a generalpurpose APEX (APplication EXecutive) interface between the OS of a computer hardware resource and the application software. An APEX interface provides an environment to enable the execution of independently developed applications concurrently on the same hardware. That said, ARINC 653 specifies the operating environment for avionics application software used within an integrated modular avionics (IMA) system, which supports hosting of multiple software of different criticality levels using the concept of time (access to hardware resources like CPU and communication bus) partitioning and space (memory) partitioning in a single computer. Thus, IMA partitioning has two dimensions: (a) Time partitioning (b) Space partitioning In addition to achieve time partitioning, a cyclic table-driven scheduler schedules individual partitions in an IMA system in order to guarantee uninterrupted access to common resources (CPU, communication bus) during their allocated time in every cycle. Providing real-time guarantee to applications running in partitions has its challenges, which will be discussed in Chap. 7 with the help of a case study.
182
4 Complying with Standards and Guides
4.4 Case Studies Two case studies are presented in this section: – The first one demonstrates compliance with various standards at the system architectural level. – The second one is a demonstration of synergy between a safety guide and the corresponding industry standard.
4.4.1 Case Study 1: Complying with the Safety System Criteria and General Design Criteria (GDC) This case study presents how to comply with the general design criteria (GDC) in the design of a reactor trip system (RTS). The architecture of RTS is described in Sect. 2.4.1.1. This architecture as shown in Fig. 2.20 has been considered for this study, where four-train redundancy, functional diversity, independence among redundant trains and testability feature play important roles in its compliance with the GDCs.
4.4.1.1
Single Failure Criterion
Applicable Clauses/Criteria: 5.1 of IEEE 603-2009, 5.1 of IEEE 7-4.3.22010, 21 of USNRC 10 CFR Part 50-Appendix A, 3.2 of AERB/NPPPHWR/SG/D-10, IEEE 379 (complete standard)
A safety system is required to maintain the ability to perform safety functions in the presence of any credible single failure of a sensor, controller or actuator. In order to conform to the single failure criterion, four redundant and independent trains with 2oo4 coincidence voting logic for the initiation of safety action are used in the RTS system architecture. It is known that sensor failure is a dominant cause of I&C system failure. The provision of separate sensors for all redundant trains contributes to the fault tolerance capability of the RTS and thus helps meet the single failure criterion. The use of local 2oo4 coincidence voting logic at the train level, for the validation of signals received from the redundant trains, prevents software fault propagation and avoids cascaded failure. Voting logic at any stage of processing filters out the effect of error/fault that might have occurred in the preceding stage. In RTS, parameterwise local (i.e. within a train) coincidence logic is designed to negate the adverse effect of a single sensor failure in reactor trip generation.
4.4 Case Studies
183
Fig. 4.4 2oo4 voting logic arrangement for trip breakers
A single failure of actuators is taken care of by using appropriate voting logic ladders for the initiation of safety action. A 2oo4 voting logic ladder for a reactor trip function is shown in Fig. 4.4. In this configuration of the reactor trip breakers (RTB), even if the RTBs belonging to one train fail to open, the initiation of a reactor trip is not prevented. Also, any spurious actuation (opening) of RTBs belonging to one train can be tolerated as it does not result in a spurious reactor trip. The provision of separate power supplies to individual redundant trains ensures that failure of a single electric power source does not lead to non-availability of the safety system. Adequate redundancy is also provided for other components like communication link, safety bus, etc. This ensures that malfunctioning of a single link neither results in spurious actuation nor inhibits safety function when demanded.
4.4.1.2
Completion of Protective Action
Applicable Clauses/Criteria: 5.2 and 4(k) of IEEE 603-2009, 5.5 of AERB/NPP-PHWR/SG/D-10
Protective actions, once initiated, must proceed to its completion irrespective of the changed status of the actuating signal. It requires deliberate operator intervention to return to the normal state of operation. To meet this requirement, an actuation signal is latched, and resetting of the latched signal is prohibited for a specified interval of time in order to allow completion of the safety action. In this case study, design provisions are made such that following a reactor trip (RT) generated by a RPS train, the train-level RT signal is latched and it requires manual action to reset the same. The latched signal can be reset/cleared only if 10 s
184
4 Complying with Standards and Guides
are elapsed after the actuation of RT and the plant variable that initiated the trip returns to its acceptable/operating range. The delay of 10 s is provided to ensure that the RT can be reset only after all the safety rods reach to the bottom hitting the bottom limit switch (BLS). It is to be noted here that the specified rod drop time is around 3 s in a typical PWR. This provision ensures that the reactor trip, once initiated, proceeds to complete the safety action, i.e. all the safety rods reach to the bottom, and the reactor shuts down.
4.4.1.3
Independence
Applicable Clauses/Criteria: 5.6 of IEEE 603-2009, 22 of USNRC 10 CFR Part 50-Appendix A, 4.2 of AERB/NPP-PHWR/SG/D-10, 3.1 of AERB/NPPPHWR/SG/D-20
It is obvious that redundancy is of no use until and unless redundant trains are immune to each other’s fault/failure. For this reason, all redundant trains must be functionally and physically independent from each other to the extent feasible. Four functionally independent trains are designed for a reactor trip system so that a failure of one train does not prevent the other redundant trains from performing its safety functions. The use of optical fibre in data links and communication networks (that connect the redundant trains) provides electrical isolation among the redundant trains. An independent power supply and a physical separation of the redundant trains (placing them in separate rooms) help in meeting independence criteria in accordance with IEEE Std. 384 [135].
4.4.1.4
System Integrity and Testability
Applicable Clauses/Criteria: 5.5 and 5.7 of IEEE 603-2009, 21 of USNRC 10 CFR Part 50-Appendix A, 4.5 of AERB/NPP-PHWR/SG/D-10
The software of a safety system are required to be designed with self-diagnostic features in order to continuously check, detect and report computer system faults and failures in a timely manner. In addition, there should be a provision for surveillance tests to facilitate the periodic testing of all the safety functions that are not automatically tested through self-diagnostics.
4.4 Case Studies
185
In RTS, design provisions are made to permit testing of its components (from sensor to actuators) and their functionalities by either online periodic self-tests or periodic surveillance tests as per IEEE Std. 338 [139] and IEC Std. 60671 [131]. For ensuring the integrity of RPS during operation, provisions for self-diagnostic features, viz. memory check, code integrity check, signal validity check, configuration integrity check, software version check and periodic testing of all hardware modules (CPU, input modules, output modules and communication modules), are made in the software. These diagnostic tests detect system fault(s) timely and notify the operator so that necessary repair work can be carried out to restore the system integrity and make the system ready to perform all its safety functions correctly whenever any demand arises. In addition, provisions for surveillance tests are also made in RTS to allow the testing of safety functions (e.g. reactor trip generation based on the RT parameter status, integrity check of the self-diagnostic functions, manual safety action, actuation of RTBs, etc.), which are not automatically tested through selfdiagnostics. The surveillance test can be initiated by an operator for one train at a time. Although it is preferable to provide a test scheme for end-to-end testing of a reactor trip function of RTS, which allow the simulation of sensor input to cause opening of RTB, it may require the simulation of sensor inputs in two trains at the same time because RT generation is based on the local coincidence logic (e.g. 2oo4 voting logic) of individual RT parameters. Simulation in two redundant trains is not allowed because it violates the independence criteria, and simulation of the same RT parameter in two trains at a time may initiate a spurious reactor trip during testing. Therefore, overlapped testing is designed for the reactor trip function as follows. First, the sensor input of one train is simulated to verify the partial trip (refer to Definition 2.3) generation in a particular train. Then it is followed by a separate test where manual test inputs are used to generate a train-level reactor trip. The trainlevel RT is confirmed by opening of the RTBs corresponding to the train under test.
4.4.1.5
Information Display
Applicable Clauses/Criteria: 5.8 of IEEE 603-2009, 10 of AERB/NPPPHWR/SG/D-10
A reactor trip system is designed with a provision to send all important parameters value, safety function actuation indication, status of actuators, bypasses and inoperable status indication of equipment/functions as well as testing and calibration information to safety-related display systems. The information is displayed in the control room to assist the operator for the execution of the required safety actions, particularly in the case of failure of automatic safety systems.
186
4 Complying with Standards and Guides
4.4.1.6
Control of Access
Applicable Clauses/Criteria: 5.9 of IEEE 603-2009, 5.9 of IEEE 7-4.3.22010, 4.8 of AERB/NPP-PHWR/SG/D-10
A safety system should be designed and operated with appropriate security measures based on the security threats so that the system can be protected against any security breach. System security control is designed for RTS to protect the integrity and availability of the system. Access to RPS hardware and software is administratively controlled. An alarm is generated in the control room, whenever a hardware rack of any RPS train is opened. Any change in configurable parameters4 (e.g. pre-trip set points) of RPS is allowed only after authentication. The system is designed with two levels of authentication—passkey switch and software password—for any such changes.
4.4.1.7
Maintenance and Repair
Applicable Clauses/Criteria: 5.10 of IEEE 603-2009, 3.38 of AERB/NPPPHWR/SG/D-20)
The RTS software design permits bypassing of individual input channels (parameter-level bypass) or individual trains (train-level bypass) for maintenance activities while the plant is operational. In addition, the system is developed on a modular hardware platform for easy repair by replacing the faulty module.
4.4.1.8
Common Cause Failure Criteria
Applicable Clauses/Criteria: 5.16 of IEEE 603-200, 5.16 of IEEE 7-4.3.220109, 4.1 of AERB/NPP-PHWR/SG/D-10, IEC 62340 (complete standard))
4 Also
known as parameter data item (PDI).
4.4 Case Studies
187
RTS is designed with measures to prevent common cause failure (CCF). This is achieved by providing (i) functional independence of redundant trains, (ii) functional diversity by detecting any abnormal condition from at least two diverse signals, (iii) diverse controllers (one CPU-based and the other FPGA-based) within a train of RPS and (iv) diverse mechanisms (under voltage and shunt trip attachments) for tripping the RTBs. In the case of a common cause failure affecting all digital computer-based systems, a hardwired system called diverse protection system (DPS) provides reactor trips and other safety functions based on some selected critical parameters.
4.4.1.9
Fire Protection
Applicable Clauses/Criteria: 3 of USNRC 10 CFR Part 50-Appendix A
Redundant trains of RPS are kept in physically separate rooms so that fire in one room does not prevent the protective function. Other components, viz. wires, cables and connectors, as applicable, are made from flame resistant materials to minimize the potential propagation of fire. In addition, a separate backup control room (BCR) is provided with essential monitoring and control capability, which is located away from the main control room (MCR). This enables the operator to take necessary safety actions even during non-habitability/inaccessibility of the MCR due to fire, steam release, etc.
4.4.1.10
Protection System Failure Modes
Applicable Clauses/Criteria: 23 of USNRC 10 CFR Part 50-Appendix A, 4.3 of AERB/NPP-PHWR/SG/D-10
A safety system should be designed to fail into either a safe state or a state which is acceptable on some other defined basis. Each train of RPS is designed to fail into a safe state and generate a reactor trip signal in the case of software failure, CPU failure or failure of power supply. RTBs are also designed to open in the event of power supply failure, which leads to a safe state as it causes dropping of safety rods making the reactor sub-critical.
188
4.4.1.11
4 Complying with Standards and Guides
Separation of Protection and Control System
Applicable Clauses/Criteria: 24 of USNRC 10 CFR Part 50-Appendix A
RTS, a protection system, is designed with dedicated safety sensors and RPS controllers, which are physically separate and independent from the sensors and hardware used in implementing control systems. Unidirectional flow of information from RTS to control systems and electrical isolation by the use of optical fibre for communication links ensure that the protection system is decoupled from the control systems.
4.4.2 Case Study 2: AERB SG D-25 and IEC 60880 for Licensing of a Computer-Based System (Software) in Safety Systems In this case study, the Atomic Energy Regulatory Board (AERB) safety guide AERB SG D-25 and the industry standard IEC 60880 are taken up (i) to establish and define the relationship between them and (ii) how this relationship can be helpful in the process of obtaining regulatory approval before deploying software in a computerbased system for safety-critical application in NPP. The necessary condition for the regulatory approval of software for computerbased systems performing safety-critical functions is generating evidences, which include the following: (a) A well-defined software development process has been followed. (b) The software development process has been followed in conformance with the regulatory guides and industry standards identified in the software project management as well as software quality assurance plans, which include verification and validation. In other words, it is necessary that the software is qualified for its use in safetycritical systems. While the next chapter of this book is dedicated to software qualification, this case study is limited to establishing the relationship between a country-specific safety guide and the recommended industry standard. It is necessary to seek answers to the following pertinent questions that may be asked by the developers of software performing Class IA I&C safety functions [50]: – Is there any additional regulatory requirement in AERB SG D-25 vis-à-vis IEC 60880? – Is there any difference in the stringency of requirements between these two documents?
4.4 Case Studies
189
Table 4.13 Documents and recommendations: software development plan Documents Software quality assurance plan (SQAP)
Software configuration management plan (SCMP) Software verification and validation plan (SVVP)
Programming guidelines (PG)
IEC 60880 Requires SQAP at an early stage of the software life cycle and details out its attributes Describes general guidelines for SCMP
AERB SG D-25 Requires SQAP within the framework of organizational-level QA plan
Working standards IEEE Std. 730
Explicit details of SCMP are provided
IEEE Std. 828
Specific guidelines for the verification of software and software aspects of system validation Detailed guidelines for software coding are provided
No significant difference
IEEE Std. 1012
Refers to IEC 60880
Depends on the language and the adopted guideline, e.g. MISRA-C [174] is recommended in India
– What are the necessary documentary evidences to be produced for a safetycritical software in obtaining AERB approval? To this end, the requirements of IEC 60880 and AERB SG D-25 are compared against each identified document as in [152].
4.4.3 IEC 60880 Versus AERB SG D-25: Recommendations and Documentation In addition to comparing and mapping the recommendations of IEC 60880 with AERB SG D-25, the applicable standards pertaining to the production of relevant documents are also identified. These standards are referred to as the working standards. The essential documentary evidence to build a case for approval along with the corresponding working standards are captured in Tables 4.13, 4.14, 4.15, 4.16, and 4.17, which are reproduced5 from [152] with minor modifications. It can be observed from Table 4.13 that so far as SQAP is concerned, AERB SG D-25 leaves it to the QA policy of the development organization. But, IEC 60880 details out the attributes of SQAP. In contrast, AERB SG D-25 gives explicit details of the SCMP, while IEC 60880 provides only general guidelines. 5 ©2013
Springer India, reprinted with permission.
190
4 Complying with Standards and Guides
Table 4.14 Documents, objectives and recommendations: software requirements Documents and objectives Software requirements specifications (SRS)
Requirements analysis
Traceability to user/system requirements
IEC 60880 Specific details of software requirements, including self-supervision and periodic testing, are provided Recommends the use of formal or application-oriented language for SRS Traceability to previous SDLC document required
AERB SG D-25 No significant difference. Brief description under system architectural design is provided
Working standards IEEE Std. 830
No specific recommendation
Depends on selected specification or modeling language, e.g. UML IEEE Std. 830 (Part of SRS)
No difference
Table 4.15 Documents, objectives and recommendations: software design Documents Software architectural design (SAD) Software detailed design (SDD) (a) Design of dynamic behaviour (b) Traceability to SRS
IEC 60880 Required
AERB SG D-25 Guidelines provided
Detailed mandatory requirements provided
Brief guideline
Working standards IEEE Std. 1016 (Architectural viewpoints) IEEE Std. 1016
(a) Required (b) Required
(a) Required (b) Required
From Table 4.14, it can be observed that there are no significant differences between AERB SG D-25 and IEC 60880 so far as the stringency of the software requirements specification and the key objectives of the requirements phase are concerned. However, it is noteworthy that IEC 60880 provides more detailed guidelines pertaining to software requirements for documentation in SRS. This can be expected as AERB SG D-25 is a guide and not a standard like IEC 60880. The comparison of the key objectives pertaining to software design documentation, as specified by AERB SG D-25 and IEC 60880, is presented in Table 4.15. It can be observed from Table 4.15 that conformance to IEC 60880 requires mandatory compliance to the detailed requirements specified for SDD. In comparison, AERB SG D-25 provides only brief guidelines. Table 4.16 shows that both IEC 60880 and AERB SG D-25 recommend the use of static analysis tools. But it should be mandatory to use static analysis tools for reasons discussed in Sect. 4.4.3.2.
4.4 Case Studies
191
Table 4.16 Objectives and recommendations: software implementation Objectives Compliance to PG
IEC 60880 Required
AERB SG D-25 Required
Static analysis report for compliance with quality metrics specified (acceptable nesting depth, cyclomatic complexity, comments per lines of code, etc.) in SQAP
Recommends the use of static analysis tools
Static analysis tools are recommended
Working standards As per PG, e.g. MISRA-C Nil
Table 4.17 Documents, objectives and recommendations: software testing Documents and objectives Software unit test plan and report (SUTP/R) (a) Traceability to SDD (b) Statement coverage, branch coverage Software integration test plan and report (SITP/R) (a) Functional testing (b) Traceability to SDD (c) Code coverage, branch coverage Software user manual (SUM)
IEC 60880 Required
AERB SG D-25 Required
Working standards IEEE Std. 1008, IEEE Std. 829
Software integration is to be considered as part of system integration
Refers to IEC 60880
IEEE Std. 829
Required
Required
IEEE 1063
It can be observed from Table 4.17 that AREB SG D-25 endorses IEC 60880 for compliance with its regulatory requirements pertaining to software testing.
4.4.3.1
IEC 60880 Compliance
It is important to note that AERB SG D-25 made it mandatory to produce an IEC 60880 compliance matrix for regulatory approval of Class IA software. This
192
4 Complying with Standards and Guides
implies that wherever any regulatory requirement in AERB SG D-25 is not specified or needs elaboration, a developer should look for the corresponding IEC 60880 recommendation.
4.4.3.2
Use of CASE Tools
The use of computer-aided software engineering (CASE) tools is recommended by both IEC Std. 60880 and AERB SG D-25. This is because it helps improve the process of software development including verification and validation. The use of CASE tools is essential to carry out (i) requirements analysis and design using artefacts provided by a modeling language like UML and (ii) static analysis of code for branch coverage and statement coverage as well as generating quality metrics like cyclomatic complexity. In addition, documents generated by CASE tools help build the safety case for regulatory approval. Compliance to Programming Guidelines (PG) Without a standard guideline (e.g. MISRA-C [174]) and a compliance checking tool, it will become a humongous task to verify a large code for its adherence to PG. Requirements Analysis and Design The use of a CASE tool is necessary for requirements modeling and design using a modeling language like UML (Unified Modeling Language). This is for the simple reason that it is not practical to carry out requirements analysis and design using UML manually. Also, if formal methods are adopted for requirements specification, verification and automated synthesis of code, the use of tools becomes a must. Quality Metrics A static analysis tool is necessary to generate matrices related to testability and complexity (a measure of maintainability) of code.
4.5 Summary and Takeaways This chapter offered a holistic approach towards compliance with the guides and standards. Given the plethora of standards and guides published by various institutions, it was necessary to bring out the essence of the relevant standards and develop a holistic approach towards unification of the prevalent standards to the extent achievable. This has been executed with the help of a number of comparative studies of various industry-specific standards, from different perspectives, by identifying their commonality as well as their uniqueness. This remains essential especially for young developers who are likely to get overwhelmed by the large number of standards. The chapter also offered a road map for the development of a safety system software by means of identifying the standards applicable at various stages.
4.5 Summary and Takeaways
193
•
? Review Questions
1. Why do we need to follow the standards? 2. What do you understand by means prescriptive and objective prescriptive standards for software development? 3. Why is the classification of safety systems important? 4. Why is the use of custom PG (programming guideline) not a good practice? 5. What are the key USNRC GDC (general design guidelines) that help improve the reliability and availability of safety systems? 6. Which standards are endorsed by USNRC regulatory guides for the development of computer-based safety systems? 7. What are the effective measures to protect computer-based systems from common cause failures? 8. What is the fail-safe design principle? 9. What is the single failure criterion? What are the important measures taken in the design to meet this criterion? 10. Design test cases to provide 100% MC/DC coverage for the following C code. if ((A && B) || C) { func1(); } else { func2(); }
Chapter 5
Qualification of Safety System Software
The man of science has learned to believe in justification, not by faith, but by verification. – Thomas H. Huxley
This chapter dwells on the process that helps develop a safety case—the documentary evidences demonstrating the application of appropriate design and development standards, necessary safety principles and the verification process for the development of computer-based systems performing safety-critical functions. This is essential to generate adequate documentary evidence in order to secure approval from the regulatory authority for its worthiness in safety-critical applications. As already discussed in Sect. 3.1.7.4, the verification and validation (V&V) of software are part of the software safety life cycle. However, the software safety life cycle is strongly integrated with the safety system life cycle. This is because it is necessary that the software not only meets its functional requirements but also supports functions like initialization, system/hardware configuration, diagnostics/surveillance of the system communication interfaces, etc., and all these requirements are introduced by the system design. Software-related activities based on the traditional V-model of development recommended by IEC 60880 [121] demand verification at every stage of development, viz. requirements specification, design and implementation, which goes hand in hand with the verification of the software aspects of system integration as well as system validation. Adherence to the recommended development process and verification at every stage of development by an independent team are the essential components of software qualification. This chapter explains, step by step, the process of software qualification for safety systems. In addition to the conventional methods of generating documentary evidences supporting the quality of the software, this chapter also discusses how formal design and verification can improve the process towards meeting the regulatory requirements and finally getting approval. To this end, we also introduce the formal
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. Karmakar et al., Development of Safety-Critical Systems, https://doi.org/10.1007/978-3-031-27901-0_5
195
196
5 Qualification of Safety System Software
model-based design and verification of software for safety systems in this chapter, as it can play a very important role in software qualification. However, considering that the topic deserves more attention, we include Chap. 6, where we discuss the recent developments in the domain of formal modeling, verification and automated synthesis.
5.1 Regulatory Requirements and Safety Case The acceptance of computer-based systems (CBS) for safety applications is determined based on the documented demonstration (the safety case) of the design and operational safety principles, standards and/or criteria applicable to the safety system belonging to a particular safety class. The review of this demonstration comes under the purview of the regulators. A general guideline for the constituents of the acceptable safety case demonstration has been provided in [117]. However, as a part of the qualification process, it is necessary to prepare a framework for the review of the safety case specific to the software. Safety Case The documented demonstration of the design and operational safety principles, standards and/or criteria applicable to the safety system belonging to a particular safety class.
5.1.1 Software Safety Case Let us now discuss how to prepare a safety case for a computer-based system and demonstrate its suitability for use in applications belonging to a particular class of safety.
5.1.1.1
The Basis
The acceptance criteria and the rigour of the qualification process for a safety system software depend on its safety class. In the case of a pre-developed software, previous experience—evidence of demonstration from the previous use—can also form the basis of acceptance if the system belongs to the same class of safety. Classification of the Safety System A safety system is classified based on its importance, i.e. the severity of harm it can cause in case of its failure to perform the safety functions. For example, the failure of a landing gear system of an aircraft
5.1 Regulatory Requirements and Safety Case
197
can be catastrophic preventing safe landing, which can potentially cause many fatal injuries. This system is classified as level A and as per DO-178B/DO-178C [21, 92] demands a demonstration of its worthiness by meeting the requirements of the highest safety level. In contrast, the qualification requirement of a level B (hazardous) system is relatively less stringent. Previous Experience In the absence of documentary evidence that a suitable process was followed in the development of software, previous experience can be considered as an acceptance criterion by the approving authority as suggested in IEC 60880 [121]. The evidence of demonstration associated with a previous system of the same safety class finds its applicability in making a case for approval in the absence of documentary evidence of a suitable process in the development of software. This is often a case where legacy software or commercial off-the-shelf (COTS) components/products are used for which the SDLC documentation is not available for a review/audit.
5.1.1.2
Demonstration
In order to prepare a safety case for computer-based safety systems, it is necessary to produce documentary evidence demonstrating the following: (i) The system performs its functions as specified. (ii) The correctness of the software requirements specification is verified. (iii) A well-defined process of development is followed right from the requirements analysis to the design to all the phases of testing. (iv) Applicable standards are used during the development. This evidence is also required post-development if maintenance involves changes in software. (v) It is ensured that each and every system functionality has been tested. (vi) The verification and validation of each and every output of all the development phases are carried out. (vii) The capability of monitoring and testing to ensure in-service integrity is established. (viii) Software quality assurance (SQA) tasks and activities are carried out to ensure the quality of software being developed, and SQA reports are generated. (ix) All the outputs (code and documents) generated in the process are managed using a well-defined software configuration management (SCM) process. Note that SCM includes the software change control process as well.
5.1.2 Generating Evidences The evidences against the demonstration areas, listed in Sect. 5.1.1.2, are generated at different phases of the software development life cycle.
198
5 Qualification of Safety System Software
The purpose is to: (i) Identify the documentation steering: • • • •
The development Verification and validation The use of the software Its maintenance
(ii) Prepare a list of a minimum number of documents, which are to be reviewed as well as audited Generating evidences and their documentation begins with a plan for it, and therefore, it becomes a part of the software project management plan (SPMP). In order to ensure the quality of the software under development meets the required level, the SPMP includes a software quality assurance plan (SQAP) introduced in Sect. 3.1.2. An SQAP ensures that the following activities are carried out and the corresponding documentary evidences are generated.
5.1.2.1
Plan Documents
(i) Software project management plan (SPMP) (refer to Sect. 3.1.1 and IEEE Std. 1058 [9]) (ii) Software development plan, which includes software engineering methods/procedures/tools description (iii) Software quality assurance plan (SQAP) (refer to Sect. 3.1.2 and IEEE Std. 730), which includes software development standards description (refer to Chap. 4) (iv) Software verification and validation plan (SVVP) (refer to Sect. 3.1.3 and IEEE Std. 1012 [133]) (v) Software configuration management plan (refer to Sect. 3.1.4 and IEEE Std. 828 [16]) (vi) Software safety plan (refer to IEEE Std. 1228)
5.1.2.2
Documents Pertaining to Requirements Analysis and Design
(i) Software requirements specification (SRS) (refer to IEEE Std. 830 [12]) (ii) Software design description (SDD) (refer to IEEE Std. 1016 [17])
5.1.2.3
Verification and Validation Reports
Verification and validation (V&V) and the documentation of the V&V results are essential for preparing a safety case. The detailed description of V&V activities to be conducted is documented in the software verification and validation plan (SVVP).
5.1 Regulatory Requirements and Safety Case
199
The verification and validation activities are interrelated, and their process outputs are used by each other to prepare a better case for qualification as it helps establish better criteria of completion and analysis, evaluation, review, inspection, assessment and testing [133]. In this context, it is to be noted that nuances in descriptions by various standards and guides can apparently blur the distinction between the terms verification and validation if not interpreted carefully, critically and contextually. However, the key points that distinguish the one from the other are their primary objectives. Objectives of Verification and Validation Activities
(i) Verification activities are carried out to ensure that the product has been developed correctly, i.e. following the right process as agreed upon. The goal is to provide objective evidences if the product, throughout its development life cycle, (i) conforms to the correctness, completeness, consistency and accuracy requirements; (ii) conforms to the relevant standards, practices and conventions; and (iii) ensures that all the life cycle activities are completed following the laid-down criteria [133]. (ii) Validation activities establish that the correct/right product has been developed. Thus, validation activities are associated with requirements and aimed at providing evidence whether (i) the system requirements are satisfied; (ii) the user requirements, e.g. the operational environment, are satisfied; and (iii) the modeling of physical laws, wherever applicable, is done correctly [133].
Verification activities involve a review of the SDLC documents corresponding to the requirements, design, code and testing, which also includes the following: (i) Checking the traceability of the software requirements (SR) to the user requirements (UR) and its completeness (ii) Checking the traceability of the software design (SD) to the software requirements (SR) and its completeness (iii) Checking the traceability of the code to the software design (SD) and its completeness (iv) Checking the correctness of algorithms and formal proofs, as applicable The result reports of the minimum verification activities include the following: (i) (ii) (iii) (iv) (v) (vi)
Review report of the user requirements document (URD) Review report of the software requirements specification (SRS) Review report of the software design description (SDD) Code review report Review of the software unit test plan and (test) reports (SUTP/R) The software integration test plan and (test) reports (SITP/R)
200
5 Qualification of Safety System Software
Validation activities include the following: (i) System validation testing (ii) User acceptance testing The corresponding documents to be generated and reviewed are the following: (i) The system validation plan and test report (SVP/TR) (ii) The acceptance test plan and report (ATP/R) Audits of all the above documents involving software V&V activities are also carried out, and the audit reports are generated.
5.1.2.4
Document Related to Software Use
Software user guide
5.1.2.5
Software Maintenance
In accordance with the maintenance plan, it is necessary to carry out the following tasks and generate the corresponding documentation: (i) Reporting of software change request (SCR) (ii) Review of the SCR and its impact analysis
5.2 Verification and Validation Processes In this section, the verification and validation activities (identified in Sect. 5.1.2.3) will be elaborated further to help identify a comprehensive things-to-do list from a practitioner’s perspective.
5.2.1 The Art of Verification The verification process involves (i) reviews, (ii) tracing and (iii) formal proof (wherever practical). 5.2.1.1
Review
In order to evaluate an element pertaining to software, three kinds of reviews are carried out, viz. technical review, walk-through and inspection. A software element
5.2 Verification and Validation Processes
201
(a configuration item, popularly known by its acronym CI) can be a document, a source code or a module. Technical review and the review reports provide evidence that a particular software element: (i) Conforms to the specifications documented in the previous phases (ii) Has been produced following the standards and procedures identified in the plan documents (iii) Has incorporated the changes in accordance with the document change record (DCR) or software change record (SCR), as applicable, supported by the impact analysis of the change Walk-through can literally mean a tour by walk from one end to the other of a document, a model or a proposed solution—design and code. A walk-through helps identify defects and deficiencies at an early stage of the requirements analysis and design phase, where possible solutions are discussed and debated before taking a decision. For example, a trade-off between computational efficiency (time) and accuracy in selecting an algorithm for a given system requirement demands discussion. Software inspection is also aimed at identifying defects but does not discuss solutions. This is mainly carried out to find out if there is any non-conformance with the applicable standards and specifications. For example, non-conformance to the modified condition/decision coverage (MC/DC) in the level A avionics software unit test is a violation, which can be identified by inspection.
5.2.1.2
Tracing
Tracing is mainly aimed at (i) justifying the existence and (ii) assuring the completeness of a software element (SDLC documents and code) during every phase of development. The software requirements need to be traced back to the user requirements. Similarly, the software design is to be traced back to the requirements and the code back to the design. However, only backward tracing is not enough for verification. This is because it may fail to capture if any requirement (of the previous phase) is left out in the present phase leading to incompleteness. Therefore, both forward traceability and backward traceability are essential to demonstrate the completeness and adequacy of a software element. For example, forward traceability of each and every requirement to the design phase ensures that every requirement has been considered in the design, and backward traceability finds out the existence of superfluous design considerations, if any, which are not traceable to some requirement. Backward traceability associates a statement with each item justifying its existence. However, this can also mean that a requirement itself is not complete, which can and should be resolved.
202
5 Qualification of Safety System Software
Table 5.1 Traceability list and the corresponding documents Traceability User requirements traced to software requirements and vice versa Software requirements traced to component descriptions and vice versa Software requirements traced to software functional modules and vice versa Unit test items traced to software functional modules Integration test items traced to architectural components and vice versa System tests traced to software requirements and vice versa Acceptance tests traced to user requirements and vice versa
Documents URD and SRS SRS and SDD (ADD, if a separate document exists) SRS and SDD SUTP and SDD SITP and SDD (ADD, if a separate document exists) SVP and SRS ATP and URD
Traceability is normally documented in the form of matrices showing crossreferences between inputs to a phase and its outputs. A list of required minimum traceability and the corresponding documents is shown in Table 5.1.
5.2.1.3
Formal Proof
Providing a formal proof of the correctness of a software can be considered as the best approach for obtaining regulatory approval. It takes well-defined semantics and calculus to construct a proof. Formal methods can be applied for two different objectives: (i) generating the proof of correctness of a given code meeting a set of requirements and (ii) generating code from a model developed from the requirements specification. In the second case, the generated code can be claimed as correct by construction. Correctness of a code can be proved using techniques like bounded model checking [158] and Hoare logic [115], among others. Code generation from requirements calls for formal specifications using formal languages, which include Lustre [108] and SCADE [1]. Though a formal proof is very attractive and makes a strong case for obtaining regulators’ approval, it can be challenging, and it is still an active area of research. The formal design and verification techniques are introduced later in Sect. 5.5 of this chapter. The topics on the automated synthesis of code and the development of a qualified platform deserve further discussions and therefore are taken up in Chaps. 6 and 7, respectively.
5.2 Verification and Validation Processes
203
5.2.2 Validation As already discussed in Sect. 5.1.2.3, validation is carried out to generate evidence that the requirements are satisfied by the product as specified. The first step in requirements validation can be the process of generating evidences that user requirements are interpreted correctly by the software engineer [71]. This starts with the review of the software requirements specification and its validation against the user requirements document. Prototyping is another method of validation, which can reveal any wrong assumption by software engineers, especially safety-critical requirements. Further, the quality of the requirements model needs to be validated during the requirements analysis phase. This is known as model validation. It can be a UML [184]/SysML [185] object model or scenario, which can be validated against specific user requirements, e.g. the operator interface or field input (data) interface. If formal specifications are used, the specifications of the required safety properties can be validated by formal reasoning. The system validation test is carried out at the developer’s end to ensure that the system meets the specified requirements. Many non-functional requirements like response time, reliability and security can be tested at the system level only. Validation testing not only generates an evidence of system performance but prepares the ground for the final acceptance by the user. The acceptance test is carried out by the user to determine whether the system satisfies the laid-down acceptance criteria, which are essentially the system behaviour/performance against the user requirements (UR). Acceptance testing can be conducted with or without involving the software/system developers.
5.2.3 Testing Testing is involved in both the verification and validation processes. Testing is the process of evaluating a system, and it is applied during various phases of development as well as at the end of it, i.e. before delivery and final acceptance. Furthermore, testing is also essential during software maintenance. Testing can be broadly classified into two categories: construction-level testing and system-level testing. Any software maintenance involving changes will include both of these two categories of testing. Furthermore, regression testing is also necessary to ensure that there is no ripple effect due to changes in the software. In other words, it is necessary to verify that the modifications have not impaired the functionality and performance of the software. Construction-level testing involves: (i) Unit testing (ii) Integration testing
204
5 Qualification of Safety System Software
Table 5.2 Software development life cycle and testing User Software Detailed design requirements requirements Architectural and Testing definition specification design implementation Delivery Acceptance test ATP ATR SVP SVR System validation test SITP SITR Integration test SUTP, SUTR Unit test Phase
System-level testing involves: (i) System validation testing (ii) Acceptance testing Testing can be carried out manually or automatically with the help of tools in order to meet the objectives of (i) confirming that the requirements are satisfied as specified and (ii) identifying the differences between the expected and the actual system behaviour that may occur [71]. SDLC and Testing The timelines of documenting test plans, the test design and its execution at various phases of the software development life cycle (SDLC) can be straightaway derived from the V-model of development presented in Sect. 3.1.7.4, which is summarized in Table 5.2.
5.3 Safety Case Docket and Process Objectives Standards and guides in all the safety-critical system domains emphasize the need to follow a well-defined development process, adherence to the standards and verification at every stage of design and production. Regulatory authorities demand documentary evidences for the same as the basis for the certification/licensing of computer-based systems performing safety functions. However, the avionics standard DO-178B and its latest version DO-178C are unique in a way that they identified and defined the set of objectives against the processes to be followed for compliance. The five DO-178C development processes are: (i) (ii) (iii) (iv) (v)
Software planning Software development Software verification Software configuration management Software quality assurance (SQA)
In addition, DO-178C defines the objectives of the software certification liaison process, which is not discussed here as the focus of the book is the development of safety-critical systems.
5.3 Safety Case Docket and Process Objectives
205
It is important to note that the process objectives defined by DO-178C are equally applicable to the safety systems in other domains like nuclear power plants.
A minimum list of documentary evidences to build a safety case, which we call a safety case docket, along with (i) the corresponding SDLC activities and (ii) the associated process objectives is presented in Tables 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, and 5.10 in the following sections.
5.3.1 Safety Case Documents: Planning The required minimum number of documents for developing the safety case pertaining to software development planning is shown in Table 5.3. The table also shows the corresponding SDLC activities against each document and the applicable DO-178C objectives associated with the same.
5.3.2 Safety Case Documents: Development The set of minimum number of documents pertaining to software development is shown in Table 5.4. The correspondence between the documents and the SDLC activities and the associated DO-178C objectives is also presented in the table.
5.3.3 Safety Case Documents: V&V of Requirements The list of required documentary evidences pertaining to the verification and validation (V&V) of requirements along with the associated DO-178C objectives is presented in Table 5.5.
5.3.4 Safety Case Documents: V&V of Design The SDLC documents pertaining to the verification and validation of the software design are presented in Table 5.6. The DO-178C objectives corresponding to each of these documents are also shown in the table.
206
5 Qualification of Safety System Software
Table 5.3 Safety case docket: software planning Document/deliverable Software project management plan (SPMP) Software quality assurance plan (SQAP)
SDLC activities Software planning
Software configuration management plan (SCMP)
Version control
Software verification and validation plan (SVVP) Acceptance test plan (ATP) System validation plan (SVP)
Software V&V planning
Software integration test plan (SITP) Software unit test plan (SUTP)
Applicable DO-178C objectives Software life cycle process activities are defined
Software planning: (i) Identification of documentation standard, design standard, programming guidelines and standards, and testing standards and practices (ii) Identification of tools, techniques and methodologies to be used during the development life cycle
(i) Software development life cycle environment is selected and defined (ii) Software development plan complies with this document
Development and revision process of software are coordinated Software life cycle process activities are defined
Defining (user) acceptance test criteria Defining system validation criteria from the software aspects of the system requirements document (SRD) Software architectural design Software detailed design
5.3.5 Safety Case Documents: V&V of Implementation The list of safety case documents pertaining to the V&V of software implementation and its validation against low-level as well as system-level requirements as specified under the DO-178C objectives are shown in Table 5.7.
5.3.6 Safety Case Documents: Verification of Verification (Code Review) The documents pertaining to the verification of the code review and test reports are shown in Table 5.8 along with the corresponding DO-178C objectives.
5.3 Safety Case Docket and Process Objectives
207
Table 5.4 Safety case docket: software development Document/deliverable User requirements document (URD) Software requirements specification (SRS)
Software architectural design (SAD)a Software detailed design (SDD)
Source code Executable code
a SAD
SDLC activities User requirements documentation and defining acceptance test criteria Software requirements documentation and planning for system validation (software aspects)
Applicable DO-178C objectives High-level requirements are developed Derived high-level requirements are defined and provided to the system processes, including the system safety assessment process Software architecture is developed
Design of software architecture and planning for software integration test Detailed design of software Derived low-level and planning for software unit requirements are defined and test provided to the system processes, including the system safety assessment process Software implementation Source code is developed Executable code generation Executable code and along with parameter data files parameter data item files, if any, are produced and loaded in the target hardware
can be either a separate document or a part of the software detailed design document
5.3.7 Safety Case Documents: Configuration Management The safety case documentation related to the process of software configuration management (SCM) and the applicable DO-178C objectives are presented in Table 5.9.
5.3.8 Safety Case Documents: SQA Table 5.10 shows the documents related to SQA (software quality assurance), which are essential for developing the safety case. The applicable DO-178C objectives are also presented in the table.
208
5 Qualification of Safety System Software
Table 5.5 Safety case docket: verification and validation of requirements Document/deliverable Review report of user requirements document (URD) Review report of acceptance test plan (ATP)
V&V activities Review of URD
Review report of software requirements specification (SRS)
Review of SRD
Review of ATP
Applicable DO-178C objectives High-level requirements: (i) Comply with system requirements (ii) Are correct and consistent (iii) Are compatible with the target computer (iv) Are verifiable (v) Conform to standards (vi) Are traceable to system requirements (vii) Algorithms are correct Derived and low-level requirements: (i) Comply with system requirements (ii) Are correct and consistent (iii) Are compatible with the target computer (iv) Are verifiable (v) Conform to standards (vi) Are traceable to user requirements (vii) Algorithms are correct
5.4 Software Audit Software audit is essential before releasing a software. Auditing is the process of conducting independent reviews to evaluate the compliance of software products and the processes (planning, requirements specification, design and development, verification and validation, quality assurance and configuration management) with the applicable standards, guides as well as contractual and licensing requirements. The process involves both physical and functional audits. Physical audit checks that all the identified configuration items (the document, source code, executable, database, etc.) do exist physically as baselines. In addition, whether the processes are followed in compliance with the applicable standards is also checked. Functional audit checks that the unit tests, integration tests and system validation tests have been carried out and their successes and failures are documented.
5.5 Formal Specification and Verification of Software
209
Table 5.6 Safety case docket: verification and validation of design Document/deliverable Review report of software architectural design (SAD)
Review report of software detailed design (SDD)
V&V activities Review of software architecture
Review of SDD
Applicable DO-178C objectives Software architecture is: (i) Compatible with high-level requirements (ii) Consistent (iii) Compatible with the target computer (iv) Verifiable (v) Consistent with the standards One more objective, mainly applicable to the avionics industry, is to ensure that the software partitioning integrity is confirmed (i) Algorithms are accurate (ii) Low-level requirements are verifiable (iii) Parameter data item (PDI) file is correct
5.5 Formal Specification and Verification of Software Traditional approaches rely mainly on simulation and testing to demonstrate that a system will perform its safety functions correctly. However, this cannot give any firm guarantee on the correctness with respect to the specified requirements. By contrast, in the formal method-based design approach, the requirements are first expressed in a suitable mathematical logic. This allows the rigorous formal analysis of requirements themselves such as checking for correctness, consistency and realizability to be carried out. Consistency pertains to the absence of contradiction in the different elements of the requirements—it ensures that at least one behaviour (i.e. the sequence of inputs and outputs) satisfies the requirement. A more useful quality is the realizability of requirement, which states that at least one implementation exists which fully satisfies the requirement. The realizability of requirement demands that for every input sequence, there exists an output sequence which satisfies the requirement. The formal verification of a system against the specified requirements is well established [94]. Furthermore, the automatic synthesis of a system from logicbased requirements is feasible, and it is currently an active area of research. Unfortunately, besides these advantages, many existing formal methods also have a few disadvantages: (a) Often the logical notations used are cumbersome, complex and far removed from practically usable visual notations for the requirements specification.
210
5 Qualification of Safety System Software
Table 5.7 Safety case docket: verification and validation of implementation (coding) Document/deliverable Review report of software integration test plan (SITP) Source code review report
V&V activities Review of SITP Source code review
Applicable DO-178C objectives Software code complies with software architecture Software code: (i) Complies with low-level requirements (ii) Complies with software architecture (iii) Conforms to standards (iv) Is verifiable and is traceable to low-level requirements (v) Is accurate and consistent (vi) Output of the software integration process is complete and correct (vii) Parameter data item (PDI) file is correct and complete (viii) PDI file is verified
Review report of system validation test (SVT) report Review report of acceptance test report
System validation test Review of user acceptance test
Executable object code: (i) Complies with high-level as well as low-level requirements (ii) Is robust with high-level as well as low-level requirements (iii) Is compatible with the target hardware
(b) Scalability is a major concern in the applicability of formal methods because of the infamous state space explosion problem. (c) The automatic synthesis methods and tools from logical specifications do not really consider the quality of the synthesized controller. In this section, we will only introduce formal methods by answering the following questions.
(i) What is formal specification and how to specify formally? (ii) What is formal verification and what are the practical approaches to apply the formal verification techniques?
5.5 Formal Specification and Verification of Software
211
Table 5.8 Safety case docket: the verification of verification (code review) process Document/deliverable V&V activities Review report of software unit Review of SUTP test plan (SUTP) Source code review Source code review report
Applicable DO-178C objectives (i) Test procedures are correct. (ii) Test results are correct and discrepancies are explained (iii) Test coverage of low-level requirements is achieved (iv) Test coverage of software structure pertaining to: (a) Modified condition/decision is achieved (b) Decision coverage is achieved (c) Statement coverage is achieved (v) Test coverage of software structure is verified
Further discussions on this topic focusing on the formal model-based design, its verification and the automated synthesis of code will be taken up in Chap. 6. It will also introduce scalable verification techniques such as run-time monitoring and the synthesis of run-time enforcement shields.
5.5.1 Formal Verification Approaches Two distinct approaches of formal verification are necessary to meet the requirements of real-world applications. These are: (i) Verification of implementation with respect to the requirement (ii) Synthesis of the implementation automatically from the requirements specified in formal languages
212
5 Qualification of Safety System Software
Table 5.9 Safety case docket: the software configuration management process Document/deliverable Review report of software configuration management plan (SCMP)
V&V activities Review of SCMP
Applicable DO-178C objectives (i) Configuration items (CI) are defined (ii) Baselines and traceability are established (iii) Problem reporting, change control, change review and configuration status accounting are established (iv) Archive, retrieval and release are established (v) Software load control is established (vi) Software life cycle environment control is established
Table 5.10 Safety case docket: software quality assurance Document/deliverable Software quality assurance (SQA) reports Software verification and validation (V&V) reports
V&V activities Internal audit of SQA reports Internal audit of V&V reports
Applicable DO-178C objectives Assurance is obtained that:. (i) Software plans are available and reviewed for compliance with the SQAP (ii) Applicable standards are identified as per the SQAP (iii) Software life cycle processes comply with software plans (iv) Software life cycle processes comply with software standards (v) Transition criteria for the software life cycle processes are satisfied (vi) Software conformity review is conducted
Verification of Implementation There are two techniques for the verification of implementation—model checking and theorem proving. Out of these two, the model checking method is used most widely. This is because of the availability of automated analysis tools, which are applicable to a large subset of real-world problems. In this section, we will introduce this verification technique briefly.
5.5 Formal Specification and Verification of Software
213
In this approach, the implementation is manually generated or generated from domain-specific application programming languages. For example, in the case of a programmable controller, the application program is written using one of the IEC 61131-3-specified programming languages. Once we have the implementation, the verification proceeds by defining the properties expected to be satisfied by this implementation. Synthesis of Implementation This is an approach where the properties expected to be satisfied by the implementation are specified using a formal language and the implementation is generated (synthesized) automatically. However, this is still an active area of research, and the available techniques of generating implementation are not yet mature enough to be widely applicable in practical systems.
5.5.2 Verification Using Model Checking The aim of formal verification techniques is to guarantee mathematically that the system under consideration satisfies the desired requirements. Formal verification involves three entities: 1. The model of the system 2. The requirement to be satisfied by the model 3. The methods/tools for checking if the model satisfies the requirement In the following section, each of these entities is discussed briefly.
5.5.2.1
The Model of the System
The formal model of a system is an abstraction, which can be represented as a mathematical structure such as an automaton to represent the temporal evolution of the system. In general, the model is represented using some logic-based specification, which is then translated to an equivalent automaton. The model can directly be represented as an automaton too or can be extracted from the existing implementation. It is important to note that an automata-based representation of the system allows tool development to enable the automated analysis of reachable states used to check the satisfiability of the required property specification. Interested readers can refer to [103, 178] for a formal relationship between logic and automaton, which is used by the tools. Formally, the model M is represented by the triple .(S, →, L), where S is the set of states, .→ is a transition relation from S to S such that every .s ∈ S has some AP , where .s ∈ S with .s → s and L is a labeling function from S to the power set .2 AP is a set of atomic propositions. So, the function L maps each state .s ∈ S to the corresponding set of atomic propositions which hold in s.
214
5.5.2.2
5 Qualification of Safety System Software
Specification of Requirements or Property
The second element of formal verification is the property or requirement which the model is supposed to satisfy. Model checking is a technique to check the satisfiability of a temporal specification against the model. We need to have a property specification language, which can designate the set of good states in the automaton. Good states in the automaton are those which satisfy the required property. There are several temporal logics studied in the literature for the specification of the desired properties, but we will concentrate on the most widely used logic called the linear temporal logic (LTL). Linear Temporal Logic (LTL) LTL describes the evolution of system variables with time. The time is discrete and measured in clock cycles or steps of computation. Thus, a behaviour is an infinite sequence of states. The model specifies all the behaviours which can arise in system execution. The truth value of the formula in temporal logic is not fixed, and unlike in prepositional or predicate logic, it varies with the time point inside the model [118, 192]. A formula in LTL allows us to refer to future time points. For example, the desirable property of the system model could be that it never enters a deadlock, i.e. no state starting from the initial state should be able to reach a state where a deadlock is true.
Syntax of LTL Formula The syntax of an LTL formula .φ over a finite set of atomic propositional variables AP is given by .
φ := | ⊥ | p | (¬φ) | (φ ∨ φ) | (φ ∧ φ) | (Gφ) | (Xφ) | (F φ) | (φ1 U φ2 )
where .p ∈ AP and ., .⊥ represent the truth values true and f alse, respectively. The operators .¬, .∨ and .∧ have their usual meaning. The temporal operators G, X, F and U are read as Globally, neXt, eventually (Finally) and Until, respectively. Semantics of LTL Formula The formal semantics of these operators can be found in [118, 192]. Here, we will give an intuitive description of each of the temporal operators. The truth value of an LTL formula is evaluated for a path in model .M = (S, →, L). A path .ψ in M is an infinite sequence of states, i.e. .ψ = s1 , s2 , s3 , . . ., such that .si → si+1 for every i .i ≥ 1. We say that .ψ is the suffix path of .ψ starting from state .si . Conceptually, the truth of a formula .φ at time point i in .ψ is encoded as the truth of .φ in suffix .ψ i . The satisfaction relation of an LTL formula .φ over a path .ψ is given as follows: • Path .ψ satisfies . and violates .⊥. • Path .ψ satisfies .p ∈ AP iff .p ∈ L(s1 ).
5.5 Formal Specification and Verification of Software
215
Fig. 5.1 Visual representation of a path satisfying an LTL formula
• Path .ψ satisfies .Gφ iff .φ holds on the entire suffix paths (see Fig. 5.1a). • Path .ψ satisfies .Xφ iff .φ holds for the suffix path starting from the next state (see Fig. 5.1b). • Path .ψ satisfies .F φ iff .φ holds for at least one of the suffix paths (see Fig. 5.1c). • Path .ψ satisfies .φ1 U φ2 iff .φ1 holds at least until .φ2 becomes true at some future state in .ψ (see Fig. 5.1d). • The satisfaction of the LTL formula .φ over the logical connectives .¬, .∨ and .∧ takes its usual meaning. Definition 5.1 For model .M = (S, →, L), where .s ∈ S, and for a given LTL formula .φ, M satisfies .φ in state s iff every execution path .ψ starting from s satisfies .φ. Safety and Liveness Properties In general, the properties of LTL can be categorized as either a safety property or a liveness property: 1. A safety property is the LTL property which specifies that the execution never ends up in a bad state. For example, in a system with concurrent processes, the system never enters a deadlock state. Assuming that the propositional variable deadlock is .(true) iff there is a deadlock in the system, this property can be expressed as the following LTL formula: G(¬ deadlock)
.
(5.1)
2. A liveness property specifies that something good happens eventually. For example, in a system with concurrent processes whenever any process requests
216
5 Qualification of Safety System Software
for a critical resource, it will eventually be granted that resource. Assuming that there are n concurrent processes, .reqi represents that the .ith processes have requested for a critical resource and .granti represents that the .ith processes have been granted the critical resource. This property can be expressed as the following LTL formula: G(reqi ⇒ F granti ) ∀1 ≤ i ≤ n
.
5.5.2.3
(5.2)
Model Checking
For model checking, either the system can be directly represented as an automaton using some high-level language or it can be generated from a given logic-based specification. In the model checking of a temporal logic formula .φ against a model M, the formulas may change their truth values as the system evolves from state to state under an execution path. The general approach adopted by model checking algorithms for an LTL formula .φ over a model .M = (S, →, L), where .s ∈ S, consists of the following steps: 1. Construct an automaton equivalent to the LTL formula .¬φ, i.e. the automaton which accepts only those traces (paths) which satisfy the formula .¬φ. Thus, the automaton encodes all the traces which do not satisfy the formula .φ. The theory behind the construction of an automaton for a given LTL formula is beyond the scope of this book, and the interested readers can refer to [103]. 2. Now, take the product of the automata constructed in the previous step with the system model M. The resulting automaton contains all the paths of M such that they do not satisfy .φ. 3. Check if there exists any path from state s in the resulting transition system. If no such path exists, then the formula .φ is satisfied by model M; otherwise, the feasible path can be used as a counterexample. The detailed algorithm for model checking the LTL formula is not presented in this book and can be found in [118]. The major challenge in model checking is dealing with the state space explosion. As the number of state variables in the system increases, the state space of the model grows exponentially. Further, the product of property automaton and model gives rise to the exponential blowup in the state space on which the model checking is to be performed. This is called the state space explosion problem. The techniques used to deal with this problem and other model checking approaches are discussed in Sect. 6.5.1.
5.6 Summary and Takeaways
5.5.2.4
217
Automated Synthesis of Implementation
The automatic synthesis of a program can be broadly categorized into two types: (i) functional synthesis and (ii) reactive synthesis. Functional program synthesis deals with the synthesis of functional programs, i.e. the program which accepts inputs, processes it and terminates after producing the results. Reactive programs, on the other hand, continuously interact with the environment, scan inputs, process them and produce the outputs in a cyclic manner, i.e. they do not terminate after computing the output in one cycle. Thus, the outputs in any cycle depend not only on the current inputs but also on the previous inputs and outputs. This is why a reactive system has states, which represent the temporal evolution. The synthesis of reactive systems from temporal logic has a special significance in the safety-critical application domain as these systems are inherently reactive. The theory behind the synthesis of reactive systems from temporal logic specifications is well studied [70]. The synthesis of a reactive system assumes finite-state data structures but can generate non-terminating reactive programs, i.e. programs that transform an input data stream into an output data stream. The input specification is given by a temporal logic specification such as LTL. The aim of the synthesis algorithm is to produce a program that satisfies the specification for any sequence of inputs. The algorithms use the techniques based on automata theory on infinite words [178] and automatabased game theory [103], wherein the synthesis algorithm can be considered as a two-player game. One player is the environment which generates the inputs, and the other player is the controller which generates the outputs such that the requirements are always met. The theoretical results proved by Rosner [198] show that the complexity of the synthesis from LTL properties is doubly exponential in the size of the formula. In order to deal with this complexity, the approach followed is to find the subsets of LTL which are expressive enough for specifying reactive systems and can be synthesized efficiently. The details of LTL-based synthesis algorithms are not within the scope of this book. The interested readers can refer to [70] for a survey of various LTL-based synthesis methods along with their applicability and complexity. However, in Chap. 6 we will discuss the automatic synthesis from regular properties specified using another formal language QDDC and show how effectively QDDC [187] can be used for formal verification and synthesis.
5.6 Summary and Takeaways The qualification of a software-based system requires the generation of documentary evidence, known as building a safety case, that the system and the software have been built following an established and well-defined process. This involves compliance with the relevant standards and carrying out the necessary verification and validation activities at every stage of development. This chapter explains the
218
5 Qualification of Safety System Software
above activities with a practical approach by offering a things-to-do list at every stage of development. The formal specification and formal verification approaches are introduced in this chapter. This is because evidences generated using such mathematically rigorous approaches can make a more convincing case in obtaining regulatory approval. This includes the introduction of formal verification and automated code generation, which plays a very important role in software qualification. Considering that the topic of formal verification and automated code generation deserves more attention, Chap. 6 is included in this book, where the developments in the domain of modelbased development and synthesis from a temporal logic called QDDC are discussed with a focus on automated software development approaches.
•
? Review Questions
1. What do you understand by software qualification? 2. How is the process of software qualification different from hardware qualification? 3. What do you understand by building a safety case? 4. What are the steps involved in software qualification? 5. How do verification and validation help in qualifying a software? 6. Can you identify and describe the various types of testing associated with the different SDLC phases? 7. What are the DO-178C objectives corresponding to various SDLC activities? 8. Can you correlate IEC 60880 recommendations (Sect. 4.4.3) with the corresponding DO-178C objectives (Sect. 5.3)? 9. What role can formal verification play in software qualification? How? 10. What are the steps involved in formal verification using model checking?
Chapter 6
Formal Modeling, Verification and Automated Synthesis
The most important property of a program is whether it accomplishes the intention of its user. – C.A.R. Hoare
Traditional software development approach follows the well-known V development model discussed in Chap. 3. The typical flow associated with the V development model is to first capture the requirements, followed by architecture/detailed design and then implementation. Simulation/testing is used to ensure the correctness of the design and implementation. However, testing can only unearth the presence of some bugs but cannot assure the absence of bugs in the design or implementation. Therefore, it is very difficult to provide firm guarantees using the traditional development approach. This chapter outlines the use of formal techniques and tools for the modeling and analysis of safety systems. When used judiciously, this approach can have a marked influence on the system reliability. Moreover, such formal modeling and analysis is often supported by tools which can enhance the automated development of systems, which are correct by construction. The central elements of the formal modeling and verification approach are formal requirement and high-level model, leading to a model-based design. Such a development involves the specification of requirements using formal notations which are based on a suitable mathematical logic or an abstract computational model such as an automaton. A formal specification provides a rigorous and mathematically unambiguous articulation of the desired properties of the system structure and behaviour. Moreover, a logical specification makes the automated analysis of requirements such as checking for consistency and realizability possible. It is well established that inconsistencies, ambiguities and incompleteness in requirements are a few of the major causes of deep-seated bugs in critical systems. In the model-based design, a high-level model capturing the structural and behavioural aspects of the system is constructed. Such a model, being executable, can be subjected to simulation, testing and analysis for finding logical flaws in the system design. Moreover, in a separate phase of design (called the code generation), © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. Karmakar et al., Development of Safety-Critical Systems, https://doi.org/10.1007/978-3-031-27901-0_6
219
220
6 Formal Modeling, Verification and Automated Synthesis
the high-level model can be systematically transformed into an efficient implementation on available platforms and technologies. Often, such code generation can be carried out automatically (or semi-automatically) using reliable tools which ensure correct-by-construction implementation of the model. The twin elements of logical specification and high-level model also enable the formal verification of the model to ensure that under all circumstances, the model satisfies the logical specification. Extensive research has gone into developing techniques, algorithms and tools for such formal verification. Moreover, lightweight verification techniques such as run-time monitoring, symbolic simulation and concolic testing have been developed to address the issues of scalability of formal verification. The formal verification of the implementation against the specified requirements is well established in the literature and increasingly finding use in safety-critical industries. Furthermore, automatic synthesis of high-level model from logic-based requirements is also being attempted. Such synthesis can also incorporate quality enhancement by optimizing the expected frequency of occurrences of soft requirements.1 As an added benefit, the availability of a logical requirement and high-level model allows us to systematically incorporate robustness against partial failures of the environment/system right at the logical design phase. A pragmatically useful idea is to shield manually constructed systems with run-time enforcement shields enforcing critical requirements. This chapter gives an overview of the current practices useful in safety-critical applications, which provide firm guarantees on the implementation. Despite firm guarantees provided by the use of formal verification techniques, the traditional testing method of verification cannot be replaced completely by formal verification. This is because of the well-known state space explosion problem associated with it. Therefore, systematic testing is still one of the most important methods to validate the system. This chapter also covers the different kinds of testing and the recent advances in the area of automating the test suite generation process.
6.1 Formal Model-Based Design A formal model-based design is an approach wherein the domain-specific formal modeling language is used to capture the functional and non-functional requirements of the system. This formal model provides a basis for verification and validation activities through provably correct guarantees on the behaviour of the model. Often, it is possible to automatically transform this high-level model into implementable code using automatic code generation techniques with high integrity. Therefore, such an approach is also called the correct-by-construction method of system design.
1 These are requirements which are not guaranteed to be satisfied in every cycle, but the frequency of satisfaction over a long period will be maximized.
6.1 Formal Model-Based Design
221
The model-based design approach leads to more portable and maintainable designs, as the change in requirement can be introduced in the model directly. This model can be verified (via both simulation/testing and formal verification) before code is generated, thereby detecting logical design errors earlier in the development cycle. Automated code generation from the model keeps the implementation synchronized with the design. The model-based design methodology for the development of software used in safety class applications also conforms to the recommended design guidelines by the IEC 60880 standard [121]. The important criterion, however, is to select or design a modeling language which is suitable for the targeted application domain. It can be observed that most safety-critical embedded applications are reactive in nature. Reactive systems are systems which continuously interact with their environment rather than producing some final value upon termination. Another feature of such systems is that they consist of several concurrent activities addressing different parts of the system, and these activities must interact and synchronize. Typical examples of such systems include nuclear reactor power control systems, air traffic control systems, etc. The desirable features of a modeling language for such systems include the following. Deterministic Behaviour Safety-critical system models are required to be deterministic, i.e. functionally, an identical sequence of inputs should give rise to identical behaviour (i.e. the sequence of outputs). Thus, the model should specify the behaviour which should be unaffected by underlying implementation architecture, schedulers and run-time environments. This makes the system more understandable and predictable. Parallelism Safety-critical systems are often highly parallel in nature, i.e. logically several tasks can be executed simultaneously. Systems may be composed of several concurrent sub-systems which may interact and synchronize. Hierarchy A system may be made out of sub-systems, and each sub-system may further be composed from smaller sub-systems. The modeling language should promote such hierarchical and modular design. System components should be reusable and parameterized to avoid duplication. Predictable Execution Time Almost all safety-critical systems are time-critical, which imposes stringent response time requirements on the systems. Dependability Being safety-critical, the system should have very high reliability and availability, requiring very careful design and verification methods.
6.1.1 Need for the Model The model of any system is an abstraction of the system behaviour. A computational model describes the behaviour of a system through the behaviour of its constituting components. As the system behaviour may belong to various broad categories
222
6 Formal Modeling, Verification and Automated Synthesis
such as reactive, functional, concurrent, etc., an appropriate modeling paradigm should be selected to suit the application domain. There are various computation models having different characteristics, making these models suitable for a particular application domain. The relevant characteristics of safety systems and the corresponding computational models supporting these characteristics are presented in the following section.
6.1.1.1
Computational Models and Their Characteristics
The most widely used model for safety systems (also considered as reactive systems by definition) which continuously interact with the environment is the synchronousreactive (SR)2 model of computation [108]. This model is designed for modeling applications with complicated control logic embedded in one or more periodic tasks occurring concurrently and still preserving the determinism. A fundamental concept in such concurrent systems is synchrony, which assumes that the system is able to react to all the events before any further event occurs. The important characteristics of SR models are the logical representation of concurrency and time, fault detection and recovery. Another important feature of reactive systems is that they operate on the stream of data acquired periodically, which closely matches with the concept of data flow model of computation. Data flow models provide abstraction for handling streams of data such as periodic readings from a sensor in a process control system. Thus, a synchronous data flow model is data driven, and data flows through a network of operators at each logical clock cycle. By contrast, the event-driven SR model is good at handling sporadic data, which is consumed or used at an event. In the SR model, the reaction of a system is assumed to be simultaneous and instantaneous at each event (a tick of the clock or a logical notion of time).
The notion simultaneous represents the fact that all actors in this model react all at the same time and any dependency is inherently taken care of by the model. The term instantaneous means that outputs are produced as soon as inputs are received. Practically, the model assumes that the system is able to produce the output before any further input (event) occurs. This assumption is known as synchrony hypothesis, where it is assumed that actors are executing in no time and the communication between actors also takes zero time. This is similar to the design and analysis of digital circuits where it is assumed that the gate and wire delays are zero (negligible).
2 The
acronym SR used in this chapter should not be confused with software requirements.
6.1 Formal Model-Based Design
223
Let us now discuss the desired features of a modeling language, which suit the requirements of a safety system design discussed in the previous section. (i) The modeling language should provide a determinism feature, which is achieved by formally defining the semantics of every construct of the language so that no ambiguity arises during the compilation/interpretation of the model. (ii) Modeling parallelism: • Data flow is a high-level paradigm in the domain of reactive programming. The systems are modeled by means of networks of operators which transform the flows (streams) of input data into streams of output data by means of equations or functions. • The data flow approach provides parallelism between equations. Each equation in the data flow model transforms an input stream into a stream of output independently. If there is data dependency, then the valid schedule which satisfies all the data dependency constraints is automatically figured out at compile time. Considering the system is to be implemented on a uniprocessor system, the parallelism is modeled at logical level, but it provides the application developer the freedom to specify what is required to be done instead of how it should be done. (iii) The requirement of execution time predictability can be met by synchronous languages.3 Such modeling languages provide primitives, which allow the developer to assume that their program is instantaneously reacting to external events. Thus, each internal event takes place at a known time with respect to the external events that occurred previously. This feature, along with the deterministic constructs, results in deterministic programs from both functional and temporal points of view. In practical terms, the synchrony hypothesis amounts to assuming that the program is able to react to an external event before any further external events occur. This is an important criterion as the synchrony hypothesis can only be met if the modeling language allows only those constructs which are guaranteed to terminate in bounded time. (iv) Achieving dependability: Declarative models allow for concise and high-level specifications of the system. This is because the developer only needs to specify what functions are expected from the system without specifying the details of how (in what sequence) the system will perform these functions. For example, a developer need not specify the sequence of execution of the specified function. The compiler automatically figures out the order based on data dependency. Following a formal data flow-based modeling approach provides the functional model with mathematically clean semantics. Such a model is easily comprehended by the programmer, and it can be well adapted to formal verification. This offers a high level of confidence in the model and thus dependability of the implementation in accordance with the formal model.
3 The
languages which follow the synchrony hypothesis.
224
6.1.1.2
6 Formal Modeling, Verification and Automated Synthesis
Uses of the Model
The high level of dependability required by safety systems can be realized by: – A rigorous analysis of the functional requirements – A correct-by-construction synthesis of implementation from functional requirements – A verification of whether the implementation faithfully captures the requirements The correct choice of the model can help designers in each of these activities including the automatic generation of validation artefacts (e.g. test suits) that can be submitted to the regulatory authorities for obtaining license/approval. This section focuses on each of the design activities in order to facilitate the design and development of the system to comply with the stringent regulatory requirements imposed by the safety system development industry.
6.1.2 Application Development with Model-Based Design Framework Model-based design (MBD) framework allows system designers to express complex application logic as visual/textual models having formally defined semantics. The applications developed in such a framework can be synthesized into equivalent software programs in a low-level language such as C or C++. Model-based design provides graphical/textual modeling environments consisting of constructs such as functional block diagrams and state machines. The model is used to specify, analyse, simulate and synthesize the software algorithms within the embedded systems. Figure 6.1 shows the steps to be followed when developing applications with the model-based design framework in conformance with IEC 60880 [121]. The process starts with requirements specification, followed by the conceptual design and specification of I&C (instrumentation and control) systems and functions. Engineering tools then generate code automatically as per the specifications. The generated code is functionally validated against the specifications, which is then compiled and deployed to the target system. The following are the advantages of model-based design (MBD): 1. The MBD framework can be used to generate code automatically. Automatic code generation eliminates human coding errors, making the model and actual implementation consistent. 2. It is easy for domain experts to express designs as graphical models using MBD frameworks. 3. Model-based design helps reduce low-level testing effort since one can verify designs during the early stages of development. This makes early bug removal possible and increases overall productivity.
6.1 Formal Model-Based Design
225
Specification Activities
Validation Activities
Requirement Specification
System Validation
Model level Validation
Conceptual Design of I&C System
Test field Tests
Simulator / Input Signal Transients
Specification of I&C System
Specification of I&C Function
Functional Validation
Integration on Target System
Model Based Design Framework
Engineering Tools
Automatic Coding
Compilation
Fig. 6.1 Application development with model-based design framework [121]
However, it is important to note that software programs synthesized using modelbased design framework need to be verified to ensure that they are correct with respect to the specifications.
6.1.2.1
Formal Model-Based Development Language and Tool (SCADE)
One of the widely used tools in the industry which provides a model-based design environment along with the features mentioned in Sect. 6.1.1.1 is the SCADE tool suite [1]. The SCADE tool provides a graphical modeling language, which has a formally defined semantics based on the synchronous data flow model of computation discussed before. The SCADE tool suite also assists in validation and verification activities by facilitating the simulation of model and by the use of the coverage analyser module to determine parts of the model covered during simulation. It also provides a formal verification of critical properties using an SMT solver. Finally, SCADE has a qualified code generator which is certified to generate C or ADA code fully compatible with the model. Another important aspect is that the SCADE models are intuitive and easy to understand, as they are built using graphical notations familiar to domain engineers. This simplifies the review of the models by control system engineers. The SCADE [1] modeling notation supports data flow equations to specify the logical relationship between inputs and outputs and safe state machines to describe the control modes of the system in terms of states and state transitions. Conceptually,
226
6 Formal Modeling, Verification and Automated Synthesis
each clock tick has an operative set of data flow equations which are computed by combining equations from each concurrent component based on its current mode (state). There are transitions with conditions under which the change of mode (state) takes place. The SCADE language has a mathematically precise interpretation of the correct behaviour of a given model. A set of equations are evaluated in parallel, respecting the data dependencies between them. Circular data dependencies (called causality errors) are not permitted. Consider the example of a basic counter that counts up starting from 0. This can be expressed in SCADE textual language as the following equation: N = 0
-> pre (N) + 1;
In the above example, the variable N stores the value of counter at any given time. The operator init denoted by “->” is used for initialization. The operator pre parameterized by a variable is used to refer to the value of that variable at a previous instant of time. Therefore, the above equation specifies that the initial value of variable N is 0, and thereafter, its value is one more than its previous value. A network of equations in SCADE can also be encapsulated as a new reusable operator, which is called a user-defined operator. The following example defines a node which outputs a sequence of natural numbers on the output stream N and a sequence of EVEN numbers on the output stream M. The streams are synchronized so that at any instance the value on M is exactly twice the value on N. node COUNTER () RETURNS (M,N:int) let N = 0 -> pre (N) + 1; M = 2*N; tel
A model is described as a collection of such operators or nodes including a MAIN node which transforms the input streams of data into the output streams. The MAIN node may invoke other nodes which provide a specific functionality. The SCADE suite provides a simulator module to test the model using simulation. Another module called SCADE MTC can be used to analyse the model coverage, based on structural coverage criteria such as MC/DC or branch coverage. Support for formal verification is provided by the SCADE design verifier. It allows the safety (invariance) properties of nodes to be model checked. After having the application verified satisfactorily using simulation and formal verification, the final step is to automatically generate the functionally equivalent program. SCADE provides a certified C code generation module called KCG for this purpose. The generated C code is then cross-compiled using a suitable crosscompiler for the desired target. SCADE Example (A Bus Arbiter) A synchronous bus arbiter, sharing a critical resource among n devices, has n request lines .req1 , . . . , reqi , . . . , reqn as input and acknowledgement lines .ack1 , . . . , acki , . . . , ackn as output. At any clock cycle, a subset of the request lines can be high, signifying that the requesting devices want access to the shared bus for
6.1 Formal Model-Based Design
227 OverideIn GrantOut
TokOut
FBY false
1 Reqln
TokIn
AckOut
FBY 1
Init
OverideOut Grantin
Fig. 6.2 Arbitration logic for one cell
the current cycle. The task of an arbiter is to set at most one of the corresponding acknowledgement lines high to grant access to at most one requesting device. MacMillan [170] has proposed an arbiter design as a set of interconnected cells where each cell is given by a digital circuit. The same cell circuit can be encoded in SCADE as shown in Fig. 6.2. The circuit elements are standard. The square box labeled “fby” denotes a D-latch which delays the signal by one clock cycle. There are four input signals (Request, TokenIn, GrantIn and OverrideIn) and four output signals (Acknowledgement, TokenOut, GrantOut and OverrideOut) in this circuit. To obtain the arbitration among n request lines, n instances of this cell circuit are used in series in such a way that it has n inputs corresponding to n request lines and n outputs corresponding to n acknowledgement lines, i.e. the arbiter makes the output line .acki high when it grants access of the shared resource to .reqi . All the other signals are connected internally. This cell circuit of Fig. 6.2 as a SCADE operator (named arbiter logic) is then used for arbitration between five devices which is shown in Fig. 6.3. We analysed the following basic properties of arbiter for correctness on the SCADE model, using the SCADE design verifier. 1. MutualExclusion: The arbiter has the property that only one of the acknowledgements will be true at any cycle. 2. NoLostCycle: A cycle is called lost if there exists one or more requests, but there is no acknowledgement in this cycle. This property will become true if there is no lost cycle. 3. Response Time for .reqi : If in any observation interval of length 20 cycles the request .reqi is continuously true, then .acki must be true at least once during this observation interval.
6 Formal Modeling, Verification and Automated Synthesis
Fig. 6.3 Five-cell arbiter
228
6.2 Formal Requirements Specification
229
Fig. 6.4 Verification node for arbiter
The SCADE operator for the first two of these properties is presented in Fig. 6.4. The SCADE design verifier could verify these properties, which proves that in our design (the SCADE model), these properties will always hold. These kinds of strong guarantees are required to increase the confidence in safety system design.
The third property called Response Time illustrates a requirement for which building a SCADE operator manually is cumbersome and error-prone. Section 6.2.2 presents a language of logical assertions where such properties can be conveniently and precisely stated. Moreover, the SCADE operators (synchronous observers or property monitors) for verifying such properties can be automatically constructed as shown in Sect. 6.4.
6.2 Formal Requirements Specification The development of any system starts by first writing down all the functional as well as non-functional requirements. Capturing the requirements in an unambiguous manner is of utmost importance, as any contradictory, partial or misunderstood requirements may affect all the future development phases. Therefore, the literature [64, 192, 199] and several industry standards have identified the need to appropriately capture the requirements using formal methods. This section focuses on the effective ways to capture the requirements based on industry-proven methods as well as currently evolving methods based on the current research in formal methods.
230
6 Formal Modeling, Verification and Automated Synthesis
Traditional approaches to show the correctness of the system with respect to the requirements rely mainly upon simulation and testing. Whether a simulation or a test case satisfies the requirement is left to the designer’s interpretation of the requirements, as they are informally stated. Moreover, the system can only be tested with a limited set of test cases. Hence, these methods cannot give any firm guarantee on correctness with respect to the specified requirements. In contrast to this, in the formal method-based design approach, the requirements are first expressed in a suitable logic. This allows a rigorous formal analysis of the requirements such as checking for their consistency, realizability and formal verification of a system against the specified requirements.
6.2.1 Traditional Method of Capturing Requirements Prominently, the designers of safety systems use either textual requirements or various visual requirements specification notations to capture and specify functional requirements addressing different aspects of the system. For example, the temporal relationship among different signals is specified using timing diagrams [183] or message sequence charts. Statecharts [111] efficiently specify the eventbased dynamic behavioural aspects of the system. These methods of requirements capturing are called informal or semi-formal requirements, as their meaning is not unambiguously defined, and they do not support generation of evidences to prove the correctness of the developed systems with respect to the specified requirements. However, in safety systems, the ability to generate provable guarantees for the correctness of the system is essential for acceptance of the system. The next section discusses how formal requirements capturing helps in achieving this goal.
6.2.2 Formal Requirements Capture In contrast to traditional methods, the formal requirements capturing uses the mathematical logic-based notation for the specification of requirements. This provides several advantages in terms of unambiguity, possibility of automatic analysis, validation and synthesis. Practically, formalizing requirements in logic is often complex and cumbersome. To cope with this issue, researchers have worked in mainly two areas, viz.:
• The development of logical notations which are expressive enough to naturally capture the requirements for safety systems. Moreover, the notation should be easily understandable by the user. (continued)
6.2 Formal Requirements Specification
231
• The creation of methods to formalize traditional requirements specification notations such as timing diagrams, message sequence charts, etc. into a logical framework to make them amenable to formal analysis.
Once the requirements are captured in the logic-based framework, the advantages associated with the formal requirements specification can be exploited. Several logics have been studied in the literature for the unambiguous specification of functional requirements. Temporal logics such as LTL [192] and CTL [85] are among the prominently used logics for the requirements specification of reactive systems. Syntax, semantics, formal verification and synthesis from LTL and CTL are all well studied in the literature as discussed in Sect. 5.5.2. LTL and its variants such as PSL [97], which enhances LTL with regular expressions, and MTL [57], which allows the specification of real-time properties, have been successfully used for formal specification in some industrial examples [22, 102]. The desirable properties of LTL which enable its widespread application in requirements specification include its clean syntax and its intuitive semantics. However, over the years, some major shortcomings of LTL have been identified. These include its inability to compose specifications [61], its inability to succinctly specify complex quantitative properties as well as its inability to effectively express robustness in specification [180]. Logics such as MSO [202] and duration calculus [78] are often found to be better suited for building complex specifications in a modular fashion and also for specifying quantitative properties. One such property (Property no. 3) was given in the SCADE example described in Sect. 6.1.2.1. Quantified discrete-time duration calculus (QDDC) [187] is a discrete-time variant of duration calculus (DC) originally proposed by Zhou et al. [78], for modeling real-time requirements. The use of DC in real-time system design was explored by several authors (see Zhou et al. [77, 78]). The expressiveness of these logics is also well studied in the literature, e.g. LTL is expressively complete with respect to the first-order logic [62], and it is equivalent to star-free .ω-regular languages [202]. It is possible to get a language-equivalent Büchi automata for any LTL formula (but not vice versa as the .ω-regular languages are strictly more expressive than LTL). The logic MSO (over finite words) and QDDC are expressively equivalent to regular languages, and there exists a languageequivalent DFA for each formula specified in these logics. Showing the effective construction of language-equivalent automata for these logics is one of the most groundbreaking results in computer science [75, 107], as they enable the automata theoretic algorithmic analysis of logical specifications. QDDC being based on interval temporal logics provides a rich vocabulary to the visual and modular specification of properties. Moreover, it also allows quantitative timing constraints on time distance between events (or duration) to be captured effectively. A number of practical visual requirements capture formalisms such as message sequence charts, timing diagrams or statecharts that can be succinctly and
232
6 Formal Modeling, Verification and Automated Synthesis
modularly translated to QDDC [167]. The following sections briefly describe the logic QDDC and its usage in formalizing requirements. The rest of this chapter will mainly use logic QDDC to give the case studies of requirements specifications and to bring our various facets of formal verification and synthesis from logical requirements.
6.2.3 Quantified Discrete-Time Duration Calculus (QDDC) A behaviour of a reactive system is a sequence of states recording how the observable variables of the system evolve with time. Formally, let P V be a finite non-empty set of propositional variables. A behaviour .σ is an infinite word over the alphabet .2P V . It has the form .σ = P0 P1 · · · where each .Pi ⊆ P V . Each .Pi gives the observable state of the system at position i: it records the set of propositions which are true at time point i. Let the set of position .dom(σ ) = N = {0, 1, . . .}, i.e. the set of natural numbers. An observation point .i ∈ dom(σ ) identifies a reference position where we may observe the behaviour. By contrast, an observation interval .[b, e] ∈ N × N with the restriction that .b ≤ e identifies a section of the behaviour which is of interest. Let .σ [i] = Pi and .σ [b, e] = Pb · · · Pe . Given an infinite behaviour .σ , a partial behaviour is any of its finite prefixes. Example: Behaviour The following partial behaviour .σ [0, 9] records the evolution of propositions P , Q, R in the first ten cycles. Here, time is measured in clock cycles, and each position in the behaviour gives a time instance.
.
P os σ (P ) . σ (Q) σ (R)
0 0 1 1
1 1 1 0
2 1 0 1
3 0 0 1
4 1 0 0
5 0 0 1
6 0 1 0
7 1 0 0
8 0 1 1
9 1 1 1
In formal specification, we often focus on the properties of a given system behaviour as seen from a given reference position. Such a point property .ψ can be evaluated to be true or false in a behaviour .σ at a position i. The notation .σ, i | ψ denotes that the property evaluates to true at position i. Such point properties can be further classified as follows. A local property specifies the current state of the system at position i. It is essentially a proposition .ϕ which holds at position i. In the bus arbiter example given in Sect. 6.1.2.1, the mutual exclusion property .∧j =k ¬(ackj ∧ ackk ) is a local property.
6.2 Formal Requirements Specification
233
A temporal property specifies how the behaviour looks from a given current position. For example, the property may state that from the current point, if req remains continuously true, then ack must occur infinitely often. Logic LTL (augmented with past-time operators) is widely used to specify temporal properties. An important subclass of the temporal properties are the past-time properties, which only specify some aspect of the behaviour fragment .σ [0, i] up to the current position i. For example, “in past every occurrence of error is followed by a reset within its next 2 cycles” is a past-time property. A large class of properties of interest turn out to be past-time properties. Moreover, they have the feature that such properties are causal and naturally monitorable. For a past-time property .ψ, a property monitor .A(ψ) is a Mealy machine (or a program) which continuously observes the system inputs-outputs, and at each point, it outputs a Boolean value indicating whether the behaviour so far satisfies the property. Formally, at any position i, the Mealy machine outputs true if and only if .σ, i | ψ. Such property monitors play a central role in the formal verification of programs [75, 206]. In this section, we shall describe an expressive and a feature-rich logic called quantified discrete-time duration calculus (QDDC) for specifying past-time properties. The logic has associated algorithms for automatically constructing the property monitor Mealy machine for any given formula of the logic. In logic QDDC, there are two types of entities: propositions denoted by .ϕ and QDDC formulas denoted by D. A proposition .φ is evaluated at a position in the word, and it records a local property of the variable values at that position, whereas a QDDC formula D specifies a property of the behaviour in a specified observation interval .[b, e]. Thus, the notation .σ, [b, e] | D states that property D holds for the observation interval .[b, e] in behaviour .σ . We generalize this to past satisfaction. A formula D is past-satisfied at a specified position i if it holds for the observation interval .[0, i]. This is denoted as .σ, i |p D. (Notice the subscript p for past satisfaction which differentiates it from interval satisfaction of D.) We first give examples of some of QDDC formulas with an intuitive explanation of what their truth value for a given interval .[b, e] states. This is followed by a formal definition. • Formula .[[ϕ]] states that proposition .ϕ holds invariantly at all positions within the interval .[b, e] (i.e. for all i with .b ≤ i ≤ e). Its variant .[ϕ] states that proposition .ϕ holds invariantly at all positions i with .b ≤ i < e. • The term .scount ϕ counts the number of occurrences of .ϕ within the interval, .[b, e]. Similarly, the term slen gives the length .e − b of the observation interval .[b, e]. • Note that an interval of the form .[b, b] is called a point interval, and a formula pt marks such intervals. We can define .pt = (slen = 0). Similarly, we can define .unit = (slen = 1) and .ext = (slen > 0). • Formula .ϕ is defined as .pt ∧ [[ϕ]]. It holds for all point intervals where .ϕ is true. • Formula .D1 ˆD2 states that interval .[b, e] can be chopped into two parts .[b, m] and .[m, e] such that the first sub-interval satisfies .D1 and the second sub-interval
234
6 Formal Modeling, Verification and Automated Synthesis
satisfies .D2 . Thus, the formula .P ˆ[!Q]ˆQ holds for all intervals which start with a P and end with the next occurrence of Q. • Formula .EP (ϕ) states that the local property .ϕ holds at the endpoint of the observation interval, and formula .BP (ϕ) states that .ϕ holds at the beginning point of the interval. We can define .EP (ϕ) = true ˆϕ and .BP (ϕ) = ϕ ˆtrue. • Formula . D states that D must hold for some sub-interval .[b , e ] of the current interval .[b, e] where .b ≤ b ≤ e ≤ e. It can be defined as . D = (true ˆD ˆtrue). • Formula .[]D states that D must hold for all sub-intervals of the current interval .[b, e]. Its variant .pref (D) states that for all prefix intervals of the form .[b, i] with .b ≤ i ≤ e, the formula D holds. Similarly, .suff (D) states that for all suffix intervals of the form .[i, e] with .b ≤ i ≤ e, the formula D holds. The definitions of these formulas are given in the following paragraph. Example: QDDC Formula The formula .D = ((slen < 10) => (scount err slen < 2) holds for an interval .[b, e] provided for all its sub-intervals .[b , e ] with .b ≤ b ≤ e ≤ e, and if !A is true throughout the interval, then the length of the sub-interval .[b , e ], i.e. .e − b , must be less than 2. Hence, .σ, [0, 5] | D since all sub-intervals with A continuously false span at most two cycles. Interval .[5, 8] does not satisfy D since it has a subinterval .[6, 8] of length 2 (i.e. three cycles) where A is invariantly false. Also, using the definition of past satisfaction, the formula D holds at position 5. It also holds at position 7. But it does not hold at positions 8 or 9. Formally, .σ, 5 |p D but .σ, 9 |p D. In summary, the formula D as a past-time property states that in the behaviour seen so far, there is no interval where A remains continuously false for more than two cycles.
Property Monitor A property monitor .A(D) for a given QDDC property D is a device which continuously computes the past satisfaction of a property D as the system evolves. Thus, .A(D) observes the evolution of a system with time, and at each position i, it accepts the behaviour fragment .σ [0, i] if D is past-satisfied at i. Alternatively, .A(D) is a Mealy machine or a digital circuit which outputs a value .OK, where .OK is true at position i if and only if D is past-satisfied at i. Theorem 6.1 ([187]) For every formula D over propositional variables P V , we can construct a deterministic finite automaton (DFA) .A(D) over alphabet .2P V such that .L(A(D)) = {σ [0, i] | σ, i |p D}. We call .A(D) a formula automaton for D or the property monitor for D.
6.2 Formal Requirements Specification
237
Fig. 6.5 Monitor automaton for the QDDC formula
A DCVALID tool implements this formula automaton construction and generates the minimal, deterministic automaton (DFA) for the formula D. A detailed description of QDDC and its model checking tool DCVALID can be found in [187, 188]. Example: Property Monitor Consider the formula D=[]([[!A]] => slen < 2) given earlier. The property monitor automaton .A(D) observing the past satisfaction of this formula D (as constructed by the DCVALID tool) is shown in Fig. 6.5. Each transition is labeled with the value of proposition A, and X denotes a do not care (either 0 or 1) value. The automaton is in an accepting state if and only if the past behaviour has satisfied the formula.
6.2.4 Formalization of Requirements in QDDC Capturing the requirements properly (unambiguously and consistently) is the first and most important step for any safety system development. We shall use QDDC formulas with past-time interpretation to formally capture requirements. Given a past-time QDDC formula D, recall that an infinite behaviour .σ invariantly satisfies D (denoted .σ | G(D)) if for all positions .i ∈ N, we have .σ, i |p D. A typical reactive system M exhibits many different behaviours, based on input from its environments as well as uncertainties such as scheduling. Let .Beh(M) denote the set of all infinite behaviours exhibited by a system M. We say that .G(D) is valid for a system M (denoted .M | G(D)) if for all infinite behaviours .σ of M, we have .σ | G(D). Thus, no possible behaviour of M can at any point violate D. In this sense, D is called an invariant property of system M. In a typical specification, an assumption formula A states the assumed properties of the environment of the controller. It also includes the properties of how the plant behaves. A commitment formula C states the desired behaviour of the controller.
238
6 Formal Modeling, Verification and Automated Synthesis
We shall assume that both A and C are past-time properties given in logic QDDC. At any point i in a given behaviour, we can evaluate whether A is true at point i. Similarly, we can also evaluate whether C is true at position i. Note that both A and C may hold intermittently during the behaviour. Formula .pref (A) holds at a position i if assumption A has been continuously satisfied at all positions .j ≤ i. One common usage [68], called .BeCorrect(A, C), requires that for the given system M, we should have M | G(pref (A) ⇒ C)
.
Thus, in any behaviour of M, at every position where assumption A has been continuously true in the past, the commitment C must also hold. With such a usage, the formula .(pref (A) ⇒ C) is called a hard requirement .D h : it must never be violated by M in any possible circumstance. Establishing that a system M invariantly satisfies this hard requirement using mathematical or algorithmic analysis is called formal verification. There has been much progress in developing theories, techniques and tools for such formal verification. The reader may note that the .BeCorrect (A, C) requirement is rather stringent. Even one intermittent violation of the assumption in the past is sufficient to invalidate the commitment forever. In a later section, we shall show how we can give examples of robust system specifications which can tolerate a limited number of assumption failures and still guarantee commitment. Soft Requirement In practice, we often come across the hard requirements which need to be guaranteed, while others are desirable requirements. The desirable requirements may not be satisfied invariantly, but the frequency of their satisfaction should be maximized. We call such a desirable requirement, captured as a QDDC formula .D s , a soft requirement. Formally, a system (controller) with input variables I and output variables O is specified by giving a hard requirement .D h which must be invariantly satisfied and a soft requirement .D s which needs to be satisfied as often as possible. The specification thus has the form .(I, O, D h , D s ). Moreover, the hard requirement h .D is typically constructed out of a pair .(A, C) of assumption and commitment formulas and a correctness criterion such as .BeCorrect (A, C). We will see other forms of correctness criteria in the robustness section. However, when not explicitly stated, we shall assume in keeping with the literature that the BeCorrect criterion is used. The correctness of the controller is determined by checking that .D h can never be violated, whereas the quality of the controller is given by how frequently the soft requirement .D s is satisfied on average across behaviours. More precisely, quality is measured by the probability of .D s holding, where this probability is averaged over all positions in all behaviours of the controller. We refer the reader to the following sources [100, 207] for a formal definition of this probability (also called expected value). Given a requirement .(I, O, D h , D s ), there are algorithmic techniques to automatically construct “optimal” controllers guaranteeing .BeCorrect(A, C) and max-
6.2 Formal Requirements Specification
239
imizing the probability of occurrence of .D s (see [208]). These are discussed more fully in the following sections. We illustrate how the requirements can be captured in logic QDDC by giving two examples. These are adapted from the authors’ original papers [189, 208]. Example: Mine Pump Specification The mine pump controller (see [187]) has two input sensors, the high water level sensor .HH2O and the methane leakage sensor .HCH4, and one output, .PUMPON, to keep the pump on. The objective of the controller is to safely operate the pump in such a way that the water level never remains high continuously for more than w cycles. Thus, mine pump controller specification has input and output variables .({H H 2O, H CH 4}, {P U MP ON}). We have the following assumptions on the mine and the pump. Their conjunction is denoted by .MineAssume(, ζ, κ) with integer parameters ., ζ, κ. Being of the form .[]D, each formula states that the property D (described in text) holds for all observation intervals in the past. – Pump capacity: .[]!(slen = && [[P U MP ON && H H 2O]]). In any observation interval, if the pump is continuously on for . + 1 cycles, then the water level cannot be continuously high. – Methane release: .[](([H CH 4]^.[!H CH 4]^.H CH 4 ) ⇒ (slen > ζ )) and .[]([[H CH 4]] ⇒ slen < κ). The first formula states that the minimum separation between the two leaks of methane is .ζ cycles, while the second formula states that the methane leak cannot persist for more than .κ cycles. The commitments are as follows. The conjunction of commitments is denoted by MineCommit (w), and they hold intermittently in the absence of assumption.
.
– Safety conditions: .EP ((H CH 4 || !H H 2O) ⇒ !P U MP ON)) states that if there is a methane leak or absence of high water in the current cycle, then the pump should be off in the current cycle. Formula .!(true^.([[H H 2O]] && slen = w)) states that the water level does not remain continuously high in the last .w + 1 cycles. The mine pump specification denoted by .MineP ump(w, , ζ, κ) is given by the assumption-commitment pair .(MineAssume(, ζ, κ), MineCommit (w)). Additionally, we can specify .EP (!P U MP ON) as the soft requirement to get a controller which keeps the pump off as much as possible to save energy. On the other hand, if we change the soft requirement to .EP (P U MP ON), we get a controller which keeps the pump on as much as possible to get rid of high water as soon as possible (even if the hard requirement permits a longer latency).
240
6 Formal Modeling, Verification and Automated Synthesis
Example: Arbiter Specification We now give the formal specification of a synchronous bus arbiter. This is an enhanced version of the example discussed in Sect. 6.1.2.1. An n-cell synchronous bus arbiter has inputs .{reqi } and outputs .{acki } where .1 ≤ i ≤ n. In any cycle, a subset of .{reqi } is true, and the controller must set one of the corresponding .acki to true. The arbiter commitment, .ArbCommit (n, k), is a conjunction of the following properties: Mutex(n) = EP ( ∧i=j ¬(acki ∧ ackj ) ) N oLoss(n) = EP ( (∨i reqi ) ⇒ (∨j ackj ) ) .
NoSpurious(n) = EP ( ∧i (acki ⇒ reqi ) ) Response(n, k) = (∧1≤i≤n Resp(reqi , acki , k)
where
Resp(req, ack, k) = suff (([[req]] && (slen = (k − 1)) ⇒ (scount ack > 0)) The past QDDC formula .EP (P ) holds at a point i in execution if the proposition P holds at that point. Thus, the formula .Mutex(n) gives mutual exclusion of acknowledgments; .NoLoss(n) states that if there is at least one request, then there must be an acknowledgement; and NoSpurious(n) states that acknowledgement is only given to a requesting cell. Formula .suff (([[req]] && (slen = (k − 1)) ⇒ (scount ack > 0)) holds at a position if in the previous k cycles, if the request is continuously true, then there must be an acknowledgement during these last k cycles. Thus, .Response(n, k) says that each cell requesting continuously for the last k cycles must get an acknowledgement within the last k cycles. A controller can invariantly satisfy .ArbCommit (n, k) if .n ≤ k. For example, the DCSynth tool for automatic controller synthesis [208] gives us a concrete controller for the commitment .D h = ArbCommit (6, 6). It is easy to see that there is no controller which can invariantly satisfy .ArbCommit (n, k) if .k < n. To see this, consider the case when all .reqi are continuously true. Then, it is not possible to give a response to every cell in fewer than n cycles due to mutual exclusion of .acki . To handle such desired but unrealizable requirements, we make an assumption. Let a proposition .Atmost (n, i) be defined as .∀S ⊆ {1 . . . n}, |S| ≤ i. ∧j ∈S / ¬reqj . It states that at most i out of a total of n requests can be true simultaneously. Then, the arbiter assumption is the formula .ArbAssume(n, i) = true.Atmost (n, i) , which states that .Atmost (n, i) holds in the current cycle. The specification of the synchronous arbiter is the assumption-commitment pair .(ArbAssume(n, i), ArbCommit (n, k)), with the BeCorrect correctness criterion. This is denoted by .Arbiter(n, k, i). Here, n denotes the number of clients, k is the response time and i is the maximum number of requests that can be true simultaneously. This constitutes the hard requirement. We can take the commitment s .ArbCommit (n, k) as a soft requirement .D . Thus, even when the assumption is violated, the controller should try to maintain the commitment as often as possible.
6.2 Formal Requirements Specification
241
6.2.5 Formalizing Visual Requirements in Logic During the development of a safety system, several heterogeneous elements at each design phase come into play. Designers prominently use various visual requirements specification notations to capture and specify functional requirements addressing different aspects of the system. For example, the temporal relationship among different signals is specified using timing diagrams [183] or message sequence charts, and statecharts [111] efficiently specify the event-based dynamic behavioural aspects of the system.
6.2.5.1
Timing Diagram Requirements as Logical Formulas
One of the most prominently used visual formalisms is the timing diagram. A timing diagram is a collection of binary signals and the associated timing constraints among them. The major application areas for timing diagrams are in the specification of communication protocols, specifying timing constraints on embedded controllers, etc. The timing diagram notation is found to be very useful since it is easy for the designers to visualize waveforms of signals and these allow designers to specify the timing and ordering constraints between the events. Several attempts have been made to formalize timing diagram constraints in the framework of various temporal logics such as LTL [79], timing diagram logic [101], synchronous regular timing diagrams [58] and QDDC [167].4 The main motivation behind these efforts is to be able to exploit the automatic verification techniques for temporal logics and to use these for validation and automatic circuit synthesis. Example: Ordered Stack Let us consider the timing diagram shown in Fig. 6.6. There are three binary signals a, b and c. The rise and fall of these signals follow a stack discipline. The behaviour described by it is given by the following QDDC formula:
.
([!a] ^ < ua > ^ [a] ^ < va > ^ [!a]) && ([!b] ^ < ub > ^ [b] ^ < vb > ^ [!b]) && ([!c] ^ < uc > ^ [c] ^ < vc > ^ [!c]) && . (ext^ < ua > ^ ext ^ < ub > ^ true) && (true^ < ub > ^ ext ^ < uc > ^ true) && (true^ < vc > ^ ext ^ < vb > ^ true) && (true^ < vb > ^ ext ^ < va > ^ true).
4 Examples
given in this section are adapted from [167] with a few modifications.
242
6 Formal Modeling, Verification and Automated Synthesis
A B
a
f b
C
e c
d
Fig. 6.6 Timing diagram example
It should be noted that the first three conjuncts correspond to the three waveforms for signals .a, b and c. The four conjuncts after that correspond to the four arrows specifying the ordering constraints between the waveforms. In Fig. 6.6, .ua, ub and uc denote the positions of the events where the timing diagrams for a, b and c take transitions from low to high, respectively. Similarly, .va, vb and vc denote the position of the events where the timing diagrams for a, b and c take transitions from high to low, respectively; these are shown by arrows d, e and f in the figure.
6.2.5.2
Finite-State Machine Requirements as Logical Formulas
Finite-state machine (FSM) is one of the most widely used visual notations to specify various modes of the system. It consist of states and the transitions among these states. The transitions are taken on a particular event in the system, and they describe how the system behaviour changes in response to certain events. In a given state, the system is supposed to follow a particular behaviour. Therefore, the expected behaviour is determined by the current state of the system. In order to encode the state machine notation in logic QDDC, two formulas, namely, persist and transit, are defined. The formula transit takes three arguments, S1, E and S2, and specifies that the state changes from S1 to S2 on event E. The formula persist takes two arguments, S1 and E, and specifies that if the current state is S1, then this state persists unless there is an event E. Using these two formulas, we can encode the state machine in QDDC. The statechart notation for visual requirements specification was introduced by David Harel [111] as an extension of finite-state machines. It should be noted that the statechart [111] notation is better suited than the simple finite-state automata for the behavioural modeling of complex real-world systems. The statechart notation has features of hierarchy, concurrency and communication which allow complex systems to be represented succinctly and modularly. Extending the approach of this section, QDDC logic can also be used to formalize the statechart notation. The case study in Sect. 6.8 encodes the state machine specification for the alarm annunciation system example introduced in Sect. 3.9.1 and depicted in Fig. 6.14. The QDDC specification for the AAS system is given in Fig. 6.15.
6.3 Verification and Analysis of Logic-Based Requirements
243
6.3 Verification and Analysis of Logic-Based Requirements Having specified the requirements in a suitably expressive logic, it becomes possible to do a formal analysis of such requirements. Three main forms of analysis are prevalent. Firstly, a set of requirements can be checked for consistency—the property that they are not mutually contradictory. Secondly, given an actual system M (or its abstracted high-level model), we can try to formally check that in all possible circumstances, the behaviours of M satisfy the requirement. This is called formal verification. Thirdly, algorithms can be developed which construct prototype systems M which are guaranteed to satisfy the formal requirement. This is often referred to as automatic controller synthesis. The fact that a specification admits at least one controller satisfying it is called the realizability of the specification. This is a more sophisticated check than mere consistency of the specification. A controller synthesis method also solves the realizability problem. This and the subsequent two sections discuss the above three analysis problems, using the QDDC-based formal specification of systems introduced earlier. However, it must be mentioned that such formal analysis is widely carried out from other forms of temporal logics such as LTL and CTL too.5 When used judiciously, these three forms of analysis can have a marked influence on the reliability of a safety system.
6.3.1 Requirements Analysis: Consistency Checking, Realizability and Refinement A logical requirement is typically structured as a set of properties, all of which must be satisfied by the system under consideration. These represent the mandatory properties of the systems. Different requirements may address separate features of the system. Even before a system is built, it is important to check that these diverse requirements are consistent—that there is no mutual contradiction between them. A QDDC formula D is called satisfiable if for some behaviour .σ and some position i, we have .σ, i | D. The formula is called valid if for every .σ and every position i, we have .σ, i | D. It is possible to establish that a QDDC formula is satisfiable by analysing its formula automaton. Basically, the formula is satisfiable if its formula automaton has an accepting path. Similarly, it is also possible to check whether a formula is valid. The DCVALID tool, which constructs the formula automaton, also analyses the formula for satisfiability and validity. By using the satisfiability/validity checking ability of tools such as DCVALID, we can carry out analysis of requirements early in the life cycle, independent of the system being constructed.
5 The
logic LTL and satisfiability checking for LTL properties are discussed in Sect. 5.5.2.
244
6 Formal Modeling, Verification and Automated Synthesis
Consistency Given a set of requirements .D1 , · · · , Dn , we can check their consistency by establishing that .D1 ∧ · · · ∧ Dn is satisfiable. This shows that the requirements are not mutually contradictory.6 Consistency is a weak check. It just establishes that there is at least one scenario allowed by the specification. However, the user may want to establish that each use case of the system is consistent with the requirement. This can be carried out by formulating the use case as a formula, say .CASE, and checking that .CASE ∧ D1 ∧ · · · ∧ Dn is satisfiable/consistent. For example, in a hypothetical car specification, requirement .R1 may state that the door must be locked whenever the engine is on. A requirement .R2 may state that in a crash situation, the door must be unlocked. These two requirements together are unsatisfiable for the use case where a crash is happening. However, the specification is satisfiable in scenarios where a crash does not happen. Realizability Just because a set of requirements is consistent does not imply that there exists a system which can correctly implement these requirements. The existence of at least one system which meets all the requirements is called realizability. This is a more stringent property than consistency. We shall discuss the automatic synthesis of a prototype controller from a given requirements specification quite extensively in Sect. 6.4. Such a synthesis solves the realizability problem and thereby provides evidence of requirements being adequate. When a set of requirements turns out to be contradictory (or unrealizable), we have the option of turning some of them into soft requirements—they are no longer guaranteed. The other possibility is to reduce the scope of applicability of the system by adding an assumption under which consistency (or realizability) gets established. We saw examples of these techniques in the synchronous bus arbiter specification earlier. Refinement Given a hard requirement .(A, C) consisting of assumption A and commitment C, we can make a design decision which is written as a formula .DES. Then, we can verify that .((pref (A)∧DES) ⇒ C) is valid. This reduces the problem of designing the system meeting .BeCorrect (A, C) to that of designing the system which meets .G(DES). For example, for the mine pump specification .MineP ump(w, , ζ, κ) with given constants, .w = 8, = 3, ζ = 8, κ = 2, we can decide that the pump must be started within two cycles of water being high and methane being absent. Also the pump must be stopped as soon as (in the same cycle as) methane is detected or water level
6
A more sophisticated consistency check requires us to show that there exists an infinite behaviour which invariantly satisfies .D1 ∧· · ·∧Dn . Technically, this can be stated as goal .EG(D1 ∧· · ·∧Dn ). There are well-established algorithms for checking this given a property monitor automaton for .(D1 ∧ · · · ∧ Dn ). The DCSynth tool implements such an algorithm. .σ
6.4 Automatic Synthesis from Formal Requirements
245
stops being high. This is given by the following formula .DES: DES =
.
(! ([[HH2O&&!HCH4]] ∧ ((slen = 2)ˆ![[PUMPON]]))) ∧ [[(HCH4 || !HH2O) ⇒ !PUMPON]]
The DCVALID tool can then automatically establish the validity of the following verification condition: .
| MineAssume(3, 8, 2) ∧ DES ⇒ MineCommit (8)
From this, we have the guarantee that if we implement a controller meeting .G(DES), then the mine pump requirement is met. .DES can be further refined to even simpler formulas which are easier to implement. This method of system design is called the top-down refinement approach [49].
6.4 Automatic Synthesis from Formal Requirements Automatic program synthesis7 is the technique of obtaining an implementation automatically from the requirements given in a suitably expressive logic. Because of the unambiguous nature of logic-based requirements (formal requirements), it becomes possible to synthesize the program automatically, which satisfies these requirements. As previously described in Sect. 5.5.2.4, synthesis can be broadly classified as either functional synthesis or reactive synthesis. In the context of safety systems, reactive synthesis shall be the focus of this section, as nearly every safety system is reactive in nature.
6.4.1 Correct-by-Construction Synthesis Program synthesis deals with the problem of algorithmically obtaining an implementation for a given logical specification, such that the program exhibits exactly those behaviours which meet the specification. Originally, the problem was defined by Church [194] over specification given in monadic second-order (MSO) logic. The aim was to synthesize the implementation as a Mealy machine which realizes the given specification. One of the most useful problems in the implementation of safety systems is the synthesis of the reactive program from the given specification. Reactive programs are the programs that continuously interact with the environment. Büchi et al. [74] and Rabin [194] independently presented solutions to the synthesis problem. 7 This section draws upon the previous work of the author published in [207, 208]. The reader may refer to the original sources for the complete details of this approach.
246
6 Formal Modeling, Verification and Automated Synthesis
Specifying the behaviour of reactive systems in MSO is cumbersome, and hence, several alternatives were proposed in the literature. One of the most widely accepted alternatives is LTL, which is used in the formal verification and synthesis community [193]. However, the synthesis from LTL properties was proved to be doubly exponential in the size of the formula by Rosner [197]. To make the synthesis more efficient, people have looked at the useful subsets of LTL. Piterman et al. [157, 191] have proposed an efficient polynomial time symbolic algorithm to automatically synthesize controllers for the subset of LTL called GR(1). Similarly, Wolff et al. [211] have identified a useful subset of LTL for the efficient controller synthesis for non-deterministic transition systems and Markov decision processes. Reactive synthesis from specification given in linear temporal logic (LTL) has been widely studied, and theories and tools have been developed [56, 68] for such synthesis. The related problem of synthesis of supervisory control for a discrete event system (DES) was introduced by Ramadge and Wonham [195, 196]. Here, the plant (i.e. the system to be controlled) is modeled using a finite-state automaton. The required behaviour is also specified as a finite-state automaton. They proved that the controller can be synthesized in linear time for such specifications. The aim of supervisory control is to synthesize a controller for a given specification which restricts the behaviour of the plant such that the resulting plant behaviour always meets the specification. Ramadge and Wonham formulated the necessary algorithm to automatically construct such a controller. The connection between the supervisory control theory and reactive synthesis was formally studied by Ehlers et al. [95].
6.4.2 Controller Synthesis: From Correctness to Quality Controller synthesis aims at algorithmically constructing an implementation (say, a Mealy machine or a digital circuit of the controller) from a given specification of the desired behaviour of the controller. The specification is typically given using formulas of a suitable temporal logic. As we saw in our framework, a system with input and output variables .(I, O) is specified as .(I, O, D h , D s ). The correctness of a controller M for this specification requires that the controller behaviour in response to any arbitrary input must satisfy .D h at every point in execution. Moreover, a high-quality controller must maximize the relative frequency of the points where .D s holds in executions, when averaged over all executions of the system. In order to meaningfully define this notion of quality, we assume that the input to the controller is defined by a Markov process, and we take the expected (average) value of the fraction of points where .D s holds during the execution, when averaged across all executions. In other words, we define the probability .E(D s ) of s .D holding across all time points in all executions of a given system. Such a notion of quality is quite well established in the performance analysis literature. Here, we do not go into its mathematical formulation [100]. When the input sequences are generated by a finite-state Markov process, and the controller itself is a finite-state
6.4 Automatic Synthesis from Formal Requirements
247
system, there are well-known algorithms which compute this expected value [100]. A simplifying assumption can be that inputs are fully random and each point for every input is equally likely. Under this assumption, using well-known techniques, the DCSynth tool allows us to compute the expected value (probability) .E(D s ) of .D s holding across all positions of all behaviours of a given controller Mealy machine. A controller is called optimal if it gives the maximum value of .E(D s ) among all controllers which guarantee the hard requirement. Now we discuss a technique which allows the synthesis of a discrete controller from a logical requirement specified as a tuple .(I, O, D h , D s ), where .D h and .D s are QDDC formulas over a set of input and output propositions .(I, O). Here, .D h and s .D are the hard and the soft requirement, respectively. The aim is to synthesize a controller which (a) invariantly satisfies .D h and (b) meets .D s at “as many points as possible”. Meeting .D s “at as many points as possible” is achieved by synthesizing a controller which maximizes (optimizes) the cumulative count of .D s holding in the next H moves, averaged over all the inputs of length H . Such a controller implementation is called H -optimal for .D s . The algorithm for the synthesis of a controller implementation from the specification given as tuple .(I, O, D h , D s ) consists of the following steps: • First define the term called supervisor, which is a non-blocking Mealy machine that may non-deterministically produce one or more outputs for each input. A supervisor may be refined to a sub-supervisor by resolving (pruning) the nondeterministic choice of outputs. A controller is a deterministic supervisor. • The algorithm starts by first computing the property monitor automaton for .D h . • In the second step, we compute maximally permissive supervisor (MPS) from the monitor automaton. The supervisor MPS contains all the behaviours which invariantly satisfy .D h , and it is non-blocking. • Then maximally permissive H-optimal supervisor (MPHOS) is computed by pruning MPS. This is done by keeping only those outputs of MPS which Hoptimally satisfy .D s . Such an H -optimal supervisor maximizes the expected value (probability) of .D s holding in the next H steps across all behaviours of the supervisor. The computation of such an H -optimal supervisor draws upon the technique of the finite horizon controller for Markov decision processes, pioneered by Bellman [65]. • Finally, MPHOS is turned into a controller by pruning the choice of outputs based on a user-given preference ordering. This algorithm is efficiently implemented by the DCSynth tool. (See [207, 208].)
6.4.3 Synthesizing Robust Controllers from QDDC Specification Robust controller synthesis deals with the automatic synthesis of a controller which continues to function (i.e. maintain its commitment) even under limited failures
248
6 Formal Modeling, Verification and Automated Synthesis
(of the assumptions about) of the environment and the plant. Under transient failures, the synthesized controller should be able to recover by re-establishing the commitment in bounded time [69, 166]. Several authors have investigated various notions of robustness [96, 166]. We now consider the specification of a robust controller using the logic QDDC.8 As discussed in Sect. 6.2.4, a controller specification consists of a pair of QDDC formulas .(A, C), giving the assumption and the commitment, along with a correctness criterion. A standard correctness criterion [68, 157, 191] called .BeCorrect (A, C) gives rise to the hard requirement .D h = (pref (A) ⇒ C). This mandates that in any behaviour of the controller M, if A has held continuously in the past, then C must hold at that point. Formally, .M | G(D h ) must be true. The reader may note that .BeCorrect (A, C) is a rather stringent requirement. Even one intermittent violation of the assumption is sufficient to invalidate the commitment forever afterwards. A more robust criterion, [68] called .BeCurrentlyCorrect (A, C), requires that the system M should have the property M | G(A ⇒ C)
.
In this case, the hard requirement .D h is .(A ⇒ C). It states that at every position where the assumption A is intermittently true, the commitment must also hold. Thus, even if A has been false at some time in the past, if A has recovered, then C must recover. A controller which satisfies the .BeCurrentlyCorrect (A, C) criterion guarantees that C holds in more circumstances than a controller which satisfies the .BeCorrect (A, C) criterion. It is more robust in the presence of the violations of the assumption A. The disadvantage is that the .BeCurrentlyCorrect (A, C) criterion may make the controller specification unrealizable—it may become impossible to construct a controller satisfying it. But, if realizable, such a controller is preferable. In general, robustness pertains to the ability of commitment C to hold even when the assumption A does not invariantly hold in the past [68]. A relaxed assumption denoted by .Rb(A) specifies a weaker condition than .pref (A), and the robust specification can be given by the formula .G(Rb(A) ⇒ C). Thus, C should hold whenever the relaxed assumption .Rb(A) holds. We term this as hard robustness with the hard requirement .D h = (Rb(A) ⇒ C). Different QDDC formulas (called robustness criteria) give rise to different notions of robustness. For the BeCurrentlyCorrect criterion, we have .Rb(A) = A, which specifies that whenever the assumption A is true, the commitment should also be met irrespective of whether A was satisfied in the past.
8 Examples in this section are adapted from the previous work of the authors published in [189, 207] with a few modifications.
6.4 Automatic Synthesis from Formal Requirements
249
Yet another notion of hard robustness is given by the criterion .LenCntI nt (A, K, B) with integer parameters .K, B. This criterion is given by the formula Rb(A) = suff (slen < B ⇒ (scount !A < K)).
.
It states that at any position i, if the assumption A has failed to hold at most K times within the last B cycles, then the commitment must hold at position i. A controller for such a criterion is more robust that the .BeCorrect (A, C) controller, and it is likely to be realizable even when the .BeCurrentlycorrect (A, C) controller is not realizable. It should be noted that this criterion mandates the recovery of the commitment C under transient violations of the assumption A. This is ensured by specifying that C should be re-established once the assumption A has held sufficiently often in the recent past. Hard robustness provides firm guarantees on when the commitment must hold. Soft robustness, on the other hand, pertains to the controller’s ability to try to meet the commitment C even when the relaxed assumption .Rb(A) does not hold. Bloem et al. have called this notion “never give up” [69]. To exploit the concept of soft robustness, we synthesize a controller by specifying C as the soft requirement, which gives rise to a controller that satisfies C “as much as possible”. As given in Sect. 6.4.2, this will maximize the expected value of the count of C over the next H -steps. Several diverse robustness criteria .Rb(A) and their effect on the quality of the controller are studied in [207]. The following section provides a brief summary of this study. The reader will appreciate that rather than hardwiring notions of robustness in the handcrafted code, the logical specification-based approach provides a flexible and modular way to specify and automatically synthesize robust and high-quality controllers with desired features.
6.4.3.1
Experimental Evaluation
High-quality, robust controllers can be automatically synthesized from the QDDC specification using the techniques presented in earlier sections. These techniques have been implemented in a DCSynth tool. This section presents such synthesis carried out using DCSynth for the two case studies of the mine pump and the arbiter specifications, given earlier. Experimental results giving the effect of hard and soft robustness on the quality of the controller are presented. The quality is measured by .E(C), the probability with which the commitment holds in the long run over random inputs. For each case study and for each robustness criterion, two controllers, namely, the .MPS and .MPHOS, are synthesized, and the corresponding values .E(C) are computed. Both the controllers enforce the commitment whenever the relaxed assumption .Rb(A) holds. However, the .MPS controller ignores the commitment when .Rb(A) does not hold. By contrast,
250
6 Formal Modeling, Verification and Automated Synthesis
Table 6.1 Expected value of commitment formula C holding in the long run over random inputs Arbiter(4,3,2) Robustness criteria BeCorrect LenCntInt(K,B) BeCurrentlyCorrect
.E(C)
.E(C)
.MPS
.MPHOS
k .= 1, b .= 3 0.000000 0.768066 0.687500
0.998175 0.992647
Minepump(8,2,6,2) .E(C) .MPS .MPHOS k .= 2, b .= 8 0.000000 0.997070 0.0027342 0.997070
.E(C)
the .MPHOS further optimizes the controller by trying to maximize the frequency of occurrence of C even when .Rb(A) does not hold (soft robustness).9 An examination of Table 6.1 is quite enlightening. We state the following main findings: • In both the case studies, for the .MPS controllers, the expected value of commitment C depends on the robustness criterion used. • The soft robustness used in the .MPHOS controller has an overwhelming impact on the measured expected value of C holding under random inputs. In both case studies, this value is above .99% irrespective of the robustness criterion used. Thus, soft robustness vastly improves the expected case performance of the controller and should be preferred over .MPS. • The .MPHOS controllers synthesized with different robustness criteria may have a similar or the same value of .E(C). However, they provide very different hard guarantees of when the commitment must hold. For example, for .Minepump(8, 2, 6, 2), all the .MPHOS controllers have the same expected value, but they are not identical. • In summary, the hard robustness provides the conditional guarantee of meeting the commitment C under the relaxed assumptions, and the soft robustness improves the performance of the controller by optimizing it for meeting commitment C even when the relaxed assumptions are not met. Therefore, the combination of hard and soft robustness is useful. Tools such as DCSynth support the automatic synthesis of high-quality controllers with both hard and soft robustness.
6.4.4 Synthesis of Run-Time Enforcement Shields A run-time monitor for a safety-critical property is a system component which continuously observes the inputs (I ) and outputs (O) of a system under consideration (SUC), and it checks for the compliance of the behaviour so far with the specified
9 Experiments were performed on the Ubuntu system with Intel i5 64 bit, a 2.5 GHz processor and 4 GB memory. The synthesis was completed within 3 s for both the examples.
6.4 Automatic Synthesis from Formal Requirements
251
Runtime Monitor
Fig. 6.7 Run-time monitor
property. At every point in execution, the property monitor outputs a signal, say SUCOK, which is true if the property holds and false otherwise (see Fig. 6.7). When a system is augmented with a run-time monitor, we get a self-aware system. Such self-aware systems can provide valuable insights on the correct functioning of systems. They can flag property violations during simulation, testing or field usage. The reader may note that for every QDDC formula D, a property monitor (or formula automaton) .A(D) can be automatically synthesized. This automaton can be encoded as a Mealy machine or a digital circuit and integrated with the system under consideration to provide run-time monitoring. A run-time enforcement shield [190] is a more sophisticated device than a runtime monitor. It observes the inputs and outputs of a system under consideration (developed manually), and, like a run-time monitor, it checks for the correctness of the system behaviour with respect to a given critical property. However, instead of outputting the .SSEOK signal, the shield rectifies the system-generated output when it does not satisfy the given critical property. Let us define a system with sporadic errors (SSE) as a controller which produces desirable output for any given input, but due to design errors, it may sporadically violate a critical system requirement. Here, we assume that the critical requirement is specified by a QDDC formula .REQ(I, O), where I and O are the set of input and output propositions. Usually, manually designed controllers also include unspecified optimizations and properties not captured by .REQ in their behaviour, and they are often preferred over automatically synthesized controllers. However, they may exhibit obscure design errors due to oversight or misunderstood problem specification. A run-time enforcement shield for a critical requirement given by .REQ(I, O) is a controller (Mealy machine) which observes both the input and the output .(I, O) generated by SSE and produces an output .O which is guaranteed to meet the critical requirement .REQ(I, O ) invariantly. Moreover, the shield output .O must deviate from the SSE output O as little as possible, in order to maintain the quality. This ensures that the shield minimizes the deviation from the system designer’s optimizations (see Fig. 6.8) to maintain the quality.
.
252
6 Formal Modeling, Verification and Automated Synthesis
Fig. 6.8 Run-time enforcement shield
A central issue in designing run-time enforcement shields is the underlying notion of “deviating as little as possible” from the SSE output. Let Deviation denote a proposition encoding the fact that system output O and shield output .O are not equal. A hard deviation constraint is a QDDC formula over propositions .SSEOK and Deviation which specifies when the shield is permitted to deviate from the system output. The shield must invariantly satisfy this constraint at every point in execution. For example, the following formula .H DC(e, d) with integer parameters .e, d specifies one such hard deviation constraint. .
[]((scount !SSEOK ≤ e) ⇒ (scount Deviation ≤ d)) ∧ []((BP (!Deviation) ∧ [[SSEOK]]) ⇒ [[!Deviation]])
The first line states that in any observation interval where the system violates the requirement cumulatively for at most e cycles, the shield can deviate for at most d cycles. The second line specifies the property of no spurious deviation which states that deviation cannot begin without encountering an error (but deviation can last even after the error clears). In addition to invariantly satisfying .HDC, the shield must choose outputs which minimize the cumulative hamming distance between the shield output and the system output. This property can in fact be encoded as a QDDC formula .EP (!Deviation), which can serve as a soft requirement. By maximizing the relative frequency of occurrence of this property .SDC, we obtain minimally deviating shields. A run-time monitor for .REQ generates an .SSEOK signal. Similarly, we can construct a monitor for signal Deviation. By combining the hard deviation constraint .HDC and the soft deviation constraint .SDC over these signals, it is possible to arrive at a controller specification .(I, O, D h , D s ) which gives rise to a correct-by-construction shield with minimal deviation. The synthesis method described earlier can then be used to automatically synthesize such a shield. An experimental evaluation of the quality of resulting shields shows that the method is capable of generating high-performance run-time enforcement shields. The detailed
6.5 Formal Verification
253
notions of various .HDC constraints, the detailed shield synthesis technique and the experimental evaluation of the quality of the resulting shields can be found elsewhere [190, 207].
6.5 Formal Verification Formal verification involves three entities, viz. (a) the model of the system, (b) the requirement to be satisfied by the model and (c) the method to check whether the model satisfies the requirement. The aim of formal verification techniques is to show mathematically that the model satisfies the formula representing the desired requirement. The most suitable representation for modeling the system under consideration is based on either expressive logic or as a finite-state automaton/state machine. Requirement over this model is generally specified using a suitable logic-based specification. The verification methods can broadly be classified into two categories, viz. proof-based (theorem proving) and model-based (model checking). In a proofbased approach, the system description is given as a conjunction of formulas (say .) in a suitable logic, and the specification is given as another formula (say .ψ). Then formal verification essentially establishes that the set of formulas . logically imply the requirement .ψ. The reader may note the similarity of this to the refinement step discussed earlier. In the theorem proving approach, the logical implication is established by giving a proof. This involves finding a sequence of deduction rules which when applied to . will be able to infer .ψ. Applying such a method requires the familiarity with the logic and its inference rules. In the model checking approach, the system is represented by a model M, usually as a finitestate machine. The specification is again represented by a formula (say .ψ), and the verification method consists of algorithmically checking whether the model M satisfies .ψ. Suitable algorithms have to be developed for the specification logic and the modeling language used. The theorem proving technique is usually manual, whereas the model checking is automatic for finite models. Model checking and theorem proving both can be applied at model level as well as at program level. When these techniques are applied at program level, then a model needs to be extracted from the program to make it suitable to apply formal verification. In Sect. 6.1.2.1, we have discussed an example of a formal verification of a property at model level (for the SCADE model). In this section, we will discuss various formal verification approaches, the associated technique and the tool(s) for verification at program level. For formal verification at program level, the program is generally written in a syntactically restricted subset of the programming language (e.g. C, Java, etc.) by avoiding the use of constructs such as infinite loop, pointers and unbounded array references. These restrictions make the program amenable to formal verification.
254
6 Formal Modeling, Verification and Automated Synthesis
There are formal verification tools and techniques, available in both commercial and academic domains, for performing the formal verification of hardware and software designs. CBMC [158] is a program-level bounded model checker for C programs. VHDL Bounded Model Checker [144] is a tool used for verifying the functional properties of hardware designs described in VHDL. NuSMV [81] is a symbolic model checking tool aimed at the reliable verification of industrial designs. It supports a very wide variety of model checking techniques. SPIN [116] is a model-level verification tool used for verifying models of distributed software systems. ABC [72] is a verification tool for the synthesis and verification of binary sequential logic circuits appearing in synchronous hardware designs. Towards theorem proving, the most widely used technique is the pre-condition/postcondition-based approach known as Hoare logic, which was proposed by C.A.R. Hoare [115]. We will be discussing the two prominent approaches for formal verification: (a) the model checking technique using NuSMV and CBMC tools and (b) theorem proving by a Hoare triple in the subsequent sections.
6.5.1 Model Checking Model checking involves algorithmically answering whether the set of behaviours exhibited by the given system model (often given as a finite-state automaton) are included in the set of behaviours which satisfy the logical formula. The model checking problem for LTL [206] and MSO [75] against the system specification given as finite-state automaton is well studied in the literature, and several efficient tools exist [80, 90, 156]. These tools have significantly contributed to showing the importance of formal methods in providing the correctness guarantees required by high-integrity systems. Model Checking Approaches Model checking involves exploring the state space of the model to check whether a given property is satisfied by the model. We shall confine ourselves to safety property verification where the erroneous set of states of the system are specified. The property to be established is that no erroneous state can be reached from the initial state in any possible execution. It should be noted that as the model becomes more complex, the states of the model also grow exponentially larger and hence safety verification sometimes becomes intractable. This problem is called the state space explosion in the model checking domain. To overcome this problem, researchers have over the years devised various model checking methods, which are described briefly as follows. Although it is not the complete list of model checking techniques available, it covers most of the practically useful techniques. (i) Explicit state model checking: In this method, the system (high-level model) is explicitly represented as a state machine encoding the global state transition graph. The validity of a temporal property, given in some temporal logic (such as LTL and CTL), over a given model is evaluated by interpreting its
6.5 Formal Verification
255
global state transition graph as a Kripke structure [67]. A Kripke structure is a mathematical structure derived from a state machine, whose nodes represent all the possible states of the system and whose edges represent state transitions. The reachability analysis algorithm over this structure is used to prove the specified safety properties. The reachablility analysis is implemented as a breadth-first (or depth-first) search from a given initial state. (ii) Symbolic model checking: Instead of listing reachable states one at a time, the state space can be traversed more efficiently by considering a large number of states at a time. Symbolic model checking represents a set of states and transition relations as logic formulas instead of creating a graph explicitly. The process works roughly as follows: The set of initial states is represented as a formula. The procedure then starts an iterative process, where at each step i, the set of states that can be reached in i steps from an initial state are added to the formula. This can be efficiently done by logically transforming the formula representing the set of states reachable in .i − 1 steps to those reachable in i steps. At each such step, the set of new states is intersected with the set of erroneous states. If the resulting set is non-empty, it means that an error has been detected. This process terminates when the set of newly added states is empty or an error is found. The first case indicates that the property holds, because no reachable state contradicts it. In the latter case, the path from the initial state to the error state gives the counterexample. The reader may refer to [84, 86, 118] for details. The two main approaches for symbolic model checking are based on two different logical representations of Boolean functions, namely, binary decision diagrams (BDD) and SAT/SMT formulas. These representations have their distinctive algorithms to analyse Boolean formulas. Typically, such algorithms allow for finding satisfying assignments of a formula or checking for equivalence of two Boolean formulas [118]. (iii) Bounded model checking (BMC): The basic idea in BMC is to search for a counterexample whose length is bounded by some user-provided integerbound k. Thus, BMC finds bugs in partial executions of user-specified length. If no bug is found within that bound, then k is increased until either a bug is encountered or some user-specified bound is reached. The BMC is performed by reducing it to a satisfiability problem and can therefore be solved by SAT/SMT solvers [201]. It may be noted that in BMC, the user has to provide a bound on the number of cycles that should be explored, which implies that the method is incomplete if the bound is not high enough. While this method is practically quite effective at finding bugs, it cannot detect bugs which occur in the execution of depth longer than the user-specified bound k. (iv) IC3 (Incremental Construction of Inductive Clauses for Indubitable Correctness): Compared to BMC, IC3 [82] follows a different approach for the reduction to SAT. Instead of an unwinding, it computes step-wise overapproximations of reachability information. It then focuses on single states and proves or disproves their reachability in a backward-search fashion. During
256
6 Formal Modeling, Verification and Automated Synthesis
this procedure, only single steps of the transition relation are considered. As a result, IC3 makes the SAT solver deal with many simple queries. Model Checking Tools: 1. NuSMV (Symbolic Model Checker) NuSMV [6] is one of the most widely used tools in the formal verification domain. It has a rich programming notation which allows complex system models with concurrency, hierarchy and modularity to be encoded as a labeled transition system (or a Kripke structure). It facilitates the symbolic model checking of LTL and CTL properties on a specified model. It supports a very wide variety of state-of-the-art model checking algorithms. NuSMV has been used to model check numerous examples, especially those pertaining to digital circuits. These include cache controllers, bus controllers and communication protocols. A NuSMV model consists of various modules which can be instantiated any number of times in other modules. This allows for breaking the system specification into several simple modules. It has a unique and mandatory module named main like in the C program, which is the starting point of the model. The NuSMV modeling language allows the specification of synchronous as well as asynchronous concurrent composition of modules. A module is structured with the sections called .VAR, .ASSIGN and .SPEC. The .VAR section is used to define the variables; .ASSIGN is used to assign the values to the variables using init and next keywords to provide the initial value and the value in the next cycle, respectively. Finally, the .LTLSPEC section is used to specify the required safety properties using the logic LTL discussed in Sect. 5.5.2. Example NuSMV Model Consider the following standard NuSMV model (taken from [118]). This is a simple example with only the main module. The model has a Boolean variable request and an enumerated variable status. The variable request is an input, controlled by the external environment, and hence, the NuSMV model is not defining its value. The variable status is an output partially defined by the model. Its initial value is ready and the value in the next cycle is busy, if the input is true; otherwise, it has a non-deterministic value. The LTL specification in the example describes the property that “whenever request input becomes true, the output status eventually becomes busy”. When the model along with the LTL property is given as an input to the NuSMV model checker, it tries to verify the property over the transition system represented by the model and outputs whether property is satisfied on this transition system or not. The transition system (non-deterministic) corresponding to the example specification given in Fig. 6.9 is shown in Fig. 6.10. There are four states in the transition system each depicting an instance of values of variables req and status, with states S1/S3 as the initial state. From the transition system, it is evident that the specified property will be met and the same is outputted by the NuSMV model checker.
6.5 Formal Verification
257
Fig. 6.9 Sample NuSMV model [118] S1
S2
req=TRUE, status= ready
req=TRUE, status=busy
S3
S4
req= FALSE, status= ready
req=FALSE, status=busy
Fig. 6.10 Transition system corresponding to the NuSMV example model
2. CBMC (Bounded Model Checker for C Programs) CBMC is a tool for the formal verification of C and C++ programs based on the bounded model checking technique. It supports most of the C syntax, including C89, C99 and most of C11. The tool also allows the analysis of programs with dynamic memory allocation and checks for pointer safety, exceptions, buffer overflow violations and user-defined assertions. CBMC is a tool which implements bounded model checking for a C program. It verifies the absence of a violation against a user-defined assertion. CBMC converts the C program to a Boolean formula by considering branches and unrolling loops to a given depth and solving that formula to prove the correctness or violation of the assertion. CBMC uses state-of-the-art SAT/SMT solver to check for the satisfiability of these formulas. The tool reduces BMC to SAT instance and uses various SAT solvers to finally generate the counterexample (if any) for the specified property. The C program is fed into CBMC, along with a property as assert statement and the maximal “unrollings” to be done to program loop(s) (say k). CBMC unrolls the program k times and performs symbolic execution to translate it into a formula called verification condition. This verification condition is encoded as
258
6 Formal Modeling, Verification and Automated Synthesis
Fig. 6.11 Sample C program for verification in CBMC
conjunctive normal form (CNF) formula, which is then analysed by a SAT solver to generate the counterexample. CBMC is quite effective at finding bugs. However, it should be noted that the method is incomplete and it cannot detect bugs which occur in executions of depth longer than the user-specified bound k.
Example C Program A sample C code given for verification to CBMC is shown in Fig. 6.11. The property to be proved for this program is “y==x” (given as an assert in the program). The CBMC finds it to be always true and reports the result. Figure 6.12 shows the verification result outputted by CBMC for the given property.
6.5.2 Theorem Proving Theorem proving is the methods of formal verification in which the system model and the requirements specification are expressed as correct formulas in a suitable logic, such as the Hoare logic discussed below. The deduction-based technique is used to prove that the model logically infers the specification. The proof system consists of a set of axioms and inference rules to guide the deduction based on rewriting, simplification and induction. Unlike model checking, theorem proving in general cannot be fully automated because of undecidability results for the practically useful logical specifications. However, in theorem proving, we do not need to exhaustively check every state that the system can get into. Therefore, theorem proving can be applied to systems where program variables have infinite
6.5 Formal Verification
259
Fig. 6.12 CBMC example verification result
domains and can have infinitely many interacting values (states) unlike model checking where the model is assumed to have finitely many states. In this section, we will concentrate on an assertion-based proof technique which can be directly applied at the program level. One of the most widely accepted such techniques is Hoare logic [115] for the verification of pre-/post-condition-based specifications of transformational programs. In Hoare logic, the specification of a program or its fragment S is given using a pre-condition P and a post-condition Q. These are formulas of first-order logic giving a condition on the values of the program variables. For example, a condition .(a = r + q ∗ b) ∧ (0 ≤ r < b) states that r is the remainder and q the quotient of a divided by b. The correctness formula takes the form of a Hoare triple .{{P }} S {{Q}}. This correctness formula states that any execution of S, if started with condition P true, must terminate and the condition Q must hold on termination. Such a formula is called the total correctness formula.
260
6 Formal Modeling, Verification and Automated Synthesis
See the following example of a program fragment which divides an integer a by an integer b giving remainder r and quotient q. Hence, the example condition above must hold on termination. The program functions correctly only if started with pre-condition .a ≥ 0 ∧ b > 0. {{a>=0 /\ b>0}} -- Pre condition r = 0; q = a; while (r >= b) { r = r+1; q = q -b; } {{(a = r + q*b) /\ (0 = PSH. (v) Signal .V_CLOSE is generated to close the valve V if both the following conditions are satisfied: (a) .V_CLOSEC pressed. (b) Pump P is not running.
Safety Property for Cooling Water Supply Control System Valve V must open if .≥3 s elapsed after pump P had started.
7.6.1.2
Interlock Logic Using PLC language and Its Translation
The logic of cooling water supply control system, specified using the IEC 61131-3 programming language (SFC), is shown4 in Fig. 7.15. The notations used in the SFC program are listed in Table 7.3. It can be observed from Fig. 7.15 that the SFC program has f ive steps, viz.: INIT, .PUMP_ON, .PUMP_OFF, .VLV_CLOSE and .VLV_OPEN
.
The above PLC application program is translated as a synchronous model in Lustre as described in Sect. 7.3.4. In the translation, the step-active flags and the step-time variables corresponding to all steps are defined in the Lustre program node. A step-active flag denotes whether the step is active or not. The value of a step-time variable shows how long it has been active. The following step-active flags and step-time variables are defined for the translation of the PLC program depicted in Fig. 7.15. (i) Step-active flags: .INIT__X, .PUMP_ON__X, .VLV_CLOSE__X and .VLV_OPEN__X (ii) Step-time variables: .INIT__T, .PUMP_ON__T, .VLV_CLOSE__T and .VLV_OPEN__T
PUMP_OFF__X,
.
PUMP_OFF__T,
.
The textual representation of the PLC program in ST and its equivalent Lustre code is presented in Appendix B.
4 ©2012
IEEE. Reprinted, with permission from [147].
7.6 Case Study 1: PLC Application Program Verification
309
Fig. 7.15 SFC program for the control system of the cooling water supply system Table 7.3 Notation used in the SFC program for the CWSS control system Notation .P_STARTC .P_STOPC .P_ON_OFF .V_OPENC .V_CLOSEC .V_OPEN .V_CLOSE .T_LEVEL .LOW .HIGH .P_DIS_PR .PSH
Description Manual command to start pump P Manual command to stop pump P Signal is generated to run the pump P Manual command to open valve V Manual command to close valve V Signal is generated to open valve V Signal is generated to close valve V Current water level of storage tank Set point for low water level in storage tank Set point for high water level in storage tank Set point for high discharge pressure of pump P Set points corresponding to the maximum allowed water level
It is to be noted that the synchronous languages have a notion of logical time, which denote the number of time ticks (periodic) arrived. In order to maintain the synchrony hypothesis, the minimum difference between two consecutive logical time ticks should be more than the worst-case execution time (WCET) of the application program. For specifying the safety properties having realtime constraints, the physical time is required to be translated into logical time. For example, if an application has a cycle time of 500 ms, then the physical time of say 5 s is represented by ten logical ticks. In the case study, the time interval between two successive logical ticks is assumed to be 1 s (periodicity of the application). Therefore, N logical ticks represent the physical time of N seconds.
310
7 Development of Qualified Platform
Once the PLC program is translated into Lustre specification, a formal verification with respect to different safety properties is carried out by introducing observer nodes. In an observer node, the safety property is specified as a Lustre equation. The model checker tool Lesar takes the observer node and translated node as inputs and verifies the safety property. It returns true if the PLC program satisfies the safety property; otherwise, it returns false with a counterexample.
7.6.1.3
Verification of Property
Consider the safety property “Valve V must open if .≥3 s elapsed after pump P had started”. To verify this safety property, the following Lustre equation is specified in the observer node: .property
= (P U MP _ON __X and (P U MP _ON __T ≥ 3)) ⇒ V _OP EN
.property
= not (P U MP _ON __X and (P U MP _ON __T ≥ 3)) or V _OP EN
Or
A formal verification of this safety property by the model checker tool Lesar results in false. It also gives the following counterexample, which represents the control flow, which leads to safety property violation. T RAN SI T I ON 1 : true T RAN SI T I ON 2 : P _ST ART C or (T _LEV EL ≤ 10) T RAN SI T I ON 3 : (P _ST OP C and not V _OP EN C and (P _DI S_P R ≥ 100)) or (not V _OP EN C and (P U MP _ON __T ≥ 3) and (P _DI S_P R ≥ 100)) or (not V _OP EN C and (T _LEV EL ≥ 50)
.
and (P _DI S_P R ≥ 100)) T RAN SI T I ON 4 : not V _OP EN C and P _ST ART C and (T _LEV EL ≥ 50) and (P _DI S_P R >= 100) T RAN SI T I ON 5 : P _ST ART Cand(P U MP _ON __T ≥ 3) or (P U MP _ON__T ≥ 3)and(T _LEV EL ≤ 10)
This counterexample denotes that if both the steps .PUMP_OFF and step V_CLOSE are concurrently active and the condition
.
.(notV _OP EN C)
and P _ST ART C and (T _LEV EL ≥ 50) and (P _DI S_P R ≥ 100)
7.6 Case Study 1: PLC Application Program Verification
311
is true, then transition T 3 and transition T 6 have taken place simultaneously. It leads to the violation of the safety property. This is because of a design error in the PLC program, which is corrected by changing the guard conditions of transition T 3 and transition T 4 as follows: .T 3
= (P _ST ART C AN D V _CLOSE) AN D N OT (T _LEV EL ≥ H I GH )
T 4 = (V _OP EN COR(P _DI S_P R ≥ P SH ) AN D N OT (T _LEV EL ≥ H I GH )
7.6.1.4
Verification of Unreachability Property
A step of an SFC program is considered as an unreachable step if it never becomes an active step. In other words, a step is unreachable if its step-active flag never becomes true. In order to detect any unreachable step in the SFC program (shown in Fig. 7.15), the following specification is used: .property
= never(I N I T __X) or never(P U MP _ON __X) or never(P U MP _OF F __X) or never(V _CLOSE__X) or never(V _OP EN __X)
If the property holds true for the program, it implies that there exists an unreachable step in the program. 7.6.1.5
Verification of Mutual Exclusion
Another class of property can be checked to see whether the program conforms to the mutual exclusion requirement of different step activation. For example, for verifying that steps .PUMP_OFF and .VLV_OPEN of the SFC shown in Fig. 7.15 are mutually exclusive, the following Lustre specification is used in the observer node: .property
7.6.1.6
= never(P U MP _OF F __X and V LV _OP EN __X);
Verification for Deadlock
A deadlock exists in an SFC program if a step is not deactivated after its activation. In other words, the step-active flag of a step will never become false once it is set. In order to check the SFC program (shown in Fig. 7.15) for the presence of deadlock, the following specification is used: .property_DL
= never(not (I N I T __X) and pre(I N I T __X)) or never(not (P U MP _ON __X) and pre(P U MP _ON __X)) or
312
7 Development of Qualified Platform
never(not (P U MP _OF F __X) and pre(P U MP _OF F __X)) or never(not (V _CLOSE__X) and pre(V _CLOSE__X)) or never(not (V _OP EN __X) and pre(V _OP EN _X))
If this property holds true, it implies that there is a deadlock in the SFC program.
7.7 Case Study 2: PLC Supporting Space and Time Partitioning A prototype PLC supporting an integrated modular avionics (IMA) system is discussed in this section. We will refer to this PLC supporting space and time partitioning (discussed in Sects. 1.5.2.3 and 4.3.6.3) as partitioned PLC or pPLC in short. This case study highlights the IMA partitioning requirements of pPLC and how to deal with the challenges in meeting those requirements. While space, i.e. memory partitioning techniques, is simpler and well established, supported by hardware design, time partitioning has its own challenges.
7.7.1 Partition System Model The partitioned real-time system consists of a number of partitions and applications in individual partitions, which run independently as shown in Fig. 7.16. The applications are independent in terms of assured processor time and the allocated memory space, which are protected from overrun by applications in other partitions. The partitions, however, can interact with each other, and interacting tasks within a partition are managed by independent schedulers. It employs a two-level hierarchical scheduling—the partitions are scheduled by a SParK .μ-kernel [104] using a table-driven cyclic (TDC) scheduler and tasks within a partition are scheduled by the fixed priority (FP) preemptive scheduler. Thus, the architecture supports the use of a commercial off-the-shelf (COTS) realtime operating system (RTOS). It can be observed from Fig. 7.16 that it is the SParK .μ-kernel which manages all the interactions with the hardware resources and management of partitions. SParK provides all the necessary services to access hardware resources, which include: (a) (b) (c) (d) (e) (f)
Hardware initialization Timer services Support for interrupt services Device drive support Memory management Support for hypercalls
7.7 Case Study 2: PLC Supporting Space and Time Partitioning
313
Fig. 7.16 Partition system model
Task 1
Partition P1
Task 2
..................
Partition P2
Task k
..................
Partition Pn
Partition Kernel
Hardware Layer
In addition and most importantly, SParK carries out partition management, which involves: (a) Partition scheduling (b) Inter-partition communication In this context, note that a guest OS running within a partition receives its clock tick from the SParK timer service, and execution of any privileged instruction is mediated by SParK only. As already discussed in Sect. 4.3.6.3, in order to ensure dependability, it is necessary to implement both (i) space partitioning and (ii) time partitioning.
7.7.2 Space Partitioning Space partitioning is achieved by providing appropriate memory management so that memory overrun into the physical space of a partition due to the failure of application in any other partition is prevented. The SParK memory management is shown in Fig. 7.17. SParK utilizes the hardware mediation provided by the memory management unit (MMU) of the target processor in order to ensure that the memory allocated to the guest OS in a partition is protected from any possible interference by (applications running in) other partitions.
314
7 Development of Qualified Platform
Partition Pn Task Wk
Partition P2 Partition P1
Task W1 Guest OS Kernel
Partition Kernel Physical Memory
Memory Allocated to Guest OS of Partition P1
Fig. 7.17 Space partitioning in SParK
7.7.3 Time Partitioning As discussed in Sect. 4.3.6.3, the IMA architecture supports the time partitioning of hardware resources using a cyclic table-driven scheduler [59]. This can be achieved by implementing the two-level hierarchical scheduler, where: (i) At the core software module level, a Spartan real-time operating system (RTOS) manages partition scheduling and inter-partition communications. (ii) At the partition level, individual RTOS or general-purpose OS manages processes (process management—scheduling, message handling) within a partition and communication between the constituent processes (intra-partition communication). Let us refer to scheduling of tasks within a partition as in-partition scheduling. The OS within a partition is termed as guest OS. Works on the software architecture supporting the IMA system include Strongly Partitioned Integrated Real-Time System (SPIRIT) [155] and Safety Partitioned Kernel (SParK) [104]. Both these .μ-kernels work at the bottom layer ensuring spatial and temporal isolation among the partitions at the top layer. In other words, the .μ-kernel gives (i) temporal guarantee to each partition by ensuring that applications running in individual partitions receive the allocated CPU time and (ii) spatial guarantee by preventing memory overruns. A partition can have its own RTOS scheduling a set of tasks in an application, or it may simply run a single threaded application. The temporal guarantee to individual partitions allows hosting of applications of different criticality running in partitions different from each other.
7.7 Case Study 2: PLC Supporting Space and Time Partitioning
315
The design and implementation techniques of temporal partitioning used in the pPLC and discussed under this section are derived mainly from [104] and [151].
7.7.3.1
Modeling and Design of Temporally Partitioned System
In pPLC, the best of both SPIRIT and SParK are utilized. However, the partition schedulability analysis in pPLC [151] extends and utilizes the worst-case execution time (WCRT) analysis technique of Joseph and Pandya [146]. Partition A partition is a real-time virtual processor, which is allocated only .αk fraction of CPU time and is responsible for executing a set of tasks τk = {τik (Tik , Cik ) | i = 1, 2 . . . , n}
.
where (i) .Tik and .Cik are the period and the worst-case execution time (WCET) of task .τik , respectively, in partition .Pk , (ii) k is the partition id and (iii) n is the number of tasks in the task set .τk . Assumption It is assumed that the relative deadline of a task .Dik is equal to its period .Tik . Therefore, these two terms will be used interchangeably in further discussions. Partition Scheduling The pPLC employs a table-driven cyclic (TDC) scheduling of partitions as recommended by ARINC [59] for reasons of simplicity in implementation, which makes the verification and validation (V&V) easier. In-Partition Scheduling A task set within a partition is scheduled by any RTOS, referred to as guest OS, using a fixed priority (FP) preemptive scheduling policy. Note that often safety-critical tasks are simple and can run as a single threaded application. Such an application can be modeled as a single threaded task within a partition. Hierarchical Scheduling Combination In view of the above discussions, the hierarchical scheduling architecture can have a combination of fixed priority (FP) preemptive scheduling for in-partition (top-level) scheduling of tasks and TDC scheduling of partitions (bottom level or partition level). It is to be ensured that when it is time to switch partition, a running task in the active partition will be preempted to give processor time to the partition for which it is due. Definition 7.1 TDC-FP scheduling is referred to as the hierarchical scheduling combination, which adopts fixed priority preemptive scheduling of tasks within a partition and a TDC policy for scheduling of partitions. The key idea behind the pPLC task scheduling is as follows. Since a single processor is responsible for executing tasks in partitions, only a fraction of processor
316
7 Development of Qualified Platform
Fig. 7.18 Partition cycle
(1- α)
α η
time .α(PSH)) AND NOT (T_LEVEL>=HIGH); END_TRANSITION TRANSITION FROM VLV_OPEN TO VLV_CLOSE:= V_CLOSEC AND NOT (P_ON_OFF); END_TRANSITION TRANSITION FROM (PUMP_OFF,VLV_CLOSE) TO INIT:= T_LEVEL>=HIGH; END_TRANSITION END_PROGRAM
B.2 Lustre Specification of the PLC Program The Lustre specification of the PLC program for the cooling water supply control system, generated using ST to Lustre translator, is reproduced below. include "math.lus" include "boolmisc.lus" node INIT (V_OPENC:bool;P_STOPC:bool;pre__LOW:int;P_DIS_PR:int;pre__V_CLOSE: bool;pre__PSH:int;P_STARTC:bool;V_CLOSEC:bool;T_LEVEL:int;pre__V_OPEN:bool ;pre__P_ON_OFF:bool;pre__HIGH:int) returns(P_ON_OFF:bool;V_OPEN:bool;V_CLOSE:bool) let P_ON_OFF = 0 ; V_OPEN = 0 ; V_CLOSE = 1 ; tel node PUMP_ON (V_OPENC:bool;P_STOPC:bool;pre__LOW:int;P_DIS_PR:int;pre__V_CLOSE: bool;pre__PSH:int;P_STARTC:bool;V_CLOSEC:bool;T_LEVEL:int;pre__V_OPEN:bool ;pre__P_ON_OFF:bool;pre__HIGH:int) returns(P_ON_OFF:bool) let P_ON_OFF = 1 ; tel node VLV_CLOSE (V_OPENC:bool;P_STOPC:bool;pre__LOW:int;P_DIS_PR:int; pre__V_CLOSE:bool;pre__PSH:int;P_STARTC:bool;V_CLOSEC:bool;T_LEVEL:int; pre__V_OPEN:bool;pre__P_ON_OFF:bool;pre__HIGH:int) returns(V_CLOSE:bool) let V_CLOSE = 1 ; tel node PROCESS (V_OPENC:bool;P_STOPC:bool;P_DIS_PR:int;P_STARTC:bool;V_CLOSEC: bool;T_LEVEL:int)
B.2 Lustre Specification of the PLC Program
333
returns(property:bool;V_CLOSE:bool;V_OPEN:bool;P_ON_OFF:bool) var P_ON_OFF__INIT:bool;V_OPEN__INIT:bool;V_CLOSE__INIT:bool;P_ON_OFF__PUMP_ON: bool;INIT__T:int;PUMP_ON__T:int;PUMP_OFF__T:int;VLV_OPEN__T:int; VLV_CLOSE__T:int;LOW:int;INIT__X:bool;PUMP_ON__X:bool;PUMP_OFF__X:bool; VLV_OPEN__X:bool;VLV_CLOSE__X:bool;V_CLOSE__VLV_CLOSE:bool; V_CLOSE__PUMP_ON__VLV_CLOSE:bool;V_CLOSE__PUMP_OFF__VLV_CLOSE:bool;PSH:int ;V_OPEN__VLV_OPEN:bool;V_OPEN__PUMP_ON__VLV_OPEN:bool; V_OPEN__PUMP_OFF__VLV_OPEN:bool;P_ON_OFF__PUMP_OFF:bool;HIGH:int; let property = not (PUMP_ON__X and (PUMP_ON__T >= 4)) or V_OPEN; (V_OPEN__PUMP_ON__VLV_OPEN) = (0) -> VLV_OPEN (V_OPENC,P_STOPC,pre(LOW), P_DIS_PR,pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN), P_ON_OFF__PUMP_ON,pre(HIGH)); (V_CLOSE__PUMP_ON__VLV_CLOSE) = (0) -> VLV_CLOSE (V_OPENC,P_STOPC,pre(LOW ),P_DIS_PR,pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN), P_ON_OFF__PUMP_ON,pre(HIGH)); (V_OPEN__PUMP_OFF__VLV_OPEN) = (0) -> VLV_OPEN (V_OPENC,P_STOPC,pre(LOW), P_DIS_PR,pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN), P_ON_OFF__PUMP_OFF,pre(HIGH)); (V_CLOSE__PUMP_OFF__VLV_CLOSE) = (0) -> VLV_CLOSE (V_OPENC,P_STOPC,pre( LOW),P_DIS_PR,pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN), P_ON_OFF__PUMP_OFF,pre(HIGH)); (P_ON_OFF__INIT,V_OPEN__INIT,V_CLOSE__INIT) = INIT (V_OPENC,P_STOPC,10 -> pre(LOW),P_DIS_PR,0 -> pre(V_CLOSE),100 -> pre(PSH),P_STARTC,V_CLOSEC, T_LEVEL,0 -> pre(V_OPEN),0 -> pre(P_ON_OFF),50 -> pre(HIGH)); INIT__X = 1 -> if pre(INIT__X) and (((T_LEVEL = pre(HIGH)) then true else pre(INIT__X); INIT__T= 0 -> if INIT__X and not pre(INIT__X) then 0 else if INIT__X then pre(INIT__T) +1 else pre(INIT__T) ; (P_ON_OFF__PUMP_ON) = (0) -> PUMP_ON (V_OPENC,P_STOPC,pre(LOW),P_DIS_PR, pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN),pre(P_ON_OFF), pre(HIGH)); PUMP_ON__X = 0 -> if pre(INIT__X) and (((T_LEVEL = 3 and pre( V_CLOSE)) or (T_LEVEL >= pre(HIGH)))then false else if pre(PUMP_OFF__X) and ( P_STARTC and pre(V_CLOSE) and not (T_LEVEL >= pre(HIGH)) )then true else pre(PUMP_ON__X); PUMP_ON__T= 0 -> if PUMP_ON__X and not pre(PUMP_ON__X) then 0 else if PUMP_ON__X then pre(PUMP_ON__T) +1 else pre(PUMP_ON__T) ; (V_CLOSE__VLV_CLOSE) = (0) -> VLV_CLOSE (V_OPENC,P_STOPC,pre(LOW),P_DIS_PR, pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN),pre(P_ON_OFF), pre(HIGH)); VLV_CLOSE__X = 0 -> if pre(INIT__X) and (((T_LEVEL pre(PSH)))and not ( T_LEVEL >= pre(HIGH)))then false else if pre(VLV_OPEN__X) and (V_CLOSEC and not ( pre(P_ON_OFF) ) )then true else if pre(PUMP_OFF__X) and pre(VLV_CLOSE__X) and ( T_LEVEL >= pre(HIGH) )then false
334
B A PLC Program and Its Formalization else pre(VLV_CLOSE__X); VLV_CLOSE__T= 0 -> if VLV_CLOSE__X and not pre(VLV_CLOSE__X) else if VLV_CLOSE__X then pre(VLV_CLOSE__T) +1 else pre(VLV_CLOSE__T) ;
then 0
(V_OPEN__VLV_OPEN) = (0) -> VLV_OPEN (V_OPENC,P_STOPC,pre(LOW),P_DIS_PR, pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN),pre(P_ON_OFF), pre(HIGH)); VLV_OPEN__X = 0 -> if pre(VLV_CLOSE__X) and ((V_OPENC or ( P_DIS_PR > pre (PSH)))and not (T_LEVEL >= pre(HIGH)))then true else if pre(VLV_OPEN__X) and (V_CLOSEC and not (pre(P_ON_OFF))) then false else pre(VLV_OPEN__X); VLV_OPEN__T= 0 -> if VLV_OPEN__X and not pre(VLV_OPEN__X) then 0 else if VLV_OPEN__X then pre(VLV_OPEN__T) +1 else pre(VLV_OPEN__T) ; (P_ON_OFF__PUMP_OFF) = (0) -> PUMP_OFF (V_OPENC,P_STOPC,pre(LOW),P_DIS_PR, pre(V_CLOSE),pre(PSH),P_STARTC,V_CLOSEC,T_LEVEL,pre(V_OPEN),pre(P_ON_OFF), pre(HIGH)); PUMP_OFF__X = 0 -> if pre(PUMP_ON__X) and ( P_STOPC or (pre(PUMP_ON__T) >= 3 and pre(V_CLOSE)) or (T_LEVEL >= pre(HIGH)))then true else if pre(PUMP_OFF__X) and (P_STARTC and pre(V_CLOSE) and not (T_LEVEL >= pre(HIGH)))then false else if pre(PUMP_OFF__X) and pre(VLV_CLOSE__X) and ( T_LEVEL >= pre(HIGH)) then false else pre(PUMP_OFF__X); PUMP_OFF__T= 0 -> if (PUMP_OFF__X and not pre(PUMP_OFF__X)) then 0 else if PUMP_OFF__X then pre(PUMP_OFF__T) + 1 else pre(PUMP_OFF__T); V_CLOSE = V_CLOSE__INIT -> if (PUMP_ON__X and VLV_CLOSE__X) then V_CLOSE__PUMP_ON__VLV_CLOSE else if (PUMP_OFF__X and VLV_CLOSE__X) then V_CLOSE__PUMP_OFF__VLV_CLOSE else if (VLV_CLOSE__X then V_CLOSE__VLV_CLOSE) else if INIT__X then V_CLOSE__INIT else if (pre(VLV_CLOSE__X) and not(VLV_CLOSE__X)) then 0 else pre(V_CLOSE); V_OPEN = V_OPEN__INIT -> if (PUMP_ON__X and VLV_OPEN__X) then V_OPEN__PUMP_ON__VLV_OPEN else if (PUMP_OFF__X and VLV_OPEN__X) then V_OPEN__PUMP_OFF__VLV_OPEN else if (VLV_OPEN__X then V_OPEN__VLV_OPEN) else if (INIT__X then V_OPEN__INIT) else if (pre(VLV_OPEN__X) and not(VLV_OPEN__X)) then 0 else pre(V_OPEN); P_ON_OFF = P_ON_OFF__INIT -> if (PUMP_ON__X and VLV_OPEN__X) then P_ON_OFF__PUMP_ON else if (PUMP_ON__X and VLV_CLOSE__X) then P_ON_OFF__PUMP_ON else if (PUMP_OFF__X and VLV_OPEN__X) then P_ON_OFF__PUMP_OFF else if (PUMP_OFF__X and VLV_CLOSE__X) then P_ON_OFF__PUMP_OFF else if (PUMP_OFF__X then P_ON_OFF__PUMP_OFF) else if (PUMP_ON__X then P_ON_OFF__PUMP_ON) else if (INIT__X then P_ON_OFF__INIT)
B.2 Lustre Specification of the PLC Program
335
else pre(P_ON_OFF); LOW = 10 ; PSH = 100 ; HIGH = 50 ; tel node VLV_OPEN (V_OPENC:bool;P_STOPC:bool;pre__LOW:int;P_DIS_PR:int;pre__V_CLOSE :bool;pre__PSH:int;P_STARTC:bool;V_CLOSEC:bool;T_LEVEL:int;pre__V_OPEN: bool;pre__P_ON_OFF:bool;pre__HIGH:int) returns(V_OPEN:bool) let V_OPEN = 1 ; tel node PUMP_OFF (V_OPENC:bool;P_STOPC:bool;pre__LOW:int;P_DIS_PR:int;pre__V_CLOSE :bool;pre__PSH:int;P_STARTC:bool;V_CLOSEC:bool;T_LEVEL:int;pre__V_OPEN: bool;pre__P_ON_OFF:bool;pre__HIGH:int) returns(P_ON_OFF:bool) let P_ON_OFF = 0 ; tel
Appendix C
Temporal Partitioning: Proof of TDC-FP Schedulability
The proof of the TDC-FP combination of partition schedulability is presented here. For the convenience of the readers, the schedulability criterion is reproduced from Sect. 7.7.3.2 before discussing the proof. Schedulability Criteria The TDC-FP feasibility of scheduling as studied in [151] states that a task set .τk = (Tik , Cik ) | i = 1, 2, . . . , n is schedulable in a partition .Pk allocated with a partition capacity .αk (< 1) in a static partition schedule cycle .ηk , if (a) .αk ≥
nk
i=1 Cik /Tik Uk sp Tik (1−αk )
(b) .∀i, ηk < − n Rik (c) .ηk < min(Tik ) sp
where (i) .Tik is the time that a task .τik can spare considering its worst-case response time and (ii) .Uk is the upper bound of the utilization of the task set .τk feasible under the rate monotonic (RM) priority assignment [164] to individual tasks. Proof It has been explained in Sect. 7.7.3.1 that in the partition system model, (i) available time for other partitions is simulated as an additional task .τ0k (ηk , (1 − αk )ηk ) and (ii) highest priority is assigned to it among all the tasks in partition .Pk . In accordance with the rate monotonic priority assignment, the condition .ηk < min(Tik ) ensures the highest priority of the task .τ0 . This, in turn, ensures that no task under execution in a partition can prevent partition switching when it is time for doing so. Thus, it guarantees the allocated processor time to every partition. Schedulability guarantee to the additional task .τ0k (having the highest priority) can be provided if all the invocations of .τ0k during the time period of .n Rik obtained using Eq. 7.1 receive their CPU time before the expiry of the deadline .Dik .
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 G. Karmakar et al., Development of Safety-Critical Systems, https://doi.org/10.1007/978-3-031-27901-0
337
338
C Temporal Partitioning: Proof of TDC-FP Schedulability sp
The time .Tik that can be spared by any task .τik in .Pk for other partitions can be computed as sp
Tik = Dik −n Rik
.
(C.1) n
The number of invocations of .τ0k (ηk , (1 − αk )ηk ) within .n Rik will be . ηRkik . Therefore, if the following condition is fulfilled, the schedulability of applications running in any partition will be guaranteed. n
.
Rik +
n
Rik (1 − αk )ηk ≤ Dik ηk
Let us take .n Rik = mηk + δ where .0 ≤ δ < ηk . Substituting .n Rik with .mηk + δ in the second term, we get mηk + δ (1 − αk )ηk .LH S = Rik + ηk
n
≤ n Rik + (m + 1)(1 − αk )ηk = n Rik + mηk (1 − αk ) + (1 − αk )ηk = n Rik + (n Rik − δ)(1 − αk ) + (1 − αk )ηk , Because, [mηk =n Rik − δ] = n Rik + (1 − αk )(n Rik + ηk − δ) Replacing .n Rik from Eq. C.1 .
sp
= Dik − Tik + (1 − αk )(n Rik + ηk − δ) sp
if (1 − αk )(n Rik + ηk − δ) < Tik
< Dik ,
Since .δ ≥ 0, the condition can be modified as sp
(1 − αk )(n Rik + ηk ) < Tik
.
sp
ηk