Dependable Computing - EDCC 2020 Workshops: AI4RAILS, DREAMS, DSOGRI, SERENE 2020, Munich, Germany, September 7, 2020, Proceedings [1st ed.] 9783030584610, 9783030584627

This book constitutes refereed proceedings of the Workshops of the 16th European Dependable Computing Conference, EDCC:

291 115 25MB

English Pages XVI, 215 [226] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xvi
Front Matter ....Pages 1-3
Safe Recognition A.I. of a Railway Signal by On-Board Camera (Jean François Boulineau)....Pages 5-19
Audio Events Detection in Noisy Embedded Railway Environments (Tony Marteau, Sitou Afanou, David Sodoyer, Sébastien Ambellouis, Fouzia Boukour)....Pages 20-32
Development of Intelligent Obstacle Detection System on Railway Tracks for Yard Locomotives Using CNN (Andrey Chernov, Maria Butakova, Alexander Guda, Petr Shevchuk)....Pages 33-43
Artificial Intelligence for Obstacle Detection in Railways: Project SMART and Beyond (Danijela Ristić-Durrant, Muhammad Abdul Haseeb, Marten Franke, Milan Banić, Miloš Simonović, Dušan Stamenković)....Pages 44-55
Anomaly Detection for Vision-Based Railway Inspection (Riccardo Gasparini, Stefano Pini, Guido Borghi, Giuseppe Scaglione, Simone Calderara, Eugenio Fedeli et al.)....Pages 56-67
Rolling Stocks: A Machine Learning Predictive Maintenance Architecture (Roberto Nappi, Valerio Striano, Gianluca Cutrera, Antonio Vigliotti, Giuseppe Franzè)....Pages 68-77
Analysis of Railway Track Irregularities with Convolutional Autoencoders and Clustering Algorithms (Julia Niebling, Benjamin Baasch, Anna Kruspe)....Pages 78-89
UIC Code Recognition Using Computer Vision and LSTM Networks (Roberto Marmo)....Pages 90-98
Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing (Wan-Jui Lee, Helia Jamshidi, Diederik M. Roijers)....Pages 99-110
Front Matter ....Pages 111-112
Enforcing Geofences for Managing Automated Transportation Risks in Production Sites (Muhammad Atif Javed, Faiz Ul Muram, Anas Fattouh, Sasikumar Punnekkat)....Pages 113-126
Safety Cases for Adaptive Systems of Systems: State of the Art and Current Challenges (Elham Mirzaei, Carsten Thomas, Mirko Conrad)....Pages 127-138
Front Matter ....Pages 139-141
Drafting a Cybersecurity Framework Profile for Smart Grids in EU: A Goal-Based Methodology (Tanja Pavleska, Helder Aranha, Massimiliano Masi, Giovanni Paolo Sellitto)....Pages 143-155
Front Matter ....Pages 157-158
An Eclipse-Based Editor for SAN Templates (Leonardo Montecchi, Paolo Lollini, Federico Moncini, Kenneth Keefe)....Pages 159-167
Interplaying Cassandra NoSQL Consistency and Performance: A Benchmarking Approach (Anatoliy Gorbenko, Alexander Romanovsky, Olga Tarasyuk)....Pages 168-184
Application of Extreme Value Analysis for Characterizing the Execution Time of Resilience Supporting Mechanisms in Kubernetes (Szilárd Bozóki, Jenő Szalontai, Dániel Pethő, Imre Kocsis, András Pataricza, Péter Suskovics et al.)....Pages 185-199
Concepts and Risk Analysis for a Cooperative and Automated Highway Platooning System (Carl Bergenhem, Mario Majdandzic, Stig Ursing)....Pages 200-213
Back Matter ....Pages 215-215
Recommend Papers

File loading please wait...
Citation preview

Simona Bernardi · Valeria Vittorini · Francesco Flammini · Roberto Nardone · Stefano Marrone · Rasmus Adler et al. (Eds.)

Communications in Computer and Information Science

1279

Dependable Computing EDCC 2020 Workshops AI4RAILS, DREAMS, DSOGRI, SERENE 2020 Munich, Germany, September 7, 2020 Proceedings

Communications in Computer and Information Science Commenced Publication in 2007 Founding and Former Series Editors: Simone Diniz Junqueira Barbosa, Phoebe Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu, Krishna M. Sivalingam, Dominik Ślęzak, Takashi Washio, Xiaokang Yang, and Junsong Yuan

Editorial Board Members Joaquim Filipe Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh Indian Statistical Institute, Kolkata, India Igor Kotenko St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia Raquel Oliveira Prates Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou Tsinghua University, Beijing, China

1279

More information about this series at http://www.springer.com/series/7899

Simona Bernardi Valeria Vittorini Francesco Flammini Roberto Nardone Stefano Marrone Rasmus Adler Daniel Schneider Philipp Schleiß Nicola Nostro Rasmus Løvenstein Olsen Amleto Di Salle Paolo Masci (Eds.) •





















Dependable Computing EDCC 2020 Workshops AI4RAILS, DREAMS, DSOGRI, SERENE 2020 Munich, Germany, September 7, 2020 Proceedings

123

Editors Simona Bernardi University of Zaragoza Zaragoza, Spain

Valeria Vittorini University of Naples Federico II Naples, Italy

Francesco Flammini Linnaeus University Växjö, Sweden

Roberto Nardone University of Reggio Calabria Reggio Calabria, Italy

Stefano Marrone University of Naples Federico II Naples, Italy

Rasmus Adler Fraunhofer IESE Kaiserslautern, Germany

Daniel Schneider Fraunhofer IESE Kaiserslautern, Germany

Philipp Schleiß Fraunhofer IKS Munich, Germany

Nicola Nostro Resiltech s.r.l. Pontedera, Italy

Rasmus Løvenstein Olsen Aalborg University Aalborg, Denmark

Amleto Di Salle University of L’Aquila L’Aquila, Italy

Paolo Masci National Institute of Aerospace, Langley Research Center Hampton, USA

ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-3-030-58461-0 ISBN 978-3-030-58462-7 (eBook) https://doi.org/10.1007/978-3-030-58462-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Workshop Editors

AI4RAILS

DSOGRI

Valeria Vittorini University of Naples Federico II, Italy [email protected]

Nicola Nostro Resiltech s.r.l., Italy [email protected]

Francesco Flammini Mälardalen University & Linnaeus University, Sweden francesco.fl[email protected]

Rasmus Løvenstein Olsen Aalborg University, Denmark [email protected]

Roberto Nardone University Mediterranea of Reggio Calabria, Italy [email protected] Stefano Marrone University of Naples Federico II, Italy [email protected]

DREAMS Rasmus Adler Fraunhofer IESE, Germany Rasmus.Adler@iese. fraunhofer.de Daniel Schneider Fraunhofer IESE, Germany Daniel.Schneider@iese. fraunhofer.de Philipp Schleiß Fraunhofer IKS, Germany philipp.schleiss@iks. fraunhofer.de

SERENE Amleto Di Salle University of L’Aquila, Italy [email protected] Paolo Masci National Institute of Aerospace, Langley Research Center, USA [email protected]

Preface

The European Dependable Computing Conference (EDCC) is an international annual forum for researchers and practitioners to present and discuss their latest research results on theory, techniques, systems, and tools for the design, validation, operation, and evaluation of dependable and secure computing systems. Traditionally one-day workshops precede the main conference: the workshops complement the main conference by addressing dependability or security issues in specific application domains or by focussing in specialized topics, such as system resilience. The 16th edition of EDCC, was initially planned to be held in Munich, Germany, during September 7–10, 2020. Due to the COVID-19 pandemic, it was finally changed to a virtual conference. The schedule and timings were unchanged and the workshops day was held on Monday, September 7, 2020. Six workshop proposals were submitted to EDCC 2020, and, after a thoughtful revision process led by the workshop chair, all of them were accepted. The evaluation criteria for the workshops selection included the relevance to EDCC, the timeliness and expected interest in the proposed topics, the organizers’ ability to lead a successful workshop, and their balance and synergy. One of the workshops was later canceled by the organizers and the EDCC workshop program finally included five workshops. These joint proceedings include the accepted papers from the following four workshops: – Workshop on Artificial Intelligence for RAILwayS (AI4RAILS) – Workshop on Dynamic Risk managEment for Autonomous Systems (DREAMS) – Workshop on Dependable SOlutions for Intelligent Electricity Distribution GRIds (DSOGRI) – Workshop on Software Engineering for Resilient Systems (SERENE) All 4 workshops together received a total of 35 submissions from 19 different countries. Each workshop had an independent Program Committee, which was in charge of reviewing and selecting the papers submitted to the workshop. All the workshops adopted a single-blind review process and the workshop papers received 3 reviews per paper on average (106 reviews in total). Out of the 35 submissions, 19 papers were selected to be presented at the workshops (an acceptance rate of 54.3%) and 16 papers were included in these proceedings (an acceptance rate of 45.7%). Many people contributed to the success of the EDCC 2020 workshops day. I would like to express my gratitude to all those supported this event. I thank all the workshops organizers for their dedication and commitment, the authors who contributed to this volume, the reviewers for their help in the paper assessment, and the workshops participants. I would also like to thank all the members of the EDCC Steering and Organizing Committees, in particular Michael Paulitsch and Mario Trapp (the general chairs), who made this virtual edition possible and free of charge for all the participants with the

viii

Preface

support of Intel Deutschland GmbH and Fraunhofer IKS, and Miguel Pardal (the publication chair) for his help in the preparation of these proceedings. Finally, many thanks to the staff of Springer who provided a professional support through all the phases that led to this volume. September 2020

Simona Bernardi

Organization

EDCC Steering Committee Karama Kanoun (Chair) Jean-Charles Fabre Felicita Di Giandomenico Johan Karlsson Henrique Madeira Miroslaw Malek Juan Carlos Ruiz Janusz Sosnowski

LAAS-CNRS, France LAAS-CNRS, France Institute ISTI, Italy Chalmers University of Technology, Sweden University of Coimbra, Portugal Università della Svizzera Italiana, Switzerland Technical University of Valencia, Spain Warsaw University of Technology, Poland

EDCC 2020 Organization General Chairs Michael Paulitsch Mario Trapp

Intel, Germany Fraunhofer IKS, Germany

Program Chair Elena Troubitsyna

KTH, Sweden

Web Chair Nikolaj Schack Holmkvist Pedersen

Intel, Germany

Local Organization Chairs Veronika Seifried Kerstin Alexander

Fraunhofer IKS, Germany Intel, Germany

Workshops Chair Simona Bernardi

University of Zaragoza, Spain

Students Forum Chair Marcello Cinque

University of Naples Federico II, Italy

Fast Abstracts Chair Barbara Gallina

Mälardalen University, Sweden

x

Organization

Industry Track Chair Simon Burton

Bosch, Germany

Publication Chair Miguel L. Pardal

Universidade de Lisboa, Portugal

Publicity Chair Alexander Romanovsky

Newcastle University, UK

Workshop on Artificial Intelligence for RAILwayS (AI4RAILS) Workshop Chairs and Organizers Francesco Flammini Stefano Marrone Roberto Nardone Valeria Vittorini

Mälardalen University and Linnaeus University, Sweden University of Naples Federico II, Italy Mediterranea University of Reggio Calabria, Italy University of Naples Federico II, Italy

Technical Program Committee László Ady Ali Balador Shahina Begum Nikola Bešinović Jens Braband Aida Causevic Domenico Di Nardo Pasquale Donadio Francesco Flammini Rob Goverde Luca Hudasi Leonardo Impagliazzo Zhiyuan Lin Stefano Marrone Stefano Marrone Claudio Mazzariello Roberto Nappi Roberto Nardone Stefano Olivieri Antonio Picariello Egidio Quaglietta Alexander Romanovsky

NextTechnologies, Hungary RISE, Sweden Mälardalen University, Sweden Delft University of Technology, The Netherlands Siemens Mobility GmbH, Germany Bombardier Transportation, Sweden SYENMAINT, Italy Comesvil SpA, Italy Mälardalen University and Linnaeus University, Sweden Delft University of Technology, The Netherlands NextTechnologies, Hungary Hitachi Rail STS, Italy University of Leeds, UK University of Campania Luigi Vanvitelli, Italy University of Naples Federico II, Italy Hitachi Rail STS, Italy SYENMAINT, Italy Mediterranea University of Reggio Calabria, Italy The MathWorks, Italy University of Naples Federico II, Italy Delft University of Technology, The Netherlands Newcastle University, UK

Organization

Mehdi Saman Azari Carlo Sansone Stefania Santini Dániel Tokody Valeria Vittorini

Linnaeus University, Sweden University of Naples Federico II, Italy University of Naples Federico II, Italy NextTechnologies, Hungary University of Naples Federico II, Italy

Additional Reviewers Giovanni Cozzolino Sharmin Sultana Sheuly

University of Naples Federico II, Italy Mälardalen University, Sweden

First Workshop on Dynamic Risk managEment for AutonoMous Systems (DREAMS) Program Chairs Rasmus Adler Philipp Schleiß Daniel Schneider

Fraunhofer IESE, Germany Fraunhofer IKS, Germany Fraunhofer IESE, Germany

Program Committee Eric Armengaud Gordon Blair Patrik Feth Roman Gansch Lydia Gauerhof Daniel Görges Ibrahim Habli Naoki-san Ishihama Phil Kopmann Ayhan Mehmed Nicola Paltrinieri Yiannis Papadopulous John Rushby

AVL, Austria Lancaster University, UK Sick AG, Germany Bosch, Germany Bosch, Germany Technical University Kaiserslautern, Germany University of York, UK Jaxa, Japan Carnegie Mellon University, USA TTTech Auto AG, Austria Norwegian University of Science and Technology, Norway University of Hull, UK SRI International, USA

Additional Reviewer Marc Zeller

Siemens, Germany

xi

xii

Organization

Workshop on Dependable Solutions for Intelligent Electricity Distribution Grid (DSOGRI) Program Chairs Nicola Nostro Rasmus Løvenstein Olsen

ResilTech s.r.l., Italy Aalborg University, Denmark

Program Committee Magnus Almgren Jan Dimon Bendtsen Silvia Bonomi Mislav Findrik Rune Hylsberg Jacobsen Karsten Handrup Maximilian Irlbeck Paolo Lollini Giulio Masetti Leonardo Montecchi Hans-Peter Schwefel Kamal Shahid Hamid Shaker Nuno Pedro Silva Nuno Silva Christoph Winter

Chalmers University, Sweden Aalborg University, Denmark University of Rome La Sapienza, Italy Liberty Global, Austria Aarhus University, Denmark Kamstrup A/S, Denmark Zentrum Digitalisierung.Bayern, Germany University of Florence, Italy ISTI-CNR, Italy Universidade Estadual de Campinas, Brazil GridData GmbH, Germany Aalborg University, Denmark South Danish University, Denmark Critical Software SA, Portugal GridData GmbH, Germany Fronius, Austria

International Workshop on Software Engineering for Resilient Systems (SERENE) Program Chairs Amleto Di Salle Paolo Masci

University of L’Aquila, Italy National Institute of Aerospace, USA

Steering Committee Didier Buchs Henry Muccini Patrizio Pelliccione Alexander Romanovsky Elena Troubitsyna

University of Geneva, Switzerland University of L’Aquila, Italy University of L’Aquila, Italy, and Chalmers University of Technology, Sweden Newcastle University, UK KTH, Sweden

Organization

xiii

Program Committee Marco Autili Radu Calinescu Andrea Ceccarelli Felicita Di Giandomenico Nikolaos Georgantas Jeremie Guiochet Linas Laibinis Istvan Majzik Raffaela Mirandola Henry Muccini Andras Pataricza Patrizio Pelliccione Cristina Seceleanu Alin Stefanescu Elena Troubitsyna Karthik Vaidhyanathan Marco Vieira Apostolos Zarras

University of L’Aquila, Italy University of York, UK University of Florence, Italy CNR-ISTI, Italy Inria, France Université de Toulouse, LAAS-CNRS, France Åbo Akademi University, Lithuania Budapest University of Technology and Economics (BME), Hungary Politecnico di Milano, Italy University of L’Aquila, Italy Budapest University of Technology and Economics (BME), Hungary University of L’Aquila, Italy, and Chalmers University of Technology, Sweden Malardalen University, Sweden University of Bucharest, Romania KTH, Sweden Gran Sasso Science Institute, Italy University of Coimbra, Portugal University of Ioannina, Greece

Additional Reviewer Ioannis Stefanakos

University of York, UK

Publicity Chair Claudio Pompilio

University of L’Aquila, Italy

Web Chair Francesco Gallo

University of L’Aquila, Italy

Contents

Workshop on Artificial Intelligence for RAILwayS (AI4RAILS) Safe Recognition A.I. of a Railway Signal by On-Board Camera . . . . . . . . . Jean François Boulineau

5

Audio Events Detection in Noisy Embedded Railway Environments . . . . . . . Tony Marteau, Sitou Afanou, David Sodoyer, Sébastien Ambellouis, and Fouzia Boukour

20

Development of Intelligent Obstacle Detection System on Railway Tracks for Yard Locomotives Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrey Chernov, Maria Butakova, Alexander Guda, and Petr Shevchuk Artificial Intelligence for Obstacle Detection in Railways: Project SMART and Beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Danijela Ristić-Durrant, Muhammad Abdul Haseeb, Marten Franke, Milan Banić, Miloš Simonović, and Dušan Stamenković

33

44

Anomaly Detection for Vision-Based Railway Inspection . . . . . . . . . . . . . . . Riccardo Gasparini, Stefano Pini, Guido Borghi, Giuseppe Scaglione, Simone Calderara, Eugenio Fedeli, and Rita Cucchiara

56

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture . . . Roberto Nappi, Valerio Striano, Gianluca Cutrera, Antonio Vigliotti, and Giuseppe Franzè

68

Analysis of Railway Track Irregularities with Convolutional Autoencoders and Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julia Niebling, Benjamin Baasch, and Anna Kruspe UIC Code Recognition Using Computer Vision and LSTM Networks . . . . . . Roberto Marmo Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wan-Jui Lee, Helia Jamshidi, and Diederik M. Roijers

78 90

99

xvi

Contents

Worskhop on Dynamic Risk managEment for Autonomous Systems (DREAMS) Enforcing Geofences for Managing Automated Transportation Risks in Production Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhammad Atif Javed, Faiz Ul Muram, Anas Fattouh, and Sasikumar Punnekkat Safety Cases for Adaptive Systems of Systems: State of the Art and Current Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elham Mirzaei, Carsten Thomas, and Mirko Conrad

113

127

Workshop on Dependable SOlutions for Intelligent Electricity Distribution GRIds (DSOGRI) Drafting a Cybersecurity Framework Profile for Smart Grids in EU: A Goal-Based Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanja Pavleska, Helder Aranha, Massimiliano Masi, and Giovanni Paolo Sellitto

143

Workshop on Software Engineering for Resilient Systems (SERENE) An Eclipse-Based Editor for SAN Templates . . . . . . . . . . . . . . . . . . . . . . . Leonardo Montecchi, Paolo Lollini, Federico Moncini, and Kenneth Keefe Interplaying Cassandra NoSQL Consistency and Performance: A Benchmarking Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anatoliy Gorbenko, Alexander Romanovsky, and Olga Tarasyuk Application of Extreme Value Analysis for Characterizing the Execution Time of Resilience Supporting Mechanisms in Kubernetes . . . . . . . . . . . . . . Szilárd Bozóki, Jenő Szalontai, Dániel Pethő, Imre Kocsis, András Pataricza, Péter Suskovics, and Benedek Kovács

159

168

185

Concepts and Risk Analysis for a Cooperative and Automated Highway Platooning System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carl Bergenhem, Mario Majdandzic, and Stig Ursing

200

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

215

Workshop on Artificial Intelligence for RAILwayS (AI4RAILS)

Workshop on Artificial Intelligence for RAILwayS (AI4RAILS) Workshop Description AI4RAILS is a new international workshop series specifically addressing topics related to the adoption of Artificial Intelligence (AI) in the railway domain. In the last few years, there has been a growing interest in AI applications to railway systems. Such interest has been a consequence of the potential and opportunities enabled by AIpowered solutions in combination with the other prominent technologies based on cloud computing, big data analytics and the Internet of Things. The results already achieved in other relevant transport sectors, mainly automotive, have further supported the development of AI in railways. This trend within the railway industry is also witnessed by the industrial research and innovation initiatives as well as by the growing number of scientific publications addressing AI techniques applied to the rail sector. Relevant applications include intelligent surveillance, automatic train operation, smart maintenance, timetable optimization and network management. The application of AI to railways is expected to have a significant impact in a medium to long term perspective, especially within autonomous and cooperative driving, predictive maintenance and traffic management optimization. For example, railway lines capacity, Life Cycle Cost (LCC), human errors detection and avoidance, efficiency and performance, automation and self-adaptation, among other things, could significantly benefit from artificial intelligence and machine learning. This opens for unprecedented scenarios for railway systems, but also raises concerns regarding system dependability and the new threats associated with a higher level of autonomy. Therefore, one of the first steps towards the adoption of AI in the railway sector is understanding to what extent AI can be considered reliable, safe and secure (i.e., what is sometimes referred to as “trustworthy AI”, including “explainable AI”) given the peculiarities and reference standards of the railway domain. At the same time, it is extremely relevant to understand to what extent AI can help achieve higher levels of reliability, safety and security within the railway domain. We can summarize these two opposite yet strictly interconnected aspects as “dependable AI for smarter railways” and “AI for more dependable railways”. Hence the connection between AI4RAILS and the hosting European Dependable Computing Conference (EDCC). The AI challenge has been tackled by the European Union’s Shift2Rail programme with several research and innovation projects addressing aspects of digitalization, automation and optimization. In particular, the aim of the ongoing Shift2Rail project named RAILS (Roadmaps for AI integration in the raiL Sector) is to investigate the potential of AI in the rail sector and contribute to the definition of roadmaps for future research in next-generation signalling systems, operational intelligence, and network management. This workshop was part of the dissemination activities planned in the RAILS project. The ambition of AI4RAILS is to be a reference forum for researchers, practitioners and business leaders to discuss and share new perspectives, ideas, technologies,

Workshop on Artificial Intelligence for RAILwayS (AI4RAILS)

3

experience and solutions for effective and dependable integration of AI techniques in rail-based transportation systems, in the general context of intelligent and smart railways. The format of this year’s edition of AI4RAILS included two keynotes, a tutorial and four technical sessions. The first keynote speech was given by Jens Braband from Siemens Mobility GmbH, addressing the issues of AI and machine learning within railway safety assessment, which is an extremely relevant topic given the safetycriticality of several railway control and supervision functions. The second keynote speech was given by Giorgio Travaini, Head of Research & Innovation at the European Union’s Shift2Rail Joint Undertaking, highlighting the strong connection between the Shift2Rail programme and the research and innovation challenges and opportunities given by AI in railways. The tutorial was provided by The MathWorks, a worldwide leading company in the fields of Data Science and Machine Learning Platforms, as recognised by the Gartner’s Magic Quadrant 2020, mainly known by their Matlab and Simulink software applications. This year we received many quality submissions by contributors from 10 distinct countries. Each paper has been reviewed by at least 3 reviewers from diverse institutions and nations, including reputable academic and industry representatives. After the blind peer-review process, we finally selected 9 papers for presentation at the workshop and publication in the book of proceedings. We have been glad to receive many highquality submissions and the final acceptance rate was lower than 50%. The papers presented in this edition represent an interesting blend of approaches and techniques addressing several challenges in the application of AI to railways, such as signal recognition and obstacle detection, smart surveillance, asset management and maintenance, logistics and optimization. The organization of this workshop has been supported by the aforementioned Shift2Rail project named RAILS. RAILS has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 881782. We would like to thank the keynote speakers and tutorial organizers, the international members of the technical program committee and the additional reviewers for kindly accepting our invitations. We are very grateful to them and to all workshop speakers for their outstanding contributions, which made the first edition of AI4RAILS a success exceeding our expectations. We also thank all the people who supported this workshop and helped us in its organization, including the EDCC chairs and organizers. In particular, we would like to thank the EDCC Workshops Chair, Simona Bernardi, and the EDCC Publication Chair, Miguel Pardal, for their invaluable support.

Safe Recognition A.I. of a Railway Signal by On-Board Camera Jean François Boulineau(B) RATP - Régie Autonome des Transports Parisiens, Val de Fontenay, Paris, France [email protected]

Abstract. Some railway solutions are the results of technology push. The development of low cost computers and cameras makes it possible to automate detection tasks in different industry domains. In this article, we develop the blocking points that arise for the adoption of A.I. technologies for functions involving safety. We remind the useful elements for a safety demonstration, and for the definition of tests or simulations which bring complements to this validation. We propose a paradigm shift for the demonstration of safety, in a framework where a formal demonstration is no longer possible, with two methods “proven in use” and NABS (not absolute but sufficient). Keywords: Railway signaling system · Safety validation · Machine learning · Certification · NABS (not absolute but sufficient)

1 Introduction Image analysis allow manufacturers in the automotive sector to make significant progress towards driving automation (via active driving assistance functions - ADAS - Advanced Driver Assistance Systems). The introduction of these functions is associated with a continuous technological progress in video, radar and lidar sensors and the processing of the associated inputs. These improvements are all opportunities for a delegation of driving, with in particular the emergence of traffic signal recognition applications for autonomous vehicle. The development of low cost computers and cameras is an opportunity to automate certain relevant detection tasks. There are many other applications for image recognition (medical, metallurgy, textile industry, food industry, aeronautical & defense sector, …), but A.I. applications are little used in the railway domain, and are not, up to now, not concerned by safety. There are several major obstacles for that: difficulties in safety demonstration, difficulties for elaborate methods associated with these technological developments compatible with the state of the art of the railway, including in particular the railway standards EN 50126 [9], 50128 [7], 50129 [8], difficulties to prove the “GAME” performance (Globalement Au Moins Equivalent - French approach. Means: Overall At least Equivalent safety level [1]). The RATP engineering department assesses the safety of equipment and systems before commissioning. The internal safety culture is strong and built around solid elements covering: safety studies, robust software validation, demonstration by formal © Springer Nature Switzerland AG 2020 S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 5–19, 2020. https://doi.org/10.1007/978-3-030-58462-7_1

6

J. F. Boulineau

proof, search for the most complete control possible of the system behavior after failure of equipment, events analysis (which covers accident precursors), testing of every new equipment, including proven equipment used in a context outside of the RATP network. All of these points imply to automatically reject any insufficiently known technology until more feedback is available. However, and this is the purpose of this paper, there may be possible solutions to some fundamental questions. We will explain here how the RATP engineering department handles the following questions: What are the validation means and techniques that would allow systems to be addressed satisfactorily even if there is no guaranteed and reproducible behaviors? Are we able to know, appreciate and identify the limits of the demonstration of safety on such systems? Finally, can we still speak of “safety level” like we do on existing systems, and if so, under what conditions?

2 The Subject 2.1 It Is Not a New Subject in the Automotive Sector The first work on automatic detection of automobile signaling began in the late 80s and was then based on image processing and vision by conventional signal processing (imperative programming) by basic graphical calculation (edge detection, pattern matching…). As time went on, the use of machine learning techniques applied to advanced image descriptors became widespread until, now, the supremacy of deep learning techniques [2]. 2.2 Useful for Rail Quite naturally, the perception by A.I. of fixed color signal seemed to be the most directly transferable subject to rail, with the use of an “intelligent” sensor, that is to say capable of perceiving a fixed color signal and declare the state of this signal. By recognizing the fixed color signal, it is therefore possible to imagine an automatic signaling detection function. The project in question is called SCVS: Safety reCognition by Video of the railway Signal. 2.3 A More Complex Subject Than It Seems for Rail Human perception of a traffic light, although conceptually simple, requires a complex analysis of information which may possibly involve logical reasoning and specific knowledge. An example is the detection of an extinguished or questionable side fixed signal: only an experienced driver can detect it because it requires a good knowledge of the line. Certain situations therefore require significant interpretation effort and great concentration by the operator. We can recall some other difficulties. A 2D camera vision can’t determine which is the first fixed signal when two fixed signals with different colors follow one another. Also, many objects constitute artefacts and can be erroneously seen as a valid signal.

Safe Recognition A.I. of a Railway Signal by On-Board Camera

7

2.4 Challenges Beyond the opportunity of using signal recognition functions, the main motivation of this study remains the appreciation of the adaptations of the safety analysis procedures to treat the behavior of systems integrating an I.A processing part. The study of A.I. signal recognition is a kind of textbook case or a perfect example.

3 Safety Analysis Approach 3.1 Classical Articulation of Studies Investigations involving safety are generally carried out in the following order: development and testing of a POC (Proof Of Concept) for functional feasibility, observations and analysis. The last step for consolidate the concept is a definition of the safety principles. 3.2 No Need to Go Further if Safety Is Not Provable We have chosen to treat the subject in a different order, starting with a safety analysis relative to a concept (leading to the determination of a set of essential requirements allowing to bring a certain level of confidence), then, launch of a POC integrating these requirements. Two reasons led us to this different sequencing: 1. safety remains a primary criterion (we must consider the viability of any concept as soon as possible and reframe the activities as soon as there are obvious incompatibilities or unrealistic solutions in terms of realization and foreseeable costs), 2. the type of use is not imposed by a clear expression of need, but by an exploration of technological opportunities (techno push). Depending on the result of the safety analysis the possible uses will be seen in a second step. 3.3 Safety Paradigm Shift One of the difficulties to be overcome is the paradigm shift on the demonstration of safety. It appears, from the introduction of the A.I. component, that there is a suspicion of insufficiently controlled safety due to a big dependence between the results and the choice of the data for the learning. – Can we use, for rail safety applications, technological bricks with non-predictable behavior s and therefore consubstantially contrary to any safety approach? – Is that the unpredictable behaviors of image processing allow possibilities of control of train in a given context of use? The only way out is to move from a culture of almost absolute, even absolute safety (case of failsafe components or case of formal proof approach) to an approach of “sufficient” and “acceptable”. We use the acronym NABS for this approach (Not Absolute But Sufficient) which is an extension of risk based assessment by the way of an expression of the functional safety capability (SOTIF – safety of the Intended Functionality & ODD - Operational Design Domain).

8

J. F. Boulineau

4 Method and Activities 4.1 Articulation of Activities The study is structured in the following steps: 1. Several visits into the metro cabin and observations of manual driving, Training of RATP stakeholders in the “world” of A.I., processing of a video for a first learning base, 2. Establishment of a first unreliable, unsafe solution (from typical workflows) 3. FMEA with the experts of the technologies involved 4. Enrichment of the first solution to improve reliability and safety 5. Successive iterations of architectural proposals and FMEAs 6. Stabilization and convergence on a technical and functional architecture which should provide certain guarantees on safety 7. Finally, special treatment of common modes and special treatment of learning processes (supervised) 4.2 Functional Description of the System Studied The SCVS integrate a perception component and an interpretation of the visual environment. A first functional breakdown makes it possible to retain the following functions: F1 to acquire

• Acquire the environment in front of the train in the form of video stream

F2 to interpret

• F2-1 Detect the presence of a signal • F2-2 Recognize the shape of the signal in the panel • F2-3 Determine the color F2-4 Confirm the presence of the traffic light

F3 to act

• fire status display on the dashboard • or/and • execution of an adapted action

4.3 Functional Model The model is reduced to a functional description including the support components (block diagram). The working principle was to start from an elementary functional architecture and then, over the course of discussions and observations, to build step by step a more successful and more complex functional architecture which represents the minimum structure with sufficient guarantees of robustness and safety results.

Safe Recognition A.I. of a Railway Signal by On-Board Camera

9

5 SCVS Risk Analysis 5.1 Risk Analysis The safety study consists of a critical review of the architecture and functions. Examples of topics covered: What happens if a signal is not detected or when there is no signal? What happens if the color of a traffic light is not determined? What are the parameters to know for a learning allowing a high level of confidence? On which criteria are the number scores (percentage of credibility) based and how is it used? What specific performances are expected? etc. 5.2 The Dangerous Events If we limit the description to a function of recognition of red lights with an estimation of the distance to the signal, the dangerous event is the miss of red light or a wrong estimation of distance, which can be apportioned in: red light seen as a green light, red light seen as a yellow light, burned out light bulb seen as a cancelled signal (cancelled signals are indicated by a white saint Andrew’s cross), burned out red light not seen, red light detected but considered further than it actually is. All these events can be summarized in a generic undesired event: UE 1-Result corresponding to loss of detection of red or extinguished signal or declaration of erroneous distance to the signal. 5.3 Failure Model The fault considered are a result from a black box type approach on the various functionalities implemented, with the intervention of specialists for a technologist approach which specifies the type of breakdown. Example of failures or defects on cameras or effects induced by the environment: evolution of gain - loss of sensitivity, evolution of gain - excessive sensitivity, external influence by vibration, glare (external source), frozen image, loss of video stream, displacement of the axes of the cam-eras (shocks, handling errors, fixing fault, etc.), dead pixels (on a camera), dust deposits on lens (on one or two cameras), rain, mist, smoke … (common mode), strobe effects (frequency of image acquisition = multiple of the frequency of the light source), drift of cameras acquiring the spectrum, reflections (on the lens or wall), reflected signal image, incorrect pre-processing of the image, presence of artifacts, thresholding (net shape) not optimal => incorrect detection of the shape of the signal, untimely detection of the signal on the panel, non-detection of the signal on the panel. 5.4 FMEA (Failure Mode and Effect Analysis) We tackled this question with a very traditional approach to analyze failures and malfunctions. The FMEA clarifies the scenarios of dangerous situations and establishes a list of associated safety requirements (RRM - Risk Reduction Measures). For breakdowns and effects of the events, requirements are grouped according to the following types of protection: learning, calibration, failure detection, validation, experimentation,

10

J. F. Boulineau

functional safety (SOTIF), installation, software, preventive maintenance actions, minimum performances, reinforcement of redundancy (the requirement of multiple chain processing and need for a diversification of the sensors), consistency tests. The FMEA reveals the points of weakness of the initial architecture. For all unsuitable and unsafe processing, changes are introduced which lead to a new and more complex architecture and new functional requirements. Then, the final FMEA introduces a specific analysis of common modes. To deal with common modes, the following types of RRM are identified: detection of situation outside previous domains, detection of latent failure, software diversification, material diversification, process diversification, principles of measurement, protection from the environment, qualification, consistency tests, new reinforcement of redundancy (3 processing chains), validation. Note: the exported constraints (for example: processing of an information which validate or not the result) are rare, which is a desired result to offer a flexible solution. 5.5 Final Architecture FMEA identifies the functionalities to be added while always limiting itself to what is strictly necessary. It is therefore proposed to complete the first functional model with the following elements: knowledge of the distance to the sign (LIDAR and cartography), introduction of cartography, consistency checks of the intermediate results of 3 treatment chains, tracking to target the presence area of the light stand panel. The SCVS system behaves like a complex sensor which ensures pre-processing and restores information relative to the presence or not of a panel, the presence of a light, the state of light (including burned out light = red), the distance to the signal and a consistency result test which can be used as a validator. It is the user of this information (external computer) which takes a decision by vote (voting on several channels and vote with the time redundancy via consideration of several images analyzed). Knowing the position of the train also allows the SCVS to be completely autonomous and not interfaced to other sensors or information from the train. This meets the objective of limiting the constraints of integration and minimize dependency of driving systems to their environment. We propose an autonomous and a “plug and play” equipment.

6 The Possible Blocking Points 6.1 Filtering the Artefacts The quality of the results is very dependent on the coverage of situations during learning. Outdoors, the visibility of lateral signals can be impaired by most climatic phenomena: snow, ice, rain, hail, fog. A complex infrastructure can also induce difficulties in perceiving (discriminating) RATP signals due to the presence and proximity of other signaling means, for example: possible disruption by road signs, possible disruption by SNCF signaling. Many other artefacts light sources can appear and complicate the task of perception and detection of the signal. In a non-exhaustive manner, the following light sources can be found, for example: reflects on infrastructure (rails, station platform …), lanterns of another train either in front on track 1 or track 2 or on another crossed track,

Safe Recognition A.I. of a Railway Signal by On-Board Camera

11

signal for another way, spot, publicity having a light, typically screens on platforms, special stations, ghost stations, site or work vehicle, lighting in tunnel, paintings illuminated under certain conditions (typically red line indicating the reduced width of certain track sections). A particular case is where an artefact taken for a green light would be more intense than a red signal near the image to be processed and also the cases where there is a successive traffic signal (one red and the other green). The appreciation of the distance to the signal stay the easier solution to deal with the case of misinterpretation of the order of the signals. 6.2 Validation of Products Using Machine Learning Principles There is great difficulty to define the validation coverage with always a partial nature of a validation which is not associated with structural tests. There is a strong dependence of algorithms on the representativeness of learning and validation of this learning. Machine learning techniques are used precisely in cases where there is no formal specification of the objects to be detected but simply examples of such objects. The validation of these products is not established from a specification (but only through consistency with an expected performance specification). 6.3 Depth and Accuracy of Supervised Learning The FMEA identified a second major undesired event which is related to the particular learning process: ER 2: “Incorrect learning creating an inability to correctly interpret a signal” This is a “critical” element of the process, the labelling (examples of entries that are labelled with the desired outputs). Supervised learning can be subject to errors having the following possible causes: forgetfulness, confusion, recording errors, incompleteness, erroneous manual acceptance of an image, sample not sufficiently representative (the number of images used for learning is not sufficient to recognize the panel), and on the contrary image surplus (=> poorly oriented treatment of related picture elements that were actually unrelated to a signal ..), environment in front of the train which evolves relatively to the environment considered for learning (brighter, darker, snow, rain, a new building etc.), particular lighting, position in the metro line, etc. which do not permit identification of panels, or particular shapes of panels. 6.4 Expression of Validation Criteria Validation refers to the evaluation of the output of the software and to the measure of the performance of an algorithm. The term validation does not have the same meaning in computer vision or machine learning as in software safety. We talk here of the validation, in the sense of validation of safety and not in the sense of validation of predictive models during training. The outputs of the model or Machine Learning Computer Vision consist mostly in a numerical value between 0 and 1, often wrongly interpreted as a probability: it is in fact

12

J. F. Boulineau

a simple empirical numerical value which associates a useful and practical indicator for the decision of result. At the end of the test phase, the model does not provide any guarantee as to its future detection rate on new or unseen images. Only the actual onsite test can confirm the empirical estimate of such a detection rate. Detection algorithms have, most of the time, false negatives or false positives with no means of detecting interpretation errors. A Machine Learning algorithm or Computer Vision is evaluated only through its empirical performance and general models (or algorithms) are too complex to be inspected and possibly corrected in case of prediction error. Then, the performance of an algorithm is closely linked: to the richness (completeness) of the database, to the quality of the data, the examples must be correctly labelled, to the versatility of the model (ability to adapt to the new data), to its robustness to disturbance inputs. A Machine Learning model does not “work” in absolute terms but has a certain performance relative to a set of learning and tests and therefore remains conditioned only by the examples seen. This model is trained by minimizing a global error: it can therefore be mistaken on many examples and still display a performance which seems good. At the time of the inference, these models do not estimate a guarantee on the inputs or their adequacy (proximity) in relation to the learning data. In their most classic version, Machine Learning or Computer Vision algorithms return only an arbitrary value and not an actual measure of the uncertainty of the prediction. Uncertainty would be computable by unusual methods introducing credible intervals in the Bayesian sense. These more rigorous methods are complex in detection applications (Bayesian neural networks), they are extremely computational and often rely on nondeterministic algorithms (Monte Carlo Markov Chain in particular): they have not reached a sufficient level of maturity to be integrated into critical systems. Deep learning has the particularity of being able to establish erroneous correlations without anyone knowing which one (the system characterizes in its own way). In fact, nobody knows how to explain and understand how you get to a result. For this only reason, someone could already say that it is then useless to go further, but we are more nuanced and we are looking at how we can make it safer to use an A.I. 6.5 Non-certifiable Components The paradigm of fault finding and “corner cases” specific to dependability are orthogonal to the machine learning and computer vision algorithms which have as performance measure an average error “inside” the nominal domain. The classic V cycle is replaced by a simple training (adjustment of parameters) of a given model on a learning basis and therefore the limits in verification and validation are blurred for this type of algorithms. The final challenge (or even major incompatibility) with EN 50128 is the separation between the data and the software. Machine learning and computer vision algorithms build their internal parameters using data. Conventional notions associated with generic software, application and specific software (= software with parameters) are no longer applicable. It is indeed not possible to conclude on the validity of generic software because there is no specification but just reference data. For all these reasons and many others, current

Safe Recognition A.I. of a Railway Signal by On-Board Camera

13

safety standards and paradigms lead to the rejection of these algorithms for applications involving safety. Now, it seems difficult to imagine standards upgrades to integrate the use of AI directly in safety applications. However, it is useful to go further than this first judgement.

7 The Proposed Solutions on the Hard Points So, we retain the following hard points which compromise the realization of a safe SCVS system (Table 1): Table 1. Treatment of hard points Hard point

Suggested response

Filtering of artifacts

Discrimination by locating the train and mapping => rejection of artefacts. Section 7.1

Learning depth

Double independent learning and double “validation”. Section 7.2 Site tests and fixes. Section 7.3

Safety validation type

Interpretation hardened by vote type analysis on several processed images (video stream analysis not on single frame). Section 7.4 Redundancy and consistency. Section 7.5 NABS approach. Section 7.9

Certification

Diversification to downgrade component SILs (Safety Integrity Level). Section 7.6 Safety-oriented approach to configuration and tools type T3. Section 7.7 Oriented “proven in use” approach. Section 7.8

7.1 Discrimination by Localization A pre-knowledge of position of the panels signaling support is the most effective solution with highest guarantee to refuse any artefact or to find a fault perception. It is then proposed to add a LIDAR: the information extracted from the processing of a LIDAR + consultation of the cartography permits to determine presence or not of a perceived panel and keep a history of the events which reveals inconsistency of results on the multiple processing chains. The untimely detection of panels by camera must be avoided and filtered by the recorded data of cartography and acquisition of the onboard LIDAR. The principle chosen is not to consider a panel in a different position than the position expected by LIDAR and the mapping. The analysis of the panels remains associated with a notion of distance and order of presentation. An important step in 3D mapping must be carried out. This step consists in locating in the cloud of points the various item of the RATP rail infrastructure on the line considered. Note that the LIDARs used for the establishment of the mapping have no obligation to be mounted on trains.

14

J. F. Boulineau

7.2 Double (Redundant) Learning Learning by manual labelling consists in qualifying the quality of the images through a process of double labelling and double validation from the same set of images (two different operators): this is to avoid a bad classification and therefore a poor performance of the detection model. A third person would analyze and correct the discrepancies. A learning process generally represent a big work, such an approach increases the investment involved. But, this solution remains necessary to limit the errors linked to this learning. The picture analysis is relative to images of panels and signals which are standardized in their characteristics. This offers quickly good results, even with short learning. Results for each image are quickly satisfactory (issued of a same video stream). 7.3 Site Tests and Corrections The classic approach, in computer vision and machine learning, is to impose verification and validation (in the formal sense) of the software implicitly by ensuring that the learning domains comply with the specifications. Thus, if the formal specification is “Detection of red lights at a speed below 40 km/h in underground”, the verification and validation process for the design of a detection system based on Computer Vision and Machine Learning must ensure that the learning base used to train such an algorithm includes: a significant part of red lights, a sufficient number of images covering potential use cases (difficult light conditions, glare, input disturbances, etc.), both for learning and testing. And during use, the system must be able to detect some cases that are not or cannot be present in the database. Continuous improvement is possible. 7.4 Voting with Multiple Images (Video Stream) We observe in our first experiments by video processing, with a unique channel of treatment, random dropouts, but these dropouts are limited and can be easily filtered. Quality of results increase if the interpretation is based on the establishment of a condition for N images. This principle corresponds to a principle of temporal redundancy. 7.5 Redundancy and Consistency Classical RAMS (Reliability Availability Maintainability Safety) techniques are applicable to address the random failures. The introduction of diversified redundancies associated with permanent consistency tests for a system which ensures continuous processing with high dynamization makes it possible to find safety solutions which are quite conventional. An architecture associated with a diversification of the hardware and the treatments in theory offers all the capacities to offer a sufficient level of safety as soon as the consistency tests are effective. This type of architecture corresponds to the composite safety solution as described in its main principles in CENELEC standard EN 50129. The criteria of decision of the vote between 3 results compared can be properly defined on an experimental basis, which is the only way to find an acceptable compromise between the safety (no false negatives that is to say wrongly absence of detection of a red light) and the availability (no false positive: light wrongly declared red).

Safe Recognition A.I. of a Railway Signal by On-Board Camera

15

7.6 Safety Integrity Level (SIL) Downgrading SIL downgrading consists only in revising the dangerous random failure rate thresholds for each of the processing chains in regard of the apportionment (confer 50126: 22017 §10.2.2). The SIL level relating to systematic faults and common modes remains addressed by treating the entire SCVS product. Their apportionment modifies therefore not the overall requirement of the functions concerned considering such an integration and validation process product that inherits the overall SIL level required and not SIL parts deducted from each channel TFFR – Tolerable Functional Failure Rate. It remains necessary to consider that if the software is a common mode then we treat (for example) 2 processing chains as a single function and therefore the software cannot be downgraded. The solution then consists in functionally diversifying the processing chains. 7.7 Class T3 Tools Learning can correspond to the configuration of a software with the use of T3 tools (confer EN 50128), the weights can be interpreted as configuration data. Given the strong dependence to the learning results, we can classify the tools and the learning process as a T3 tool (Tool class T3, according to EN 50128: 2011, a tool that generates outputs likely to contribute directly or indirectly, to the executable code - including data - of the safety system). The option chosen is an approach called “dual channel” corresponding to an option considered in the EN 50128: 2011 criterion mentioned in item a, § 6.7.4.4 and to the criteria for double chain mentioned in paragraph c of the same section. 7.8 Proven Product (Proven in Use) This notion of proven in use is associated to the notion of “high confidence established by use”, that means, acknowledging the results achieved with rules of acceptance or refusal. The essential element of the result obtained, apart from the RAMS aspects, is the good response of the system to the different environments. It can be shown that a product is proven if it is possible to obtain observations covering a substantial number of years of safe operation. A proven product must just meet the requirements imposed by any new operational or technical environment (see [10] - §6.5). This expression, deduced from common sense, does not in itself allow us to approach the notion of representativeness of the elements of observation. The representation depends on the duration of observation done and also the number of different use cases and operational situations. Performance algorithms of Machine Learning or computer vision are based on training images that may not be completely representative of the operational reality. In our case, the tunnel situations are reproducible and subject to little variability. This point is less true outdoors but is still relatively controllable by studying behavior in certain borderline cases (rain, snow, sunshine). If the given situations are well considered, it is possible to justify by the use for a given learning – a satisfactory capacity to recognize signals. Here again it is a

16

J. F. Boulineau

question of justifying sufficiency (NABS approach) and not of making a demonstration. There is no acquired or generic demonstration but an observation to be made on a result for a given line, with conditions given by the conditions taken for learning. Contexts too different from these defined conditions must therefore be handled with care. Data processing of Qualification, testing and test processes concern the recording of false positives and false negatives: a specific validation and qualification strategy still needs to be defined to obtain indicators for a NABS approach. We are confident about the possibility of obtaining results quickly on metro lines with little variability of the inputs (not outside). It remains to establish criteria of acceptability and level of risk deducted (see the automobile case in [9]. 7.9 The Concept of NABS and Sufficient Use With new AI-based processing technologies, there is no absolute approach to demonstrate/guarantee a level of performance. Expressing a level of safety therefore remains a priori undecidable. The NABS (Not Absolute But Sufficient) approach integrates all the limits of a demonstration and validation of safety and aims to explicitly extract these limits in order to confine the use of the system in its safe areas. The influence of the learning phase and the significant influence of the environmental conditions on the behavior of the software make the classic validation approaches inapplicable. In a validation process and its effectiveness, the classic approach aims at an obligation of means and results: means to deal with suspected errors not seen during validation by application of criteria items in the process (design, coding, etc.), results by observation of absence of error during tests or during proven use. To reach a level of confidence about the safety of SCVS system, we must rely on an enough commitment means: definition as exhaustive as possible of the environmental conditions in operation (to establish a learning base), definition what we did not see during the learning (for example: circulation outdoor in the snow) and thus clarify the ODD (Operational Design Domain). With ODD it is possible to add a function which detects if the real conditions are in or out the “Definition Domain (DOD)”. The question to be addressed is then to know and define the rare situations (those which pass through learning) and to define a perimeter or limitation of use (ODD) or to have another system which detect and declare a parameter outside the domain (for example detection of snow presence) and can inhibit the function. The commitment on result is established on the basis of onsite observations: onsite experimentation with a maximum coverage of external conditions with forecast of a continuous learning process which aims to take into account the errors of the algorithms during the operation to improve them (enrich the learning base), simultaneously enhance learning and apply a reliability growth model. The verification and validation of a detection system by Computer Vision and Machine Learning should therefore not be understood as a fixed process but rather as a continuous and operational improvement of detection systems, making it possible to gain confidence in their use and ultimately be validated. The NABS objective is then to obtain a characterization of all the parameters included in observations (tests, trials, feedback), to define the areas covered and representative of a demonstration by use, to confine the system to this “comfort zone” thus defined.

Safe Recognition A.I. of a Railway Signal by On-Board Camera

17

8 Conclusion - Towards a POC (Proof of Concept) 8.1 First Off-Line Findings Signals detections are predicted on an HD test video (full course of line 8) and we use BBox Label Tool as a labelling tool. The tool allows easy drawn around objects manually and assign a class (panel, green light, red light). We processed and analyzed a total of 2147 images which we separated into 2 sets: 10% of the images for the validation set (214 images) and 90% for the learning base (1933 images). For information, the labelling of more than 2000 images takes a little more than a day of work. It remains very reasonable. After relearning on L8 (no more red light and larger learning base following an inconclusive first version), the first observations were as follows (Table 2): Table 2. Findings found on Line 8 RATP (from a HD video) Lessons from the first demonstrator Findings

Analysis

No error in interpretation

Very encouraging result given the rather limited effort of learning

Momentary loss of signal interpretation in a part of the curve

Given the proposed architecture, which considers not a condition for an image but a single red-light detection among several images, it appears possible to consider that fugitive loss are not problematics

Possible interpretation of signal visible from As soon as a consistency test with LIDAR and the other trackside the mapping is done, to confirm the presence of a panel, the artefact can be filtered Confusion or difficulty in differentiating circular and rectangular signal

To see and investigate to determine whether this possibility remains permanent or fugitive

High dependence on image quality and capacity to detect distant signal

The maximum of distance is an important factor. Solution are to be found via different focal lengths and an ad-hoc image stabilization

It will remain possible to establish a quantification of the frequency of dangerous failures on a SCVS system with high redundancy and with the care taken for the diversification of the components. This point makes it possible to define a FFR (Functional Failure Rate) and a corresponding SIL reached covering random failures. But this approach remains of course insufficient. For systematic failures: error executions related bugs will be detectable by consistency tests added in the redundant proposed solution, on the other hand, errors remain consubstantial with the quality of learning (supervised). This quality of learning can be observed onsite with the reserve that it remains difficult to decide on the representativeness of the situations that arise during the tests. Only an experiment over an extended period and area can allow to clarify the level on confidence.

18

J. F. Boulineau

It is however possible to improve the safety level by increasing the accuracy and the coverage of the learning base, to detect the “distant” inputs of the learning examples, thereby reducing the risk of undesired events. The major limiting point is that it will not be possible to certify an algorithm of machine learning in regards of the criteria of the standards for the safety of the software. An absolute or almost absolute result is of course not demonstrable on such a system. A possible solution to explore is the concept of proven in use with appreciation and judgment of the result through sufficient testing. The fundamental element is to know the limits of a system by defining a safe domain according to its capabilities (ODD - Operational Domain Definition). The ODD constitutes a limitation of use on all that could not be covered by the learning. 8.2 Induced Limitation of Use Automatic recognition of a railway signal gives different interest depending on the level of automation (grade of Automation): – in GOA2 (with driver => SIL 2), automatic signal detection can be useful for correcting an erroneous driver response (barrier) by improving the overall safety of lines not equipped with stop control or ensuring this control farther upstream of the traffic signal than some repetition loops (check before crossing with the additional treatment of a speed limit condition), – in GOA4 (total automation without human presence => SIL 4). The technical device is made to bear part of the responsibility for driving the train safely (the visual traffic instructions are used for automatic train control). All observations done for the difficulties to demonstrate the safety lead to limited -for the moment- applications for support functions: the function can be used for controlling the driver error (it represents a functional barrier that improves the global safety despite its default options). Applications are restricted to improve safety and not to support safe functions alone… at first! 8.3 Usefulness of a Future Experiment Onsite experimentation is now necessary to master the safety - reliability compatibility with regard to false negatives and false positives. We must evaluate the relationship between the number of images to be processed to obtain a result with vote and interpretation errors, perception distances, impact of external conditions, response time, effects of consistency tests. All these elements can’t be modelized and are not predictable.

References 1. Principe «GAME» (Globalement Au Moins Équivalent)-STRMTG – version 2 2. Temel, D., Chen, M.H.: Traffic sign detection under challenging conditions: a deeper look into performance variations and spectral characteristics. IEEE Trans. Intell. Transp. Syst., 1–11 (2019)

Safe Recognition A.I. of a Railway Signal by On-Board Camera

19

3. Marmo, R., Lombardi, L.: Railway sign detection and classification. In: IEEE Intelligent Transportation Systems Conference, pp. 1358–1363 (2006) 4. Birch, J., et al.: Safety cases and their role in ISO 26262 functional safety assessment. In: Bitsch, F., Guiochet, J., Kaâniche, M. (eds.) SAFECOMP 2013. LNCS, vol. 8153, pp. 154– 165. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40793-2_15 5. Li, G., Hamilton, W.I.: Driver detection and recognition of lineside signals and signs at different approach speeds. Cogn. Technol. Work 8, 30–40 (2006) 6. Ingibergsson, J.T.: Explicit image quality detection rules for functional safety in computer vision. In: VISIGRAPP (6: VISAPP), pp. 433–444 (2017) 7. NF EN 50128: Railways application – Communication, signaling and processing systems – Software for railway control and protection systems, 01 October 2011 8. NF EN 50129 (E) Prépubliée/Prepublished (2018-11-23) - Railways application – Communication, signaling and processing systems – Safety related electronic systems for signaling 9. NF EN 50126: Railway Applications - The Specification and demonstration of reliability, availability, maintainability and safety (RAMS) - Part 1: generic RAMS process, Part 2: Systems Approach to Safety 10. Kalra, N., Paddock, S.M.: How many miles of driving would it take to demonsytrate autonomous vehicle reliability? Transp. Res. Part Policy Pract. 94, 182–193 (2016) 11. “proposed cross acceptance for railways signaling systems and equipment” Committee report N°6 – IRSE – Institution of Railways Signal Engineers

Audio Events Detection in Noisy Embedded Railway Environments Tony Marteau1(B) , Sitou Afanou1(B) , David Sodoyer2 , S´ebastien Ambellouis2 , and Fouzia Boukour2 1

SNCF Voyageurs, Centre d’Ing´enierie du Mat´eriel, Le Mans, France {tony.marteau,sitou.afanou}@sncf.fr 2 COSYS-LEOST, Univ Gustave Eiffel, IFSTTAR, Univ Lille, 59650 Villeneuve d’Ascq, France {david.sodoyer,sebastien.ambellouis,fouzia.boukour}@univ-eiffel.fr

Abstract. Ensuring passengers’ safety is one of the daily concerns of railway operators. To do this, various image and sound processing techniques have been proposed in the scientific community. Since the beginning of the 2010s, the development of deep learning made it possible to develop these research areas in the railway field included. Thus, this article deals with the audio events detection task (screams, glass breaks, gunshots, sprays) using deep learning techniques. It describes the methodology for designing a deep learning architecture that is both suitable for audio detection and optimised for embedded railway systems. We will describe how we designed from scratch two CRNN (Convolutional Recurrent Neural Network) for the detection task. And since the creation of a large and varied training database is one of the challenges of deep learning, this article also deals with the innovative methodology used to build a database of audio events in the railway environment. Finally, we will show the very promising results obtained during the experimentation in real of the model. Keywords: Audio event detection · Abnormal event environment · Railway · Deep learning · CRNN

1

· Transport

Introduction

Surveillance in the railway field is an expensive task. It requires deploying huge resources, both human and material, to ensure the safety of passengers. A whole framework dedicated to this task must be deployed: CCTV cameras and microphones, patrol and surveillance agents, barriers, etc. Nowadays, most of autonomous surveillance systems still require a human operator. With the recent image and signal processing techniques as neural networks (NN) and deep learning (DL), a robust surveillance automation becomes possible. The automation’s aim is to help railway operators by reducing security issues by detecting an event very early and allowing the prompt intervention of the railway police. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 20–32, 2020. https://doi.org/10.1007/978-3-030-58462-7_2

Audio Events Detection in Noisy Embedded Railway Environments

21

Developing audio and video algorithms to detect critical events is not a new research action. But with recent NN innovations, this research area has grown very quickly these last years. Moreover, smart video based event recognition is an active researchfieldbut is more difficult inside the railway vehicle due to occlusion issues. In this context, analysing audio environment of a railway has yield promising results in the past [16,25] with classical machine learning techniques. In this paper, we present a work in line with this question: how to detect some critical events by analysing audio environment inside a train? We propose to design an event detection system based on NN and DL techniques, that will be based on existing audio equipment’s in actual commercial trains. We aim at increasing the capabilities of the actual surveillance systems with automatic detection and identification of some abnormal sounds. The automatic sound classification and recognition are two active areas of study [2,17,22] and are present in various fields of application as speaker recognition [18], speech emotion classification [23,24], urban sound analysis [19], audio surveillance of roads [9], acoustic scene classification [8], event detection [21] and localisation [4]. It aims at detecting the onset and offset times and labelling for each sound event in an audio sequence. The prolific research is due to advances of NN and DL that has deeply changed the way to design and use automatic detection systems, for both sound or image stream. In this paper, we study sound classification algorithms to deal with abnormal audio events recognition. As previously cited, several research have already been conducted between 2005 and 2015: The European research ´ projects BOSS, the french projects SURTRAIN and DEGIV [16,25]. These works did not use DL and NN because the computing power of computers did not allow us to consider on-board setup. Recently, Laffitte et al. showed the way and presented studies on the automatic detection of screams and shouts in subway train using deep neural networks (DNN) [12,13]. A supervised DNN requires a large amount of data for the training task of the model. Obtaining a database combining both quantity, thousands of data, and quality, for which all the “event to be recognised” are precisely labelled become a complex paradigm. Some audio databases are publicly available in the scientific community like databases from different challenges of Detection and Classification of Acoustic Scenes and Events (DCASE) [1] but the embedded railway environment is not generally considered. This is understandable since it is difficult to collect huge amount of recordings in a train. Indeed, railway is a highly regulated environment, where very strict rules must be followed, in particular concerning fire risks assessment, electromagnetic compatibility, vibrations and personal data protection. Therefore, only certified on-board computers can be used in trains. These computers have limited computing resources for heat dissipation considerations. The present paper addresses two problems. The first problem is dealing with the build of a railway synthetic database dedicated to abnormal audio events detection. This database has been built by mixing sound patterns and real embedded railway background sounds. The second problem is focusing on the

22

T. Marteau et al.

design of convolutional and recurrent neural network for abnormal events detection trained from this database. In the introduction section we present some studies on audio classification and detection. The second section present an original database dedicated to the abnormal events detection in railway environment. Two architectures of convolutional and recurrent neural network are presented in the third section. The following sections are dedicated to the experiments description and detection results respectively. Finally the last section presents the conclusions.

2

A Railway Database for Abnormal Events Detection

An embedded railway environment is a very specific place where new acoustic constraints have to be considered. This acoustic environment is very noisy and not stationary. It is a mixture of many acoustic sources emitted from mechanical, electrical and electronic sub-systems working simultaneously and also emitted by passengers. In this context we propose to build a dedicated database by mixing railway background and abnormal event sounds. Both are presented in the following sections and are followed by a description of the mixing method we use. 2.1

Railway Background Sounds

Railway background sounds have been recorded during technical rolling on board of multiple (suburbans, regional and high speed) SNCF train to create variability and make our system less specific. The mobile capture equipment have been placed in the middle and at the tail of the train. Six hours of background sounds have been recorded. The audio signal has been recorded on a single 32 bits channel and has been sampled at 44.1 kHz. These background sounds are a mixture of sounds of the engines, sounds of friction of the wheels on the rails, sounds of air conditioner, commercial audio messages etc. These railway background signal is clearly a polyphonic and not stationary background signal. 2.2

Abnormal Events and Additional Sounds

Four types of abnormal sounds events to detect have been chosen: gunshots, screams, glass breaks and sprays. The samples of these class sounds are extracted from Freesound website [10] in order to check the audio content all the samples have been listen before incorporating them into the final dataset. These abnormal sound are recorded on a 32 bits mono channel signal and sampled at 44.1 kHz. In a commercial train, other operation sounds as buzzers, door opening/closing, passengers conversations etc. appear. Because, these sounds are not recorded during technical rolling, we added all these additional sounds from an other railway audio dataset. The duration distributions of abnormal events and additional sounds are presented in Table 1.

Audio Events Detection in Noisy Embedded Railway Environments

23

Table 1. Duration distribution of the abnormal events sequences and additional sounds.

Gunshot Scream Glass break Spray

Number Total

Min

Mean Max

358 339 175 310

0.13 s 0.28 s 0.38 s 0.13 s

1.13 s 1.14 s 1.35 0.81 s

Add. sounds 459

2.3

404.6 s 335.8 s 237.9 s 253.6 s

2.68 s 1.99 s 2.98 s 1.92 s

910.8 s 0.53 s 1.98 s 2.0 s

Database Samples Generation Process

Here we detail how we mix the background, abnormal and additional sounds to generate one audio sequence of our new database. The duration of each generated sound sequence is 10 s and below is the workflow we follow to process each sound sequence in the dataset: 1. Selection of a background sound randomly in the background dataset. The gain of the audio sample is selected randomly between 0 and −10 dB to create variability without introduce audio saturation. 2. Selection of 0 up to 3 abnormal events to detect. The temporal localisation is fixed randomly within the 10 s of background. Overlapping of samples is allowed and a random gain between −5 and −15 dB is applied. 3. Choice randomly of the presence or not of an other abnormal events (repeated for the three other abnormal sounds). These other events can occur when dedicated events take place. In order to make the system more reliable against these others events, necessary for a correct identification, the detector learn to consider all the other sounds as patterns not to be detected. The temporal localisation is randomly set in the 10 s background. Overlapping of samples allowed and a random gain between −5 and −15 dB is applied. 4. Choice randomly the presence or not of an other normal events. The temporal localisation and the gain are choosen randomly in same interval conditions. The labels are generated at the same time than the integration of the samples of the events to be detected. The labeling is a One Hot encoding labeling: each sequence associated to an event to detect has its own label tensor. Each component of this tensor is initialized to 0, except for the frames when a event to detect occurs where the element is set to 1. The length of this label vector is equal to the length of output model. For example, the Fig. 1 shows an overview of the spectrogram and its label. In this sequence a 2-s scream sample is inserted at the 7.5th second of the background file, the start timestamp will be 7.5 and the end timestamp will be 9.5.

24

T. Marteau et al.

Fig. 1. Spectogram 10 log10 |x(f, t)| of an audio sequence of 5511 frames (10 s) with the label vector associated in red line. (Color figure online)

3

Abnormal Event Detection

The algorithm has to label each abnormal event that appears during one 10 s sequence. All the sequence is consider as the input of our CRNN. We adopt a One-vs-All (OvA) strategy to predict the labels. One CRNN is viewed as a binary classifier designed and trained for one event to detect: scream, gun shot, glass break and spray. We can take into account the polyphonic detection problem ie. the cases where several events appear at the same time. Finally by using dedicated “less complex” networks, we can expect faster detection by using multi-processors capacity of the computer. Each CRNN consists in extracting the feature map of each sequence (convolution layers) and to analyze the temporal coherence of the frequency activity (recurrent layers). Finally, it computes one event activity probability for every frames of the sequence. The final detection is done by applying a threshold σ = 0.5. 3.1

Model Architecture

We propose two models based on the Convolutional Recurrent Neural Network (CRNN) developed in the recent papers [3,5,6,14]. These studies show that the combination of convolutional and recurrent layers allows to jointly capture the invariance in frequency domain and to model short and long term temporal dependencies. The first model consists in the following layers: two convolutional, two Gated Recurrent Unit (GRU) [7] and three fully connected (FC) layers. GRU is prefered to Long Short Term Memory (LSTM) to reduce the number of parameters and avoid the vanishing gradient problem. In many works in audio events detection, MEL coefficients (MFCCs), are generally used as input features [13]. In [6], the input of the network is a log time-frequency representation of the data in MEL band energies over frames. In our work, we use directly the magnitudes of the

Audio Events Detection in Noisy Embedded Railway Environments

25

spectrum [11] and let the first layers optimise the extraction of higher level parameters. N spectra are computed on each 10 s sequence of the database. The basic version of this first model (CRNN 1) is defined as follows (Fig. 2a): – 1 convolutional layer. It is composed of 32 filters that use k × 15 kernel. Here k is the total number of filters in the time-frequency representation. We use a stride of 4 samples to reduce the dimensionality of the resulting feature map. We use a stride rather than pooling to obtain better computational performance [20]. The convolutional layer is activated by a ReLu function. – 1 convolutional layer. This second convolutional layers is composed of 32 filters that use 32×15 kernel, a stride of 1 and is activated by a ReLu function. – 2 layers of GRU. GRUs are used to extract temporal information from the feature map output of the second convolutional layer. Each GRU layer has 32 units. – 3 FC layers. The 2 first layers are respectively 128 and 64 neurons layers activated through a ReLU function. The last one is a single neuron layer activated through a Sigmoid function. These layers gradually reduce the size of the output and are distributed over the time. The last layer is computing the activity probability for each class. The second model (CRNN 2) is inspired by the network architecture of [3]. For this configuration, the convolution operation of the first and the second layer does not integrate all the frequencies as before i.e. it uses a 3 × 15 kernel, 3 along the frequencies range as in [3]. A third convolutional layer is added and a maxpooling operation is used after each convolutional layer. More precisely, the basic version of this second model (CRNN 2) is defined as follows (Fig. 2b): – 1 convolutional layer. The layer is composed of 32 filters that use 3 × 15 kernel. As in CRNN 1 we use a stride of 4 samples to reduce the dimensionality of the feature map. A max-pooling of 5 × 1 is applied to reduce the dimension output along the frequencies range. – 1 convolutional layer. The layer is composed of 32 filters with a 3 × 15 kernel and a stride of 1. It is followed by a 2 × 1 max-pooling. – 1 convolutional layer. The layer is composed of 32 filters with a 3 × 3 kernel, a stride of 1 and is followed by a 2 × 1 max-pooling. – 2 layers of GRU. GRUs are used to extract temporal information from the feature-map of the third convolution. Each layer has 32 units. – 3 FC layers. The 2 first layers are activated through a ReLU function, and the last one through a Sigmoid function. These layers gradually reduce the size of the output and are distributed over the time as in CRNN 1. The last layer is computing the activity probability for each class. For both network, the last FC layer is not providing one probability for all frames. Because of the use of a 4 samples stride for the first layer of each network, the length of the output vector is divided by 4 i.e. it is equal to N/4.

26

T. Marteau et al.

(a) CRNN 1

(b) CRNN 2

Fig. 2. The two basic CRNN architectures.

4

Evaluation

The experiments consist firstly in evaluating the influence of the units number of the GRU and the structure of the convolutional layers for CRNN 1 and CRNN2. Secondly, we compare the performances of both architectures. For this both study the evaluations have been realised with the synthetic mixture described in Sect. 2.3. Finally, in the last run, we present a preliminary result for the portability and the feasibility of detection in real conditions of the railway environment. 4.1

Features Extraction Description

The input features of the networks is the module of the complex-valued spectra computed on a T seconds audio signal x(t). The spectra x(f, n) are calculated by a Fast Fourier Transform using a sliding Hamming windows of 200 samples and a 60% overlap: f and n denote respectively the frequency and the frame index. Finally, the input of the network is a matrix composed of N magnitude vectors |x(f, n)|. In our experiments, T = 10 s, the sampling rate is fs = 44.1 kHz and the input is 101 × 5511 matrix composed of 5511 vectors of k = 101 frequencies. 4.2

Evaluation Procedure in Synthetic Mixtures Case

We evaluate both CRNN architectures by modifying their parameters.

Audio Events Detection in Noisy Embedded Railway Environments

27

In a first step, we study the influence of the number of units per recurrent layer: 0 (simple convolutional network), 32 (the basic CRNN described previously) and 64 units per recurrent layers. For these three cases, the number of convolution filters is fixed to 64. This step is realised for the architecture CRNN 1. In a second step, we test the influence of the filters number per convolutional layers for two configurations: 32 filters (the basic CRNN described previously) and 64 filters. In both cases, the units number of GRU layers is fixed to 32. This step is realised for the architecture CRNN 1 and CRNN 2. CRNN is trained and tested independently for each event using dedicated built synthetic database. In total, we generated for each event 11000 10 s sequences: 7000 sequences for training, 2000 for validation, and 2000 for testing. This corresponds to 30 h of sounds for each class. For learning phase, we use a 0.01 learning rate and the Adam optimizer for binary crossentropy loss function. Early stopping is triggered after twenty iterations without loss improvement on validation database. A batch normalisation, a 0.2 dropout and a layer normalisation are applied on each convolutional or recurrent layer during the training phase. The test phase consist in presenting the 2000 sequences of 5511 spectra to compare the N predictions with the N truth labels for each sequence. With a sequence length of 5511, the length of output vector of the CRNN is equal to N = 1375. The evaluation is made by computing accuracy, precision, recall rates and F 1score [15] for the 2000 × N = 2750000 predictions. The rates are calculated as: accuracy =

TP+TN TP+FP+TN+FN

precision =

TP TP + FP

(1a) (2a)

F 1score =

2·precision·recall precision+recall

recall =

TP TP+FN

(1b) (2b)

where TP, FP, TN and FN are respectively the number of true positive, false positive, true negative and false negative predictions. 4.3

Detection Description in Real Environment

To check the performance of our CRNN architectures under the real conditions of the railway environment, we carried out tests at the SNCF Technicentre des Ardoines, in Vitry sur Seine. There, we were able to access an Ile-de-France region suburban train (Z2N train) operating on lines C and D. The train used for the test was stationary for maintenance reasons. The train, however, had its engine running, and many trains were near which produced a huge amount of noise. We set up the system inside the train and used only one IP camera. When the train is set ON, the system launched automatically with the onboard computer. It assigned itself a fixed IP address and started its autotests. The system was able to connect to the IP camera and start reading the audio stream from the IP camera. Some abnormal events were played using a speaker placed at different locations in the train:

28

T. Marteau et al.

1. Speaker at 2 m from the microphone, and we play 3 different samples 2. Speaker at 4 m from the microphone, and we play 3 different samples 3. Speaker at 6 m from the microphone, and we play 3 different samples The audio stream is captured and stored in a FIFO memory of 10 s refreshed every 0.5 s. The detection tests was performed using the models learned with the synthetic mixture presented in Sect. 2.3 (without new learning phase). In this real context the evaluation consisted in checking the detection of the corresponding event by monitoring logs files of the system. For these experiments only two critical abnormal events have been tested: the scream and the gunshot events. 4.4

Results in Synthetic Environment

The Table 2 and the Table 3 present the results of the experimentation plan described in the Sect. 4.2. For both tables T arget and BG refer respectively to the event class and the background. The first conclusion is that both architectures yield good results in our railway environment with an accuracy over 90%. On the one hand, regarding the impact of the recurrent layers (Table 2), we can observe that the performance decrease without recurrent layers for all rates. It confirms that we need to take into account the temporal evolution of spectral patterns extracted by the convolutional Layers. The GRU layers do not benefit to the spray events that has really complex spectral-temporal structure. For the three other classes, recurrent layers improve target recall: 12% for Scream and up to 92% for Gunshot. On other hand, the number of units in recurrent layers does not influence significantly the quality detection. The Table 3 presents the effect of the number of filters on the performance. It is difficult to highlight a major improvement w.r.t the number of filters used. Nevertheless, for CRNN 1, 32 or 64 filters yield quite similar rates and for CRNN 2, increase the number of filters seems to severely decrease performance for all events except for scream. 4.5

Results in Real Environment

The results in Table 4 present the number of detected events for scream and Gunshot. In general manner, all events are correctly detected. However, it appears clearly that the detection rate depends on the distance between the source and the microphone. The sensibility effect is reduced by increasing the number of microphones. In this case, the microphones have to be distributed in the railway vehicle insuring that the distance between passengers and one microphone is less that 6 m.

Audio Events Detection in Noisy Embedded Railway Environments

29

Table 2. Detection performances for the architecture CRNN 1 in function the absence or the complexity of GRU layers. “0 GRU” stands for no GRU. Event

Config

Accuracy F1 score Target BG

Precision Target BG

Recall Target BG

Scream

0 GRU 32 GRU 64 GRU

0.90 0.93 0.93

0.82 0.87 0.88

0.93 0.95 0.95

0.92 0.92 0.93

0.90 0.93 0.93

0.73 0.82 0.83

0.97 0.97 0.97

Gunshot 0 GRU 32 GRU 64 GRU

0.80 0.91 0.90

0.54 0.83 0.82

0.87 0.94 0.93

0.82 0.89 0.89

0.80 0.91 0.90

0.40 0.77 0.75

0.96 0.96 0.96

Spray

0 GRU 32 GRU 64 GRU

0.95 0.95 0.96

0.87 0.88 0.90

0.97 0.97 0.97

0.92 0.96 0.94

0.95 0.95 0.96

0.83 0.81 0.86

0.98 0.99 0.99

Glass

0 GRU 32 GRU 64 GRU

0.91 0.95 0.95

0.71 0.86 0.86

0.95 0.97 0.97

0.86 0.91 0.92

0.92 0.96 0.96

0.60 0.82 0.81

0.98 0.98 0.99

Table 3. Detection performances in function the complexity of convolutional layers for architecture CRNN 1 and CRNN 2. Event

Config

Acc. F1 score Target BG

Precision Target BG

Recall Target BG

Scream

CRNN1 32 64 CRNN2 32 64

filters filters filters filters

0.92 0.93 0.90 0.91

0.87 0.87 0.81 0.83

0.95 0.95 0.93 0.94

0.89 0.92 0.94 0.92

0.94 0.93 0.89 0.91

0.85 0.82 0.72 0.76

0.96 0.97 0.98 0.97

Gunshot CRNN1 32 64 CRNN2 32 64

filters filters filters filters

0.90 0.91 0.89 0.85

0.81 0.83 0.80 0.71

0.93 0.94 0.93 0.90

0.91 0.89 0.87 0.79

0.90 0.91 0.90 0.86

0.73 0.77 0.74 0.64

0.97 0.96 0.96 0.93

Spray

CRNN1 32 64 CRNN2 32 64

filters filters filters filters

0.95 0.95 0.96 0.87

0.89 0.88 0.91 0.61

0.97 0.97 0.98 0.93

0.93 0.96 0.94 0.96

0.96 0.95 0.97 0.87

0.85 0.81 0.89 0.45

0.98 0.99 0.98 0.99

Glass

CRNN1 32 64 CRNN2 32 64

filters filters filters filters

0.94 0.95 0.96 0.82

0.81 0.86 0.88 0.62

0.96 0.97 0.97 0.88

0.86 0.91 0.89 0.50

0.95 0.96 0.97 0.96

0.76 0.82 0.86 0.83

0.97 0.98 0.98 0.82

30

T. Marteau et al.

Table 4. Event detection results in real environment for three distances between events and microphone. Distance Scream Gunshot

5

2m

3/3

3/3

4m

3/3

3/3

6m

3/3

2/3

Conclusions

In this paper, we present a new railway audio database and two CRNN architectures designed for abnormal audio event detection. Our evaluation show that using a kernel shape of the same size as the number of frequency bands (CRNN 1) yield better rates. As in [3,6], the detection results show that catching the temporal structure of the spectrum improves the performance rates. Increasing the number of filters has a weak impact on the detection performance only for CRNN 1. The complexity of the CRNN 1 and the number of parameters are lower than for CRNN 2. It seems to be a quite promising embedded solution for real railway conditions. Acknowledgement. We would like to thank Helmi REBAI and Martin OLIVIER for strongly contributing to the advancement of this study.

References 1. http://dcase.community/challenge2019/ 2. Abeßer, J.: A review of deep learning based methods for acoustic scene classification. Appl. Sci. 10(6) (2020) 3. Adavanne, S., Pertil¨ a, P., Virtanen, T.: Sound event detection using spatial features and convolutional recurrent neural network. In: IEEE International Conference on Acoustics, Speech and Signal Process, New Orleans, LA, USA, 5–9 March 2017, pp. 771–775 (2017) 4. Adavanne, S., Politis, A., Nikunen, J., Virtanen, T.: Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE J. Sel. Top. Signal Process. 13(1), 34–48 (2019) 5. Adavanne, S., Parascandolo, G., Pertil¨ a, P., Heittola, T., Virtanen, T.: Sound event detection in multichannel audio using spatial and harmonic features. In: Detection and Classification of Acoustic Scenes and Events Workshop, Budapest, Hungary, 3 September 2016 6. Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H., Virtanen, T.: Convolutional recurrent neural networks for polyphonic sound event detection. EEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017) 7. Cho, K., van Merri¨enboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches. In: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014, pp. 103–111. Association for Computational Linguistics (2014)

Audio Events Detection in Noisy Embedded Railway Environments

31

8. Drossos, K., Magron, P., Virtanen, T.: Unsupervised adversarial domain adaptation based on the Wasserstein distance for acoustic scene classification. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 20–23 October 2019, pp. 259–263 (2019) 9. Foggia, P., Petkov, N., Saggese, A., Strisciuglio, N., Vento, M.: Audio surveillance of roads: a system for detecting anomalous sounds. IEEE Trans. Intell. Transp. Syst. 17(1), 279–288 (2016) 10. Font, F., Roma, G., Serra, X.: Freesound technical demo. In: ACM International Conference on Multimedia, Barcelona, Spain, 21 October 2013, pp. 411–412 (2013) 11. Huzaifah, M.: Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. CoRR abs/1706.07156 (2017) 12. Laffitte, P., Sodoyer, D., Tatkeu, C., Girin, L.: Deep neural networks for automatic detection of screams and shouted speech in subway trains. In: IEEE International Conference on Acoustics, Speech and Signal Process, Shanghai, China, 20–25 March 2016, pp. 6460–6464 (2016) 13. Laffitte, P., Wang, Y., Sodoyer, D., Girin, L.: Assessing the performances of different neural network architectures for the detection of screams and shouts in public transportation. Expert. Syst. Appl. 117, 29–41 (2019) 14. Lim, H., Park, J., Lee, K., Han, Y.: Rare sound event detection using 1D convolutional recurrent neural networks. In: Detection and Classification of Acoustic Scenes and Events Workshop, Munich, Germany, 16 November 2017 15. Mesaros, A., Heittola, T., Virtanen, T.: Metrics for polyphonic sound event detection. Appl. Sci. 6(6), 162 (2016) 16. Pham, Q.C., et al.: Audio-video surveillance system for public transportation. In: 2nd International Conference on Image Processing Theory, Tools and Applications, Paris, France, 7–10 July 2010. https://doi.org/10.1109/ipta.2010.5586783 17. Purwins, H., Li, B., Virtanen, T., Schl¨ uter, J., Chang, S., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019) 18. Ravanelli, M., Bengio, Y.: Speaker recognition from raw waveform with SincNet. In: IEEE Spoken Language Technology Workshop, Athens, Greece, 18–21 December 2018, pp. 1021–1028 (2018) 19. Salamon, J., Bello, J.P., Farnsworth, A., Kelling, S.: Fusing shallow and deep learning for bioacoustic bird species classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, LA, USA, 5–9 March 2017, pp. 141–145 (2017) 20. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net. In: 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015 21. Turpault, N., Serizel, R., Salamon, J., Shah, A.P.: Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. In: Detection and Classification of Acoustic Scenes and Events Workshop, New York University, NY, USA, October 2019, pp. 253–257 (2019) 22. Virtanen, T., Plumbley, M.D., Ellis, D. (eds.): Computational Analysis of Sound Scenes and Events, 1st edn. Springer, Cham (2018). https://doi.org/10.1007/9783-319-63450-0 23. Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C., Schuller, B.: Speech emotion classification using attention-based LSTM. IEEE/ACM Trans. Audio Speech Lang. Process. 27(11), 1675–1685 (2019)

32

T. Marteau et al.

24. Zhang, Z., Coutinho, E., Deng, J., Schuller, B.: Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 115–126 (2015) 25. Zouaoui, R., et al.: Embedded security system for multi-modal surveillance in a railway carriage. In: Optics and Photonics for Counterterrorism, Crime Fighting, and Defence XI and Optical Materials and Biomaterials in Security and Defence Systems Technology XII. SPIE, Toulouse, France, 21 October 2015

Development of Intelligent Obstacle Detection System on Railway Tracks for Yard Locomotives Using CNN Andrey Chernov(B)

, Maria Butakova , Alexander Guda, and Petr Shevchuk

Rostov State Transport University, Rostov-on-Don 344038, Russia {avcher,butakova,guda}@rgups.ru

Abstract. The paper proposes an approach to the development of obstacle detection systems on railway tracks for yard locomotives. The proposed approach is illustrated by full-stack technology comprises of hardware construction and software implementation. Original video capture device with double cameras making stereoscopic image recording in the realtime mode has been developed. The novel modified edge detection algorithm recognizes railway tracks and obstacles with on-line noise filtering. Pretrained object detection model containing deep convolution neural network able to distinguish and classify obstacles by its type and size has been implemented. Thus, the yard locomotive equipped with a proposed system can be classified as an intelligent vehicle achieving an autonomous safe-operating unit. Keywords: Obstacle detection system · Intelligent transportation system · Railway industry · Convolution neural network

1 Introduction Any incident that occurs on the railway requires close attention and the identification of relevant circumstances. Due to the extremely complex railway infrastructure, intelligent awareness systems [1–3] are designed and implemented. The essential way of developing autonomous units is by designing [4] and implementing smart transportation vehicles up to smart transportation systems [5]. Various transportation sectors raise different problems concerning self-driving cars [6], automated and autonomous vehicles [7], unmanned aerial vehicles [8], autonomous trains [9], etc. Automation levels and architecture of such systems may be rather different, but literally, every intelligent transportation unit requires an onboard computer (machine) vision subsystem [10]. It is possible to divide the computer vision into two categories: active vision and passive vision systems. The first type of active vision involves interaction with the environment by various types of actions, for example, by saying a word and getting the object reaction. This way is less common than the second one implemented in many technical applications with sensor devices such as range finders, cameras, SONAR, LIDAR, RADAR, and other devices to measure and record sound, light, radio waves. Beyond © Springer Nature Switzerland AG 2020 S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 33–43, 2020. https://doi.org/10.1007/978-3-030-58462-7_3

34

A. Chernov et al.

technical aspects, the final goal every computer vision system is reconstructing the overall scene and distinguishing objects under interest. The methods to reach this goal are quite different and depend on many parameters, and cost-effective financial estimation in this balance plays an essential role. We remain outside the detailed explanation of computer vision methods and refer the reader to the exhaustive monographs [11, 12]. Obstacle recognition systems on railways can be simpler than conventional computer vision systems because the train travels along strictly allocated tracks. The problems concerning parallel railway tracks recognition [13] and 3D reconstruction [14] are extensively studied until now. Much attempts to build an obstacle detection system on railroad tracks are based on using different types of measurement tools and techniques. In [15] multi-sensor systems and technology of the environment perception for obstacle detection have been proposed. Automatic radar target recognition procedure presented in [16] can be used for detecting objects lying on the railway tracks. Another important problem is safety on railway level crossings was raised in [17]. This solution uses laser-based scanning that records highly dense points clouds with scene reconstruction, enabling the detection of small obstacles on the crossings. A study in [18] and its implementation enables the detection of obstacles on railway crossings using optical flow and clustering. Algorithms and software for obstacle detection by cameras remain popular approaches to this day. Paper [19] presented an image processing robust segmentation algorithm with mathematical morphology techniques and binary large object analysis. Background subtraction method from the sequence of image frames recording by moving cameras installed on the train has been proposed in [20]. Algorithms and software for obstacle detection systems much differ in feature detection, preliminary filtering, and objects matching. Feature detection approaches are broadly classified into the following categories: key point-based features, gradient-based features, pattern-based features, color-based features, and deep learning-based features. The latter category is under intensive research in various railway technologies [21–23] now. However, there is a lack of modern obstacle detection systems suitable for implementing on yard locomotives that can be old but are exploiting until nowadays. Those systems must have a simple but reliable design without complex construction while having low-cost prices. The paper proposes an approach to the development of obstacle detection systems on railway tracks for yard locomotives. An approach is illustrated by reproducible full-stack technology comprises of hardware construction and software implementation. The system consists of a video capture device using double video cameras mounted on a yard locomotive and a computing unit that implements a convolutional neural network (CNN). The paper is structured as follows. Section 2 provides information on the hardware construction of the obstacle detection system. Section 3 contains algorithms and software description for that system. In Sect. 4, we present an example of system implementation and discuss results. Section 5 concludes this paper.

Development of Intelligent Obstacle Detection System

35

2 Mechanical Construction and Hardware The section describes required mechanical construction and hardware with a video capture device. It consists of the following components: 1) an onboard computing module in the locomotive cab; 2) a pairing camcorder synchronization unit; 3) telescopic bar mounts with an extension drive and a fixed processing unit for incoming video signals, to which the outputs of the cameras are connected. There are two cameras on each side. They are mounted on gyroscopic platforms on ends of the telescopic bar sections. The video signal processing unit output is connected to a computing module. The module is transferring control signals to the onboard locomotive safety system. The first output of the computing module is connected via the interface unit to the control input of the telescopic bar extension drive. The second output of the computing module is connected to the input of the video synchronizer. A general view of the proposed mechanical construction mounted on a yard locomotive is shown in Fig. 1a, the connection diagram of hardware in Fig. 1b.

a)

b)

Fig. 1. A general view of proposed mechanical construction mounted on a yard locomotive

In Fig. 1 there are: 1 – telescopic bar; 2 – the right section of the bar; 3 – the left section of the bar; 4, 5 – electric drives; 6, 7 – gyrostabilized platforms; 8, 9, 10, 11 – multidirectional high-speed cameras; 12 – video synchronization unit; 13 – interface

36

A. Chernov et al.

unit; 14 – video signal processing unit; 15 – onboard computing module; 16 – onboard locomotive safety system. The control system for the yard locomotive movement before the start of the movement makes a hardware activation of the automatic collision avoidance system. System modules initialization includes following procedures: starting of gyro-stabilized platforms for cameras; their exposure to the working position by extending the automatic drives of both sections on the mounting bar; overall calculation dimensions of the bar and dynamically adjusting in realtime the direction of the camera by automatic drives according to locomotive movement. The maximum dimension of the bar sections relative to the dimensions of the rolling stock does not exceed 15 cm and is included in the permitted vehicle dimension for railway transport. After the start of the automatic collision avoidance system, the interface unit starts a synchronization system for video streams by generating clock pulses, after which all the cameras simultaneously receive images in the motion control software. In the case that the obstacle detection system registers unknown objects on the track, which interferes with the safe use of railway transport, the motion control system transmits a video image from the corresponding pair of cameras and a command to brake the locomotive with the onboard locomotive safety system. If necessary, the operator using the locomotive safety system can control video surveillance units, as well as display images from cameras located on the bar on both sides of the cab and directed to the front and rear hemispheres, on the display of the onboard system in turn, which allows a full view to the driver, reducing the influence of “blind spots,” especially in the case of a single control of a locomotive without an assistant driver. The realtime locomotive safety system transmits a geographical map and information about the rail track infrastructure along with the locomotive motion control system to the global vehicle movement control system for the program to verify the coordinates of detected obstacles. The main feature of developed obstacle detection system construction is providing a full view of the driver along the rail section, which significantly increases the level of passive safety of railways. An example of the image obtained from the locomotive camera is shown in Fig. 2. Such system design, including the cameras, as well as the synchronization of their video streams, allow implementing stereoscopic vision processing algorithms and determining the safe distance from the locomotive to the car coupling device.

Development of Intelligent Obstacle Detection System

37

Fig. 2. An example of an image from locomotive cameras

3 Algorithms and Software We have developed a procedure for detecting unknown obstacles on a rail track, which is used for the yard locomotive control system described in the previous section. That procedure includes several steps and algorithms implemented on each step. Further, we represent each step with a sufficient detail level. At the first stage, we developed and programmed an algorithm for detecting the rail track integrity, which can be used for the operating modes of the locomotive motion control system described in the previous section. This algorithm is intended for determining the rail track has no physical gaps and providing the ability to control the position of rail switches along the track. We used a widely known Sobel operator for edge detection and applied a non-parametric framework for dominant point detection methods [24]. In Fig. 3, a sample of dominant points (N1 and N2 and their projections P1 and P2 on X-axis) for edge detected image from Fig. 2 is presented. The Sobel operator calculates the approximate value of the brightness gradient of the image in the vector form and creates image edges. The dominant point detection method [24] converts that vector form to a digital curve containing indexed essential dominant points with minimizing the integral square error. This approach is fastening further realtime computations. The algorithm allows the input video stream processing in 1080p format using the H.264 codec and RGB channel with the image resolution of 1920 × 1080 pixels and providing 60 frames per second. If the algorithm finds a gap in the set of dominant points, the system decides on the presence of an obstacle; otherwise, the system goes to the next stage. The second stage allows the finding of obstacles lying on the rail tracks surface. To this end, CNN has been developed and trained. We used a CNN (InceptionV3) that has a maximum input image size of 299 × 299 × 3 pixels in RGB format. Therefore, for the

38

A. Chernov et al.

Fig. 3. A sample of dominant points for an image in Fig. 2

effective operation of the network, the dynamic image resizing is used from 1920 × 1080 pixels to 299 × 299 pixels. The development consists of adding the hidden neuron layers defining special features of the rail track intersection by adding convolutional layers based on a vertical Sobel filter. Structural fully connected layers were also added, and to increase the quality of recognition and reduce the effect of retraining dropout neuron dump layers were applied. Thus, based on the aforementioned convolution operations, the initial image of 299 × 299 × 3 pixels is converted into a layer of 768 images of 7 × 7 pixels, which then narrows to the binary output. That is the result of the presence or absence of obstacles on the rail track. Next, an algorithm for training a CNN with forwarding propagation was performed. It is designed to update and timely correct internal synapse weights between neurons. The process of predicting the result takes input data and using neurons calculates the output data in such a way as to be as close as possible to the desired result. Training takes place directly by checking whether the output matches the input, updating the weights in such a way as to fit the current model to the required form. Parameters of the developed CNN are shown in Table 1. The next step is to evaluate the convolutional operations of the model, for which the operation of outputting intermediate convolutional data is used. For convolutions, the operation described above in the theoretical section is used. The result is a map of the edges compiled in different versions, depending on the filter applied to the image. Thus, the comparison of pixels in searches in one case - horizontal edges, and in the case of using a transposed filter - vertical edges. The resulting edge maps of the images are easy to analyze and represent in any case because they represent a mathematical gradient and are a combination of recognition of horizontal and vertical edges. However, for some filters used, it turns out that the gradient is the same (the edges of the image are not recognized, and therefore, on the convolution, such areas are highlighted in the same monotonous color. Thus, the map of the edges gives regions where the gradient vector has approximately the same direction. However, to recognize most of the features used,

Development of Intelligent Obstacle Detection System

39

Table 1. CNN parameters Layer type

Output

Weights

Convnet layer









conv2d_68 (Conv2D)

(None, 7, 7, 192)

258048

activation_67[0][0]

conv2d_69 (Conv2D)

(None, 7, 7, 192)

147456

average_pooling2d_6[0][0]

batch_normalization_60

(None, 7, 7, 192)

576

conv2d_60[0][0]

batch_normalization_63

(None, 7, 7, 192)

576

conv2d_63[0][0]

batch_normalization_68

(None, 7, 7, 192)

576

conv2d_68[0][0]

batch_normalization_69

(None, 7, 7, 192)

576

conv2d_69[0][0]

activation_60

(None, 7, 7, 192)

0

batch_normalization_60[0][0]

activation_63

(None, 7, 7, 192)

0

batch_normalization_63[0][0]

activation_68

(None, 7, 7, 192)

0

batch_normalization_68[0][0]

activation_69

(None, 7, 7, 192)

0

batch_normalization_69[0][0]

mixed7 (Concatenate)

(None, 7, 7, 768)

0

activation_60[0][0] activation_63[0][0] activation_68[0][0] activation_69[0][0]

flatten (Flatten)

(None, 37632)

0

mixed7[0][0]

dense (Dense)

(None, 1024)

38536192

flatten[0][0]

dropout (Dropout)

(None, 1024)

0

dense[0][0]

dense_1 (Dense)

(None, 1)

1025

dropout[0][0]

Total: 47512481

it is necessary to use a sufficiently large number of convolutions, as well as a sufficiently large number of layers to recognize as many features of the input image as possible. The convolution process is shown in Fig. 4. The obstacle on rail track in Fig. 4 is marked by dark blue color. To enhance the training datasets are laid out from a three-dimensional array in such a way as to prepare the data for further analysis in the form of one-dimensional arrays. To prepare the data, first represent the images in the form of one-dimensional arrays, because each pixel is considered a separate input feature. In this case, pre-processed input images of 299 × 299 size are converted into arrays containing 89401 elements. The pixel values are set in shades of gray within the range from 0 to 255. For effective training of neural networks, it is almost always recommended to perform some scaling of the input values as a result of which the pixel values are normalized in the range 0 and 1, dividing each value by a maximum value of 255.

40

A. Chernov et al.

Fig. 4. Convolution process of images containing an obstacle (dark blue) on the rail track (Color figure online)

4 Example and Discussion We used Tensorflow software to implement a model of the developed CNN. The final model consists of 242 hidden layers, and the total number of neurons is 47512481. The hidden layers use the semi-linear activation function ReLu, and the output layer uses the Sigmoid activation function to convert the output values to categorical values, which allows choosing one of two options as the output value of the model. As a loss function, it is not the quadratic error that is used, but the cross-entropy, which in our case will be converted to binary cross-entropy since there are two output options. For probabilistic purposes, this function is designed to maximize the model confidence in the correct definition of the class, and it does not care about the probability distribution of the sample falling into other classes. The model goes through 20 training epochs each time the weights are updated, 1800 input images are used. Such a large amount of input

Development of Intelligent Obstacle Detection System

41

data in one sample is selected based on the specifics of this data - since most of the data come from the video stream of mounted cameras the images turn out to be quite the same for a certain amount of time and their use is advisable in this system. Test data, which are used as a validation dataset, showed the quality of model recognition as it is trained. The value verbose = 2 is used to reduce the output by one line for each training epoch. The CNN training has been performed on a computer with the following parameters: CPU Intel(R) Core(TM) CPU i7-3770K@ 3.50 GHz; GPU Nvidia GeForce GTX 1070; RAM 24 GiB; SSD 512 GiB; OS Microsoft Windows 10 Pro. It takes approximately 9 h for 20 epochs. The system achieves an accuracy of approximately 90.72% and an error of 9.28% on the test dataset. The peculiarity of training this CNN expressed in the intersection and even exceeding the graph of the accuracy of the test data over the accuracy of the training data, is due to input data. Most of the errors of which were recorded on some sections of the rail track leaving other sections without data with errors. General indicators of obstacle detection in the developed system are shown in Table 2. Table 2. Results of obstacle detection Correctly recognized, %

False-positive recognized, %

False-negative recognized, %

84

6

10

Since the main task is to help the yard locomotive driver as well as due to the features of the program (additional verification of neighboring results of the model to eliminate false positives), this accuracy is quite enough for CNN to work effectively.

5 Conclusion The paper proposes a system for detecting obstacles on the railway on the basis of technical means for capturing and filtering images. A mechanical design, as well as algorithmic and obstacle recognition software based on CNN has been developed. Further development of the obstacle detection system involves optimizing actions both in the structure of the neural network and in the system for converting the input video stream to images. Deploying a trained network is effortless and can be completed in a short time. The results allow us to conclude that the future use of obstacle detection systems for the additional control of the movement of yard locomotives is promising. Acknowledgment. The reported study was funded by the Russian Foundation for Basic Research, according to the research projects No. 19-07-00329-a, 19-01-00246-a, 18-08-00549-a, 18-0100402-a.

42

A. Chernov et al.

References 1. Butakova, M.A., Chernov, A.V., Shevchuk, P.S.: An approach for distributed reasoning on security incidents in critical information infrastructure with intelligent awareness systems. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds.) CoMeSySo 2019 2019. AISC, vol. 1046, pp. 248–255. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30329-7_23 2. Chernov, A.V., Savvas, I.K., Butakova, M.A.: Detection of point anomalies in railway intelligent control system using fast clustering techniques. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds.) IITI’18 2018. AISC, vol. 875, pp. 267–276. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01821-4_28 3. Butakova, M.A., Chernov, A.V., Guda, A.N., Vereskun, V.D., Kartashov, O.O.: Knowledge representation method for intelligent situation awareness system design. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds.) IITI’18 2018. AISC, vol. 875, pp. 225–235. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01821-4_24 4. Chernov, A.V., Butakova, M.A., Vereskun, V.D., Kartashov, O.O.: Mobile smart objects for incidents analysis in railway intelligent control system. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Vasileva, M., Sukhanov, A. (eds.) IITI 2017. AISC, vol. 680, pp. 128–137. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68324-9_14 5. Xu, H., Lin, J., Yu, W.: Smart transportation systems: architecture, enabling technologies, and open issues. In: Sun, Y., Song, H. (eds.) Secure and Trustworthy Transportation CyberPhysical Systems. SCS, pp. 23–49. Springer, Singapore (2017). https://doi.org/10.1007/978981-10-3892-1_2 6. Yoganandhan, A, Subhash, S.D, Hebinson Jothi, J., Mohanavel, V.: Fundamentals and development of self-driving cars. In: Materials Today: Proceedings (2020). https://doi.org/10.1016/ j.matpr.2020.04.736 7. Hancock, P.A., Nourbakhsh, I., Stewart, J.: On the future of transportation in an era of automated and autonomous vehicles. Proc. Natl. Acad. Sci. 116(16), 7684–7691 (2019) 8. Kovacevic, M.S., Gavin, K., Stipanovic Oslakovic, I., Bacic, M.: A new methodology for assessment of railway infrastructure condition. Transp. Res. Procedia 14, 1930–1939 (2016) 9. Fantechi, A.: Connected or autonomous trains? In: Collart-Dutilleul, S., Lecomte, T., Romanovsky, A. (eds.) RSSRail 2019. LNCS, vol. 11495, pp. 3–19. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18744-6_1 10. Banic, M., Miltenovic, A., Pavlovic, M., Ciric, I.: Intelligent machine vision based railway infrastructure inspection and monitoring using UAV. Facta Univ. Ser. Mech. Eng. 17(3), 357–364 (2019) 11. Computer Vision: Concepts, Methodologies, Tools, and Applications by Information Resources Management Association. IGI Global, Hershey (2018). 2451 p. 12. Davis, E.R.: Computer Vision. Principles, Algorithms, Applications, Learning. Academic Press, London (2018). 859 p. 13. Qi, Z., Tian, Y., Shi, Y.: Efficient railway tracks detection and turnouts recognition method using HOG features. Neural Comput. Appl. 23(1), 245–254 (2013) 14. Yang, B., Fang, L.: Automated extraction of 3-D railway tracks from mobile laser scanning point clouds. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(12), 4750–4761 (2014) 15. Wang, R., Chen, Y., Wang, H.: Research on environment perception and obstacle detection for unmanned vehicle based on machine vision. In: Qin, Y., Jia, L., Liu, B., Liu, Z., Diao, L., An, M. (eds.) EITRT 2019. LNEE, vol. 639, pp. 565–575. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2866-8_54 16. Mroué, A., Heddebaut, M., Elbahhar, F., Rivenq, A., Rouvaen, J.-M.: Automatic radar target recognition of objects falling on railway tracks. Meas. Sci. Technol. 23(2), 025401 (2012)

Development of Intelligent Obstacle Detection System

43

17. Amaral, V., Marques, F., Lourenço, A., Barata, J., Santana, P.: Laser-based obstacle detection at railway level crossings. J. Sens. (2016). https://doi.org/10.1155/2016/1719230 18. Silar, Z., Dobrovolny, M.: The obstacle detection on the railway crossing based on optical flow and clustering. In: 36th International Conference on Telecommunications and Signal Processing, TSP, pp. 755–759 (2013) 19. Kano, G., Andrade, T., Moutinho, A.: Automatic detection of obstacles in railway tracks using monocular camera. In: Tzovaras, D., Giakoumis, D., Vincze, M., Argyros, A. (eds.) ICVS 2019. LNCS, vol. 11754, pp. 284–294. Springer, Cham (2019). https://doi.org/10.1007/9783-030-34995-0_26 20. Mukojima, H., et al.: Moving camera background-subtraction for obstacle detection on railway tracks. In: Proceedings of the International Conference on Image Processing, ICIP, pp. 3967–3971 (2016) 21. Karagiannis, G., Olsen, S., Pedersen, K.: Deep learning for detection of railway signs and signals. In: Arai, K., Kapoor, S. (eds.) CVC 2019. AISC, vol. 943, pp. 1–15. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-17795-9_1 22. James, A., et al.: TrackNet - a deep learning based fault detection for railway track inspection. In: International Conference on Intelligent Rail Transportation, ICIRT 2018 (2019) 23. Xia, T., et al.: DeepRailway: a deep learning system for forecasting railway traffic. In: Proceedings of the IEEE 1st Conference on Multimedia Information Processing and Retrieval, MIPR 2018, pp. 51–56 (2019) 24. Prasad, D.K., Quek, Ch., Leung, M.K.H., Cho, S.Y.: A novel framework for making dominant point detection methods non-parametric. Image Vis. Comput. 30(11), 843–859 (2012)

Artificial Intelligence for Obstacle Detection in Railways: Project SMART and Beyond Danijela Risti´c-Durrant1(B) , Muhammad Abdul Haseeb2 , Marten Franke1 , Milan Bani´c3 , Miloš Simonovi´c3 , and Dušan Stamenkovi´c3 1 Institute of Automation, University of Bremen, Otto-Hahn-Allee 1, 28359 Bremen, Germany

{ristic,franke}@iat.uni-bremen.de 2 IKOS Consulting Deutschland GmbH, Kemperpl. 1, 10785 Berlin, Germany

[email protected] 3 Faculty of Mechanical Engineering, University of Niš, Aleksandra Medvedeva 14,

18000 Niš, Serbia {milan.banic,milos.simonovic,dusan.stamenkovic}@masfak.ni.ac.rs

Abstract. In this paper an AI-based system for detection and distance estimation of obstacles on rail tracks ahead of a moving train is presented, as developed within the H2020 Shift2Rail project SMART. The system software includes a novel machine learning-based method that is applicable to long range obstacle detection, the distinguishing challenge of railway applications. The development of this method used a novel long-range railway dataset, which was generated during the project lifetime as described in the paper. Evaluation results of reliable obstacle detection using SMART onboard cameras are presented. The paper also discusses the possible use of SMART software for obstacle detection in images taken by drone camera, for future extension of the SMART onboard system to a holistic system for obstacle detection in railways, as planned for SMART2, the follow-up project to SMART. Keywords: Vision-based obstacle detection · Machine learning · Dataset generation for obstacle detection in railways

1 Introduction Obstacle detection is crucial for the safety of a wide range of applications involving moving elements, ranging from robotic manipulators to manned and unmanned vehicles for land, sea, air, and space. As a result of developments in sensor technology and of Artificial Intelligence (AI), in recent years, there has been a rapid expansion in research and development of obstacle detection for road transport. Although railways are the other principal means of transport over land, research and development of obstacle detection in railways has to date lagged behind that for road transport. Rail is statistically by far the safest mode of land transport in Europe. Nevertheless, there is still scope for improvement of rail safety as each year there is a large number of collisions between trains and objects located on or close to rail tracks [1]. Among © Springer Nature Switzerland AG 2020 S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 44–55, 2020. https://doi.org/10.1007/978-3-030-58462-7_4

Artificial Intelligence for Obstacle Detection in Railways

45

various obstacles that can obstruct railway traffic are objects such as road vehicles, large fallen objects such as stones and trees, humans and animals (e.g. moose, deer, cow). Collisions with such objects adversely affects train passenger safety, and in most cases kills or severely injures any live object collided with. Such collisions can furthermore cause infrastructure breakdown, lead to train and rails needing costly repairs, and cause delays in rail traffic. Autonomous obstacle detection gives significant potential to reduce the quantity of these collisions. In contrast to road vehicles, which can change direction to avoid an obstacle, the only way for a train to avoid collision is to come to a complete stop before making contact with an obstacle. This is only possible if the detection distance exceeds the stopping distance of the vehicle. A freight train travelling at 80 km/h has a stopping distance of about 700 m so it is often far too late for the locomotive driver to brake, bring the train to a stop and avoid a collision when an unexpected object is detected on the rail tracks in front of the train [2]. Nevertheless, even if a train needs 1000 m to reach a full stop, the capability to detect certain obstacles at shorter distances can be valuable for prompting the locomotive-driver to decrease the train speed and so to reduce the severity of the collision. One of the main objectives of Shift2Rail project SMART-SMart Automation of Rail Transport [3] was to develop an on-board sensor-based obstacle detection system able to detect an obstacle ahead of the locomotive at the distance up to 200 m (mid-range) and up to 1000 m (long-range). Different camera types (zoomed RGB cameras, thermal and night vision cameras) were integrated into the SMART Obstacle Detection (OD) system and were evaluated within the SMART project in order to investigate possibilities of individual sensors and to find a good practical combination of these camera types that took advantage of the benefits of each type. The chosen vision sensors were supported with novel machine learning-based method for object detection and distance estimation from a single camera. In this paper, the obstacle detection and distance estimation results with SMART RGB cameras are presented. While the problem of autonomous obstacle detection for railways may at a first glance look similar to the much researched obstacle detection problem for road transport, a closer look reveals that the rail obstacle detection problem is somewhat different and remains unaddressed. Relevant datasets for the rail obstacle detection problem are missing as well. This paper therefore makes two contributions: a novel system to the problem of obstacle detection on railways is presented, and also the generation of a novel dataset, including images of distant objects, for obstacle detection and distance estimation in railways is described. The paper is organized as follows. In Sect. 2, an overview of related work focusing on datasets used for machine learning-based methods for autonomous driving is given. SMART AI-based on-board Obstacle Detection (OD) including description of longrange dataset generation is presented in Sect. 3. In Sect. 3.2, the evaluation results are shown. The possible use of SMART on-board OD software for obstacle detection in images taken by drone camera is considered in Sect. 4. The main conclusions are given in Sect. 5.

46

D. Risti´c-Durrant et al.

2 Related Work In last decade, there is significant progress in the research and development of machine learning based methods for object detection for intelligent transportation mainly in automotive field [4]. Intelligent vehicle research is critically dependent on vast quantities of real-world data for development, testing and validation of algorithms before deployment on public roads. Following the benchmark-driven approach of the computer vision community, a number of vision-based autonomous driving datasets have been released including [5, 6]. Most of the existing datasets, however, contain no information for potential obstacles and their location/distance with respect to the vehicles. Due to the lack of this information, most of existing datasets cannot be used for distance estimation methods, which is however, beside recognition of the objects on the road, crucial information for drivers/autonomous cars to avoid collisions and adjust its speed for safety driving. The KITTI dataset [7] addresses these issues with object annotations in both 2D and 3D. The dataset includes high-resolution grayscale and color stereo cameras, a LiDAR and fused GNSS/IMU sensor data. The accurate ground truth in benchmarks is provided by the Velodyne laser scanner and the GPS localization system. Each object in the dataset is described by a number of parameters, including the information about the class of the objects (8 object types available), the 2D bounding box information of an object, the dimension and location of the 3D object as well as its orientation in camera coordinate system. The Z coordinate of object location represents the object distance to the camera. The distance range in KITTI dataset is among 0 to roughly 120 m, which makes this dataset appropriate for the applications, where looking some 200 m ahead [8] is sufficient. However, KITTI dataset is not sufficient for railway applications where long-range obstacle detection is crucial. Within the intelligent transport research the rail domain has received little contributions. As a consequence, relevant datasets are missing. To the best of the authors’ knowledge, RailSem19 (rail semantics dataset 2019) is the first public dataset for semantic scene understanding for trains and trams [9]. RailSem19 offers 8500 unique images taken from the ego-perspective of a rail vehicle (trains and trams). Extensive semantic annotations are provided, both geometry-based (rail-relevant polygons, all rails as polylines) and dense label maps with many Cityscapes-compatible road labels. Many frames show areas of intersection between road and rail vehicles (railway crossings, trams driving on city streets). RailSem19 is thus useful for rail applications and road applications alike. However, datasets focus on railway infrastructure and though there are some objects on the rail tracks annotated such as humans and vehicles they are not labeled as obstacles and they do not have assigned distance. The first dataset, which contains railway obstacles and related distance annotation including long-distances was developed within the Shift2Rail project SMART. This dataset was used for development of novel SMART machine learning-based obstacle detection and distance estimation method as explained in following.

3 SMART AI-Based Obstacle Detection A novel method was developed in SMART project to support the vision hardware in directly estimating of the object distances. It consists of two parts: the first one is deep

Artificial Intelligence for Obstacle Detection in Railways

47

learning based object detection and the second one is neural network-based distance estimation (Fig. 1). The details of development of both parts are given in [10]; in the following an overview of the complete system is given.

Fig. 1. DisNet-based object distance estimation system.

The object detector can be any bounding box-based deep learning method, which extracts the bounding box (BB) of the detected object in the input image. In SMART project, one of state-of-the-art Convolutional Neural Network (CNN)-based models for objects BBs prediction named YOLO v3 [11] was used, which is suitable for realworld applications. The distance estimator, which was developed in project SMART, is a feedforward neural network named DisNet of 3 hidden layers, each containing 100 hidden units. The DisNet estimates distance between each detected object and the on-board camera based on the object BB features. Used YOLO network originally was trained with COCO dataset [12], which consists of images of complex everyday scenes containing common objects in their natural context. Although COCO dataset contains some images of railway scenes, such as about 3500 images annotated with the class trains, it does not contain images of explicit scenes of objects on the rail tracks and, moreover, does not contain images of distant objects. In order to enable YOLO network to detect objects in railway scenes, with particular focus on distant objects, a Transfer Learning method was used to quickly re-train YOLO network on SMART data without the need to re-train the entire network. Given an input image, the objects’ bounding boxes can be directly extracted from the outputs of the re-trained model. Once objects have been isolated by YOLO, information pertaining of these objects and their bounding boxes are extracted as features to be used for subsequent object distance estimation by DisNet. The following features of an extracted object’s BB are calculated: Height, Bh = (height of the object BB in pixels/image height in pixels) Width, Bw = (width of the object BB in pixels/image width in pixels) Diagonal, Bd = (diagonal of the object BB in pixels/image diagonal in pixels) For each extracted object bounding box, a six-dimensional feature vector v is calculated:   1 1 1 (1) v= , , , Ch , Cw , Cd . Bh Bw Bd The ratios of the object bounding box dimensions to the image dimensions Bh , Bw and Bd enable the reusability of DisNet trained model with a variety of cameras independent

48

D. Risti´c-Durrant et al.

of image resolution. Parameters C h , C w and C d in (1) are the values of average height, width and depth of an object of the particular class. For example, for the class “person” C h , C w and C d are respectively 175 cm, 55 cm and 30 cm, and for the class “car” 160 cm, 400 cm and 180 cm. The features C h , C w and C d are assigned to the objects labelled by YOLO classifier as belonging to the particular class in order to complement 2D information on object bounding boxes and so to give more information to distinguish different objects. Above feature vector is input to DisNet. After the training of DisNet, given an object’s bounding box feature vector, the object-specific distance can be directly extracted from the outputs of the trained model. In order to re-train YOLO network and to train DisNet, the dataset was created by manually extracted bounding boxes of objects in the images of rail scenes, recorded in field tests by SMART cameras, and by assigning the ground truth distance to labelled objects as explained in the following. 3.1 SMART Dataset Generation In order to collect relevant image data to support development of SMART software for obstacle detection and distance estimation in railways, dynamic field tests were performed on the Serbian railways tests sites upon getting the relevant permission by Serbian railway infrastructure. For this purpose, the sensors were integrated into sensors’ housing mounted on the front profile of the locomotive (Fig. 2(a)). The sensors’ housing was vibration isolated to prevent transmitting of vibrations from the locomotive onto the cameras as moving vehicle vibration can severely deteriorate quality of acquired images. The vibration isolation system was designed with the rubber-metal springs, as described in [13].

SMART ODS

(a)

(b)

Fig. 2. SMART field tests for dataset generation (a) Vision sensors for obstacle detection integrated into sensors’ housing mounted on the frontal profile of a locomotive below the headlights. (b) Train route on the Serbian part of the pan European corridor X towards Thessaloniki during the SMART dynamic field tests.

For all dynamic field tests, the SMART OD system was mounted onto the train locomotive in the locomotive depot workshop located in Niš junction in Serbia. The

Artificial Intelligence for Obstacle Detection in Railways

49

locomotives were then running to “Red cross” station, which was the starting point for all dynamic run tests. The following dynamic field tests were performed: • “Red cross” station - Niš Marshalling Yard – Ristovac on 16.07.2018. The test was performed with in service train of the operator Serbia Cargo, Locomotive 444-018, pulling 21 wagons with total mass of 1194 t and total train length of 458 m. The wagons were attached to the locomotive in Niš Marshalling Yard. The test length was 120 km on the Serbian part of the pan European corridor X (Fig. 2(b)), the average speed was 34 km/h and the run on the whole length lasted 3.5 h. On the straight railtracks sections, between Niš Marshalling Yard and station Grdelica, the maximal speed was 80 km/h. In Grdelica gorge, the speed was limited to 30 km/h due to the highway construction works, which were performed in the gorge at the time of tests. Upon leaving the gorge, the maximal train speed was again 80 km/h. SMART team members mimicked objects (obstacles) on two crossings along the route according to previously adopted test protocols. During the rest of the test, as the train was in real traffic, accidental objects were detected along the route. These objects represented possible obstacles, which could cause an accident, for example a truck crossing the unsecured crossing at station “Momin Kamen” while train was approaching (Fig. 3, first column middle). • “Red cross” station – Leskovac – Ristovac on 08.05.2019 The test was performed with in service train (Locomotive 444-003) of the operator Serbia Cargo pulling 16 wagons with total mass of 998 t and total train length of 224 m on the same route as the previous dynamic test performed in July 2018. As the highway construction works were finished, the maximal speed on the whole section was 80 km/h. The wagons were attached to the locomotive in station Leskovac. The dynamic test run ended in dusk, with the train arrival to Ristovac station on 19:35, which allowed the dynamic test in different lighting conditions (Fig. 3, first column bottom). As in July 2018 tests, SMART team members mimicked objects (obstacles) on several crossings along the route according to previously adopted test protocols. During the rest of the test, as the train was in real traffic, accidental objects were detected along the route. All dynamic tests were finished successfully, according to the test plans. The mounting and dismounting of the OD system demonstrator was performed under 30 min in all dynamic tests, and there were no disruptions of operations of the Serbia Cargo. The SMART dataset comprises approximately 8 h of raw sensor data from a train mounted SMART OD system in dynamic field tests. The series of recorded videos were converted into sequential frames of images. In the dataset images both static and moving obstacles are present, including humans, vehicles, bicycles and animals (some examples are shown in Fig. 3). As given in Table 1, each object in the dataset is described by a number of parameters, including the information about the class of the object, the 2D bounding box (BB) information of an object and the ground truth distance to the cameras. (xul , yul ) and (xbr , ybr ) are respectively image coordinates of the left upper corner of the object bounding box and image coordinates of the right bottom corner of the object bounding box. The ground truth distances were calculated off-line, using recorded GPS coordinates of the running train and Google maps GPS coordinates of the

50

D. Risti´c-Durrant et al.

obstacles locations (e.g. at crossings and at known locations near railway infrastructure). The obstacle distance range covered by recordings in dynamic field tests was determined by the real-world operational environment. It is important to note that during the dynamic field tests, because of the rail-tracks configuration, there were no segments longer than 600 m viewed with on-board sensors were the obstacles could be recorded, so that distance range covered by SMART real-world railway dataset covers distances up to 600 m. Table 1. Dataset structure Image Frame No.

Object class

(xul , yul )

(xbr ,ybr )

BB width (pixels)

BB height (pixels)

BB diagonal (pixels)

Ground truth distance (meters)

Fig. 3. Examples from SMART railway dataset. Different object classes on the/near the rail tracks (humans, different vehicles, animals).

3.2 Evaluation Some of the results of the DisNet object distance estimation in RGB images recorded in operational field tests, which were not used for training of the modules of SMART AI-based OD system, are given in Fig. 4. The estimated distances to the objects (persons) detected in the images are given in Tables 2 and 3. Shown results illustrate the capability of SMART on-board OD system to reliable recognize objects belonging to different object classes located on the/near the rail tracks.

Artificial Intelligence for Obstacle Detection in Railways

51

Fig. 4. DisNet estimation of distances to objects in a rail track scene from the RGB camera image.

Table 2. Estimated distances vs Ground truth (Fig. 4(a)). Object

Ground truth

DisNet

Error

Person 1 266,69 m 231,42 m 13,22% (station middle point) Person 2 281,44 m 5,53% Person 3 Car

597,87 m

280,87 m

5,31%

593,71 m

0,69%

Evaluation results demonstrate the accuracy of SMART OD system. Larger errors such as one in the case of Person 1 in Fig. 4(a) mainly are caused by imprecision of object bounding box extraction. Concretely, in the case of Person 1, the object detector extracted bounding box which was larger than the person region in the image, so that DisNet estimated object distance that was smaller than the real distance. This problem would

52

D. Risti´c-Durrant et al. Table 3. Estimated distances vs Ground truth (Fig. 4(b)). Object Ground truth DisNet Person Car Horse

Error

48,21 m

50,41 m 4,56%

120,44 m

128,59 m 6,76%

48,21 m

43,39 m 9,99%

be addressed by authors in the future work by refinement of extracted bounding boxes using a traditional computer vision techniques, or by replacement of bounding box-based deep learning object detector with an instance segmentation deep learning-based object detector. In the case of horse detection in Fig. 4(b), larger error in distance estimation is a consequence of small number of annotated image with labeled “horse” object in SMART dataset. In order to cope with this problem authors will in the future work extend the SMART railway dataset with larger number of images per object class over wider variates of objects’ classes. In particular, objects which are specific for obstacle detection in railways such as fallen trees will be included in the dataset extension.

4 Beyond the SMART Project One specific challenge of the railway application is that even the perfect on-board longrange OD system, which gives no false positives and no false negatives, is not sufficient as it cannot “see” potential obstacles around curves. As one possible solution to this problem, authors suggested including the drones to monitors the rail tracks parts at critical locations such as curves and complementing on-board OD system with a dronebased detection of objects, potential obstacles, on the/near the rail tracks. Such extension of the on-board OD system to a holistic OD system for railways is one of the objectives of recently started Shift2Rail project SMART2 [14], which is follow-up project to SMART. As preliminary work to SMART2 research and development of drone-based OD system, authors investigated possibilities of using SMART AI-based on-board OD software for detection of object in the rail tracks scene. Some preliminary results are shown in Fig. 5. As it can be seen from Fig. 5(a) so long the drone’s camera view on the scene is close to on-board (frontal) camera’s view on the scene, the SMART on-board OD system reliably detects the objects (and the probability of classifying objects is close to 1), and it estimates the distances (from detected objects to the drone camera). However, with larger deviation of drone camera’s view from the on-board camera’s view, the onboard SMART OD system fails in detected the objects. As it can be seen in Fig. 5(b) representing a vertical bird (drone) view image of the same scene as in Fig. 5(a), only a person has been detected though with a low classification probability, while cars were not detected. These preliminary results indicate necessity to either extend the SMART on-board dataset with drone cameras’ images or to develop a separate drone-based OD system using drone specific dataset. In order to enable the design of new algorithms for reliable drone-based obstacle detection it is needed to have access to appropriate data. To the best of the authors’

Artificial Intelligence for Obstacle Detection in Railways

53

Fig. 5. Object detection and distance estimation in drone images of the same railway scene using SMART on-board OD software.

knowledge, Stanford Campus Dataset [15] is the very first large scale dataset that collects images and videos of various types of objects (e.g. pedestrians, bicyclists, skateboarders, cars, buses, and golf carts) that navigate in a real world outdoor environment such as a university campus. However, Stanford dataset contains only so-called vertical bird view images (images taken by a drone camera with the optical axes perpendicular (vertical) to the ground). A drone dataset, which gives more variety to drone view angles is VisDrone dataset [16] and gives more variety to background scenes (e. g road scenes in urban areas, costs and nature scenes). However, VisDrone dataset does not contain images of railway scenes that would be useful for development of drone-based detection of obstacles on the rail tracks. Even though drones are already being used in railways for tasks such as examination of infrastructure elements [17], to the best of authors’ knowledge there are no drone datasets specific to railway applications. Therefore, there is need to generate such a dataset, which will be one of the objectives in authors’ future work within the SMART2 project. Generation of such dataset will certainly be connected

54

D. Risti´c-Durrant et al.

with particular constraints as specific permission of railway infrastructure managers will be needed. Starting from the positive experience and knowledge gained in SMART project on performing real-world field tests for the purpose of data recording, authors are confident that positive results will be achieved in future real-world field tests for the purpose of drone vision data recording.

5 Conclusion In this paper, the AI-based system for on-board obstacle detection (OD) and distance estimation in railway applications is presented. The system was developed and evaluated in Shift2Rail project SMART. The object detector part is based on a state-of-the-art CNN-based object detector, indicating that OD systems for railways can benefit from vast amount of existing datasets used in computer vision community for object detection. However, this paper demonstrates that the OD railways applications could benefit from existing datasets only up to certain level. In order to cope with challenges of OD in railways, such as long-range OD, it is necessary to re-train existing networks with railway specific data such as data generated in SMART project and presented in this paper. The paper also discusses the possible use of SMART on-board OD method for OD in images taken by drone camera, for future extension of the SMART onboard system to a holistic system for obstacle detection in railways, as planned for SMART2, the follow-up project to SMART. It is stressed that in order to cope with challenges of drone-based obstacle detection, the object detector part of the on-board OD system has to be re-trained with drone images of rail scenes. Acknowledgements. This research received funding from the Shift2Rail Joint Undertaking under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 730836. Special thanks to Serbian Railways Infrastructure and Serbia Cargo for support in realization of the SMART on-board OD field tests and preliminary drone-based field tests.

References 1. GoSAFE RAIL project. http://www.gosaferail.eu. Accessed 14 Apr 2020 2. Berg, A., Öfjäll, K., Ahlberg, J., Felsberg, M.: Detecting rails and obstacles using a trainmounted thermal camera. In: Paulsen, R.R., Pedersen, K.S. (eds.) SCIA 2015. LNCS, vol. 9127, pp. 492–503. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19665-7_42 3. Shift2Rail project SMART. http://www.smartrail-automation-project.net. Accessed 24 June 2020 4. Dairi, A., Harrou, F., Senouci, M., Sun, Y.: Unsupervised obstacle detection in driving environments using deep-learning-based stereovision. Robot. Auton. Syst. 100, 287–301 (2018) 5. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. Int. J. Robot. Res. 36, 3–15 (2017) 6. Janai, J., Güney, F., Behl, A., Geiger, A.: Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.05519v1 [cs.CV] (2017)

Artificial Intelligence for Obstacle Detection in Railways

55

7. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013) 8. Pinggera, P., Franke, U., Mester, R.: High-performance long range obstacle detection using stereo vision. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (2015) 9. Zendel, O., Murschitz, M., Zeilinger, M., Steininger, D., Abbasi, S., Beleznai, C.: RailSem19: a dataset for semantic rail scene understanding. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop (2019) 10. Haseeb, M.A., Guan, J., Risti´c-Durrant, D., Gräser, A.: DisNet: a novel method for distance estimation from monocular camera. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems – IROS (2018) 11. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement arXiv:1804.02767v1 [cs.CV] (2018) 12. COCO dataset. https://arxiv.org/pdf/1405.0312.pdf. Accessed 19 Feb 2019 13. SMART project, Deliverable 2.2. http://www.smartrail-automation-project.net/images/Att achment_0_2.pdf. Accessed 29 Dec 2019 14. Shift2Rail project SMART2. https://smart2rail-project.net/. Accessed 20 June 2020 15. Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory prediction in crowded scenes. In: European Conference on Computer Vision (ECCV) (2016) 16. Zhu, P., et al.: Vision meets drones: past, present and future. arXiv preprint arXiv:2001.06303 (2020) 17. Flammini, F., Pragliola, C., Smarra, G.: Railway infrastructure monitoring by drones. In: 2016 International Conference on Electrical Systems for Aircraft, Railway, Ship Propulsion and Road Vehicles & International Transportation Electrification Conference (ESARS-ITEC) (2016)

Anomaly Detection for Vision-Based Railway Inspection Riccardo Gasparini1 , Stefano Pini1 , Guido Borghi1(B) , Giuseppe Scaglione2 , Simone Calderara1 , Eugenio Fedeli2 , and Rita Cucchiara1 1

AIRI - Artificial Intelligence Research and Innovation Center, Universit´ a di Modena e Reggio Emilia, Modena, Italy {riccardo.gasparini,s.pini,guido.borghi,rita.cucchiara}@unimore.it 2 RFI - Rete Ferroviaria Italiana Gruppo Ferrovie dello Stato, Florence, Italy {g.scaglione,e.fedeli}@rfi.it

Abstract. The automatic inspection of railways for the detection of obstacles is a fundamental activity in order to guarantee the safety of the train transport. Therefore, in this paper, we propose a vision-based framework that is able to detect obstacles during the night, when the train circulation is usually suspended, using RGB or thermal images. Acquisition cameras and external light sources are placed in the frontal part of a rail drone and a new dataset is collected. Experiments show the accuracy of the proposed approach and its suitability, in terms of computational load, to be implemented on a self-powered drone. Keywords: Railway inspection · Anomaly detection vision · Deep learning · Self-powered drone

1

· Computer

Introduction

A crucial element to guarantee the safety of rail transport is the visual inspection of railways, in order to ensure the absence of obstacles placed on the railroad track that could cause damages or even the derailment of trains. These inspection activities are generally conducted during nighttime, when the train circulation is usually suspended. In this context, due to the vastness of railroads, an automatic inspection system is strongly needed. Therefore, in this paper, we propose a vision-based framework to tackle the obstacle detection task in videos acquired from a rail drone, i.e. a self-powered light-weight vehicle moving along railways and operated by remote control, which computes the analysis locally. To deal with the night time, the rail drone is equipped with thermal and RGB cameras, in addition to external light sources. The proposed framework is a combination of two sequential deep networks, an autoencoder and a binary classifier, as shown in Fig. 1. From the point of view of the computer vision research field, we interpret the detection of obstacles as the anomaly detection task, i.e. the ability to identify samples that exhibit significant differences with respect to a regularity. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 56–67, 2020. https://doi.org/10.1007/978-3-030-58462-7_5

Anomaly Detection for Vision-Based Railway Inspection

57

Fig. 1. Overall view of the proposed framework (using thermal data). From the left, the acquired frame is cropped, then fed into the autoencoder. The reconstructed frame is then used to compute the absolute and the gradient difference images that are stacked and fed into the classifier network. This classifier outputs the presence or absence of anomalies in the frame.

This task is a key element in many real-world applications, such as video surveillance [18], defect detection [20], reinforcement learning [27] and medical imaging [31]. In these applications, the acquisition sensor is often assumed to be in a raised and fixed position, resulting in images and videos with a static background [33]. In particular, this condition is present in industrial video-based systems [14] and video surveillance ones [30]. Furthermore, many approaches are based on supervised learning [3,16] that often require manual, time-consuming and expensive annotations along with the assumption that all anomalies are known during the training process. Differently from these works, in this paper we investigate the anomaly detection task using images taken from a moving camera. Indeed, the acquisition devices are placed in the frontal part of the rail drone, close to the railroad. Due to the lack of public railway datasets focused on the anomaly detection task, we collect more than 30k frames from a rail drone moving on the track during the night. The new dataset contains more than 50 recordings, with and without anomalies, acquired with multiple synchronized cameras, i.e. RGB and thermal cameras (used in this paper) in addition to stereo and depth sensors. As we focus on the railroad safety, we considered anomalies consisting in many categories of objects which are usually employed in rail yards, such as track lifting jacks, pickaxes, rail signals and so on. Samples of anomalous objects are depicted in Fig. 2 in both RGB and thermal domains.

2 2.1

Related Work Anomaly Detection on Railways

At the time of writing, there are no works that address the task of anomaly detection through visual data in the railway scenario during nighttime. Only similar task have been addressed, such as track detection [12,17,36] and collision prediction [22,23]. Unfortunately, datasets are not often publicly available.

58

R. Gasparini et al.

To detect obstacles on railways, many literature works exploit the use of infrared (IR) of ultra-sonic range sensors, usually placed in the frontal part of the train. For instance, [26] proposed a system based on a range sensor to perform obstacle detection. Specifically, an infrared emitter is exploited and a light turns on when an object is detected within the (limited) working distance. A framework using GSM and GPS modules is proposed in [29]: similar to the previous work, an infrared emitter, in combination with the other modules, is exploited to detect obstacles in front of the train. A LiDAR is exploited in [24]: the sensors is coupled with a camera to detect obstacles on railway tracks. In [15] pairs of infrared sensors are places on both the railway sides: a lack of connection between the two devices, specifically an emitter and a receiver, reveals the presence of obstacles. We note the scarcity of publicly-released dataset in this research field. Only recently, a public dataset for semantic scene understanding, acquired from the point of view of a train and a tram, namely RailSem19 [34], has been introduced. RailSem19 contains specific annotations collected for a variety of tasks, including the classification of trains, switch plates, buffer stops and other elements related to the railway scenario, but not anomalies and obstacles. In general, existing works addressing the anomaly detection on railways are often ad hoc systems, created for a specific scenario and employing specific infrared emitters. There is a lack of systems based only on vision-based systems. 2.2

Anomaly Detection in Computer Vision

From a general point of view, literature work are categorized in two different approaches: reconstruction-based models and probabilistic methods. The former learn a parametric reconstruction of normal data through different methods, such as sparse-coding algorithms [11,35], deep encoder-decoder architectures [18] or GANs [13,30]. A similar approach is the future frame prediction, in which anomalies are detected comparing the differences between a predicted future frame and the current one [21]. The latter approximate a density function of motion features and normal appearance. In this case, optical flow and trajectory analysis, exploiting nonparametric [2] and parametric [5] estimators, are often used. Highly-dynamic scenarios, such as images taken from a moving rail drone for the railway inspection, represent a tough challenge to these state-of-art methods based on fixed cameras. Only recently, an unsupervised approach has been proposed for the traffic accident detection [33] in which the acquisition device is a dashboard camera. In [10], a dataset of crowd-sourced dashcam images is presented and a supervised method that detect anomalies, in terms of motorbike and car collisions, is proposed. Abati et al . [1] introduce an anomaly detection method capable of working in the automotive scenario [25]. However, the visual content is purposely discarded and only eye fixations are employed.

Anomaly Detection for Vision-Based Railway Inspection

(a) Fuel Tank

(b) Lifting jack

(c) Balise

(d) LPG Tank

(e) Traffic light

(f ) Insulating stick

(g) Rail signal

(h) Pickaxe

59

Fig. 2. Some examples of anomalies included in the acquired data. The upper row contains frames acquired with the RGB camera (and external illuminators), while the lower row reports the same classes collected through a thermal device.

3

Data Acquisition

As mentioned above, we record a new dataset to overcome the lack of public railway datasets. Data has been collected placing multiple sensors in the frontal part of a rail drone, very close to the cobbled road. The acquisition activity has been done during the night: to the best of our knowledge, this is the first dataset collected for the anomaly detection task in the nighttime railway scenario. Therefore, the acquisition system needs to comply with three main requirements, derived from the automotive context [7,8]: – Fast Acquisition: since cameras are placed on a rail drone, the frame rate and the shutter speed of the acquisition devices must be sufficiently high to avoid motion blur caused by the high speed of the drone (up to 100 km/h). – Night Vision: acquisition devices must deal with the night time. In this context, the adoption of external light sources and the use of thermal cameras

60

R. Gasparini et al.

is required. Since the acquisition system is places on a self-powered rail drone, it is important to limit the power consumption of the light sources. – High Resolution: in order to detect even small-sized anomalies at long distances, the sensors must have a high spatial resolution. To conform to these requirements, the following cameras and light sources are employed: – Basler acA800-510uc1 : this is an industrial camera with an extremelyhigh frame rate (more than 500 fps) that however is limited to a low spatial resolution (800 × 500 pixels). We equipped this camera with a 12.5–75 mm zoom lens. With this camera, external light sources are needed. – Light sources: we use two types of light source. The first one is the LED Light Bar 470 2 : this headlamp is a compact lightweight bar, having a low profile and a power consumption of only 35 W. It is useful to illuminate wide areas close to the drone. The second light source is the Comet 200 LED 3 . Being a high-beam headlamp with a power consumption of 13 W and only 495 g of weight, it is useful to illuminate areas that are far from the drone. – Flir Boson 6404 : this is a high-resolution thermal camera, having a spatial resolution of 640 × 480 pixels, which is able to acquire up to 60 frames per second. Its small-size form factor (21 × 21 × 11 mm), limited weight (7.5 g) and low energy consumption (only 500 mW) make it suitable to be installed on a rail drone. The camera is equipped with a 14 mm lens. – Zed Stereo camera5 : this is a stereo camera carefully designed for the outdoor setting. The spatial resolution is 4416 × 1242 pixels, the acquisition range is up to 20 m of distance and the acquisition rate ranges from 15 to 100 frames per seconds (depending on the resolution). To have real time performance at the maximum resolution, it requires a dedicated graphic processing unit (GPU). In the acquired data, anomalies are objects placed on the railroad track. We select and employ the following objects, which are the common tools used in the construction sites along the railways: – – – – – – 1 2 3 4 5

Electrical Insulator Fuel Tank Rail Signal Pickaxe Locking turnout Track lifting jack

– – – – –

Traffic light Insulating stick LPG tank Balise Oiler

https://www.baslerweb.com/en/products/cameras/area-scan-cameras/ace/aca800510uc. https://www.hella.com/truck/it/LED-LIGHT-BAR-470-Single-Twin-3950.html. https://www.hella.com/offroad/it/Comet-200-LED-1626.html. https://www.flir.it/products/boson. https://www.stereolabs.com/zed.

Anomaly Detection for Vision-Based Railway Inspection

61

A sample of each of these classes is depicted in Fig. 2 in RGB and thermal domains. Every frame is annotated with two labels: the presence of one or more obstacles (i.e. whether the frame contains an anomaly) and the location, expressed with bounding boxes, of each visible obstacle.

4

Proposed Framework

We propose a deep learning-based framework based on 2 sequential modules. The first one is an autoencoder network [6], i.e. an encoder-decoder architecture whose goal is to reconstruct the input frame, while the second one is a binary classifier network [4], predicting if the input frame contains or not an anomaly (i.e. an object placed on the railway). The whole system is depicted in Fig. 1 and described in the following. 4.1

Autoencoder

As mentioned above, the first module of the framework is an autoencoder that aims to reconstruct the input frame passing through an intermediate bottleneck. The input of the model is a single frame while the output is the reconstructed one. During the training, the network receives as input only regular frames, i.e. frames without any anomaly. In this way, the network should learn to reconstruct only normal frames thus the output should always result in a clean image devoid of any anomaly, even if the input frame contains anomalies. Finally, the reconstructed frame is compared with the original input frame, through an absolute and a gradient difference, i.e. a difference computed on the gradients of the two images. The resulting 2 difference images are stacked and used as input for the second module, the classifier. Model. This neural network accepts as input images with a spatial resolution of 192 × 192 pixels. The encoder architecture consists in 9 convolutional layers with kernel size 3 × 3. The first and last two layers have stride s = 1 while other layers have stride s = 2. The decoder architecture is symmetrical: it is composed of 9 transpose convolutional layers, to up-sample the input feature maps, with kernel size 3 × 3. The first two and the last layer have stride s = 1 while the remaining ones have s = 2. Regarding the feature maps, their size is doubled (and then halved) at each layer, except for the first one, starting from 16, arriving to 1024 in the bottleneck, and then reducing down to 16 again at the end of the decoder architecture. The final output is a 192 × 192 pixels image. We exploit the Leaky ReLu [32] activation function with slope α = 10−2 . This deep architecture have  22M parameters.

62

R. Gasparini et al.

Training Procedure. We train the autoencoder with an unsupervised approach since, as mentioned above, the network receives only frames without anomalies during the training procedure. We adopt two different loss functions. The first one is the Mean Squared Error loss, here referred as LM SE , defined as: LM SE =

M N  1   II (m, n) − IR (m, n) 2 2 M N m=1 n=1

(1)

where II , IR are the input and the reconstructed image, respectively, of size M × N pixels. In addition, we propose to use a Gradient Loss (LG ) defined as: LG =

M N  1   GI (m, n) − GI (m, n)2 I R 2 M N m=1 n=1

(2)

where GII and GIR are the gradients computed on the input (II ) and the reconstructed (IR ) images with a spatial resolution of M × N pixels:  (3) G = G2x + G2y  1  Gx = I ∗  1 1

 0 − 1  0 − 1  , 0 −1 

  1 1 1  0 0 0 Gy = I ∗   −1 −1 −1

     

(4)

in which the ∗ symbol is the convolution operator. These equations, introduced in [28], perform the calculus of the gradients along both the horizontal and vertical dimension of an image. Minimizing this loss function is equivalent to improve the definition of lines and contours in reconstructed frames. Finally, the general loss L is defined as a weighted sum, taking inspiration from [9], of LM SE and LG , as follows: L = α · LM SE + β · LG

(5)

In our experiments, we set α = β = 1, the learning rate is set to 10−3 and the Adam [19] optimizer is used. 4.2

Classifier

This module is a deep binary classifier, predicting if a frame contains or not anomalies. The input is represented by the two (absolute and gradient) difference images stacked together. The output is a binary label representing the presence or the absence of any anomaly. Using the two difference images as input, the network can use both the variations in terms of textures and the variations in terms of contours and lines.

Anomaly Detection for Vision-Based Railway Inspection

63

Model. This neural network is a lightweight CNN that shares the architecture with the encoder module described previously, but the number of filters is halved and thus ranges from 8 to 256. Moreover, the last convolutional layer is removed and replaced with a flatten operation and 2 sequential linear layers with 48 and 2 units. We add a dropout regularization with drop probability p = 0.3 between the linear layers and, as in the autoencoder model, we exploit the Leaky ReLu [32] activation function with slope α = 10−2 . This network contains  700k parameters. The output of the model is a binary classification which corresponds to 1 if the frame contains anomalies and to 0 if it does not. Training Procedure. We train the binary classifier with a supervised approach, using both frames with anomalies and frames without anomalies during the training procedure. The Binary Cross Entropy (LBCE ) loss is employed as objective function: LBCE = −(y log(p) + (1 − y) log(1 − p))

(6)

in which log is the natural log, p is the predicted probability of a class and y is the binary label corresponding to anomalous and non-anomalous frames. Table 1. Results of the proposed framework for both RGB and thermal data. The system achieves satisfactory results, confirming the applicability to real-world applications. The usage of thermal data results in higher scores. Input type Accuracy Precision Recall F1-score

5

RGB

0.811

0.979

0.719

0.825

Thermal

0.966

0.989

0.957

0.973

Experimental Evaluation

In this Section, we evaluate the proposed framework using as input the RGB images (converted in gray-scale), acquired with the Basler camera (supported by the external light sources), and the thermal images, collected by the Flir Boson thermal camera. Further details about the acquisition devices are reported in Sect. 3. In order to train and test the proposed framework, we create appropriate splits: we group all the frames containing anomalies and randomly sample about 80% of the frames for the training and validation phases and the remaining 20% for the testing one. Then, we randomly sample an equivalent number of regular frames from the dataset and we add them to the training, validation and testing splits.

64

R. Gasparini et al.

Fig. 3. Sample output of the autoencoder network of the proposed system, both for the intensity (left) and the thermal (right) domains. The first row represents the acquired frames used as input. The second row contains the reconstructed frames, while the third and fourth rows shows the absolute and the gradient difference images, respectively. The last columns reports the case in which no obstacles are placed on the railway.

For all experiments, we exploit the following common metrics: prediction accuracy, precision, recall and F1-score. We report the obtained results in Table 1. We compare the use of RGB and thermal images as input data. We note that in general performance are good for both the data domains, revealing that the framework is able to deal with different data types and that the use of an autoencoder combined with the analysis of absolute and gradient difference images is a suitable approach in order to detect anomalies on the railways, as depicted in Fig, 3. Thermal data are probably a better choice than RGB data to achieve the best overall results: indeed, thermal cameras do not depend from external light sources (hence the energy consumption of the system is lower), but usually they are more expensive than the RGB ones and have a limited acquisition framerate and resolution. As shown in Fig. 2, anomalies appear more evident, with a better contrast with respect to the railroad: this element could contribute to the better performance of the framework using thermal data. Experiments are conducted on sequences acquired during good weather conditions. We also test the speed performance of the proposed framework, computing how many frames the architecture can process each second. In order to meet the

Anomaly Detection for Vision-Based Railway Inspection

65

requirement imposed by the use of a rail drone in terms of energy consumption and computation performance, we run the tests on a PC equipped with an Intel i7-8700 CPU (3.60 GHz, 60 W) and a Nvidia P4000 GPU (100 W). The deep networks are implemented in Pytorch. The framework runs in real-time, reaching about 190 frames per second. This result has been obtained carefully designing the two architectures, balancing between the number of layers and total parameters and the computational load of the overall system.

6

Conclusion

In this paper, we propose a deep vision-based framework capable of detecting anomalies (i.e. obstacles) on railways that could affect the safety of the train transport. The proposed system combines an autoencoder and a binary classifier in order to label input frames as normal or anomalous. Experimental results are carried out on a dataset acquired on the railways during the night and confirm the feasibility and the accuracy of the proposed approach. In addition, the proposed system can operate in real-time. Future work will regard the introduction of the stereo data in the framework and the usage of a GPU-based embedded board equipped with an ARM processor, such as the Nvidia Jetson TX2 6 . Moreover, future work will focus on the localization and classification of the detected anomalies as well as on adverse weather conditions that may influence the acquisition process. Acknowledgements. We thank Ivan Mazzoni (RFI), Marco Plano (RFI) e Mattia Bevere (RFI) for the technical support and accurate annotations.

References 1. Abati, D., Porrello, A., Calderara, S., Cucchiara, R.: Latent space autoregression for novelty detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 481–490 (2019) 2. Adam, A., Rivlin, E., Shimshoni, I., Reinitz, D.: Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30(3), 555–560 (2008) 3. Akcay, S., Atapour-Abarghouei, A., Breckon, T.P.: GANomaly: semi-supervised anomaly detection via adversarial training. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 622–637. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6 39 4. Ballotta, D., Borghi, G., Vezzani, R., Cucchiara, R.: Head detection with depth images in the wild. In: 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. SCITEPRESS (2017) 5. Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008) 6. Borghi, G., Fabbri, M., Vezzani, R., Cucchiara, R., et al.: Face-from-depth for head pose estimation on depth images. IEEE Trans. Pattern Anal. Mach. Intell. (2018) 6

https://developer.nvidia.com/embedded/jetson-tx2.

66

R. Gasparini et al.

7. Borghi, G., Frigieri, E., Vezzani, R., Cucchiara, R.: Hands on the wheel: a dataset for driver hand detection and tracking. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 564–570. IEEE (2018) 8. Borghi, G., Gasparini, R., Vezzani, R., Cucchiara, R.: Embedded recurrent network for head pose estimation in car. In: 2017 IEEE Intelligent Vehicles Symposium (IV), pp. 1503–1508. IEEE (2017) 9. Borghi, G., Pini, S., Vezzani, R., Cucchiara, R.: Driver face verification with depth maps. Sensors 19(15), 3361 (2019) 10. Chan, F.-H., Chen, Y.-T., Xiang, Yu., Sun, M.: Anticipating accidents in dashcam videos. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10114, pp. 136–153. Springer, Cham (2017). https://doi.org/10.1007/978-3319-54190-7 9 11. Cong, Y., Yuan, J., Liu, J.: Sparse reconstruction cost for abnormal event detection. In: CVPR 2011, pp. 3449–3456. IEEE (2011) 12. Espino, J.C., Stanciulescu, B.: Rail extraction technique using gradient information and a priori shape model. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 1132–1136. IEEE (2012) 13. Fabbri, M., Borghi, G., Lanzi, F., Vezzani, R., Calderara, S., Cucchiara, R.: Domain translation with conditional GANs: from depth to RGB face-to-face. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1355–1360. IEEE (2018) 14. Filev, D.P., Chinnam, R.B., Tseng, F., Baruah, P.: An industrial strength novelty detection framework for autonomous equipment monitoring and diagnostics. IEEE Trans. Ind. Inform. 6(4), 767–779 (2010) 15. Garc´ıa, J.J., et al.: Dedicated smart IR barrier for obstacle detection in railways. In: 31st Annual Conference of IEEE Industrial Electronics Society. IECON 2005, pp. 6-pp. IEEE (2005) 16. G¨ ornitz, N., Kloft, M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection. J. Artif. Intell. Res. 46, 235–262 (2013) 17. Gschwandtner, M., Pree, W., Uhl, A.: Track detection for autonomous trains. In: Bebis, G., et al. (eds.) ISVC 2010. LNCS, vol. 6455, pp. 19–28. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17277-9 3 18. Hasan, M., Choi, J., Neumann, J., Roy-Chowdhury, A.K., Davis, L.S.: Learning temporal regularity in video sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 733–742 (2016) 19. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 20. Kumar, A.: Computer-vision-based fabric defect detection: a survey. IEEE Trans. Ind. Electron. 55(1), 348–363 (2008) 21. Liu, W., Luo, W., Lian, D., Gao, S.: Future frame prediction for anomaly detectiona new baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6536–6545 (2018) 22. Maire, F.: Vision based anti-collision system for rail track maintenance vehicles. In: 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 170–175. IEEE (2007) 23. Maire, F., Bigdeli, A.: Obstacle-free range determination for rail track maintenance vehicles. In: 2010 11th International Conference on Control Automation Robotics & Vision, pp. 2172–2178. IEEE (2010)

Anomaly Detection for Vision-Based Railway Inspection

67

24. Mockel, S., Scherer, F., Schuster, P.F.: Multi-sensor obstacle detection on railway tracks. In: IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), pp. 42–46. IEEE (2003) 25. Palazzi, A., Abati, D., Solera, F., Cucchiara, R., et al.: Predicting the driver’s focus of attention: the DR (eye) VE project. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1720–1733 (2018) 26. Passarella, R., Tutuko, B., Prasetyo, A.P.: Design concept of train obstacle detection system in Indonesia. IJRRAS 9(3), 453–460 (2011) 27. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17 (2017) 28. Prewitt, J.M.: Object enhancement and extraction. Pict. Process. Psychopictorics 10(1), 15–19 (1970) 29. Punekar, N.S., Raut, A.A.: Improving railway safety with obstacle detection and tracking system using GPS-GSM model. Int. J. Sci. Eng. Res. 4(8), 282–288 (2013) 30. Sabokrou, M., Khalooei, M., Fathy, M., Adeli, E.: Adversarially learned one-class classifier for novelty detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3379–3388 (2018) 31. Schlegl, T., Seeb¨ ock, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9 12 32. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015) 33. Yao, Y., Xu, M., Wang, Y., Crandall, D.J., Atkins, E.M.: Unsupervised traffic accident detection in first-person videos. arXiv preprint arXiv:1903.00618 (2019) 34. Zendel, O., Murschitz, M., Zeilinger, M., Steininger, D., Abbasi, S., Beleznai, C.: Railsem19: a dataset for semantic rail scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019) 35. Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011, pp. 3313–3320. IEEE (2011) 36. Zwemer, M.H., van de Wouw, D.W., Jaspers, E.G., Zinger, S., et al.: A vision-based approach for tramway rail extraction. In: Video Surveillance and Transportation Imaging Applications 2015, vol. 9407, p. 94070R. International Society for Optics and Photonics (2015)

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture Roberto Nappi1(B) , Valerio Striano1 , Gianluca Cutrera2 , Antonio Vigliotti2 , and Giuseppe Franz`e3 1 SYENMAINT srl, Torre del Greco, Naples, Italy {roberto.nappi,valerio.striano}@syenmaint.it 2 RFI - Rete Ferroviaria Italiana, Rome, Italy {g.cutrera,a.vigliotti}@rfi.it 3 DIMES, University of Calabria, Rende, Italy [email protected]

Abstract. In this paper, a model-based maintenance approach is developed for rolling stocks vehicles operating along railway networks. By considering the high management costs in the modern and complex railways fleets as a primary requirement, the key goal of the proposed approach consists in efficiently integrating maintenance actions with the capability to satisfactorily keep railway services. Here, this is achieved by means of multi-layer approach that combine into a single framework the following ingredients: interpolation procedures, machine learning algorithms and prediction arguments that take advantage of an accurate model description of the rolling stock dynamics. Experiments on a PV7 EVO - Matisa, owned by the Italian Railways Network, have been conducted with the aim to show the effectiveness of the proposed maintenance architecture.

Keywords: Machine learning vehicles

1

· Predictive maintenance · Rolling stock

Introduction

Many complex systems, in different engineering application fields, railways, aerospace, aeronautics, nautical industry and so on, operate in specific environmental conditions for which it is required to be compliant with requirements such as Reliability, Availability, Maintainability and Safety (RAMS). In particular, the maintainability concerns with the maximization of the so-called system life-time under a minimum global cost, i.e. the Life Cost Cycle, see [7,8,17]. Moreover, maintenance as a function, compared to other areas in operations, is considered to be of a fuzzy nature. Maintenance has not led itself to systematization due to the fact that its activities were not repetitive in the same manner as operations tasks. Hence, there is a need for an iterative and a systematic approach to maintenance practice. It has been observed that decision makers in maintenance often seek to be efficient before being effective. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 68–77, 2020. https://doi.org/10.1007/978-3-030-58462-7_6

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture

69

In recent years, the system maintenance has gained an increasing relevant and strategic role for improving the economic competitiveness within several industrial markets [16]. In fact, a correct and low-demanding cost operation of highly complex processes requires that constant maintenance actions have to be performed in order to avoid undesired production interruptions. To comply the latter, traditional approaches are essentially designed via a cyclical preventive logics whose the task, most of the time, reduces to repairing actions, see [9] and references therein. More recently, condition-based maintenance architectures, where decision making is based on the observed “condition” of an asset, have been proposed with the aim to improve as much as possible the system lifecycle, see e.g. [10–12]. Compared with cyclic preventive maintenance prevailing in current practice, condition-based maintenance is more efficient, as it is able to suggest timely and crucial actions by predicting the evolution of the deterioration process [13]. Besides this, the condition-based maintenance methodology has the aim consists of reducing as much as possible the uncertainty on the maintenance operations timing. The idea is to perform adequate actions according to the system status monitoring (condition-monitoring). Predictive-based maintenance schemes are in principle able to exploit such information for efficiently planning the maintenance schedule in order to mitigate failure occurrences and to modify during the on-line operations the time interval of the maintenance cycle. In this context the so-called prognostic parameters have a key role because, by exploiting the plant mathematical model, are in charge to monitor the system behavior and to provide a measure on the deviation of the plant from admissible operation conditions. This essentially allows to estimate the Remaining Useful Life (RUL) of the system [14]. Hence, the development of a reliable condition monitoring maintenance architecture prescribes the use of Artificial Intelligence, Neural Networks, Machine & Deep Learning techniques capable to handle large amounts of data. Starting from these premises, here a model-based maintenance management scheme has been developed for rolling stocks vehicles [15]. Specifically, the architecture exploits three different diagnostic methodologies for estimating the system RUL: – a model-based algorithm makes use compares the real monitored condition with the model of the object in order to predict the fault behavior [6]; – a case-based algorithm takes advantage of historical records of maintenance cases to provide an interpretation for the actual monitored conditions of the equipment [5]; – a rule-based algorithm detects and identifies faults in accordance with the rules representing the relation of each possible fault with the actual monitored equipment condition [4]. The major contribution of this paper can be summarized as follows. First, an accurate mathematical formulation of the rolling stock vehicle for prediction purposes is derived, then a novel multi-layer scheme for condition-based maintenance schedule is developed. The proposed architecture is able to determine

70

R. Nappi et al.

adequate actions for a long track over a long prediction horizon, and provides an efficient work plan based on individual defects for the recommended maintenance actions at each time step.

2

Problem Formulation

In the sequel, the interest is devoted to develop an ad-hoc maintenance strategy for rolling stocks vehicles whose dynamics is fully detailed in [2]. By taking into account that one of the relevant requirements for a vehicle operating in railway networks is to protract as much as possible its lifetime, the problem to deal with can be stated as follows: Problem Statement - Find a management strategy for rolling stocks vehicles such that the frequency of maintenance actions is minimized according to the following criteria: 1. reduction of the downtown machines; 2. costs mitigation in response to faults occurrences. Essentially, the problem will be addressed as follows: derive an accurate model description of the rolling stock dynamics to be used within a predictive philosophy for individuating in advance the remaining useful life of the vehicle. The next sections will be devoted to properly detail such an idea.

3

The SYENMAINT PLATFORM RAIL

The Syenmaint Platform Rail (SPR) is a multi-layer architecture that, by resorting to innovative tools, has been conceived with the aim to manage predictive maintenance processes for railway systems [18]. Roughly speaking, the key idea can be summarized as follows. By properly positioning a set of sensors on rolling stocks, the SPR built-in allow to detect anomalous dynamical behaviours that in turn activate pre-specified maintenance countermeasures on the vehicle of interest. As one of its main features, the SPR is capable to transmit relevant data to the control station, e. g. the class of the detected anomaly and the vehicle geopositioning. Essentially, the SPR structure consists of four layers: data are collected by a sensor set powered via energy harvesting mechanisms (Layer 1); the Hardware unit acquires such information (Layer 2) and the Firmware processes them by exploiting machine learning and artificial intelligence algorithms with the main aim of estimating the Remaining Useful Life (RUL) of the Device Under Test (DUT) (Layer 3). Finally, the estimated RUL value is transmitted to the Software unit on the remote side, which is in charge to implement the integrated management of the maintenance process (Layer 4) (Fig. 1). From a methodological perspective, a model-based approach is fully exploited. The main idea can be summarized as follows. First, a formal meanvalue mathematical description of the rolling stock vehicle dynamics is derived, then the resulting set of continuous-time differential equations is characterized

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture

71

Fig. 1. The SYENMAINT PLATFORM RAIL architecture

in terms of its coefficients and validated by means of the procedure reported in Fig. 2. There, process and model are excited by the same input sequences in order to evaluate the so-called degradation patterns that in turn are used by the Decision Support System (DSS) to estimate the timing on anomalous behavior occurrences.

Input data

Plant

Measurements

+ DSS Model

Predicted data

-

Fig. 2. Vehicle model validation

Finally, a machine learning algorithm [1] is used to predict the maintenance cycle. Specifically, the algorithm consists of a target/outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these sets of variables, a function that map inputs to desired outputs is generated. The training process continues until the model achieves a desired level of accuracy on the training data. By referring to Fig. 3, one has that two sequences of stored data (measurements and outputs) are manipulated

72

R. Nappi et al.

by the Interpolator unit wit the aim to derive the interpolating law according to a least square criterion [3]. Then, the Predictor is in charge to predict the variable behavior by exploiting a new data set and the above interpolating function.

stored data

Interpolator

mathematical model

Predictor new data sequences

predicted data

Fig. 3. Machine learning based scheme

4

Experimental Set-Up

In this section, the SPR will be experimentally tested by considering a rolling stock vehicle (PV7 EVO - Matisa), owned by the RFI (Italian Railways Network) company. 4.1

Operating Scenario

As shown in Fig. 4, the use of the SPR unit prescribes the on-board installation of the following devices: – a set of sensors (accelerometers, gyroscopes, inclinometers) that allow to detect the vehicle dynamics; – encoders and/or GPS receivers for localization purposes; – Data acquisition systems; – synchronization units in charge to allow a correct execution of the firmware pertaining to processing, on-board storing and packet communication (vehicle-to-PC/Server and vehicle-to-mobile devices) tasks.

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture

73

Fig. 4. TPV7 EVO with the on-board devices provided b the Syenmaint.

5

Experiments

In this section, the main experimental results are collected in Figs. 5, 6, 7 and 8. First the analysis of the mean value of the axle displacement Y [mm], pertaining to the displacement dynamical evolution, puts in light that it is possible to predict the condition under which Y (t) will overcome the critical threshold 10 mm. As a consequence, a re-scheduling of the prescribed maintenance actions can be dynamically carried out. By analyzing the behavior of the axle displacement Figs. 5 and 6, it results that the SPR architecture is capable to identify the condition under which a maintenance action must be performed, i.e. once the axle displacement slope with respect to the traversed kilometers overcomes a pre-assigned upper bound 10 mm, see Fig. 6. A second important experiment is summarized in Figs. 7 and 8, the active facet hem slope is determined by exploiting the equivalent conicity depicted in Fig. 7. There, it is important to underline that a linear law describes the relationships between the wear and the traversed kilometers: Δδ = 0.1◦ /1000 km

(1)

Notice that in Fig. 8 the red continuous line represents the admissible lower bound on the active facet hem slope, i.e. qR > 6.5 mm. By taking a look to the evolution of qR with respect to the traversed kilometers, one again can predict the qR disrepair. In particular, in the present experiment, it is required a maintenance action (re-turning or wheel replacement) around after 30000 km traversed.

R. Nappi et al.

y[m]

74

Time [s]

y[m]

Fig. 5. The axle displacement - mean value at the rest: 6 mm

Km Fig. 6. The axle displacement slope vs traversed kilometers

γ Time [s]

qR

Fig. 7. Equivalent conicity

Km Fig. 8. The active facet hem slope qR vs traversed kilometers (Color figure online)

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture

6

75

Conclusions

In this paper, an ad-hoc management scheme for maintenance purposes of a class of rolling stocks vehicles has been presented. As one of its main features, the proposed strategy allows to predict the RUL specification by resorting to modelbased arguments. The latter allows to schedule in an “optimal” way the required maintenance process. Experiments have been conducted on a PV7 EVO - Matisa vehicle with a twofold aim: first to validate the rolling stock mathematical model, then to show the effectiveness of the proposed predictive architecture. Although preliminary, the achieved results are encouraging and open the doors to address maintenance processes in industrial contexts by a formal approach capable, in principle, to significantly mitigate economic costs. 6.1

Future Developments

Future research directions will consider the use of rolling stock models as a physical-based digital twin to train an Artificial Neural Network (ANN) in order to obtain a Machine Learning (ML) based digital twin. Even if the ML based approach is a no exact mathematical representation of the system under observation, it can lead to several advantages with respect to the classical physical based approach. To cite a few: – Improvement of the capability description of the model both over time and data lines (reinforcement learning); – Knowledge access to complex relations and patterns; – Not physics domain knowledge required; – Easily to be customized in response to design changes; – Low computational demanding. On the other hand, ML digital twin development is in charge to deal with a changeling the main challenging problem: availability of large amount of training data and labeled dataset. The training method of the ML digital twin of the rolling stock would require knowledge of the information related to all the states of the system (taking into account normal and abnormal operating conditions) and time-intensive activities for labeling the large amount of data mandatory for any ANN training. The latter puts in light the fact that synthetic datasets, arisen via the rolling stock physical model, will be used to proper training ML models in a cost and time-effective way. The physical based digital twin can be used not only to derive proper datasets for each operative state of the system, but even to automating the data labelling process, which will feed the ANN. Under such an approach it is expected to obtain a triple representation of the rolling stock system based on the physical model, real data and ML digital twin, allowing even the development of a Physics-Guided Neural Network (PGNN) model of the rolling stock. This will allow to compare the performance of the developed models in terms of accuracy levels of the ROL prediction.

76

R. Nappi et al.

Acknowledgments. The SYENMAINT PLATFORM RAIL has been developed at the SYstems ENgineering for MAINTenance (SYENMAINT s.r.l.) laboratory within the Campania NewSteel Startup Incubator at the Business Innovation Center of the Naples Science Center. In particular, the SPR device has been designed under the SIMBAS (Smart Innovative Model Based Approach) grant in collaboration with the Italian Railways Network company, FS Innovation (Dr. Franco Stivali, Dr. Rita Casalini, Dr. Ugo Cerretani) and OPENITALY-ELIS (Dr. Luciano De Propris, Dr. Federico Merlo, Dr. Gabriele Barbalace, Dr. Riccardo Panunzio).

References 1. Li, H., Parikh, D., He, Q., Qian, B., Li, Z.: Improving rail network velocity: a machine learning approach to predictive maintenance. Transp. Res. Part C Emerg. Technol. 45, 17–26 (2014) 2. Iwnicki, S.: Handbook of Railway Vehicle Dynamics. CRC Press, Boca Raton (2006) 3. S¨ oderstr¨ om, T., Stoica, P.: System Identification. Prentice-Hall Inc., Upper Saddle River (1998) 4. Tse, P.: Neural networks based robust machine fault diagnostic and life span predicting system. Ph.D. thesis, Univerisity of Sussex, UK (1998) 5. Milne, R.: Strategies for diagnosis. IEEE Trans. Syst. Man Cybern. 17(3), 333–339 (1987) 6. Butler, K.L.: An expert system based framework for an incipient failure detection and predictive maintenance system. In: International Conference on Intelligent System Application to Power Systems Proceedings, pp. 321–326. IEEE (1996) 7. Bucher, C., Frangopol, D.M.: Optimization of lifetime maintenance strategies for deteriorating structures considering probabilities of violating safety, condition, and cost thresholds. Probab. Eng. Mech. 21(1), 1–8 (2006) 8. Kumar, U.D., Crocker, J., Knezevic, J., El-Haram, M.: Reliability, Maintenance and Logistic Support:- A Life Cycle Approach. Springer, Heidelberg (2012). https://doi.org/10.1007/978-1-4615-4655-9 9. Dindia, K., Baxter, L.A.: Strategies for maintaining and repairing marital relationships. J. Soc. Pers. Relat. 4(2), 143–158 (1987) 10. Blanchard, B.S.: Maintenance and support: a critical element in the system life cycle. In: Systems Engineering and Management for Sustainable DevelopmentVolume II, p. 12 (2009) 11. Karlsson, V.: An overall view of maintenance. Eur. Railway Rev. 11(3), 11–17 (2005) 12. Fararooy, S., Allan, J.: Condition-based maintenance of railway signalling equipment. In: IET (1995) 13. Jardine, A.K.S., Lin, D., Banjevic, D.: A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 20(7), 1483–1510 (2006) 14. Si, X.-S., Wang, W., Hu, C.-H., Zhou, D.-H.: Remaining useful life estimation-a review on the statistical data driven approaches. Eur. J. Oper. Res. 213(1), 1–14 (2011) 15. Nappi, R.: Integrated maintenance: analysis and perspective of innovation in railway sector. arXiv preprint arXiv:1404.7560 (2014) 16. Strano, S., Terzo, M.: Review on model-based methods for on-board condition monitoring in railway vehicle dynamics. Adv. Mech. Eng. 11(2), 1–14 (2019)

Rolling Stocks: A Machine Learning Predictive Maintenance Architecture

77

17. Vitiello, V., Nappi, R., Strano, S., Terzo, M.: Mathematical modeling applied to predictive maintenance in the railway field. Master Degree thesis, Department of Industrial Engineering, University of Naples Federico II (2019) 18. Nappi, R., Florio, G.: Innovation in the software development platforms for safety critical systems. Italian Railway Engineering Magazine, CIFI, LXVII, pp. 323–334 (2012)

Analysis of Railway Track Irregularities with Convolutional Autoencoders and Clustering Algorithms Julia Niebling1(B) , Benjamin Baasch2 , and Anna Kruspe1 1

2

Deutsches Zentrum f¨ ur Luft- und Raumfahrt, Institute of Data Science, M¨ alzerstraße 3, 07745 Jena, Germany [email protected] Deutsches Zentrum f¨ ur Luft- und Raumfahrt, Institute of Transportation Systems, Rutherfordstr. 2, 12489 Berlin, Germany

Abstract. Modern maintenance strategies for railway tracks rely more and more on data acquired with low-cost sensors installed on inservice trains. This quasi-continuous condition monitoring produces huge amounts of data, which require appropriate processing strategies. Deep learning has become a promising tool in analyzing large volumes of sensory data. In this work, we demonstrate the potential of artificial intelligence to analyze railway track defects. We combine traditional signal processing methods with deep convolutional autoencoders and clustering algorithms to find anomalies and their patterns. The methods are applied to real world data gathered with a multi-sensor prototype measurement system on a shunter locomotive operating on the industrial railway network of the inland harbor of Braunschweig (Germany). This work shows that deep learning methods can be applied to find patterns in railway track irregularities and opens a wide area of further improvements and developments.

Keywords: Defect detection autoencoder · Clustering

1

· Deep learning · Convolutional

Motivation and Introduction

The development of low-cost monitoring systems for condition based and predictive maintenance has gained increasing importance in recent years. In the railway sector, the use of in-service trains to monitor the track is considered a promising tool to collect data necessary for now- and forecasting of the track health status [13]. In this context, axle box acceleration (ABA) sensors play an The work on signal processing presented in this paper has received funding from the European Union’s Horizon 2020 research and innovation programme and from the European Global Navigation Satellite Systems Agency under grant agreement #776402. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 78–89, 2020. https://doi.org/10.1007/978-3-030-58462-7_7

Analysis of Railway Track Irregularities with CNN-AEs and Clustering

79

important role and many different studies have shown promising results (see, for example, [7,11] and references therein). It has been shown that track defects can be identified based on spectral analysis and time-frequency representations of ABA data [11]. Molodova et al. [8] proposed the use of the scale-averaged wavelet power for the automatic detection of squats. Baasch et al. [1] described a time-frequency signal separation algorithms to detect singular track defects. Furthermore, by means of time-frequency representations, the ABA data are transformed into 2D images (Fig. 2a). This means that defect clustering can be framed as an image clustering problem. This fact motivates the use of deep neuronal networks in this study. Deep learning is often used for supervised learning tasks (e.g. classification) that rely on massive amounts of labeled data. In the case of railway track inspection, especially for small to mid-size infrastructure operators, the problem is that labeled data in sufficient quantity and quality are rarely available. Therefore, in this paper, we examine how deep learning methods can be applied to analyze railway track irregularities without the use of human labeled data. The idea is that once clusters in a set of identified track irregularities are found, only representative examples of each cluster need to be manually inspected by the asset manager. At best, the results of this inspection can be generalized for all members in a cluster. This would drastically reduce the visual inspection effort. Our approach to analyze the data is the following: First, signal pre-processing is applied such that the input is well-shaped for a convolutional autoencoder (CNN-AE). Then, this CNN-AE is trained on our data set. In this way, the autoencoder learns a lower-dimensional representation of the data which can be decoded to an output similar to the input. The representations of the input data are used for further data analysis. At this point, we apply a clustering method to find similarities in the data. We further examine how outlier detection prior to the application of the CNN-AE affects the results of dimensionality reduction and clustering. In Sect. 2, we describe the experimental set-up for our approach and explain the concept of autoencoders and the used clustering algorithm in more detail. Next, we evaluate and interpret the results in Sect. 3. Finally, Sect. 4 concludes and gives an outlook for further research.

2

Experimental Design

In this section, we explain all steps from acquiring the data to detecting railway track irregularities. After real-world data collection, the pre-processing steps and further data analysis are done with Python. The utilized deep learning framework is Keras which uses the Tensorflow 2.0 backend, [2]. 2.1

Data Acquisition

The data used in this study were acquired with a low-cost multi-sensor prototype system developed at the German Aerospace Center (DLR, [5]). It is installed on

80

J. Niebling et al.

a shunter locomotive (Fig. 1) operating at the Braunschweig harbor industrial railway network in Germany, which has a total track length of 15 km and a connection to the national mainline railway network. The ABA data are measured with a triaxial accelerometer at one of the axle boxes. The ABA sensor measures vibrations in a range of 0.8 to 8,000 kHz at a sampling rate of 20,625 Hz. A low-cost global navigation satellite system (GNSS) receiver allowing GNSS raw data acquisition and a multi-band antenna that also covers EGNSS frequency bands is used for positioning and timing. Additionally, an inertial measurement unit (IMU) at the car body (above suspension) measures accelerations and turn rates with 100 Hz. GNSS and IMU data together can be fused with a digital map in an offline process to assign locations (track ID and distance on track) and train speed to the ABA data [10]. The data analyzed here come from 60 consecutive train journeys with lengths between 12 and 372 s acquired in April 2016.

Fig. 1. DLR prototype of multi-sensor-measurement system (right) installed on a shunter locomotive (left) with triaxial accelerometer at the front right axle box (middle), taken from [5].

2.2

Signal Processing

The aim of the signal processing is to reduce unwanted noise in the data and to detect relevant track irregularities that can be further analyzed. The preprocessing methodologies presented here are mainly based on mathematical transformations in the field of Fourier Analysis. The outlier detection uses simple thresholding. Pre-processing. The measured ABA data are preprocessed as follows. 1. Time-frequency representation: In order to analyze the frequency content of the data, a short-time Fourier transform (STFT) with a window length of 2000 samples (approx. 0.1 s) and 1000 samples overlap is performed. From the complex STFT the magnitude is taken and logarithmized (Fig. 2a).

Analysis of Railway Track Irregularities with CNN-AEs and Clustering

81

2. Reduction of periodic noise: Imperfections of the wheel produce periodic impacts that lead, together with other rotating parts, to periodic noise. This noise can be removed in the cepstral domain by calculating the real Cepstrum from the STFT amplitude spectrums and applying a low-pass filter (also called lifter). 3. Removal of natural frequencies of rail and wheel: The frequency content of the ABA data is dominated by the natural frequencies of the rail and the wheel excited by the unevenness of the wheel-rail contact patch. This results in prominent horizontal bands in the time-frequency representation (Fig. 2a). Those bands can be modeled by time-variant scaling of the frequency response of the dynamic wheel-rail interaction. By assuming that the frequency response is linear and the wheel-rail roughness is white, the spectrum of the system response equals the spectrum of the ABA time series multiplied by a constant. This time variant scaler is found through linear regression of the local spectrum at each time window and the average of all spectra in the time-frequency representation. The scaler is also used later on as a feature to detect outliers indicating track irregularities. Once the frequency bands are modeled, they can be simply subtracted from the timefrequency representation. The remaining signals can then be attributed to local track irregularities (Fig. 2b).

(a) Time-frequency representation of raw ABA data.

(b) Time-frequency representation after pre-processing.

Fig. 2. Time-frequency representation before (a) and after (b) pre-processing

Outlier Detection. The outlier detection aims to find time-windows in the timefrequency representation that contain track irregularities. Those irregularities

82

J. Niebling et al.

excite strong vibrations of the wheel and rail. The time-variant scaler calculated in step 3 of the pre-processing sequence is a measure of the strength of these vibrations and can therefore be used as a feature for the outlier detection. Here, an outlier is defined as a scaler with an amplitude that is higher than the average amplitude plus two times the standard deviation of all scalers. 2.3

Dimensionality Reduction

In this work, we use autoencoders for the dimensionality reduction. This kind of neural networks was first introduced in [6]. The aim of an autoencoder is to learn the identity map between inputs and outputs with a specific neural network. Hence, autoencoders are a type of unsupervised learning methods as the desired output is given by the input data. An autoencoder consists of an encoder and a decoder. While the encoder maps the input data to a lower-dimensional or latent representation also named code, the decoder reconstructs this representation back to the original data. The smallest part of an autoencoder is typically the last layer of the encoder and the input layer of the decoder, also known as the bottleneck. The word bottleneck is also used for the latent space which contains the code. Figure 3 shows the schematic structure of a typical autoencoder. There are a lot of variations and applications of autoencoders. Besides dimensionality reduction, autoencoders can be used for anomaly detection, image denoising, and much more. For a more detailed overview, see, for example, [4]. The input to the dimensionality reduction of our data set is the data after signal processing, namely the spectra of the processed time-frequency representations for each ABA component of all 60 journeys. Thus, each input sample consists of one frame of the time-frequency representation with the size of 1 × 1000 × 3. The encoder of this CNN-AE consists of six 1D convolutional layers and five 1D max-pooling layers in an alternating order. The decoder has six 1D convolutional layers and five 1D upsampling layers nearly symmetrical to the encoder layers. The dimension reduction only happens in the max-pooling layers. The bottleneck has an output size of 16 × 1, i.e., the lower dimensional representation of the input data has dimension 16. We chose this extreme dimensionality reduction to force the model to only learn the most important features. The network consists of 1D layers because we want to examine each time frame of the time-frequency representation independently. This is due to the fact that we only investigate short track defects here. For longer and repeating irregularities, several frames could be regarded together using 2D layers in future experiments. The specific architecture can be found in Fig. 4. As the input data is not further scaled to not loose amplitude variations, a linear activation function is chosen for the first and last convolutional layer. A sigmoid activation function in the bottleneck ensures that the code is contained in the 16-dimensional unit cube of the latent space to examine the single feature vectors later. All other activation functions are tanh to include nonlinearities. The number of filters of the convolutional layers is specified in Fig. 4. For all

Analysis of Railway Track Irregularities with CNN-AEs and Clustering

Fig. 3. The structure of a typical autoencoder.

83

Fig. 4. Network architecture of the utilized CNN-AE.

convolutional layers except for the first and last convolutional layer, the kernel size is 3 and padding is same. The first and last convolutional layer have kernel size 489 and 25, respectively, and both have valid padding. The max-pooling layers always have a pool-size of 2. All upsampling layers except the last one enlarge the data by doubling the entries. The last upsamling layer enlarges the dimension by 4. Since the goal of this experiment is to cluster track irregularities, it was tested, if the clustering provides better results, when the outliers are extracted before the features are learned and clustered. Therefore, we did two separate experiments concerning the training. In the first experiment, the training of the autoencoder was done on the whole data set with 143420 samples and with 200 epochs. In the second experiment, only identified clusters were used in training. The outlier detection is explained in the last paragraph of Subsect. 2.2. This reduced data set consists of 7642 samples, on which the training was done for 200 epochs. In both training experiments, the loss function is the mean-squared error function as it is a typical loss function for autoencoders. The utilized minimization algorithm is the ADAM algorithm with default parameters from Keras, i.e., learning rate 0.001, β1 = 0.9, β2 = 0.999 and ε = 10−7 . 2.4

Clustering

For clustering tasks, one can choose algorithms from a large pool of cluster algorithms. Many of them are available in certain libraries: For example, the scikit-learn library [9] contains many different cluster algorithms. A popular approach is to find a probability distribution of a Gaussian mixture model, i.e., a finite weighted sum of multidimensional Gaussian distributions, from which the given data set is most likely sampled. Each Gaussian distribution is defined by a mean and a covariance matrix, which can typically be determined by an expectation-maximization algorithm [3]. The expectation-maximization algorithm alternates between two steps, the expectation and the maximization step. Initially, the means of the single Gaussian distributions are set randomly or by a more advanced strategy. Then the expected values of the weights of the single Gaussian distributions are computed in the expectation step. These get fixed

84

J. Niebling et al.

to those expected values. The maximization step determines the expected values of the parameters of the single Gaussian distributions by maximizing the probabilities of the occurrences of the data points. Then the procedure repeats until a maximum number of iterations or a convergence threshold is reached. Once a Gaussian mixture model is built depending on an input set and a predefined number of components, the input samples are assigned to one cluster: For each sample, the probabilities for each mixture component are computed. The component with the highest probability defines the cluster for the sample. In our experiments, we use Gaussian mixture models to find suitable clusters in the set of 16-dimensional representations of the input data which are obtained by the trained encoder. For the analysis, we test different numbers of mixture components and evaluate the Bayesian information criterion (BIC) [12] to examine which number of components/clusters suits best. The Bayesian information criterion depends on the likelihood function of the Gaussian mixture model, the numbers of estimated parameters k and the sample size n: ˆ BIC = k ln(n) − 2 ln(L). ˆ is the maximized value of the likelihood function of the Gaussian mixture Here, L model. Note that for each component, the 16-dimensional mean and a 16 × 16dimensional covariance matrix have to be computed. In general, a lower BIC is preferred.

3

Results and Interpretation

Convolutional Autoencoder. The training of the CNN-AE was done for 200 epochs on the full data set and on a fraction of the data set, i.e., the outliers (see Subsect. 2.2), respectively. The autoencoder trained on the full data set is named AE full and the one trained on the outlier set is named AE outlier. In both training experiments, the losses showed a converging behaviour. For AE full, the overall loss was 0.1906 and the validation loss 0.1859. The training of AE outlier on the outlier data set with only outlier samples delivered an overall loss of 0.3733 and a validation loss of 0.3850. These higher losses compared to the losses of AE full are caused by the general higher amplitudes and more diversity within the outlier samples. Thus, it is harder for the model AE outlier to reconstruct a sample. From an analytical point of view, it is also interesting how well the input data are reproduced by the autoencoders. As the code produced by the bottleneck is very low-dimensional in comparison to the input data, we can expect that the reconstruction will be smoother resulting in further noise reduction. This is confirmed by the example in Fig. 5. In order to examine the bottleneck in more detail, the unit vectors of the 16-dimensional latent space are decoded with the trained decoder. For both autoencoders, we obtain that each dimension of the bottleneck explores a certain range of frequencies which is visualized in Fig. 6. In this way, the bottleneck act as a band-pass or band-stop filter.

Analysis of Railway Track Irregularities with CNN-AEs and Clustering

(a) Prediction after training of AE full, (b) Prediction after training mean squared error of 0.2054 (0.1866, AE outlier, mean squared error 0.2012, 0.2284). 0.2450 (0.2529, 0.2382, 0.2437).

85

of of

Fig. 5. Comparison between input data and predictions of one sample (three channels). The mean squared errors of all three channels (and each single channel) are given in the sub-captions.

(a) Decoded unit vectors after training of AE full.

(b) Decoded unit vectors after training of AE outlier.

Fig. 6. Decoded unit vectors of the 16-dimensional latent space.

86

J. Niebling et al.

Clustering. Given the two trained autoencoders AE full and AE outlier, we cluster the code of the input data by finding a Gaussian mixture model with a predefined number of components. By varying this number and evaluating the BIC, it is possible to infer a suitable number of components. For the autoencoder AE full, the number of components is varied from 2 to 40. The resulting BIC curve is shown in Fig. 7a. There, it can be seen that the BIC is more or less decreasing by increasing number of components. That suggests that the input data and its code do not build a certain cluster structure with small number of clusters.

(a) BIC for AE full, code of the full data (b) BIC for AE outlier, code of the outlier set as input set. data set as input set.

(c) BIC for AE full, code of the outlier data set as input set.

Fig. 7. Results of computing Gaussian mixture models.

In Fig. 7b, we visualize the BIC for Gaussian mixture models with 2 to 30 components computed from the code of the outlier data set obtained from the autoencoder AE outlier. From that, it follows that 9 components are a suitable choice. To compare both trained autoencoders, we compute Gaussian mixture models for the codes on the outlier data set obtained by the encoder of AE full additionally. Figure 7c reports the BIC curve for that setting. The smallest BIC was obtained for 7 components. For the settings, where the number of components is small and the BIC is smallest, we further examine the corresponding Gaussian mixture models: To visualize the centers of the obtained clusters, the means of the best Gaussian mixture models are decoded by the decoder of the respective autoencoder (Fig. 8). The differences in those decoded clusters are slight and mainly noticeable in the overall amplitude and the slope of the amplitude from lower to higher frequencies. In general, a broad spectrum with steep

Analysis of Railway Track Irregularities with CNN-AEs and Clustering

87

amplitude slope at low frequencies corresponds to a short wavelet in the time domain that could indicate a short isolated track irregularity. In contrast, a smoother slope could indicate an increase in rail roughness. It is important to mention that this interpretation needs to be verified by means of ground truth data. The clusters built on the code obtained by AE full show higher diversity due to the higher variation in the training set (Fig. 8b).

(a) Decoded centers of GMM with 9 components built on code obtained by AE outlier.

(b) Decoded centers of GMM with 7 components built on code obtained by AE full.

Fig. 8. Decoded centers of Gaussian mixture model built on code of outlier data set obtained by different autoencoders.

Plotting Cluster Assignments Back to Railway Track. By means of georeferencing, the cluster assignments of each sample can be associated with a position on the corresponding track and can hence be visualized on the railway network map (Fig. 9). This information can be used to guide specific maintenance actions. Here, we use the cluster assignments of the outlier data set obtained by the Gaussian mixture model with 9 components, which was computed from the codes of the encoder of the trained model AE outlier. It can be seen that some clusters accumulate at certain sections of the network. In the north-west of the network cluster 1 is predominant. In this area, different scarp and coal loading sites are situated. The loading leads to dirt on the track that increases the track

88

J. Niebling et al.

roughness and can also lead to track defects. It still needs to be verified whether the clusters can be associated with specific track defects.

Fig. 9. Cluster assignments mapped on industrial railway network at the Braunschweig harbour.

4

Conclusion and Outlook

In this paper, we presented an approach to analyze real world data, i.e., ABA data from train journeys at an industrial railway network, with signal preprocessing and unsupervised machine learning methods to track irregularities on railway tracks. Our chosen unsupervised machine learning approach based on deep learning, namely convolutional autoencoders, and Gaussian mixture models is capable of clustering ABA anomalies. The clustering algorithm was applied to the full data set and to outliers detected with signal processing methods only. Clustering the full data set did not lead to reliable results. In contrast, when only outliers were considered, an optimal number of clusters was found. Nevertheless, the differences between the clusters were small, which made the interpretation difficult. Therefore, validation with ground truth data is important and should be considered in the future. If the clusters match existing failure modes, this would also motivate to adapt the presented CNN-AE methodology towards a supervised classification approach. Furthermore, it could be tested if the clustering could be improved by integrating the clustering in the deep learning architecture, i.e., using more advanced techniques such as deep embedded clustering like, for example, in [14,15]. In this sense, one prospective aim is to unite all steps after the data acquisition, namely the pre-processing including denoising and a first outlier detection,

Analysis of Railway Track Irregularities with CNN-AEs and Clustering

89

the dimensionality reduction and the clustering, to one deep learning model. For this, a suitable architecture and loss function have to be found. For further research, it is also of interest to examine more consecutive time frames together to track longer irregularities. From the methodical point of view, anomaly detection in sequential data is not that well studied now and offers the chance for developing new deep learning methods.

References 1. Baasch, B., Roth, M., Havrila, P., Groos, J.C.: Detecting singular track defects by time-frequency signal separation of axle-box acceleration data. In: WCRR 2019 (2019) 2. Chollet, F., et al.: Keras (2015). https://keras.io 3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Applied StatisticsJ. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977) 4. Dong, G., Liao, G., Liu, H., Kuang, G.: A review of the autoencoder and its variants: a comparative perspective from target recognition in synthetic-aperture radar images. IEEE Geosci. Remote Sens. Mag. 6(3), 44–68 (2018) 5. Groos, J.C., Havrila, P., Schubert, L.: In-service railway track condition monitoring by analysis of axle box accelerations for small to mid-size infrastructure operators. In: The British Institute of Non-Destructive Testing (ed.) Proceedings of WCCM 2017 congress (2017) 6. Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Advances in Neural Information Processing Systems, pp. 3–10 (1994) 7. Li, Z., Molodova, M., N´ un ˜ez, A., Dollevoet, R.: Improvements in axle box acceleration measurements for the detection of light squats in railway infrastructure. IEEE Trans. Ind. Electron. 62(7), 4385–4397 (2015) 8. Molodova, M., Li, Z., N´ un ˜ez, A., Dollevoet, R.: Automatic detection of squats in railway infrastructure. IEEE Trans. Intell. Transp. Syst. 15(5), 1980–1990 (2014) 9. Pedregosa, F.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 10. Roth, M., Baasch, B., Havrila, P., Groos, J.C.: Map-supported positioning enables in-service condition monitoring of railway tracks. In: International Conference on Information Fusion (FUSION), pp. 2346–2353 (2018) 11. Salvador, P., Naranjo, V., Insa, R., Teixeira, P.: Axlebox accelerations: Their acquisition and time-frequency characterisation for railway track monitoring purposes. Measurement 82, 301–312 (2016) 12. Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978) 13. Weston, P.F., Roberts, C., Goodman, C.J., Ling, C.S.: Condition monitoring of railway track using in-service trains. In: 2006 IET International Conference On Railway Condition Monitoring, pp. 26–31 (2006) 14. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016) 15. Yang, J., Parikh, D., Batra, D.: Joint unsupervised learning of deep representations and image clusters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5147–5156 (2016)

UIC Code Recognition Using Computer Vision and LSTM Networks Roberto Marmo(B) Computer Vision and Multimedia Lab, University of Pavia, Pavia, Italy [email protected] https://vision.unipv.it/

Abstract. UIC code is a key data for railway operations. This paper presents a method for UIC code recognition on locomotive and wagon. The approach is based on computer vision, to gain high-level understanding from digital images, and LSTM, a specific neural network with relevant performance in optical character recognition. Experimental results show that the proposed method has a good localization and recognition performance in complex scene, to improve the logistic and safety of a railway infrastructure. Keywords: OCR · Logistic · Wagon · UIC

1 Introduction UIC codes are those special pieces of data that wagons have printed on them, a key data for railway operations, because it helps identifying and tracking railway wagons for railway operators, infrastructure companies and transportation authorities. UIC, by the way, stands for the French name of the International Union of Railways (Union Internationale des Chemins de Fer), the number system is defined in UIC leaflet 920-14. The complete wagon number comprises 12 digits. The standard wagon number has the following template: [XXYYZZZZTTT-Q] example: 61838890999-5. The individual digits within the number have the following meaning: – – – – –

first and second position: interoperability code (on multiple units’ type code); third and fourth position: owner’s code (since 2006: UIC country code); fifth to eight position: type number; ninth to eleventh position: serial number; twelfth position: self-check digit.

In case of goods wagon, the UIC numbers is in three lines, web page [1] contains the UIC classification of goods wagons and their meanings, in other case is in one line. There is a variety of font shapes and colors. This number enables a railway wagon to be positively identified and forms a common language between railway operators, infrastructure companies and the responsible state authorities. © Springer Nature Switzerland AG 2020 S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 90–98, 2020. https://doi.org/10.1007/978-3-030-58462-7_8

UIC Code Recognition Using Computer Vision and LSTM Networks

91

What do these trains carry? Are they on time? Are they where they should be? Any missing carriages? In order to obtain these relevant answers in logistics in rail transport, railway system operators must manually register the specific UIC code passing railway carriages (carrying cargo or passengers), but this human recording may introduce errors. By having an automatic railway code recognition system, it is possible to answer these questions easily. Moreover, the gathered data can be automatically stored and processed for statistical and system management purposes. RFID identification needs to install chips on the wagon, whose production cost is high. Some approaches are proposed, based on camera, computer, digital image processing software. Artificial intelligence is useful in this kind of recognition, because the better the algorithms are, the highest the quality of the character recognition software is, the widest range of picture quality it can handle, the most tolerant against distortions of input data it is. The rest of this paper is organized as follows. Section 2 reviews related research and technology on similar problems. Section 3 described the proposed solution. Section 4 presents the experimental results. Section 5 suggests how to install the solution on multifunctional railway portal or smartphone.

2 Related Research and Technology Computer vision system [2] is composed of hardware, such as CCTV IP camera and computer, and software, based on image processing and pattern recognition algorithms, by ensuring a convenient environment for visual control. Optical character recognition (OCR) based on computer vision is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo. The technology applied in number recognition on goods wagon are inspired from automatic license plate recognition (ALPR), a technology used in all fields where car information is recognized. This topic has been extensively researched by researchers worldwide to improve performance of ALPR in real-world scenarios. Current ALPR algorithms achieve exemplary performance in controlled environments; however, performance is decreased when dealing with complex scenes. The current challenges in ALPR includes multinational ALPR, dealing with uncontrolled conditions such as uneven illumination, weather (snow, fog, rain, etc.), image distortion, image blurring, occlusions, etc. [3–5]. With the development of image processing and pattern recognition technology, the recognition of the railway truck license number based on image processing method has also been applied in rail transport. However, application of similar systems at railway stations and terminals implies significant adaptation difficulties, since the following challenges have to be addressed [6]: variety of color combinations of digits and background, presence of various digit outlines, various kinds of contamination of the object’s surface, and the necessity to operate in both artificial and natural lighting. In this sense, the common recognition

92

R. Marmo

methods based on template matching and pattern recognition, are unable to recognize the characters of wagon ID. The localization methods of car license plates are mostly based on color, texture and edge methods. As opposed to license plate recognition systems designed for scanning quite compact and clear-cut license plates, UIC code can exists anywhere in wagons surface, which increases the processing time. Industrial solutions in railway monitoring are available in the global market in case of goods wagons with white characters and black, gray, red background [7–10]. The approaches in railway monitoring can be broadly divided into two main categories: traditional image processing methods [11, 12] and deep learning methods. Railway oil tank wagon ID in the industrial scene is discussed in [11], based on maximum stable extremal regions to obtain the extreme regions contains the characters, for overcoming the interference from camera installation angles. This approach make use of four-point correction method to correct ID areas, an improved projection method to segment characters, and Tesseract-OCR text recognition classifier to recognize ID characters. Experimental results show that the proposed method has a good localization and recognition performance for railway oil tank wagon ID in the complex industrial scene. The scale-invariant feature transform (SIFT) is a feature detection algorithm in computer vision to detect and describe local features in images [2]. The probability that a set of features indicates the presence of an object is computed, given the accuracy of fit and number of probable false matches. As a result of comparison, if there exist any matching areas, it is possible to check whether that area contains wagon number with 12 digits or not [12]. Moreover, these works are related to generic number recognition. The main advantage of this paper regards the specific context of UIC code, based on artificial intelligence in order to significant improvement of results and increase their reliability.

3 Proposed Solution The methodology suggested is based on neural networks model called LSTM and library Tesseract OCR. Neural networks, or connectionist systems, are computing systems vaguely inspired by the biological neural networks that constitute animal brains [14], therefore they are part of Artificial Intelligence (AI). Such systems learn to perform tasks by considering examples, generally without being programmed with task-specific rules, also called as machine learning. Humans don’t start their thinking from scratch every second. You don’t throw everything away and start thinking from scratch again, your thoughts have persistence. Traditional neural networks can’t do this [14], and it seems like a major shortcoming. Recurrent neural networks (RNN) address this issue [15]. They are neural networks with loops in them, allowing information to persist. Long Short Term Memory (LSTM) is a kind of RNN, which works, for many tasks, much better than the standard version, because it is capable of learning long-term dependencies [16]. Unlike standard feedforward neural networks, LSTM has feedback connections, it can not only process single data points, but also entire sequences of data. It is

UIC Code Recognition Using Computer Vision and LSTM Networks

93

composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell. Tesseract OCR [13] is an optical character reading engine developed by HP laboratories in 1985 and open sourced in 2005. Since 2006 it is developed by Google. Tesseract has Unicode (UTF-8) support and can recognize more than 100 languages “out of the box” and thus can be used for building different language scanning software also. Latest Tesseract version is Tesseract 4. Tesseract works best when there is a very clean segmentation of the foreground text from the background. Tesseract does not do a very good job with dark boundaries and often assumes it to be text, if we help Tesseract a bit by cropping out the text region, it gives perfect output. In practice, it can be extremely challenging to guarantee good segmentation using image processing. Tesseract provides OCR engine based on neural networks LSTM. Based on these assumptions, the proposed approach for UIC recognition consists of the following steps: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Color image acquisition; Luminosity and contrast enhancement; Thresholding to obtain B/W image; Locating text-line regions on wagon; Image correction techniques; Connected component analysis; Morphological transformations; Tesseract OCR; Reconstruction of specific UIC code; Computation of self-check digit.

Step (1) regards collecting color images using a camera, as suggested in next Sect. 5. The output to this stage is a wagon image. Step (2) allows to enhance edges along characters, to increase the difference between adjacent colors, to better stand out the characters from background, increasing contrast of 50% and reducing luminosity of 40%. Step (3) performs conversion on gray image and, subsequently, application of threshold value 127 to obtain a B/W image, having white characters and black background. Step (4) regards locating text-line regions on wagon. It is a challenge because variety of their positions, colors of characters and backgrounds, font types and sizes. In [17] is proposed a method related to license plate, where image edges were extracted and projected horizontally with a Gaussian smoothing filter. The position of the local maximums and local minimums of the smoothed histogram were found. From each local maximum, the top and bottom position of each text-line region can be obtained. Vertical and horizontal projections connected component analysis [18], or contour analysis [19] are applied to obtain the position of each character. Step (5) regards angle correction and image correction techniques used to increase the recognition rate. A realignment algorithm for irregular character strings on color

94

R. Marmo

documents, such as inclined or curved texts sometimes with distortion, is proposed in [20]. Step (6) allows noise reduction and discarding objects not useful in OCR, based on connected component analysis of binary object measurements [21] by using geometry information such as height, width, area, aspect ratio of each connected component. In order to prepare image suitable for OCR, a set of thresholding applied to these measurements are computed, based on typical values assumed by characters, to remove non-text components. Step (7) is based on morphological transformations [21], that are some simple operations based on the image shape. Erosion and dilation are applied with kernel 5 × 5. In this way, characters are detached, and spurious white pixel are removed. Step (8) is computed using Tesseract, as before mentioned, all existing characters are extracted as output. Step (9) allows to discard characters and digits not involved in UIC code using regular expression [22], that is a sequence of characters that define a search pattern. Usually such patterns are used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. As example: \d{4}[ ]\d{2}[-]\d{2}[ ]\d{3}[-]\d{1} can detect 6183 88-90 999-5. Some little variations are considered, such as 61838890999-5. Step (10) allows computation of self-check digit can detect errors in code recognition. It is based on Luhn algorithm, it is derived from the sum of the numbers that arise when the digits are alternately multiplied by 2 and 1, the difference of this sum from the next multiple of ten produces the check digit. This step deals with selection of unique code and deletion of the redundant ones. Experiments are conduct with Python 3 programming language, OpenCV library [23] for image processing and computer vision in steps (1–6), PyTesseract library in step (7).

4 Experimental Results 50 images extracted from complex scene are collected using web site, social media, personal collection, they are manually verified and annotated. There are 10 locomotives, 20 freight wagons, 20 passenger wagons. Data augmentation is the technique of increasing the size of data used for training a model, useful to generate new images and thereby increase the amount of data, the transforms are: • position augmentation: scaling, cropping, flipping, padding, rotation, translation; • color augmentation: brightness, contrast, saturation, hue. In this way, varying the parameters for each image can be obtained 20 images, having in total 1000 images with having characters on a lighter background or vice versa. Figure 1 shows some input images with different visual features.

UIC Code Recognition Using Computer Vision and LSTM Networks

95

Fig. 1. Some input images, with different rotation, background and character colors, orientation and inclination of characters.

1000 images of various trains without UIC code are also considered, no UIC code was found. Image parameters: RGB color, 1024 × 768 pixel, file format JPEG. Experiments shows accuracy of 97% of plates. The worst recognition performance regards blurred and partially occluded characters, low contrast, corrupted shapes, closely arranged characters. Due to frontal position of camera, background and character colors, this approach can detect the area accurately, which reduces the recognition rate. Surveys performed at the commission of the Canadian Ministry of Transport reveals an accuracy of slightly above 90% [24] related to systems in the global market, therefore, the results are comparable with these systems. These experiments were implemented in Python 3.6.5 on traditional computer without GPU or hardware accelerator, timing measurements reveals few seconds. UIC code can be recognized on wagons moving in both directions, recording range of 5 to 10 m in width, speed between 2 and 15 km/h. Figure 2 shows input and output images obtained from previous steps.

96

R. Marmo

Fig. 2. From left to right, top to bottom: original image, rotated image, B/W image, morphological operations, Tesseract OCR, UIC code detected.

5 Installation The acquisition parameters of the camera, such as the type of camera, camera resolution, shutter speed, orientation, and light, must be considered [5]. A multifunctional railway portal can contribute to improve the safety of a railway infrastructure [10]. The portal can detect in real time the conformity of the trains traveling along the tracks and can transfer the status information to traffic control center [25]. A digital camera can be placed on this kind of portal, to monitor in real time the UIC code during the transit. It is possible to examine both sides of the train at the same time. Additional structured lighting, such as LED optical element, that ensures high illumination uniformity across the whole field of vision [6]. With ongoing advances in mobile camera technology, smartphones are being increasingly used to capture images of documents in a variety of consumer and business applications [26]. Examples of such documents include bank applications, checks, insurance claims, receipts, etc. A variety of mobile apps for document scanning are available in the market. These apps return as output all the text located in images, without filtering suitable in railway application. The proposed software can be installed in mobile application for capturing images of printed multi-page documents with a smartphone camera. Due to availability of Tesseract as Tess4J API and OpenCV libraries in Java language and Android architecture, the computation can be made in smartphone as Android app, it is not necessary to transfer images from phone client to cloud service.

UIC Code Recognition Using Computer Vision and LSTM Networks

97

It is relevant to place the smartphone in front of wagon, with vertical alignment in order to detect a wide area. Any rotation or skew must be corrected [26]. Additional spotlight led can be added on smartphone to highlight the wagon. Human operators need to place the smartphone on photographic easel and to press a button to start the app when the train is approaching on platform. Another acquisition consists on capturing single image made by human operator walking along the train. Output file can be sent by wi-fi or SMS to traffic control room. New design solutions in the integrated technology of coordinated design of all hardware and software components make it possible to increase the acceptable speed of train pas sage through these checkpoints. Using two cameras it is possible to examine both sides of the train at the same time, in order to correct errors.

6 Conclusions This paper describes an algorithm based on computer vision and LSTM neural network, which make possible the recognition of UIC code from wagon image, an important part of modern transportation control system. The highest performance results and strong practicability show that the method proposed can satisfy the real-time requirements in practical railway applications. This approach can also be easily adapted into multifunctional portal or smartphone app. In future work, container code recognition will be considered for logistic management.

References 1. UIC classification of goods wagons. https://en.wikipedia.org/wiki/UIC_classification_of_ goods_wagons. UIC Wagon numbers. https://en.wikipedia.org/wiki/UIC_wagon_numbers. Accessed 01 June 2020 2. Davies, E.R.: Computer Vision Principles Algorithms Applications Learning. Academic Press, Cambridge (2017) 3. Castro-Zunti, R.D., Yépez, J., Ko, S.-B.: License plate segmentation and recognition system using deep learning and OpenVINO. Intell. Transp. Syst. 14(2), 119–126 (2020) 4. Henry, C., Yoon, A.S., Lee, S.-W.: Multinational license plate recognition using generalized character sequence detection. IEEE Access 8, 35185–35199 (2020) 5. Du, S., Ibrahim, M., Shehata, M., Badawy, W.: Automatic license plate recognition (ALPR): a state-of-the-art review. IEEE Trans. Circuits Syst. Video Technol. 23(2), 311–325 (2013) 6. Kazanskiy, N.L., Popov, S.B.: Integrated design technology for computer vision systems in railway transportation. Pattern Recogn. Image Anal. 25(2), 215–219 (2015) 7. Carmen® UIC code recognition. https://adaptiverecognition.com/products/carmen-uic-rai lway-code-recognition/. Accessed 01 June 2020 8. MULTIRAIL® IDentify Fully automatic railcar number recognition, https://www.schenckpr ocess.com/products/multirail-identify. Accessed 01 June 2020 9. Recognition of UIC wagon/coach numbers. http://www.ocrtech.com/uic_recognition.html. Accessed 01 June 2020 10. Rail OCR Portal. https://www.camco.be/products/rail-ocr-portal/. Accessed 01 June 2020

98

R. Marmo

11. Xiang, X., Yang, F., Wang, M., Bao, W., Sheng, Y.: ID localization and recognition for railway oil tank wagon in the industrial scene. In: Proceedings of 12th World Congress on Intelligent Control and Automation (WCICA), pp. 826–829 (2016) 12. Sanver, U.: Identification of train wagon numbers. In: Proceedings of the IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference, pp. 63–68 (2014) 13. Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of ICDAR Ninth International Conference on Document Analysis and Recognition, vol. 2 (2007) 14. Aggarwal, C.C.: Neural Networks and Deep Learning: A Textbook. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-94463-0 15. Mandic, D., Chambers, J.A.: Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. Wiley, Toronto (2001) 16. Hochreiter, S., Schmidhuber, J.: Long short term memory. Neural Comput. 9(8), 1735–1780 (1997) 17. Abolghasemi, V., Ahmadyfard, A.: An edge-based color-aided method for license plate detection. Image Vis. Comput. 27, 1134–1142 (2009) 18. Li, G., Zeng, R., Lin, L.: Research on vehicle license plate location based on neural networks. In: International conference on Innovative Computing, Information and Control, vol. 3, pp. 174–177 (2006) 19. Anagnostopoulos, C., Anagnostopoulos, I., Loumos, V., Kayafas, E.: A license platerecognition algorithm for intelligent transportation system application. IEEE Trans. Intell. Transp. Syst. 7, 377–392 (2006) 20. Hase, H., Yoneda, M., Shinokawa, T., Suen, C.Y.: Alignment of free layout color texts for character recognition. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 932–936 (2001) 21. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2007) 22. Goyvaerts, J., Levithan, S.: Regular Expressions Cookbook. O’Reilly Media, Sebastopol (2012) 23. Brahmbhatt, S.: Practical OpenCv. Apress, New York (2013) 24. AEI/OCR system verification. Technical report no. TP 14143E, DTI Telecommunications (2003) 25. Bersani, C., Guerisoli, C., Mazzino, N., Sacile, R., Sallak, M.: A multi-criteria methodology to evaluate the optimal location of a multifunctional railway portal on the railway network. J. Rail Transp. Plann. Manage. 5, 78–91 (2015) 26. Pundlik, S., Singh, A., Baghel, G., Baliutaviciute, V., Luo, G.: A mobile application for keyword search in real-world scenes. IEEE J. Transl. Eng. Health Med. 7, 1–10 (2019)

Deep Reinforcement Learning for Solving Train Unit Shunting Problem with Interval Timing Wan-Jui Lee1(B) , Helia Jamshidi2 , and Diederik M. Roijers3,4 1

3

R&D Hub Logistics, Dutch Railways, Utrecht, The Netherlands [email protected] 2 CiTG, TU Delft, Delft, The Netherlands [email protected] Microsystems Technology, HU University of Applied Sciences Utrecht, Utrecht, The Netherlands [email protected] 4 AI Research Group, Vrije Universiteit Brussel, Brussels, Belgium

Abstract. The Train Unit Shunting Problem (TUSP) is a hard combinatorial optimization problem faced by the Dutch Railways (NS). An earlier study has shown the potential to solve the parking and matching sub-problem of TUSP by formulating it as a Markov Decision Process and employing a deep reinforcement learning algorithm to learn a strategy. However, the earlier study did not take into account service tasks, which is one of the key components of TUSP. Service tasks inject additional time constraints, making it an even more challenging application to tackle. In this paper, we formulate the time constraints of service tasks within TUSP to enable deep reinforcement learning. Using this new formalization, we compare two learning strategies, DQN and VIPS, to evaluate the most suitable one for this application. The results show that by assigning extra triggers to agents at fixed time intervals, the agent accurately learns based on VIPS to send the trains to the service tracks in time to comply with the departure schedule.

Keywords: Train Unit Shunting iteration

1

· Deep reinforcement learning · Value

Introduction

The Dutch Railways (NS) operates a daily number of 4,800 domestic trains serving more than 1.2 million passengers each day. When trains are temporarily not needed to transport passengers they are maintained and cleaned at dedicated shunting yards. Here, NS is dealing with the so-called shunting activities [2]. The Train Unit Shunting Problem (TUSP) is a computationally hard sequential decision-making problem. At a service site, train units need to be inspected, c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 99–110, 2020. https://doi.org/10.1007/978-3-030-58462-7_9

100

W.-J. Lee et al.

cleaned, repaired and parked during the night. Furthermore, specific types of train units need to depart at a specific times. Together, these requirements and constraints make the TUSP a highly challenging scheduling and planning problem. In 2018 NS has started to explore the possibility to solve the TUSP problem with Deep Reinforcement Learning [5], and this has shown great potential in solving the parking problem using Deep Q-learning. In [5] the authors only focused on matching incoming trains to outgoing trains and sending trains to parking locations. However, the service tasks, which form a key component and often are the bottleneck, are not yet taken into account. More generally, time constraints were out of scope. For a complex planning and scheduling problem like TUSP, a proper design of the environment, reward functions, state representation and actions is critical to keep the problem not only tractable, but even learnable at all. To better understand and monitor the learning of an agent on the TUSP problem, we keep track of specific movements and violations during training. For instance, we monitor: correct departures, sending trains to the service track if there is undone service task, and parking at tracks with sufficient space. These visualizations and monitoring of the agent behaviors have helped a lot in iteratively improving upon the design of the agent, as well as the environment representing the TUSP problem. We believe that this strategy of monitoring shall be also helpful for other real-world applications which are as complex as TUSP. In the following sections, we first describe the TUSP problem and its environment in detail, followed by the Deep Reinforcement Learning (DRL) methods our agents employ. Afterwards, we empirically compare different learning methods. We close with a discussion of the effectiveness of our formulation and learning methods.

2

Train Unit Shunting Problem (TUSP)

First, let us discuss an example to get a feel for what TUSP problems are. Figure 1 and Fig. 2 illustrate a small TUSP problem. Figure 1 shows a service site. Trains enter and exit the site over gateway track G and can be parked on track 1 to 4. The tracks are connected by the switches a and b. The length of the parking tracks is displayed in meters. A cleaning platform allows internal cleaning tasks to be performed on trains positioned on track 2. Figure 2 is a planning instance. Figure 2(b) shows the lists of incoming and departing trains together with their incoming and departing time. The departure trains specify the composition of subtypes instead of the train units, since the assignment is part of the matching problem. The ordering of the train units or subtypes indicates from left to right the order of the train units of subtypes in the train on the service site. Figure 2(a) describes the type of service tasks required for specific train units and the duration of such service tasks. The goal is to find a feasible shunting plan to move train units onto a shunting yard and make sure they can be serviced and parked in suitable locations, and subsequently depart at the desired time.

DRL for TUSP with Interval Timing

101

Please note that this example is only for illustration purposes, and real shunting yards and problem instances faced by NS are larger and more complex.

Fig. 1. An example of a service site.

(a) Train units

(b) Arrivals and departures

Fig. 2. An example scenario

2.1

Design of the TUSP Environment

The design of the environment is essential for an agent to solve the TUSP problem effectively and efficiently. Specifically, we need to encode the essential components in the sequential decision making process of making a shunting plan in a compact, yet informative way. The process starts with a train arrival at the gate track, and a decision needs to be made to move the arrival train immediately from the gate track to a parking track. If the train requires certain services, it will need to be moved to the corresponding service track from its current position before its expected departure time. If there is no service needed, the train will park on a parking track and wait for departure. At a certain moment, a train with a specific type and composition will be requested for departure, and the chosen train will move from the parking track to the gate track to depart. When all the arrival trains are handled properly and all departure trains depart in time with all service done, the planning process terminates and all the decisions form a feasible solution. Therefore, the TUSP environment has to provide (1) proper triggers to indicate agents when to take an action, (2) a proper action set to allow agents to make necessary decisions, (3) rewards and punishment to teach agents whether a

102

W.-J. Lee et al.

action is good or bad, and (4) yard status and expected train arrival/departures to provide agents with sufficient information for decision making. All these elements will be introduced in the following: Triggers. The environment maintains a trigger list in which each trigger has a countdown timer indicating how much time left to its execution. After a trigger is processed, it is deleted from the trigger list. The types of trigger events include: 1. An Arrival trigger is described with the train and time. When it is processed, the corresponding train appears on the gate track. 2. A Departure trigger is described with the train type and time. 3. An End-Of-Service trigger is described with the train unit and time. It can only be added to the trigger list by starting a service. 4. An End-of-Movement is also a trigger added after each movement state changes. When End-of-Movement is triggered the agent can take a new action. These trigger events alone are not sufficient for agent to learn to put trains to service. Therefore, we propose the wait action to wait until the next trigger. In this setting, the wait action can result in time being moved forward with anywhere from a minute to several hours. While full information of how far the next trigger is, is encoded in the state, in early training the agent cannot access that information. This will make outcome of the wait action highly unpredictable and agent is less likely to learn it, despite it being crucial to solving an episode. To address this phenomenon, it is best to make a trade-off between uniform and reactive sampling of time. This can be achieved by limiting the maximum interval between triggers. If the duration between triggers is long, extra triggers will be sampled with an uniform time interval. It should be mentioned that this condition will not change or enhance optimum solution to the planing problem. However, it proved to push training of the agent in the direction of the optimal policy. Action Space. In this work, the action space consists of track-to-track movements, plus the wait action. No combination or splitting actions of train units are considered in this study and the compositions of arrival and departure trains are assumed to remain the same. Each movement has a start track and an end track. Movement actions only exist when these tracks are inherently connected; hence the agent can never violate routing between tracks. We note that this holds given that no simultaneous movements can be performed. Since all track pairs (start track to end track) are either connected to each other with a movement to the right or a movement to the left, there can only be maximum of one train which can move from the start track to the end track. Wait actions will forward the state of the yard in time to a fixed time interval or the next train arrival, a required train departure, or the end of service (cleaning) of a train. Via this action space, the agent has the freedom to move each train around. Note that if a train unit has an undone service task and is sent to the corresponding service track, the service task is assumed to be started automatically after the movement.

DRL for TUSP with Interval Timing

103

Rewards and Violations. When the agent chooses an action that causes one of the following violations, the episode ends. 1. 2. 3. 4. 5. 6. 7. 8.

Choosing a start track that is empty and therefore there is no train to move. Choosing to wait when there is an arrival or departure train. Parking a train on a gate track or relocation track. Choosing the wrong time or a wrong train type for departure. Choosing a train with undone service for departure. Moving a train while in service. Violating the track length. Missing a departure or arrival while doing other movements.

For actions that result in certain situations, rewards will be given: – – – – – – –

Moving to the relocation track: −0.3 Move to the service track while no service is required: −0.5 Right departure: +2.5 Wait for service to end: +(duration in minutes)/60 End service: +(duration in munites)/60 Find a solution: +5 Others: 0

State Representation. For the state representation, all data describing train arrival and departure schedules, and their required service time on each track is incorporated. The current positions and order of trains on tracks are also included. The type of a train unit is not communicated directly in the state, however to match leaving train units with required train type, for each train in the yard and each train which has not arrived yet, opportunities when this train can leave the yard are indicated. To summarize, the following information are encoded into the state representation and the concatenated into a vector: – – – – – – – –

3

Position of train units on the track. Required service time of train units. Whether a train unit is under service. Length of train units. Time to arrival of train units. Is it the arrival time of a train unit? Next 3 departure opportunities of the same train type. Is it the departure time of the same train type?

Deep Reinforcement Learning for TUSP

In this section we describe our methodology for learning initial solutions (i.e., policies) for the improved formulation of the TUSP, as described in the previous section. As mentioned earlier, using DRL methods to solve the parking and matching subproblem of TUSP have been proposed [5]. However, the new TUSP

104

W.-J. Lee et al.

formulation is significantly harder, due to the inclusion of temporal constraints as well as service tasks. Consider the general setting shown in Fig. 3 where an agent interacts with an environment. At each time step t, the agent observes a state signal st , and performs an action at . Following the action, the state of the environment transitions to st+1 and the agent receives reward rt . We note that in our formulation, the state signal is sufficiently Markovian to model the problem as fully observable, i.e., the state transitions and rewards can be stochastic and are assumed to have the Markov property; i.e. the state transition probabilities and rewards depend only on the state of the environment st and the action taken by the agent at . It is important to note that the agent can only control its actions, and has no prior knowledge of which state the environment would transition to or what the reward may be. By interacting with the environment, during training, the agent can learn about these quantities. The of learning is to maximize goal ∞ the expected cumulative discounted reward: E[ t=0 γ t rt ], where γ ∈ (0, 1] is the factor discounting future rewards. In our application γ is set to 0.99.

Fig. 3. Showing interaction of DRL agent with the environment

The agent picks actions based on a policy, defined as a probability distribution over actions: π(s, a) which is the probability that action a is taken in state s. As there is at least one optimal deterministic stationary policy in a fully observable infinite-horizon discounted MDP [3] (as our TUSP formulation is), policies are often written as a mapping from states to actions, π(s). In many popular reinforcement learning methods [8], this policy is derived from an state-actionvalue function Q(s, a), i.e., π(s) = arg maxa Q(s, a). However, in most problems of practical interest, there are many possible (state, action) pairs. Hence, it is impossible to store the state-action-value function in tabular form and it is common to use function approximators. A function approximator has a manageable number of adjustable parameters, θ; we refer to these as the policy or network parameters. The justification for approximating the Q-function is that similar states should have similar state-action values. Deep Neural Networks (DNNs) are a popular choice and have been shown to be usable as Q-function approximators for solving large-scale reinforcement learning tasks [4]. An advantage of DNNs is that they do not need hand-crafted features.

DRL for TUSP with Interval Timing

3.1

105

DQN

In the previous study of [5], the Deep Q-Network (DQN) [4] is adopted to solve the parking and matching subproblem of the TUSP. Their architecture is divided into two parts. First, a series of convolutional layers learns to detect increasingly abstract features based on the input. Then, a dense layers map the set of those features present in the current observation to an output layer with one node for every possible action possible in environment. The Q-values correspond to how good it is to take a certain action given a certain state. This can be written as Q(s, a). Deep neural networks can be adopted as a function approximator for the Q-values of each (state, action) pair. This enables generalization from seen states to unseen states. For (their subproblem of) TUSP [5] needed to make slight changes to standard DQN. The reason for this is that many actions are invalid at given timesteps. As DQN needs an equal number of outputs for each timestep, and must learn not to perform invalid actions, taking an invalid action is modelled as: receiving a highly negative reward, and ending the episode. However, because the action-space in TUSP is very large, this preempts learning efficiently: invalid actions are taken too often, so actually ending an episode with only valid actions because a rare occurrence using an ε-greedy exploration strategy. Therefore, the first three times an invalid action is chosen, the state is reverted to the previous state (before the offending action), and the strategy is switched to greedy, excluding the invalid action that was just taken. Note that the greedy action according to the network might also be invalid. In this case, the second-best action according to the action-value network Q is chosen. [5] found that this was an essential adaptation to make DQN work for their subproblem of TUSP. Because our formulation of TUSP has similar action spaces, but is even more complex (i.e., has more challenging constraints), we make use of this adaptation as well. 3.2

Value Iteration with Post-decision State (VIPS)

In DQN, the agent had difficulty handling transition function for TUSP. This is mainly due to the large action space, and the many invalid actions (leading to terminal states). This made learning slow. As our TUSP formulation is more complex, we found that DQN became too slow again. However, there some observations we can exploit. In general the transition function determines how likely it is that we reach next state st+1 . However, if we are in st in a deterministic problem like TUSP, the probability of a certain action is either zero or one. Furthermore, the transition function is known to us. Therefore, rather than using a Q-learning algorithm, we can employ a planning algorithm that exploits this. We extend value iteration for TUSP, to show the deep learning agent the deterministic transition function in advance. It should be emphasized though, that unlike DQN, after the agent has converged in value iteration, it cannot be used independent of the environment simulator to pick out actions, as it requires the transition function to operate.

106

W.-J. Lee et al.

Value Function. Value function V (s) measures how good it is to be in a specific state. By definition, it is the expected discounted rewards that collect totally following the specific policy: V π (st ) =

T  



Eπθ [γ t

−t

r(st , at )|st ]

(1)

t =t

Approximate Value Iteration. First, dynamic programming can be used to calculate the optimal value of V iteratively. V ∗ (s) = maxa E[rt+1 + γV ∗ (st+1 |st = s, at = a)]

(2)

Then, the value function can be used to derive the optimal policy. The Post-decision State Variable. The post-decision state variabdle is the state of the system after we have made a decision (deterministic) but before any new information (stochastic in general) has arrived. For a wide range of applications, the post-decision state is no more complicated than the pre-decision state, and for many problems is it much simpler [6]. For the remainder of this section st is pre-decision state variable and sat is post-decision state variable, in case of deterministic information, or using a forecast for the information sat is the same as st . 3.3

Training Procedures of DQN and VIPS

In this paper, we propose Value Iteration with Post-States (VIPS) as a method to solve TUSP. The main differences between VIPS and DQN are in the action selection. Specifically, VIPS either takes an exploratory action, or the greedy action with respect to the sum of the immediate reward, and the value of the next state. Note that the agent takes exploratory actions (even though it is a planning problem) for the function approximator to learn how to generalize across more of the reachable state-space. The target value is now based upon the VI Bellman backup rule [1] rather than the Q-learning update rule [9].

4 4.1

Experiment Setup Instance Generation

To test the performance of DRL model in various scenarios, 5,000 problem instances are generated for 4, 5, 6 and 7 trains, separately, resulting in a total of 20,000 problem instances. From these 20,000 problem instances, 1,000 are randomly withdraw as the test instance while the rest are used for training the DRL agents. The shunting yard studied in this work is’de Kleine Binckhorst’ which is one of the smallest yards of NS. In all the instances, there is at least one internal cleaning task assigned.

DRL for TUSP with Interval Timing

4.2

107

Neural Network Architecture

For both DQN and VIPS, Dense Neural Networks are constructed. The input to the neural network is a 1610 dimensional vector. The first and second hidden layers are fully connected layer with 256 and 128 units that use the ReLU activation function. The difference between DQN and VIPS on the nerual network architecture is in the output layer; DQN has a fully connected linear layer with 53 outputs (one for each action) while VIPS has a fully connected linear layer with 1 output. Other hyperparameters for DQN and VIPS are the same with the batch size of 128, memory length of 400,000, random exploration of 400,000 steps, discount factor of 0.99, and ADAM optimizer. 4.3

Off-Policy Action Filtering

The main challenge of training in TUSP problem is that most actions (e.g. possible movements in the yard) are invalid at any given state. Therefore it is interesting to know what would be the effect of filtering on feasible actions while training the DRL agent to constrain its search space. As mentioned earlier, there are many violations possible in the TUSP environment that will lead to immediate termination of the episode. Therefore, an off-policy rule is proposed to filter out actions that would result in immediate violations. It should be noted that the neural network will still estimate a value for the filtered action; hence theoretically it could be chosen as the optimal action (especially during early training).

5 5.1

Results Convergence and Problem Solving Capability

Figure 4 and Fig. 5 show the convergence of Q values and losses of DQN and VIPS learned on either filtered actions or all actions. The light orange lines are the actual values and the dark orange lines show the exponential moving average of those values. It is very clear to see that when DQN learned on filtered actions, its Q values and losses explode after 60,000 episodes which is about 90 h of execution. Because of this apparently bad result, it was terminated earlier compared to other models. VIPS, on the other hand, did converge when only learned on filtered actions. However, when learned on all actions, VIPS converges faster than DQN. VIPS learned on all actions and VIPS learned on filtered actions both converge around 40 h of execution and therefore there is no clear benefit to learn only on filtered actions. The capability to solve TUSP problems is essential for the evaluation of different models. All these four off-policy models are used to find shunting plans for 5 sets of test instances which are randomly withdrawn from 1,000 test instances. Their average performance is listed in Table 1. Models learned on all actions clearly have a better problem solving capability, and VIPS learned on all actions significantly outperforms the others. Figure 4 also supports the results of Table 1.

108

W.-J. Lee et al.

(a) VIPS learned on(b) VIPS learned on(c) DQN learned on(d) DQN learned on filtered actions all actions filtered actions all actions

Fig. 4. Q values of different models in episodes

(a) VIPS learned on(b) VIPS learned on(c) DQN learned on(d) DQN learned on filtered actions all actions filtered actions all actions

Fig. 5. Losses of different models in hours of execution

VIPS learned on all actions converges to a higher Q value compared to other methods which indicates a better strategy is learned. DQN learned on filtered actions, on the other hand, fails to converge and therefore fails to learn a proper strategy. 5.2

Monitoring on Learning Behaviour and Violations

Table 1. Average percentage of solved instances and standard deviations of different models on solving 5 sets of 200 test instances. Percentage of solved instances VIPS, filtered actions 0.601 ± 0.030 VIPS, all actions DQN, filtered actions DQN, all actions

0.905 ± 0.037 0±0 0.782 ± 0.016

As discussed earlier, the main target of this work is to enable DRL agents to learn the concept of time in order to send the trains to service tracks for service at a proper moment. More specifically, the agent should generate a policy that trains should only move around when necessary and if there is no need or it is not a good moment to move, they should choose to wait on the tracks. Therefore, the learning behaviours of agents on learning the concept of waiting (which also implies the concept of time) are also monitored during the training process as

DRL for TUSP with Interval Timing

109

given in Fig. 6. After the random exploration period VIPS learned on all actions very quickly converges on learning the concept of waiting and it also learns to wait for often compared to other models. This suggests that it does not only solve more problems as shown in Table 1, but it also solves the problems with a better strategy.

(a) VIPS learned on(b) VIPS learned on(c) DQN learned on(d) DQN learned on filtered actions all actions filtered actions all actions

Fig. 6. Wait actions of different models in episodes

In addition to monitoring on movement/wait actions, learning behaviours on violations are also monitored during training in this work. The monitoring on the violations and actions has helped a lot on the iterative adjustment on the design of TUSP environment and in understanding the decisions made by DRL agents. Figure 7 show some examples of the learning behaviours of VIPS learned on all actions on some violations. For instance Fig. 7(d) shows that the agent did not learn to send the trains to service tracks at the beginning (after random exploration) and therefore trains often leave with undone service. After some episodes this violation starts to drop and is fully learned by the agent after 300,000 episodes. By monitoring various violations at the same time, we can better understand which are tasks are easy or difficult to learn by the agent, and ultimately to help us understand the bottleneck of the problem instance or the yard itself.

(a) Request train at(b) Move a train un-(c) Depart a wrong(d) Depart a train an empty track der service train type with undone service

Fig. 7. Monitoring of different violations during the training (in episodes) of VIPS learned on all actions

110

6

W.-J. Lee et al.

Discussion and Conclusion

In this paper, we formulate the time constraints of service tasks within TUSP to enable deep reinforcement learning. Using this new formalization, we compare various learning strategies to evaluate the most suitable one for this application. The results show that by assigning extra triggers to agents at fixed time intervals, the agent accurately learns to send the trains to the service tracks in time to comply with the departure schedule. Specific movements and violations are monitored during training to keep track on the capability of the agent on learning certain tasks effectively. These visualizations and monitoring of the agent behaviors have helped a lot in iteratively improving upon the design of the agent and the environment for the TUSP problem. We believe that this strategy of monitoring shall also be helpful for other real-world applications which are as complex as TUSP. In this paper, we have focused on how many instances can be solved successfully, i.e., adhering to all (hard) constraints. However, when this is not possible, there are going to be trade-offs between different constraints. To empower the users of our planning system, we aim to model this using multiple objectives [7] in future work. Acknowledgements. This research was in part supported by funding from the Flemish Government under the “Onderzoeksprogramma Artifici¨ele Intelligentie (AI) Vlaanderen”.

References 1. Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957) 2. Boysen, N., Fliedner, M., Jaehn, F., Pesch, E.: Shunting yard operations: theoretical aspects and applications. Eur. J. Oper. Res. 220(1), 1–14 (2012) 3. Howard, R.A.: Dynamic Programming and Markov Processes. Wiley, New York (1960) 4. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015) 5. Peer, E., Menkovski, V., Zhang, Y., Lee, W.J.: Shunting trains with deep reinforcement learning. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE-SMC, pp. 3063–3068 (2018) 6. Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd edn. Wiley Series in Probability and Statistics (2011) 7. Roijers, D.M., Whiteson, S.: Multi-objective decision making. In: Synthesis Lectures on Artificial Intelligence and Machine Learning vol. 11, no. 1, pp. 1–129 (2017) 8. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998) 9. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

Worskhop on Dynamic Risk managEment for Autonomous Systems (DREAMS)

First Workshop on Dynamic Risk managEment for AutonoMous Systems (DREAMS) Workshop Description Autonomous systems have enormous potential and they are bound to be a major driver in future economical and societal transformations. Their key trait is that they pursue and achieve their more or less explicitly defined goals independently and without human guidance or intervention. In contexts where safety, or other critical properties, need to be guaranteed it is, however, presently hardly possible to exploit autonomous systems to their full potential. Unknowns and uncertainties are induced due to high complexity of the autonomous behaviors, the utilized technology and the volatile and highly complex system contexts. These characteristics render the base assumptions of established assurance methodologies (and standards) void, hence new approaches need to be investigated. One general approach to deal with such unknowns and uncertainties is to shift parts of the development time assurance activities into runtime. Giving autonomous systems risk management skills means empowering them to monitor their environments (i.e. other collaborating systems as well as the physical environment), analyze and reason about implications regarding risks, and, execute actions to control risks ensuring that all risks acceptable at any time – thus conducting Dynamic Risk Management (DRM). DRM has the potential to not only outright enable certain types of systems or applications, but also to significantly increase the performance of already existing ones. This is due to the fact that by resolving unknowns and uncertainties at runtime it will be possible to get rid of worst case assumptions that typically detriment the systems performance properties. The DREAMS workshop intends to explore concepts and techniques for realizing DRM. It invites experts, researchers, and practitioners for presentations and in-depth discussions about prediction models for risk identification, integration between strategic, tactical and operational risk management, architectures for dynamic risk management, and V&V of dynamic risk management. The DREAMS workshop intends to explore concepts and techniques for realizing DRM. It aims at bringing together communities from diverse disciplines, such as safety engineering, runtime adaptation, predictive modelling, control theory, and from different application domains such as automotive, healthcare, manufacturing, agriculture and critical infrastructures.

Enforcing Geofences for Managing Automated Transportation Risks in Production Sites Muhammad Atif Javed(B) , Faiz Ul Muram , Anas Fattouh , and Sasikumar Punnekkat School of Innovation, Design and Engineering, M¨ alardalen University, V¨ aster˚ as, Sweden {muhammad.atif.javed,faiz.ul.muram,anas.fattouh, sasikumar.punnekkat}@mdh.se

Abstract. The key to system safety is the identification and elimination/mitigation of potential hazards and documentation of evidences for safety cases. This is generally done during the system design and development phase. However, for automated systems, there is also a need to deal with unknowns and uncertainties during operational phase. This paper focuses on virtual boundaries around geographic zones (i.e., geofences) that can serve as an active countermeasure for dynamic management of risks in automated transportation/production contexts. At first, hazard analysis is performed using the Hazard and Operability (HAZOP) and Fault Tree Analysis (FTA) techniques. Based on the hazard analysis, appropriate measures, such as geofences for elimination/mitigation of hazards are defined. Subsequently, they are translated into the safety requirements. We leverage on simulation based digital twins to perform verification and validation of production site by incorporating safety requirements in them. Finally, to manage risks in a dynamic manner, the operational data is gathered, deviations from specified behaviours are tracked, possible implications of control actions are evaluated and necessary adaptations are performed. The risk management is assured in situations, such as communication loss, subsystem failures and unsafe paths. This approach provides a basis to fill the gaps between the safety cases and the actual system safety emanating from system/environment evolution as well as obsolescence of evidences. The applicability of the proposed framework is exemplified in the context of a semi-automated quarry production scenario. Keywords: Geofence enforcement · Risk management · Automated transportation · Safety assurance · Digital twin · Quarry site

1

Introduction

In the safety-critical production sites, an unplanned event or sequence of events can potentially harm humans (injuries or even deaths) or create damages to c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 113–126, 2020. https://doi.org/10.1007/978-3-030-58462-7_10

114

M. A. Javed et al.

machines, property or the environment. The system safety is a basic part of the risk management process that emphasizes the identification of hazardous events and their causes, mechanisms for elimination or mitigation of causes, and documentation of evidences for safety cases. This is primarily done during system design and development phase. The safety analysis and risk management are much more cost effective during system design and development than trying to inject safety after the occurrence of an accident or mishap [5]. However, hazard analysis conducted at system design and development phase may not be sufficient for the evolving systems with enhanced automation, digitalization and connectivity. The capability to manage risks at operational phase is essential for them. The virtual boundary around a geographic zone, usually called geofence, can serve as an active countermeasure against operational mishap risks. The Global Positioning System (GPS) is used for tracking and navigation purposes and its information is used for triggering alerts in circumstances when the device enters or exits the geographical boundary of a point of interest [12]. During the past decade, there has been increasing attention in geofences. They are enforced in various domains, such as smart city, healthcare, road transport, smartphones, security, forestry and aerospace [16]. This is particularly useful for automated transportation, for example, to control the movement of machines in hazardous situations and areas, though existing studies have not considered them for managing risks in a dynamic manner. This paper considers geofences as a measure for elimination or mitigation of automated transportation risks. Hazard analysis was carried out as a first step towards safe production site and appropriate mitigation mechanisms such as geofences were established. They were then translated into the safety requirements, which were implemented in digital twins as code scripts. The digital twins were leveraged for performing verification and validation of the production site. The static, dynamic, time-based and conditional geofences were enforced by defining different geometric areas. To carry out the dynamic risk management, the operational data was gathered and the deviations from specified behaviours were tracked. The risk management is assured in situations, such as connection loss, subsystem failures and unsafe movements. Besides that, the gaps between the safety cases and the actual system safety were handled. The applicability of the proposed framework using geofences for dynamic risk management is demonstrated in a simulated quarry production scenario. In the next phase, after studying the effects, accuracy, failure modes and design/standardisation requirements, we plan to implement them in real machines. The rest of this paper is organized as follows: Sect. 2 presents a risk management framework for automated materials transportation. Section 3 demonstrates the effectiveness of proposed framework in a quarry production scenario. Section 4 discusses related work. Section 5 concludes the paper and presents future research directions.

Enforcing Geofences for Risk Management in Production Sites

2

115

Managing Automated Transportation Risks

Risk management is an overarching process that begins during the earliest phase of a system design and continues throughout its entire life cycle. This section describes our proposed overall framework for managing automated transportation risks, as shown in Fig. 1. The framework consists of essentially three stages. In the first stage, the safety analysis during design and development phase is carried out through the identification of hazards, the assessment of risks, and the control of hazards risk. In the second stage, digital twins are utilized in which measures for elimination or mitigation of hazards are implemented. The geofences are supported and given special consideration for risk management. During the verification and validation with digital twin, additional hazards can be detected. In the third stage, the dynamic safety assurance during operational phase, in particular, the risk management and update of safety cases are carried out. Safety Analysis During Design and Development Phase Hazard Identification Select Hazard Analysis Techniques such as HAZOP and FTA

Risk Assessment Evaluate Identified Hazard Causal Factors

Understand the System Find Causal Factors Design and Operation

Assess Hazard Severity and Probability

Risk Control Establish Methods for Eliminating or Mitigating the Hazards Derive Safety Requirements

Document Safety Cases

Geofence-enabled Safety Analysis with Digital Twin Site Environment Depict the Structure, Function, Behaviour and Interactions Parameter Values

Implement Safety Requirements

Locations (Position and Orientation)

Travelling Paths

Geofences Geometric Shapes

Control (GPS, Server and Site Cameras)

Restrictions (Static, Dynamic, Timebased and Conditional)

Handle Events Speed Limits

Safety Assurance During Operational Phase Dynamic Risk Management Data Acquisition from Digital Twin Control Actions

Dynamic Safety Cases

Detect Uncertain or Unknown Situations

Reduce Risk to an Acceptable Level

Implications of the Failure Vulnerabilities

Required Changes in Safety Reasoning and Evidence

Support Continuous Update of Safety Cases Highlight Resolved Gaps with the Actual System Safety

Fig. 1. A framework for managing automated transportation risks

2.1

Safety Analysis During Design and Development Phase

This subsection presents the system safety risk management process at the design and development phase. The initial step is based on systems context establishment in risk analysis (e.g., system design and operation), identification of hazards and their causes. The next step involves the risk assessment with the aim of evaluating the causes of identified hazards, assessing hazard severity and the

116

M. A. Javed et al.

probability of occurrence. The last step is the risk control that focuses on the establishment of mechanisms to eliminate or mitigate those causes, which may result in unexpected or collateral damage, and documentation of the safety cases to demonstrate the acceptable safety of the system. The argumentation editor of OpenCert1 tool platform is used for modelling and visualizing the safety cases, which is based on the Goal Structuring Notation (GSN) [14]. However, Common Assurance and Certification Metamodel (CACM) implemented in OpenCert internally uses the Structured Assurance Case Metamodel (SACM) [11]. The hazard analysis is performed by using HAZOP and FTA techniques. The HAZOP is an inductive technique for identifying and analysing the potential deviations from design intention or operating conditions of a system; whereas, FTA is a deductive analysis approach for modelling, analysing and evaluating failure paths in large complex dynamic systems [5,9]. The hazard analysis not just focuses on the individual behaviour of the used machines in the production site and their emergent interactions, but also the server or other working equipment are considered. The advanced production site can be divided into a set of zones, such as parking, loading, charging, transporting, and unloading (dumping). In production sites, automated vehicles follow defined travel paths to move from one zone to another. However, several potential risks can be found in advanced production sites due to the simultaneous presence of automated vehicles, human-driven machines, and human workers at the same site/zone. Congested zones occur when multiple machines simultaneously arrived at the loading, dumping, charging or parking zones from different paths. The geofences can provide a mechanism to control the movement of machines in such zones. Usually, the machines’ movements are controlled by a fleet management server where the machines inform the server by their locations, and the server permits or restricts them from entering a specific zone. However, if the connection with the server is lost, a machine can enter in the congested zone with high speed and collide with another machine even with the presence of a collision-avoidance system, as the machine is approaching the same location and has not enough distance to stop. In this case, the geofences are particularly useful by enforcing a zero-speed limit at the entry of the congested zone. There is also a possibility that an automated vehicle arrives early at the specific zone and does not maintain the speed limits due to the failure of speed sensors and brakes. Moreover, the failure of obstacle detection and avoidance mechanism caused by Light Detection and Ranging (LIDAR) or camera failure may also lead to a collision of automated vehicles. The hazards like collisions of automated vehicles with static objects (e.g., ladder, stone, fallen pallet), dynamic obstacles (e.g., other automated vehicles, human worker, or other working equipment) appearing in travel paths, and specific zones caused by communication loss, subsystem failures and unsafe paths can also be avoided by enforcing geofences. The safety requirements and mitigation techniques (i.e., geofences) derived from the results of the risk assessment are used for designing and configuring the site environment, which serves as a digital twin of production site. 1

https://www.polarsys.org/opencert/.

Enforcing Geofences for Risk Management in Production Sites

2.2

117

Digital Twin Based Safety Analysis

The digital twin brings out the virtual depiction of the real-world systems; the functions, behaviour, and communication capabilities are mirrored in the digital twin. The digital twins are perceived as an integral part of systems with enhanced automation, digitalization and connectivity. The automated vehicles are utilised for transportation and distribution of materials in advanced production sites. In addition to the automated vehicles, the human-driven machines can also be considered in the digital twin. Besides that, the emergent interactions with machines, site server and other devices are fundamental aspects in the digital twin. The Volvo CE (Construction Equipment) simulators2 , with unique digital twins of their construction machines, were adapted and extended for use in the case study of this paper. A number of different kinds of machines were selected to build a typical production scenario in the simulator. In the scenario, several zones were defined and the paths between these zones were specified. A detailed list of parameters were used to support automated operations of the scenario that can be accessed and changed during operational phase, if necessary. The safety requirements and hazard mitigation recommendations were also implemented in the scenario as code scripts. This not only gives the possibility to detect deficiencies, such as additional hazards and risks but also identify, monitor, evaluate, and resolve deviations from specified behaviours during the operational phase. There are some risks that can only be identified when a mishap occurs; however, the risks may not be detected if the probability is small, so they never happen. 2.3

Geofence Enforcement for Managing Risks

The geofences can serve as a countermeasure that could either eliminate the encountered hazards or reduce the risk of a mishap to an acceptable level. Geofencing is a virtual boundary (shape and dimension) defined for each zone (e.g., loading, unloading and transportation zones etc.), which in turn are divided into segments or edge paths. The automated vehicles are continuously monitored within the geofencing area. To be able to specify the virtual boundaries around geographic zones (i.e., geofences), the geometric shapes can be drawn in site zones. To control the entry for avoiding hazardous conditions due to the presence of multiple vehicles and uncertainties in collaborative interactions between machines, the communication with server is required; the default parameter value of the specific path point is set to restricted. The speed limits are also specified. For instance, the automated vehicle needs to maintain its position within drawn boundary of a loading point and a zero speed limit. Besides the GPS, the additional devices such as site cameras have also been used for locating machines in geofenced areas. The geofencing can be classified as follows:

2

https://www.volvoce.com/europe/en/services/volvo-services/productivity-services/ volvo-simulators/.

118

M. A. Javed et al.

– Static geofences are constant or fixed. In the sites, the loading, dumping and charging areas may not change over time. Besides that, the movement of automated vehicles need to be restricted in various other fixed locations, such as, the storage place of dangerous materials and areas/regions in which humans are working. – Dynamic geofencing moves over time. The capsule shape is used to widen the boundary for collision avoidance, for example, the vehicles not equipped with or have faulty obstacle detection devices, failed hardware, or transporting dangerous materials (e.g., explosive, toxic, etc.) are hazardous. In addition, the automated vehicles are not allowed to go for loading when maintenance team is present on site. In case of adverse environmental conditions such as slippery surface, the movement of automated vehicles can also be avoided in terms of dynamic geofences. – Periodic geofences are only active or inactive for specific time periods. Therefore, they can be enforced to restrict the movement in certain areas for a specified time period. An example is termination of operation at the end of the day, so the movement towards areas except parking places is restricted. – Conditional geofencing: The permissions associated with a geofence depends on certain factors like the number of vehicles can be allowed together, i.e., as a platoon for efficient operation. In addition, to deal with problems, such as path blockage, the movement of automated vehicle with a certain humandriven vehicle can be marked as conditional that follows to formulate an alternative travel path. 2.4

Dynamic Risk Management in Production Site

This subsection discusses the dynamic risk management in a production site. The uncertainty sources that includes the loss of connection with a server, system failures and unsafe paths need to be continuously monitored. The contracts can be derived for the uncertainty sources. In particular, they define the behaviour in a way that the assumptions (conditions) are made on the environment and if they hold then the certain behaviour/properties are guaranteed [2,7]. The site parameter values/ranges are retrieved from the simulation environments for monitoring the uncertainty sources. In circumstances when the deviations from specified behaviours are detected, the implications of failure vulnerabilities are determined and defences against them are performed. Besides the other safety measures, the enforcement of geofences at the operational phase is supported. Dynamic geofences are enforced to reduce mishap risk of emergent and evolving hazards to an acceptable level, for example, the travelling to particular area is blocked and the machines present in area are directed to drive away. The conditional geofence is used as a countermeasure against unsafe paths, the automated vehicle wait for a human-driven vehicle and follows to formulate an alternative travel path. By considering the deviations, implications, and respective control actions, the safety cases modelled in the OpenCert platform are updated. For this reason, the guidance presented in McDermid et al. [8] for safety assurance of automated systems is followed.

Enforcing Geofences for Risk Management in Production Sites

3 3.1

119

Case Study Electric Quarry Site

This section describes an operational quarry site [15], which solely produces stone and/or gravel in various dimensions. The quarry operation is carried out using different kinds of machines, for instance, an excavator, a mobile primary crusher, a wheel loader, autonomous haulers, and a stationary secondary crusher. In particular, they collaborate together to realize the targeted production goals [9]. The quarry site is subdivided into the following different production zones, as shown in Fig. 2. – Feeding Primary Crusher and Loading: The excavator feeds the blasted rocks (i.e., the rocks that are broken out of the mountain with explosives) to primary crusher. The primary crusher breaks the blasted rocks into smaller rocks. This is done to facilitate the transportation to the secondary crusher. For the discharging purpose, the conveyor belt is attached to the primary crusher. It is therefore possible to directly load the haulers from primary crusher. If primary crusher starts to build a stone pile, the direct loading is disabled. In such case, the hauler will be loaded with a wheel loader from the stone pile (i.e. indirect loading). – Transportation: Autonomous haulers and/or articulated haulers are used to transport material in the quarry site. The operation of autonomous haulers is similar to the Automated Guided Vehicles (AGVs). For the perception of surrounding environment, two obstacle detection sensors, in particular, a LIDAR and a camera are mounted whereas the GPS is fitted for tracking and navigation purposes. The data produced by the particular sensors is processed for controlling the mechanical parts, for example, the drive unit for motion and operation, the steering system for manoeuvring, and the braking system for slowing down and stopping the vehicle. The interaction platform and other attachments that include batteries for power supply are integrated into the machines. – Dumping and Feeding Secondary Crusher: The autonomous haulers move in the defined paths and dump the loaded rocks in the feeding spot of the secondary crusher. The secondary crusher further crushes the rocks into smaller granularity or fractions to meet the customer demands. – Charging: To perform the mission efficiently, the required battery level needs to be determined. This is done before performing a mission. There are designated charging stations to recharge the battery whenever needed. If the energy consumption of the autonomous hauler is reduced, then battery needs to be less often charged. Energy consumption also depends on distance between different zones. – Parking: The machines are moved to the parking zone after the termination of transportation operation and for maintenance purposes. For the assignment of specific places, the number and kind of machines needs to be determined.

120

M. A. Javed et al.

Fig. 2. Automated transportation in quarry site

3.2

Simulation-Based Digital Twin

For designing and configuring the quarry site, we have extended and adapted the Volvo CE simulators, fabricated by Oryx3 . They serve as digital twins of various machines used at the quarry site such as the mobile primary crusher, the excavator, the wheel loader, autonomous haulers, articulated haulers, and the secondary crusher. The Volvo CE mobile platforms used for training the operators of articulated haulers, excavators, and wheel loaders are connected to the quarry site to demonstrate the functionality and behaviour of manuallydriven machines. This means that the connected human-driven machines can operate in conjunction with the other machines in the quarry site. For instance, the rocks transportation in the quarry site can be carried out with the humandriven (articulated) and/or autonomous haulers. The site manager specifies the number and kind of machines to be used in the quarry site and their missions to fulfill the production demands. In the scenarios, the specific spots/zones are marked in the site map, e.g., parking, charging, loading, and dumping, as shown in Fig. 2. The transportation paths are also defined that operating machines use to move between different zones. We have modelled and implemented static, dynamic, time-based and conditional geofences necessary for safe site operations in the above simulation testbed. The site management shows the position and movement of all the operating machines; however, the screens placed on the machines show the visual region and machines present in this region. The values regarding timing, location, path 3

https://www.oryx.se/.

Enforcing Geofences for Risk Management in Production Sites

121

points, load capacity and speed limits are provided to the user interface of the site management. For the emergent interactions and geofences, code scripts have been implemented. In the running mode, the information from machines operating in the quarry site is retrieved, stored, and displayed in the site management system. 3.3

Managing Operational Risks in Automated Transportation

We have defined geofences over a) various zones at the site, b) different machines, c) other actors at the site such as humans, and even d) around specified paths of movements. These geofences are of different geometric characteristics (typical studies use mainly circular geofences represented by a centre point and a radius). We also attach appropriate priority levels to indicate their relative importance/precedence. For example, zones have highest priority, followed by emergency/maintenance team, followed by vehicles and humans. In our simulation framework we can also represent geofences with multiple boundaries marked with different color codes (such as an outer boundary marked in yellow followed by an inner boundary marked in red) to indicate the relative criticality levels when another object reaches these boundaries. Static and common geofences details are available to all actors in the system. Dynamic geofences are enforced in conditions of subsystem failures. Periodic geofences are enforced to stop or control the operation and movement for a certain time-period. The conditional geofences are enforced to adapt new behaviour with an acceptable risk level. The geofences-enabled safety is achieved through, central server commands, vehicle level actions, multiple checkpoints and a monitoring system; vehicle level actions are typically of two categories, viz., those taken by self for normal actions and those taken in response to failure conditions of self or others. Examples of server commands sent to vehicles include ‘Queue’, ‘Pause’, ‘Exit’, etc. The essence of all these are to adjust the vehicle speeds to acceptable levels in relation to the context. There are many challenges and trade-offs which we explore through our simulation test-bed before arriving at reasonable values for the geofences as well as command/action sequences in case of uncertainties. We now exemplify few typical scenarios of interest. Normal Flow of Operations – Ensuring Safety at Loading Zone. Let us consider three autonomous haulers. H1 is present inside loading point which is modelled as a geofence, H2 is located at the entry to the loading zone and H3 is approaching towards the loading zone. The successful and safe loading operation requires the presence of only one hauler H1 in the loading point. The automated loading is compromised if two or more haulers arrive in the loading point. There is a need to communicate with the server for entry in geofenced region that is triggered when the located hauler H2 touches the sensor at entry to the loading zone. Since the hauler H1 is already present in the loading point, the hauler H2 requesting permission to enter is given a command to be in a ‘queue’. After the completion of current loading, an ‘exit’ command is given

122

M. A. Javed et al.

to the loaded hauler H1, which then start moving. Next, the waiting hauler in queue (H2) is given the permission to enter; its maximum speed limit is set to 20 km/h. The other hauler H3 may also arrive in the meanwhile and instructed to be in the queue at next level; H3 moves to the place of H2. The boundary of the loading area is constant/fixed. It should not be violated and the hauler needs to maintain 0 km/h speed while at the loading point. If center of mass of hauler is not maintained, the stones will fall on loading point and the haulers may also get damaged. In this case, the risk is not regarded as acceptable; as a control action, the hauler is given a command to exit and approach again. Besides the GPS position, site cameras are also placed to realize precise point positioning. The geofenced regions in quarry site can involve many uncertainties and therefore continuously monitored. Resolving the Failure Cases – Communication Failures: The loss of communication with a server is a safety risk. Besides that, the messages containing less, or wrong data can also cause mishaps. When the primary crusher is jammed, the human operators can be called on site, during that period a direct loading command is sent to the autonomous hauler instead of the loading from wheel loader. The transformation of an incorrect mission or travel path to an autonomous hauler can be caused by an incorrect command or timing failure that leads to an incomplete mission, machine damage or human injuries. In such cases, the control action is in place, i.e., the movement of autonomous haulers is still restricted in geofenced areas. The capsule geometry shapes (geofences) around the haulers provide the means for obstacle avoidance. These shapes are drawn in different ranges and different colors based on their criticality level. When an obstacle is detected in yellow range (indicating move with caution at reduced speeds), the slow down or stop measures are taken, the red range is regarded as emergency stopping distance. The haulers can maintain maximum speed limit if no obstacles are detected within the range. – Subsystem Failures: As another example, consider a subsystem failure. There is a possibility that an autonomous hauler arrives early in loading zone and does not maintain new speed limit due to speed sensor and brake failures. In the former case, the focus is shifted to the map to compute the speed, i.e., for detecting distance covered in time frame. In the latter case, besides the steering wheel rotation commands, depending on the severity risk factor, dynamic geofences are enforced. – Path Problems: The travelling in nearby area is blocked and the autonomous haulers in travelling path, including in standstill mode, such as loading point are commanded to drive away to reduce the risk to an acceptable level. In case of path problem, to create a new path compliant with the conditional geofence, the autonomous hauler wait for the human-driven hauler or another machine, such as wheel loader to formulate an alternative travel path, and then follows it.

Enforcing Geofences for Risk Management in Production Sites

123

Update of Safety Cases. Geofences, together with the permissible operations/commands and associated speed limits gets easily translated into a set of safety contracts between the involved actors. Appropriate monitoring mechanisms need to in place to check the associated parameters during operational phase. Failure cases often leads to partial or full non-conformance to contracts or enable new contracts to be considered during such situations. The update of safety cases is carried out based on the risk control actions. The required changes in safety cases and associated safety contracts modelled in the OpenCert platform are tracked and then the update command is launched, so that the gaps are resolved and current system safety is not compromised.

4

Related Work

Baumgart et al. [1] examine the feasibility of System-Theoretic Process Analysis (STPA) for System-of-Systems (SoS). For this reason, a simplified control structure diagram of a quarry site is used. Muram et al. [9] perform the SoS hazard analysis for the quarry site, in particular, the HAZOP and FTA techniques are applied. The research in [4] presents the idea of through-life safety assurance. It is reflected in four activities: identify, monitor, analyse and respond. Jaradat and Punnekkat [7] discuss the monitoring of runtime failures related to the hardware component and failures analysis by comparing with a predefined threshold. The fault trees were used for deriving safety contracts and defining thresholds. The digital twin, however, is not used. In this paper, besides the hazard analysis during design and development phase, the digital twin is used for gaining confidence and managing transportation risks at the operational phase. Zimbelman et al. [16] use the mobile geofences in forestry to define safe work areas; moving, circular safety zones around people and heavy equipment has the potential to reduce accidents during logging operations. For the tree falling hazard zones around manual fallers, the traditional proximity alerts and the overlap among multiple circular geofences of varying radii are considered. Dasu et al. [3] give a vision of air-traffic control based on geofences. In general, the partition of a sky is needed to allocate the flying space and prevent entry in certain areas. This can be done via geofences. Stevens and Atkins [13] include the operating permissions in geofence specification; access to the airspace is enabled by considering property type or vehicle risk. Nait-Sidi-Moh et al. [10] present the integration of geofencing techniques with the TransportML platform for real-time tracking of mobile devices. The application allows defining adequate and safe itineraries for the registered vehicles. If a vehicle deviates from its geo-corridor, alerts are sent to the in-vehicle computer to warn the driver and to the management center to generate a new alternative itinerary. The use of geofences is analysed in the defence and security sector; for instance, the potential of using geofences as tools to prevent terrorist attacks using hazardous material transportation [12]. Guo et al. [6] developed a model for dynamic geofences. Through a lane-level precise positioning service, the geofences can serve as an efficient and reliable active safety service for a vehicle accident prevention. The published studies have not

124

M. A. Javed et al.

considered the geofences for automated transportation and production sites. In this paper, the queue, pause and exit restrictions are presented that provides efficient resource for risk control in various areas of production site, such as loading, unloading, charging and parking.

5

Conclusion and Future Work

To support the dynamic risk management, which is perceived as an essential characteristic of production sites with enhanced automation, digitalization and connectivity, the central theme of this paper focuses on three particular aspects: (i) hazard analysis and risk assessment during the design and development phase; (ii) virtual depiction of the real site using simulators-based digital twins; and (iii) risk management and update of safety cases at the operational phase. The hazard analysis during the design and development phase is a fundamental element and absolutely necessary for safety-critical systems. Based on the hazard analysis, which is performed with HAZOP and FTA techniques, the mitigation mechanisms, such as geofences are established; they are translated into the safety requirements. The Volvo CE simulators-based digital twins are leveraged in which the mitigation mechanisms and safety requirements are implemented. During the design and development phase, all hazards and causal factors may not be identified. The intention with the simulators-based digital twins is to perform verification and validation to gain confidence in production site. It served as a resource to discover additional hazards. Finally, the dynamic risk management and safety assurance during the operational phase is carried out. The data from digital twin is gathered to identify and monitor deviations from specified behaviours, evaluate and select the optimal control action and resolve problems. Note that the geofences are used as an active countermeasure against mishap risks in various site areas. The applicability has been demonstrated for the Volvo CE electric quarry site. The research presented in the paper is primarily based on the exploratory studies we conducted using a simulation test-bed which features realistic models of the machines and processes of Volvo quarry site. We have so far obtained results which satisfy the coarse level specifications on accuracy and precision requirements. However, we need to conduct further detailed evaluations for stabilisation of the geofencing function, as well as implement them as per the mandatory domain-specific safety requirements for potential inclusion in real Volvo machines. It is noteworthy that, the geofencing function is generally applicable to the broad range of scenarios and domains. In the future, we plan to consider additional scenarios and applications based on the Industry 4.0. To the best of our knowledge, geofences-enabled safety mechanisms are not considered in current domain-specific standards. We also aim to highlight its potentials, in due course by having discussions with the relevant standardization bodies at the national level.

Enforcing Geofences for Risk Management in Production Sites

125

Acknowledgment. This work is supported by SUCCESS (Safety assUrance of Cooperating Construction Equipment in Semi-automated Sites) project via the AAIP (Assuring Autonomy International Programme), FiC (Future factories in the Cloud) project funded by SSF (Swedish Foundation for Strategic Research) and Volvo Group Arena.

References 1. Baumgart, S., Fr¨ oberg, J., Punnekkat, S.: Can STPA be used for a system-ofsystems? Experiences from an automated quarry site. In: 2018 IEEE International Symposium on Systems Engineering (ISSE), Rome, Italy, pp. 1–8 (October 2018). https://doi.org/10.1109/SysEng.2018.8544433 2. Benveniste, A., et al.: Contracts for System Design. Research report RR-8147, INRIA (November 2012). https://hal.inria.fr/hal-00757488 3. Dasu, T., Kanza, Y., Srivastava, D.: Geofences in the sky: herding drones with blockchains and 5G. In: 26th ACM International Conference on Advances in Geographic Information Systems, pp. 73–76 (November 2018). https://doi.org/ 10.1145/3274895.3274914 4. Denney, E., Pai, G.J., Habli, I.: Dynamic safety cases for through-life safety assurance. In: 37th IEEE/ACM International Conference on Software Engineering (ICSE), Florence, Italy, pp. 587–590 (May 2015). https://doi.org/10.1109/ICSE. 2015.199 5. Ericson, C.A.: Hazard Analysis Techniques for System Safety. Wiley, Hoboken (2005) 6. Guo, C., Guo, W., Cao, G., Dong, H.: A lane-level LBS system for vehicle network with high-precision BDS/GPS positioning. Comput. Intell. Neurosci. 2015, 531321:1–531321:13 (2015) 7. Jaradat, O., Punnekkat, S.: Using safety contracts to verify design assumptions during runtime. In: 23rd International Conference on Reliable Software Technologies, Ada-Europe 2018, Lisbon, Portugal, pp. 3–18 (June 2018). https://doi.org/ 10.1007/978-3-319-92432-8 1 8. McDermid, J., Jia, Y., Habli, I.: Towards a framework for safety assurance of autonomous systems. In: Workshop on Artificial Intelligence Safety Co-located with the 28th International Joint Conference on Artificial Intelligence, AISafety@IJCAI, Macao, China (August 2019) 9. Muram, F.U., Javed, M.A., Punnekkat, S.: System of systems hazard analysis using HAZOP and FTA for advanced quarry production. In: 4th International Conference on System Reliability and Safety (ICSRS), Rome, Italy, pp. 394–401 (November 2019). https://doi.org/10.1109/ICSRS48664.2019.8987613 10. Nait-Sidi-Moh, A., Ait-Cheik-Bihi, W., Bakhouya, M., Gaber, J., Wack, M.: On the use of location-based services and geofencing concepts for safety and road transport efficiency. In: Matera, M., Rossi, G. (eds.) MobiWIS 2013. CCIS, vol. 183, pp. 135– 144. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03737-0 14 11. Object Management Group: Structured Assurance Case Metamodel (SACM), Version 2.1 (April 2020). https://www.omg.org/spec/SACM/2.1/PDF. Accessed: 9 Aug 2020 12. Reclus, F., Drouard, K.: Geofencing for fleet & freight management. In: 9th International Conference on Intelligent Transport Systems Telecommunications (ITST), pp. 353–356 (2009). https://doi.org/10.1109/ITST.2009.5399328

126

M. A. Javed et al.

13. Stevens, M., Atkins, E.: Geofencing in immediate reaches airspace for unmanned aircraft system traffic management. In: 2018 AIAA Information Systems-AIAA Infotech @ Aerospace, Kissimmee, Florida (January 2018). https://doi.org/10. 2514/6.2018-2140 14. The Assurance Case Working Group: Goal Structuring Notation Community Standard Version 2, January 2018 (2018). http://www.goalstructuringnotation.info/ 15. Volvo Construction Equipment: Emission-free quarry. https://www.volvoce.com/ global/en/news-and-events/press-releases/2018/testing-begins-at-worlds-firstemission-free-quarry/ 16. Zimbelman, E.G., Keefe, R.F., Strand, E.K., Kolden, C.A., Wempe, A.M.: Hazards in motion: development of mobile geofences for use in logging safety. Sensors 17(4), 822 (2017)

Safety Cases for Adaptive Systems of Systems: State of the Art and Current Challenges Elham Mirzaei1(B) , Carsten Thomas1 , and Mirko Conrad2 1

2

HTW Berlin – University of Applied Sciences, Energy and Information, Wilhelminenhof-Street 75a, 12459 Berlin, Germany {elhammirzaei,carstenthomas}@htw-berlin.de Samoconsult GmbH, Franz¨ osische Street 13-14, 10117 Berlin, Germany [email protected]

Abstract. Adaptive Systems of Systems (SoS) are able to react onto internal and external changes, adapting their member systems and reconfiguring the relations between these. Ensuring continued safety for adaptive SoS is challenging, because either the multitude of relevant configurations must be assessed at design time, or assessment must done dynamically at run time. The concepts of Modular Safety Cases (MSC) and Dynamic Safety Cases (DSC) might form part of a potential solution for these challenges. MSC provide the basis for coping with complexity in SoS and support structural adaption through their modularity. Yet, they are constructed at design time and do not well match with the dynamics and uncertainty of reconfiguration in adaptive SoS. DSC are adapted and re-evaluated at run time. A combination of both approaches could be the foundation for run-time safety assurance for adaptive SoS. In this paper, we analyse the state-of-the-art for MSC and DSC and briefly explain existing amendments to the original approaches. Further we identify current challenges for a full support of safe reconfiguration in adaptive SoS and define potential future research topics. Keywords: Modular safety case system of systems

1

· Dynamic safety case · Adaptive

Introduction

Adaptive Systems of Systems (SoS) are getting more popular due the rapid changes in the industry and their application domains. This is particularly due to their capability for reconfiguration at run time, facilitating more autonomy and reacting more flexible to the changes either in the SoS or in its context. Examples for such adaptive SoS are vehicle platoons in automotive or railway applications, and networks of collaborating machines and transport robots in factories of the future. Adaptive SoS consist of autonomous systems which can decide to be part of the SoS and to build a dynamic connectivity as well as to c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 127–138, 2020. https://doi.org/10.1007/978-3-030-58462-7_11

128

E. Mirzaei et al.

benefit from cooperation within the SoS, in order to achieve greater fulfillment of their own goals, along with the higher SoS goals [11]. Each individual system may be seen as a Cyber-Physical System, and their integration as a CyberPhysical System of Systems (CPSoS) [1]. In such systems, a reconfiguration is the process of changing an already developed and operatively used system in order to adapt it to new requirements, to extend its functionality, to eliminate errors, or to improve quality characteristics [9]. Reconfiguration is divided in two methods, which we could define as programmed and ad-hoc reconfiguration. Programmed reconfiguration is predefined at design time. In contrast, ad-hoc reconfiguration creates higher flexibility, because the system generates possible configurations at run time. Yet, from an assurance point of view, this method is far more challenging [19]. There are three levels of flexibility regarding the selection of new configurations in adaptive SoS [10]: 1. Predefined Selection: Once a dynamic change is initiated, the system chooses a configuration based on a predefined selection made at design time. 2. Constrained Selection from a predefined set: Suitable configurations are defined at design time in relation to given situations or system states. Once a dynamic change is initiated, the system will select the most appropriate configuration from the set of configurations that matches the current situation. 3. Unconstrained Selection: Once a dynamic change is initiated, the system may choose freely from a multitude of possible configurations, or even generate new configurations at run time. From the safety perspective, the development of adaptive SoS with programmed reconfiguration is well supported by design-time assurance methods. However, the complexity of covering all potential configurations of the SoS at design time poses a major challenge to these methods, and unconstrained selection cannot be handled purely with design-time assurance methods. In particular, the openness of adaptive SoS and the dynamic nature of the relations between their members make it impossible to foresee and assess all relevant configurations, unless one strongly restricts the selection flexibility. Alternatively, safety assurance at run time appears as a suitable means to master this challenge, because it could help to restrict the assurance effort to assessing only those configurations which are relevant in a certain reconfiguration situation. Safety Cases (SC) are one important assurance approach to be considered in this context. Originally, the safety case methodology was developed to support safety assurance during design time. In order to be truly applicable for adaptive systems of systems – that emerge and change structurally at run-time, and those constituents change at run-time, too – safety cases must be modular and adaptive, too. Different approaches have been developed throughout the years to address modularity and dynamicity for safety cases. The approaches described in literature support several aspects of what is required for safe reconfiguration in adaptive systems of systems, but none of them seems to cover all required elements. This motivated us to explore the current approaches and to identify the state of the art for modular and dynamic safety cases in the light of adaptive SoS.

Safety Cases for Adaptive SoS

2

129

General Concepts of Safety Cases

We can distinguish between two strategies for safety assurance [7]: 1. Process-based: Use safety standards and regulations as an input and follow a set of structured tasks and task conditions (a safety plan) during the system development. 2. Product-based: Use “safety goals” and “safety concepts” as input to define a set of product requirements that must essentially be considered in the system development to assure its safety. Most often, compliance with detailed prescriptive regulations alone is not sufficient to ensure safety [6]. Consequently, both assurance strategies must be applied in parallel, and their proper application must be assessed and documented. Safety cases are the established means for this. The UK Defence Standard 00-56 [12] defines a safety case as “a structured argument, supported by a body of evidence, that provides a compelling, comprehensible and valid case that a system is safe for a given application in a given environment”. The key elements of safety cases based on this definition are: • Claims, defining properties of the system • Evidences, used as the basis of the safety argument, being facts, assumptions or sub-claims • Arguments, linking evidences to claims • Context, being the environment in which all these safety analysis and arguments are valid. Traditionally, most people were using textual notations to define safety cases. As structuring of the arguments for complex system became challenging, graphical notations were developed supporting the safety case development. In general, there are different notations supporting structuring the arguments to associate the evidences to the claims. Two of the most popular ones are: • Goal Structuring Notation (GSN) [20] • Claim-Arguments-Evidence (CAE) [17] GSN well supports capturing the underlying rationale of arguments. This helps to scope areas affected by a particular change and thus helps developers to propagate the change mechanically through the goal structure. However, GSN do not tell if the suspect elements of the argument in question are still valid. Hence, using GSN does not directly help to maintain the argument after a change, but it helps to more easily determine the questions to be asked to do so [5,14]. There is no generally agreed “correct” way of structuring safety cases. Some generic structuring approaches are: • System architecture: Traversing the system architecture and analysing the failure, risk and hazards within each component. The level of abstraction can be different for each use case and depends on the combinatorial allocation and connection of the components.

130

E. Mirzaei et al.

• System function: Partitioning the system based on different functions that they perform. This includes contributions from all components to perform a specific function. • System safety goals: Breaking down the top-level safety goal of the entire system into sub-goals considering the goal dependencies (i.e., goal-1 satisfaction requiring goal-2 and goal-3 satisfaction). The basic idea in all these structuring approaches is very simple: In many cases, in order to make a claim about a property of an object, we need to investigate whether the object has this property by evaluating its components. To do this, we need to clarify what the property is, what the rules are regarding how to view the object as being composed of components, and how the properties of these components can be combined (i.e., how reliability properties of components are combined when the degree of independence is not known) [18]. In general, understanding, developing, evaluating, and maintaining safety cases is a non-trivial task due to the volume and diversity of information that a typical system safety case must aggregate when best engineering practice is followed [4]. On the other hand, there is some criticism against using safety cases. Leveson, i.e., states that the problem is that it is always possible to find or produce evidence that something is safe. Unlike proving a theorem using mathematics (where the system is essentially “complete” and “closed”, i.e., it is based on definitions, theorems and axioms and nothing else), a safety analysis is performed on an engineered and often social system where there is no complete mathematical theory to base arguments and guarantee completeness [13]. Despite such critique, the safety case methodology is increasingly accepted and used, with the core elements of the critique partially addressed by improvements of the methodology.

3

Modular Safety Cases

One of the big challenges in SC approaches is the considerable effort required for constructing the SCs as well as the high extra costs induced by changes in the system during its life cycle. A small change in a single component may require to re-certify the system and to redo the SC for the entire system. This is decisive in particular in safety-critical system where considerable number of changes occur. An established approach to overcome this situation is the Modular Safety Case (MSC) approach. In this approach, one establishes safety cases by combining elements which can safely be composed, removed and replaced while enabling independent reasoning concerning the safety. The major benefit of this approach is the module-based nature of the safety case which brings more flexibility in terms of interchangeability of the system components. This means that for each change occurring in the system, we only need to update or replace the affected SC module instead of redoing the entire system’s SC. The safety certification for modified systems is a major contributor to the cost of change. A modular approach reduces the cost of re-certification of changed

Safety Cases for Adaptive SoS

131

systems, offsetting any ‘setup’ costs and leading to an overall through-life cost saving, when compared to traditional safety cases. It is worth noting that a safety case is a living document that grows as the system grows. A safety case should be maintained as needed whenever some aspect of the system, its operation, its operating context, or its operational history changes [14]. The first step in MSC development is the design of the architecture of modules. In general, there is no generic and automated method for modularization. There are two approaches: 1. Approach A: Follow the system structure (architected with many other criteria), and define a MSC for each system component. This approach is probably not optimal in terms of effort because not necessarily all the system components will change during the system’s life. On the other hand, aligning the module borders with the system architecture design eases the change impact analysis and prepares well for future change 2. Approach B: Initially follow the system structure, but optimize the MSC structure by merging components that are not expected to change, or that are mutually dependant. This approach is benefiting from the system architecture design for tracing the changes while optimizing the MSC architecture to remove unnecessary modularization. MSC reduce the system complexity and support interchangeability of system components. However, one of the significant challenges is the alignment of the claims of SC modules in the module hierarchy in order to certify the overall system safety. As we iteratively divide the system into more and more subsystems, this challenge is getting more problematic. Consistency and traceability are two key features, which we must assure in the MSC constructions to simplify the system safety composition. Therefore, it is necessary to design the module architecture optimally. One may think that designing a safety argument structure similar to the system design structure can provide the matching. Theoretically, a one-to-one mapping may facilitate tracking down the components of a system design to the safety argument, but it is impractical due to four key factors: (1) modularity of evidence, (2) modularity of the system, (3) process demarcation (e.g., ISO 26262 items), and organisational structure (e.g., who is working on what). These factors have a significant influence when deciding upon the safety argument structure [14]. Fenn et al. [8] list key criteria which make the application of MSC beneficial and should be considered when architecting the MSC structure: A distilled set of change scenarios, re-use, modularity, use of COTS and vendor cooperation, system size and system complexity. The Industrial Avionics Working Group (IAWG) defined the Modular Software Safety Case (MSSC) process [14]. They introduce five main steps for developing MSC. In addition, there is corresponding analysis and extension of the method done by other researcher referring to the MSSC approach: Step 1. Identify change scenarios in product life-cycle: Fenn et al. [8] suggest capturing the change scenarios based on parameters such as likelihood of

132

E. Mirzaei et al.

change, size of change, frequency, complexity, relationship to safety, any required grouping of changes. Step 2. Optimize software design and safety case architecture: Fenn et al. describe concepts in constructing the MSC architecture supporting this step: • High cohesion: where the responsibilities of the SC module are well-focused to assuring, for example, the argument relating to the subject design module • Low coupling: where the reliance of the SC module upon other SC modules is low. • Well-defined interfaces: where any collaborations between SC modules only occur via well-defined module interfaces. • Information hiding: to ensure the impact of change can be determined, we should only expose the minimum necessary information at the public interface of the SC module and keep all unused information private to the interface the SC module. Step 3. Construct safety case modules: Form a hazard mitigation argument and direct the derived safety requirements to software blocks safety case modules. Step 4. Integrate safety case modules: Integrate the safety case modules so that claims requiring support in one safety case module can be linked to claims providing that support in others. IAWG introduced two properties to support this step: • Dependency-Guarantee Relationships (DGRs): Capture the important guaranteed properties of a software component and define the properties on which that component is dependent in order to uphold its guarantee. • Dependency-Guarantee Contract (DGC): Capture the relationship (dependencies) between two software elements. Step 5. Assess/Improve change impact: When a system change occurs, we must assess the impact on the design modules and associated safety case modules. There are two approaches introduced by Kelly et al. for this purpose: • Away goals: One extension to GSN defined to is the “Away Goal” , which references a goal (claim) defined within another module. The drawback of this approach is that they cannot support the main purpose of modularization for avoiding changes beyond the level of necessary module changes • Safety contracts: He proposes that where we can make a successful match between two or more modules, we should record a contract of the agreed relationship between the modules. The advantage of this method rather than away goals is that by change in one module we only need to update the contract with its dependent module. The approaches that we discuss in the previous paragraphs, excellently support the construction of MSC at design time. We can use such modular safety cases in the scope of adaptive SoS, when the flexibility of reconfiguration is limited to predefined of constrained (i.e., programmed) selection (see Sect. 1). For

Safety Cases for Adaptive SoS

133

implementing more flexible reconfiguration approaches, e.g., to support open adaptive SoS, these concepts are not sufficient. In this case, additional mechanisms are required that support construction and evaluation of safety cases at run-time.

4

Dynamic Safety Cases

One of the key challenges in adaptive SoS is the capability of safe reconfiguration at run time. To ensure the continued safety of the SoS, the safety case assumptions often must be re-analyzed at run time. The problem of safety assurance at design time is lack of certainty regarding the system context and evolution during operation. In particular, there are some uncertainties that may occur during the life cycle of the system that can leads to the changes in the system’s safety assurance case. Any mismatch between design hypothesis or risk analysis and the operational requirements may lead to deviations from the assumptions in the safety case. Hence, there is a need for a new class of safety assurance techniques that exploit the run-time related data (operational data) to continuously assess and evolve the safety reasoning to, ultimately, provide through-life safety assurance [15]. Accordingly, reassessment of safety cases using data collected at run-time is essential to move the safety management system towards a dynamic safety management. Indeed, gaps between the documented safety reasoning and the actual safety of the system might lead to “a culture of ‘paper safety’ at the expense of real safety”. Despite significant improvements in operational safety monitoring, there is insufficient clarity on evolving the safety reasoning based on monitored data [5]. Comparing the process of MSC construction at design time to the run-time analysis, one of the key challenges is step 5 (Assess/Improve change impact) where the source of change must be identified at run-time and propagated to the impacted arguments within the correspondent modules. One key concept in this respect is the dynamic safety case (DSC) method introduced by Denney et al. [5] which is as an engineering solution for through-life safety assurance, where the data produced by a safety management system can be utilized to create a continuously evolving assurance argument. Denney argues that a framework for dynamic safety cases must support three fundamental principles: 1. Proactively compute the confidence in, and update the reasoning about, the safety of ongoing operations. The ability to compute confidence from operational data is necessary to evolve safety reasoning and enable continuous safety assurance. 2. Provide an increased level of formality in the safety infrastructure to support automated analysis, e.g., determining dependencies between the system and corresponding argument fragments, and automatically updating the argument based on operational data.

134

E. Mirzaei et al.

3. Provide a mixed-automation framework. The idea is to generalize from the notion of a complete standalone argument to a partially developed, but wellformed, argument with open tasks assigned to various stakeholders. As described, a well-developed safety management system is necessary to support DSC.The method suggests a life-cycle comprising four continuous activities to assure the life-cycle continuous feedback among SCs and the monitored data: 1. Identify: The sources of uncertainty in the safety case can weaken the confidence in safety. As the system and its safety arguments change, so will the assurance deficits. 2. Monitor: We collect data at run-time related to both system and environment variables, events, and the assurance deficits in the safety arguments. To enable monitoring, we periodically interrogate both the argument and its links to external data. 3. Analyze: To understand the impact on safety reasoning, we analyze the operational data to examine whether the threshold defined for assurance deficits are met, and to update the confidence in the associated claims. 4. Respond: Enabling a proactive response to operational events that affect safety assurance is at the heart of DSCs. Deciding on the appropriate response depends on a combination of factors including the impact of confidence in new data, the available response options already planned, the level of automation provided, and the urgency with which certain stakeholders have to be alerted. Jaradat et al. [14] also introduced a concept in this respect through the extension of the MSSC approach. As part of their research result, they propose extensions to the MSSC process for identifying the potential consequences of a system change (i.e., impact analysis), thus facilitating the maintenance of a safety case. They proposed annotating each reference to a development artefact (e.g. an architecture specification) in a goal or context element with an artefact version number. They also proposed annotating each solution element with: 1. An evidence version number. 2. An input manifest identifying the inputs (including version) from which the evidence was produced. 3. The life cycle phase during which the evidence obtained (e.g., software architecture design). 4. A safety standard reference to the clause in the applicable standard (if any) requiring the evidence (and setting out safety integrity level requirements). With these data, they can perform a number of automated checks to identify items of evidence impacted by a change. Feth et al. [16] correctly state another issue with regards of autonomy in autonomous systems, which can be a good example of open adaptive SoS. “Without proper verification and validation (V&V), sufficient evidence to argue safety is not attainable.”

Safety Cases for Adaptive SoS

135

They presented a conceptual framework by means of a meta-model to define a component for safety monitoring – a Safety Supervisor (SSV) – as an instantiation of the meta-model. They also proposed a Safety Supervisor Definition and Evaluation Framework (SSV DEF) instantiated for the automotive domain to assist in conducting what-if analyses in the design of a concrete supervisor component. However, the conceptual framework is domain-independent. One key idea in this concept is considering the elements of the meta-model not as being static but rather as elements that can adapt during run-time. This turns the individual models into models a run-time and allows the supervisor to adapt to changes in the environment and learn from experience. Such an open and adaptive SSV can be embedded into their existing conceptual framework for safety assurance of open adaptive systems. The DSC approach takes advantage of data collected at run-time, but still utilizes this information during design iterations (i.e. at design time throughout the life cycle of the system). In other words, we plan the potential responses for the system at design time and trigger them with accordance with the data collected at run time. Nevertheless, the approach marks an important step into applying modular safety cases at run time – something that is essential for the safe reconfiguration of adaptive SoS.

5

Challenges Related to Adaptive Systems of Systems

The main challenge in flexibly supporting dynamic reconfiguration of adaptive SoS is to enable unconstrained selection at run time. In other words, providing safety assurance for the unconstrained choice regarding the appropriate change to make. One approach to address this challenge could be identifying safety-related system variables at design time, monitoring them at run time, and analysing their variation for prospective configurations during the configuration selection process. However, this is not yet sufficient to cover completely unconstrained selection, since this approach focuses on known variables of known systems. In a truly unconstrained selection, we will not only see parametric changes to existing variables, but also structural changes. For this, we need a dynamic approach that also can dynamically compose and assess safety cases at run time. There are many methods supporting Dynamic Safety Case (DSC) and many approaches for Modular Safety Cases (MSC). However, there is a lack of focus in combining the two approaches addressing the above-mentioned challenges as well as benefiting of both method’s assets. A combination of both approaches in the sense of “Dynamic Modular Safety Cases” (DMSC) could be a good solution. We must look at the following key properties in more details for stepping towards DMSC: Dynamic Monitoring: In current open complex adaptive system, uncertainty is a non-integrated part of the system, which makes it more complex to identify which modules’ properties are precisely in exposure of changes during operations.

136

E. Mirzaei et al.

This necessitates the integration of a concept for run-time observation of safetyrelated properties, which is capable of providing relevant and requested status data as an input for the dynamic safety management system. Run-Time Analysis: Identification of change and propagating it through the respective affected modules while measuring the impact of change remains the key challenge. In other words, reviewing and assessing the arguments at run time as well as embedding the relevant evidences is challenging. This issue remains open for most researchers in this area. As Schneider [3] declares, gradual shift of safety intelligence to run time will be indispensable to ensure the safety of future SoS. To which extent this shift is possible and how exactly it will be realized is still subject for research. Automatic Generation/Updating of Modular Safety Cases: Automating the safety case analysis (either fully or partially with human review) during operation, gives more assets and degree of freedom to the open adaptive SoS in terms of responding to events and decision making at run time. This means that DMSC approach must be able to construct (i.e., structurally synthesize) and analyze new modular safety cases at run time. This leads to another challenge, which is the machine readability as well as automatic regenerating of MSC. A clear interface must be defined for analyzing relevant data that are produced at design time or are the result of run-time monitoring, identifying the hazardous events, integrating the mitigation by modifying the MSC and generating and reasoning about the new MSC as part of the reconfiguration process. Schneider has proposed a framework that enables conditional safety certification for open adaptive systems [2]. This approach, called modular conditional safety certificates (ConSerts), builds on a series of formalized guarantee-demand relationships that can be composed and evaluated at run time. The evaluation result can be interpreted as a run-time safety certificate that supports the autonomous decision of whether the integrated system is currently safe to run or not. For the operationalization of ConSerts, he establishes adequate support for dynamic integration and adaptation as well as appropriate modularization concepts and mapping functions. ConSerts are predefined modular conditional safety certificates characterizing safety properties of the SOS member systems with a certain degree of variability, where the variability is connected to properties of the system or its environment. Being combined and evaluated at run time, and being able to reflect situational variability also for the certificates, the ConSerts approach successfully transfers certain safety analysis work from design time to runtime. Yet, the ConSert itself is “a post-certification artefact” [2] that is initially created during design time. It can represent only a limited pr-engineered amount of flexibility, but still maps well with dynamic reconfiguration based on constrained selection, whilst unconstrained selection is not yet supported. These methods are significant steps towards DMSC development in order to overcome the challenges listed earlier in this paper. However, further research

Safety Cases for Adaptive SoS

137

is required to operationalize these concepts and to demonstrate to which extent we can support safety analysis at run time

6

Summary and Conclusion

In the current context of fast technology evolution, adaptive SoS play a key role to cope with challenges such as rapid changes in the application domain. Reconfiguration is a vital feature of such adaptive SoS, and shall be possible at run-time whilst maintaining their safety properties during and after this process. The complexity and flexibility of these SoS necessitates the development of new approaches to analyse and maintain safety. The basic safety case concept has evolved over the years, and has led to amendments and extensions such as the modular safety case (MSC) approach, and the dynamic safety case (DSC) approach. MSC, in particular was introduced to cope with the system complexity by breaking down safety arguments into modules, in order to reduce cost and impact of changes during the system life-cycle, whilst DSC were developed to bridge the gap between assumptions taken during design time and properties of the realized system becoming apparent during run time. In this paper, we explored the current state of the art of MSC and DSC and discussed the application of these approaches for dynamic reconfiguration of SoS. We examined their application in two different life-cycle phases—at design time and at run time—to identify the unexploited features and properties of these approaches in both phases. The limitations of the MSC and DSC approaches for dynamic reconfiguration of adaptive SoS necessitate combining the two methods into an approach—called Dynamic Modular Safety Cases (DMSC)—to cope with the challenge of unconstrained selection during reconfiguration at run time and benefiting from both method’s assets. In this paper, we tried to capture the state-of-the-art in this field and the challenges toward developing this method.

References 1. Ferrer, B.R., et al.: Towards the adoption of cyber-physical systems of systems paradigm in smart manufacturing environments. In: 2018 IEEE 16th International Conference on Industrial Informatics (INDIN), Porto (2018) 2. Schneider, D.: Conditional safety certification for open adaptive systems: Zugl.: Kaiserslautern, Technical University, Dissertation, 2014, Ph.D. Theses in Experimental Software Engineering, vol. 48. Fraunhofer Verlag, Stuttgart (2014). http:// publica.fraunhofer.de/dokumente/N-283653.html 3. Schneider, D., Trapp, M.: B-space: dynamic management and assurance of open systems of systems. J. Internet Services Appl. 9(1), 1–16 (2018). https://doi.org/ 10.1186/s13174-018-0084-5 4. Denney, E., Pai, G., Whiteside, I.: Proceedings of the Formal Foundations for Hierarchical Safety Cases, HASE 2015. IEEE, Piscataway, NJ (2015) 5. Denney, E., Pai, G., Habli, I.: Dynamic safety cases for through-life safety assurance. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, Italy (2015)

138

E. Mirzaei et al.

6. Cullen, H.L.: The Public Inquiry into the Piper Alpha Disaster, vol. 1. HMSO, London (1990) 7. Habli, I., Kelly, T.: Process and product certification argument getting balance right. SIGBED Rev. 3, 1–8 (2006) 8. Fenn, L., Hawkins, R., Williams, P.J., Kelly, T., Banner, M., Oakshott, Y.: The who where, how, why and when of modular and incremental certification, vol. 532, pp. 135–140. Institution of Engineering and Technology, London (2007) 9. Matevska, J.: Rekonfiguration komponentenbasierter Softwaresysteme zur Laufzeit. Vieweg+Teubner Verlag/Springer Fachmedien Wiesbaden, Wiesbaden, Wiesbaden, Wissenschaft (2010) 10. Bradbury, J.S., Cordy, J.R., Wermelingerb, M.: A Survey of Self-Management in Dynamic Software Architecture Specifications. ACM, New York, NY (2004). http://dl.acm.org/citation.cfm?id=1075405 11. Boardman, J., Sauser, B.: System of Systems - the meaning of of. In: 2006 IEEE/SMC International Conference on System of Systems Engineering, Los Angeles, California, USA, 24–26 April 2006. IEEE (2006). https://doi.org/10.1109/ SYSOSE.2006.1652284 12. Ministry of Defence: Defence Standard 00-56: Safety Management Requirements for Defence Systems (2007) 13. Leveson, N.G.: Engineering a Safer World: Systems Thinking Applied to Safety. MIT Press, Cambridge (2011) 14. Jaradat, O., Bate, I., Punnekkat, S.: Facilitating the maintenance of safety cases. In: Kumar, U., Ahmadi, A., Verma, A.K., Varde, P. (eds.) Current Trends in Reliability, Availability, Maintainability and Safety. LNME, pp. 349–371. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-23597-4 25 15. Jaradat, O., Punnekkat, S.: Using safety contracts to verify design assumptions during runtime. In: Casimiro, A., Ferreira, P.M. (eds.) Ada-Europe 2018. LNCS, vol. 10873, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-31992432-8 1 16. Feth, P., Schneider, D., Adler, R.: A conceptual safety supervisor definition and evaluation framework for autonomous systems. In: Tonetta, S., Schoitsch, E., Bitsch, F. (eds.) SAFECOMP 2017. LNCS, vol. 10488, pp. 135–148. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66266-4 9 17. Bishop P., Bloomfield R.: A methodology for safety case development. In: Redmill F., Anderson T. (eds.) Industrial Perspectives of Safety-critical Systems. Springer, London. https://doi.org/10.1007/978-1-4471-1534-2 14 18. Bloomfield, R., Netkachova, K.: Building blocks for assurance cases. In: 2014 IEEE International Symposium on Software Reliability Engineering Workshops, Naples, Italy (2014). https://doi.org/10.1109/ISSREW.2014.72 19. Batista, T., Joolia, A., Coulson, G.: Managing dynamic reconfiguration in component-based systems. In: Morrison, R., Oquendo, F. (eds.) EWSA 2005. LNCS, vol. 3527, pp. 1–17. Springer, Heidelberg (2005). https://doi.org/10.1007/ 11494713 1 20. Kelly, T.P., Weaver, R.: The goal structuring notation-a safety argument notation. In: Proceedings of the Dependable Systems and Networks 2004 Workshop on Assurance Cases (2004)

Workshop on Dependable SOlutions for Intelligent Electricity Distribution GRIds (DSOGRI)

Workshop on Dependable Solutions for Intelligent Electricity Distribution Grid (DSOGRI) Workshop Description Electrical distribution grids are required to deliver an increasingly more efficient and reliable supply of electricity to end users while supporting also a higher penetration of renewable energy resources. This increased penetration of renewable energy resources as well as the use of increasingly connected intelligent meters, inverters, electrical vehicle represent a challenge for reliability. For this purpose, Distribution System Operators (DSOs) need to implement intelligent solutions capable of guaranteeing a high level of resilience to their systems by offering new functionalities as, for example, prompt identification and localization of faults and diagnosis. Such solutions are based in communication technologies, Information and Communications Technology (ICT) infrastructures, data collection and management, which create the basis for obtaining intelligent, reliable and efficient distribution grids. Despite these functionalities offering a more effective and successful management of electricity distribution, they also introduce a level of interdependence between the ICT infrastructure and the electricity grid, as well as potential cyber-security vulnerabilities, that must be managed. The purpose of this workshop is to investigate issues related to the ICT-based management of failures, including cyber-security aspects, the ability to quantify the quality of the data collected from the sub-systems deployed in the field, in order to make an appropriate diagnosis and detection. Moreover, the workshop provides a forum for researchers and engineers in academia and industry for discussing and analyzing current solutions and approaches, research results, experiences, and products in the field of intelligent electricity grids. Its ultimate goal is to presents the advancement on the state of art in this domain and spreading their adoption in several scenarios involving main DSOs of the power domain. The different areas or research interest include, but are not limited to: – – – – – – – –

Dependable ICT solutions for intelligent electricity distribution grids ICT assisted grid fault management (detection, localization, diagnosis) ICT faults and their impact on grid operation Quantification of data quality and of the impact of data inaccuracies on applications Cascading effects of ICT or grid faults Security Threats and vulnerability Smart grid cyber security challenges Fault, attack and anomaly detection

The 2nd edition of DSOGRI workshop was co-located with the 16th European Dependable Computing Conference (EDCC 2020). We would like to thank the Organizing Committee of EDCC 2020 for giving us the opportunity to organize the

Workshop on Dependable Solutions for Intelligent Electricity Distribution Grid

141

workshop and, in particular, the Workshops Chair, Simona Bernardi, for her precious support. We would also like to thank to the Workshop Program Committee for the important work and support in the revision phase of the papers, each of them received four rigorous reviews. Last but not least, we thank all the Authors of submitted papers, hoping that DSOGRI 2020 will act as a stimulus to continue their research activities in this field.

Drafting a Cybersecurity Framework Profile for Smart Grids in EU: A Goal-Based Methodology Tanja Pavleska1(B) , Helder Aranha2 , Massimiliano Masi3 , and Giovanni Paolo Sellitto4 1

Jozef Stefan Institute, Ljubljana, Slovenia [email protected] 2 Lisbon, Portugal 3 Tiani “Spirit” GmbH, Vienna, Austria [email protected] 4 Rome, Italy

Abstract. As for any other Critical Infrastructure, the design and implementation of a Smart Grid shall satisfy the demand for a strong security posture, while complying with regulatory requirements and maintaining an high level of interoperability among heterogeneous components. In this paper, we provide a goal-based methodology to ensure the fulfillment of the relevant security goals at design time. The methodology enables system architects to devise an adequate set of countermeasures in view of the selected security goals. In order to obtain a cybersecurity profile suitable for the protection of Smart Grids and in particular for Virtual power Plants in Europe, we build upon best practices and accepted standards, such as the security posture defined by the NIST Cybersecurity Framework and the ISO standards that are widely adopted by the EU Critical Infrastructure. In addition, we provide some informative references to demonstrate how they can these frameworks and standards can be integrated into the proposed methodology.

Keywords: Cybersecurity framework Grid security · RMIAS

1

· Goal-based · Profiles · Smart

Introduction

Critical infrastructures demand high security postures: cyber-physical attacks targeted at the infrastructures may endanger human safety, disrupt critical production processes and even affect a country’s economy. Security posture re[resents the security status of enterprise networks, information and systems based on information security resources (e.g., people, hardware, software, policies) and capabilities in place to manage the defense of the enterprise and to react H. Aranha, G. P. Sellitto—Independent Scholar c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 143–155, 2020. https://doi.org/10.1007/978-3-030-58462-7_12

144

T. Pavleska et al.

as the situation changes [1]. The advent of automation and digitization results in critical infrastructures being increasingly interlinked, dramatically increasing their attack surfaces [2]. Government and regulatory agencies, such as the European Union Agency for Cybersecurity (ENISA) and the Department of Homeland Security (DHS) in the USA provide security guidelines to support the implementation of high security standards for critical infrastructures. These guidelines are based on the application of international standards (such as ISO 27001 for IT security, or ISA 62443 for Operational Technology security) profiled for the specific context of Critical Infrastructures. The European Directive on security of network and information systems (NIS) [3] and the US Presidential’s Policy Directive 21 [4] lay down the legal foundations for every infrastructure owner to adhere to in the respective policy domains. However, defining the proper security approach for a specific critical infrastructure instance is a complex and cumbersome task, since it depends on a very specific set of requirements and risk assessments. The National Institute of Standards and Technology (NIST) has drafted a cybersecurity framework (NIST CSF) [5] aimed at harmonising the security posture of critical infrastructures by defining a set of cybersecurity activities regulated by the international security standards and a methodology to complement the risk management process, helping organizations in the implementation of a cybersecurity plan. The framework is broadly used and it can be applied across many critical sectors. In Italy, it is adopted to define the mandatory requirements for certain critical infrastructures to operate in the country [6]. The Smart Grid is a complex critical infrastructure employing innovative products and services together with intelligent monitoring, control, and communication to distribute electric power. It is geographically distributed across different regions or countries, making the implementation of countermeasures such as physical security, typically from the ISO 27001 family, a complex task. This task is even more difficult in the case of Virtual Power Plants (VPPs), where a swarm of small and medium-scale Distributed Energy Resources (DERs) consuming and/or producing electricity are connected through a central control system called the Virtual Power Plant Operator (VPPOP) [7]. Such an environment is a complex system characterized by a high level of heterogeneity of the connected assets and of their owners and by a decentralized governance model, even in presence of a central VPPOP. Although DER operators vary from local households to professional operators, each has to maintain high security levels and adhere to the regulatory requirements. It has already been demonstrated that it is possible to facilitate this task by implementing a methodology for eliciting security countermeasures that can be employed by people with no security skills [8]. In the World Smart Grid Forum of 2013 security was declared as an urgent priority [9]. Moreover, the NIST framework, and in particular the NIST CSF were recognized and proclaimed as the means for protection of the critical infrastructures, denoting the NIST CSF employment as a guiding principle for the Smart Grid projects worldwide. In this paper, we rely on the same goalbased methodology to draft a cybersecurity framework profile tailored to protect VPPs in the EU Smart Grid. Such a profile can highlight both the relevant

Towards a Profile for EU Smart Grids

145

countermeasures and the adequate solutions that the VPP operators can adopt to meet the required level of security when they connect to the Smart Grid. Given the highly distributed nature and the diverse set of devices that are part of the Smart Grid, interoperability comes as a complementary requirement to be addressed together with cybersecurity. Recognizing this, the US Department of Homeland Security and NIST have cooperated to devise a security profile of the framework adapted to smart grids [10]. The profile references the same standards of the NIST CSF. However, those standards are not specific to the Smart Grid and they do not account for the protocol-specific security issues set by the VPPs. These requirements are addressed by other initiatives unrelated to the NIST framework, such as ISO 61850/60870-5-104 and OpenADR [11]. Creating a CSF profile that would account for all Smart Grid scenarios is thus extremely difficult, as it must encompass all aspects related to security, from the power plant situated in the local household (e.g., a photovoltaic panel) to the more complex Intelligent Energy Devices (IEDs). To facilitate this task, we propose a methodology which, given a set of pre-selected high level security goals, allows the drafting of such profiles. This methodology complements the existing threat-and risk-based approaches to security and profiling (like NIST and ENISA), with the goal-based Reference Model for Information Assurance and Security (RMIAS) [12] to enable the attainment of a strong security posture at design-time. Our contribution is hence twofold: first, we introduce our methodology and provide an algorithmic approach to integrate it with the existing models; then, we employ this method to support the architectural development process for secure Smart Grids solutions and pave the way towards the definition of a profile that is readily applicable to the EU Smart Grids. The paper is organised as follows: In Sect. 2 we place our work among the related approaches and point out the significance of our contribution in relation to the state of the art. In Sect. 3, we provide the required theoretical background to understand the cybersecurity framework and its contextualization to the Smart Grid. Section 4 introduces our methodology, which in Sect. 5 is used to draft a profile based on ISO 61850, which is the standard used for SG in Europe, but is not part of the NIS CSF. In Sect. 6 we touch upon future work and conclude.

2

Related Work

The NIS Directive aims at ensuring high level of network and information security across Europe [3]. As a response to the directive requirements, ENISA, national governments and National Regulatory Authorities engaged in joint work in order to achieve harmonized implementation. Three non-binding technical documents were provided as guidance to the NRAs across EU member states [13–15]. Thus, the presented work is also an effort to bridge the existing technical solutions with the regulatory policies and standardization requirements. In that sense [16] provide a holistic account of “security requirements profiles” in an organization by assembling a set of “modular security safeguards”. However, they are concerned only with the technical aspects and mainly serve the solution developers.

146

T. Pavleska et al.

The Integrating the Energy Systems [17] and VHPReady [18] initiatives created specifications aimed at achieving interoperability among the entities in this distributed scenario. The ISO standards 61850 and 60870-5-104 have been selected to report measurements and send functional schedules control messages between the VPP and the DERs. The EU mandated CEN, CENELEC and ETSI [19] to produce a framework for standards-based establishment and sustainability of Smart Grids. The central result of this tripartite Smart Grid Work Group is a model of architectural viewpoints encompassing a broad range of Smart Grid aspects: from the field devices, to functionalities required by software components, and definition of business requirements. In addition to this Smart Grid Architectural Model, SGAM [20,21] the Work Group offers a methodology for the creation of SG specific solution models [22] enabling the evaluation of quality and security aspects [23]. It is worth noting that the definition of the standards does not include security. Both in OpenADR and in ISO 61850, security is treated in separate documents [24]. Basic security requirements such as channel encryption, role-based access control, and key management are handled referring to ISO 62351 [23]. NIST released a profile for improving Smart Grid security infrastructure [5]. The profile is made considering the high penetration of DERs, following four common security-related business objectives: safety, power system reliability, resilience, and grid modernization. However, considering that it is a profile of the NIST CSF core, it is essentially a threat-and risk-based approach. Similar in the objectives to NIST CSF are ENISA technical guidelines on security measures. Both ENISA and NIST CSF require a running system with a history of behaviour in order to derive evaluations and recommendations for improving cybersecurity postures. However, the implementation of Smart Grids after the EU mandate M/490 requires design interventions and reasoning within the architectural model itself, which significantly limits the applicability of the NIST CSF for this purpose. There is criticism about security design frameworks deemed to be too focused on the technical aspects and falling short in detecting and addressing potential design conflicts [25]. An example of this is a system that should implement both anonymity and auditability. By joining the goal-based approach included in our methodology with the threat-view offered by NIST and ENISA, not only the creation of profiles at design time is allowed, but the issue of contradictory requirements in technology management is also addressed.

3

Contextual Considerations

Several standards exist for the design and implementation of a Smart Grid [26]. The Smart Grid is composed of actuators and monitoring devices to realise the modernization of energy transmission, distribution and consumption. Such devices require the secure exchange of messages either over public internet or IoT networks (e.g. LoRaWAN, Sigfox, or 5G) . Interoperability is a crucial aspect of the Smart Grid scenario. The numerous and diverse hardware and software

Towards a Profile for EU Smart Grids

147

versions installed, from the small household’s photovoltaic panels to the power plants, require a strong architectural approach that joins all the required viewpoints and desired capabilities into a single operating solution that is reliable and secure. This is particularly true for the orchestration of the power supplied by the distributed energy resources, where the USA and Europe follow different approaches. While in the USA the standards produced by the OpenADR alliance are mainly used, in Europe ISO 61850 and 60870-5-104 are the prominent ones [17,20]. In the VPP it is required to have reliable energy measurements from the DERs to be used as time series where the operator can simulate and predict the necessary amount of power that the energy market will require in a way that is efficient yet profitable. In turn, the VPP sends control messages, named functional schedules, (FSCH) to DERs, to initiate the energy production. To support the security capability of such architectures, both the NIST and the ENISA defined guidelines for improving their Cybersecurity: the CSF, under the mandate of the Cybersecurity Enhancement Act of 2014 as a technologyneutral framework guides critical infrastructure operators and owners in their cybersecurity activities, by considering the cybersecurity risks as part of the risk evaluation process. The ENISA guidelines, on the other hand sublime an extensive list of national and international EU electronic communications standards into a set of security objectives divided by domain [13]. The CSF is divided into three parts: i) the Core, a set of cybersecurity activities, outcome and informative references to best practices and standards that are common across Critical Infrastructures, ii) the Tiers, a methodology for an organisation to view risks and the process used to manage the risk, and iii) the profiles, i.e., the outcome based on business needs that an organisation has selected from the core. The Core itself is aimed to fulfill five functions: Identify, Protect, Detect, Respond, and Recover. To do that, it further identifies 23 Categories divided into 108 Subcategories for each function. The ENISA guidelines outline 25 security objectives, each analyzed through various security measures and supported by evidence testifying that an objective was met. The security measures are grouped in 3 sophistication levels, whereas the security objectives are divided in 7 domains of application. Both NIST and ENISA follow a threat/risk based approach, requiring an implemented system with history of behaviours that would allow to devise the set of necessary countermeasures (e.g., the NIST gap analysis). In contrast, our methodology allows security reasoning in abstract architectural models, hence not requiring (but also not ruling out) any operational system in place We employ it for the VPP use case, where the need to have security over the DER-VPP communication is established. To do this, RMIAS, NIST CSF and ISO 62351 (as it will emerge during the application of the methodology) are put together into play to provide a cybersecurity framework profile for the EU Smart Grid. The detailed methodological approach of how this is realized, as well as the structural definition of the methodology itself are provided in Sect. 5.

148

4

T. Pavleska et al.

Methodology

In this section, a methodology is proposed that complements the threat-based approaches outlined previously and is adjusted to the Smart Grid context. Then, the architectural considerations of the methodology are addressed and a procedure is established through which the enterprise architect is enabled to perform security reasoning at system design time, in close collaboration with security and business experts. 4.1

Security Considerations

The NIST CSF requires a live system and the history of its behavior to derive a specific threat model by which a risk assessment is made (as specified by the implementation tiers [5, Sect. 2.2]). Instead, the proposed methodology guides the profiling, i.e. the selection of relevant countermeasures, without being limited to NIST core or ENISA guidelines only. The result of its application is a CSF profile created upon the goals set at design time. It is important to note that for any architectural change, the re-applying of the methodology creates another profile or target, which in turn enables keeping track of the security posture evolution (going from current, to target). At the basis of the methodology stands the Reference Model for Information Assurance and Security (RMIAS), which provides a general methodological cycle considering the full lifecycle of the Information System, from inception to operation, monitoring and retirement. It integrates the identification of assets to protect, their categorization into a security taxonomy, the prioritization of security goals to be achieved in relation to the assets, the selection of countermeasures in view of the security goals and monitoring the effectiveness of the applied measures over time. Being a goal-based approach, RMIAS has been successfully combined with threat-based approaches as described in [27]. Figure 1 depicts the application of RMIAS to a Smart Grid system, which is represented in the core diagram with all its assets: Network, Hardware, People, Information, etc. RMIAS is composed of the following security aspects (i.e. dimensions) to provide a goal-based view: Security Development Life Cycle (SDLC, represented in green), Information Classification (which corresponds to the RMIAS taxonomy), Security Goals (in orange) and Countermeasures (in blue). As such: – The SDLC illustrates how security is built up along the Information System life cycle; – Information Taxonomy characterizes the nature of information being protected; – Security Goals contain a broadly applicable list of eight security goals: Confidentiality; Integrity; Availability; Accountability; Authentication; Trust Establishment; Non-repudiation; Privacy and Auditability. – Countermeasures categorize the countermeasures available for asset protection.

Towards a Profile for EU Smart Grids

149

Fig. 1. Relation of the methodology to the overall Smart Grid system

It is assumed that the organization already has an SDLC established— RMIAS only builds on it. We harmonise the set of countermeasures provided by NIST and ENISA by exploiting the inner relationships between NIST subcategories and ENISA guidelines to Security Objectives (see Fig. 3, where ENISA guidelines are mapped to the CSF according to the profiling NIST tool shown therein). To address the threats and vulnerabilities of the system, the goal-based model is complemented by a threat-view which is represented by the purple blocks in Fig. 1. 4.2

Architectural Considerations

The methodology expresses security as an architectural concern to be addressed when designing Smart Grids. For efficiency reasons, it assumes a joint work between the business representatives (experts on what to protect), security experts (who know how to protect) and the solution architect (Smart Grid technical expert), orchestrated by the latter. As represented in Fig. 2, SGAM interoperability layers (themselves architecture viewpoints [23]) are the departing point of a series of methodological steps intended to produce a subset of coherent, synergistic countermeasures that fulfill the security goals considered for the assets to protect. Starting from the bottom up (e.g., from the field devices up to the business concerns) the information assets are identified and evaluated in Step 1. This is done for all stages in assets’ lifecycle, as captured by Step 2. An asset can be in different categories, depending on the specific SDLC stage in Step 3 (e.g., it can become public after an initial classification). The information assets, their category, and the Business Objectives from NIST CSF are an input to the Risk Analysis in Step 4, and asset categories are then prioritized according to risk in Step 5. Security Goals that mitigate said risks are then selected for the

150

T. Pavleska et al.

Fig. 2. Instantiating the methodology with SGAM architecture model and NIST Cybersecurity Framework for Smart Grids

corresponding categories in Step 6. In Step 7 the countermeasures are chosen, either from the available guidelines (e.g., the NIST CSF core or ENISA guidelines) or from external requirements. Finally, in Step 8, the countermeasures are integrated into a CSF profile. Algorithm 1 outlines the exact procedure from the described above: starting with a superset of countermeasures from which we want to create a profile, we loop over SGAM layers and the stages of the SDLC. In line 4 the taxonomy (category) of the assets is defined as an RMIAS tuple . Then, in line 5, we perform risk analysis following the guidance of the Implementation Tiers of NIST CSF, for each information object defined in the taxonomy, after which the countermeasures are selected from the countermeasures set. This allows for a rigorous and repeatable process that the architect can reuse until he considers the definition of the profile satisfactory. Notably, iterating over all SGAM layers is important: security concerns may emerge at any level of scale [28]. When choosing the set of adequate countermeasures, some adjustments should be performed between the NIST CSF and the ENISA guidelines in order to enable seamless employment of both frameworks. This adjustment is depicted in Fig. 3. The upper left corner shows the NIST profiling tool based on which the mapping between the ENISA objectives (presented centrally in the figure) and the NIST categories is performed. The upper right corner shows a visual guidance of how the mapping should be read when switching between the ENISA objectives and the NIST profiling categories. Thus, the seven domains from the

Towards a Profile for EU Smart Grids

151

Algorithm 1: The goal based methodology 1 2 3 4 5 6 7 8

Result: A CSF profile Profiling the countermeasure set (as, e.g., in Fig. 2); foreach SGAM Layer do foreach stage of the System Development Life Cycle do — Define a taxonomy of the assets, as ; — Perform risk analysis using a CSF Implementation Tier to determine the security goals relevant for the taxonomy objects; — Select the security countermeasures for the obtained goals from the countermeasure set. end end

ENISA guidelines (D1–D7), together with their 25 objectives are mapped as follows: – – – –

Business objectives: No domain is mapped to this category Cybersecurity requirements: D1 (SO1–SO4), D3 (SO9–SO12) Technical environment: D5 (SO16–SO18), D7 (SO21–SO25) Operating methodologies: D2 (SO5–SO8), D4 (SO13–SO15), D6 (SO19– SO20)

From this, it becomes evident that the 25 ENISA objectives can be fully mapped to the NIST CSF, but not vice versa, as the Business objectives are not accounted for by the ENISA guidelines. This implies that during the contextualisation of its security objectives, some security measures may be rendered inadequate. Our methodology helps alleviate this issue by enabling the adjustment of the security objectives to a viable outcome right from the solution design. As a result, it: – Drastically diminishes the amount of analysis necessary to select the applicable NIST subcategories for [asset category-security goal] tuples, resulting from the risk assessment when selecting the appropriate countermeasures (notably, each NIST objective prescribes dozens of subcategories, each of which may be applicable to several business goals); and – Helps identify missing or conflicting security requirements and countermeasures obtained from the threat-based approaches. This is also highlighted in the following section. To provide a more general account of the methodology, the example provided in the next section is using the NIST CSF for Smart Grid profile to show the applicability of the proposed methodology.

5

Application Example

To show the applicability of the proposed methodology and to provide a proof of its viability potential, this section provides an example of how it can be applied

152

T. Pavleska et al.

Fig. 3. Adjusting (profiling) ENISA to the NIST CSF to be integrated into the methodology

to the NIST Cybersecurity framework for Smart Grids, using an SGAM-based model to the VPPOP scenario. The example consists of a preparatory stage and a main procedure and follows the methodology presented in the previous section. First, the following preparatory steps are performed for the pre-selection of subcategories from [10] when designing cost-effective countermeasures for risk mitigation: – An SDLC is selected consisting of the following stages: 1. Security requirements engineering, 2. Security Design, 3. Security countermeasures implementation, 4. Security Management and monitoring and 5. Secure retirement of an information system. – A set of Business Objectives from the NIST CSF for Smart Grids is created, containing the following objectives: Maintain Safety, Maintain Power System Reliability, Maintain Power System Resilience and Support Grid Modernization. – A system architecture based on SGAM is available (thus including also the referenced standards) We start applying the procedure shown in Fig. 2. For each SGAM layer, we define a taxonomy entry for each information asset [29]. (Steps 1, 2, and 3).

Towards a Profile for EU Smart Grids

153

In the VPPOP scenario, control messages (the “FSCH”) may have the following entry: < form : electronic, state : transmission, location : restricted, sensitivity : top secret > Clearly, the definition comes from the fact that Smart Grid control messages represents the most sensitive assets to be protected, since they may hinder safety if compromised. Since the location is “restricted” and the sensitivity “top secret”, two natural goals emerge after using, e.g., a Tier 3 Repeatable risk analysis: CONFIDENTIALITY and ACCOUNTABILITY, with high priority (Steps 4, 5 and 6). In order to implement the security countermeasures for those two additional goals, the architect needs to choose specific security countermeasures. In [22,23] the message implementing the FSCH follows ISO 61850. This results in the introduction of the ISO 62351 countermeasures in Step 7, as required by the SGAM Security group [23]. The introduction of this latter standard also enhances the available informative references for the basic NIST CSF core. For example, ISO 62351 introduces authorization through Role Based Access Control that could be used as a reference for PR.AC-4 “Access Permissions and authorizations are managed, incorporating the principles of least privilege and separation of duties”, while the TLS is PR.DS-2 “Data-in-transit is protected”. The need for network segregation is already defined by PR.AC-5 “Network integrity is protected” and referenced to ISA 62443. However, although the PR.PT-1 “Audit logs are determined, documented, implemented, and reviewed in accordance with policy” could potentially fulfill the ACCOUNTABILITY requirement, the informative references do not define any syntax or semantics for them. In order to be able to perform forensic analysis or continuous monitoring, it is of paramount importance for the requirement to have harmonised audit trail entries among a multitude of DERs, from different vendors with different software versions. This countermeasure is lacking both from the ISO 62351, OpenADR, and the cybersecurity framework. After all the cycles over the layers and the asset’s SDLC, we obtain a tailored CSF profile for an architectural model of Smart Grid which operates in the EU, with a rigorous and repeatable process using as informative references all the relevant EU standards.

6

Conclusions

The design and implementation of Smart Grids systems and subsystems must adhere to high standards and regulatory requirements, while satisfying the demand for strong security posture and interoperability. This paper proposed a methodology that joins a goal-based approach with a threat-view on cybersecurity for Smart Grids. The methodology enables the devising of an adequate set of countermeasures in view of a pre-selected number of security goals.

154

T. Pavleska et al.

The work relies on standardized solutions to facilitate the task of creating cybersecurity profiles compatible with the existing practices. It assists the architects of Smart Grid solutions in fulfilling the requirement to account for desired security goals at design time. Since the methodology relies on standardized solutions and combines several formal models to attain its goal, as part of our future work we will focus on its formalization and piloting in a real-world setting. Moreover, to provide a full assistance to the architects, we will provide the tools necessary for model and quality-attributes checking. Although critical infrastructures have similar business objectives, Smart Grid has a peculiarity in that the DERs may span different technological and legal domains, they can even reside in different countries, exposing a complex attack surface. A similar domain (also listed in the NIS directive’s critical infrastructures) is the domain of Cooperative Intelligent Transport Systems. Similarly to the Smart Grid, those system communicate with stations that are geographically distributed. The type of information shared can be the results of a mixed IT/OT/IoT computation, yet result in another complex attack surface. In such settings, the implementation of concepts like defense-in-depth requires rigorous and repeatable approaches. For this reason we aim at enhancing the proposed methodology to cover those complex aspects as well.

References 1. NIST Computer Security Resource Center: Definition of Security Posture (2020). https://csrc.nist.gov/glossary/term/security posture. Accessed 10 July 2020 2. Europol: Attacks On Critical Infrastructures (2020). https://www.europol.europa. eu/iocta/2015/attacks-on-ci.html. Accessed 10 July 2020 3. The European Parliament and the Council of European Union: Directive (EU) 2016/1148 of the European Parliament and of the Council of 6 July 2016 concerning measures for a high common level of security of network and information systems across the Union (2016) 4. The White House: Presidential Policy Directive - Critical Infrastructure Security and Resilience (2013). https://obamawhitehouse.archives.gov/the-press-office/ 2013/02/12/presidential-policy-directive-critical-infrastructure-security-and-resil. Accessed 10 July 2020 5. NIST: Framework for Improving Critical Infrastructure Cybersecurity (2018) 6. Presidency of the Council of Ministers: The Italian Cybersecurity Action Plan (2017). https://www.sicurezzanazionale.gov.it/sisr.nsf/wp-content/uploads/2019/ 05/Italian-cybersecurity-action-plan-2017.pdf. Accessed 10 July 2020 7. Koraki, D., Strunz, K.: Wind and solar power integration in electricity markets and distribution networks through service-centric virtual power plants. IEEE Trans. Power Syst. 33(1), 473–485 (2018) 8. Aranha, H., Masi, M., Pavleska, T., Sellitto, G.P.: Enabling security-by-design in smart grids: an architecture-based approach. In: 15th European Dependable Computing Conference, EDCC 2019, Naples, Italy, 17–20 September 2019, pp. 177–179. IEEE (2019) 9. IEC International Electrotechnical Commission: SGCC State Grid Corporation of China (CN), VDE Association for Electrical, Electronic & Information Technologies (DE). World Smart Grid Forum. Technical report, IEC (2013)

Towards a Profile for EU Smart Grids

155

10. Marron, J., Gopstein, A., Bartol, N., Feldman, V.: NIST Technical Note 2051: Cybersecurity Framework Smart Grid Profile (2019) 11. The OpenADR Alliance: OpenADR (2020). https://www.openadr.org. Accessed 10 July 2020 12. Cherdantseva, Y., Hilton, J., Rana, O.F., Ivins, W.: A multifaceted evaluation of the reference model of information assurance & security. Comput. Secur. 63, 45–66 (2016) 13. ENISA: Technical Guideline on Security Measures. Technical guidance on the security measures in Article 13a (2014) 14. ENISA: Guideline on Threats and Assets. Technical guidance on threats and assets in Article 13a (2015) 15. ENISA: Technical Guideline on Incident Reporting. Technical guidance on incident reporting in Article 13a (2014) 16. Zuccato, A.: Holistic security management framework applied in electronic commerce. Comput. Secur. 26(3), 256–265 (2007) 17. Gottschalk, M., Franzl, G., Frohner, M., Pasteka, R., Uslar, M.: From integration profiles to interoperability testing for smart energy systems at connectathon energy. Energies 11, 3375 (2018) 18. VHPReady: The Communication Standard for Smart Grids (2020). https://www. vhpready.com/en/home/. Accessed 10 July 2020 19. The European Commission: Mandate M/490 (2013). https://ec.europa.eu/growth/ tools-databases/mandates/index.cfm?fuseaction=search.detail&id=475. Accessed 10 July 2020 20. Gottschalk, M., Uslar, M., Delfs, C.: The smart grid architecture model – SGAM. In: The Use Case and Smart Grid Architecture Model Approach. SpringerBriefs in Energy, pp 41–61. Springer, Cham (2017). https://doi.org/10.1007/978-3-31949229-2 3 21. CEN-CENELEC-ETSI: CEN-CENELEC-ETSI Smart Grid Coordination Group Smart Grid Reference Architecture (2012) 22. Neureiter, C., Eibl, G., Engel, D., Schlegel, S., Uslar, M.: A concept for engineering smart grid security requirements based on SGAM models. Comput. Sci. 31(1–2), 65–71 (2016) 23. CEN-CENELEC-ETSI: CCEN-CENELEC-ETSI Smart Grid Coordination Group Smart Grid Information Security (2012) 24. Elgargouri, A., Virrankoski, R., Elmusrati, M.: IEC 61850 based smart grid security. In: 2015 Proceedings of the IEEE International Conference on Industrial Technology (2015) 25. Mercuri, R.: Uncommon criteria. Commun. ACM 45, 172 (2002) 26. Knapp, E.D., Langill, J.T.: Industrial Network Security: Securing Critical Infrastructure Networks for Smart Grid, SCADA, and Other Industrial Control Systems, 2nd edn. Syngress Publishing, Rockland (2014) 27. Pavleska, T., Aranha, H., Masi, M., Grandry, E., Sellitto, G.P.: Cybersecurity evaluation of enterprise architectures: the e-SENS case. In: Gordijn, J., Gu´edria, W., Proper, H.A. (eds.) PoEM 2019. LNBIP, vol. 369, pp. 226–241. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35151-9 15 28. Gridwise Architectural Council: Smart Grid Interoperability Maturity Model (2011) 29. Aranha, H., Masi, M., Pavleska, T., Sellitto, G.P.: Securing mobile e-health environments by design: a holistic architectural approach. In: 2019 International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019, Barcelona, Spain, 21–23 October 2019, pp. 1–6. IEEE (2019)

Workshop on Software Engineering for Resilient Systems (SERENE)

International Workshop on Software Engineering for Resilient Systems (SERENE) Workshop Description SERENE 2020 is the 12th International Workshop on Software Engineering for Resilient Systems, held as a satellite event of the European Dependable Computing Conference (EDCC). Resilient systems withstand, recover from, and adapt to disruptive changes with acceptable degradation in their provided services. Resilience is particularly remarkable for modern software and software-controlled systems, many of which are required to continually adapt their architecture and parameters in response to evolving requirements, customer feedback, new business needs, platform upgrades, etc. Despite frequent changes and disruptions, the software is expected to function correctly and reliably. This is particularly important for software systems that provide services which are critical to society, e.g., in transportation, healthcare, energy production and egovernment. Since modern software should be developed to cope with changes, unforeseen failures and malicious cyber-attacks efficiently, design for resilience is an increasingly important area of software engineering. SERENE has a long tradition of bringing together leading researchers and practitioners, to advance the state-of-the-art and to identify open challenges in the software engineering of resilient systems. This year, eight papers were submitted. Each submission was reviewed by three members of the Program Committee, and four papers were accepted for presentation. The format of the workshop included a keynote, followed by technical presentations. We would like to thank the SERENE Steering Committee and the SERENE 2020 Program Committee, who made the workshop possible. We would also like to thank EDCC for hosting our workshop, and the EDCC workshop chair Simona Bernardi for her help and support, and the editors of CCIS Springer who accepted the papers for publication. The logistics of our job as Program Chairs were facilitated by the EasyChair system.

An Eclipse-Based Editor for SAN Templates Leonardo Montecchi1 , Paolo Lollini2,3(B) , Federico Moncini3 , and Kenneth Keefe4 1

2

Universidade Estadual de Campinas, Campinas, SP, Brazil [email protected] Consorzio Interuniversitario Nazionale per l’Informatica (CINI), Firenze, Italy 3 University of Firenze, Firenze, Italy [email protected], [email protected] 4 University of Illinois at Urbana Champaign, Urbana, IL, USA [email protected]

Abstract. Mathematical models are an effective tool for studying the properties of complex systems. Constructing such models is a challenging task that often uses repeated patterns or templates. The Template Models Description Language (TMDL) has been developed to clearly define model templates that are used to generate model instances from the template specification. This paper describes the tool support that is being developed for applying the TDML approach with Stochastic Activity Networks (SANs) models. In particular, this paper details a graphical editor for SAN templates, which assists users in creating template-level models based on SANs. From these specifications, it will be possible to generate by model-transformation the subsequent instance-level models, which can be studied by simulation or analytical tools.

Keywords: Stochastic Activity Networks Metamodel · Graphical editor

1

· Templates · Sirius ·

Introduction

Model-based evaluation [1] has been extensively used to estimate performance and reliability metrics of computer systems. Constructing and maintaining models for large-scale, evolving systems is a challenging task. In our recent work [2], we defined an approach for reusing the specification of performability models, in particular, Stochastic Petri Net (SPN) models [3]. The approach is based on the concept of model templates, which use well-defined interfaces to interact with This work has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 823788. This work has received funding from the S˜ ao Paulo Research Foundation (FAPESP) with grant #2019/02144-6. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 159–167, 2020. https://doi.org/10.1007/978-3-030-58462-7_13

160

L. Montecchi et al.

connected segments of the model. The interfaces and composition rules are specified using our novel, domain-specific language, Template Models Description Language (TMDL). A model template is essentially a parameterized abstracted version of a model in a specific formalism. From a template, concrete instances can be automatically derived by specifying values for its parameters. In [2] we defined the overall idea of the framework, formalized its definition, and introduced the TMDL language. The framework was designed to be independent of a specific formalism, and it assumes the existence of 1) a templatelevel formalism 2) an instance-level formalism, and 3) a concretize function, to generate an instance-level model from a template-level model. Then, in [4] we completed the formalization by defining Stochastic Activity Network Templates (SAN-T), a template-level formalism based on Stochastic Activity Networks (SANs) [5]. In the same document we also defined the associated concretize function, thus enabling the application of the TMDL approach using SANs as instance-level formalism. However, to be of practical use, appropriate tool support must be developed. In this paper, we introduce an Eclipse-based graphical editor for SAN-T models, the template-level models in the TMDL framework. Using SAN-T models, instance-level SAN models can be generated efficiently and accurately. The generated SAN models can be studied by modeling and simulation tools, such as M¨ obius [6]. The rest of the paper is organized as follows. In Sect. 2 we recall in more details the TMDL framework and the SAN-T formalism. In Sect. 3 we discuss the architecture of the overall tool, and detail the two main components of the editor. In Sect. 4 we show a simple example of application, and in Sect. 5 we conclude the paper with an overview on the planned future work.

2

Background

Before focusing on the new tool architecture, an overview of the TMDL framework and the SAN-T formalism is provided. 2.1

The TMDL Framework

Figure 1 provides an overview of the three major steps utilized by the TMDL framework. In Step #1, a library of reusable model templates is created by an expert. In Step #2, the different system configurations that should be analyzed are defined in terms of “scenarios.” Scenarios are composed of model variants, that is, a selection of model templates with their parameter value assignment. In Step #3, the model instances are automatically created and assembled, thus generating the complete system model for each scenario. What makes the model templates reusable is that they have well-defined interfaces and parameters. Interfaces specify how they can be connected to other templates, while parameters make it possible to derive different concrete models from the same template. A model template has a specification and an implementation. The specification of a template is provided with TMDL. The implementation of an atomic

An Eclipse-Based Editor for SAN Templates

161

Fig. 1. Workflow of the TMDL framework for the automated composition of template performability models. Figure adapted from [2].

template is given using a template-level formalism, that is, a modeling formalism that defines partially specified models in the concrete formalism of choice. By “partially specified”, we mean that some aspects of the structure and behavior of the model are controlled by parameters, e.g., the number of cases of an activity. Conversely, the instance-level formalism is the modeling formalism actually used for the analysis (e.g., SANs). The models generated in Step #3 conform to the instance-level formalism. In [4] we provided the definition of a template-level formalism based on SANs, that we call Stochastic Activity Network Templates (SAN-T). 2.2

Stochastic Activity Network Templates

Stochastic Activity Networks (SANs) are formal models that represent stochastic behavior, in general, of a complex system [5]. A SAN is defined as a tuple: SAN = (P, A, I, O, γ, τ, ι, o, μ0 , C, F, G), where P is a finite set of places; A is a finite set of activities; I is a finite set of input gates; and O is a finite set of output gates. The function γ : A → N+ specifies the number of cases for each activity, that is, the number of possible choices upon execution of that activity. τ specifies the type of each activity; ι maps input gates to activities and o maps output gates to cases of activities. Places can hold tokens; the number of tokens in each place gives the state of the network, called its marking. The behavior of a SAN is determined by input gates and output gates. An input gate contains a predicate on the marking of connected places (input predicate), and an input function that alters the marking. An input gate holds in a certain marking if its input predicate holds. An output gate contains only the output function, which alters the marking of the connected places. Input arcs and output arcs are special cases of input and output gates that add/remove one token to the connected place. When an activity is enabled it can fire (instantaneous activities have priority); when an activity fires, one of its cases is selected. The new marking is obtained by computing the functions of all the input and output gates connected to the activity. The stochastic behavior is given by three

162

L. Montecchi et al.

functions that are associated to each activity a: function Ca ∈ C specifies the probability distribution of its cases; Fa ∈ F specifies the probability distribution of the firing delay; and Ga ∈ G describes its reactivation markings [5]. In [4] we introduced the new SAN-T formalism, as a template-level formalism [2] based on SANs. The idea is to leave some elements of the SAN model unspecified, and to make them depend on parameter values. This is different from what is done for example in the M¨ obius tool [6], where global variables can be used to set initial marking values and distribution parameters. In SAN-T models, parameters can also affect some aspects of the model structure, like the number of cases of a transition or the number of places in the model. Formally, a Stochastic Activity Network Template (SAN-T) is also a tuple: ˜ F˜ , G), ˜ ˜ I, ˜ O, ˜ γ˜ , τ˜, ˜ι, o˜, μ SAN -T = (Δ, P˜ , A, ˜0 , C,

(1)

where Δ is a set of parameters, and elements marked with a tilde accent, ˜·, are modified versions of SANs elements, reformulated to take parameters into account. The main differences are summarized in the following. The set Δ is the set of parameters of the template, which may have a type. We denote with TERMt the set of all the possible terms of type t, that is, all the possible combinations of parameters and operators that are of type t. For example TERMInt is the set of all terms of integer type. P˜ is a finite set of place templates. A place template is a pair (τ, k), where τ is the name of the place, and k ∈ TERMSet{Int} is its multiplicity. When values are assigned to parameters and the instance-level model is derived, the place template is expanded to a set of SAN places. The concept of marking has also been extended. The idea is to anticipate that the place template will be mapped to a set of places, and thus allow the marking for each of them to be specified using an index. Given a set of place templates S˜ ⊆ P˜ , a marking template of S˜ is a mapping μ ˜ : S˜ × N → N. For example, μ ˜(˜ p, 2) = 10 means that the place generated from p˜ having index 2 contains 10 tokens. All the other elements of a SAN have been adapted to depend on parameters. In particular, cases of activity templates also depend on parameters; function γ˜ : A˜ → TERMInt specifies the number of cases for each activity template. An output gate template is connected directly to an activity template, as opposed to normal output gates that are connected to activity cases. When a regular SAN is generated from the template, the output gate template will be expanded to multiple concrete output gates.

3 3.1

Tool Architecture Overview

An overview of the tool that will implement the TMDL framework is presented in Fig. 2, showing its main components and their dependencies. Colors indicate the role of each component with respect to the workflow of Fig. 1. The whole tool is

An Eclipse-Based Editor for SAN Templates

163

Fig. 2. Main components of the TMDL Framework for SANs, and their dependencies. An arrow from A to B means that component A uses component B. Dependencies between Eclipse components are not shown. (Color figure online)

based on the ecosystem provided by the Eclipse Modeling Framework (EMF)1 , and it is composed of a formalism-independent layer and a formalism-specific layer. The project is available as a public GitHub repository [7]. The formalism-independent layer implements the part of the framework that is not tied to a specific instance-level formalism. This is basically the TMDL metamodel and the associated composition algorithm introduced in [2]. The formalism-specific layer includes the components that support a specific instance-level formalism, in this case SANs. The core of this layer are the metamodels of the instance-level and template-level formalisms. As shown in Fig. 2, they depend on the TMDL metamodel; this dependence is limited to the root elements (i.e., the SAN and SANT metaclasses), which need to extend specific classes to connect to the framework. The focus of this paper are the components in the dashed region (in yellow), which includes the two metamodels and the graphical editor for SAN-T models. The other components are based on these two metamodels: the concretize function that transforms SAN-T models into SAN models will be realized as a model-to-model transformation; instead, a model-to-text transformation will be developed to generate the concrete input for the M¨ obius analysis tool, which uses an XML-based format. 3.2

SAN and SAN-T Metamodels

To the best of our knowledge there was no EMF-based metamodel that supports all the concepts of the SAN formalism, and we thus developed one as part of this project. We based on the formal definition of SANs [5], but also on their practical implementation provided by M¨ obius [6]. In fact, M¨ obius includes some variations to the original definition; for example, it supports extended places, 1

https://www.eclipse.org/modeling/emf/.

164

L. Montecchi et al.

Fig. 3. SAN-T metamodel—Core.

Fig. 4. SAN-T metamodel—Places.

which may hold values of other datatypes, instead of natural numbers only. The complete SAN metamodel contains 57 metaclasses and 4 packages: Core, Types, Expressions, and Distributions. The SAN-T metamodel is organized in 4 packages: Core, Places, Cases, and Gates, and it also reuses some elements from the SAN metamodel. Figure 3 depicts the Core package, which defines the main elements of a SAN-T model. Figure 4 depicts the Places package, which defines elements used in place templates to specify their multiplicity and initial marking. A PlaceTemplate contains a MarkingTemplate and a Multiplicity. The metamodel contains classes to specify the multiplicity in different ways: as a constant value (MultiplicityValue), as a parameter (MultiplicityParametric), as an array (MultiplicityArray), and as a range of values (MultiplicityRangeOperator ). The complete SAN-T metamodel contains 22 metaclasses, in addition to those reused from the SAN metamodel. 3.3

Graphical Editor

Based on the metamodels introduced above, we used the Sirius tool to create a graphical editor for SAN-T models. Sirius is an Eclipse project that permits to easily create customized graphical editors. It adopts a declarative approach, in which the developer specifies which element of a certain metamodel should be represented and how, and the framework takes care of the actual realization

An Eclipse-Based Editor for SAN Templates

165

of the feature and of its integration within Eclipse. Thanks to these facilities, our editor supports most of the features expected from a graphical editor: a tool palette, a property page, synchronization between model and its graphical representation, copy and paste, rearrangement of connections, etc. (Fig. 5). As a positive side effect of the fact that the SAN-T metamodel reuses elements of the SAN metamodel, our editor can also be used to specify regular (i.e., instance-level) SAN models. However, these models will be stored as XMI files, and thus they cannot be directly used as input to M¨ obius at this time.

Fig. 5. Editing properties of a place using our editor.

4

Application Example

In this section we provide a simple example of application of our editor. We use the editor to specify the same example that was introduced in [4]; more specifically, in Fig. 3(c) of that document. The context is that of a mobile network, in which different services and different classes of users are available. Different users have similar behavior, but may request different network services. The variability is both in the amount of services they have access, and also on which ones. Each service is identified by a number. Figure 6 shows the User SAN-T model, specified with our editor. It includes three normal places (Idle, Failed, and Dropped ), and a place template (Req). It also has two normal instantaneous activities (Fail and Drop), and one timed activity template (Request). The part of the model that remains unspecified (i.e., the “template” part) is the one determining which services are requested by the user, and with which probability. This aspect of the model is determined by parameter S and by parameter P . It should be noted that the connections involving template elements are highlighted with a different color in the editor (green), to facilitate their identification.

166

L. Montecchi et al.

Fig. 6. Example of a SAN-T model, specified with our editor. Request, OGRequest, and Req are template-level entities, and thus highlighted in green. (Color figure online)

The semantic of the model is the following. The user is initially in idle state (place Idle contains one token). After a certain amount of time, given by the distribution associated with activity Request, the user requests a network service. The identifiers of the services he or she may request are given by the value(s) of parameter S, which determines how many places named ReqX will appear in the generated instance-level SAN model (MultiplicityParametric element). The number of cases of the activity Request and their probabilities are given by the P parameter (CaseSpecificationProbabilityArray). The output gate template OGRequest determines that if case k is selected, then a token should be added to place Req k . When a token is added to place Failed or Dropped, the corresponding activity fires, and the user goes back to the idle state.

5

Concluding Remarks

In this paper we have introduced a graphical editor for SAN-T models, developed on top of the modeling facilities provided by Eclipse. The editor is part of a bigger project that aims to provide concrete tool support for applying the TMDL approach with SANs. As future work, we are working on completing the tool support for an end-to-end application of the TMDL approach, implementing the components highlighted in green in Fig. 2. It should be noted that most of them will rely on the SAN and SAN-T metamodels that we developed for realizing this editor. In particular, as the next immediate steps we plan to implement the modelto-model transformation from SAN-Ts to SANs, and on the model-to-text transformation to generate the input for the M¨ obius framework. These two components will finally enable the evaluation of performability metrics based on SAN-T models specified with our editor.

An Eclipse-Based Editor for SAN Templates

167

References 1. Nicol, D.M., Sanders, W.H., Trivedi, K.S.: Model-based evaluation: from dependability to security. IEEE Trans. Dependable Secure Comput. 1(1), 48–65 (2004) 2. Montecchi, L., Lollini, P., Bondavalli, A.: A template-based methodology for the specification and automated composition of performability models. IEEE Trans. Reliab. 69, 293–309 (2020) 3. Ciardo, G., German, R., Lindemann, C.: A characterization of the stochastic process underlying a stochastic Petri Net. IEEE Trans. Softw. Eng. 20(7), 506–515 (1994) 4. Montecchi, L., Lollini, P., Bondavalli, A.: A Formal Definition of Stochastic Activity Networks Templates. arXiv arXiv:2006.09291 (June 2020). https://arxiv.org/abs/ 2006.09291 5. Sanders, W.H., Meyer, J.F.: Stochastic activity networks: formal definitions and concepts. In: Brinksma, E., Hermanns, H., Katoen, J.-P. (eds.) EEF School 2000. LNCS, vol. 2090, pp. 315–343. Springer, Heidelberg (2001). https://doi.org/10.1007/ 3-540-44667-2 9 6. Clark, G., et al.: The M¨ obius modeling tool. In: Proceedings 9th International Workshop on Petri Nets and Performance Models, September 2001, pp. 241–250 (2001) 7. TMDL Framework. https://github.com/montex/TMDL-Framework. Accessed 7 Aug 2020

Interplaying Cassandra NoSQL Consistency and Performance: A Benchmarking Approach Anatoliy Gorbenko1,2(&) , Alexander Romanovsky3, and Olga Tarasyuk4 1

4

Leeds Beckett University, Leeds, UK [email protected] 2 National Aerospace University, Kharkiv, Ukraine 3 Newcastle University, Newcastle-upon-Tyne, UK Odessa Technological University STEP, Odessa, Ukraine

Abstract. This experience report analyses performance of the Cassandra NoSQL database and studies the fundamental trade-off between data consistency and delays in distributed data storages. The primary focus is on investigating the interplay between the Cassandra performance (response time) and its consistency settings. The paper reports the results of the read and write performance benchmarking for a replicated Cassandra cluster, deployed in the Amazon EC2 Cloud. We present quantitative results showing how different consistency settings affect the Cassandra performance under different workloads. One of our main findings is that it is possible to minimize Cassandra delays and still guarantee the strong data consistency by optimal coordination of consistency settings for both read and write requests. Our experiments show that (i) strong consistency costs up to 25% of performance and (ii) the best setting for strong consistency depends on the ratio of read and write operations. Finally, we generalize our experience by proposing a benchmarking-based methodology for run-time optimization of consistency settings to achieve the maximum Cassandra performance and still guarantee the strong data consistency under mixed workloads. Keywords: NoSQL  Cassandra database  CAP theorem Consistency  Performance  Latency  Benchmarking

 Trade-off 

1 Introduction NoSQL (Non SQL or Not Only SQL) databases have become the standard data platform and a major industrial technology for dealing with enormous data growth. They are now widely used in different market niches, including social networks and other large-scale Internet applications, critical infrastructures, business-critical systems, IoT and industrial applications. NoSQL databases designed to provide horizontal scalability are often offered as a service by Cloud providers. The concept of NoSQL databases has been proposed to effectively store and provide fast access to the Big Data sets whose volume, velocity and variability are difficult to deal with by using the traditional Relational Database Management Systems. Most NoSQL stores sacrifice the ACID © Springer Nature Switzerland AG 2020 S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 168–184, 2020. https://doi.org/10.1007/978-3-030-58462-7_14

Interplaying Cassandra NoSQL Consistency and Performance

169

(atomicity, consistency, isolation and durability) guarantees in favour of the BASE (basically available, soft state, eventually consistent) properties [1], which is the price to pay for distributed data handling and horizontal scalability. The paper discusses the trade-offs between consistency, availability and latency, which is in the very nature of NoSQL databases. Although these relations have been identified by the CAP theorem in qualitative terms [2, 3], it is still necessary to quantify how different consistency settings affect system latency. Understanding this trade-off is key for the effective usage of NoSQL solutions. While there are many NoSQL databases on the market, various industry trends suggest that Apache Cassandra is one of the top three in use today together with MongoDB and HBase [4]. There have been a number of studies, e.g. [5–9], evaluating and comparing the performance of different NoSQL databases. Most of them use general competitive benchmarks of usual-and-customary application workloads (e.g. Yahoo! Cloud Serving Benchmark, YCSB). Reported results show that depending on the use case scenario, deployment conditions, current workload and database settings any NoSQL database can outperform the others. Other recent related works, such as [10–12], have investigated measurement-based performance prediction of NoSQL data stores. However, the studies, mentioned above, do not investigate an interdependency between consistency and performance that is in the very nature of such distributed database systems and do not study how consistency settings affect database latency. In this paper we put a special focus on quantitative evaluation of the fundamental Big Data trade-offs between data consistency and performance using the Cassandra database as a typical example of distributed data storages. Apache Cassandra offers a set of unique features (e.g. tuneable consistency, extremely fast writes, ability to work across geographically distributed data centres, etc.) and provides high availability with no single point of failure which makes it one of the most flexible and popular NoSQL solutions. Moreover, we would like to equip the developers of distributed systems that use Cassandra as the distributed data storage with the practical guidance allowing them to predict the Cassandra latency taking into account the required consistency level and to coordinate consistency settings of read and write requests in an optimal manner. In the paper we propose a benchmarking approach to optimising the Cassandra performance with guaranteeing the strong data consistency under mixed workloads. Because the real workload mix can evolve and change over time impacting the desirable Cassandra settings, we propose to monitor the current workload mix and chose the optimal consistency setting at run time to get the highest throughput by making use of benchmarking results collected during system load testing. The rest of the paper is organized as follows. In the next section, we briefly discuss the fundamental CAP trade-off for distributed systems and replicated data storages, and analyse the Cassandra tuneable consistency feature. In Sects. 3 and 4, we describe our methodology and the experimental setup, and present the results of the Cassandra performance benchmarking for different consistency levels. Sections 5 and 6 discuss the optimal consistency settings and propose a methodology for a run-time optimization of consistency settings for read and write requests under different workloads. Finally, some practical lessons learnt from our work are summarized in Sect. 7.

170

A. Gorbenko et al.

2 Trade-Offs Between Consistency, Availability and Latency 2.1

CAP Theorem

The CAP conjecture [2], which first appeared in 1998–1999, defines a trade-off between system availability, consistency and partition tolerance, stating that only two of the three properties can be preserved in distributed replicated systems at the same time. Gilbert and Lynch [3] view the CAP theorem as a particular case of a more general trade-off between consistency, availability and latency in unreliable distributed systems which, nevertheless, assume that updates are eventually propagated. System partitioning, availability, consistency and latency (response time) are tightly connected [13]. Moreover, we believe that these properties need to be viewed as more continuous than binary. A replicated fault-tolerant system becomes partitioned when one of its parts does not respond due to arbitrary message loss, delay or replica failure, resulting in a timeout. System availability can be interpreted as the probability that each client request eventually receives a response. Failure to receive responses from some of the replicas within the specified timeout causes partitioning of the replicated system. Thus, partitioning can be considered as a bound on the replica latency/response time [14]. A slow network connection, a slow-responding replica or the wrong timeout settings [15] can lead to an erroneous decision that the system has become partitioned. When the system detects a partition, it has to decide whether to return a possibly inconsistent response to a client or to send an exception message in reply, which undermines system availability. Consistency is also a continuum, ranging from weak consistency at one extreme to strong consistency on the other, with varying points of eventual consistency in between. The designers of distributed fault-tolerant systems cannot prevent partitions which happen due to network failures, message losses, hacker attacks or components crashes and, hence, have to choose between availability and consistency. One of these two properties has to be sacrificed. The architects of modern distributed database management systems and large-scale web applications such as Facebook, Twitter, etc. often decide to relax consistency requirements by introducing asynchronous data updates in order to achieve higher system availability and allow a quick response. Yet the most promising approach is to balance these properties [16, 17]. 2.2

Cassandra’s Tuneable Consistency

The Cassandra NoSQL database extends the concepts of strong [18] and eventual [19] consistency by offering tuneable [20] consistency. Consistency in Cassandra can be configured to trade-off availability and latency versus data consistency. The consistency level among replicated nodes can be controlled on a per-operation basis. Thus, for any given read or write operation, a client can specify how consistent the requested data must be. The read consistency level specifies how many replica nodes must respond to a read request before returning data to the client application. In turn, the write consistency level determines the number of replicas on which the write must succeed before returning an acknowledgment to the client. All Cassandra read and write requests support the following basic consistency settings:

Interplaying Cassandra NoSQL Consistency and Performance

171

– ONE: data must be written to the commit log and memtable of at least one replica node before acknowledging the write operations to a client; when reading data, Cassandra queries and returns a response from a single replica (the nearest replica with the least network latency); – TWO: data must be written to at least two replica nodes before being acknowledged; read operations will return the most recent record from two of the closest replicas (the most recent data is determined by comparing timestamps of records returned by those two replica); – THREE: similar to TWO but for three replicas; – QUORUM: a quorum of nodes needs to acknowledge the write or to return a response for a read request; a quorum is calculated by rounding down to a whole number the following estimate: replication_factor/2 + 1; – ALL: data must be written to all replica nodes in a cluster before being acknowledged; read requests return the most recent record after all replicas have responded. The read operation will fail even if a single replica does not respond. If Cassandra runs across multiple data centres, a few additional consistency levels become available: EACH_QUORUM, LOCAL_QUORUM, LOCAL_ONE. The sum of nodes written and read being greater than the replication factor always ensures strong data consistency when a read never misses a preceding write. Thus, if data consistency is of a top priority, one can ensure that a read always reflects the most recent updates by using the following: ðnodes written þ nodes read Þ [ replication factor;

ð1Þ

otherwise, the eventual consistency occurs. For example, the strong consistency is ensured if, either: – the QUORUM consistency level is set for both write and read requests; – the ONE consistency level is set for writes and ALL for reads; – the ALL consistency level is set for writes and ONE for reads. The weaker consistency level, the faster Cassandra should perform read and write requests. Balancing between nodes_written and nodes_read in (1), Cassandra users can give the priority to read or write performance still guaranteeing the strong data consistence.

3 Cassandra Performance Benchmarking 3.1

Methodology and Experimental Setup

In this section we describe our performance benchmarking methodology and report the experimental results showing how consistency settings affect latency of the read and write requests for the Cassandra NoSQL database. Cassandra Deployment Setup. As a testbed we have deployed the 3-replicated Cassandra 2.1 cluster in the Amazon EC2 cloud (Fig. 1). Replication factor equal to 3

172

A. Gorbenko et al.

is the most typical setup for many modern distributed computing systems and Internet services, including Amazon S3, Amazon EMR, Facebook Haystack, DynamoDB, etc. The cluster was deployed in the AWS US-West-2 (Oregon) region on c3.xlarge instances (vCPUs – 4, RAM – 7.5 GB, SSD – 2  40 GB, OS – Ubuntu Server 16.04 LTS). Benchmark. Our work uses the YCSB (Yahoo! Cloud Serving Benchmark) framework which is considered to be a de-facto standard benchmark to evaluate performance of various NoSQL databases like Cassandra, MongoDB, Redis, HBase and others [5]. YCSB is an open-source Java project. The YCSB framework includes six out-of-thebox workloads [5], each testing different common use case scenarios with a certain mix of reads and writes (50/50, 95/5, read-only, read-latest, read-modify-write, etc.). In our experiments we used the read-only Workload C, and the Workload A, parametrized to execute write-only operations. All the rest Cassandra and YCSB parameters (e.g. request distribution, testbed database, etc.) were set to their default values. The testbed YCSB database is a table of records. Each record is identified by a primary key and includes F string fields. The values written to these fields are random ASCII strings of length L. By default, F is equal 10 and L is equal 100, which constructs 1000 bytes records. The final size of the testbed database reached 70 GB by the end of our experiments. The YCSB Client is a Java program that generates data to be loaded to the database, and runs the workloads. It was deployed on a separate VM in the same Amazon region to reduce influence of the unstable Internet delays. Benchmarking Scenario. Some examples of general methodologies for benchmarking Cassandra and other NoSQL databases with YCSB can be found in [4, 21]. However, unlike these and other works (e.g. [5–9]) studying and comparing the maximum databases throughput we put the focus on analysing the dynamic aspects of the Cassandra performance under different consistency settings. In particular, we analyse how the database latency and throughput depend on a current workload (i.e. number of concurrent requests/threads). To achieve this we run a series of YSCB read and Replica 2 write performance tests on Apache Cassandra with a Replica 3 Location: Amazon WS, number of threads varying US-West-2 (Oregon) region from 10 to 1000. The operVM instance: с3.xlarge Cassandra 2.1 ation count within each (vCPUs – 4, RAM – 7.5 GB, Cluster SSD – 2x40 GB) thread is set to 1000. The OS: Ubuntu Server 16.04 LTS same scenario is run for read and write workloads on the 3-replicated Cassandra clusYCSB client ter with the three different Replica 1 consistency settings: ONE, QUORUM and ALL. Fig. 1. Experimental setup: Cassandra cluster.

Interplaying Cassandra NoSQL Consistency and Performance

3.2

173

Raw Data Cleansing

YCSB supports different measurement types including ‘histogram’, ‘timeseries’ and ‘raw’. In our experiments we set it to ‘raw’ when all measurements are output as raw data points in the following csv format: operation (READ|WRITE), timestamp of the measurement (ms), latency (us). This allows us to plot the response delay graphs, see the examples in Figs. 2 and 3 (each graph superimposes three curves corresponding to different consistency settings: ONE, ALL, QUORUM). The ‘Raw’ measurement type, used in our experiments, requires further manual analysis of the benchmarked data. Though, it also provides a great flexibility for a posterior analysis and allows us to get important insights rarely discussed by other researchers. In particular, we noticed that the cold start phenomenon can have a significant effect on the results of the Cassandra performance analysis. This phenomenon exhibits itself through the initial period of low performance observed at the beginning of each read and write tests (see Figs. 2 and 3). In general, its duration depends on the database size, available RAM, intensity of read and write requests and their distribution and other factors. In all our experiments this period lasts approximately 800–1000 ms. This phenomenon is explained by the fact that Cassandra uses three layers of data store: memtable (stored in RAM and periodically flushed to disk), commit log and SSTable (both are stored on disk). If the requested row is not in memtable, a read needs to look-up in all the SSTable files on disk to load data to memtable. In addition, Cassandra also supports integrated cacheing and distributes cache data around the cluster. Thus, during the cold start period Cassandra reads data from SSTables to memtables and warms up cache. This period was taken out of consideration in our further statistical analysis. Otherwise, the average performance estimates would be significantly biased. For instance, in our experiments the delays measured during the cold start were on average 5–8 times longer than the ones measured during the rest of time.

600000 Time, us

Read delay

75000 65000 55000 45000 35000 25000 15000 5000

500000

400000

300000

ALL

Time, us

QUORUM

ONE

200000 Response no 100000

0 1

1001

2001

3001

4001

5001

6001

7001

8001

9001

10001 11001 12001 13001 14001 15001 16001 17001 18001 19001 20001

Fig. 2. A fragment of the READ delay graph, 500 threads.

174

A. Gorbenko et al.

500000 Time, us 450000

Write delay

40000 35000

400000

Time, us

30000

350000

25000

ALL

QUORUM

ONE

20000

300000

15000

250000

10000

200000

5000

150000 100000

Response no

50000 0

1

1001

2001

3001

4001

5001

6001

7001

8001

9001

10001 11001 12001 13001 14001 15001 16001 17001 18001 19001 20001

Fig. 3. A fragment of the WRITE delay graph, 500 threads.

4 Data Analysis. Interdependency Between Performance and Consistency 4.1

Read/Write Latency and Throughput Statistics

Results of the Cassandra performance benchmarking are summarised in Tables 1, 2, and 3 and depicted in Figs. 4 and 5. Figure 4 shows that the average delay for both read and write requests increases almost linearly as the number of threads increases. The average values have been computed over a thousand of requests sent within each thread. A coefficient of variation (CV) is a ratio between the delay standard deviation and its average value. It is used as the measure of uncertainty. This uncertainty is caused by noise (coming from the underlying platforms and technologies) and natural uncertainty and variability which is intrinsic to a cloud environment and the Internet [22–24]. The CV value varies between 34 and 50% on average (see Tables 1 and 2) depending Table 1. Cassandra READ latency statistics. Threads Average latency, us ONE 10 50 100 200 300 400 500 600 700 800 900 1000 Average:

QUORUM (slowdown,% of ONE)

8120 8150 12207 13077 15768 18139 21853 26350 31038 35996 38928 48054 49931 59799 56433 72083 69527 79919 74766 87445 89479 98086 96854 106762

(+1%) (+7%) (+15%) (+21%) (+16%) (+23%) (+20%) (+28%) (+15%) (+17%) (+10%) (+10%) (+15%)

ALL

(slowdown, % of ONE)

8311 14732 21428 29218 41326 52921 65569 77215 84427 93092 107238 117367

(+2%) (+21%) (+36%) (+34%) (+33%) (+36%) (+31%) (+37%) (+21%) (+25%) (+20%) (+21%) (+26%)

Coefficient of variation, % ONE QUORUM ALL 17% 29% 32% 56% 63% 58% 54% 50% 49% 47% 48% 45% 46%

17% 27% 30% 43% 48% 49% 46% 42% 39% 39% 39% 41% 38%

15% 21% 24% 38% 45% 44% 40% 39% 37% 35% 36% 38% 34%

Interplaying Cassandra NoSQL Consistency and Performance

175

Table 2. Cassandra WRITE latency statistics. Threads Average latency, us ONE 10 8941 50 11198 100 14550 200 19825 300 24119 400 29944 500 36831 600 41412 700 48256 800 55629 900 61410 1000 65254 Average:

QUORUM (slowdown,% of ONE) 8988 (+1%) 11317 (+1%) 14747 (+1%) 20464 (+3%) 26078 (+8%) 33338 (+11%) 38784 (+5%) 44240 (+7%) 53276 (+10%) 61712 (+11%) 67329 (+10%) 72726 (+11%) (+7%)

ALL 9116 12591 16235 22064 27458 35319 41364 46963 55192 64856 69615 78333

Coefficient of variation, % (slowdown, % ONE QUORUM ALL of ONE) (+2%) 20% 18% 16% (+12%) 28% 28% 26% (+12%) 37% 32% 28% (+11%) 51% 42% 32% (+14%) 68% 58% 40% (+18%) 61% 55% 49% (+12%) 60% 53% 46% (+13%) 58% 47% 45% (+14%) 56% 51% 46% (+17%) 54% 51% 47% (+13%) 53% 49% 44% (+20%) 51% 49% 46% (+13%) 50% 44% 39%

Table 3. Cassandra READ/WRITE throughput statistics. Threads Average READ throughput, ops/s ONE QUORUM % of ALL % of ONE ONE

Average WRITE throughput, ops/s ONE QUORUM % of ALL ONE

10 50 100 200 300 400 500 600 700 800 900 1000 Average:

1066 4043 6384 8803 10679 12041 12686 13444 13533 13369 13552 13428

1136 3525 6323 7959 9025 9561 9723 10221 9899 10487 10248 10508

1023 3295 5489 7022 7815 7998 8173 8496 8567 9041 8906 9072

(+11%) (+7%) (+15%) (+13%) (+15%) (+20%) (+19%) (+20%) (+16%) (+16%) (+15%) (+16%) (+15%)

959 3181 5180 6271 7011 7313 7438 7586 7956 8322 8281 8398

(+19%) (+11%) (+22%) (+27%) (+29%) (+31%) (+31%) (+35%) (+24%) (+26%) (+24%) (+25%) (+25%)

1053 3861 6228 8361 10334 11294 12200 12548 12714 12572 13051 12985

(+1%) (+5%) (+3%) (+5%) (+3%) (+7%) (+4%) (+7%) (+6%) (+6%) (+4%) (+3%) (+5%)

921 3720 5937 8074 9871 10380 11530 12054 12287 11913 12568 12141

% of ONE (+4%) (+9%) (+8%) (+9%) (+8%) (+16%) (+10%) (+12%) (+10%) (+12%) (+8%) (+11%) (+10%)

on the workload (the higher the workload the higher the latency variation) and consistency settings (the stronger the level of consistency, the lower latency variation). When Cassandra is configured to provide consistency level ONE, the latency of both read and write operations is lower (by 13% and 26% respectively) than the average response time of the ALL consistency setting. The QUORUM setting demonstrates a rational balance between delays and data consistency. Besides, our

176

A. Gorbenko et al.

results confirm the claim that Cassandra has very high write speed especially under the heavy workload. Indeed, write operations are almost 25% faster on average than read requests independently of consistency settings. However, reads are slightly overperforming writes when a number of concurrent requests is below 10. Table 3 shows that Cassandra writes executed under the ONE consistency level reach the maximum throughput of 13552 requests per second. For the QUORUM and ALL consistency settings it fluctuates around 13000 and 12500 requests per second. The maximum throughput of read operations is lower by 21%, 33% and 38% correspondingly. A combination of average delay and average throughput allows us to analyse how the average read and write delays depend on the current workload. When the workload reaches the maximum Cassandra throughput, delays increase in exponential progression (see Fig. 5). Figures 4 and 5 clearly show performance benefits offered by weaker consistency settings in case of the heavy workload. It is also shown that the system is saturated with around 800 threads and delays become highly volatile when Cassandra operates close to its maximal throughput (see Fig. 5). 4.2

Theoretical Regressions of the Cassandra Latency

Table 4. Goodness-of-Fit for read/write regressions. Polynomial regression Order = 2

Order = 3

Order = 4

Linear regression

Benchmarking results reported in Read ALL 0.9984 0.9979 0.9982 0.9982 Tables 1 and 2 are a discrete set of statistics QUORUM 0.9981 0.9988 0.9990 0.9976 0.9980 0.9952 ONE 0.9978 0.9979 measured values. They do not allow 0.9988 0.9970 Write ALL 0.9985 0.9985 system developers to precisely esti- statistics QUORUM 0.9981 0.9983 0.9991 0.9972 mate the database latency throughout 0.9994 0.9978 ONE 0.9982 0.9986 a range of possible workloads. A regression function estimated from the experimental data using least squares (see Fig. 4) will effectively solve this problem. Table 4 reports the R-squared values (often referred to as the goodness-of-fit) estimating extrapolation accuracy of different regression functions. It shows that the polynomial regression of the forth order (2–7) fits our experimental statistics with the high accuracy. 4 3 2 yRead ALL ð xÞ ¼ 9E  08x  0:0002x þ 0:1005x þ 94:648x þ 8870:3

ð2Þ

4 3 2 yRead QUORUM ð xÞ ¼ 9E  08x  0:0002x þ 0:1605x þ 64:185x þ 8761:6

ð3Þ

4 3 2 yRead ONE ð xÞ ¼ 2E  08x þ 2E  05x þ 0:0159x þ 70:201x þ 8002

ð4Þ

4 3 2 yWrite ALL ð xÞ ¼ 6E  08x þ 0:0001x  0:0802x þ 79:259x þ 8608:1

ð5Þ

4 3 2 yWrite QUORUM ð xÞ ¼ 1E  07x þ 0:0002x  0:1073x þ 78:402x þ 7897:9

ð6Þ

4 3 2 yWrite ONE ð xÞ ¼ 1E  07x þ 0:0002x  0:0958x þ 70:308x þ 8141:8

ð7Þ

Interplaying Cassandra NoSQL Consistency and Performance 120000 110000 100000 90000 80000

70000

(a) READ

Delay, us

y = 9E-08x4 - 0.0002x3 + 0.1005x 2 + 94.648x + 8870.3 R² = 0.9984 y = 9E-08x4 - 0.0002x3 + 0.1605x 2 + 64.185x + 8761.6 R² = 0.999 y = -2E-08x 4 + 2E-05x3 + 0.0159x 2 + 70.201x + 8002 R² = 0.9979

60000 50000

ALL QUORUM ONE Poly. (ALL) Poly. (QUORUM) Poly. (ONE)

40000 30000 20000

10000

Number of threads

0

0

120000

100

200

300

500

600

700

800

900

1000

(b) WRITE

Delay, us

110000

400

y = -6E-08x 4 + 0.0001x3 - 0.0802x2 + 79.259x + 8608.1 R² = 0.9988 y = -1E-07x 4 + 0.0002x3 - 0.1073x2 + 78.402x + 7897.9 R² = 0.9991 y = -1E-07x 4 + 0.0002x3 - 0.0958x2 + 70.308x + 8141.8 R² = 0.9994

100000 90000 80000

ALL QUORUM ONE Poly. (ALL) Poly. (QUORUM) Poly. (ONE)

70000 60000 50000 40000 30000 20000 10000

Number of threads

0 0

100

200

300

400

500

600

700

800

900

1000

Fig. 4. Average Cassandra delay depending on the current workload: (a) reads; (b) writes.

177

where yRead ALL , Read Read Write yQUORUM , yONE , yALL , Write yWrite QUORUM , yONE – the Cassandra read/update response time [microsecond] for different consistency settings; x – the number of threads (e.g. concurrent requests). Obviously, regression functions (2–7) and their coefficients are unique for our experimental setup and will not exactly match other installations. System developers performing predictive modelling and forecasting system performance will need to find equations which should theoretically fit their own benchmarking results using a variety of tools and APIs. If needed, more sophisticated regression techniques, like multivariate adaptive regression splines, support vector regression or arti-

ficial neural networks can be also applied.

5 Finding the Optimal Settings Guarantying the Strong Data Consistency As it was discussed in Sect. 2.2 Cassandra can guarantee the strong data consistency model if a sum of replicas written and read is higher than the replication factor. This means that for a three-replicated system (which is a default standard for many largescale distributed systems including Facebook, Twitter, etc.) there are 6 possible read/write consistency settings guaranteeing the strong data consistency: 1) ‘Read ONE – Write ALL’ (1R-3W); 2) ‘Read QUORUM – Write QUORUM’ (2R-2W); 3) ‘Read ALL – Write ONE’ (3R-1W); 4) ‘Read QUORUM – Write ALL’ (2R-3W); 5) ‘Read ALL– Write QUORUM’ (3R-2W); 6) ‘Read ALL – Write ALL’ (3R-3W).

178

A. Gorbenko et al.

Besides, the two settings: ‘Read ONE – Write QUORUM’ (1R-2W) and ‘Read QUORUM – Write ONE’ (2R-1W) provide the 66.6% consistency confidence. Finally, the ‘Read ONE – Write ONE’ (1R-1W) setting can guarantee only the 33.3% consistency confidence. If a system developer 120000 (a) READ would like to ensure that a 110000 Delay, us read operation always 100000 reflects the most recent 90000 update he/she can opt for 80000 one of the first six settings. 70000 However, our experiments 60000 ALL QUORUM clearly show that the fewer 50000 ONE replicas are invoked the 40000 30000 faster Cassandra performs 20000 read/write operations. Thus, 10000 in practice one should Number of requests pes second 0 choose between the three 0 2000 4000 6000 8000 10000 12000 14000 following settings: 1R-3W, 2R-2W and 3R-1W. 120000 (b) WRITE Delay, us As all three settings 110000 guarantee the strong consis- 100000 90000 tency a system developer 80000 could be interested in ALL 70000 choosing one providing the QUORUM 60000 minimum response delay on ONE 50000 average. In turn, the 40000 response delay and the Cas30000 sandra throughput depend 20000 on the current workload and 10000 the ratio between read/write Number of requests pes second 0 requests. 0 2000 4000 6000 8000 10000 12000 14000 Using regression functions (2–7) we can predict Fig. 5. Average Cassandra delay vs. average throughput: the average Cassandra (a) reads; (b) writes. latency under the mixed read-write workload: Write y1R3W ð xÞ ¼ PRead  yRead ONE ð xÞ þ PWrite  yALL ð xÞ

ð8Þ

Write y2R2W ð xÞ ¼ PRead  yRead QUORUM ð xÞ þ PWrite  yQUORUM ð xÞ

ð9Þ

Write y3R1W ð xÞ ¼ PRead  yRead ALL ð xÞ þ PWrite  yONE ð xÞ

where PRead , PWrite – probabilities of read/update requests, PRead + PWrite ¼ 1.

ð10Þ

Interplaying Cassandra NoSQL Consistency and Performance

179

Table 5 provides some estimates of the Cassandra latency for different settings guaranteeing the strong consistency under a mixed read/write workload using (2–7) and (8–10). Table 5. Estimated Cassandra delay under different workloads depending on read/write ratio. Threads Read/Write = 10/90%

10 50 100 200 300 400 500 600 700 800 900 1000

Read/Write = 30/70%

Read/Write = 50/50%

Read/Write = 90/10%

1R3W

2R2W

3R1W

1R3W

2R2W

3R1W

1R3W

2R2W

3R1W

1R3W

2R-2W

9324 12304 15794 22281 28532 34928 41709 48976 56686 64655 72561 79937

8746 11651 15027 21286 27402 33804 40714 48149 55914 63610 70628 76154

8935 11679 14863 20741 26428 32322 38628 45371 52385 59319 65636 70612

9187 12137 15661 22397 29051 35926 43204 50952 59115 67523 75888 83803

8896 11806 15372 22433 29622 37104 44930 53035 61239 69247 76650 82922

9133 12157 15820 22917 29938 37075 44425 51989 59672 67285 74541 81061

9049 11970 15529 22512 29570 36924 44700 52928 61544 70392 79216 87670

9045 11960 15717 23580 31842 40405 49146 57921 66564 74885 82672 89691

9331 12636 16777 25093 33447 41828 50222 58607 66959 75250 83447 91510

8774 11637 15264 22743 30609 38921 47690 56879 66403 76128 85872 95403

9345 9728 12269 13593 16407 18691 25874 29446 36283 40466 47006 51335 57578 61815 67694 71843 77214 81534 86160 91182 94716 101258 103228 112407

3R-1W

* delays are measured in [us]; ** the minimum values are underlined.

100 90 80 70 60 50 45 40 35 30 25 20 15 10 5 100 100 200 300 400 500 600 700 800 900 1000

9000

The Cassandra database is known for extremely fast writes/updates. Thus, one might decide to use 1R-3W as the best setting, among others (e.g. 2R-2W, 3R-1W), which guaranty the strong consistency. However, as follows from Table 5, the 1R-3W setting does not provide the lowest response time in all possible scenarios. For instance, if the probability of reads is less than 0.5 (50%), the 2R-2W consistency setting provides the lowest delay for workloads less than 50 threads. However, with the increase of the percentage of read requests (higher 105000 95000 than 50%) the 1R-3W set85000 ting becomes optimal inde75000 pendently on the current 65000 55000 workload. Increasing the 45000 number of requests per 35000 second and the percentage 25000 of read requests make the 15000 5000 2R-2W and especially the 3R-1W setups very inefficient demonstrating the exponential grow of the Cassandra latency.

Fig. 6. Workload domains with the optimal consistency settings.

180

A. Gorbenko et al.

The 3R-1W setup still provide the lowest delay in heavy ‘write mostly’ workloads when the percentage of read requests is less than 25%. The 3D surface plot shown in Fig. 6 demonstrates the domains in the input workload where the particular consistency setting provides the best performance. It is clearly shown that opting for 2R2W and 3R-1W in certain scenarios would allow us to improve the Cassandra performance up to 14% on average (see Fig. 7). This information can be extremely useful for system developers allowing them to dynamically change consistency set- Fig. 7. Cassandra‘s average speedup (latency decrease as tings of read and write compared to 1R-3W) due to optimum consistency settings. requests in an optimal way still guaranteeing the strong data consistency.

6 Experimental-Based Methodology for Optimal Coordination of Consistency Settings One might note that data reported and regression functions used in the previous sections are unique for our experimental setup and might not exactly match other installations. This is generally true. It is obvious that the Cassandra latency depends on many factors including the size and structure of the column family, used hardware, number of nodes and their geographical distribution, etc. In this section we generalize the experimental data reported in the paper by proposing a methodology to be used by system engineers for predicting the Cassandra latency and coordinating consistency settings at run time for read and write requests in an optimal manner. 6.1

Methodology

The methodology employs a benchmarking approach to quantify the Cassandra latency and throughput. The benchmarking here aims at estimating the system performance in order to find how efficient the system can serve the certain mixed workload when using different consistency settings. The methodology consists of the following steps (steps

Interplaying Cassandra NoSQL Consistency and Performance

181

1–5 can be performed once as a part of system load testing; step 6 should be performed at run-time during system operation): 1. Deploying and running a Cassandra database in a real production environment. 2. Modifying the YCSB workloads to execute application-specific read and write queries. This helps evaluate the Cassandra performance in the realistic application scenarios. 3. Benchmarking the Cassandra database under different workloads (threads per second) with different consistency settings following the benchmarking scenario described in Sects. 3.1–3.2. 4. Finding regression functions that accurately interpolate the average read/write latency measured experimentally depending on the workload for different consistency settings (see Sect. 4.2). 5. Identify the optimal consistency settings by using functions (8)–(10) to provide the minimum Cassandra latency depending on the workload and the ratio of read and write requests (e.g. workload mix) as described in Sect. 5. As a result, system developers will be able to identify the workload domains with the optimal consistency settings (e.g. see Fig. 6). 6. Monitoring the current workload and the read/write ratio during system operation and setting the optimal consistency taking into account the workload domains identified at the previous step. The proposed methodology enables a run-time optimization of consistency settings to achieve the maximum Cassandra performance and still guarantee the strong consistency. 6.2

Methodology Verification

To verify the proposed methodology we benchmarked the Cassandra performance under mixed read/write workloads (Read/Write = 10/90%, 30/70%, 50/50% and 90/10%) using the same methodology described in Sects. 3.1–3.2. Obtained experimental results were compared with the estimated data reported in Table 5. Table 6 shows a deviation between experimentally measured and estimated (see Table 5) delays. Cells with consistency settings that provided the minimum latency among {1R-3W, 2R-2W and 1R-3W} are underlined in the table. It is shown that accuracy between experimental and estimated data are considerably high. A deviation never exceeds 17% (the worst case: Read/Write = 90/10%; threads = 10; 3R-1W) and is reducing with the increase of a number of threads. Finally, by matching cells with the underlined values in Tables 5 and 6 we could see that the proposed methodology suggests optimal consistency settings in 92%.

182

A. Gorbenko et al.

Table 6. A deviation between estimated and experimentally measured Cassandra delays under different workloads depending on read/write ratio. Threads Read/Write = 10/90%

Read/Write = 30/70%

Read/Write = 50/50%

Read/Write = 90/10%

1R-3W 2R-2W 3R-1W 1R-3W 2R-2W 3R-1W 1R-3W 2R-2W 3R-1W 1R-3W 2R-2W 10

−3.4%

1.8% −0.6%

−4.2%

−1.4% −1.1%

−1.8% −4.3%

−5.0%

−5.6% −8.2%

−6.7%

3R-1W

−13.5% −16.2%

50

2.0%

2.7%

0.3%

0.8%

3.5%

1.9%

2.5%

5.0%

4.9%

100

2.4%

0.4%

2.5%

2.7%

2.5%

4.8%

3.0%

4.4%

6.7%

3.5%

7.8%

9.9%

200

−1.1%

−1.1%

0.1%

−1.8%

−0.9% −1.2%

−2.5%

−0.7% −2.3%

−4.0%

−0.4%

−4.1%

300

−2.6%

−1.2% −2.3%

−1.8%

−2.0% −2.2%

−1.1%

−2.6% −2.2%

0.2%

−3.7%

−2.2%

400

2.1%

2.9% −0.2%

1.3%

1.7% −0.6%

0.5%

0.7% −1.0%

−0.9%

−0.9%

−1.4%

500

1.2%

0.4%

2.7%

1.7%

0.4%

2.3%

2.1%

0.3%

1.9%

2.8%

0.2%

1.4%

600

−2.2%

−2.4% −0.8%

−2.3%

−0.8%

0.3%

−2.4%

0.4%

1.2%

−2.5%

2.3%

2.4%

700

−0.1%

0.0% −1.0%

0.6%

0.0% −1.0%

1.3%

0.0% −0.9%

2.5%

0.1%

−0.9%

0.3% −0.6%

−0.8%

−0.4% −1.2%

−2.1%

800

1.8%

1.1%

0.1%

0.5%

900

−1.3%

−0.3%

0.5%

−0.4%

1000

0.3%

0.0% −0.2%

0.1%

−0.1%

0.8%

0.4%

0.0% −0.2%

−0.1%

5.5%

−3.2%

−1.5%

1.0%

1.9%

0.3%

1.4%

0.1% −0.2%

−0.4%

0.1%

−0.2%

0.0%

* deviations for the measured minimum delays are underlined.

7 Conclusion and Lessons Learnt Our work experimentally investigates the interplay between different consistency settings and performance of the Cassandra NoSQL database. This is an important part of the fundamental trade-off between Consistency, Availability and Partition tolerance which is in the very nature of globally-distributed systems and large-scale replicated data storages. The reported results show that used consistency settings can significantly affect the Cassandra response time and throughput that have to be accounted during system design and operation. The strong data consistency settings can increase database latency by 26% and degrade its throughput by 25% on average for read requests and by 13% and 10% for write requests correspondingly. The Cassandra database offers developers a unique opportunity to tune the consistency setting for each read or write request. Besides, it is possible to guarantee the strong data consistency by coordinating consistency settings for read and write requests to ensure that the sum of nodes written and read is greater than the replication factor. Developers of real Big-Data applications where Cassandra is used as a NoSQL storage are advised to benchmark the performance of different consistency settings under different workloads and for different ratios between read and write requests. This will allow them to identify the domains in the space of the input workload where the certain consistency setting provides the minimum latency. One of our major findings is the fact that the optimal consistency settings maximizing the Cassandra performance significantly depend on the current workload and a ratio of read and write requests. They confirm our claim that none of consistency settings always guaranties the minimum latency. There could be no “cleanly” defined workload mixes which approximate the operational system workloads to make the best off-line decisions. The real workload mix can evolve and change over time impacting

Interplaying Cassandra NoSQL Consistency and Performance

183

the desirable Cassandra settings. The proposed methodology aims at choosing the optimal consistency setting dynamically at run-time by monitoring the current workload mix and making use of the benchmarking results collected offline during system load testing.

References 1. Pritchett, D.: Base: an acid alternative. ACM Queue 6(3), 48–55 (2008) 2. Brewer, E.: Towards robust distributed systems. In: 19th Annual ACM Symposium on Principles of Distributed Computing, Portland, USA (2000) 3. Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available partition-tolerant web services. ACM SIGACT News 33(2), 51–59 (2002) 4. Github: Benchmarking Cassandra and other NoSQL databases with YCSB. https://github. com/cloudius-systems/osv/wiki/Benchmarking-Cassandra-and-other-NoSQL-databaseswith-YCSB 5. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: 1st ACM Symposium on Cloud Computing, Indianapolis, Indiana, USA, pp. 143–154 (2010) 6. Hecht, R., Jablonski, S.: NoSQL evaluation. A use case oriented survey. In: IEEE International Conference on Cloud and Service Computing, Washington, USA, pp. 336–341 (2011) 7. Abramova, V., Bernardino, J., Furtado, P.: Testing cloud benchmark scalability with Cassandra. In: IEEE 10th World Congress on Services, Anchorage, USA, pp. 434–441 (2014) 8. Klein, J., Gorton, I., Ernst, N., Donohoe, P., Pham, K., Matser, C.: Performance evaluation of NoSQL databases: a case study. In: 1st ACM/SPEC International Workshop on Performance Analysis of Big Data Systems, Austin, USA, pp. 5–10 (2015) 9. Haughian, G., Osman, R., Knottenbelt, W.J.: Benchmarking replication in cassandra and MongoDB NoSQL datastores. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9828, pp. 152–166. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44406-2_12 10. Farias, V., Sousa, F., Maia, J., Gomes, J., Machado, J.: Regression based performance modeling and provisioning for NoSQL cloud databases. Fut. Gener. Comput. Syst. 79, 72– 81 (2018) 11. Karniavoura, F., Magoutis, K.: A measurement-based approach to performance prediction in NoSQL systems. In: 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecom. Systems, MASCOTS 2007, Banff, Canada (2017) 12. Cruz, F., et al.: Resource usage prediction in distributed key-value datastores. In: Jelasity, M., Kalyvianaki, E. (eds.) DAIS 2016. LNCS, vol. 9687, pp. 144–159. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39577-7_12 13. Guerraoui, R., Pavlovic, M., Seredinschi, D.A.: Trade-offs in replicated systems. IEEE Bull. Tech. Comm. Data Eng. 39(1), 14–26 (2016) 14. Brewer, E.: CAP twelve years later: how the “rules” have changed. Computer 45(2), 23–29 (2012) 15. Gorbenko, A., Romanovsky, A.: Time-outing internet services. IEEE Secur. Priv. 11(2), 68– 71 (2013) 16. Abadi, D.J.: Consistency tradeoffs in modern distributed database system design. IEEE Comput. 45(2), 37–42 (2012)

184

A. Gorbenko et al.

17. Tarasyuk, O., Gorbenko, A., Romanovsky, A., Kharchenko, V., Ruban, V.: The impact of consistency on system latency in fault tolerant internet computing. In: Bessani, A., Bouchenak, S. (eds.) DAIS 2015. LNCS, vol. 9038, pp. 179–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19129-4_15 18. Fekete, A.D., Ramamritham, K.: Consistency models for replicated data. In: Charron-Bost, B., Pedone, F., Schiper, A. (eds.) Replication. LNCS, vol. 5959, pp. 1–17. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11294-2_1 19. Burckhardt, S.: Principles of eventual consistency. Found. Trends Program. Lang. 1(1–2), 1– 150 (2014) 20. DataStax, Inc.: Apache Cassandra 2.1 for DSE. About data consistency. https://docs. datastax.com/en/cassandra/2.1/cassandra/dml/dmlAboutDataConsistency.html 21. Cooper, B.F.: Running a Workload. https://github.com/brianfrankcooper/YCSB/wiki/ Running-a-Workload. Accessed 2 July 2013 22. Gorbenko, A., Kharchenko, V., Tarasyuk, O., Chen, Y., Romanovsky, A.: The threat of uncertainty in Service-Oriented Architecture. In: RISE/EFTS Joint International Workshop on Software Engineering for Resilient Systems, SERENE 2008, Newcastle, pp. 49–54 (2008) 23. Mezni, H., Aridhi, S., Hadjali, A.: The uncertain cloud: state of the art and research challenges. Int. J. Approximate Reasoning 103, 139–151 (2018) 24. Gorbenko, A., Romanovsky, A., Tarasyuk, O., Kharchenko, V., Mamutov, S.: Exploring uncertainty of delays as a factor in end-to-end cloud response time. In: 9th European Dependable Computing Conference, EDCC 2012, Sibiu, Romania, pp. 185–190 (2012)

Application of Extreme Value Analysis for Characterizing the Execution Time of Resilience Supporting Mechanisms in Kubernetes Szil´ ard Boz´oki1(B) , Jen˝ o Szalontai1 , D´ aniel Peth˝ o1 , Imre Kocsis1 , 1 2 acs2 Andr´ as Pataricza , P´eter Suskovics , and Benedek Kov´ 1

Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tud´ osok krt. 2, Budapest 1117, Hungary {bozoki,szalontai.jeno,petho.daniel,ikocsis,pataric}@mit.bme.hu 2 Ericsson Hungary, Budapest, Hungary {peter.suskovics,benedek.kovacs}@ericsson.com

Abstract. Containerization, and container-based application orchestration and management - primarily using Kubernetes - are rapidly gaining popularity. Resilience in such environments is an increasingly critical aspect, especially in terms of fault recovery, as containerization-based microservices are becoming the de facto standard for soft real-time and cyber-physical workloads in edge computing. The Worst Case Execution Time (WCET) of platform-supported recovery mechanisms is crucial for designing the resilience of applications, influencing, e.g., dimensioning and the design and parameterization of recovery policies. However, due to the complexity of the underlying phenomena, establishing such WCET characteristics is generally feasible only empirically, carrying the risk of under- or overapproximating recovery time outliers, which, in turn, are crucial for assurance design. Measurement-Based Probabilistic Timing Analysis (MBPTA) aims at estimating the Worst-Case Execution Time (WCET) based on measurements. A technique in the MBPTA “toolbox”, Extreme Value Analysis (EVA) is a statistical paradigm dealing with approximating the properties of extremely deviant values. This paper demonstrates that container restarts, a key platform mechanism in Kubernetes, exhibits rare extreme execution time values. We also demonstrate that characterizing these rare values with EVA can lead to at least as good or better approximations as classic distribution fitting - and for the practice importantly, without distribution assumptions.

The results reported on in this paper partially rely on previous results of the EFOP3.6.2-16-2017-00013 national project at the Budapest University of Technology and Economics and a joint research project with Ericsson. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 185–199, 2020. https://doi.org/10.1007/978-3-030-58462-7_15

186

S. Boz´ oki et al. Keywords: Kubernetes · Resilience · Worst Case Execution Time · Measurement-Based Probabilistic Timing Analysis · Extreme Value Analysis

1

Introduction

Since the advent of centralized computing, data-center related technologies have been rapidly evolving. Cloud computing was a cornerstone of the commercialization of such technologies, which had its foundation in hardware resource sharing via virtualization technologies. Recently a finer granular architectural virtualization appeared with containerization, which is essentially a form of operating system sharing. The vanguard of this evolutionary process is the widespread adaptation of containerization, especially Docker containers and Kubernetes based technologies. With the maturing of Kubernetes, its solutions are gaining popularity among applications with stricter requirements, such as soft real-time constraints. This gain in popularity, combined with the next era of 5G and edge computing requires a better understanding of the system resiliency aspects, especially the timing aspects of the available error recovery mechanisms. With any kind of cloud computing technology, two distinct viewpoints have to be addressed: the cloud user view and the cloud provider view. An exemplary scenario could be a Virtual Network Function application with Service Level Agreement (SLA) requiring 1-sec restoration in case of failure [9]. What can the cloud user do to meet the SLA requirements? How should the cloud user dimension the infrastructure requirements? Meanwhile, what can the cloud provider do to support such applications? How should the infrastructure be dimensioned? These questions are the two sides of the same coin; and to answer them requires a better understanding of the timing aspects of the repair and restoration of container-based services in a Kubernetes environment. Generally speaking, worst-case execution time (WCET) estimates for recovery mechanisms have always been crucial for resilient system design. Traditionally, WCET estimates were based on analytical methods. However, with the increasing complexity of the systems, analytical methods became impractical. This analytical impracticality is precisely the case for cloud computing technologies, with all the different services and solutions that make up the whole ecosystem. In such analytically impractical cases, Measurement-Based Probabilistic Timing Analysis (MBPTA) could be used as an alternative because it does not require a detailed understanding of the underlying system implementation [5,17]. In the field of Probabilistic Timing Analysis, Extreme Value Analysis (EVA) is the dominant statistical modeling paradigm, as EVA is a mathematically established paradigm geared at modeling the extremely deviant values, the values which dominate the WCET. This paper investigates the characteristics and characterization of the timeliness aspects of some of the fault tolerance mechanisms available in Kubernetes. Our contribution focused on the practical understanding of Docker container restart times through measurements.

Application of EVA for Kubernetes Resilience Mechanism Execution Times

187

This paper is organized as follows. Section 2 describes the context of the investigation. Section 3 describes the experiments. Sections 4 presents the theory. Section 5 describes classic analytical results. Section 6 describes EVA results. Section 7 describes caveats to the validity of the results. Section 8 presents viable research fields spawning from our results. Finally, Sect. 9 presents our conclusions.

2

Kubernetes and Service Quality Assurance

Integrating modern kernel capabilities, containerization technologies as Docker essentially enable logically isolated operating system (and thus resource) sharing between applications on machines and clusters of machines. Conceptually, containers are similar to Virtual Machines (VMs), but generally far more efficient resource overhead-wise. Consequently, containerization is increasingly a foundational component in the architecture of contemporary microservice-based applications: a mesh of communicating “micro”-services can be practically implemented by encapsulating each microservice implementation in a lightweight, standalone container image. Containerization based microservice-architectures enable increased agility in many operational management tasks, from scaling through network traffic management to software upgrades (compared to “monolithic” virtual machines). Kubernetes is an open-source framework for the deployment, scaling, and management of containerized applications in cluster settings (regularly over “clusters” of VMs). Its varied and continuously growing capability set includes, e.g., traffic routing based on service topology, load balancing, automatic horizontal and vertical scaling of containers, batch execution of jobs, and automated rollout and rollback of applications. In Kubernetes, the smallest deployable units of computing are called pods - with some simplification, groups of containers that have to be co-deployed to a single host machine. Kubernetes can also be the basis of a broadly Infrastructure-as-a-Service (IaaS) like Container-as-a-Service (CaaS) cloud computing service model (Fig. 1), where the tenant specifies the constituent containers of her application and configures/requests additional platform services (e.g., container replication, scaling and deployment policies or integration with a continuous delivery pipeline) [12]. 2.1

Application Service Quality Assurance

Kubernetes as a platform provides a number of operational management mechanisms that can be utilized for service quality assurance purposes. We use the broad term “service quality assurance” due to the fact that these tools can all be part of an application resilience, or dependability, or performance, or performability – and situationally, even security – strategy. Such platform mechanisms include the following. – Health checks and automatic restarts of pods and containers. – Maintaining a given level of replication for pods in the face of host machine faults.

188

S. Boz´ oki et al.

Fig. 1. A schematic view of Kubernetes-based CaaS.

– Maintaining pod co-deployment constraints (in Kubernetes terminology affinity and anti-affinity rules for “scheduling”). – Load-balancing network traffic across replicas. – Performance isolation of containers (e.g., through limiting CPU usage). – Rule-based horizontal scaling of replica sets. These platform capabilities implement standard – and vital – patterns in cloud computing resilience [1], and fault-tolerant computing in general [2,10]. Middleware-like solutions extend these capabilities; e.g., Istio as a service mesh manager includes “circuit breakers” (network isolation of failed host after a number of retries) and network fault injection for resilience testing. Naturally, the deployment of application-level mechanisms – as, e.g., HAProxy for reliable and highly customizable load balancing – also remains an option. 2.2

Performance of Service Quality Assurance Mechanisms

Cloud computing quality modeling has been recognizing for a long time that whenever we aim at assuring application dependability (resilience, performance, . . . ) using platform services as the ones enumerated above, we expose ourselves to the possible imperfections of those. For instance, if we assume a restoration time upper threshold for containers: the maximum time from container crash/livelock/. . . to restored containerhosted service through detection and container restart, then the platform exceeding our threshold assumption will potentially have a service quality level impact. This is a significant risk, as containerization is increasingly the platform of choice for – for now at least, soft – real time workloads, especially in the telecommunications and mobile edge computing domains. The problem is amplified by the

Application of EVA for Kubernetes Resilience Mechanism Execution Times

189

fact that such events in general tend to be rare. As such, their quantitative characterization poses its own challenges. 2.3

Target of Investigation: Container Restart Time

As an initial step, in this paper, we focus on the empirical investigation and modeling problems specifically of the container restart time upon (process exitstyle) container failure component of service restoration time. This time includes failure detection on part of Kubernetes, “cleanup” of the failed container, issuing a start command to Docker and the actual start of the container instance. Restart time, however, does neither include the applicationspecific initialization activities of the container-hosted application, nor the “normalization” (restoration) of the service to be provided. The main underlying rationale for focusing on the container restart time are twofold. On the one hand, from the application assurance design point of view, the container restart activity is under the sole control of the platform (and platform provider). Thus, if a provider wants to associate SLAs with provided Kubernetes resilience mechanisms, this is the broadest platform primitive they shall use; and for tenants, this is the part of the overall error handling activity which is completely out of their control (as opposed to, e.g., the “speed” and predictability of application initialization). On the other hand, container restarts can not be expected to be rare events in a CaaS setting. Quite the contrary: similarly to classic IaaS, applications should expect host and container failures to be a probable occurrence and implement service quality assurance with scale, platform mechanisms and application-level defenses. (Note that it is not rare for most of the containers in a service ensemble running externally developed, off-the-shelf software.) The simplest and most common platform resilience mechanism is, arguably, in-place container restart; thus, it is worthwhile to look at it in an application-independent way.

3

Experiments

For our measurement setup, we used a dedicated Kubernetes cluster of lower mid-range servers (with Docker for containerization). Test orchestration and fault injection were deployed to a single physical machine and the test target pod to another one (enforced by using Kubernetes pod affinity rules). Orchestration and fault injection were implemented using custom scripts, with the docker kill command used for actual fault injection. We used the Docker event log for tracking the lifecycle of the fault-injected containers and to compute the restart times (via the Docker event timestamps). We performed three fault injection campaigns on three different physical machines. In each campaign, a pod with a single BusyBox container was set up, killed, the restart time measured and after we started a new pod, we injected the fault again, and so on. Each campaign comprised 10000 restarts. We will refer to the three campaigns as C1 , C2 and C3 .

190

S. Boz´ oki et al.

Clearly, while the dominant (“average”) behavior for all three nodes is roughly the same (Fig. 2a), there seem to be very significant outliers, which, in addition, are different for the three machines. Figure 2b, Fig. 2c and Fig. 2d depict the histograms of the observed container restart times for the three campaigns. The red line on each histogram is a Normal distribution probability density function fitted on the data set using maximum likelihood estimation.

(a) Obs. indices and restart times

(b) Restart time histogram: C1

(c) Restart time histogram: C2

(d) Restart time histogram: C3

Fig. 2. Restart time observations for the three campaigns. (Color figure online)

By focusing on C2 only, the three measured outliers become starkly visible (Fig. 2a). If we look at the observation index of the three outliers, we can rule out a transient burst of container restart time increase, because they are firmly

Application of EVA for Kubernetes Resilience Mechanism Execution Times

191

apart. Owing to this, the three large values can be considered systematic large values that should be included in the WCET estimation (Table 1). Table 1. Restart time descriptive statistics for the three campaigns Dataset Obs. # Min (ms) Mean (ms) Q3 (ms) Max (ms)

4

C1

10000

1897

2823

3028

4895

C2

10000

1733

2765

2953

7515

C3

10000

1783

2460

2600

3521

Measurement-Based Probabilistic Timing Analysis

Timing analysis based WCET estimation has two major paradigms: Statistical Timing Analysis (STA) and Measurement-Based-Timing-Analysis (MBTA) [5, 17]. STA requires the total, fine-grained knowledge of the system internals, and uses theoretical considerations for WCET estimation. For contemporary systems, as a rule, sheer complexity makes such timing analyses infeasible [8]. This complexity issue can be addressed by taking an empirical, evidence-based approach, assuming the representativeness of the measurement data can be assured. MBTA is a method built on observed timing data [3,7]. The output of MBPTA is a probabilistic model: a fitted pWCET curve which can provide the probability of exceeding a given execution time. Timeliness guarantees, especially in HA services need to take into account such rare fault events which result occasionally in extreme long repair times and transients due to a number of different root causes – while the majority of faults manifest only in short failures. This way, any MBPTA-like analysis has to properly address such outliers (extreme values). Extreme Value Analysis (EVA) is a mathematical discipline explicitly targeting such problems and is dominant in MBPTA [4]. That said, other approaches have been also explored, as copulas and Markov models [5]. Moreover, there are different tools implementing MBPTA [13,15,16]. However, EVA fits best with our technical metrology background, as it directly reflects on the engineering thinking of separating “normal” cases from the “extreme” ones. 4.1

Extreme Value Analysis

Extreme Value Analysis is a branch of statistics focusing on modeling the extremely deviant values, those values which are usually labeled as outliers and filtered out in a traditional model fitting context [6,11,14]. The classic use case of EVA is in hydrology, where it’s used to answer questions like the following. – How tall of an embankment is needed to survive the floods of the next time period (for example, 100 years) with a given probability?

192

S. Boz´ oki et al.

– What is the probability of a given embankment surviving the floods of the next time period (for example, 100 years)? – What is the expected maximal level of a flood in the next time period (for example, 100 years) based on historical data? The core idea behind EVA is that the values deemed extreme should represent the extreme values. Thus the key to EVA is the selection of such extreme values for distribution fitting. Very broadly speaking (for the specifics, see the cited references), EVA selects an appropriate threshold in the observations and estimates the statistical distribution over that value; this way, it can characterize “long/heavy tails” much better than standard distribution fitting on the whole data set could. The two fundamental methods of EVA are Annual Maxima Series (AMS, also known as the block maxima method) which bins the dataset into equal length blocks, and Peak Over Threshold, which selects the largest values (Fig. 3).

Fig. 3. EVA methods compared based on how the representative large (extreme) values are selected relative to the whole dataset. The selected values are depicted in green. (Color figure online)

AMS first bins the data according to a predefined aggregation interval and subsequently fits the distribution to the maxima. The AMS method takes into account the ordering of the data and underweights, for instance, burst-like series of extreme values. As in IT infrastructures such phenomena are not uncommon for transient disturbances emanating from transient external disturbances, in the next chapter we will apply POT. The POT approach first selects the extreme values by selecting all the values above a threshold and then fits a Generalised Pareto Distribution (GPD) to the selected values. Under-threshold observations are not taken into account. The threshold is an input parameter that has to be carefully selected. The output of the POT method is the distribution of the above-threshold values (Fig. 4).

Application of EVA for Kubernetes Resilience Mechanism Execution Times

193

Fig. 4. EVA methods compared based on how the representative large (extreme) values are selected algorithmically.

From a theoretical point of view, if a given set of criteria holds true, then Extreme Value Theory states that the rate (speed) at which the tails of distributions diminish have fixed levels: – The fastest is the Weibull, these distributions have thin tails with a finite right endpoint. Example class members are: uniform, beta and reverse Burr distributions. – The reference point is the Gumbel distribution, which has an exponential tail. Example class members are: normal, gamma, log-normal, exponential distributions. – The slowest is the Fr´echet, these distributions have heavy-fat tails, where the survival function decreases as a power function. Example class members are Pareto, Student, Cauchy, Burr distributions. 4.2

Service Quality Assurance Mechanism Characterization

In the context of repair-associated service outages and transient disturbances associated with imperfect fault tolerance, we can certainly use EVA in a straightforward way: as a tool for long-tail distribution approximation. That being said, it is conceptually useful to translate the “classic” usage patterns of EVA to our domain of investigation, in order to demonstrate its applicability directly from the point of view of service quality assurance. For demonstrational purposes, let us introduce the following simple notation. – Let td denote the general notion of “disturbance time”, encompassing time to container restart, or time to service recovery and the time of service transients in a general sense. (Note that using multivariate versions of EVA, we could incorporate the modeling of maximum disturbance severity, too; this is left for future work here.)

194

S. Boz´ oki et al.

– Let ta denote an “aggregation interval”; the minimum unit of time that will be the basis of some of our queries. Theoretical reasons as well as the practice of quality assurance management (see, e.g., SLAs) dictate that td  ta . Example: the “worst disturbance” in an hour-interval. – Let Π denote a (future) time horizon under investigation, with Π = N ∗ ta , N ∈ N+ . – As usual, p denotes probability. Classic use cases of EVA can be reinterpreted in our technical setting the following way. Probability-Based Maximum Disturbance Time Estimates. Given a probability, what is the expected maximum disturbance time for a time horizon? Such queries can inform, for instance, risk-based design of assurance mechanism: if I’m willing to accept, e.g., “container restart time runoffs” 0.5% of the time, what is the maximum expected disturbance time I have to count with in the remainder of the cases. (1) fpr : p, Π → maxΠ (td ) Disturbance Time Threshold Non-violation Probability Estimates. For a given time horizon, what is the probability of all disturbance times remaining under a τ threshold value? This is essentially asking that what is the probability of a (usually conservative) disturbance time estimate used during service quality assurance strategy design “being wrong”. fthr : τ, Π → P (maxΠ (td ) < τ )

(2)

Maximum Expected Disturbance Time. For the next time period. Such queries can be especially useful when the service quality assurance strategy does not tolerate mechanisms that work “fast enough most of the time”; instead, we need estimates that are statistically robust, if potentially pessimistic. fmax : Π → maxΠ (td ) 4.3

(3)

N-Interval Return Level

Is the td that’s reached on average once every N intervals. (“the size of the thousand year flood”). This measure is directly meaningful in our setting, too, in a “fault rate” like interpretation.

5

Simple Distribution Fitting

In this section, we first present a “classic” approach to characterizing the container restart times.

Application of EVA for Kubernetes Resilience Mechanism Execution Times

195

A na¨ıve approach to MBPTA would be selecting and fitting a well known probability distribution function (PDF) to the whole data set, for which wellknown techniques and tests as well as hints for the right distribution to use exist. One such statistical measure is the Kurtosis value of a (one dimensional) empirical observation set. Kurtosis is a measure of “extremity”, expressing the “long/heavy tailedness” of a distribution. Higher Kurtosis means more deviations, longer and heavier tails. The Kurtosis of any univariate normal distribution is three, thus higher than three Kurtosis implies that the distribution produces more frequent and more extreme values than the normal distribution. However, a low (3 or less) Kurtosis does not automatically mean that the normal distribution appropriately approximates the extremal characteristics of an observed phenomenon. Extreme values can occur with low Kurtosis. (Note that EVA, which we will apply in the next section, is capable to cover cases where the distribution is actually normal, too, through the use of the Gumbel distribution family.) In the other direction, a high Kurtosis value does indicate the presence of a “long tail”. Table 2 depicts the restart time Kurtosis values for the three campaigns. In order to evaluate whether a normal distribution is appropriate to characterize the extreme values, we performed normal distribution fitting (see earlier in Sect. 3) and computed the probability of a random sample from the distribution being as large or larger than the largest value seen during the campaign (denoted = P (x ≥ maxC (tobs by pmax C d )). Additionally, we also computed the expected number of values that are x ≥ maxC (tobs d ), assuming as many samplings as observations are in the campaign and drawing from the fitted normal distribution (E(#) column). In our case, by looking at the histograms and the Kurtosis values in Table 2, the standard distribution seems to be a good candidate for C1 and C3 . Meanwhile, the Kurtosis value around 10 for C3 indicates more severe extremity than the standard distribution. For C1 and C2 , the E(#) metric shows that the normal distribution is underestimating the probability of the large values compared to empirical probabilities by orders of magnitude. For both cases, the expected number of large values is practically near zero - while the observations do include one such value! In C3 , the expected number of large values was within range that is comparable to the measurement results. and E(#) columns will serve as a basis for comparing these and The pmax C the EVA results in the next section. In both cases, for demonstration purposes, and E(#) to make them easily comparable. point estimates are used for the pmax C Confidence intervals could have been used to further fine tune the comparison by incorporating modeling uncertainty. However, it is beyond the scope of this paper, as it is not part of the core message.

196

S. Boz´ oki et al. Table 2. MBPTA of observations using normal distribution fitting Dataset Kurtosis pmax C

6

E(#)

C1

3,200677 7,37E-10 0,000007

C2

10,29974 4,20E-56 0,000000

C3

3,183879 3,98E-05 0,398494

Applying Extreme Value Analysis

We applied POT EVA (see earlier) to all three campaigns. For POT threshold selection, we used graphical tools of mean residual life and parameter stability plots along with goodness of fit statistical tests. For fitting the Generalised Pareto Distribution (the distribution family underlying the POT method), we used the standard maximum likelihood estimator (MLE). Table 3 shows the selected thresholds, the number of data-points above the and E(#) selected thresholds, the p-value of the goodness of fit tests, and pmax C computed for the distribution approximation created by EVA. For C1 and C2 , EVA is underestimating the probability of the large values compared to empirical probabilities by less than an order of magnitude. For both cases, the expected number of large values is practically a non zero value considering 10000 datapoints. In C3 , the expected number of large values was almost the same as with the normal distribution, thus within range that is comparable to the measurement results. For C1 and C2 , although EVA is underestimating the probability of the large values compared to empirical probabilities, the inaccuracy is within an order of magnitude. Moreover, EVA does not break down to practically zero probabilities. In addition, it has to be emphasized that for the EVA approximation, we did not make any assumption on the distribution type of the whole distribution or the “long tail”, which is a very important consideration for practical applications. Table 3. MBPTA of observations using Peak Over Threshold EVA Dataset THR # above-THR p-value EVA pmax EVA E(#) C

7

C1

3600 200

0,29

1,04E-05

0,10

C2

3510 107

0,14

4,35E-06

0,04

C3

3050 300

0,43

4,02E-05

0,40

Threats to Validity

Our initial analysis focused on specific machines; further work is needed to evaluate EVA for reaching cluster-wide (that is, robustly “reusable”) estimates.

Application of EVA for Kubernetes Resilience Mechanism Execution Times

197

Additionally, initial measurements indicate – unsurprisingly – that overall resource utilization on the hosts does influence container restart times, to the point where the restart mechanism almost “breaks down” due to the host resources (most importantly, CPU) being saturated. In our measurements, such “parastic” performance interferences were not present. That being said, in an environment, where the host-homogeneity and time stability of restart times is truly important, it can be expected that the resources necessary for performing restarts are protected by a series of mechanisms (e.g., via dedicated CPU cores). Further measurements will investigate the effectiveness of such measures. Last but not least, we performed our measurements for a single container image type. Initial measurements (not reported here) indicate that the image type (e.g., CPU-heavy applications, network-centric applications, databases, . . . ) does have some effect on container restart times. However, the mere presence of a “long tail” does not seem to depend on the image type – i.e., we have to expect it.

8

Towards Characterizing Other Resilience Mechanisms

The potential need for EVA-based long tail repair/protective mechanism runtime estimates in the service quality assurance design of containerized applications is not constrained to container restarts. Our ongoing experiments target the overall service restoration time for a simple, instrumented containerized web service. In addition to measuring container restarts – which seem to have largely the same effect which we described for our BusyBox experiments – we are also investigating other resilience mechanisms: failover setups. One experiment uses the Kubernetes built-in kube-proxy, and two other use HAProxy. In the latter case, we are measuring an active-active as well as active-passive replication scheme. For active-active schemas – whether load balanced or not – the expectation is completely disturbance-free operation, while in the active-passive case slight, but capped response delay increases are expected for at most a well-determined transient interval. While the expectation seems to be met for the active-active cases (although our experiment numbers are still low), for HAProxy active-passive replication the measurements suggest the potential presence of a long tail in the service disturbance time (see Fig. 5). Owing to the long tail behaviour encountered, even in our initial measurements, the long tail characterisation of resiliency fundamentals seems to be a viable research field, especially for applications with SLA requirements.

198

S. Boz´ oki et al.

Fig. 5. Service recovery time histogram for HAProxy active-passive replication.

9

Conclusions

In a typical technical system, extreme values are rare deviances from the normal operation. This way, specific statistical methods are needed to evaluate them, especially in an empirical setting. “All-in-one” approaches that try to encompass the normal and the extreme values uniformly typically simply suppress the later ones. As expected, EVA proved orders of magnitude more accurate than the normal distribution in the more extreme cases. As a consequence, based on our measurement results and practical application of MBPTA using a na¨ıve nonEVA and an EVA approach, we found that using EVA for critical applications is a safer bet in platform resilience mechanism WCET estimation, because EVA is more robust to the extreme outlier values. Additionally, POT does not assume anything about the distribution of the under the threshold values, which makes it more practical for real world phenomena. Based on our results, with the caveats previously mentioned, it can be decided whether relying on Docker container restart alone is a fast enough error recovery mechanism or not. As future work, we plan to use the fitted models for fault tolerant system design analysis. Fundamentally, the timing characteristics can be used for mean time to repair (MTTR) estimation, which can serve as a basis for availability analysis. Moreover, considering the cost of alternative resiliency mechanisms and the cost of SLA violations, the fitted models could be used for risk based optimization of the expected value of the total cost.

Application of EVA for Kubernetes Resilience Mechanism Execution Times

199

References 1. Agarwal, H., Sharma, A.: A comprehensive survey of Fault Tolerance techniques in Cloud Computing. In: 2015 International Conference on Computing and Network Communications (CoCoNet), pp. 408–413 (2015) 2. Avizienis, A., Laprie, J., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1(1), 11–33 (2004) 3. Bernat, G., Colin, A., Petters, S.M.: WCET analysis of probabilistic hard realtime systems. In: 2002 23rd IEEE Real-Time Systems Symposium, RTSS 2002, pp. 279–288 (2002) 4. Castillo, E., Hadi, A., Balakrishnan, N., Sarabia, J.: Extreme Value and Related Models with Applications in Engineering and Science. Wiley, Hoboken (2004) 5. Cazorla, F.J., Kosmidis, L., Mezzetti, E., Hernandez, C., Abella, J., Vardanega, T.: Probabilistic worst-case timing analysis: taxonomy and comprehensive survey. ACM Comput. Surv. 52(1), 141–1435 (2019) 6. Cizek, P., H¨ ardle, W.K., Weron, R. (eds.): Statistical Tools for Finance and Insurance, 2nd edn. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-64218062-0 7. Cucu-Grosjean, L., et al.: Measurement-based probabilistic timing analysis for multi-path programs. In: 2012 24th Euromicro Conference on Real-Time Systems, pp. 91–101 (2012) 8. Cullmann, C., et al.: Predictability considerations in the design of multi-core embedded systems. In: Proceedings of Embedded Real Time Software and Systems. pp. 36–42 (2010) 9. ETSI: Network Functions Virtualisation: An Introduction, Benefits, Enablers, Challenges & Call for Action, Issue 1 (2012). https://portal.etsi.org/NFV/NFV White Paper.pdf. Accessed 10 July 2020 10. Hanmer, R.: Patterns for Fault Tolerant Software. Wiley, Hoboken (2013) 11. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques and Tools - Revised Edition. Princeton University Press (2015) 12. Mell, P., Grance, T.: The NIST Definition of Cloud Computing. Technical report. NIST Special Publication (SP) 800-145, National Institute of Standards and Technology (2011). https://csrc.nist.gov/publications/detail/sp/800-145/final 13. Proartis: Probabilistically analysable real-time systems. https://www.proartisproject.eu/. Accessed 10 July 2020 14. Rakoncai, P.: On Modeling and Prediction of Multivariate Extremes. Ph.D. thesis, Mathematical Statistics Centre for Mathematical Sciences, Lund University (2009) 15. Rapitasystems: Rapitime product. https://www.rapitasystems.com/products/ rapitime. Accessed 10 July 2020 16. Reghenzani, F., Massari, G., Fornaciari, W.: chronovise: measurement-based probabilistic timing analysis framework. J. Open Source Softw. 3(28), 711 (2018) 17. Wilhelm, R., et al.: The worst-case execution-time problem-overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7(3), 1–53 (2008)

Concepts and Risk Analysis for a Cooperative and Automated Highway Platooning System Carl Bergenhem1(B) , Mario Majdandzic2 , and Stig Ursing2 1

Qamcom Research and Technology AB, Gothenburg, Sweden [email protected] 2 Semcon AB, Gothenburg, Sweden {mario.majdandzic,stig.ursing}@semcon.com

Abstract. With the advent of automated cooperative systems, new challenges are raised with respect to fulfilling safety. This paper presents a novel approach in analysing and assessment of the risks for such systems. Moreover, a distributed conceptual architecture is proposed and discussed. This architecture features separated layers of concern for strategic, tactical and operational functions. We presented how the tactical layer can be allocated to either centralised or distributed versions of the architecture. An item definition, i.e. a description of the concept according to the ISO 26262 framework, for the platooning system, is presented together with a description of the operational design domain. A preliminary architecture for a functional safety concept of the item is presented. Finally, we present a risk analysis of the item according to a novel approach using a quantitative risk norm and incident classification, instead of traditional hazard analysis and risk assessment. Keywords: Risk analysis · Platooning · Cooperative automated highway vehicle system · Quantitative risk norm · Incident classification

1

Introduction

A cooperative system refers to a collection of multiple dynamic entities which collaborate to reach a common goal and benefits. An example of such a cooperative system is a public highway vehicle platoon where a group of vehicles travel very closely together, safely at high speed to a common destination. Vehicle platooning presents the opportunity for significant improvements within several areas such as road safety, energy efficiency, driver comfort and cost benefits. In this paper we present an example of an automated cooperative system for public highway vehicle platooning. We present item definition (definition of concept, boundaries etc. of a function at vehicle-level; according to ISO 26262 [7]), risk analysis, Operational Design Domain (ODD), i.e. operating conditions under which a given driving automation system or feature thereof is specifically designed to function within, and a general scenario of usage of this system. c Springer Nature Switzerland AG 2020  S. Bernardi et al. (Eds.): EDCC 2020 Workshops, CCIS 1279, pp. 200–213, 2020. https://doi.org/10.1007/978-3-030-58462-7_16

Concepts and Risk Analysis

201

We propose a generic functional concept for cooperative ADS (Automated Driving System) decision hierarchy. The functional concept separates concerns into three levels of functionality related to decision and control: strategic, tactical and operational. We then give two different basic implementations of the concept: centralised and distributed. The two alternatives differ with respect to where tactical functions are allocated and where the data comes from. We present a preliminary functional architecture with distributed tactical functions. The highway platooning system is analysed based on a decentralised concept. Since we assume that the system provides highly automated driving, i.e. without driver involvement, (see [13]), we argue that traditional Hazard analysis and Risk Assessment (HARA) should be replaced with a novel approach called Quantitative Risk Norm (QRN) as proposed in [15]. We demonstrate this novel method to analyse the cooperative ADS (the platooning system) and present some highlights. The rest of the paper is organised as follow. In Sect. 2, we give an overview of related work. This is divided into key concepts and terminology for ADS (e.g. ODD and DDT) and into safety analysis of platooning etc. In Sect. 3, we present a centralised and a distributed concept for a platooning ADS. In Sect. 4, we give an item definition and description of ODD for the highway platooning system based on the distributed concept. In Sect. 5, we describe the task of monitoring ODD exit. In Sect. 6, we perform the risk analysis of the item, give some highlights and present a preliminary functional architecture for the system. Finally, we draw conclusions and close the paper.

2 2.1

Related Work Key Concepts for ADS

This subsection is based on SAE J3016 [13]. This standard provides a taxonomy for motor vehicle automation ranging in five “SAE automation” levels from no automation to full automation. The follow sections describe key definitions in the standard. The ODD describes the operating conditions that a given driving automation system is designed to operate within, i.e. can be used to restrict where the system is valid. It includes factors such as type of highways, speed limitations, environmental conditions (weather, day/night, etc.), absence or presence of certain traffic and other domain constraints. Gyllenhammar et al. [6] discuss the contents of the ODD. The ODD may be used to confine the content of the risk analysis and, in later stages of development, to confine the scope of the safety case as well as the verification. They contribute with a refined definition and proposed structure of contents and categories of information that describes the ODD. A template is proposed; and this is used to define the ODD of the platooning item that is presented in this paper. The DDT (Dynamic Driving Task) encompasses all of the real-time operational and tactical functions required to operate a vehicle within the ODD. This includes lateral and longitudinal vehicle motion control, and monitoring

202

C. Bergenhem et al.

the driving environment. Strategic functions such as trip scheduling and selection of destinations and way-points are not included in the DDT. If the system experiences a failure or approaches a ODD exit, then either a human driver shall take over (e.g. continue the DDT) or the system or driver shall achieve a MRC (minimal risk condition). Note that a human driver may have other operating limits, i.e. ODD, than the ADS and hence not be able to take over. An example of an MRC is the platoon at full stop in current trajectory, with all vehicles disengaged from the platoon. This can be deemed safe if it can be argued that it does not occur too often. OEDR (Object and Event Detection and Response) is a subtask of the DDT, that is responsible for monitoring the driving environment (event and object detection, recognition, classification and response planning accordingly) and performing the planned response to such objects and events. DDT fallback is the expected role of either the user (human) or the system to perform the DDT or achieve minimal risk condition after occurrence of a DDT performance-relevant system failure or when approaching ODD exit. The amount of automation a system is capable of performing is categorized into levels based on whether the system: – performs any or all real-time operational and tactical functions required to operate a vehicle (DDT) – performs the OEDR monitoring subtask of the DDT – performs the DDT fallback – is limited by an ODD Driving automation system (DAS), refers to any SAE level 1–5 [13] system or feature capable of performing part or all of the DDT on a sustained basis within given ODD limitations, if any. An ADS is classified as SAE levels 3–5. 2.2

Other Work Related to Platooning

According to ISO 26262, an item is implemented as a system or array of systems to realise a function. As discussed by Nilsson et al. [11], the function may be cooperative and distributed across several vehicles and there may be a shared system, such as communications infrastructure, common interaction protocol or central traffic management system. When applying ISO 26262 to a cooperative system, a change in Part 3, clause 5.4.1 b, which specifies that the “the functional behaviour at the vehicle level”, would be needed. In this paper, we treat the entire fleet of cooperating vehicle as a single item. Pelliccione et al. [12] investigates design of vehicles, and their system architecture, that are constituents of future connected intelligent transportation systems. The assumed context is that of System of Systems (SoS). A functional reference architecture for cars as constituents of an SoS is presented. In this paper we further the notion of the different functional purposes and hierarchical arrangement of components in an SoS for platooning. In [1], an overview of safety in vehicle platooning is given. The paper surveys the literature regarding safety for platooning, including what analysis methods

Concepts and Risk Analysis

203

have been used, what hazards and failures have been identified, and solution elements that have been proposed to improve safety. Dajsuren and Loupias [4] research safety analysis for a cooperative adaptive cruise control system. They show that the inclusion of cooperative architecture perspective affects the safety goals of cooperative adaptive cruise control because ASIL determination is influenced by vehicle-to-vehicle communication faults. ENSEMBLE1 is a European project with the goal of enabling adoption of multi-brand truck platooning in Europe to improve fuel economy, traffic safety and throughput. It is the ambition of the project to realise pre-standards for interoperability between trucks, platoons. Contributions include an item definition [5] and HARA [9] for a truck platoon. The project results elaborate on use case and automation levels [14], and conclude that SAE automation levels 1–3 cannot be readily applied to platooning. Instead three new levels are proposed: A, B and C. At level C (the highest degree of platooning automation), the lead vehicle is under (assisted) manual control and the following vehicles have automated longitudinal and lateral control. In the example given in this paper, all of the platoon vehicles are highly automated, so assigning SAE level 4 seems appropriate. A novel risk analysis method for ADS is proposed in [15]. It is proposed to replace the traditional HARA method with a quantitative risk norm approach (QRN). The approach defines a quantified risk norm that is differentiated in classes based on different consequences of failure. Each class is allocated a budget limit for the frequency with which the consequences may occur. Incident types are then defined and assigned to the consequence classes; these incident types are used as safety goals to fulfill in the implementation. The risk norm depends of what the user and society deem as acceptable. The main benefits are the ability to show completeness of safety goals, and make sure the safety strategy is not limited by safety goals that are not formulated in a way suitable for an ADS feature. Performing HARA for a system implies enumeration of operational situations, or relevant scenarios that can occur when using the feature. With the traditional HARA applied to ADS, this is both intractable and unnecessary: It is intractable since the number of potentially relevant operational situations is vast, making an argument for completeness a very difficult task; and it is unnecessary since much of this complexity can be confined in the solution domain by means of using tactical decisions and an appropriately defined ODD to reduce risks.

3

Two Concepts for Platooning ADS

In this section we first introduce a conceptual architecture for cooperative ADS, see Fig. 1, and then give two refinements of the architecture differentiated by how they are deployed; either centrally or distributed. The conceptual architecture for the cooperative ADS is an adaptation toward cooperative systems of the architecture presented in [3], which in turn was 1

See: www.platooningensemble.eu.

204

C. Bergenhem et al.

inspired by [2]. The architecture consists of a hierarchical layout with three interacting, but separated, layers of concern (orange, blue and green) strategic, tactical and operational. Together they incorporate all of the real-time functions respectively required to operate an ADS-dedicated vehicle (ADS-DV) within its given ODD limitations. The conceptual architecture is refined into two alternative item concepts, see Fig. 2 and Fig. 3. Tactical functions are either centrally allocated or distributed among participants. In general, either of the two concepts could be found in various cooperative applications such as coordinated snow removal, grass-cutting, farming and vehicle platooning. In both concepts, operational functions are made in the vehicle systems. Strategic functions are always centralised, and include where to go, what to do, who can participate, i.e. a dispatch order. In the centralised concept, a centralised tactical layer, where a single subsystem is responsible for tactical functions for a vehicle platoon is illustrated in Fig. 2. The centralised tactical layer is deployed in a fleet management system. This implies that vehicles themselves cannot perform tactical functions. In the distributed concept, tactical functions are distributed among the individual members, as illustrated in Fig. 3. This implies that individual vehicles make tactical decisions taking into account how this will affect other vehicles and actors.

Fig. 1. A conceptual architecture for cooperative ADS (Color figure online)

Concepts and Risk Analysis

205

A more detailed description of the different layers responsibilities and how they interact with each other follows. The top strategic function layer (orange) includes strategic decision and control; and ensures that the information regarding the mission objectives and constraints for all platoon vehicles are cohesive e.g. information regarding formation, inter-vehicle distance, speed, highway topology, travel itinerary etc. The strategic function layer is implemented in a centralised configuration, which collectively manages strategic functions for the platoon. Since this layer is not related to a specified platoon vehicle, it may be implemented in any of the platoon vehicles or in a traffic management system. All information regarding the selected route is sent to the middle tactical function layer (blue), which consequently provides the upper layer with the current vehicle states. The middle tactical function layer (blue) includes tactical decision making; this layer determines and coordinates the necessary vehicle maneuvers, concerning maneuvering the vehicle platoon safely based on its mission, as well as event based control decisions during unexpected events in the environment and vehicle states. All information regarding manoeuvres of vehicle(s) is sent to the lower operational function (green) layer, which consequently provides the middle layer with the current vehicle states. The tactical function layer can be implemented in two different control configurations, centralised or distributed; as discussed previously in this section. The lower operational function layer (green) involves e.g. real-time decision making and control over actuators. This ensures that the tactical decisions for each vehicle in the platoon are actuated, e.g. steering, braking, acceleration to automatically perform and maintain the strategic mission or to avoid sudden obstacle or other incident in the vehicles path. Operational decision receives its actuation commands from the tactical function layer (blue) and consequently provides the middle layer with vehicle states and information regarding the perception of the external environment.

4

Item Definition and Operating Design Domain

In this section we present an item definition according to ISO 26262 for a public highway vehicle platooning system. The presentation is limited to only include the distributed architecture as discussed in the previous section. The item definition and ODD are used for the context of the risk analysis, but also in later design stages for verification and validation. The item definition includes the ODD and a functional description of the platooning system including vehicles and infrastructure, see Fig. 4. Table 1 is an example of ODD description for the highway platooning item; based on a proposed template given in [6]. The automation level of the proposed public highway vehicle platooning system is SAE level 4, i.e.“High Driving Automation”. The ADS fully performs the sustained and ODD-specific execution of the DDT; including both the lateral and longitudinal vehicle motion control subtasks (hence the ADS is above level 1). The individual vehicles have sensing capability to monitor the external environment, i.e. fully performing the OEDR (hence the ADS is above level 2).

206

C. Bergenhem et al.

Fig. 2. Centralised item elements and boundary, implementing a public highway vehicle platoon function with centralised tactical functions (Color figure online)

Fig. 3. Distributed item elements and boundary, implementing a public highway vehicle platoon function with distributed tactical functions (Color figure online)

The, vehicles also have internal sensing and capability of monitoring of the internal subsystems (internal OEDR task), i.e. have sensing of vehicle internal matters to conclude that a vehicle’s subsystems have no failures. For example, a vehicle can compare the requested trajectory with the actual trajectory, based on internal sensors such as IMU, to detect any deviation. Any failure will be signalled to the ADS. If an obstacle appears or in case of vehicle failure, then the ADS shall decide (hence the ADS is above level 3) if to initiate DDT fallback of the ADS to reach MRC. The ODD has a limited scope (highway) and hence

Concepts and Risk Analysis

207

Fig. 4. Item definition for a cooperative function

the ADS cannot be level 5. The human in the lead vehicle (or following vehicles) of the platoon has no responsibility, neither for DDT (including OEDR) nor DDT-fallback, hence the human (in the driver’s seat) is a passenger while the platooning function is engaged. A small example of an ODD for the highway platooning system is summarised in Table 1. No sources are mentioned, but we assume that there exist such sources, e.g. for traffic intensity and highway dimensions. The ODD shall define the limits of the ADS. Performing the DDT shall handle the dynamic changes during run-time, such as change of highway friction, vehicle capability or weather conditions. Figure 3 shows the boundary of the item that implements a highway vehicle platooning system with distributed tactical functions, its elements and its interaction with external elements. The vehicles communicate with each other for tactical functions and communicate with infrastructure for strategic functions.

5

Monitoring ODD Exit

In the previous sections we described an ADS item that functions according to SAE J3016 level 4, i.e. the system is capable of completely performing DDT and DDT-fallback. Generally, an ADS can only operate within its ODD, and can not be allowed to enter conditions that are outside the ODD, i.e. ODD-exit. While inside the ODD, the ADS always “knows what to do”, i.e. can fully perform the DDT. Hence ODD-exit cannot occur provided that the ADS can, through tactical decisions, perform its DDT to adapt the risk (e.g. by changing speed or manoeuvre to reach MRC) according to the environment and situation etc. Note that performing MRC can be done at any time while the ADS is within the ODD. The ODD specification needs to define triggering conditions within ODD that signals that ODD exit is approaching. When triggered, operation can be adapted, e.g. reduction of speed or performing an action to reach MRC. The tactical subtasks of performing the DDT, including DDT-fallback, are allocated differently depending on the chosen concept. In the centralised concept, the tactical functions are allocated to the central element, while it is allocated to each member in the distributed concept.

208

C. Bergenhem et al. Table 1. ODD for Item: Public highway Vehicle platooning. Based on [6]

Category

Description

Dynamic elements

The item assumes that the composition of other vehicles (cars, trucks, other platoons, busses, emergency vehicles) is according to highway, e.g. supported by measurement. E.g. low occurrance of VRUs. Large animals (that cause damage to vehicles on collision) shall be anticipated, e.g. supported by institutional reports

Scenery

The item is available from start to destination of trips on designated highways between large cities. Designated highways have been investigated to permit the system, e.g. supported by map data and measurements. At the start and destination there shall be a particular area for loading, checking and initiating the platooning. The highway including shoulder shall be clear of vegetation. There shall be barriers on both sides of the highway. The function is available under dry, above zero degree conditions, and supported by current weather data and forecasting. High friction is assumed. Lane markings shall be clearly visible. All sections of the highway shall have open sky (e.g. no underpasses)

Connectivity

The item assumes availability of: GPS, V2V and V2I communication, remote traffic management systems, weather forecast data. Information (real-time during run-time) on highway condition including friction and highway-works

Actions & events, other actors

Any action (e.g. supported by accident databases) that occurs involving actors (dynamic elements) on highway (supported by e.g. institutional reports) shall be anticipated, supported by measurements from driving. This list is not exhaustive or complete, but gives some examples: – vehicle changing lanes – vehicle cut-in – vehicle join/leave into platoon – vehicle turning on the hazard lights – large animal crossing the highway – motorcycle driving in between lanes in a queue

Goals & values permanent

All traffic rules, laws and policies, e.g. maximum speed limits, must be obeyed. In unobstructed conditions (e.g. no traffic) the minimum speed is 50 km/h

Goals & values transient

During ADS operation of the DDT, operator input is not allowed. Change of mission objective is allowed. There may be other transient goals & values. These choices are supported by user/usage mapping/profiling

Functional range

Platoon must have minimum performance as described in the internal platoon feature specification

Fallback ready user

The item is a level 4 ADS, hence Fallback Ready User is not assumed or required to perform DDT fallback. Vehicles (lead and following) may have potential drivers (passengers) that could take over DDT of an individual vehicle if permitted by the platoon ADS. These choices are supported by user/usage mapping/requirements

Gyllenhammar et al. [6] give four strategies for ensuring not to exit the ODD. These are summarised with examples below:

Concepts and Risk Analysis

209

– Inherent in ADS feature definition - E.g. restriction of maximum speed as part of the definition. – Checking mission when accepting strategic task - E.g. at the start of a platooning mission dispatch, there is a weather report that states there is bad weather (i.e. outside the ODD) halfway along the route. In this case it is a (strategic) business decision whether or not the mission should even be embarked. The (business) risk is that the platoon will need to reach MRC halfway, e.g. that customers will be displeased that goods are delayed in transit. The triggering condition can be based on a “bad weather”-sensor or continual external weather reports. Note that a mission will only be embarked if there exists a safe MRC at any point along the route. – Statically defined, spatial and temporal triggering conditions - E.g. geographically defined limits of the feature. – Run-time measurable triggering conditions related to operating conditions E.g. the snow-intensity during usage has intensified gradually and has now reached the triggering condition on the border of the ODD. Before ODD-exit, the ADS prompts the passenger to take over as driver and perform the DDT, or the system performs DDT fallback to reach MRC, see the example in Fig. 7 of SAE J3016 [13].

6

Risk Analysis Using the QRN Approach

The highway platoon item is analysed to find safety goals using the QRN approach instead of HARA. HARA centers around enumerating operational situations and then arguing completeness of these. This is intractable for ADS due to the complexity of the function. The QRN approach is described in [15]; which gives an in-depth background and argumentation for this choice. This method is summarised in the related work Sect. 2. The QRN approach essentially relates to a budget of acceptable frequencies of incidents or accidents in specified consequence classes, i.e. incident classification. Each incident shares the budget for the particular class. The consequence classes can be e.g. either quality- or safety related. Completing the method generates safety goals that are used in an ISO 26262 process. We choose the incident classes in the following way: three classes for safety, νS1 to νS3 , and three classes for quality-related incidents, νQ1 to νQ3 . The safety classes imply risk of harm to humans and the quality classes imply risk of harm to brand (reputation or financial). Each norm class will contain contributions from different types of incidents leading to the same consequence. fνj ,Ik denotes the frequency of incident k within a consequence class j. fmax,νj denotes the total frequency budget for consequence class j. An example is given in Table 2. We make an incident classification for the highway platoon, see Fig. 6, and further develop selected leaves, see Table 3. From the incidents, safety goals can be formulated, e.g.:

210

C. Bergenhem et al.

Fig. 5. A simplified preliminary distributed functional architecture to implement a highway platooning ADS; based on [11]

– SG-I1 : Avoid near collision Platoon VRU Other with d < 1 m within fI1 – SG-I5 : Avoid collision Platoon Non-platoon vehicle with Δ v < 7 km/h within fI5 We propose a preliminary functional architecture, see Fig. 5. It is based on the cooperative architecture as described by Nilsson et al. in [11], but is adapted to include the notion of the three hierarchical layers of concern to perform highly automated driving. The strategic function layer would be allocated to the central traffic management system, but is omitted here. The “Situation threat and risk assessment”-component performs the OEDR task (a subtask of the DDT). Overall, the tactical functions component shall fulfil the strategic mission. Tactical decisions are made cooperatively (distributed) for all participating cooperative vehicles. We note here that some agreement protocol would be needed such as [10]. This added complexity would be avoided in a centralised approach. This is not should be investigated in further work. Another functional architecture is Table 2. An example of total frequency budget for each consequence class Q1: brand image loss (small) Frequency fmax 1 × 10−5 (events per hour)

Q2: brand image loss (large)

Q3: financial loss

S1: light to mod. injuries

S2: severe injuries

S3: lifethreat. injuries

1 × 10−7

1 × 10−7

1 × 10−6

1 × 10−7

1 × 10−8

Concepts and Risk Analysis

211

proposed in the ENSEMBLE project [8]. This also uses the notion of separating concerns in different decision levels. A next step during development would be to refine Safety Goals, and derive Functional Safety Requirements, and allocate the requirements to the components in functional architecture. This would achieve a Functional Safety Concept that fulfills the requirements of the highway platooning item.

Table 3. Selection of incidents and corresponding safety goals Incident

Specification

Consequence class in the risk norm

Platoon VRU Other

[I1] d < 1 m 90% Q1 10% Q2 [I2] Δ v < 7 km/h 60% S1 40% S2 [I3] Δ v ≥ 7 km/h 33% S1 33% S2 34% S3

Platoon Non-platoon vehicle [I4] d < 1 m 90 % Q1 10% Q2 [I5] Δ v < 7 km/h 40% S1 20% S2 40% Q3 [I6] Δ v ≥ 7 km/h 30% S1 25% S2 45% S3

Fig. 6. Incident classification for the highway platoon item

7

Conclusions and Future Work

We have presented, discussed and analysed an automated and cooperative system for highway platooning. As a background, a conceptual ADS functional architecture was presented. We presented how tactical functions can be allocated in either centralised or distributed versions of the architecture. An item definition, with a description of operational design domain, for the platooning system was presented based on the distributed concept. A preliminary architecture for a functional safety concept of the item was presented. Finally, we presented a risk

212

C. Bergenhem et al.

analysis of the item according to a novel method using a risk norm and incident classification, instead of traditional HARA. Safety goals are found through the QRN approach. These state a maximal frequency of occurrence, rather than a mainly qualitative integrity target as in ISO 26262. This could require a new development framework to replace or amend ISO 26262 for ADS. The paper is clearly missing a validation section where e.g. architectural choices are compared with their relative cost, resilience, and other merits and drawbacks. We propose this as future work. The paper has also given little focus on the solution with centralised tactical control. This should be described and compared with the solution with distributed tactical control. Acknowledgment. This work is supported by VINNOVA via the FFI ESPLANADE (See: www.esplanade-project.se) project (ref 2016-04268). Thanks to Dr. Rolf Johansson & Dr. Fredrik Warg for valuable discussions, and Dr. Negin Nejad for review.

References 1. Axelsson, J.: Safety in vehicle platooning: a systematic literature review. IEEE Trans. Intell. Transp. Syst. 18(5), 1033–1045 (2016) 2. Behere, S., Torngren, M.: A functional architecture for autonomous driving. In: 2015 First International Workshop on Automotive Software Architecture (WASA), pp. 3–10. IEEE (2015) 3. Cassel, A., et al.: On perception safety requirements and multi sensor systems for automated driving systems. SAE Technical Paper (2020) 4. Dajsuren, Y., Loupias, G.: Safety analysis method for cooperative driving systems. In: 2019 IEEE International Conference on Software Architecture (ICSA), pp. 181– 190. IEEE (2019) 5. Dhurjati, P., Mengani, L.: D2.10 iterative process document and item definition (2018) 6. Gyllenhammar, M., et al.: Towards an operational design domain that supports the safety argumentation of an automated driving system. In: Proceedings of the 10th European Congress on Embedded Real Time Systems (ERTS), Toulouse, France (Jan 2020) 7. ISO: ISO 26262:2018 Road vehicles - Functional safety (2018) 8. Konstantinopoulou, L., Coda, A., et al.: D2.4 functional specification for whitelabel truck (2018) 9. Mengani, L., Dhurjati, P.: D2.11 first version hazard analysis and risk assessment and functional safety concept (2019) 10. Morales-Ponce, O., Schiller, E.M., Falcone, P.: Cooperation with disagreement correction in the presence of communication failures. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 1105–1110. IEEE (2014) 11. Nilsson, J., Bergenhem, C., Jacobson, J., Johansson, R., Vinter, J.: Functional safety for cooperative systems. SAE Technical Paper (2013) 12. Pelliccione, P., et al.: Beyond connected cars: a systems of systems perspective. Sci. Comput. Program. 191, 102414 (2020)

Concepts and Risk Analysis

213

13. SAE: SAE J3016 - Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles (2018) 14. Vissers, J., et al.: D2.2 platooning use-cases, scenario definition and platooning levels (2018) 15. Warg, F., et al.: The quantitative risk norm - a proposed tailoring of HARA for ads. In: 2020 Annual IEEE/IFIP International Conference on Dependable Systems and Networks, SSIV Workshop (2020)

Author Index

Afanou, Sitou 20 Ambellouis, Sébastien Aranha, Helder 143

Majdandzic, Mario 200 Marmo, Roberto 90 Marteau, Tony 20 Masi, Massimiliano 143 Mirzaei, Elham 127 Moncini, Federico 159 Montecchi, Leonardo 159 Muram, Faiz Ul 113

20

Baasch, Benjamin 78 Banić, Milan 44 Bergenhem, Carl 200 Borghi, Guido 56 Boukour, Fouzia 20 Boulineau, Jean François Bozóki, Szilárd 185 Butakova, Maria 33

5

Nappi, Roberto 68 Niebling, Julia 78 Pataricza, András 185 Pavleska, Tanja 143 Pethő, Dániel 185 Pini, Stefano 56 Punnekkat, Sasikumar 113

Calderara, Simone 56 Chernov, Andrey 33 Conrad, Mirko 127 Cucchiara, Rita 56 Cutrera, Gianluca 68

Ristić-Durrant, Danijela 44 Roijers, Diederik M. 99 Romanovsky, Alexander 168

Fattouh, Anas 113 Fedeli, Eugenio 56 Franke, Marten 44 Franzè, Giuseppe 68 Gasparini, Riccardo 56 Gorbenko, Anatoliy 168 Guda, Alexander 33 Haseeb, Muhammad Abdul Jamshidi, Helia 99 Javed, Muhammad Atif 113 Keefe, Kenneth 159 Kocsis, Imre 185 Kovács, Benedek 185 Kruspe, Anna 78

44

Scaglione, Giuseppe 56 Sellitto, Giovanni Paolo 143 Shevchuk, Petr 33 Simonović, Miloš 44 Sodoyer, David 20 Stamenković, Dušan 44 Striano, Valerio 68 Suskovics, Péter 185 Szalontai, Jenő 185 Tarasyuk, Olga 168 Thomas, Carsten 127 Ursing, Stig

Lee, Wan-Jui 99 Lollini, Paolo 159

200

Vigliotti, Antonio

68