International Congress and Workshop on Industrial AI and eMaintenance 2023 (Lecture Notes in Mechanical Engineering) 3031396189, 9783031396182

This proceedings brings together the papers presented at the International Congress and Workshop on Industrial AI and eM

139 72 24MB

English Pages 813 [780] Year 2024

Table of contents :
Editorial
Contents
A Vision-Based Neural Networks Model for Turbine Trench-Filler Diagnosis
1 Introduction
2 Literature Review
3 Materials and Methods
3.1 Trench-Filler in Turbines
3.2 YOLO
3.3 Labelme
3.4 Roboflow
4 Methodology
4.1 Image Acquisition System
4.2 Trench-Filler Non-conformities
4.3 Image Dataset
4.4 Image Segmentation and Annotation
4.5 Model Training
5 Experimental Results
6 Conclusion
References
Use Cases of Generative AI in Asset Management of Railways
1 Introduction
2 State-of-Art
2.1 History of Generative Algorithms
2.2 Generative AI Models
2.3 Application Areas of Generative AI (GAI)
2.4 GAI in Asset Management of Railways
3 Use Cases of GAI in the Asset Management of Railways
3.1 Context Aware Analytics
3.2 Historical Data and Condition Data
3.3 Improved Data Quality
3.4 Natural Language Processing Asset Management Data
3.5 Fleet to Individual and Individual to Fleet Knowledge Transfer
3.6 No Fault Found
3.7 Configuration Management
3.8 Life Cycle Management in a Cross-Organisational Operation and Maintenance Environment
3.9 Maintenance Policy
4 Conclusions
References
A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive Impairments in Fighter Pilots
1 Introduction
2 Cognitive Profile
3 Training Needs
4 The Perceptual Cycle
5 Practical Adaptation
5.1 Case Report
5.2 Mirror-Based Warning System
6 Platform
7 Conclusions
References
Risk-Based Safety Improvements in Railway Asset Management
1 Introduction
2 Method and Material
3 Results
4 Discussion
References
Performance of Reinforcement Learning in Molecular Dynamics Simulations: A Case Study of Hydrocarbon Dynamics
1 Introduction
2 Chemical Simulation Details
3 RL Learning Details
3.1 Deep Reinforcement Learning
3.2 Recurrent Proximal Policy Optimization
4 RL in Chemistry
5 RL Environment Details
5.1 Reward Functions
6 Results
7 Conclusion
References
Causal Effects of Railway Track Maintenance—An Experimental Case Study of Tamping
1 Introduction
1.1 Tamping and Track Geometry
2 The Experiment
2.1 Experimental Design and Analysis
3 Onboard Measurement System
3.1 Data Pre-processing and Cleaning
4 Results
4.1 StdevLL Trend on Tamped Track Segments
4.2 StdevLL Trend on Untamped Segments
4.3 Welch’s Two-Sample t-Test
4.4 Effect on Lateral Alignment
5 Generalized Experimental Approach
6 Conclusions and Discussion
References
Self-driving Cars in the Arctic Environment
1 Introduction
2 Working of Self-driving Car
3 Arctic Circle
4 Key Challenges of Implementing Self-driving Cars in the Arctic
5 Technologies Used in Self-driving Car
5.1 Normal Weather Condition
5.2 Arctic Condition
6 The Future of Arctic Transportation with Self-driving Cars
7 Conclusion
References
Towards a Railway Infrastructure Digital Twin Framework for African Railway Lifecycle Management
1 Introduction
2 Digital Twin Overview for African Railway Infrastructure
3 South African Railway
4 Railway Infrastructure Information
4.1 Transverse Profile
4.2 Longitudinal Railway Profile
5 Available Information
5.1 Track Geometry Measurements
5.2 Non-destructive Testing
5.3 Instrumented Monitoring
5.4 Remote Sensing Techniques
5.5 Summary
6 Digital Twins for Railway Infrastructure
6.1 Hardware
6.2 Middleware
6.3 Software
7 Conclusions
References
Climate Change Impacts on Mining Value Chain: A Systematic Literature Review
1 Introduction
2 SLR Methodology for CCIMO
3 Answer to RQ1: Impacts
4 Answer to RQ2: Model/Approaches and Strategies
5 Discussion and Conclusion
References
Systematic Dependability Improvements Within Railway Asset Management
1 Introduction
2 Method and Material
3 Results
3.1 Step 1–FMEA
3.2 Step 2–Planning of Field Experiment
3.3 Step 3–Execution of Verification Field-Test
4 Discussion
References
A Conceptual Model for AI-Enabled Digitalization of Construction Site Management Decision Making
1 Introduction
2 State of the Art
2.1 Construction Site Managers
2.2 Decision-Making on Construction Site
2.3 AI and Digitalization as a Decision Support Tool
3 Research Methodology
4 Results and Analysis
4.1 Identified Issues and Challenges in Decision-Making (Results)
4.2 Proposed Conceptual Models (Analysis)
5 Conclusion
References
Risk-Based Dependability Assessment of Digitalised Condition-Based Maintenance in Railway—Part II–Event Tree Analysis (ETA)
1 Introduction
2 Method and Material
3 Results
3.1 Event Tree Analysis (ETA)
3.2 Third Maintenance Significant Item Selection
4 Discussion
References
Making Tracks—Combining Data Sources in Support of Railway Fault Finding
1 Introduction
2 Trials Aims
3 Terminology
4 Trials Conduct
4.1 Preparations
4.2 Data Collection and Correlation
4.3 Data Analysis
5 Discussion
6 Conclusions
7 The Way Ahead
References
Simulation Environment Evaluating AI Algorithms for Search Missions Using Drone Swarms
1 Introduction
1.1 Problem Formulation
2 Background
2.1 Unmanned Aerial Vehicles
2.2 Drone Swarm Communication
2.3 Bayes Search Theory
2.4 Lawn Mower Search
2.5 Local Search Hill Climbing
2.6 A-Star Search Algorithm
3 Related Work
4 Method
5 Results
6 Discussion
7 Conclusion
References
Risk-Based Dependability Assessment of Digitalised Condition-Based Maintenance in Railway Part I–FMECA
1 Introduction
2 Method and Material
3 Results
3.1 Stakeholder Requirements
3.2 First MSI Selection
3.3 FMECA
3.4 Second MSI Selection
4 Discussion
References
Climate Zone Reliability Analysis of Railway Assets
1 Introduction
2 Climate Zones Classification Systems
3 Sweden’s Climate
4 Impact of Climate on Railway Infrastructure
5 Methodology
5.1 Data Gathering and Pre-processing
5.2 Clustering According to Climate Zones
5.3 Reliability Analysis and Discussion
6 Conclusion
References
Remaining Useful Life Estimation for Anti-friction Bearing Prognosis Based on Envelope Spectrum and Variational Autoencoder
1 Introduction
2 Experimental Data [10]
3 Methodology
3.1 Envelope Analysis for Fault Detection
3.2 Spectral Variation During Degradation
3.3 Ground Truth RUL
3.4 VAE for RUL Prediction
4 Experiments Results
4.1 Network Training
4.2 Model Performance on the Training Set
4.3 Model Performance on the Testing Set
5 Conclusion
References
Artificial Intelligence in Predictive Maintenance: A Systematic Literature Review on Review Papers
1 Introduction
2 Method
2.1 Research Questions
2.2 Systematic Searches
2.3 Criteria
3 Quality Assessment Criterion of Secondary Studies
3.1 Documentation of Scoring Criteria
4 Results
4.1 Search Results
4.2 Quality Evaluation of Reviews
5 Discussions
6 Conclusions
References
Using a Drone Swarm/Team for Safety, Security and Protection Against Unauthorized Drones
1 Introduction
2 Background
3 Unmanned, Autonomous Drones in a Security Context
3.1 Scenario
4 Distributed Decision Making and Task Allocation
4.1 Task Allocation and Execution
4.2 Distributed Decision Making
5 Implementation
5.1 Training Framework for Cooperative Tasks
5.2 Training Framework for Competitive Task
6 Simulation and Evaluation
6.1 Architecture
6.2 Evaluation
6.3 Results
7 Conclusions and Future Work
References
Design, Development and Field Trial Evaluation of an Intelligent Asset Management System for Railway Networks
1 Introduction
2 The Intelligent Asset Management Platform
3 The IAMS Platform ML Pipeline
4 Numerical Results
5 Conclusions
References
Analysing Maintenance and Renewal Decision of Sealed Roads at City Council in Australia
1 Introduction
2 Methodology
3 Results
4 Discussion
5 Conclusion
References
Issues and Challenges in Implementing the Metaverse in the Industrial Contexts from a Human-System Interaction Perspective
1 Introduction
2 Human-System-Interaction (HSI)
3 Metaverse
3.1 Emerging Technology
3.2 Architecture of the Metaverse
4 Human-System Interaction in Metaverse
4.1 Interaction with the Virtual World
4.2 Interaction with the Physical World
5 Applications of HSI in the Metaverse in Industrial Aspects
6 Issues and Challenges
6.1 Technical Challenges
6.2 Organizational Challenges
6.3 Ergonomical Challenges
6.4 Economical Challenges
7 Conclusions
References
Development of a Biologically Inspired Condition Management System for Equipment
1 Introduction
2 Proposed Framework
2.1 Structural Attributes
2.2 Environmental Attributes
2.3 Operational Attributes
2.4 Primary Component (“Old Brain”)
2.5 Secondary Component (“New Brain”)
2.6 Arbitrator Component
3 Conclusions
References
Application of Autoencoder for Control Valve Predictive Analytics
1 Introduction
2 Related Work
3 Workflow
3.1 Applying Autoencoder to Valve Predictive Analytics
3.2 Determination of Anomaly Threshold
4 Results
5 Discussion
6 Conclusion
7 Future Work
References
LCC Based Requirement Specification for Railway Track System
1 Introduction
2 Method
3 System Description
4 Results and Discussions
5 Conclusions
References
Pre-processing of Track Geometry Measurements: A Comparative Case Study
1 Introduction
2 Methodology
2.1 Modified Correlation Optimised Warping
2.2 Recursive Segment-Wise Peak Alignment
2.3 Evaluation Methods
3 Case Study
4 Results and Discussion
5 Conclusion
References
Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm
1 Introduction
2 Methodology
3 Experiment
3.1 Dataset
3.2 Experimental Configuration
3.3 Evaluation Metrics
4 Result
5 Conclusion
References
Cooperative Search and Rescue with Drone Swarm
1 Introduction
2 Theoretical Background
2.1 Potential Fields
2.2 Search Strategies
3 Search and Rescue Scenario
4 Method and Material
5 Results and Discussion
6 Conclusion
References
Domain Knowledge Regularised Fault Detection
1 Introduction
1.1 Unsupervised Fault Detection in Condition Monitoring
1.2 Including Domain Knowledge in Unsupervised Data-Driven Methods
1.3 Prior Work
1.4 Overview
2 Domain Knowledge Regularisation for an Auto-encoder
2.1 Problem Definition
2.2 Auto-encoder Definition
2.3 Proposed Regularisation Scheme
2.4 Evaluation Metrics
3 Evaluation
3.1 Datasets and Data Preparation
3.2 Model Definition and Training
3.3 Model Evaluation
4 Adding Domain Knowledge
4.1 Envelope Spectrum Frequency Data
4.2 Engineering Feature Data
5 Results
6 Conclusions and Future Work
References
HFedRF: Horizontal Federated Random Forest
1 Introduction
2 Related Work
2.1 Federated Learning (FL)
2.2 Random Forest (RF) with FL
2.3 Tree-Merging Algorithm
3 Proposed Work
3.1 Example to Challenge the Lossless Property of the Tree Merging Algorithm
3.2 Horizontal Federated Random Forest (HFedRF)
3.3 Model Building
3.4 Proposed Scheme for HFL with Random Forest
4 Experimental Results
4.1 Dataset Used
4.2 IIDs and Non IIDs Partition
4.3 Building Random Forest and Aggregation
4.4 Result Analysis
5 Conclusion and Future Work
References
Rail Surface Defect Detection and Severity Analysis Using CNNs on Camera and Axle Box Acceleration Data
1 Introduction
2 Data Collection and Description
2.1 ABA Preprocessing
2.2 Image Preprocessing
3 CNN for Image and ABA Analysis
3.1 ABA Classifier
3.2 Image Classifier
4 Results and Discussion
4.1 ABA Classifier
4.2 Image Classifier
5 Conclusion
References
A Testbed for Smart Maintenance Technologies
1 Introduction
2 Methodology
2.1 Case Study
2.2 Data Collection
2.3 Data Analysis
2.4 Testbed Development
3 Smart Maintenance Technologies
4 Empirical Findings
5 A Testbed for Smart Maintenance Technologies
6 Discussions and Conclusions
7 Further Research
References
Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity
1 Introduction
2 Research Methodology
3 Game Theory for Cyber Kill Chain
4 Strategic Game Model
5 Case Study
6 Conclusion and Future Research
References
On the Need for Human Centric Maintenance Technologies
1 Introduction
2 Smart Maintenance
3 Human Errors
3.1 Operations
3.2 Autonomous Maintenance
3.3 Professional Maintenance
3.4 Human Factors Analysis
4 Humans in Industry 4.0
5 Root Cause Failure Analysis
6 Discussion
References
A Systematic Study of the Effect of Signal Alignment in Information Extraction from Railway Infrastructure Recording Vehicle Data
1 Introduction
2 Required Background
2.1 Track Geometry Data
2.2 Spatial Accuracy of Data
2.3 Aligning Data Sets Between Different Measurement Campaigns
2.4 Features Characterising the Evolution of a Double Slack in the Track Geometry
2.5 Stretching of Time Series Data
3 Conclusion
References
Wheel Damage Prediction Using Wayside Detector Data for a Cross-Border Operating Fleet with Irregular Detector Passage Patterns
1 Introduction
2 Problem Outline
3 Measurement Characteristics
3.1 Data Normalisation
4 Case Study
5 Discussion and Conclusions
References
Predictive Maintenance and Operations in Railway Systems
1 Introduction
2 Literature Review
3 Project Plan and Methods
3.1 Predictive Maintenance in Railway V-T Systems
3.2 Infrastructure-Oriented PMO
3.3 Operator-Oriented PMO
4 Conclusions
References
Experimental Setup for Non-stationary Condition Monitoring of Independent Cart Systems
1 Introduction
2 System Description and Experimental Setup
2.1 Extended Transport System (XTS)
2.2 Experimental Setup
3 Data Acquisition
4 Raw Data and Preliminary Results
5 Conclusion
6 Future Work
References
Hazardous Object Detection in Bulk Material Transport Using Video Stream Processing
1 Introduction
2 Methodology
2.1 Frame Pre-processing
2.2 Initial Classification Step
2.3 Second Classification Step
2.4 Testing and Validation
3 Results and Discussion
3.1 First Classification Step
3.2 Second Classification Step
4 Conclusions and Recommendations
References
Rotor and Bearing Fault Classification of Rotating Machinery Using Extracted Features from Experimental Vibration Data and Machine Learning Approach
1 Introduction
2 Machine Learning (ML) Model Approach [25]
3 Laboratory Rig
4 Analysis of Measured Data
5 Data Preparation and Application of the ML Model
6 Results
7 Concluding Remarks
References
Are We There Yet?—Looking at the Progress of Digitalization in Maintenance Based on Interview Studies Within the Swedish Maintenance Ecosystem
1 Introduction
2 Method
3 Results
3.1 Enabling Technologies in Maintenance
3.2 Digital Challenges in Maintenance
3.3 Facilitating the Digital Transformation
3.4 Where Are We Headed?
4 Conclusions
References
Integrated Enterprise Risk Management and Industrial Artificial Intelligence in Railway
1 Introduction
2 Theoretical Frame of Reference
2.1 Conceptual Framework of Reliability-Centred Maintenance (RCM)
2.2 Concept of Triple-Loop Learning
3 Method and Material
4 Levels of IAI in Railway Risk Management
5 Risk-Related Methodologies and Tools for IAI Applications
6 Discussion and Conclusions
References
Digital Twin: Definitions, Classification, and Maturity
1 Introduction
2 Definitions from Standards
2.1 Digital Twin Components
3 Classification of Digital Twin
3.1 Application Area
3.2 Hierarchical Level
3.3 Interaction Devices
3.4 Data Sources
3.5 Integration
3.6 Level of Autonomy
3.7 DT Creation Time
3.8 Life Cycle Phase
3.9 Cognitive Type
3.10 Collaboration Type
4 Digital Twin Maturity
4.1 Atkins
4.2 Cognitive DT Maturity
4.3 DT Maturity in Analytics
4.4 DT Maturity in Dimensions
5 Conclusion
References
The Importance of Using Domain Knowledge When Designing and Implementing Data-Driven Decision Models for Maintenance: Insights from Industrial Cases
1 Introduction
2 Theoretical Reasoning
3 Industrial Cases
3.1 Understanding the Problem
3.2 Understanding the System
3.3 Understanding the Physical Parameters
3.4 Understanding Variability
4 Discussion and Conclusions
References
Point Cloud Data Augmentation for Linear Assets
1 Introduction
2 Related Work
2.1 Image Data Augmentation
2.2 Point Cloud Data Augmentation
3 Data Augmentation for Linear Assets
4 Conclusion
References
Selection of Track Solution in Railway Tunnel: Aspect of Greenhouse Gas Emission
1 Introduction
2 Method
2.1 Study Approach
2.2 Data Collection
2.3 Computation of CO2 Equivalent Emission
3 Use Case
3.1 Description of the Line
3.2 Characteristics of the Tunnel
3.3 Track Solutions
3.4 Service Life of the System
3.5 System Boundary
4 Results and Discussion
5 Conclusions
References
Intelligence Based Condition Monitoring Model
1 Introduction
2 Literature Review
3 Methodology
3.1 Machinery Fault Database
3.2 Data Pre-processing
3.3 Model Training
3.4 Model Stacking
4 Results
5 Conclusion
References
Enhancing the Effectiveness of Neural Networks in Predicting Railway Track Degradation
1 Introduction
2 Review of the Literature
3 Neural Network Methods
3.1 FF-ANN
3.2 RNN
4 Hyperparameter Tuning
4.1 Bayesian Optimization
5 Results and Discussion
5.1 Data Preparation
5.2 Application and Evaluation of ANNs
6 Conclusion
References
Process Reliability Analysis Applied for Continual Improvement of Large-Scale Alumina Refineries
1 Introduction
1.1 What Is Alumina Refinery
1.2 Identification and Reduction of Performance Gaps
1.3 Isograph’s Process Reliability Module
2 Methodology
3 Results and Discussion
4 Conclusions
References
Cyber-Physical Asset Management of Air Vehicle System
1 Introduction
2 AM Needs and Challenges
3 IVHM and Health Awareness
4 Data-Driven Models and Digital Twins for AM
5 AM Information Exchange
6 System Modelling for AM
7 AN AM Solution for Aviation
8 Discussion and Results
9 Conclusions
References
A Case Study on Ontology Development for AI Based Decision Systems in Industry
1 Introduction
2 Overview and Related Work
3 Methodology
4 Results and Evaluations
5 Discussion and Conclusion
References
System Innovation Challenges for Climate Adaptation
1 Introduction
1.1 The Swedish Railway
1.2 Theoretical Framework
2 Research Methodology
3 Results
3.1 Business Model and Procurement Process
3.2 Policy and Regulations
3.3 Technology, Product, and Process and Infrastructure
3.4 Behaviour, Culture, and Values
4 Discussion and Conclusions
References
Data-Driven Predictive Maintenance: A Paper Making Case
1 Introduction
2 Related Work
3 Domain Knowledge
4 Methodolgy
4.1 Data Acquisition
4.2 Health Indicator Construction
4.3 Health Stage Division
5 Results
6 Conclusions and Future Works
References
Dependability Management Framework and System Model for Railway Improvements
1 Introduction
2 Method and Material
3 Results
3.1 Railway Dependability Mgt. Framework
3.2 Rail System Model
4 Discussion and Conclusions
References
Technology and the Future of Maintenance
1 Introduction
2 Maintenance Management
3 Industry 4.0: An Overview
4 Industry 4.0 to Industry 5.0: An Overview
5 The Impact of I4.0 and I5.0 on Maintenance Performance
6 Conclusion
7 Discussion
7.1 Terminology
7.2 Scalability
7.3 The Unknown
References
Generic Smart Rotor Fault Diagnosis Model with Normalised Vibration Parameters
1 Introduction
2 Experimental Rig and Data
3 Early Model Developed [10–12]
3.1 2-Steps Application of Smart Fault Detection Model
4 Optimised Parameters
5 Results
5.1 Step-1:Fault Detection
5.2 Step-2: Fault Diagnosis
6 Concluding Remarks
References
Risk Assessment of Climate Change Impacts on Railway Infrastructure Asset
1 Introduction
2 Research Methodology
2.1 Step 1—Collecting and Pre-processing of Infrastructure and Weather Data
2.2 Step 2: Asset Climate Failure Mapping
2.3 Step 3: Risk Assessment
2.4 Step 4: Remedy Actions
3 Framework Validation
3.1 Risk of High/Low Temperature to Railway
4 Conclusion
References
Quality Assurance in Flow Through Oil and Gas Pipelines
1 Introduction
2 Methodology
3 Results
4 Conclusion
References

Recommend Papers

International Congress and Workshop on Industrial AI 2021 (Lecture Notes in Mechanical Engineering) 3030936384, 9783030936389

This proceedings of the International Congress and Workshop on Industrial AI 2021 encompasses and integrates the themes

118 102 45MB Read more

Proceedings of the 9th International Conference on Industrial Engineering: ICIE 2023 (Lecture Notes in Mechanical Engineering) 3031381254, 9783031381256

This book highlights recent findings in industrial, manufacturing and mechanical engineering and provides an overview of

108 17 106MB Read more

Proceedings of the 7th International Conference on Industrial Engineering (ICIE 2021): Volume II (Lecture Notes in Mechanical Engineering) 3030852296, 9783030852290

This book highlights recent findings in industrial, manufacturing and mechanical engineering, and provides an overview o

117 25 26MB Read more

Proceedings of the 8th International Conference on Industrial Engineering: ICIE 2022 (Lecture Notes in Mechanical Engineering) 3031141245, 9783031141249

This book highlights recent findings in industrial, manufacturing and mechanical engineering and provides an overview of

109 64 122MB Read more

Proceedings of the 6th International Conference on Industrial Engineering (ICIE 2020): Volume I (Lecture Notes in Mechanical Engineering) 3030548139, 9783030548131

This book highlights recent findings in industrial, manufacturing and mechanical engineering, and provides an overview o

122 101 158MB Read more

Proceedings of the 7th International Conference on Industrial Engineering (ICIE 2021): Volume I (Lecture Notes in Mechanical Engineering) 3030852326, 9783030852320

This book highlights recent findings in industrial, manufacturing and mechanical engineering, and provides an overview o

122 42 26MB Read more

Proceedings of 5th International Conference on Mechanical, System and Control Engineering: ICMSC 2021 (Lecture Notes in Mechanical Engineering) 9811696314, 9789811696312

This book comprises the proceedings of the 5th International Conference on Mechanical, System, and Control Engineering 2

119 71 9MB Read more

Optimization of Production and Industrial Systems: Select Proceedings of CPIE 2023 (Lecture Notes in Mechanical Engineering) 9819983428, 9789819983421

This book presents the select proceedings of International Conference on Production and Industrial Engineering (CPIE) 20

98 6 12MB Read more

ABCM Series on Mechanical Sciences and Engineering: 12th Brazilian Manufacturing Engineering Congress - COBEF (Lecture Notes in Mechanical Engineering) [1st ed. 2024] 3031435540, 9783031435546

This book presents research and findings in the field of manufacturing engineering, technologies and innovative approach

104 62 36MB Read more

Proceedings of 10th International Conference on Mechatronics and Control Engineering: ICMCE 2021 (Lecture Notes in Mechanical Engineering) 9811915393, 9789811915390

This volume consists of selected peer reviewed papers from the 10th International Conference on Mechatronics and Control

122 46 6MB Read more

International Congress and Workshop on Industrial AI and eMaintenance 2023 (Lecture Notes in Mechanical Engineering)
3031396189, 9783031396182

Author / Uploaded
Uday Kumar (editor)
Ramin Karim (editor)
Diego Galar (editor)
Ravdeep Kour (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Lecture Notes in Mechanical Engineering

Uday Kumar Ramin Karim Diego Galar Ravdeep Kour Editors

International Congress and Workshop on Industrial AI and eMaintenance 2023

Lecture Notes in Mechanical Engineering Series Editors Fakher Chaari, National School of Engineers, University of Sfax, Sfax, Tunisia Francesco Gherardini , Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy Vitalii Ivanov, Department of Manufacturing Engineering, Machines and Tools, Sumy State University, Sumy, Ukraine Mohamed Haddar, National School of Engineers of Sfax (ENIS), Sfax, Tunisia Editorial Board Francisco Cavas-Martínez , Departamento de Estructuras, Construcción y Expresión Gráfica Universidad Politécnica de Cartagena, Cartagena, Murcia, Spain Francesca di Mare, Institute of Energy Technology, Ruhr-Universität Bochum, Bochum, Nordrhein-Westfalen, Germany Young W. Kwon, Department of Manufacturing Engineering and Aerospace Engineering, Graduate School of Engineering and Applied Science, Monterey, CA, USA Justyna Trojanowska, Poznan University of Technology, Poznan, Poland Jinyang Xu, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China

Lecture Notes in Mechanical Engineering (LNME) publishes the latest developments in Mechanical Engineering—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNME. Volumes published in LNME embrace all aspects, subfields and new challenges of mechanical engineering. To submit a proposal or request further information, please contact the Springer Editor of your location: Europe, USA, Africa: Leontina Di Cecco at [email protected] China: Ella Zhang at [email protected] India: Priya Vyas at [email protected] Rest of Asia, Australia, New Zealand: Swati Meherishi at [email protected] Topics in the series include: . . . . . . . . . . . . . . . . .

Engineering Design Machinery and Machine Elements Mechanical Structures and Stress Analysis Automotive Engineering Engine Technology Aerospace Technology and Astronautics Nanotechnology and Microengineering Control, Robotics, Mechatronics MEMS Theoretical and Applied Mechanics Dynamical Systems, Control Fluid Mechanics Engineering Thermodynamics, Heat and Mass Transfer Manufacturing Precision Engineering, Instrumentation, Measurement Materials Engineering Tribology and Surface Technology

Indexed by SCOPUS, EI Compendex, and INSPEC. All books published in the series are evaluated by Web of Science for the Conference Proceedings Citation Index (CPCI). To submit a proposal for a monograph, please check our Springer Tracts in Mechanical Engineering at https://link.springer.com/bookseries/11693.

Uday Kumar · Ramin Karim · Diego Galar · Ravdeep Kour Editors

International Congress and Workshop on Industrial AI and eMaintenance 2023

Editors Uday Kumar Division of Operation and Maintenance Engineering Luleå University of Technology Luleå, Norrbottens Län, Sweden

Ramin Karim Division of Operation and Maintenance Engineering Luleå University of Technology Luleå, Norrbottens Län, Sweden

Diego Galar Division of Operation and Maintenance Engineering Luleå University of Technology Luleå, Norrbottens Län, Sweden

Ravdeep Kour Division of Operation and Maintenance Engineering Luleå University of Technology Luleå, Norrbottens Län, Sweden

ISSN 2195-4356 ISSN 2195-4364 (electronic) Lecture Notes in Mechanical Engineering ISBN 978-3-031-39618-2 ISBN 978-3-031-39619-9 (eBook) https://doi.org/10.1007/978-3-031-39619-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Editorial

The Power of IAI and eMaintenance Industrial Artificial Intelligence (IAI) is revolutionizing the industrial landscape by bringing automation and optimization to various processes. IAI is an umbrella term encompassing a range of technologies adapted to industrial contexts, including business, operations, maintenance, and asset management. IAI is commonly used in the context of industrial processes and applications, leveraging machine learning algorithms, predictive analytics, and other AI tools to automate and optimize various industrial processes. One particular area where IAI shines is in asset management through eMaintenance. By combining data-driven and model-driven methodologies in a hybrid analytic approach, eMaintenance enables fact-based decision-making. The integration of IAI in eMaintenance has the potential to significantly enhance overall performance efficiency and effectiveness for organisations. The International Workshop and Congress on Industrial AI and eMaintenance have been at the forefront of exploring the possibilities and challenges of implementing AI in industries. Since 2010, six successful events have taken place, and now, after the global pandemic, the 7th act is being conducted onsite. This act aims to shed light on the opportunities and challenges associated with AI implementation, with a particular focus on the operation and maintenance of industrial assets and transport infrastructure. The event has received excellent support from both industry and academia in terms of the number of technical papers and participants. By fostering a collaborative environment, this biannual platform facilitates the sharing of best practices, success stories, and innovative approaches. As IAI and eMaintenance solutions encompass a fusion of diverse technologies and methodologies, the congress addresses challenges and issues from a wide range of disciplines related to eMaintenance. The purpose and theme of the congress is to provide a timely review of research efforts on the topic, covering both fundamental and applied research that contributes towards understanding the strategic role of Industrial AI and eMaintenance in asset

v

vi

Editorial

management and the performance of operation and maintenance of complex systems. The presentations and papers included in this proceeding cover all the areas related and relevant to the main themes of the congress, as mentioned below: . . . . . . . . . . . . . . . . . . . . .

Industrial AI eMaintenance Intelligent Asset Management Industry 4.0 Digitalisation Digital Twin Industrial AI & Machine Learning Cloud Computing Diagnostics & Prognostics Condition Monitoring Condition based maintenance Prognostics & Health Management RAMS Life Cycle Costing & Life Cycle Profit Human-Machine-Interface (HMI) & Human-System-Interface (HSI) Industrial wearables Cybersecurity & Blockchain Data Quality, Information Quality & Quality-of-Service System resilience Autonomous Maintenance Analytics Climate Adaptation and Climate Resilient Infrastructure

The scope of IAI2023 encompasses and integrates the themes and topics of three conferences, namely Industrial AI & eMaintenance, Condition Monitoring and Diagnostic Engineering management (COMADEM), and Advances in Reliability, Maintainability and Supportability (ARMS) on a single platform. We thank all the authors for their contributions and the reviewers for their support. We would also like to thank all the members of the International Advisory Committee, the Programme and Organising Committees for their active support. Uday Kumar Ramin Karim Diego Galar Ravdeep Kour

Contents

A Vision-Based Neural Networks Model for Turbine Trench-Filler Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cesar Isaza, Fernando Guerrero-Garcia, Karina Anaya, Kouroush Jenab, and Jorge Ortega-Moody Use Cases of Generative AI in Asset Management of Railways . . . . . . . . . Jaya Kumari and Ramin Karim A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive Impairments in Fighter Pilots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Angelo Compierchio, Phillip Tretten, and Prasanna Illankoon Risk-Based Safety Improvements in Railway Asset Management . . . . . . . Peter Söderholm and Lars Wikberg Performance of Reinforcement Learning in Molecular Dynamics Simulations: A Case Study of Hydrocarbon Dynamics . . . . . . . . . . . . . . . . Richard Bellizzi, Christopher Hixenbaugh, Marvin Tim Hoffman, and Alfa Heryudono Causal Effects of Railway Track Maintenance—An Experimental Case Study of Tamping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erik Vanhatalo, Bjarne Bergquist, Iman Arasteh-Khouy, and Dan Larsson Self-driving Cars in the Arctic Environment . . . . . . . . . . . . . . . . . . . . . . . . . Aqsa Rahim, Javad Barabady, and Fuqing Yuan

1

15

31 45

61

75 89

Towards a Railway Infrastructure Digital Twin Framework for African Railway Lifecycle Management . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Daniel N. Wilke, Daniel Fourie, and Petrus Johannes Gräbe Climate Change Impacts on Mining Value Chain: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Ali Nouri Qarahasanlou, A. H. S. Garmabaki, Ahmad Kasraei, and Javad Barabady vii

viii

Contents

Systematic Dependability Improvements Within Railway Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Rikard Granström and Peter Söderholm A Conceptual Model for AI-Enabled Digitalization of Construction Site Management Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Gaurav Sharma, Ramin Karim, Olle Samuelson, and Kajsa Simu Risk-Based Dependability Assessment of Digitalised Condition-Based Maintenance in Railway—Part II–Event Tree Analysis (ETA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Peter Söderholm and Per Anders Akersten Making Tracks—Combining Data Sources in Support of Railway Fault Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 R. G. Loe Simulation Environment Evaluating AI Algorithms for Search Missions Using Drone Swarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Nils Sundelius, Peter Funk, and Richard Sohlberg Risk-Based Dependability Assessment of Digitalised Condition-Based Maintenance in Railway Part I–FMECA . . . . . . . . . . . . 205 Peter Söderholm Climate Zone Reliability Analysis of Railway Assets . . . . . . . . . . . . . . . . . . 221 Ahmad Kasraei, A. H. S. Garmabaki, Johan Odelius, Stephen Mayowa Famurewa, and Uday Kumar Remaining Useful Life Estimation for Anti-friction Bearing Prognosis Based on Envelope Spectrum and Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Haobin Wen, Long Zhang, Jyoti K. Sinha, and Khalid Almutairi Artificial Intelligence in Predictive Maintenance: A Systematic Literature Review on Review Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Md Rakibul Islam, Shahina Begum, and Mobyen Uddin Ahmed Using a Drone Swarm/Team for Safety, Security and Protection Against Unauthorized Drones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Ella Olsson, Peter Funk, and Rickard Sohlberg Design, Development and Field Trial Evaluation of an Intelligent Asset Management System for Railway Networks . . . . . . . . . . . . . . . . . . . . 279 Markos Anastasopoulos, Anna Tzanakaki, Alexandros Dalkalitsis, Petros Arvanitis, Panagiotis Tsiakas, Georgios Roumeliotis, and Zacharias Paterakis

Contents

ix

Analysing Maintenance and Renewal Decision of Sealed Roads at City Council in Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Kishan Shrestha and Gopi Chattopadhyay Issues and Challenges in Implementing the Metaverse in the Industrial Contexts from a Human-System Interaction Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Parul Khanna, Ramin Karim, and Jaya Kumari Development of a Biologically Inspired Condition Management System for Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Maneesh Singh, Knut Øvsthus, Anne-Lena Kampen, and Hariom Dhungana Application of Autoencoder for Control Valve Predictive Analytics . . . . . 333 Michael Nosa-Omoruyi and Mohd Amaluddin Yusoff LCC Based Requirement Specification for Railway Track System . . . . . . 343 Stephen Famurewa and Elias Kirilmaz Pre-processing of Track Geometry Measurements: A Comparative Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Mahdi Khosravi, Alireza Ahmadi, and Ahmad Kasraei Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Xinyu Liu, Chao Liu, and Dongxiang Jiang Cooperative Search and Rescue with Drone Swarm . . . . . . . . . . . . . . . . . . . 381 Luiz Giacomossi, Marcos R. O. A. Maximo, Nils Sundelius, Peter Funk, José F. B. Brancalion, and Rickard Sohlberg Domain Knowledge Regularised Fault Detection . . . . . . . . . . . . . . . . . . . . . 395 Douw Marx and Konstantinos Gryllias HFedRF: Horizontal Federated Random Forest . . . . . . . . . . . . . . . . . . . . . . 409 Priyanka Mehra and Ayush K. Varshney Rail Surface Defect Detection and Severity Analysis Using CNNs on Camera and Axle Box Acceleration Data . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Kanwal Jahan, Alexander Lähns, Benjamin Baasch, Judith Heusel, and Michael Roth A Testbed for Smart Maintenance Technologies . . . . . . . . . . . . . . . . . . . . . . 437 San Giliyana, Joakim Karlsson, Marcus Bengtsson, Antti Salonen, Vincent Adoue, and Mikael Hedelind Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Ravdeep Kour, Ramin Karim, and Pierre Dersin

x

Contents

On the Need for Human Centric Maintenance Technologies . . . . . . . . . . . 465 Antti Salonen A Systematic Study of the Effect of Signal Alignment in Information Extraction from Railway Infrastructure Recording Vehicle Data . . . . . . . 477 Daniël Fourie, Daniel N. Wilke, and Petrus Johannes Gräbe Wheel Damage Prediction Using Wayside Detector Data for a Cross-Border Operating Fleet with Irregular Detector Passage Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Johan Öhman, Wolfgang Birk, and Jesper Westerberg Predictive Maintenance and Operations in Railway Systems . . . . . . . . . . . 503 Antonio R. Andrade Experimental Setup for Non-stationary Condition Monitoring of Independent Cart Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Abdul Jabbar, Gianluca D’Elia, and Marco Cocconcelli Hazardous Object Detection in Bulk Material Transport Using Video Stream Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Vanessa Meulenberg, Kamal Moloukbashi Al-Kahwati, Johan Öhman, Wolfgang Birk, and Rune Nilsen Rotor and Bearing Fault Classification of Rotating Machinery Using Extracted Features from Experimental Vibration Data and Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Khalid M. Al Mutairi, Jyoti K. Sinha, and Haobin Wen Are We There Yet?—Looking at the Progress of Digitalization in Maintenance Based on Interview Studies Within the Swedish Maintenance Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Mirka Kans Integrated Enterprise Risk Management and Industrial Artificial Intelligence in Railway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Peter Söderholm and Alireza Ahmadi Digital Twin: Definitions, Classification, and Maturity . . . . . . . . . . . . . . . . 585 Adithya Thaduri The Importance of Using Domain Knowledge When Designing and Implementing Data-Driven Decision Models for Maintenance: Insights from Industrial Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Marcus Bengtsson, Robert Pettersson, San Giliyana, and Antti Salonen

Contents

xi

Point Cloud Data Augmentation for Linear Assets . . . . . . . . . . . . . . . . . . . . 615 Amit Patwardhan, Adithya Thaduri, and Ramin Karim Selection of Track Solution in Railway Tunnel: Aspect of Greenhouse Gas Emission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Andrej Prokopov, Stephen Mayowa Famurewa, Birgitta Aava Olsson, and Matti Rantatalo Intelligence Based Condition Monitoring Model . . . . . . . . . . . . . . . . . . . . . . 639 Kouroush Jenab, Tyler Ward, Cesar Isaza, Jorge Ortega-Moody, and Karina Anaya Enhancing the Effectiveness of Neural Networks in Predicting Railway Track Degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Mahdieh Sedghi Process Reliability Analysis Applied for Continual Improvement of Large-Scale Alumina Refineries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 R. Welandage Don, G. Chattopadhyay, and J. Kamruzzaman Cyber-Physical Asset Management of Air Vehicle System . . . . . . . . . . . . . 679 Olov Candell, Robert Hällqvist, Ella Olsson, Torbjörn Fransson, Adithya Thaduri, and Ramin Karim A Case Study on Ontology Development for AI Based Decision Systems in Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Ricky Stanley D’Cruze, Mobyen Uddin Ahmed, Marcus Bengtsson, Atiq Ur Rehman, Peter Funk, and Rickard Sohlberg System Innovation Challenges for Climate Adaptation . . . . . . . . . . . . . . . . 707 Veronica Jägare, Ulla Juntti, and A. H. S. Garmabaki Data-Driven Predictive Maintenance: A Paper Making Case . . . . . . . . . . 723 Davide Raffaele and Guenter Roehrich Dependability Management Framework and System Model for Railway Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737 Peter Söderholm Technology and the Future of Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . 751 Derek Dixon and David Baglee Generic Smart Rotor Fault Diagnosis Model with Normalised Vibration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Natalia Espinoza-Sepulveda and Jyoti Sinha

xii

Contents

Risk Assessment of Climate Change Impacts on Railway Infrastructure Asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 A. H. S. Garmabaki, Masoud Naseri, Johan Odelius, Ulla Juntti, Stephen Famurewa, Javad Barabady, Matthias Asplund, and Gustav Strandberg Quality Assurance in Flow Through Oil and Gas Pipelines . . . . . . . . . . . . 789 Muhammad Atif, Rakesh Mishra, Matthew Charlton, and Andrew Limebear

A Vision-Based Neural Networks Model for Turbine Trench-Filler Diagnosis Cesar Isaza, Fernando Guerrero-Garcia, Karina Anaya, Kouroush Jenab, and Jorge Ortega-Moody

Abstract Vision-based neural networks as artificial intelligence models have been critical in many manufacturing industries, including automotive, food, and aerospace. Machine vision and deep learning have provided practical, promising, and accessible innovations to address many problems during manufacturing and diagnostics. Recent studies offer beneficial results in implementing industrial artificial intelligence systems that require determining, comparing, and evaluating optimal technological solutions. The aerospace industry has to deal with the problem of diagnosing critical components that continually need the expertise of a trained human being. On the other hand, automatic diagnostics are becoming a critical technology to deal with this problem. However, these systems require a particular configuration of computer vision algorithms which, in the case of turbine trench-filling components, have yet to be added to the literature. Considering the above, we report in this paper a new methodology that uses a pre-trained deep neural network framework available online to automatically diagnose geometric nonconformities in aeronautical components that protect the turbines from vibrations and reduce noise from the engines to the aircraft deck. The method is based on the following stages: a computer vision C. Isaza (B) · K. Anaya Networking and Telecommunication Academic Program Polytechnic University of Queretaro, Queretaro, Mexico e-mail: [email protected] K. Anaya e-mail: [email protected] F. Guerrero-Garcia Productive System Master Academic Program, Polytechnic University of Queretaro, Queretaro, Mexico e-mail: [email protected] K. Jenab · J. Ortega-Moody School of Engineering and Computer Science, Morehead State University, Morehead, KY 40351, USA e-mail: [email protected] J. Ortega-Moody e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_1

1

2

C. Isaza et al.

system with a monochrome camera to acquire images, training of deep neural networks with transfer learning, and a stage to analyze nonconformities automatically. In addition, a benchmark of several pre-trained deep neural networks is presented to address the problem. According to the experimental results, the YOLO deep neural network topology significantly contributes to automatic diagnosis with high precision and exact rate. Finally, we show that integrating deep neural networks with control graphs is a promising strategy to optimize online diagnostics in the production lines of the aeronautical industry. Keywords Computer vision · Deep neural networks · Automatic diagnosis · Industrial artificial intelligence

1 Introduction Manufacturing and maintaining turbines are crucial functions in the aeronautical industry, requiring specialized attention and inspection. Traditionally, skilled professionals perform these tasks manually, carefully examining each component from the production phase to disassembly using their expertise and training [1]. In the aviation sector, predictive maintenance has become critical for improving maintenance schedules, minimizing aircraft downtime, and detecting unforeseen defects. All materials, components, and systems must be tested to meet contemporary turbine standards. In an aircraft turbine, a vital component is the Trench-filler, a multifaceted mechanical part that provides mechanical, thermal, and sound insulation [2]. A Trench-filler is made of composed materials [3]. A resin and carbon fibers are melted in a high-pressure autoclave to create a component with a diameter ranging from 50 cm to 2.5 m. During the process, a manual diagnosis is made to ensure its quality meets the demanding standards of the aeronautical industry [4]. Due to the complexity, the inspection is performed by a skilled human who is well-trained in diagnosing non-conformities. However, achieving this operation is a big challenge because the Trench-filler’s color is too dark, requiring the inspector to have excellent vision and complex illumination systems [5]. As a result, bottlenecks occur in the overall turbine manufacturing and maintenance process. Considering the above, the design and implementation of an assisted diagnosis and maintenance system for turbine Trench-filler is required and corresponds to the primary goal of this work. The method considers a computer vision system using the most advanced and available technology for automatic diagnosis. Deep neural networks have recently opened new possibilities to deal with many complex problems in the aeronautical industry [6]. Similar tasks have been reported in the literature, but the specific kind of non-conformities of the Trench-filler has not been studied yet [7].

A Vision-Based Neural Networks Model …

3

The proposed scheme uses a vision system with a monochromatic camera due to the color of the Trench-filler surface. The illumination is centered on the wavelength of the maximum sensibility of the CCD sensor. A lens with the focal distance and focus adjusted to detect the non-conformities was estimated according to the typical and historical values obtained during the manufacturing and diagnosis. A well-known deep neural network is used to recognize defects, implemented in a framework with the convolutional neural network already trained [8]. A transfer learning process is used to adapt the deep neural network to diagnose the Trench-filler. This paper presents a vision-based neural networks model for turbine Trenchfiller diagnosis based on a computer vision system to classify defects appearing on the component’s border. The method can accurately identify three types of nonconformities and significantly reduce inspection time and visual fatigue for human inspectors.

2 Literature Review Artificial intelligence is a technology that focuses on solving complex problems, such as computer vision and natural language processing, using hardware and software. A common task is recognizing control graphic patterns as non-conformities for the manufacturing industry. In 2001, Perry et al. [9] proposed a control chart pattern recognition using artificial neural networks. The method used a network of four layers comprising single input and output layers and two hidden layers. Similarly, Abbsi proposed using artificial neural networks to control chart pattern recognition. The backpropagation algorithm showed that the training performance could identify patterns accurately [10]. In 2012, Shaban et al. introduced a double network method for the identification and parameter estimation [11]. An extensive description of statistical learning using neural network methods to train and improve the performance of different topologies used in computer vision and automatic diagnosis systems was presented by [12]. In 2019, Nimbale et al. [13] introduced a method to monitor the mean and variability of the process by using neural networks. The research discussed in this paper focuses on a method for recognizing in images non-conformities in a Trench-filler. The above implies that pixels will be associated with a surface defect. This activity defines the objects that must be taken from the image. Thus, the convolutional network can process the region of interest, as was used by Badrinarayanan et al. [14]. The authors extract features from the image through filters by an encoder. Regularly some filters or edge detection stages are applied to improve the accuracy of the segmentation process. Another technique for recognizing regions of interest in images is based on the Unet model. This deep neural network consists of object detection through semantic segmentation and segmentation by instances. Semantic segmentation consists of labeling each pixel, while with segmentation by instances, each element is classified

4

C. Isaza et al.

separately. How was it done in this work where each type of porosity in each image was classified. The convolutional Decoder-Encoder deep neural networks are also efficient in dealing with the problem of image recognition. Since they are applications that are generally used with supervised models with data labeled instead of making the traditional measurements in the X and Y coordinate system of an image. The training of this neural network is based on obtaining the results through the input applications; within the Encoder, it compresses the input data by segmenting each pixel in the form of a vector. Meanwhile, the DECODER is in charge of reconstructing the image of the original input of each image. A U-Net convolutional neural network model was used by Miao et al. to automatic recognition of highway tunnel defects. This architecture consists of two ways; one is used for the contraction that works as an Encoder to capture the context of an image and thus creates a map of the image’s characteristics and gradually reduces its size to use fewer parameters. The second way is called Decoder, which transposes the image using its location [15]. Convolutional neural networks (CNN) identify and classify images using the labeled data to split features. This model uses pre-tagged images for its feature division. Some images are used from a research network called ImageNet. The CNN architecture is given with all output neurons having an activation function for classifying the input. And so, the convolution will perform the processing of each pixel to give the output image. In 2021 Lu et al. [16] extensively reviewed CNN to classify plant leaf disease. Exploring new capabilities of deep neural networks, Yu et al. [17] made a review of recurrent neural networks (RNN). In this architecture, objects are classified and located using a bounding box and semantic segmentation that classifies each pixel. Each region of interest gets a segmentation mask. A class label and a bounding box are produced as the final output. Another architecture is an extension called Faster R-CNN. This topology comprises a deep convolutional network that proposes the regions and a detector that uses the regions to make the recognition. Automatic diagnosis using computer vision with YOLO (You Only Look Once) is a state-of-the-art technology that enables the rapid and accurate detection of various anomalies in medical images. The computer can quickly identify and diagnose non-conformities by applying YOLO to images. This technology can potentially significantly improve the speed and accuracy of diagnoses. In 2022, Salman et al. used YOLO to diagnose prostate cancer in images [18]. Similarly, Karasi et al. used YOLO to detect Coronavirus disease (COVID-19) cases from X-ray images [19].

3 Materials and Methods This research project proposes a vision-based neural networks model for turbine Trench-filler diagnosis. The materials and methods to create this proposal include integrating the following technological components: Trench-filler in turbines, pre-

A Vision-Based Neural Networks Model …

5

trained deep neural network—YOLO, a tool to retrain the network with transfer learning and model implementation in a cloud solution.

3.1 Trench-Filler in Turbines The Trench-filler is a product of the aerospace industry, and the function of this component is to support the turbine and to isolate the noise inside the aircraft. It is produced using carbon, polymerizing when brought to a temperature of more than ◦ .150. C. Figure 1 illustrates a Trench-filler over the rotatory table to manual diagnosis.

3.2 YOLO The YOLO algorithm is a convolutional neural network for real-time object detection [20]. Unlike other networks, it utilizes a single neural network architecture, enabling it to detect objects faster and more accurately by predicting multiple bounding boxes and their corresponding probabilities. YOLO depends on several tools, including the COCO dataset, a widely used reference for object detection. This dataset supports object segmentation, context recognition, and classification of 80 different object types, among other functionalities.

3.3 Labelme Labelme is an application that facilitates annotating images with various geometric shapes, including polygons, rectangles, circles, and dotted lines [21]. It also allows users to download and convert data into the VOC format, which is helpful for semantic and instance segmentation. To achieve instance segmentation, the software can export

Fig. 1 Trench-filler for turbine

6

C. Isaza et al.

data sets into COCO format, which allows for the labeling of specific objects within the image that has been previously identified.

3.4 Roboflow Utilizing multiple models like YOLOv5 when training a deep neural network simplifies the process of uploading labeled image files for real-time object detection and classification. Thus, the Labelme framework allows efficient image annotation to reduce the preprocessing of datasets. Generating new training data involves evaluating and experimenting with several datasets, which sometimes require a lot of effort to manually label the regions of interest. Considering the above, all images of the dataset used in this project were tagged with this framework. The transfer learning process was done with the same application to re-train the deep neural network. In addition, this tool’s primary purpose is not to develop new machine learning models but to achieve trained topologies with other specific purposes like detecting non-conformities in a Trench-filler.

4 Methodology To produce a Trench-filler, composite carbon sheets polymerized through lamination in molding, using temperatures exceeding .150◦ C, are utilized. Any defect must be corrected through reprocessing, ensuring a conforming piece free of porosities and cracks. As was described previously, manual inspection is a bottleneck issue in the diagnosis process, which is proposed to solve with the methodology introduced in this study. Figure 2 illustrates the general Vision-based neural network for Trench-filler diagnosis. There are three main stages: image acquisition, training, and testing.

4.1 Image Acquisition System The image acquisition system is based on a circular mechanical rotatory table synchronized with a DC motor, a monochromatic camera HT-GE1201M-T1P-C, and a frame grabber application programmed in Python Language running in a personal computer with Anaconda Python distribution and OpenCV libraries. To obtain a complete inspection of the border, a total of 70 images are acquired. Thus, the base is rotated .5.14 ◦ , and an image is grabbed. The illumination system is centered at 512 nm (green light), corresponding to the CCD sensor’s maximum sensibility.

A Vision-Based Neural Networks Model …

7

Fig. 2 Proposed methodology for vision-based NN for turbine Trench-filler diagnosis

Fig. 3 Image acquisition system

4.2 Trench-Filler Non-conformities Carbon fibers and resin are two primary materials to manufacture a Trench-filler. The physical properties include strong resistance and lightweight. However, three types of non-conformities may appear in a given component: types 1, 2, and 3. Defects type 1 is over the surface of the Trench-filler where only resin material is still missing. Type 2 is a pore on the surface of the resin with a fiber partially exposed. Type 3 is the most compromised non-conformity because it involves the lack of resing and complete carbon fibers exposed (Fig. 3). Each type of pore indicates the lack of resin on the surface where types 1 and 2 are not critical levels to reject the part as non-conforming. However, type 3 is the most critical because it can affect the final use of the application as a possible static leak. This defect is among the most common rejections after polymerization. Typically a well-trained expert spends two to six working days in the manual visual inspection

8

C. Isaza et al.

Fig. 4 Trench-filler non-conformities types

and defects correction. Figure 4 illustrates a transversal cut of a part with the three types of non-conformities that may appear in a Trench-filler.

4.3 Image Dataset The Trench-filler dataset contains 168 images taken perpendicular to the component’s border, as Fig. 3 illustrates. Thus, the dataset is divided into (70%) to train the model, (20%) to validate, and (10%) to test. Training images with non-conformities were classified into three major categories according to previously defined defects. The image dataset is considered a semantic segmentation; once labeled, it is saved in a unique folder that will be called from the training stage. All images have the original input, mask, and the general highlight mark of the region of interest. Furthermore, the number of images gathered to train the model is enough because each type of non-conformity is considered an independent and identically distributed random variable. In Fig. 5, the first row, representative images without non-conformities. The second, third, and fourth rows are images classified with non-conformities types 1, 2, and 3, respectively.

4.4 Image Segmentation and Annotation The image segmentation and annotation were done by using the Labelme framework. The human experts made the diagnosis over the Trench-filler manually. Then they visualized the acquired images to recognize the porosities and cracks over the component’s surface. A single color was used to represent each type of porosity non-conformities depending on its size, shape, and deepness (Type 1 = red, type 2 = green, and type 3 = Yellow). The aeronautical standards reject any piece with a single type 3 non-conformity. However, a Trench-filler that is not accepted can be reworked manually by expert technicians of composed materials. To improve the performance of the annotation, a UHD 70-inch 4K Smart TV was installed next to the automatic rotatory basic. The image acquisition system first takes

A Vision-Based Neural Networks Model …

9

Fig. 5 Examples of Trench-filler images

a full sequence of images rotating the Trench-filler. Then, an inspector recognizes all non-conformities manually. Figure 6 illustrates images of porosity types that may appear in a Trech-filler. The first, second, and third rows illustrate non-conformities types 1, 2, and 3, respectively. The first and second columns are the original input image and the manual segmentation. The third column is the input image with manual segmentation highlighted.

10

C. Isaza et al.

Fig. 6 Types of porosity on the Trench-filler surface

4.5 Model Training Using a powerful GPU tool that requires no configuration, the presented model was trained on the Google Colab cloud platform. The YOLOv5 model was built on a Roboflow image dataset previously described in Sect. 4.4, and pre-trained COCO weights were utilized. The dataset was downloaded as a zip folder to Colab using the URL generated by Roboflow. The annotated dataset was divided into three sets: a training set containing 116 images, a validation set with 35 images, and a testing set with 17 images. Each image in the Roboflow data was labeled with the three types of non-conformities. To create several detection models using the image dataset, we employed the YOLO v5l algorithm. The network was trained using specific parameters, including a base learning rate of 0.001, a momentum of 0.937, weight decay of 0.0005, a batch size of 20, and 600 epochs. Training the model took 14 min to complete the process. To identify non-conformities, three metrics were employed: precision, recall and mean average precision. Precision is determined by dividing the number of accu-

A Vision-Based Neural Networks Model …

11

Fig. 7 Training performance of YOLOv5 with transfer learning in roboflow

rately marked non-conformities by the total non-conformities. In contrast, recall is determined by dividing the number of accurately marked non-conformities by the total number of detections. Figure 7 displays graphs demonstrating the model’s progress, showcasing various performance metrics for training and validation images. The deep neural network’s optimal weights were selected after 600 epochs, which proved sufficient. The precision, recall, and mAP parameters evidenced the improved model. A significant reduction in validation classification loss was observed after epoch 100. The lost function also illustrated the trained network’s ability to identify non-conformities in input images. The relationship between the loss and classifier model inversely reflected the types of input and output non-conformities processed by the network.

5 Experimental Results To verify the efficiency of the proposed algorithm, qualitative and quantitative analyses were done. Then, the model trained with Roboflow was moved to Google Colab to measure the performance in the testing images using Python functions. The qualitative results of the diagnosis of non-conformities in the Trench-filler are illustrated in Fig. 8.

12

C. Isaza et al.

Fig. 8 Qualitative results

It was previously stated that the most crucial type of non-conformity diagnosis is Type 3, where carbon fiber is visible without resin. Figure 8 displays several instances of the model’s detection performance. Overall, the proposed strategy in this research substantially improves the diagnosis process by enhancing inspectors’ efficiency and accuracy. The confusion matrix is a standard tool to evaluate the performance of any machine learning method. Visualizing the performance of the supervised learning algorithm is fundamental to reaching the optimal values of parameters. Notably, the nonconformities study in this research, type 1, yields a lower result in the prediction because it is a defect less present in the dataset built. Consequently, it is possible to differentiate the porosity types 2 and 3 with better results. It is worth mentioning that type 3 defects are direct rejections in the final inspection. With this visual support, the inspectors can know the accuracy and precision of the algorithm. The confusion matrix results showed that the detection efficiency of type 1 against the background is 0.86%; for type 2, 0.80% and for type 3: 0.72%. Although the results can still be improved, the limitation of the database selected for the study and the imbalance

A Vision-Based Neural Networks Model …

13

of the classes make the efficiency of the method for type 3 non-conformities still a subject of study. In addition to the qualitative and quantitative evaluation of the proposal, an improvement in time efficiency during the diagnosis was achieved. The approximate time for a trained inspector’s manual diagnosis of a Trench-filler is 1 h and 30 min. In contrast, the proposed Vision-Based Neural Networks Model for the Diagnosis system has been optimized, achieving diagnosis in 6 min. Therefore, a significant monthly increase in operating efficiency from 112 to 3500 Trench-filler units is expected.

6 Conclusion In this research, we proposed a vision-based neural networks model for turbine Trench-filler diagnosis. With this model, we enhance and optimize the diagnosis process by accurately detecting and diagnosing non-conformities over the surface of the composed material. Comparing the proposed method against traditional manual inspection, the vision-based neural network strategy demonstrated that computer vision, deep neural networks, and automated diagnosis technologies are fundamental in the aeronautical industry. Experimental results showed that the diagnosis of critical components that require the expertise of a trained human being could be achieved by using industrial artificial intelligence. Remarkably, the average diagnosis rate of the essential non-conformities type 3 was over 85%. In addition, the proposed model introduced in this paper can be applied to other turbine aircraft components for future work. Furthermore, combinations of many different deep neural network topologies can be tested and developed under the concept of transfer learning to improve the accuracy of the automatic diagnosis. Acknowledgements The authors would like to sincerely thank KY NSF-EPSCoR, the Council for Science and Technology of Mexico (CONACYT), and the Universidad Politecnica de Queretaro for their financial support; and the DUQUEINE GROUP of Composite manufacturing, design, and manufacturing of high-performance composite parts and sub-assemblies for the availability of the experimental data.

References 1. Stanton I, Munir K, Ikram A, El-Bakry M (2022) Predictive maintenance analytics and implementation for aircraft: challenges and opportunities. Syst Eng 2. Kennet DM (1994) A structural model of aircraft engine maintenance. J Appl Econom 9(4):351– 368 3. Zhang C, Ling Y, Zhang X, Liang M, Zou H (2022) Ultra-thin carbon fiber reinforced carbon nanotubes modified epoxy composites with superior mechanical and electrical properties for the aerospace field. Compos Part A Appl Sci Manufact 163:107197

14

C. Isaza et al.

4. Norkhairunnisa M, Chai Hua T, Sapuan SM, Ilyas RA (2022) Evolution of aerospace composite materials. In: Advanced composites in aerospace engineering applications, pp 367–385. Springer 5. Aust J, Pons D (2022) Comparative analysis of human operators and advanced technologies in the visual inspection of aero engine blades. Appl Sci 12(4):2250 6. Meister S, Wermes M, Stüve J, Groves RM (2021) Investigations on explainable artificial intelligence methods for the deep learning classification of fibre layup defect in the automated composite manufacturing. Compos Part B Eng 224:109160 7. Fotouhi S, Pashmforoush F, Bodaghi M, Fotouhi M (2021) Autonomous damage recognition in visual inspection of laminated composite structures using deep learning. Compos Struct 268:113960 8. Jiang P, Ergu D, Liu F, Cai Y, Ma B (2022) A review of yolo algorithm developments. Proc Comput Sci 199:1066–1073 9. Perry MB, Spoerre JK, Velasco T (2001) Control chart pattern recognition using back propagation artificial neural networks. Int J Prod Res 39(15):3399–3418 10. Abbasi B (2009) A neural network applied to estimate process capability of non-normal processes. Expert Syst Appl 36(2):3093–3100 11. Shaban A, Shalaby MA (2012) A double neural network approach for the identification and parameter estimation of control chart patterns. Int J Qual Eng Technol 3(2):124–138 12. Carlyle WM, Montgomery DC, Runger GC (2000) Optimization problems and methods in quality control and improvement. J Qual Technol 32(1):1–17 13. Nimbale SM, Ghute VB (2019) Monitoring process mean and variability using artificial neural networks. Int J Sci Res Math Stat Sci 6:3 14. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495 15. Miao X, Wang J, Wang Z, Sui Q, Gao Y, Jiang P (2019) Automatic recognition of highway tunnel defects based on an improved u-net model. IEEE Sensors J 19(23):11413–11423 16. Lu J, Tan L, Jiang H (2021) Review on convolutional neural network (CNN) applied to plant leaf disease classification. Agriculture 11(8):707 17. Yong Yu, Si X, Changhua H, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270 18. Salman ME, Cakar GC, Azimjonov J, Kosem M, Cedimouglu IH (2022) Automated prostate cancer grading and diagnosis system using deep learning-based yolo object detection algorithm. Expert Syst Appl 201:117148 19. Karacı A (2022) VGGCOV19-net: automatic detection of Covid-19 cases from x-ray images using modified VGG19 CNN architecture and YOLO algorithm. Neural Comput Appl 34(10):8253–8274 20. Diwan T, Anirudh G, Tembhurne JV (2022) Object detection using yolo: challenges, architectural successors, datasets and applications. Multimedia Tools Appl:1–33 21. Aljabri M, AlAmir M, AlGhamdi M, Abdel-Mottaleb M, Collado-Mesa F (2022) Towards a better understanding of annotation tools for medical imaging: a survey. Multimedia Tools Appl 81(18):25877–25911

Use Cases of Generative AI in Asset Management of Railways Jaya Kumari and Ramin Karim

Abstract Asset management of railways is a data-driven process. Empowering asset management through utilisation of Artificial Intelligence (AI) and digital technologies for data-driven fact/based decision-making is highly dependent on the availability and accessibility of data. Additionally, data-driven approach puts demands on the quality of data and the relevance of the datasets to the contexts of analytics. This is to ensure the accuracy of the analytics and the precision of the predictions. One of the emerging approaches that can be utilised to augment data used for analytics and model learning process is Generative Artificial Intelligence (GAI). GAI can be useful in the various contexts of asset management of railways. This paper aims to provide some use cases in which GAI can be utilised for e.g. data augmentation that will lead to an improved accuracy and precision of decision-support. The identified use cases will provide a list of potential areas that can be used to develop a roadmap for implementation of GAI within the asset management of railways. Keywords Asset management · Railways · Rolling stock · Generative AI · Operation and maintenance

1 Introduction Asset management of railways is a data-driven process. Asset health analytics is critical to make efficient and effective decisions related to asset management processes. Such analytics is based on data related to the operation and maintenance of assets. The methodology and technology to develop these analytic services, are already highly evolved in the research field. In recent time, there has been an exponential J. Kumari (B) · R. Karim Luleå University of Technology, Luleå, Sweden e-mail: [email protected] R. Karim e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_2

15

16

J. Kumari and R. Karim

growth in data-driven approaches with development in areas such as data acquisition, data pre-processing, big-data storage, data analytics using AI technologies, and information visualisation. However, when it comes to industrial applications, the use of such technologies still largely remains at lab-scale demonstrators [1]. AI is still observed to be at an early stage of development in most of the application areas related to railways [2]. The AI and digitalisation technologies related to the development of analytic services, are based on the underlying assumption that a vast amount of asset-related data is available. In-spite of the availability of advanced data acquisition techniques, there is still almost always a lack of relevant and good quality data for an assetintensive organisation such as railways. One of the reasons behind this is the challenge of data acquisition without interrupting operation. Second challenge is related to the diverse operational contexts for railway infrastructure and rolling stock. Third challenge is the multi-stakeholder ownership and operation of railways in Sweden. In such a scenario, it then becomes critical to make the best use of available data by improving its quality and utilising it to generate relevant data for specific contexts and configurations of assets. When there is less data available, it is useful to be able to generate data which is similar to the available data but is not exactly the same. This can be done by generating a diverse dataset that is based on the underlying pattern of the training data but is not an exact replica of the training data. Such diversity in the training data increases the robustness and generalisation capabilities of data-driven models. Data-driven approaches such as now-casting and forecasting in asset management of railways face some specific challenges. These challenges are mainly related to (1) the development of context-aware analytics, (2) availability of historical data and condition data for now-casting and forecasting, (3) issues related to data quality, (4) natural language processing of operation and maintenance logs, (5) the knowledge transfer between a fleet of assets and an individual asset, (6) the no fault found condition, (7) life-cycle management of assets, and the (8) maintenance policy for assets [3, 4]. These challenges are directly related to the availability of good quality data. Generative artificial intelligence (GAI) technology is known to create synthetic data that can substitute real data [5]. In the past, there have been multiple generative algorithms such as the iterative process in Turing machines, simulation techniques such as, Markov chain Monte Carlo simulation, and genetic algorithms that have been able to generate data which is similar to the training data. However, there have been some limitations of these past attempts to create generative algorithms. Algorithms such as Markov chain Monte Carlo, are based on making strong assumptions about the underlying pattern in the data and make approximations based on these assumptions, and are computationally expensive [6]. This may lead to models with sub-optimal performance. However, the recent advancements in generative AI techniques have provided an alternative to generating realistic data [7]. This paper identifies the state-of-art in the application of Generative AI technologies. It is observed from the state-of-art that, the application of GAI in railways has been mainly for focussed problems related to specific assets. There was no study

Use Cases of Generative AI in Asset Management of Railways

17

found that looked at the possibilities of GAI from a broader perspective of challenges related to data-driven approaches in railways. The paper, then explores the implementation methods, of two of the most popular GAI models, i.e. Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs). This is done to develop an understanding on the viability of different GAI models for use cases in asset management of railways. Combining the knowledge from the understanding of these techniques, the state-of-art in the application of Generative AI, and existing data-related challenges in asset management of railways, the paper identifies some use cases. The main contribution of this paper is these identified use-cases in asset management of railway system where GAI can be utilised for its capabilities in data augmentation, domain translation, anomaly detection etc. These use cases are based on real-world challenges in the asset management of railways, identified as part of the AI Factory for railways project, which is a consortium of railway organisations to identify and solve real-world challenges related to operation and maintenance in railways [1]. This paper is a preliminary study on the possibilities of GAI in asset management of railways. The verification of these approaches has not been conducted within the scope of the paper. They will be carried out as future work, as the verification of each of these approaches is expected to be an independent research work.

2 State-of-Art 2.1 History of Generative Algorithms The emergence of generative algorithms can be traced back from 1930s, when a class of machines called the Turing machines was defined by A. M. Turing [8]. The Turing machines work similar to a generative algorithm, where at any given time, the next state of the machine is defined by the current state of a control element with a fixed number of internal states, and the data that is being read [9]. Another early example of generative algorithms are Von Neumann’s self-reproducing machines based on a fixed set of elementary components [10]. The concept like that of the GAI algorithms has also been used in genetic algorithms. An article by John H. Holland published in 1990s, on genetic algorithms [11] introduced the concept similar to generative algorithms, where new solutions were generated and optimised iteratively based on existing knowledge. Markov chains, explained and formulated by Andrey Markov [12] are mathematical models that work like generative algorithms. In Markov chains, the system transitions from one state to another based on transition probability to that state, which is only determined by the current state of the system. Monte Carlo simulation techniques that use samples from available data to estimate statistical parameters, have also been used in generative algorithms.

18

J. Kumari and R. Karim

The articles containing the term ‘Generative AI’ have been comparatively fewer up to the year 2017. Most of this early research work on GAI has been in the field of computer science, where the research consisted of development of generative algorithms. Publications on GAI has exponentially shot up from 2018 to current year, 2023, with more varied applications of the concept in engineering, and other applied fields such as arts, biochemistry, business, psychology etc.

2.2 Generative AI Models GAI models are being used in research after certain modifications and enhancements required for the task at hand. Models such as transformers, are more commonly used for general GAI applications, while models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAE), diffusion models, and flow models can be used for GAI applications that are task specific [13]. Two of the most commonly used GAI models, i.e. GAN and VAE have been discussed below. Like any machine learning models, the efficiency of GAI models is based on the quality of training data, and the tuning of the model hyperparameters for increased performance.

2.2.1

Generative Adversarial Networks (GANs)

Generative adversarial networks consist of one neural network called the ‘generator’ based on generative algorithm and another called the ‘discriminator’ based on discriminative algorithm. The working of generative algorithms can be understood in contrast to the working of discriminative algorithms. Discriminative algorithms define the category or class of a given data based on its input features [14]. If the features are expressed by ‘x’ and the labels are expressed as ‘y’, a discriminative algorithm tells p(y|x), i.e. the probability that the label is y, given the features are x. Generative algorithms on the contrary tell the probability p(x|y), i.e. the probability of the features x, given that the label is y. Therefore, while discriminative algorithms intend to put a separation layer between different labels/classes, generative algorithm intends to model the probability distribution of each of the classes. These two ‘generator’ and ‘discriminator’ neural networks, work against each other in an ‘adversarial’ manner to generate synthetic data that can no more be discriminated from real data. As shown in Fig. 1, the generator network takes random noise samples and generates synthetic data. The noise acts as a random seed and prevents the GAN to generate the same synthetic data every time. The discriminator network assigns label probabilities to the synthetic data and the real data that is used as the ground truth. The loss function calculated based on the predicted labels provide feedback to the generator and the discriminator to improve the training accuracy. GANs are suitable for tasks such as generating data in unsupervised learning tasks, training with data on a source domain and fine tuning it on a target domain,

Use Cases of Generative AI in Asset Management of Railways

19

Fig. 1 The double feedback loop of a generative adversarial network (GAN)

data augmentation, and anomaly detection. GANs can generate data that is not only like the input but also with diverse characteristics.

2.2.2

Variational Autoencoders (VAEs)

Variational autoencoders (VAEs) are a method used for unsupervised representation learning, that captures the underlying information and structures in the data. The VAE comprises of two coupled models; a recognition model called the encoder, and a generative model called the decoder. The VAE uses ‘Expectation–Maximisation’ approximation approach to estimate its posterior over latent variables. As shown in Fig. 2, the encoder receives the input data, and maps it into a lower dimensional latent space. The output of the encoder is a set of parameters, that define the distribution of the input data in the latent space. The parameters obtained from the latent space are used to generate a distribution. A sample is drawn from this distribution and passed to the decoder which reconstructs the input data. The loss function measures the difference between the input data and the generated data. This is called the reconstruction loss which is used to match the input data. The other component of the loss function is called the regularisation loss, which promotes the learning of the latent space. The overall loss function is aimed to be minimised during the training of the VAE. The encoder, and the decoder are multiple layer neural networks. Like GANs VAEs are also suitable for data generation tasks in unsupervised learning. Due to their ability to learn latent representations in the data, VAEs can also be used for generating missing data, and for denoising data. VAEs where the focus in on reconstructing the input data makes it more suitable for data quality applications, than GANs where the focus is on generating new data, making it more suitable for data augmentation applications. When working with data in the form of images, the inherent loss function of the VAE may lead to blurred images with smoothed outputs, Fig. 2 The basic workflow of a variational autoencoder

20

J. Kumari and R. Karim

while GANs are known to capture fine details and generate crisp images. In terms of domain translation applications, the reconstruction loss of VAEs may prevent it from capturing the fine details that mark the model differences while adaptation to a target domain. Both the models have limitations when it comes to specific tasks, and therefore, there is multiple research that has explored a combination of VAEs and GANs to attain desirable results [15–17]. However, as stated earlier in this section, that GANs and VAEs both have the capability to be enhanced based on a specific task. The selection of a generative AI algorithm should be made based on factors such as, the task at hand, the type of data used, explain ability of the model, and computational limitations.

2.3 Application Areas of Generative AI (GAI) Since 2018, the percentage of research articles on the application of GAI to fields such as engineering, medicines, social sciences, and decision sciences have increased, however a substantial amount of research remains in the field of computer science. The research on application of GAI in industrial applications is mostly focussed on areas such as domain translation, data augmentation, improvement of data quality, anomaly detection, and design. These application areas have been discussed below.

2.3.1

Domain Translation

Domain translation refers to the adaptation of content from one domain to another. In terms of data-driven and model-driven approaches, domain translation refers to adaptation of models developed for one domain to a model that is relevant for another domain through fine-tuning the model with data from the target domain. GAI algorithms can be used to generate data for a target domain, and thereby, facilitate domain translation approaches. In a study focussed on structural health monitoring of assets, the acceleration data on detection of damage in pipes in a different working condition, was used to generate acceleration data for damaged conditions while a structure is healthy [18]. One of the challenges related to domain translation in machine learning is covariate shift, when the situation for input data in training is different from the input data in testing, which leads to reduced model performance. An application of GAI for detecting track defects on railroad, addresses such a problem of covariate shift and overfitting due to lack of training data and uses generative adversarial networks (GANs) to eliminate biases [19].

2.3.2

Data Augmentation

Data augmentation refers to the process of applying various augmentation techniques on available data to generate more data that is similar but has varying characteristics,

Use Cases of Generative AI in Asset Management of Railways

21

in order to increase the robustness and generalisation in the training of data-driven models [20]. Generative AI has been extensively used for data augmentation to supplement scarce datasets in fault diagnostic and anomaly detection models to improve the model performance and accuracy both in supervised and unsupervised learning [21–23].

2.3.3

Data Quality

The issues related to data quality refer to discrepancies in data such as missing/ incomplete information, incorrect/erroneous/distorted data, inconsistent data, duplicates in the data, overrepresentation and underrepresentation of events creating bias in the data, and issues related to the accuracy and reliability of data. Some GAI models have the capability to tackle data quality issues such as imputation of missing data, reducing noise in the data, decreasing bias in the data through data augmentation techniques etc.

2.3.4

Anomaly Detection

An anomaly can be defined as an event that is different from the expected usual behaviour. Anomalies in the data are defined based on the requirements of the application domain. With respect to data-driven algorithms, an anomaly is something that is different from the general behaviour of the data. GAI algorithms learn the underlying pattern in the data to generate new data, and the data that has a low probability of being generated based on a given dataset can be classified as anomalies [24]. A study on the detection of anomalies for time-series data that faces challenges such as lack of labels, and temporal corelations, claims that anomaly detection approach based on GAI proves to be better that many previously used baseline methods [25].

2.3.5

Design

GAI can be used to overcome problems like design fixation based on prior knowledge and pre-conceived notions, and enhance the creativity in the design process [26]. However, another study claims that with reference to industrial design, GAI can make significant contribution and save time in the early creative phases, while it still lacks competence to contribute to later phases of fine tuning based on stakeholder interactions and business requirements [27].

22

J. Kumari and R. Karim

2.4 GAI in Asset Management of Railways There has been research in the application of GAI for addressing challenges related to operation and maintenance of railways. GANs were used to decrease data imbalance in track defect detection [19]. The railway fastener detection problem also suffers with the problem of imbalance data, where there are very few negative samples. GANs were used to generate negative samples in fastener detection problem to improve fault diagnosis [28]. GANs were used to generate fake images of outdoor insulator in high speed rails for detection and visualisation of anomalies [29]. The use of GAI models in railways have been primarily for anomaly detection, and data augmentation for focussed problems. This paper identifies the use cases of GAI in asset management of railways from a broader perspective of challenges related to data-driven approaches in asset management. The possibility of these use cases can be seen in multiple systems and subsystems of the railway system, and not only for solving a problem focussed for a single type of asset.

3 Use Cases of GAI in the Asset Management of Railways When augmenting the decision-making process in industrial applications using data, there are several challenges that are encountered. Some of these challenges are directly related to availability of low data, data that is from another context, or poor-quality data. This section proposes the basic methodology for the use of GAI to address challenges related to data-driven approaches in asset management. Then it suggests some specific application areas in asset management of railways where this methodology can be used. The methodology for the implementation of generative AI in asset management of railways is shown below in Fig. 3. This methodology is based on the basic functionality of GAI models such as GANs, and VAEs. There is a generator and a discriminator network. The generator network takes data from the source as input, and generates synthetic data based on the learned patterns from the input data. The discriminator model compares the generated data to real data from the target domain and makes a classification on the quality of the generated data. This quality assessment is then given as feedback to the generator network to improve the quality of generated synthetic data. This methodology is proposed for some identified use cases related to data-driven approaches in asset management of railways in the next sub-section.

Use Cases of Generative AI in Asset Management of Railways

23

Fig. 3 The basic methodology for mapping of data from a source domain to a target domain using generative AI

3.1 Context Aware Analytics Context aware analytics for railways refers to the process of analysing asset-related data in context of its operational environment to make informed decisions. The context of analytics in industrial applications can be based on factors such as, types of end-users, operational environments, maintenance policies, environment and sustainability constraints, and professional training levels of the working personnel. Railways infrastructure and rolling stock operates in different geographies, and under different traffic conditions in terms of load and frequency. Such varying contexts are one of the reasons, why lab-scale demonstrations do not perform well in real-world industrial applications. The population of assets that are from the same manufacturer, may vary in their performance based on these contextual conditions. Models trained on data from one context may sometimes not represent the unique characteristics of another context. There is not always sufficient data to model the behaviour of the asset in each context. Data augmentation and domain translation techniques enabled by generative AI can be used to generate data for similar assets operating in different contexts. This can be done by training a GAI model on data from source domain and use the small dataset from the target domain to improve the model performance.

3.2 Historical Data and Condition Data It is typical for assets that are new in operation, to have very limited operation and maintenance data, while other similar assets that have been in operation for a long time can provide rich historical data. The difference between condition data and historical data for industrial assets can also be seen as data from different contexts. In this case, the contexts can be imagined to be distributed temporally than, spatially. However, while changes in the spatial context may be more concrete to model, the changes in temporal context can be slow and gradual such as degrading infrastructure and surroundings, changing weather conditions, impact of climate change, change in

24

J. Kumari and R. Karim

operating personnels, new maintenance policies etc. which are complicated to track and model. Generative AI techniques such as VAEs uses latent variables that cannot be observed or measured directly but enable the modelling of such underlying patterns and trends in data that are caused due to these subtle temporal changes. Generative algorithms can learn underlying patterns and dependencies in the historical data and generate new condition data for assets. GAI can generate more data with variations, to allow robust training of machine learning models.

3.3 Improved Data Quality There are several data quality issues that are encountered in the context of railway system. Railway assets are spread over and operate over different geographical locations and are operated in a multi-stakeholder environment. Such an ecosystem prevents the adoption of standardised policies related to data collection, data sharing, and quality control processes for data. This may lead to various data quality issues such as missing or incomplete data, erroneous data due to faulty sensors or human errors, outdated data, data with different unit systems and naming conventions etc. Generative AI techniques may help tackle some of these data quality issues by increasing the completeness of data by imputation of missing data, identifying faulty data through anomaly detection methods, and for denoising the data by learning the underlying distribution of clean data.

3.4 Natural Language Processing Asset Management Data The issues related to asset data in terms of language processing is generally not related to classical challenges such as handwriting recognition. Most of the recorded asset data in natural language, in industrial set-ups such as railways are already in a digitised form. The main challenges in dealing with such data is related to the domain specific language used in the records, data referring to specific contextual understanding of the domain, data from different sources in different languages, and missing information in data due to industrial standards for data and information security. Generative AI techniques can be used to learn the domain specific language and knowledge to generate text data that is relevant for the domain. Currently, most of the advanced NLP algorithms do not perform very well on industrial data due to the above-mentioned challenges. It may also be considered to improve the quality of the text data, to make it more understandable by out of the box NLP processing techniques. This can be done by generating a replica of the industrial data that holds the important information but translates the domain specific terms into interpretable language.

Use Cases of Generative AI in Asset Management of Railways

25

3.5 Fleet to Individual and Individual to Fleet Knowledge Transfer A fleet-based approach for engineering assets to model phenomenon such as degradation, has been used in research. This approach is based on identifying assets that have some similarities and using the data from the asset population to compensate for the lack of data for individual assets. The two major limitations of this approach are (1) generalisation, i.e. the model reflects the population behaviour and is far from the individual behaviour, and (2) specialisation, i.e. the model is close to the individual behaviour but does not perform well when used for other individuals of the population. Generalisation is observed when there is a large population dataset but a smaller individual dataset. Specialisation is observed when some individuals have substantially larger dataset than other individuals, which then have less impact on the population. Generative AI techniques can be used to create a balanced dataset for the implementation of fleet-based approaches by generating data through learning the population behaviour and the individual behaviour.

3.6 No Fault Found ‘No Fault Found’ refers to situations where, there is an issue that has been reported, but no faults have been observed. The NFF situations in a dataset lead difficulty in diagnosis of events. Generative AI techniques may help to generate possible faults that may have led to the reported issue. Some of the possible use cases of Generative AI in addressing the NFF condition are listed below. . The data on previous reported issues with observed faults can be used to generate synthetic faults for simulation and testing purposes. . The historical troubleshooting data on NFF conditions can be used to train Generative AI models on domain specific expertise such as, the course of action, localisation of components and the impact of the actions that were performed in such situations.

3.7 Configuration Management Railway system is a complex system-of-systems with multiple level of hierarchies. Figure 4 shows an example of this S-o-s perspective for rolling stock. A pool of high value components (HVC) in railway vehicles such as wheel sets, engines and compressors are shared by the vehicles in a fleet. These HVCs operate in different vehicle configurations through their life cycle. The railway vehicle is then part of a fleet configuration that generally shares a common operation and maintenance

26

J. Kumari and R. Karim

Fig. 4 An example to illustrate a configurational hierarchy for a railway vehicle

strategy. The fleet of vehicles operate on certain infrastructure configurations based on their operational strategy. The change in any of these configurations may lead to change in the degradation patterns of the railway assets. The behaviour of an asset cannot be analysed independently. For example, the wear on the wheels of a railway vehicle may be related to the condition of the track on which it is running, or the vehicle in which they are mounted. Configuration management in this scenario deals with consideration of tracking the change in asset behaviour based on its operational configuration. Generative AI techniques can help in generating data for the configurations of assets for which the data is not available. This can be done on training with different types of data with close similarities, such as similar operating conditions, similar configuration types and so on.

3.8 Life Cycle Management in a Cross-Organisational Operation and Maintenance Environment In industrial systems it is typical that different assets are owned, operated, and maintained by multiple parties throughout their lifecycle. In such a scenario, the historical and condition data on these assets is also owned by separate parties. To be able to predict parameters, such as time between failures, and remaining useful life of an asset, it is crucial to construct the operation and maintenance timeline of the asset throughput its life cycle. Figure 5 shows and example of the timeline of engines in rolling stock in terms of corrective and preventive maintenance activities. It can be

Use Cases of Generative AI in Asset Management of Railways

27

Fig. 5 An illustration of a timeline of maintenance actions performed on a high value component of a railway vehicle

observed that there are a number of activities recorded for some durations, while there are no activities in others. Windows of the asset life cycle for which data is not available may lead to inaccurate predictions. Generative AI techniques can be used in such scenarios to impute the missing windows of asset lifecycle. Creating data for an asset for a life-cycle stage could be done by different types of training data, such as-data for same asset at different stages of the life cycle, and data for other similar assets for the same stage of life cycle.

3.9 Maintenance Policy Maintenance policy refers to the set of rules and guidelines that state how the maintenance activities are planned. The preventive maintenance intervals of assets are generally planned based on their design reliability. However, very often, the assets are observed to fail before the planned and scheduled maintenance. Such a scenario calls for a reconsideration in the maintenance policy of the asset. The change in maintenance policy at an organisational level, involves multiple levels of approvals and paperwork making it difficult to have dynamic maintenance policies based on asset performance. Generative AI techniques can be used to train the model from failure data observed during the current maintenance policy, to generate data for a new proposed maintenance policy. This data can then be used to observe the failure times and conduct cost analysis for the new policy and facilitate an informed decision-making related to change in maintenance policy.

28

J. Kumari and R. Karim

4 Conclusions There has been tremendous development in AI algorithms in recent times. GAI algorithms have taken the world by storm with its popular applications such as ChatGPT. These generative algorithms offer multiple possibilities for generating data to facilitate data-driven asset management. However, to harness these technologies in industrial application, an understanding of domain specific challenges is required. This paper utilises the previous understanding of domain specific challenges to identify key areas where GAI technologies can be used to address challenges related to datadriven approaches in the asset management of railways. The underlying technology of GAI has been published and presented in research articles several times. This paper identifies the use of this technology to empower context-aware analytics, conditionmonitoring, improvement of data quality, addressing no-fault found conditions, lifecycle management of assets, configuration management in a system-of-systems, and development of maintenance policy in railway system which is a multi-stakeholder, asset intensive organisation with a wide spatial and temporal distribution of asset data. The future work in this field will include the application of these use cases to specific scenarios in railways. This application requires development of methodological contributions in areas such as data pre-processing, algorithm selection, adapting the available models to the specific scenario and analysing the model performance. A major challenge in this research is to quantify the effectiveness and efficiency of these suggested use cases on the asset management of railways. Acknowledgements We would like to convey our appreciation to Sweden’s Innovation Agency Vinnova, JVTC (Luleå Railway Research Center), Trafikverket, Alstom, Tågföretagen, Norrtåg, Infranord, Trasnitio, Bombardier, Sweco, Omicold, and Damill and partners for their financial support to carry out this work within the AIFR (AI Factory for Railways) project.

References 1. Karim R, Galar DP, Kumar U (2021) AI Factory: theories, applications and case studies. CRC Press (Manuscript in preparation) 2. Tang R, De Donato L, Besinovi´c N, Flammini F, Goverde RMP, Lin Z, Liu R, Tang T, Vittorini V, Wang Z (2022) A literature review of artificial intelligence applications in railway systems. Transp Res Part C: Emerg Technol 140:103679 3. Kumari J, Karim R, Thaduri A, Castano M (2021) Augmented asset management in railways— issues and challenges in rolling stock. Proc Inst Mech Eng Part F: J Rail Rapid Transit. https:// doi.org/10.1177/09544097211045782 4. Kumari J, Karim R, Thaduri A, Dersin P (2022) A framework for now-casting and forecasting in augmented asset management. Int J Syst Assur Eng Manag 13:2640–2655 5. Li X, Luo J, Younes R (2020) ActivityGAN: generative adversarial networks for data augmentation in sensor-based human activity recognition. In: Adjunct proceedings of the 2020 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2020 ACM international symposium on wearable computers, pp 249–254

Use Cases of Generative AI in Asset Management of Railways

29

6. Brooks S (1998) Markov chain Monte Carlo method and its application. J R Stat Soc Ser D (Stat) 47:69–100 7. Jo A (2023) The promise and peril of generative AI. Nature 614:214–216 8. Turing A (1936) Turing machine. Proc London Math Soc 242:230–265 9. Shannon CE (1956) A universal Turing machine with two internal states. Autom Stud 34:157– 165 10. Shannon CE (1953) Computers and automata. Proc IRE 41:1234–1241 11. Holland JH (1992) Genetic algorithms. Sci Am 267:66–73 12. Markov AA (2006) An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains. Sci Context 19:591–600 13. Liu Y, Yang Z, Yu Z, Liu Z, Liu D, Lin H, Li M, Ma S, Avdeev M, Shi S (2023) Generative artificial intelligence and its applications in materials science: current situation and future perspectives. J Mater 14. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35:53–65 15. Ibrahim BI, Nicolae DC, Khan A, Ali SI, Khattak A (2020) VAE-GAN based zero-shot outlier detection. In: Proceedings of the 2020 4th international symposium on computer science and intelligent control, pp 1–5 16. Niu Z, Yu K, Wu X (2020) LSTM-based VAE-GAN for time-series anomaly detection. Sensors 20:3738 17. Mi L, Shen M, Zhang J (2018) A probe towards understanding GAN and VAE models. arXiv preprint arXiv:1812.05676 18. Luleci F, Catbas FN, Avci O (2023) CycleGAN for undamaged-to-damaged domain translation for structural health monitoring and damage detection. Mech Syst Signal Process 197:110370 19. Balogun I, Attoh-Okine N (2023) Covariate-shift generative adversarial network and railway track image analysis. J Transp Eng Part A: Syst 149:4022158 20. Van Dyk DA, Meng X-L (2001) The art of data augmentation. J Comput Graph Stat 10:1–50 21. Gao X, Deng F, Yue X (2020) Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty. Neurocomputing 396:487–494 22. Lim SK, Loo Y, Tran N-T, Cheung N-M, Roig G, Elovici Y (2018) Doping: generative data augmentation for unsupervised anomaly detection with GAN. In: 2018 IEEE international conference on data mining. IEEE, pp 1122–1127 23. Antoniou A, Storkey A, Edwards H (2017) Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 24. Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR (2018) Efficient GAN-based anomaly detection. arXiv preprint arXiv:1802.06222 25. Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K (2020) TadGAN: time series anomaly detection using generative adversarial networks. In: 2020 IEEE international conference on Big Data (Big Data). IEEE, pp 33–43 26. Hoggenmueller M, Lupetti ML, Van Der Maden W, Grace K (2023) Creative AI for HRI design explorations. In: Companion 2023 ACM/IEEE international conference on human robot interaction, pp 40–50 27. Fang Y-M (n.d.) The role of generative AI in industrial design: enhancing the design process and learning 28. Yao D, Sun Q, Yang J, Liu H, Zhang J (2020) Railway fastener fault diagnosis based on generative adversarial network and residual network model. Shock Vib 2020:1–15 29. Lu X, Peng Y, Quan W, Zhou N, Zou D, Chen JX (2020) An anomaly detection method for outdoors insulator in high-speed railway traction substation. In: 2020 2nd international conference on advanced computing and communication technology. IEEE, pp 161–165

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive Impairments in Fighter Pilots Angelo Compierchio, Phillip Tretten, and Prasanna Illankoon

Abstract The constant needs to recruit and retain professional fighter pilots has been advocated by the rising trend of pilot age related cognitive impairments, pilot shortages and the continuous struggle to retain pilots is likely to get worse in the coming years. This crisis requires immediate actions and must be addressed by recruiting more pilots and by offering incentives to retain available pilots. The present investigation aims to address this problem with the introduction of a prevention strategy to ensure continued service of aging pilots. The continuous interaction between pilot, the flight information system, and the environment, have been recognized to affect pilot’s performance and error potentially hindering the safety of the aircraft. The research direction undertaken is intended to provide pilots with cognitive stimulation to raise flight safety levels and maintain flying skills proficiency. For this purpose, a novel neuron-based mirror platform has been proposed enabling understanding of another pilot’s actions and purposes behind pre-activated targeted responses from motorial cognitive domains. The study extends the application of mechanisms involved of mirror activation by combining a cognitive architecture for generating neuroergonomics mapping (NMP) programs. This interaction will enable pilots to develop learning experience based on observation and synchronized movements. NMP programs would also be tailored to ground-based training systems to enhance tactical air combat manoeuvring for active-duty fighter pilots. Keywords Cognitive impairments · Mirror neurons · Neuroergonomics mapping

A. Compierchio (B) · P. Tretten Luleå University of Technology, Luleå, Sweden e-mail: [email protected] P. Tretten e-mail: [email protected] P. Illankoon University of Moratuwa, Moratuwa, Sri Lanka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_3

31

32

A. Compierchio et al.

1 Introduction The quantum leap exerted by technology on aircraft performance has been startling, requiring as such pilot’s cognitive ability amounted in keeping the pace with performance while being “fit” to fly. Still, undertaking a physical activity such as running or rely on traditional “donut and coke” before flying would impair both physical and mental capabilities of a pilot enough to degrade performance and reaction times exacerbating all forms of error. The importance of human performance in aviation is such that continues to follow and exceed Boyd’s claim that he could win a simulated engagement with a rival fighter pilot in 40 s or less [1]. This expectancy contends the same human components of a pilot that flew the Harrier aircraft in the late sixties and a pilot who flies the F35 today. Assessing and accounting for human cognitive and information processing limitations started as a paramount military prerequisite and spread within every interactive technological domain. Furthermore, cognitive abilities challenges are portrayed by situations where circumstances facing the pilot in a real flight cannot befall those exposed during the training. This disengagement can trigger a chain-of-events contributing to the degradation or loss of the pilot’s awareness capability to create and maintain the complete picture throughout the entire flight. In military aviation, maintaining visual cues are highly critical when related to the pilot’s ability transition between internal and external guidance. This concern has been emphasized with the evaluation of pilot performance and the coupling of human-technology by illustrating a neuroergonomics approach of the pilot cognitive mechanisms. The aim is to undercover and engage neuralmirror markers dedicated to human performance of different mental states potentially affecting cognitive functions. The approach entails providing assistance to specific classes of pilots subjected to experience higher cognitive health risks. The pilots most exposed are not delimited by age alone [2] but embraces other categories subjective to return-to-duty planning [3] following mild injuries reported at home or work that may result to cognitive dysfunction and attentional impairment [4]. Furthermore, ensuring that pilots are properly trained remains the main emphasis by both civil and military air forces. This development reflects key findings from the 2020 US National Commission on Military Aviation Safety [5], reporting that pilots were undertaking fewer training flights, critically affecting competency levels, as a result of budget constraints and availability of spare parts. Numerous research programs have been investigating pilot’s performance, as well the effects of pilots’ mental and emotional states in a cockpit work environment. Traditional assessment techniques have included eye tracking, functional magnetic resonance imaging (fMRI) [6–8], magnetoencephalography (MEG) [9, 10], spectral Electroencephalographic (EEG) [11] and Near Infra-Red Spectroscopy (fNIRS) [12].

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive …

33

2 Cognitive Profile Boyd’s claim introduced the Energy-Manoeuvrability (E-M) Theory as a measure to directly relate the aircraft’s energy state and rate capabilities to operational efficiency and manoeuvrability. In principle, it was expected that fighter pilots possessed or were able to gain higher energy levels quickly before battle began than rival pilots. This advantage necessitated the formulation of a best path profile within a steadystate envelope by projecting manoeuvring energy on an altitude (h) against velocity (Mach No) chart [13]. The approximated profile developed by a Dynamic Profile Generator (DPG) was designated to compute normal G’s acceleration as a function of time between each energy profile point (hj, Mi). In correlation with Boyd’s E-M theory, the competing mechanisms influencing human performance nowadays has been mapped along orthogonal dimensions [14], both approaches are shown in Fig. 1. Human influence is conceptually presented across the “comfort zone” border with degraded mental states conditions as effort withdrawal, perseveration, inattentional blindness and deafness and mind wandering. The generation of NMP programs completely relies on the coupling of humantechnology and the mental flexibility exhibited in quantify the effectiveness of training acquired for each level of operational readiness. Maintaining this level of training for the required period is key to Air Forces, as it could have major implications for the operational planning process. In principle, as the technology trend of highly automated aircraft increases, it needs to be properly compensated by an equivalent increase in pilot’s competency. Aviation training has expanded from augmenting world realism with detailed aircraft features, equipment failures, trajectories and response and more [15]. This level of realism does not consider what has been learned by pilots, in practice simulation requires a more comprehensive approach linking feedback to learning outcome [16]. This contention has become more visible with pilots unable to get sufficient

Fig. 1 E-M of an aircraft flight path (left), human performance, arousal and task engagement (right) (adapted from [14])

34

A. Compierchio et al.

Fig. 2 Air-Force, Navy and Marine Corps physiology episodes (adapted from [18])

flight time [17], resulting in overwhelming physiological episodes due to the pilot’s inability to reach the comfort zone while flying an aircraft. Figure 2 shows the trend of reported physiological episodes for fighters (F-15, F-16, F/A-18, F-35), trainer aircrafts (T-6, T-38 and T-45), the A-10 attack jet and the C-130 Hercules between 2013 and 2018 [18]. Physiological symptoms consistent with hypoxia related symptoms included also spatial disorientation, mental exhaustion and temporal distortion. Therefore, the focus should be expanded according to individual pilot’s needs because of physical and physiological differences. Capturing cognitive thinking activities, sensation, perceptions and brain functions is the main task staging a direct communication link between the external environment (aircraft) and the brain. Neuroergonomics science has transformed the venues of addressing pilots’ performance by taking a deep examination of the naturalistic settings underlined by brain mechanisms in human–machine interactions. The ability to reproduce brain conditions in a laboratory has been exploited to construct a neuron-based mirror platform in a simulation-based training environment dedicated to enhancing human performance as well underpin potential risk factors constantly affecting potential pilot errors. The approach to improve pilot’s attentional reaction and response when facing an imminent threat has been elaborated on the mirror neurons theory [19]. The activation of mirror neurons happens when a human performs action and imagine himself performing it, and he/she observes another human performing the same action. A neuroergonomics training program based on this principle would provide a steep ramp up to obtain full operational readiness.

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive …

35

3 Training Needs Modern Air Forces are ever more demanding eye-blink based Situation Awareness (SA) solutions to effectively improve mission success. This process requires a cohesive structure for displaying information to pilots from many separate sources, while harvesting cognitive competences of the pilot’s decision making when managing unexpected safety related hazards. In the early fifties pilots were mainly trained for technical skills as acquiring knowledge about aircraft systems and flying the aircraft. Since then, the role of adopting simulation-based training in aviation has considerably evolved with the introduction of virtual environments reflecting to a greater extend pilot’s performance in the real world. Aviation training has continuously expanded the role of simulation reflected in improved training methodologies; however, it still does not consider the performance outcomes and what has been learned [15]. This matter is still current today as it was visible two decades ago, this cause has been found fundamentally interrelated to aviation accidents. A meta-analysis assessment between 1990 and 1998 of over 16,000 accidents involving different aviation communities (U.S. Navy/Marine Corps, U.S. Air Force, U.S. Army, U.S. commercial air carrier and U.S. general aviation) reported that across all communities’ decision and skills-based errors were the primary cause, followed by violation and perceptual errors [20]. This trend is continuing today, notably as the recently reported occurrences between 2013 and 2020 affecting 186 aircraft [17]. Although simulation training has continuously evolved to cater for cognitive and behavioural aspects [21], the pilot shortages situation and the shortening of simulation hours could have a snowball effect over the entire military aviation domain [17]. Similar to Boyd’s E-M theory, employing a mirror system (MS) for training would have several benefits: . from active to proactive role in emergency scenarios, . minimise the loss of competency and maintain efficiency without increasing the number of sorties, . support less experience instructors, . create a baseline for certain cognitive functions to eliminate the most bias, . create a repository pre-activation database for mapping visual information on motorial actions for aging pilots, . prepare ad-hoc training for unfit period of cognitive dysfunction. These conditions have opened up new research areas covering observational learning [22]. A schematic overview of a novel mirror-based training approach is presented in Fig. 3. The training forms a template encircled by two main human factors: survival and recover. The pilot must be able to maintain cognitive and task performance during a flight task by controlling physiological and psychological influences. The mainstream in the diagram identifies embodied cognition for task relevant resources

36

A. Compierchio et al.

Fig. 3 Schematic of cognitive abilities congruent with the pilot’s mission performance

spanning the brain, the body and the environment that can be accessed to perform the task [23]. A systematic mapping of physical and psychological profiles could be produced through an evidence-centered design process. From a training perspective a personalized Brain–Computer Interface (pBCI) model could also be implemented to export direct brain signals collected from sorties to capture behavioural and cognitive abilities for demanding training segments [24].

4 The Perceptual Cycle Since the prevalence of accidents were due to pitfalls such as decision making, communication, situation awareness and human–machine coordination an explanatory model of decision making has been introduced for explaining potentially available information from a MS. The importance of decisional errors is characterized by intrinsic knowledge evolved into local rationality [25]. In this context, when an alarm indicates an operational fault in the cockpit, the pilot would held a schema to gather evidence from past experience. Therefore, local rationality is preserved by scanning the surrounding world to explore a wider knowledge and follow-on actions. The Neisser’s Perceptual Cycle Model (PCM) shown in Fig. 4 has been applied to the aeronautical critical decision making (ACDM) process with main emphasis on the world information, individual held mental model and their role in decision making [26].

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive …

37

Fig. 4 Neisser’s Perceptual Cycle Model with potentially available motorial information (adapted from [26])

The model recognizes the role of motorial acts on the cognitive mental map and its influence on interaction itself directly related to the working environment. The potential availability of mirror-based information is intended to highlight the foreseeable contribution to pilot’s cognitive and perceptual skills.

5 Practical Adaptation The pilot’s main responsibility is flying the aircraft safely as foremost advocated by the sequence described by three golden rules: “Aviate (A), Navigate (N), Communicate (C)” commonly known as ANC. The first rule refers to flying the aircraft, utilizing flight instruments and controls. The second rule draws attention on the pilot’s current and planned flight position. This action in turn is performed as appropriate through the third rule as communication to ground-based air traffic controllers. The ANC rule might seem simple to follow, however it defines human abilities to interact and cope with complex situations when the pilot disconnects the ANC sequence from distractions and loses control. The complex interaction between the pilot-cockpit systems has been thought to develop a solution to better control human limitations. In this circumstance, there is an overwhelming need to create an inventory capable to achieve the required understanding. Herein, a neuroergonomics platform has been positioned at the forefront of capturing human factors consequences as a result of complex human–machine interaction. The following accident case report shows a practical example of a simulated adaptation of a mirror countermeasure [27].

38

A. Compierchio et al.

Fig. 5 Aligned WA-HMD (left) versus misaligned BA-HMD with “green glow” (right) (adapted from [27])

5.1 Case Report In a training operation a student pilot and an instructor pilot operating two F35 aircraft were engaging an enemy aircraft. The training formation consisted of a White Air (WA) and a Black Air (BA) when at a certain time the Mishap (accident) Aircraft (BA) touched down and the pilot lost control of the aircraft. A required correction was affecting the Helmet Mounted Display (HMD) by showing misaligned symbology and by projecting green glow brightness further distracting the pilot. The pilot tried to correct the error unsuccessfully since subjected to lack of visual cues became distracted, while the low alignment conflicted Instrument Landing System (ILS) and visual data. This event led to cognitive degradation contributing to mental fatigue, that was added to an already prolonged wakefulness as admitted by the pilot. The summation of these symptoms made it a struggle causing an accident (Fig. 5). The main cause to the accident was the HMD misalignment that distracted the pilot right on the approach timeline. This event coupled with the high landing speed and lack of control made it cognitively demanding to recognize the fast approach.

5.2 Mirror-Based Warning System The actual accident was the result of a combination of both system (BA-HMD misalignment) and pilot error. Two F35 aircraft took part in the training. The take-off time for both aircraft was set at 20:16 while the accident happened at 21:15. In this situation, a mirror warning system could have been activated in two ways, at time of take-off or as a result of an indication of failure. At take-off the Autonomic Logistic Information System (ALIS) in each aircraft would have essentially become entangled enabling the mirror platform to function as a warning system. In the other case, the detection of an HMD misalignment would have activated a video projecting the HMD view on the BA pilot as shown in Fig. 6. The use of the video would have increased the neurophysiological arousal (see Fig. 1) and heightened motorial preparation through assigning more attentional

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive …

39

Fig. 6 Aligned WA-HMD (left) versus misaligned BA-HMD with a video projection of WA-HMD (right)

resources to the pilot [28]. The animation would have enhanced motorial response and facilitated the BA pilot to address F35 system issues.

6 Platform The design of the platform begins by localizing the pilot’s MS with data links into a Cognitive Architecture (CA). The localization is obtained with neuroimaging and transcranial magnetic stimulation (TMS). Research experiments have also shown that motor acts were observed with fMRI [29]. More specific actions that induced motor activity were associated with tools and hand-manipulation and orientation as reaching and grasping. The MS is also affected by Mu desynchronization from attenuation of EEG power [30]. This notion is induced by attention processes after presenting visual stimuli (e.g., video) in observation states as shown in Fig. 7. The MS pre-initiated acts could be developed to generate more realistic and costeffective training plans. At this stage, the CA concept is introduced for improving human cognition understanding also quoted as [31]: ... the fixed structure that forms the framework for the immediate processes of cognitive performance and learning.

In this context, the focus is on building a CA to capture and improve perception response and learning from previous experiences. A CA plays a key role for modelling cognitive processes for understanding, predicting and replicating human performance and learning. This role is further extended to cover simulated-based cognition readiness analyses from one pilot to another, since mirror neurons are activated through observation of one observing another. Through time, the level of operational readiness for pilots increases and decreases, a pre-activated actions list could be classified according to the Level of Operational Readiness, since the minimum readiness level must be satisfied all times. After processing, captured and store pre-activation data are suited for real-world mission scenarios. Each generated action sequence is realized into specific NMPs.

Fig. 7 A neuroergonomics platform exhibiting a mirror-based warning system with a cognitive architecture

40 A. Compierchio et al.

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive …

41

The effectiveness of a MS would increase further with the integration of performance tracking and prediction algorithm and evolve as a new technology for combat aviation.

7 Conclusions In this study, we introduced a neuroergonomics perspective to understand the neural mechanisms of pilot performance. This discipline provided interesting solutions to investigate pilot incapacitation, mental workload, and attentional impairment. The possibility to extract information from the brain allows to identify countermeasures to tackle specific mental conditions experienced by the pilot in emergency situations. A broad set of cognitive abilities was identified for future evaluation with neuroergonomics measurements. We used these elements to develop a novel mirror-based platform for enhancing the learning of pilots with impaired cognitive conditions. In this task, the platform could be adapted for pilot experiencing temporary cognitive dysfunction and age-related declines. A pilot needs to undertake frequent training to maintain and retain proficiency, in this context the dynamic adaptation of the platform may well support and optimize initial training and follow-on training. Potential countermeasures of the MS were investigated on an accident report while underpinning a situational awareness misperception. The feasibility to employ a mirror-based platform to pilot’s cognitive abilities holds the potential for benchmarking individualized mental health diagnostics for real-time solutions. In a functional context, an MS-CA integrated platform is set to create focused resolutions to combat safety problems and set the scene for further human-cockpit integration. Acknowledgements Our thanks to sponsors of IAI2023 Congress for their intellectual and financial support.

References 1. Coram M (2002) The fighter pilot who changed the art of air warfare. http://www.aviation-his tory.com/airmen/boyd.htm. Accessed 12 Jan 2023 2. Van Benthem KC, Herdman CM (2021) A virtual reality cognitive health screening tool for aviation: managing accident risk for older pilots. Int J Ind Ergon. https://doi.org/10.1016/j. ergon.2021.103169 3. Jones C, Harasym J, Miguel-Cruz A, Chisholm S, Smith-MacDonald L, Brémault-Phillips S (2012) Neurocognitive assessment tools for military personnel with mild traumatic brain injury: scoping literature review. JMIR. https://doi.org/10.2196/26360 4. Dehais F, Roy RN, Scannella S (2019) Inattentional deafness to auditory alarms: inter-individual differences, electrophysiological signature and single trial classification. Behav Brain Res. https://doi.org/10.1016/j.bbr.2018.11.045

42

A. Compierchio et al.

5. LaGrone S (2020) Military commission on military aviation safety. www.news.usni.org. Accessed 12 Dec 2022 6. Mason MF, Norton MI, Van Horn JD, Wegner DM, Grafton ST, Macrae CN (2007) Wandering minds: the default network and stimulus independent thought. Science 315:393–395. https:// doi.org/10.1126/science.1131295 7. Christoff K, Gordon AM, Smallwood J, Smith R, Schooler JW (2009) Experience sampling during fMRI reveals default network and executive system contributions to mind wandering. Proc Natl Acad Sci USA 106:8719–8724. https://doi.org/10.1073/pnas.0900234106 8. Fox KC, Spreng RN, Ellamil M, Andrews-Hanna JR, Christoff K (2015) The wandering brain: meta-analysis of functional neuroimaging studies of mind-wandering and related spontaneous thought processes. https://doi.org/10.1016/j.neuroimage.2015.02.039 9. Molloy K, Griffiths TD, Chait M, Lavie N (2015) Inattentional deafness: visual load leads to time-specific suppression of auditory evoked responses. J Neurosci 35:16046–16054. https:// doi.org/10.1523/jneurosci.2931-15.2015 10. Durantin G, Dehais F, Gonthier N, Terzibas C, Callan DE (2017) Neural signature of inattentional deafness. Hum Brain Mapp 38:5440–5455. https://doi.org/10.1002/hbm.23735 11. Yi-Feng L, Ji-Jiang Y, Li-Hui Z, Tao Z, Lue D, Bei W (2015) Research of EEG change feature under +Gz acceleration. Comput Ind 20:144–152 12. Mark JA, Kraft AE, Ziegler MD, Ayaz H (2022) Neuroadaptive training via fNIRS in flight simulators. Front Neuroergon. https://doi.org/10.3389/fnrgo.2022.820523 13. Rutowski ES (1954) Energy approach to the general aircraft performance problem. J Aeronaut Sci 21(3) 14. Dehais F, Lafont A, Roy R, Fairclough S (2020) A neuroergonomics approach to mental workload, engagement and human performance. Front Neurosci. https://doi.org/10.3389/fnins. 2020.00268 15. Salas E, Bowers CA, Rhodenizer L (1998) It is not how much you have but how you use it: toward a rational use of simulation to support aviation training. Int J Aviat Psychol 8(3):197– 208. https://doi.org/10.4324/9781315243092-3 16. Lateef F (2010) Simulation-based learning: just like the real thing. J Emerg Trauma Shock 3(4):348. https://doi.org/10.4103/0974-2700.70743 17. Copp T (2022) As more aviation accidents pile up, key safety recommendations remain undone. https://www.defenseone.com/threats/2022/06/more-aviation-accidents-pilekey-safety-recommendations-remain-undone/368072/. Accessed 2 Feb 2022 18. National Commission on Military Aviation Safety (2020) Report to the president and the Congress of the United States December 1, 2020. https://www.militaryaviationsafety.gov/rep ort/NCMAS_Final_Report.pdf 19. Luppino G, Rizzolatti G, Matelli M (1998) The organization of the cortical motor system: new concepts. Electroencephalogr Clin Neurophysiol 20. Shappel S, Wiegmann D (2004) HFACS analysis of military and civilian accidents: a North American comparison. In: Proceedings of ISASI, Australia 21. Jafer S, Durak U (2017) Tackling the complexity of simulation scenario development in aviation. In: Proceedings of the symposium on modelling and simulation of complexity in intelligent, adaptive and autonomous systems 22. Bonini L, Rotunno C, Arcuri E, Gallese V (2022) Mirror neurons 30 years later: implications and applications. Trends Cogn Sci 26(9):767–781. https://doi.org/10.1016/j.tics.2022.06.003 23. Wilson A, Golonka S (2013) Embodied cognition is not what you think it is. Front Psychol. https://doi.org/10.3389/fpsyg.2013.00058 24. Ma Y, Gong A, Nan W, Ding P, Wang F, Fu Y (2023) Personalized brain–computer interface and its applications. J Pers Med 13(1):46. https://doi.org/10.3390/jpm13010046 25. Plant KL, Stanton NA (2014) The process of processing: exploring the validity of Neisser’s perceptual cycle model with accounts from critical decision-making in the cockpit. Ergonomics 58:1–15. https://doi.org/10.1080/00140139.2014.991765 26. Neisser U (1976) Cognition and reality. W.H. Freemand and Co., San Francisco. Internet Archive: cognitionreality00neisrich

A Neuroergonomics Mirror-Based Platform to Mitigate Cognitive …

43

27. F-35A, T/N 12-005053 (2020). https://www.afjag.af.mil/. Accessed 18 Feb 2023 28. Jahanpour ES, Fabre EF, Dehais F, Causse M (2019) Giving a hand to pilots with animated alarms based on mirror system functioning. https://www.frontiersin.org/events/2nd_Internati onal_Neuroergonomics_Conference/5421. Accessed 18 Jan 2023 29. Cattaneo L, Rizzolatti G (2009) The mirror neuron system. Arch Neurol 66(5):557–560. https:// doi.org/10.1001/archneurol.2009.41 30. Debnath R, Salo VC, Buzzell GA, Yoo KH, Fox NA (2019) Mu rhythm desynchronization is specific to action execution and observation: evidence from time-frequency and connectivity analysis. Neuroimage 31. Newell A (1990) Unified theories of cognition. Harvard University Press, Cambridge, MA. https://psycnet.apa.org/record/1990-98981-000

Risk-Based Safety Improvements in Railway Asset Management Peter Söderholm and Lars Wikberg

Abstract This paper describes results from a research and development (R&D) project at Trafikverket (Swedish transport administration). The purpose of the project is to establish a risk-based framework for continuous safety improvement in railway infrastructure asset management. The approach is based on a combination of methodologies and tools described in dependability standards, such as Fault Tree Analysis (FTA), Failure Modes, Effects and Criticality Analysis (FMECA) and Event Tree Analysis (ETA). The empirical material is related to railway track with focus on safety critical rail defects potentially leading to rail breaks and derailment, but also their management within Trafikverket. Besides identifying risks, barriers and improvements, the approach complies with regulations and mandatory standards. Examples of these are Common Safety Method for Risk Evaluation and Assessment (CSM-RA, EU 402/2013) and EN 50126—RAMS (Reliability, Availability, Maintainability and Safety) for railway applications. In addition, the approach complies with regulatory requirements related to enterprise risk management and internal control, i.e., effectiveness, productivity, compliance and reporting. The approach also supports asset management in accordance with the ISO 55000-series. Keywords Risk · Safety · Asset management · Railway · Infrastructure · Rail · Research · Development · Improvement · Compliance

P. Söderholm (B) Trafikverket and Quality Technology and Logistics at Luleå University of Technology, Box 809, 971 25 Luleå, Sweden e-mail: [email protected] L. Wikberg Trafikverket, Box 809, 971 25 Luleå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_4

45

46

P. Söderholm and L. Wikberg

1 Introduction Infrastructure managers, as well as all others operating the railway system, have full responsibility for the safety of their own part of the system. The establishment of a Safety Management System (SMS) is intended to fulfil this responsibility. The purpose of the SMS is to ensure that an organisation achieves its business objectives in a safe manner and complies with safety obligations. These safety obligations must always be fulfilled in today’s ever changing and complex railway environment, which requires continuous safety improvement. Hence, the SMS-requirements are arranged to completely describe an organisation’s SMS following a Plan, Do, Check, Act (PDCA) cycle [1]. The SMS should consider each individual requirement, as well as their integration, to form a coherent SMS that controls relevant risks. However, an efficient control of risk can only be achieved by jointly considering three critical dimensions of barriers, i.e., Man, Technology and Organisation (MTO). Barriers related to man, or human, consist of the front line people with their skills, training and motivation, e.g., maintenance personnel performing tasks in the railway infrastructure. Barriers related to technology consists of the use of tools and equipment, e.g., measurement wagons for inspection of track quality. Barriers related to organisation consists of procedures and methods defining the relationship of tasks. One example of an organisational barrier is the maintenance programme with different types of tasks and their intervals based on the criticality of inherent items of the railway infrastructure [1]. To support continuous safety improvement, the SMS should be monitored in accordance with Common Safety Method for Monitoring (CSM-Mon, EU 1078/ 2012) [2]. This monitoring checks the correct application and the effectiveness of all the processes and procedures in the management system. This includes operational, technical and organisational barriers (or risk control measures). A correct application is related to productivity by doing things the right way in accordance with the SMS. The effectiveness is related to the performance of the SMS by doing the right things. See [1]. The monitoring of a SMS is supported by the use of Common Safety Targets (CSTs) and Common Safety Indicators (CSIs) as described by ERA-GUI-02-2015 [3]. Indicators related to the technical safety of infrastructure and its implementation support an understanding of the safety performance. This performance can be measured by CSIs such as collisions of trains, derailments and level crossing accidents. In addition, CSIs related to precursors of accidents, measure the number of precursors that may result in collisions and derailments on national level. If the monitoring of the SMS identifies occurrences of non-compliance to safety requirements that are considered unacceptable, an action plan shall be established. This action plan shall (EU 1078/2012) [2]: (a) lead to the enforcement of correctly implemented processes, procedures, operational, technical and organisational barriers (or risk control measures) as specified; or

Risk-Based Safety Improvements in Railway Asset Management

47

(b) improve existing processes, procedures, operational, technical and organisational barriers (or risk control measures); or (c) identify and implement additional barriers (or risk control measures). Since the changes included in the action plan is intended to affect safety, expert judgement should be used to determine the significance of the change. There are six criteria for judging the significance, of which one is failure consequence. This criterion considers the credible worst-case scenario in the event of failure of the system under assessment, taking into account the existence of safety barriers outside the system under assessment. If the safety improvement is judged to be significant, CSM-RA shall be applied (EC 402/2013) [4]. As described above, barrier analysis is central when working with continuous improvement related to safety within railway. Hence, the purpose of the study presented in this paper is to establish a risk-based framework for continuous safety improvement of railway infrastructure in asset management. The framework is exemplified by a case study and consist of three parts: . a proposed combination of methodologies and tools for risk assessment and barrier analysis that can be implemented in the SMS; . an identification of existing hazards and barriers linked to CSTs and CSIs that should be monitored to comply with CSM-Mon, and; . an action plan for implementation or improvement of existing barriers, but also the addition of barriers when necessary, that follows the process of CSM-RA when the changes are significant.

2 Method and Material One way to comply with CSM-RA is to apply the EN 50126 standard [4, 5]. This standard provides examples of useful methodologies and tools for the identification of hazards and analysis of barriers, e.g., FMEA/FMECA, FTA, Bowtie analysis, and ETA. These methodologies are further described in standards for risk assessment (e.g., IEC 31010) [6, 7], but in some cases also specific standards for each methodology, i.e., IEC 60812 (FMEA and FMECA) [8], IEC 62502 (ETA) [9] and IEC 61025 (FTA) [10]. Hence, the standard EN 50126 [4, 5] and its recommendation for methodologies to apply in identification of hazards and barrier analysis, as well as the specific IEC-standards, are applied in this study. Fault Tree Analysis (FTA) and Failure Modes, Effects and Criticality Analysis (FMECA) are well-known and commonly applied risk management methodologies for analysis, evaluation and control of risks. These methodologies are typically utilised in early phases of the system life cycle, e.g., for system safety purposes. However, they can also be used for continuous dependability improvements in later life cycle stages, e.g., during the operation and maintenance phases. Two other common risk-related methodologies applied for safety and dependability analyses are the bowtie diagram and barrier analysis (see, e.g., EN 50126 and ISO/IEC 31010) [4–6].

48

P. Söderholm and L. Wikberg

As application area, the study presented in this paper focus on track as part of the infrastructure, which TSI [11] also point to EN 50126 as an appropriate standard when working with railway safety. According to regulations [2], CSIs related to precursors that may result in derailments and collisions on national level should always include broken rails and track buckles regardless of the detection mode and time. However, indicators related to collisions of trains and level crossing accidents are excluded in this study since they mainly are related to the Control Command and Signaling (CCS) system. In addition, the material presented here focuses on precursors related to rail break, even though the study also considers track buckles and other track misalignment. Hence, besides derailment (as top event) due to rail breaks (as precursor), the main focus is on rail defects [12] and their treatment [13]. In summary, the study includes indicators related to technical safety of track infrastructure and its implementation for understanding safety performance with reference to derailments. The empirical material of the study presented in this paper is related to Trafikverket (the Swedish transport administration). As railway infrastructure manager, Trafikverket has the ultimate responsibility for managing safety-related risks in the railway infrastructure. Hence, one important area is traffic safety, which on an aggregated level is monitored by CSIs. One of Trafikverket’s strategic goals is to raise its maturity level of the general safety culture from “Established” to “Proactive”. This includes working in a more risk-oriented way, among others with a pragmatic utilization of FTA and FMECA. The purpose is to improve the capability to work proactively with railway-related traffic safety risks. This includes the application of an integrated MTO-perspective (Man, Technology and Organization) to identify risks in the railway infrastructure, monitor barriers and risk, and to propose additional barriers (risk control measures). Furthermore, experiences from occurred events, incidents and accidents need to be taken into account through active participation in accident investigations and ensuring implementation of their improvement suggestions and follow-up. In summary, this safety-related railway application is one vital part of Trafikverket’s integrated Enterprise Risk Management (ERM) framework, where risk management, continuity management, event management and related capabilities are integrated with each other. The applied approach is based on an initial FTA and analysis of existing barriers, which in turn was used as input to a FMECA. The FMECA also included the identification of additional barriers to reduce the Risk Priority Numbers (RPNs) that were on an unacceptable high level. Data and information gathering regarding CSIs for FTA, FMECA, and ETA, was performed in a workshop format with participating experts from different technical areas (e.g., command-control and signalling, telematics, infrastructure, and energy) within Trafikverket. In addition, data from the asset register system (BIS), the inspection system (Bessy), the fault reporting and corrective action system (0Felia), and the accident reporting system (Synergi) was used.

Risk-Based Safety Improvements in Railway Asset Management

49

3 Results The CSIs and related rail defects can be can be illustrated and quantified in a fault tree on different degrees of detail, see Figs. 1 and 2. An internal barrier of the track system to reduce the probability of rail breaks through design is the selected type of rail and fastening (see Fig. 3). An external barrier to reduce the probability is preventive maintenance (e.g., inspections and related tasks [13]) intended to manage precursors (e.g., rail breaks or track buckles and other track misalignment) and their causes (e.g., rail defects [12]). As one generic illustration, the risk and barrier model can be related to the applicability and effectiveness of Condition-Based Maintenance (CBM) as a barrier to illustrate possible consequences on safety performance, availability (and Life Cycle Cost, LCC) due to the degradation of required functions in relation to operational time (see Fig. 4). A quantified example of CBM related to rail defects, rail breaks and derailments is provided in Fig. 5.

Common Safety Indicators, CSI’s

≥1

CSI Level 1

Collision of train with rail vehicle

Derailment of train

Collision of train with obstacle within the clearance gauge

Fire

Level crossing accident

Electrical accident

accident to persons involving rolling stock in motion

≥1

CSI Level 2

Operative/Human failure/error

Vehicle failure

Infrastructure failure

Environment

≥1

CSI Level 3

Defect in track geometry

Turnout failure

Track deﬂection (Sun)

Rail Defects

2/3

CSI Level 4, Technical area speciﬁc failure causes

Old/Aged rail fastening Rail fracture -crack/ -defect

Fig. 1 Part of a qualitative fault tree of CSIs. Based on [2]

Secondary/ Additional rail fracture -crack/ -defect within 3m

Other (accident)

Suicide

Level 3: Patiern nature

0. Full section

0. Unknown origin

1. Transverse

Red: High derailment risk

Level 2: Location

2. Horisontal cracking

≥1

3. Longitudinal/ vertical cracking (piping)

1. Rail head

4. Corrosion

5. Foot

6. Not passing through a hole

5. Passing through a hole

3. Web

Level 1: Situation

9. Lap

1. Rail ends

0. Wear

≥1

≥1

1. Rolling surface defects

2. Plain rails ”rällängd”

2. Gauge corner defects

3. Crushing

≥1

2. Surface of head

Secondary/ Additional rail fracture -crack/ -defect within 3m

4. Local batter 5. Wheel burns

running surface

Squat. Cracking & local depression of

7.

Red: High derailment risk

1. Bruising 2. Felaktig maskinbearbetning

≥1

0. Full section

3. Defects caused by damage to rail ”Yttre påverkan”

3. Permanent deformation

2. Aluminothermic welding

Level 3: Origin, Cause

1. Electric ﬂash-butt welding

Level 2: Welding method

Fig. 2 Part of a qualitative fault tree with rail defects that may lead to rail break (and derailment). Based on [13]

IRS/UIC Level 4 Additional characteristics & Diﬀerentiation

IRS/UIC Level 3 Patiern nature/Origin, cause

IRS/UIC Level 2 Location/Welding method

IRS/UIC Level 1 Situation

IRS_70712 Ed1_2018-5

Rail fracture -crack/ -defect

Level 3: Patiern nature

3. Electric arc welding

1. Transverse cracking

4. Oxyacetylene (autogenous) welding

Squat. Cracking & local depression of running surface

7.

5. Pressurised gas welding

2. Horisontal cracking

≥1

≥1

4. Welding and resurfacing defects

6. Induction welding

7. Resurfacing

8. Other welding methods

50 P. Söderholm and L. Wikberg

Risk-Based Safety Improvements in Railway Asset Management

51

Fig. 3 Example of impact on the accumulated number of rail breaks per kilometre and accumulated tonnage based on type of rails and fastenings [14]

System safety performance Normal operational condition Reduced operational condition Critical operational condition

Accident condition (top event) Accident condition (top event)

Alert limit - Requires that the system track geometry condition is analysed and considered in the regularly planned maintenance operations Intervention limit - The value, which, if eceeded, requires corrective maintenance in order that the immediate action limit shall not be reached before the next inspection. Immediate action - The value which, if exceeded, requires taking measures to reduce the risk of derailment to an acceptable level. Indicators related to precursors of accidents Accident with consequence indicators related to killed, injured, material damage and disruption to traffic

System operational time

Fig. 4 Illustration of condition monitoring and related maintenance as part of Condition-Based Maintenance (CBM) as barriers related to degraded system safety performance

At the event of a derailment, an internal barrier in the track system reducing severity could be a guiding rail (e.g., at tunnels and bridges). An external barrier to reduce the severity of derailment can be infrastructure design through the use of energy absorbing areas in the nearby area or removal of hard obstacles. Examples of other severity reducing barriers by infrastructure design are single track, i.e., no collision with another train at parallel track possible at derailment. They can

52

P. Söderholm and L. Wikberg

Fig. 5 Quantified example of detection and related preventive or corrective maintenance of rail defects and rail breaks, but also resulting derailments due to rail breaks and failed barriers at a specific track section in Sweden (NDT = Non Destructive Testing) [14]

also be related to traffic management, e.g., speed limitations and thereby reduced consequences at derailment due to reduced kinetic energy. Another barrier related to traffic management is to only allow freight traffic, i.e., no casualties among passengers at derailment. See, Figs. 6, 7, and 8. Based on the barrier and degradation models (Figs. 4 and 8), it is possible to construct an event tree (see Fig. 9). Figure 9 illustrates that the entire barrier chain starts with a normal operating situation or normal operating state. Here, the safetycritical functions are active and protected by barriers (barriers NB1 and NB2), which must ensure that required functions are maintained in normal operation. This means that the barriers at this level must reduce the frequency of a functional fault, which

Fig. 6 The risk level for different types of traffic related to rail breaks and derailment at a specific track section in Sweden (ALARP = As Low As Reasonable Possible) [15]

Risk-Based Safety Improvements in Railway Asset Management

53

Fig. 7 The effect of additional barriers to control the risk related to rail breaks and derailment at a specific track section in Sweden (NDT = Non Destructive Testing; ALARP = As Low As Reasonable Possible) [15]

Subsystem failure rail defects

Subsystem fault rail break

Top event derailment

System consequences

Normal operational condition

Critical operational condition

Accident condition (top event)

Recovery condition

Normally active barriers

Escalation barriers

Consequence barriers

Non-destructive testing (NDT)

Train driver Track circuit

Check rail Derailment detector

Reducing the frequency/probability of an accident to occur

Reducing the consequences of an accident

Fig. 8 Model for risk and barrier analysis. Exemplified with derailment due to rail break [16]

leads to a critical operational state. For example, this could be a condition where a rail vehicle has to drive at a reduced speed. Here should activation of sufficient barriers (escalation barrier, EB) prevent a critical operational condition from developing to an unwanted top event (accident). When these barriers fail, an unwanted top event occurs. When this has occurred, the last link in the barrier chain is activated (consequence barriers, CB). These barriers shall reduce the consequence of an unwanted event, e.g., emergency services or securing the area for other traffic. From a focused safety perspective, system conditions 1 and 2 of the event tree in Fig. 9 can be combined into one. However, [5, 6] state that any actions to manage safety-related risks should be evaluated with regard to availability and Life Cycle Cost (LCC) before acceptance (i.e., an integrated RAMS-perspective). Hence, from an availability perspective, it is valuable to separate between system conditions 1 and 2 in Fig. 9. In addition, the occurrence of false alarms and resulting activation of barriers will also affect the availability and LCC. Hence, the event tree focused on

54

P. Söderholm and L. Wikberg Barriers Subsystem failure

Normally active barriers

Escalation barriers

System consequences Consequence barriers

Yes

Yes No

Yes No Yes No Yes No PM = Preventive Maintenance CM = Corrective Maintenance

System condition

System safety

System availability

1. Normal operational condition

Planned availability with PM

2. Reduced operational condition

Reduced planned availability with postponed CM

3. Critical operational condition

Restricted availability with immediate CM

4. Accident condition (top event)

Unavailability with consequence barriers and CM

5. Accident condition (top event)

Unavailability without consequence barriers, but CM

Fig. 9 Event tree for barrier analysis with a focused safety perspective

safety (Fig. 9) should be complemented with an event tree focusing on availability and LCC related to false alarms on any barrier level (see Fig. 10). In this event tree, the actual safety performance is never compromised. However, the perceived safety performance can be affected (scenarios 7 and 8), which also will affect the availability and LCC negatively (see Fig. 10). Based on the event trees illustrated in Figs. 9 and 10, it is possible to establish sequence logics (or scenarios) leading to each consequence. These sequence logics are useful for quantification of the conditional probabilities of their occurrences. This can in turn be based on data that reflect the probabilities for each of the branches Barriers Subsystem failure

Normally active barriers

Escalation barriers

System consequences Consequence barriers

Yes

No No

Yes No Yes No Yes No PM = Preventive Maintenance CM = Corrective Maintenance

System condition

System safety

System availability

6. Normal operational condition

Planned availability with PM

7. Reduced operational condition

Reduced planned availability with postponed CM

8. Critical operational condition

Restricted availability with immediate CM

9. Normal operational condition

Planned availability with PM

10. Normal operational condition

Planned availability with PM

Fig. 10 Event tree with a focus on availability (and Life Cycle Cost, LCC) to complement the safety-focused event tree

Risk-Based Safety Improvements in Railway Asset Management

55

of the event trees. See Table 1. When there are a subsystem failure (SF), normally active barriers should stop the propagation of the failure event into a faulty state with unwanted consequences (event tree Fig. 7). However, when there is no subsystem failure (SF), the barriers should not be triggered (event tree Fig. 10). If the barriers are triggered without any subsystem failure, one reason can be erroneous fault diagnostics (detection and localisation of failures, but also cause identification) at any barrier level (event tree Fig. 9). In addition, different fault diagnostics at different barrier levels contribute to No Fault Found (NFF) events. The events of the proposed event trees (Figs. 9 and 10) are related to subsystem failure and the function or activation of barriers. These events are listed in alphabetical order as follows (Table 1). Principal component Analysis (PCA) can be used to explore differences regarding barriers and consequences related to rail defects and rail breaks (see Fig. 11). The loading plot reveals three distinct groups, which in combination with a score plot (not shown here) reveals track sections with different characteristics. The red circle contains track sections on critical lines. The green circle contains track sections at major city areas. The blue circle represent track sections at marshalling yards and freight stations. The first principal component (PC1) summarises traffic characteristics, either as a cause to defects and breaks in rail, but even more to consequences of rail breaks. Hence, corrective maintenance related to rail breaks resulting in delays is also reflected in PC1. The second principal component (PC2) mainly reflects type of preventive maintenance task, where there are two distinct groups with opposite impact. Rail defects detected through NDT-inspections, affected by the amount of Table 1 Tabulation of proposed event trees (Figs. 9 and 10) No.

Subsystem failure SF

1

SF

2

SF

3

SF

NB1

Normally active barrier NB1

Normally active barrier NB2

Escalation barrier EB

Consequence barrier CB

NB1

-

-

-

PM

SF NB1

1

NB1

NB2

-

-

MiA + PCM

SF NB1 NB2

2

NB2

EB

-

MaA + ICM

SF NB1 NB2 EB

3

CrS + SA + SCM

SF NB1 NB2 EB CB

4

CaS + SA + SCM PM

SF NB1 NB2 EB CB

5

SF NB1

6

System consequences

Sequence logic

No.

4

SF

NB1

NB2

EB

CB

5

SF

NB1

NB2

EB

CB

6

SF

NB1

-

-

-

7

SF

NB1

NB2

-

-

MiA + PCM

SF NB1 NB2

7

8

SF

NB1

NB2

EB

-

MaA + ICM

SF NB1 NB2 EB

8

9

SF

NB1

NB2

EB

CB

PM

SF NB1 NB2 EB CB

9

10

SF

NB1

NB2

EB

CB

PM

SF NB1 NB2 EB CB

10

CB = Activation of consequence barrier; CB = No activation of consequence barrier, EB = Activation of escalation barrier; EB = No activation of escalation barrier; NB1 = Normally active barrier 1; NB1 = Failure of normally active barrier 1; NB2 = Normally active barrier 2; NB2 = Failure of normally active barrier 2; SF = Subsystem failure; SF = No subsystem failure

56

P. Söderholm and L. Wikberg

Fig. 11 Loading plot of reduced number of variables related to rail defects and rail breaks [17]

traffic has a positive loading. Rail breaks detected through other types of maintenance inspection, neither resulting in any major traffic disturbances nor affected by the amount of traffic, which has a negative loading. Regarding PC2, it should be noted that the interval of both these preventive maintenance tasks are related to traffic characteristics, i.e., at least the tonnage and the speed of trains. Furthermore, machine inspections are probably more cost-effective for linear assets (e.g., critical lines) than for point assets (or geographically limited areas, e.g., freight stations and marshalling yards) compared to manual inspections, due to the differences in maintainability and detectability of failures [17].

4 Discussion The barrier analysis is performed for hidden and evident failures that are classified as having safety consequences [18]. Hence, from a maintenance perspective, the barrier analysis is performed to evaluate dependent and independent maintenance tasks. An independent maintenance task is the last effective barrier to avoid a failure development escalating to an accident (EB). A dependent maintenance task is necessary to ensure expected reliability in operation and is not considered as a barrier controlling safety-related risk (NB1 and NB2). When analysing existing maintenance programs, the starting point is the maintenance tasks with the purpose of identifying the required function they are to protect [18]. Then the analysis proceeds with the technical system that delivers the required function. This part of the analysis focuses on the technical system design’s ability to maintain the required function. Design solutions to prevent single faults are redundancies or backup functions. If the design itself does not have preventive barriers, a degraded or a loss of required function can be fully or partially prevented by maintenance or operational activities. When evaluating the maintenance

Risk-Based Safety Improvements in Railway Asset Management

57

barriers, decision logic is used to decide on relevant maintenance strategy or type of maintenance activity in relation to that of the technical system design [19]. The decision logic basically separates functions whose functionality is secured with technical barriers, operational procedures or maintenance tasks. It is also possible to consider design changes (e.g., Fig. 3) or introduction of monitoring systems. However, availability and maintenance costs must be taken into account. The decision logic in the barrier and event tree analyses is based on the principles of single failure and it is the system that delivers the overarching safety performance that is the object of analysis. In contrast, FMEA/FMECA and RCM focus on every single Line Replaceable Unit (LRU) as analysis object. Hence, maintenance and operational barriers are analysed to check that the total chain of barriers related to a safety-critical failure and the barriers to its escalation into an accident with unwanted consequences. The contribution of the barrier model in the analysis is to compile all the barriers in the barrier chain to determine the critical importance each barrier. Maintenance activities can be barriers at all levels of the barrier chain. Hence, it is important to note that all technical barriers to reduction of consequence has maintenance as the last barrier, where the system does not have a BiT (Built-in-Test) which indicates in the event of a fault in the system. These barriers are not active during normal (NB1 and NB2) and critical operation (EB), but is activated by a top event (CB). The result of the safety analyses can be displayed in a risk matrix, which illustrates the total risk exposure for the analysed system. The risk matrix should be based on the RAMS-standard EN 50126 [5, 6]. However, since maintenance is a result of the technical system design, the likelihood can be expressed as a composition of barriers rather than small numbers [16]. This approach has two advantages. First, an improved control of the maintainability design characteristics of the technical system. The second advantage is an improved maintenance understanding in the group performing the analysis. Barrier types and the composition of barrier chains are important for the fulfilment of the analysis’s acceptance criteria [15]. For the barrier analysis, barrier types are defined which are linked to the probability of loss of a function, which may lead to an undesirable event or accident [15]. The application of EN 50126 [5, 6] and related methodologies for barrier analysis also point to other applicable criteria for accepting changes to the maintenance program of railway track. The CSM-RA regulations gives three possible principles for risk acceptance [4]. One of these principles are an explicit risk estimation, which can be based on an application of the standard EN 50126. Some TSIs are also related to the RAMS-standard (e.g., [12]). There exist a quantified level for acceptance of functional failures in technical systems that have a credible direct potential for a catastrophic consequence. However, it is also possible to accept the risk if it is possible to demonstrate that the national safety level can be maintained [4]. This acceptance is related to the Globalement Au Moins Equivalent (GAME) principle, which states that any new or modified transport system should be globally as safe, or safer, than the existing accepted as a reference system (EN 50126-2, [6]). By using the standard EN 50126 and related methodologies and tools, the GAME demonstration can be done in four ways (EN 50126-2, [6]). When using FTA for modelling risk in railway safety according to EN 50126-2 it is natural to use the hazard at the railway system level as the top event. In addition, these top events

58

P. Söderholm and L. Wikberg

can be retrieved through the use of CSIs. The reason being that hazard at the railway system level is a hazard that affects a railway in operation as a comprehensive system (EN 50126-2). FTA also provides an integration of CSIs with rail defects on lower levels [13]. The study presented mainly used a combination of FMEA/FMECA, FTA and ETA to model hazards and their barriers. The FTA is related to the first part of the FMEA, i.e., functions, functional failure (failure mode), related failure mechanisms, the nature of the consequences at fault and related criticality. One limitation is that the analysis focuses on safety critical faults related to track, i.e., by using CSIs as top events in the FTA. The lowest level of the FTA was LRUs, i.e., the system level where tasks in the physical infrastructure are performed. However, this is a mainly a dependability criterion, which can act as input to a maintenance program. According to EN 50126-2, the analysis should proceed until the basic events of the FTA are independent. The second part of the FMECA focuses on failure detection, and the inherent provisions that exist to compensate for the failure. To support this analysis the existing barriers where structured in a dedicated event tree. The completed FMECA combines the FTA and barrier analysis in a spreadsheet. Performing FTA prior to conducting FMECA greatly facilitated completion of the FMECA itself. The reason being that the FTA through the graphical illustration facilitated for the risk analysist performing and completing the FMECA. Graphical illustration utilising FTA clearly visualised the extent of the analysed undesired event, focusing on safety related-events and CSIs. Conducting an initial FEMCA identified the fault modes’ corresponding RPNs, where certain RPNs were unacceptable high due to insufficient or missing barriers. Furthermore, development of an organized event tree structure of barriers facilitated a holistic hierarchical view of different types of barriers. Creation of such an event tree structure covering existing barriers was beneficial for completion of the remaining part of the FMECA. Hence, quantifying RPNs for fault modes derived from the FTA. Completion of the FMECA resulted in necessary risk control actions and improvements regarding fault modes with high RPNs. Corresponding improvements aiming at reducing RPNs to acceptable levels, and continuous monitoring of fault modes, risks and barriers. Future work at Trafikverket includes determining risk areas for railway safety and the establishment of risk and barrier monitoring. Additional risk analyses are necessary through the application of FTA, ETA and FMECA, with the aim of broadening and deepening the application within additional technical areas besides track as part of the infrastructure (e.g., command-control and signaling, telematics, and energy) and risk areas (e.g., dependability in addition to safety) within Trafikverket. Acknowledgements We acknowledge the support received from Trafikverket by projects ASSET (TRV 2022/29194) and Reality Lab Digital Railway (TRV 2017/67785). We greatly appreciate management commitment from Björn Dellås and Jonas Larsson, and the specialists’ technical knowledge as input to the analyses.

Risk-Based Safety Improvements in Railway Asset Management

59

References 1. (EU) 2018/762—Establishing common safety methods on safety management system requirements 2. (EU) No 1078/2012—Common safety method for monitoring to be applied by railway undertakings, infrastructure managers after receiving a safety certificate or safety authorisation by entities in charge 3. ERA-GUI-02-2015—Implementation guidance on CSIs 4. (EU) No 402/2013—Common safety method for risk evaluation and assessment 5. EN 50126-1—Railway applications—the specification and demonstration of reliability, availability, maintainability and safety (RAMS)—part 1: generic RAMS process 6. EN 50126-2—Railway applications—the specification and demonstration of reliability, availability, maintainability and safety (RAMS)—part 2: systems approach to safety 7. IEC 31010:2019—Risk management—risk assessment techniques 8. ERA/GUI/02-2008/SAF—Collection of examples of risk assessments and of some possible tools supporting the CSM Regulation 9. IEC 60812:2018—Failure modes and effects analysis (FMEA and FMECA) 10. IEC 62502:2010—Analysis techniques for dependability—event tree analysis (ETA) 11. IEC 61025:2006—Fault tree analysis (FTA) 12. (EU) No 1299/2014—Technical specifications for interoperability relating to the ‘infrastructure’ subsystem of the rail system in the European Union 13. IRS (2018) IRS 70712—rail defects 14. UIC (2015) Leaflet 725—treatment of rail defects 15. Söderholm P (2015) Rail breaks track section 124. Technical report, Trafikverket, Luleå 16. Eriksson A, Asgeir IA, Lennartsson S (2015) Risk assessment derailment due to rail breaks at track section 124. Technical report, Lloyd’s Register Consulting, Sundbyberg 17. Söderholm P, Bergquist B (2016) Rail breaks: an exploratory case study. In: Current trends in reliability, availability, maintainability and safety: an industry perspective, pp 519–541 18. Söderholm P, Nilsen T (2017) Systematic risk-analysis to support a living maintenance programme for railway infrastructure. J Qual Maint Eng 23(3):326–340 19. IEC 60300-3-11:2009—Dependability management—part 3-11: application guide—reliability centred maintenance

Performance of Reinforcement Learning in Molecular Dynamics Simulations: A Case Study of Hydrocarbon Dynamics Richard Bellizzi, Christopher Hixenbaugh, Marvin Tim Hoffman, and Alfa Heryudono

Abstract Utilizing computational frameworks that involve simulation, modeling, and machine learning has gained popularity in lubricant industries to speed up research development. The frameworks serve as digital twins to aid the design of new lubricants by allowing the study of molecular assembling processes, analyzing various candidate chemicals, and understanding their physical properties under different application conditions to complement laboratory experiments. This work aims to evaluate the performance of the Proximal Policy Optimization (PPO) deep reinforcement learning (RL) agent in describing long-chain folding hydrocarbons, compounds commonly used as a main ingredient in lubricant industries. The hexadecane structure is a suitable benchmark molecule for assessing the RL agent. The policy learned by the RL agent encodes the intramolecular characteristics required to dictate the activity of individual molecules. Once trained on ab initio molecular dynamics trajectories, the RL molecular agents act in a virtual environment. Observing the dynamics of topological shapes and their properties then demonstrates the agents’ ability. Keywords Reinforcement learning · Molecular dynamics · Digital twin

R. Bellizzi Fuchs Lubricants Co., 12 Howland Road, Fairhaven, MA 02719, USA e-mail: [email protected] C. Hixenbaugh (B) Department of CSIS, University of Massachusetts Dartmouth, Dartmouth, MA 02747, USA e-mail: [email protected] M. T. Hoffman Fuchs Lubricants Germany GMBH, Friesenheimerstr. 19, 68169 Mannheim, Germany e-mail: [email protected] A. Heryudono Department of Mathematics, University of Massachusetts Dartmouth, Dartmouth, MA 02747, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_5

61

62

R. Bellizzi et al.

1 Introduction The field of synthetic chemistry modeling is rapidly becoming a necessity in various material science industries. Lubricant formulations, in particular, require the use of various chemicals that are optimized to achieve the desired physical and chemical properties for the end customer application. Molecular Dynamics (MD) has emerged as a powerful tool to analyze these combinations of molecules, allowing for deeper insights into product performance [1]. Improving products for customers ensure they are able to obtain lubrication for mechanical systems or dampening in motion control applications. Although the overall system of molecules acts as a fluid in these end applications, the effects at the interface between the metal surfaces can significantly alter the properties of the fluid through the different chemical reactions that can occur. Evaluating the effects of these conditions is often challenging, as obtaining accurate measurements can be costly and time-consuming. However, Machine Learning (ML) has yielded methods that leverage historical experimental data to predict the outcomes of these scenarios. While ML methods have significantly improved product understanding, simulating these scenarios in addition to these approaches pushes the depth of that understanding even further. Reinforcement Learning (RL) is gaining traction, particularly in control systems, where traditional PID controllers are being replaced with RL agents to steer the system’s actions [2]. MD and quantum chemistry (QC) systems are similar to control systems in that they are subject to physical restrictions dictated by some function [1, 3]. For example, benchmark RL control system environments, such as the Bipedal Walker, require the RL agent to control the torque of each joint motor to enable walking. Similarly, in MD, the positions of each atom are governed by Newtonian physical equations. A statistical distribution of these actions is required to evaluate the most likely event to occur in a robust manner. RL provides a way of learning the distribution and encoding of the probabilistic action space for a given molecular structure. The encoded information can then act as a control system for that molecule in future simulations (Fig. 1). Establishing this experiment required the integration of several tools. A proofof-concept example using the alkane chain, hexadecane, acts as a trial structure to

Fig. 1 Hexadecane structure C12 H34

Performance of Reinforcement Learning in Molecular Dynamics …

63

demonstrate the integration process and corresponding results. Given the physical laws surrounding atom motion, selecting an RL agent that performs well in continuous action spaces is essential. Additionally, the selected agent requires an environment to train on, and Gymnasium environments are often used in bench-marking agents’ performance [4]. This Python framework provides the capability to build a custom environment where the molecular RL agent can train. Compared to other Gym environments, this environment presents the agent with the task of following or mapping the movement contained within a trajectory, allowing the higher-level constraints of the trajectory to drive the training environment. The bonds, angles, and dihedrals of a molecule provide multiple attributes for learning control. Suppose the RL agent learns a probability space for each atom in the molecule based on rewards revolving around these features. In that case, learning a motion policy from the training environment is possible. For example, the agent predicts new coordinates at each step of the environment. The bonds, angles, and dihedrals are evaluated against a distribution with the new coordinates as inputs. These rewards yield accuracy updates for how likely that step is. The action of moving an atom in a way that stretches a bond would have a lower probability, resulting in more punishment being induced. The learning task is then to move atoms in a way that expends the least amount of energy or punishment in reward terms.

2 Chemical Simulation Details Developing a deep understanding of lubricants using molecular dynamics (MD) simulations requires a large number of simulations to cover the wide variety of molecules and their interactions in various scenarios. Due to the reduced accuracy of MD simulations, many repetitions of each scenario are needed to ensure statistically significant results [1]. In contrast, quantum chemical (QC) calculations provide more accurate results but are computationally expensive [3]. The behavior of different molecules in MD simulations is dictated by a molecular potential energy function, which is often represented using models such as Lennard– Jones or machine-learned potentials like TorchANI [1, 5]. The selected force field(s) describe the forces and velocities of the system through time. To interpret intraand inter-molecular forces in the MD simulation, a combination of force fields is often required to govern all the interactions that occur [1]. Machine learning (ML) methods like TorchANI can predict these forces and velocities after training on QC data. The generalization of a large number of QC calculations into regression problems demonstrates how these ML methods enhance MD simulations [5]. While this generalization is powerful, additional information may be required in unique scenarios through additional force fields or control mechanisms. Providing granular control with the flexibility and speed of an ML model is still ideal for these scenarios. Training RL agents on individual structures could be an avenue toward this type of adaptive simulation control.

64

R. Bellizzi et al.

Configuring simulations to specific structures requires defining unique force fields for each structure. In this way, each structure has information on how to behave in the combined simulation. Before controlling multiple structures, it needs to be determined if an agent can regulate a single structure. Leveraging QC software, the molecular structure runs in a controlled simulation yielding a trajectory file for a 250 K equilibrium canonical ensemble simulation [3]. The generated trajectory file contains all the physical constraints and the observable states defined by the simulation. Testing whether this information maps to a reasonable distribution demonstrates the capacity of an RL agent to learn molecular equilibrium dynamics. The agent can build new distributions for how that structure might adapt in new situations by training on different scenarios. This enables the agent to become an isolated “digital twin” of that structure within the conditions dictated by the trajectory file. By encoding underlying details, the agent can ingest more detailed distributions from QC-level trajectories, allowing for various other scenarios to be represented and evaluated. The aim is to prove that the agent can learn a steady state scenario and to discretize MD simulations to a personalized control of each structure.

3 RL Learning Details 3.1 Deep Reinforcement Learning Reinforcement Learning can be used to solve sequential decision-making problems such as control problems. An example of a control problem is adaptive cruise control for autonomous vehicles [2, 6]. Sequential decision-making problems can be abstracted and formalized as a Markov Decision Process (MDP). MDPs consist of three parts: states, actions, and rewards. At each time step of an MDP, the agent observes a state, st ∈ S, selects an action, at ∈ A, and receives a numerical reward rt ∈ R ⊂ R. Following that sequence, the agent observes the environment’s next state, st+1 . The set of all non-terminal states is defined by S, A is the set of all actions available in a state s, R is the set of possible rewards, and R is the set of real numbers. This information can be used to compute the state-transition probabilities defined by . p(s'|s, a) = PrSt−1 = s'|St−1 = s, At−1 = a, where s' is the next state [7]. The agent’s goal is to select actions that affect the environment in a way that maximizes the numerical reward. The reward is a metric that is commonly used to quantify agent performance. The MDP is illustrated in Fig. 2.

3.2 Recurrent Proximal Policy Optimization Proximal Policy Optimization (PPO) agents, developed by OpenAI, are proven adept at robotic control systems. These agents follow an Actor-Critic style, a proven method

Performance of Reinforcement Learning in Molecular Dynamics …

65

Fig. 2 Markov decision process [7]

Fig. 3 Actor-critic architecture

for maximizing the reward for a wide array of problems. Figure 3 shows how the offline agent learns. This particular algorithm can optimize recurrent-based neural networks such as LSTM. The advantage of using PPO and LSTM stems from how the network can determine the best representation of the states based on the sequential set of observations. The PPO algorithm is an example of a policy gradient algorithm that performs clipping on the policy to stabilize the training process. The PPO algorithm is favorable for molecular structure control as it is stable in more complex states. Rather than explicitly designing a state, this approach lets the model generate the states based on the previous observations [8]. Automatically learning states contained in the trajectory through sequentially stored observations is a desirable feature in mapping physical states to various application scenarios. The algorithm shown in Fig. 4 displays the process for running the PPO with an Actor-Critic style [8]. Here trajectory means a single round of iterations for the current policy. These trajectories provide updates to the Actor and Critic through experiences gained over numerous repeated attempts. The Actor and Critic are LSTMs with a single recurrent layer and 128 hidden layers for these experiments. The networks are optimized using the Adam optimizer. As shown in Fig. 3, the Actor optimizes its policy distributions, and the Critic optimizes the value function based on the experiences gained through the previous policy. After these updates, the Actor generates a policy which is then evaluated in the environment. The advantages generated from these updates allow for the Actor to update its weights and develop a new policy for the next iteration. These updates are performed for 10 epochs with batch sizes of 512. The Gymnasium environment offers

66

R. Bellizzi et al.

Fig. 4 PPO, actor-critic style [8]

a way to vectorize an environment allowing for parallel runs to occur simultaneously [4]. Leveraging this feature, the MoleculeEnv is run across 6 different instances. All the generated data here were computed on a workstation equipped with an Intel 11th Gen i7 8-core CPU with 32 GB of RAM.

4 RL in Chemistry Previous work in RL within chemical simulation revolved around conformer searches, reaction pathways, and various other methodologies [9–12]. These approaches have shown promising results and have demonstrated that RL modeling of MD simulations continues to gain traction. An interesting method for RL discussed in [10], demonstrated the capacity for these agents to explore large design spaces to search for and evaluate new conformers. Finding new materials or interaction methods using these approaches become a real possibility. These models are capable of managing QC level accuracy with the goal of optimizing structure geometry, further showing the ability to generate conformers with the level of an expensive computational equivalent and can sustain detailed information [9]. The work presented in [12] uses RL to investigate reactions by formulating the problem as a game to track down a chemical reaction’s transition state. Allowing the agent to interact with the MD trajectories and evaluate the different pathways showed how these approaches successfully pair with MD providing methods for interpreting and learning from these simulations. Like these previous efforts, this approach follows a similar method where the goal learned is unique to whatever simulation is input to the environment. MD is notably present in pharmaceutical settings where bio-molecular structures such as proteins are the main focus [11]. These areas provide avenues for drug discovery and the different chemical pathways to get there while accurately capturing the dynamics within these problem spaces. Presenting MD in the context of a control problem allows an RL agent to become a feasible solution for manipulating a molecule. The movement of each atom and the movement of the molecule as a whole are two factors that the agent learns. The forces and the velocities are calculated at each step when these positions change during MD simulations. These factors are externally dictated by the overarching temperature

Performance of Reinforcement Learning in Molecular Dynamics …

67

and pressure conditions presented to the structures and regulated by thermostats and barostats [1]. These factors are embedded in the training environment, and the agent implicitly builds its observed probability space around this. Hence, while the trajectories drive the learning environment, extending the learned probabilities to future applications revolves around pursuing a personalized force field in larger simulations. For example, in lubricants composed of several types of molecules, there are varying combinations, with some molecules comprising only 1% of the total mixture. Capturing the isolated effects these structures experience within the entire system presents an opportunity to study interactions and the changes in the system due to these interactions. Seeing how RL is permeating problems like controlling drones and other automated vehicles, the dynamics of individual molecules provide an alternative avenue for exploring successful control algorithms. A Recurrent Policy Proximal Optimization (PPO) agent has been proven to work on benchmark problems like the bipedal walker, which operates a two-legged system with multiple joints on each leg [4, 13]. Manipulating each joint allows the walker to progress forward until it can no longer maintain that progress. Similarly, by enabling the agent to chase after the trajectory where each atom acts like the joints from the bipedal walker. Considering these similarities and the successes of previous approaches utilizing RL in chemistry, the experiment builds around the concepts of these earlier works to test if RL can provide an avenue toward customized MD controllers.

5 RL Environment Details The creation of the MoleculeEnv utilizes the Gymnasium Python library [4], which offers a framework for generating personalized RL environments. ASE, Atomic Simulation Environment, is a widely used computational chemistry library known for its compatibility with various QC and MD software. For this experiment, DFTB3 [3], an open-source QC software that provides accurate results at a reasonable computational cost, is used to produce a trajectory utilizing the 3ob-3-1 parameter set. Although the trajectory file is loaded using a LAMMPS data file format in this study, other trajectory files can be utilized interchangeably. This approach provides flexibility in balancing computational cost and accuracy. Pre-defining these features is useful for establishing general rewards, as all molecules will contain them. During initialization, the MoleculeEnv loads the LAMMPS data and trajectory files and gathers distribution statistics for each bond, angle, and dihedral from the trajectory file over every step. The reward function uses this trajectory to represent how each bond should behave. The agent’s goal is to learn how to map each atom’s movements according to how its corresponding bonds, angle, and dihedral change within the loaded trajectory. These statistics are utilized in the reward function to compute appropriate rewards during training. In this environment, the observations for the agent include the Cartesian coordinates (x, y, z) of each atom, the current energy, and the change in energy from

68

R. Bellizzi et al.

the previous step. Using these observations, the agent selects new positions for each atom as part of the action space for this environment. As the values for each step can consist of anything, the action space is continuous for this type of approach due to the nature of the coordinate space.

5.1 Reward Functions Since the bonds, angles, and dihedrals demonstrate specific behaviors in trajectories, it is reasonable to fit a distribution for each of these and then compute rewards using the precomputed distributions. Since these distributions are fit prior to training, there is some initial setup time, but the training benefits overall since rewards are computed using cheap evaluations. Assuming these attributes fit a normal distribution, the z-score provides an approach similar to the bipedal walker mentioned earlier. In the bipedal walker, each action or application of torque to a joint requires energy, and the reward is computed relative to the amount of torque applied. More torque consumes more energy; therefore, the agent is punished more, whereas if it only uses a small amount, it is punished less. This means that for the rewards related to the joints, the reward is at most 0 if no torque is applied anywhere. These rewards, combined with the reward evaluating how far the walker has progressed, have proven successful in evaluating robotics control problems. Similarly, for this environment, each attribute is compared to its z-score with more energy consumed when the agent moves in lower probability spaces than higher. The closer to the mean the action is, the closer to zero its punishment. As the mean and standard deviation are computed at the start of the environment run, each attribute of the structure is evaluated at each step using the following z-test where the attribute i |i = attributei . These individual evaluations for the bond, gets a scor e = x1 −μ σi angle, and dihedrals provide individual rewards to tune each attribute contained distribution. The trajectory tracking is performed using a Root Mean Squared Deviation (RMSD) metric between the agents’ positional arrangement compared to the trajectories’ true position. All these values are brought together as a total reward for the agent as shown here, r ewar d = bonds + angles + di hedrals + r msd. As the model improves its trajectory tracking, the reward balances maintaining this behavior while finding the best way to adjust the atoms. Consolidating the z-scores of every attribute and the deviations from the trajectory evaluate how the agent performs. Each tracked bond, angle, and dihedral has zscore metrics precomputed. The root-mean-squared deviation metric acts as the main constraint within the environment. The environment uses these rewards collectively to leverage a similar approach to the bipedal walker. This means that both types of control problems can yield new types of agents that are transferable across domains. Regarding evaluating molecular structure dynamics, the MoleculeEnv is independent of what structure is desired.

Performance of Reinforcement Learning in Molecular Dynamics …

69

6 Results To achieve transferability to new scenarios, the reward function must generalize behavior effectively. Evaluating different reward approaches is necessary for this purpose. The energy of the system and deviation from the trajectory contribute to the learning objective along with explicitly defined statistical rewards for the molecular attributes. These types of rewards are essential for capturing the overall physics of the environment. Even so, when trained under these conditions, the model learned to center all atoms around a single point and orbit around it. Using the trajectory during training captures the physics involved without requiring a highly customized reward function. Additionally, since the trajectory contains these details and can run with arbitrary QC level accuracy, explicitly encoding a detailed probability distribution of the regulated simulation offers a way to create a general reward function. Since the distributions for each molecular attribute are precomputed, there is a direct comparison between the expected distribution and the distributions generated by the agent. Figures 5 and 6 show the distributions generated for this particular 250 K steady-state dynamics simulation [3]. In the trajectory, it is clear that two bond types are present regarding the C–C and C–H bonds, as seen in Fig. 5. That said, the standard deviation appears to demonstrate each bond’s rigidity. Tighter distributions relate to stiffer bonds, while the broader ranges allow for more movement at a reasonable cost. The importance of modeling each attribute separately becomes apparent after examining the dihedral distributions (Fig. 7). The majority of the dihedral means revolve around three distinct rotations. That said, there are a few standouts with minimal restrictions to their rotation, as shown in Fig. 6 in particular. These few dihedrals point to cases where these, in particular, lean towards a single rotation while requiring much less energy than the others to move to a new rotation. The few high variation examples in Fig. 6 give direct insights into each movement. Looking at the violin plot is informative as it points to differing behavior between the dihedrals. This level of detail provides access to Fig. 5 Bond length distribution for hexadecane at 250 K

70

R. Bellizzi et al.

Fig. 6 Dihedral distributions for hexadecane at 250 K

Fig. 7 Agent reward progress after 20 h of training

evaluating and interpreting individual intramolecular attributes. Already this provides an avenue to structure property relationships in fragmentation simulations to identify high probability events and where they occur in the structure. Figure 8 shows dihedral 0 highlighted in yellow, and when compared to the dihedral violin plot in Fig. 6, the conclusion is drawn that these dihedrals on end have more torsional freedom than internal dihedrals during the equilibrium state. Fig. 8 Dihedral 0 identification

Performance of Reinforcement Learning in Molecular Dynamics …

71

Fig. 9 Bond distribution comparison

Since there is a clear structure for the agent to match, these features can be evaluated in a similar manner as the agent determines new positions. Looking at the agents’ distributions evaluated over a matching number of steps gives the key measure for how well the agent is performing. Figure 9 shows what distributions the agent has learned for the bonds so far. It is apparent that the agent is still optimizing each individual bond type as it has a single distribution representation for all bonds, as shown in Fig. 9, with the distributions starting to shift into place after 180 h of training. The reward is still improving, as shown in Fig. 7, so it can be examined at periodic intervals to evaluate the agents’ continuing progress. The initial convergence and increase in reward observed at around 700 steps are quite intriguing. The subsequent drop-off suggests that the model’s patience, which was set to a few hundred steps, may have caused the early termination of the learning process. At this stage, the agent has not yet fully learned all the nuances of the problem. Still, an examination of the angle distributions reveals that it has started tightening them, bringing them closer to the target quantum chemistry (QC) trajectory. The convergence behavior can be attributed to the exploration–exploitation tradeoff, with the observed peaks and valleys reflecting this dynamic. The peak signifies that the agent has learned a local minimum, after which it ventures out of this region to optimize other features. Given the high dimensionality of the problem, with approximately 272 features, including bonds, angles, and dihedrals, this problem exhibits more intricate dynamics, necessitating longer training times. The agent continuously refines its policy to balance these features, and avoiding convergence to local minima is of paramount importance, considering the nearly infinite configurations the structure can assume. The observed improvement and subsequent drop-off in performance suggest that longer training durations may reveal additional stepwise learning characteristics as the agent evaluates each new addition to the policy, both individually and in conjunction with other features.

72

R. Bellizzi et al.

One possible approach to accelerating convergence could be the implementation of offline learning. Instead of randomly taking action, the agent could begin with a pre-trained policy that maps the RMSD trajectory. This approach would allow the agent to focus on adapting the distributions to any new scenarios introduced from the QC side, potentially leading to more efficient learning and improved performance.

7 Conclusion This work examines the performance of observing agents interacting in an environment, focusing on a single structure. However, further testing is necessary to determine the full potential of this approach. Various configurations were explored through short-term training to test an agent as a molecular controller, with the longest training time of around 180 h. Although high-performance systems were not fully utilized in this work, the vectorized environment structure allows for more parallel evaluations at scale, enabling the learning task to be divided. Some optimization is required to take full advantage of modern GPU acceleration on top of this inherent parallelization. The reward function was modified to optimize the agent’s performance. Scaling the attribute rewards appropriately to the trajectory tracking reward influenced the training objective. If the attribute rewards saturate the tracker rewards, the agent may prioritize improving these attributes over following the trajectory. The balance between these objectives is challenging, and further exploration is needed to determine the optimal reward split. Long-term training may also provide a solution, where the policy learns to follow the trajectory until that reward is minimal. The scale shifts the reward punishments primarily towards the individual attribute distributions. A theoretical “training regiment” could be implemented to generalize structure behavior across different scenarios. An adversarial approach, where the agent is directly tied to simulation software, would allow for a more direct learning environment. Generating various settings with varying conditions and allowing the agent to walk through them repeatedly presents an opportunity for online training systems. Adjusting the reward function to regulate the training process is always an area for optimization. Additionally, allowing the forces and velocities to be observed provides another avenue for adjustment. Overall, the tracking method used to evaluate the agents lag behind the trajectory has the most opportunity for improvement, as this feature drives the implicit constraints for the simulation. As a multi-disciplinary project combining chemistry, machine learning, and computational methodologies, there are numerous opportunities for optimization. Consolidating these topics together is step one, with the expectation that each area can be improved, bringing a tunable and adaptive control system for MD simulations another step closer. A force field potential that focuses on its task alone presents an intriguing opportunity to supplement existing MD simulation capabilities, especially for complex environments where lubricants operate.

Performance of Reinforcement Learning in Molecular Dynamics …

73

Acknowledgements RB and MTH gratefully acknowledge the support provided by Fuchs Lubricants for computational and research support. CH and AH thank UMass Dartmouth CSCDR (Center for Scientific Computing and Data Science Research) for computational support. AH is supported by the UMass Dartmouth MUST ONR Project S31320000056116.

References 1. Thompson AP, Aktulga HM, Berger R, Bolintineanu DS, Brown WM, Crozier PS, in ’t Veld PJ, Kohlmeyer A, Moore SG, Nguyen TD, Shan R, Stevens MJ, Tranchida J, Trott C, Plimpton SJ (2022) LAMMPS—a flexible simulation tool for particle-based materials modelling at the atomic, meso, and continuum scales. Comput Phys Commun 271:108171 2. Laumônier J, Desjardins C, Chaib-draa B (2006) Cooperative adaptive cruise control: a reinforcement learning approach. In: The fourth workshop on agents in traffic and transportation, Hakodate, Hokkaido, Japan 3. Hourahine B, Aradi B, Blum V, Bonafé F, Buccheri A, Camacho C, Cevallos C, Deshaye M, Dumitrica T, Dominguez A et al (2020) Dftb+, a software package for efficient approximate density functional theory based atomistic simulations. J Chem Phys 152(12):124101 4. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540 5. Gao X, Ramezanghorbani F, Isayev O, Smith JS, Roitberg AE (2020) TorchANI: a free and open source PyTorch based deep learning implementation of the ANI neural network potentials. J Chem Inf Model 60(7):3408–3415 6. Zhang Y, Sun P, Yin Y, Lin L, Wang X (2018) Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning. In: 2018 IEEE intelligent vehicles symposium (IV), pp 1251–1256 7. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press 8. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 9. Ahuja K, Green WH, Li Y-P (2021) Learning to optimize molecular geometries using reinforcement learning. J Chem Theory Comput 17(2):818–825 10. Kajita S, Kinjo T, Nishi T (2020) Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations. Commun Phys 3(1):77 11. Shin K, Tran DP, Takemura K, Kitao A, Terayama K, Tsuda K (2019) Enhancing biomolecular sampling with reinforcement learning: a tree search molecular dynamics simulation method. ACS Omega 4(9):13853–13862 12. Zhang J, Lei Y-K, Zhang Z, Han X, Li M, Yang L, Yang YI, Gao YQ (2021) Deep reinforcement learning of transition states. Phys Chem Chem Phys 23(11):6888–6895 13. Goodger N (2020) Proximal policy optimisation in PyTorch with recurrent models. Medium

Causal Effects of Railway Track Maintenance—An Experimental Case Study of Tamping Erik Vanhatalo, Bjarne Bergquist, Iman Arasteh-Khouy, and Dan Larsson

Abstract This paper illustrates a generalisable experimental approach to assess the causal effects of track maintenance actions. In our case, we assess the causal effects of tamping on railway track geometry. The tamping was conducted during the regular autumn tamping campaign in 2022 on track section 118 of the Swedish Iron ore line “Malmbanan”. The experimental setup provided frequent measurements of track geometry closely before and after the tamping action to estimate the potential causal effects of tamping on track geometry. Damill AB provided the track geometry data through an onboard measurement system mounted on a passenger train wagon travelling the track section frequently. The system provides position, speed, and track geometry variables, such as longitudinal level and lateral alignment, through GPS, accelerometers, and gyros. We based the analysis on hypothesis testing (Welch’s two-sample t-test) to test the statistical significance of tamping effects on track geometry variables. The analysis included a segmentation of the tamped track section to form experimental units allowing the application of hypothesis testing. The frequent measurements also allow for assessing how effects change over time. The analysis established a statistically significant reduction in the standard deviation of the longitudinal level and the standard deviation of the lateral alignment of the track after tamping. The experimental approach is generalizable for assessing the causal effects of other railway track maintenance actions.

E. Vanhatalo (B) · B. Bergquist Luleå University of Technology, Luleå, Sweden e-mail: [email protected] B. Bergquist e-mail: [email protected] I. Arasteh-Khouy Swedish Transport Administration, Luleå, Sweden e-mail: [email protected] D. Larsson Damill AB, Luleå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_6

75

76

E. Vanhatalo et al.

Keywords Experimental design and analysis · Tamping · Onboard sensors · Railway track maintenance

1 Introduction Railway track components such as ballast, sleepers, rail, fasteners, switches and crossings degrade due to traffic and environmental conditions. Railway track maintenance is, therefore, imperative to maintain a safe track with high ride quality. Maintenance practitioners apply maintenance assuming a resulting improvement of the asset’s condition (e.g., the track geometry). However, assessing the causal effect of a maintenance action in situ may be difficult. Typically, estimating causal effects and their statistical significance require a designed experiment where one controls the experimental factor(s) and evaluates the system response(s) [1]. It may be challenging in laboratory tests and even more so in situ. Dynamic systems require that the measurements are frequent enough to estimate the system responses after the factors are changed. This paper aims to demonstrate an approach to assess the causal effects of railway track maintenance actions in situ and determine their statistical significance. We illustrate the approach by a case study assessing the effect of tamping on the track quality condition measured by track geometry variables. The experimental analysis uses hypothesis testing of track geometry variables measured through an onboard mounted track geometry sensing equipment (Track-logger® ). Experiments rely on the experimenter tying the system responses to the experimental actions. Here, the frequent measurements allow timely matching between maintenance actions and measurements of the track condition. The frequent measurements also allow for assessing the stability and deterioration rate of the track after tamping. Track section 118 (TS-118) is part of the Swedish Iron ore line in northern Sweden. Approximately ten iron ore cargo trains and ten passenger trains travel TS-118 daily. While the number of trains is equivalent, the passenger trains are much shorter and lighter, and, therefore, we may attribute the majority of the total wear to the freight trains.

1.1 Tamping and Track Geometry Tamping is a railway track maintenance action used to improve the geometrical track quality, e.g., long wavelength (1–25 m) faults caused by repeated traffic [2]. Tamping is the most widely used maintenance action to fill ballast-sleeper gaps and homogenize ballast beds, but also a difficult action to evaluate in practice [3]. The effectiveness of maintenance to improve track geometry depends heavily on the initial condition of the track [2]. Prior research studying the tamping maintenance

Causal Effects of Railway Track Maintenance—An Experimental Case …

77

action has shown that train speed and maintenance history affect track deterioration [4] and that tamping may damage the ballast [4]. Tamping improves the track’s geometrical condition by restoring the ballast’s position to let the track regain ground support for the intended alignment. The improvement and the following deterioration rate of the track geometry may depend on, e.g., the track’s maintenance history, geotechnical characteristics, and traffic volume [4]. We have previously worked with prediction and planning models for tamping maintenance, e.g., [5, 6]. However, in previous work, there has been an ongoing discussion about the effect of tamping because known tamping maintenance actions were not always visible in measured track geometry variables. Another consideration is the stability of the newly tamped track bed. Tamped tracks in Sweden must either undergo a track stabilization action or be subjected to a speed restriction until 100,000 net tons have passed the track [7]. Measuring the effect of tamping in situ is also described as challenging [8]. Experimental studies on the effects of tamping in a controlled laboratory environment have been performed and are available in the literature, see e.g., [9, 10]. For example, the side tamping technology substantially increased the residual stresses in the ballast below the sleeper compared to vertical tamping in laboratory testing [9]. In Sweden, the Swedish Transport Administration (STA) requires that track geometry inspection cars monitor the geometrical condition of highly trafficked tracks at least six times per year. Although these measurements may aid in assessing the effects of tamping, the time interval between a passage of an inspection car and when tamping occurs complicates the assessment of effect dynamics. There is a risk of not capturing short-term track geometry changes if track geometry variables are not measured closely before and after the intervention. An indicator of the need for tamping at STA is the standard deviation of the longitudinal level on a specific track length, e.g., 100 or 200 m [8]. Typically, several track geometry variables are evaluated to form a complete assessment of the track condition.

2 The Experiment STA has the responsibility of planning which parts of a railway track section should undergo tamping. The contractor then makes a detailed plan for executing the tamping. In this case study, we consider the actual tamped track segments during the fall tamping campaign of 2022. From the total length of TS-118, we can extract experimental units (i.e., virtual track segments) that have received the treatment (i.e., tamping). Other segments of TS-118 that have not undergone tamping can provide comparisons as experimental units that did not receive the treatment. Table 1 provides examples of which parts of TS-118 the contractor tamped during the time interval when the track geometry was measured frequently. Figure 1 shows the first 1250 m of tamped track in Table 1, which we will focus on herein.

78 Table 1 Example of tamping performed on TS-118

E. Vanhatalo et al.

Date

From (km,m)

To (km,m)

Total distance (m)

2022-09-19

1198,300

1199,550

1250

2022-09-20

1200,295

1201,895

1600

2022-09-21 .. .

1205,390 .. .

1206,720 .. .

1330 .. .

2022-09-27

1211,260

1211,340

80

Fig. 1 The orange line provides the approximate tamped distance (km,m): (1198,300–1199,550) on 2022-09-19. Source Google Earth

2.1 Experimental Design and Analysis From an experimental design perspective, we can classify the experiment as a onefactor experiment with two levels: “tamping = YES” and “tamping = NO”. Table 2 explains concepts important for the design and analysis. The experimental unit to be studied is a crucial experimental choice. The measurement data contain GPS positioning information previously shown to have a precision

Causal Effects of Railway Track Maintenance—An Experimental Case …

79

Table 2 Important concepts in the experimental design and analysis Concept

Explanation

Experimental unit

A defined length of the railway track 100 m track segments that may undergo maintenance

In this paper

Experimental factor

The potential maintenance action to which the experimental unit is subjected

Tamping. Two levels: “YES” and “NO”

Response(s)

Measured variable(s), y

Measured track geometry variables

Significance level, α

The risk one is willing to take of being wrong when rejecting the null hypothesis H 0

The p-value determines the risk

Fig. 2 Segmentation of the tamped track length to create experimental units

of ±3 m for repetitive measurements. However, to reduce the potential impact of GPS precision and ensure that we measure track geometry for tamped segments, we skip the first 25 m of the tamped track before we start segmentation. We then split the measurement series into 100 m segments (i.e. experimental units) and used statistics from these segments to reduce the effects of positioning errors further. Figure 2 illustrates the segmentation to create experimental units. In hypothesis testing, it is common not to assume a direction of the possible effect of the experimental factor [1]. An alternative zero hypothesis would be to disregard the possibility the track improved between measurements, thus using a single-sided hypothesis. Using a two-sided test require larger deviations to reject the zero hypotheses, and since the natural order is for track conditions to degrade, this choice is a conservative one. In this case, a reasonable null hypothesis (H 0 ) is to assume that the mean values of track geometry quality indicators for the experimental units are equal before and after tamping. In the statistical test, we can reject H 0 if the data show the null hypothesis to be unlikely. We can thus formulate the null (H 0 ) and alternative hypothesis (H 1 ) according to:

80

E. Vanhatalo et al.

Table 3 Illustration of data needed for a two-sample t-test in software The difference in track geometry variable between two measurements for experimental units where tamping = “YES”

The difference in track geometry variable between two measurements for experimental units where tamping = “NO”

y11 .. .

y21 .. .

y1n

y2n

Group 1 mean difference: y 1

Group 2 mean difference: y 2

Group 1 variance: s12

Group 2 variance: s22

H0 : µ1 = µ2 H1 : µ1 = µ2 where µ1 is the expected value of the observations for the response for one level of the experimental factor (e.g., tamping = YES), and µ2 is the expected value of the observations for the response variable for the second level of the experimental factor (e.g., tamping = NO). We used a two-sided alternative hypothesis, i.e., H 1 would be true regardless of whether µ1 > µ2 or µ1 < µ2 . The test can thus detect both positive and negative effects. See [1] for more details on two-sample t-tests. In our analysis, we used Welch’s unequal variance two-sample t-test, which does not assume equal sample sizes or variances in the sample groups and outperforms the regular t-test, should the variation differ between the sample groups [11]. Welch’s unequal sample t-test is more conservative than a standard (Student’s) t-test based on a common pooled variance. Welch’s two-sample t-test is a useful statistical tool that can provide more accurate and reliable results when the data does not fulfil the assumptions of the traditional two-sample t-test. Before the analysis, data on the response variable(s) need to be collected before and after tamping for several experimental units, and average values and variances for each group are needed, see Table 3. All statistical analysis and data visualizations in this paper were performed in R statistical software [12].

3 Onboard Measurement System Damill AB’s Track-Logger® system recorded the measurements. The system’s sensors consist of two bearing-box accelerometers (Fig. 3) mounted on each side of a railway car, a gyro mounted on one of the bearing boxes, and recording equipment. The system uses a high and fixed measurement frequency resulting in the track being measured at least every 3 cm up to speeds of 160 km/h. The data are aggregated in postprocessing and indexed to measurement observations every 1 m to reduce the

Causal Effects of Railway Track Maintenance—An Experimental Case …

81

Fig. 3 Track-logger® equipment. Left: Accelerometer mounted on bearing axle box, partly covered in ice. Right: Data recording equipment underneath a Vy passenger train car. Photos courtesy of Damill AB

data storage in the following steps. The onboard data storage stores the observations. An operator collects the sensor data at a depot for offline postprocessing and resets the onboard storage. The postprocessing includes filtering and acceleration signal integration, but also calculations of responses that require a certain measurement length (e.g., twist) to obtain the track geometry. The recording equipment also includes a GPS sensor for positioning purposes. After postprocessing, the measurement data hold the coordinates of each observation and measurements of speed and track geometry variables such as longitudinal level, lateral alignment, and twist on 3 or 6 m bases. In this study, the Track-logger® was mounted on a passenger train operated by the company Vy, which travels the Iron Ore track daily. Damill personnel emptied the data at the Luleå Central train station and then performed postprocessing to produce the measured track geometry variables. Here we first focus on the standard deviation of the track’s longitudinal level measured over a 100 m track segment. Measurement data of the longitudinal level were filtered to the 25 m wavelength according to SS-EN 13848-1:2008. The standard deviation of the longitudinal level, as measured by the regular measurement train, will trigger a need for maintenance if it surpasses a maintenance limit value (TDOK 2013:0347). As produced by Track-logger® , the longitudinal level is not identical to the regular measurements since they are measured and calculated differently, but we consider them appropriate proxies. Tamping maintenance is expected to reduce the standard deviation of the longitudinal level. Therefore, we considered it a suitable response for studying the tamping effect. The Track-logger® also provides a measure of lateral alignment of the track, which we also used as a response.

82

E. Vanhatalo et al.

3.1 Data Pre-processing and Cleaning There is a small risk that the direction of travelling affects the measurements since the travel direction potentially affects both the GPS positioning and the vehicle-track interaction. Therefore, we have only studied the track’s left (or westernmost) rail and disregarded measurements obtained from travel in one direction (southeast-bound trains). Level measurements result from integrating acceleration readings. This coupling means that the acceleration measurements’ reliability relies on the train traveling above a certain speed. To avoid unreliable measurements, we have only included measurements where the train speed exceeded 40 km/h. The first measurement by the Track-logger® was on 2022-09-15, and the last was on 2022-12-14. The tamping maintenance action we study herein occurred on 202209-19. The track geometry measurements were performed twice before and several times after the tamping. The data include periods with frequent measurements but also time gaps due to measurement equipment maintenance or the train car not passing TS-118. Out of 20 available measurement series obtained between 2022-09-15 and 2022-12-14, we excluded two passages due to issues with accelerometer readings. The 18 remaining measurements had train speeds well above 40 km/h. Figure 4 provides the measured longitudinal level (raw data) for all the measurement dates. Four additional measurements for the lateral alignment in December were excluded due to unknown data recording issues.

Fig. 4 Raw data for the longitudinal level (mm) for the left rail. Two measurements (15 and 17 Sep.) before tamping in red colour

Causal Effects of Railway Track Maintenance—An Experimental Case …

83

4 Results To illustrate the experimental analysis procedure, we focus on the 1250 m tamping action performed on 2022-09-19 (Fig. 1). We will also focus our illustration on analyzing the potential effect of tamping on the standard deviation of the longitudinal level response for the left rail (stdevLL) ans the lateral alignment.

4.1 StdevLL Trend on Tamped Track Segments We split the tamped track data into twelve 100 m segments and then calculated the standard deviation of the longitudinal level (stdevLL) for each measurement on each segment. Figure 5 shows the stdevLL for the twelve segments and the 18 measurement dates. Figure 5 indicates an initial reduction in the post-tamping stdevLL measurements.

4.2 StdevLL Trend on Untamped Segments We need measurements from segments that have not undergone tamping (experimental units) for the statistical hypothesis test. We argue that there are different options to extract these. One option would be to study the individual segments before

Measurements compared in Welch’s t-test: Oct. 13 (after tamping) - Sep. 17 (before tamping)

Fig. 5 The standard deviation for the longitudinal level for each segment (experimental unit) for the left rail. Track segments with km,m markers (1198,300–1199,550). Tamping occurred on 202209-19

84

E. Vanhatalo et al.

Measurements compared in Welch’s t-test: Oct. 13 (after tamping) - Sep. 17 (before tamping)

Fig. 6 The standard deviation for the longitudinal level for 12 segments that have not undergone tamping. Comparative track segments with km,m markers (1196,750–1198,000)

and after maintenance. We chose to compare similar segments and split them into two groups. The first group would be maintained; the other would not. Here, we wanted the comparative track segments to come from an area geographically close to the tamped segments. We, therefore, chose comparative track segments with km,m markers (1196,750–1198,000), a 1250 m length of the track just before the tamped segments when traveling in the northwest direction. Figure 6 shows the stdevLL of the left rail for the 12 segments that have not undergone tamping and the 18 measurement dates. Comparing Fig. 6 with Fig. 5, while noting that while the observations vary from measurement to measurement for individual segments, the general trend is that the stdevLL of the untamped segments remains constant during the studied time interval. Figure 5, conversely, displays a reduction in stdevLL following tamping, followed by a slight increase in the StdevLL primarily in December.

4.3 Welch’s Two-Sample t-Test The statistical test requires creating difference vectors between two measurements by the Track-logger® , see Table 3. The time horizon in the test needs to be specified here to measure potential effects. The STA engineers suggested calculating the potential effects of tamping after 100,000 gross tons have passed the track, following the standard [7]. Therefore, the basis for the hypothesis test is the difference in the mean of the stdevLL between October 13, 2022 (after tamping) and September 17, 2022 (before tamping).

Causal Effects of Railway Track Maintenance—An Experimental Case …

85

We also use a standard log transformation (y* = ln(y)) for the standard deviation response values. Applying the log transformation is standard practice since this will transform the shape of the χ2 distribution of the standard deviation to a shape closer to the normal distribution [1]. As input to Welch’s two-sample t-test, we now have Table 4, which holds the difference vectors, given the two levels of the experimental factor. Performing the statistical test given the difference vectors in Table 4 results in a significant difference between the two samples. The p-value of the test is 0.00032. The 95% confidence interval for the difference in means for the natural logarithm of the stdevLL is: [−0.818 −0.285]. Put differently, the mean of the stdevLL was significantly smaller on October 13 compared to September 17 on the tamped segments. A slow-increasing trend in the stdevLL during the two months of extra measurements following tamping may be noticeable in Fig. 5. The slight increase in the stdevLL from October to December 2022 can be due to the track adjusting its geometry during traffic after tamping or potentially due to frost heave in the track substructure. However, performing Welch’s two-sample t-test based on differences in stdevLL between December 14 and October 13 shows that this difference is not statistically significant (p-value 0.24). Table 4 Difference vectors comparing the natural logarithm of standard deviations of segments on October 13, 2022 (after tamping) with segments on September 17, 2022 (before tamping) Segment

Difference in ln(StdevLL), October 13–September 17, where tamping = “YES”

Difference in ln(StdevLL), October 13–September 17, where tamping = “NO”

Seg. 1

−0.490

−0.053

Seg. 2

−0.469

0.178

Seg. 3

−0.976

−0.244

Seg. 4

−0.240

−0.394

Seg. 5

−0.407

0.111

Seg. 6

−0.187

0.111

Seg. 7

−1.092

−0.075

Seg. 8

0.120

0.324

Seg. 9

−0.874

0.446

Seg. 10

−0.487

−0.385

Seg. 11

−0.740

0.310

Seg. 12

−0.389

0.057

Average

−0.519

0.032

Variance

0.122

0.074

86

E. Vanhatalo et al.

Fig. 7 Raw data for the lateral alignment (mm) of the track. Two measurements (15 and 17 Sep.) before tamping in red colour

4.4 Effect on Lateral Alignment We have also evaluated the effect of tamping on the lateral alignment. Figure 7 provides the measured lateral alignment for all the measurement dates excluding measurements in December 2022. Four measurements in December were excluded due to unknown data recording issues for the lateral alignment. If we proceed and perform Welch’s two-sample t-test according to the procedure outlined in Sects. 4.1, 4.2, and 4.3 we find that there is a statistically significant reduction (p-value = 0.043) in the natural logarithm of the standard deviation of the lateral alignment of the track. The 95% confidence interval for the difference in means for the natural logarithm of the standard deviation of the lateral alignment is: [−0.613 −0.0096]. That is, the standard deviation of the lateral alignment on 100 m segments was lower after tamping (October 13) than before tamping (September 17).

5 Generalized Experimental Approach In Fig. 8, we generalise our applied approach to assess the causal effects of track maintenance actions and summarise important steps the experimenter must consider when planning the experiment. In our case study, we studied the effect of tamping on track geometry for 100 m segments of the track. However, one may study other track maintenance actions using the suggested experimental approach. The experimenter must decide on experimental track maintenance treatments or methods in the planning phase. The experimenter then runs the experiment by applying the treatments to the

Causal Effects of Railway Track Maintenance—An Experimental Case …

87

Fig. 8 Generalized experimental approach. Choices in our case study are given in dark grey boxes

experimental units. Experimental units can be track components or segments, as in our case study. We also need at least one response variable to capture the potential effect that the treatments may have on the experimental units. The experimenter must outline the statistical analysis of the experiment already when choosing treatments and experimental design. Welch’s two-sample t-test used in our case study was a reasonable choice depending on the experimental design with one experimental factor and two levels of the factor (tamping = “YES”; tamping = “NO”).

6 Conclusions and Discussion This paper illustrates a generalisable experimental approach to assessing causal, in situ effects of railway track maintenance actions. We show how to apply experimental treatments to experimental units and use Welch’s t-test to establish the statistical significance of treatment effects. The experimental design herein is ‘simple’ because it only has one experimental factor tested on two levels: “tamping = YES” and “tamping = NO”. However, our experimental approach can accommodate more sophisticated experimental designs, e.g., factorial designs. The statistical analysis and the experimental design are interconnected, so if the design changes, so must the analysis. Our case study featured frequent track geometry measurements through a newly developed onboard mounted track geometry measurements equipment on a passenger train car. These frequent measurements allow monitoring of short-term track geometry degradation behaviour after tamping, which we also observed in our case study, although this was not our primary aim here. Future work on aligning track geometry measurements from onboard train measurement systems would be interesting to improve the resolution and possibility of precise positioning of faults. For example, Khosravi et al. [13] tested alignment methods for condition measurements of linear assets. To reduce the effects of positioning errors in our case study, we studied track segments of 100 m in length. Another venue for future research is to extend the

88

E. Vanhatalo et al.

experimental analysis approach to accommodate multivariate responses and analyses. Typically, a maintenance action may affect several potential track geometry variables, and multivariate analysis of the response space may be beneficial. Acknowledgements The authors thank the Swedish Transport Administration for funding the work presented in this paper through the two projects “Fact or Fiction” (TRV 2020/25832) and “ASSET” (TRV 2022/29194). We also thank the train operator Vy Tåg for allowing the Track-logger® system to be mounted on one of their train cars. This allowed frequent measurement of track geometry for track section 118.

References 1. Montgomery DC (2021) Design and analysis of experiments, 10th edn, EMEA edition. Wiley 2. Selig E, Waters J (1994) Track geotechnology and substructure management. Thomas Telford 3. Guo Y, Markine V, Jing G (2021) Review of ballast track tamping: mechanism, challenges and solutions. Constr Build Mater 300:123940. https://doi.org/10.1016/j.conbuildmat.2021. 123940 4. Audley M, Andrews JD (2013) The effects of tamping on railway track geometry degradation. J Rail Rapid Transit 227(4):376–391. https://doi.org/10.1177/0954409713480439 5. Sedghi M, Bergquist B, Vanhatalo E, Migdalas A (2022) Data-driven maintenance planning and scheduling based on predicted railway track condition. Qual Reliab Eng Int 38:3689–3709. https://doi.org/10.1002/qre.3166 6. Sedghi M, Kauppila O, Bergquist B, Vanhatalo E, Kulahci M (2021) A taxonomy of railway track maintenance planning and scheduling: a review and research trends. Reliab Eng Syst Saf 2015:107827. https://doi.org/10.1016/j.ress.2021.107827 7. TRVINFRA - 0014 (2020) Trafikverket 8. Arasteh-Khouy I, Schunnesson H, Juntti U, Nissen A, Kråik P-O (2014) Evaluation of track geometry maintenance for a heavy haul railroad in Sweden: a case study. J Rail Rapid Transit 228(5):496–503. https://doi.org/10.1177/0954409713482239 9. Przybyłowicz M, Sysyn M, Gerber U, Kovalchuk V, Fischer S (2022) Comparison of the effects and efficiency of vertical and side tamping methods for ballasted railway tracks. Constr Build Mater 314:125708. https://doi.org/10.1016/j.conbuildmat.2021.125708 10. Aursudkij B (2007) A laboratory study of railway ballast behaviour under traffic loading and tamping maintenance. PhD thesis, University of Nottingham 11. Ruxton GD (2006) The unequal variance t-test is an underused alternative to student’s t-test and the Mann-Whitney U test. Behav Ecol 17(4):688–690. https://doi.org/10.1093/beheco/ark016 12. R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ 13. Khosravi M, Soleimanmeigouni I, Ahmadi A, Nissen A, Xiao X (2022) Modification of correlation optimized warping method for position alignment of condition measurements of linear assets. Measurement 201:111707. https://doi.org/10.1016/j.measurement.2022.111707

Self-driving Cars in the Arctic Environment Aqsa Rahim, Javad Barabady, and Fuqing Yuan

Abstract In recent years, self-driving car technology has advanced rapidly due to significant investments in research and development by major automakers and technology companies. However, there are still challenges to be addressed, particularly when it comes to operating in harsh weather conditions such as the Arctic environment. The operation of sensor technologies used in self-driving cars can be significantly affected by such conditions, making it challenging to deploy them in these regions. Therefore, there is a necessity for further research and development of specialized solutions and technologies that can be used specifically for self-driving cars in Arctic environments. This paper addresses the following research questions: (RQ1) What are the technologies that enables the autonomous driving of self-driving cars, and (RQ2) how do they work? (RQ3) What are the key challenges that must be addressed to successfully implement self-driving cars in the Arctic region? (RQ4) What are the impacts of widespread adoption of self-driving cars, and how might they shape the future of transportation? The development of specialized solutions and technologies specifically designed for use in Arctic Environment is crucial to overcome these challenges. Further research and development are necessary to ensure that self-driving cars can be deployed safely and effectively in all weather conditions. As technology continues to evolve, it is likely that we will see even more advancements in self-driving car technology in the coming years. Keywords Self-driving car · Arctic environment · Challenges · Technology · Future

A. Rahim (B) · J. Barabady · F. Yuan Department of Technology and Safety, UiT The Arctic University of Norway, Tromsø, Norway e-mail: [email protected] J. Barabady e-mail: [email protected] F. Yuan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_7

89

90

A. Rahim et al.

1 Introduction In the past, self-driving cars seemed like a pipe dream, a vision of a distant future that we would never get to experience. However, these cars have already been on the road and have affected drivers all around the world [1]. In 2009, Google launched a selfdriving automobile experiment. These vehicles have previously undergone numerous successful test drives on motorways and city streets from different manufacturers [2]. Even the ride-hailing firm Uber has begun developing a fleet of autonomous vehicles. Even though the age of self-driving cars has arrived, they are still trapped in traffic, moving slowly, and blowing their horns [3]. However, some major milestones have already been reached, and others are quickly approaching. We will now briefly outline the breakdown of this paper. Section 2 discusses the working of self-driving car and the details of the sensors used for this purpose, Sect. 3 discusses the Arctic conditions and experience of driving in the Arctic condition, Sect. 4 discusses the Key Challenges of Implementing Self-Driving Cars in the Arctic, Sect. 5 discusses the Technologies used in Self-driving cars both for normal weather conditions and in the Arctic environment, Sect. 6 discusses the future of Arctic Transportation with Self-Driving Cars, followed by conclusion in Sect. 7.

2 Working of Self-driving Car Self-driving cars use a combination of advanced sensors, artificial intelligence (AI), and other technologies to navigate roads and make driving decisions without the need for human intervention (Fig. 1). The interaction of hardware and software is critical for the functioning of self-driving cars [4]. Here are some details on how the hardware and software work together: Sensors: Self-driving cars are equipped with a variety of sensors, including cameras, lidar, radar, and GPS. These sensors are the car’s “eyes and ears” and provide data about the car’s environment [5]. The hardware components of the sensors include the lenses, laser emitters, and receivers. The software component of the sensors is Fig. 1 Working of self-driving car

Self-driving Cars in the Arctic Environment

91

responsible for processing the sensor data and identifying objects in the environment, such as other vehicles, pedestrians, and obstacles [6]. Control Systems: The control systems in a self-driving car are responsible for controlling the vehicle’s acceleration, braking, steering, and other functions. The hardware components of the control systems include the brakes, throttle, steering system, and other actuators. The software component of the control systems receives input from the perception and planning modules and uses this information to generate control commands that are sent to the vehicle’s actuators [7]. Computing Hardware: Self-driving cars rely on powerful computing hardware to process the vast amounts of data that are generated by the sensors and other systems. The computing hardware includes high-performance processors, memory, and storage devices [8]. The software running on this hardware is responsible for analyzing the sensor data, generating driving decisions, and controlling the vehicle. Communication Systems: Self-driving cars also rely on advanced communication systems to exchange data with other vehicles, infrastructure, and the cloud. The hardware components of the communication systems include antennas, modems, and other wireless devices. The software component of the communication systems is responsible for encoding and decoding data, establishing connections with other devices, and managing the flow of data between the car and other systems [9]. The hardware and software components of self-driving cars are highly interconnected and work together to enable safe and reliable autonomous driving. The hardware provides the physical components that interact with the environment and control the vehicle, while the software processes the sensor data, generates driving decisions, and manages the car’s systems. The effective integration of hardware and software [10] is critical for the success of self-driving cars and requires a highly coordinated development effort from multiple disciplines. Figure 2 gives a detail flow for the working of self-driving car.

3 Arctic Circle The Arctic Circle is a region that encircles the North Pole, encompassing parts of Norway, Sweden, Finland, Russia, Canada, Greenland, and Alaska. It is known for its harsh and unforgiving weather conditions, which can make driving in the region a challenging and dangerous experience [11]. During the winter months, the Arctic Circle experiences very limited daylight, with some areas experiencing complete darkness for several months where sun does not rise above the horizon (see Table 1). The Arctic Circle is known for its long, dark winters, with temperatures often dropping well below freezing. Snow, ice, and freezing rain are common, making the roads slippery and treacherous. Strong winds can also cause drifting snow, reducing visibility, and making it difficult to control the vehicle. The Arctic Circle is a sparsely populated region, with vast stretches of wilderness between towns and villages. If a

92

A. Rahim et al.

Fig. 2 Workflow of self-driving car

Table 1 Polar nights in different regions Place

Darkness period

Polar nights (days)

Lapland, Finland

2nd Dec–11th Jan

40

Lulea, Sweden

11th Dec–1st Jan

21

Tromso, Norway

27th Nov–15th Jan 50

Utqia˙gvik, Alaska

18th Nov–24th Jan 67

Faroe Islands, Denmark

15th Dec–15th Jan

Nunavut and Yukon, Canada

20th Nov–15th Jan 45

Murmansk and Franz Josef Land archipelago, Russia 10th Dec–20th Jan

30 40

driver experiences car trouble or gets lost, it may be difficult to find help or get back on track. The roads in the Arctic Circle are often remote and sparsely maintained [12], with few gas stations or other services available. In some areas, the roads may be little more than dirt tracks or gravel paths, making it difficult to travel quickly or comfortably. Driving in the Arctic Circle may also be subject to a variety of regulations and restrictions, particularly in areas that are protected for environmental or cultural reasons [13]. Drivers should be aware of any regulations or restrictions that may apply to driving in the Arctic Circle, such as restricted access to protected areas or restrictions on the use of certain vehicles. These regulations may be in place to protect the environment or cultural heritage of the region. Sensible 4 [14], a self-driving technology company known for testing in challenging environments, is pushing the envelope with a groundbreaking long-term project in north of the Arctic circle (as shown in Fig. 3). Two electric Toyota Proace vehicles [15] outfitted with autonomous driving software will travel a 3.6-km route in the town of Bod as part of a pilot project in Norway. Sensible 4 creates full-stack autonomous driving software that integrates data from numerous sensors enabling

Self-driving Cars in the Arctic Environment

93

Fig. 3 Testing early version of Sensible 4 in Finland [18]

according to the business, vehicles to function in any conditions. The Bod project’s Proaces will travel along public roads between the town’s harbor and hospital, providing a welcoming service for locals as well as a rigorous test for the software. The test’s most exciting component, however, is the weather part because the Bodo subpolar environment poses a unique difficulty [16]. Every year, there is a tremendous fluctuation in the weather, including rain, wind, snow, daylight hours, and fluctuating temperatures [17].

4 Key Challenges of Implementing Self-driving Cars in the Arctic Self-driving automobiles, which are highly automated technology, provide several potential advantages including Greater road safety and efficiency, saving money, Increased independence, and decreased traffic congestion. Government statistics indicate that 94% of collisions are the result of the actions or mistakes of the driver; self-driving cars can help reduce driver error. Personal freedom is a benefit of complete automation. For instance, blind persons can support themselves, and highly automated vehicles can let them live the lifestyles they want. The use of highly automated cars may be able to address several traffic congestion issues. When there are fewer crashes or accidents, there are fewer traffic delays. Highly automated vehicles maintain a safe and constant distance between vehicles, which contributes to fewer stop-and-go traffic jams. While there are many benefits that come with self-driving cars there are still a lot of challenges in designing fully autonomous systems for driverless cars specially in the arctic environment. The major challenges of implementing self-driving car in the arctic environment are weather conditions, Road conditions, Accident reliability and Radar interfaces [19]. Let’s discuss these in detail below:

94

A. Rahim et al.

Fig. 4 Different temperatures in the arctic

Weather conditions: One of the biggest technical difficulties in the Bodo project [15] was dealing with the notorious weather conditions in the Arctic (as shown in Fig. 4). The Sensible 4 [14] had to deal with a torrential downpour, falling leaves, brisk wind, ice, and snow. Snow, fog, mist, and darkness severely affect such sensors, and extremely low temperatures might have an impact on sensor performance. Harsh winter weather also restricts the utility of imaging sensors [11]. Wheel slippage is amplified by ice and snow on the roads, which lowers the accuracy of wheel-mounted sensors. Road conditions: The state of the roads could be incredibly unpredictable and change from place to place. For instance, in the arctic the roads lanes are covered in snow (as shown in Fig. 5) hence making it a requirement for self-driving cars to be more intelligent [12]. The highly active ionosphere can interfere with high-accuracy Global navigation satellite system (GNSS) positioning and limit the visibility of GNSS and satellitebased augmentation system (SBAS) satellites [20]. In the Arctic, where there are limited GNSS interference monitoring networks, positioning systems are particularly vulnerable in the absence of alternative measures to enhance their resilience. In terms of infrastructure, the unpopulated regions of the European Arctic, such as northern Norway, Finland, and Sweden, have spatial gaps in the broadband cellular network coverage [21] required to download assistance data for GNSS, map updates, and traffic information. Absolute LiDAR- and camera-based positioning techniques’ effectiveness is constrained by a lack of detailed and high-quality maps. Additionally, the lack of consistent maintenance of the transportation infrastructure in sparsely inhabited areas limits the potential market opportunity [22]. Autonomous vehicles are designed to operate on public roads and interact with other traffic participants, including pedestrians. While they may share the road with other autonomous vehicles in a highly regulated environment, unexpected situations such as rule-breaking by human drivers or sudden obstacles may occur. It is not practical to wait indefinitely for traffic to clear, which could lead to the development

Self-driving Cars in the Arctic Environment

95

Fig. 5 Road lanes covered in snow

of traffic congestion if multiple autonomous vehicles are waiting for the same issue to resolve [23]. Accident liability: All choices are made by the software, which is the primary component of the vehicle. Newer concepts presented by Google lack a dashboard and a steering wheel, in contrast to earlier designs that included a driver at the wheel. How is a person supposed to operate the car in such a situation if a dangerous collision is going to happen? Similar to, how autonomous vehicles operate, when a crisis demands the inhabitants’ attention, yet they are unconcerned, it can be too late for them to act [24]. Radar interference: Autonomous vehicles rely on sensors such as lasers and radar to navigate, with the lasers often mounted on the roof of the vehicle. Radar detects radio wave reflections from nearby objects by emitting radio frequency waves and measuring the time it takes for the waves to reflect back to the vehicle. However, when multiple vehicles equipped with radar are on the road, there may be interference between signals [25]. It may be challenging for the vehicle to differentiate its own reflected signal from the signals of other nearby vehicles. Additionally, while multiple radio frequencies may be available for radar use, it may not be sufficient to accommodate all vehicles on the road.

96

A. Rahim et al.

5 Technologies Used in Self-driving Car The sensor technologies used in normal weather condition may not be suitable in extreme weather conditions such as Arctic Environment. Sensors such as cameras, lidar, radar, and ultrasonic sensors are commonly used in autonomous vehicles to perceive their surroundings and make decisions. While these sensors can work well in normal weather conditions, they may not be reliable in extreme weather conditions such as heavy rain, snow, fog, or ice. For example, in heavy snow or fog, lidar and camera sensors may struggle to accurately detect objects and obstacles, while radar sensors may face interference from the snow or fog. Additionally, extreme temperatures and harsh weather conditions can cause sensors to malfunction or fail altogether. Table 2 shows the comparison of various technologies used in autonomous driving in different weather conditions. Therefore, autonomous vehicles designed to operate in extreme weather conditions may require specialized sensors that are better suited for those conditions. These sensors may need to be able to penetrate through fog or snow and be able to operate in extreme temperatures. Furthermore, the sensors would need to be equipped with software algorithms to account for the specific weather conditions, which can affect vehicle speed, traction, and overall performance.

5.1 Normal Weather Condition In normal weather conditions, self-driving cars typically rely on a combination of cameras, radar sensors, lidar sensors, GPS, and advanced software algorithms [26, 27] to navigate the roadways. These sensors and systems work together to detect and interpret a wide range of information, including lane markings, other vehicles on the road, and potential obstacles or hazards. Table 2 Technologies used in autonomous driving in different weather conditions Sensor technology

Normal weather condition

Rain

Snow and ice

Fog

Cameras

Good performance

Reduced visibility

Reduced visibility

Reduced visibility

Thermal cameras

Good performance

Good performance

Good performance

Good performance

Lidar

Good performance

Reduced performance

Reduced performance

Reduced performance

Radar

Good performance

Reduced performance

Reduced performance

Reduced performance

Ultrasonic sensors

Good performance

Reduced performance

Reduced performance

Reduced performance

Self-driving Cars in the Arctic Environment

97

5.2 Arctic Condition In arctic conditions, self-driving cars face additional challenges due to the harsh weather conditions and poor visibility. To address these challenges, manufacturers may use specialized sensors, such as thermal imaging cameras, to help detect and navigate through snow, ice, and other obstacles [28]. Other technologies that may be used in arctic conditions include advanced traction control systems, adaptive cruise control, and snow tires designed for improved traction in icy conditions [29].

6 The Future of Arctic Transportation with Self-driving Cars Businesses that fail to adapt quickly to the potential advances in self-driving car technology may face significant challenges. This is a common occurrence when new innovative technology is introduced. The automotive industry, including automakers, suppliers, dealers, insurers, and parking companies, among others, could potentially lose hundreds of billions or even trillions of dollars due to this technological shift. Moreover, the government could also suffer a loss of revenue from license fees, taxes, tolls, and personal injury attorneys as a result of decreased car ownership and usage. If accidents are few, who needs a car with eight airbags, heavier-gauge steel, and a body shop? Who needs a spot near their place of employment when their car can take them there, park themselves miles away, and then pick them up later? Who needs to pay for a flight when you can depart in the evening, sleep for most of the journey, and arrive in the morning? Google is actively working to facilitate car-sharing and reduce the number of vehicles on the road. By providing convenient access to shared vehicles, there may be less of a need for individuals to own their own cars. With automated ridesharing services, users can order a vehicle that will arrive promptly and transport them to their destination. This approach could significantly decrease the number of cars on the road, especially since most people currently commute to work alone. The $600 billion in new and used automobiles sold annually worldwide would initially rise because of this radical change in technology in the transportation sector. But as sharing becomes more common as the technology gains traction, purchases may drastically decline. Many aspects of how the car is made could alter, such as whether front seats are required or optional. Automakers who anticipate change will be more inclined to concentrate on their services than on what and how they produce. Parking lots may be used for other purposes if there were less automobiles on the road. If lengthier journeys are accepted, it might also result in greener cities and revived suburbs. Additionally, if there are fewer vehicles on the road, federal, state, and local government entities might be able to reallocate a sizeable amount of funding each year for highways and roads [27, 30].

98

A. Rahim et al.

7 Conclusion Our thanks to sponsors of IAI2023 Congress for their intellectual and financial support. In conclusion, the use of self-driving vehicles in the Arctic has the potential to have a large positive influence on the area, including better safety, improved mobility, and a smaller environmental footprint. The possible effects of these vehicles on the Arctic environment and society must be carefully considered, though. Safety, respect for the environment, local communities, privacy, equity and access, and accountability are important ethical factors. It is conceivable to benefit from this technology while protecting the distinctive and vulnerable ecosystem of the Arctic by adopting a reasonable and balanced approach to the deployment of self-driving automobiles there. To ensure that the deployment of self-driving cars in the Arctic is carried out in a responsible and sustainable manner, it is crucial for all stakeholders, including the government, industry, and local communities, to collaborate. This could entail creating rules and laws to make that self-driving cars are operated in a way that safeguards the environment, animals, and the safety of all road users. Additionally, the possible effects on neighbourhood economies and employment should be considered, and measures should be implemented to make sure that these neighbourhoods can truly take benefit from the introduction of self-driving cars. In conclusion, the use of self-driving vehicles in the Arctic poses a special combination of difficulties and prospects. Utilizing the advantages of self-driving cars while protecting the distinctive and vulnerable Arctic ecology is conceivable if we carefully evaluate the potential effects of these vehicles on society and the environment and cooperate to ensure responsible deployment.

References 1. Schoettle B, Sivak M (2015) Potential impact of self-driving vehicles on household vehicle demand and usage. University of Michigan, Transportation Research Institute, Ann Arbor 2. Nees MA (2016) Acceptance of self-driving cars: an examination of idealized versus realistic portrayals with a self-driving car acceptance scale. In: Proceedings of the human factors and ergonomics society annual meeting, September, vol 60, no 1. SAGE Publications, Sage CA, Los Angeles, CA, pp 1449–1453 3. Zhao J, Liang B, Chen Q (2018) The key technology toward the self-driving car. Int J Intell Unmanned Syst 6(1):2–20 4. Munir F, Azam S, Hussain MI, Sheri AM, Jeon M (2018). Autonomous vehicle: the architecture aspect of self-driving car. In: Proceedings of the 2018 international conference on sensors, signal, and image processing, October, pp 1–5 5. Surden H, Williams MA (2016) Technological opacity, predictability, and self-driving cars. Cardozo Law Rev 38:121 6. IEV (2008) International electrotechnical vocabulary (IEV). http://www.electropedia.org/. Accessed 18 Aug 2008 7. Brell T, Biermann H, Philipsen R, Ziefle M (2019) Conditional privacy: users’ perception of data privacy in autonomous driving. In: VEHITS, pp 352–359

Self-driving Cars in the Arctic Environment

99

8. Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, ... Zieba K (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 9. Nielsen TAS, Haustein S (2018) On sceptics and enthusiasts: what are the expectations towards self-driving cars? Transp Policy 66:49–55 10. Chen S, Chen Y, Zhang S, Zheng N (2019) A novel integrated simulation and testing platform for self-driving cars with hardware in the loop. IEEE Trans Intell Veh 4(3):425–436 11. Zang S, Ding M, Smith D, Tyler P, Rakotoarivelo T, Kaafar MA (2019) The impact of adverse weather conditions on autonomous vehicles: how rain, snow, fog, and hail affect the performance of a self-driving car. IEEE Veh Technol Mag 14(2):103–111 12. Anwar AKK, Hanis HH, Amirudin MRM, Rabihah I, Wong SV (2017) Advancement in vehicle safety in Malaysia from planning to implementation. Asian Transp Stud 4(4):704–714 13. Ryghaug M, Haugland BT, Søraa RA, Skjølsvold TM (2022) Testing emergent technologies in the Arctic: how attention to place contributes to visions of autonomous vehicles. Sci Technol Stud 35(4):4–21 14. Launonen P, Salonen AO, Liimatainen H (2021) Icy roads and urban environments. Passenger experiences in autonomous vehicles in Finland. Transp Res F: Traffic Psychol Behav 80:34–48 15. Cieslik W, Antczak W (2023) Research of load impact on energy consumption in an electric delivery vehicle based on real driving conditions: guidance for electrification of light-duty vehicle fleet. Energies 16(2):775 16. Miles V, Esau I, Miles MW (2023) The urban climate of the largest cities of the European Arctic. Urban Clim 48:101423 17. Yoneda K, Suganuma N, Yanase R, Aldibaja M (2019) Automated driving recognition technologies for adverse weather conditions. IATSS Res 43(4):253–262 18. Sensible 4 (2021) Sensible 4 is testing the early version of autonomous driving software DAWN in Finnish Lapland. https://sensible4.fi/company/newsroom/sensible-4-is-testing-the-early-ver sion-of-autonomous-driving-software-dawn-in-finnish-lapland/. Accessed 12 Apr 2023 19. Fathollahi M, Kasturi R (2016) Autonomous driving challenge: to infer the property of a dynamic object based on its motion pattern. In: Computer vision–ECCV 2016 workshops: Amsterdam, The Netherlands, October 8–10 and 15–16, 2016, proceedings, part III 14. Springer International Publishing, pp 40–46 20. Li J, Bao H, Han X, Pan F, Pan W, Zhang F, Wang D (2017) Real-time self-driving car navigation and obstacle avoidance using mobile 3D laser scanner and GNSS. Multimed Tools Appl 76:23017–23039 21. Khan B, Khan F, Veitch B (2019) A cellular automation model for convoy traffic in Arctic waters. Cold Reg Sci Technol 164:102783 22. Li J, Zhao X, Cho MJ, Ju W, Malle BF (2016) From trolley to autonomous vehicle: perceptions of responsibility and moral norms in traffic accidents with self-driving cars. SAE technical paper, 10, 2016-01 23. Chen L, Hu X, Tian W, Wang H, Cao D, Wang FY (2019) Parallel planning teaches self-driving cars to respond quickly to emergencies 24. Puertas-Ramirez D, Serrano-Mamolar A, Martin Gomez D, Boticario JG (2021) Should conditional self-driving cars consider the state of the human inside the vehicle? In: Adjunct proceedings of the 29th ACM conference on user modeling, adaptation and personalization, June, pp 137–141 25. Sheeny M, De Pellegrin E, Mukherjee S, Ahrabian A, Wang S, Wallace A (2021) RADIATE: a radar dataset for automotive perception in bad weather. In: 2021 IEEE international conference on robotics and automation (ICRA), May. IEEE, pp 1–7 26. Li Y, Ibanez-Guzman J (2020) Lidar for autonomous driving: the principles, challenges, and trends for automotive lidar and perception systems. IEEE Signal Process Mag 37(4):50–61 27. Muhammad K, Ullah A, Lloret J, Del Ser J, de Albuquerque VHC (2020) Deep learning for safe autonomous driving: current challenges and future directions. IEEE Trans Intell Transp Syst 22(7):4316–4336 28. Chaudhary S, Wuttisittikulkij L, Saadi M, Sharma A, Al Otaibi S, Nebhen J, ... Chancharoen R (2021) Coherent detection-based photonic radar for autonomous vehicles under diverse weather conditions. PLoS One 16(11):e0259438

100

A. Rahim et al.

29. Iyer NC, Pillai P, Bhagyashree K, Mane V, Shet RM, Nissimagoudar PC, ... Nakul VR (2020) Millimeter-wave AWR1642 RADAR for obstacle detection: autonomous vehicles. In: Innovations in electronics and communication engineering: proceedings of the 8th ICIECE 2019. Springer Singapore, pp 87–94 30. Poczter SL, Jankovic LM (2014) The google car: driving toward a better future? J Bus Case Stud (JBCS) 10(1):7–14

Towards a Railway Infrastructure Digital Twin Framework for African Railway Lifecycle Management Daniel N. Wilke, Daniel Fourie, and Petrus Johannes Gräbe

Abstract Africa has 82,000 km of railway lines of which around 68,000 km are operational (est. 2007). South Africa accounts for a quarter of Africa’s operational railway lines which moves 63% of Africa’s railway freight by mass (est. 2007). In South Africa, freight railway lines are managed by Transnet, where freight is the main driver for Transnet’s growth. Transnet’s freight is currently centered around two lines, namely, the 861 km Sishen-Saldanha iron-ore line and the 621 km coal system linking Mpumalanga mines with the Richards Bay port which account for 60% of Transnet’s freight (Transnet SOC Ltd © LTPF (2016) Chapter 3). South Africa’s railway network is in rapid decline with freight transport declining by 25% over the last five years. For the financial year 2021–2022, the transported freight was merely 172.7 mt, a decline of 10 mt from 2007 and a decline of 54 mt from the peak in 2015. The welfare of South Africa’s railway infrastructure is critical for the world as South Africa stores 75% of the world’s known manganese used in stainless steel production. Maintaining an aged and declining railway network within a resource-constrained environment requires careful planning and prioritization of maintenance tasks. To support the management of Transnet’s railway infrastructure, we are developing a railway infrastructure performance digital twin. The performance digital twin is being designed for real-time assessment of the state of the railway infrastructure as well as the translation thereof in identifying and prioritizing maintenance tasks. Given available resources, utilization, and transport risks the digital twin will be able to inform maintenance schedules and maintenance decision-making. Only limited digital twin instances for railway infrastructure have been realized mainly within a first-world context. D. N. Wilke (B) · D. Fourie Mechanical and Aeronautical Engineering, University of Pretoria, Pretoria, South Africa e-mail: [email protected] D. Fourie e-mail: [email protected] P. J. Gräbe Civil Engineering, University of Pretoria, Pretoria, South Africa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_8

101

102

D. N. Wilke et al.

Keywords Railway · Digital twin · Life-cycle management · Maintenance

1 Introduction The state of railway lines in Africa varies widely across the continent, with some countries having well-maintained and modern rail systems, while others have neglected and outdated infrastructure. Many countries in North Africa, such as Egypt, Morocco, and Tunisia, have relatively modern and well-maintained railway systems, with high-speed trains and electrified lines. However, in Sub-Saharan Africa, the situation is more challenging. Many railway lines in this region were built during the colonial era and have not been adequately maintained or upgraded since then. As a result, many lines are in poor condition, with outdated equipment and inadequate maintenance practices. In recent years, there has been a renewed focus on investing in railway infrastructure in Africa. Several countries, including Ethiopia, Kenya, and Tanzania, have launched ambitious plans to build new railways or upgrade existing lines. These projects aim to improve connectivity within countries and between countries, facilitate trade, and promote economic growth. However, significant challenges remain, including funding constraints, political instability, and inadequate maintenance and management practices. Many railway lines in Africa are also vulnerable to climate change and extreme weather events, such as floods and droughts. Overall, the state of railway lines in Africa is mixed, with some countries having well-maintained and modern rail systems, while others face significant challenges in maintaining and upgrading their infrastructure. The investment in railway infrastructure in recent years is promising, but sustained efforts are needed to ensure the long-term sustainability and reliability of these systems. Africa and South Africa face several challenges regarding their railway infrastructure, including: 1. Funding Constraints: One of the significant challenges in Africa and South Africa is the lack of funding for railway infrastructure. Building and maintaining railway lines require substantial investment, which is often difficult to secure due to competing demands for limited resources. 2. Outdated Infrastructure: Many railway lines in Africa and South Africa were built during the colonial era and have not been adequately maintained or upgraded since then. As a result, many lines are in poor condition, with outdated equipment and inadequate maintenance practices. 3. Limited Connectivity: The railway networks in Africa and South Africa are not well connected, which limits their potential to facilitate trade and economic growth. Many regions are not served by railways, which makes it difficult to transport goods and people across the continent. 4. Political Instability: Political instability in some African countries has hindered investment in railway infrastructure. Political instability can lead to disruptions in

Towards a Railway Infrastructure Digital Twin Framework for African …

103

railway operations and can discourage private investors from investing in railway projects. 5. Lack of Skilled Personnel: Many African countries face a shortage of skilled personnel in the railway industry, including engineers, technicians, and maintenance workers. This shortage makes it difficult to maintain and operate railway lines effectively. 6. Climate Change: Many railway lines in Africa and South Africa are vulnerable to extreme weather events such as floods and droughts, which can cause significant damage to infrastructure and disrupt railway operations. Overall, addressing these challenges requires sustained efforts and investment from governments, private investors, and international development partners. Improving railway infrastructure can facilitate trade, boost economic growth, and improve access to essential services for communities across Africa and South Africa. Digital twins can play a significant role in helping Africa to maintain its railway infrastructure by providing a comprehensive view of the system, enabling proactive maintenance, reducing costs and downtime, and enabling knowledge and strategy sharing between governing bodies within and between countries. Digital twins can help maintain railway infrastructure in Africa with the following tasks [1, 2]: 1. Predictive Maintenance: Digital twins can use real-time data from sensors and other sources to predict when maintenance is required. This helps to identify potential issues before they occur, enabling maintenance crews to plan and execute maintenance activities more efficiently. 2. Simulation and Analysis: Digital twins can simulate and analyse the performance of the railway system, enabling engineers and maintenance crews to identify areas that require improvement and plan for future upgrades. 3. Improved Safety: Digital twins can be used to model and analyse safety scenarios, helping to identify potential safety risks and develop strategies to mitigate them. 4. Enhanced Efficiency: Digital twins can improve the efficiency of railway operations by optimizing train schedules, reducing downtime, and minimizing disruptions. 5. Cost Reduction: By reducing downtime and optimizing maintenance activities, digital twins can help to reduce maintenance costs and increase the lifespan of railway infrastructure. Overall, digital twins can provide valuable insights into the performance of railway infrastructure, enabling maintenance crews to identify potential issues and plan maintenance activities more efficiently. By improving the efficiency and effectiveness of maintenance operations, digital twins can help to ensure the long-term sustainability and reliability of railway infrastructure in Africa.

104

D. N. Wilke et al.

2 Digital Twin Overview for African Railway Infrastructure Digital twins are virtual replicas of physical assets or systems that can be used for monitoring, analysing, and optimizing their performance. In the context of railway maintenance, digital twins can be a powerful tool for improving the efficiency and effectiveness of maintenance operations [1, 2]. By creating a digital twin of a railway line, engineers and maintenance crews can simulate and analyse the performance of the system, identify potential issues before they occur, and plan and execute maintenance activities more efficiently. The digital twin can be created by collecting and integrating data from various sources, such as sensors, historical maintenance records, and engineering models. This data can be used to create a dynamic model that reflects the current state of the railway line and provides insights into its performance. With this information, maintenance crews can plan and execute maintenance activities more effectively, reducing downtime and improving the reliability and safety of the railway system. Digital twins within an African context can assist in transferring knowledge and strategies between governing bodies within a country or between countries. That is, digital twins, can rapidly disseminate strategies and approaches that work across borders. Second, establishing a digital network between African countries may enable additional insights to be uncovered, while assisting poorer countries with limited data by applying transfer learning from countries with well-established sensor networks. Digital twins can revolutionize railway maintenance by providing a comprehensive view of the system, enabling proactive maintenance, and reducing costs and downtime. As technology advances and more data becomes available, the potential and application of digital twins for railway maintenance are expected to only continue to grow.

3 South African Railway South Africa has a well-developed railway infrastructure that plays a vital role in the country’s economy. The railway system is operated by the state-owned company, Transnet Freight Rail (TFR), which is responsible for the management and maintenance of the rail network. The railway infrastructure in South Africa covers approximately 31,000 km, connecting major cities, towns, and ports. The rail network consists of both passenger and freight lines. The passenger lines are operated by the Passenger Rail Agency of South Africa (PRASA), while the freight lines are operated by TFR. The railway infrastructure in South Africa is divided into five main corridors: the Coal System, the Durban-Gauteng System, the Export Ore System, the Cape Town-Gauteng System, and the North-Eastern System as shown in Fig. 1.

Towards a Railway Infrastructure Digital Twin Framework for African …

105

Fig. 1 Core network systems (adapted from [3])

In addition to the main corridors, there are also several branch lines that connect smaller towns and industrial areas to the main rail network. The railway infrastructure in South Africa is electrified, and most trains are powered by electric locomotives. However, the railway infrastructure in South Africa faces several challenges, including vandalism, theft, and insufficient investment. These challenges have resulted in disruptions to rail services and have had a negative impact on the efficiency of the rail network. Despite these challenges, the railway infrastructure in South Africa remains an essential component of the country’s transport system and economy.

4 Railway Infrastructure Information Railway infrastructures are complex systems. The complexity is highlighted with a quick overview of the transverse and longitudinal profile information that may affect the railway infrastructure’s interaction with railway vehicles, as well as change over time with climate, condition, and interaction state. The aim is to give an overview of the complexity of the full information problem for digitising railway infrastructure and present information regarding the South African railway infrastructure.

106

D. N. Wilke et al.

Fig. 2 Transverse railway profile

4.1 Transverse Profile The transverse view of a railway line, shown in Fig. 2, indicates the bedrock, subgrade, sub-ballast, ballast, sleeper, and rails. At a given point in a rail, these affect the interaction between the railway track and rail vehicles. The condition of these components of a railway profile also changes over time with climate and conditions.

4.2 Longitudinal Railway Profile The complexity is further enhanced by considering the longitudinal direction of the rail, i.e. the information in the transverse railway profile changes along the length of the rail. Let us consider the 31,000 km of operational track in South Africa. The complexity is highlighted by just considering the number of components involved in the railsleeper interaction. To just capture the rail sleeper interaction involves around 330 million components as outlined in Table 1. Table 1 shows an estimate of the number of sleepers, clips, and pads present in the 31,000 km of active railway infrastructure in South Africa. The presence and condition of the sleepers, clips, and pads directly affect the rail-to-railway vehicle interaction. In addition, the condition of the ballast, sub-ballast, subgrade, and bedrock influences the interaction of the sleepers with the surrounding environment. Although the problem seems intractable, just assessing the binary state of being present or absent requires around 330 MB of data while storing a half-precision float indicating the state of each component requires 660 MB of storage. From a digital perspective, storing information about each component is tractable. Keeping track of each sleeper’s orientation in physical space, merely requires 4 GB of data to be stored.

Towards a Railway Infrastructure Digital Twin Framework for African …

107

Table 1 Estimated number of sleepers, clips, and pads in the South African network system Railway system

Railway length Estimated number Estimated number Estimated number of sleepers (per of clips (per of pads (per million) million) million)

Coal

580

0.9

3.6

1.8

Durban-Gauteng 688

1.1

4.4

2.2

Export Ore

861

1.3

5.2

2.6

Cape Town-Gauteng

1,600

2.5

10

5.0

North-Eastern

928

1.4

5.6

2.8

Other

26,343

39.8

159.0

79.6

Total

31,000

47

188

94

Hence, the issue is not in the digital capabilities but rather the digitizing capabilities, i.e., the direct assessment of the presence and absence of each component, the state of each component, or faults with equipment becomes intractable. Track geometry faults include track misalignment, gauge widening or narrowing, and irregularities in the rail surface. They can cause excessive wear on the train wheels and increase the risk of derailments. Ballast defects include voids or contamination, that can lead to uneven support of the track and affect its stability. Rail damage includes cracks, fractures, and corrosion. They can weaken the rail structure and increase the risk of rail failure and derailments. Defective sleepers can cause track misalignment and increase the risk of derailments. As a result, the indirect assessment of the state of the railway is the only viable option with current technology.

5 Available Information In addition to visual inspections, there are several typical approaches to obtaining information on the state of railway infrastructure. These can include track geometry measurements, non-destructive testing, instrumented monitoring, and remote sensing. Data from various sources, such as inspection reports, track geometry measurements, and instrumented monitoring, can be analysed to identify trends and patterns. This can help identify areas that require maintenance or repair and enable more effective asset management.

108

D. N. Wilke et al.

Fig. 3 Infrastructure measuring car data capturing system

5.1 Track Geometry Measurements Track geometry measurements involve using specialized equipment such as an infrastructure measuring car to measure the gauge, cross-level, warp, twist, alignment, curvature, and profile (surface and top) of the track [4]. This can help identify areas that require maintenance or repair. An infrastructure measuring car records data that enables the estimation of track and catenary geometry data as shown in Fig. 3. A typical infrastructure measuring car has several sensors that include an inertial measurement unit, GPS, accelerometers, microphones, an optical measurement system to measure gauge and rail profiles using lasers as well as instrumented pantograph that are processed to estimate track and catenary geometry data. In addition a rotating laser scanner measures the ballast profile and other clearances. Further processing of the data enables the estimation of typical track faults that include track geometry faults, ballast defects, track damage, and defective sleepers.

5.2 Non-destructive Testing Non-destructive testing (NDT) is a valuable tool in assessing the state of railway infrastructure. NDT techniques include ultrasonic testing, magnetic particle inspection, and eddy current testing [5]. NDT techniques can help identify defects or flaws that may be invisible to visual inspections, without causing any damage to the railway components being tested.

5.2.1

Ultrasonic Testing

Ultrasonic testing (UT) is a commonly used NDT technique for inspecting rail tracks, wheels, axles, and other components [6]. The technique uses high-frequency sound waves to detect defects such as cracks, corrosion, and delamination. UT can also be

Towards a Railway Infrastructure Digital Twin Framework for African …

109

used to measure the thickness of rails and wheels, which can help determine if they need to be replaced or repaired.

5.2.2

Eddy Current Testing

Eddy current testing (ECT) is another NDT technique that is used to detect surface and near-surface defects in rails, wheels, and axles [7]. The technique uses electromagnetic induction to create eddy currents in the material being tested, which can help detect cracks, corrosion, and other defects.

5.2.3

Magnetic Particle Testing

Magnetic particle testing (MPT) is a technique used to detect surface and slightly subsurface defects in railway components such as rails, axles, and wheels [8]. The technique involves applying magnetic particles to the surface of the component being tested, which are then attracted to areas of magnetic flux leakage caused by defects such as cracks.

5.3 Instrumented Monitoring Instrumented monitoring involves installing sensors on various components to continuously monitor their condition. This can help detect potential issues early and enable preventative maintenance. For example, ultrasonic sensors can be installed on railway tracks to estimate internal cracks and flaws of railway tracks. Depending on the driving frequency and defect, a sensor will typically cover 4 km of railway, making them suited for installation on problematic sections of rail. Fiber optic sensing (FOS) technology is an approach for monitoring the structural health and performance of railway infrastructure [9]. FOS technology uses fiber optic cables to measure strain, temperature, and other parameters along the length of the cable. FOS technology can be used to monitor various components of the railway infrastructure, including tracks, bridges, tunnels, and other structures. For track monitoring, FOS technology can be used to measure the strain and temperature of railway tracks. This can help identify areas of high stress and potential failures. FOS technology can also be used to monitor the stability of track ballast and detect subsidence. FOS technology offers several advantages over traditional monitoring techniques, including the ability to measure multiple parameters along a single fiber optic cable, high accuracy, and the ability to monitor large areas of railway infrastructure simultaneously. FOS technology can help railway operators detect defects and anomalies early, optimize maintenance and repair efforts, and improve the safety and reliability of railway infrastructure.

110

D. N. Wilke et al.

5.4 Remote Sensing Techniques Remote sensing techniques allow for non-intrusive monitoring of railway infrastructure by providing high-resolution data on railway infrastructure. This can help identify areas that require maintenance or repair and enable more effective asset management. Remote sensing techniques can provide valuable information on the condition of the railway infrastructure, which can help identify defects and anomalies early and prevent catastrophic failures. Remote sensing techniques can also be used to monitor the effectiveness of maintenance and repair efforts over time. Remote sensing techniques used in railway infrastructure assessment include LiDAR, Thermal imaging, Synthetic Aperture Radar, and Satellite Radar Data.

5.4.1

Light Detection and Ranging

LiDAR (Light Detection and Ranging) is a remote sensing technique that uses laser pulses to create high-resolution 3D maps of the railway infrastructure [10]. LiDAR can be used to detect surface features and anomalies such as cracks, deformation, and erosion. LiDAR data can also be used to create digital elevation models (DEMs) and orthophotos, which can be used to assess the stability of railway structures.

5.4.2

Thermal Imaging

Thermal imaging can be used to detect hot spots or temperature variations in railway infrastructure, which can indicate defects such as delamination, cracks, and other anomalies [11]. Thermal imaging can also be used to detect subsurface anomalies such as moisture and leaks in railway tunnels.

5.4.3

Synthetic Aperture Radar and Satellite Radar Data

Synthetic Aperture Radar (SAR) and Satellite Radar Data (SRD) are remote sensing techniques that use radar to create high-resolution images of the railway infrastructure [12]. SAR and SRD can be used to detect changes in the ground surface, which can indicate structural instability or subsidence. SAR and SRD can also be used to detect changes in vegetation cover near railway infrastructure, which can indicate erosion or instability.

Towards a Railway Infrastructure Digital Twin Framework for African …

111

5.5 Summary Track geometry measurements are usually readily available and conducted at regular intervals, making them good candidates for information to initiate and develop a digital twin model for railway infrastructure. Advanced data processing of track geometry data combined with the dynamic response of the infrastructure measuring car may initiate a baseline digital twin model that can be refined using additional information as they become available.

6 Digital Twins for Railway Infrastructure A digital twin requires an ecosystem of hardware, middleware, and software that seamlessly interoperates with each other through network devices as shown in Fig. 4.

Fig. 4 Outline of a railway digital twin ecosystem

112

D. N. Wilke et al.

6.1 Hardware The hardware consists of sensors that can be fixed, moveable or remote. Since railway infrastructure covers significant differences the application of moveable and remote sensors is tractable. Limited fixed sensors offer potential such as FOS.

6.2 Middleware Middleware refers to software components that facilitate data management, communication, and data exchange between different parts of a digital twin system. Middleware can aggregate data from different sources and transform it into a format that can be used by the digital twin known as data integration. Alternatively, data interoperability is the seamless exchange of data and operates as a cohesive whole without the need to transform data. Interoperability is important because it enables data to flow freely between different systems and eliminates the need for manual data entry or conversion, which can be time-consuming and errorprone. As a result, interoperability promotes competition and innovation, as it allows different systems and products to work together and be combined in new and creative ways. Middleware performs basic data processing and modelling on the data before passing it on to the digital twin. Middleware also facilitates secure communication between different software components within the digital twin system, such as between the simulation engine and the visualization component.

6.3 Software Software refers to the digital twin and any transformation of data into useful information such as condition monitoring, and predictive maintenance. The digital twin can be used to simulate various maintenance schedules without disrupting actual operations. This can help maintenance teams identify the most effective maintenance schedule to effectively manage resource constraints. A digital twin can be used to provide training and education to maintenance teams, allowing them to practice maintenance procedures, and schedules and identify potential patterns of areas of concern. Digital twins enable hypothesis testing that might provide new insights. A digital twin community may enable the transfer of learning from one railway governing body to another within the same country or across borders. This may enable established maintenance in one country to benefit the maintenance of another country, potentially enabling interoperability of life-cycle management strategies.

Towards a Railway Infrastructure Digital Twin Framework for African …

113

7 Conclusions This study provides a high-level overview of the potential of digital twins to improve the life-cycle management of African railway infrastructure. Suitable data to initiate digital twin technology has been identified that may ease the barrier to entry as the digital twin ecosystem is systematically realised.

References 1. Dirnfeld R, De Donato L, Flammini F, Azari MS, Vittorini V (2022) Railway digital twins and artificial intelligence: challenges and design guidelines. In: Dependable computing—EDCC 2022 workshops. EDCC 2022. Communications in computer and information science, vol 1656. Springer, Cham 2. Dimitrova E, Tomov S (2021) Digital twins: an advanced technology for railways maintenance transformation. In: 13th electrical engineering faculty conference (BulEF), Varna, Bulgaria, pp 1–5 3. Transnet SOC Ltd © LTPF (2016) Chapter 3 4. Eklöf K, Nwichi-Holdsworth A, Eklöf J (2021) Novel algorithm for mutual alignment of railway track geometry measurements. Transp Res Rec 2675(12):995–1004 5. Cafiso S, Capace B, D’Agostino C, Delfino E, Di Graziano A (2016) Application of NDT to railway track inspections. In: 3rd international conference on traffic and transport engineering (ICTTE). Assoc Italiana Ingn Traffico Trasporti Res Ctr, Belgrade, Serbia 6. Xue Z, Xu Y, Hu M, Li S (2023) Systematic review: ultrasonic technology for detecting rail defects. Constr Build Mater 368:130409 7. Rajamäki J, Vippola M, Nurmikolu A, Viitala T (2018) Limitations of eddy current inspection in railway rail evaluation. Proc Inst Mech Eng, Part F: J Rail Rapid Transit 232(1):121–129 8. Zhou L, Liu X-Z, Ni Y-Q (2019) Contemporary inspection and monitoring for high-speed rail system. In: High-speed rail, June. IntechOpen 9. Du C, Dutta S, Kurup P, Yu T, Wang X (2020) A review of railway infrastructure monitoring using fiber optic sensors. Sens Actuators A: Phys 303:111728 10. Taheri Andani M, Mohammed A, Jain A, Ahmadian M (2018) Application of LIDAR technology for rail surface monitoring and quality indexing. Proc Inst Mech Eng, Part F: J Rail Rapid Transit 232(5):1398–1406 11. Stypułkowski K, Gołda P, Lewczuk K, Tomaszewska J (2021) Monitoring system for railway infrastructure elements based on thermal imaging analysis. Sensors 21(11):3819 12. Chang L, Dollevoet R, Hanssen R (2014) Railway infrastructure monitoring using satellite radar data. Int J Railw Technol 3:79–91

Climate Change Impacts on Mining Value Chain: A Systematic Literature Review Ali Nouri Qarahasanlou, A. H. S. Garmabaki, Ahmad Kasraei, and Javad Barabady

Abstract Mining is becoming increasingly vulnerable to the effects of climate change (CC). The consequences of changing weather patterns, such as extreme weather events that can damage equipment, infrastructure, mining facilities, and operation interruption, are the source of the vulnerability. The new demand initiated by governments and international agreements put extra pressure on mining industries to update their policies to reduce greenhouse gas (GHG) emissions and adapt to CC, such as carbon pricing systems, renewable energy, and sustainable development. Most mining and exploration industries focus on reducing mining’s impact and climate mitigation on CC rather than adapting to extreme weather events. Therefore, it is important to study and investigate the impacts of CC on the mining sector. This paper aims to study the challenges and strategies for adapting and mitigating CC impacts on mining using a systematic literature review (SLR). These results showed that most of the proposed models and strategies in the mining field are in the conceptual phase, and fewer are practical models. Keywords Climate change · Adaptation · Mitigation · Mining

A. N. Qarahasanlou Faculty of Technical and Engineering, Imam Khomeini International University, Qazvin, Iran e-mail: [email protected] A. H. S. Garmabaki (B) · A. Kasraei Department of Civil, Environmental and Natural Resources Engineering, Luleå University of Technology, Luleå, Sweden e-mail: [email protected] A. Kasraei e-mail: [email protected] J. Barabady Faculty of Science and Technology, UiT The Arctic University of Norway, Tromsø, Norway e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_9

115

116

A. N. Qarahasanlou et al.

1 Introduction The mining and metal sector contributes 4–7% of greenhouse gas (GHG) emissions globally, which can be exacerbated by increased mineral production (increasing by more than 450% by 2050) [1]. On the other hand, mining operations are vulnerable to the effects of CC. CC means variability that lasts long, generally for decades or longer [2]. For example, temperature variations, extreme precipitation, harsh weather, and storms pose various vulnerabilities to mining infrastructure and activities. The interaction between extreme climatic events and mining infrastructure can create vulnerabilities that may lead to risks and uncertainty that must be addressed longterm [3]. Therefore, there is a need to increase awareness, which helps stakeholders, practitioners, and regulators better understand the negative impact of mining activity (e.g., Carbon footprint) and the impacts of climate change on the mining sector to plan proactive measures to reduce the uncertainty. In addition, CC has significant implications for mining and regional and national economies. On the other hand, Fig. 1 shows that the CC response/ET is one of the main priorities for the mining sector in 2022 and 2023. However, there is a lack of enough knowledge about how CC may affect important industrial processes, raising concerns about mining firms’ capacity to maintain profitable operations. Thus, the aim of this paper is to study the challenges and strategies (and models) for CC impacts on the mining sector using Systematic Literature Review (SLR). In this regard, the mainstream of the paper is considered to be the two main climateresilient actions, namely climate adaptation and mitigation. Figure 2 illustrates these actions as follows: • Climate adaptation: This entails planning for an uncertain future and taking action to lessen the effects of climatic changes [5]. Climate adaptation changes how we

Poll of 156 decision makers

30

2023

2022

20

10

0

Fig. 1 The main priority for the mining sector (adapted from [4]), (ESG: Environmental, Social, and Governance, ET: Energy transition, SC: supply chains)

Climate Change Impacts on Mining Value Chain: A Systematic …

117

Fig. 2 Climate mitigation and adaptation [5]

live, work, and behave to lessen our vulnerability to the unavoidable effects of CC caused by past and ongoing greenhouse GHG emissions [6]. ISO 2019 defined CC adaptation as adjusting to the current and projected climate and its effects [7]. • Climate mitigation: Lowering GHG emissions through improving energy management (such as efficiency and conservation), implementing renewable energy projects, and promoting a low-carbon economy [5]. Companies, governments, and local communities will be impacted by the economic and environmental effects of climate change’s effects on the mining industry. The mining industry may benefit from sustainable practices that minimize impact on the environment and local populations and open up new potential for development and innovation with the right adaptation and mitigation strategies. Thus, the review includes how CC impacts in Mining Operations (CCIMO) are perceived and which strategies/approaches have been used for CC adaptation and mitigation in the mining sector. The rest of the paper is organized as follows: The steps of the SLR approach are detailed in Sect. 2. As elucidated in Sects. 3 and 4, the utilization of SLR for CCIMO has been comprehensively explained. Section 5 is a discussion and conclusions.

2 SLR Methodology for CCIMO Figure 3 shows the methodology for SLR in this paper. The methodology is divided into three phases: Planning, conducting, and Reporting the review [7]. In phase 1, the research questions must be defined, and keywords must be specifically set to locate scientific papers in designated databases. The data extraction strategy determines how data items are obtained. Phase 2 involves reviewing the scientific database to find potential documents and research. Primary studies are chosen based on inclusion and exclusion criteria. The, required data to meet research objectives extraction and

118 Fig. 3 SLR process phases [7]

A. N. Qarahasanlou et al. Phase 1: Planning the review Stage 1: Determine whether a systematic review is necessary Stage 2: Identify the research questions Stage 3: Creating a procedure for reviews 3.1: Find the search keywords 3.2: List the resources for your search 3.3: Provide the criteria for choosing the research 3.4: Method for data extraction Phase 2: Conducting the review Stage 4: Finding the research projects Stage 5: Choose primary studies Stage 6: Data extraction Stage 7: Data synthesis Phase 3:Rreporting the review Stage 8: Writing the main report

synthesis [7]. The outcomes of the earlier phases should be published after the review process during phase 3. Since the necessity of the review was discussed, the following research questions are defined to meet the aim of the study: • RQ1: How is CC perceived in the mining value chain? and what are the impacts (or challenges as risks and opportunities) of CC in the mining industry? The mining value chain includes firms, service providers, utility and infrastructure providers, local and/or state government organizations, and community groups. • RQ2: What approaches and strategies have the mining industry used to combat CC impacts, and how have climate adaptation and mitigation strategies been addressed? Making the right decisions in the face of existing challenges requires specialized, comprehensive, and multifaceted approaches arranged from the macro to the micro level. Thus, the models and methodologies offered for this purpose should be recognized before planning and decision-making. Then, appropriate strategies and decisions can be adopted based on the output of these approaches. Based on the addressed research questions, search string terms are considered with Boolean operators (AND, OR) as: “climate change” OR “climate adaptation” OR “climate immigration” AND “mining” OR “mining operation”. Several research databases, such as The Multidisciplinary Digital Publishing Institute (MDPI), SpringerLink, Taylor & Francis, Wiley Online, WEB of Science, etc., have been used. In the next stage, the criteria for choosing the search were provided as follows: • The included document should be prepared in English. • Included studies need to be turned into a research paper and published.

Climate Change Impacts on Mining Value Chain: A Systematic …

119

• Unrelated research should be discarded based on title, keyword, abstract, and conclusion. • The complete version of duplicated publications was included in the review process. • The articles of the last two decades were examined based on the search statistics obtained from the Web of Science (WOS) and Scopus. • Finally, data extraction and synthesis forms were used to gather data to answer study questions. The following is a list of the items on the data extraction form: – – – – –

Research topic; Publication date; Presented definitions for CC and CC challenges for the mining sector; Type and explanation of the concepts determine the CCIMO; Type and description of the strategies used for CCIMO.

After that, the search was done using search sources and the search string starting in 2000. Primary studies were 42 related studies of all one. The results of reviewed articles explain the two main research questions, which will be discussed in Sects. 3 and 4.

3 Answer to RQ1: Impacts The term “Climate Change” is used in the mining value chain to describe the longterm alterations in the planet’s climate brought on by human activities such as the combustion of fossil fuels, deforestation, and industrial operations. These actions raise GHG emissions, which trap heat in the planet’s atmosphere and negatively affect the ecosystem, including temperature increases, altered precipitation patterns, and sea level rise. In 2006, Irarrázabal identified CC’s significant impacts in the mining sector based on different scenarios. He showed that companies of all sizes must be essentially created comprehensive strategies to reduce GHG emissions. Such strategies should encompass large and established firms and smaller and newer organizations that may lack the resources for costly initiatives [6]. The mine production stage (from pre-mine planning through planning and development, production, postproduction and closure, and post-mining) is most at risk from climate change [8]. Figure 4 explored the effects of significant weather events and climatic variations on a general mining process. These effects could be impacted directly or indirectly as follows [6, 8]: • Primary (or direct) impacts: Physical effects such as flooding, erosion, landslides, debris flows, overflowing waste ponds, and threats to human life, property, revenue, and the environment are the majority of the direct effects.

120

A. N. Qarahasanlou et al.

Pre-mine planning

Mine planning and development

Geophysics Mine sustainability Social/community impact Temperature and rainfall Uptake

Production

Climate change planning Infrastructure and other industries

Mining communities Temperature Data

Post-Production and closure

Post-mining

Rehabilitation plan

Rehabilitation implementation

Water Transportation Energy availability Events Temperature

Data

Fig. 4 Potential impacts of climate and extreme weather on mining practices [9]

• Secondary (indirect or ‘knock-on’) impacts: For instance, Mining access to land and labor can be affected by changes in population and resources. Local labor force shortages can result from a lack of water, electricity, dust, isolation, and tropical diseases. Mining operations, labor, transportation, communication, building infrastructure, and mine decommissioning are all projected. These effects might favor or negatively influence future mining operations, equipment, assurance, economic stability, and health and safety conditions inside and around the mine. Thus, the effective elements of CC in the mining sector can be delineated through the following perspectives: • The processing operations: Because mining is frequently a “heavily waterdependent” sector, rising water scarcity is a major concern [10]. • The site geography (condition of the property): There is a chance that steep slopes in permafrost overburden exposed for a long time will damage the stability of open-pit mine walls [11]. Permafrost thaw is a problem in all northern locations, especially when containment buildings have not been built to endure the faster melting expected by CC or to allow for long-term maintenance [12]. • Challenges to environmental management: For instance, risks associated with extreme rainfall and/or tailings dam collapse include polluted water flow into nearby communities, associated remedial costs, increased environmental responsibility, effects on community health and safety, and a large potential for reputational harm can be considered as some of the managerial challenges that need to be considered. Flooding and intense precipitation also run the risk of exposing sinkholes and causing or escalating acid rock drainage, all of which might have negative effects on water supplies [13]. • Dryer conditions could decrease water intake capacity and expose tailings to subaerial weathering, underscoring the urgent need for new technologies to combat the effects of CC. This is due to the limited amount of climate modeling data and continued reliance on permafrost in the design of retention facilities [14]. • It is essential that the planning for new mines and the closure and reclamation of existing ones consider the possible repercussions of CC as its effects become increasingly obvious [14]. • Increasing sea levels by more precipitation, altering storm patterns, and temperature changes in some places will be made possible or hindered access to distribution and supply chains (Transportation services to ports for export) [15].

Climate Change Impacts on Mining Value Chain: A Systematic …

121

• Changes in temperature, precipitation, and wind may all directly affect mines. For example, strong winds can damage electrical lines and high temperatures can lead to heat exhaustion in workers, and low precipitation can restrict water availability. These climatic factors also affect the intensity and frequency of natural hazards such as forest fires, avalanches, flooding, landslides, drought, and landslides [15]. • Surface mining and CC are potential threats to Environmental systems. For instance, mining and CC’s effects harm delicate ecosystems like wetlands [16]. • The Arctic Climate Impact Assessment (ACIA) highlights the potential impacts of permafrost thawing on transportation, infrastructure, and economic development [11, 15]. • CC significantly impacts the water management infrastructure, waste containment systems, and hydrological, hydrogeological, and geochemical conditions influencing water flow and contaminant levels at mining sites. These changes can increase the risk of acid rock drainage and metal leaching, posing serious environmental threats [17]. • Lack of guidance and misunderstanding about how to respond among employees; conflicting emergency response guidelines; a lack of contingency plans for worstcase scenarios; poor communication within and across organizations and departments; and limited awareness of sensitivity to climatic stresses are some of the issues that need to be addressed [13]. • The mining industry faces mounting pressure to explore strategies to reduce its carbon footprint, including integrating renewable energy sources, owing to heightened scrutiny of the sector’s greenhouse gas emissions and advancements in climate discourse [18]. • The life cycle costs associated with CC consequences: This is an important concern due to the long-lasting environmental consequences following mining activities and in the lifespan of mines. It may take hundreds of years, especially for large-scale mines, which disturb the physical balance of the land in the mining area and produce all kinds of waste (or Tailings) [19]. • The uncertain and long-term nature of the effects of climate change can create doubt among miners regarding their investment horizons [19]. • Large-scale mining operations, which are often significant sources of pollution, may be subject to more climate change-related regulations. As a result, investors may want to avoid investing in these mines. • The mining industry risks being negatively affected by CC legislation that may be created without sufficient participation and awareness from all stakeholders involved in the mining chain [15]. • Managerial challenges, including cost management, organizational cultural attitudes to learning and change, and inflexible company policies and government regulations, can hinder adaptation and mitigation efforts [19]. • Effective land management strategies (Post-mining land use) that address the impacts of CC are crucial for successful mine reclamation operations [40]. • CC will multiply the mining supply chain risks by increasing the complexity of systems, particularly in newly industrialized and developed nations [18].

122

A. N. Qarahasanlou et al.

4 Answer to RQ2: Model/Approaches and Strategies Mitigation and adaptation are essential and complementary strategies for the mining industry to coexist with the impacts of CC as they work together to reduce climaterelated risks [41]. To effectively implement these strategies, design makers require practical approaches and models that can be put into action. Hereof, Table 1 provides the CC impact management methods and models chronologically, highlighting risk management and vulnerability assessment. As a summary of these models, it is evident that while mines must employ a two-pronged approach to addressing the issues posed by climate change, only 20% of the evaluated methods integrate adaptation and mitigation strategies. Also, the study of the currently used techniques demonstrates that most use an analytical mode. Even though these analytical techniques focus on particular CC effects, a holistic strategy has not yet been offered. Entering into appropriate strategies against the effects identified by these models first requires identifying their main components. The key components of CC strategies are highlighted as [15]: • Include climate-sustainable development in local and operational efforts. • Develop practical measures for coping with the effects of CC with the host communities. • Examine how enhancing local resilience through investments in ecosystem services. • Consult with stakeholders to comprehend their recent worries. • Launch industry-wide cooperation on regional adaption plans. In these regards, climatic adaptations and mitigation strategies in the mining sector conducted by different researchers as follow: Climate Mitigation Strategies: • Accurate research and identification of CC-prone mining locations [19]. • Mines should be encouraged to use renewable energy through legislation, tax rebates, etc. [1]. • Some mitigation measures could be implemented at the operational level [1]: – – – – – – – – – –

Fuel switching (Hybrid diesel, out of diesel) Energy efficiency (Lighting, motors, pumps, conveyors) Renewable energy (Procurement, PPAs, on-site) Battery storage (Energy storage, electric vehicles) Artificial Intelligence (Analytics, machine learning) Digitization (Data processing, interfaces) Low-carbon electricity (Renewables, CCS, SMRs) Ore processing improvements (Bulk processing efficiency) Hydrogen fuel cells (Electricity, machinery) Other (RD&D, grade engineering)

Climate Change Impacts on Mining Value Chain: A Systematic …

123

Table 1 Approaches for adaptation and mitigation in the mining sector Researcher

Main approach, contributions

Category Researcher

Main approach, contributions

Category

Auld and “No regrets” and Maclver, 2006 “adaptation learning” (conceptual) [20]

AD

Chavalala, 2016

Mixed-method (analytical) [21]

AD

Pearce et al., 2011

Vulnerability-based (conceptual) [22]

AD

Odell et al., 2018

Nature-society relationships (conceptual) [23]

AD&MI

Riaza et al., 2007

Hyperspectral imaging AD&MI Hotton et al., (analytical) [24] 2018

Capillary barrier effects (analytical) [25]

MI

AD

Garnaut, 2008 Vulnerability-based (conceptual) [9]

AD

Kosmol, 2019 Vulnerability based on Notre Dame Global Adaptation Country Index (analytical) [26]

Pearce et al., 2009

Two-stage vulnerability-based (conceptual) [13]

AD

Nunfam et al., 2019

Rayne et al., 2009

Risk of water quality MI (conceptual-analytical) [28]

Mavrommatis Regional CC risks AD et al., 2019 (analytical) [29]

Pearce et al., 2009

General circulation models (conceptual) [13]

MI

Mavrommatis Bottom-up and Damigos, survey-based 2020 (analytical) [30]

Ford et al., 2011

Questionnaire for threats that CC (analytical) [31]

AD

Sun et al., 2020

Integrated climate AD risk index (CRI) based on return on total assets (analytical) [32]

Mason et al., 2013

Risk-based (conceptual) [33]

AD

MAC, 2021

Adaptation measures (conceptual) [3]

AD

AD

Bresson et al., 2022

Risks and vulnerabilities (analytical) [35]

AD

Loechel et al., Questionnaire for attitudes and actions 2013 about CC adaptation (analytical) [34]

Contemporaneous AD mixed methods based on risk (analytical) [27]

AD&MI

(continued)

124

A. N. Qarahasanlou et al.

Table 1 (continued) Researcher

Main approach, contributions

Anawar, 2013 Acid mine drainage (AMD) (analytical) [36] Baisley et al., 2016

Category Researcher

Main approach, contributions

Category

AD

Revised the mine reclamation design (conceptual) [37]

AD&MI

Ngoma et al., The endogenous 2023 latent factors and exogenous latent variables in the partial least squares structural equation modeling technique (analytical) [39]

AD&MI

Risk-based (analytical) MI [38]

Rüttinger and Risk by iModeler and Sharma, 2016 vulnerability (analytical) [18]

AD

Xie and van Zyl, 2022

AD: Adaptation, MD: Mitigation, AD&MI: Adaptation-mitigation

Climate Adaptations Strategies: • Information and communication technology (ICT) innovation adoption as part of adaption measures may lessen the susceptibility and exposure of the mining sector to CC catastrophe risks [42]. • Improved operational safety and resilience under projected future climate conditions through better planning, design, building, and maintenance [19]. • Mining contracts must consider CC concerns (national adaptation plans and climate adaptation guidelines), especially for areas highly vulnerable to the effects of CC [1]. • The mining plan’s tailings dam design must adhere to the most recent international safety requirements, and ongoing maintenance and clean-up procedures must be followed [1]. • Applied engineering solutions can play a critical role in achieving climate resilience, which refers to the ability of communities and systems to withstand and recover from the impacts of CC [19]. • At the operational level, several adaptation activities could be carried out [1]: – – – – – – –

Special pumps must be set up at the mine site to remove the water. Mines open early if winters are short and close early if winters are long Autonomous operations (Drilling, loading, haulage) Fugitive emissions reduction (Ventilation Air Methane, CH4 capture, and use) Electrification (Mine processes, transport) Tailings management (Emissions capture and mineral carbonation) Water management (Treatment technologies)

Climate Change Impacts on Mining Value Chain: A Systematic …

125

Jointly for Adaptation and Mitigation Strategies: • Governments could mandate that mining corporations account for all direct, indirect, and induced impacts on forests at every step of operations to limit forest conversion [1]. • Governments should stipulate clearly when granting mining firms access to water [1]. • At the project’s start, closure plans incorporating climatic risks should be offered and submitted along with the environmental and social impact assessment. • Use educational initiatives to raise knowledge of and capacity to address urgent threats, particularly in isolated communities, like rising flooding, bushfires, and diseases from mosquitoes [19]. • Impacts that have been identified cascading down, up, and across the supply chain (e.g., a higher frequency of intense rain events (in summer) may cause more transportation disruptions on the roads), which would reduce mining productivity at this time of year and have an impact on jobs and communities (Transport → mining → human resources → community)) [19].

5 Discussion and Conclusion This article streamlines the actions performed on climate mitigation and climate adaptation in the mining sector to address climate change challenges. The various climate mitigation and adaptation approaches and strategies are presented based on this streamlining. The review discovered that risks and uncertainties associated with the CC must be considered to make informed decisions. Once these risks have been identified, examining how each element or mineral may behave under the given circumstances is essential. On the other hand, CC can have a long-term/gradual impact on mines operation, requiring comprehensive planning to tackle CC challenges. The first research question explored the impacts of CC on mining activities. The challenges for mitigation and adaptation actions include low-level awareness of climate change impacts in mining, a lack of technical knowledge, high capital costs, and existing rules and regulations, necessitating a multidimensional approach to education and cultural advancement, encompassing political, legal, and practical aspects. The second research question explored the current CC impacts and associated assessment methods. This section divided the survey into analytical and conceptual approaches. Our survey on climate adaptation and mitigation revealed a critical need for long-term planning in the mining industry. In addition, our analyses show most mining and exploration industries focus on reducing mining’s impact and climate mitigation action rather than adapting to extreme future weather events. Therefore, there is a need to emphasize more on climate adaptation action to fulfill the future demand and smooth transition toward sustainable mining development. According to the literature review done in mining engineering, the knowledge base in this area is relatively new, emphasizing bringing attention to current problems

126

A. N. Qarahasanlou et al.

rather than providing practical solutions. This issue is also evident in the development of proposed models, which lean more toward raising awareness of concerns than toward useable solutions. While strategy and macro-level decisions are crucial for combating CC, operational planning is becoming increasingly vital to support them. As a result, managers and decision-makers must concentrate on creating specific plans and initiatives that can successfully handle the risks posed by CC. Most of today’s mines have a short lifespan, and CC mitigation and adaptation actions are not considered while designing and operating mining activities. These mines need to adapt to shifting climatic circumstances from an operational perspective and consider mitigation concerns for the foreseeable future. Investigations showed that one of the most important tools for implementing mitigation and adaptation programs is increasing awareness of different phases of mining activities and their associated value chain. In this regard, the mining industry still requires extensive research to develop practical and operational approaches to overcoming the challenges.

References 1. Mebratu-Tsegaye T, Toledano P, Dietrich Brauch M, Greenberg M (2021) Five years after the adoption of the Paris Agreement, are climate change considerations reflected in mining contracts? 2. Geneva S (2013) Intergovernmental panel on climate change, 2014. Working group I contribution to the IPCC fifth assessment report. Climate change, vol 8 3. MAC (2021) Guide on climate change adaptation for the mining sector. The Mining Association of Canada. Accessed 17 Mar 2023 4. Campbell R et al (2023) Mining & metals 2023: lifting the fog of uncertainty. White & Case LLP. Accessed 11 Mar 2023 5. Climate resilience strategy—mitigation & adaptation action plans. Official web site of The City of Calgary, Calgary, 18–00577050, 2018 6. Irarrázabal R (2006) Mining and climate change: towards a strategy for the industry. J Energy Nat Resour Law 24(3):403–422 7. Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report EBSE-2007-01, Keele University and University of Durham 8. Hodgkinson J et al (2010) Climate adaptation in the Australian mining and exploration industries 9. Garnaut R (2008) The Garnaut climate change review. Cambridge, Cambridge 10. Dontala SP et al (2015) Environmental aspects and impacts its mitigation measures of corporate coal mining. Procedia Earth Planet Sci 11:2–7 11. Instanes A (2006) Impacts of a changing climate on infrastructure: buildings, support systems, and industrial facilities. Presented at the 2006 IEEE EIC climate change conference. IEEE, pp 1–4 12. Prowse TD, Furgal C (2009) Northern Canada in a changing climate: major findings and conclusions. Ambio 38(5):290–292 13. Pearce T et al (2009) Climate change and Canadian mining: opportunities for adaptation 14. Smith M (2013) Assessing climate change risks and opportunities for investors—mining and minerals processing sector. Australian National University, Canberra 15. Nelson J, Schuchard R (2011) Adapting to climate change: a guide for the mining industry. Business for Social Responsibility, vol 10

Climate Change Impacts on Mining Value Chain: A Systematic …

127

16. Phillips J (2016) Climate change and surface mining: a review of environment-human interactions & their spatial dynamics. Appl Geogr 74:95–108 17. Nordstrom DK (2009) Acid rock drainage and climate change. J Geochem Explor 100(2– 3):97–104 18. Rüttinger L, Sharma V (2016) Climate change and mining: a foreign policy perspective 19. Loechel B et al (2013) Climate adaption in regional mining values chains: a case-study of the Goldfields-Esperance Region, Western Australia. CSIRO 20. Auld H, Maclver D (2006) Changing weather patterns, uncertainty and infrastructure risks: emerging adaptation requirements. Presented at the 2006 IEEE EIC climate change conference. IEEE, pp 1–10 21. Chavalala B (2016) An assessment of South Africa’s coal mining sector response to climate change adaptation demands. Thesis, University of South Africa, South Africa. Accessed 20 Mar 2023 22. Pearce TD et al (2011) Climate change and mining in Canada. Mitig Adapt Strat Glob Change 16:347–368 23. Odell SD et al (2018) Mining and climate change: a review and framework for analysis. Extr Ind Soc 5(1):201–214 24. Riaza A et al (2007) Pyrite mine wastes hyperspectral monitoring as a tool to detect climate change. Presented at the 10th international symposium on physical measurements and signatures in remote sensing, ISPMSRS07. WG, pp 12–14 25. Hotton G et al (2018) Assessment of CCBE performance with climate change: case study of the Lorraine mine site. Presented at the proceedings tailings and mine waste 26. Kosmol J (2019) Climate change impacts on mining and raw material supply chains. German Environment Agency. www.umweltbundesamt.de/en 27. Nunfam VF et al (2019) Climate change and occupational heat stress risks and adaptation strategies of mining workers: perspectives of supervisors and other stakeholders in Ghana. Environ Res 169:147–155 28. Rayne S et al (2009) Analytical framework for a risk-based estimation of climate change effects on mine site runoff water quality. Mine Water Environ 28:124–135 29. Mavrommatis E et al (2019) Towards a comprehensive framework for climate change multi-risk assessment in the mining industry. Infrastructures 4(3):38 30. Mavrommatis E, Damigos D (2020) Impacts of climate change on the Greek mining industry: perceptions and attitudes among mining industry practitioners operating in the Cyclades. EuroMediterr J Environ Integr 5(2):28 31. Ford JD et al (2011) Canary in a coal mine: perceptions of climate change risks and response options among Canadian mine operations. Clim Change 109:399–415 32. Sun Y et al (2020) The impacts of climate change risks on financial performance of mining industry: evidence from listed companies in China. Resour Policy 69:101828 33. Mason LM et al (2013) Adapting to climate risks and extreme weather: a guide for mining and minerals industry professionals. National Climate Change Adaptation Research Facility 34. Loechel B et al (2013) Climate change adaptation in Australian mining communities: comparing mining company and local government views and activities. Clim Change 119:465–477 35. Bresson É et al (2022) Climate change risks and vulnerabilities during mining exploration, operations, and reclamation: a regional approach for the mining sector in Québec, Canada. CIM J 13(2):77–96 36. Anawar HM (2013) Impact of climate change on acid mine drainage generation and contaminant transport in water ecosystems of semi-arid and arid mining areas. Phys Chem Earth, Parts A/B/C 58:13–21 37. Xie L, van Zyl D (2022) How should mine reclamation design effectively respond to climate change? A mini review opinion. J Geosci Environ Prot 10(12):117–125 38. Baisley A et al (2016) Climate change and mine closure—a practical framework for addressing risk. In: Proceedings IMWA, pp 35–42

128

A. N. Qarahasanlou et al.

39. Ngoma RGTM et al (2023) The impact of the mining equipment, technological trends, and natural resource demand on climate change in Congo. Sustainability 15(2):1691 40. Cottrell L, Stowe C (2021) Climate change and extreme weather: how can mine action programs adapt to our changing environment? J Conv Weapons Destr 25(2):6 41. OECD Publishing (2009) Integrating climate change adaptation into development co-operation: policy guidance. Organisation for Economic Co-operation and Development 42. Aleke BI, Nhamo G (2016) Information and communication technology and climate change adaptation: evidence from selected mining companies in South Africa. Jàmbá: J Disaster Risk Stud 8(3):1–9

Systematic Dependability Improvements Within Railway Asset Management Rikard Granström and Peter Söderholm

Abstract This paper describes results from a research and development (R&D) project at Trafikverket (Swedish transport administration). The purpose of the study was to systemize dependability improvements of Trafikverkt’s Control Command and Signalling (CCS) assets. A case study was conducted on level crossings that represent a critical part of the CCS system. The results of the study show that the systemic approach contributes to asset management as it contributes to short-term dependability and productivity improvements as well as to medium term specifications for system modifications and long-term specifications for next generation of level crossings. The approach is based on a combination of methodologies and tools described in dependability standards, e.g., Failure Modes, Effects & Criticality Analysis (FMECA). However, the approach also considers aspects from Design of Experiments (DoE) to support field tests aligned with other tasks in the railway infrastructure. Besides contributing to improvements, the approach comply with regulations and mandatory standards. Examples of these are Common Safety Method for Risk Evaluation and Assessment (CSM-RA, EU 402/2013) and EN 50126–RAMS (Reliability, Availability, Maintainability & Safety) for railway applications. In addition, the approach comply with regulatory requirements related to enterprise risk management and internal control, i.e., effectiveness, productivity, compliance and documentation. The approach also supports asset management in accordance with the ISO 55000-series. Keywords Risk · Dependability · Asset management · Railway · Infrastructure · Research · FMECA · Level crossing · Development · Improvement · Compliance · Maintenance program · Maintenance concept

R. Granström Trafikverket, Box 809, 971 25 Luleå, Sweden e-mail: [email protected] P. Söderholm (B) Trafikverket and Quality Technology & Logistics Luleå University of Technology, Luleå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_10

129

130

R. Granström and P. Söderholm

1 Introduction Besides changes in a technical system’s reliability and maintainability, one central part of continuous dependability improvement is a living maintenance program as prescribed by Reliability-Centred Maintenance (RCM) [1]. However, any change to the maintenance program that might affect railway safety must follow CSM-RA. In addition, any safety-related requirement should be evaluated with regard to availability and life cycle cost (LCC) before it can be accepted. Hence, any improvements of safety or dependability should be integrated with each other, as described by EN 50126 (RAMS) [2]. There are many safety critical functions within the railway infrastructure. One of these functions are the possibility to safely pass the railway track at level crossings. Hence, level crossing users and related accidents are one risk category that has to be managed in accordance with regulations such as Common Safety Targets (CST). Besides safety, level crossings also impact the availability of infrastructure for train passages and road traffic, but also asset management and LCC. Hence, continuous dependability and productivity improvements related to level crossings can contribute positively to effective railway asset management. Maintenance programmes governing the maintenance of Trafikverket’s infrastructure assets have from the outset been constructed based upon expert judgments from which rules and regulations for maintenance have been derived. During the years, new rules have been added, sometimes due to incidents or accidents, some of which have had fatal consequences. Hence, some rules are “written in blood”. As experts retire or go to other occupations, the rules remain, while the knowhow and logic to why rules are existing slowly diminishes. This creates a situation where maintenance programmes becomes static as by ossification. Without a systematic description of the logic from which maintenance programs are derived, it is difficult for the maintenance engineers to implement changes, alter the maintenance programs, and at the same time assess consequences of proposed changes. New maintenance technologies and methodologies emerge, but are rarely adopted and implemented to support a more effective and efficient asset management. This paper is based on a study conducted at Banverket (Swedish railway administration), predecessor to Trafikverket (Swedish transport administration). The study sets out to explore how dependability improvements of level crossing assets could be obtained, by applying standardised dependability methodologies. The study is a relevant description of a methodology that can be applied to create necessary prerequisites for any organisation wanting to improve dependability as well as contributing to improvements of LCC within safety–critical systems. As an exemplary case, the study it is also a relevant description of efforts required to implement changes in maintenance programs and maintenance concepts within railway.

Systematic Dependability Improvements Within Railway Asset …

131

2 Method and Material The theoretical framework of this work is based on best agreed upon applications, as described by common dependability and railway specific standards. Examples of central standards are IEC 60300-3-11 (Reliability-Centred Maintenance, RCM) [1], IEC 60300-3-14 (Maintenance and maintenance support) [3], IEC 60812 (FMEA/ FMECA) [4], and EN 50126 (RAMS) [2]. In addition, theories related to Design of Experiments (DoE) are used to plan verification and validation of changes in current maintenance programs. The empirical material used in this work is based on Trafikverket’s asset management of level crossings. Besides maintenance programs, data is collected from the inspection system (Bessy), the fault reporting, analysis and corrective action system (0Felia), and the asset register system (BIS). Field observations during maintenance execution were also used to collect data relevant to productivity improvements of existing maintenance concept. The study was executed in three major steps. The first step of the study was initiated by a proposal to use FMEA to systemize the dependability of the level crossing system. Findings from this initial study rendered into recommendations for operational changes of the existing maintenance programme and maintenance concept as well as recommendations for tactical changes for functional modifications of the level crossing assets [5]. Even some strategical recommendations for the next generation of level crossings were made. The second step of the study was initiated by Banverket to assess the effectiveness of proposed operational changes to the maintenance programme and the maintenance concept. Therefore, a decision was made to do an analysis to estimate the impact of proposed changes in terms of cost and dependability and to find a track section at which recommendations for maintenance programme and maintenance concept could be deployed for demonstration and verification [6]. The third step was a verification testing in field, which focused on the implementation of recommendations at track section number 524 between Hallsberg and Frövi stations [7]. An extended maintenance programme was executed by the maintenance entrepreneur Infranord. In addition, field observations were made to assess the feasibility of recommended maintenance execution and to study the time it takes to execute maintenance in track.

3 Results The results from the study are presented in relation to the three steps of the applied approach to provide useful insights from the performed study.

132

R. Granström and P. Söderholm

3.1 Step 1–FMEA To construct a FMEA for the level crossing system was not a straight-forward exercise. The first obstacle to overcome was to define the actual system of interest, i.e., which inherent items that are part of the level crossing system. No existing drawing of the system could be obtained. Therefore, field visits and archival analysis of Banverket’s material catalogue were used to assemble items that are inherent to the level crossing system. Another challenge was to determine the system boundaries, and thereby which inherent items to be considered in the analysis. The aim of the study was to improve system dependability. Hence, it was necessary to identify which items that are critical for the dependability of the level crossing system. The system of interest should be the constitution of inherent items that provide the dependability of required function. Hence, the start of the analysis was to identify the required function of the level crossing system. Reasoning led to the conclusion that the required function of a level crossing system is to provide reliable go/no-go signals to both rail and road traffic. Therefore, the system of interest and the items included in the study were selected upon the basis that they are critical for the required function (go/no-go for rail and road traffic). In order to obtain a useful system structure for analysis, items had to be grouped into sub-functions that are critical for obtaining the required system function, se Fig. 1. Inherent items that are critical for obtaining sub-functions were grouped into their respective category. See Fig. 2, example of some items critical for the crossing gate mechanism function. Field visits conducted together with maintenance personnel during the initial phases of the study were valuable for understanding the physics of the degradation of the system and for how maintenance is executed. This understanding was especially useful for later work with the FMEA-sheets. One example of the degradation process can be related to the crossing gate arm. In Sweden, level crossing gate arms are made of wood with a protective coating of paint. The crossing gate mechanism is supposed to work as a seesaw (teeter) in almost perfect balance. However, if the coating is compromised, the wood will absorb

Fig. 1 Sub functions of required system function

Systematic Dependability Improvements Within Railway Asset …

133

Fig. 2 Some items critical for the crossing gate sub function

Fig. 3 Level crossing gate arm

moisture. Hence, the crossing gate will become heavier, which affects the balance of the seesaw. This will in turn lead to excessive degradation and failure of mechanical components within the crossing gate mechanism. See Fig. 3. Another example of a failure process can be seen in Fig. 4. Here, grease is injected into the grease nipples on the outside of the crossing gate mechanism, in accordance with the maintenance programme. However, excessive grease propagates to the inside of the mechanism and drips down onto the roller switch (which indicates the possition of the gate arm) causing it to lose its function, usually in an intermittent manner. One of the challenges when working with complex systems where many different inherent items can experience multiple failure modes is to get a bird eye system view of the problem. One solution to this problem was to perform a rudimentary form of Fault Tree Analysis (FTA), illustrated in Fig. 5. By using different colours, different aspects of function, failure mode and maintenance programme can be illustrated. The orange colour illustrates the unwanted top event of a faulty crossing gate mechanism function. The yellow colour illustrates the degraded functional states of sub-components. The blue colour marks the failure modes and the grey colour

134

R. Granström and P. Söderholm

Fig. 4 Grease in roller switch

indicate areas where the current maintenance programme or its application can be insufficient. The FMEA was constructed following recommendations from IEC 60812 (Analysis techniques for system reliability—Procedure for failure mode and effects analysis, FMEA) [4]. At outset, the study followed the standard structure of the FMEA sheets, where the physical item was the baseline for the analysis. However, this led to

Fig. 5 Rudimentary documentation of fault tree analysis (FTA)

Systematic Dependability Improvements Within Railway Asset …

135

a rather tedious exercice since a number of items experience the same type of failure modes, causing each failure mode to be described in the same way on multiple occasions for different items. This causes the FMEA-sheet to become somewhat incoherent and the group working with the analysis lost focus on multiple occations. Hence, the recommendation for future studies is to focus on functions instead of physical items. In this case, the study should emenate from the required system function (e.g., level crossing), or appropriate sub system function, e.g., the crossing gate mechanism’s required function. Next step is to assess the failure modes which can cause loss of required function. Thereafter, the items whose functions are critical for maintaining required function. See Table 1. This approach is a functional FMEA and is in its structure much more like the representations created in the fault trees. This is also from a maintenance point of view a more rational structure since the purpose of maintenance is to maintain or restore required system functions and not the condition of physical items. In addition, the FMEA (Table 1) prooved to be much valuable to reconstruct the logic behind the documented maintenance programme. Table 1 Part 1, 2 and 3 of FMEA-sheet

136

R. Granström and P. Söderholm

The first part of the FMEA describes unit, function and failure mode. The second part of the FMEA describes the failure modes’ failure cause, local and final effect as well as the detection method (e.g., inspections), and compensating provisions against failure (e.g., actual maintenance task to be executed). The third part of the FMEA contains recommendations for alternative detection methods and recommended measures. Parts one and two of the FMEA-sheet describe the present situation. Part three of the FMEA-sheet describes improvements that can be achieved compared to the present situation. The beauty of this structure is also that it directly indicates which failure mode a recommended measure addresses, while the structure allows us to see which type of effects we could expect in relation to implementing, e.g., recommended changes. Overall, it gives a much useful initial structure for Design of Experiments (DoE). The combination of FTA and FMECA allows to recreate the logic behind the maintenance programmes. It also provides a baseline for experiments, which can be conducted in an orderly fashion since it already from the outset is possible to isolate the consequences of proposed changes. The FMEA is also valuable for collecting recommendations for future system modifications. The conducted study rendered in a long list of recommendations. Some of the operational recommendations are: . Better instructions for battery maintenance, remove oxide from terminals, use special grease. . Develop decision support for when to exchange batteries. . Instructions for applying grease, and cleaning of excess grease. . Better instructions for when to paint gate arms. . Include inspection of cross gate mechanism heating. . Make sure that failures of structural items are corrected (doors, seals, mosquitonets, rubber strips). Some of the tactical and strategical recommendations are: . . . .

Use other material than wood in gate arms. Exchange track circuits to axle counters. Change signalling light from bulbs to LED. Better instructions for snow removal.

3.2 Step 2–Planning of Field Experiment To execute any change of a railway maintenance programme in Sweden, as in most national railway administrations, is a major task. It involves updates of rules and regulations, preparation of information systems to accommodate the changes, e.g., new inspections and new maintenance tasks. It involves developing new courses and to educate technicians on a nationwide scale. Since maintenance is executed by entrepreneurs, the changes also have to be accommodated within new and existing

Systematic Dependability Improvements Within Railway Asset …

137

maintenance contracts throughout the nation. Therefore, it is necessary to validate changes in small scale before executing nationwide plans. From the recommended changes to the maintenance programme suggested by the FMEA, an analysis was made to estimate the impact of proposed changes in terms of cost and dependability. The analysis also included the task to find a track section at which recommendations for maintenance programme and maintenance concept could be deployed for demonstration and verification. Track section number 524 was identified as a useful candidate for verification testing. This track section had a high number of faults per level crossing. Configuration wise, the track section represented an average constitution of level crossings in Sweden. Based on fault data this track section was in a degraded state and in need of improved maintenance. The number of level crossings (20) for the study was judged satisfactory. Expected results from the test: 1. Improved dependability and reduced cost for preventive and corrective maintenance. 2. Cost–benefit of performed measures, which could serve as decision support for further dependability improvements for level crossing assets. 3. A demonstrator for how changed maintenance programmes and maintenance concepts could be deployed nationwide, also for other systems than level crossings. The cost–benefit analysis was constituted in three main parts: 1. Initial increase of preventive maintenance effort and related cost to restore the dependability of the level crossing system. 2. Increased cost for deployment of new recommended maintenance programme. 3. Cost–benefit comparison between increased preventive maintenance cost (pt.1 and pt. 2) compared with reduced cost for corrective maintenance. Based on the assumption that corrective maintenance measures could be reduced by some 50% due to the increase of system dependability. In theory, if the initial calculations holds, the pay-off time for initial cost increase from pt. 1 and pt. 2 would be somewhere between 2–3 years. After which the dependability level could be maintained at a lower cost than required by previous maintenance programme. From this part of the study recommendations were also made for practical preparation of the actual experiment, some of which are: . Detail the maintenance concept for critical items. One example being the crossing gate mechanism motor, which due to excessive application of grease can experience intermittent failures when grease comes into to the motor and clogs the brush and commutator. . Training programme for the entrepreneurs in accordance to the new maintenance programme. . Develop routine to keep track of costs within the experiment. . Open books, to keep track of real costs.

138

R. Granström and P. Söderholm

. Continuous assessment meetings with the entrepreneur to keep them focused on the task at hand. There is always a risk that they return to old habits. . The maintenance programs should contribute to extended life length of items. However, the reporting structure of failure or faults does not support an adequate description of items’ life lengths. Therefore, it was recommended to conduct interviews with technicians to get their assessment of the matter. . Conduct interviews with infrastructure manager and signalling experts to document their experiences, which are useful for further development of maintenance programs and for the deployment on a larger scale.

3.3 Step 3–Execution of Verification Field-Test Verification testing started with an inspection fit for the purpose of assessing the amounts of material that had to be pre-ordered before the physical work could be conducted. Two days were required to complete the inspection, i.e., 10 level crossings per day. Results of the experiment were in accordance with expectations. Hence, the number of faults were reduced by a magnitude of 50%. The physical work was conducted in June 2012, Table 2 shows the development of signalling related faults for the years 2008–2021. Interesting to observe is also that the number of no fault found (NFF) events were reduced in comparison to previous statistics. It should be mentioned, that the test was only carried out 2012 to somewhere in 2014. A technician involved in the test, who is still working on the same track section have confirmed that maintenance execution from 2015 onwards has in most aspects returned to how maintenance was carried out before the test. This might explain why the number of fault have increased from 2015 onwards. As for the NFF events, the crossing gate mechanism motor has been a source of intermittent failures in all level crossings. There was a hypothesis that the cause of the intermittent failures were the cause of excess grease contaminating items internal to the motor. Before executing the verification test, the project set out to test this hypothesis by examining a couple of the motors to assess whether grease was in fact contaminating items internal to the motor. A couple of motors were therefore disassembled for this purpose, see Fig. 6. The picture in the middle clearly shows a contaminated commutator. The left picture shows commutator after cleaning operation. From a cost perspective, this finding was especially interesting. The current Table 2 Signalling related faults and no fault found (NFF) 2008–2021 Year

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Faults

70

74

65

54

39

35

28

31

53

44

51

37

46

56

No fault 39 found (NFF)

42

33

31

18

16

14

13

27

12

18

9

16

25

Systematic Dependability Improvements Within Railway Asset …

139

Fig. 6 Motor after and before refurbishment, right picture engine marked for test

practice is to replace the motor in field if it has any major problem. The cost for one new motor in 2012 was 1,550e. If the motor had been refurbished at a factory, the cost was somewhere around 1,200e. A time study conducted before the field test showed that it took about 30 min to refurbish an engine in field conditions, with an estimated refurbishment cost of about 50e. Hence, the cost saving potential is somewhere around 25–30 times. This could have an enormous impact on the total asset management cost of level crossings if implemented nationwide. A test was set up within the verification test to examine whether refurbishment of engines was, from a reliability point of view a viable alternative to exchanging motors. Therefore, a lot of motors were replaced in the beginning of the verification test. Motors were exchanged in three categories, new, refurbished in factory, and refurbished in field. In addition, instructions for applying a correct amount of grease to the motors was included in the maintenance concept. Table 3 shows the number of reported faults on track section 514 where a motor is replaced for the test series. Interesting to observe is that none of the motors replaced due to fault is a motor refurbished in the field. Therefore, it can be concluded that refurbishing motors in the field is, from a reliability and cost point of view, a much more viable task than the other two alternatives. Tightening of terminals is also something that should be included in the maintenance programme. On multiple occasions, it was discovered that connectivity was not satisfactory. This is a likely cause of intermittent failures and NFF events. Another interesting field observation was that the paint recommended for protective coating of the gate arms was not working as intended. It did not provide an Table 3 Number of faults requiring exchange of motor 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Motor 2

3

4

6

2

0

1

0

1

1

0

0

0

0

0

140

R. Granström and P. Söderholm

Fig. 7 Protective coating of gate arm

adequate protection cover. However, when efforts was made to paint the gate arms with a second layer (15 min after the first coating) of the same paint the results was fantastic, almost like a shrink tubing had been placed on to the arm, see Fig. 7. As for future maintenance concepts of gate arms, they should state that two coats of paint should be applied and that the tip of the arm should always be painted. The tip of the arm is almost always faced upwards. Hence, without protective coating at the tip, water will sipper into the wood of the rod. A time study was also conducted to assess the production rate for maintenance of level crossing. During the planning of the experiment phase, an assumption was made that two persons would require one day to perform maintenance of one level crossing in accordance with the new instructions. During the experiment, it was obvious that two level crossings could be managed if the execution was well planned and all required materials were prepared and brought to field. For future maintenance concepts, it is useful that the same personnel gets to do the physical work since they get better at it for each day.

4 Discussion The work presented in this paper shows that it is possible to fulfil requirements related to enterprise risk management and internal control (i.e., compliance, effectiveness, productivity and documentation) when working with continuous dependability improvement. The applied combination of methodologies focusing on process, function and system (primarily FMECA, FTA and DoE) fulfil requirements related to compliance (primarily railway safety and RAM) and documentation. In addition, the effectiveness is improved by increased availability performance simultaneously as the productivity is improved by cutting costs. However, in spite of this, there have not been any general implementation of the project results, e.g., by changes in Trafikverket’s maintenance programmes. This unsatisfactory situation is not unique

Systematic Dependability Improvements Within Railway Asset …

141

for this project or for Trafikverket, but it is actually becoming more frequently recurring. Hence, there are dedicated research areas studying the pacing problem, i.e., the phenomenon where the regulators are struggling to keep pace with the fast technological development and avoid ossification. As indicated above, roles responsible for and working with dependability improvements should participate in every part of the maintenance process. This to ensure that changed tasks in maintenance programmes are supported and executed in the right way at all process phases. It is important that correct information is collected throughout the experiment. Preferable, the same systems should be used for planning and data collection during the experimentation study as during normal operation. One reason is that different tasks in the railway infrastructure should be coordinated, e.g., regarding possession times. Another reason is that data should be used to measure any change in dependability and cost. It was possible to get dependability data with good enough quality by combining sources such as the inspection system (Bessy), the fault reporting and corrective action system (0Felia), and the asset register (BIS). Hence, it was also possible to estimate the obtained dependability improvements. However, cost data with sufficient quality was more challenging to obtain. Hence, it was not possible to estimate the actual cost savings in a good way. The major obstacle for receiving good quality cost data is that Trafikverket only has aggregated or contract-related cost data. However, the maintenance entrepreneur has more specific and real cost data related to individual maintenance tasks in the railway infrastructure. However, even when data is available it may not be relevant for dependability improvements related to the maintenance programmes. Hence, the use of FMECA can support the identification of relevant information to collect in field to monitor specific functions and their failure modes. In addition, the FMECA can be used to evaluate possible condition monitoring applications to support condition-based maintenance (CBM) of different failure modes. When working with changes that requires field tests in the railway infrastructure, it is crucial to always start with the design of the experiment. This involves to identify what to control, what to measure, how to measure and determine how to perform the evaluation. It is not sufficient to expect that these things sort itself out at the end or that someone else will take responsibility at the end. These things have to be determined already in the experimental design stage, and pre-test is a good way to test assumptions made in the design. Without proper preparation of experimental design it is not uncommon that data is collected which at the end is shown to be insufficient for evaluation purposes. The planning of the experiment should consider the overarching purpose. In this case, it is to receive information to make a correct decision about improvements of the existing maintenance programme. Hence, it is a deductive approach that starts with the decision, then identifies necessary information as support, and finally what data to collect and analyse. The FMEA and FTA are valuable to get a useful system description for experimental design purposes. Hence, the FMECA and FTA can be used to document expert knowledge and judgement

142

R. Granström and P. Söderholm

or statistical correlations that are the foundation of the maintenance program. The experiments can contribute with establishing causal relationships by verifying and validating expert judgements and statistical correlation analyses. After the test was executed, Trafikverket performed an own inspection to control that all included level crossings had been maintained by the entrepreneurs according to the new instructions. This control revealed that some maintenance tasks had been omitted by the entrepreneur. However, after the control inspection by Trafikverket and resulting communication activities, the entrepreneur executed the remaining work. In summary, trust is good, but control is better (necessary) to enable a follow up of the effects of proposed changes. Hence, the ones responsible for the test have to stay involved in all aspects of it, especially regarding maintenance execution in field. It is not possible to assume that the operative maintenance personnel in field automatically understands what and how tasks should be executed, or that they are motivated. Hands-on education in the field is the best way to get an understanding and acceptance, i.e., show physically how the tasks are to be carried out. The ones designing the test will also learn and get insight about the application area and its limitations by participating in day-to-day maintenance activities in the field. The later follow-up of the long-term effects of proposed changes, reveals that it is challenging to follow a planned experiment that requires an extended test time to estimate achieved affect. Hence, it requires perseverance to conduct a planned experiment that extends over several years. Especially when persons related to decisions and tasks included in the experiment includes changes at both the infrastructure manager and at the maintenance entrepreneur. Normally, most persons are excited to test something new, but perseverance is required to conduct a study for a number of years. An interesting finding of the study is related to the maintenance of level crossing motors. By repairing the motor in field instead of replacing it, gives a potential cost saving of 25–30 times. When considering this on a national level for all relevant level crossings, the savings becomes significant. Hence, the FMECA and FTA supported a simple level of repair analysis that identified a dependability improvement with great cost saving and negligible negative impact on the availability for functions required by rail and road traffic. However, for lines with very high density traffic, replacement of the motor may still be preferred compared with field repair due to availability requirements. In these cases, a combination can be applied by using a field restored motor for replacement of the faulty motor and thereafter repair the replaced motor in field. By this approach, only field level maintenance is involved and other levels of repairs are excluded. This should reduce NFF events and associated costs, as well as other Life support costs (LSC), e.g., related to logistics. Acknowledgements We acknowledge the financial and intellectual support received from Trafikverket, especially the R&D project ASSET (TRV 2022/29194). Functionality provided by Reality Lab Digital Railway (TRV 2017/67785) is also highly appreciated.

Systematic Dependability Improvements Within Railway Asset …

143

References 1. IEC (2010) 60300–3–11– Dependability management application guide–reliability centred maintenance. International Electrotechnical Commission, Geneva, Switzerland 2. IEC (2017) 50126–Railway applications–the specification and demonstration of reliability, availability, maintainability and safety (RAMS). International Electrotechnical Commission, Geneva, Switzerland 3. IEC (2008) 60300–3–14–Dependability management–application guide–maintenance and maintenance support. International Electro technical Commission, Geneva, Switzerland 4. IEC (2006). 60300–3–11–Analysis techniques for system reliability–procedure for failure mode and effects analysis (FMEA). International Electro technical Commission, Geneva, Switzerland 5. Granström R (2009) DoU utredning av Banverkets vägskyddsanläggningar. Technical report. Trafikverket, Luleå 6. Granström R (2011) Underhållsbehovsanalys—Beslutsstöd för driftsäkerhetshöjning vägskyddsanläggningar. Technical report. Trafikverket, Luleå 7. Granström R (2012) Halvårsutvärdering förstärkt underhåll vägskyddsanläggningar BDL 524. Technical report. Trafikverket, Luleå

A Conceptual Model for AI-Enabled Digitalization of Construction Site Management Decision Making Gaurav Sharma, Ramin Karim, Olle Samuelson, and Kajsa Simu

Abstract Artificial Intelligence (AI) and digitalization are changing the landscape of performing projects in the construction industry. In construction projects, the responsibility for achieving the required scope within a specified timeframe and estimated cost lies with the project managers. To meet these stringent requirements, site managers must make on-site project decisions, relying heavily on their past experiences and intuition. Moreover, the complexity of the decision-making process is amplified by the need to manually review various information sources such as Building Information Modelling (BIM), Bill of Quantities (BOQ), construction drawings etc. The heterogeneity, complexity, availability, accessibility, and volume of these contents that need to be processed by the decision-maker pose risks to the decision-making processes that may adversely impact resource consumption, planned time, and cost. The emerging technologies related to AI and digitalization are expected to improve the effectiveness and efficiency of the decision-making process in construction projects. However, the development and implementation of such technologies are highly dependent on the identification and definition of contexts to which AI and digital technologies add additional value. Hence, the purpose of this research paper is to study and explore the decision-making process of site managers. This paper also provides the identified gaps and potential of utilizing AI and digital technologies to assist in decision-making. Further, this paper proposes a set of conceptual models that combine hardware and AI algorithms to support the site decision-making process. The findings provide insights into the complexities site managers face and offer innovative approaches to mitigate risks and improve decision making efficiency. G. Sharma (B) · R. Karim · O. Samuelson · K. Simu Luleå University of Technology, Luleå, Sweden e-mail: [email protected] R. Karim e-mail: [email protected] O. Samuelson e-mail: [email protected] K. Simu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_11

145

146

G. Sharma et al.

Keywords Construction management · Construction site decision-making · AI-based decision support tools · AI Factory · Asset management

1 Introduction Digitization has rapidly transformed the construction industry by enhancing efficiency, streamlining processes, and improving project outcomes. Artificial Intelligence (AI) is emerging as a transformative tool in the construction industry. It has the potential to enable new era of digitalization of site management processes to optimise operations, reduce reworks and failures, ensure safety and security at the construction site, and harmonise the workflows. The potential of AI can be enhanced when combined with hardware platforms such as LiDAR, Metaverse, AR, VR, etc. [1]. Also, appropriate data analytic techniques are required to fit the data and generated information for AI. These technologies can facilitate numerous iterations through tracking and visualising construction progress and running algorithms to detect compliance with design. Additionally, AI can offer diverse applications in scheduling and procurement, predicting cost overruns by analysing factors like project size, contract type, and managerial competence. It can also aid in risk management by prioritising issues and collaborating with high-risk subcontractors. AI optimises work cycles, improving efficiency and reducing delays. Above all, AI can equip them with advanced tools to enhance quick and informed decision-making for Site managers [2]. These managers, play a crucial role in overseeing operations, managing resources, and ensuring timely project completion. They are signing authority for project-related issues and are under pressure to make quick decisions which can alter the course of the project [3]. However, there are research gaps in leveraging AI solutions empowered by digital technologies effectively and efficiently in the construction industry. Despite some progress, there are unexplored research areas in leveraging AI solutions as a decision-support tool in the construction industry, some of these contributions and initiatives are highlighted by Pan and Zhang Exploring this untapped potential and addressing the above-mentioned challenges can shape the industry’s future [4]. Based on the above premises, the aim of this paper is twofold: firstly, to explore the issues and challenges that site managers face in their day-to-day decision-making, and secondly, to suggest conceptual models for developing a decision support toolkit using digitalization and AI-based tools.

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

147

2 State of the Art Globally, individuals and businesses spend over $10 trillion per year on constructionrelated activities—and that’s projected to keep growing by 4.2% until 2023. Part of this enormous amount of spending is on, and enabled by, rapidly moving technological advancements that touch all areas of the ecosystem. The potential applications of machine learning and AI in construction are vast. Requests for information, open issues, and change orders are standard in the industry. Machine Learning (ML) and Deep Learning (DL) technologies are like smart assistants that can search among this vast amount of data and provide conclusions to the decision-makers [5]. The development of the construction industry is severely limited by the complex challenges it faces such as cost and time overruns, health and safety, productivity, and unavailability of labour. Also, the construction industry is one the least digitalized industries in the world, which has made it difficult for it to tackle the problems it currently faces [2]. An advanced digital technology, Artificial Intelligence (AI), is currently revolutionising industries such as manufacturing, retail, and telecommunications. The subfields of AI such as machine learning, knowledge-based systems, computer vision, robotics and optimisation have successfully been applied in other industries to achieve increased profitability, efficiency, safety, and security [2]. While acknowledging the benefits of AI applications, numerous challenges exist in use of AI in the construction industry [2]. For example: Life safety is an area where companies prefer to choose the conventional path and hesitate when it comes to adapting technologies such as artificial intelligence, as it is often equated with leaving an individual’s life in the hands of artificial intelligence [5]. Site decision-making plays a vital role in the management of construction sites, as managers are tasked with overseeing multiple activities to ensure project success. This overview investigates the situation faced by construction site managers, examines the characteristics and statistics related to decision-making, and explores the prospects of digital technologies/AI on the decision-making process. This section draws insights from various research papers, thus, providing a basis for the discussions in subsequent sessions.

2.1 Construction Site Managers Amid the dynamic construction sites, the site/construction manager is responsible for handling countless challenges, staying determined to finish the project within a given time frame and cost, at the same time adhering to the quality. Site managers generally experience being stuck in between production objectives and day-to-day administrative/decision routines [6]. The site managers witness high level of anxiety irrespective of their grade. They are stressed because of role insecurity emanating from, fear of failure, committing mistakes, work overload, physical working conditions etc., [3]. Furthermore, construction managers are under Constant time pressure, which turns

148

G. Sharma et al.

out to be the highest source of stress [3]. Decisions made by site managers play a key role in adhering to the time schedule, as wrong decisions can lead to rework, injury, or erroneous estimates, leading to time overruns. Thus, there is a need for a decision support tool to ease the life of construction site managers.

2.2 Decision-Making on Construction Site As we discussed above, construction site managers make important decisions to manage various aspects of construction projects. Their goal is to ensure that work packages run smoothly, and the project is completed successfully. Construction site managers primarily focus on making critical decisions related to technical and engineering aspects, accounting for approximately 65% of their responsibilities. In addition, finance-related decisions also hold significant importance, constituting around 29% of their job [7]. Construction site managers regularly make decisions, regarding construction methods and techniques, material selection and specification, safety measures and risk mitigation. They also decide on financial aspects such as project scheduling, vendor selection etc. At construction sites, decision-making is a dynamic process that draws upon the collective experience and knowledge of site personnel. These decisions are informed by a multitude of factors, with a significant emphasis on previous experiences, accounting for approximately 44% of the decision-making process. Furthermore, domain knowledge plays a pivotal role, contributing approximately 26% to the decision-making process. Valuable insights from colleagues also shape decisions, making up around 12% of the process. Additionally, past records and historical data analysis contribute approximately 9%. Additionally, 70% of these decisions had to be taken on the same day [7]. The time pressure on site managers often fosters hasty decision-making. Moreover, site managers must deal with a lot of information overload, which they need to go through to make their decisions [6]. The above discussion highlights a gap where emerging digital technologies can assist construction managers in decision support.

2.3 AI and Digitalization as a Decision Support Tool The foremost digital technology, Artificial intelligence (AI), has helped to achieve significant contributions to the improvement of business operations, service processes and industry productivity in recent years [8]. AI is poised to make a big impact on the way things are done in several industries as an innovative approach to improve productivity and solve challenges. The construction industry faces a productivity problem and other challenges, which has the potential to be solved by AI [2].

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

149

The potential applications of machine learning and AI in construction are vast. Requests for information, open issues, and change orders are standard in the industry. Machine Learning (ML) and Deep Learning (DL) technologies are like smart assistants that can search among this vast amount of data and provide conclusions to the decision-makers [9]. AI has the capability to harness data and leverage on the abilities of other technologies to improve construction processes [2]. Thus, the advent of AI and digitization has opened new avenues for construction management. AI has the potential to improve site decision-making processes through automation, data analysis, and predictive modelling [2]. Additionally, digital technologies such as Building Information Modelling (BIM), Construction 4.0, Augmented reality (AR), Virtual reality (VR), LIDAR etc. have enormous potential in enhancing site decision-making capabilities [1]. However, there are still major challenges such as lack of clear benefits, feasibility, data management, technical issues, fragmentation, lack of skilled manpower etc. in realising true potential of digitalization/AI in the construction site management process [10]. One of the approaches to solve many of these challenges is to integrate identified technologies in a single platform and create an ecosystem of technologies that can demonstrate clear benefits [11]. Therefore, there is a need for the construction technology ecosystem to move from point solutions towards integrated technology platforms [12]. Digitalization can provide benefits such as increased internal efficiency in construction management processes and improved insight in the linkage between everyday tasks and the overall goals. Thus, there is a tangible need to demonstrate how the expected benefits can be achieved by deploying new emerging digital technologies [13].

3 Research Methodology This section presents the research methodology used in this paper for identifying issues and challenges in construction site decision making and to suggest conceptual models using which AI can act as a decision support tool. This research is based on ‘Qualitative research methods’ and we have conducted case study research and deployed techniques such as semi-structured interviews and expert consultations. The work began with a State-of-the-Art study. This step revealed a gap in the usage of AI and digitalization in the decision-making process from the perspective of site managers. To gather specific insights, we designed a Semi-structured interview using clear and concise questions. We prepared a protocol for on-site discussions to ensure consistency in data collection. We targeted site managers and supervisors from two case study sites in Luleå, Sweden, who possessed valuable first-hand experience. Semi-structured interview was conducted with them, following the protocol. The results of this stage are described in Sect. 4.1.

150

G. Sharma et al.

We analysed the identified issues and challenges, seeking input from AI and machine learning experts to develop conceptual models for addressing problems in construction decision-making. To ensure practicality, we consulted with construction experts at LTU, validating the relevance and viability of the proposed solutions. This collaborative approach improved the potential effectiveness of the models, aligning them with real-world construction needs. The results of this stage are described in Sect. 4.2. The execution of the research is summarized below in seven steps (also described in the Fig. 1).

Fig. 1 Methodology for the proposed work

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

151

Step 1: State of the Art study: this step identifies our research purpose; in this step we also establish the need for AI-based tools in the construction decision-making process. Thus, in this section, we are finding a gap in the decision-making process from the site manager’s perspective. Step 2: Framing Semi-structured interview: To identify specific issues for the next phase of our project, we conducted semi structured interviews based on questions developed through online study and expert consultation (from LTU). A protocol was also prepared to guide on-site discussions. Step 3: Identify participants: We Identified participants who were site managers/ supervisors from our 2 case study sites in Luleå, Sweden. These sites were executed by Tier 1 and Tier 2 sized construction firms. Step 4: Conducting semi structured interviews: We conducted an open discussion with site managers and supervisors. Participants were guided through the questions in an open discussion format to encourage maximum information flow from their side. Step 5: Shortlisting feasible issues and challenges: We analysed the collected responses by reviewing and identifying common themes and patterns. A preliminary screening was done in consultation with AI experts in our team and construction experts at LTU to do a preliminary shortlisting of issues to take forward. We have chosen challenges which are in line with our AI factory platform in the division of Operations & maintenance division at LTU. Step 6: Developing Conceptual models: After discussing the issues with AI and ML experts, conceptual models were developed for feasible problems. While developing conceptual models we performed minor case studies in other domains within AI factory in operation & maintenance engineering division (LTU), to understand deployment of technologies/AI. Step 7: Validation: After developing potential solution concepts, we discussed them again with site managers and construction experts at LTU to determine their viability and relevance.

4 Results and Analysis The Results section is divided into two parts. Firstly, we present the Results from the semi-structured interviews, where we describe the five challenges identified in construction site decision-making. In the second part, the Analysis section, we present conceptual models after analyzing the challenges in consultation with construction/ AI experts at LTU. Firstly, when discussing challenges in decision-making, site managers highlighted the verification of executed components compared to the drawings as a major challenge. Another significant concern raised was the safety of personnel involved in

152

G. Sharma et al.

various activities. Furthermore, the accessibility and availability of information from past project experiences and company databases were mentioned as additional issues. Additionally, adhering to schedules in terms of time and cost was reported as a challenge by site managers. These challenges are discussed in detail in Sect. 4.1. Secondly, in the proposed conceptual models, we describe various toolkits tailored to specific use cases. These toolkits integrate different technologies to enhance decision-making for site managers. The technologies include LIDAR scans for documenting as-built structures and developing algorithms to compare scans with design and safety codes. We also propose algorithms for identifying and visualizing deficiencies using virtual or augmented reality environments. AI-based solutions encompass algorithms for processing large datasets and providing natural language responses, predicting attentiveness to address safety challenges, and forecasting cost and time overruns based on historical data. Additionally, one use case incorporates health/location tracking devices as hardware support. Detailed descriptions of these conceptual models are provided in Sect. 4.2.

4.1 Identified Issues and Challenges in Decision-Making (Results) In this section, we compile a list of identified issues and challenges that have been deemed feasible to be addressed through the implementation of an AIbased construction decision-making support platform (AI factory for construction platform).

4.1.1

Reinforcement Steel

. Problem is, once the steel is tied in a concrete member, it is difficult to verify its adherence to the drawings/ BIM model. . This is especially the case in foundations where the density of steel is high. . Most important verification criteria to verify, Diameter of reinforcement bars, their numbers, positioning, and lapping length. Application is suitable for Pre-Cast/ In-situ applications. 4.1.2

Temporary Support

. Problem is, once the temporary supports are tied, it is difficult to verify its adherence to codes. . This is an issue with decision making as the site manager is supposed to verify these installations and approve them as ‘safe to use’. . She/He must physically measure for the support distances ‘centre to centre’, other requirements as per codes and physically sign them that they are safe to use.

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

153

. It is difficult to measure and verify on site. 4.1.3

Lesson Learned

. Problem is, we usually lose all the experiences from previous projects as the teams are dismantled and reorganized. . A lesson learned session is always conducted and the data is captured is a predefined format. . This data is then stored in project servers which are often inaccessible for the new project teams. . Even if the files are accessible for everyone, it is time consuming to navigate through a lot of data and reach your point of interest. 4.1.4

Accidents and Health Tracking

. Problem is that people often keep wandering in areas which are hazardous to operate in construction sites. . It is difficult to track the presence of people real time on large construction sites. . Also, it is virtually impossible to predict exhaustion and fall incidents. . A person who health metrices are not good on a day should not be deployed in harsh climate outdoor activities and in hazardous areas where a fall can be fatal. 4.1.5

Optimising Scheduling

. Problem is, to make the construction schedule practical enough to mimic real-life situation. . There is always a quest to develop an optimised schedule. . Predicting time and cost overruns in the schedule. . Predicting risks and hedging/ adjusting the plan accordingly.

4.2 Proposed Conceptual Models (Analysis) 4.2.1

Reinforcement Steel

. Visualizing the reinforcement model in VR glasses to better understand the reinforcement design before tying the bars, this will reduce the errors in comparison to looking at a 2D drawing and tying. . When the Bars are in place, a Light Detection and Ranging (LiDAR) scan is done. It will capture all physical parameters such as diameters, positioning of bars, lapping distances etc. in form of a point cloud.

154

G. Sharma et al.

Fig. 2 Reinforcement steel conceptual model

. Data cleaning through software and programming tools. Script is developed to keep only relevant data, filtering out irrelevant points which crowd the system. . A trained AI algorithm (using our AIF platform) we compare the scanned virtual mesh to the BIM model/ Drawing. . System gives decision support to site manager to give a go ahead for pouring concrete/or letting them know about the problematic areas for further manual verification. . If there are any corrections, domain specific algorithm to recommend corrections. This will be visualized with the help of AR glasses to recommend corrections in the mesh (Fig. 2). 4.2.2

Temporary Support

. When the supports are in place, a Light Detection and Ranging (LiDAR) scan captures all physical parameters such as diameter of tubes, positioning of tubes, ‘centre to centre’ distance of supports, in form of point cloud. . Data cleaning through software and programming tools. Script is developed to keep only relevant data, filtering out irrelevant points which crowd the system. . A trained AI algorithm (using our AIF platform) we compare the scanned virtual mesh to input Codal provisions fed into it. . The results are presented as a dashboard accessible through computer or handheld device. It gives site manager a go-ahead/or let them know about the problematic areas for further manual verification.

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

155

Fig. 3 Temporary supports conceptual model

. If there are any corrections, we develop algorithms to recommend corrections. This will be visualized with the help of VR/AR glasses as per the suitability of data (Fig. 3). 4.2.3

Chat Con—Lesson Learned

. A chat bot tool for finding learning/experiences from all previous projects in the company could be a solution to this problem. . It will also eliminate the need to browse through piles of documents to get access to learnings of interest. . This Language learning model, inspired by the popularity of chatGPT, trained on private data can use all previous project lesson learnt experiences. It could give answers to challenges without consulting all the people involved in the previous projects (Fig. 4). Fig. 4 ‘Chat CON’ conceptual model

156

G. Sharma et al.

Fig. 5 Wearable device for location & health tracking conceptual model

4.2.4

Wearable Device for Location and Health Tracking

. A fitness band/ring kind of tracker will begiven to every worker on site. . Temporary Virtual boundaries can be created on-site, and the location of workers can be tracked in real-time by AI, it can give a warning to the project manager about presence of people in the area before he/she approves the hazardous activity to start within the virtual boundary zone. . Also, the AI-based tool can track health metrics of workers and predict if there is a hazard of fall associated with an individual that day. AI will give decision support assistance for manpower allocation to the site manager. Thus, substantially eliminating the probability of accidents on site (Fig. 5). 4.2.5

AI for Construction Scheduling

. Projects to utilise Artificial Neural Networks to anticipate cost overruns by considering various factors such as project size, contract type, and project managers’ competence. By analysing historical data, predictive models can generate realistic timelines for future projects based on planned start and end dates.

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

157

Fig. 6 AI in construction scheduling conceptual model

. AI and machine learning solutions can be leveraged to monitor and prioritize prioritise risks at construction sites effectively. This enables project teams to allocate their limited time and resources to address the most significant risk factors. AI systems automatically assign priority to issues, while subcontractors are evaluated using a risk score, allowing construction managers to work closely with high-risk teams and mitigate potential problems. Consequently, this approach reduces overall risks and helps adhere to project schedules. . AI models, utilising historical data and environmental parameters, can estimate workload values, takt times, and other relevant factors to synchronise the work cycles of both machines and human labour. By optimising processes and managing complex interdependencies among different trades, AI can contribute to streamlining operations and enhancing efficiency. . Given below is a conceptual model but the detailed use case and its working mechanism needs to be explored in future steps of this research (Fig. 6).

5 Conclusion In this research paper, we have explored the decision-making process of site managers in the construction industry and identified the challenges they face. We have also examined the potential of AI and digital technologies to assist in decision-making and proposed a set of conceptual models to support the site decision-making process. The findings of this research provide valuable insights into the complexities that site managers face and offer innovative approaches to mitigate risks by improving

158

G. Sharma et al.

decision-making efficiency and efficacy. By leveraging the models, site managers can benefit from automation, data analysis, and predictive modelling to make quick and informed decisions. These tools eventually have the potential to alleviate the stressful situations under which site managers operate. The proposed conceptual models integrate hardware platforms such as LiDAR, Metaverse, AR, and VR with AI algorithms to enhance decision-making capabilities. These models enable tasks such as verifying executed components, ensuring the safety of personnel, accessing past project experiences, and adhering to schedules in terms of time and cost. Further research and development are required to refine and validate the proposed conceptual models and integrate them into real-world construction practices. To further enhance the research, we intend to conduct a close ended survey with a larger sample size, involving domain experts, to establish a taxonomy of challenges. This taxonomy will aid in shortlisting the most relevant conceptual models for further investigation. Subsequently, demonstrators will be developed and deployed on construction sites to validate the selected use cases. Overall, by addressing the identified issues and challenges through the implementation of AI-based decision support tools, the construction industry can potentially achieve improved efficiency, productivity, and project outcomes. This could be realised by harnessing the potential of AI and digitalization to optimise operations, reduce reworks and failures, ensure safety and security, and harmonise workflows. Acknowledgements Firstly, we would like to express our gratitude to NCC AB and HÖ Allbygg AB, Luleå kommun (the Luleå municipality) for granting us access to their sites and personnel, which greatly facilitated the smooth execution of this research. Furthermore, we extend our deep appreciation to the funding agencies SBUF, Smart Built Environment, and Norrbottens Byggmästareföreningen for their trust in the project concept and their financial support.

References 1. Hossain A, Nadeem A (2019) Towards digitizing the construction industry: state of the art of construction 4.0. In: 10th International structural engineering and construction conference. ISEC, pp 20–25 2. Abioye SO et al (2021) Artificial intelligence in the construction industry: a review of present status, opportunities and future challenges. J Build Eng 44:103299 3. Davidson MJ, Sutherland VJ (1992) Stress and construction site managers: issues for Europe 1992. Empl Relat 14(2):25–38 4. Pan Y, Zhang L (2021) Roles of artificial intelligence in construction engineering and management: a critical review and future trends. Autom Constr 122:103517 5. Young D, Panthi K, Noor O (2021) Challenges involved in adopting BIM on the construction jobsite. EPiC Ser Built Environ 2:302–310 6. Styhre A, Josephson PE (2006) Revisiting site manager work: stuck in the middle? Constr Manag Econ 24(5):521–528 7. Poon SW, Price ADF (1999) Decisions made on construction sites. In: 15th Annual ARCOM conference. Liverpool, UK

A Conceptual Model for AI-Enabled Digitalization of Construction Site …

159

8. Bughin J, Hazan E, Ramaswamy S, Chui M, Allas T, Dahlstrom P, Henke N, Trench M (2017) Artificial intelligence: the next digital frontier? 9. Rao S (2022) 10 Examples of artificial intelligence in construction. https://constructible.tri mble.com/construction-industry/the-benefits-of-ai-in-construction 10. Hwang BG, Ngo J, Teo JZK (2022) Challenges and strategies for the adoption of smart technologies in the construction industry: the case of Singapore. J Manag Eng 38(1):05021014 11. Woodhead R, Stephenson P, Morrey D (2018) Digital construction: from point solutions to IoT ecosystem. Autom Constr 93:35–46 12. Bartlett K, Blanco JL, Fitzgerald B, Johnson J, Mullin AL, Ribeirinho MJ (2020) Rise of the platform era: the next chapter in construction technology. McKinsey & Company 13. Sezer AA, Thunberg M, Wernicke B (2021) Digitalization index: developing a model for assessing the degree of digitalization of construction projects. J Constr Eng Manag 147(10):04021119

Risk-Based Dependability Assessment of Digitalised Condition-Based Maintenance in Railway—Part II–Event Tree Analysis (ETA) Peter Söderholm and Per Anders Akersten

Abstract One vital contribution of Reliability-Centred Maintenance (RCM) is the definition of potential failure, which led to the concept of Condition-Based Maintenance (CBM) being accepted as one of the best ways of preventing functional failure. To enable CBM, the condition of an item must be monitored by Condition Monitoring (CM) of some critical functions. The CM results in collected data that represent the system’s condition in some way. Diagnostics and prognostics is then concerned with the interpretation of collected condition data and the conclusion drawn about the item’s current and future condition. On the basis of the diagnostic and prognostic information, decisions about appropriate CBM can be made. The purpose of the risk-based dependability assessment described in this paper, is to support a decision whether or not a railway infrastructure item should be covered by a new digitalised inspection solution for CM to enable improved CBM. Hence, the dependability assessment indicates ‘which’ functions or items that should be covered by a digitalised solution for inspection (CM) and ‘why’ they should be covered. ‘How’ the coverage should be solved, i.e., by which technical solutions, is not included in this paper. The proposed dependability assessment is based on stakeholder requirements, through a combination of a Failure Modes, Effects & Criticality Analysis (FMECA) and an Event Tree Analysis (ETA) with three major decision points where Maintenance Significant Items (MSIs) for further analysis are identified. This paper focuses on the ETA-part and the third MSI-selection of the proposed approach. Keywords Reliability-centred maintenance (RCM) · Event tree analysis (ETA) · Condition monitoring (CM) · Condition-based maintenance (CBM) · Railway · Infrastructure

P. Söderholm (B) Trafikverket and Quality and Logistics at Luleå University of Technology, Luleå, Sweden e-mail: [email protected] P. A. Akersten Linnæus University, Växjö, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_12

161

162

P. Söderholm and P. A. Akersten

1 Introduction One example of requirements related to CM and CBM within railway can be related to an innovation procurement at Trafikverket (Swedish transport administration). The innovation procurement focuses on new digital solutions for improved asset management of the railway infrastructure and increased punctuality. One area of application is CBM of track that meets Trafikverket’s needs of solutions that considers a combination of, e.g., regulations and technology. The vision is that the innovations contribute to active management of a sustainable and digitalised asset management practice based on dynamic maintenance programs and CBM. The main purpose of the innovation procurement is to improve productivity and effectiveness through reduced traffic disturbances caused by the rail infrastructure’s condition and its maintenance. This may be achieved by a reduction of corrective maintenance that affect traffic and an improvement of the preventive maintenance. Regarding CBM, criteria for applicability are that the [1, 2]: . condition should be detectable. . degradation should be measurable. . interval between potential failure (failure event) and functional failure (fault state), i.e., the PF-interval, should be long enough to enable both inspection of condition and associated maintenance task to prevent functional failure. . P-F-interval should be consistent. Effectiveness criteria for condition-based maintenance are that it should [1, 2]: . reduce the probability of safety critical faults (during planned rail traffic) to an acceptable level. . improve cost-effectiveness of faults that are not safety critical, i.e., the cost of preventive maintenance is lower than the consequential costs related to traffic management, operation and maintenance.

2 Method and Material The work presented in this paper is performed within the research project “ASSET– Active, Systematic Management for More Effective Asset Management” [3]. The overall aim of the project is to contribute to a sustainable, resilient and competitive transport infrastructure through increased ability to actively manage a dynamic regulatory framework. The project is carried out in accordance with Trafikverket’s main process “Research and develop innovation”. Empirical data from Trafikverket is collected through document studies, interviews, observations and databases. The analysis of qualitative empirical material is mainly based on international dependability standards (e.g., IEC’s 60300 standards) in order to relate Trafikverket’s needs and potential solutions to a coherent framework based on internationally agreed upon best

Risk-Based Dependability Assessment of Digitalised Condition-Based …

163

practices. Also theories related to continuous improvement are applied to describe the need and implementation of changes. Central parts of these theories are found in management system standards (e.g., ISO 9000 [4] for quality and ISO 55000 [5] for asset management), which provide an opportunity to support the practical application of the dependability standards. Quantitative data is mainly analysed with Pivot diagrams in Excel, but also with the help of Trafikverket’s business analytics tool (LUPP, i.e., SAP Business Objects), which is applied within the business unit Maintenance. The analysis is based on logic within Failure Modes, Effects & Criticality Analysis (FMECA), Reliability-Centred Maintenance (RCM) and Event Tree Analysis (ETA). The applied dependability assessment is based on stakeholder requirements, through a combination of a FMECA and an ETA with three major decision points where Maintenance Significant Items (MSIs) for further analysis are identified. A basic assumption in the work is that inspection remarks with a related action time of three or more months indicate that the maintenance is applicable. The reason is that, in most cases, they do not lead to traffic disturbances. However, inspection remarks with a shorter action time than three months indicate a lack of applicability or practice as they usually lead to traffic disturbances.

3 Results The results of the study are presented according to the proposed dependability assessment. However, only the ETA-part and third MSI-selection are described in this paper. The FMECA and the other two MSI-selections are presented in another paper [6].

3.1 Event Tree Analysis (ETA) ETA is an inductive procedure to model the possible outcomes that could ensue from a given initiating event and the status of countermeasures provided (IEV 192-11-09) [7]. In this assessment the ETA is applied to compare two design alternatives for CM that enables CBM. The first design alternative is existing inspection consisting of manual and automated inspection of track. The second design alternative is with a new introduced digital inspection that replaces some of the current manual or automated inspections. Three basic design influences on the new digital inspection are criticality, design, and fault rate. Hence, the digital inspection should be related to the general design, operational requirements, and the reliability of each system. The extent and complexity of the digital inspection must be balanced to the increase in fault rates and costs involved. For linear assets, it is beneficial if the added digital inspection as far as practical utilise operational functions (e.g., train in regular traffic and not built-in test in the infrastructure). For point assets, built-in-test equipment may be

164

P. Söderholm and P. A. Akersten

considered. Independent, the new digital inspection should also be failsafe, which means that a fault in a digital inspection function should not cause a fault in the normal function. This can be achieved by having the ordinary manual and automated inspection as redundancies to the digital inspection in order to ensure safety. This redundancy can be applied during the demonstration of safety and dependability, where after they can be omitted, if the digital inspection is proven to be sufficient. The balance between fault detection capability and the risk for false alarm should be set based on the reliability of different inherent items and the consequences of undetected faults and false alarms. The reliability of different inherent items and the consequences of faults may be displayed in the FMECA. However, the consequences of undetected faults and false alarms may be more clearly displayed in the ETA, since different scenarios can be compared to each other. It is also important that the tolerances for each test level, depot level included, are coordinated and optimised to get a high degree of performance verification, low false alarm rate, and a minimum of track items circulating between the different test-levels, which all are connected to the No Fault Found (NFF) phenomenon. Two major rail system phases are operation and maintenance. The operational phase can be analysed in light of different transport profiles, e.g., (different types of) goods or passenger traffic (e.g., high-speed rail). In this document the maintenance phase of a track system is analysed. The maintenance phase for the track is assumed to be the possession time between train passages. During this phase the track is brought to a safe state by appropriate type of protection, maintenance (e.g., inspection and tasks related to inspection remarks) is performed and the track possession is ended. The time to travel to the specific location along the track is only relevant in the case of acute corrective maintenance, since it will add to the traffic disturbances. In all other cases, the arrival time of the maintenance crew can be managed in relation to the planned possession and should not affect the train traffic. In this document, maintenance times for fault consequences with different criticality are analysed by a comparison between the alternatives with or without a new digital inspection that replaces part of the current manual and automated inspections. How the track’s total maintenance times (and costs) on any inherent item level, with and without an introduced new digital inspection, can be compared to each other to decide upon the appropriate design alternative is also described.

3.1.1

Input Requirements Deployed from the General Requirements

Input requirements to the ETA can be derived from the Technical Specifications for Interoperability (TSI) deployed to the inherent track items and national documents (e.g., maintenance programs). The requirements on the new digital inspection should be accurate and consistent, compatible with the subsystem, and also possible to verify and validate. Examples of safety requirements deployed to subsystems are maximum allowed probability for certain events, and the identification of catastrophic and critical faults (e.g., based on Common Safety Targets, CSTs [8], and Common Safety Indicators,

Risk-Based Dependability Assessment of Digitalised Condition-Based …

165

CSIs [9]). Some further examples of requirements of the digital inspection that are related to safety, dependability and costs may be measured by availability (e.g., down time) and maintenance requirements, Mean Time Between Failures (MTBF), and Mean Time Between Repairs (MTBR). There are also available criteria and levels for acceptance, e.g., in the CSM-RA (e.g., acceptance models and level of failure frequencies), common safety targets (CST), EN 50126 [10, 11] (e.g., risk matrix), and EN 601508-1 [12] (e.g., Safety Integrity Level, SIL). Further requirements connected to fault localisation are availability, maintenance intervals, and allowed maintenance time per operating hour. The majority of these requirements might be explored by the aid of the proposed ETA. However, there are also additional maintainability requirements such as the interaction with technicians and external test equipment, which mainly is not covered by the proposed ETA. An exception might be the consequences when the manual or digital test and the automated test have different test efficiency and also might give different test results. Recommended testability requirements can be found in the standard IEC 607065 (Maintainability of equipment—Part 5: Testability and diagnostic testing) [13]. Examples of quantitative values for test efficiency are test time, fault detection capability, false alarm rate, mean time to localise a fault, and fault localisation capability. Fault detection coverage (fault detection capability) is the percentage of detected faults with respect to the total number of known faults. Fault isolation coverage (fault localisation capability) is the percentage of faults that will be correctly isolated to the recoverable interface given that a fault has occurred and has been detected. False alarm rate is the ratio of false alarms (alarms that have no identifiable fault associated with them) to the total number of alarms. Another important requirement to be aware of when designing the digital inspection is the probability of digital inspection failing to detect a fault, which is the probability that an alarm is not triggered, but the actual system is indicating that there is a fault or failure present.

3.1.2

Input Requirements Derived from the FMECA

Examples of input requirements from the FMECA are failure modes and their frequencies, severity, and criticality, but also current management and control. Interfacing requirements of the digital inspection can be derived from neighbouring systems, central digital (including data recording), display and control systems. These requirements may be collected and displayed in the FMECA-sheet. An important requirement related to digital inspection is train passage probability, which may be defined as the product of two probabilities. These probabilities are the probability that there are no hidden faults present in the infrastructure that can lead to a cancelled train passage before the transport is started, and the probability that no such fault will occur during the time that the transport is performed. The track system’s functions contains faults that without appropriate inspections may be hidden between each train passage. These hidden faults must be prevented to allow traffic. The first probability is based on the probability for the faults included in

166

P. Söderholm and P. A. Akersten

the FMECA, which also is the initiating probability in the ETA. The presented riskbased dependability assessment is primarily concerned with the probability of hidden faults, which may lead to corrective maintenance that disturbs traffic (depending on the criticality). However, the other probability and its consequences can also be analysed in different transport phases if other event trees than described in this paper are derived. The FMECA is often primarily performed from an operational perspective, which generates safety, dependability, and maintenance requirements on the digital inspection. Catastrophic and critical faults should be further analysed. The frequency of inspection and preventive maintenance (i.e., based on operational time), is decided based on the total probability of each hazard.

3.1.3

Two Different Event Trees

In the following sections two different event trees will be described. The basic logic behind the applied event trees as used in this paper was originally presented by Datta and Squires [14]. However, some modifications have been performed in relation to the assumptions of the probabilities of different events, which influence the consequences of different scenarios. The performed criticality assessment is also different, since it explores separate consequences, which also are affected by the adjusted probabilities. The modified assumptions also affect the performed minimisation of the expected loss and thereby the criteria for selecting one inspection solution over the other. A presentation of the proposed event trees without any railway application can also be found in Söderholm [15]. The first event tree (Fig. 1 and Table 1) describes different scenarios and consequences of faults in an inherent track item not covered by any added digital inspection, i.e., the current normal inspection (manual and automated). The second event tree (Fig. 2 and Table 2) describes scenarios and consequences of the same fault on an inherent track item covered by a new digital inspection. . X = Event = SF (Subsystem Fault), NID (Normal Inspection Detects), or RD (Retest at maintenance Detects). For the event tree with an introduced new digital inspection, the event ID (Introduced Digital inspection detects) is added. . X = Event occurs . X = Event does not occur . Pr(X) = Probability that event X occurs . RT = Retest at maintenance detects fault . UT = Unscheduled time of maintenance (i.e., corrective maintenance that disturbs traffic), UT(Flt ) without any introduced new digital inspection (i.e., normal inspection) and UT(Fli ) with an introduced digital inspection . LT = Lost time equivalent of cancelled train passages . NFF = No Fault Found

Risk-Based Dependability Assessment of Digitalised Condition-Based …

167

Fig. 1 Event tree 1: without an introduced additional digital inspection

Table 1 Subsystem fault not covered by an introduced additional digital inspection No

Sub-system fault (SF)

Normal inspection detects (NID)

Retest at maintenance detects (RD)

Consequences

Sequence logic

No

1

SF

NID

RD

RT + UT(Flt )

SF NID RD

1

2

SF

NID

RD

RT + LT + NFF

SF NID RD

2

3

SF

NID

–

LT

SF NID

3

4

SF

NID

RD

RT + UT(Flt )

SF NID RD

4

5

SF

NID

RD

RT + NFF

SF NID RD

5

6

SF

NID

–

0

SF NID

6

It should be noted that when comparing event tree 1 (Fig. 1 and Table 1) with event tree 2 (Fig. 2 and Table 2), the symbols NI, NID, and RD may represent different numerical values in the different event trees.

3.2 Third Maintenance Significant Item Selection Based on the suggested ETA it is possible to preliminary decide upon if the analysed fault should be covered by a new digital inspection or not. However, the final decision

168

P. Söderholm and P. A. Akersten

Fig. 2 Event tree 2: with an introduced digital inspection

must be based on how the digital solution should be realised. Therefore, the outcome of the ETA acts as a basis for selection of functions or inherent items that should be included in a further analysis, focusing on how the MSIs should be covered by a new digital inspection. The inclusion of a new digital inspection contributes with increased automatic fault detection coverage, an increased fault localisation capability, its inherent false alarms, and its inherent missed detections.

3.2.1

Time-Based Criticality Assessment

The rationale for choosing time as a measure of consequence severity is that there are stakeholder requirements related to availability and different maintenance times. The inspection and maintenance times are applied in order to compare the two design alternatives with or without a new digital inspection. The scheduled maintenance times depends on the inherent item’s reliability and the severity of fault consequences. When adding a digital inspection some of the current manual and automated inspections can be replaced. The scheduled maintenance time for these digital inspections is denoted STi (Fd ). The rest of the scheduled inspections continue to be performed using parts of the ordinary inspections. The scheduled maintenance time for these previous inspections is denoted STn (1−Fd ). One criterion in the selection between added digital inspection or not may be the

Risk-Based Dependability Assessment of Digitalised Condition-Based …

169

Table 2 Subsystem faults covered by an introduced digital inspection No Sub-system Normal Normal fault (SF) inspection inspection (NI) detects (NID)

Introduced Retest at Consequences Sequence No digital maintenance logic inspection detects (RD) detects (ID)

1

SF

NI

NID

–

RD

RT + UT(Flt ) SF NI NID RD

1

2

SF

NI

NID

–

RD

RT + LT + NFF

SF NI NID RD

2

3

SF

NI

NID

–

–

LT

SF NI NID

3

4

SF

NI

–

ID

RD

RT + UT(Fli ) SF NI ID 4 RD

5

SF

NI

–

ID

RD

RT + LT + NFF

SF NI ID 5 RD

6

SF

NI

–

ID

–

LT

SF NIID

6

7

SF

NI

NID

–

RD

RT + UT(Flt ) SF NI NID RD

7

8

SF

NI

NID

–

RD

RT + NFF

SF NI NID RD

8

9

SF

NI

NID

–

–

0

SF NI NID

9

10 SF

NI

–

ID

RD

RT + UT(Fli ) SF NI ID 10 RD

11 SF

NI

–

ID

RD

RT + NFF

SF NI ID 11 RD

12 SF

NI

–

ID

–

0

SF NI ID

12

scheduled maintenance time. According to this criterion the reduction in scheduled inspection time in track that affects traffic due to the introduction of digital inspection should be larger than the increase in track possession time for maintenance of the system for digital inspection, i.e. STi (Fd ) < STn (1−Fd ). In addition to the scheduled maintenance times for inspection, unscheduled maintenance times (i.e., corrective maintenance that disturbs traffic) occur due to faults in inherent items. These unscheduled maintenance times are further explored in the ETA. Event tree 1, illustrating the case with present manual and automated inspections (Fig. 1), and event tree 2, illustrating the case with an added digital inspection (Fig. 2), starts with the same fault. The consequences in the event trees are expressed in terms of inspection and unscheduled maintenance times. However, because of the addition of digital inspection there may be additional consequences measured in unscheduled maintenance times, UT(Fli ). The unscheduled maintenance time for inherent items not covered by an added digital inspection is denoted UT(Flt ). Since the consequences mainly are of the same kind (but with different magnitude) independently of how the faults are covered, i.e. by an added digital inspection or

170

P. Söderholm and P. A. Akersten

the present inspection, an initial ordinal ranking of the different scenarios’ criticality can be based on their consequences, where the differences in criticality between the cases with and without any additional digital inspection primarily is seen in the frequencies. It is also possible to choose between the digital inspection or not by a minimisation of the overall expected loss (required inspection and maintenance times), which is further discussed later on in this section.

3.2.2

Highest Criticality

The highest severity is the scenarios where the consequence is lost traffic (LT) due to traffic restrictions and related corrective maintenance (cannot be included in the existing train plan) in response to the inspection remark with a retest time (RT) and a related No Fault Found (NFF) contribution. These consequences can be found in scenario 2 for the alternative without any added digital inspection (Table 3) and scenario 2 and 5 for the alternative with an added digital inspection (Table 4). In order to decrease the criticality of the scenarios with the highest severity, the digital inspection or normal inspection must be coordinated with the retest at the resulting corrective maintenance task where inspection remarks are managed. The reason for this is that the different test levels should detect the same faults in order to avoid a prolonged retest time (RT), a No Fault Found (NFF) classification, and finally lost traffic (LT) due to an actual fault. In worst case, the fault that is not detected at retest as part of the resulting corrective maintenance are safety related and may lead to an accident. Therefore, the type of inspection that can be coordinated with the following retest at following corrective maintenance in the most effective way should be chosen. Since the fault leads to lost traffic and possible safety consequences, it Table 3 Normal inspection (without any added digital inspection) scenario with highest severity No

Sub-system fault (SF)

Normal inspection detects (NID)

Retest at maintenance detects (RD)

Consequences

Sequence logic

No

2

SF

NID

RD

RT + LT + NFF

SF NID RD

2

Table 4 Digital inspection scenarios with highest severity Normal No Sub-system Normal fault (SF) inspection inspection (NI) detects (NID)

Digital Retest at Consequences Sequence No inspection maintenance logic detects detects (RD) (ID)

2

SF

NI

NID

–

RD

RT + LT + NFF

SF NI NID RD

5

SF

NI

–

ID

RD

RT + LT + NFF

SF NI ID 5 RD

2

Risk-Based Dependability Assessment of Digitalised Condition-Based …

171

Table 5 Normal inspection scenario with second highest severity No

Sub-system fault (SF)

Normal inspection detects (NID)

Retest at maintenance detects (RD)

Consequences

Sequence logic

No

3

SF

NID

–

LT

SF NID

3

Table 6 Digital inspection scenarios with second highest severity No Sub-system Normal Normal fault (SF) inspection inspection (NI) detects (NID)

Introduced Retest at Consequences Sequence No digital maintenance logic inspection detects (RD) detects (ID)

3

SF

NI

NID

–

–

LT

SF NI NID

3

6

SF

NI

–

ID

–

LT

SF NIID

6

must be detected at both maintenance levels, so there is no alternative where the first level inspection could be omitted in order to avoid retest times and NFF. It should be noted that the retest time only considers corrective maintenance with possession time in track and not intermediate or depot levels.

3.2.3

Second Highest Criticality

The second highest severity is the scenarios where the consequence is lost traffic (LT). These consequences can be found in scenario 3 for the normal inspection case (Table 5) and scenarios 3 and 6 for the digital inspection case (Table 6). The lost traffic is caused by the fact that the inspection fails to proactively detect a critical system fault, which requires corrective maintenance that cannot be planned without any effect on traffic. In worst case, the lost traffic is related to an accident. In order to decrease the criticality of this scenario the digital inspection shall be selected if its fault detection probability is higher than the normal inspection that it replaces, i.e., Pr(ID|SF)>Pr(NID|SF).

3.2.4

Third Highest Criticality

The third highest criticality is the scenarios where the consequence is a retest time (RT) combined with a No Fault Found (NFF) contribution due to preventive maintenance (that can be planned within the existing train plan) initiated by a false inspection remark. These consequences can be found in scenario 5 for the normal inspection case (Table 7) and scenario 8 and 11 for the digital inspection case (Table 8).

172

P. Söderholm and P. A. Akersten

Table 7 Normal inspection scenario with third highest severity No

Sub-system fault (SF)

Normal inspection detects (NID)

Retest at maintenance detects (RD)

Consequences

Sequence logic

No

5

SF

NID

RD

RT + NFF

SF NID RD

5

Table 8 Digital inspection scenarios with third highest severity No Sub-system Normal Normal fault (SF) inspection inspection (NI) detects (NID)

Introduced Retest at Consequences Sequence No digital maintenance logic inspection detects (RD) detects (ID)

SF

NI

NID

–

RD

RT + NFF

SF NI NID RD

8

11 SF

NI

–

ID

RD

RT + NFF

SFNI ID RD

11

8

In this scenario there is actually no fault present in the system. However, the first level inspection gives a false alarm and the retest at following preventive maintenance correctly does not indicate any fault. In order to decrease the criticality of this scenario, the digital inspection should be selected if its false alarm probability for the specific fault is lower than the normal inspection that it replaces, i.e., Pr(ID|SF) < Pr(NID|SF).

3.2.5

Fourth Highest Criticality

The fourth highest criticality is the scenarios where the consequence is a retest time (RT) at preventive maintenance in response to an inspection remark for the analysed system (UT) that can be planned and performed without any traffic disturbances. These consequences can be found in scenarios 1 and 4 for the normal inspection case (Table 9) and scenarios 1, 4, 7, and 10 for the digital inspection case (Table 10). The addition of a digital inspection adds both complexity and faults to the rail system. This may in turn increase both the corrective (un-schedulable within existing train plan) and preventive (schedulable within existing train plan) maintenance Table 9 Normal inspection scenarios with fourth highest severity No

Sub-system fault (SF)

Normal inspection detects (NID)

Retest at maintenance detects (RD)

Consequences

Sequence logic

No

1

SF

NID

RD

RT + UT(Flt )

SF NID RD

1

4

SF

NID

RD

RT + UT(Flt )

SF NID RD

4

Risk-Based Dependability Assessment of Digitalised Condition-Based …

173

Table 10 Digital inspection scenarios with fourth highest severity No Sub-system Normal Normal fault (SF) inspection inspection (NI) detects (NID)

Introduced Retest at Consequences Sequence No digital maintenance logic inspection detects (RD) detects (ID)

1

SF

NI

NID

–

RD

RT + UT(Flt ) SF NI NID RD

1

4

SF

NI

–

ID

RD

RT + UT(Fli ) SF NI ID RD

4

7

SF

NI

NID

–

RD

RT + UT(Flt ) SF NI NID RD

7

10 SF

NI

–

ID

RD

RT + UT(Fli ) SF NI ID 10 RD

times. However, the fault localisation coverage may also be improved with a new digital inspection, which decreases the corrective maintenance times. Hence, in order to decrease the severity of these scenarios, a digital inspection should be selected if its fault localisation capability leads to a lower corrective maintenance time in track than if a digital inspection is not selected, i.e. UT(Fli ) < UT(Flt ). The test times for new digital inspections, UT(Fli ), are assumed to be small compared to present preventive manual and automated inspections, UT(Flt ), that is UT(Fli ) Pr(NID|SF)

Retest time (RT) Select digital inspection if its false alarm rate is lower than the and no fault found normal inspection, i.e., Pr(ID|SF) < Pr(NID|SF) (NFF)

Tables 7 and 8

Retest time (RT) and unscheduled maintenance time (UT) in track

Tables 9 and 10

Select the digital inspection if its fault localization capability leads to lower unscheduled maintenance times in track than if normal inspection is selected, i.e., UT(Fli ) < UT(Flt ) Select the digital inspection if its probability for a false alarm is lower than that of the normal inspection, i.e. Pr(ID|SF) < Pr(NID|SF) Select the digital inspection if its fault detection probability is higher than for the normal inspection, i.e., Pr(ID |SF) > Pr(NID |SF)

None (0)

Select the digital inspection if it gives the smallest amount of

Tables 11 and 12

false alarms, i.e., Pr(ID|SF) < Pr(NID|SF) Minimization of expected loss

Select the digital inspection if it expected loss is smaller: • Unscheduled maintenance times in track, i.e., E(TDigital inspection ) < E(TNormal inspection ) • Scheduled and unscheduled maintenance times, i.e., {E(TDigital inspection ) + STi (Fd )} < {E(TNormal inspection ) + STn (1−Fd )}

Section 3.2.7

Acknowledgements This paper is a deliverable within the research and development project (R&D) “ASSET” (TRV 2022/29194) at Trafikverket.

References 1. Nowlan FS, Heap HF (1978) Reliability-centered maintenance. Report AD/A066–579. National Technical Information Service, US Department of Commerce, Springfield, Virginia 2. IEC 60300–3–11 (2009) Dependability management–Part 3–11: Application guide–Reliability centred maintenance. IEC 3. TRV 2022/29194–ASSET–active, systematic management for more effective asset management. R&D project, Trafikverket, Luleå 4. ISO 9000 (2015) Quality management systems–fundamentals and vocabulary. ISO

178

P. Söderholm and P. A. Akersten

5. ISO 55000 (2014) Asset management–overview, principles and terminology. ISO 6. Söderholm P (2023) Risk-based dependability assessment of digitalised condition-based maintenance in railway—Part I—FMECA. In: Proceedings of industrial AI conference 2023 7. IEV (2023) International electrotechnical vocabulary (IEV). http://www.electropedia.org/ Accessed 28 February 2023 8. 2013/753/EU: Commission implementing decision of 11 December 2013 amending decision 2012/226/EU on the second set of common safety targets for the rail system 9. ERA-GUI-02–2015–Implementation guidance for CSIs–Annex I of directive 2004/49/EC as amended by directive 2014/88/EUIEC 10. EN 50126–1 (2017) Railway applications–the specification and demonstration of reliability, availability, maintainability and safety (RAMS)–part 1: generic RAMS process. IEC 11. EN 50126–2 (2017) Railway applications–the specification and demonstration of reliability, availability, maintainability and safety (RAMS)–part 2: systems approach to safety. IEC 12. EN 61508–1 (2011) Functional safety of electrical/electronic/programmable electronic safetyrelated systems–part 1: general requirements. IEC 13. IEC 60706–5 (2009) Maintainability of equipment—part 5: Testability and diagnostic testing. IEC 14. Datta K, Squires D (2004) A methodology to quantify some IVHM requirements during RLV conceptual design. In: Proceedings of RAMS, pp 485–491 15. Söderholm P (2005) Maintenance and continuous improvement of complex systems: linking stakeholder requirements to the use of built-in test systems. Luleå University of Technology, Luleå

Making Tracks—Combining Data Sources in Support of Railway Fault Finding R. G. Loe

Abstract In this paper we describe recent Swedish trials to combine different data sources to support fault finding on railway vehicles and in railway operations in Sweden. The background to the trials, trials aims, the conduct of the trials and the data sources used are described in brief. The results of the trials are described using several case studies. Practical limitations and constraints are discussed together with the ways in which these might be overcome. Finally, the paper describes the way in which it is hoped to apply such methods to support future railway fleet management and operations. Keywords Railway · Fleet management · Asset management · Data analysis · Fault analysis

1 Introduction AB Transitio own and manage rolling stock for regional passenger train services in Sweden. A key aim for the company is to provide comfortable and well-maintained rolling stock which delivers reliable and punctual train services. To assess how data from different sources might be combined and used to improve fault finding and support fleet management a series of field trials have been conducted. This paper describes three case studies from these trials and discusses the results obtained. The potential benefits of sharing data from different sources to improve reliability and maintenance has long been recognised [1], as have some of the difficulties [2]. Previous studies have considered how data from different sources might be combined to support infrastructure maintenance and there is some literature regarding the application of such methods to support rolling stock fleet management [3]. Modern railway rolling stock produce large amounts of data, data which fleet managers could use to reduce disruptions in train services by improving rolling R. G. Loe (B) AB Transitio, Drottninggatan 92, 111 36 Stockholm, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_13

179

180

R. G. Loe

stock availability and reliability [4]. Combining data from rolling stock with data from signalling systems and driver log reports offers an opportunity to refine data analyses and so improve reliability centred maintenance of rolling stock with the aim of reducing train service disruptions [5]. The starting point for such improvement must necessarily be improved fault finding, since faults are the root cause of train service disruptions. For this reason AB Transitio, Trafikverket and SJ AB agreed to conduct field trials together in order to assess what benefits could be gained by each organisation in their respective roles (AB Transitio as rolling stock owners, Trafikverket as railway network operators and SJ AB as train operators and rolling stock owners). The trials were led by Transitio and form part of a larger initiative known as Tillsammans för Tåg i Tid (TTT, Together for Trains on Time), which aims to improve train service punctuality and reliability.

2 Trials Aims The overall aim of the field trials was to assess if data from rolling stock could be combined with data from signalling systems and driver logs to identify the causes of train service disruptions. Within this overall aim there were three specific trials aims: 1. To assess how many train service disruptions were directly and indirectly caused by Transitio owned rolling stock; 2. To assess which train services and which line sections and hence which workings suffered most train service disruptions, and 3. To systematically assess which faults caused most train service disruptions.

3 Terminology For the purposes of this paper it is necessary to define and explain certain terms used in railway operations. Operational data is used as a collective term for data from signalling and railway network traffic management systems. On-board data is the data which is collected and recorded on rolling stock in traffic, used primarily for fault finding. The working timetable is the name given to the complete timetable for a railway network. Unlike the public timetables used by passengers the working timetable shows all services on the network: passenger, freight, empty rolling stock, light engines, maintenance trains etc. The working timetable can be tabular but is often presented in the form of a graph. A path is the term for the route and timings for a given train service on the network. In a graphical working timetable it takes the form of a line across the graph, and in

Making Tracks—Combining Data Sources in Support of Railway Fault …

181

a signalling centre it takes the form of a route set using signals and points along the lines and through any junctions the train passes. A working is the term given to the collection of train services operated by a locomotive or multiple unit over a period of one or more days. Locomotives or multiple units allocated to a working are said to be booked on a working. A turn is the name given to the services which a driver is scheduled to operate during the day. Turns often include parts of several different train workings, which means that drivers may drive several different locomotives and/or multiple units during one day. Drivers are said to be booked on a turn.

4 Trials Conduct 4.1 Preparations Before the trials could begin it was necessary to select both the rolling stock fleet and the routes/traffic area to be used in the trials. It was decided to use Transitio’s ER1 class electric multiple units, partly because they have the most modern on-board data recording systems and partly because these units are used in a single traffic area which makes identifying vehicle workings and the associated paths in the working timetable significantly easier. Once the rolling stock fleet had been selected it was possible to identify which data sources would be used in the trials. Data source selection was driven by the need to identify which units were used on which workings and hence in which train services, so that data from the on-board data recording systems could be combined with the correct operational data from Trafikverket’s systems. In addition to the on-board data recording systems the data sources used were Trafikverket’s planning and signalling systems together with Transitio’s own fleet management and maintenance system. In the latter case extra information was available as the rolling stock concerned was still covered by the manufacturers guarantees which requires extra information to be recorded in order to allocate responsibility for repairs. To support analysis of the data an internet-based portal page was created to collect and present data. This was done for IT-security reasons, to allow users from Transitio, Trafikverket and SJ AB to combine and analyse data from different sources without any impact on operational IT-systems.

4.2 Data Collection and Correlation To collect data from the various systems it was first necessary to select a given time period for the trials. The month selected was September 2022, mostly to avoid weather and seasonal issues, and data from Trafikverkets systems was collected first.

182

R. G. Loe

On-board data was also identified and collected, but the sheer volume of data from the rolling stock made this somewhat unwieldy. Data from Trafikverkets operational planning and signalling systems was then fed into the portal page. A first assessment of the data was made, to ensure that the data was of suitable quality for analysis. This first required that that the data time tags be checked for correctness and then correlated to a common reference time. After the data from Trafikverket had been time correlated it had then to be correlated with the time tags for the on-board data, before positional data from the on-board GPS data records were correlated with positional data from the signalling systems. Once this had been done the drivers reports were collected and correlated with the operational and on-board data. This proved to be significantly more difficult, partly because the reports were written after the drivers had completed their turns and so lacked accurate timings, and partly because the reports did not always contain information on times or places for the events reported.

4.3 Data Analysis 4.3.1

Traffic Analysis

The first aims of the trials was to assess how many train service disruptions were directly and indirectly caused by Transitio owned rolling stock, and the second aim of the trials was to assess which train services and which line sections and hence which workings suffered most train service disruptions. These assessments were done using a traffic analysis. Once the data correlation had been completed an initial traffic analysis was conducted. The aim of this traffic analysis was to identify if there were any particular routes and/or train services which suffered greater delays than others, the second aim of the trials. The idea behind this was that if certain routes or services suffered significant delays then these would also be the services that caused most disruption to other trains, the first aim of the trials. Although such a traffic analysis can be done using analysis tools in things like Excel it proved easier to visualise the raw data graphically using a combination of bubble charts, area plots and stacked graphs. Visualising the data in this way allowed quicker identification of times, places, routes and train services where delays occurred, although it should noted that this requires that the analyst has knowledge and experience of both the routes concerned and the working timetable. The initial traffic analysis showed only one systematic delay which could be attributed to routes, places or particular train services. A more detailed traffic analysis was then conducted using close examination of the working timetable, presented in graphical form. This showed that the identified systematic delay occurred on a single line route with passing loops and was due to the passenger train service in question crossing a southbound long distance freight train from the far north of Sweden. Examination of operational data for the freight train showed that this train was often

Making Tracks—Combining Data Sources in Support of Railway Fault …

183

delayed, for a wide variety of reasons. Delays to the southbound freight train meant that either the planned passing loop had to be changed or that the passenger train had to wait longer for the freight train, both of which would delay the passenger train. Use of the graphical working timetable made identification of the likely root cause of the delays easier and quicker, in that it could be seen where trains cross and which trains were involved. Such an analysis could also be done using software-based tools, but a visual inspection of the graph by an experienced analyst is quicker. Deeper examination of the remaining delays using the graphical working timetable did not provide evidence of systematic causes such as routes, places or particular train services, suggesting that the delays concerned were more likely to be due to rolling stock faults. This led to a second traffic analysis, one which attempted to identify how many direct and indirect delays were caused by Transitio owned rolling stock. Given that services in the traffic area selected were operated only with Transitio rolling stock it was easy to identify the number of direct delays, since these were all the delays to passenger services in the traffic area. In a traffic area where rolling stock belonging to different owners and operators are used it would be necessary to analyse specific workings, and these vary by day and week. Measuring the number of indirect delays, that is delays to other services such as freight trains, did not prove possible. The reasons for this were the complexity of the workings for other services, particularly long distance services and commuter services. Delays to long distance passenger and freight services can be cumulative with several different causes, and without a full traffic analysis for each service it is not possible to attribute delay causes. For commuter services the intensity and tight turnarounds for the various workings mean that small delays accumulate very quickly, leading to changes in booked workings being made dynamically by the train operating company. Such changes help maintain the services but mean that it is very difficult to track causes of indirect delays without following each individual commuter train.

4.3.2

Case Study Identification

Given that no further systematic delays had been identified, the collected data was then inspected and three cases of delays selected for further analysis, with the aim of supporting the third trials aim, that of systematically assessing which faults caused most train service disruptions. The idea was to assess whether it was possible to identify systematic or underlying faults from the data available. Case study selection was based partly on the reported length of delay, and partly on the train service and route where the delay occurred, the aim being to ensure that a wide range of routes and services were covered. Once suitable cases had been selected it was necessary to identify the multiple unit trains involved. This required both the train service number and the date of the delay, since these were needed to search the workings for the date concerned to find the train service and hence the multiple unit train allocated to the service on that day.

184

R. G. Loe

When the multiple unit train concerned had been identified the on-board data could be searched using time as a filter to reduce the volume of data to be analysed. It quickly became apparent that the time window had to be fairly wide, as any underlying faults might have been present and recorded in the on-board data for a long time period. The time windows selected covered the period from the point at which the multiple unit was started and activated for service on the date concerned until a point 90 min after the train had restarted after the delay, which in practice meant at or shortly after the train had reached the destination for the delayed service.

4.3.3

Case Study 1

The first case selected was a service from Nyköping Central to Norrköping Central, and then (after a reversal) to Stockholm at the end of September 2022, where a 28 min delay was reported by the driver on the run south to Norrköping, and a further 20 min delay on the run north to Stockholm. The driver had reported the cause of the first delay as an inability to accelerate after crossing other trains and suggested that there was a fault with the unit as a whole. On the next part of the journey the driver reported that there seemed to be a limitation in the power available from the insulated-gate bipolar transistor (IGBT) traction converters providing the power to the traction motors. Having apparently lost power from three of the four traction converters the driver stopped at Ålberga and reset the power system. This seemed to cure the problem and the train could continue to Stockholm. Having first time correlated the data, the on-board data log was searched to see what system faults had been recorded in the time up to the power system reset. This did not reveal any evidence of traction converter faults, nor any other faults in the power system as a whole. However, following the system reset a temporary fault in the traction converters was recorded followed by a short limit on the available power. No further faults were recorded for the rest of the day in the multiple unit concerned. The on-board data did not indicate any faults, yet the driver clearly had difficulties keeping time, and this was solved by resetting the power system. This suggests that the on-board data system either does not cover all system components or that some data is not recorded and so is lost. However, the subsequent drivers of the multiple unit concerned did not report any trouble on that day and one driver even managed to recover a 6 min delay on a later journey. This suggests that the first driver made a mistake either when starting the unit or when driving the unit, so causing the delays, which in turn suggests that the operating company may need to revise their driver training program.

4.3.4

Case Study 2

The second case involved a multiple unit on a service from Linköping Central to Stockholm. The driver reported a 24 min delay after being forced to stop at Gistad

Making Tracks—Combining Data Sources in Support of Railway Fault …

185

having received an alarm about a brake system fault. The fault concerned had not previously been seen by the driver and he had to ring for assistance in order to solve the problem, which added to the delay. The fault was resolved by doing a hard reset of the brake system, turning the power off then on again. The fault did not recur during the rest of the day, but the driver did request that it be followed up under guarantee. Once the collected data had been time correlated, the on-board data log was searched to see what system faults had been recorded in the time up to the brake system reset. This revealed that the brake lock warning system had reported a fault before the train had even left Linköping Central—and that the driver had simply acknowledged the warning message. The warning messages had been repeated several times and each time the driver had acknowledged the message but taken no further action. Finally, a little before the train reached Gistad, the brake lock warning system had given an alarm message and the driver stopped the train. A deeper examination of the on-board data showed that the problem was probably caused by an overflow error in the software controlling the brake lock warning system, which the hard reset of the system fixed. The case suggests that the drivers require more training in how warning messages should be handled. A simple reset of the system before the train departed Linköping Central would have avoided all delays on the journey. One result from this case study—which is relevant to the original aim of the field trials—is that there is a potential systematic fault in the form of a software error in the brake lock warning system. Correctly written and validated software should not give overflow errors. An examination of on-board data from the rest of the fleet would be needed to confirm the existence of a systematic fault, and investigations are ongoing.

4.3.5

Case Study 3

The final case selected involved a multiple unit working a northbound service from Arboga to Uppsala via Stockholm. The driver reported losing 24 min on the run to Eskilstuna, this as a result of two emergency brake applications, both unintended and uncommanded by the driver, one of which was from a speed of 200 km/h. The seriousness of the incidents meant that the drivers report was studied closely before any data analysis began. The multiple unit concerned had been driven by three drivers on the day concerned, and the first driver of the day had apparently had four similar faults on a northbound service. This had been mentioned verbally in a hand-over to the second driver who in turn had mentioned this to the third and final driver. An immediate search was made for a report from the first driver, but none was found. The third drivers report indicated that the problem with the emergency brake system only occurred on northbound services, suggesting that the underlying problem was confined to one of the driving cars in the multiple unit. This was confirmed by an analysis of operational data based on the workings, which showed no delays or reported faults for those southbound services on which the multiple unit concerned

186

R. G. Loe

was used during that day. The operational data showed that previous northbound services had recorded delays of 20 min, which were described as being due to Automatic Train Control (ATC) faults. This information was then used to define an initial time period for the search of the on-board data. The on-board data confirmed that there appeared to have been emergency brake applications at the times reported as ATC faults in the operational data. However, the on-board data also revealed that the same emergency brake code was reported each time the driving unit was activated. The reason for this is that when the driver activates the unit a brake test is performed and as part of this test an emergency brake application is made. A key lesson for future attempts to automate searches and analyses of on-board data is that allowances must be made for the codes reported during start and activation of rolling stock. Once the unit activation sequences had been identified and removed from the search results the remainder of the on-board data was searched for similar emergency brake codes. The search results showed that uncommanded and unintended emergency brake applications had been made 22 times during the time period concerned, nearly 4 times as many as had been reported by the drivers. Some of these applications were very short, only a few seconds, and might not have been noticed by the drivers. A deeper analysis showed that the fault was not transient or due to driver error, and the multiple unit concerned was taken out of service for immediate repair. The first result from this case study, which is relevant to the original aim of the field trails, was that there was a systematic fault with the brake system of the multiple unit concerned. Given the potentially serious nature of the fault a fleet-wide analysis was undertaken, but this did not show the fault to be systematic across the whole fleet. This case also shows that on-board data can be used to detect systematic faults which otherwise might be regarded as transient faults. Such detection is useful as a support for fault finding and could, provided that analysis could be done in near real time, be used to ensure that trains with faults in safety critical systems are taken out of service. What this case also shows is the importance of reporting chains. It is clear from the operational data that Trafikverkets signalling staff knew that the multiple unit had suffered potential ATC system faults but this does not appear to have been relayed to the train operating company, and as the first driver did not report the fault both the train operating company and Transitio (the owners of the multiple unit) remained unaware of the problem until the third driver reported the fault.

5 Discussion Data combination has long been seen as a way to improve fleet management, both in the railway industry and in the wider transport sector. Overall the field trials showed that it is possible to combine data from different sources to conduct systematic analyses of faults and service delays.

Making Tracks—Combining Data Sources in Support of Railway Fault …

187

Data combination requires that the practical problems of taking data electronically from different sources be solved. There is currently no easy way of doing so, and the complex structure of railway operations in Sweden and in many other countries may make this difficult given the number of IT-systems involved. Data volumes also need consideration. Modern rolling stock like the ER1 multiple units generate huge amounts of data from their on-board systems. The limited number of case studies meant that this was not an issue during the field trials, but detailed examination of the on-board data took time, a situation only eased by applying tight time limits on the data being studied. Case study 3 showed the potential benefits of being able to analyse data in near real time, in that a unit with a potentially serious defect could have been take out of traffic as soon as the systematic fault was identified. Given the volumes of data involved this would require some way of identifying which data and hence which sub-systems have the highest priority for analysis. Even then there would probably need to be some limit on how often data was sampled, and this would in itself bring a risk that some faults were missed. However, the trials also showed that there may be additional benefits to be gained from combining data. The ability to show that there was no underlying technical fault on a multiple unit meant that it was also possible to assess how drivers had driven their trains and handled reported faults, which could then be used to show that there might be deficiencies in the training of drivers. This was not expected and shows that on-board data may have wider utility than simply maintenance, particularly when combined with operational data. Such analysis results bring with them a number of personal integrity issues, since the ability to analyse how individual drivers have driven their trains and handled faults also makes it possible to (mis)use the results of such analyses to victimise individual drivers. Any large scale analyses of combined data would need to be anonymised to ensure that individual drivers could not be identified. The ability to analyse underlying faults and their causes does require some form of reporting from drivers. For these trials such reports were available in part as a result of the multiple units concerned being under manufacturers guarantee. Driver reports give context to the operational and on-board data and allow a qualitative dimension to be taken into account in the analysis. Ideally one would wish to debrief all drivers at the end of their turns, but this is neither practical nor cost-effective. Encouraging drivers to report in some form of IT-system is one way of obtaining such contextual information, but this requires that drivers have the time to write such reports at the end of their turns. One important lesson from the trials is the need for detailed knowledge of the way in which on-board systems start and the codes which are written to the on-board data log during the start-up and activation sequences. Case study 3 showed quite clearly that a correct brake test gave the same code in the on-board data log as an uncommanded emergency brake application caused by a brake system fault, and any attempt to automate fault finding and fault analyses using AI-systems must allow for and separate out codes caused by correct sub-system operations. This requires a very detailed knowledge and understanding of system operations down to sub-system level. This brings with another challenge in the form of commercially confidential

188

R. G. Loe

information, e.g., vehicle design, vehicle operation and performance, and the way in which the vehicle systems are configured and activated. Such information must be protected, and this places extra information security requirements on any AI-system (or indeed any other IT-system) which are used to automate fault analyses. One of the interesting aspects in case study 2 was the identification of the underlying fault in the brake lock warning system. The analysis had shown that the brake lock warning system has sent several error messages to the driver followed by an alarm message, but it was only after a very detailed examination of the on-board data log that the underlying cause was identified. That identification required an understanding of what might cause an out-of-range error in the output from a sub-system, and the ability to infer from this that there was an error in the software used in the sub-system. The ability to make such inferences requires both technical knowledge and engineering experience, and any attempt to build an AI-system capable of fault finding at such a level would require significant amounts of data to train the system. Traffic analysis was in part done visually, using a graphical presentation of the correlated data from different sources. By presenting the data visually it was possible to see which delays might be significant and systematic. From this the use of the graphical version of the working timetable allowed very quick identification of potential causes, since it is relatively quick and easy to identify ripple effects of delays when the information is presented in this way. This method builds on individual experience of railway operations which allows the person to make deductions and inferences based on data presented. Whilst it would be possible to train an AI-system to do this there might not be much benefit in doing so, since such a system might not be any quicker than an experienced human.

6 Conclusions The field trials have shown that it is possible to combine data from rolling stock with data from signalling systems and driver logs to identify the causes of train service disruptions. It is also possible to assess how many train service disruptions are caused by Transitio owned rolling stock, and it is likely that this could be applied to rolling stock belonging to other fleet owners. Assessing which train services and which line sections and hence which workings suffered most train service disruptions is also possible, but the trials did not attempt to automate this process. Systematic assessment of which faults caused most train service disruptions was also shown to be possible. The field trials had relatively limited aims, but the analyses conducted showed that combining operational and on-board data can also identify issues in areas such as driver training and fault reporting. The potential benefits of such data combination extend to more than just improving fleet management. Much of the analysis was done using manual methods. It would probably be possible to train an AI-system to do such analyses, but this requires a detailed understanding of how railway services are operated, a very detailed knowledge of how

Making Tracks—Combining Data Sources in Support of Railway Fault …

189

on-board data from rolling stock is structured and some way of capturing human knowledge and experience. This suggests that a more rational and effective approach may be to use AI-systems to support systematic fault finding by doing large scale data analyses and pattern recognition before a human takes the results and uses their knowledge and experience to identify underlying causes.

7 The Way Ahead The trials showed that combining and analysing data from operational and on-board systems has the potential to identify the causes of train service disruptions and any underlying or systematic faults that cause these delays. Such improvements are intended to be applied in future IT-systems used to support fleet management. However, such analyses require the ability to process large amounts of data, and further work is necessary to develop the AI-systems needed to do this. Any such systems will need to deal with the issues of personal integrity and commercial confidentiality identified earlier. This has already placed additional information security requirements on coming IT-systems to support fleet management, requirements which must be met if the systems are to deliver their full potential. Further work is also required to identify which data and sub-systems should have the highest priority for analysis if it is found necessary to have near real time data analysis. This is not a self-evident requirement and may vary depending on what role the organisation has in railway operations. Further discussions are necessary within the industry. Acknowledgements Our thanks to Trafikverket and SJ AB for their support in the conduct of these trials.

References 1. Parkinson HJ, Bamford G (2017) A journey into railway digitisation. Stephenson Conference: Research for Railways, October 2017, pp 333–340. 2. Tucker GJ, Hall A (2014) Breaking down the barriers to more cross-industry remote condition monitoring (RCM). IET Conf Publ 2014(CP631):1–6 3. Burchell AK, Green SR (2008) Improving fleet performance by automatic analysis of enhance “black box” OTMR data. In: Proceeding of the 4th IET international conference on railway condition monitoring 4. Ward CP, Weston PF, Stewart EJC, Li H, Goodall RM, Roberts C, Mei TX, Charles G, Dixon R (2011) Condition monitoring opportunities using vehicle-based sensors. Proc Inst Mech Eng Pt F J Rail Rapid Transit 225(2):202–218 5. Karim R, Dersin P, Galar D, Kumar U, Jarl H (2021) AI factory—a framework for digital asset management. In: Proceedings of the 31st European safety and reliability conference. pp 1160–1167

Simulation Environment Evaluating AI Algorithms for Search Missions Using Drone Swarms Nils Sundelius, Peter Funk, and Richard Sohlberg

Abstract Search missions for objects are relevant in both industrial and civilian context, such as searching for a missing child in a forest or to locating equipment in a building or large factory. To send out a drone swarm to quickly locate a misplaced item in a factory, a missing machine on a building site or a missing child in a forest is very similar. Image-based Machine Learning algorithms are now so powerful that they can be trained to identify objects with high accuracy in real time. The next challenge is to perform the search as efficiently as possible, using as little time and energy as possible. If we have information about the area to search, we can use heuristic and probabilistic methods to perform an efficient search. In this paper, we present a case study where we developed a method and approach to evaluate different search algorithms enabling the selection of the most suitable, i.e., most efficient search algorithm for the task at hand. A couple of probabilistic and heuristic search methods were implemented for testing purposes, and they are the following: Bayesian Search together with a Hill Climbing search algorithm and Bayesian Search together with an A-star search algorithm. A swarm adapted lawn mower search strategy is also implemented. In our case study, we see that the performance of the search heavily depends on the area to search in and domain knowledge, e.g., knowledge about how a child is expected to move through a forest area when lost. In our tests, we see that there are significant gains to be made by selecting a search algorithm suitable for the search context at hand. Keywords Simulation environment · AI · Search missions · Drone Swarm · Search and rescue · Swarm · Drones · Optimization

N. Sundelius (B) · P. Funk · R. Sohlberg Mälardalen University, Universitetsplan 1, 721 23 Västerås, Sweden e-mail: [email protected] P. Funk e-mail: [email protected] R. Sohlberg e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_14

191

192

N. Sundelius et al.

1 Introduction Performing a search mission width an UAV swarm can be divided into two subtasks, one is when the UAV swarm is performing a strategy to find and visit areas of interest to search, the other is when they investigate the area of interest with the onboard sensors to identify objects of interest, e.g., using anomaly detection [1]. The former can be defined as a high-level search strategy and the latter as a low-level search strategy. This paper is focused on high-level search strategies. In case there is prior knowledge of where a target might be, you can use a probability map to assist in the search for the location of the target. A method for searching this kind of map is to use the Bayes Search Algorithm [2] to update the map after a cell visited to the lower probability where nothing is found and to update every other cell with a higher probability with the help of a mathematical formula. Additional information can be found in Sect. 2.3. If prior knowledge is insufficient, different search patterns can be utilized. In this paper, a method and approach is developed to test and evaluate different search algorithms to enable the selection of the most suitable for the mission. Three search algorithms are implemented and investigated for testing purpose. One is a lawn mower like search, the drone swarm just patrols the map just like they were mowing a lawn. The second is a hill climbing approach and the third an A-star approach, both using the Bayes Search Algorithm.

1.1 Problem Formulation When performing a search mission, time is often of the essence. In a search and rescue mission, for example, the time it takes to be found can play a relevant role in the well-being of the concerned person or persons. The same is also true in a safety and protection context, e.g., in a patrol mission to find unauthorized drones near critical infrastructure to aid in threat engagement missions [3]. Here time is equally of essence but for other reasons, e.g., preventing damage to critical infrastructure. These are some of the reasons why it is important to have a search algorithm that finds what it is looking for as quickly as possible. Once a target is found, necessary measures can be performed. In this paper, a simulation environment and an approach is proposed to evaluate different search algorithms using drone swarms.

1.1.1

The Search Area

The search area is divided into a rectangular grid which consists of cells of equal size. These cells are assigned probabilities of the target being there or not, these probabilities are based on prior knowledge of the specific target and its environment. This can be known previous locations, unprobeable locations, movement patterns etc. In Fig. 1, some examples of 10 × 10 grid maps can be seen. The colours of

Simulation Environment Evaluating AI Algorithms for Search Missions …

193

Fig. 1 Examples of 10 × 10 probability maps. Green indicates a low probability of the target being there, red indicates a high probability, and the rest of the colours are the probabilities in between

the cells indicate the level of probability, if it is a high, medium, or low probability cell. Where red indicates a high probability of that the target may be there, yellow a medium probability and green a low probability. The rest of the colours are gradients in between red, yellow and green they represent the probabilities that are in between these probabilities.

1.1.2

The Model of the Movement Capability of the Drone Agents

The drone agents are seen as particles that can navigate the search area that is defined as a grid map. They can move on the grid map in 8 directions, these are the cardinal directions of the eight connected cells that surround a drone agent, i.e., north, northeast, east, southeast, south, southwest and west as depicted in Fig. 2. No other physics are simulated. Fig. 2 The figure shows the movement capability of a drone agent, it can move in the directions, to the eight connected cells, shown in the figure, i.e., north (N), northeast (NE), east (E) southeast (SE), south (S), southwest (SW), west (W) and northwest (NW)

N NE

NW W SW

E SE S

194

N. Sundelius et al.

2 Background The following sections contain descriptions of all tools and algorithms to understand this paper. It begins with an introduction to unmanned aerial vehicles and how they can communicate with each other, then continues to explain Bayes Search Theory, A-star, and local search hill climbing.

2.1 Unmanned Aerial Vehicles There exist many configurations and types of Unmanned Aerial Vehicles (UAVs) or unmanned drones, e.g., zeppelins, fixed wings, helicopters, tricopter, quadcopter, hexacopters and octocopters etc. Common ways to control an UAV can be either remote controlled by a human or pre-programmed with a desired behaviour, i.e., that they fly autonomously [4]. To fly autonomously, a set of necessary sensors is required to make the UAV aware of its surroundings, pose, position and orientation, in order to avoid obstacles. Examples of such sensors can be, Inertial Measurement Unit, Global Positioning System, computer vision, lasers etc. These can be used together with sensor fusion technologies or in separate [4].

2.2 Drone Swarm Communication There exist many communication and interaction strategies between UAVs in a swarm. Common such strategies include different variations and usages of Ground Control Systems (GCS) and flying adhoc networks (FANET) [5]. An example of an architecture where a GCS is used is the Infrastructure-based swarm architecture [6]. It uses a GCS to gather telemetry, send commands, receive messages etc., from each drone and all communication goes through the GCS. The drone can be pre-programmed with a behaviour or fully controlled by the GCS. One of the advantages of this style of architecture is that it allows for the possibility to do all the heavy computations on the GCS, which makes it possible to use smaller drones or make the drones use less energy. A drawback is that the drone swarm is dependent on the GCS. On the other hand, the FANET requires only one of the UAVs to have communication to a base station or a satellite. The rest of the communication occurs within a network between the drones. A benefit of using this architecture is that the decisionmaking is distributed between the drones, this means that they do not need to be orchestrated by a GCS. Drawbacks may be that the UAVs need to carry more gear in terms of computation and communication equipment, that can increase energy consumption.

Simulation Environment Evaluating AI Algorithms for Search Missions …

195

2.3 Bayes Search Theory This search method was developed during the World War II, when the allies were trying to find German submarines. It has its origins in the 1750s under the name Bayes theorem, named after the statistician Thomas Bayes. John Crave helped pioneer the use of Bayes Search Algorithm during the search of USS-Scorpion, 1968, a skipjackclass nuclear-powered submarine that had disappeared during a mission and finding [2]. The Bayesian search contains four fundamental steps: • First step: Divide your search area into a grid map and put in a priori-probabilities • Second step: Search in the cell with the highest a priori probability – If you find the target, the search is finished. • Third step: – If the target is not found, reduce the probability of that cell, and increase the probability of the other cells correspondingly. The a priori-probabilities are now a posteriori-probabilities. • The fourth step: Repeat. The a priori-probabilities from the beginning of the search are based on assumptions before any data is collected and are based on expert data and how well the sensors can detect the target if it is there. The last search posteriori-probabilities are the next search a priori probabilities [2]. The name of this search theory gives a clue to what theorem is being used to update the cells in the grid map, namely Bayes theorem, see Eq. 1. P( A|B) =

P(B|A)P( A) P(B)

(1)

Which means, the probability of A given that B has occurred. This Eq. 1 can be rewritten as seen in Eq. 2. P( A|B) =

P(B ∩ A) P(B)

(2)

In a search mission, A can denote the probability of that the target is in the search area, the cell, and B can denote the probability of that the target is not found in that area. This inserted in Eq. 2 can be. P(I s i n the search area|N ot f ound) P(I s in the search area AN D N ot f ound = P(N ot f ond)

(3)

196

N. Sundelius et al.

Now, let p denote the probability for that the target may be in a specific cell and let f be the probability to find the target if it is there, this value can be different for different search areas depending on the search difficulty in that area. Then P(A ∩ B) and P(B) can be defined as seen in Eqs. 4 and 5. P(B ∩ A) = p ∗ (1 − f )

(4)

P(B) = p ∗ (1 − f ) + (1 − f )

(5) '

With Eqs. 4 and 5 inserted in Eq. 2 the posterior probability p of the searched cell can be calculated if the target is not found. See Eq. 6. p' =

1− f p(1 − f ) =p p(1 − f ) + (1 − f ) 1 − pf

(6)

'

The posterior probability for the other cells, r is then updated with Eq. 7. r' = p

1 1−rf

(7)

where r is the prior probability of a specific cell. The f in the numerator is set to f = 0 in this case, since f is the probability of being found if the target is there, the probability is zero since the cells aren’t searched at this time.

2.4 Lawn Mower Search The Lawn Mower search pattern is moving through an area one row at the time, back and forth, similar to how you would mow a lawn. The Canadian Coast Guard in their Auxiliary Search & Rescue Crew Manual [7] call this search pattern either Parallel Track Papa Sierra (PS) or Creeping Search–Charlie Sierra (CS) depending on the circumstance. This type of search is used when the most probable position of the target is little-known. In Fig. 3 an example of what this search pattern can look like.

2.5 Local Search Hill Climbing Among the Local Search algorithms, Hill Climbing (HC) is one of the simplest. The HC algorithm starts from an initial solution and is then exploring neighbouring solutions to find a better one to move to, the movement can be either downhill or uphill depending on if the goal is to minimize or maximize the function it is optimizing. The HC algorithm in its simplest form can only move to a better solution, and it does this

Simulation Environment Evaluating AI Algorithms for Search Missions …

197

Fig. 3 The figure shows an example of a drone swarm moving in a lawn mower search pattern. The dashed lines show the flight path

until no better solution can be found. This makes it prone to get stuck in a local optima if the problem is not convex, but on the other hand, it will always find the global optimum if the problem is convex. In Fig. 3, a flowchart of a simple HC algorithm can be seen, which depicts how the objective function f(x) is being optimized by exploring the neighbouring solutions to the initial and candidate solution x (Fig. 4). But since the HC algorithm is not delivering a satisfactory result, stuck in local optima, for most nonconvex problems, a couple of different extensions have been made to mitigate this problem. Many of these extensions use an intelligent stochastic operator to avoid getting stuck in this local trap. The most common extensions are Simulated Annealing [8], Tabu Search [9, 10], Greedy Randomize Adaptive Search Procedure [11], Variable Neighbourhood Search and Iterated Local Search [12] etc. Fig. 4 A flowchart over a simple hill climbing algorithm, where x is an initial solution and f(x) is the objective function that is optimized by exploring neighbours to x N(x)

198

N. Sundelius et al.

2.6 A-Star Search Algorithm A-star is a successor to the Dijkstra’s algorithm for finding the shortest path. It adds heuristics to the Dijkstra algorithm as a mean to get a better search performance for the shortest path. At each iteration of the A-star algorithm, it uses Eq. 8 to determine which path to extend. f (v) = g(v) + h(v) ,

(8)

where g(v) is the cost it takes to go through node v from the current node, h(v) is a heuristic function that returns the estimated cost to go from node v to the target and f (v) is the total cost. To get the best performance, h(v) should be calculated to be as close to the real cost as possible or underestimate it, but never overestimate it. Some common heuristic functions are Manhattan, Euclidean and Chebyschev distance [13]. In Fig. 5, a flow chart of the A-star algorithm can be seen.

3 Related Work Giacomossi et al. [14] developed a 2D simulator to simulate UAV swarms performing a target search scenario. The dynamic model of the drone is a combination of a steering force model for guidance and potential fields for collision avoidance. One of the drawbacks of using this model is that the drones can get stuck in local minima. This problem is solved by the fact that the drones can recognize that they are stuck. When this is recognized, they try to get off the minima by trying to go in the direction of the nearest drone or, as a last resort, in the direction of another random cell. The search area is translated to a grid map that includes obstacles, which is shared and updated by all the drones in the swarm. Once a cell is visited it is marked as visited and this information is shared by all the drones so that cell is not visited again. The search strategy is that each drone visits one unvisited cell to search of the eight connected cells that surround the drone following a line search pattern, if all eight connected cell are already visited it picks a random unvisited cell on the map to go to. Three different scenarios were evaluated. The first where the target’s location is known, and no blockage heuristics are used–the drones go directly to the target. The second where there is no knowledge of the target’s location and no blockage heuristics is used, and the third is like the second but with blockage heuristics. These scenarios were run with 5, 10, 15 drones and 10, 15, 30 obstacles. There were little difference in search time between the second and third scenario, but with the blockage prevention heuristics, the number of drones getting stuck is significantly reduced. Based on the work of Giacomossi et al. [14], Bernardo et al. [15] created a simulator where they used A-star with three different heuristics in combination with three different search patterns. The heuristics used were the Euclidean distance, Manhattan distance and a combination of these two. The search patterns used were Grouped

Simulation Environment Evaluating AI Algorithms for Search Missions …

Fig. 5 The figure shows the a-star algorithm in a flow chart

199

200

N. Sundelius et al.

Columns (GC), Inner Square Sparsed (ISS), Columns Sparsed (CS) and Random Seek (RS). This is then compared to the use of steering force path planner used by Giacomossi et al. [14] using the same search patterns. They discovered that the search is more effective when the drones are more spread out doing their search, and that fewer drones get stuck in obstacles when A-star is used compared to when steering force and potential fields are used as path planner. Waharte et al. [16] performed search mission simulations using two drones and a single stationary target using a 10 × 10 grid map with assigned probabilities of where the target might be. These probabilities are updated at each observation using a Bayesian process. The two drones have their own local probability map that they share and recompute when they come in communications range of each other, and they use the steepest gradient ascent method to their local map to select the next cell to visit and search. A cooperative search strategy, using the proposed communication method, and a non-cooperative search strategy, i.e., with no communication between the drones were tested against each other. They found that the notion of where the target might be, got stronger more quickly when the drones were cooperating in contrast to when they weren’t, showing it can be of great benefit to use a cooperative search strategy.

4 Method A simulator was developed to test algorithms in a simple and fast way. It was developed both as a Python version and a C++ version (for faster execution). 2D-visualisation is available, but the simulator can also run without it. The drones communicate through a shared map using a whiteboard model of communication [17] combined with a GCS. This means that each drone can read and write on a shared map of, e.g., their findings in a mutually exclusive manner, but still have an onboard map as well. In Fig. 6, can a simplified flowchart be seen. The number of drones can be set dynamically, but the testing is done with three drones, since that is the minimum amount of drones needed to demonstrate swarm capability. Fig. 6 Flow chart of the simulator

Simulation

Drone 1 Map Drone n

2D visualization

Simulation Environment Evaluating AI Algorithms for Search Missions …

201

The search area consists of a grid map, where each cell is assigned a probability (0 to 1) of that the target is being there or not, based on knowledge of the area and an expected behaviour of the target. The movement model of the target is that it is stationary, emulating a lost object or a person who suffers from dementia that is lost, they often walk until an obstacle is in their way and then stop at that place [18]. Dynamic target models are not tested yet. To search for the target on the grid map, three search strategies are investigated. The first one, the baseline search algorithm, is a lawn mower like search where three drones patrol the map just like you would if you were mowing your yard. They start in the corner of the map and traversing it up and down or back and forth without searching in each other’s cells or searching the same cell more than once. See Fig. 3 for an illustration. In the second one, three drones search the area using Local Hill Climbing, selecting the highest probability cell of the eight nearest cells. In each visited cell where the drones do not find the target, the probability in that cell is greatly reduced, and the probabilities in all other cells on the map are increased according to the Bayes Search Theory, see Eq. 7. The third algorithm is similar to the second one, but the drones pick the destination by considering the probabilities of all cells in the whole map and the Euclidean distance to each cell, and they pick the highest probability cell which is the closest. And they use A-star as a path planner to get there. The assumption is that it takes much longer time to search a cell than fly over and not search it. It is assumed that it takes 20 time units to search a cell and 1 time √ unit to get to cell when going in the directions N, E, S and W and approximately 2 time units when going in the directions NE, SE, SW and NW. A cell is considered a low probability cell when the probability p < 0.35, a medium probability cell if 0.35≥ p < 0.70 and a high probability cell if p ≥ 0.7. In the probability maps, in Fig. 1, these probability intervals are illustrated with different colours, where green indicates a low probability of the target being there, red indicates a high probability, and the rest of the colours are the probabilities in between. The maps are generated randomly with a weighted distribution of high, medium and low probabilities. Where approximately 16 of the cells are assigned a high probability, a 13 of the cells, a medium probability and 12 of the cells, a low probability. The assumption is that the expert knowledge, which is the foundation of a good probability map, has some good data of a few places where the target may be and if it is not there the search area expands. In the preliminary tests the target is placed first in low probability cells, then in medium probability cells and finally in high probability cells, using 100 randomgenerated maps for each case. This is done to see how the algorithms perform in different scenarios, when for example, the expert knowledge behind the maps succeeds or fails to deliver trustworthy information. In Table 1, the algorithms are named with an ID and in Table 2, the results from the different searches are found.

202

N. Sundelius et al.

Table 1 Table showing the names and IDs of the algorithms used, the ID is shown inside the parenthesis Algorithm Lawn mower (LM) Hill climbing with Bayes search theory (HC) A-star with Bayes search theory (AS)

Table 2 Table of the average search time if the target is in any of the low, medium, or high probability cells. Using 100 random maps for each case Map

Random maps

Algorithm

LM HC AS

Avg. Search time in time in time units for when the target is located in a cell with the probability shown in the column header Low

Medium

High

359.90 517.14 627.11

347.50 330.16 263.17

320.00 237.01 72.60

5 Results In Table 2 below, the results are found. They show that, with the help of an expert made maps of where the target might be, if the correct assumptions are made, AS and HS outperform LM, but when the drones are led astray by bad information through poorly made probability maps, LM is the fastest.

6 Discussion High-level search mission planning, using swarms, can definitively lead to substantial improvements in efficiency in time to find the target or object of interest, which can play an important role in the success criteria’s of a search mission. The benefit of using a swarm is that large areas can be searched in a small amount of time. Furthermore, using a shared and dynamically updated probability map allows the swarm to examine the search area more efficiently and make more intelligent decisions about where to search next. In the case where the target is in a low probability cell, if wrong assumptions have been made when constructing the probability map, the LM algorithm performs better when it comes to time than HC and AS. One of the reasons for this is that HC and AS are greedy in the way that they always pick the cells with the highest probability first. This leads to that it takes some time before they start to search in the lower probability cells, or until the low probability cells have been updated enough, so it becomes a high probability cell, so it is getting picked. It also

Simulation Environment Evaluating AI Algorithms for Search Missions …

203

confirms that LM is a suitable option if there is a big uncertainty of where the object of interest may be [7]. If on the other hand, good assumptions are done, HC and AC outperforms LM. This indicates that it is important to pick the right search strategy in regard to the situation.

7 Conclusion A method and approach to evaluate different search algorithms, using UAV swarms, in a simulation environment were developed. This paper demonstrates the potential of using probabilistic methods together with heuristics to aid in the planning of search missions. Bayesian Search Theory together with Hill Climbing and A-star was shown to outperform the lawn mower search pattern in all cases except in the case where the target is in a low probability cell. This highlights, that when a probability map is used, precise assumptions are needed. Otherwise, the search is not as effective. Future work would be to test with a moving target, add no-fly zones, test other algorithms, test more maps, and add good-looking graphics. Also, testing the algorithms with real drones. Acknowledgements The authors would like to thank Vinnova, Sweden’s Innovation Agency for supporting and funding this research.

References 1. Dantas A, Diniz L, Almeida M, Olsson E, Funk P, Sohlberg R, Ramos A (2022) Intelligent system for detection and identification of ground anomalies for rescue. In: Latifi S (ed) ITNG 2022 19th International conference on information technology-new generations. Springer International Publishing, Cham, pp 277–282 2. Polson N (2018) AIQ: hur artificiell intelligens fungerar. Bokförlaget Daidalos 3. Olsson E, Funk P, Sohlberg R (2023) Using drone swarms for safety and protection against unauthorized drones. In: International congress and workshop on industrial AI 2023. Forthcoming paper 4. Sherif N, Sundelius N (2022) Room mapping for tuning of high fidelity sound systems. Master’s thesis, Mälardalen University, School of Innovation, Design and Engineering 5. Campion M, Ranganathan P, Faruque S (2018) Uav swarm communication and control architectures: a review. J Unman Vehicle Syst 7(2):93–106 6. Bekmezci I, Sahingoz OK, Temel S¸ (2013) Flying ad-hoc networks (fanets): a survey. Ad Hoc Netw 11(3):1254–1270 7. (2011). Canadian coast guard auxiliary, search & rescue crew manual. https://ccgagcac.ca/xde/ getfile.php?fid=761. Accessed 22 Feb 2023 8. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 9. Glover F (1986) Future paths for integer programming and links to artificial intelligence. Comput Oper Res 13:533–549 10. Glover F, Laguna M (1998) Tabu search. Springer

204

N. Sundelius et al.

11. Feo TA, Resende MGC (1995) Greedy randomized adaptive search procedures. J Global Optim 6(2):109–133 12. Mladenovic N (2004) A tutorial on variable neighborhood search. Les Cahiers du GERAD 13. Duchon F, Babinec A, Kajan M, B˘eno P, Florek M, Ficˇo T, Jurišica L (2014) Path planning with modified a star algorithm for a mobile robot. Proc Eng 96:59–69. Modelling of Mechanical and Mechatronic Systems 14. Giacomossi L, Souza F, Cortes RG, Cortez HMM, Ferreira C, Marcondes CAC, Loubach DS, Sbruzzi EF, Verri FAN, Marques JC, Pereira LOA, Maximo MROA, Curtis VV (2021) Autonomous and collective intelligence for uav swarm in target search scenario. In 2021 LARS, 2021 SBR, and 2021 WRE. pp 72–77 15. Bernardo GT, Vogás LM, Rodrigues SD, Lopes TG, Marcondes CA, Loubach DS, Sbruzzi EF, Verri FA, Marques JC, Pereira LA, Maximo MR (2022) A-star based algorithm applied to target search and rescue by a uav swarm. In: 2022 (LARS), 2022 (SBR), and 2022 (WRE). pp 49–54 16. Waharte S, Trigoni N, Julier S (2009) Coordinated search with a swarm of uavs. In: 2009 6th IEEE annual communications society conference on sensor, mesh and Ad Hoc communications and networks workshops. pp 1–3 17. Das S, Santoro N (2019) Moving and computing models: agents. Springer International Publishing 18. Hanna D, Ferworn A (2020) A uav-based algorithm to assist ground sar teams in finding lost persons living with dementia. In: 2020 IEEE/ION Position, Location and Navigation Symposium (PLANS). pp 27–35

Risk-Based Dependability Assessment of Digitalised Condition-Based Maintenance in Railway Part I–FMECA Peter Söderholm

Abstract One vital contribution of Reliability-Centred Maintenance (RCM) is the definition of potential failure, which led to the concept of Condition-Based Maintenance (CBM) being accepted as one of the best ways of preventing functional failure. To enable CBM, the condition of an item must be monitored by Condition Monitoring (CM) of some critical functions. The CM results in collected data that represent the system’s condition. Diagnostics and prognostics are then concerned with the interpretation of collected condition data and the conclusion drawn about the item’s current and future condition. On the basis of the diagnostic and prognostic information, decisions about appropriate CBM can be made. The purpose of the risk-based dependability assessment described in this paper, is to support a decision whether or not a railway infrastructure item should be covered by an additional digitalised inspection solution for CM to enable improved CBM. Hence, the dependability assessment indicates ‘which’ functions or items that should be covered by a digitalised solution for inspection (CM) and ‘why’ they should be covered. ‘How’ the coverage should be solved, i.e., by which technical solutions, is not included in this paper. The proposed dependability assessment is based on stakeholder requirements, through a combination of a Failure Modes, Effects & Criticality Analysis (FMECA) and an Event Tree Analysis (ETA) with three major decision points where Maintenance Significant Items (MSIs) for further analysis are identified. This paper focuses on the FMECA-part and the two related MSI-selections of the proposed approach. The approach has been used on an aggregated level to formulate goals of an innovation procurement. Later on, the methodology is expected to support verification and validation of proposed CBM-solutions on a more detailed level. Keywords RCM · FME(C)A · CM · CBM · Condition · Railway · Infrastructure

P. Söderholm (B) Trafikverket and Quality Technology and Logistics, Luleå University of Technology, Luleå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_15

205

206

P. Söderholm

1 Introduction Requirements related to CM and CBM within railway can be exemplified by an innovation procurement at Trafikverket (Swedish transport administration) [1, 2]. This procurement focuses on new digital solutions for improved asset management of railway infrastructure and increased punctuality. One area of application is CBM of track that meets Trafikverket’s needs of solutions that considers a combination of, e.g., regulations and technology. The vision is that the innovations contribute to active management of a sustainable and digitalised asset management practice based on dynamic maintenance programs and CBM. The main purpose of the innovation procurement is to improve productivity and effectiveness through reduced traffic disturbances caused by the rail infrastructure’s condition and its maintenance. This may be achieved by a reduction of corrective maintenance that affects traffic and an improvement of the preventive maintenance. For track, examples of goals to achieve this are to reduce the number of remarks from manual and automated inspections that generate corrective maintenance, e.g., due to direct traffic restrictions (immediate corrective maintenance) or too short planning horizon to enable maintenance within existing train plan (postponed corrective maintenance). Another goal is to reduce the number of inspection remarks where the planning horizon is sufficient to include the maintenance task in the train plan, but the resulting maintenance task requires more than three hours possession time (e.g., rail replacement). In addition, one goal is to reduce the number of corrective maintenance tasks that are related to potential traffic disturbances (related to insufficient preventive maintenance and events that can be prevented). It is also desirable to reduce the risk of traffic disturbances for maintenance tasks in the infrastructure related to specific inherent items (e.g., inspections for CM and resulting CBM tasks based on inspection remarks to prevent faults through coordinated planning). The above mentioned goals can be related to improvements in the selection of applicable and effective maintenance tasks based on relationships between the infrastructure’s degradation and the effect of maintenance (e.g., difference between temporary or more thorough maintenance to manage track geometry faults due to geotechnical causes). There are also goals that can be related to productivity improvements of maintenance activities such as planning, execution and assessment in line with existing regulation on operational, tactical and strategical levels. In addition, goals are related to effectiveness and dependability improvements of regulations or the reliability and maintainability of rail infrastructure and inherent items [1, 2].

2 Method and Material The work presented in this paper is performed within the two research projects “Pre-study innovation procurement” (TRV 2020/39092) [1] and “ASSET—Active, Systematic Management for More Effective Asset Management” (TRV 2022/29194)

Risk-Based Dependability Assessment of Digitalised Condition-Based …

207

[3]. The overall aim of the later project is to contribute to a sustainable, resilient and competitive transport infrastructure through increased ability to actively manage a dynamic regulatory framework, e.g., by use of FME(C)A and RCM-concept of living maintenance program. The work follows Trafikverket’s main process “Research and develop innovation”. Empirical data is collected through document studies, interviews, observations and databases. Qualitative data analysis is based on international dependability standards (e.g., IEC’s 60300 standards) to relate Trafikverket’s needs and potential solutions to a framework based on internationally agreed upon best practices. Also theories related to continuous improvement are applied to describe the need and implementation of changes. Central parts of these theories are found in management system standards (e.g., ISO9000 [4] and ISO55000 [5]), which support the application of the dependability standards. Quantitative data is analysed with Excel and Trafikverket’s business analytics tool (i.e., SAP Business Objects). The analysis logic follows FMECA, RCM and ETA. A basic assumption is that inspection remarks with a action time of three or more months indicate applicable maintenance. The reason is that, in most cases, they do not lead to traffic disturbances. However, inspection remarks with a shorter action time than three months indicate a lack of applicability or compliance as they usually lead to traffic disturbances.

3 Results The results of the study are presented according to the proposed dependability assessment methodology. However, only two of the MSI-selections and the FMECA-part are described in this paper.

3.1 Stakeholder Requirements Within EU’s interoperability directive for railway, there are some general requirements related to reliability and availability, safety, technical interoperability, accessibility, health, and environment. These requirements are further deployed in Technical Specifications for Interoperability (TSI) [6]. Regarding safety, there are Common Safety Targets (CST) [7] that can be related to the requirements (see, e.g., ERAGUI-02–2015) [7]. All these requirements should be managed according to process standards, such as EN 50126 (RAMS for railway applications) [8, 9], or regulations such as Common Safety Methods (CSM) [10]. This to enable a design, and later on a verification and validation, of the requirements. The standards and regulations also propose suitable methodologies to support the work, e.g., FMECA [11], FTA [12], RCM [13] and ETA [14]. See Fig. 1.

208

P. Söderholm

Fig. 1 Dependability management framework related to railway infrastructure availability. Dark red boxes are related to a generic rail system model in Fig. 3. Inspired by [15]

3.2 First MSI Selection Significant item is an item whose fault have safety or major economic consequences [16]. The first selection of Maintenance Significant Items (MSIs) is intended to decide upon which functions and items that should be included in the FMECA. Conditions for classifying a function or an item as a MSI that requires a further analysis are [16]: Hidden–the fault can be undetected at any time during normal traffic; Safety–the fault affects safety; Environment–the fault could have environmental impact; Operation– the fault could lead to cancelled trains or restricted traffic performance; Economic– the fault could have significant economical impact. According to the interoperability directive [17], EU’s rail system is divided into subsystems that corresponds to structurally defined subsystems such as vehicles, infrastructure, energy, trackside and onboard control-command and signalling (CCS). Another division is into functional subsystems such as operation and traffic management, maintenance, and telematics for passenger and goods (Fig. 2). In the middle of the rail system model (Fig. 2), there are critical interfaces between vehicles and infrastructure that contain MSIs according to all criteria given by [16]. One example is track as part of the mechanical interface of the infrastructure to the wheels of vehicles. The function of the track is mainly to provide a trafficable track with a geometry that is able to receive and distribute the track forces they are exposed to by passing vehicles. There are failures in the track superstructure that are hidden

Risk-Based Dependability Assessment of Digitalised Condition-Based …

209

Fig. 2 A rail system model based on a combination of [16, 18, 19]. Inspired by [15]

during normal operation and are detected by inspection activities, e.g., related to track geometry parameters such as gauge and twist that are safety-related since they can cause derailment. Hence, inspections are necessary as a proactive barrier. These inspections may result in corrective maintenance activities that disturb planned traffic since they are safety-related. There are also less critical inspection remarks that drives costs by resulting in maintenance actions that may be defined as preventive since they can be planned within the existing train plan without disturbing the traffic, e.g., track geometry parameters beside gauge and twist. There may also be other types of hidden failures in the track superstructure that are not detectable during normal operation, but is manifested when a number of factors interact. One example is insufficient ability to receive and distribute pressure forces in the longitudinal direction of the track, which may result in so-called sun curves and potentially derailment. Hence, track is one possible first selection of a MSI to analyse further.

3.3 FMECA FMECA is a quantitative or qualitative methodology of analysis that involves failure modes and effects analysis together with a consideration of the probability of the failure mode occurrence and the severity of the effects (IEV 192–11-06) [19]. The work order suggested for the proposed FMECA corresponds to the following steps, where aspects of RCM [16] also are included: . What are the system’s functions and associated desired standards in its current operational environment (functions)? . In what ways can the system fail to fulfil its functions (functional fault)? . What causes each functional fault (fault mode)? . What happens when each fault occurs (fault effects)?

210

P. Söderholm

. In what way does each fault matter (fault consequences)? . What should be done to predict or prevent each fault (proactive actions and action intervals)? . What to do if a suitable proactive action cannot be found (standard actions)? The system that is to be analysed and its relationship with other systems in its environment have to be mapped. The relationship may be physical, mechanical, thermal, electrical, or any other possible interrelationship. In Fig. 2, a number of interfaces can be identified, e.g., in the central part of the rail system model. The technical systems have interfaces to each other, but also with organisational capabilities necessary for operation and traffic management, maintenance (including renewal) and modification (including investment and upgrading). Another critical interface is between the rail system’s delivered performance in relation to general requirements (e.g., regarding availability and safety) found in regulations (EU) 2016/797 [17], as well as other stakeholders’ requirements, needs and expectations. A required function is considered necessary to fulfil a given requirement (IEV 192-04-04) [19]. For example, the railway infrastructure is intended to deliver a performance expressed as a required function of successful train passages according to present train plan. This is enabled by a technical performance, expressed as required functions related to load, speed and gauge. In addition, this means that the required functions on all track system inherent levels has to fulfil a desired availability performance. Hence, there are design solutions to improve the rail systems reliability (e.g., redundancies, by multiple sleepers or fastenings) or maintainability (e.g., possibility to easily replace broken rails). In addition, availability is affected by the organisational maintenance support performance, e.g., logistic and administrative capabilities. Furthermore, the maintenance support must have resources to comply with regulations such as maintenance programs, i.e., for preventive maintenance tasks and their interval for different parts of the rail infrastructure. Parts of the preventive maintenance program contain CBM tasks. This CBM consists of inspections for CM and related maintenance tasks to manage resulting inspection remarks. Resources for inspections (CM) are part of the maintenance support and can be manual or automated. Besides the required function of geometry, the track should also be able to receive and distribute the track forces transferred by passing vehicles, i.e., (quantitative values in TSIs): . Pressure forces in the longitudinal direction of the track due to braking and accelerating vehicles or temperature changes. . Static and dynamic forces from axle and wheel loads and impacts from wheel plates. . Lateral forces such as frictional forces, centrifugal forces and sinusoidal motion from carriages. . Opposing forces in lateral and longitudinal direction. As described earlier the track is a critical system, both from a traffic management, operation and maintenance perspective. It is also part of the infrastructure’s critical interface with rolling stock, i.e., between rail and wheel. This interface to rolling stock represents the track’s required functions at an overall infrastructure level, i.e., track

Risk-Based Dependability Assessment of Digitalised Condition-Based …

211

geometry. At lower levels within the track system, one fault is “not able to provide a track geometry that enables safe and comfortable train passage”. This fault can be due to the track’s inability to receive and distribute pressure forces in the longitudinal direction of the track. These faults can in turn depend on single or multiple faults on lower system levels, which may be identified in the FMECA. Example of lower level faults are (quantitative values in TSIs): . The ballast has too little lateral displacement resistance to hold the sleepers in place. . The fastenings provide insufficient resistance to hold the rails in place. . The track settles due to insufficient resistance to applied forces. Possible failure modes related to the faults related to insufficient track geometry are (quantitative values in TSIs): . Un-compacted ballast, lack of ballast or poor quality ballast (which leads to that the ballast has too little lateral displacement resistance to hold the sleepers in place). . Poor or missing fastenings (which leads to that the fastenings fail to hold the track in place). . Mud seeping out of the ballast (which leads to settlement in the track). As described above, some common causes of poor track geometry are material fatigue and aging. Another common cause is inadequate maintenance, which are not effective as they result in recurrent track geometry faults, often due to geotechnical reasons (substructure built long ago, e.g., vegetation beds at mires).

3.3.1

Fault Effect, Fault Consequence

One level of analysis is the fault effect on the analysed system, e.g., the inherent system level where maintenance is performed in track (i.e., Line Replaceable Unit, LRU). This describes the way in which the failure or fault is manifested, e.g., insufficient track geometry. The information here may be connected to fault detection, and thereby be valuable in fault localisation. This information is one important input to the following ETA, see [20]. The second level of analysis is the fault effects on neighbouring system. Three types of proximity that may be considered are functional, informational, and physical. When the proximity is due to function and information both failures and faults should be considered. One example can be that that track geometry faults could affect the catenary system since a small displacement and related increase in forces on track level might be magnified by the lateral distance between the two systems, which becomes interlinked by passing trains. The final level of analysis is the fault effect on the railway system. The effect on the railway system level may either be direct due to a failure or fault of the analysed system, or indirect through a failure or fault of an affected neighbouring system. In worst case, the track geometry fault is detected by a derailment. If the track geometry fault is detected by inspection and requires maintenance that cannot be performed within

212

P. Söderholm

existing train plan, there are traffic disturbances due to stopped or restricted traffic and corrective maintenance actions. However, if the inspection results in a remark that requires maintenance that can be performed within the existing train plan, it is possible to plan and will only lead to preventive maintenance cost and no traffic disturbances.

3.3.2

Current Management or Control

Existing means of assistance for detection, such as manual and automated inspections, should be identified. In the case of track geometry, the main preventive maintenance actions are manual maintenance and safety inspections of the tracks inherent items, but also automated inspections with measurement wagons focusing on the track geometry parameters. Besides fault detection, fault localisation is part of the diagnostics. Manual inspection of track focuses mainly on functions of inherent items such as ballast, sleeper, fastener, and rail. The automated inspections by measurement wagons focuses mainly on the overall track geometry function. Since track is a liner asset (cf., point assets such as switches and crossings or level crossings), the localisation of faults is done both geographically (GPS or kilometre and metre) and within the inherent levels of the railway infrastructure. The actions that are necessary to restore the system to a desirable state may be corrective maintenance tasks. These are performed in response to faults and failures in the track system detected by inspections or passing trains. Examples of maintenance tasks related to the track system are the replacement of faulty or missing inherent items (e.g., fasteners, sleepers, ballast or rail) or tamping to align the track geometry. Common corrective maintenance actions related to bad track geometry are adjustment, ballast consolidation, track geometry inspection, tamping and levelling (see Fig. 3). Even in cases where there is no known cause, the same corrective maintenance tasks are used. In cases where the track geometry fault is train disturbing, the share of track alignment is higher than if the fault is not disturbing traffic. One reason can be to reduce the time for traffic disturbances by a relative fast and effective corrective maintenance action. However, in a longer time perspective, the actions may in some cases be seen as temporary. One example is places where there are geotechnical causes for track geometry faults and related corrective maintenance actions are, e.g., adjustment, ballast consolidation, track geometry inspection, tamping and levelling. Under these circumstances, the maintenance should in many cases be considered as temporary actions that likely are not sufficient to correct the root cause of the track geometry fault. See Fig. 4, where there seems to be reoccurring track geometry faults at some places where the corrective maintenance actions exemplified above might have limited long-term effects.

Risk-Based Dependability Assessment of Digitalised Condition-Based …

213

Fig. 3 Common corrective maintenance actions related to bad track geometry and if they disturb traffic or not

Fig. 4 Examples of reoccurring track geometry faults where maintenance actions have limited long-term effects

3.3.3

Failure Mode Frequency

The frequency of failure modes can be classified according to different scales, e.g., according to examples in EN 50126 (RAMS) [7, 8]. There are also examples of different requirement on the acceptable frequency of different failures, e.g., in CSMRA [9] and CSM CST [5]. One important quantity to clarify is the time-parameter, which may be, for instance, calendar time, operation time, or number of cycles. A multidimensional time concept may be necessary. Regarding the safety inspection interval for railway in Sweden, the combination of speed and load are used to identify the number of necessary inspections per year. Additional parameters can be introduced by statistical analysis of inspection remarks and faults to determine appropriate inspection intervals.

214

P. Söderholm

Most of the acute and weekly inspection remarks from the automated inspections through machine measurements that results in corrective maintenance actions are related to twist and gauge faults. The reason for this is probably that they are safety– critical due to the risk of derailment. Other track geometry faults are on a lower level and results in monthly remarks that normally can be managed by planned preventive maintenance actions without affecting the traffic. The most common reasons for acute and weekly inspection remarks from the manual safety inspection are missing fastenings, which is managed correctively. The next level of frequency for these inspections remarks are related to rails and track gauge expansion linked to fastenings, which also is managed correctively. From the manual maintenance inspections the most common remarks are related to missing ballast and fastenings, which normally can be managed by preventive maintenance. Here, it should be noted that the manual maintenance inspections normally have the longest time horizon since they should focus on Life Cycle Cost (LCC). Thereafter, the manual safety inspections should have an intermediate time horizon since they focuses on inherent items that may affect the stability of track and thereby the track geometry. The automated safety inspections performed by measurement wagons can be used to monitor changes in level and pace of track geometry degradation. Hence, the time horizon of the automated inspections depends on the predictive capability of track specialists. However, the practice of using inspections remarks from any inspection type for planning purposes will ultimately affect the balance between corrective and preventive maintenance actions. The track fault that generates most corrective maintenance actions, but also the most number of disturbed trains and the most delays, is poor track geometry. The absolute most common reported cause of track geometry faults is related to insufficient geotechnical conditions (old substructure expensive to correct), followed by no identified cause and on a third level fatigue or aging. Inherent items that disturbs traffic most extensively are in descending order track, rail, ballast, sleeper and fastening. The top three causes of bad track geometry (for the three groups all fault, train disturbing fault or not train disturbing faults) are in descending order geotechnical, unknown causes and fatigue or degradation. Hence, it seems as if there is a scope for an additional digital inspection that might support the identification of causes that today are unknown. Figure 5 displays the number and relative frequency of different faults in the track system during the years 2015–2019 in Sweden. The frequency can also be displayed in other diagrams, e.g., Pareto charts. The track fault that generates the most corrective maintenance, but also the most disturbed trains and the most delays, is poor track geometry (see Fig. 6). Fault causes that are not possible to define or are not any fault are largely due to external causes, e.g., vehicles with impact wheels.

3.3.4

Severity

The severity of different failures can be classified as exemplified by EN 50126 [8, 9]. On an aggregated level, the severity of faults can be classified according to its

Risk-Based Dependability Assessment of Digitalised Condition-Based …

215

Fig. 5 Number and relative frequency of different faults in the track system during 2015–2019 in Sweden Fig. 6 Causes of track geometry faults, Sweden 2015–2019

consequences and related maintenance actions, e.g., maintenance or safety inspections. The maintenance inspections covers faults that mainly will effect dependability requirements. The safety inspections covers faults that effect aspects such as safety, environment or work environment. In Sweden, the severity of inspection remarks from railway infrastructure inspections can be classified according to the available time to manage them by related maintenance actions. Acute inspection remarks, requires cancellations or restrictions of train traffic until corrective maintenance is performed. Weekly inspection remarks, affect train traffic since necessary corrective maintenance tasks must be performed within two weeks and thereby will affect the train plan. Monthly inspection remarks, will in most cases result in preventive

216

P. Söderholm

maintenance tasks that can be planned into the train plan without any traffic disturbances. However, there are exceptions (e.g., rail replacement), where even though the planning horizon is sufficient, the time of required maintenance is longer than the available possession time of three hours. Inspection remarks with longer planning horizon than three months (e.g., year), requires preventive maintenance that can be included in the train plan without any traffic disturbances. Maintenance actions can also be initiated by other events than inspections, e.g., by observations from train personnel, weather events or accidents, which results in corrective or preventive maintenance.

3.3.5

Criticality

A criticality (risk) matrix can be applied to decide upon if the ETA should be performed for the analysed function and its failure modes [20]. The risk is a combination of the severity and possibility of the fault. It is important that the identification of possible causes is made in detail only when the risk is high, i.e., a high severity of the fault effect, or by a very high fault frequency (or both). The risk estimation may be based on CSM-RA [10] or EN 50126 (RAMS) [8, 9] and any suggested risk matrix that may be found therein. In Sweden, the frequency of different failure modes are related to a combination of highest allowed speed and accumulated yearly load of train traffic. Based on this combination five different inspection classes (I1-I5) are outlined, to select an appropriate number of inspections per year to proactively detect and manage failures by planned preventive maintenance actions. Inspection class I1 represent the lowest number of inspections per year and I5 the highest number of inspections per year. As a complement to this minimum number of yearly inspections per year, statistical analysis of field data can be used to further increase the number of inspections per year. Regarding automated inspection of track by measurement wagons the inspection intervals for different inherent items are, e.g., geometric position of rail (1–6 times per year) and ballast profile (¼–1 times per year). Figure 7 illustrates different criticalities for automated inspection of track in Sweden for the years 2015–2019. One criticality is the relative frequency of three different severity levels of inspection remarks (increasing severity on ordinal scale Acute, Week, Month) for five different severity levels of inspection classes (increasing severity on ordinal scale I1I5). Acute and Week remarks indicate that the applied CBM not are applicable, while Monthly remarks indicate applicability. It is seen that for automated track inspection, the proportion of Acute and Week remarks tends to decrease as the inspection interval decreases (increasing inspection class). However, the inspection interval must be reduced in all inspection classes to avoid Acute and Week remarks. The size of the required reduction in the inspection interval decreases with inspection class. From this criticality assessment, it seems as an additional digital solution for inspection that has shorter intervals than the present automated inspection could be valuable. If one looks at the consequences on rail traffic it is possible to illustrate the criticality by a matrix showing the number of delay minutes, number of disturbed trains and

Risk-Based Dependability Assessment of Digitalised Condition-Based …

217

Fig. 7 Illustration of criticality by the relative frequency of three different severity levels of inspection remarks [1]

the number of faults. Figure 8 illustrates this criticality by a comparison between the two linear asset types track and catenary systems. In Fig. 8, the track’s inherent items joint and joint connection are excluded as the study is limited to seamless track. The y-axis shows severity as the number of disturbed trains at a punctuality level of three minutes (RT + 3). Severity as the number of additional delays (RT + 3) is indicated on the x-axis. The frequency of faults in different inherent items (e.g., rails and sleepers) is illustrated by the size of different geometrical shapes. In summary, the number of train-disturbing faults for track is more than for catenary, but results in shorter interruptions and fewer delays even if the number of disturbed trains is higher. A similar criticality analysis of track at the next level of consequence (even though not showed here), reveals that sleepers followed by fastenings are the two next most critical items with regard to traffic disturbances.

3.4 Second MSI Selection Based on the criticality assessment (in FMECA) a decision about which functions or items that need a further analysis might be taken [20]. This second MSI selection act as an input to an extended analysis, which can focus on safety or dependability, but preferable both. In either case, maintenance is important to achieve the requirements of safety and dependability. To select between different design-solutions that fulfil aggregated requirements, cost is commonly applied. Examples of well-known methodologies are safety analysis (CSM-RA, FMECA and FTA) or reliability analysis (ETA, FMEA and RCM). Regarding CBM, criteria for applicability and effectiveness can be found in RCM [13, 16]. Based on the analysis it seems that track still

218

P. Söderholm

Fig. 8 Example of criticality matrix. Inspired by [21]

is a good selection of MSI, as well as its inherent items and related functions since the applicability criteria is fulfilled. It also seems that there is scope to improve the effectiveness of the track’s current CM and CBM.

4 Discussion For track, the current automated inspections with a measuring train focuses on the track’s required function related to moving vehicles (e.g., gauge and twist). Track geometry is a safety–critical function that may lead to derailment. For track, the manual inspections focus on required functions of inherent items (e.g., fastenings and sleepers). A digitalisation of these manual inspections can contribute to both productivity and effectiveness improvements of CBM. Track is a linear object and not a point object (cf., level crossings). Hence, inspection solutions implemented in the support system are normally more desirable than solutions built into the infrastructure. For track, there are faults where the cause is difficult to identify and also maintenance tasks that are not effective as they result in recurring track geometry faults, often due to geotechnical reasons. Two other common causes of poor track condition are material fatigue or aging and inadequate maintenance. To better identify causes of faults and the effect of maintenance tasks, digitalised CBM can be valuable through the establishment of causal relationships between cause and effect as well as cost and benefit. These relationships can be based on analysis of additional condition data, a deeper analysis of existing reliability data, or a combination of both. However, a challenge, is to obtain relevant cost-data related to individual maintenance tasks as they are registered in different systems, on an aggregated level and vary between

Risk-Based Dependability Assessment of Digitalised Condition-Based …

219

different contracts related to procured maintenance. The FMECA-sheet works as a tool for documentation, communication, and cooperation, e.g., during procurement. The purpose of the analysis is decisive for which fields to cover. The individual FMECA is inductive, i.e., starts at LRU-level and ends at the top system level. However, the work with the FMECA starts early in the life cycle (or procurement process) at the system level and can be deployed down to an appropriate subsystem level. For example, the infrastructure level where maintenance is performed in field to verify and validate possible technical solutions for CM and CBM. Acknowledgements This paper is a deliverable within the research and development (R&D) project “ASSET” (TRV 2022/29194) at Trafikverket.

References 1. (TRV 2020/39092)–Förstudie automatiserad mätning av järnvägsinfrastruktur genom innovationsupphandling 2. (TRV 2020/39092) [KOM-405140]–Innovationsupphandling - nya systemlösningar för automatiserad mätning av järnvägsanläggningen 3. (TRV 2022/29194)–ASSET–Aktiv, Systematisk Styrning för Effektivare Tillgångsförvaltning 4. ISO 9000 (2015) Quality management systems–fundamentals and vocabulary. ISO 5. ISO 55000 (2014) Asset management–overview, principles and terminology. ISO 6. 2013/753/EU: on the second set of common safety targets for the rail system (notified under document C(2013) 8780) 7. ERA-GUI-02–2015–Implementation guidance for CSIs–Annex I of directive 2004/49/EC 8. EN 50126–1 (2017) Railway applications–part 1: generic RAMS Process. IEC 9. EN 501216–2 (2017) Railway applications part 2: systems approach to safety. IEC 10. Directive (EU) 2016/798 on railway safety 11. EN IEC 60812 (2018)–Failure modes and effects analysis (FMEA and FMECA). IEC 12. IEC 61025:2006 (2006)–Fault tree analysis (FTA). IEC 13. IEC 60300–3–11 (2009) Reliability centred maintenance. IEC 14. EN 62502 (2011)–Event tree analysis (ETA). IEC 15. Söderholm P (2022) Dependability and system railway models.—PM (TRV 2022/29194). Trafikverket, Luleå 16. Nowlan FS, Heap HF (1978) Reliability-centered maintenance. National Technical Information Service, US Department of Commerce, Springfield, Virginia 17. Directive (EU) 2016/797 on the interoperability of the rail system within the European Union 18. SS 4410505 (2000) Dependability—terminology. SEK 19. IEV (2023) International electrotechnical vocabulary (IEV). http://www.electropedia.org/. Accessed 28 Feb 2023 20. Söderholm P, Akersten PA (2023) Risk-based dependability assessment of digitalised condition-based maintenance in railway—Part II–ETA. In: IAI 2023. Luleå 21. Söderholm P, Bergquist B (2016) Business analytics of railway robustness. In: 20th QMOD Conference. pp 4–7

Climate Zone Reliability Analysis of Railway Assets Ahmad Kasraei, A. H. S. Garmabaki, Johan Odelius, Stephen Mayowa Famurewa, and Uday Kumar

Abstract Railway infrastructures deteriorate under the influence of various physical, mechanical, and environmental including climate change. Climate change impacts during past years have led to various critical damage to railway infrastructure assets. Switch and crossing (S&C) are the sensitive components of the railway network, which is affected by climate change and severe events such as abnormal temperatures, snow and ice, and flooding. S&Cs’ Failures can lead to severe consequences, which often negatively influence the reliability and safety of the railway network. Reliable railway infrastructure asset clustering is essential for tactical and strategic decision-making to operate and maintain railway networks dealing with future climate scenarios. This study utilized machine learning (ML) models to cluster S&C failures by leveraging the historical maintenance data (number of failures, failure mode, time to repair, etc.), asset registry data (type, location, the criticality of the asset, etc.), inspection data, and weather data. Four different clusters have been identified considering the climatic pattern. The proposed model has been validated using S&Cs from the Swedish railway network. The clustering approach leads to uncertainty reduction in the model building, which has the potential to support robust and reliable decision-making in railway operation and maintenance management. Furthermore, this categorization helps infrastructure managers to implement climate adaptation actions leading to more resilient transport infrastructure. A. Kasraei (B) · A. H. S. Garmabaki · J. Odelius · U. Kumar Luleå University of Technology, 97187 Luleå, Sweden e-mail: [email protected] A. H. S. Garmabaki e-mail: [email protected] J. Odelius e-mail: [email protected] U. Kumar e-mail: [email protected] S. M. Famurewa Trafikverket, 97187 Luleå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_16

221

222

A. Kasraei et al.

Keywords Railway asset · Clustering algorithm · Climate zones · Reliability analysis · Climate adaptation

1 Introduction Railway infrastructure plays a crucial role in facilitating safe and efficient transportation of people and goods to fulfil the future sustainable development goals (SDG). The railway infrastructure network is susceptible to a range of climate change impacts, including severe weather events, heatwaves, flooding, and rising sea levels which ultimately leading to reduced availability, safety, and punctuality, as well as increased operation and maintenance costs. A significant portion of the railway network, including structures such as bridges, tunnels, and stations, was constructed without adequate consideration of future climate change impacts, such as rising temperatures, increased precipitation, and more frequent extreme weather events. As a result, it is imperative to consider the potential impacts of climate change on railway infrastructure assets to mitigate the risks and ensure the longevity and resilience of this critical transportation system. Therefore, proactive planning and investment in new technologies and infrastructure, such as improved drainage systems, weather-resistant materials, and more resilient tracks and rolling stock, as well as the development of new maintenance and inspection protocols, are necessary to ensure the safe and reliable operation of railways in the face of climate change. One of the key challenges associated with this strategy pertains to the impact of changing weather patterns and climates across different geographic regions. To mitigate this challenge, one possible solution is to divide the targeted area into various classes or groups that share significant similarities. Doing so may make it easier to manage the effects of weather patterns and climatic consequences on railway assets. This paper includes different sections as follows, in Sect. 2 different climate zones classifications are presented, in Sects. 3 and 4 Sweden’s climate and impact on railway infrastructure are discussed respectively. Thereafter, the research methodology has been presented and case study have been discussed in Sect. 5. Finally, Sect. 6, presents the main findings and future research directions.

2 Climate Zones Classification Systems There are some classification systems for climate zones include: Köppen-Geiger system The Köppen-Geiger system is a widely used climate classification method that identifies different climates based on temperature and precipitation patterns. It groups climates into five main categories, which are further divided into subtypes. The categories are tropical (A), dry (B), temperate (C), continental (D), and polar (E).

Climate Zone Reliability Analysis of Railway Assets

223

The system assigns a combination of letters describing each climate zone, with the first letter indicating the major climate group and the second indicating the subtype. For example, a humid subtropical climate is classified as Cfa, where C represents temperate climates, f represents humid subtropical subtype, and a represents hot summer [1]. Hornthwaite climate classification system This system classifies climates based on the water balance and potential evapotranspiration, it is often used in hydrology and water resource management to evaluate water availability and demand in a given region [2, 3]. Trewartha climate classification system This system is based on the concept of seasonality and takes into account temperature and precipitation patterns throughout the year [4]. Bergeron classification system The Bergeron classification system is based on the relationship between temperature and precipitation. This system divides climates into four groups: polar, boreal, temperate, and tropical. Each group is further divided into subgroups based on temperature and precipitation characteristics [5]. Spatial Synoptic Classification (SSC) The Spatial Synoptic Classification (SSC) system was developed by Mark S. Yarnal in the 1990s and is based on the synoptic (large-scale weather) patterns that produce local weather conditions. This system uses a combination of temperature, humidity, wind, and cloud cover to classify climates into six groups: dry tropical, moist tropical, dry mid-latitude, moist mid-latitude, dry polar, and moist polar [6]. Holdridge life zones The Holdridge Life Zones system is another climate classification system that is based on the relationship between temperature, precipitation, and potential evapotranspiration. The Holdridge system divides the world into life zones based on three climatic variables: mean annual temperature, mean annual precipitation, and potential evapotranspiration. These variables are combined to create an index of biotemperature and a measure of aridity, which are then used to classify areas into one of 30 different life zones [7].

3 Sweden’s Climate Sweden is classified as a cold climate zone (Dfb) according to the Köppen-Geiger climate map (See Fig. 1), and its average temperature has risen by almost 2 °C in comparison to the temperature to the end of nineteenth century. In contrast, the global mean temperature has increased by approximately 1 °C. In Fig. 2, the charts display

224

A. Kasraei et al.

Fig. 1 World Map of Köppen-Geiger climate classification updated with mean monthly CRU TS 2.1 temperature and VASClimO v1.1 precipitation data for the period 1951 to 2000 on a regular 0.5° latitude/longitude grid [8]

the average temperature with bars, where the red bars represent temperatures that are higher than the average temperature for the normal period of 1961–1990, while the blue bars represent temperatures that are lower than the average. Additionally, the chart also includes a black line which shows a running mean calculated over approximately ten years.

4 Impact of Climate on Railway Infrastructure Weather conditions such as temperature and precipitation can significantly impact railway infrastructure, causing severe damages. For example, high temperatures can cause the rails to buckle or expand, leading to structural damage. Similarly, heavy rainfall or snowfall can cause infrastructure slope failures, track misalignment, and bridge scour. Additionally, extreme weather conditions can also cause damage to the catenary line and signal equipment, making it difficult for trains to operate smoothly [9–11]. All these factors highlight the importance of regular maintenance and monitoring of railway infrastructure to ensure safe and efficient transportation. Extreme weather conditions, such as heavy rainfall, snowfall, freezing temperatures, and high winds, can lead to delays and even failures in railway infrastructure in northern Europe. Research has indicated that adverse climate conditions account for 5 to 10% of total failures and 60% of delays in the railway system in this region [12, 13].

Climate Zone Reliability Analysis of Railway Assets

Fig. 2 Average temperature for different seasons from 1860 till 2020

225

226

A. Kasraei et al.

To address this issue, rail operators and infrastructure managers may need to invest in measures such as improved drainage, better insulation, more resilient track materials, and enhanced maintenance protocols. Additionally, contingency plans and procedures may need to be developed to respond effectively to weather-related disruptions. However, the specific impact of adverse climate conditions on rail infrastructure may vary depending on various factors, including geography, topography, and types of infrastructure and equipment used in different regions. Therefore, further research and analysis may be required to fully understand the relationship between climate conditions and rail performance.

5 Methodology Given the potential impact of adverse climate conditions on railway assets, it is crucial to consider the diverse climatic zones in different regions and their effects on the railway infrastructure. In this context, this paper seeks to categorize the areas of Sweden into different sections, taking into account various climatic parameters such as temperature and the type of railway assets, including stations. To achieve this goal, unsupervised machine learning techniques such as K-Means will be utilized to cluster and group different areas based on their similar climatic conditions and railway assets. The use of machine learning techniques in this study can provide valuable insights into the relationships between climatic zones and railway infrastructure in Sweden. By analyzing the data on temperature and railway assets, the study can identify the areas that are most vulnerable to adverse weather conditions and require more attention and investment in terms of adaptation and resilience measures. Furthermore, the use of unsupervised machine learning techniques such as K-Means can allow for the identification of patterns and trends that may not be easily identifiable through traditional statistical methods. This approach can also help to overcome the limitations of human judgment and bias, providing a more objective and data-driven perspective on the categorization of different climatic zones and railway assets. This study is focusing on railway S&Cs which are distributed over the network. In addition, based on knowledge created in our previous projects and discussion with experts, temperature has been selected as main climatic factors with high impacts on railway infrastructure network. This study includes the following subsections: Railway asset specifications: This section involves gathering information on the railway network map of Sweden and selecting over forty railway stations located in different parts of the country. The GPS specifications of these assets are then determined and recorded for further analysis. Meteorological data gathering and pre-processing: This section outlines the process of selecting weather stations based on the identified railway stations. The desired meteorological data is then gathered from open sources, and temperature records from the past 20 years are collected and analyzed using data from VviS and SMHI.

Climate Zone Reliability Analysis of Railway Assets

227

Clustering algorithms: This section explains the use of K-means clustering techniques to analyze the gathered data. The clustering technique groups the data based on their similarities, allowing for easier interpretation and identification of patterns in the data. Reliability analysis: This section involves performing reliability analysis on the clustered data. After grouping the data into different clusters, the reliability of each cluster is assessed. The framework of this paper is depicted in Fig. 3. The diagram illustrates the different components and their relationships in the proposed framework.

Step1: Data Gathering and pre-processing

Meteorological data (history of temperature for selected assets)

Asset selection (railway stations)

Satellite data

History of failures and maintenances of assets

Step 2: Clustering according climate zones aspect Optimal number of clusters with Elbow approach

Determining members of diﬀerent clusters

Step 3 : Trend analysis Extracting failures history data for each clusters

HPP (Weibull, Normal, Lognormal, etc.)

No

Yes Is there trend in failures data of cluster?

Step 4: Reliability analysis

•

Fig. 3 Framework of the study

• Reliability function Expected number of failures

NHPP (power low process)

228

A. Kasraei et al.

5.1 Data Gathering and Pre-processing The present study involves collecting and analyzing diverse data sets pertaining to the railway network in Sweden. Specifically, the study includes the selection of forty railway stations from various parts of the network, and their locations have been identified for further analysis. In addition, meteorological data from weather stations close to the selected railway stations have also been collected. This data includes temperature readings, and records have been gathered over a period of more than 20 years. Including these diverse data sets allows for a comprehensive analysis of the railway network and its associated environmental factors. By examining the temperature records over an extended period, the study can identify any trends or patterns that may be present in the data.

5.2 Clustering According to Climate Zones K-Means is an unsupervised machine learning algorithm commonly used to cluster and group data points. The algorithm identifies the centroid of each cluster and then iteratively reassigns data points to the nearest centroid until the clusters stabilize. The “K” in K-Means refers to the number of clusters the algorithm aims to identify. This approach consists of the following steps: – Select the number of clusters (K). – Initialize the centroids of each cluster randomly. – Assign each data point to the nearest centroid based on the Euclidean distance between the data point and the centroid. – Recalculate the centroid of each cluster as the mean of all the data points assigned to that cluster. – Repeat steps 3 and 4 until the centroids no longer change or until a maximum number of iterations is reached. – The algorithm outputs the final clusters, each containing the data points assigned to the same centroid. At the beginning of K-Means method the number of clusters need to be determined. The Elbow method is commonly used for this purpose. As shown in Fig. 4, the optimal number of clusters for the given dataset was determined to be four. Using this parameter (K = 4), the K-Means technique was implemented using the Spyder software. In the next step of the analysis, the dataset was clustered based on several features. Specifically, time series of temperature for each railway station were considered, and various parameters were extracted from these time series. These parameters included the mean temperature, standard deviation (std), skewness, kurtosis, as well as the geographic coordinates of the station (i.e., latitude, longitude) and its height above sea level. This approach allowed for a more comprehensive analysis of the

Climate Zone Reliability Analysis of Railway Assets

229

Fig. 4 Using the elbow method to determine the optimal number of clusters for K-Means algorithm

temperature data, taking into account not only the temporal patterns but also the spatial characteristics of the data. By considering a range of features, the resulting clusters were able to capture the underlying structure of the data and identify groups of similar observations. Furthermore, the inclusion of geographic coordinates and height above sea level in the clustering process may reveal patterns that are related to topography and microclimate, which are known to affect temperature variations. This information could be useful for understanding the spatial distribution of temperature patterns and potentially identifying areas that are more susceptible to extreme temperature events. Overall, by leveraging multiple features in the clustering process, a more nuanced understanding of the temperature data can be obtained, leading to insights that may not be apparent from a single feature analysis. Figure 5 displays the outcome of the grouping process, which reveals the presence of four distinct clusters encompassing a total of 40 railway stations located throughout Sweden. The clusters shown in the figure are divided by green lines, and each cluster’s members are indicated by four different colours (green, red, blue, and yellow). λ(t) =

β ∗ θ

( )β−1 t θ

(1)

5.3 Reliability Analysis and Discussion Trend analysis: Statistical trend test technique is utilized to assess the presence of any patterns or trends in cumulative failure times of a particular system over time. To evaluate trends in cumulative failure time, various statistical tests such as the Laplace trend test, Military Handbook test, and Anderson–Darling test can be employed. These tests

230

Fig. 5 Depicting selected railway stations on various clustering over Sweden

A. Kasraei et al.

Climate Zone Reliability Analysis of Railway Assets

231

analyze the data for a monotonic trend, which refers to a consistent increase or decrease in the cumulative failure time over time. If the test indicates a significant increase in the cumulative failure time over time, this may suggest that the system is becoming less reliable and may require maintenance or redesign. Figure 6 depicts the trend in failures occurrence which performed based on the clustering outcome given in sect. 5.2. It can be observed that over time, there is a gradual increase in the curve representing the reduction in reliability performance of the assets. In trend tests, the null hypothesis is a statement of trend-free or no significant pattern in the data being analyzed. The results of statistical analysis confirm that in all statistical tests the null hypothesis is rejected for all cases (except the pooled Laplace’s for cluster 3, and the nonhomogeneous Poisson process (NHPP) is utilized for reliability analysis in next step [14]. Nonhomogeneous Poisson process (NHPP): The NHPP model is associated with an intensity function (Eq. 1) that signifies the rate of failures. The shape parameter (β) value determines whether the system is improving, deteriorating, or remaining stable over time. The value of β indicates an increasing failure rate, meaning the system is deteriorating. Under NHPP case, the number of failures can be estimated through the integration of λ(t) and the reliability function can be approximated as given in Eq. 2:

Fig. 6 The presentation of trend test result graphically

232

A. Kasraei et al. t

m(t) = ∫ λ(s)ds

(2)

0

R(t) = exp(−m(t)) The reliability function, denoted by R(t), and the intensity function at time s, denoted by λ(s), are related through the cumulative intensity function, represented {t as 0 λ(s)ds. According to the previous analysis, the shape and scale for the failure time are presented in Table 1. Using the Eqs. 1 and 2, and the result, which are presented in Table 1, the reliability curves of four clusters are presented in Fig. 7. Notably, the data set under consideration comprises four distinct clusters, labelled Cluster1, Cluster2, Cluster3, and Cluster4, and encompasses 2,002 individual assets. These assets experienced a total of 24,738 failures over the course of the 18-year observation period. Table 2 reveals that Cluster 4, consisting of assets located in the southern region of Sweden, accounted for over 56% of all assets and nearly 60% of total failures. The data presented in Table 1 highlights the remarkable similarity between Cluster 4 and the integrated scenario, including all assets, as evidenced by the fact that Cluster 4 contributed to almost 60% of all observed failures. Furthermore, Table 2 (last column) shows that the ratio of the distribution of the number of failures per assets is approximately the same for all the clusters and integrated scenario. In addition, the ratio of assets per cluster and the ratio of number of failures per cluster are approximately the same (See columns three and five). Hence, a weighted combination of cluster failures’ parameter (shape parameter) provides a reasonably accurate approximation of this parameter for the reliability analyses Table 1 Parameters estimation for different clusters Parameter

Cluster1

Cluster2

Cluster3

Cluster4

Whole assets

Shape

1.20

1.34

1.08

1.28

1.24

Scale

14,298

20,310

11,548

15,890

15,573

Fig. 7 Reliability curves for selected assets over time

1

Cluster1 Cluster2 Cluster3 Cluster4 Whole assets

Reliability

0.8 0.6 0.4

0.2 0 0

10000

20000 30000 40000 Operational time (h)

50000

Climate Zone Reliability Analysis of Railway Assets

233

of whole assets. This will help the infrastructure manager with tactical and strategical decision-making utilizing the cluster’s combined pattern, considering the ratio weighted for reliability, quality management, and maintenance planning. In Fig. 7, comparing the reliability behaviour of the whole assets (black curve) and other clusters is evident that clustering technique can lead to a reduction in uncertainty in modelling, as demonstrated by the varying reliability behaviours of the different clusters. This finding is of great significance in the field of machine learning for climate adaptation measures and risk assessment, where accurate and reliable models are crucial for effective decision-making. By clustering S&Cs based on their meteorological feature, the analysis provides insight into the underlying patterns and behaviours of these assets, which can assist in the development of more robust and accurate models. Specifically, the reliability analysis conducted on the different clusters reveals that different groups of assets exhibit different reliability behaviours, with some being more reliable than others over time. Figure 8 illustrates the differences between the integrated scenario, and the clustering approach that divides the assets into four different groups based on their meteorological features. The results demonstrate that the number of failures at t = 40,000 h for each of the four clusters and the whole assets vary, with cluster 2 having the lowest number of failures (2.50) and cluster 3 having the highest number of failures (4.00). These differences highlight the importance of considering the specific characteristics and environmental conditions of assets when developing machine learning models for risk assessment and climate adaptation measures. Furthermore, the similarities between the number of failures for the whole assets (3.25) and cluster 4 (3.28) suggest that this cluster is representative of the overall behaviour of the assets. This finding is particularly relevant for decision-makers in the transportation industry.

Table 2 Distribution of assets and failures in different clusters #Cluster

#Assets

% of assets

#Failures

% of failures

#Failures per #assets

1

124

6

1340

5

10.80

2

387

19

4113

17

10.63

3

373

19

4620

19

12.38

4

1118

56

14,665

59

13.12

Whole assets

2002

100

24,738

100

12.36

234

5 Expected number of failures

Fig. 8 Expected number of failures curves for selected assets over time

A. Kasraei et al. Cluster1 Cluster2 Cluster3 Cluster4 Whole assets

4 3 2 1 0 0

10000

20000 30000 40000 Operational time (h)

50000

6 Conclusion Implementing machine learning in climate adaptation measures and risk assessment is a necessary step due to the vast number of assets located in different geographical areas, the diverse characteristics of these assets, and the varying meteorological conditions they are exposed to. This study focuses on 2002 S&Cs installed in different railway stations in Sweden. Temperature, one of the essential meteorological measures associated with the coordination and altitude of railway stations, was selected and clustered into different climatic zones using the K-means technique, resulting in four different clusters. After that, reliability analysis was conducted for these four clusters and integrated scenario. The reliability analysis showed that different clusters follow different behaviours and that assets located in cluster 2 have the highest reliability, while those in cluster 3 have the lowest reliability over time. Our analysis showed the similarity between reliability parameters of cluster4 and integrated scenario. The clustering approach used in this study help infrastructure managers to utilize climate adaptation measure by integrating climate parameters with asset heath performance (reliability analysis) which leads to better understand the behaviour of assets under different meteorological conditions. In the future, we have planned to extend the proposed methodology to consider other effective climatic parameters such as humidity, snow depth, wind speed along with other operational features e.g. asset’s age, track types, and speed. Acknowledgements Authors gratefully acknowledge the funding provided by Sweden’s innovation agency, Vinnova, to the project titled “Adapting Urban Rail Infrastructure to Climate Change (AdaptUrbanRail)1 ” (Grant no. 2021-02456) and “Robust infrastructure – Adapting railway maintenance to climate change (CliMaint)2 ” (Grant no. 2019-03181)” and Kempe foundation (Grant no. JCK-2215). The authors gratefully acknowledge the in-kind support and collaboration of Trafikverket, SMHI, SWECO AB, WSP AB, InfraNord, BnearIT, and Luleå Railway Research Center (JVTC).

1 2

www.ltu.se/adapturbanrail. www.ltu.se/CliMaint.

Climate Zone Reliability Analysis of Railway Assets

235

References 1. Peel MC, Finlayson BL, McMahon TA (2007) Updated world map of the Köppen-Geiger climate classification. Hydrol Earth Syst Sci 11(5):1633–1644 2. Thornthwaite CW (1948) An approach toward a rational classification of climate. Geogr Rev 38(1):55–94 3. Feddema JJ (2005) A revised Thornthwaite-type global climate classification. Phys Geogr 26(6):442–466 4. Trewartha T (1968) Introduction to climate. McGraw-Hill 5. Chiu LS (2020) Climate: classification. In: Atmosphere and climate. CRC Press, pp 169–178 6. Cakmak S, Hebbern C, Vanos J, Crouse DL, Burnett R (2016) Ozone exposure and cardiovascular-related mortality in the Canadian Census Health and Environment Cohort (CANCHEC) by spatial synoptic classification zone. Environ Pollut 214:589–599 7. Holdridge LR (1947) Determination of world plant formations from simple climatic data. Science 105(2727):367–368 8. Kottek M, Grieser J, Beck C, Rudolf B, Rubel F (2006) World map of the Köppen-Geiger climate classification updated 9. Palin EJ, Stipanovic Oslakovic I, Gavin K, Quinn A (2021) Implications of climate change for railway infrastructure. Wiley Interdiscipl Rev Clim Change 12(5):e728 10. Stenström C, Famurewa SM, Parida A, Galar D (2012) Impact of cold climate on failures in railway infrastructure. In: International conference on maintenance performance measurement & management: 12/09/2012–13/09/2012 11. Garmabaki A, et al (2022) Climate change impact assessment on railway maintenance. https:// www.rpsonline.com.sg/proceedings/esrel2022/html/toc html 12. Garmabaki A, Thaduri A, Famurewa S, Kumar U (2021) Adapting railway maintenance to climate change. Sustainability 13(24):13856 13. Thaduri A, Famurewa S, Garmabaki A, Kumar U (2021) Adapting railway maintenance to climate change. Sustainability 14. Garmabaki A, Ahmadi A, Block J, Pham H, Kumar U (2016) A reliability decision framework for multiple repairable units. Reliab Eng Syst Saf 150:78–88

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis Based on Envelope Spectrum and Variational Autoencoder Haobin Wen, Long Zhang, Jyoti K. Sinha, and Khalid Almutairi

Abstract Anti-friction bearings (AFB) are crucial structural components conveying rotating motions in a variety of mechanical systems. To avoid unscheduled breakdowns and fatal failures, remaining useful life (RUL) prediction is of great practical significance in industrial practice for prognostics health management, e.g., optimizing maintenance plan for component replacements. Recently, the artificial intelligence (AI) advancements have provided effective data-driven models for bearing prognostics using machine learning. In this paper, using the variational auto-encoder (VAE) networks as the regression backbone, the bearing RUL is estimated using envelope spectra via measured vibrational data. First, the envelope spectra are utilized for bearing fault detection and the network input features. After the fault is detected, the VAE is used for learning the probabilistic mapping from the spectral input to the estimated RUL value, given its good probabilistic and generative properties over the classical auto-encoder (AE) in content generation and variational inference. The application of the method to the run-to-failure measured vibration data from the experimental rig available online have shown its efficacy in bearing RUL estimation. Keywords Bearing · Remaining useful life · Variational auto-encoder · Machine learning · Prognostics health management

H. Wen (B) · J. K. Sinha · K. Almutairi Dynamics Laboratory, The Department of Mechanical Aerospace and Civil Engineering, The University of Manchester, Manchester M13 9PL, UK e-mail: [email protected] J. K. Sinha e-mail: [email protected] K. Almutairi e-mail: [email protected] L. Zhang The Department of Electrical and Electronic Engineering, The University of Manchester, Manchester M13 9PL, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_17

237

238

H. Wen et al.

1 Introduction In the era of artificial intelligence, the advancement of information technology has revolutionized almost every aspect of the modern industry from the manufacturing to maintenance. As the fundamental structural components of rotating machines, antifriction bearings are one of the susceptible parts that often work under harsh conditions. Bearing faults are one of the root causes of other structural defects that lead to systematic failures or breakdowns. Therefore, the early prediction of the remaining useful life (RUL) is a significant part in bearing prognostics and health management (PHM), which helps schedule proactive maintenance and avoid significant financial losses. The current frontier research of RUL prediction is focused on developing robust data-driven models, including traditional machine learning models, such as support vector regression [1] and random forests [2], and deep learning models, such as multi-layer perceptron (MLP) [3], convolutional neural networks (CNN) [4], recurrent neural networks (RNN) [5], etc. Deep autoencoders (AE), a type of encoderdecoder-based neural network architecture for unsupervised feature learning, have been shown to be effective in dimensionality reduction, synthetic data generation and anomaly detection [6]. Recently, the variational autoencoder (VAE) [7] has been proposed as an improvement over classical AE to encode continuous latent distributions as the latent space rather than a determinant feature vector, which has improved generalization capability for missing and noisy data. In [8], VAE have been used for out-of-distribution detection for cyber-physical system. Zhao et. al proposed the VAE for regression to predict ages based on pure brain image input [9]. In this work, the deep VAE network is utilized to learn the latent relationship between the envelope spectrum and the bearing remaining useful life. First, the envelope analysis is carried out on the bearing vibration data throughout the whole bearing lifetime for fault detection. Once the bearing fault is detected, the envelope spectrum is input to the VAE networks for RUL regression task. The deep learning-based RUL prediction approach is applied to the envelope spectra dataset constructed from the experimental bearing vibration signals available in [10]. The result of the case study has shown the effectiveness of the deep VAE networks in the RUL prediction once the bearing fault is detected.

2 Experimental Data [10] In this study, the measured vibration acceleration data available online, the XJTUSY dataset [10], based on accelerated bearing run-to-failure experiments are used. An overview of the bearing test bench is shown in Fig. 1 [10]. The test bearing is supported at the end of the rig with the horizontal hydraulic loading on the bearing housing. The measured vibration data are available for three working conditions: (1) 35 Hz shaft speed with 12 kN loading, (2) 37.5 Hz shaft speed with 11 kN loading,

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis …

239

Fig. 1 Bearing test bench [10]

and (3) 40 Hz shaft speed with 10 kN loading. Under each condition, five sets of vibration data are available for the bearing running from the healthy state to failure. These data have used 10 times the vibration amplitude at the normal state as the failure threshold. Each dataset consists of 1.28 s data with a sampling frequency at 25.6 kHz. The full description of the data can be found in [10].

3 Methodology 3.1 Envelope Analysis for Fault Detection To perform meaningful RUL prediction, it is assumed that the bearing RUL remains the same at the healthy state and bearing defects need to be detected before the implementation of RUL prediction. First, the high-pass filtering and envelope analysis are performed on the measured vibration acceleration responses [10] on the bearing housing to detect faults during the full lifecycle of a typical bearing. For example, in the case of the ‘Bearing 1_1’ [10], the initial outer race fault can be found at the 68th minute by the observing the envelope spectrum of the filtered acceleration signals. Next, the raw vibration acceleration signals are shown in Fig. 2. And the corresponding envelope spectrum is presented in Fig. 3, which contains clear fault frequency components of the outer-race fault.

240

H. Wen et al.

Fig. 2 The raw vibration signals along with machine operation time of the ‘Bearing 1_1’

Fig. 3 The envelope spectrum of the horizontal vibration at the 68th minute of the ‘Bearing 1_1’

For other bearing groups, the envelope analysis is also carried out to obtain the envelope spectra for fault detection. Once the early fault is detected, the deep learning model is introduced for RUL prediction based on the spectral input.

3.2 Spectral Variation During Degradation To observe the changes in the envelope spectrum with the bearing degradation over the period of the rig operation, the envelope spectrum of every measurement is computed and shown in the contour plot along with the machine operation time. Figure 4 shows a typical contour plot for the envelope spectra along with the machine operation time for the ‘Bearing 1_1’ from the healthy condition (the 1st minute) to the end of life. After the 68th minute, the outer-race fault is detected, and the harmonics pattern of

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis …

241

Fig. 4 The contour plot of the envelope spectra versus machine operation time for ‘Bearing 1_1’

the bearing fault frequencies are clearly observed with the degradation of bearing condition during the operation time.

3.3 Ground Truth RUL For meaningful RUL prediction, it is assumed that the remaining life of the bearing is a constant value at the healthy state, and only the vibration data after faults occur are considered for generating the RUL labels. Based on the inspection of the envelope analysis, the linear degradation model is used [5] to generate the ground truth RUL labels since the detection of bearing faults. For example, the ground truth RUL labels for ‘Bearing 1_1’ after the fault detection via envelope analysis are shown in Fig. 5.

3.4 VAE for RUL Prediction Since the envelope spectrum is a tool for identifying different modes of bearing faults, it could also contain important information about the degradation severity and provides the hint of remaining lifetime. In order to learn the latent mapping between the envelope spectrum and the RUL, the deep learning network architecture of Variational AutoEncoder (VAE) is introduced for the RUL prediction.

242

H. Wen et al.

Fig. 5 The ground truth RUL for ‘Bearing 1_1’

3.4.1

Variational Autoencoder

VAE is the combination of the classical autoencoder (AE) networks and the variational inference, which is advantageous in unsupervised feature learning, synthetic data generation, and missing data imputation. Both AE and VAE are encoderdecoder-based neural networks for dimensionality reduction based on the minimization of the reconstruction loss. The improvement of VAE over AE mainly comes from the following designation: • Learning the data distributions rather than a determinant value as the latent space, which enables the generative process to create new synthetic data. • Regularizing the latent distributions to be standard Gaussian to enforce meaningful and disentangled latent features. A graphical illustration for the model differences is shown in Fig. 6.

3.4.2

VAE for RUL Prediction

Given an m-point envelope spectrum of the vibration signal collected at the kth minute, x ∈ Rm , the VAE networks are expected to adaptively learn the latent mapping, f (x) : x ∈ Rm×1 → L ∈ R1 , for estimating the remaining life L at time step k. Here, the envelope spectrum is segmented from 0 to 1 k Hz, consisting of 640 spectral lines, i.e., m = 640. Besides reconstructing the synthetic spectrum, the VAE is crafted for regression task predicting the bearing RUL value regarding } Let X denote the { the spectral input. envelope spectra dataset for training, X = x (1) , x (2) , . . . , x (n) , and L be the set

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis …

243

Fig. 6 Difference between AE and VAE

} { of ground truth RUL labels for training, L = L (1) , L (2) , . . . , L (n) , where n is the total number of training samples, the VAE model takes in both X and L for learning the generative process of spectral reconstruction and the inference process of RUL estimation. Assume each envelope spectrum x is related to a latent distribution z and the latent distribution is dependent on L, the VAE model is established based on the minimization of the total loss based on the following terms: • The mean-squared-error between the input spectrum x and the VAE reconstruction x’. • The KL divergence between the probabilistic encoder network q(z|x) and the multivariate Gaussian [9]. • The KL divergence between the RUL regressor network q(L|x) and the univariate Gaussian, in which the ground truth RUL, L, is a priori knowledge for each envelope spectrum during training. Table 1 summarizes the supervised VAE network model for bearing RUL prediction.

244

H. Wen et al.

Table 1 Network Architecture Output shape

Specifications

Encoder input

(640 × 1)

640 bins envelope spectrum, x

Ground truth input

(1 × 1)

RUL label, L

Dropout layer

(640 × 1)

25%

Dense layer 1

(256 × 1)

256 units, ReLu activation

Dense layer 2

(32 × 1)

32 units, ReLu activation

Latent mean

(16 × 1)

Mean of latent distribution

Latent variance

(16 × 1)

Variance of latent distribution

RUL mean

(1 × 1)

The mean of estimated RUL, L˜

RUL variance

(1 × 1)

Layer Encoder network:

Regressor network:

Decoder network: Latent input layer

(16 × 1)

Latent distribution input

Dense layer 3

(32 × 1)

32 units, ReLu activation

Dense layer 4

(256 × 1)

256 units, ReLu activation

Dense layer 5

(640 × 1)

640 units, ReLu activation, Spectral reconstruction, x’

3.4.3

Dataset Construction

To construct the bearing dataset for training the RUL prediction networks, the envelope spectra of the vibration signals are first computed and then labelled by a ground truth RUL value. Only the signals after detection of the bearing faults are used to construct the dataset. Here, the bearing data under the 3rd condition [10] are used for training and testing, the details of which are listed in Table 2. Overall, a total of n = 826 samples of envelope spectra are used for training the latent spectrum-to-RUL mapping. The networks are set up with Python implementation with Keras deep learning library. The adaptive moment optimizer (ADAM) is used with 30% of the training data randomly selected for validation. The batch size is set at 128 for fitting the model with 1000 epochs. Table 2 Training and testing data

Training and validation

Testing

Bearing 3_2, Bearing 3_3 Bearing 3_4, Bearing 3_5 (826 × 640)

Bearing 3_1 (152 × 640)

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis …

3.4.4

245

Evaluation Metrics

To quantify the performance of the RUL prediction model, the root-mean-squared error (RMSE) is used for measuring the deviation from the estimated RUL, L˜ i to the ground-truth RUL, L i . The RMSE is defined as follows: | | N ( )2 |E L˜ i − L i RMSE = |

(1)

i

where N is the total number of test samples.

4 Experiments Results 4.1 Network Training The network performance can be evaluated by monitoring the total loss on the training set and the validation set. As shown in Fig. 7, the VAE model is well fitted with both the training loss and the validation loss decreasing to the stably low value. Due to the use of KL divergence in the loss function, the overall loss could reach negativity.

Fig. 7 Total loss during training

246

H. Wen et al.

Fig. 8 VAE model training for RUL prediction

4.2 Model Performance on the Training Set To show the performance of the VAE-based RUL prediction model on the training set, Fig. 8 shows the RUL prediction result in the chronological order. Considering only the signals after the detection of bearing faults via envelope analysis, the linear ground truth RUL label (as marked in red) is gathered for each bearing dataset in similar fashion as described in Sect. 3.3. Based on the VAE prediction results on the training data including ‘Bearing 3_2’, ‘Bearing 3_3’, ‘Bearing 3_4’, and ‘Bearing 3_5’, it can be seen that the estimated RUL (in black) is consistent with the ground truth RUL and the VAE model is well fitted on the training data for remaining useful life estimation.

4.3 Model Performance on the Testing Set For testing, the trained network is applied to the vibration data of ‘Bearing 3_1’. As shown in Fig. 9, the raw vibration signals of the test bearing during full life cycle are steadily low in the amplitude for most of the machine operation time. After performing the envelope analysis, the out-race fault is identified at the 2386th minute based on the observation of the fault characteristic frequencies in the envelope spectrum, as in Fig. 10. Then, the trained VAE model is applied to the remaining envelope spectra of the ‘Bearing 3_1’ for RUL estimation. As shown in Fig. 11, the predicted RUL (in blue) has a decreasing trend fluctuating around the ground truth RUL (as marked in red). The overall RMSE of the prediction against the ground truth is 32.8.

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis …

247

Fig. 9 The raw vibration signals in full machine operation time of the ‘Bearing 3_1’

Fig. 10 The envelope spectrum of the vibration signal at the 2386th minute of the ‘Bearing 3_1’

Note that the initial estimation of the RUL is more accurate than that at the later stage, which could be caused by the variation of the spectral components in the vibration signals given the ongoing deterioration of the bearing defects. It is worth mentioning that the early-stage RUL prediction is much more meaningful to practical applications for planning timely maintenance and avoiding significant loss. Therefore, the above result has shown the effectiveness of the deep VAE networks in bearing RUL prediction and AI assisted prognostics health management.

248

H. Wen et al.

Fig. 11 RUL prediction for ‘Bearing 3_1’ by the VAE model

5 Conclusion The study in this paper presents the intelligent RUL prediction model for bearing prognostics based on envelope analysis and deep VAE networks. Based on the observation of the envelope spectra across the whole bearing lifetime, the latent nonlinear relation is assumed between the bearing envelope spectrum and its remaining useful life, which is excavated by the training the deep VAE network for regression. The deep learning approach for RUL prediction is tested on the vibration data of the experimental rig available online. The relevant results show that the VAE regression model is effective in estimating the RUL at the early stage of bearing defect, which reveals its potentials in intelligent predictive maintenance for modern industrial practice. Acknowledgements The first author would like to thank the China Scholarship Council and the University of Manchester for their joint support for the study.

References 1. Khelif R, Chebel-Morello B, Malinowski S, Laajili E, Fnaiech F, Zerhouni N (2017) Direct remaining useful life estimation based on support vector regression. IEEE Trans. Ind. Electron 2. Alfarizi MG, Tajiani B, Vatn J, Yin S (2022) Optimized Random Forest Model for Remaining Useful Life Prediction of Experimental Bearings. IEEE Trans Ind Inform 3. Fink O, Wang Q, Svensén M, Dersin P, Lee WJ, Ducoffe M (2020) Potential, challenges and future directions for deep learning in prognostics and health management applications. Eng Appl Artif Intell 92:103678 4. Navathe SB, Wu W, Shekhar S, Du X, Sean Wang X, Xiong H (2016) Deep convolutional neural network based regression approach for estimation of remaining useful life. In: Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9642:214–228

Remaining Useful Life Estimation for Anti-friction Bearing Prognosis …

249

5. Heimes FO (2008) Recurrent neural networks for remaining useful life estimation. In: 2008 international conference on prognostics and health management 6. Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality reduction. https://doi.org/10.1145/2689746.2689747 7. Jinwon A, Sungzoon C (2015) Variational autoencoder based anomaly detection using reconstruction probability. Spec Lect IE 8. Cai F, Ozdagli AI, Koutsoukos X (2022) Variational autoencoder for classification and regression for out-of-distribution detection in learning-enabled cyber-physical systems. Appl Artif Intell 36(1) 9. Zhao Q, Adeli E, Honnorat N, Leng T, Pohl KM (2019) Variational autoencoder for regression: application to brain aging analysis. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinform) 11765:823–831 10. Wang B, Lei Y, Li N, Li N (2018) A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Trans Reliab 69(1):401–412

Artificial Intelligence in Predictive Maintenance: A Systematic Literature Review on Review Papers Md Rakibul Islam, Shahina Begum, and Mobyen Uddin Ahmed

Abstract The fourth industrial revolution, colloquially referred to as “industry 4.0”, has garnered substantial global attention in recent years. There, Artificial intelligence (AI) driven industrial intelligence has been increasingly deployed in predictive maintenance (PdM), emerging as a vital enabler of smart manufacturing and industry 4.0. Since in recent years the number of articles focusing on Artificial Intelligence (AI) in PdM is high a review on the available literature reviews in this domain would be useful for the future researchers who would like to advance the research in this area and also for the persons who would like to apply PdM in their application domains. Therefore, this study identifies the AI revolution in PdM and focuses on the next stages available in the literature reviews in this area by quality assessment of secondary study. A well-known structured review approach (Systematic Literature Review, or SLR) was employed to perform this tertiary study. In addition, the Scale for the Assessment of Narrative Review Articles (SANRA) approach for evaluating the quality of review papers has been employed to support a few of the research questions. Here, This tertiary study scrutinizes four crucial aspects of secondary articles: (1) their specific research domains, (2) the annual trends in the quantity, variety, and quality (3) a footsteps of top researchers, and (4) the research constraints that review articles face during the time frame of 2015 to 2022. The results show that the majority of the application areas are applied to the manufacturing industry. It also leads to the identification of the revolution of AI in PdM as well. Our final findings indicate that Dr. Cheng et al.’s (2022) review has emerged as the predominant source of information in this field. As newcomers or industrial practitioners, we can benefit greatly from following his insights. The final outcome is that there is a lack of progress in SLR formulation and in adding explainable or interpretive AI methodologies in secondary studies. M. R. Islam (B) · S. Begum · M. U. Ahmed Mälardalen University, Västerås, Sweden e-mail: [email protected] S. Begum e-mail: [email protected] M. U. Ahmed e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_18

251

252

M. R. Islam et al.

Keywords Predictive maintenance · Artificial Intelligence · Systematic literature review

1 Introduction The utilization of machinery and other equipment in the industrial sector, such as in construction, transportation, and power generation, mandates routine maintenance and corrective measures. While such maintenance is necessary, it can be costly and may comprise a significant portion of total operating expenses, ranging from 15 to 60% depending on the industry under consideration. Elevated downtime associated with maintenance activities can have a detrimental effect on an industry’s operational capacity and financial stability. This, in turn, has a far-reaching impact on the entire manufacturing pipeline and may impinge upon overall corporate performance. To mitigate the risk of unexpected downtime and uphold optimal production facility functionality, the development and implementation of a robust maintenance program are essential. Utilization of data-driven predictive maintenance (PdM) techniques is one of the most potent methods of ensuring sound maintenance scheduling practices. The definition of PdM can be given as: “Predictive maintenance is a philosophy or attitude that, simply stated, uses the actual operating condition of plant equipment and systems to optimize total plant operation [13]”. However, PdM has a far wider scope. It is the way through which manufacturing and production facilities may increase their overall effectiveness, as well as their productivity and product quality. As a result of the fourth industrial revolution, nearly every production plant is transformed into a smart manufacturing facility. The availability and use of data throughout the whole industrial system is what makes these intelligent production systems knowledgeable and efficient. Applying AI for PdM is one of the hallmarks of smart manufacturing. AI-driven PdM has emerged as a prominent research area, with a history and evolution that can be traced through secondary literature sources. A thorough understanding of the current state of the field is essential to anticipating the future of PdM, which promises to revolutionize production cost reduction. An effective means of achieving this understanding is to conduct a systematic literature review (SLR) to identify the most influential and informative studies in this field, thereby providing guidance for future research endeavors. The entire investigation is structured as follows: Sect. 2 the methods are discussed here, Sect. 3 presents the quality assessment criterion of secondary studies, Sect. 4 shows the results, Sect. 5 includes a discussion, and Sect. 6 Conclusions.

Artificial Intelligence in Predictive Maintenance …

253

2 Method Systematic literature review, often known as SLRs, are the most organized and standardized approach to determining the current state of any field of study by conducting reviews of previously conducted research in that topic [12]. It’s a secondary study and justification is necessary right from the start of a SLR.The aim of SLR is to conduct a survey of previous studies with a similar scope, review their methods critically, and, if possible, combine them into a statistical analysis, which is known as a meta-analysis as well [15]. Although it needs more work than typical reviews, it is more favorable for scholars seeking a broad variety of evidence-based, evolutionary information.In any case, the procedures that required to be carried out in order to acquire an SLR approach are outlined in additional detail for the purpose of our research in the field of AI-driven PdM.

2.1 Research Questions The first step in both making an SLR approach and putting it into action is to come up with research questions. The following are the research questions that will be addressed in the article: . RQ1: What particular areas of PdM have been examined in reviewed articles since 2015? In accordance with the RQ1, we have begun our search between January 1, 2015 and September 30, 2022. Industry 4.0’s optimal maintenance schedule is now being determined by data-driven industrial AI solutions, which are a top-tier research trend. Solution of this RQ will assist us in determining which industrial segments were or were not included in the secondary study of AI-driven and/or data-driven PdM. . RQ2: What are the yearly tendencies in the quantity, variety, and quality of secondary research? This will provide a clear picture of the study paths pursued in the past, the present, and the future. . RQ3: Identify the footsteps of top researchers who conducted the most successful secondary research in PdM according to the quality? This RQ would lead us to analyze the relevant literature in order to follow in the footsteps of top researchers who have been successful in predictive maintenance research. . RQ4: What are the existing contributions and industrial research constraints of secondary study is for AI driven PdM? This is the most crucial and significant conclusion, which will direct us to design our future research route for using AI to apply PdM industrial settings. Consequently, answering this RQ will have the most influence in this study.

254

M. R. Islam et al.

2.2 Systematic Searches Identifying search strategies is crucial to acquiring an SLR. The systematic literature search starts with the selection of authoritative research databases (IEEE, Scopus, Science Direct). Due to their applicability sectors, several academic research article databases, such as PubMed, were ignored in this study. Keyword selection was essential for the perfect SLR. The keywords were: (“Predictive Maintenance” AND (“Review” OR “Survey” OR “systematic Literature Review”) AND (“Artificial Intelligence” OR “Deep Learning” OR “Machine Learning”) The systematic search for SLR was conducted using the aforementioned keywords, namely in the titles and abstracts of research publications in several academic databases. After the search, we were eager to determine which publications would be included in our SLR and which would be excluded.

2.3 Criteria Articles published in peer-reviewed journals and conferences between January 2015 and October 2022 that met the following criteria were included: (1) Articles that serve as reviews for the areas of computer science and engineering. (2) Both formally and informally published SLR and international reviews. Articles based on the following criteria were excluded: (1) Non-relevant review articles about PdM and Artificial Intelligence. (2) Articles that are not in English. (3) Reviewed articles that are shallow analysis.

3 Quality Assessment Criterion of Secondary Studies SANRA [1] is a mechanism for evaluating the quality of secondary research. In accordance with the SANRA technique, a review’s quality could well be determined by responding to the following questions. . . . . . .

QA1: Was the importance of the article to the readers justified? QA2: Were the specific goals or questions put up in an appropriate way? QA3: Were the strategies for finding relevant literature provided effectively? QA4: Were the essential assertions backed by appropriate citations? QA5: Were valid arguments used to support the scientific evidence? QA6: Were the data presented in the optimal manner?

Artificial Intelligence in Predictive Maintenance …

255

Table 1 Literature review studies Authors and review type

Review period

Databases

No. of reviewed articles

Application’s area

Review type

Zhang et al. [21]

2015–2019

Not mentioned

Not mentioned

Manufacturing industry

NR

IEEEXplore, ScienceDirect

18

Manufacturing industry

SLR

Carvalho et al. [2] 2009–2018 Dalzochio et al. [5]

2015–2020

IEEEXplore, Springer, ACM Digital Library, ScienceDirect

38

Manufacturing industry

SLR

Zonta et al. [22]

2008–2020

ACM Digital Library, IEEEXplore, Scopus, Web of Science, ScienceDirect

118

Manufacturing industry

SLR

Xie et al. [20]

1999–2019

ScienceDirect, Scopus, IEEEXplore

218

Railway industry

SLR

Davari et al. [6]

2010–2021

Not mentioned

Not mentioned

Rainway industry NR

Lima et al. [10]

2011–2020

IEEEXplore, Science Direct, Springer, ACM Digital Library

32

Manufacturing industry

SLR

Mahmoud et al. [11]

2012–2020

Scopus, ScienceDirect, IEEEXplore, Web of Science

65

Power industry

SLR

Schwendemann et al. [17]

Not mentioned

Not mentioned

Not mentioned

Manufacturing industry

NR

Es-sakali et al. [8] Not mentioned

Not mentioned

Not mentioned

Building engineering

NR

Drakaki et al. [7]

2015–2021

IEEEXplore, Elsevier, Wiley, Springer, Taylor, Francis

109

Manufacturing industry

NR

Jain et al. [9]

2009–2022

IEEEXplore, ScienceDirect

Not mentioned

Automobile industry

SLR

Cheng et al. [4]

Not mentioned

Scopus, Web of Science, IEEEXplore

37

Manufacturing industry

SLR

Nor et al. [14]

2018–2022

Scopus, IEEEXplore, Elsevier,

56

Nuclear energy industry

NR

Toumi et al. [18]

2016–2021

IEEEXplore, Elsevier, Springer, ASME, Autres

91

Manufacturing industry

SLR

256

M. R. Islam et al.

3.1 Documentation of Scoring Criteria QA1: The significance of rescheduling of secondary articles must be understood within the perspective of reconstruction. The secondary manuscript considers how well the industry outlines the problem of PdM and highlights unanswered questions or gaps in the evidence—fully (2), partially (1), not at all (0). QA2: A high-quality article will ask relevant questions about one or more specific goals or issues that need to be looked into. The grade should reflect whether the task was completed entirely and clearly (2), partially or ambiguously (1), or not at all (0). QA3: A competent narrative review will identify the sources of the information in the literature. It is not required that the highest ranking (2) be based on the number of databases used for a literature search rather than a detailed explanation of the search strategy employed throughout the whole investigation. It is more crucial to select the search keyword and establish the inclusion and exclusion criteria in order to attain a high ranking.When it is only descriptive, we will place them in the second position (1), however failing to mention any of them would result in a score of zero (0). QA4: In any secondary study, all “important claims” must be backed by the appropriate citations. If all “important assertions” in a single research are well supported, the study will earn a flawless grade (2). If just some of the “important claims” are adequately justified, the research will obtain a score of one (1); otherwise, it will receive a score of zero (0). QA5: The scientific viewpoint had to be validated by sufficient evidence, and the evidence had to be of adequate strength to warrant it. If the proof is sufficient, it will earn two points; if it is just superficial, it will receive one point; and if it is insufficient, it will receive no points (0). QA6: This question focuses on the information provided by the secondary article, which exhibits a statistical presentation intended to bolster an argument. To be considered, the insight must be relevant and have the proper direction. If the research results are presented in a logical and relevant way, the study will get the best possible grade (2). The second state will be partially (1) followed by the presentation of systematic data, and the final score (0) will be attained through the presentation of irrelevant data in secondary research (Tables 1 and 2).

4 Results This section provides a concise summary of the findings from the whole research.

4.1 Search Results The poll was conducted throughout the month of January 2015 to October in 2022. According to our search criteria we have found 42 secondary articles among them 18 paper were focused on application of AI in manufacturing Industry’s PdM, rest were

Artificial Intelligence in Predictive Maintenance … Table 2 Contributions of secondary studies Contributions

257

Authors

Davari et al. [6], Drakaki et al. [7], Lima et al. [10], Mahmoud et al. [11], Xie et al. [20], Zhang et al. [21], Zonta et al. [22] ML and DL approaches that lack performance Carvalho et al. [3], Lima et al. [10], Schwendemann et al. [16] analysis Only ML approaches Carvalho et al. [2], Dalzochio et al.[5], Jain et al. [9] ML, DL and Explainable AI (XAI) approaches Cheng et al. [4] Nor et al. [14] ML, Augmented Reality (AR) approaches ML, DL approaches including Es-sakali et al. [8] knowledge-based, model-based approaches PdM related equipment, Data set, Data Davari et al. [6], Nor et al. [14] description ML and DL approaches along with potential future research directions

Table 3 Quality assessment Authors QA1 Zhang et al. [21] Carvalho et al. [2] Dalzochio et al. [5] Zonta et al. [22] Xie et al. [20] Davari et al. [6] Lima et al. [10] Mahmoud et al. [11] Schwendemann al. [17] Es-sakali et al. [8] Drakaki et al. [7] Jain et al. [9] Cheng et al. [4] Nor et al. [14] Toumi et al. [19]

QA2

QA3

QA4

QA5

QA6

2 2 2 2 2 2 2 1 et 2

1 1 1 2 1 1 2 0 0

0 2 2 1 1 1 1 1 0

1 1 1 1 1 1 2 2 2

2 1 1 1 2 1 1 1 2

2 1 1 2 1 0 1 1 2

2 2 2 2 2 2

0 0 2 2 0 0

0 1 1 2 0 1

1 2 1 1 1 1

1 2 1 1 2 1

1 1 1 2 1 1

in Railway, Power, Automobile, Building, Nuclear Energy Industry, 67% studies were published in peer reviewed journals and rest were in conferences. According to the inclusion and exclusion criteria that were identified before, only 15 of the 42 articles that were discovered throughout the search fulfill the standards to be included in this SLR. Among them 9 articles were SLR and 6 were narrative review (NR).

258 Table 4 Limitations of secondary studies Limitations

M. R. Islam et al.

Authors

The standard SLR methodology is not included Es-sakali et al. [8], Jain et al. [9], Mahmoud et al. [11], Xie et al. [20] Systematic literature searches did not meet Drakaki et al. [7], Es-sakali et al. [8], aims properly Mahmoud et al. [11], Nor et al. [14] No comparison of the validation or Carvalho et al. [2], Davari et al. [6], Jain et al. performance measures of ML or DL [9], Lima et al. [10], Schwendemann et al. [17], approaches Zonta et al. [22] Restricted exclusively to PdM of Dalzochio et al. [5] cyber-physical systems XAI, Digital twin in terms of PdM are not All except Cheng et al. [4] included

4.2 Quality Evaluation of Reviews It is essential to conduct a quality assessment of the selected review articles using SANRA, as outlined in Sect. 3. In addition, the score for each article is presented in Table 3 in order to determine which article provides the most insightful viewpoint on predictive maintenance in the field of applied AI in best structured way. This was done in order to find the best article and for answering research question that was formulated in our SLR study.

5 Discussions In the following section, we are going to discuss the findings that pertain to the questions that we posed in our SLR study. RQ1: What particular applications areas of Predictive maintenance have been examined in reviewed articles since 2015? In terms of the application areas of PdM addressed by secondary studies since 2015 are: Manufacturing Industry, Automotive or Automobile Industry, Power Engineering, Building Engineering, Healthcare Industry, Nuclear Energy Industry, Telecommunication Industry, Railway Industry. Initially, we discovered that the Manufacturing Industry had the highest trend, which was 18. In addition, the Automobile Industry ranked second with 6 articles. Further trends were observed in power engineering, building engineering, healthcare, nuclear energy, telecommunications, and railway engineering, in that order. RQ2: What are the yearly tendencies in the quantity of secondary research? We came across a total of 42 secondary research that were related to the study that we executed. Up to the year 2022, 14 secondary papers have been published, indicating that this is a relatively young field of research. Prior to the year 2019, it was almost at

Artificial Intelligence in Predictive Maintenance …

259

zero, even in the primary study as well. The vast majority of secondary research has been published in industry-specific journals that are also concerned with computerhuman interactions. The Computer In Industry Journal is one example of such a journal. ACM Computer Survey is the finest publication in computer science for review papers. Surprisingly, we have not discovered any secondary papers linked to data-driven PdM that motivate us to broaden our research towards our future aim of conducting a comprehensive systematic literature review. Prior to that, this study provides a glimpse into future objectives. RQ3: Who conducted the most successful secondary research in predictive maintenance according to the quality? Applying the SANRA technique of quality assessment of review articles we found that the SLR of team [4] score the highest value (10) out of 12. They followed the better SLR methodologies for their study. The research question was well formulated and justified accurately. They used preferred reporting items for systematic reviews and meta-analyses (PRISMA), and well structured meta analysis of research articles. RQ4: What are the existing contributions and industrial research constraints of secondary studies is for AI driven PdM? SLR is a revolutionary technique for secondary articles, and it is governed by clear and precise rules. In order to assess the state of the art in a given field of study, certain guidelines must be followed. We discovered shortcomings, which are rectified in Table 4, as a result of a quality evaluation. Tables 1 and 2 summarises the overall contributions and constraints for determining the state of the art in PdM secondary research employing AI. Finally, we find that typical SLR methodology and search tactics, as well as the XAI and digital twin concepts, are significantly underutilised.

6 Conclusions This tertiary study had two main objectives: the first was to classify the secondary research in the field of PdM using AI methodologies according to quality and application area; the second was to find the research constraints for determining the future research scope of survey work for a new Ph.D. student or industrial practitioner in this area. Both of these objectives were accomplished by answering all the research questions. 15 articles were selected according to the search, inclusion and exclusion criteria. The assessment were taken place on those selected articles and the findings are: By providing responses to all of the research questions, both of these goals were successfully completed. Following the search, the inclusion, and the exclusion criteria, 15 articles were chosen for further consideration. The results show that the majority of application areas are applied to the manufacturing industry, which validates the reasons that were presented in support of them. In the year 2022, the Scopus database, and journals have established themselves as the leaders in terms of the number of papers published. In addition to this, it leads to the discovery of the AI revolution in Pdm. The team led by X.Cheng at Universiti Kebangsaan Malaysia (UKM) is at the forefront of this discipline, making it wise for newcomers and indus-

260

M. R. Islam et al.

try professionals to emulate their success. This, in turn, indicates that not a great deal of progress has been made in the creation of SLRs and the addition of AI approaches that can be explained or understood in secondary studies. Therefore, the following terms will serve as our chosen keywords for the next secondary research: “Predictive Maintenance”, “Condition Based Maintenance”, “Explainable Predictive Maintence”, “Prognostic Heath Management”, “Prognostic Maintenance”, “Explainable Artificial Intelligence”, “Explainable AI”, “Intepretable Artificial Intelligence”, Following the completion of this tertiary study, our next research will focus on developing a standard SLR equipped with Explainable AI in PdM.

References 1. Baethge C, Goldbeck-Wood S, Mertens S (2019) SANRA-a scale for the quality assessment of narrative review articles. Res Integrity Peer Rev 4(1):5 2. Carvalho TP, Soares FA, Vita R, Francisco RDP, Basto JP, Alcalá SG (2019a) A systematic literature review of machine learning methods applied to predictive maintenance. Comput Ind Eng 137:106024 3. Carvalho TP, Soares FA, Vita R, Francisco RDP, Basto JP, Alcalá SG (2019b) A systematic literature review of machine learning methods applied to predictive maintenance. Comput Ind Eng 137:106024 4. Cheng X, Chaw J, Goh K, Ting T, Sahrani S, Ahmad M, Abdul Kadir R, Ang M (2022) Systematic literature review on visual analytics of predictive maintenance in the manufacturing industry. Sensors 22(17) 5. Dalzochio J, Kunst R, Pignaton E, Binotto A, Sanyal S, Favilla J, Barbosa J (2020) Machine learning and reasoning for predictive maintenance in Industry 4.0: current status and challenges. Comput Ind 123:103298 6. Davari N, Veloso B, Costa G, Pereira P, Ribeiro R, Gama J (2021) A survey on data-driven predictive maintenance for the railway industry. Sensors 21(17) 7. Drakaki M, Karnavas YL, Tziafettas IA, Linardos V, Tzionas P (2022) Machine learning and deep learning based methods toward industry 4.0 predictive maintenance in induction motors: State of the art survey. J Ind Eng Manage 15(1):31–57 8. Es-sakali N, Cherkaoui M, Mghazli M, Naimi Z (2022) Review of predictive maintenance algorithms applied to HVAC systems. Energy Rep 8:1003–1012 9. Jain M, Vasdev D, Pal K, Sharma V (2022) Systematic literature review on predictive maintenance of vehicles and diagnosis of vehicle’s health using machine learning techniques. Comput Intell 38(6):1990–2008. Publisher: Wiley Online Library 10. Lima A, Aranha V, Carvalho C, Nascimento E (2021) Smart predictive maintenance for highperformance computing systems: a literature review. J Supercomput 77(11):13494–13513 11. Mahmoud M, Md Nasir N, Gurunathan M, Raj P, Mostafa S (2021) The current state of the art in research on predictive maintenance in smart grid distribution network: fault’s types, causes, and prediction methods-a systematic review. Energies 14(16) 12. Malek J, Desai TN (2020) A systematic literature review to map literature focus of sustainable manufacturing. J Clean Prod 256:120345 13. Mobley RK (2002) An introduction to predictive maintenance. Elsevier. SjqXzxpAzSQC, Google-Books-ID 14. Nor A, Kassim M, Minhat M, Ya’acob N (2022) A review on predictive maintenance technique for nuclear reactor cooling system using machine learning and augmented reality. Int J Electr Comput Eng 12(6):6602–6613 15. Pati D, Lorusso LN (2018) How to write a systematic review of the literature. HERD Health Environ Res Des J 11(1):15–30. Publisher: SAGE Publications Inc

Artificial Intelligence in Predictive Maintenance …

261

16. Schwendemann S, Amjad Z, Sikora A (2021) Bearing fault diagnosis with intermediate domain based layered maximum mean discrepancy: a new transfer learning approach. Eng Appl Artif Intell 105:104415 17. Schwendemann S, Amjad Z, Sikora A (2021) A survey of machine-learning techniques for condition monitoring and predictive maintenance of bearings in grinding machines. Comput Ind 125:103380 18. Toumi H, Meddaoui A, Hain M (2022a) The influence of predictive maintenance in industry 4.0: a systematic literature review 19. Toumi H, Meddaoui A, Hain M (2022b) The influence of predictive maintenance in industry 4.0: a systematic literature review. In: 2022 2nd International conference on innovative research in applied science, engineering and technology (IRASET), pp 1–13 20. Xie J, Huang J, Zeng C, Jiang S-H, Podlich N (2020) Systematic literature review on data-driven models for predictive maintenance of railway track: implications in geotechnical engineering. Geosciences (Switzerland) 10(11):1–24 21. Zhang W, Yang D, Wang H (2019) Data-driven methods for predictive maintenance of industrial equipment: a survey. IEEE Syst J 13(3):2213–2227 (Conference Name: IEEE Systems Journal) 22. Zonta T, da Costa C, da Rosa Righi R, de Lima M, da Trindade E, Li G (2020) Predictive maintenance in the Industry 4.0: a systematic literature review. Comput Ind Eng:150

Using a Drone Swarm/Team for Safety, Security and Protection Against Unauthorized Drones Ella Olsson, Peter Funk, and Rickard Sohlberg

Abstract There is an increased need for protection against unauthorized entry of drones as there has been an increased number of reports of UAV’s entering restricted areas. In this paper we explore an approach of using a swarm/team of drones that are able to cooperate, to autonomously engage and disable one or more unauthorized drones entering a restricted area. In our approach, we have investigated technologies for distributed decision-making and task allocation in real-time, in a dynamic simulated environment and developed descriptive models for how such technologies may be exploited in a mission designed for a drone swarm. This includes the definition of discrete tasks, how they interact and how they are composed to form such a mission, as well as the realization and execution of these tasks using machine learning models combined with behaviour trees. To evaluate our approach, we use a simulated environment for mission execution where relevant KPI’s related to the design of the mission have been used to measure how efficient our approach is in deterring or incapacitating unauthorized drones. The evaluation has been performed using Monte-Carlo simulations on a batch of randomized scenarios and measures of effectiveness has been used to measure each scenario instance and later compiled into a final assessment for the main scenario as well as each ingoing task. The results show a mission success in 93% of the simulated scenarios. Of these 93%, 58% of the scenarios resulted in the threat being neutralized and in 35% of the scenarios the threat was driven away from the critical area. We believe that the application of such measurements aids to validate the applicability of this capability in a real-world scenario and in order to assert the relevance of these parameters, future validations in real-world operational scenarios are warranted. E. Olsson (B) Saab AB, Nettovägen 6, 175 41 Järfälla, Sweden e-mail: [email protected] P. Funk · R. Sohlberg Mälardalen University, Universitetsplan 1, 721 23 Västerås, Sweden e-mail: [email protected] R. Sohlberg e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_19

263

264

E. Olsson et al.

Keywords Drones · Drone swarm · Protection · Safety · Security

1 Introduction There is an increased need for protection against unauthorized entry of drones and there are increased reports of UAV’s entering restricted areas, e.g., airports, industrial building sites, mining areas, nuclear power plants and military restricted areas (e.g. [1–3]). There are also plenty of safety and security scenarios where unauthorized entry has led to large disturbances (e.g. [4, 5]). The rapid development of artificial intelligence puts us in the middle of a paradigm shift in modern defence solutions. Technologies like drones and drone swarms already reduces and may in future eliminate humans in the battlefield and enable powerful surveillance solutions. We have in this paper investigated the application of drones, drone swarms and their capabilities in defence and safety/security solutions. We believe it to be a game changer for industry, organisations, and society. Development goes fast and there are large investments in the field. Nothing indicates that there will be a slowdown in the process and the potential is large in all fields and aspects of safety/security and defence. E.g., DARPA is making an impressive investment in the field [6]. This paper describes an approach of using a swarm/team of drones for protection against unauthorized entry of drones, to provide safety/security around a critical infrastructure object and its surrounding environment. The approach uses a topdown perspective to define a generic safety/security scenario, including its constituent entities to model a high-level operational concept of the scenario of interest, capturing basic scenario entities and their relations. The scenario also describes the cooperative operations of the constituent entities conducted in the course of achieving the ingoing tasks of this scenario. The ingoing tasks are further developed using machine learning techniques where two reinforcement learning algorithms has been used to develop models/policies that controls the drones during task execution. These models enable the drones to act, cooperate and interact in a swarm/team to deal with an emerging threat. These models have been integrated in a behaviour tree that functions as a mission commander, exercising command and control of the drone swarm/team in response to the upcoming threat. Finally, a set of key performance indicators (KPI) has been selected, based on the NATO Air Force Task List [7] in order to obtain a numerical assessment of the proposed methods.

Using a Drone Swarm/Team for Safety, Security and Protection Against …

265

2 Background Reinforcement learning (RL) is a subfield of machine learning. In recent years, reinforcement learning algorithms have proven to be effective for solving problems at super-human levels. Examples of this is the Deepmind’s Alphastar that uses a combination of reinforcement learning and league training for developing a StarCraft II agent able to defeat 99.8% of its human opponents [8]. Autonomous drones have the potential to revolutionize several industries, including logistics, agriculture, public safety, surveillance and security applications. Drones may be used for protection purpose in both civil and military applications such as scaring away animals from a specific area (e.g., birds around airports etc.). Drones may also be equipped with specific tools for security or defence situations, e. g. nets (used today to capture criminals on the run). RL-based approaches for drones and drone swarms have been used in security applications, such as surveillance, target tracking, and swarm formation control. One example is an RL-based approach to a drone-based intrusion detection system able to detect and track unauthorized access attempts [9]. In the area of surveillance and target tracking, drones has been used to monitor and track targets in real-time, providing situational awareness [10–12] where drone-based systems has been trained to locate, identify and track a moving target in real-time, resulting in improved tracking capabilities.

3 Unmanned, Autonomous Drones in a Security Context 3.1 Scenario The safety/security scenario is modelled around a critical infrastructure object and its surrounding perimeter. In this instance, the critical infrastructure object is set to be an aircraft hangar such as a part of a forward operating airbase (FOB). The scenario is set around four main entities, these are: 1. A security drone swarm/team—This entity consists of three similar drones with the ability to autonomously act, cooperate and interact in a swarm/team configuration in order to deal with an emerging threat. 2. A patrol drone—This is a single drone equipped with high performance camera and processing equipment. This drone can continuously patrol the fenced-off area in search for trespassing objects. 3. An intruder—The intruder is represented by a single drone whose task is to trespass into the fenced-off area from above. Figure 1 depicts the scenario, including the critical infrastructure, the scenario entities, and the interactions between the entities where the dashed line arrows represent communication between entities and the full line arrows represent interactions.

266

E. Olsson et al.

Fig. 1 The safety/security scenario, including the critical infrastructure, the scenario entities

4 Distributed Decision Making and Task Allocation 4.1 Task Allocation and Execution The safety/security scenario activities are controlled by a high-level mission which encompasses a set of tasks that are allocated to, and executed by the mission entities. These tasks are: 1. A Threat Engagement task, allocated to one threat engagement drone in the drone-swarm/team comprising of three threat engagement drones. 2. A Group/Regroup task which is allocated to, and cooperatively executed by the drones in the drone swarm/team. 3. An area protection task is allocated to, and cooperatively executed by the drones in the swarm/team. 4. A Patrol Task which is allocated to a patrol drone. We use a behaviour tree [13] to control execution of the allocated tasks. The behaviour tree controls the execution of all tasks within the mission. The execution of the behaviour tree starts at the root node and propagates to a parallel execution of the two child sequence nodes where the left one controls the execution of the patrol task which is further detailed in [10, 11] whereas the right sequence node controls the execution of the tasks allocated the security drone-swarm/team, which are to be further detailed in this and the following sections.

Using a Drone Swarm/Team for Safety, Security and Protection Against …

267

Fig. 2 The behaviour tree controls the execution of all tasks within the scenario

As depicted in Fig. 2, the right branch of the behaviour tree controls the task execution of the drones. A group/regroup task is executed until an intruder is detected (this is combined with the parallel execution of the patrol task as described in [10, 11]). If an intruder is detected, the control flow is ticked to execute the area defence task and if this task fails to deter the intruder from the area, the threat engagement task is executed.

4.2 Distributed Decision Making The Group/Regroup task and the area protection task involves distributed decision making and cooperation where the three drones in the drone swarm/team acts in concert which each other in a swarm/team setting in order to perform their tasks. Further, these tasks, and the threat engagement task are all realized using a set of neural networks, where each network is trained on a one specific task using a reinforcement learning algorithm. The applied algorithms are further detailed in Sect. 5. The distributed decision making is based on the MADDPG algorithm [14] which allow agents to learn from their own actions as well as the actions of other agents in the environment. Each agent is treated as an “actor” which gets advice from a “critic” that helps the actor to decide what actions to reinforce during training. Agents do not need to access the central critic at test/execution time and instead they act based on their observations in combination with their predictions of other agent’s behaviours’.

268

E. Olsson et al.

Fig. 3 The cooperative behaviour is based on distributed decision making where drone agents act based on their observations in combination with their predictions of other agents’ behaviours

The centralized critic is learned independently for each agent. This approach can be used to customize the reward structure for specific agent(s). It also enables scenarios with adversarial agents with opposing rewards. Figure 3 depicts the concept of autonomy and distributed decision making in the drone swarm/team.

5 Implementation 5.1 Training Framework for Cooperative Tasks The Multi-Agent Particle Environment (MPE) [15], combined a PyTorch implementation of OpenAI Baselines [16] called stable baselines 3 [17] has been used for training and evaluation of the MARL algorithm MADDPG [14]. The MPE contains a set of simple OpenAI gym environments using only a simple kinetic agent model. The selected environments are adapted from simple spread and simple tag scenarios [14]. These environments adaptations were used to train high-level multi-agent control networks for the multi-agent task models. Figure 4 depicts the relation between the RL algorithm, the environment and the agent policies. The policies emits high-level steering commands to the agents in the representative scenarios. In turn observations and rewards are returned to the training algorithm which uses this information to calculate a reward for the critic and infer new steering commands from the policy.

Using a Drone Swarm/Team for Safety, Security and Protection Against …

269

Fig. 4 The training framework for training of the high-level multi-agent control networks

The reward is recalculated at each time step during training and simulation. During training, each agent learns to output high level control actions to minimize the negative reward based on the reward function which is detailed in the sections below. The cooperative scenarios comprise of three good agents, whereas the competitive scenario also includes one adversarial agent. Both scenarios contain three randomized landmarks which act as go to-positions in the first scenario and obstacles in the second. When training starts, each scenario is instantiated with three good agents and one adversarial agent (in the cooperative/competitive scenario), along with a set of obstacles/go to positions, all at randomized positions. An episode is defined as running the simulated training environment in a certain number of time steps (usually 25–50 steps) and the reward function that estimates rewards at each time step.

5.1.1

Group/Regroup Cooperative Task

In the group/regroup task, the agent takes tactical information about the locations and velocities of other agents and the points of destination as input and predicts a high-level control action, in concert with the other agents in the swarm/team. For each action the agents’ outputs, it receives a reward from the environment based on a reward function that evaluates the combined distance between each agent and its nearest point of destination. Further, the reward function calculates the combined distance between each point of destination and its nearest agent. The sum of these calculations is then used as a reward provided to each agent critic. The steps below show the reward calculation. 1. Agent spread: The sum of the least distance to a destination point from any agent. 2. Destination point spread: The sum of the least distance to an agent from any destination point. 3. The combined reward is defined as the sum of (1) and (2).

270

5.1.2

E. Olsson et al.

Area Defence Cooperative/Competitive Task

The training algorithm uses the MADDPG algorithm to train the RL agents. This task uses a Simple Tag scenario [14]. In the area defence task, good agents take tactical information about the locations and velocities of other agents as well as the threat (adversarial agent) as input and predicts a high-level control action, in concert with the other agents. For each action the good agents’ outputs, they receive a reward from the environment based on a reward function that evaluates the combined distance between each good agent and the adversarial agent. Contrary, the adversarial agent receives a penalty based on its distance to any of the good agents. Further, the adversarial agent also receives a penalty if it leaves the area which is set to a pre-defined, normalized boundary. This results in two sets of behaviours and objectives: (1) the good agents will cooperatively chase, try to surround, and catch the adversary. (2) The adversary will try to stay in the area while in the meantime trying to avoid being cached by the good agents. In addition, the scenario also includes a set of obstacles that agents are penalized if they fly into. Similarly, all agents are penalized if they collide with other agents.

5.2 Training Framework for Competitive Task 5.2.1

Threat Engagement Competitive Task

Proximal Policy Optimization (PPO) is the selected algorithm used to train the threat engagement task. PPO is a policy gradient [18] method for reinforcement learning. In the training of a policy, it alternates between sampling data from the policy and performing several epochs of optimization on the sampled data. The algorithm is suited for continuous control tasks and has outperformed other online policy gradient methods on tasks such as simulated robotic locomotion and Atari games while being much simpler to implement, more general, and have better sample complexity [18]. The threat engagement task is trained using a scenario where two agents: one “good” and one adversary agent, are pitted against each other. The scenario uses OpenAI’s multi-agent particle environment which is a simple multi-agent world with a continuous observation and action space, along with some basic simulated physics [15]. The scenario is a specialized OpenAI gym scenario with a specialized threat engagement reward function and entity definitions. For each scenario episode, each competitive agent starts at a random position in the scenario. The ambition is to train the agents to learn to engage each other by means of reinforcement learning e.g., each agent must score as good of a reward as possible when pitted against the other agent. The threat engagement scenario is defined as a game where the agents fight each other until one of the below conditions is true:

Using a Drone Swarm/Team for Safety, Security and Protection Against …

271

. An agent is defeated (after 5 successful aims) . The scenario times out (3000 steps). 5.2.2

Training Algorithm

The training process optimizes two separate agents that learns to fight each other from scratch. Both agents start from “zero” and slowly develops skills needed to engage each other. The architecture is built upon OpenAI Gym [19], Multi-Agent Particle Environment [15] and the PPO training algorithm [20] from a PyTorch implementation of OpenAI Stable Baselines [17].

5.2.3

Training of Entities

The selection of rewards are crucial for prevention of local minima and encouragement of a balance between engagement, evasion, and the will to stay alive and not crash into the opponent while maintaining the will to properly defeat the opponent and staying within range of each other. The reward function is defined as stated below: 1. A high, fixed penalty for collision between agents. Both agents are penalized. 2. A shaped penalty for fleeing from the opponent in terms of a penalty that increases proportionally with the distance to the opponent. 3. A shaped “aim”-reward that increases with the agent’s ability to “aim” at its opponent by pointing its “nose” at the opponent in terms of capturing the opponent within a pre-defined engagement-zone/cone of the agent. 4. A shaped “aimed-at”-penalty that increases with the opponent’s ability to “aim” at the agent by pointing its “nose” at the agent in terms of capturing the agent within a pre-defined engagement-zone/cone of the opponent.

6 Simulation and Evaluation 6.1 Architecture The drone receives a task from the behaviour tree (see Sect. 4.1) and the type of task (Group/Regroup, Area Defence, Threat Engagement) determines which neural network to deploy. Each deployed neural network requires a specific set of input variables and produces an output in terms of a high-level steering command. The input vector consists of kinematics in terms of the drone’s own position and velocities and the relative positions and velocities of the other drones and the intruder. The steering command consist of a high-level command in terms of delta coordinates in the x, y and z-positions. Which in term are sent to a PID controller that excerpts

272

E. Olsson et al.

Fig. 5 The behaviour architecture for each drone. Its main behaviour component is composed of a set of task specific behaviour networks

low-level control of the drone in terms of motor RPM values for each motor. The general behaviour component architecture is depicted in the Fig. 5.

6.2 Evaluation The evaluation of the safety/security scenario has been done in the PyBullet drones’ framework [21] with simulated BitCraze Crazyflie quadcopters [22]. It is performed using Monte-Carlo simulations on a batch of randomized scenarios and measures of effectiveness has been used to measure each scenario instance and later compiled into a final assessment for the main scenario as well as each ingoing task (see Fig. 6 and Sect. 6.3).

6.3 Results Measures of effectiveness for the scenario are derived from the AFT 7 to PERFORM COMMAND AND CONTROL [7], adapted to the specific tasks within the safety/ security scenario. This section describes general information about these measurements of effectiveness. The measures are used to provide a standard for the evaluation of the tasks by means of expressing the degree of success the task execution results

Using a Drone Swarm/Team for Safety, Security and Protection Against …

273

Fig. 6 A set of scenarios has been simulated and measures of effectiveness has been compiled for the main scenario as well as each ingoing task

in under a specified set of conditions. [7] states that “A measure provides the basis for describing varying levels of task performance and are directly related to a task”. The following measure has been selected: . M1 Percent Of desired operational effects achieved. In this scenario, it is related to effectiveness of the complete infrastructure protection mission. . M2 Percent Of desired tactical effects achieved. In this scenario, it is related to effectiveness of each of the ingoing tasks of the infrastructure protection mission. Figure 7 depicts the M1 Percent Of desired operational effects achieved during Monte-Carlo simulations on a batch of randomized scenarios. The result shows a mission success in 93% of the simulations. Of these 93%, 58% of the simulation resulted in the threat being neutralized (Threat Down) and in 35% the threat was driven away from the critical area. In 7% of the simulations, the threat engagement drone was neutralized (threat engagement drone down), which resulted in a mission failure. The three diagrams in Fig. 8 depicts the M2 Percent Of desired tactical effects achieved. The tactical effects are related to the effectiveness of each of the ingoing tasks of the mission. The figures depict, from left to right: 1. For the regroup task, the M2 result is 100% success. The success is timedependent, that is, the group/regroup must be done within the time span of the safety/security scenario. 2. For the Area Defence Task, the M2 measurement success in 77% of the simulations and fails in 23%. The definition of mission success/failure for the regroup task is defined below:

274

E. Olsson et al.

Fig. 7 The figure depicts the M1 Percent Of desired operational effects achieved in the safety/ security scenario

a. The objective of this task is to keep the threat away from a critical perimeter. If the threat is kept outside of this perimeter until the scenario time span has run out, the area defence task is considered a success. b. If the threat is neutralized during the area defence task, it is considered a success c. If the area defence drones fails to keep the threat is outside of the perimeter anytime during the scenario time span, the mission is considered a failure 3. For the threat engagement task, the M2 measurement equals success in 94% of the simulations and failure in 6%. The definition of mission success/failure for the threat engagement task is defined as below: a. The objective of this task is to engage the threat and either neutralize it or drive it away from the critical infrastructure are perimeter, and keep it outside of this perimeter the whole time-span of the safety scenario. Both of these conditions will result in a success. b. Failure of this task will happen either if the threat engagement drone is neutralized, or if the treat remains within the critical perimeter of the area when the safety scenario time span ends.

Using a Drone Swarm/Team for Safety, Security and Protection Against …

275

a)

b)

c) Fig. 8 a, b, c The three figures depict the M2 Percent Of desired tactical effects achieved during the scenario

7 Conclusions and Future Work The aim of this work is to explore valuable aspects and solutions for future development of autonomous capabilities in the safety/security domain. We introduce and validate a hybrid reinforcement learning/behaviour tree approach to obtain a simulated infrastructure protection capability by means of a swarm/team of drones. The capability is validated in a simulated safety/security scenario by means of real-world performance measurements. These are the Percent Of desired operational effects achieved in the mission and the Percent Of desired tactical effects achieved in the mission. We believe that the application of such measurements aids to validate the applicability of this capability in a real-world scenario and in order to assess the relevance of these parameters, future validations in real-world operational scenarios are warranted.

276

E. Olsson et al.

Acknowledgements We would like to thank our funders, Vinnova (Grant 2022-02869), Saab AB and Mälardalens University for this project.

References 1. TT News Agency (2023) Drönare vid kärnkraftverkutreds som nationell särskild händelse [Drones at nuclear power plants investigated as national special event] Ny Teknik, 17 Jan. https://www.nyteknik.se/dronare-karnkraft-samhalle/dronare-vid-karnkraft verk-utreds-som-nationell-sarskild-handelse/455890. Accessed 22 Feb 2023 2. Bergman T (2022) Flögdrönare i närheten av slottet och riksdagen–utländsk man gripen [Drones flew near the castle and the parliament-foreign man arrested], SVT Nyheter, 18 May. https://www.svt.se/nyheter/lokalt/stockholm/flog-dronare-i-narheten-av-slottet-ochriksdagen-utlandsk-man-gripen. Accessed 22 Feb 2023 3. TT News Agency (2020) Polisen sökte drönare över regeringskansliet [The police searched for drone over the government office], Svenska Dagbladet, 2 July. https://www.svd.se/a/P9K XWb/polisen-soker-dronare-over-regeringskansliet. Accessed 22 Feb 2023 4. Edgren J (2017) Filmen visar det kaos som en drönare skapade på Gatwick [The film shows the chaos caused by a drone at Gatwick], Ny Teknik, 28 November. https://www.nyteknik.se/ dronare/filmen-visar-det-kaos-som-en-dronare-skapade-pa-gatwick/179661. Accessed 23 Feb 2023 5. Sobieski N (2023) Norge stängdeluftrum över två flygplatser–drönare siktad, SVT Nyheter, 16 October. https://www.svt.se/nyheter/utrikes/norge-stanger-luftrum-dronare-overflygplatser. Accessed 23 Feb 2023 6. Rudd L (2016) OFFensive Swarm-Enabled Tactics (OFFSET) https://www.darpa.mil/program/ offensive-swarm-enabled-tactics. Accessed 23 Feb 2023 7. Air Force Task List (AFTL) Air Force Doctrine Document 1–1, 12 Aug 1998 8. Vinyals O, Babuschkin I, Czarnecki WM et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350–354. https://doi.org/10.1038/s41586-0191724-z 9. Masadeh A, Alhafnawi M, Salameh HAB, Musa A, Jararweh Y (2022) Reinforcement learningbased security/safety UAV system for intrusion detection under dynamic and uncertain target movement. IEEE Trans Eng Manage. https://doi.org/10.1109/TEM.2022.3165375 10. Dantas A, Diniz L, Almeida M, Olsson E, Funk P, Sohlberg R, Ramos A (2022) Intelligent system for detection and identification of ground anomalies for rescue. In: ITNG 2022 19th International conference on information technology-new generations. Springer International Publishing, pp 277–282 11. Sundelius O, Funk P, Sohlberg R (2023) Simulation environment evaluating AI algorithms for search missions using drone swarms. In: International congress and workshop on industrial AI 2023. Forthcoming paper 12. Zhao J, Liu H, Sun J, Wu K, Cai Z, Ma Y, Wang Y (2022) Deep reinforcement learningbased end-to-end control for uav dynamic target tracking. Biomimetics 7:197. https://doi.org/ 10.3390/biomimetics7040197a 13. Iovino M, Scukins E, Styrud J, Ögren P, Smith C (2022) A survey of behavior trees in robotics and ai. Robot Auton Syst 154:104096 14. Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actorcritic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, vol 30 15. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Neural Information Processing Systems (NIPS)

Using a Drone Swarm/Team for Safety, Security and Protection Against …

277

16. Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y, Zhokhov P (2017) OpenAI Baselines. In: GitHub repository. GitHub. https://github.com/ openai/baselines, Last accessed September 2023. 17. Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-baselines3: reliable reinforcement learning implementations. J Mach Learn Res 22(268):1–8. http://jmlr. org/papers/v22/20-1364.html, Last accessed September 2023. 18. Sutton RS, McAllester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, vol 12 19. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym 20. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint at arXiv:1707.06347 21. Panerati J, Zheng H, Zhou S, Xu J, Prorok A, Schoellig AP (2021) Learning to fly—a gym environment with pybullet physics for reinforcement learning of multi-agent quadcopter control. In: 2021 IEEE/RSJ International conference on intelligent robots and systems (IROS) (pp 7512–7519). IEEE 22. Bitcraze Crazyflie 2.1 quadcopter. https://www.bitcraze.io/products/crazyflie-2-1/, Last accessed Septemeber 2023.

Design, Development and Field Trial Evaluation of an Intelligent Asset Management System for Railway Networks Markos Anastasopoulos, Anna Tzanakaki, Alexandros Dalkalitsis, Petros Arvanitis, Panagiotis Tsiakas, Georgios Roumeliotis, and Zacharias Paterakis

Abstract This paper focuses on the development of a platform used to identify optimal maintenance plans for railway tracks. To achieve this, a fully automated solution that can monitor the status of the tracks has been deployed by Hellenic Train. Sensors monitoring a variety of parameters (such as acceleration, vibration, position, cameras etc.) have been attached to the rolling stock frame, continuously monitoring the status of the tracks. Based on the collected measurements, Machine Learning schemes able to detect track defects and estimate the deterioration rate of track quality over time have been developed. The output of these models has been used as input to a set of optimization problems that have been formulated in order to estimate the candidate time periods during which maintenance activities can be scheduled under various constraints and cost functions.

M. Anastasopoulos (B) · A. Tzanakaki Department of Physics, National and Kapodistrian University of Athens, Athens, Greece e-mail: [email protected] A. Tzanakaki e-mail: [email protected] A. Dalkalitsis · P. Arvanitis · P. Tsiakas · G. Roumeliotis · Z. Paterakis Hellenic Train, Athens, Greece e-mail: [email protected] P. Arvanitis e-mail: [email protected] P. Tsiakas e-mail: [email protected] G. Roumeliotis e-mail: [email protected] Z. Paterakis e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_20

279

280

M. Anastasopoulos et al.

Keywords Intelligent asset management · Maintenance · Artificial intelligence · Optimization · Railway track · Anomaly detection · Track defects

1 Introduction This study focuses on the development of a technology solution that can be used to identify optimal maintenance plans ensuring cost-effective operation for railway network. Railway track maintenance is a challenging problem in railway environments due to the large number of parameters, variables, and constraints involved in the decision-making process. In many cases, decisions taken by the infrastructure manager to perform specific maintenance activities have a direct impact on train operators as they may lead to service disruptions. This is a common situation, especially in railway networks comprising double-track lines without redundancy, such as the Greek railway network shown in Fig. 1, where maintenance has an impact not only on service availability and reliability but also on safety. Towards this direction, the present paper provides an overview of an Intelligent Asset Management System (IAMS) platform that has been developed assisting infrastructure managers and operators to identify cost-effective maintenance plans that can not only increase operating speed and capacity but also improve the reliability of the railway system. The platform developed in the context of the EU project

Fig. 1 The Hellenic train railway network

Design, Development and Field Trial Evaluation of an Intelligent Asset …

281

DAYDREAMS provides a set of tools that can be used to monitor the status of the railway track using cost-effective sensors, analysis of collected data and then prescribe a set of optimal maintenance plans that can jointly optimize the payoff for the Infrastructure Managers and the Rail service providers. To achieve this, the platform operates adopting a cloud-based microservices architecture allowing users to be actively involved in all stages of data analysis performed during track maintenance. This includes: • data on-boarding offering the option to upload either offline data or connect measuring equipment acquiring data from the field. • visualization of the collected measurements through simple re-configurable dashboards. • analysis of past measurements using diagnostic analytics. The outcome of this analysis provides pointers to the exact locations where defects may exist. • prediction of the status of the track over time and, • recommendation of optimal maintenance plans, solving a set of Multi-Objective Optimization (MOO) problems using as input data from the previous stages. The necessary intelligence is provided to the IAMS platform through a set of machine learning (ML) models that have been purposely designed to: (a) detect track anomalies, (b) estimate the deterioration of the track over time and, (c) recommend a set of candidate time periods where maintenance activities can be scheduled. To develop the relevant ML models, we rely on data collected from track monitoring equipment (i.e., accelerators, vibration sensors and industrial cameras) installed on a train operated by Hellenic Train. The data acquisition system that has been installed collects fully synchronised measurements from heterogeneous sensors with sampling frequencies exceeding 10 kHz/channel/sec. It is well known that track irregularities generate a certain level of vibration in moving trains influencing stability and safety of travellers [1]. Most of these irregularities are determined through human inspection, while maintenance processes are carried out during regular/fixed reviews of the system. Currently, the measurements that can be used are classified in two main categories including [2]: • direct measurements aiming at identifying the exact track irregularity including its size and location, using laser scanning and video capture, and • indirect measurements using accelerations at the axial box or the carriage body to reflect the status of the irregularity in the track. The platform considered in this study relies on the second approach as it provides a very cost efficient and effective approach in detecting track anomalies. Collection of qualitative data with high sampling rates offer significant benefits in the maintenance process as it enables not only accurate characterisation of irregularities that may appear in the tracks but also estimation of the way these irregularities evolve over time. Estimation of the evolution of track irregularities is performed using a specific ML class called Gaussian Process Regression (GPR) [3]. Once the preprocessing phase has been completed and the relevant input parameters have been

282

M. Anastasopoulos et al.

estimated, a Multi-Objective Optimisation (MOO) problem is formulated, recommending optimal maintenance strategies for all stakeholders involved. The rest of the paper is organized as follows. Section 2 provides an overview of the main building blocks of the developed platform. Section 3 describes the IAMS Platform ML Pipeline and Sect. 4 presents the numerical results produced and summarises the conclusions.

2 The Intelligent Asset Management Platform The operation of the IAMS platform proposed in this study relies on a set of ML models that can be used to estimate and then predict the status of the railway track and prescribe optimal intervention plans. To achieve this, the platform comprises a set of microservices adopting the architecture shown in Fig. 2 performing the following actions. • Data Acquisition responsible for the collection of data from various sensors. It should be mentioned that Hellenic Train allocated a specific train for experimentation purposes. This train has been equipped with sensors monitoring a variety of parameters (such as acceleration, vibration, position etc.) attached to the rolling stock frame, continuously monitoring the status of the tracks. The overall installation process and location where sensors have been mounted is depicted in Fig. 3. • Message Broker facilitating inter-module connectivity. The interaction between the developed modules utilize an asynchronous Application Programming Interfaces (API) based on MQTT. • Storage (data, ML metadata repository) implemented through InfluxDB for the time series datasets and PostgreSQL for object storage. InfluxDB through MQTT exporter stores all real time measurements acquired from the field. PostgreSQL is used to store ML metadata. Fig. 2 Interactions and communication between the different modules supporting the operation of the IAMS platform

Design, Development and Field Trial Evaluation of an Intelligent Asset …

283

Fig. 3 Hellenic Train Measurement Campaign: a Siemens Desiro used in the experimentation b Installation of vibration sensor (Monitran 1185C), c Basler Line scan camera monitoring railway tracks with custom enclosure designed by TRAINOSE, d 3-Axis accelerometer installation (PCB 356A02).

• Train module responsible for the selection and optimization of the hyperparameters of the ML models. The ML schemes are able to detect track defects and estimate the deterioration rate of track quality over time. • Prescribe module responsible for the MOO and recommendation of the optimal maintenance plans. The MOO problems that have been formulated in order to estimate the candidate time periods during which maintenance activities can be scheduled under various constraints and cost functions. The connectivity between the different modules is provided through Application Programming Interfaces (API) adopting the Publish (PUB)-Subscribe (SUB) approach. The two basic PUB and SUB commands have been implemented using a message broker. Each application can publish or subscribe to a specific topic and send or receive the necessary information. The basic components that have been used to implement the various modules include:

284

M. Anastasopoulos et al.

• MQTT (MQ Telemetry Transport) is a lightweight, publish-subscribe, machine to machine network protocol. It is designed for connections with remote locations that have devices with resource constraints or limited network bandwidth. It runs over a transport protocol that provides ordered, lossless, bidirectional connections—typically, TCP/IP. It is an open OASIS standard and an ISO recommendation (ISO/IEC 20,922). • Graphical user interfaces developed using the open source anvil.works1 platform written in Python, • A set of Python libraries implementing the train and predict modules. The key libraries that have been used include: PyTorch, Pandas, Matplotlib, Numpy, Scipy and Scikit-learn. In addition to this, signal processing related libraries (PyWavelets, pywt) have been used to implement the wavelet.

3 The IAMS Platform ML Pipeline The IAMS platform relies on a set of interactive dashboards allowing users to be involved in the descriptive, diagnostic, predictive and prescriptive parts of the maintenance process. The ML pipeline comprises the following steps: Step 1: Data On-boarding—Data can be uploaded either offline or in real time by adding a new data acquisition device. To add the new device its IP address should be selected along with the sampling rate (in 1 Hz–10 kHz) and the number of measurement channels (integer value in the range 1–8 channels). Step 2: Descriptive Statistics—In the next step, the user has the option to view plots of the collected time-series per track segment. Typical plots that can be generated include acceleration and vibration, speed, simple statistics per track segment etc. Comparisons between different track segments can also be provided. A typical example is shown in Fig. 4 (left) where platform users can customize their dashboard and view down-sampled versions of the collected metrics. In addition to simple time-series plots, the dashboard provides forms to visualize basic statistics and view graphs of advanced signal processing models. The relevant dashboard is shown in Fig. 4 (right) where users can select the track segment and time window where the analysis can be performed. Step 3: Diagnostic Statistics—In the next step, users can select on AI model through a drop-down list of AI models that can be trained to identify track defects. So far, the platform supports the following ML models: Support Vector Machine (SVM), Neural Networks (NN) models based on Recurrent Neural Networks, Gaussian Mixture Models (GMM), Principal component analysis (PCA), Self-Organizing Maps (SOM) and Logistic Regression. The results of the training process (optimal ML model and relevant performance metrics) are stored in the ML repository. 1

https://anvil.works/.

Design, Development and Field Trial Evaluation of an Intelligent Asset …

285

Fig. 4 (Left) Time series data visualization dashboards showing acceleration, vibration and speed, (middle)) Wavelet analysis performed in a specific track segment for anomaly detection, (right) Normal versus abnormal speed vibration profiles

Step 4: Diagnostic Model deployment—The trained ML models are available to the users through the ML repository. The ML repository contains the list of all pretrained ML models together with their corresponding accuracy metrics (i.e., training error). Users can select through the “tick box” the optimal pretrained model that they want to deploy in the system. Once selected, the ML models can be tested using either history measurements Fig. 5a) or real time data collected by the monitoring platform. To further verify the accuracy of the results obtained field surveys have been performed (Fig. 5b, c). Step 5: Predictive statistics—In the next step, an ML model is applied, estimating the degradation of the track over time. The model uses as input the outcome of the analysis carried out in the previous step. At this stage, users may select the configuration/parameters of the track degradation model (see Fig. 6) and evaluate its accuracy. Step 6: Prescriptive Statistics—In the final step, the track degradation model is used to estimate the future condition of the track. Users select a normalized value for the ride quality index for the passengers and a MOO optimization model is solved identifying the optimal maintenance plans. Previous scenarios and their cost functions are also illustrated.

Fig. 5 (Left): ML model validation against test data. 1: indicates defect in a specific track point, 0: normal track point, (middle): geolocation of track defects, (right): verification of track anomalies using field surveys

286

M. Anastasopoulos et al.

Fig. 6 Track degradation model using GPR

4 Numerical Results This section evaluates the tools developed to support the operation of the IAMS platform. The results obtained by the different ML models are available through the “Trained ML Models” repository. The ML models are trained using as input features obtained directly from the various sensing devices installed on-board. In addition to raw data (vibration, speed, acceleration, etc.), signal processing techniques have been also applied to enrich the feature list. This is considered as one of the key innovations of the IAMS platform as the ML models developed in have been trained using as inputs features calculated based on advanced signal processing techniques such as the continuous wavelet transform (CWT) analysis (see Fig. 4-middle). At this point it should be noted that the corresponding Neural Network (NN) models are stored in a JavaScript Object Notation (JSON) format and are readily available for deployment through the ML repository (Fig. 7). Initially, we evaluate the efficiency of the developed NN models for different parameters and topology settings. Some indicative results showing the impact of the number of hidden neurons under different type neuron models for the hidden and output layers are illustrated in Fig. 8. We observe that the combination of Sigmoid neuron model for the hidden layers and Softmax for the output layer achieves the best results for what concerns the

Fig. 7 ML model repository

Design, Development and Field Trial Evaluation of an Intelligent Asset …

(a)

287

(b)

Fig. 8 Performance of the NN-based anomaly detection scheme for different number of hidden neurons, a Hidden Layer Type: Sigmoid, Output Layer Type: Sigmoid, b Hidden Layer Type: Sigmoid, Output Layer Type: SoftMax

ability of the model to detect track anomalies. From the same results we also observe that as the number of hidden neurons increases, the efficiency of the track anomaly detection ML model also increases reaching its maximum value of 99.5% for 7 or more number of hidden layers. In the next step of the analysis, measurements are gathered and analysed in order to estimate how the specific vibration and acceleration profiles evolve over time. To perform this task, a Gaussian Process Regression (GPR) model has been implemented estimating the evolution of vibration/speed and acceleration/speed profiles through time for different track segments. The GPR model uses as input a training f (x set i.e., (x i , f (x i )), i = 1, 2 . . . , n and determines the response new ) for the = m(x) and function E f new input vector x new . A GP is defined by a mean (x) a covariance function k x, x = C O V f (x), f x . k x, x is defined through a set of kernel functions modelling the fact that measurement points taking similar values x i , will produce similar outputs (response functions f (x i )). Typical outputs of the GPR model for the squared exponential kernel and the ARD Exponential Kernels as defined in [4] for the vibration/speed curves is shown in Fig. 6. Finally, once the relation between vibration/acceleration and speed has been determined the optimal maintenance plans are determined solving a MOO that jointly minimizes train delays and maintenance costs. The optimization framework developed solves the MOO problem that minimizes maintenance costs (C M r t ) and delay ( D r t ) for line r at time t through the minimization of the following MOO function over the set of all lines R and time periods T : min

t∈T,r ∈R

C Mr t ,

Dr t

(1)

r ∈T ,r ∈R

Subject to a set of constraints related to safety, ride quality and speed defined [5]. To model ride quality, the Sperling’s ride index has been considered [6, 7]. The

288

M. Anastasopoulos et al.

Fig. 9 Interface of the optimization module showing the Pareto front of the MOO problem

graphical interface of the MOO module developed for IAMS is shown in Fig. 9. Users can select the normalized value for the ride quality index and then select the “optimize” button to solve the problem. The calculated values of the cost functions are illustrated in the Pareto front diagrams illustrated in Fig. 9 where the optimal values for the Maintenance Cost vs Capacity and Maintenance Cost vs Delay are illustrated.

5 Conclusions The paper study focused on the development of an Intelligent Asset Management System (IAMS) to identify optimal maintenance policies for Infrastructure Management and Railway Operators. The cloud-based platform relies on data collected through extensive field measurements to train a set of machines learning models that are able to detect track defects and estimate the deterioration rate of track quality with high accuracy. The output of these models is used as input to a multi-objective optimization module that prescribes optimal intervention strategies, minimizing costs and delays. Additional tests and field measurements are planned to further validate the developed system and improve its accuracy. Acknowledgements This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No: 101008913—DAYDREAMS and the 5G-VICTORI project under Grant agreement ID: 857201.

Design, Development and Field Trial Evaluation of an Intelligent Asset …

289

References 1. Peixer MA, Montenegro PA, Carvalho H, Ribeiro D, Bittencourt TN, Calçada R (2021) Running safety evaluation of a train moving over a high-speed railway viaduct under different track conditions. Eng Fail Anal 121:105–133 2. Niu L, Liu J, Zhou Y (2020) Track irregularity assessment in high-speed rail by incorporating carriage-body acceleration with transfer function. Math Probl Eng (1) 3. Peng S, Feng QM (2021) Reinforcement learning with Gaussian processes for condition-based maintenance. Comput Ind Eng 158:107321 4. Kernels for Gaussian Processes. https://scikit-learn.org/stable/modules/gaussian_process.html# kernels-for-gaussian-processes 5. Daydreams D3.2, Artificial Intelligence Modelling, https://daydreams-project.eu/ 6. Jiang Y, Chen BK, Thompson C (2019) A comparison study of ride comfort indices between Sperling’s method and EN 12299. Int J Rail Transport 7(4):276–296 7. Gaberson HA (2012) Shock severity estimation. Sound Vibr 46:12–19

Analysing Maintenance and Renewal Decision of Sealed Roads at City Council in Australia Kishan Shrestha and Gopi Chattopadhyay

Abstract Roads are one of the major physical infrastructures of Hepburn Shire Council (HSC) as of all other local councils. Every year HSC allocates and spends huge amount of budget on roads for maintenance and renewal. The road performance condition level has been the major priority for roads renewal selection. However, other criteria are under-considered, and there are gaps in significant analysis of the relation between roads age, condition, risk, and cost. In this study, decision-making framework or tool has developed using multi criteria technique (MCT) and analytic Hierarchy Process (AHP) for single objective optimisation i.e., to provide an agreed level of service optimising Maintenance and Renewal cost or improve the condition subjected to annual budget. This study adopted decision criteria as per community and council needs, by developing a model for criteria selection. Additionally, this study analysed the adopted HSC maintenance strategies, condition monitoring systems, performance conditions of the roads, and operational and renewal budget of HSC. Keywords Maintenance · Renewal · Performance condition · Roads age · Useful life

1 Introduction Roads are one of the important assets of public infrastructure which required effective maintenance for maintaining its quality of service. Annually, Hepburn Shire Council allocates and spends about 64% of total infrastructure budget on Roads for regular maintenance, upgrade, and renewal of road network for safety and convenience of the community [1]. As stated in Council Plan 2017–2021 by Hepburn Shire Council [2], K. Shrestha (B) · G. Chattopadhyay Institute of Innovation, Science, and Sustainability, Federation University, Mount Helen, VIC 3350, Australia e-mail: [email protected] G. Chattopadhyay e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_21

291

292

K. Shrestha and G. Chattopadhyay

delivering annual assets maintenance and renewal programs will enable to achieve the quality community infrastructure and meeting the service of level being agreed to provide to the rate payers by Council. HSC owned 1456 km of roads network of different hierarchy (Link, Collector, Local Access, Maintained Track, Non-Maintained Track, and Reserve) out of which 612 km of roads are sealed and 844 km of roads are unsealed. 37% of Sealed Road Pavement and 22% of Sealed Road Surface are in condition 3 (at the middle of the useful life) as per road condition survey 2021. Council conducts pro-active and reactive inspections for defects identification. The roads condition survey is carried out at every 4–5 years interval [3]. In lack of regular maintenance, the roads would be deteriorated sharply resulting failure in performance and unable to provide the agreed level of service. The consequence will demand increasing numbers of roads for renewal program. Based on the condition and defects, roads maintenance types can be varied as preventive maintenance and corrective maintenance [4]. Road renewal as a Condition based or Predictive maintenance, a type of preventive maintenance, extend the life of Road in a cost-effective way. HSC performing renewal program based on the road’s performance condition surveyed by the consultant. The study analysed roads conditions, and other most appropriate road selection criteria of decision for renewal of sealed roads to optimise Council maintenance budget and delivering quality service agreed to provide by Council. The multiple criteria are selected as per community and council need based approach with common approval by the members of the asset team in the discussion. The weightage of the selected decision criteria is determined by using AHP; pairwise comparison between any two criteria using a scale from 1 to 9. The study also analyses HSC planned long-term life-cycle budget versus life-cycle budget requirement forecast by Nams+ software model developed by IPWEA.

2 Methodology This study followed the steps of analysing the roads condition, renewal forecast steps and lifecycle cost, roads deterioration (straight-line slope), useful and remaining life calculation, renewal budget forecast versus planned renewal program budget analysis using Nams + modeling and prepare a framework of decision for maintenance and renewal of sealed roads options. Road Condition Analysis: Council conducts roads condition survey at every 4–5 years of interval and uses 1 to 5 grading system for the roads condition evaluation (Table 1). Council operates and maintains about 612 km of sealed roads out of which about one-third of the sealed Roads (both surface and pavement) are found in condition 3 (Figs. 1 and 2).

Analysing Maintenance and Renewal Decision of Sealed Roads at City … Table 1 Condition grading system adopted by HSC

Fig. 1 Condition distribution by % of Sealed Rd Surface

293

Condition

Descriptions

Maintenance action required

1

Very good

Not required maintenance

2

Good

Minor maintenance

3

Fair

Major maintenance

4

Poor

Rehabilitation or Renewal

5

Very poor

Replacement or Reconstruction

Condition distribution by % of Selaed Rd Surface Length

4% 0%

1

22% 44%

2 3

30%

4 5

Fig. 2 Condition by % of sealed Rd pavement

Condition distribution by % of Sealed Rd Pavement Length

6%

1% 19%

1 2

37% 37%

3 4 5

Steps for renewal forecast: The following steps are followed to forecast Renewal and Renewal analyses (Fig. 3). Road degradation: Due to lack of proper sealed road degradation model and algorithm, it is assumed that sealed roads are deteriorated proportionately with age factor and represented by a straight line as shown below (Fig. 4).

294

K. Shrestha and G. Chattopadhyay

. Preventive (Planned) Maintenance Maintenance . Corrective (Reactive) Maintenance Strategies

. Optimising Maintenance Program Maintenance . Reviewing Maintenance Performace Analysis

Renewal Forecast

. Life-Cycle Costing . Road degradation . Age/Useful life/Remaining life estimation and Condition Analysis

Fig. 3 Steps for renewal forecast Fig. 4 Line of deterioration and useful life estimation

Age, Useful life, and Remaining life estimation: Age is the time lapsed from the date of construction or acquisition. Useful life is period for which an asset would be available to provide service. The useful life of roads and their components is determined based on experts’ recommendation and comparison matrix between neighbouring councils (Table 2). Age versus Condition: Aging the asset, lesser will be the performance condition and closer to the end of useful life. If asset is not renewed, the highest will be probability of failure. The life of asset is ended at the point of the asset renewal/replacement, or disposal. But the asset retains its 100% useful life to its new condition when renewed [5].

Analysing Maintenance and Renewal Decision of Sealed Roads at City …

295

Table 2 Condition versus remaining life of asset Condition

Estimation of remaining life (%)

1

100

2

75

3

50

4

25

5

0

It is estimated that Sealed roads will be unable to provide agreed level of service after condition 4 which means useful life is expected to be 75% of the standard life and renewal is planned at condition >3 or 4 of the roads (Fig. 5). Planned versus Forecast lifecycle budget analysis of HSC: Lifecycle costing of road are calculated by the summation of acquisition cost and other future costs (operation and maintenance cost) (Tables 3 and 4). Development of Framework or Tool for Decision of Maintenance and Renewal Road Options Renewal Plan: Decision making framework for renewal of sealed road options is presented as below in Fig. 6. Free AHP calculator available in the website: AHP calculator - AHP-OS (bpmsg.com) is used to determine the weightage for the selected criteria and their

Fig. 5 Lifecycle costing phases

Table 3 HSC adopted roads life and replacement rates Descriptions

HSC adopted unit rates Life

Rural

Town

Sealed road pavement

80 years

$33.42/m2

$36.76/m2

Sealed road surface—spray seal

15 Years

$5.06/m2

$7.04/m2

25 years

$25/m2

$25/m2

Sealed road surface—asphalt

296

K. Shrestha and G. Chattopadhyay

Table 4 HSC planned long-term Life-cycle budget Fiscal year

Operations

Maintenance

Renewal-Sealed Rd

Acquisition/ New construction

Total

2021–22

$487,155.24

$1,948,620.94

$2,507,151.00

$64,058.40

$5,006,985.58

2022–23

$499,334.24

$1,997,339.46

$2,569,829.78

$65,659.86

$5,132,160.22

2023–24

$511,817.47

$2,047,269.87

$2,634,075.52

$67,301.36

$5,260,464.22

2024–25

$524,612.91

$2,098,451.62

$2,699,927.41

$68,983.89

$5,391,975.83

2025–26

$537,728.23

$2,150,912.91

$2,767,425.60

$70,708.49

$5,526,775.23

2026–27

$551,171.44

$2,204,685.73

$2,836,611.24

$72,476.20

$5,664,944.61

2027–28

$564,950.73

$2,259,802.87

$2,907,526.52

$74,288.11

$5,806,568.23

2028–29

$579,074.50

$2,316,297.94

$2,980,214.68

$76,145.31

$5,951,732.43

2029–30

$593,551.36

$2,374,205.39

$3,054,720.05

$78,048.94

$6,100,525.74

2030–31

$608,390.14

$2,433,560.52

$3,131,088.05

$80,000.16

$6,253,038.87

Total Average per year budget

$56,095,170.96 $5,609,517.10

influencing factors. The AHP, as a method which used math and psychology for organising and analysing complex decisions, is an effective tool for generating the weightage of the selected criteria. AHP is widely used for multi-criteria decision making process in several sectors of economics, politics, and engineering. In AHP, pairwise comparison between any two criteria is conducted using a scale from 1 to 9 (1-Equal Importance, 3—Moderate Importance, 5—Strong Importance, 7—Very strong importance, 9—Extreme importance, 2, 4, 6, 8—intermediate values between the two adjacent judgements). Condition, Risk Index, and Replacement Cost are selected as decision criteria from the approval and discussion with asset and works team of HSC (Table 5). The weightage of major decision-making criteria generated by AHP is illustrated in Fig. 7. The criteria and influence factors are scored between 1 and 5 and the total score is calculated based on their relative weightage generated by AHP. The decision of priority for renewal of roads options will be in descending order of the score i.e., the highest scored road will be the most priority for renewal (Fig. 8).

3 Results Figures 9, and 10 show the outcomes from the renewal modeling. Figure 10 gives the summary of outcomes from the model. It shows the budget comparison between lifecycle budget planned versus life-cycle expenditures estimation.

Analysing Maintenance and Renewal Decision of Sealed Roads at City …

297

Fig. 6 Decision making framework for Renewal

Table 5 Selection of decision criteria and factors Decision criteria

Factors influencing the criteria

Remarks

Condition

Pavement condition

Agreed to select

Surface condition

Agreed to select

AADT

Agreed to select

Hierarchy of roads

Agreed to select

Surface type

Not agreed to select

Bus route

Not agreed to select

Access to school

Agreed to select

Access to Health services

Agreed to select

Town/Rural

Not agreed to select

Tourism

Not agreed to select

Collision/Crash history

Agreed to select

Customer request

Agreed to select

Replacement cost

Agreed to select

Risk Index

Cost

The planned life-cycle budget in LTFP ((10 year Long- term financial Planning) is $ 56,095,172.00 (average per year is $5,609,517.00). From Nam + Renewal model, the lifecycle forecast expenditure is $ 37,474,536.00 (average per year is $3,747,454.00).

298

K. Shrestha and G. Chattopadhyay

Fig. 7 Weightage calculation of criteria using AHP

Fig. 8 Condition summary

An average per year planned lifecycle budget in LTFP (Long-term Financial Plan) is sufficient than average per year lifecycle expenditure from the model. Figure 11 shows the framework of decision for renewal of road options with weightage of the major decision-making criteria and their influence factors.

Analysing Maintenance and Renewal Decision of Sealed Roads at City …

299

Fig. 9 Operation and maintenance summary

Fig. 10 Lifecycle summary

4 Discussion In this study, it is assumed that roads are deteriorating proportionately with age. However, the actual deterioration may be different and govern by numerous factors such as traffic loads, quality and maintenance strategies, environment, and structural quality. Council needs to collect sufficient roads condition data including extents of the defects such as cracking, rutting, and surface roughness during condition

300

K. Shrestha and G. Chattopadhyay

Fig. 11 Framework of decision for renewal of road options with weighatge of criteria and factos

survey which will help to develop degradation model to analyse and predict actual deterioration of the roads. The decision criteria are selected from the common approval for the selection in the team during the discussion, rather than developing a model because of time constraints. An effective model is required for the selection of “truly optimal roads” for renewal and decision criteria selection. The detail consultation with community, councillors, other internal staffs, and all stake holders are required to identify their expectation from road renewal. Some of the data are assumed for this study propose only and not relevant for analysis for the other purposes.

5 Conclusion This study has developed a decision-making framework for renewal of sealed road options at HSC based on the priority score determined by the weightage assigned to the selected criteria using AHP following the analysis of planned versus forecast lifecycle budget using IPWEA Nams + renewal model. The major decision criteria and their influence factors are selected as per the community and Council need based approach and the weightages of the decision criteria are determined using AHP. The sealed roads for renewal are selected on the assumption that roads deteriorate proportionately with age and deterioration line is straight line slope. It is assumed that roads are at renewal position when their age reached at 75% of the life. A proper roads degradation model is required for the prediction of actual deterioration of the roads which required sufficient and reliable data.

Analysing Maintenance and Renewal Decision of Sealed Roads at City …

301

Acknowledgements The authors would like to acknowledge Hepburn Shire Council for providing the data and supporting in Nams+ modeling.

References 1. 2. 3. 4.

Hepburn Shire Council, Hepburn Shire Council Plan 2017–2021 Hepburn Shire Council, Annual Plan 2021/2022 Hepburn Shire Council (2017) Road Management Plan Nassar F (2019) Development of maintenance program for main road network in bahrain. In: 8th International conference on modeling simulation and applied optimization (ICMSAO) 5. IPWEA (2017) Asset management and financial management guidelines (Useful Life of Infrastructure)

Issues and Challenges in Implementing the Metaverse in the Industrial Contexts from a Human-System Interaction Perspective Parul Khanna, Ramin Karim, and Jaya Kumari

Abstract The concept of Metaverse is emerging in the industry. Metaverse is expected to be important in industrial asset management and sustainable operation and maintenance. Some of the potentials of implementing Metaverse in the industry can be related to virtual co-creation and design, remote and virtual inspection and maintenance, skills development and training, simulation, safety, and security. Additionally, Metaverse integrated with Artificial Intelligence (AI) and digital technologies will augment human perception, facilitating the Human-System-Interaction (HSI). The traditional HSI carries limitations regarding usability, immersiveness, and connectivity when it comes to the interaction between the virtual, augmented, and real world. An improved HSI in such cyberspace applications may lead to a better understanding of the system and eventually reduced faults. However, implementing Metaverse in industrial contexts is challenging and has not yet been explored thoroughly and systematically. Hence, this paper aims to systematically identify and investigate the various issues and challenges in the implementation of Metaverse in industrial contexts from an HSI perspective. The paper will further provide a taxonomy of these issues and challenges. The research methodology has been based on literature surveys, active and passive observations, and experiments done in the eMaintenanceLAB at Luleå University of Technology. The findings from this paper can be used to increase the effectiveness and efficiency of implementing Metaverse in various industrial contexts. Keywords Human-system interaction · Metaverse · Industrial metaverse · Virtual reality · Augmented reality · Architecture P. Khanna (B) · R. Karim · J. Kumari Division of Operation and Maintenance Engineering, Luleå University of Technology, Luleå, Sweden e-mail: [email protected] R. Karim e-mail: [email protected] J. Kumari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_22

303

304

P. Khanna et al.

1 Introduction Metaverse is the concept of a digital world facilitated by technologies like virtual reality (VR), augmented reality (AR) and more. In the metaverse, avatars which are the virtual representation of users can interact with each other and can have an immersive experience with their digital surroundings [1]. The term metaverse was coined by Neal Stephenson in his science fiction novel called “Snow Crash” published in the year 1992. His work described it as a computer-generated and imaginary universe [2]. But with the advancement in technology, this imaginary universe is transforming into a digital reality. According to the Scopus database, the research related to the metaverse is seeing exponential growth in recent years, especially after 2020 (Fig. 1). This analysis is based on the search term ‘metaverse’ and a few related terms like ‘augmented reality’, ‘virtual reality’, ‘mixed reality’, ‘extended reality’,’virtual world’ and ‘Metaverse wallet’ from the year 1995 till 2022. Table 1 defines some of these metaverse-related buzzwords. Metaverse offers an interaction between both worlds (physical and virtual) leading to an increased apprehension of assets in the virtual environment. Metaverse contributes to the evolution of the existing state of the internet to build a more immersive, interactive, and interconnected virtual space. It has been an area of major interest in industries like gaming [9], and education [10]. Its usage is now expanding to industries dominated by physical assets like transport [1], mining [11], construction [12], railways [13], etc. Metaverse in an industrial context, refers to the integration of digital technologies, data analytics, connectivity, and automation into industrial processes, systems, and environments. It encompasses advanced technologies, such as the Industrial

Fig. 1 Metaverse research trend (Scopus database)

Issues and Challenges in Implementing the Metaverse in the Industrial …

305

Table 1 Metaverse-related buzzwords Search Term

Definition

Augmented AR is the technology which overlays virtual (digital) content onto the real world Reality (AR) making the virtual objects appear in the same space as that of the physical world [3, 4] Virtual VR is a fully simulated digital environment that enables users to interact and Reality (VR) have an immersive experience. [5, 6] Mixed MR is the combination of both VR and AR technologies to create a hybrid Reality (MR) environment where virtual and real-world objects coexist and interact in real-time [7] Extended XR is an umbrella term that encompasses technologies like AR, VR and MR. It Reality (XR) highlights the spectrum of human’s immersive experience [8] Metaverse Wallet

This term refers to the digital or virtual wallet that enables users to manage digital assets in the metaverse

Internet of Things (IIoT), artificial intelligence, machine learning, virtual reality, and augmented reality to create a collaborated and immersive digital ecosystem within the industrial sector [14]. It aims to transform traditional industrial practices by incorporating digital technologies to optimize efficiency, enhance safety and decision-making. It deals with various components, including machines, control systems, sensors, and humans, to enable seamless communication and collaboration. The industrial applications of Metaverse have the potential to play a crucial role in the remote monitoring of assets, training and support assistance, predictive maintenance, condition-based maintenance, asset management, performance optimization, and supply chain management [15]. The metaverse also enables the monitoring of key performance indicators and the identification of bottlenecks in industries. Human System Interaction (HSI) plays a crucial role in the implementation of the metaverse in industries. HSI focuses on designing system interfaces, interactions, and experiences to facilitate effective communication and collaboration between humans and industrial systems. An improved HSI positively impacts the system’s performance, including the aspects of resilience, safety, security, and sustainability. Additionally, well-designed interfaces in the metaverse empower users to navigate complex systems, access information, and perform tasks efficiently, leading to increased productivity, reduced faults, and user satisfaction. The traditional HSI techniques carry some limitations such as those regarding usability, immersiveness, and connectivity when applied to the metaverse in the industrial context. This paper explores these limitations and develops a taxonomy of issues and challenges in the implementation of the metaverse in the industrial context from an HSI perspective. This taxonomy can help in the design and development of the metaverse system in various industries.

306

P. Khanna et al.

Fig. 2 Multidisciplinary fields in HSI

2 Human-System-Interaction (HSI) Human-System Interaction (HSI) is a multidisciplinary field that combines aspects of computer science, human factors, cognitive psychology, sociology, and engineering. HSI aims to optimise the design and development of systems that interact with humans. HSI focuses on the design, research, and development of interfaces between people and systems such as computers [16]. Figure 2 shows the interaction of humans with industrial systems. The human aspects include cognitive psychology, sociology, and human factors. The industrial aspects include concepts such as computer science and engineering. The industrial progress of human society has been shaped by various systems that enable human interaction across physical, digital, virtual, social, and artificial environments, as well as the interconnected interactions between these systems. The field of HSI has experienced significant growth due to the advancements in digitalization [17], which have resulted in improved capabilities, efficiencies, and cost reductions. Digitalization, characterized by its ability to preserve information, has paved the way for the integration of Artificial Intelligence (AI) into HSI. AI has harnessed the vast amounts of data and information generated by digitalization to learn, reason, predict, optimize, and enhance both human-system and system-system interactions across the interconnected environments mentioned earlier [18].

3 Metaverse 3.1 Emerging Technology As per Gartner’s hype cycle for emerging tech 2022 [19], the metaverse has entered the trend with an outlook of more than 10 years. It is seen as an evolution of the existing state of the internet with significant potential and growing interest and the promise of transforming our digital experiences and interactions. The COVID-19 pandemic has proven to be a catalyst in the emergence of this immersive technology [20]. When mobility around the world was restricted, due to the pandemic, technologies offering remote services proved to be essential. Different

Issues and Challenges in Implementing the Metaverse in the Industrial …

307

industries adapted to this new norm. The education industry shifted towards more online classes. Corporate industries started getting accustomed to online meetings and digital collaboration techniques. While the existing internet technologies made remote connections and collaborations possible, the need for immersive and interactive tools was observed. Metaverse offers one alternative for an immersive and interactive experience in remote connections.

3.2 Architecture of the Metaverse The architecture of the metaverse refers to the framework and structure of the technology that enables interaction of the virtual world, digital assets, and users facilitating the functioning of the metaverse. Understanding the architecture is important to understand the HSI principles in the metaverse to design and develop optimised and efficient system interfaces. Although the metaverse is a relatively new technology to have a standard architecture, there are still a few generalised architectures that are used for the metaverse. Jon Radoff, CEO at Beamable, famously characterises the metaverse into seven layers: Experience, Discovery, Creator Economy, Spatial Computing, Decentralization, Human Interface, and Infrastructure [21]. Another perspective of a similar architecture but on a higher abstraction level with a three-layered architecture consisting of the Infrastructure, the Interaction and the Ecosystem is proposed by Duan et al. [22] (Fig. 3). This three-layered architecture follows a basic but fundamental layering approach which divides the metaverse system into distinct layers. The infrastructure layer refers to the physical environment and the ecosystem refers to the virtual environment, the interaction being the intersection or a bridge between these two [22]. We describe a Railway industry use case developed by the AI Factory [23] to illustrate the three-layered architecture. The implementation of this platform is shown in Fig. 4. This developed metaverse implementation is used for monitoring and analysis of railway assets for their effective and efficient operation and maintenance. The first layer called the ‘physical world’ deals with physical assets like railway Fig. 3 Metaverse from a macro perspective [22]

308

P. Khanna et al.

Fig. 4 Industrial metaverse in railways context [23]

tracks, wheels, locomotives, etc. In this use case, we have taken the railway track and a wheel as the input assets. The images in the physical world layer show the railway track and a tear on the wheel respectively. The 3D models of these assets are fed into the devices used for interaction like the VR headsets and AR glasses as seen in the “interaction” layer. The output can be seen in the “virtual world” layer as simulated digital content in VR. The two images show the virtual space created for collaborative work and the wheel and track inspection for monitoring and maintenance activities. The interaction between the physical world and the virtual world is a core aspect of the design and functionality of the metaverse. It plays the role of a bridge between both worlds; therefore, it is crucial to have seamless integration and communication.

4 Human-System Interaction in Metaverse Human-System Interaction (HSI) refers to the study, research, design, and development of interaction between humans and systems. It is a multi-disciplinary field encompassing concepts of computer science, ergonomics, cognitive science, design technology and more related fields of study [24]. In the metaverse, HSI is the base on which user experience is built. It refers to the interaction between the human and the metaverse ecosystem. The metaverse ecosystem encompasses the real world and the virtual world and consists of assets, technological devices, other users (avatars), virtual objects, AI-driven systems and services like virtual assistants or recommendation services, etc. As observed in Sect. 3.2, the interaction layer in the metaverse architecture plays a pivotal role in the development of the metaverse. It is the layer that is in contact with both the physical world as well as the virtual world. Thus, we categorize interactions in the metaverse with two aspects, one dealing with the virtual world and the other dealing with the physical world.

Issues and Challenges in Implementing the Metaverse in the Industrial …

309

Fig. 5 Social interaction in metaverse

4.1 Interaction with the Virtual World 4.1.1

Social Interaction

These interactions include engagement and collaboration with other users. It allows users to connect from all over the world thus giving a social platform to connect, collaborate and engage. It adds a sense of human connection to the virtual environment, making the metaverse an interactive space. Figure 5 shows a social interaction between avatars inspecting railway assets in AR space.

4.1.2

Environmental Interaction

These interactions refer to the user’s engagement with the virtual environment around them. It allows users to engage with the various virtual objects which creates a sense of control. It also allows users to discover the virtual world within the metaverse. It enhances the immersive experience and interactivity, which makes the metaverse experience realistic and interactive. Figure 6 shows an avatar engaging with the virtual environment and taking inspection notes.

4.1.3

Creative Interaction

These refer to activities where users focus on creativity. User-generated content in the metaverse is a good application usage for this. Users can generate their own customized avatars for different roles, environments or even experiences [25]. Figure 7 shows customised avatars in a customised spacial environment for railways maintenance.

310

P. Khanna et al.

Fig. 6 Environmental interaction in metaverse

Fig. 7 Creative interaction in metaverse

4.2 Interaction with the Physical World 4.2.1

User Interactivity with Hardware

These interactions refer to the ways in which users engage and interact with the hardware devices in the metaverse that facilitate their participation in the virtual environment. Such devices help with the input, output, and control within the metaverse. Hardware devices like AR, VR headsets, mice, keyboards, controllers, etc. are used for user interaction.

Issues and Challenges in Implementing the Metaverse in the Industrial …

4.2.2

311

User Interactivity with Software

These interactions refer to the ways users interact and engage with the various software components in the metaverse. AI-driven services like voice recognition and natural language processing enable users to have a more intuitive and hands-free way of engaging with the virtual environment.

4.2.3

Sensory Interaction

These interactions facilitate the users to perceive the metaverse experience through their senses. It involves creating an immersive and realistic sensory experience. There can be various applications in sensory interaction like visual experience, audio experience, haptic feedback using a haptic vest or gloves, gestural interaction, etc. [26].

5 Applications of HSI in the Metaverse in Industrial Aspects The metaverse has impacted several industrial domains and there continues to be ongoing research for many others. Industries like gaming, entertainment and education have many applications of metaverse. The impact of the metaverse also has been seen in industries like manufacturing, transport, mining, aviation, construction, etc. Some of the areas where HSI in the metaverse has been applied to industrial aspects are discussed below: . Interaction Design and User Interfaces This encompasses designing intuitive and user-friendly industrial system interfaces and facilitating effective collaborations and interactions between users and the virtual environment. This includes techniques like integrating haptic feedback, gesture controls, natural language processing, shared object interaction, and realtime communication like audio chat, text chat, etc. HSI principles are used to make this interaction as smooth as possible. For example, optimizing the quality of signals, minimizing the latency, etc. . Immersive Virtual Environment HSI principles are used to develop an immersive industrial virtual environment in the metaverse. This includes developing a high-quality 3D model with as many realistic features as possible to give the user an immersive experience.

312

P. Khanna et al.

. Real-Time Data Integration In industries where, real-time data is of utmost importance, like using sensors or IOT devices, HSI principles are used for smooth integration and flow of real-time data to the virtual environment. This helps in synchronising the data and in decision-making activities. . Virtual Object Control When designing the industrial virtual space in the metaverse, certain objects require some real word concepts. For example, for some objects, physics-based concepts might be needed to model their interaction with the environment. HSI techniques help in replicating such behaviour and feedback. . Prototyping HSI is used in the metaverse for virtual prototyping. Users can create and modify the virtual prototype to test and optimize the design before proceeding to the physical model. This helps in improving design efficiency and reducing costs in the industrial sector. . Visualization and Analysis HSI principles are implemented in the metaverse to visualize and analyse complex data sets. Such applications give industries an edge to gain deep insights, identify patterns in the data, and make data-driven decisions.

6 Issues and Challenges With the exponential growth of technology, the operation of industrial system interfaces is becoming more and more complex. Designing and developing the interaction between the human and the system is getting more challenging. We have developed a taxonomy of issues and challenges of metaverse implementation in the industrial context from an HSI perspective. We have identified and categorized these challenges into four categories: technical, ergonomic, organizational, and economic. Figure 8 shows the developed taxonomy of issues and challenges. Each category of the taxonomy is explained below with the associated challenges.

6.1 Technical Challenges This category encompasses the challenges with the HSI that the technological side of the metaverse is dealing with in the industries. Some of the technical challenges observed are described below:

Issues and Challenges in Implementing the Metaverse in the Industrial …

313

Fig. 8 Taxonomy of issues and challenges in the metaverse from the HSI perspective in industrial contexts

6.1.1

Integration of Multiple Technologies

A number of technologies are integrated to give a better metaverse experience. Integration of these technologies becomes a challenge. Technologies such as artificial intelligence, machine learning, cloud services, and IoT services, come together to give an immersive experience but bring their complexities along with their functionalities and capabilities. Such Integration requires interfaces that handle various data formats, workflows, and scalability.

6.1.2

Multi-platform and Cross-Platform Compatibilities

There are a number of devices that facilitate the metaverse. There is a diversity of devices used to access the virtual world like personal computers, smartphones, tablets, AR glasses, VR headsets, smart wearables, etc. These devices vary in their operating systems, screen size resolution, operation and performance efficiency. The challenge is there to ensure a seamless and consistent user experience across these varying platforms and devices. Another challenge is to synchronise the data to enable users to transition between different platforms.

6.1.3

Big Data

In the metaverse ecosystem, systems generate and deal with high velocity, variety, and vast amounts of data. The challenge is there for system interfaces to manage and present data in a user-friendly manner, especially when dealing with complex data structures, real-time data, and data visualisation.

314

6.1.4

P. Khanna et al.

Security and Privacy

As discussed in Sect. 6.1.3, the metaverse deals with a vast amount of data which might include some personal information, behavioural data, or other sensitive data. The challenge is to develop robust, and access-controlled system interfaces to keep data secure from unauthenticated and unauthorized access, data breaches and other cybersecurity threats.

6.2 Organizational Challenges This category encompasses the challenges with the HSI that the organisation deals with while implementing Metaverse in the industries. Some of the organisational challenges are described below:

6.2.1

System Complexity

Due to the integration of multiple technologies in the metaverse, the system is getting more and more complex. Designing and developing interfaces for such complex technology requires organisational decisions like resource allocation, workforce hiring, technological upgrade etc. Also, training the workers on complex industrial procedures is essential. Designing effective interfaces for training that simulate real-world behaviour and provide hands-on experiences can be challenging.

6.2.2

Integrating with Legacy System

In most existing industrial systems, reliance on legacy systems and infrastructure is huge. Now that metaverse is being implemented in industries, the development of interfaces that facilitate the integration with the existing industrial systems poses challenges in compatibility, data exchange, and system integration.

6.2.3

Safety and Risk Management

The industrial system interfaces should be failsafe and compatible with error handling. Organizations should effectively manage threats and ensure the safety and well-being of the users. This may include adapting to new safety measures and compliance with regulations.

Issues and Challenges in Implementing the Metaverse in the Industrial …

315

6.3 Ergonomical Challenges This category encompasses the challenges with the HSI in the industrial metaverse dealing with the difficulties related to designing interfaces, systems, and interactions especially to optimize user experience, comfort, and efficiency. Some of the ergonomic challenges are described below:

6.3.1

User Comfort

One of the most common ergonomic challenges is to ensure user comfort. Immersive experiences in the metaverse can induce motion sickness or simulator sickness in some users. These are caused by rapid movements or visual discrepancies. Visual fatigue and eye strain are also common issues that users face.

6.3.2

Accessibility and Inclusivity

Industrial systems can be quite complex in the metaverse, thus designing interfaces and interactions that can be accessed by all users regardless of parameters like age, experience, and disabilities can be a challenge. Integration of metaverse systems with assistive technologies like screen readers, alternative input methods, etc. is also a challenge.

6.3.3

User Adaptability and Acceptance

Introducing the metaverse to the workers, especially the industrial workers requires their acceptance and adaptation. The most common reasons to not accept and adapt to new technologies are concerns about job security, resistance to change, and unfamiliarity with new technology, in this case, virtual environments. User-specific designed system interface is also a challenge.

6.3.4

Cognitive Load and Information Overload

Complex integrated industrial metaverse systems can be information-intensive and deal with a vast amount of data to process and interpret. Transferring and presenting this information effectively within the metaverse is a challenge but doing so without overwhelming the users with cognitive load or information overload is a bigger challenge. Designing system interfaces to give concise information with intuitive navigation is a challenge.

316

P. Khanna et al.

6.4 Economical Challenges This category encompasses the challenges with the HSI in the metaverse dealing with the financial obstacles that organizations and individuals may face in the industries. Some of the economic challenges are described below:

6.4.1

Realistic Experience

The metaverse is often used for simulation, training and design purposes in the industries. Ensuring realistic virtual environments is crucial for an immersive experience, effective learning, testing, and decision-making. Creating such a realistic experience can require high-end digital infrastructure to include physics-based interactions and realistic graphics which can be a financial challenge.

6.4.2

Hardware and Devices

Participating in the metaverse requires specialised hardware devices like VR headsets, AR glasses, haptic gloves or vests, etc. Such devices can be costly, and the availability of compatible hardware can impact the adoption and use of the metaverse within the organisation.

7 Conclusions In this paper, we have explored the implementation of metaverse for industrial applications related to asset management, operation, and maintenance where humansystem interaction plays a crucial role in issue detection, diagnosis and equipment monitoring. We categorised interactions in the metaverse into two aspects. One aspect deal with the virtual world and the other deals with the physical world. Looking at the interaction layer in the metaverse architecture, we distinguished between the interaction with the virtual world and with the physical world to give a distinct understanding of HSI in the metaverse. Furthermore, we developed a taxonomy of the issues and challenges in the implementation of Metaverse in the industrial context from an HSI perspective. This taxonomy will help gain deeper insight into requirements for a better interactive and immersive virtual environment. Additionally, this taxonomy will also help to assess the technology readiness level in the industry for the adaptation of the metaverse technology. Future work in this field includes a deeper understanding of these challenges for developing practical solutions and strategies for seamless and optimized human-system interaction in industrial contexts. Acknowledgements We gratefully acknowledge the European Commission for its support of the Marie Sklodowska Curie program through the H2020 ETN MOIRA project (GA 955681). We also

Issues and Challenges in Implementing the Metaverse in the Industrial …

317

acknowledge the valuable support and resources provided by the eMaintenanceLAB in conducting this research.

References 1. Njoku JN, Nwakanma CI, Amaizu GC, Kim DS (2023) Prospects and challenges of Metaverse application in data-driven intelligent transportation systems. IET Intel Transport Syst 17(1):1– 21. https://doi.org/10.1049/ITR2.12252 2. Stephenson N (1992) Snow crash: a novel. Spectra 3. Azuma R, Baillot Y, Behringer R, Feiner S, Julier S, MacIntyre B (2001) Recent advances in augmented reality. IEEE Comput Graph Appl 21(6):34–47. https://doi.org/10.1109/38.963459 4. Azuma RT (1997) A survey of augmented reality. Presence Teleoper Virtual Environ 6(4):355– 385. https://doi.org/10.1162/PRES.1997.6.4.355 5. Dincelli E, Yayla A (2022) Immersive virtual reality in the age of the metaverse: a hybrid-narrative review based on the technology affordance perspective. J Strateg Inf Syst 31(2):101717. https://doi.org/10.1016/J.JSIS.2022.101717 6. Zheng JM, Chan KW, Gibson I (1998) Virtual reality. IEEE Potentials 17(2):20–23. https:// doi.org/10.1109/45.666641 7. Buhalis D, Karatay N (2022) Mixed reality (MR) for generation Z in cultural heritage tourism towards metaverse. Inf Commun Technol Tourism 2022:16–27. https://doi.org/10.1007/9783-030-94751-4_2 8. Çöltekin A et al (2020) Extended reality in spatial sciences: a review of research challenges and future directions. ISPRS Int J Geo-Inf 9(7):439. https://doi.org/10.3390/IJGI9070439. 9. Nevelsteen KJL (2018) Virtual world, defined from a technological perspective and applied to video games, mixed reality, and the metaverse. Comput Animat Virtual Worlds 29(1):e1752. https://doi.org/10.1002/CAV.1752 10. Hwang GJ, Chien SY (2022) Definition, roles, and potential research issues of the metaverse in education: an artificial intelligence perspective. Comput Educ Artif Intell 3:100082. https:// doi.org/10.1016/J.CAEAI.2022.100082 11. Liu K, Chen L, Li L, Ren H, Wang FY (2023) MetaMining: mining in the metaverse. IEEE Trans Syst Man Cybern Syst:1–10. https://doi.org/10.1109/TSMC.2022.3233588 12. User, Sustainable engineering paradigm shift in digital architecture, engineering and construction ecology within metaverse sustainable tourism view project advanced infrastructure development view project Tak Kit Kwok. https://www.researchgate.net/publication/359892741. Accessed 01 Jun 2023 13. Global Railway Review. https://edition.pagesuite-professional.co.uk/html5/reader/production/ default.aspx?pubname=&edid=dfe528ba-cab6-46cb-914b-d9fbd5cf44b6. Accessed 01 Jun 2023 14. Lee J, Kundu P (2022) Integrated cyber-physical systems and industrial metaverse for remote manufacturing. Manuf Lett 34:12–15. https://doi.org/10.1016/J.MFGLET.2022.08.012 15. The Industrial Metaverse. https://ignite.microsoft.com/en-US/sessions/d12206d9-ee2d-4d999dfb-dedd50bf7f0a?source=/home Accessed 29 May 2023 16. Human System Interaction|GE Research. https://www.ge.com/research/technology-domains/ artificial-intelligence/human-system-interaction. Accessed 27 May 2023 17. Ao SI, International Association of Engineers (2009) World congress on engineering and computer science: WCECS 2009 : 20–22 Oct 2009. Newswood Ltd., International Association of Engineers, San Francisc, USA 18. De Silva D, Nawaratne R, Ruminski J, Malinowski A, Manic M (2022) Human system interaction in review: advancing the artificial intelligence transformation. In: International conference on human system interaction, HSI, vol 2022. https://doi.org/10.1109/HSI55341.2022.9869473

318

P. Khanna et al.

19. 3 Exciting new trends in the gartner emerging technologies hype cycle. https://www.gar tner.com/en/articles/what-s-new-in-the-2022-gartner-hype-cycle-for-emerging-technologies. Accessed 10 May 2023 20. Wang Y, Lee LH, Braud T, Hui P (2022) Re-shaping post-COVID-19 teaching and learning: a blueprint of virtual-physical blended classrooms in the metaverse era. In: Proceedings of the 2022 IEEE 42nd International conference on distributed computing systems workshops, ICDCSW 2022, pp 241–247. https://doi.org/10.1109/ICDCSW56584.2022.00053 21. Radoff J (2023) The metaverse value-chain. In: Trillions of dollars are pouring into… |Building the Metaverse|Medium. https://medium.com/building-the-metaverse/the-metaversevalue-chain-afcf9e09e3a7. Accessed 09 May 2023 22. Duan H, Li J, Fan S, Lin Z, Wu X, Cai W (2021) Metaverse for social good: a university campus prototype. In: MM 2021 - Proceedings of the 29th ACM international conference on multimedia, pp 153–161. https://doi.org/10.1145/3474085.3479238 23. Karim R, Galar D, Kumar U (2023) AI factory: theories, applications and case studies. CRC Press 24. Wickramasinghe CS, Marino DL, Grandio J, Manic M (2020) Trustworthy AI development guidelines for human system interaction. In: International Conference on Human System Interaction, HSI, vol 2020, pp 130–136. https://doi.org/10.1109/HSI49210.2020.9142644 25. Dwivedi YK et al (2022) Metaverse beyond the hype: Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int J Inf Manage 66:102542. https://doi.org/10.1016/J.IJINFOMGT.2022.102542 26. Kour R, Castaño M, Karim R, Patwardhan A, Kumar M, Granström R (2022) A human-centric model for sustainable asset management in railway: a case study. Sustainability 14(2):936. https://doi.org/10.3390/SU14020936

Development of a Biologically Inspired Condition Management System for Equipment Maneesh Singh, Knut Øvsthus, Anne-Lena Kampen, and Hariom Dhungana

Abstract Biomimicry is an approach for solving industrial challenges by studying similar cases in nature and emulating bio-organisms’ responses. Thus, it helps to solve modern day technological problems using the solutions that bio-organisms have successfully used over the course of millions of years. In an ongoing research project, investigations are being carried out to explore the use of biomimicry approach for developing a framework for a human-centric condition management system. This framework is inspired by the knowledge of human cognition. It is expected that the system will be able to utilize various data and integrate it with analytical models and knowledge-based systems to help an equipment diagnose and recommend optimised operation and maintenance programs. This paper describes the proposed framework for this human-centric condition management system. Keywords Artificial intelligence · Biomimicry · Cognition · Condition monitoring · Failure · Predictive maintenance · Safety · Security

M. Singh (B) · K. Øvsthus · A.-L. Kampen · H. Dhungana Western Norway University of Applied Sciences, Inndalsveien 28, 5063 Bergen, Norway e-mail: [email protected] K. Øvsthus e-mail: [email protected] A.-L. Kampen e-mail: [email protected] H. Dhungana e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_23

319

320

M. Singh et al.

1 Introduction Biomimicry is an approach for solving industrial challenges by studying similar cases in nature and emulating bio-organisms’ responses. Thus, the solutions that have been successfully developed and tested by bio-organism over the course of millions of years are used as inspiration to solve modern day technological problems. In the last few years, biomimicry has generated considerable research interests. The concepts developed from these studies have helped in improving efficiency of various equipment and structures. Some successful examples of application of biomimicry include [1]: . Shikansen Bullet Train (Japan)—Modelling of the frontend of the train after the beak of kingfishers to reduce air drag and noise. . Eastgate Building (Harare, Zimbabwe)—Modelling of internal climate control system after termite mounds to allow natural draft of air. . “Painless” needles—Modelling of injection needles after mosquito’s proboscis (mouth) to make easy insertion needles. . Underwater communication—Modelling after dolphins’ underwater communication method to develop reliable multi-frequency data transmission. An important aspect of survival of bio-organisms is their ability to regulate their processes with changing environment and to protect-maintain themselves against internal and external attacks. Learning cognitive process by which bio-organisms carry out these tasks can help in developing a cognitive layer for a condition management system. Such a system can help in self-preservation by enabling an equipment to diagnose and recommend optimised operation and maintenance programs. This paper presents the ideas behind an ongoing project that intends to develop a condition management system (diagnosing and recommending optimised operation and maintenance programs) inspired by human cognitive system, for assets for manufacturing, process and infrastructure industries.

2 Proposed Framework Human body is continuously subjected to numerous attacks; and to protect and maintain itself it follows a number of routines. It does so by adopting protective and maintenance measures, like: 1. 2. 3. 4.

Identification of threats (who, where, how) using sight, hearing, smell; Identification of attacks (who, where, how) using touch and taste; Analysis of damage caused by understanding pain, blood flow, etc.; Identification of failures to protect itself from threat by understanding blood flow, sickness, etc.; 5. Repair and replacement by generating antibodies, blood clotting, repair of tissues; 6. Condition assessment to check the progress of repair and replacement.

Development of a Biologically Inspired Condition Management System …

321

A close study of human body shows that is has distinct similarities to cyberphysical systems. Both have: . Physical body (human body versus equipment/structure) . Sensors for collecting data (sense organs versus sensors) . Data transfer mechanism (nervous system versus digital data transportation and storage) . Memory and analytics section (brain versus data analytics) Thus, it may be possible to learn how the human body protects and maintains itself against external attacks and apply those concepts for developing a condition management system for assets (equipment or structures). The condition management system for equipment / structure can possibly be developed by systematically: 1. Connecting physical process (replicating body), monitoring systems (replicating sense organs), data transmission networks (replicating neural network) and decision support system (replicating brain). 2. Considering (a) equipment; (b) operating conditions; (c) vulnerabilities; (d) potential threats-attacks-damages; (e) condition monitoring; (f) failure profile; and (g) inspection-maintenance planning. 3. Providing decision support (classical and AI based) by considering various aspects like: a. b. c. d. e.

prioritization of reliable sensor data under different operating conditions; rejection of irrelevant or irrational data; analysis (processing-reasoning) of data; rationalizing-interpretation-elaboration of data; learning from history for continuous improvement.

Figure 1 shows the schematic representation of the framework that is inspired by the working of human brain. A brain consists primarily of two parts – “old brain” and “new brain” (neocortex or “new outer layer”). All animals have the old brain that is made up of many structurally different components or organs performing specific tasks, for example, premeditation and impulsive aggression (amygdala), basic movements (spinal cord), and digestion and breathing (brain stem). In addition to the old brain, mammals have an additional component, often referred as the “new brain” or neocortex. Unlike old brain, neocortex is composed of one large structure that looks similar throughout, but different regions of the neocortex perform different functions related to vision, hearing, touch, speech, taste, thought, etc. [2]. In the proposed framework, inspired by the working of human brain, there are two distinct components—Primary Component (corresponding to the “old brain”) and the Secondary Component (corresponding to the “new brain” or neocortex).

322

M. Singh et al.

Fig. 1 Framework for biologically inspired condition management of industrial processes

The study of human cognition is extremely vast and complicated. Over the decades, multiple levels of analysis covering various aspects of biological, neurological, sociological and functional human behaviours have been carried out to understand it. While we are aware that a number of parallel and conflicting theories have been proposed to explain different aspects of human cognition system, we have often taken simplified versions of these theories and at times adopted a mix and match approach while using them. Thus, while appreciating the complexity of the subject, we have taken inspirations from human cognition for developing a framework for condition management system. This work is still under development and many important features are still missing.

2.1 Structural Attributes Every physical object, biological or equipment, has a unique structure that is determined by its genes or construction / manufacturing. Structure of an equipment can be characterized based on its physical dimensions, materials of construction, protections (example, coatings, and linings), insulations, lubrications, etc. These structures also come equipped with sense organs / sensors that help the organism / equipment experience its health conditions. For example, in offshore wind turbine parameters like generator bearing temperature, hydraulic oil temperature and gearbox oil temperature can help to diagnose components’ health.

Development of a Biologically Inspired Condition Management System …

323

2.2 Environmental Attributes Every physical object, biological or equipment, operates in an environment that not only influences its performance, but also subjects it to a number of several environmental attacks by degrading mechanisms, like corrosion and erosion. These degrading attacks may damage and significantly reduce the structural integrity of process equipment or structure. For example, operating environment parameters like wind speed, ambient temperature, pressure, and humidity can influence the degradation characteristics of an offshore wind turbine.

2.3 Operational Attributes During their operational lifetime, degrading mechanisms, like wear, tear, and deformation, subjects a bio-organism or an equipment to a number of operational attacks. These attacks target the vulnerabilities of the equipment resulting in damages. For example, in an offshore wind turbine some of the operating parameters are rotor pitch angle, rotor RPM and generator RPM.

2.4 Primary Component (“Old Brain”) The old brain is hard-wired to control the basic behaviour and functions performed by an organism. This includes the innate behavioural styles and patterns of eating, sleeping, reflex, feelings, emotions, desires, etc. These basic skills are coded in its genes. Most of these activities can take place without conscience decisions from the new brain [2]. Just as a non-mammal can function without neocortex, an important safety feature for the proposed condition management system is the robust and safe functioning of the critical elements of the condition management system even in absence of the inputs from the Secondary Component.

2.4.1

Needs, Goals and Motivations

In an animal, the old brain is the seat of needs, goals and motivations. According to cognitive theories, the contents of these needs, goals and motivations exist as knowledge (cognitive) representations in memories. Relevance, hence utilization and success, of any product or service depends upon the core idea: To what extent does it satisfy existing needs? This puts needs at the centre of any activity - it fuels the need-fulfilment aspiration that drives the motivation system of all activities.

324

M. Singh et al.

Fig. 2 Maslow’s theory of needs

According to Abraham Maslow’s Theory of Needs, a person has five basic needs that have to be satisfied (Fig. 2). Even though he himself never represented this theory in form of a hierarchical triangle, the theory is best known in that format [3]. After necessary modifications, Maslow’s hierarchy of needs can also be used for explaining the motivation of carrying out condition management of equipment. First in the hierarchy is the physiological need that are related to the survival of the equipment. The next level is the need for safety, protection and security that ensure the proper health of the equipment. The next two layers comprise of the psychological needs. Last layer is that of self-fulfilment needs. We have made the following changes to the application of the Maslow’s Theory of Needs: 1. Basic Needs are physical and not psychological 2. Esteem Needs are not relevant 3. Belongingness Needs are physical, not psychological and refer to the needs of working as a part of a system (network of equipment) 4. Self-fulfilment Needs reflect the purpose of having the equipment 5. The needs are not organized in a hierarchy, rather all the four needs (Physiological, Safety / Security, Belongingness and Self-fulfilment) have to be satisfied to some extent. 6. The optimum degree of satisfaction for individual needs is not static, it depends upon the corporation strategy and environment. Thus, the four needs of survival, well-being and function of the equipment, may be interpreted as a need to perform its required function effectively and efficiently without adversely affecting health, safety and environment. According to Fishbach & Ferguson a goal is a cognitive representation of a desired endpoint that impacts evaluation, emotions and behaviours [4]. It directs an organism’s thoughts, feelings, decisions, and behaviours. An understanding of “cognitive representation” of goals can help in understanding various aspects of the goal settings. Figure 3 shows an example of hierarchical structure of goals. Every living organism has two existential goals—shortterm goal (survival) and long-term goal (passing on genes to next generation). To meet the goal of survival, the organism in turn has two sub-goals—self-generation (growing of self from birth until death) and self-maintenance. Self-maintenance entails self-protection (identifying threats, avoiding attacks, reducing damage) and self-preservation (resorting to repair and replacement).

Development of a Biologically Inspired Condition Management System …

325

Fig. 3 Goals of living organisms

Analogous to it, the Primary Component has modules that contain goals of decision-making aspect for operation (corresponding to self-generation in an organism) and maintenance (corresponding to self-preservation in an organism) of process equipment. Simpson and Balsam define motivation as the energizing of behaviour in pursuit of a goal [5]. It is fundamental to our interaction with the environment around us. For example, cues regarding the availability of food, a requirement for the goal of survival, may energize (motivate) an organism to take food-seeking actions. A number of factors, like physiological condition, environmental condition and experiences, influences the degree of motivation. The final decision taken by a person is an outcome of cost–benefit analysis involving all the factors and processes that can potentially influence the pursuit of goal. Similarly, in condition management of an equipment cost–benefit analysis is central for any optimisation of operation or maintenance activities. A process equipment, like a pump or a pipe, is just a physical functional structure whose purpose and identity is defined by its role in the process network to which it is connected. It has no inherent purpose, need, goal, or motivation, but derives them from the ideas associated by its manufacturers or users. Thus, it reflects the intents of the usage and is devoid of any inherent goal, needs or motivations. Development of human inspired cognitive condition management system entails a proper understanding of the relationship between humans, equipment and environment. Any pair wise (humans-equipment, equipment-environment and humansenvironment) study will only provide a partial understanding of the requirements and offer limited solutions. An addition of a cognitive human–machine platform could provide it with limited capabilities to autonomously condition-manage itself.

2.4.2

Safety Versus Security Issues

The main objective of the framework is to reduce issues related to safety and security of an equipment. To be able to meet this objective it is essential to understand the difference between the two and then create strategies to handle them.

326

M. Singh et al.

In context of this framework, the major distinguishing features for safety and security issues are the identities of subject (entity that performs an action) and object (entity on which the action is performed). Thus [6, 7]: . In safety issues equipment is the subject and environment (natural environment, system, humans, corporate, etc.) are the object; failure of equipment can adversely affect the environment. . In security issues environment is the subject and equipment is the object; attack by environment can adversely affect the equipment. Figure 4 illustrates the differences between the safety and security issues in condition management. Every bio-organism or equipment needs to overcome its vulnerabilities in order to survive. An understanding of these vulnerabilities and the associated safety and security issues can help to reduce the risks associated with the failure. Fig. 4 Safety versus security issues

Development of a Biologically Inspired Condition Management System …

327

Similar to humans, the industrial assets (equipment or structure) need to be aware of both – safety and security – issues. For its security, an equipment needs to detect, identify and mitigate threats even before the attack takes place. If the attacker manages to breach security and exploit the system’s vulnerability, there is a need to detect, identify, and limit the damage caused by the attack. For its safety, an equipment needs to detect, identify and mitigate degradations mechanisms caused by its own operation.

2.4.3

Memories for Procedural Decision Making

Old brain carries a cache of “best practices” that it has either inherited or acquired over long period [2, 8]. Similarly, Primary Component, which is inspired by the old brain, has Memories Region containing predetermined rules and reasoning. These rules, while not being optimal, are good enough to take the first decision regarding the safety of the operation. In order to generate and store these rules and reasoning, there have to be schemas for expert system for learning, organising and suggesting tasks associated with the recognised features in equipment-environment-corporate system. Inefficient schemas will severely restrict the efficiency and efficacy of the expert system.

2.4.4

Awareness and Attention

Real-time data from sensors first enter the Awareness and Attention Region of the Primary Component. Here the data is pre-processed, cleaned and then compared against baseline values to detect any anomaly (analogous to catching attention). An anomalous data results in activation of the Procedural Decision Making Region (Fig. 1).

2.4.5

Procedural Decision Making

When an animal encounters an incidence or accident, it takes procedural decisions to reduce reaction time. These decision are often characterised as “reflexive actions”. These decisions are based on a combination of recognition processes and their associated decisions [9]. Similarly, Procedural Decision Making Region uses schemas present in the Memories Region to identify failure modes and type of damage. This knowledge is used for estimating safe working conditions for the damaged equipment.

328

M. Singh et al.

2.5 Secondary Component (“New Brain”) “New Brain” or neocortex with its knowledge and understanding of the world, while actively involved, facilitate the old brain in pursuit of its survival goals. Thus, neocortex is only an enabler for motivations that arise in the old brain. It can also set its own goals and motivations based on the needs. Additionally, since the neocortex is not connected directly to the sensors and cannot directly control the movement, it relies on the old brain for signals from the sensors and for control over the actions [2]. Reflecting the working of the neocortex, on detection of anomaly, the Secondary Component carries out detailed analysis. The main purpose of the Secondary Component is to carry out detailed analysis and recommend optimize operation control and inspection-maintenance schedule.

2.5.1

Memories for Deliberative Decision Making

At birth, neocortex does not have any knowledge about the environment in which the host person lives. As the time passes, the signals that come from sensors via the old brain are not stored in neocortex as a library of facts; rather this learning is in the form of predictive models in the neocortex. Thus, mammals are not born with models in their neocortex, but have the ability to create them by learning. As the animal explores the changing world around them, it makes new models and updates the old ones. This way the brain makes, remembers and reuses hundreds of thousand models to give a “feel” of the real world [1]. Just as the neocortex has several models to represent the environment, the Memories Region of the Secondary Component has a number of models to represent the process equipment. These models analyse the data coming from sensors to interpret the real-time condition of the process equipment, predict its behaviour and control it. Depending on the requirements of the process equipment, different types of models are used, including computational fluid dynamics (CFD) models for predicting flow of fluids, finite element analysis (FEA) for structural analysis, degradation models for loss of integrity, etc. These models can complement each other so that the condition of the equipment is interpreted not using a single source of data and model, but from a number of heterogeneous sensors and complementary models.

2.5.2

Failure Profile Analysis

Failure Profile Analysis examines the details of different types of failures that an equipment might experience during the operational life. This provides an understanding of cause, mechanisms, modes and type of failure (Fig. 5). It can be performed using several methods and tools such as: Failure Mode and Effects Analysis (FMEA),

Development of a Biologically Inspired Condition Management System …

329

Fig. 5 Failure profile of humans and equipment

Fault Tree Analysis (FTA), Root Cause Analysis (RCA), etc. [9]. Each of these approaches has its own capabilities and limitations and the choice of approach depends on the system needs and objectives.

2.5.3

Risk Assessment

According to ISO 31000, Risk Assessment is the overall process of risk identification, risk analysis and risk evaluation [10]. Thus, it is carried out in three steps: . Risk Identification . Risk Analysis . Risk Evaluation The first step, Risk Identification, is carried out in the Primary Component under Damage Identification. For carrying out Risk Analysis, Likelihood of Failure is evaluated using necessary data from the real-time sensors, inspection reports, etc. and appropriate models stored in Memories of the Secondary Component. Consequences of Failure for safety, economic and environmental aspects are mostly evaluated using rules, incidental memories, etc. that are present in the Memories of the Primary Component. Finally, the results from the previous step, Risk Analysis, and Risk Acceptance Criteria stored in the Memories of the Secondary Component are used for Risk Evaluation.

2.5.4

Deliberative Decision Making

Deliberative decision-making is a slow and computationally intensive process. It requires sufficient data and computational power. Hence, Deliberative Decision Making Region takes only the critical decisions and gives out detailed output regarding failure prognostics, optimized operation control and recommends inspection-maintenance schedule. Humans carry out deliberation by means of “episodic future thinking” that takes place in two steps: (a) imagination to create a representation of a future scenario; and

330

M. Singh et al.

(b) evaluation of that scenario. The consequences of each possible future scenario are then compared and the best option is selected [8]. Similarly, for a particular goal, Deliberative Decision Making Region considers various possible scenarios and evaluates corresponding risks and outcomes in terms of safety, cost–benefit, etc. These results are then compared and the best option is used to optimise and control process of the damaged equipment and recommend inspection-maintenance schedule.

2.6 Arbitrator Component The old brain and the new brain are not entirely separate organs; they coordinate and work as a team. At times, decisions made by the two may come in conflict and need to be resolved. For example, conflict between new brain’s goal to hold breath for long and old brain’s need to provide oxygen to the body. More often old brain’s decision prevails over the new brain’s decision [2]. In the proposed framework, an Arbitrator Component works to resolve decisions based on conflicting goals. This component receives decisions from the Primary Component and Secondary Component. At times, because of the optimisation among multiple goals, the Secondary Component may generate multiple decisions. If the differences between the decisions exceed some pre-set thresholds, then arbitration among the decisions are carried out using Arbitration Methodology [11].

3 Conclusions Application of biomimicry for solving technical challenges is relatively new. To the best of knowledge, use of biomimicry for optimising inspection and maintenance schedule of an equipment has not been explored. This paper describes a framework, inspired by human cognition system, for developing a condition management system framework. It is expected that the system will be able to utilize real-time data and integrate it with a knowledge-based system so that it can diagnose faults, and recommend optimised operation and maintenance programs of a damaged equipment. It is expected that the framework can help to: . explicitly address setting of different goals and objectives of the operation and maintenance tasks so as to facilitate decision making under multiple and often conflicting requirements; . handle different types (numerical, pseudo-numerical, linguistic, etc.) of data, information and knowledge for decision making in a systematic and structured manner; . provide access to various mathematical and statistical tools where needed;

Development of a Biologically Inspired Condition Management System …

331

. take decision in a structured way by integrating – procedural decision making that recommends decision based on limited data / information but weighted towards safety and security of the process, – deliberative decision making that recommends decision based on detailed analysis and weighted towards optimisation of process, – argumentative decision making that takes into account multiple and at times conflicting goals or recommendations (procedural and deliberative).

References 1. Biomimicry Institute. https://biomimicry.org/. Accessed 10 Feb 2023 2. Hawkins J (2021) A thousand brains—a new theory of intelligence. Basic Books, New York 3. Bridgman T, Cummings S, Ballard J (2019) Who built maslow’s pyramid? A history of the creation of management studies’ most famous symbol and its implications for management education. Acad Manage Learn Educ 18(1):81–98 4. Fishbach A, Ferguson MJ (2007) The goal construct in social psychology. In: Social psychology: handbook of basic principles, pp 490–515 5. Simpson EH, Balsam PD (2016) The behavioral neuroscience of motivation: an overview of concepts, measures, and translational applications. Curr Top Behav Neurosci 27:1–12 6. Firesmith DG (2003) Common concepts underlying safety, security, and survivability engineering. Technical Note CMU/SEI-2003-TN-033, Carnegie Mellon University 7. Bartnes M (2006) Safety versus security? In: Proceedings of the 8th International conference on probabilistic safety assessment and management, 14–18 May 2006, New Orleans, Louisiana, USA 8. Redish AD, Schultheiss NW, Carter EC (2016) The computational complexity of valuation and motivational forces in decision-making processes. Curr Top Behav Neurosci 27:313–333 9. International Organization for Standardization (ISO) (2019) IEC 31010:2019, Risk management—risk assessment techniques 10. International Organization for Standardization (ISO) (2009) ISO 31000:2009, Risk management—principles and guidelines 11. Fridman L, Ding L, Jenik B, Reimer B (2018) Arguing machines: human supervision of black box AI systems that make life-critical decisions. Accessed 27 Feb 2023

Application of Autoencoder for Control Valve Predictive Analytics Michael Nosa-Omoruyi and Mohd Amaluddin Yusoff

Abstract In this paper, we investigated the application of an autoencoder neural network for predictive analytics of control valves, which are crucial components in industrial processes with significant consequences in case of failure. The autoencoder was created using Python and Keras deep learning framework, comprising encoding and decoding sections. By comparing the difference between the input sensor data and its reconstructed output, referred to as the reconstruction error, we were able to identify anomalies. The result which is based on the data of an actual asset was compared with the random forest regressor to ensure the effectiveness of the approach. We have also proposed a practical approach to generate alerts when the magnitude exceeds a predefined threshold, thereby enabling proactive maintenance and avoiding unplanned shutdowns. We emphasized the diagnostic capability of the autoencoder in identifying anomalous sensors, which is not present in traditional regression approaches. Furthermore, we argued that this capability could be more valuable for a complex equipment with many input sensors. The proposed approach can be further improved to provide prognostic capability by forecasting the trend of the reconstruction error. Keywords Predictive analytics · Autoencoder · Reconstruction error

1 Introduction In the control valve [1, 2], the valve body changes the size of the flow passage by blocking or allowing the flow of fluid through the valve depending on the position of the valve, which is controlled by the actuator. The actuator receives an electrical, M. Nosa-Omoruyi (B) Department of Computer Science, University of Port Harcourt, Port, Harcourt, Nigeria e-mail: [email protected] M. A. Yusoff Digital and Innovation Department, Nigeria LNG Limited, Port, Harcourt, Nigeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_24

333

334

M. Nosa-Omoruyi and M. A. Yusoff

pneumatic, or hydraulic signal, and converts it into mechanical movement, which is used to open or close the valve. As a mechanical device which can be operated remotely and automatically, it is used to control variables such as flow rate, temperature and pressure in industrial processes. Given the importance of the valve in industrial processes, extensive research has been conducted to identify valve’s failures using predictive analytics. The valve is a critical component and failure can result in decreased production and system reliability [3, 4]. Early detection and diagnosis of failures are critical. Predictive analytics [5–11] can help identify potential failures before they occur, allowing users to schedule maintenance and repairs ahead of time and avoid unexpected failures and downtime. However, predictive analytics for the valve used in this paper forecasts the behavior and performance using data, machine learning, and deep learning algorithms. These models and algorithms analyze data from the valve’s sensor to identify patterns and trends that indicate when the valve is likely to fail or when its performance deviates from normal, as well as the identification of the root cause of anomalies. The valve’s sensor data includes parameters such as flow rate, pressure, temperature, and vibration. In this paper, we applied autoencoder for valve predictive analytics. By analyzing sensor data that are responsible for monitoring the state of the valve system and modifying the architecture of the autoencoder [12–15], we were able to create compact representations of sensor data which was used to identify patterns and trends that indicated when the performance of the valve deviated from normal (anomaly detection). These compact representations can be thought of as a compressed version of the sensor data that captures the most important information for predictive analytics. The rest of the paper is structured as follows. Section 2 introduces related works, first on the use of deep autoencoders for intelligent surveillance of Electric Submersible Pumps [12] and then on valve predictive analytics using random forest regressor. Section 3 describes the workflow, the architecture of the autoencoder for valve predictive analytics and the determination of anomaly threshold. Section 4 consists of illustrative examples of the results followed by detailed discussions in Sect. 5. Section 6 contains the conclusion of the applied approach. Finally, Sect. 7 details our future work.

2 Related Work In this section, we begin by summarizing how deep autoencoder has been used to intelligently monitor electric submersible pumps (ESP). The autoencoder network was trained using a 2-year historical data dump of stable operating data from 97 sensors. As a result, the model was able to comprehend the ESP’s stability patterns. It was applied to sensor data surrounding an ESP to reveal insights into the operating conditions that cause these systems to trip and fail. This solution is currently used to intelligently monitor ESPs in near real-time and to send email alerts whenever a sensor deviates from stability [12].

Application of Autoencoder for Control Valve …

335

Fig. 1 Obtained result from the random forest regression model

To predict deviations in the valve, a random forest regressor [16, 17] was used to model the behavior. The algorithm which works by constructing and averaging the outcomes of multiple decision trees, was trained on the sensor data. It divided the sensor data into subsets based on the attribute with the highest information gain, which was a measure of how much randomness or uncertainty that was reduced in the data as a result of the split. It reduced the randomness of the sensor data and improved prediction accuracy. As shown in Fig. 1, the random forest regressor was able to detect a deviation in the valve around the month of March, 2021 (based on data of an actual asset).

3 Workflow The machine learning workflow was written in Python, and the autoencoder network was built with Keras deep learning library. The data collected was a time series of data from the sensors of the valve. After cleaning the sensor data by removing missing values, the MinMax scaler was used to scale the features of the sensors’ data to a specific range. After splitting the data into training and testing sets, the modified network was trained on the training set to learn patterns and relationships in the data that are representative of the entire data set. Upon checking for overfitting and underfitting, the performance of the network was evaluated by calculating the mean squared error (MSE) of the difference between the input and the output. The details of the parameters of the autoencoder network is shown in the section Results.

336

M. Nosa-Omoruyi and M. A. Yusoff

Fig. 2 Schematic of an autoencoder

3.1 Applying Autoencoder to Valve Predictive Analytics Unsupervised, the autoencoder learnt effective data encodings. The objective of the autoencoder was to find a representation for a collection of sensor data and replicate it in the output. An autoencoder [14] is a feedforward neural network, with the exception that the number of neurons in the input and output layers are the same. The representation of an autoencoder can be seen in Fig. 2. It is divided into two sections: encoding and decoding. The encoding section reduced the dimension of the sensor data to a lower dimension, forcing the network to learn important new properties about the sensor data. The encoder section functioned similarly to the principal component analysis (PCA) in terms of dimensionality reduction. However, unlike PCA which is linear, it did this dimensionality reduction in a non-linear manner. The decoder section reproduced the output of the latent space characteristics. Only sensor data that matches an autoencoder’s training set can be produced. The network’s difficulty to recreate a particular sensor data has important uses as opposed to the straightforward reconstruction of the sensor data, which is useless in and of itself. It is suitable for anomaly detection because of this attribute. The autoencoder network lost some information while attempting to reproduce the sensor data. This information loss was small for sensor data on which the model was trained; however, it was significantly bigger for sensor data on which the model was not trained. This information loss can be described by the autoencoder’s reconstruction error, which is calculated by the equation below. R E = x − x'

(1)

where . R E is the reconstruction error i.e., the difference between input sensors and the reconstructed output,.x the vector of input sensors and.x ' the model reconstruction of sensor values. The autoencoder’s reconstruction error is calculated using the difference between the model’s input vector and its projected output vector. As the equation shows, reconstruction error is a vector with the same shape as the input data. Finding the mean of the squared reconstruction error (MSE), which is used as the anomaly score, is a useful way for condensing this vector to a single integer. This value is calculated

Application of Autoencoder for Control Valve …

337

using the following equation. E MSE =

(R E)2 N

(2)

where. N is the number of sensors used to train the model and. M S E the mean squared error (anomaly score).

3.2 Determination of Anomaly Threshold The mean and standard deviation of the reconstruction error was used to calculate the anomaly threshold for the sensor data. The threshold [12, 15, 18–20] was typically set at a multiple of the standard deviation from the mean and the anomalies were data points with a reconstruction error that exceeded the threshold, as shown in section Results. It is possible to determine which tags are responsible for the identified anomaly by examining the severity of the reconstruction error. The greater the error, the more likely it is that the tag is the root cause of the anomalous circumstance. This is typically accomplished by examining the features or variables with the highest reconstruction error. These characteristics or variables can then be used to determine which tags are most likely to be responsible for the anomaly. This method is based on the idea that the degree of anomaly is directly proportional to the reconstruction error.

4 Results Table 1 shows the parameters of the autoencoder network with reference to the section Workflow. As shown in Fig. 3, the anomaly data points had a mean squared error value that differed from the mean by .3.99 × 10−4 standard deviations, as described in Section Determination of the anomaly threshold. Figure 4 depicts how the anomaly score of the valve rises before an incident and falls after the system has stabilized. In comparison to the random forest regressor used in the previous implementation as illustrated in section II, the proposed model was able to detect anomalies in the sensor data for the valve used in this paper around the month of March, 2021 (based on data of an actual asset), as shown in Fig. 5.

338

M. Nosa-Omoruyi and M. A. Yusoff

Table 1 Parameters of the Autoencoder network Parameter Value Number of layers Activation function Optimizer Loss function

Fig. 3 Determination of the anomaly threshold

Fig. 4 Obtained result of the autoencoder reconstruction error

Fig. 5 The autoencoder reconstruction error obtained was compared to the random forest regression deviation

.7

Rectified linear unit, Sigmoid Adam Mean squared error

Application of Autoencoder for Control Valve …

339

5 Discussion From Fig. 4, it is clear that the magnitude of the error in the testing period (blue) is significantly higher than that of the training period (red). The peak in Fig. 4 that occurred around March 2021 coincides with the red dotted line in Fig. 1. This indicates the time when the valve has the highest deviation between predicted and actual values as illustrated in Fig. 5. This result shows the feasibility of using the reconstruction error of an autoencoder to detect anomalies. In a practical situation, the reconstruction error at any given time can be monitored and when the magnitude is higher than a predefined threshold, an alert or an email to a responsible engineer can be generated automatically for proactive action. As shown in Fig. 3, we can use the means and standard deviations from both training and testing periods to determine if any given situation is normal or not. Specifically, Z-score given by x −μ (3) Z= σ can be calculated where .x is the error, and .μ and .σ are the mean and standard deviation respectively. Let’s denote the Z-score for both training and testing duration as . Z train and . Z test respectively. If . Z test is higher than . Z train , .x can then be classified as anomalous. The reconstruction error can be further used to forecast the trend of the error either increasing or decreasing hence providing prognostic capability of the predictive system. As briefly mentioned in Sect. 3.2, the main advantage of using autoencoder instead of the traditional regression algorithm is its inherent diagnostic capability to identify anomalous sensors based on the reconstruction error. In a practical situation, it is vital to identify the bad sensors for proactive action to avoid unplanned shutdown of the plant or other unintended consequences. As shown in Fig. 2, the tag with the highest error can be quickly identified. This capability is absent when using a regression approach which necessitates a more comprehensive troubleshooting. We note that this capability is very crucial when designing a predictive model for a larger equipment with higher number input sensors. For example, normally .5 input sensors are sufficient for a valve but for a compressor, .50 to .200 sensors are required. Therefore, with this successful illustration of the approach, our future work would include the implementation of the algorithm for a more complex equipment.

6 Conclusion We have demonstrated the effectiveness of an autoencoder neural network in valve predictive analytics. By analyzing time series data from multiple sensors, we were able to detect anomalies using the reconstruction error of the compressed sensor data. The proposed approach shows great promise beyond detecting anomalies. It has the potential to identify root causes and diagnose faults, which enables proactive

340

M. Nosa-Omoruyi and M. A. Yusoff

action and more efficient maintenance activities. With this potential, we believe that the proposed approach would provide more advantages when applied to more complex systems where troubleshooting beyond anomalies detection involves timeconsuming manual processes.

7 Future Work In our future work, besides implementing the approach for a larger equipment, we will develop prognostic capability by formulating a suitable forecasting machine learning model using the reconstruction error as an input. Additionally, the valve predictive analytics problem could be directly restated as a regression problem by modifying the autoencoder’s architecture and loss function to be applicable to general regression tasks. It learns how to reduce the dimensionality of the input data using the autoencoder model, as well as how to linearize the latent space during training by aligning data points with the value we want to predict. After training the autoencoder to encode the input data into a lower-dimensional latent space, a linear layer can be added to predict the target value. This allows the model to learn a non-linear mapping from the input data to the target value in a lower-dimensional space [13, 21–23].

References 1. Emerson Automation Solutions (2017) Control Valve Handbook (PDF), 5th edn. Fischer Controls International LLC. Accessed 04 May 2019 2. Patrascioiu C, Panaitescu C, Paraschiv N (2009) Control valves modeling and simulation, pp 63–68 3. Sharif MA, Grosvenor RI (1998) Fault diagnosis in industrial control valves and actuators. In: IMTC/98 conference proceedings of the IEEE instrumentation and measurement technology conference. Where Instrumentation is Going (Cat. No. 98CH36222), vol. 2, pp 770–778. https:// doi.org/10.1109/IMTC.1998.676830 4. Qureshi M, Miralles L, Payne J, O’Malley R, Namee BM (2020) Valve health identification using sensors and machine learning methods. https://doi.org/10.1007/978-3-030-66770-2_4 5. Bangert P, Sharaf S (2019) Predictive maintenance for rod pumps. Society of Petroleum Engineers. https://doi.org/10.2118/195295-MS 6. Jansen van Rensburg N (2018) Usage of artificial intelligence to reduce operational disruptions of esps by implementing predictive maintenance. Society of Petroleum Engineers. https://doi. org/10.2118/192610-MS 7. Marra F, Girard C (2017) Advanced electric submersible pumps—added value for offshore fields. Society of Petroleum Engineers. https://doi.org/10.2118/185159-MS 8. Dunham C (2013) 27th ESP Workshop, summary of presentation. https://www.spegcs.org/ media/files/files/cebfcc3a/2013-ESP-Workshop-Summary-of-Presentations.pdf 9. Gupta S, Saputelli L, Nikolaou M (2016) Applying big data analytics to detect, diagnose, and prevent impending failures in electric submersible pumps. Society of Petroleum Engineers. https://doi.org/10.2118/181510-MS

Application of Autoencoder for Control Valve …

341

10. Pandya D, Srivastava A, Doherty A, Sundareshwar S, Needham C, Chaudry A, KrishnaIyer S (2018) Increasing production efficiency via compressor failure predictive analytics using machine learning. In: Offshore technology conference. https://doi.org/10.4043/28990-MS 11. Urban A, Boechat N, Haaheim S, Sleight N, Debacker I, Rivera R (2015) MOBO ESP interventions. In: Offshore technology conference. https://doi.org/10.4043/26125-MS 12. Alamu OA, Pandya DA (2020) ESP data analytics: use of deep autoencoders for intelligent surveillance of electric submersible pumps. In: Offshore technology conference, vol 30468-MS 13. Lee S, Kim H, Lee D (2022) Linearization autoencoder: an autoencoder-based regression model with latent space linearization. Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST). https://doi.org/10.1101/2022.06.06.494917 14. Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on unsupervised and transfer learning, in proceedings of machine learning research, vol 27, pp 37–49 15. Fathi K, van de Venn HW, Honegger M (2021) Predictive maintenance: an autoencoder anomaly-based approach for a 3 DoF delta robot. Sensors 21:6979. https://doi.org/10.3390/ s21216979 16. Fawagreh Khaled, Gaber Mohamed Medhat, Elyan Eyad (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng Open Access J 2(1):602–609. https://doi.org/10.1080/21642583.2014.956265 17. Paolanti M, Romeo L, Felicetti A, Mancini A, Frontoni E, Loncarski J (2018) Machine learning approach for predictive maintenance in Industry 4.0. In: 2018 14th IEEE/ASME International conference on mechatronic and embedded systems and applications (MESA), pp 1–6. https:// doi.org/10.1109/MESA.2018.8449150 18. Borghesi A, Bartolini A, Lombardi M, Milano M, Benini L (2019) Anomaly detection using autoencoders in high performance computing systems. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp 9428–9433. https://doi.org/10.1609/aaai.v33i01. 33019428 19. Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis (MLSDA’14). Association for Computing Machinery, New York, NY, USA, pp 4-11. https://doi.org/10.1145/2689746.2689747 20. Cacciarelli D, Kulahci M (2022) A novel fault detection and diagnosis approach based on orthogonal autoencoders. Comput Chem Eng 163:107853. ISSN 0098-1354, https://doi.org/ 10.1016/j.compchemeng.2022.107853 21. Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res 16(1):2859–2900 22. Hosseini B, Hammer B (2020) Interpretable discriminative dimensionality reduction and feature selection on the manifold. In: Joint European conference on machine learning and knowledge discovery in databases, pp 310–326. Springer, Cham 23. Tian TS, James GM (2013) Interpretable dimension reduction for classifying functional data. Comput Stat Data Anal 57(1):282–296

LCC Based Requirement Specification for Railway Track System Stephen Famurewa and Elias Kirilmaz

Abstract Life cycle cost (LCC) analysis is a key tool for effective infrastructure management. It is an essential decision support methodology for selection, design, development, construction, and maintenance of railway infrastructure system. Effective implementation of LCC analysis will assure cost-effective operation of railways from both investment and life-cycle perspectives. A major setback in the successful implementation of LCC by infrastructure managers is the availability of relevant, reliable, and structured data. Another challenge is the prediction of future behaviour of railway system with a change in the design or operation parameters. Different cost estimation methods and prediction models have been developed to deal with both challenges. However, there is a need to integrate prediction models as an integral part of LCC methodology, to account for possible changes in the model variables. This article presents an LCC based approach for requirement specification. It integrates degradation models with an LCC model to study the impact of change in design speed on key decision criteria such as track possession time, service life of track system, and LCC. The methodology is applied to an ongoing railway investment project in Sweden to investigate and quantify the impact of design speed change from 250 to 320 km/h. This is carried out to support specification of technical requirement for the design of track system. The results from the studied degradation models show that the correction factor for a change in speed varies between 0,79 and 0,96. Using this correction factor to compensate for changes in design speed, the service life of ballasted track system is estimated to decrease by an average of 15% from 30 years to approximately 25 years. Further, the LCC of the route under consideration will increase by an expected value of 30%.

S. Famurewa (B) Trafikverket, Luleå, Sweden e-mail: [email protected] Luleå University of Technology, Luleå, Sweden E. Kirilmaz SWECO, Stockholm, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_25

343

344

S. Famurewa and E. Kirilmaz

1 Introduction Life cycle cost analysis is a method for evaluating the total cost of ownership for a component, system, or complex system over its entire lifespan. The goal of LCC analysis is to identify the most cost-effective option by comparing the total costs of different alternatives over the life of the system. It is a decision-making tool commonly used in engineering, construction, and procurement to evaluate the costeffectiveness of different design and technical solutions. It provides a comprehensive and objective way to evaluate and analyse the total cost of a system and support engineers or other asset managers in making effective decision from a cradle-to-grave viewpoint. By considering the total cost which includes the costs of operating, maintaining, and disposing a system and not only the initial acquisition cost, LCC analysis allows engineers to make more informed decisions about the long-term economic feasibility of different design options. Additionally, this systematic approach to decision making can help to identify trade-offs between different designs such as between a more expensive, high-performance product and a less expensive, low-performance product. The application areas of LCC analysis in railway infrastructure management include: ranking or selection of railway project, requirement specification, design and construction of railway system. It is also applied in selection of technical system, procurement of new system or component, selection of maintenance methods, maintenance intervention cycles, decision between maintenance and renewal, optimal timing of technological shift, decision between improvement and modification, as well as system upgrade or replacement. Life cycle cost analysis has been used in the railway industry to evaluate the total cost of ownership for various rail infrastructure, rolling stock, and other railwayrelated systems. For example, LCC analysis has been used to evaluate the costeffectiveness of different track maintenance and renewal options, for example slab or traditional ballasted tracks. Further, it has been used to evaluate the cost-effectiveness of different rolling stock options, such as diesel or electric trains, and to compare the costs of different signaling and train control systems. Additionally, LCC analysis can also be used to evaluate the sustainability of different railway-related systems and components. This is done by considering the environmental, social, and economic impacts of different alternatives over the entire service life. The method can help to identify options that are more sustainable in terms of their long-term impacts. A vital advancement in the application of LCC in railway industry is the use of probabilistic and statistical methods such as Monte Carlo simulation which allows systematic incorporation of uncertainty and risk into the LCC analysis (1). This enables decision-makers to evaluate the potential impacts and cost implication of variations in design, operational parameters, or market conditions. The contribution of this paper is the application of deterioration models to estimate correction factor that is included in the presented LCC model. This is aimed at studying the impact of increased design speed on track system. The result of this study is used to support

LCC Based Requirement Specification for Railway Track System

345

the design specification at an early lifecycle phase of an ongoing railway project in Sweden.

2 Method This study entails the development of a data-based decision model to support the selection of track system during the design phase of a new railway line. The method used in the study includes literature survey, degradation modelling, Monte-Carlo simulation and LCC modelling. The first step in the study is to collect available cost data for the track system under consideration. Construction and maintenance costs for ballast system with speed up to 250 km/hr was collected from literature and available database. Similar costs for slab track with operating speed up to 320 km/hr were obtained from benchmarking. Thus, these operating conditions became the baseline for both systems. Furthermore, literature study was conducted to study how change in operating speed can affect the degradation, maintenance need and cost of ballasted track system. Several models were studied to capture the behaviour of ballasted track system under varying operating conditions. Based on relevance, robustness, model formulations, result accuracy as presented in the literature studied, 10 models were selected for detailed study. At the final stage, 1 model was exempted due to very high discrepancy in comparison to other models and poor performance with variation in speed, which is the parameter of interest. A list of the degradation model with simplified model formulation, remarks and literature source is given in Table 1. More information about these models can be found in the references provided. It should be noted that the main aim of using these models is not to predict the exact evolution of the track system but to estimate a correction factor which will be used to adjust the maintenance and cost of the known baseline scenario. For the data collection, different sources of data were used to obtain the input values for the LCC model. The default values of the degradation model parameters presented in Table 1 are collected from literature and the technical specification document for the construction of the route. Maintenance and cost data are collected from benchmarking and best practices. Line specification and system description are collected from the ongoing investment project within the Swedish Transport Administration, Trafikverket. Correction factor is a common method used in engineering design and construction to compensate for changes or uncertainties in design parameters and operating conditions. Correction factor makes it possible to use the results of the different degradation models even though there are uncertainties in predicting the exact degradation value. The correction factor is estimated using the ratio of the predicted deterioration of the track system under 2 different operating speeds i.e., 250 and 320 km/h, see Eq. 1. It should be noted that other parameters remain unchanged in the study. The correction factor is then used to adjust the possible change in maintenance need, service life of track components and costs due to speed change.

Degradation models [References]

Sato [2, 3, 4]

ORE [3, 4, 5]

Bing-Gross [6]

Indian formula [7]

South African formula [7]

Clarke [7]

WMATA formula [7]

British railway formula [7]

AREA formula [7]

Model nr

1

2

3

4

5

6

7

8

9

N BG = 1, 25∗

19,5 ∗ V D∗k

N A R E A = 1 + 5, 21 VD

0,67 N W H AT A = 1 + 3, 86 ∗ 10−5 ∗ V 2 0,5 D j∗Pu ∗ N B R = 1 + 8,784∗(a1+a2)∗V Ps g

T QI1 T QI2

NC L = 1 + 0, 5 ∗

NS A

NI N

α, β, γ are dimensionless speed coefficients that depend on track and vehicle parameters

V—velocity T—Annual Tonnage M—Structural factor L—CWR factor P—Subgrade factor

Comments

D—wheel diameter

g—Gravity Dj—sleeper pressure Pu Ps—axle load (a1+a2)—total rail joint dop angle

V—Velocity

k—Track stiffness

−0,58 −0.18 1,04 TQI—Track quality indices at A1 −0,11 ∗ VV 21 ∗ RR A2 ∗ BB II 21 ∗(1 + F S)−0,44 time 1&2 V—Velocity at time 1&2 RA—Track age at time 1 and 2 BI—Ballast index at time 1 and 2 FS—Substructure factor V k—Track stiffness = 1 + 0, 5 ∗ 58,14 ∗ k = 1 + 4, 92 ∗ VD D—Wheel diameter

NO R E = 1 + α + β + γ

N S = 2, 04 ∗ 10−3 ∗ T 0,31 ∗ V 0,98 ∗ M 1,1 ∗ L 0,21 ∗ P 0,26

Formulation

Table 1 Degradation models [see references 2–20]

346 S. Famurewa and E. Kirilmaz

LCC Based Requirement Specification for Railway Track System

C F mod =

Degradation value 250 km/h Degradation value 320 km/h

347

(1)

CFmod is the correction factor for a given degradation model m. A range of correction factors was obtained using the degradation models in Table 1. A normal distribution was assumed for the factors and then used for a probabilistic estimation of the values of the unknown LCC parameters under varying operational condition. These parameters include service life and intervention interval of track systems. In the design of track system and selection of optimal operating parameters, the amount of possession time that will be required for the purpose of maintenance and renewal is a very vital criterion. This is a good indication of the availability performance that a track system can deliver over a given period. A benchmarking of existing maintenance plan of ballasted track with speed up to 250 km/h was carried out to create a realistic maintenance plan for the base scenario. The correction factors were used to adjust the intervention intervals for the different activities for ballasted track system with speed up to 320 km/h. A simplified formulation used to estimate the possession time requirement for the two systems is given in Eqs. 2 and 3. Note that this does not include the time required for the design, projection, and initial installation of the systems. Possession time = Maintenance Possession T ime + Renewal Possession T ime

Possession time =

M m=1

120 120 ∗ M PT m + ∗ R PT r I nter val m I nter val r r =1

(2)

R

(3)

where MPTm and RPTr are maintenance possession time for activity m and renewal possession time for activity r respectively. For the life cycle cost of the two track designs, the costs of acquisition, operation and maintenance, renewal and disposal of replaced items are summed up over the required period of 120 years. Only significant, distinctive, and future cost elements are included in the model since the purpose of the study is to compare the proposed alternatives and to provide input for decision making. A simplified formulation of the LCC model is given in Eq. 4. LCC = Cost acquisition + Cost operation & maintenance + Cost r enewal and disposal

(4)

For calculation without discount rate, LCCs for the alternative designs under consideration are estimated using the formulation below. LCC = C A +

M m=1

120 120 ∗ C Mm + ∗ C Rr I nter val m I nter val r r =1 R

(5)

348

S. Famurewa and E. Kirilmaz

LCC estimation with discount can be estimated using the formulation in Eq. 6. 120

LCC = C A +

120

M I nter val m m=1

n=1

C Mm (1 + i )

(n∗I nter val m )

+

R I nter val r r =1

n=1

C Rr (1 + i)(n∗I nter val r )

(6)

For ballast system with traffic speed equal to 320 km/h, the interval used in the LCC formulation is adjusted for speed change using the formular below: I nter val m_320 =

I nter val m C F mod

(7)

where CA is the cost of acquisition of the different systems. CMm and CRr are the cost of maintenance for intervention m and cost of renewal for activity r respectively. Intervalm and Intervalr are the intervention intervals for maintenance and renewal activities m and r respectively. Intervalm_320 is the maintenance interval for activity m under an operating speed condition of 320 km/h. The variation in the intervals of track intervention due to the impact of operational parameters was modelled using Monte Carlo Simulation assuming a normal distribution of the correction factor. 10 000 simulations were carried out to have a reliable estimate of the decision criteria. Note that a discount rate of zero is used in the LCC calculation due to the long calculation period of 120 years required for technical specifications within the project.

3 System Description The main aim of the study was to support the specification of track design requirement for an ongoing investment project in Sweden. An overall description of the route is given in Table 2 below. Only the features that are needed for modeling the impact of speed change is given in the table. The description of the route presented below was as it were in June 2022 when this study was carried out. For analysis and comparison, a third option called alternative 3 with ballasted track and speed of 250 km /h on all the lines was drawn up.

4 Results and Discussions Using all the degradation models presented in Table 1, a range of values was obtained for the correction factors. These values were used to adjust parameters in the LCC models whose values were expected to vary with change in speed. Figure 1 presents the values obtained for the different models. The variation in the factor is assumed to be due to several factors including: assumption of the models, parameter values used

LCC Based Requirement Specification for Railway Track System

349

Table 2 Description of line section Line section

Alternative 1- Design Track system

Alternative 2- Design Track system

Speed (km/ h)

Estimated Trains per day

Estimated annual load (MGT)

Track length (km)

Line 1

Ballasted

Ballasted

250

179

16

177

Line 2

Slab

Ballasted

320

160

15

122

Line 3

Slab

Ballasted

320

100

9

83

Line 4

Ballasted

Ballasted

250

210

15

82,5

Line 5

Slab

Ballasted

320

100

9

207

Line 6

Slab

Ballasted

320

130

11

57

in the model, number of parameters in the model, application area of the model (heavy haul, mixed traffic), approximation and simplification in the model. However, using several models gives a robust and practical range of the possible variation in the LCC parameters. In a nutshell, the values presented in the figures is a quantification of the impact of track force on maintenance and renewal actions as caused by speed increase in an uneven track. Technically, an increase in track force leads to the following: rail head wear, fatigue of the rail, increased bending stress of the rail foot, increased pressure at sleeper-ballast interface & rail-sleeper interface, higher bearing stress in ballast and subgrade, increased force on fasteners and higher track deformation. Using the distribution of the correction values to compensate for increase in design speed from 250 to 320 km/h, the service life of ballasted track system is expected to decrease by an average of 15%. This means a reduction from 30 years to a range between 23 and 28 years is expected. Similar level of decrease is expected in the service life of some other track components and their maintenance intervals. Note that there is no change in the design speed of any line section with slab track thus a design life of 60 years is used in the analysis. In other words, correction factors were not applied to slab track scenarios. Using the procedure described in Sect. 2 and in Eq. 3, the amount of possession time that will be required for the purpose of maintenance and renewal is estimated and presented in Fig. 2. This is an indication of the change in availability performance of ballasted track system when the speed is increased from 250 to 320 km/h. An increase of about 18% is expected for the required track possession time. In reality, the outcome can be far above the estimated value in this study if an effective maintenance programme is not adopted. Similarly, the procedure described in Sect. 2 was used to estimate the LCC for the different design alternatives. It should be noted that intervention intervals in the models are treated as probabilistic variables due to uncertainties in the degradation models used to predict the impacts of speed. As a result, a probabilistic approach was used in the LCC estimations, and the results are presented in Fig. 3. The results show that there is a clear difference between the estimated LCC for alternatives 1 and 2. Alternative 1 which has about 67% of the entire route as slab track has a lower LCC

350

S. Famurewa and E. Kirilmaz

Fig. 1 Correction factors for the adjustment of LCC parameters

than the competing alternative 2 (100% ballast) even though the initial investment is relatively higher. Alternative 1 is estimated to be 30% less expensive than alternative 2 from a lifecycle viewpoint. Also, the variation in the estimated LCC for alternative 1 is lower than the second alternative based on the variations in the input parameters. The results also show that increasing the speed from 250 to 320 km/h on a ballasted section would lead to an average of 10% increase in the LCC. The magnitude of the difference in LCC for alternative 2 and comparative alternative 3 should be interpreted carefully due to the overlap of the box plot. This is due to the uncertainty of the impact of speed on LCC parameters such as intervention intervals and service life of track components. More data collection or expert opinion would be needed to confirm if the difference in LCC of these two alternatives is significant. However, there is an indication that increase in speed would lead to higher maintenance needs and if not well handled with appropriate preventive maintenance, cost of ownership could grow exponentially at an undesirable rate. It is worth mentioning that this study assumed relatively good ground formation along the route thus no need for extensive ground preparation and reinforcement for slab track in some areas. In future study, location-specific geotechnical works that is required for construction of both slab and ballast track system along the route will be considered in detail.

LCC Based Requirement Specification for Railway Track System

351

Fig. 2 Change in track possession time for ballasted track with operational speed of 250 km/h and 320 km/h

5 Conclusions This study has contributed to the advancement of LCC application in the design phase of railway infrastructure where there is inadequacy of data. It addresses the integration of deterioration model and LCC model to study the impact of increasing design speed on important selection criteria for track system. A probabilistic approach has been used to capture the uncertainty in predicting the effect of speed change on LCC parameters such as maintenance interval and service life of track system. In summary, an increase in the operational speed of a ballasted section from 250 to 320 km/h will lead to an average of 15% decrease in service and 18% increase in possession time. From a route perspective, about 30% increase in LCC is expected with a change of the design from alternative 1 to alternative 2. On the other hand, an average of 10% increase in LCC is expected with a change in speed from 250 to 320 km/h on a ballasted track. In conclusion, this study is used by track engineer to motivate the selection of track system and to specify an optimal design requirement for an ongoing investment project in Sweden.

352

S. Famurewa and E. Kirilmaz

Fig. 3 LCC for the three alternatives over 120 years

References 1. Dependability management (2017) EN 60300–3–3, Part 3–3: application guide—Life cycle costing 2. Sato Y (1995) Japanese studies on deterioration of ballasted track. Vehicle Syst Dyn 24:197–208 3. Lichtberger B (2005) Track compendium. Eurailpress, Tetzlaff-Hestra GmbH & Co., 1st edn. Hamburg, Germany. ISBN 3-7771-0320-9 4. Esveld C (2001) Modern railway track (MRT-productions Zaltbommel, The Netherlands 5. Larsson D (2004) A study of the track degradation process related to changes in railway traffic. Licentiate thesis, Lulea University of Technology. 2004; 48 ISSN: 1402-1757 6. Bing AJ, Gross A (1983) Development of railway track degradation models. Transp Res Rec 939:27–31 7. Doyle NF (1980) Railway track design, a review of current practice. BHP Melbourne Research laboratories 8. Abadi T (2015) Effect of sleeper and ballast interventions on rail track performance. University of Southampton, Faculty of Engineering and the Environment 9. Edvardsson K, Hedström R (2012) Bankonstruktionens egenskaper och deras påverkansgrad på nedbrytning av spårfunktionen. VTI rapport 864. 2012/0725-28 10. Elhhoury N, Robert DJ, Moridpour S (2008) Degradation prediction of rail tracks: a review of the existing literature. School of Engineering, RMIT University, Melbourne, Australia, Civil and Infrastructure Engineering Discipline 11. Grossoni I, Powrie W, Zervos A, Bezin Y, Pen L (2021) Modelling railway ballasted track settlement in vehicle-track interaction analysis. Transp Geotech 26

LCC Based Requirement Specification for Railway Track System

353

12. Lyngby N, Hokstad P, Vatn J (2008) RAMS management of railway tracks. Springer, Handbook of performability engineering, pp 1123–1145 13. Nguyen K, Goicolea JM, Galbadón F (2011) Dynamic effect of high-speed railway traffic loads on the ballast track settlement. Group of Computational Mechanics School of Civil Engineering, Technical University of Madrid 2011 14. Sadeghi J, Askarinejad H (2007) Influences of track structure, geometry and traffic parameters on railway deterioration. Int J Eng 20(3) 15. Sadeghi J, Askarinejad H (2008) Development of Improved railway track degradation models. Int J Struct Infrastructure Eng 3(4) 16. Sadeghi J, Askarinejad H (2009) Investigation on effect of track structural conditions on railway track geometry deviations. In: Proceeding of the institution of mechanical engineers, Part F, Journal of rail and rapid transit 17. Sadri M, Lu T, Steenbergen M (2019) Railway track degradation: The contribution of a spatially variant support stiffness—local variation. J Sound Vib 455:203–220 18. Salim MW (2004) Deformation and degradation aspects of ballast and constitutive modelling under cyclic loading, PhD thesis, Faculty of Engineering, University of Wollongong 19. Stichel S (2004) Ökade laster med hänsyn till spårnedbrytning. Järnvägsgruppen KTH, Avd för järnvägsteknik 20. Vikesland AF (2019) Track geometry degradation cause identification and trend analysis. NTNU

Pre-processing of Track Geometry Measurements: A Comparative Case Study Mahdi Khosravi, Alireza Ahmadi, and Ahmad Kasraei

Abstract Degrading linear assets such as railway track deteriorate and lose their functionality over time and usage. A reliable and effective predictive maintenance strategy is necessary to rehabilitate the functionality and reliability of these assets. Data analytics are required to be performed to extract the information used for the decision-making process and for implementing an optimized maintenance strategy. Accordingly, data pre-processing and data quality improvement are essential to remove errors in measurements and develop efficient data analysis methods. Inaccurate measurement positioning is a common error in track geometry measurements which causes track geometry single defects suffer from an uncontrolled shift called positional error. To reduce the positional errors of track geometry measurements, this paper presents two alignment methods i.e., modified correlation optimized warping (MCOW) and recursive segment-wise peak alignment (RSPA). MCOW is a profilebased method that align all the measurements with the same priority, while RSPA is a featured-based method that only focuses on the alignment of peaks with high amplitudes in the geometry measurements. To evaluate and compare the performance of these methods in aligning track geometry measurements, a case study was conducted. The results show that RSPA can precisely align the single defects, while the MCOW is more efficient when considering the same importance for aligning every single data-point. Keywords Railway track geometry · Position alignment · Positional error · Modified correlation optimized warping · Recursive segment-wise peak alignment · Linear asset · Condition monitoring M. Khosravi (B) Luleå University of Technology, 14 0046738033912 Luleå, Laboratorievägen, Sweden e-mail: [email protected] A. Ahmadi Luleå University of Technology, 14 0046920493047 Luleå, Laboratorievägen, Sweden e-mail: [email protected] A. Kasraei Luleå University of Technology, 14 0046920493292 Luleå, Laboratorievägen, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_26

355

356

M. Khosravi et al.

1 Introduction The railway track geometry deteriorates continually over time and deviates from its intended vertical and horizontal alignment due to various factors such as frequent load from the train’s wheels and environmental effects. Consequently, enhancing or maintaining the track’s quality through rehabilitation and upgrading of existing lines or building new lines is crucial. The track geometry quality is assessed through regular inspections performed on the same track. Evaluating the track geometry degradation rate and predicting the occurrence of geometry defects based on the information gathered from these inspections are essential for development of predictive models for geometry defects and maintenance planning. To predict the occurrence of geometry defects, precise positioning of the track geometry is of utmost importance. Generally, track geometry measurements are impacted by positional errors, which are uncontrolled shift, stretching, or compression of the measurements. Different sampling positions of the measurements in each inspection, slipping or sliding the wheel on the rail, wheel wear, and rail wear are some factors which cause positional errors in track geometry measurements. Positional errors negatively affect the accuracy of geometry conditions evaluation and defects prediction. To minimize positional errors in frequent inspections and make previous data useful for modeling and forecasting, track geometry measurements should be aligned with high precision. There are a few studies that have tackled the alignment of track geometry data. Pedanckar [1] used Pearson’s correlation coefficient, Selig et al. [2] utilized the cross-correlation function (CCF), and Li and Xu [3] used an optimization model to align measurement data with historical data. These studies aimed to align data from different inspection runs with a constant shift. To address alignment of datasets that were shifted, stretched, or compressed, Xu et al. [4, 5] and Palese et al. [6] applied dynamic time warping (DTW) and dynamic sampling position matching, respectively. These methods match data-points in new inspections with the most similar data-points in previous ones. Wang et al. [7] took advantage of combining the measurements of multiple track geometry parameters to enhance the alignment precision of the datasets in different inspections runs. Khosravi et al. [8] proposed a combined method using the recursive alignment by fast Fourier transform (RAFFT) and correlation optimized warping (COW) methods. This combined method not only provides accurate alignment of datasets, but also maintains their original shape and has a manageable computational time and memory usage. Furthermore, Khosravi et al. [9] Proposed a modified COW (MCOW) for alignment of track geometry measurements. This alignment method overcomes the limitation of COW in aligning the datasets that have shifts in the beginning and ending of the datasets and it aligns the datasets with higher speed. All the above-mentioned studies have considered alignment of track geometry data using profile-based methods that align all the data-points of the datasets with the same priority. However, it is more crucial to align geometry defects (high-amplitude peaks) as the key features in track geometry measurements. Thus, a feature-based alignment method that focuses on aligning peaks only, rather than all data points, may be more

Pre-processing of Track Geometry Measurements: A Comparative Case …

357

efficient. Accordingly, this paper introduces recursive segment-wise peak alignment (RSPA) as a feature-based method and compares its performance with the MCOW method. A comprehensive case study was conducted on measurement data from a line section in the Main Western Line in Sweden to evaluate the proposed method’s efficiency and applicability. Additionally, an in-depth analysis of the effects of the RSPA and MCOW methods on the shape of track geometry defects and reducing their positional errors was carried out. The rest of the paper is structured as follows: Sect. 2 outlines the basics of MCOW and RSPA methods. The case study is presented in Sect. 3. Section 4 analyzes the feasibility of the methods in aligning track geometry measurements through a case study and shows a comparison of the results obtained using RSPA and MCOW. Finally, in Sect. 5, conclusions are drawn and future research directions are suggested.

2 Methodology The following subsections present a summary of the theories related to MCOW and RSPA methods.

2.1 Modified Correlation Optimised Warping Let suppose two datasets as follows: X = { x1 , x2 , . . . , xn } Y = { y1 , y2 , . . . , ym }

(1)

where X is the reference dataset with n data-points and Y is another dataset with m data-points, which is called the unaligned dataset herein. MCOW aligns the two datasets by breaking them down into smaller parts of length L and producing N segments. Considering that n and m may not be equal, the final segments of X and Y have lengths of L + /\ X and L + /\Y , respectively, where the additions, /\ X and /\Y represent the additional data-points present in the respective segments. Figure 1 shows schematic depiction of segmentation in MCOW. As Fig. 1 shows, the beginning and end points of each segment in Y may vary by an amount equal to the slack (s). This creates new segments that are shifted, stretched, or compressed relative to the original segment. It is important to note that L, which transforms the problem into a segmentwise problem, must not be less than (2s + 1). By shifting, stretching, or compressing each segment considering the slack, (2s + 1)2 new segments are obtained from each segment in Y , which can be compared with their corresponding segments in X . The similarity between the generated segments for each segment in Y and their

358

M. Khosravi et al. 1+

−

Reference dataset ( ) 1

2

3

( − 1) ∆

1+

1

−

+

2 −

−

2 +

2

3

Unaligned dataset ( )

( − 1) ∆

Aligned dataset ( ′) 1

Fig. 1 Schematic depiction of segmentation in MCOW

corresponding segment in X is calculated using a correlation coefficient function. Prior to computing the correlation coefficient for the segments, linear interpolation is applied to equalize the length of the generated segments to that of their corresponding segments in X . Finally, to align Y , the most appropriate combination of the generated warping must be established across all segments. Detailed information on MCOW is available in the literature [9].

2.2 Recursive Segment-Wise Peak Alignment The RSPA method is employed to match the positions of the peaks in one dataset with those in a reference dataset. In the first step, RSPA identifies the peaks whose heights surpass a specified threshold in both datasets X and Y . Then, the segments in the Y are recursively aligned with their corresponding segments in the X in a top-down manner. At first, the entire Y is aligned with the entire X , and then the alignment process moves forward by focusing on smaller segments. The alignment continues until either the length of the segments is below a certain threshold (L min ) or the correlation between the two segments (ρs (T , X )) surpass δ. The value of L min is stablished based on the narrowest peak width and the value of δ is set to 0.5 [10]. The fast Fourier transform cross-correlation method is employed as a similarity measure to find the shift between the segments in Y and their corresponding segments in X . The final aligned dataset is constructed by combining the aligned segments. In order to prevent unnecessary shifting and enhance the alignment accuracy, RSPA takes the slack into consideration as an input. Detailed information on RSPA is available in the literature [10].

Pre-processing of Track Geometry Measurements: A Comparative Case …

359

2.3 Evaluation Methods According to literature [8], an effective alignment method must meet three primary goals: provide accurate alignment of the datasets, preserve the original shape of the datasets, and align with a reasonable amount of time. For this purpose, simplicity score, peak factor, and time needed for alignment are employed to assess the performance of alignment methods. The simplicity score gauges the alignment’s accuracy by examining the traits of the singular value decomposition, as specified in Eq. (2). The peak factor assesses the methods’ proficiency in retaining the original shape of the measurements, as outlined in Eq. (3). () Simplicit y = sum (

)

[T ; X '] SV D En 2 2 i=1 T (i ) + X ' (i )

))4 )

| ) ) )| | ||X '(i )|| − ||X (i)|| | 2 |,1 Peak Factor = 1 − min || | ||X (i )||

(2)

(3)

For further information on the simplicity score and peak factor, including the rationale for their use, readers may refer to the literature [8, 11].

3 Case Study A case study was conducted to compare the performance of RSPA and MCOW methods in aligning track geometry measurements, using data collected from line Section 414 located between Järna and Katrineholm Central Station in Sweden. The study included 14 inspection runs conducted between 2016 and 2018, and measurements taken by IMV200 and IMV100 at speeds ranging from 60 to 170 km/h with a sampling distance of 25 cm. The longitudinal level data for each inspection run were divided into datasets of length 1 km, with reference datasets selected based on the mean correlation coefficient criterion recommended in [8]. The alignment methods were then applied, with a slack of 60 and a segment length of 125 in MCOW, given the maximum position variation between datasets was around 15 m (60 data points).

4 Results and Discussion In this paper, the performance of MCOW and RSPA was assessed by considering their ability to align entire datasets and the peaks within those datasets. Accordingly, the simplicity score and peak factor are employed to evaluate the performance of methods in aligning both the datasets as a whole and the peaks individually. Additionally, the positional errors of single defects in the datasets were computed.

360

M. Khosravi et al.

Figure 2 displays the average simplicity scores for MCOW and RSPA, which were calculated for the entire length of the line section. In Fig. 2a, the simplicity scores for the aligned datasets and their corresponding references are shown, while Fig. 2b presents the simplicity scores for peaks in the aligned datasets and their corresponding peaks in the references. Both MCOW and RSPA were able to increase the simplicity score above 0.8 for most of the datasets after alignment, as seen in Fig. 2a. However, the red circles in Fig. 2b indicate that RSPA performs better than MCOW in aligning the peaks of the datasets, with simplicity scores closer to one. It should be noted that the simplicity score is not only affected by the accuracy of the alignment but also by inherent differences in the values of the data-points in the corresponding datasets [8]. Since the track longitudinal level deteriorates over time, the measurements in different inspection runs are not entirely alike, which leads to the maximum simplicity score being less than one most of the time, even in the absence of any positional error. Therefore, both methods exhibit a higher performance in aligning the peaks of the datasets, as shown in Fig. 2b where the simplicity scores are closer to one. Figure 3 displays the average peak factor for MCOW and RSPA. Figure 3a demonstrates that in some instances, the peak factor results produced by RSPA (indicated by red circles) are very low, indicating that this method does not always yield satisfactory results for the peak factor. However, in Fig. 3b, the red circles demonstrate that the peak factor results obtained by RSPA for only the peaks in the datasets are very close to one, indicating that RSPA is highly effective in maintaining the original shape of the peaks. Additionally, the green circles in Fig. 3a, b indicate that the MCOW can maintain the original shape of the aligned datasets to a high degree, with approximately the same peak factor value for all areas. It should be noted that the simplicity and peak factor results obtained by these methods can be influenced by adjustments to their parameters. Choosing a higher slack value may increase the simplicity score obtained by the methods, but it may also result in a decrease in the peak factor score. The precise identification of single defects is critical in track geometry maintenance planning, making the reduction of positional errors in different inspection runs a top priority. As such, the performance of MCOW and RSPA in reducing positional errors of single defects that exceed the planning limit (as defined in [12]) was investigated. Figure 4 displays the positional error results for single defects of the longitudinal level after alignment using the MCOW and RSPA methods. The figure demonstrates that, in the majority of cases, there is no positional error between defects, and for more than 90% of single defects, the positional errors dropped below 0.25 m. This is a significant improvement considering the measurement sampling interval of 0.25 m. The speed at which the methods align the datasets is also a critical evaluation criterion. This criterion was assessed by using the methods to align track geometry measurements of varying lengths and different slack values. The methods were executed on a personal computer equipped with Intel® Core™ i7 Processors (1.9 GHz) and 32 GB of memory, using the MATLAB R2021b platform. Figure 5 displays the results of the evaluation. The figure illustrates that selecting different

Pre-processing of Track Geometry Measurements: A Comparative Case …

361

(a)

(b) Fig. 2 Average simplicity scores for MCOW and RSPA: a entire length of the datasets, b peaks

slack values does not impact the time required for alignment by RSPA. However, a higher slack value can significantly increase the time required for alignment by MCOW. Moreover, it can be observed that when slack is low, both RSPA and MCOW can align the datasets at a high speed, whereas when slack is high, MCOW is much slower than RSPA. To delve deeper into the results of MCOW and RSPA, let’s concentrate on two datasets that contain longitudinal level data collected in two different inspection runs, with a window of 1 km. Each dataset covers approximately 4000 data-points, with a sampling interval of about 25 cm. One of the inspections is used as a reference dataset, while the other is treated as an unaligned dataset. Figure 6a displays these

362

M. Khosravi et al.

Fig. 3 Average peak factor for MCOW and RSPA: a entire length of the datasets, b peaks

(a)

(b)

two datasets, indicating a non-constant shift between them (with a positional error of 62 points between two defects at the beginning of the datasets and 24 points between two defects at the end). Both RSPA and MCOW can precisely align single defects and high-value peaks, as demonstrated in Fig. 6b, c, respectively. However, upon closer inspection, it becomes apparent that aligning the datasets using RSPA causes some parts of the aligned datasets to deform, whereas MCOW aligns the datasets without deformation, as shown in Fig. 6c. Figure 7 displays two sections of the datasets before and after alignment using RSPA and MCOW. To ensure comparability, they are aligned based on points 3440 and 3350 in Fig. 7a, b, respectively. As shown in these figures, MCOW slightly

Pre-processing of Track Geometry Measurements: A Comparative Case …

363

(a)

(b) Fig. 4 Positional error after alignment using a RSPA, b MCOW

stretches or compresses the dataset without warping its shape. However, RSPA causes sporadic warping that stretches or compresses the dataset while keeping it intact in other points, as seen in Fig. 7a. It’s worth noting that this warping mostly occurs in data-points with low values (close to zero). Nevertheless, Fig. 7b indicates that the parts that include single defects or high-value peaks are unaffected by warping. This explains why the peak factor results obtained by RSPA (as shown in Fig. 3) are precise for single defects but not satisfactory for the whole datasets after comparing the discussed methods, it can be concluded that RSPA is a useful approach for aligning measurements when the preservation of the shape and alignment of single defects is important, time is a critical factor, and there is a high maximum position variation between the datasets. On the other hand, MCOW is a suitable option when

364

M. Khosravi et al.

Fig. 5 Time needed to align datasets with different lengths using the alignment methods by considering different maximum shift in datasets

Fig. 6 Alignment of datasets: a reference and unaligned datasets, b alignment using RSPA, c alignment using MCOW

all data-points in the datasets are equally important to be aligned. MCOW is capable of precisely aligning datasets while maintaining their shape. Additionally, when the slack is low, MCOW is faster than RSPA.

Pre-processing of Track Geometry Measurements: A Comparative Case …

365

(a)

(b) Fig. 7 Comparing a part of dataset before and after alignment by RSPA and MCOW, a alignment of areas with low values, b alignment of peaks

5 Conclusion This article compares two different approaches for aligning track geometry measurements: feature-based and profile-based methods. The feature-based method, RSPA, aligns the peaks of the datasets, while the profile-based method, MCOW, aligns all data points equally. The methods were tested on railway track geometry data collected from 14 inspections. RSPA proved to be fast and efficient at aligning peaks, while MCOW was better at preserving the shape of the aligned data. However, RSPA struggled with aligning entire datasets, while MCOW was able to do so effectively. Both methods were able to reduce positional errors to below 0.25 m in more than 90% of cases. These results can aid in selecting the appropriate method for aligning track geometry measurements to improve the analysis and prediction of track geometry degradation. Acknowledgements The authors would like to thank the Swedish transport administration (Trafikverket), the In2Smart II project (grant agreement number 881574 within EU Shift2Rail),

366

M. Khosravi et al.

and Luleå Railway Research Center (JVTC) for their technical and financial support during this project. We would also like to express our very great appreciation to Dr. Iman Soleimanmeigouni for his valuable suggestions during the development of this research work.

References 1. Pedanckar NR (2004) Methods for aligning measured data taken from specific rail track sections of a railroad with the correct geographic location of the sections 6,804,621 2. Selig ET et al (2008) Analyzing and forecasting railway data using linear data analysis. WIT Trans Built Environ 103:25–34 3. Li H, Xu Y (2010) A method to correct the mileage error in railway track geometry data and its usage. In: Traffic and Transportation Studies 2010 Anonymous 4. Xu P et al (2014) Dynamic-time-warping-based measurement data alignment model for condition-based railroad track maintenance. IEEE Trans Intell Transp Syst 16(2):799–812 5. Xu P et al (2015) Optimizing the alignment of inspection data from track geometry cars. Comput-Aided Civil Infrastructure Eng 30(1):19–35 6. Palese JW, Zarembski AM, Attoh-Okine NO (2019) Methods for aligning near-continuous railway track inspection data. Proc Inst Mech Eng Pt. F: J Rail Rapid Transit 234(7): 709–721 7. Wang Y et al (2018) Position synchronization for track geometry inspection data via big-data fusion and incremental learning. Transp Res Part C: Emerg Technol 93:544–565 8. Khosravi M et al (2021) Reducing the positional errors of railway track geometry measurements using alignment methods: a comparative case study. Measurement 178:109383 9. Khosravi M et al (2022) Modification of correlation optimized warping method for position alignment of condition measurements of linear assets. Available at SSRN 4036551 10. Veselkov KA et al (2009) Recursive segment-wise peak alignment of biological 1H NMR spectra for improved metabolic biomarker recovery. Anal Chem 81(1):56–66 11. Skov T et al (2006) Automated alignment of chromatographic data. J Chemometrics: A J Chemometrics Soc 20(11–12):484–497 12. EN 13848-2 (2008) Railway applications—track—track geometry quality—part 2: measuring systems—track recording vehicles. Brussels: CEN (European Committee for Standardization)

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm Xinyu Liu, Chao Liu, and Dongxiang Jiang

Abstract For wind turbine operation and maintenance, wind turbine blade surface defect detection is a very important and challenging problem, as wind turbine blade surface defects seriously affect the efficiency and safety of the wind turbine. The performance of traditional methods depends heavily on the correlation between the handcrafted features and the features of the defects themselves, but surface defects are diverse which could make the traditional methods fail in reality. In this work, we present an automated framework to identify surface defects of blades using the advanced YOLOv5 algorithm, which can learn and extract blade surface defect features adaptively and accurately identify even very minor faults. The results of different algorithms are collected and compared based on a self-built dataset, which show that YOLOv5 has the best performance. In addition, YOLOv5 has significant advantages in terms of model size and training speed, and these advantages make the YOLOv5 model well suited for wind turbine blade surface defect recognition. Keyword Wind turbine blades · YOLOv5 · Surface defects identification

1 Introduction Wind energy is a widely used renewable energy source to solve the problems of fossil energy crisis and ecological degradation. With the rapid development of wind power industry in the world, a large number of wind turbines have been put into use. As a major component of wind turbines, blade failures can lead to significant financial X. Liu · C. Liu (B) · D. Jiang Department of Energy and Power Engineering, Tsinghua University, Beijing, China e-mail: [email protected] X. Liu e-mail: [email protected] D. Jiang e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_27

367

368

X. Liu et al.

losses and unplanned downtime. Blade failure is one of the major failures causing abnormal downtime, and each blade failure can take more than ten days to repair. It can severely reduce power generation efficiency. So it is critical to develop efficient methods for wind turbine blade inspection. There are various wind turbine blade fault detection methods. When the blade appears obvious damage, the method based on vibration signal detection is very effective. But for the early fault, the blade damage cannot affect the vibration characteristics of the blade. This method is difficult to achieve the fault detection and early warning. The identification and classification of faults at an early stage are helpful to avoid catastrophic failures of wind turbines. It can effectively extend the life of the units and optimize management and maintenance costs. Other common detection methods include ultrasonic detection, acoustic emission detection, infrared detection techniques, and image detection techniques. Ultrasonic inspection can locate the defect site, but relies on the subjective judgment of the inspector [1, 2]. Acoustic emission cannot detect the static condition of the blade, and unable to accurately reflect the target defect due to the easy signal attenuation during the transmission signal and the need for external applied stress [3– 5]. Infrared detection technology is a non-destructive testing method, but the cost of equipment is too high. And it is difficult to monitor the use in the actual environment [6, 7]. Visual inspection has become the main tool for routine maintenance of wind turbine blades, mainly using image processing techniques, machine learning and deep learning methods for early detection [8, 9]. Long Wang [10] et al. proposed a data-driven framework for automatic detection of cracks on the surface of wind turbine blades based on UAV images. It uses a traditional Haar-like manual feature extraction method to characterize the crack region and uses a cascade classifier for crack detection to obtain a model with fast detection rate and high classification accuracy. Yinan Wang [11] et al. proposed an unsupervised learning method based on one-class support vector machines, which can distinguish normal and abnormal components by combining deep features learned from the dataset. Most of these methods are for single faults or for normal and abnormal detection, and cannot do an all-round identification of the complex defects on the surface of wind turbine blades. Donghua Xu [12] et al. trained three convolutional neural networks for defect classification using the constructed blade defect image dataset and applied F1 score to evaluate the model. And this study compressed the model and reduced the hardware requirements, but failed to locate the defect. Yajie Yu [13] et al. proposed a defect semantic feature and transfer feature extraction. The model is trained on ImageNet for transfer learning. And he used the trained parameters for fine-tuning the blade, and obtained a better result, but also did not work on detection of defects. Chao Zhang [14] et al. proposed a Mask-MPNet detection method for wind turbine blade faults based on UAV images, which can achieve multiple fault detection of wind turbine blades and dynamic monitoring of wind turbine blades in operation. ASM Shihavuddin [15] developed an automatic damage alert system based on deep learning. The study showed that appropriate data expansion can improve model accuracy. Dipu Sarkar [16] used the YOLOv3 to train and evaluate a deep learning

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm

369

model on low-resolution wind machine images. his work concluded that the YOLOv3 model outperformed the conventional model in terms of accuracy and computation time. Zifeng Qiu [17] proposed an autonomous vision detection system based on convolutional neural networks and the YOLO model. In his work, the YSODA model was developed to detect small size defects and improve the recognition accuracy. Jihong Guo [18] proposed a hierarchical recognition framework for wind turbine blades, which consists of Haar-AdaBoost region detection and convolutional neural network classifier. his work demonstrated that this framework outperforms models such as SVM and VGG16. Through the summary of the current situation, there are few researches based on the computer vision field algorithm for wind turbine blade surface defects detection. And the methods need to be improved, so we need to analyze of wind turbine blade data characteristics, the develop higher accuracy of the defect detection algorithm. This study uses advanced target detection algorithms to identify and locate defects on the blade surface, providing some guidance for blade maintenance work.

2 Methodology YOLO is one of the one-stage detectors [19–21]. On June 25, 2020, Ultralytics released the first official version of YOLOv5. The YOLOv5 algorithm is an improvement on the YOLOv3 algorithm and is divided into four parts: Input, Backbone, Neck and Prediction. The Backbone network is used to extract the features in the input image. YOLOv5 adopts the idea of CSPNet and designs the CSP structure in the backbone network, this structure can significantly reduce the computation and make the model more lightweight. It also reduces the memory cost. A structure combined by FPN and PAN is used in Neck network. FPN layer conveys strong semantic features from top down while PAN conveys strong localization features from bottom up. And by combining both sides, features are fused from different backbone layers to different detection layers. The final detection part is performed in the Head network, which generates anchor frames for the mapping of features. Its output vector is the probabilities computed for different classes as well as the bounding boxes of the defects. The structure of YOLOv5 is shown in Fig. 1. YOLOv5 divides the input picture into S × S grids. If the center of a target falls into the grid, then the grid will make predictions about it. And each grid also needs to predict multiple bounding boxes and confidence scores for those boxes. Each bounding box has to predict the confidence and the location information of the target. So we end up with a tensor of size S × S × (5 × B + C), where B represents the number of bounding boxes that each grid can predict, and C represents the number of classes of bounding box confidence. YOLO algorithm needs to generate bounding boxes based on anchor boxes. For anchor boxes, we usually generate three anchor boxes of different sizes based on the center of each grid. Figure 2 illustrates the process of generating bounding boxes from anchor boxes.

370 Fig. 1 The structure of YOLOv5

X. Liu et al.

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm

371

Fig. 2 The process of generating a bounding box

The following equations can obtain Parameters of bounding box: bx = (2 · σ (t X ) − 0.5) + c X

(1)

by = 2 · σ t y − 0.5 + c y

(2)

bw = pw · (2 · σ (tw ))2

(3)

bh = ph · (2 · σ (th ))2

(4)

σ (x) =

1 1+e−x

(5)

where in Eqs. (1) and (2), tx and ty are correlation offsets with the upper left corner of the grid unit of the prediction target, cx and cy are the coordinates of the upper left corner of the corresponding grid. bx and by are the center coordinates of the bounding box. In Eqs. (3) and (4), tw and th are the dimensional offsets of the cells, pw and ph are the anchor values, bw and bh are the width and height of the bounding box. YOLOv5 uses the sigmoid function to perform central coordinate prediction. Using the regression algorithm, the bounding box and the ground truth box can be nearly overlapped, and then the prediction box can be obtained. The end-to-end training process can be started by associating the feature maps extracted by the convolutional neural network with the prediction box labels and creating a loss function. The loss of YOLOv5 consists of three main components, namely Classes loss, Objectness loss, and Location loss. Classes loss uses BCE loss (Binary cross-entropy loss), which calculates the classification loss of positive samples only. Objectness loss is calculated using BCE loss, which is the CIOU loss of the target bounding box with ground truth box for all samples of network predictions. Location loss is used as CIOU loss [22], and only the location loss of positive samples is calculated. The binary cross-entropy is calculated as follows:

372

X. Liu et al.

Loss = − N1

N i=1

yi · log( p(yi )) + (1 − yi ) · log(1 − p(yi ))

(6)

where y is the binary label 0 or 1 and p(y) is the probability that the output belongs to the y label. The loss function of YOLOv5 is calculated as follows: Loss = λ1 L cls + λ2 L obj + λ3 L loc

(7)

where Lcls is classes loss, Lobj is objectness loss, and Lloc is location loss. λ1 , λ2 and λ3 are balance factors.

3 Experiment 3.1 Dataset The data are from a wind farm in eastern China [18], which includes 85 images taken by high-resolution cameras. Some examples are given in Fig. 3. The data include four surface defects, which are cracks, skin debounding, fibre failure and pitted surface. The description of each defect category is presented in Table 1. These pictures have the expert’s label and comment, so we think it is credible. For the fourth kind of faults, we call it pitted surface, because at this time the wind turbine blade has a large number of small area of the surface rubber skin off. It is not easy to mark and identify, so we will mark them as a whole. Occurrence of this Fig. 3 Wind turbine blade surface defects

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm

373

Table 1 Fault type and description Fault type

Description

Crack

Gel coat cracking or de-bonding

Skin debonding

Delamination, fracture and damage to the adhesive layer

Fibre failure

Severe blade damage leading to fibre and laminate delamination

Pitted surface

A large number of rubber skin off the surface of the leaf caused by a large number of pits and pockmarks to form a rough surface

fault blade surface will appear a large number of pits, pockmarks, the formation of rough surface. It is easy to accumulate water and dust, making the exposed surface more vulnerable to erosion, seriously reducing the life of the wind turbine blade. This study uses labelImg to mark the defect on the picture, and the marked label file is saved as an XML file in PASCAL VOC format, and then processed into txt format using python for easy reading by the program (Fig. 4). Because of the small number of images in the original dataset, data augmentation was adopted. We used the simplest affine transformation including rotation and flip, as well as a small amount of sharpening and blurring, adding noise, and cropping image processing to make dataset more useful for network training, and finally a total of 510 images are used to train and test. The data set was divided into training and validation sets in the ratio of 9:1. Table 2 shows the number of samples. Fig. 4 Samples of annotated images

Table 2 The number of samples Fault type

Crack

Skin debonding

Fibre failure

Pitted surface

Number of samples

124

952

101

292

374

X. Liu et al.

Fig. 5 Illustration of the mosaic data augmentation

Meanwhile, YOLOv5 algorithm comes with some data augmentation functions, including scaling, color space adjustment and mosaic augmentation. Among them, mosaic is a novel and effective data augmentation technique, which can combine four training images into a complex image at a certain ratio, as shown in Fig. 5. With a rich training dataset, it is beneficial to optimize the performance of the detector and avoid the occurrence of overfitting.

3.2 Experimental Configuration The running environment of this experiment is Intel® Xeon processor E5-2600 v4, 2.40 GHz, GPU is NVIDIA RTX2080Ti with 11 GB video memory, running memory is 256 GB, operating system is Ubuntu 18.4, acceleration environment is CUDA10.2, programming language is Python, and deep learning framework is pytorch1.2.

3.3 Evaluation Metrics The main evaluation metrics used in this study are P (precision), R (recall), and F1 (F-Score), which are calculated as follows: P=

TP T P+F P

(8)

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm

R=

TP T P+F N

F1 = 2 ×

P×R P+R

375

(9) (10)

where TP (true positive) indicates the number of actual defects detected correctly, FP (false positive) indicates the number of actual defect-free parts detected, FN (false negative) indicates the number of actual defective parts not detected. The precision indicates the percentage of correctly detected results among all detected results. Recall is a metric to evaluate the sensitivity of the model. F1 score combines the two to evaluate the accuracy of the model uniformly, and a higher F1 score indicates a more accurate model.

4 Result In the training process, each iteration can be divided into two steps. First, the data from the training set is applied to the model, and the model automatically adjusts the weights based on the loss values. Then, the data from the validation set is applied to the model, after which the loss values are calculated using the just updated weights. The loss values obtained using the data in the validation set are an important metric for evaluating the performance of the model. In this process, we used transfer learning in order to speed up the convergence. The pre-trained model is obtained by training on the COCO dataset (Fig. 6). After 200 epochs, which took a total of 2.22 h, an 166 MB weight file was obtained. There are 3 kinds of losses in the training process, box loss represents the loss of the bounding box, obj loss represents the target detection loss mean, and cls loss represents the classification mean. In the first 50 iterations, the training set loss

Fig. 6 The loss for the YOLOv5 in training

376

X. Liu et al.

decreases rapidly, and then a steady decrease in loss can be observed from the loss curve, and the loss stabilizes after 200 iterations. The loss curves show that YOLOv5 has strong learning ability and fast convergence ability. Figure 7 shows the confusion matrix obtained from the training results. The confusion matrix shows that the recognition of crack is not high. Through analysis, it is believed that the fault of cracks is not obvious in the image features and the sample size is too small leading to the imbalance of training samples. We would like to add cracked images to balance the dataset to get more satisfying results, so we added some images of cracked wall in the dataset. Because we believe that the local surface of the wind turbine blade is similar to the wall background (white background and smooth). We add 90 images in the training, and the confusion matrix is shown in Fig. 8, and images of cracked wall used are shown in Fig. 9. The number of samples of crack is 223 after adding picture. It can be clearly seen that the detection of cracks become better after adding crack pictures. Table 3 shows the comparison of the two training results. The images were detected using the trained YOLOv5 model with a target confidence threshold of 0.25. Figure 10 shows the results for four typical samples, with rectangles used to mark the detected faults and confidence coefficients also indicated at the top of the rectangles. YOLOv5 shows a very good performance on the dataset. Almost all faults are accurately identified. Although the recall is lower, the other metrics are higher. All results show that YOLO-v5 can effectively learn enough information from the training set and then correctly identify surface defects from the background (Table 4).

Fig. 7 Confusion matrix

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm

377

Fig. 8 The confusion matrix obtained from the second training

Fig. 9 Wall cracks

Table 3 The comparison of the two training results

Train 1

Train 2

P(precision)

0.752

0.859

R(recall)

0.852

0.820

F1

0.799

0.839

mAP

0.846

0.865

Among these models, YOLOv5 performs very well and has a short training time and fast convergence. It is worth mentioning that if YOLOv5s is used, the training time is reduced to 1.85 h and the fps can reach 140. The accuracy can reach 78.6%, recall 70.8% and mAP 0.759 and the weight file is only 14 MB. It also demonstrates

378

X. Liu et al.

Fig. 10 The results of surface defects detection

Table 4 Comparison of training results of different models

Model

mAP

Time for training (h)

fps

YOLOv5

0.865

2.22

71

YOLOv3

0.64

3.56

62

YOLOv4

0.77

13.8

40

SSD

0.66

1.09

116

Faster-RCNN

0.69

10

18

that YOLOv5 can select models of different sizes in different scenarios, and all have good detection capabilities.

5 Conclusion In this study, the YOLOv5 model was used to accurately detect surface defects in wind turbine blades and to test the method using its own dataset. The experimental results obtained from the experiments show that it is a very effective option for the wind

Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm

379

turbine smart O&M industry. Like other CNN-based models, the accuracy improves as the training set is balanced across the various categories of labels. As the proportion of images of crack in the dataset increases, the detection accuracy improves significantly compared with the previous training set. Compared with other models, the YOLOv5 model achieves the best performance in detecting surface defects on wind turbine blades. The comparison also shows that YOLOv5 has significant advantages in terms of training speed and weight file size. These advantages make YOLOv5 more suitable for wind turbine blade surface defect detection. Although surface defects were accurately detected using the YOLOv5 model, there is much room for improvement in this work. Because the performance of the YOLOv5 model depends on the size of the training set. In this dataset, the number of certain types of surface defects is very small, especially fibre failure and cracks. In the future, more samples will be collected to better train YOLOv5 and conduct more research.

References 1. Frankenstein B, Schubert L, Klesse T et al (2009) Rotor blade monitoring of wind turbines. Materialprufung 51(10):673–677 2. Tian YP, Wang SS, Entao Y et al (2013) Study of hybrid non-destructive testing suitable for inspection in wind turbine blades 3. Bo Z, Yanan Z, Changzheng C (2017) Acoustic emission detection of fatigue cracks in wind turbine blades based on blind deconvolution separation. Fatigue Fract Eng Mater Struct 40(6):959–970 4. Junior VJ, Zhou J, Roshanmanesh S et al (2017) Evaluation of damage mechanics of industrial wind turbine gearboxes. Insight 59(8):410–414 5. Tang J, Soua S, Mares C et al (2016) An experimental study of acoustic emission methodology for in service condition monitoring of wind turbine blades. Renew Energy 99(dec): 170–179 6. Hwang S, An YK, Sohn H (2019) Continuous-wave line laser thermography for monitoring of rotating wind turbine blades. Struct Health Monit 18(4):1010–1021 7. Yang B, Zhang L, Zhang W et al (2014) Non-destructive testing of wind turbine blades using an infrared thermography: a review. In: International conference on materials for renewable energy & environment. IEEE 8. Patel J, Sharma L, Dhiman HS (2021) Wind turbine blade surface damage detection based on aerial imagery and VGG16-RCNN framework 9. Deng L, Guo Y, Chai B (2021) Defect detection on a wind turbine blade based on digital image processing 10. Wang L, Zhang Z (2017) Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Trans Industr Electron 1:1–10 11. Wang Y, Yoshihashi R, Kawakami R, You S (2019) Unsupervised anomaly detection with compact deep features for wind turbine blade images taken by a drone. IPSJ Trans Comput Vis Appl 11(1) 12. Xu D, Wen C, Liu J (2019) Wind turbine blade surface inspection based on deep learning and UAV-taken images. J Renew Sustain Energy 11(5) 13. Yu Y, Cao H, Yan X, et al (2020) Defect identification of wind turbine blades based on defect semantic features with transfer feature extractor. Neurocomputing 376 14. Zhang C, Wen C, Liu J (2020) Mask-MRNet: a deep neural network for wind turbine blade fault detection. J Renew Sustain Energy 12(5):053302 15. Shihavuddin A, Chen X, Fedorov V, et al (2019) Wind turbine surface damage detection by deep learning aided drone inspection analysis. Energies 12(4)

380

X. Liu et al.

16. Gunturi SK, Sarkar D (2020) Wind turbine blade structural state evaluation by hybrid object detector relying on deep learning models. J Ambient Intell Humanized Comput (1) 17. Qiu Z, Wang S, Zeng Z, et al (2019) Automatic visual defects inspection of wind turbine blades via YOLO-based small object detection approach. J Electron Imaging 28(4):43023.1–43023.11 18. Guo J, Liu C, Cao J, Jiang D (2021) Damage identification of wind turbine blades with deep convolutional neural networks. Renew Energy 174 19. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: undified, rel-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 20. Redmon J, Farhadi A (2017) YOLO9000: better, fastern, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6517–6525 21. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Eprint Arxiv, pp 1–5 22. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2019) Distance-IoU loss: faster and better learning for bounding box regression. In: AAAI conference on artificial intelligence 23. http://github.com/ultralytics/YOLOv5

Cooperative Search and Rescue with Drone Swarm Luiz Giacomossi, Marcos R. O. A. Maximo, Nils Sundelius, Peter Funk, José F. B. Brancalion, and Rickard Sohlberg

Abstract Unmanned Aerial Vehicle (UAV) swarms, also known as drone swarms, have been a subject of extensive research due to their potential to enhance monitoring, surveillance, and search missions. Coordinating several drones flying simultaneously presents a challenge in increasing their level of automation and intelligence to improve strategic organization. To address this challenge, we propose a solution that uses hill climbing, potential fields, and search strategies in conjunction with a probability map to coordinate a UAV swarm. The UAVs are autonomous and equipped with distributed intelligence to facilitate a cooperative search application. Our results show the effectiveness of the swarm, indicating that this approach is a promising approach to addressing this problem. Keywords Drones · UAV · Search and rescue · Swarm · Cooperative

L. Giacomossi (B) · M. R. O. A. Maximo Autonomous Computational Systems Lab (LAB-SCA), Aeronautics Institute of Technology (ITA), São José Dos Campos, Brazil e-mail: [email protected] M. R. O. A. Maximo e-mail: [email protected] N. Sundelius · P. Funk · R. Sohlberg Mälardalen University (MDU), Universitetsplan 1, 721 23 Västerås, Sweden e-mail: [email protected] P. Funk e-mail: [email protected] R. Sohlberg e-mail: [email protected] J. F. B. Brancalion Technological Development Department, EMBRAER S.A, São José Dos Campos, Brazil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_28

381

382

L. Giacomossi et al.

1 Introduction Swarms are a natural and widespread phenomenon among various groups of animals. They are able to move in space with coordination and cohesion through simple rules. In the context of unmanned aerial vehicles (UAVs), swarms refer to a group of homogeneous entities working collaboratively towards a common goal [1]. This concept has drawn the interest of researchers due to its potential for efficient and cost-effective completion of complex tasks in harsh environments compared to traditional single drone applications. This approach aims to achieve the highest level of autonomy, which poses several challenges [2]. One of these challenges is the ability to perform a task coordinated among multiple UAVs without human operator intervention [3]. Using UAVs in search and rescue operations (SAR) can significantly improve the chances of locating and rescuing individuals in distress. UAVs equipped with advanced sensors and cameras can cover large areas quickly, allowing search teams to identify potential targets with greater speed and efficiency. They can also access hard-to-reach or dangerous areas, providing a high level view of the situation and allowing rescuers to assess the situation more accurately. In addition, UAVs can reduce the risk to human search teams, allowing them to focus on the most critical areas of a search and minimizing their exposure to danger. Recent research has shown the potential of UAVs in SAR. Dantas et al. [5] explore low-processing image recognition alternatives, such as Inception and Yolo, to address challenges such as locating small targets in large areas. Sampedro et al. [14] propose a real-time mission planning architecture for a swarm of UAVs with dynamic agent-to-task assignment, while Volchenkov et al. [15] propose path planning methods using fuzzy logic and PSO algorithms. Mission planning involves developing a strategy to accomplish an objective in a specific mission. It includes identifying required resources, risks, and challenges to develop a detailed plan with tasks and responsibilities. Mission planning at an online level requires constant updates as environmental conditions change. Planning is one of the biggest challenges in artificial intelligence (AI), and building intelligent systems that can plan effectively often requires a hybrid approach. There is no single algorithm that can perform perfect planning, and the challenge is to develop planning approaches that are as efficient as possible for a specific task. As humans, we often begin by making a coarse-grained plan, even if we do not have a complete understanding of the task and its challenges. At the top level, the task for the coarse-grained plan may have conflicting goals, such as minimizing the risk of problems, finding the target as quickly as possible, and using as few resources as possible. Humans perform online planning by continuously reevaluating and adjusting our plans as the situation changes and as we collect more information. We may even have backup plans in case the primary plan does not work as expected. Our research focuses on using UAVs in a swarm to perform SAR missions with the objective of rescuing missing individuals in distress. The main contribution of this paper is the development of a new approach that enables the utilization of drone swarms in search and rescue operations when a probability map of the region is

Cooperative Search and Rescue with Drone Swarm

383

provided in advance. To achieve this, we have designed a set of heuristics inspired by the work of Giacomossi et al. [6] and Sundelius et al. [4]. Additionally, we have created simulations that can emulate search missions with drones in both 2D and 3D virtual environments. Through these simulations, we evaluate the effectiveness of our approach. This paper is organized as follows. Section 2 presents the theoretical background. The scenario of interest is presented in Sect. 3. Later, in Sect. 4 we describe our solution. Then we proceed to the experiment’s results and discussion in Sect. 5. Lastly, Sect. 6 concludes and shares ideas for future work.

2 Theoretical Background In this section, we present the main techniques found in the literature for our context.

2.1 Potential Fields Potential fields have been widely used in the field of aerial robotics for navigation and obstacle avoidance. They provide a way for autonomous aerial robots to generate smooth and safe trajectories by assigning attractive and repulsive forces to the environment. The attractive force pulls the robot towards the goal while the repulsive force repels it from obstacles. The resulting path is a compromise between the two forces that satisfies the objective while avoiding collisions with obstacles. One popular variant of the artificial potential field (APF) method is using the bivariate normal function as a potential field [11]. However, potential fields are known to suffer from local minima and can get stuck in trap situations. To overcome this problem, various enhancements to the basic potential field approach have been proposed, e.g., combining potential fields with other techniques such as machine learning [6]. The potential field that governs the movement of drones within the formation can defined by. f (x, y) = e−α(x−xc )

2

−γ (y−yc )2

,

(1)

where α and γ are constants, [xc , yc ]T is the center of the field, and [x, y]T is the position that is influenced by the potential field. In order to achieve a symmetrical field, both α and γ need to be equal. This ensures the influence to be uniform in all directions, which is important for maintaining coordinated motion within the formation. Specifically, the partial derivatives of (1) create a velocity field that dictates the direction and speed of movement for each drone, based on its position relative to the center of the field. The equation is given by

384

L. Giacomossi et al.

| ∂ f (x,y) | ∂x ∂ f (x,y) ∂x

|

−2α f (x, y)0 = 0 − 2γ f (x, y)

||

| (x − xc ) , (x − xc )

(2)

By modifying α and γ, the potential field can be shaped and scaled to achieve different patterns of drone behavior, allowing for flexibility in the formation’s movement.

2.2 Search Strategies There are many approaches to search a specific area, one is providing a probability grid map of where a target might be located, this grid map is divided into cells of equal size. One approach is to use the Bayes Search Algorithm [8] to update the map after each visited cell to a lower probability where nothing is found and update every other cell with a higher probability with the help of an equation. To traverse such a discrete map three techniques are investigated. A baseline lawn mower-like search, that does not take the probabilities into account, a hill climbing approach and an A-star hill climbing mix. Once the search begins, each UAV will select one high probability cell of the map to ensure the distribution of the swarm members in the map, and as soon as the UAV reaches the destination, it will begin the search using one of the search algorithms implemented. The Lawn Mower Search (LMS) algorithm, also known as row, line or column search, is a search technique that has been used in optimization problems inspired by the motion of a lawn mower. According to the research by Ousingsawat and Earl [12], the LMS divides the search space into subregions and searches in a zigzag pattern, similar to the motion of a lawn mower, exemplified in Fig. 1. This approach is particularly useful for problems with very little information. The LMS has been shown to find optimal solutions, making it a popular choice for many optimization problems [12]. However, LMS has limitations as it can be inefficient and timeconsuming. It may require extensive search time in some areas and overlook small, but critical areas. Hence, for more complex scenarios, we need more sophisticated search algorithms that can adapt to different situations and utilize the available information in the most effective way. The Hill Climbing (HC) algorithm [13] is a powerful optimization technique widely used in computer science and AI. It is a local search algorithm that iteratively moves towards a higher-value solution in the search space, by evaluating neighbouring solutions and selecting the one with the highest value. The HC continues this process until it reaches a peak where no higher-value solution can be found. For convex problems it will always find the optimum solution. However, if the problem is not convex, the algorithm can get stuck in local maxima and minima, resulting in suboptimal solutions. To address this limitation, variants of the HC algorithm have been developed, which introduce randomness and memory to the search process to enable the algorithm to escape local optima and improve the solution. The HC algorithm is useful when there is no information outside the closest surroundings, as

Cooperative Search and Rescue with Drone Swarm

385

Fig. 1 Example of a swarm formation executing the lawn mower search pattern. The dashed lines show the paths

it follows the most promising path. However, the problem of getting stuck in local maxima remains a challenge. In our previous work [6], we tackled this by proposing a strategy using finite state machines (FSM) to overcome the problem and to enhance the performance of the search. Lastly, the Bayes Search (BS) algorithm was developed during World War II when the allies were searching for German submarines. It has its origins in the Bayes theorem [8]. The probabilities from the beginning of the search are based on assumptions before any data is collected and are based on experts and sensor data. The first step is to divide the search area into a grid map and assign each cell with a priori-probabilities. The second step is to search the cell with the highest probability, if the target is there, the search is finished. If the target is not there, then the third step is to reduce the probability value of that cell and increase the probability of the other cells correspondingly. The fourth step is to repeat until the target is found. The Bayes formula to update the cells for a search mission can be set up as, P( A|B) =

P(B ∩ A) P(B|A)P(A) = , P(B) P(B)

(3)

where the variables A and B are defined as P(I s in the search area|N ot f ound) =

P(I s in the search area AN D N ot f ound) , P(N ot f ound)

(4)

Now let p denote the probability of the target being in a cell and f the probability of that target being found if it is there. Then P(B) and P(B ∩ A) can be defined by

386

L. Giacomossi et al.

P(B) = p ∗ (1 − f ) + (1 − p),

(5)

P(B ∩ A) = p ∗ (1 − f ),

(6)

Now the posterior probability p' of a searched cell can be updated with p' = p

1− f , 1 − pf

(7)

The posterior probability of other cells can be updated with r' =

r , 1 − pf

(8)

where r is the individual probability of all other cells.

3 Search and Rescue Scenario In order to optimize patrolling in autonomous scenarios, the task can be broken down into two sub-tasks: low-level and high-level patrolling. Low-level patrolling involves using sensors to detect objects of interest within the area. Meanwhile, highlevel patrolling focuses on directing the search towards specific subregions of the area that are more likely to contain the object of interest. This can be accomplished by utilizing additional information, such as previous search patterns and knowledge about the types of objects that are typically found in certain areas. By employing a combination of these tactics, an autonomous system can effectively and efficiently search the patrolling area while minimizing the risk of overlooking crucial objects or areas. Therefore, the task can be broken down into: • Low-Level Strategy and Tactics for anomaly detection and classification. An efficient patrolling drone is able to identify anomalies and classify them including a number of properties, e.g. speed and orientation. • High-Level Strategy and Tactics for searching the area of interest. An essential part of a patrolling task is to perform the task as efficiently as possible with respect to identifying the target and minimise the risk of missing an object of interest. Anomaly detection and image recognition are rapidly advancing fields, with potential applications in patrolling. In our previous work [5], we demonstrated this potential by adapting these techniques to patrolling tasks. Building on this foundation, our current work aims to address high-level strategies and tactics for patrolling. We assume a previously provided probability grid map of the search region, as shown in Fig. 2. In this map the region is divided into equally sized cells, where each cell represents the likelihood of finding a person in that sub-region. The probabilities are estimated by a human expert and are updated simultaneously by the drones using

Cooperative Search and Rescue with Drone Swarm

387

Fig. 2 Example of a probability map. Green, Yellow and Red cells represent low, medium and high probability regions, respectively. No flight zones are illustrated as Blue cells. The star represents the object of interest

search algorithms. To ensure seamless coordination in the swarm, we assume the drones share the same map and are capable of updating the probabilities on the map simultaneously. Furthermore, we assume that the drones are equipped with advanced sensors that can effectively identify the object of interest.

4 Method and Material Our simulations aim to evaluate collaborative online mission planning algorithms. To compare the efficiency of different algorithms, we limit the scenario to 5 autonomous drones, which is sufficient as proof of concept. We adapted a 2D simulator, as shown in Fig. 3, based on the simulator developed by Giacomossi et al. [6]. The simulation focuses on search missions, where various search algorithms are tested to identify the location of the missing person in the patrolling area and complete the mission. The simulation environment can be summarized with the diagram in Fig. 4. We implemented a communication model for the swarm to ensure that all UAVs have access to information on which cells have been searched and a basic strategy for searching in a specific direction once all surrounding cells have been visited. The UAVs communicate through a shared map, using the whiteboard model of communication [7] and a centralized ground control system, which enables each drone to read and update map probabilities simultaneously. Additionally, the simulation is connected to the 2D visualization engine that displays the current state of the simulation. By connecting the simulation to a visualization engine, it is possible to monitor and adjust the simulation in real-time. This is useful for testing different scenarios

388

L. Giacomossi et al.

Fig. 3 2D Simulation view of the search mission in using Bayesian probabilities and HC search

Fig. 4 High-level overview of the components and interactions within the simulation environment

or optimizing the behavior of the drones within the simulation. Additionally, the potential field technique is implemented for collision avoidance and path planning [6]. At the start of the search process, the drones communicate with one another to select different cells on the map with high probabilities and begin the search process, as shown in Fig. 5. Once the drone reaches the intended location, it actively initiates the search process using HC and BS algorithms. In the following Fig. 6, we can observe the search process being performed by the drone. The drone follows a strategy where it visits a cell and, in case the object is not identified, it examines the probabilities of the 8 cells. Then, using the HC algorithm, the drone moves to the cell with the highest probability to continue the search. In Fig. 7 we illustrate the Bayes algorithm updating the cell probabilities in a simulation. The probabilities of all map cells are updated at each step using the formulas (7) and (8). As seen in frames (a) and (b), the probabilities increase as the drone visits new cells without finding the object. During each iteration of the simulation, a new probability map is created randomly. The probability for each cell in the map is also randomly assigned. The location of the individual is also updated to a new high probability cell in the newly generated map. Also, new obstacles are randomly positioned in the map, which helps to evaluate the algorithm’s performance under different conditions. The random assignment of probabilities to each cell ensures that the simulation results are unbiased towards any particular scenario. For the experiments, we will evaluate the average time taken to

Cooperative Search and Rescue with Drone Swarm

389

Fig. 5 Example of search process using three drones. Green, yellow, and red cells represent low, medium, and high probability regions, respectively. No flight zones are illustrated as blue cells. The yellow circle represents the object of interest. Each line with a different colour represents the path taken by a different drone

Fig. 6 Drone performing the search in the 8-connected cells. Red, Yellow and Blue cells represent the high, medium and low probabilities respectively. In Gray cells that were already visited

Fig. 7 Drone performs the search using the Bayes algorithm and updates the probabilities associated with each cell as it visits them. These probabilities correspond to the numbers assigned to each cell and are updated at every new cell the drone explores

390

L. Giacomossi et al.

complete the mission in 270 iterations and compare it to the results obtained in our previous work [6] which employed a strategy based on the lawn mower search.

5 Results and Discussion In Table 1 we present the results obtained from the experiments. Notably, Experiment 2 demonstrated a shorter mean time of 24.2 s compared to Experiment 1. This indicates that the new strategy utilizing the HC and BS algorithms resulted in an approximately 72% improvement in time efficiency for identifying the missing person. These results suggest that the HC and BS algorithms significantly reduced the time required to complete the task, in comparison to the approach adopted in Experiment 1. It is important to consider that the drones are being simulated in a 2D environment with reduced degrees of freedom, simplified dynamics, and with idealized identification and communication capabilities. So notice that these results can deviate from real world performance. Furthermore, it demonstrates the importance of improving the search method when the goal is to optimize the time efficiency. Note that the Experiment 2 may be more useful as a model for achieving that goal, but the Experiment 1 is still a safe approach when performing a search when there is no information about the region, i.e., no previous map of probabilities is provided. The results of the comparison between the mission with and without the aid of a probability map provide evidence that using such a map can significantly reduce search time. An interesting observation from the experiments is that when the target is located in a high probability area with most of its cells in the direct path of the drones, as seen in Fig. 8, the time taken to locate the target is significantly reduced. This is because the drones can fly almost directly to the target, bypassing areas with low probabilities of detection. However, if the target is not in a high probability area, the algorithm may lead the drones to wrong areas, resulting in decreased time efficiency. Therefore, while the use of a probability map can improve search time in optimal scenarios, its effectiveness may vary in situations where the map has incorrect assumptions. Thus, the accuracy of the probability map is a critical factor that has a significant impact on the search. By leveraging the power of AI and probability mapping, search and rescue operations can become faster, more efficient, and ultimately more successful. Time can be translated into reduction of fuel and energy costs, as well as reduced number of UAVs involved to meet the success criterias of the mission. These results can yield large performance increases by choosing the right mission planning algorithms. When Table 1 Results of the experiments

Exp

Strategy

Mean Time [s]

1

Lawn Mower - Giacomossi et al. [6]

57.6

2

HC + Bayes

33.4

Cooperative Search and Rescue with Drone Swarm

391

Fig. 8 Drone performing the search in the 8-connected cells. Red, Yellow and Blue cells represent the high, medium and low probabilities respectively. In Gray cells that were already visited. The numbers represent the probability of each cell

performing searches with a map of probabilities and Bayesian updates it is important to make precise assumptions about the probabilities when preparing the map.

6 Conclusion This work combines AI techniques with a swarm of UAVs to coordinate efficient search and rescue missions in a cooperative and autonomous manner. Moreover, the study demonstrates the potential of AI algorithms for patrolling, target identification, and patrol mission planning. We apply these concepts in a search and rescue scenario, modelled as a testbed for this kind of application where we study strategies and tactics for locating an object of interest i.e., a missing person. The methodologies using search patterns, hill climbing algorithm, bayesian search, and potential fields were merged with the development of simulations to provide an application and evaluation of the concept. In this work we reduced by approximately 72% the time taken in searching for the missing person using the strategies developed in comparison with our previous work, which employed a strategy with no previous information of where the missing person might be located. The findings highlight the importance of continually exploring new approaches and technologies in order to optimize search and rescue missions and increase the chances of saving lives. Also, it is important to note the importance of making precise assumptions when preparing the map, otherwise it may lead to incorrect and not effective searches. For future work, we intend to study the performance of the algorithms in a 3D environment with robust physics and realistic drone dynamics, seen in Fig. 9. Also, the intention in the future is to apply this technique in real drones. Further experimentations with different kinds of maps to evaluate the algorithms and tweaking of the probability update function can be also studied.

392

L. Giacomossi et al.

Fig. 9 Search and rescue scenario adapted and performed in the py-bullet 3D environment. The cells’ colors represent the probability of a subregion of the map. Green refers to low probability cells, Yellow to medium probability and Red cells to high probability cells. The duck represents the missing individual

Acknowledgements Luiz Giacomossi acknowledges Embraer S.A for his scholarship. Marcos Maximo is partially funded by CNPq – National Research Council of Brazil through the grant 307525/2022-8. The authors are also grateful to Embraer and Vinnova, Sweden’s Innovation Agency for supporting and funding this research.

References 1. Brambilla M, Ferrante E, Birattari M et al (2013) Swarm robotics: a review from the swarm engineering perspective. Swarm Intell 7:1–41 2. Chen W, Liu J, Guo H, Kato N (2020) Toward robustand intelligent drone swarm: challenges and future directions. IEEE Network 34(4): 278–283 3. Sundelius N, Funk P, Sohlberg R (2023) Simulation environment evaluating ai algorithms for search missions using drone swarms. In: International congress and workshop on industrial AI 2023. 4. Huang H, Messina E (2007) Autonomy levels for unmanned systems (ALFUS) framework Volume II: framework models initial version, special publication (NIST SP), national institute of standards and technology, Gaithersburg, MD 5. Dantas A, Diniz L, Almeida M, Olsson E, Funk P, Sohlberg R, Ramos A (2022) Intelligent system for detection and identification of ground anomalies for rescue. Springer International Publishing, pp 277–282 6. Giacomossi L, Souza F, Cortes RG, Mirko Montecinos Cortez H, Ferreira C, Marcondes CAC, Loubach DS, Sbruzzi EF, Verri FAN, Marques JC, Pereira L, Maximo ROA, Curtis VV (2021) Autonomous and collective intelligence for UAV swarm in target search scenario. In: 2021 IEEE Latin American robotics symposium (LARS) 7. Das S, Santoro N (2019) Moving and computing models: agents. Springer International Publishing 8. Polson N (2018) AIQ: hur artificiell intelligens fungerar 9. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598) 10. Glover F, Laguna M (1998) Tabu search. Springer 11. Barnes L (2018) A potential field based formation control methodology for robot swarms. University of South Florida 12. Ousingsawat J, Earl MG (2007) Modified lawn-mower search pattern for areas comprised of weighted regions. In: 2007 American control conference. NY, USA, pp 918–923

Cooperative Search and Rescue with Drone Swarm

393

13. Skiena S (2020) The algorithm design manual. Texts in computer science. Springer International Publishing 14. Sampedro C, Bavle H, Sanchez-Lopez JL, Fernández RAS, Rodríguez-Ramos A, Molina M, Campoy P (2016) A flexible and dynamic mission planning architecture for uav swarm coordination. In: 2016 international conference on unmanned aircraft systems, pp 355–363 15. Volchenkov D, San Juan V, Santos M, Andújar JM (2018) Intelligent uav map generation and discrete path planning for search and rescue operations. Complexity 2018

Domain Knowledge Regularised Fault Detection Douw Marx and Konstantinos Gryllias

Abstract Unsupervised data-driven methods are attractive options for fault detection in rotating machinery since they do not require any failure data during training. However, in these data-driven approaches, engineering domain knowledge remains unexploited. Although engineering features are often used as inputs to machine learning models, thereby including domain knowledge, few methods exist for directly integrating domain knowledge about the expected machine fault behaviour into unsupervised fault detection methods. This paper presents a generic method for including domain knowledge into unsupervised, auto-encoder-based fault detection methods by regularising the Jacobian of the latent feature representation. This regularisation results in informative latent features that are sensitive to changes that are expected from a machine in a faulty condition. The proposed method is evaluated on a bearing fault detection task, both when using a low dimensional vector of engineering features and when using high dimensional frequency domain data. The analysis is conducted on two bearing fault data sets with different operating conditions, fault modes and signal-to-noise ratios. The proposed regularised auto-encoder yields improved ROC-AUC performance as compared to the unregularized baseline when evaluated on a latent-feature based fault indicator. The proposed method shows potential as a generic method for integrating engineering domain knowledge into fault detection problems. Keywords Unsupervised learning · Fault detection · Regularisation · Domain knowledge informed machine learning

D. Marx (B) · K. Gryllias Department of Mechanical Engineering, Division LMSD, KU Leuven, Celestijnenlaan 300, Box 2420, 3001 Leuven, Belgium e-mail: [email protected] K. Gryllias e-mail: [email protected] Flanders Make@KU, Leuven, Belgium Leuven.AI–KU Leuven Institute for AI, 3000 Leuven, Belgium © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_29

395

396

D. Marx and K. Gryllias

1 Introduction 1.1 Unsupervised Fault Detection in Condition Monitoring Condition-based maintenance procedures can ensure reliable and continuous operation of machines by reducing unnecessary maintenance [1], and minimizing machine downtime [2]. The first step in the condition monitoring process is fault detection, where after the faulty component can be isolated, identified and ultimately its remaining useful life can be estimated [3]. This investigation focuses on the first phase of fault detection. Supervised data-driven approaches for fault detection, isolation and classification have proven very effective for conditions where failure data is available [4]. However, since failure data necessary for supervised training is generally not available, this work is focused on enhancing unsupervised data-driven fault detection models that require only healthy data for training. Specifically, we argue that the fault detection performance of unsupervised data-driven fault detection methods can be enhanced by including domain knowledge.

1.2 Including Domain Knowledge in Unsupervised Data-Driven Methods Deep learning-based anomaly detection methods originally developed for computer vision and text analysis [5] have proven effective for fault detection in rotating machines [6]. However, it can be argued that there are fundamental differences between the fault detection problem in condition monitoring as compared to traditional anomaly detection in computer science. Generally, engineers have an understanding, albeit imperfect, of the expected machine behaviour under faulty conditions. This is not the case for many anomaly detection methods in computer science literature, where there is often no explainable physical relationship between the normal condition and the anomalous condition. Understandably, current data-driven fault detection methods do not respect engineering considerations and physical relationships (i.e. RMS is more likely to increase under fault conditions than decrease) and would label a sample that is different from the training data as faulty, even if this sample does not have characteristics expected from a faulty machine. We argue that approximate models of reality like finite element models, lumped parameter models, phenomenological models and heuristic rules can contribute information about expected fault behaviour to unsupervised data-driven fault detection models. To this end, we demonstrate that informative latent features for fault detection can be learnt by regularising an auto-encoder such that the model latent features are sensitive to the expected fault behaviour. Figure 1 highlights the benefit of model regularisation based on the expected fault behaviour using a simple auto-encoder with a single input layer, single output

Domain Knowledge Regularised Fault Detection

397

layer and latent dimensionality of 1. The measure of data point anomalousness, after training on healthy (green) data only, is plotted as a color map in the 2D input space for both a regularised (right) and unregularised (left) model. Including information about the expected deviation of the data towards faulty conditions (see Eq. 7), leads to a better separation in the learnt 1D latent distributions of healthy (green) and faulty (red) data and ultimately an improved fault detection performance as measured by the ROC-AUC metric (Eq. 10).

1.3 Prior Work Despite recent work on integrating domain knowledge into deep learning methods by adding constraints [7], using simulated bearing data for domain adaption tasks [8, 9] and designing architectures inspired by signal processing methods [10], there has been little work on integrating domain knowledge into deep learning methods applied to the fault detection problem. In earlier work [11], the authors developed a fault detection method based on the latent features of an auto-encoder. Regularisation terms, driven by an augmentation of the healthy data towards the expected faulty state, were added to the auto-encoder loss function. The regularisation enforced that fault modes are maximally separated in the latent representation, thereby facilitating the computation of fault-mode-specific health indicators. This work shows that domain knowledge regularisation of data-driven models is further possible by constraining the derivative of the auto-encoder latent features with respect to the input. This idea is connected with earlier computer science literature like the contractive auto-encoder [12], where an auto-encoder is encouraged to learn a latent representation that is insensitive to small variations in the input space. This is accomplished by adding a regularisation term in the loss function that minimizes the L2 norm of the Jacobian of the encoder activations with respect to the input. In this work, however, we do roughly the opposite, imposing that the Jacobian should be large for input features related to faulty machine behaviour. Furthermore, our constraint is not based purely on maximizing the Jacobian, but instead, maximizing the correlation of the Jacobian with the expected fault behaviour, thereby biasing the model with domain knowledge, and making the latent features sensitive to faulty behaviour. Earlier work in computer vision added similar constraints to the Jacobian to enforce explicit translation invariance image classification [13], but in this work we aim to learn features that are sensitive, and not invariant.

1.4 Overview The rest of the paper is structured as follows. First, the mathematical formulation for the problem definition, the auto-encoder regularisation scheme and the model evaluation is developed in Sect. 2. Next, the evaluation procedure is presented in

Regularised

Fig. 1 Demonstration: linear auto-encoder with single hidden layer and latent dimension of 1: The likelihood color map of a sample being healthy (green) is shown for the 2D input space for an unregularised model (left) and for a regularised model that includes information about the expected fault behaviour (right). The regularisation ensures a better separation in the latent feature space, ultimately leading to a higher ROC-AUC score for the regularised variant

Not Regularised

398 D. Marx and K. Gryllias

Domain Knowledge Regularised Fault Detection

399

Sect. 3, where after the method is applied to both traditional engineering features and frequency domain representations of rolling element bearing vibrations in Sect. 5 Finally, conclusions and future work are presented in Sect. 6.

2 Domain Knowledge Regularisation for an Auto-encoder 2.1 Problem Definition Suppose a sample . with unknown data distribution ., is measured from a machine. The sample has dimensionality . and the data distribution . is parameterized by the fault severity .∈ [0, 1] and the fault type .∈ {mode | mode ∈ all machine fault modes}: ∼ (, ) where xi ∈ R.

(1)

The dependence of the measured sample . on the fault severity and the fault mode is omitted for the sake of brevity. Figure 2 shows graphical interpretation of Eq. 1. The goal of the fault detection problem is to detect if the machine has deviated from its healthy condition such that .> 0. Although machines will not always fail gradually and predictably as depicted in Fig. 2, the assumption in this work is that the expected change in the data distribution, defined as ., is at least approximately known: ≈ E [] =

∂¯ where ∂

∈R

(2)

Here, . ∂∂¯ is the expected value of the deviation of the measured distribution from the nominal, healthy distribution at .= 0. Specifically, in this work, we use a linear approximation of the expected change in the input data at nominal conditions and prescribe the expected change of the input data towards faulty conditions as the unit vector .. This information is then used to regularize an auto-encoder, making it sensitive to samples which are expected to be associated with fault conditions (Sect. 2.3).

2.2 Auto-encoder Definition An auto-encoder can learn informative low-dimensional representations of highdimensional data. A neural network encoder maps the input data to a lower dimension, where after a decoder attempts to reconstruct the input data from the encoded latent representation. Since the bottleneck at the encoder is of lower dimensionality than the input, the auto-encoder is forced to learn a compact representation of the input data. The reconstruction loss function for the baseline auto-encoder used in this work

400

D. Marx and K. Gryllias

Fig. 2 Formulation: fault detection with domain knowledge

is given by Lr ec (, ) =

N 1 E || − (())||22 , N i=1

(3)

where . and . are the encoder and decoder functions, parameterized by . and . respectively. In this work, we learn improved latent features for anomaly detection as compared to the latent features obtained from strictly data-driven approaches that do not exploit domain knowledge. Specifically, we target the bottleneck latent features . which are obtained by passing the input samples through the encoder function: = () where

∈R

(4)

2.3 Proposed Regularisation Scheme The proposed regularisation scheme enforces that the gradient of the latent feature representation with respect to the input is large when the input samples deviate from normal conditions in a way that is consistent with the expected fault behaviour. The encoder Jacobian, or the derivative of the latent features with respect to the input can be computed by automatic differentiation and is defined as: ∂ ∂() = ∈ R× . ∂ ∂

(5)

Domain Knowledge Regularised Fault Detection

401

We design a loss function that maximizes the correlation between the expected fault behaviour and the Jacobian to ensure that the latent representation is maximally sensitive to increasing fault severity. The sensitivity of the latent features with respect to increasing fault severity is defined as: ∂ ∂¯ ∂ = ∈ R×1 ∂ ∂∂

(6)

The true deviation of the data distribution from the healthy condition is naturally unknown and the expected, imperfect, fault direction .≈ ∂∂¯ is therefore used instead as a proxy. The added regularisation maximizes the L2-norm of the sensitivity of the latent features with increasing fault severity . ∂∂ : Lr eg

N || || || 1 E || || ∂ || . =− || N i=1 ∂ ||2

(7)

Finally, the combined loss for the regularised auto-encoder is obtained by adding the regularisation in Eq. 7 scaled by hyperparameter .λ to the reconstruction loss function in Eq. 3. L(, ) = Lr ec + λLr eg

(8)

This loss function can then be optimized for the model parameters . and . using a gradient-based optimisation algorithm.

2.4 Evaluation Metrics To compare the informativeness of the learnt latent representation for fault detection, we use the Mahalanobis distance as a measure of anomalousness. The Mahalanobis distance of a newly evaluated sample . is defined as D() = (−µ)T E −1 (−µ) ,

(9)

where .µ and .E are the mean and the covariance of the latent feature representation over the healthy data training set. With the Mahalanobis distance acting as a measure of the anomalousness of a sample, the area under the receiver operating characteristic curve (ROC-AUC) can be computed. The ROC-AUC is a popular summary metric for measuring anomaly detection performance [14] and is defined as:

402

D. Marx and K. Gryllias

{

1

AUC =

TPR(t), dFPR(t) 0

with E Nt [D(xi ) ≥ t]yi TPR(t) = i=1 E Nt i=1 yi E Nt [D(xi ) ≥ t](1 − yi ) . FPR(t) = i=1 E Nt i=1 (1 − yi )

(10)

Here, .t is the classification threshold, . Nt is the total number of samples in the test set, . yi is the true label of the .i-th sample (0 for healthy and 1 for faulty), and .[·] is 1 if the condition inside the bracket is true, and 0 otherwise.

3 Evaluation To compare the regularised auto-encoder to an unregularised baseline, both variants are applied to two different datasets with different operating conditions, fault severities and signal-to-noise ratios (SNR). A diagram explaining the procedure for evaluating the proposed regularisation method is shown in Fig. 3.

Fig. 3 Procedure for evaluating the regularisation method

Domain Knowledge Regularised Fault Detection Table 1 Dataset specifications Specification CWRU dataset Sampling rate [Hz] Fault modes Operating conditions [RPM] Signal to noise ratios Signal segement length Envelope spectrum dimension Engineering features dimension N healthy train N healthy validation N healthy test, faulty test

403

KUL dataset

12.4 kHz [inner race, outer race, ball] [1797,1772,1750,1730] [0.01,0.1,1,.∞] 1732 432 14

51.2 kHz [inner race, outer race] [2415, 1805, 1210] [0.01,0.1,1,.∞] 6216 1553 14

96 14 28

76 11 22

3.1 Datasets and Data Preparation The proposed method is evaluated on the Case Western Reserve University (CWRU) drive-end 12 kHz dataset [15] and a similar fault-seeded bearing dataset measured at the KU Leuven, Belgium (see Table 1). The data is split into segments with .50% overlap ensuring that there are multiple fault events in each signal segment. Additional noisy data sets are created by adding Gaussian noise, with a variance that leads to the signal-to-noise ratios as listed in Table 1, to the time domain signal. Next, the envelope spectrum and engineering features are computed from the time series data to evaluate the method on both highdimensional and low-dimensional data. Finally, the data is split into train, validation and test data for training and evaluation.

3.2 Model Definition and Training All models used in this investigation are simple auto-encoders with two encoder and two decoder layers and .λ = 1 for regularised models, thereby assigning an equal importance the reconstruction and regularisation loss terms in Eq. 8. Model and training parameters are listed in Table 2. Both regularised and unregularised models are trained on healthy data only. After training, we verify that the model did not over fit using a healthy validation set, check that the training has converged, and verify that the regularisation loss term has a similar order of magnitude compared to the reconstruction error term.

404

D. Marx and K. Gryllias

Table 2 Model parameters and training Architecture (envelope spectrum/engineering features) Training and regularization Encoder layer size Bottleneck layer size Output activation Latent activation

100/8 5/2

Learning rate Batch size

0.001 16

Linear Sigmoid

Epochs .λ

100 1

3.3 Model Evaluation After model training, the model is evaluated on a balanced test set with an equal number of healthy and faulty examples. The ROC-AUC metric is computed for 10 trails with different train, validation and test splits. Finally, the median ROC-AUC over all trials are compared for the regularised and unregularised methods.

4 Adding Domain Knowledge A key component of the proposed method is the prescription of expected fault behaviour in Eq. 2. The conjecture is that even an approximate prescription of expected fault behaviour, as informed by lumped parameters models, phenomenological models, signal processing algorithms, finite element models and expert knowledge, can guide unsupervised data-driven models to learn informative latent features that are sensitive to faulty behaviour.

4.1 Envelope Spectrum Frequency Data Defects in rolling element bearings will lead to repeating impulses at a characteristic excitation frequency related to the fault location, bearing geometry and shaft speed. These repeating impulses excite resonance bands of the machine that tend to carry the fault information. The envelope spectrum is a popular signal processing indicator that exploits this knowledge by first extracting the signal envelope in a frequency band excited by the bearing impulses, and then computing the frequency spectrum of the signal envelope. Prominent frequency components in the envelope spectrum at the bearing characteristic frequencies are indicative of a bearing fault [16]. Thus, for frequency data in this work, the expected fault behaviour . is prescribed as triangular peaks at the characteristic bearing frequencies and its harmonics. A triangular peak . p as function of envelope spectrum frequency . f , with characteristic frequency . f c , amplitude .a and bandwidth .w is defined as

Domain Knowledge Regularised Fault Detection

{ p ( f, f c ) =

−| ( f − f c ) | + a if − ≤ f − f c ≤ , 0 otherwise

405

(11)

where .a is the amplitude of the peak, .w the bandwidth of the triangle and . f c the characteristic frequency. Peaks with exponentially decaying amplitudes are added at each harmonic . of the characteristic frequency. Furthermore, since multiple fault modes are expected in rolling element bearings, . M = {inner race fault, outer race fault, ball fault}, the contributions for different fault modes are summed to obtain a combined expected fault behaviour, although the different fault modes could also be treated separately as in [11]. es (

f ) = e−β f p ( f, n f m )

(12)

For this investigation, . N = 20, .β = 1e−3 , .a = 1 and .w = 50 Hz. Figure 4 shows an example of a normalized test sample compared to the prescribed, expected fault direction. For sample shown, there is a particularly good agreement between the expected fault direction and the true fault direction compared to other samples. However, the expected fault behaviour is still deliberately imperfectly prescribed, without for instance accounting for side bands around the fault frequency typical of inner race faults.

4.2 Engineering Feature Data In the part of the investigation where engineering features are used as input, the domain knowledge . is naively specified as all engineering features being likely to increase in equal proportion with increasing fault severity; . is a unit vector with equal positive values for each index. The 14 engineering features used in this work include the first 3 harmonics for each fault frequency, RMS, Kurtosis, Crest factor, Entropy and Spectral Entropy. All expected fault behaviours . are normalised to have unit norm.

Fig. 4 True fault direction versus expected fault direction

406

D. Marx and K. Gryllias

5 Results Figure 5 shows the median AUC over 10 trials for each dataset with a certain fault mode, fault severity and signal to noise ratio. Results are shown separately for the CWR and KUL datasets for the envelope spectrum frequency data and engineering features data respectively. A data point on the diagonal indicates that the regularised and unregularised models perform equally well on a given dataset. In this case, however, data points tend to lie above the diagonal, indicating that the regularised model tends to outperform the unregularised model. The regularised version outperforms the unregularised version on 89, 56, 80 and 66% of cases for the KUL frequency data, KUL engineering features data, CWR frequency data and CWR engineering features data respectively. However, in many cases both algorithms achieve a perfect AUC of 1. Further, in many cases where the unregularised model outperforms the regularised variant, the data points are centred around an AUC of 0.5, which is the expected performance of a randomly guessing model, where both models are performing poorly. Note that although the regularised model generally outperforms the unregularised model, even for imperfect domain knowledge, adding incorrect information to a machine learning model will ultimately decrease model performance [17].

(a) CWR: Engineering features

(b) CWR: Frequency data

(c) KUL: Engineering features

(d) KUL: Frequency data

Fig. 5 Median AUC for 10 trials. Each point is a dataset with a fault mode (Outer race: .*, Inner race: .>, Ball: .•) and signal to noise ratio (SNR .∝ size). A point above the diagonal indicates that the regularised model is outperforming the unregularised model

Domain Knowledge Regularised Fault Detection

407

6 Conclusions and Future Work This work introduces a generic method for integrating domain knowledge into unsupervised fault detection methods. The method is applied to a bearing fault detection problem both for frequency domain data and engineered features. The analysis demonstrates that the proposed regularisation scheme using imperfect domain knowledge leads to improved performance as measured on the median ROC-AUC for different data sets. In future work, the sensitivity of model performance to the domain knowledge accuracy and the regularisation parameter .λ as well as including information about unlikely fault behaviour will be investigated. Furthermore, the proposed method can be applied on time-series data and bi-frequency maps from signal processing. Acknowledgements The authors gratefully acknowledge the European Commission for its support of the Marie Sklodowska Curie program through the ETN MOIRA project (GA 955681).

References 1. Lei Y, Li N, Guo L, Li N, Yan T, Lin J (2018) Machinery health prognostics: a systematic review from data acquisition to RUL prediction. Mech Syst Signal Process 104: 799–834. https://doi. org/10.1016/j.ymssp.2017.11.016 2. Lee J, Wu F, Zhao W, Ghaffari M, Liao L (2014) Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mech Syst Signal Process. https://doi.org/10.1016/j.ymssp.2013.06.004 3. Jardine AK, Lin D, Banjevic D (2006) A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech Syst Signal Process 207: 1483–1510. https://doi.org/10.1016/j.ymssp.2005.09.012 4. Hoang DT, Kang HJ (2019) A survey on deep learning based bearing fault diagnosis. Neurocomputing 335:327–335. https://doi.org/10.1016/j.neucom.2018.06.078 5. Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, Müller KR (2021) A unifying review of deep and shallow anomaly detection. Proc of the IEEE 1095: 756–795. https://doi.org/10.1109/JPROC.2021.3052449 6. Liu C, Gryllias K (2020) A semi-supervised support vector data description-based fault detection method for rolling element bearings based on cyclic spectral analysis. Mech Syst Signal Process 140: 106682. https://doi.org/10.1016/j.ymssp.2020.106682 7. Karsmakers QVBP (2022) Constraint guided gradient descent: guided training with inequality constraints. arXiv:2206.06202 8. Liu C, Mauricio A, Qi J, Peng D, Gryllias K (2020) Domain adaptation digital Twin for rolling element bearing prognostics. Ann Conf PHM Society 121. https://doi.org/10.36001/phmconf. 2020.v12i1.1294 9. Wang Q, Taal C, Fink O (2022) Integrating expert knowledge with domain adaptation for unsupervised fault diagnosis. IEEE Trans Instrum Measure 711–712. https://doi.org/10.1109/ TIM.2021.3127654 10. Wang D, Chen Y, Shen C, Zhong J, Peng Z, Li C (2022) Fully interpretable neural network for locating resonance frequency bands for machine condition monitoring. Mech Syst Signal Process 168: 108673. https://doi.org/10.1016/j.ymssp.2021.108673

408

D. Marx and K. Gryllias

11. Marx D, Gryllias K (2022) Domain knowledge informed unsupervised fault detection for rolling element bearings. PHM Soc Eur Conf 71: 338–350. https://doi.org/10.36001/phme. 2022.v7i1.3348 12. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th international conference on international conference on machine learning (833–840). Omnipress, Madison, WI, USA 13. Simard P, Victorri B, LeCun Y, Denker J (1991) Tangent prop—a formalism for specifying selected invariances in an adaptive network. In: Moody J, Hanson S, Lippmann R (eds) Advances in neural information processing systems, vol 4. Morgan-Kaufmann 14. Ruff L, Vandermeulen RA, Görnitz N, Binder A, Müller E, Müller KR, Kloft M (2020) Deep semi-supervised anomaly detection. arXiv:1906.02694) 15. Case Western Reserve University Bearing dataset. https://engineering.case.edu/ bearingdatacenter 16. McFadden P, Smith J (1984) Vibration monitoring of rolling element bearings by the highfrequency resonance technique—a review vibration monitoring of rolling element bearings by the high-frequency resonance technique—a review. Tribol Int 17: 13–10. https://doi.org/10. 1016/0301-679X(84)90076-8 17. Ling J, Jones R, Templeton J (2016) Machine learning strategies for systems with invariance properties. J Comput Phys 318: 22–35. https://doi.org/10.1016/j.jcp.2016.05.003

HFedRF: Horizontal Federated Random Forest Priyanka Mehra and Ayush K. Varshney

Abstract Real-world data is typically dispersed among numerous businesses or governmental agencies, making it difficult to integrate them into data privacy laws like the General Data Protection Regulation of the European Union (GDPR). Two significant obstacles to the use of machine learning models in applications are the existence of such data islands and privacy issues. In this paper, we address these issues and propose ‘HFedRF: Horizontal Federated Random Forest’, a privacy-preserving federated model which is approximately lossless. Our proposed algorithm merges d random forests computed on d different devices and returns a global random forest which is used for prediction on local devices. In our methodology, we compare IIDs (Independent and Identically Distributed) and non-IIDs variant of our algorithm HFedRF with traditional machine learning (ML) methods i.e., decision tree and random forest. Our results show that we achieve benchmark comparable results with our algorithm for IID as well as non-IID settings of federated learning. Keywords Federated learning · Random forest · Decision trees · Merge trees · Merge random forest

1 Introduction Artificial intelligence has advanced significantly in recent years, due to the massive volume of data gathered in various domains. This gathered data (Big data) is dispersed in practical applications across several businesses or governmental entities and kept as data islands which makes sharing of data impossible among different domains [1]. For companies and organizations, data is one of the most significant assets that cannot P. Mehra Department of Microdata Analytics, Dalarna University, Falun, Sweden e-mail: [email protected] A. K. Varshney (B) Department of Computing Sciences, Umeå University, Umeå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_30

409

410

P. Mehra and A. K. Varshney

be shared. Nowadays, privacy concerns about data are particularly sensitive as data breaches occur periodically. Thus, various countries are forming data privacy legislation to avoid such breaches. In 2018, the European Union implemented General Data Protection Law (GDPR). The GDPR offers people greater control over their personal data and outlines strong guidelines and complete transparency for how organizations should manage their data. Before collecting any personal data, a firm must obtain the consent of the client and make it clear what it intends to do with the information [2]. For example, profiling is a prominent machine learning application, and currently, it is utilized by practically all firms to analyse clients and specialized marketing. The method itself is unbiased and is not prohibited under GDPR. Nevertheless, the utilization of profiling causes discrimination toward customers, which can be prohibited by GDPR. Considering the challenges and limitations [3], the issue determines if using scattered data is worth making the effort for. Yes, for sure as academia, firms, companies, or governments might all gain from the data islands. For example, government entities could collaborate with companies or firms to better understand the daily traffic flow in the city and update road construction plans to increase traffic efficiency during rush hours. Hence, the question is: How can we train the combined models among several organizations or domains securely? Due to the challenges of data islands, data privacy, and security, the existing available methods cannot resolve the problem of data breaches. Because of this, creating novel approaches to link data islands and realworld applications becomes a serious challenge to resolve. Federated Learning (FL) [4] was put forth in 2016 and primarily focused on creating machine learning models that protect privacy when data is scattered and cannot be easily gathered and kept in one location. One of the applications of this approach is word typing prediction on mobile devices [5]. Since the customer’s private information is contained in all the typed words, any direct collection could result in a legal or ethical infraction of rules. Parts of the modelling process can be carried out on mobile devices using FL techniques, and only the trained model parameters are sent to the central servers, hence protecting the user’s privacy. FL has offered a novel approach to look at and address the current challenges in data privacy. Data heterogeneity [6] is one of FL’s main issues. Data owned by different parties in FL may have completely distinct distributions. Consider different mobile devices in a gathering. All those devices would be producing data with different patterns like different languages, using different social networks, etc. As a result, we generally cannot assume Independent and Identically distributions (IIDs) about the data in FL. In addition, different FL participants (devices) have different volumes of training data, so some participants may just have a small sample size while others may have a large sample size. Depending on how the data is partitioned researchers divided FL into 3 categories [7]: vertical federated learning (VFL), horizontal federated learning (HFL), and federated transfer learning (FTL). HFL is a sample-partitioned FL, in which an overlapping feature space is shared by datasets on different devices yet possesses a unique sample space. VFL is a feature sampled FL and is opposite to HFL i.e., the same sample space is shared by devices but possesses a unique feature space. It is possible to use FTL when devices share some of the features and some samples.

HFedRF: Horizontal Federated Random Forest

411

Thus, motivated by FL and its categories, we propose a novel privacy-preserving horizontally partitioned tree-based machine learning model called ‘HFedRF’ (Horizontal Federated Random Forest). The main contributions in this research paper are as follows: We propose a framework that implements HFL for random forest (HFedRF). In our framework, we extend the tree merging algorithm [8, 9] in the literature for merging random forests. We also challenge the lossless claim of the tree merging algorithm in the literature through a counter example. We find that the tree merging algorithm can have information loss when the nodes in the trees are merged using the majority vote approach. Therefore, the tree merging algorithm can be approximately lossless with respect to the utility. We compare our proposed HFedRF model with the traditional machine learning models i.e., random forest, and decision trees. The structure of the paper is as follows: In Sect. 2 we review related work in FL, privacy-preserving models, and discuss tree merging algorithms. In Sect. 3, we challenge the lossless claim of the tree merging algorithm used in the literature [8, 9] and we claim it to be approximately lossless. We further propose HFedRF (Horizontally partitioned Federated Random Forest) which merges random forests in a federated setting. In Sect. 4, we give experimental results of our HFedRF model in IIDs and non- IIDs setting by comparing it to traditional ML models. In section 5, we give the conclusion and future work.

2 Related Work 2.1 Federated Learning (FL) FL was initially suggested as a solution to access rich data from user devices, as regulations made it challenging to create models using rich data. By keeping the data on consumer devices and combining locally derived intermediate outcomes, one can train a shared model in neural networks. In [10], the authors proposed a recommender system that uses FL in meta-learning. Additionally, FL has been utilized in the loss-based AdaBoost [11] and handles multi-task problems in [12]. A vertically aggregated federated learning approach was introduced in [13]. In this paper, each data provider had a unique feature space, but sample space is shared. To ensure data privacy and maintain modelling accuracy, they jointly learn a logistic regression model. Additionally, the work of [14] establishes a modular benchmarking system for federated contexts. Although there have been numerous research papers released, the definition of FL was unclear until the release of [7] which introduced the 3 categories of FL. The same team examined the tree-boosting technique and adapted it to the vertical federated environment [15]. A lossless framework was proposed, and it was successful in preventing the disclosure of each private data provider’s information. In [16] a novel reinforcement learning strategy was proposed

412

P. Mehra and A. K. Varshney

that considers the need for privacy and develops Q-networks for each agent, with the help of other agents.

2.2 Random Forest (RF) with FL Random Forest (RF) is an ensemble supervised machine learning technique, which is extensively used in both classification and regression tasks. RF employs a decision tree as the base classifier and produces multiple decision trees to make predictions. In the literature prediction from RF is presented in 2 ways, by using the bagging approach and random tree selection for the input. To address the issue of merging decision trees, a merged decision tree classifier [17] was designed for two parties having their own private database by using the ID3 learning strategy. Their work focused on the HFL approach while other researchers also used the VFL approach to solve the issue [18]. Due to the extremely slow speed of currently available works based on cryptography for privacy-preserving ML techniques, [19] first demonstrated that RF could be naturally applicable in a fully distributed architecture, and then developed protocols for RF to enable general and efficient distributed privacy-preserving knowledge discovery. Authors in [20] recently adapted this approach and merged the locally learned model. They introduced ad-hoc processes for model encryption (offline) and decision-tree evaluation (online), which can be viewed as a scoring algorithm for RFs that preserves privacy. Liu et al. proposed federated forests [21] i.e., a framework that enables multiple participants to jointly train a random forest model without sharing their data. The approach is based on decision trees and employs a secure aggregation scheme to merge the models trained by each participant. The Federated Forest framework merges decision trees by computing the weighted average of their split functions, using the number of samples at each leaf node as weights. The authors utilize a secure aggregation technique based on secure multi-party computation (MPC) [22] to calculate the weighted average without revealing participant data. Lucas Airam C. de Souza et al. propose a decentralized federated learning framework [23] for training random forest models. The approach is based on a blockchain infrastructure that enables multiple participants to collaborate on training a model without the need for a central coordinator. The authors demonstrate the effectiveness of their framework on several real-world datasets, achieving comparable performance to centralized learning while preserving data privacy and decentralization.

2.3 Tree-Merging Algorithm Given d decision trees T 1 , T 2 , T 3 , …, T d these d trees are combined into one supertree T in Fan and Li’s [8] approach for merging decision trees. It was termed a lossless random forest compression. With lossless they mean that when they use a merged

HFedRF: Horizontal Federated Random Forest

413

super tree instead of a random forest consisting of T 1 , T 2 , T 3 , …, T d to predict for a test instance, the prediction remains the same. In their paper, they proposed 2 algorithms to merge decision trees, i.e., mergeDecisionTrees and computeBranch. The algorithm mergeDecisionTrees takes as input the roots of the decision trees to be merged and the number of decision trees to be merged (i.e., d) and returns the combined decision tree T. It determines the most common split attribute creating condition intervals based on the split values of the most common split attribute and then calls the computeBranch algorithm on each decision tree to generate the part branch trees. The algorithm mergeDecisionTrees finds the child nodes of each condition in the condition interval and produces the merged super tree T. There are three conditional checks in the computeBranch algorithm. . If the decision tree’s node is a leaf. . If the split attribute of a decision tree node is not the most common attribute. . If the split attribute of the decision tree node is the most common attribute. According to their approach, the combination of decision regions is equivalent to the combination of decision trees. Each region on a particular tree is linked to a class. The majority rule is then used to combine the regions of various trees. Hall et al. [24], Bursteinas and Long [25] (regions are referred to as hypercubes), Andrzejak et al. [26] (sets of iso-parallel boxes), and Strecht et al. [27] all use a similar strategy (decision regions). The authors of [9] suggest a method for handling horizontally partitioned data that protects privacy by combining Mondrian k-anonymity with decision trees in a Federated Learning (FL) environment. They used the tree merging algorithm by Fan and Li [8]. Each device in their method trains a decision tree classifier. The root node of each device’s tree is shared with the aggregator. By selecting the split attribute that is the most frequent split, the aggregator merges the trees and expands the branches according to the split values of the selected split attribute.

3 Proposed Work In this section, we question the lossless property of the tree-merging algorithm suggested in the existing literature. We challenge this property for the first time and claim it to be approximately lossless. Further, we use the tree merging algorithm for merging random forests and propose a horizontally partitioned federated version of the random forest named (HFedRF).

414

P. Mehra and A. K. Varshney

3.1 Example to Challenge the Lossless Property of the Tree Merging Algorithm In literature, a solution to the issue of merging decision trees is claimed to be a lossless merging approach. The assumption of choosing the most common node causes no loss [8, 9] is questionable. We challenge this assumption and prove that tree merging algorithm is ‘not lossless’ using an example. Figure 1 explains the working of the tree merging algorithm. Consider a test case [3.5, 2.0, 5.0] for Fig. 1. According to decision tree a., the class assigned to the test case is ‘orange’; decision tree b., assigns class ‘blue’, and decision tree c., assigns class ‘orange.’. Now, taking the majority voting approach the class assigned to the test case is ‘orange’. Let’s find the class assigned to the test case from the merged tree. As it can be seen from Fig. 1d., the class assigned is ‘blue’ which is different from the class assigned by the majority voting approach in random forest. This difference in class prediction challenges the ‘lossless’ property of the merging algorithm. In this example, we incur the loss in prediction performance from the tree merging algorithm, when compared to the prediction from d decision trees. This disapproves the lossless assumption of the tree merging algorithm. Hence, we can say the tree merging algorithm is approximately lossless. We will be using this property to merge RF from local devices to generate approximately lossless global RF.

Fig. 1 The figure shows an example of a random forest with 3 decision trees on a dataset with 3 attributes X 1 , X 2 , and X 3 . The dataset has 2 classes i.e., orange, and blue. This is an example illustration of Fan and Li’s [8], tree merging algorithm. In the 3 decision trees a, b, and c the most common root attribute is X 1 , thus for the merged tree, X 1 becomes the root node. To compute the branches splitting intervals are created and the combinations of decision regions in the merged tree are selected according to the splitting intervals from the combinations of decision trees

HFedRF: Horizontal Federated Random Forest

415

Fig. 2 The figure illustrates ‘n’ devices (D1, D2, D3, …, Dn ). The devices send local random forests to the server and the server returns a global random forest to the devices

3.2 Horizontal Federated Random Forest (HFedRF) FL is the process of training a machine learning algorithm, such as deep neural networks (DNNs), on many local datasets confined within local nodes without explicitly exchanging data samples. The main idea consists of training local models on local data samples and exchanging parameters (e.g., the weights and biases of a deep neural network) at some frequency between these local nodes to build a global model shared by all nodes. In a federated learning environment, not much focus has been given to interpretable machine learning models such as decision trees and random forests. In [9], a k-anonymized federated variant of the decision tree algorithm was presented which uses the (assumed) lossless tree merging algorithm presented in [8] to merge their decision tree models. Taking motivation from this approach, we present our algorithm called ‘HFedRF: Horizontally partitioned Federated Random Forrest’ which merges random forest computed on local devices and returns a global random forest. RF models computed on local devices are communicated to the central server (see Fig. 2); the central server merges these locally trained models to return a global random forest to all the decentralized devices to make predictions.

3.3 Model Building For the FL’s aggregation step, we merge decision trees and extend it to generate a merging algorithm for random forest. Algorithm 1 receives the list of local forest and user defined maximum number of trees in the global forest as input and returns the global random forest. First, the aggregator makes a single list of all the decision trees from local random forest models. We recursively merge the trees with the most common root node using merge_decision_trees() algorithm and remove the most common root node from the decision tree list. We do this until the decision tree list is empty or forest has the maximum number of decision trees in it. The decision tree merging algorithm (merge_decision_trees ()) can be summed up as follows. . Find the split attribute of all the decision trees. The split attributes of decision tree roots T 1 , T 2 , T 3 , …, T d are denoted by X 1 , X 2 , X 3 , …, X d . The root of the merged decision tree model is a node whose split attribute is the most common attribute among X 1 , X 2 , X 3 , …, X d .

416

P. Mehra and A. K. Varshney

. Make condition trees as follows: Remove duplicate split values, sort them, then use the split values to create condition intervals. Make condition trees with the same number of branches as the number of conditions in the condition intervals. For example, if P1 , P2 , P3 are the split values of the attributes X 1 for decision trees T 1 , T 2 , T 3 respectively. Then in the merged tree, the split condition intervals will be P3 . . Determine the pruned condition trees: For each condition tree, eliminate the inner nodes when there is only a single branch from the child node to the parent node. . Repeat the previous steps until all the pruned condition trees have leaf nodes for stopping/base condition. Use majority voting to determine the label of the leaf node of the merged tree T if all the pruned condition trees are leaf nodes. The decision tree merging algorithm will be used to merge decision trees from the random forest on all local devices. Algorithm 1 gives the stepwise details of our proposed algorithm called ‘Horizontal Federated Random Forest (HFedRF)’. As mentioned above the algorithm has a local random forest list and user-defined maximum number of decision trees in the global random forest (num_trees) as input. After extracting decision trees (steps 2–3) from all the local random forests, it finds the most common root attribute (step 6). All the decision trees having the same root node are merged using the merge_ decision_trees (in step 7) algorithm as described above. The merged decision tree becomes part of the global random forest (step 8) and decision trees with the same root node are removed from the tree list (step 9). This process is continued until tree_list becomes empty or GlobalForest has a maximum number of decision trees (step 5). Algorithm 1 returns a global random forest which is communicated to the local devices for further predictions. In simple words, we take all the random forests trained on local devices as input and aggregate decision trees with the same root nodes to compute global random forest. Since we use the approximately lossless merge_ decision_trees algorithm to merge decision trees in the GlobalForest, our proposed algorithm ‘HFedRF’ is also approximately lossless.

3.4 Proposed Scheme for HFL with Random Forest We present a FL architecture that assumes data is horizontally IID or non-IID partitioned amongst devices. We illustrate our proposed framework in Fig. 3 and characterize each level in algorithm 1. Since we are simulating FL, we lack real d devices with individual data partitions. As a result, we use publicly accessible data from the UCI Machine Learning Repository to create IID and non-IID partitions. On each data partition, each of the d devices train a random forest classifier. As a result, we have d random forest. The random forests are merged by the aggregator using the proposed algorithm 1. To create a merged global random forest, which we call GlobalForest, trees from local random forests are collected at the server. The top-down merging in algorithm 1 is used to get the GlobalForest. The computed GlobalForest

HFedRF: Horizontal Federated Random Forest

417

is communicated to all the d devices where it is used for further classification. The aggregation is regarded as follows from the standpoint of federated learning. The root node of each device’s tree (a node of the decision tree comprises split attribute and split value) is shared.

Algorithm 1: Horizontal Federated Random Forest (HFedRF) HFedRF (local_forest[], num_trees) 1. tree_list [] 2. for i in range(len(local_forest)): 3. tree_list.append(local_forest.trees) 4. GlobalForest [] 5. while (num_trees! = 0) & len(GlobalForest)>0: 6. common_root most_common_root(tree_list) 7. merged_tree merge_decision_trees (common_root) 8. GlobalForest.append(merged_tree) 9. tree_list tree_list.remove (common_root) 10. Num_trees Num_trees - 1 11. return GlobalForest

The aggregator then chooses the node with the most common split attribute. This procedure is carried out recursively until the merger is complete. Devices must offer the required nodes at each point, and the algorithm selects the most common split attribute and creates its branches based on the split values of the most common split attribute. The aggregation server sends the Global Forest to all devices so that they can use it to infer from an unknown data instance. Fig. 3 Algorithm flowchart

418 Table 1 Details of the datasets used

P. Mehra and A. K. Varshney

Dataset

No. of samples

Adult

48842

No. of attributes 14

Churn modelling

10000

21 14

Bank

45211

Skin

24505

4

Magic

19020

11

4 Experimental Results 4.1 Dataset Used We conduct our experiments utilizing datasets from the UCI Machine Learning Repository [28]. Table 1 shows the details of the dataset used. We ran our experiments on five different datasets namely adult, skin, churn modelling, bank, and magic. We show the results of HFedRF for binary classification, but it can be extended to multi-class classification as well.

4.2 IIDs and Non IIDs Partition When given dataset D, we use 30% of it as the test set and the rest as the train data. Then, to produce the data associated with each device, we examine the creation of IID and non-IID partitions. IID Partitions: We generate IID samples for 100 devices for our experiments. The number of devices can be increased or decreased based on the problem. We have randomly chosen samples from the entire training set to generate 100 samples. We restrict the size of each sample to 10% of the training set. Non-IID Partitions: In the case of non-IID partitions, we have 100 devices which have a variable amount of data. We have randomly chosen the size of the sample from a normal distribution with a mean of 0.1 (times the size of the training set) and a standard deviation of 0.05 (times the size of the training set) to generate non-IID samples.

4.3 Building Random Forest and Aggregation Pick samples at random from the dataset provided. Create a decision tree for each sample, then analyse the predictions it produces. Cast a vote for each expected outcome. As the final forecast, choose the outcome that received the most votes.

HFedRF: Horizontal Federated Random Forest

419

We apply the aggregation approach given in algorithm 1. to merge the random forests (described in the proposed work subsection C). The tree aggregation algorithm finds the split attribute that occurs most frequently at a particular level of the decision tree and calculates the number of branches of the merged tree based on the split values of that attribute.

4.4 Result Analysis Python 3.8.1 is the programming language that we utilize. Python packages like SciPy, NumPy, and pandas are used. The selected UCI datasets (adult, churn modelling, bank, skin, and magic) are partitioned into 70% of training data and 30% of test data for the experiments. The notations of the ML models used in the tables are as follows: Single DT refers to the decision tree trained on a single device. Single RF refers to the random forest trained on a single device. The depth of the trees is randomly chosen to be equal to 5 for all the tree-based methods. Proposed Non-IID refers to the non-IID setting of our HFedRF model where the data is distributed among d devices are non-identical and independent. Proposed IID refers to the IID setting of our HFedRF model where the data distributed among d devices are independent of each other. We compare our proposed algorithm with its counterparts using three metrics: Precision, Recall, and F1-score. We have compared the metrics on training as well as on the test dataset. It is worth mentioning here that in our federated environment, neither GlobalForest nor local_forests are trained on a complete training dataset, local_forests are trained from a very small sample of training data. It can be seen from Tables 2, 3 and 4 that our proposed privately computed model from HFedRF, gives comparable precision, recall, and F1 score to its counterparts for both training and testing data in non-IID and as well as in IID environment. The results show that the merged random forest of HFedRF gives good results even when trained on a very small training sample. In some cases, such as in Table 2, for the Churn Modelling dataset (highlighted in bold), our proposed model gives better results than its counterparts. Similarly, the adult and churn modelling dataset give better precision than its counterparts, and the Magic dataset gives the best precision in the proposed IID setting. From these results, we can say that our model maintains benchmark comparable utility; is interpretable and privacy-preserving.

5 Conclusion and Future Work In this paper, we have challenged the lossless claim of merging decision trees mentioned in the literature. Further, we propose a novel algorithm titled ‘HFedRF: Horizontally Federated Random Forest’ which merges the random forest computed

420

P. Mehra and A. K. Varshney

Table 2 Precision on training and test data for various machine learning algorithms Dataset

Single DT

Single RF

Proposed non-IID

Proposed IID

Train

Test

Train

Test

Train

Test

Train

Test

Adult

0.7676

0.7597

0.7756

0.7676

0.7732

0.7761

0.7717

0.7780

Churn modelling

0.7929

0.7913

0.8157

0.8132

0.8570

0.8484

0.8329

0.8252

Bank

0.7665

0.7492

0.7747

0.7767

0.7329

0.7435

0.7427

0.7331

Skin

0.9661

0.9648

0.9657

0.9659

0.9689

0.9704

0.9656

0.9654

Magic

0.8205

0.8002

0.8465

0.8479

0.8377

0.8361

0.8306

0.8244

Table 3 Recall on training and test data for various machine learning algorithms Dataset

Single DT

Single RF

Proposed non-IID

Proposed IID

Train

Test

Train

Test

Train

Test

Train

Test

Adult

0.6976

0.6976

0.6878

0.6878

0.6581

0.6581

0.6490

0.6490

Churn modelling

0.7273

0.7273

0.7038

0.7038

0.6422

0.6422

0.6139

0.6139

Bank

0.6107

0.6107

0.6847

0.6847

0.6061

0.6061

0.6602

0.6602

Skin

0.9885

0.9885

0.9889

0.9889

0.9809

0.9809

0.9850

0.9850

Magic

0.7987

0.7987

0.7948

0.7948

0.7578

0.7578

0.7896

0.7896

Table 4 F1-score on training and test data for various machine learning algorithms Dataset

Single DT

Single RF

Proposed Non-IID

Proposed IID

Train

Test

Train

Test

Train

Test

Train

Test

Adult

0.7197

0.7197

0.7124

0.7124

0.6824

0.6824

0.6722

0.6722

Churn modelling

0.7595

0.7595

0.7379

0.7379

0.6762

0.6762

0.6388

0.6388

Bank

0.6449

0.6449

0.7173

0.7173

0.6407

0.6407

0.6892

0.6892

Skin

0.9768

0.9768

0.9846

0.9846

0.9748

0.9748

0.9749

0.9749

Magic

0.8074

0.8074

0.8111

0.8111

0.7767

0.7767

0.8033

0.8033

on various local devices in IID as well as non-IID settings of Federated Learning. Experimental results from 5 datasets show that the global model from our proposed algorithm has benchmark comparable results and improvement in some cases. Although federated learning computes the global model privately, it is still susceptible to attacks such as back-door [29] attacks. Defense against such attacks can provide good future direction. Acknowledgments The first author greatly acknowledges the feedback from Prof. Arend Hintze. The second author acknowledges the support from Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation.

HFedRF: Horizontal Federated Random Forest

421

References 1. O’Leary DE (2013) Artificial intelligence and big data. IEEE Intell Syst 28(2):96–99 2. Team IGP (2020) EU general data protection regulation (GDPR)–an implementation and compliance guide. IT Governance Ltd. 3. Hinderhofer A, Greco A, Starostin V, Munteanu V, Pithan L, Gerlach A, Schreiber F (2023) Machine learning for scattering data: strategies, perspectives and applications to surface scattering. J Appl Crystallogr 56(1) 4. Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag 37(3):50–60 5. Hard A, Rao K, Mathews R, Ramaswamy S, Beaufays F, Augenstein S, Eichner H, Kiddon C, Ramage D (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv: 1811.03604 6. Zhang C, Xie Y, Bai H, Yu B, Li W, Gao Y (2021) A survey on federated learning. Knowl-Based Syst 216:106775 7. Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10(2):1–19 8. Fan C, Li P (2020) Classification acceleration via merging decision trees. In: Proceedings of the 2020 ACM-IMS on foundations of data science conference, pp 13–22 9. Kwatra S, Torra V (2021) A k-anonymised federated learning framework with decision trees. In: Data privacy management, cryptocurrencies and blockchain technology. Springer, Cham, pp 106–120 10. Chen F, Luo M, Dong Z, Li Z, He X (2018) Federated meta-learning with fast convergence and efficient communication. arXiv preprint arXiv:1802.07876 11. Huang L, Yin Y, Fu Z, Zhang S, Deng H, Liu D (2018) LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data. arXiv preprint arXiv:1811.12629 12. Smith V, Chiang CK, Sanjabi M, Talwalkar AS (2017) Federated multi-task learning. Advances in neural information processing systems 30 13. Hardy S, Henecka W, Ivey-Law H, Nock R, Patrini G, Smith G, Thorne B (2017) Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 14. Caldas S, Duddu SMK, Wu P, Li T, Koneˇcný J, McMahan HB, Smith V, Talwalkar A (2018) Leaf: a benchmark for federated settings. arXiv preprint arXiv:1812.01097 15. Cheng K, Fan T, Jin Y, Liu Y, Chen T, Papadopoulos D, Yang Q (2021) Secureboost: a lossless federated learning framework. IEEE Intell Syst 36(6):87–98 16. Zhuo HH, Feng W, Lin Y, Xu Q, Yang Q (2019) Federated deep reinforcement learning. arXiv preprint arXiv:1901.08277 17. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp 439–450 18. Vaidya J, Clifton C, Kantarcioglu M, Patterson AS (2008) Privacy-preserving decision trees over vertically partitioned data. ACM Trans Knowl Discov Data (TKDD) 2(3):1–27 19. Vaidya J, Shafiq B, Fan W, Mehmood D, Lorenzi D (2013) A random decision tree framework for privacy-preserving data mining. IEEE Trans Dependable Secure Comput 11(5):399–411 20. Giacomelli I, Jha S, Kleiman R, Page D, Yoon K (2019) Privacy-preserving collaborative prediction using random forests. AMIA Summits on Translational Sci Proc 2019:248 21. Liu Y, Liu Y, Liu Z, Liang Y, Meng C, Zhang J, Zheng Y (2020) Federated forest. IEEE Trans Big Data 8(3):843–854 22. de Souza LAC, Rebello GAF, Camilo GF, Guimarães LC, Duarte OCM (2020) DFedForest: decentralized federated forest. In: 2020 IEEE international conference on blockchain (blockchain). IEEE, pp 90–97 23. Bonawitz K, Ivanov V, Kreuter B, Marcedone A, McMahan HB, Patel S, Ramage D, Segal A, Seth K (2017) Practical secure aggregation for privacy-preserving machine learning. In:

422

24. 25. 26.

27.

28. 29.

P. Mehra and A. K. Varshney Proceedings of the 2017 ACM SIGSAC conference on computer and communications security Author, F.: Article title. Journal 2(5), 99–110 (2016), pp 1175–1191 Hall LO, Chawla N, Bowyer KW (1998) Combining decision trees learned in parallel. In: Working notes of the KDD-97 workshop on distributed data mining, pp 10–15 Bursteinas B, Long J (2001) Merging distributed classifiers. In: 5th World multiconference on systemics, cybernetics and informatics Andrzejak A, Langner F, Zabala S (2013) Interpretable models from distributed data via merging of decision trees. In: 2013 IEEE symposium on computational intelligence and data mining (CIDM). IEEE, pp 1–9 Strecht P, Mendes-Moreira J, Soares C (2014) Merging decision trees: a case study in predicting student performance. In: International conference on advanced data mining and applications. Springer, Cham, pp 535–548 Asuncion A, Newman D (2007) UCI machine learning repository Bagdasaryan E, Veit A, Hua Y, Estrin D, Shmatikov V (2020) How to backdoor federated learning. In: International conference on artificial intelligence and statistics. PMLR, pp 2938– 2948

Rail Surface Defect Detection and Severity Analysis Using CNNs on Camera and Axle Box Acceleration Data Kanwal Jahan, Alexander Lähns, Benjamin Baasch, Judith Heusel, and Michael Roth

Abstract Rail surface defect detection is a relevant problem in the field of datadriven railway maintenance. Artificial intelligence and neural networks (NN) for axle box acceleration (ABA) or camera data show great potential for defect detection and classification. However, a sufficient amount of labeled training data is required, all the more if the defect severity is to be estimated. A unique dataset of time-synchronized ABA and camera data is employed that contains labeled defect instances. For the image analysis, RetinaNet as a single-stage object detector (with the backbone of ResNet-50 and a feature pyramid network) is used to achieve high classification performance for the two most common rail surface defects (squat and corrugation). Additionally, a machine learning-based method on ABA data to estimate defect severity levels (low, medium, heavy) is proposed. False positives are detected in the original labels by both classifiers during evaluation. The inspection of the false positives in image data reveals that defects have been overlooked in the initial labeling. The insights of this work help to reduce the dependency on labeled data by using only a few labeled samples and by exploiting complementary data sources instead of increasing the number of labeled instances. Keywords Deep learning · Convolutional neural network · Time-synchronized dataset · Supervised learning · Severity analysis · Rail surface defects · Squats · Corrugation K. Jahan (B) · J. Heusel · M. Roth German Aerospace Center (DLR) Lilienthalplatz 7, 38108 Brunswick, Germany e-mail: [email protected] J. Heusel e-mail: [email protected] M. Roth e-mail: [email protected] A. Lähns · B. Baasch German Aerospace Center (DLR), Rutherfordstr. 2, 12489 Berlin, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_31

423

424

K. Jahan et al.

1 Introduction Knowing the condition of the track infrastructure is indispensable for cost-efficient maintenance planning and maintaining the safety standards of the railway network. A defective infrastructure can appear due to multiple reasons, e.g., violation of track geometry parameter specifications or the presence of short wavelength rail surface defects such as squats, head checks, or corrugation. The permanent presence of short wavelength defects is safety-relevant, can cause secondary defects, and reduces the passenger’s comfort by vehicle vibrations and noise [14]. In extreme cases, the effects can be serious destruction of the property and the injury or death of crews and passengers [19]. Earlier, the detection of rail defects was usually conducted by manual inspection by experts, which is expensive and time-consuming for advanced high-speed railway systems. Visual inspection can fail to capture or register all defects [19]. Ever since efforts are made to include diverse sensors and detection methods. Nowadays, it is a common practice to use measurement trains on main rail lines to assess the track geometry parameters and to detect rail irregularities, using optical, ultrasound, vibration, and eddy-current sensors. These measurement trains usually operate on the same infrastructure every few months. Employing onboard sensor systems on regular trains which are capable of delivering similar information as measurement trains would allow for frequent but cost-efficient track inspections, which enables degradation monitoring and early detection of defects. Mostly, signal processing-based methods have been studied for vibration sensors such as axle box acceleration (ABA) sensors to provide track geometry parameters, [21] and to detect defects such as squats, corrugation [5], and insulated joints [10, 16]. On the other hand, conventional image processing has been deployed for the analysis of collected camera data replacing the physical visual inspection [15, 18]. Such methods require feature engineering, longer processing time per image, and the accuracy of the approach is domain specific. However, convolutional neural networks (CNNs) perform better on the mentioned criteria and can be retrained using custom datasets. Recently the inclusion of deep learning-based methods using camera images, for rail defect detection as well, has seen a rise [3, 7, 9]. However, the application of artificial intelligence (AI) using multiple sensors is not yet widespread in the railway domain but holds great potential, e.g., in the analysis of huge amounts of sensor data for automatic defect detection from vehicle-borne image and vibration data. So far approaches either use only image data-based detection or rely only on ABA-based classifiers while labeled images are used for validation only [6]. We propose methods for the detection of the defects on the rails and their further classification into severity levels from ABA data and for categorizing the defects into their types (corrugation and squats) based on image data. All approaches use CNNs for their respective classification. The results of the proposed network architectures are shown for a time-synchronized data set comprising both ABA and image data. From the maintenance perspective, the results obtained from the two data sources can be combined to obtain more detailed and reliable results. The novelty of this

Rail Surface Defect Detection and Severity Analysis Using CNNs …

425

work is the possibility to complement the knowledge obtained from different sensor sources i.e. ABA and camera. The outline of this paper is as follows. Section 2. introduces the utilized data set, Sect. 3. explains the network architectures for the ABA data-based classifier (Sect. 3.1), and the image classifier (Sect. 3.2). Section 4. shows the results of the proposed methods applied to real data. Concluding remarks are given in Sect. 5.

2 Data Collection and Description In the scope of this paper, the data-driven condition monitoring is performed on real sensor data collected from the measurement train equipped with multiple sensors which run on an existing railway network in the Netherlands. The measurement train captures the rail images by using high-definition line scan cameras with the raw data from other sensors, such as three-component ABA, and Global Navigation Satellite System (GNSS) receivers [1]. The highlight of this research is that the data collected from the camera and ABA is not only georeferenced but also time-synchronized, allowing us to combine the results. There are six cameras installed on each side of the vehicle, left and right, facing the rails directly. The image data used in this work is only from one left and one right camera. A part of the collected video data has been labeled by the experts manually, to be able to validate and assess the performance of the algorithms and classifiers for ABA data mainly. As the labeled camera images are available, may they be limited, they are used for training and evaluating an independent image-based classifier with the main focus of reducing manual labeling efforts in this research. Labels belonging to 5 different types include defects like head check Fig. 1a, stud Fig. 1b, corrugation Fig. 1c, squat Fig. 1e–g, and others Fig. 1d. The labels also differentiate these defects into severity levels i.e. heavy, medium, and low. As an example, heavy, medium, and low levels of type squat are shown in Fig. 1e, Fig. 1f, and Fig. 1g, respectively. All these defects along with their severity levels are used to train the ABA classifier. On the other hand, the relevant Rolling Contact Defects (RCF). i.e. squat and corrugation are further classified using image-based classification. Tables 1 and 2 detail the distribution of available labels, with their severity levels, for the ABA and image classifier, respectively. The severity level heavy is underrepresented in the labeled data for types squat and corrugation, see Table 2, which makes it impossible to have a further classification of these types into their severity levels. Consequently, the image classifier is trained to distinguish among these types without considering the severity levels.

2.1 ABA Preprocessing ABA data strongly depend on vehicle speed [2]. To compensate for that, the ABA data is scaled by v0/v, the quotient between a reference speed v0 (here: 60 km/h) and

426

K. Jahan et al.

Fig. 1 Labelled defects in camera data

Table 1 Distribution of all given defects with severity levels

Table 2 Distribution of classes with severity levels available for image classifier

Severity level

No. of labels

Heavy

50

Medium

131

Low

917

Total

1098

Severity level

Type

No. of labels

Heavy

Squat

18

Medium

Squat

58 633

Low

Squat

Heavy

Corrugation

0

Medium

Corrugation

5

Low

Corrugation

Total

23 737

the actual speed of the vehicle v. In order to perform the classification of the ABA data, the original time series is divided into short time series of 1,000 samples in length with 75% overlap. At an ABA sampling rate of 26,500 Hz, this corresponds to a length of approximately 0.38 seconds and 0.65 meters at a speed of 60 km/h. Labels are extracted from the image bounding boxes and assigned to each of these time series. Data that contain welds and joints are excluded. Separating objects from defects is an important, yet difficult task which is not in the scope of this research. The resulting time series instances and labels are listed in Table 3. Figure 2 shows raw ABA data and the corresponding label extracted from the images at a severe squat. It can be seen that the ABA peaks directly after the squat.

Rail Surface Defect Detection and Severity Analysis Using CNNs …

No. of labels

Severity level Intact

9794

Heavy

228 765

Medium

Table 4 No. of labeled instances used by image classifier per class after preprocessing

12,023

Total

22,810

m/s2

1.0 50

ABA in

Fig. 2 ABA data (vertical component, solid line) at a heavy squat. The dashed line represents the severity label extracted from the images

Low

0.5 0

severity label

Table 3 No. of labeled instances used by ABA classifier per class after preprocessing

427

0.0 0.00

0.25

0.50

0.75 1.00 distance in m

1.25

1.50

Classes

Squat

Training set

644

952

81

119

Validation set

1.75

Corrugation

Test set

119

118

Total instances

805

1189

2.2 Image Preprocessing Corrugation, in contrast to squats, is an elongated rail surface defect. Hence, the shape of generated true labels for corrugation is elongated, too. The length of the bounding box varies from 1,300 pixels to 13,000 pixels. To generate enough labels and introduce homogeneity in the labeled data, the width-to-height ratio of all the labels is brought to 1:2 for the image classifier. The generated instances are divided into training, validation, and test dataset with a ratio of 80%, 10%, and 10%, respectively, as shown in Table 4.

3 CNN for Image and ABA Analysis 3.1 ABA Classifier A fully convolutional network (FCN) architecture is used for classification of the ABA time series (Fig. 3). The architecture follows the idea proposed in [20]. 1-D convolutional layers followed by batch normalization (BN) and ReLU layers build the main block of the FCN. Convolutions are carried out in three layers with the number

428

K. Jahan et al. 0 1 Input Convolution BN + ReLU

Convolution BN + ReLU

GAP

Softmax

1 2 Input Convolution BN + ReLU

Convolution BN + ReLU

3 GAP

Softmax

Fig. 3 Architecture of FCN

of filters {16, 32, 16} and filter sizes {8, 5, 3}, respectively. The convolutional layers work as feature extractor. Conventional pooling layers between the convolutional layers are omitted. Instead, a global average pooling (GAP) layer is used to reduce the feature time series produced by the convolutional layers to scalar features (1x1 feature maps). This largely reduces the number of weights compared to flattening the feature time series to a feature vector and subsequently using a fully connected layer for classification. Furthermore, the use of GAP fully preserves the shift-invariance of the convolutional operation. Batch normalization is applied to im- prove the convergence speed and generalization. A softmax logistic regression layer at the end produces the final class labels. Defect detection and severity classification is difficult to perform within one network. Therefore, a step-wise classification is carried out. First, a binary classification is used to classify the data into defective and non-defective. Second, a multinomial classification is applied to the defect class that further classifies the defects by their severity level. For both classification tasks the same FCN architecture is used with the only difference in the softmax layer (Fig. 3). The input to the network is the pre-processed ABA data that consists of six channels in total (longitudinal, lateral and vertical component on the left and ride side of the axle).

3.2 Image Classifier With the recent advances in deep learning approaches for image data analysis, selecting an appropriate model is an extensive task. There are a few application-specific criteria to consider for selecting a suitable network architecture. For example, for a detection system to work adequately not only confidence in the accuracy but also the speed of the prediction is a vital criterion. One challenge specific to this problem is that the defects appear on the rails infrequently, which introduces an imbalance in the background (normal rail) and foreground (defects) classes. If the skewed class distribution is not addressed during the training phase, it may result in the network ignoring the non-dominating class and still achieving a reduction in the training loss. The aspects like class imbalance, accuracy, and speed of inference have been considered for selecting the used deep

Rail Surface Defect Detection and Severity Analysis Using CNNs … class+box subnets +

class+box subnets

+

class+box subnets

class subnet

box subnet

(a) ResNet

(b) feature pyramid net

429

W×H ×256

×4

W×H ×256

W×H ×KA

W×H ×256

×4

W×H ×256

W×H ×4A

(c) class subnet (top)

(d) box subnet (bottom)

Fig. 4 Architecture of RetinaNet [12]

neural network architecture. RetinaNet [12], as shown in Fig. 4, is a one-stage object detector that utilizes the focal loss function to address class imbalance during the training phase. Focal loss [12] applies a modulating term to cross-entropy loss to focus learning on the hard-negative examples as well. RetinaNet [12] also performs well on the inference speed and accuracy criteria. It consists of four major components, a bottom-up pathway, the top-down pathway and lateral connection, and two task-specific subnetworks running in parallel. A residual network, ResNet-50 [4], is deployed as a bottom-up pathway or the backbone network to extract the feature maps. Feature Pyramid Network [11], for a top-down pathway, is used to up-sample the spatially coarse features maps. Finally, the lateral connections bring together the features from the top-down and bottom-up layers. One of the subnetworks is termed as box regression head and the other as classification head. The regression subnetwork, a feature convolution network, learns to draw the anchor boxes as close to bounding boxes (labels) by drawing several regression boxes at each location while reducing the loss function. The classification subnetwork estimates the probability of the presence of a particular class instance for each anchor box and total object classes. The DNNs are known to have a high dependence on large amounts of labeled data, which is not available for the use case. This challenge is addressed by deploying Transfer Learning [17]. In transfer learning, the basic feature maps are learned on the relatively bigger dataset and then the task-specific, but smaller dataset is used to extract the unique features belonging to the required classification. For the bigger dataset, the COCO dataset [13] and its pre-trained weights as the initial weights for training the classifier on the defects-specific dataset are used.

4 Results and Discussion 4.1 ABA Classifier The two FCN for ABA classification are trained separately. The first FCN is trained on all instances listed in Table 3. By merging all defect labels (low, medium, heavy) into one class, this is reduced to a binary classification problem. The second FCN

430

K. Jahan et al.

Table 5 Performance parameters for ABA classification Performance parameters

Intact (%)

Low (%)

Medium (%)

Heavy (%)

Recall (TP/(TP + FN))

95

93

76

84

Precision (TP/(TP + FP))

92

95

82

84

8

5

18

16

93

94

79

84

FDR (FP/(FP + TP)) F1-Score (2TP/(2TP + FP + FN))

Fig. 5 Confusion matrix of ABA classification

1861

2

4

92

high

7

38

0

0

medium

21

0

116

16

low

141

5

22

2237

True label

intact

intact

high medium Predicted label

low

is trained on the defects only, to learn to differentiate between defects with low, medium, and heavy severity. During inference, the second FCN is only fed with the predicted defects from the first FCN. The testing results show a good performance of the two-step ABA classification with a weighted average f1-score of 93%. From Table 5 it can be seen that the classes with the most incidences (intact and low severity) have the best classification results. With only 45 incidences the defects with heavy severity could be predicted with an f1-score of 84%. However, seven heavy severity instances were classified as intact (Fig. 5). It is worth mentioning that there is a certain overlap between the training and testing data set, due to the overlap of consecutive time series analysis windows in combination with the random training-testing data split. This may have led to overly optimistic results.

4.2 Image Classifier The labeled image data, as described in Table 4, is used to train the proposed model, as explained in Sect. 3.2, by introducing dropout layers and batch normalization to avoid overfitting. A batch size of 4 is used, as supported by NVIDIA GPU- GeForce GTX 1080 Ti.

Rail Surface Defect Detection and Severity Analysis Using CNNs …

431

The hyperparameters related to the loss function and learning rate, Adam [8], are learned with the help of the validation dataset. Figure 6 shows the detections produced on the test set by the network trained for squat detection. Red boxes represent the trained network’s generated detections, with the network’s confidence for each classification. Green boxes show the true labels. In Fig. 6, the left two images depict good detection performance by the net- work with high confidence, while the rightmost image is a squat not detected by the network. Interestingly, this network has produced additional detections (false positives) for the defect type squat, as shown in Fig. 7 with high confidence, which were not labeled during the labeling process. As shown in Table 6 as well, false negatives and false positives are quite high for the squats detection network. False positives could indicate missed labels during the manual labeling phase. However, false negatives should be reduced by enhancing the available labeled data and performing iterative training of the network. Figures 8 displays the predictions of the network trained for corrugation detection. The right-side image shows the detection of all divided annotations, as explained in Sect. 2.2 belonging to one corrugation instance. On the left side, only one subannotation is detected out of 4. For image classification, it is a missed detection. Fig. 6 Classification results for squats

Fig. 7 False positives detected for squats

432 Table 6 Confusion matrix for squat detection

K. Jahan et al.

True labels

Predicted labels Squat

Not squat

Squat

TP = 59

FP = 19

Not squat

FN = 24

TN

However, as one instance of corrugation is divided into multiple labels, one such detection out of many is enough to identify the presence of corrugation on the rail from the maintenance perspective. With the amount of given labeled training data, parameters like recall, precision, and f1-score of the network are quite high for the squats detection, see Table 7. Overall, the false discovery rate, as given in Tables 7 and 5, is an important indicator for squat and severity detection. These detections require further investigation if these are the missed instances during the manual labeling process. As seen in Fig. 7, false positives in the case of squats do appear like an actual defect. This illustrates the potential of using neural networks to produce sudo labels instead of only relying on manual labor. Fig. 8 Classification results for corrugation

Table 7 Performance parameters for squat detection

Performance Parameters

Values (in %)

Recall (TP/TP + FN))

71.08

Precision (TP/(TP + FP))

75.64

FDR (FP/(FP + TP))

24.35

F1-Score (2TP/(2TP + FP + FN))

73.29

Rail Surface Defect Detection and Severity Analysis Using CNNs … Table 8 Confusion matrix for corrugation detection

Table 9 Performance parameters for corrugation detection

433

True labels

Predicted labels Corrugation

Not corrugation

Corrugation

TP = 79

FP = –

Not corrugation

FN = 39

TN = –

Performance parameters

Values (in %)

Recall (TP/TP + FN))

66.95

Precision (TP/(TP + FP))

100

FDR (FP/(FP + TP))

0.0

F1-Score (2TP/(2TP + FP + FN))

80.20

Table 8 shows the confusion matrix of the network trained for detecting corrugation. False negatives may appear high but detecting one corrugation annotation out of many, belonging to a single instance of corrugation is enough in the practical use case. As the number of instances available for training the corrugation network, see Table 4, is slightly higher than squat hence parameters recall, precision and f1-score of the networks are higher as well, see Table 9. As mentioned in Sect. 3.2, special attention is given to the speed of detection for real-time applications, since the collected image data is high resolution (840 x 40000 pixels per image). The inference time for the networks using a single Graphical Processing Unit (GPU) of NVIDIA—GeForce GTX 1080 is 17.5ms per image. That offers a detection speed of 57 frames per second (fps).

5 Conclusion In this research, deep learning classifiers based on CNN are utilized on two completely different types of sensor data, namely images and ABA. Both data sets complement each other. The images are superior in discriminating different types of rail surface defects, while the dynamic vehicle-track interaction measured via ABA indicates the severity of those defects. High values of recall, precision, and f1 scores are achieved by the corresponding networks using a limited number of labeled data. Therefore, the networks are promising for generating quality labels that can be checked by the expert. Accepted labels can be used in a second iteration of training the models. Several such training iterations are expected to increase the model performance, prediction accuracy, and reduce the manual efforts further. In this study, the networks for the different data sets were trained independently. As the next steps, a single network fusing both data sets shall be trained and evaluated.

434

K. Jahan et al.

Acknowledgements The authors thank Strukton Rail for their support on this work. This project has received funding from the Shift2Rail Joint Undertaking (JU) under grant agreement No 881574. The JU receives support from the European Union’s Horizon 2020 research and innovation program and the Shift2Rail JU members other than the Union. This work reflects only the author’s view and the JU is not responsible for any use that may be made of the information it contains.

References 1. Ahmad W (2019) Artificial intelligence based condition monitoring of rail infrastructure. PhD thesis, University of Twente, The Netherlands 2. Baasch B, Roth MH, Groos JC (2018) In-service condition monitoring of rail tracks: on an on-board low-cost multi-sensor system for condition based maintenance of railway tracks. Internationales Verkehrswesen 70(1):76–79 3. Faghih-Roohi S, Hajizadeh S, Núñez A, Babuska R, De Schutter B (2016) Deep convolutional neural networks for detection of rail surface defects. In: 2016 international joint conference on neural networks (IJCNN), pp 2584–2589 4. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778 5. Heusel J, Baasch B, Riedler W, Roth M, Shankar S, Groos JC (2022) Detecting corrugation defects in harbour railway networks using axle-box acceleration data. Insight—Non-destructive Testing and Condition Monitoring 64(7):404–410 6. Hoelzl C, Ancu L, Grossmann H, Ferrari D, Dertimanis V, Chatzi E (2022) Classification of rail irregularities from axle box accelerations using random forests and convolutional neural networks. Data Sci Eng 9:91–97 7. Jang J, Shin M, Lim S, Park J, Kim J, Paik J (2019) Intelligent image-based railway inspection system using deep learning-based object detection and weber contrast- based image comparison. Sensors 19(21):4738 8. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR, abs/1412.6980 9. Li H, Wang F, Liu J, Song H, Hou Z, Dai P (2022) Ensemble model for rail surface defects detection. Pone 17(5): e0268518 10. Li S, Núñez A, Li Z, Dollevoet R (2015) Automatic detection of corrugation: preliminary results in the Dutch network using axle box acceleration measurements. In: 2015 joint rail conference, p 7 11. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944 12. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2018) Focal loss for dense object detection. Comput Vis Pattern Recognit (CVPR), 318–326 13. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Computer Vision—ECCV 2014, volume 8693, pp 740–755. Springer International Publishing, Cham. Series Title: Lecture Notes in Com- puter Science 14. Loidolt M, Marschnig S (2022) Evaluating short-wave effects in railway track using the rail surface signal. Appl Sci 12(5):2529 15. Min Y, Xiao B, Dang J, Yue B, Cheng T (2018) Real time detection system for rail surface defects based on machine vision. EURASIP J Image Video Process 2018(1):3

Rail Surface Defect Detection and Severity Analysis Using CNNs …

435

16. Molodova M, Li Z, Nunez A, Dollevoet R (2014) Automatic detection of squats in railway infrastructure. IEEE Trans Intell Transp Syst 15(5):1980–1990 17. Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: The 27th international conference on artificial neural networks (ICANN 2018) 18. Tastimur C, Karakose M, Akın E, Aydın I (2016) Rail defect detection with real time image processing technique. In: 2016 IEEE 14th international conference on industrial informatics (INDIN), pp 411–415 19. Toliyat H, Abbaszadeh K, Rahimian M, Olson L (2003) Rail defect diagnosis using wavelet packet decomposition. IEEE Trans Ind Appl 39(5):1454–1461 20. Wang Z, Yan W, Oates T (2016) Time series classification from scratch with deep neural networks: a strong base-line. In: 2017 international joint conference on neural networks (IJCNN), pp 1578–1585 21. Westeon PF, Ling CS, Roberts C, Goodman CJ, Li P, Goodall RM (2007) Monitoring vertical track irregularity from in-service railway vehicles. Proc Inst Mech Eng, Part F: J Rail Rapid Transit 221(1):75–88

A Testbed for Smart Maintenance Technologies San Giliyana, Joakim Karlsson, Marcus Bengtsson, Antti Salonen, Vincent Adoue, and Mikael Hedelind

Abstract Industry 4.0 presents nine technologies including Industrial Internet of Things (IIoT), Big Data and Analytics, Augmented Reality (AR), etc. Some of the technologies play an important role in the development of smart maintenance technologies. Previous research presents several technologies for smart maintenance. However, one problem is that the manufacturing industry still finds it challenging to implement smart maintenance technologies in a value-adding way. Open questionnaires and interviews have been used to collect information about the current needs of the manufacturing industry. Both the empirical findings of this paper, as well as previous research, show that knowledge is the most common challenge when implementing new technologies. Therefore, in this paper, we develop and present a testbed for how to approach smart maintenance technologies and to share technical knowledge to the manufacturing industry. Keywords Smart maintenance technologies · Knowledge · Testbed

1 Introduction Liu and Xu [1] define industry 4.0 as involving Information and Communication Technologies (ICT), smart factories, internet and embedded system technologies. Alcácer and Cruz-Machado [2] identify nine industry 4.0 technologies: (1) Industrial Internet of Things (IIoT), (2) Big Data and Analytics, (3) Augmented Reality (AR), (4) Simulation, (5) Autonomous Robots, (6) Additive Manufacturing (AM), S. Giliyana (B) · M. Bengtsson · A. Salonen · M. Hedelind Mälardalen University, Universitetsplan 1, Västerås, Sweden e-mail: [email protected] S. Giliyana · J. Karlsson · V. Adoue · M. Hedelind Mälardalen Industrial Technology Center, John Engellaus Gata 1, 633 61 Eskilstuna, Sweden M. Bengtsson Volvo CE, Bolindervägen 100, 635 10, Eskilstuna, Sweden © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_32

437

438

S. Giliyana et al.

(7) Cyber Security, (8) Cloud Computing, and (9) Horizontal and Vertical System Integration. Matt et al. [3] note that in today’s competitive environment, simply producing faster, cheaper, and higher quality goods than the competitors is no longer enough. Therefore, the manufacturing industry needs to implement new and innovative industry 4.0 technologies to remain competitive in the long term. In addition, industry 4.0 technologies are transforming production systems, strategies, processes, machinery types, and maintenance [4]. Previous studies [5] show that maintenance is crucial to keep production systems running, and maintenance is an essential activity that extends the equipment’s lifetime [6], as highlighted by other studies, such as [7, 8]. The evolution of maintenance practices in industry has progressed from Corrective Maintenance in generation 1.0 to Predetermined Maintenance in generation 2.0, and Condition Based Maintenance (CBM) in generation 3.0 [9]. Due to the industry 4.0, a new demand on maintenance is placed, and some of the nine industry 4.0 technologies [10], Cyber-Physical System (CPS) [11], and Artificial Intelligence (AI) [12, 13] play an important role in the development of smart maintenance technologies, where CBM is expected to play a dominant role [11]. Previous research has identified various smart maintenance technologies. However, despite this, the manufacturing industry faces both managerial and technical challenges in implementing these technologies in a value-adding way [10, 14–18]. This paper presents empirical data from fifteen Small and Medium-sized Enterprises (SMEs) and eleven large companies within the manufacturing industry. The study aimed to determine whether smart maintenance technologies have been implemented and, if so, in what context. It also explored the perspectives of the manufacturing industry on the benefits and challenges of these technologies. Based on the empirical findings and previous research [19], the manufacturing industry should prioritize addressing the knowledge-related challenges, which are the most common when implementing new technologies. Therefore, in this paper, we present a smart maintenance technologies testbed, developed at Mälardalen Industrial technology Center (MITC), a collaborative platform and laboratory in Sweden, to share technical knowledge about smart maintenance technologies to the manufacturing industry and demonstrate possible use cases. This paper generates new knowledge for academia as well as industry.

2 Methodology This paper is based on a case study of fifteen SMEs and eleven large companies, performed in the Swedish manufacturing industry. The development of the testbed is based on the results of the case study.

A Testbed for Smart Maintenance Technologies

439

2.1 Case Study Case studies are often employed for exploration, theory building, and elaboration/ refinement [20]. Data collection methods, such as interviews, questionnaires, and observations can be used in case studies [21]. However, [21] has identified several weaknesses with questionnaires, including the lack of support for respondents when questions are unclear. To address this, [22] suggests using triangulation, which combines different data collection methods and can offset weaknesses in one method with another.

2.2 Data Collection The corresponding author is an industrial doctoral student affiliated with MITC. Empirical data was collected via an open questionnaire distributed to eleven large companies and fifteen SMEs in the manufacturing industry, all of which were accessible within MITC’s environment, see Table 1. The questionnaire was designed using Google Forms to visualize the questions and gather empirical data. The questionnaire consists of (shortened): (1) What types of smart maintenance technologies are implemented and in what context?, (2) Any added value?, (3) Any challenges?, (4) Any advantages?. To address the weaknesses of the questionnaire, interviews and email conversations were conducted with respondents who required additional support.

2.3 Data Analysis Qualitative data analysis is commonly used to analyse empirical data collected through interviews and open questionnaires [21]. Miles et al. [23] have proposed a three-step process for qualitative data analysis, beginning with data reduction, where the data is transcribed and coded to make it manageable. An initial analysis is also conducted in this step. The second step involves data display, where matrices and charts are used to visualize the data. In the final step, patterns and explanations are sought, clusters are created, comparisons are made, and changes over time are analysed to draw conclusions [23]. During the first step of the analysis, the handwritten notes from the interviews were typed into a computer. The data from the open questionnaires and interviews were then converted into a table format in Microsoft Excel and analysed, using codes such as: implemented smart maintenance technologies, context, values, advantages and challenges. Then, the data was presented in a matrix format, organized by the implemented smart maintenance technologies, context, values, advantages and challenges. A more comprehensive understanding was gained by identifying the specific types of smart maintenance technologies that

440

S. Giliyana et al.

Table 1 Case companies. OQ = Open Questionnaire, I = Interview, @ = E-mail Case comp Approx. employees (World-wide) Type of industry A

957 (27,500)

Automotive

Methods OQ

B

1500 (30 000)

Transportation

OQ

C

1100 (11,000)

Robotics

OQ

D

63 (440)

Components manuf

OQ, I

E

53 (1200)

Automotive

OQ

F

775 (9000)

Nuclear

OQ

G

125 (44,500)

Air and gas sensing

OQ, @

H

1426 (4100)

Metal cutting

OQ, @

I

825 (14,600)

Transportation

OQ

J

13,525 (50,000)

Transportation

OQ

K

23,000 (40,000)

Automotive

OQ

L

25

Contract manuf

OQ, I

M

50

Contract manuf

OQ

N

20

Manufacturing of latches and quick OQ, I release battery connectors

O

53

Manufacturing of installation-ready OQ components to the energy, automotive and process industries

P

35

Contract manuf

OQ

Q

37

Contract manuf

OQ, I

R

41

Manufacturing of impact sockets and accessories

OQ

S

10

Contract manuf

OQ, I

T

80

Surface treatment

OQ

U

43

Contract manuf

OQ

V

30

Contract manuf

OQ

W

65

Contract manuf

OQ

X

100

Manufacturing of car heating and battery charging products

OQ

Y

65

Manufacturing of components for gas and steam turbines

OQ, I

Z

14

Contract manuf

OQ, I

were implemented, as well as the associated context, values, advantages and challenges. Finally, thematic analysis [24] was used to organize the challenges into four categories: (1) Knowledge, (2) Time and resources, (3) Cost, and (4) Age of machines, and (1) Added values and (2) Advantages into Benefits.

A Testbed for Smart Maintenance Technologies

literature review

Collection of empirical data

Challenges from the manuf. industry

441

Team building

Pre-study

Testbed

Fig. 1 Testbed development process

2.4 Testbed Development Technical challenges, such as identifying what data to collect, what to measure and monitor, and a lack of familiarity with these technologies in the context of maintenance, to make these technologies work and make everyone understand and work after them, etc., were identified in the Knowledge category, see Table 2. Therefore, the focus of the testbed is to share technical knowledge by demonstrating how some of the industry 4.0 technologies, AI and CPS, can be utilized within maintenance. Figure 1 shows the process for developing the testbed, based on maintenance research. First, we started by performing a literature review. The result of the literature review shows that knowledge is the most common challenge when implementing new technologies. Second, we collected empirical data from the manufacturing industry including SMEs and large companies and identified several challenges. The challenges were organized into four categories using thematic analysis, see Table 2. Fourth, a crossfunctional team was formed to build the testbed, consisting of an industrial doctoral student (who is also the corresponding author of this paper and project leader for the testbed), two maintenance researchers (who are the academic supervisors of the industrial doctoral student), a software engineer, an industrial supervisor (who provided industry 4.0 and AI technical knowledge), and an automation expert (who provided CPS technical knowledge). Fifth, we did a pre-study to decide what technologies to use within the testbed. The decision was made based on the previous research which present that some of the industry 4.0 technologies play an important role in the development of smart maintenance technologies, as well as AI and CPS, see section three. Sixth, we build the testbed based on the output from previous steps.

3 Smart Maintenance Technologies Smart maintenance technologies in this paper are related to the nine technologies of industry 4.0, as well as AI and CPS. Among the technologies of industry 4.0, some are essential for achieving smart maintenance technologies [10]. IIoT involves connecting physical objects to the internet [25]. Big Data and Analytics are defined by the 5Vs: (1) Volume, (2) Variety, (3) Velocity, (4) Value, and (5) Veracity [26, 27], which support advanced data analysis and real-time decision-making [28]. Cloud Computing can be used for platform and data sharing across an entire company [29], while Simulation is based on mathematical modelling and algorithms for process optimization [29], and can support the design of a production system [10]. AR is based on real-time combination of 3D virtual objects with the real environment

442

S. Giliyana et al.

[30]. Autonomous Robots are cooperative and can interact with each other or safely with human [29]. Horizontal and Vertical System Integration is about integration of systems within the factory as well as supply chain [29]. By connecting machines to the internet using IIoT, maintenance data can be generated from machine control systems to support maintenance planning through Big Data and Analytics [10], such as vibration, pressure, temperature, acoustics, and viscosity [31]. Cloud Computing can be used to share maintenance data with different departments within a company, such as maintenance and production departments, as operators are responsible for Autonomous Maintenance and work closest to the machines [32, 33]. Furthermore, cloud-based Computerized Maintenance Management System (CMMS), can be accessed through different devices, such as desktops, laptops, netbooks, and smartphones [34]. By assigning different system roles depending on employment, operators can view Autonomous Maintenance activities on their smartphones, while maintenance engineers can access modules for maintenance planning, root cause analysis, and spare parts inventory. Maintenance managers can track progress through a dashboard with various diagrams and graphs. Simulation can be used to predict the behavior of a production systems and support maintenance planning [35]. AR is also a tool for maintenance development, making it possible to provide step-bystep guidance for diagnostics, inspection, and training [36, 37]. AR can help reduce maintenance time and errors [38]. It can also be used for remote maintenance by communicating with maintenance experts through real-time symbols, sketches, or text, resulting in a reduction of costs for travel and downtime [38]. AR can also be used for digitalized maintenance instructions for Autonomous Maintenance and Preventive Maintenance [39]. Autonomous Robot can collect maintenance data and perform simple maintenance tasks [10]. Horizontal and Vertical System Integration can be used to integrate CMMS with other systems. Lee et al. [13] and Kanawaday and Sane [12] highlight that AI has emerged as a critical technology for maintenance development and sensor data analysis. Kour and Gondhi [40] define AI as a technique that mimics human behavior and is comprised of subfields such as Machine Learning, which involves developing systems that can learn and improve their performance without explicit programming. Supervised Learning is one of the subfields of Machine Learning and is used to forecast events based on labeled data [40]. Within Supervised Learning, there are several algorithms, such as Regression Learner being the most commonly used algorithm to predict an event, such as failure, based on continuous variables [41]. According to [42], in CPS, the cyber world consists of data analysis, apps and services and decision-making. The physical world consists of machines, real environment, material, human and execution. According to [11], CPS is about integration of the physical and cyber world in real-time.

4 Empirical Findings Based on the empirical data, collected from eleven large manufacturing companies, it was discovered that seven of them had implemented some smart maintenance technologies while three of them (Companies F, G, and H) had not implemented any,

A Testbed for Smart Maintenance Technologies

443

and Company E had experimented with such technologies on a smaller pilot project. The respondent from Company A mentioned that they had implemented IIoT for machine connection, while a respondent from Company B mentioned the use of sensors for maintenance data collection. Company C’s respondent noted the use of Big Data and Analytics for production data collection, including maintenance data, while Company I had implemented Machine Learning for predictive maintenance. Finally, Company K’s respondents mentioned the use of AM to make spare parts for older machines. Table 3 shows the added value and advantages mentioned by the respondents from large manufacturing companies. Regarding SMEs, no smart maintenance technologies have been implemented, as mentioned by the respondents at the case companies. Therefore, no added values or advantages where mentioned. Table 2 shows the challenges mentioned by the respondents from the SMEs and large companies. Table 2 Challenges according to SMEs and large companies Knowledge

Time and resources

Cost

– Know what kind of data to collect – Know what to measure – Know what to monitor – Learning of the system, diffusion into operations – Not familiar with these technologies in the context of maintenance – Technical knowledge – Competence and experience in senior positions – To make these technologies work and make everyone understand and work after them – Knowledge in general – Cyber security

– The company is small, and due to a – Gets the costs small number of employees, there is into the budget no time to think about smart – Start-up costs maintenance technologies. There – Financial must be a full-time person who only resources works with these technologies – The time between implementing these technologies and benefits, since SMEs are dependent on fast results – Industry 4.0, in general, is created for large companies with necessary implementation resources – Technical resources – Time it takes to implement these technologies – Time in general – The technology is not mature. The machine manufacturers do not have the opportunity to offer these types of technologies for maintenance – Change management

Age of the machines – Older machines

444

S. Giliyana et al.

Table 3 Benefits according to large companies Added values

Advantages

– – – – – –

– – – – –

Reduction of unplanned stops Automated condition monitoring Tracking of process data Failure prediction Spare parts availability Competence and deeper process knowledge

Do the correct maintenance actions in time Cost efficient Less production disturbances From reactive to predictive maintenance Increased knowledge on how equipment works and degradation progress – Supporting Maintenance planning – Decision making based on data and fact – Less unplanned stops in production

5 A Testbed for Smart Maintenance Technologies The aim of the testbed is to share technical knowledge about smart maintenance technologies to the manufacturing industry, by demonstrating how some of the industry 4.0 technologies, AI and CPS can be used within maintenance. Similar development of testbed applications may be found in, for instance, Luleå University of Technology through Centre of Intelligent Asset Management (www.ltu.se/centres/ ciam) and Chalmers University of Technology through Stena Industry Innovation Lab (www.sii-lab.se). Development of maintenance-related testbeds and labs has been reported on by, for instance, [43–45]. Testbeds and labs within Industry 4.0 in general has also been reported on by, for instance, [46–48]. The testbed is designed as a cyber-physical production system, with both a physical production system in lab environment and software packages for modelling, simulation, manufacturing execution and control, follow-up, etc. The foundation of the system is a complete production system from FESTO Didactic called Cyber Physical (CP) Factory that consists of two automated production lines, a mobile robot, and a complete software suit, see Fig. 2. The testbed is built on top of the cyber-physical production system and incorporates of a range of advanced technologies, including an IIoT platform, IIoT sensors for collecting maintenance data, AR for remote maintenance and maintenance instructions, a next-generation CMMS, Autonomous Robot, communication facilitated by Fig. 2 Cyber physical production system

A Testbed for Smart Maintenance Technologies

445

Fig. 3 Architecture for the testbed

the Open Platform Communications-Unified Architecture (OPC-UA) (www.opcfou ndation.org), Modbus (www.modbus.org) and Websockets [49], data visualization, AI-powered data analysis, Application Programming Interface (API) for seamless system integration and IIoT-gateway. The IIoT Gateway is connected to the IIoT platform through Websockets. An out-station and a press station are connected to the IIoT platform through the IIoT Gateway and send data to the IIoT platform, with OPC UA, through the IIoT Gateway. An Autonomous Robot is connected to the IIoT platform through the IIoT Gateway and communicate with the IIoT platform with Modbus and REST API. The CMMS is connected to the IIoT platform through an API as an integration service. Finally, the IIoT platform communicate with the AR by sending QR-codes. See Fig. 3. The press station has a component, a press, that is a perfect example for our purpose for the testbed. As soon as the press station produces 70 parts, the IIoT platform automatically generates a work order in the cloud-based CMMS. This work order appears on a specially designed dashboard in the IIoT platform, highlighting the need to lubricate the press station. The IIoT platform simultaneously dispatches an Autonomous Robot to bring lubricant to the press station. The dashboard also displays a QR code for digitalized maintenance instructions, see Fig. 4. When the QR code is scanned using AR glasses, the instructions displays step-by-step with accompanying figures and text. In the out-station, a gripper moves on a linear guide between points A and B, and B and A, to pick up finished products, as shown in Fig. 6. Whenever the travel time between points A and B, and B and A exceeds normal thresholds, the IIoT platform automatically generates a work order in the cloud-based CMMS. This work order appears on the out-station dashboard in the IIoT platform, as shown in Fig. 5. It indicates the need for cleaning and lubrication of the linear guide or replacement if it is broken. The IIoT platform sends an Autonomous Robot to bring the necessary cleaning and lubrication tools (Fig. 7).

446

S. Giliyana et al.

Fig. 4 The dashboard for the press station that shows the number of pressed parts and the QR-code for the AR maintenance instructions

Fig. 5 The dashboard for the out-station that shows the QR-code for the AR maintenance instructions, collected travel time data (red graph) and a blue button that is used to send the Autonomous Robot to the home position when the maintenance job is done Fig. 6 The press station to the right and the out station to the left. The red arrow shows that the griper moves from A to B, and from B to A, to pick up finished products

A Testbed for Smart Maintenance Technologies

447

Fig. 7 A person performing maintenance tasks following AR maintenance instructions. The lubricants and cleaning accessories are on the Autonomous Robot

6 Discussions and Conclusions As mentioned above, some of the industry 4.0 technologies, AI and CPS are essential to the development of smart maintenance technologies. The aim of the testbed is to share technical knowledge about how some of the industry 4.0 technologies, as well as AI and CPS can be used in the development of smart maintenance technologies since challenges related to Knowledge are the most common when implementing new technologies [19]. In addition, the empirical findings show that Knowledge is one of the challenging categories when implementing smart maintenance technologies. Among these technologies, IIoT is used in the testbed for machine connection, while Cloud Computing allows for a cloud-based CMMS that is easily accessible through various devices, including desktops, laptops, netbooks, and smartphones. Additionally, Big Data and Analytics are implemented for maintenance data collection and analysis through the utilization of AI. The data from the press station and out-station is stored in the IIoT platform and can be easily downloaded and analysed using Machine Learning techniques, such as Regression Learner, to predict potential failures. In addition, the data can be visualized using Microsoft Power BI. Furthermore, the technology AR is used for digitalized maintenance instructions and remote maintenance, and Autonomous robot for bringing of lubrication and cleaning accessories. The testbed employs various systems that are integrated with each other, which is linked to the technology of Horizontal and Vertical System Integration. The testbed consists of a cyber and physical word which are integrated to each other, CPS. The cyber world consists of advanced software, such as CMMS, IIoT platform, AR software, AI software, etc. The Physical world consists of the production system, sensors, AR glasses Autonomous Robot, etc. The testbed can be used as an education and training platform for how to use some of the industry 4.0 technologies, AI and CPS for the development of smart maintenance technologies.

448

S. Giliyana et al.

7 Further Research The testbed was demonstrated at MITC to SMEs and large companies within the manufacturing industry on February 6th and 7th, 2023. 43 individuals from 25 to 30 companies attended the demonstration. Prior to and after the demonstration, the participants completed a questionnaire to evaluate their technical knowledge development. The results of the demonstration will be presented in a separate research paper. Acknowledgements We would like to express our gratitude to Maintmaster AB for sponsoring us with a CMMS and IIoT sensors. This research is a part of the Industrial Technology Graduate School at Mälardalen University.

References 1. Liu Y, Xu X (2017) Industry 4.0 and cloud manufacturing: a comparative analysis. J Manuf Sci Eng 139(3):1–8 2. Alcácer V, Cruz-Machado V (2019) Scanning the Industry 4.0: a literature review on technologies for manufacturing systems. Eng Sci Technol Int J 22(3):899–991 3. Matt D, Modrák V, Zsifkovits H (2020) Industry 4.0 for SMEs challenges, opportunities and requirements: challenges, opportunities and requirements. Palgrave Macmillan Cham, Switzerland 4. Frost T, Nöcker J, Demetz J, Schmidt M (2019) The evolution of Maintenance 4.0—what should the companies be focusing on now?. In: IncoME-IV 5. Zarreh A, Wan H, Lee Y, Saygin C, Janahi RA (2019) Cybersecurity concerns for total productive maintenance in smart manufacturing systems. Procedia Manuf 38:532–539 6. Abidi MH, Mohammed MK, Alkhalefah H (2022) Predictive maintenance planning for Industry 4.0 using machine learning for sustainable manufacturing. Sustainability 14(6):3387 7. Salonen A (2009) Formulation of maintenance strategies. Mälardalen University Press, Eskilstuna 8. Bengtsson M (2007) On condition based maintenance and its implementation in industrial settings. Mälardalens högskola, Västerås 9. Singh S, Galar D, Baglee D, Björling S-E (2013) Self-maintenance techniques: a smart approach towards self-maintenance system. Int J Syst Assur Eng Manag 5(1):75–83 10. Silvestri L, Forcina A, Introna V, Santolamazza A, Cesarotti V (2020) Maintenance transformation through Industry 4.0 technologies: a systematic literature review. Comput Ind 123:1–16 11. Al-Najjar B, Algabroun H, Jonsson M (2018) Smart maintenance model using cyber physical system. In: ICIEIND 12. Kanawaday A, Sane A (2017) Machine learning for predictive maintenance of industrial machines using IoT sensor data. In: ICSESS 13. Lee WJ, Wu H, Yun H, Kim H, Jun M, Sutheralnd J (2019) Predictive maintenance of machine tool systems using artificial intelligence techniques applied to machine condition data. Procedia CIRP 80:506–511 14. Campos J, Kans M, Salonen A (2021) A project management methodology to achieve successful digitalization in maintenance organizations. Int J COMADEM 24(1):3–9 15. Lundgren C, Bokrantz J, Skoogh A (2022) Hindering factors in smart maintenance implementation. In SPS2022

A Testbed for Smart Maintenance Technologies

449

16. Giliyana S, Salonen A, Bengtsson M (2022) Perspectives on smart maintenance technologies— a case study in large manufacturing companies. In: SPS2022 17. Giliyana S, Bengtsson M, Salonen A (2023) Perspectives on smart maintenance technologies— a case study in small and medium-sized enterprises (SMEs) within manufacturing industry. In: WCEAM2022 18. James A, Kumar G, Khan A, Asjad M (2022) Maintenance 4.0: implementation challenges and its analysis. Int J Qual Reliab Manag 19. Masood T, Sonntag P (2020) Industry 4.0: adoption challenges and benefits for SMEs. Comput Ind 121:103261 20. Karlsson C, Åhlström P, Forza C, Voss C, Godsell J, Johnson M, et al (2016) Research methods for operations management, management. Taylor and Francis, London 21. Säfsten K, Gustavsson M (2020) Research methodology—for engineers and other problemsolvers. Studentlitteratur AB, Lund 22. Yin RK (2018) Case study research and applications: design and methods. SAGE Publications, Los Angeles 23. Miles MB, Huberman AM, Saldana J (2019) Qualitative data analysis: an expanded sourcebook. SAGE Publications, Thousand Oaks 24. Braun V, Clarke V (2006) Using thematic analysis in psychology. Qual Res Psychol 3:77–101 25. Xu L, He W, Li S (2014) Internet of things in industries: a survey. IEEE Trans Industr Inf 10:2233–2243 26. Witkowski K (2017) Internet of things, big data, Industry 4.0—innovative solutions in logistics and supply chains management. Procedia Eng 182:763–769 27. Yin S, Kaynak O (2015) Big data for modern industry: challenges and trends [Point of View]. In: IEEE 28. Subramaniyan M, Skoogh A, Salomonsson H, Bangalore P, Bokrantz J (2018) A data-driven algorithm to predict throughput bottlenecks in a production system based on active periods of the machines. Comput Ind Eng 125:533–544 29. Erboz G (2017) How to define industry 4.0: main pillars of Industry 4.0. In: Managerial trends in the development of enterprises in globalization era 30. Figueiredo MJG, Cardoso PJS, Gonçalves CDF, Rodrigues JMF (2014) Augmented reality and holograms for the visualization of mechanical engineering parts. In: 18th international conference on information visualisation 31. Amruthnath N, Gupta T (2018) A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In: ICIEA 32. Nakajima S (1988) Introduction to TPM: total productive maintenance. Productivity Press, Cambridge 33. Jantunen E, Campos J, Sharma P, Baglee D (2017) Digitalisation of maintenance. In: ICSRS 34. Chang YS, Choi HC, Sung SY, Mun SJ (2016) A study of cloud based maintenance system architecture for warehouse automation equipment. In IIAI-AAI 35. Goodall P, Sharpe R, West A (2019) A data-driven simulation to support remanufacturing operations. Comput Ind 105:48–60 36. Roy R, Stark R, Tracht K, Takata S, Mori M (2016) Continuous maintenance and the future— Foundations and technological challenges. CIRP Ann 65(2):667–688 37. Chong S, Pan G-T, Chin J, Show PL, Yang TCK, Huang C-M (2018) Integration of 3D printing and Industry 4.0 into engineering teaching. Sustainability 10(11):1–13 38. Masoni R, Ferrise F, Bordegoni M, Gattullo M, Uva AE, Fiorentino M et al (2017) Supporting remote maintenance in Industry 4.0 through augmented reality. Procedia Manuf 11:1296–1302 39. Palmarini R, Erkoyuncu JA, Roy R, Torabmostaedi H (2018) A systematic review of augmented reality applications in maintenance. Robot Comput-Integr Manuf 49:215–228 40. Kour H, Gondhi NK (2020) Machine learning techniques: a survey. In: ICIDCA 41. Ray S (2019) A quick review of machine learning algorithms. In: COMITCon 42. Tao F, Qi Q, Wang L, Nee A (2019) Digital Twins and cyber-physical systems toward smart manufacturing and Industry 4.0: correlation and comparison. Engineering 5:653–661

450

S. Giliyana et al.

43. Diaz-Cacho M, Cid RL, Acevedo JM, Domínguez AP (2022) Educational test-bed for Maintenance 4.0. In: EDUCON 44. Kans M, Campos J, Håkansson L (2020) A remote laboratory for Maintenance 4.0 training and education. IFAC-PapersOnLine 53(3):101–106 45. Antonino-Daviu J, Dunai L, Climente-Alarcon V (2017) Design of innovative laboratory sessions for electric motors predictive maintenance teaching. In: IECON 46. Yoshino RUI, Pinto M, Pontes J, Treinta F, Justo J, Santos M (2020) Educational Test Bed 4.0: a teaching tool for Industry 4.0. Eur J Eng Educ 45:1–22 47. de Paula Ferreira W, Palaniappan A, Armellini F, de Santa-Eulalia LA, Mosconi E, Marion G (2021) Linking Industry 4.0, learning factory and simulation: testbeds and proof-of-concept experiments. In: Artificial intelligence in Industry 40 48. Damgrave RGJ, Lutters E (2019) Smart industry testbed. Procedia CIRP 84:387–392 49. Gupta B, Vani MP (2018) An overview of web sockets: the future of real-time communication. IRJET 5(12)

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity Ravdeep Kour, Ramin Karim, and Pierre Dersin

Abstract Digitalisation within industries is associated with many positive opportunities and simultaneously poses considerable cybersecurity-related threats. Cybersecurity is a critical concern for many industries, such as railway, aviation, mining, healthcare, and finance, where critical information is at risk of being compromised as well as the operations security. Today, researchers are looking into various solutions to tackle cybersecurity risks and still retain the desired function of the system. However, it is believed that these challenges can be approached by the integration of game theory and Cyber Kill Chain (CKC) which describe the different stages of a cyberattack to understand the chaotic situation of cybersecurity. Thus, the objective of the paper is to propose a game-based approach using game theory to address cybersecurity risks within industries by modelling the strategic interaction between attacker and defender in the Cyber Kill Chain (CKC). This approach aims to enhance understanding of the complex challenges and facilitate the development of effective cybersecurity solutions. This approach will help in evaluating the effectiveness of different security strategies. The proposed strategic approach uses a non-cooperative game which is based on mixed strategies. The authors have defined a scenario for simultaneous-move games by estimating values for various elements of the game. By analysing the behaviour of both attacker and defender, the proposed game-based approach can help industries to develop more effective and efficient security strategies. Further, the proposed approach will provide a better understanding of the complex challenges of cybersecurity in industrial contexts. It can also be used to develop appropriate strategies to mitigate cybersecurity risks. Keywords Game theory · Cyber kill chain · Strategic approach · Cybersecurity R. Kour (B) · R. Karim · P. Dersin Division of Operation and Maintenance Engineering, Luleå University of Technology, Luleå, Sweden e-mail: [email protected] R. Karim e-mail: [email protected] P. Dersin e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_33

451

452

R. Kour et al.

1 Introduction Digitalisation in industries is bringing both positive opportunities and significant cybersecurity threats, prompting researchers to explore various solution to cope with it. Game theory can be one of the areas to explore as a means to address cybersecurity challenges to safeguard critical information and operations. Game theory is based on the concept of a game, which is defined as a situation where more than two individuals or groups are competing or cooperating with each other. It is also defined as the study of multiperson decision problems [1]. A game has three elements [2]: • the set of finite number of players, • actions which are the choices that each player can make in the game, and • Utility or payoff are the outcomes and rewards or losses for each player based on the actions they choose. In 1947, John Von Neumann and Oskar Morgenstern [3] published a book where they defined an expected utility function over lotteries or gambles. In their definition, a lottery or gamble is simply a probability distribution over a known, finite set of outcomes [3]. Thus, we know the probability of the occurrence of each outcome. There are some assumptions of using these probabilities that all actions must be non-negative means that any action available to any given player should be a number between 0 or 1 and probabilities of all the actions for player must sum up to 1. Cybersecurity shares many common properties with game theory. Like game theory, cybersecurity game has players e.g., “defender” and “attacker” who are competing to achieve their own individual goals and maximise their own payoffs. This kind of game is a non-cooperative game where there is no communication between the competing players [1, 4]. Like game theory, in cybersecurity player’s (e.g., defender) payoff depends on both his/her own decisions/strategies and other player’s (e.g., attacker) strategies and behaviour. Thus, based on the similarities, the mathematical tool of the game theory can be applied in cybersecurity to quantify it. This will help to understand more clearly the extend of the impact of a particular cyberattack while choosing a specific security measure. Thus, the objective is to leverage the similarities between game theory and cybersecurity to apply game-theoretic models and strategies, enabling a quantitative understanding of cybersecurity dynamics. This will facilitate the development of effective defensive measures against attackers within the various stage of Cyber Kill Chain (CKC) model [5]. Consider a cybersecurity game with two players named as “Attacker” and “Defender” and their actions {attack, not-attack} & {defend and not-defend} respectively. In this game, each of the outcomes has a probability attached to it, and so we can define a simple attack with a set of outcomes, A = {a1 , a2 ,…,an } each of which occurs with some known probability pi . The utility function u is said to have the expected utility property if, for an attacker A with outcomes {a1 , a2 ,….,an }, with effective probabilities p1 , p2 ,…,pn respectively, we have:

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity

453

u(a) = p1u(a1) + p2u(a2) + ... + pnu(an)

(1)

where u(ai ) is the attacker’s utility from outcome ai . This above-mentioned utility function is used in mixed strategy utility function in terms of cybersecurity games. In mixed strategy games, players randomize over the set of available actions with some probability distribution instead of choosing a fixed action (pure strategy) [1]. For example, an attacker who chooses different cyberattack strategies with equal probability in each round makes difficult for defender to predict his/her actions. It will make the game more complex, as players must consider not only their own actions but also the probabilities of their opponents’ actions. In the proposed cybersecurity game, attacker can choose any attack strategy within each step of CKC model like, “Reconnaissance and Weaponise”, “Delivery and Exploitation”, “Installation” with equal probability in each round making it difficult for defender to predict attacker’s actions. This paper only considers first five stages of the CKC model but can be extended up to seven stages. The objective is to propose a game-based approach to understand interactions between attacker and defender in the Cyber Kill Chain (CKC) that can facilitate the development of effective cybersecurity solutions.

2 Research Methodology The popular databases for example, Google Scholar, Scopus, and webofscience have been searched to know the extent of cybersecurity research using game theory. We searched for the string “TI = (cybersecurity AND “Game”) in the title of the paper in webofscience database and got only 56 papers with much less applied in industrial contexts. Based on the retrieved literature, we used PowerBI tool [6] to visualise data. Figure 1 shows statistics of data related to application of game theory in cybersecurity research. Out of total 56 papers 57% are conference and 43% are journal papers. In the past, several works have focused on cybersecurity education, training, and learning applying game theory [7–11]. In recent research, authors are using game theory to share cyber threat information in the cloud [12], addressing the problem of cybersecurity investment strategies in the smart grid [13], providing network security [14, 15]. Very few researchers have applied game theory concept in industrial contexts with none focus on using Cyber Kill Chain model. A notable recent work attempts to propose a cybersecurity awareness campaign for a transport industry as platforms for cybersecurity awareness initiatives [16].

454

R. Kour et al.

Fig. 1 Total number of literature/years along with word cloud of keywords used in those literature

3 Game Theory for Cyber Kill Chain Cyber kill chain (CKC) is a framework used by cybersecurity professionals to describe the different stages of a cyberattack, from the initial reconnaissance to the final exfiltration of data [5]. The framework is used to identify vulnerabilities in an organization’s defences and design effective countermeasures. Game theory can be applied to the CKC framework to analyse the interaction between attackers and defenders at each stage and design better defensive strategies. The first stage of the CKC is reconnaissance, where the attacker gathers information about the target organization. At this stage, game theory can be used to model the attacker’s decision-making process and identify the most likely sources of information. For example, an attacker may use social engineering techniques to trick employees into revealing sensitive information or scan the organization’s website and social media accounts to gather information. The probability of gaining access at this stage of CKC can be modelled based on the organisation’s network monitoring scans, employee surveys, previous cyberattacks, etc. The second and third stage of the CKC are weaponization and delivery, where the attacker creates a malicious payload and delivers it to the target system. At this stage, game theory can be applied to model the attacker’s decision-making process and identify the most effective delivery methods. For example, the attacker may use a phishing email to deliver the payload or exploit a vulnerability in the target system. The fourth stage of the CKC is exploitation, where the attacker gains access to the target system. At this stage, game theory can be used to model the attacker’s decisionmaking process and identify the most effective methods of exploitation. For example, the attacker may use a zero-day exploit to bypass the system’s defences or brute-force the login credentials. The fifth stage of the CKC is installation, where the attacker

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity

455

installs a backdoor or other malware on the target system. At this stage, game theory can be used to model the attacker’s decision-making process and identify the most effective methods of installation. For example, the attacker may use a Trojan horse program to hide the malware or use a remote access tool to install the malware. The sixth stage of the CKC is command and control, where the attacker establishes a communication channel with the target system. At this stage, game theory can be used to model the attacker’s behaviour and identify the most effective methods of communication. For example, the attacker may use a DNS tunnelling to bypass security controls and establish a hidden communication channel. The final stage of the CKC is exfiltration, where the attacker accomplishes his/her objective by compromising data from the target system. At this stage, game theory can be used to model the attacker’s behaviour and identify the most effective methods of exfiltration. For example, the attacker may use a file transfer protocol to transfer the data to a remote server. In conclusion, game theory can be used to analyse the different stages of the CKC and identify the most effective strategies for both attacker and defender. By modelling the decision-making process of attackers, defenders can better understand their motivations and anticipate their actions. This can help defenders design more effective countermeasures and reduce the impact of cyberattacks. There are various types of games like, cooperative games where players can communicate and make binding agreements with each other before the game is played. In these games, players work together to achieve a common goal and share the rewards or losses of the game. Another kind of game is non-cooperative games where there is no communication between the competing players. These players play against each other to achieve their own individual goals and maximize their own payoffs. Researchers in [17] have argued that there is no pair of deterministic strategy that can work for both attacker and defender, therefore, Mixed Strategy Nash Equilibrium (MSNE) [18] for the cybersecurity game model can be used. According to Oxford, key property of a Nash equilibrium is that “no player has any incentive to deviate unilaterally from it, so that it gives the players no cause to regret their strategy choices when the other players’ choices are revealed” [19]. According to [18] the definition of MSNE of the security game is “the probability distribution over the set of pure strategies for any player such that all actions must be between 0 or 1 and probabilities of all the actions for any player must sum up to 1. According to the MSNE [18] definition, the opponents become indifferent about the choice of their strategies and expected utilities from playing attacker and defender strategies become equal for both the players, i.e., expected utility of the attacker for playing strategy a0, a1, and a2 becomes EU(pa0) = EU(pa1) = EU(pa2) and expected utility of the defender for playing strategy b0, b1, and b2 becomes EU(pb0) = EU(pb1) = EU(pb2). The limitation of Nash equilibrium is that player requires to know his/her opponent’s strategy. Within cybersecurity games using mixed strategies, it is always difficult for the defender to know attacker’s strategy until and unless he/she is leaving some of the signatures for the defender to react on time with some defensive measure. In addition, simultaneous-move games also called as static games or normal form games are represented with the help of Matrices (see Table 1). In these games each

456

R. Kour et al.

Table 1 Cybersecurity game elements mapped to game theory Cybersecurity game elements

Game theory elements

Attacker

Player 1

Defender

Player 2

Mixed strategies

Mixed strategies

Choosing actions within Reconnaissance and weaponise or delivery and exploitation or installation stages of CKC

Player 1 action

Monitoring system or detecting system or responding to attack

Player 2 action

cybersecurity payoff matrix

payoff matrix

expected cybersecurity payoff

expected payoff

player chooses their action without knowledge of the actions chosen by other players. For example, during DDoS attack, the attacker might be launching a flood of traffic to overwhelm the defender’s system, while the defender is simultaneously trying to block the incoming traffic. On the other hand, sequential-move games also called as dynamic games or extensive form games are represented with the help of a tree. In these games, players observe what opponents have done in the past and there is a specific order of play. For example, in case of phishing attack, the attacker might send a chain of fake email to the defender, hoping to trick him/her into revealing sensitive information. The defender might then respond by analysing the email and blocking the sender. The defender’s course of action/strategy “responds” best suits for this kind of game. In our strategic model we are assuming, non-cooperative games with simultaneous move using mixed strategies [4].

4 Strategic Game Model In this section, a strategic game model has been introduced considering a scenario where defender and attacker choose mixed strategies. Following Table 1 shows the elements of cybersecurity game. In this scenario, defender chooses defender strategies/course of action like, “Monitoring system”, “Detecting system”, “Responding to attack” and attacker can be in any stage like, “Reconnaissance and weaponise” or “Delivery and exploitation” or “Installation”. Assume, attack probabilities as (p1 , p2 , p3 ) and defense probabilities as (q1 , q2 , q2 ). Considering following cybersecurity payoff matrix in Fig. 2a–c, we can calculate the expected reward/utility in this mixed strategy games. The payoff matrix is a tool used in game theory to represent the potential outcomes of different strategies for each player [1]. In the context of the CKC, we can construct a payoff matrix that includes the possible actions of the attacker and defender at each stage of the cyberattack (Fig. 2a–c). By assigning payoffs to each combination

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity

457

Fig. 2 Attacker and defender actions in a “Reconnaissance and weaponise” stage b “Delivery and exploitation” stage c “Installation” stage of CKC model

(a)

(b)

(c)

of actions, we can determine the optimal strategies for each player. The optimal defense strategies depend on the attacker’s actions and the defender’s goals. In some cases, the defender may want to prevent the attacker from reaching a particular stage of the CKC, while in other cases, the defender may want to delay or disrupt the attacker’s progress. By analyzing the payoff matrix and identifying the Nash equilibrium, the defender can determine the best countermeasures to deploy at each stage of the CKC. The potential Nash equilibrium entails the defender actively monitoring and responding to attacks while the attacker employs stealthy tactics during reconnaissance, resulting in a stable state where neither party has an incentive to deviate. Ongoing adaptation is essential to maintain this equilibrium in the everevolving cybersecurity landscape. Exploring on how to attain this equilibrium will be considered in future work.

458

R. Kour et al.

In the above-mentioned scenario, as we are dealing with probabilities, we need to consider them for calculating the expected utility. Thus, the expected payoff for each player “i” in any simultaneous-move games is given as: “Sum over all possible outcomes k (reward of getting an outcome k* joint probability of that outcome k being played by all players)” [20]. Total expected payoff for “Attacker” is sum over all the outcomes provided in Table 2, column 2 and the total expected payoff for “Defender” is sum over all the outcomes provided in Table 2, column 3. When choosing a strategy, there are some factors to consider, for example, an attacker considers the value of the asset if compromised, resources required to execute an attack, specialized skills required to plan and execute an attack, importance of keeping their custom-built exploits a secret, and the risk of being caught (fines, imprisonment, etc.) [21]. For defender, these factors can be the value of the asset consumer trust, legal and regulatory compliance, resources required for implementation and maintenance, usability (ease with which legitimate users can perform their work) [21]. Thus, the concepts of game theory can be used in cybersecurity to model the strategic interaction between attackers and defenders in the CKC model. This model consists of seven stages and developed for identification and prevention of cyber intrusions activity [5]. Each stage of the CKC represents a decision point for the attacker, and the defender can choose different security controls to disrupt the attack. By modeling such an interaction between the attacker and defender as a game, we can determine the optimal defense strategies that maximize the defender’s payoff and minimize the attacker’s payoff. Table 2 Expected cybersecurity payoff for “Attacker” and “Defender” based on Fig. 2a Expected payoff from 1 to 9 outcomes

Attacker

Defender

First outcome

(p1)*(q1)*(a11)

(p1)*(q1)*(b11)

Second outcome

(p1)*(q2)*(a12)

(p1)*(q2)*(b12)

Third outcome

(p1)*(q3)*(a13)

(p1)*(q3)*(b13)

Fourth outcome

(p2)*(q1)*(a21)

(p2)*(q1)*(b21)

Fifth outcome

(p2)*(q2)*(a22)

(p2)*(q2)*(b22)

Sixth outcome

(p2)*(q3)*(a23)

(p2)*(q3)*(b23)

Seventh outcome

(p3)*(q1)*(a31)

(p3)*(q1)*(b31)

Eighth outcome

(p3)*(q2)*(a32)

(p3)*(q2)*(b32)

Ninth outcome

(p3)*(q3) *(a33)

(p3)*(q3)*(b33)

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity

459

5 Case Study This section presents a case study to validate the applicability of the strategic game model proposed. We consider a scenario where defender chooses strategies like, “Monitor System”, “Detecting system”, “Respond to attack”, and attacker chooses various attack strategies within “Reconnaissance and Weaponise” stage of CKC as shown in Table 3. To calculate the expected reward/utility in the mixed strategy game, we need to construct a payoff matrix and assign estimated values to each combination of attacker and defender actions. This research has adapted a payoff matrix defined in literature where researchers have used a non-cooperative game-based model for the cybersecurity to obtain optimal strategies to maintain a secure system state [22]. We have used the following cybersecurity payoff matrix and strategies shown in Table 3 to simulate the attacker’s and defender’s expected payoff within Reconnaissance and Weaponise stage of CKC model. Assuming attack probabilities as (p1, p2, p3) and defense probabilities as (q1, q2, q3), we can calculate the expected utility for the attacker and the defender using expressions provided in Table 2 at Column 2 and Column 3 for attacker and defender respectively. Attacker’s Expected Utility at Reconnaissance and Weaponise stage: Social engineering techniques to reveal sensitive information: −5 p1q1 + 5 p1q2 + 5 p1q3

(2)

Scan the organization’s website and social media accounts: Table 3 Cybersecurity payoff matrices for a cybersecurity game using CKC model Defender (course of actions, CoA) Monitor system (q1)

Detecting system (q2)

Respond to attack (q3)

Social engineering −5, 54 techniques to reveal sensitive information (p1)

5, 40

5, 38

Scan the organization’s website and social media accounts (p2)

10, 34

−10, 50

10, 28

Developing malicious payloads to gain unauthorized access to the target system (p3)

15, 24

15, 20

−15, 48

Attacker in reconnaissance and weaponise stage

460

R. Kour et al.

10 p2q1 − 10 p2q2 + 10 p2q3

(3)

Developing malicious payloads to gain unauthorized access to the target system: 15 p3q1 + 15 p3q2 − 15 p3q3

(4)

Defender’s Expected Utility: Monitor System: 54 p1q1 + 34 p2q1 + 24 p3q1

(5)

40 p1q2 + 50 p2q2 + 20 p3q2

(6)

38 p1q3 + 28 p2q3 + 48 p3q3

(7)

Detecting System:

Respond to Attack:

Thus, the expected payoffs for each player (e.g., attacker) has been calculated as below: E x pected Payo f f f or player i (attacker ) = p1 ∗ q1 ∗ Payo f f [i][0, 0] + p1 ∗ q2 ∗ Payo f f [i][0, 1] + p1 ∗ q3 ∗ Payo f f [i][0, 2] + p2 ∗ q1 ∗ Payo f f [i][1, 0] + p2 ∗ q2 ∗ Payo f f [i][1, 1] + p2 ∗ q3 ∗ Payo f f [i][1, 2] + p3 ∗ q1 ∗ Payo f f [i][2, 0] + p3 ∗ q2 ∗ Payo f f [i][2, 1] + p3 ∗ q3 ∗ Payo f f [i][2, 2]

(8)

where, p1, p2, p3 are the probabilities that the attacker chooses the strategies; Social engineering techniques, Scan the organization’s website, and Develop malicious payloads respectively at Reconnaissance and Weaponize stage. q1, q2, q3 are the probabilities that the defender chooses the strategies; Monitor System, Detecting System, and Respond to Attack, respectively. Payoff [i] [j, k] is the payoff for player i (attacker) when attacker chooses strategy j and the defender chooses strategy k. The subscripts [0, 0], [0, 1], etc. denote the rows and columns of the Payoff matrix.

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity

461

Table 4 Simulated result of attacker’s and defender’s expected utilities within different stages of CKC based on random generated attack and defense probabilities A’s strategies

A’s Prob

D’s Strategies

D’s Prob

A’a expected payoff

D’s expected payoff

D&E

[0.2 0.7 0.1]

MS

[0.4 0.3 0.3]

3.5

42.6

I

[01 0.1 0.8]

DS

[0.4 0.4 0.2]

3

30.8

D&E

[0.5 0.4 0.1]

R2A

[0.0 0.3 0.7]

7.5

24.8

R&W

[0.8 0.0 0.2]

MS

[0.3 0.6 0.1]

6

50.8

R&W

[0.7 0.40 0.3]

MS

[0.1 0.6 0.3]

10

49.2

A = attacker; D = Defender; Reconnaissance and Weaponize = R&W; Delivery and Exploitation = D&E; Installation = I; Monitor System = MS; Detecting System = DS; Respond to attack = R2A

If we assume, random generated probabilities of attack strategies (p1, p2, p3) and defense strategies (q1, q2, q3), and the payoffs for each player represented as probability distributions. These probabilities are random in nature; in our case it fits well with bots that just select random targets to launch cyberattacks. In addition, the game randomly selects a defender strategy and an attacker strategy based on their respective probabilities. According to this game, if attack probabilities are: [0.1 0.6 0.3] and defend probabilities are: [0.2 0.0 0.8], Attacker Strategy is “Reconnaissance and Weaponize” and Defender Strategy is “Respond to Attack” then, Expected payoff for attacker to play his/her strategy against defender’s is 11 and Expected payoff for defender to play his/her strategy against attacker’s is 28.8. Table 4 shows some of the few simulated result of attacker’s and defender’s expected utilities within different stages of CKC based on random generated attack and defense probabilities. We can simulate more result for attacker’s and defender’s expected utilities within different stages of CKC based on random generated attack and defense probabilities. This will help us to model a game-based strategy for understanding of cybersecurity dynamics.

6 Conclusion and Future Research It has been concluded that game theory can be utilised to qualify cybersecurity and to understand the interactions between attacker and defender. It has been concluded that game theory can be applied within cyber kill chain model to enhance understanding of the complex challenges and facilitate the development of effective cybersecurity solutions at the defender end. In future, simulated results with more realistic data can help to understand which defensive measures can be more effective for specific attack at any stage of CKC model. This realistic data that can be collected through collaborative actions of various industrial stakeholders, their expertise, white hat hackers, threat intelligence, and literature survey, to generate a near real-life cybersecurity

462

R. Kour et al.

game. This game will enable quantification of cyberattacks and facilitate strategic decision-making by analysing attacker-defender interactions and evaluating expected payoffs of different defense strategies. In further research, factors defined by [21] and presented at Sect. 4 can be considered while developing a payoff matrix and, we will conduct a sensitivity analysis for these factors. In future, we will also assume the probability of cyberattack arrival and then, we can choose any course of action from a matrix defined in [23, 24]. This is an ongoing research work and all the future possibilities defined here will be implemented and published in the next literature. Acknowledgements Authors would like to acknowledge Luleå Railway Research Center (JVTC) and AI Factory for financial support and eMaintenance lab for carrying out this work.

References 1. Gibbons RS (1992) Game theory for applied economists. Princeton University Press 2. Fudenberg D, Tirole J (1991) Game theory. MIT press; Von Neumann J, Morgenstern O (1947) Theory of games and economic behavior, 2nd rev 3. Nash J Jr (1996) In non-cooperative games. Edward Elgar Publishing, Essays on Game Theory, pp 22–33 4. Lockheed Martin Cyber Kill Chain®. https://www.lockheedmartin.com/en-us/capabilities/ cyber/cyber-kill-chain.html. Accessed 9 May 2023 5. Attiah A, Chatterjee M, Zou CC (2018) A game theoretic approach to model cyber attack and defense strategies. In: IEEE international conference on communications (ICC). IEEE, pp 1–7 6. Angafor GN, Yevseyeva I, He Y (2020) Game-based learning: a review of tabletop exercises for cybersecurity incident response training. Secur Priv 3:e126 7. Jin G, Tu M, Kim T, Heffron J, White J (2018) Game based cybersecurity training for high school students. In: Proceedings of the 49th ACM technical symposium on computer science education. pp 68–73 8. Khan MA, Merabet A, Alkaabi S, Sayed HE (2022) Game-based learning platform to enhance cybersecurity education. Educ Inf Technol 1–25 9. Tobarra L, Trapero AP, Pastor R, Robles-Gómez A, Hernandez R Duque A, Cano J (2020) Game-based learning approach to cybersecurity. In: 2020 IEEE global engineering education conference (EDUCON). IEEE, 1125–1132 10. Taylor B (2019) Cybersecurity is not a fad: why cyber is a game changer for computer science education. In: Proceedings of the 50th ACM technical symposium on computer science education. pp 975 11. Amini M, Bozorgasl Z (2023) A game theory method to cyber-threat information sharing in cloud computing technology. Int J Comput Sci Eng Res 11 12. Hyder B, Govindarasu M (2020) Optimization of cybersecurity investment strategies in the smart grid using game-theory. In: IEEE power and energy society innovative smart grid technologies conference (ISGT). IEEE, pp 1–5 13. Feng H, Chen D, Lv H, Lv Z (2023) Game theory in network security for digital twins in industry. Digit Commun Netw 14. Florea R, Craus M (2022) A Game-theoretic approach for network security using honeypots. Futur Internet 14:362 15. Duvenage P, Jaquire V, von Solms S (2022) South Africa’s taxi industry as a cybersecurityawareness game changer: why and how? information security education-adapting to the fourth industrial revolution: 15th IFIP WG 11.8 World Conference, WISE 2022, Proceedings. Springer, Copenhagen, Denmark, pp 92–106

Game Theory and Cyber Kill Chain: A Strategic Approach to Cybersecurity

463

16. Saxena S (2019) Game Theory 101: decision making in a competitive scenario using normal form games 17. Fudenberg D, Tirole J (1991) Game theory. MIT press 18. Oxford Nash equilibrium. https://. Accessed May 31 2023 19. Microsoft Data Visualization | Microsoft Power BI. https://powerbi.microsoft.com/sv-se/. Accessed 31 May 2023 20. Casey Allen Game Theory Applications. Cyber Security. https://www.csnp.org/post/game-the ory-applications-in-cyber-security. Accessed 15 Mar 2023 21. Jahan F, Sun W, Niyaz Q (2020) A non-cooperative game based model for the cybersecurity of autonomous systems. IEEE security and privacy workshops (SPW). IEEE, pp 202–207 22. Kour R, Thaduri A, Karim R (2020) Railway defender kill chain to predict and detect cyberattacks. J Cyber Secur Mobil 47–90 23. Kour R, Karim R, Thaduri A (2020) Cybersecurity for railways–a maturity model. Proc Inst Mech Eng Pt F: J Rail Rapid Transit 234:1129–1148 24. Lanchard BS (2004) Logistics engineering and management, 4 edn. Prentice-Hall, Englewood Cliffs, New Jerssey

On the Need for Human Centric Maintenance Technologies Antti Salonen

Abstract The digitalization of manufacturing industry, known as e.g., Industry 4.0 or smart production, has opened new opportunities for real-time optimization of production systems. Also, this technological leap has provided new possibilities for the maintenance of production equipment to become data driven and in many cases predictive. This fourth industrial revolution is changing the role of humans at the shop floor. Visions of the dark factory arises, meaning fully automated factories where humans are redundant, both for physical processing and for decision making. The research on Smart maintenance shows great advances in predictive diagnostics and prognostic techniques. However, in manufacturing industry, studies have shown that up to 50–60% of equipment breakdowns are due to human errors. Some of these errors are partly addressed through the development of improved information aid, such as e.g., instructions through Augmented Reality and training in Virtual Reality. Still, the root cause of human errors in manufacturing industry haven’t been properly categorized in terms of e.g., neglect, lack of competence, unclear processes, or poor leadership. In this paper the potential of data driven maintenance is discussed from a human centric perspective. Considering the large part of failures being due to human factors and the possibilities of improvement through implementation of smart technologies, this paper argues for exploring the root causes of human errors in discrete item manufacturing systems and address the proper human centric technologies as a means of reducing these failures. Keywords Human centric · Smart maintenance · Human factors

A. Salonen (B) Mälardalen University, 721 23 Högskoleplan Västerås, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_34

465

466

A. Salonen

1 Introduction Industry 4.0 is characterized by smart and connected production units that allows for flexibility and customization, and transparency throughout the whole value chain [1]. Concepts and tools, commonly associated with Industry 4.0 are for example: Cyber physical systems (CPS) [2–4], Internet of things (IoT) [5, 6], Big data [7, 8], Machine learning [9–11], Cloud computing [12, 13] and Fog computing [14, 15]. In the maintenance domain these techniques have turned into concepts such as: eMaintenance [16], Smart Maintenance [17], Self-maintenance techniques [3, 18], predictive and prescriptive maintenance [19, 20] and even Maintenance 4.0 [15, 21– 23]. Even though some of these concepts has been around for a longer time than since Industry 4.0 was first coined [24] the purposes of the concepts coincide with what is expressed within the fourth industrial revolution. All these concepts rely on good knowledge in measurable component deterioration patterns. According to [25], half of the production capacity in manufacturing industry is lost, primarily due to disturbances. In three sites within one automotive manufacturing company, Salonen, [26], found that about 50–63% of the breakdowns were caused by human errors, and further, that poor cleaning (which is a key activity in autonomous maintenance) was a major part of this. These failures may stand for up to 6% of the manufacturing cost [26]. Failures due to human errors are in most cases impossible to detect or predict through technological solutions, at least, without violating ethics. A large part of these human errors is due to operator mistakes, rather often associated with operator driven maintenance activities. Also, as stated by Reason, “Automation and increasingly advanced equipment do not cure human factors problems, they merely relocate them” [27, p.88].

2 Smart Maintenance According to [28], some of the Industry 4.0 pillars play an essential role in maintenance development. IIoT enables physical objects to interconnect through sensors using standard internet protocols [28]. Also, it is an enabling technology integration of the physical and digital world i.e., CPS [18, 29]. Through CPS it is possible to collect data on the current machine status in real-time, the base for Big Data and Analytics [30]. This data can be utilized in maintenance planning and to predict component deterioration in equipment [28]. Industrial applications of Cloud Computing are used for effective data sharing between systems at an entire company and data storage [31]. Jantunen et al. [32] proposes the use of Big Data Analytics and Cloud Computing for improved Condition Based Maintenance, CBM. Through a study in Swedish manufacturing industry, [33] concluded that manufacturing industry finds it challenging to implement smart maintenance technologies

On the Need for Human Centric Maintenance Technologies

467

in a cost-effective manner. One of the main problems was for the companies to determine what to monitor and what type of data to collect. Also, [34], found maintenance organizations struggling with organizational, technical, and managerial challenges when implementing data driven decision making in maintenance.

3 Human Errors Human–machine interaction in discrete manufacturing is comparatively common, and hence, there are a lot of opportunities to make mistakes. “Human error may be defined as the failure to perform a specified task (or the performance of a forbidden action) that could lead to disruption of scheduled operations or result in damage to property and equipment.” [35, p.22]. Human errors in operations, as well as maintenance, are mainly studied in safety critical domains, e.g., nuclear power industry and aviation [36].

3.1 Operations Most studies on human errors in manufacturing focuses on work accidents e.g. [37, 38]. Non the less, there are some studies on human errors, relating to quality and maintenance as well. Böllhoff et al. [39] presents a study of human errors in cellular manufacturing. The result is shown in Table 1. Baxter et al. [40], mean that when systems become increasingly dependable, the opportunity to practice and thereby develop operating skills will be reduced. Therefore, the operators don’t uphold necessary skills in problem solving. Further, by fast technology, the operators don’t know what the technology actually is doing in real-time. This makes it hard for the operators to react directly when deviations occur [40]. Salonen, points out that a common source for human errors in operations is setup changes, especially when operators are less experienced [36]. Table 1 Share of human error types [39]

Error type

Share (%)

Omissions

43

Incorrect selection of program

23

Unstable fixation of work piece

16

Erroneous execution direction

13

Incorrect selection of workpiece

5

468

A. Salonen

3.2 Autonomous Maintenance The term Autonomous Maintenance, AM, is associated with Total Productive Maintenance, TPM, as defined by [41]. “At the heart of autonomous maintenance is deterioration prevention, which has been neglected in most factories until recently.” [41, pp. 165–166] The concept of AM strives to letting the operators taking more responsibility for the basic maintenance actions, e.g. cleaning, and lubrication, and thus preventing deterioration of the production equipment. In Swedish manufacturing industry, AM programs are very common, but still, they are not always fully sufficient. According to [41, p. 99]: “…breakdowns are the results of human factors—the erroneous assumptions and beliefs of engineers, maintenance personnel, and equipment operators”. Salonen states that poor autonomous maintenance is one substantial category of human errors in Swedish automotive manufacturing [26]. One cause of this is insufficient training, and another one lack of time. The production leaders often underestimate the importance of autonomous maintenance and prioritize to maximize the production time. In a study in three production sites within automotive industry, all with AM programs in place, [26] found that 15–35% of the breakdowns were due to poor cleaning. This finding is in line with [39], who found that the most common form of human errors (43%), was omissions, most frequently in the form of neglected cleaning. Another important aspect is that fewer and fewer operators are supposed to do autonomous maintenance on larger and increasingly complex production systems [42]. It is also important to point out that several researchers highlight the importance of considering organizational maturity and the role of humans when implementing technologies, associated with smart production e.g., [43–45]. This most likely applies in AM programs to an even higher extent, considering that the production organization often prioritizes production, being their core activity.

3.3 Professional Maintenance Most studies of human errors in professional maintenance are focusing on safety critical domains, e.g. nuclear power, chemical processing, and aviation maintenance. In powerplant maintenance, [46], identified ten causes of human errors, as presented in Table 2. Human errors on maintenance of discrete item manufacturing is less explored. In a study, [26] found that in three manufacturing sites, the proportion of breakdowns, related to either lack of, or poorly performed preventive maintenance was 10, 11, and 40%. However, the reasons for the poor maintenance are not further studied.

On the Need for Human Centric Maintenance Technologies

469

Table 2 Causal factors for critical incidents and reported events, relating to maintenance errors in power plants [46] Causal factor

Occurrence

Oversights by maintenance personnel

Lowest

Adverse environmental factors

↓

Poor work practices Problems in facility design Poor unit and equipment identification Poor training Problems in moving equipment or people Deficiencies in equipment design Problems in tagging and clearing equipment for maintenance Faulty procedures

Highest

3.4 Human Factors Analysis From a reliability engineering perspective, various models have been developed to assess the probability of human errors. González-Prida et al. [47], summarizes quantitative methods such as Technique for Human Error-Rate Prediction (THERP), A Technique for Human Event Analysis ATHEANA, Absolute Probability Judgement (APJ), and Human Error Assessment and Reduction Technique (HEART). However, all these models fail to identify the underlaying sources of human errors. Within aviation, the Human Factors Analysis and Classification System, HFACS have been developed [48]. In this system unsafe acts are categorized as either Errors, subdivided into Decision based, Skill-based, and Perceptual; or Violations, subdivided into Routine, and exceptional. Further, the HFACS has a maintenance related extension, HFACS-ME, which categorizes underlying factors in three orders, with the first order headings: Management conditions, Maintainer conditions, Working conditions, and Maintainer acts. The second and third orders further divides the first order factors into details [49]. Based on a study by [50], Reason presents a list of conditions that he means are “guaranteed” to increase the nominal error probabilities [27], see Table 3. From Table 3, it becomes obvious that the human errors are not root causes themselves, but rather consequences of other factors. Alonso and Broadribb [51] means that an incident, caused by a human error has underlying “Systems causes” related to practices and processes, which in turn are caused by “Root causes” relating to culture, leadership and corporate values.

470

A. Salonen

Table 3 Error producing factors and their effect [27]

Condition

Risk factor

Unfamiliarity with the task

×17

Time shortage

×11

Poor signal: noice ratio

×10

Poor human system interface

×8

Designer user mismatch

×8

Irreversibility of errors

×8

Information overload

×6

Negative transfer between tasks

×5

Misperception of risk

×4

Poor feedback from the system

×4

Inexperience—not lack of training

×3

Poor instructions or procedures

×3

Inadequate checking

×3

Education all mismatch of person with task

×2

Disturbed sleep patterns

×1.6

Hostile environment

×1.2

Monotony and boredom

×1.1

4 Humans in Industry 4.0 Some authors envision the full-scale Industry 4.0 as an operator-less factory [52]. Contrary, [53] mean that human operators will continue having an important role in production, for coordination, supervision, and decision-making. In relation to the statements by [53], other researchers point out the risk of an ever-increasing cognitive load on the operators [54]. Based on the findings by [33], most companies in manufacturing industry still have a long journey ahead before they can reach the dark factory vision. Still, [55] means that human factors are underrepresented in research on Industry 4.0. Instead of envisioning the operator-less factory, many researchers have proposed how to handle humans in Industry 4.0 settings. Cimini et al. [56] present a Humanin-the-loop manufacturing control architecture as a human centric approach. Based on a structured literature review, [53] identify six research clusters for the anthropocentric perspectives on cyber-physical production systems: 1. 2. 3. 4. 5. 6.

Cognitive aid in planning. Physical aid for the execution of work. Sensorial aid for manufacturing execution. Cognitive aid for manufacturing execution. Sensorial aid for maintenance. Cognitive aid for maintenance.

On the Need for Human Centric Maintenance Technologies

471

5 Root Cause Failure Analysis In order to reduce the high number of disturbances in manufacturing industry, [25] propose a six step process: Detection, Diagnosis, Mitigation/correction, Root cause analysis, Prevention, and Prediction. An interview study with respondents from five different companies lead to the conclusion that the stage of root cause analysis needs more attention. Based on a literature review, Hussin, et al. [57] pointed out the following elements to address in order to succeed with root cause failure analysis (RCFA): • All participants need a full understanding of the RCFA process, methods and tools. • Major, as well as minor aspects of the problem need attention. • Good data quality is needed for a complete analysis. • Use the right RCFA tool, depending on the problem. • Follow all steps in the analysis process. • An effective RCFA requires sufficient amount of resources. • Effective RCFAs requires multi-disciplinary teams. • The teams need appropriate training. • The team, as well as management have to be committed to achieve the objectives of the RCFA. In order to find root causes of failures, Salonen, et al. [58] collected 3 years of raw CMMS data from two production cells containing a total of 11 machines. In total there were 386 work orders. A major problem in the analysis was that only 52% of the work orders included a failure description. Through a workshop with experienced maintenance staff, they could evaluate 202 work orders from the perspective of human errors, with the conclusion that 20% of these were due to human errors [58]. Another study of CMMS analysis is presented by Ahmed, et al. [59]. Here an attempt was made to utilize Natural Language Processing (NLP), Ontology, and Machine Learning (ML), to analyze 1700 work orders collected over 11 years. The main finding from this study was that it is difficult to analyze such unstructured free text data as the CMMS data contained.

6 Discussion The technologies associated with Industry 4.0 have a large potential for improvement of operations as well as maintenance and may facilitate data driven decision making at shopfloor level. Studies have shown that a rather large part of equipment breakdowns in manufacturing industry is caused by human errors [26]. The underlying causes for human errors are less studied in manufacturing industry. This is probably due to the fact that breakdowns in manufacturing industry seldom lead to severe injuries or causalities.

472

A. Salonen

Since the human interaction in discrete manufacturing is comparatively common, there is a need for a thorough analysis of why humans make mistakes in these systems. These mistakes are done by operators during operations, but also during the autonomous maintenance, (i.e., operators performing basic maintenance on the equipment), and the maintenance staff performing advanced maintenance tasks. However, there are few companies that fully grasp the proportion of human errors or which failures are due to human factors. The study presented by [26] show different categories of root causes of failures in manufacturing industry. However, the categories presented are showing the sources of failures, rather than the true root causes. Studies from other branches, [46, 27], indicate that many of the underlying factors relate to poor processes, poor training, and poor equipment design. Also the tendency that fewer operators are operating and maintaining larger and more complex production systems [42], relates to several of the error producing factors presented in Table 3. In order to develop sufficient, human centric maintenance technologies, it is essential to learn the underlying causes of human errors in manufacturing industry. A first step for this could be to further map these causes, e.g. through HFACS-ME in order to identify which ones belong to Management conditions, Maintainer conditions, Working conditions, and Maintainer acts. Based on the mapping of human factors, appropriate tools for support in the cognitive and sensorial perspectives of cyber-physical production systems, as described by [53] can be developed, utilizing the technologies associated with Industry 4.0. In industry today, there seem to be a rather low interest for deeper RCFA, as well as mapping of human factors as a basis for the implementation of cognitive aid for operators, and maintenance staff in manufacturing industry. One important reason for this may very well be the low awareness of the cost of human errors, in turn relating to the absence of valid models for the financial aspects of maintenance activities. Within a cyber-physical production system, utilizing big data, the possibilities for real-time tracking of equipment status should allow for improved tracking of costs, related to poor dependability. A human centric perspective on these smart maintenance technologies is also an essential aspect of what could become Maintenance 5.0, together with sustainability and resilience. Acknowledgements This paper is based on research, funded by the initiative for Excellence in Production Research, XPRES, one of two government funded strategic initiatives within Manufacturing engineering in Sweden.

On the Need for Human Centric Maintenance Technologies

473

References 1. Smit J, Kreutzer S, Moeller C, Carlberg M (2016) Industry 4.0. European Parliament, Brussels 2. Monostori L, Kádár B, Bauernhansl T, Kondoh S, Kumara S, Reinhart G, Sauer O, Schuh G, Sihn W, Ueda K (2016) Cyber-physical systems in manufacturing. Proc CIRP Ann-Manuf Technol 65:621–641 3. Brettel M, Friederichsen N, Keller M, Rosenberg M (2014) How virtualization, decentralization and network building change the manufacturing landscape: an Industry 4.0 perspective. Int J Mech Ind Sci Eng 8(1):37–44 4. Lee J, Kao HA, Yang S (2014) Service innovation and smart analytics for Industry 4.0 and big data environment. Procedia Cirp 16:3–8 5. Tedeschi S, Mehnen J, Tapoglou N, Roy R (2017) Secure IoT devices for the maintenance of machine tools. Procedia CIRP 59:150–155 6. Compare M, Baraldi P, Zio E (2019) Challenges to IoT-enabled predictive maintenance for Industry 4.0. IEEE Internet Things J 7. Baum J, Laroque C, Oeser B, Skoogh A, Subramaniyn M (2018) Applications of big data analytics and related technologies in maintenance—literature-based research. Machines 6 8. Patwardhan A, Verma AK, Kumar U (2016) A survey on predictive maintenance through big data. In: Current trends in reliability, availability, maintainability and safety. pp 437–445 9. Lee J, Davari H, Singh J, Pandhare V (2018) Industrial artificial intelligence for Industry 4.0-based manufacturing systems. Manuf Lett 18:20–23 10. Paolanti M, Romeo L, Felicetti A, Mancini A, Frontoni E, Loncarski J (2018) Machine learning approach for predictive maintenance in Industry 4.0. In: 2018 14th IEEE/ASME international conference on mechatronic and embedded systems and applications (MESA). pp 1–6 11. Bajic B, Cosic I, Lazarevic M, Sremcev N, Rikalovic A (2018) Machine learning techniques for smart manufacturing: applications and challenges in Industry 4.0. In: Conference: 9th international scientific and expert conference TEAM 12. Mell P, Grance T (2011) The NIST definition of cloud computing, NIST Special Publication 800–145, http://faculty.winthrop.edu/domanm/csci411/Handouts/NIST.pdf. Accessed Oct 2018 13. Yashpalsinh J, Modi K (2012) Cloud computing-concepts, architecture and challenges. In: International conference on computing, electronics and electrical technologies (ICCEET). IEEE 14. Ashjaei M, Bengtsson M (2017) Enhancing smart maintenance management using fog computing technology. In: The international conference on industrial engineering and engineering management (IEEM). pp 1561–1565 15. Rani S, Kataria A, Chauhan M (2022) Fog computing in Industry 4.0: applications and challenges—a research roadmap. Energy conservation solutions for fog-edge computing paradigms. pp 173–190 16. Kumar U, Galar D (2018) Maintenance in the era of Industry 4.0: issues and challenges. Quality, IT and business operations. pp 231–250 17. Bumblauskas D, Gemmill D, Igou A, Anzengruber J (2017) Smart Maintenance decision support systems (SMD S) based on corporate big data analytics. Expert Syst Appl 90:303–317 18. Singh S, Galar D, Baglee D, Björling S-E (2014) Self-maintenance techniques: a smart approach towards self-maintenance systems. Int J Syst Assur Eng Manag 5(1):75–83 19. Lee J, Begheri B, Kao H (2015) A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manuf Lett 3:18–23 20. Hashemian HM, Bean WC (2011) State-of-the-art predictive maintenance techniques. IEEE Trans Instrum Meas 60(10):3480–3492 21. Cachada A, Barbosa J, Leitño P, Gcraldcs CA, Deusdado L, Costa J, Teixeira C, Teixeira J, Moreira A, Moreira P, Romero L (2018) Maintenance 4.0: Intelligent and predictive maintenance system architecture. In: 2018 IEEE 23rd International conference on emerging technologies and factory automation (ETFA), vol 1. pp 139–146

474

A. Salonen

22. Kans M, Galar D (2017) The impact of Maintenance 4.0 and Big data analytics within strategic asset management. In: 6th international conference on maintenance performance measurement and management. Luleå, Sweden, pp 96–103. 28 Nov 2016 23. Algabroun H, Iftikhar MU, Al-Najjar B, Weyns D (2017) Maintenance 4.0 framework using self-adaptive software architecture. In: Proceedings of 2nd international conference on maintenance engineering, IncoME-II. The University of Manchester, UK 24. Kagermann H, Lukas W, Wahlster W (2011) Industrie 4.0: mit dem internet der dinge auf dem weg zur 4. industriellen revolution. VDI nachrichten 13. http://www.wolfgangwahlster.de/wor dpress/wpcontent/uploads/Industrie_4_0_Mit_dem_Internet_der_Dinge_auf_dem_Weg_zur_ vierten_industriellen_Revolution_2.pdf. Accessed Oct 2018 25. Ito A, Ylipää T, Skoogh A, Gullander P (2021) Production disturbances handling: where are we and where are we heading?. In: 2nd south american conference on industrial engineering and operations management, IEOM 2021, 5 April 2021 through 8 April 2021. IEOM Society, pp 12–23 26. Salonen A (2018) The need for a holistic view on dependable production systems. Proceedia Manuf (25):17–22 27. Reason J (1995) Understanding adverse events: human factors. BMJ Qual Saf 4(2):80–89 28. Silvestri L, Forcina A, Introna V, Santolamazza A, Cesarotti V (2020) Maintenance transformation through Industry 4.0 technologies: a systematic literature review. Comput Ind 123:103335 29. Penna R, Amaral M, Espíndola D, Botelho S, Duarte N, Pereira CE, Zuccolotto M, Frazzon EM (2014) Visualization tool for cyber-physical maintenance systems. In: 12th IEEE international conference on industrial informatics (INDIN). pp 566–571 30. Peres RS, Dionisio Rocha A, Leitao P, Barata J (2018) Idarts—towards intelligent data analysis and real-time supervision for Industry 4.0. Comput Ind 101:138–146 31. Liu Y, & Xu X (2017) Industry 4.0 and cloud manufacturing: a comparative analysis. J Manuf Sci Eng 139(3) 32. Jantunen E, Campos J, Sharma P, Baglee D (2017) Digitalisation of maintenance. In: 2nd international conference on system reliability and safety (ICSRS). pp 343–347 33. Giliyana S, Salonen A, Bengtsson M (2022) Perspectives on smart maintenance technologies— a case study in large manufacturing companies. In: Proceedings of the 10th Swedish production symposium. pp 255–266 34. Savolainen P, Magnusson J, Gopalakrishnan M, Bekar ET, Skoogh A (2020) Organisational constraints in data-driven maintenance: a case study in the automotive industry. IFAC-PapersOnLine 53(3):95–100 35. Dhillon BS, Liu Y (2006) Human error in maintenance: a review. J Qual Maint Eng 12(1):21–36 36. Salonen A (2019) Human errors in Industry 4.0: Opportunities and challenges from a dependability perspective. In: The proceedings of 4th international conference on maintenance engineering. Dubai UAE, pp 69–78 37. Yeow JA, Ng PK, Tai HT, Chow MM (2020) A review on human error in Malaysia manufacturing industries. Management 5(19):01–13 38. Reyes RM, de la Riva J, Maldonado A, Woocay A (2015) Association between human error and occupational accidents’ contributing factors for hand injuries in the automotive manufacturing industry. Procedia Manuf 3:6498–6504 39. Böllhoff J, Metternich J, Frick N, Kruczek M (2016) Evaluation of the human error probability in cellular manufacturing. Procedia CIRP 55:218–223 40. Baxter G, Rooksby J, Wang Y, Khajeh-Hosseini A (2012) The ironies of automation: still going strong at 30?. In: Proceedings of the 30th European conference on cognitive ergonomics. ACM, pp 65–71 41. Nakajima S (1989) TPM—development program—implementing total productive maintenance. Productivity Press, Cambridge 42. Salonen A (2023) What is smart maintenance in manufacturing industry? 16th WCEAM proceedings. Springer International Publishing, Cham, pp 366–374

On the Need for Human Centric Maintenance Technologies

475

43. Baglee D, Jantunen E, Sharma P (2016) Identifying organisational requirements for the implementation of an advanced maintenance strategy in small to medium enterprises (SME). J Maint Eng 16–26 44. Saltzer M (2017) A blueprint for digitalisation of maintenance. In: Proceedings of 2nd international conference on maintenance engineering, INCOME-II. pp 384–391 45. Havle C, Üçler Ç (2018) Enablers for Industry 4.0. In: IEEE international symposium on multidisciplinary studies and innovative technologies 46. Dhillon BS (2014) Human error in power plant maintenance. In: Human reliability, error, and human factors in power generation. Springer, Cham, pp 135–149 47. González-Prida V, Parra C, Crespo A, Kristjanpoller FA, Gunckel PV (2022) Reliability engineering techniques applied to the human failure analysis process. In: Cases on Optimizing the asset management process. IGI Global, pp 162–179 48. Shappell SA, Wiegmann DA (2000) The human factors analysis and classification systemHFACS 49. Schmidt JK, Lawson D, Figlock R (2003) Human factors analysis and classification system maintenance extension (HFACS-ME) review of select NTSB maintenance mishaps: an update 50. Williams JC (1988) A data-based method for assessing and reducing human error to improve operational performance. In: Conference record for 1988 IEEE fourth conference on human factors and power plants. IEEE, pp 436–450 51. Alonso IJ, Broadribb M (2018) Human error: a myth eclipsing real causes. Process Saf Prog 37(2):145–149 52. Benešová A, Tupa J (2017) Requirements for education and qualification of people in Industry 4.0. Procedia Manuf 11:2195–2202 53. Rauch E, Linder C, Dallasega P (2020) Anthropocentric perspective of production before and within Industry 4.0. Comput Ind Eng 139:105644 54. Dombrowski U, Wagner T (2014) Mental strain as field of action in the 4th industrial revolution. Proceedia CIRP 17:100–105 55. Neumann WP, Winkelhaus S, Grosse EH, Glock CH (2021) Industry 4.0 and the human factor– a systems framework and analysis methodology for successful development. Int J Prod Econ 233:107992 56. Cimini C, Pirola F, Pinto R, Cavalieri S (2020) A human-in-the-loop manufacturing control architecture for the next generation of production systems. J Manuf Syst 54:258–271 57. Hussin H, Ahmed U, Muhammad M (2016) Critical success factors of root cause failure analysis. Indian J Sci Technol 9(48):1–10 58. Salonen A, Bengtsson M, Fridholm V (2020) The possibilities of improving maintenance through CMMS data analysis. In: proceedings from the Swedish production symposium, SPS2020 59. Ahmed MU, Bengtsson M, Salonen A, Funk P (2021) Analysis of breakdown reports using natural language processing and machine learning. In: International congress and workshop on industrial AI. Springer, Cham, pp 40–52

A Systematic Study of the Effect of Signal Alignment in Information Extraction from Railway Infrastructure Recording Vehicle Data Daniël Fourie, Daniel N. Wilke, and Petrus Johannes Gräbe

Abstract The maintenance and development of rail infrastructure in Africa is key to the future success of the African content. Development of a digital twin in the context of South African railway infrastructure will assist with the future need for effective maintenance planning and resource optimisation. As part of the process towards realisation of a railway infrastructure digital twin for South African railway infrastructure, the current study quantified the effect of various signal alignment strategies and errors on the ability of Singular-Spectrum Analysis to extract features that can be used when considering the evolution of track geometry over time. When minimal stretching is present in the data, the pairwise cross-correlation for the flattened matrices representing the two sets of elementary matrices for two subsequent measurement campaigns provide a latent space that looks promising for identifying changes in the track geometry as recorded by a track geometry car. Keywords Track geometry · SSA · Signal alignment

1 Introduction The maintenance and development of rail transport in Africa is seen as a prerequisite for growth on the African content. A well-established rail network is of strategic importance for the African continent as it will provide the backbone for future economic growth. Aging railway infrastructure in Africa in general increase D. Fourie (B) · D. N. Wilke University of Pretoria, Centre for Asset Integrity Management, Pretoria, South Africa e-mail: [email protected] D. N. Wilke e-mail: [email protected] P. J. Gräbe Chair in Railway Engineering, University of Pretoria, Pretoria, South Africa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_35

477

478

D. Fourie et al.

the maintenance cost in an already resourced constraint environment and requires careful planning and resource optimisation to ensure that the allocation of available funding is optimised. Having a digital twin of the railway track can help tremendously with improving the efficiency and efficacy of maintenance operations as well as maintenance planning [1]. Regarding maintenance, the digital twin can simulate and analyse the performance of the railway track and predict and model maintenance requirements as well as alert the maintenance team of areas with rapid degradation that might have otherwise been overlooked. If one considers that a small team is usually required to perform data analysis, maintenance planning as well as daily “firefighting” to minimise delays on a network often stretching across several 100 km, a digital twin will definitely be a very valuable tool to the exposal of the maintenance engineer. In addition, when one considers the possibility of the digital twin to perform near real time optimisation, maintenance activities and planning can be updated in a continuous manner. As part of the future realisation of a railway infrastructure digital twin for South African railway infrastructure, the aim of the current study is to systematically quantify the effect of various signal alignment strategies on the ability of Singularspectrum analysis [2] to extract features and information from these signals. Time series feature extraction aims to extract a set of properties that characterise the time series and that can subsequently be used in machine learning algorithms. Singularspectrum analysis (SSA) is a method for unsupervised time series decomposition and is the method chosen for the current study as it is a proven and robust method.

2 Required Background A railway line is a fixed linear asset that provides a fixed guideway to rolling stock. From a data perspective, linear assets can be quite complex, spanning hundreds or even thousands of kilometres. Spatially positioning an object or defect on railway lines is achieved using either a linear distance along the line with a fixed reference point or using spatial data captured using a GPS receiver. Typical geometry measurements of the railway infrastructure are referenced using spatial data in the forms of both linear distances as well GPS coordinates. These signals are readily biased due to resulting drift in recorded data of which the drift can be non-uniform over time. The required level of accuracy in spatially positioning an object or defect within the data set describing a linear asset depends on the application at hand, for instance, given the task of physically locating an individual object or defect by a track worker can tolerate some inaccuracy regarding spatial positioning. Other tasks like finding correlations over time between geometry defects or objects within a digital environment are more sensitive. The quality of information extracted can be severely influenced by such inaccuracies, depending on the method employed. Typical spatial positioning of the signals recorded by the infrastructure measurement car is readily biased due to resulting drift in the recorded positioning data which is typically non-uniform between locations and time.

A Systematic Study of the Effect of Signal Alignment in Information …

479

Weston [3] mentions that during the process of mutual alignment of track geometry data, mutual alignment will have to consider any initial mismatch in position as well as stretching or compression of one data set relative to another due to an accumulation of distance error arising from the wheelset tachometer. The most significant challenge in the mutual alignment of data is when the tachometer introduces errors that are variable over a small distance scale [3]. Such errors need to be accommodated considering small segments of track geometry data ensuring that any local drift in the data can be captured and corrected for. Algorithms for mutual alignment of railway track geometry measurements have been presented in [4, 5]. The alignment strategy in [4] involves using an incremental change in the local cross correlation window size when computing the spatially correlated measurement errors that is used for correcting the signal offset. At the coarsest level of alignment, the initial window size has an upper bound based on the largest amount of drift possible. To detect a local drift with a frequency f the alignment window needs to be shorter than 1/2 f . In Ref. [5], five alignment methods were compared, namely the cross-correlation function, recursive alignment by fast Fourier transform (RAFFT), dynamic time warping (DTW), correlation optimised warping (COW) as well as a combined method. The combined method is based on first performing the alignment of the start and ends of the data set using RAFFT and secondly using Correlation Optimised Warping to eliminate any warping of the data sets with respect to one another. Although DTW was able to precisely align the data sets, DTW significantly changed the shape of the aligned data set especially in the peaks. A window length of 1 km was chosen in [5] and the success of the alignment was judged by being able to perform both global and local alignment of a large window length of data simultaneously. The combined method performed best at aligning the track geometry data and could decrease the positional errors of single defects below 0.25 m in 90% of the data sets considered. When considering a very long window length for once off alignment of data sets between different measurement campaigns global strategies like cross correlation will not perform very well with data sets where the compression or stretching changes drastically over the long window length. When a long window length of data is considered it is important to use a strategy that can perform alignment both on a local and global scale simultaneously or through the use of refinement by performing the alignment while progressing from a global to a local scale. For the current study an initial coarse alignment of the windowed data is suggested based on the Haversine distances calculated between the GPS coordinates representing the same distance marker in the two data sets and not based on a predefined upper bound. Local refinement of the data is not considered below the 200 m window chosen for the current study, however once the initial coarse alignment using GPS data is performed the starting point of the second data set is updated and a new set of data selected using the new starting point and the window size as before.

480

D. Fourie et al.

2.1 Track Geometry Data A railway infrastructure measuring car is typically used for measuring track geometry, catenary and rail parameters and the recorded as well as post-processed data is crucial in effective maintenance planning and maintenance decision-making [5]. The railway infrastructure measuring car reports several track geometry parameters like profile, horizontal alignment, track twist, track gauge and super-elevation with a sampling frequency of 250 mm. The profile data, which will be used in the current study, is extracted from the vertical space curve of the measured data using a 7 m chord. Using a 7 m chord provides an important filtering effect to the data as it filters out long wavelengths in the data that would occur in the space curve data with a wavelength larger than 7 m. Changes in the vertical alignment of the track is caused by track differential settlement, through changes in the track support stiffness as well as due to the presence of rail surface defects [6].

2.2 Spatial Accuracy of Data Two different methods were used to identify the spatial accuracy of the data. The first method was to evaluate the haversine distance between the GPS coordinates representing the same distance marker in the data between two subsequent measurement campaigns. The haversine distances calculated for a 90 km track section between the distance-matched GPS coordinates are shown in Fig. 1. The data in Fig. 2 shows that the haversine distance calculated between the GPS coordinates of the same kilometre markers between different campaigns is not constant with distance along the track. This introduces stretching and compression of the two data sets relative to one another as well as data drift, both over a short and long-distance scale. The second method involved calculating how much the haversine distance between two subsequent data points in the same data set is different from 250 mm. This can also be interpreted as the accuracy of the GPS measurements when compared Fig. 1 Haversine distance for the same linear distance marker between two different measurement campaigns

A Systematic Study of the Effect of Signal Alignment in Information …

481

Fig. 2 Distance estimated between data points from GPS coordinates

to the expected distance of 250 mm. Figure 2 shows the haversine distance between subsequent data points for the first data set used in the calculation presented in Fig. 1.

2.3 Aligning Data Sets Between Different Measurement Campaigns Because of the difference in the haversine distance calculated between data points of two different campaigns and representing the same linear distance marker, it is evident that the data sets representing two different measurement campaigns require aligning. The first coarse alignment of the two data sets (each representing 200 m of track) is achieved by shifting the second data set by the median haversine distance of the data block in the appropriate direction. The median is used instead of the average to disregard outliers in the data set. The second data shift is then achieved by calculating the maximum cross correlation between the two data sets whilst also considering the direction in that the shift needs to be affected. The first coarse alignment is important in situations where a maintenance intervention between two data sets will render a low cross-correlation as the before and after data sets would be significantly different. Figures 3 and 4 show respectively the alignment of the signals before and after the two alignment steps. After aligning the signals, unique features can be identified in the data set with the aim of using such unique features to classify any degradation of the track geometry parameters over time. SSA, being a proven and robust method for time series decomposition was chosen as the method for time series feature extraction.

482

D. Fourie et al.

Fig. 3 Signal alignment before shifting data

Fig. 4 Signal alignment after final shift

2.4 Features Characterising the Evolution of a Double Slack in the Track Geometry The degradation of a double slack as evident in the profile data (See Fig. 4) for a given 200 m section of track was tracked over four measurement campaigns to understand which features in the latent space can be used to uniquely classify the event. Between the fourth and fifth measurement campaign, the section of track was maintained and brought within the construction standard for profile. SSA was employed to decompose the trajectory matrix, formed by a sequence of lagged vectors of the time series, into its elementary matrices. The visual appearance of the elementary matrices hints at the nature of each component be it periodicity, trend or noise and can be used to understand the features of the time series. To further understand how the features as contained in the set of elementary matrices change between two track geometry measurement campaigns the Pearson’s correlation coefficient was calculated for each [i, j] pair of the two sets of elementary matrices representing two follow up measurement campaigns and presented as a 20 × 20 confusion matrix. If the sets of elementary matrices are identical, one would expect a correlation coefficient of one on the diagonal of the matrix and off-diagonal terms of zero since each set of elementary matrices is orthogonal.

A Systematic Study of the Effect of Signal Alignment in Information …

483

Fig. 5 Confusion matrix (1–2)

If one plots the pairwise cross-correlation for the flattened matrices representing the two sets of elementary matrices that were computed from the trajectory matrix representing the two subsequent measurement campaigns, it is possible to understand which elementary matrices capture the change in the profile geometry between the subsequent measurement runs, if any. Figure 5 represents the cross-correlation between the elementary matrices representing the trajectory space of the first two time series. Here the first five elementary matrices between the two different data sets are well correlated showing that the features captured by those elementary matrices remain unchanged. The 5th, 6th and 7th elementary matrices show a lower correlation and/or swopping. Swopping implies the relative contributions of the elementary matrix to the trajectory matrix changes. Generally, it was found that the first 13 elementary matrices represent unique features of the time series, whereas beyond this the elementary matrices represent noise in the original time series and is thus not expected to show a good correlation. Figure 6 represents the cross-correlation between the elementary matrices representing the trajectory space of the first and third-time series. The first time series is used as the baseline for all comparisons. Here also the 5th, 6th, 7th and 8th elementary matrix pairs show a low correlation and/or swopping. This is also true when comparing the confusion matrix in Fig. 7 representing elementary matrices of the trajectory space of the first and fourth-time series. Figure 8 shows how the relative contributions of each elementary matrix to the trajectory matrix computed for the 1st and 4th-time series changes over time. Here the first two elementary matrices describe the largest contribution to the trajectory matrix and hence the largest contribution to the original time series as well. It can also be seen that the relative contribution of the 4th, 5th, 6th and 7th elementary matrices to the trajectory matrix increases as the profile error representing

484

D. Fourie et al.

Fig. 6 Confusion matrix (1–3)

Fig. 7 Confusion matrix (1–4)

a double slack in the track becomes larger. The evolution of the profile error is contained in the two-time series presented in Figs. 3 and 4. Figure 9 represents the cross-correlation between the elementary matrices representing the trajectory space of the first and fifth-time series. Between the fourth and fifth measurement campaigns the track geometry was restored to the A-standard through a maintenance intervention. Here the two series are significantly different resulting in a scattered confusion matrix when comparing the two sets of elementary matrices representing the trajectory space of each time series.

A Systematic Study of the Effect of Signal Alignment in Information …

485

Fig. 8 Relative contribution of the ith elementary matrix to trajectory matrix

Fig. 9 Confusion matrix (1–5)

Figure 10 visualises the 4th to the 7th elementary matrices of the decomposition of the first and fourth time series to assist interpreting the swopping of the elementary matrices in Fig. 7.

2.5 Stretching of Time Series Data Stretching of one-time series relative to another representing the same 200 m section of track is especially problematic when trying to shift the time series to achieve the best cross-correlation. Advanced signal processing techniques are available to improve signal alignment [4, 5]. However, for the current analysis such techniques were not considered as the main aim of this article is to understand how the stretching

486 Fig. 10 Elementary matrices 4–7 a Time series 1 b Time series 4

D. Fourie et al.

(a)

(b)

of one signal relative to another impairs the cross-correlation between the elementary matrices representing the trajectory space of each time series. Here the stretch was introduced artificially to a base time series signal, time series 1 as referred to in other parts of the paper. A stretch of 0.2, 0.4%, as well as 1%, was introduced as was typically found when computing the gradients in the haversine distance shown in Fig. 1 and representing the relative change in distance between two data sets based on the GPS coordinates for points having the same linear distance in the data set. When stretching a time series, the frequency components also get larger towards the end of the signal. It is thus expected that the cross-correlation between the base signal and the stretched signal (or the elementary matrices representing the trajectory space of the time series) degrades as the stretch gets larger. When considering that one would like to use the elementary matrices or the pairwise correlation coefficients between two sets of elementary matrices to identify features that can be used to uniquely classify changes in the condition of the track it becomes increasingly important that any stretch that might exist in the original signals does not degrade the information that can be obtained/extracted from the latent space representation of the original time series. Figure 11 represents the cross-correlation between the elementary matrices representing the trajectory space of the baseline and resampled time series with a 0.2% stretch introduced. Evident from Fig. 11 is that for a 0.2% stretch, the crosscorrelation between the first 13 elementary matrices (representing important features of the time series) remains high. Figure 12 shows that for a stretch of 0.4% the cross correlation reduces significantly, whilst Fig. 13 shows that the stretch of 1.0% even introduces swopping of elementary matrices in the confusion matrix. It can thus be seen that any stretch in the signals might hamper the task of extracting meaningful features from the time series signals when looking for features that indicates a change in the track condition.

A Systematic Study of the Effect of Signal Alignment in Information … Fig. 11 Confusion matrix (0.2% stretch)

Fig. 12 Confusion matrix (0.4% stretch)

487

488

D. Fourie et al.

Fig. 13 Confusion matrix (1% stretch)

3 Conclusion Data sets that are recorded for different measurement runs of the track geometry vehicle requires a pre-processing step to ensure accurate alignment of the signals. The available GPS positioning data allows for an initial alignment of the signals. The initial offset between the two signals can be in excess of 40 m. If the track geometry has not drastically improved or degraded between two measurement runs the maximum cross-correlation between the two data sets provide the final alignment. Going forward the data shift of each previous data block for shifting the current data block needs to be considered in instances when the cross-correlation is poor. When large amounts of stretching is present in one data set with respect to another the pair wise cross-correlation computed for the two sets of flattened elementary matrices degrades quickly. When considering that one would like to use the elementary matrices or the pairwise correlation coefficients between two sets of elementary matrices to identify features that can be used to uniquely classify changes in the condition of the track it becomes increasingly important that any stretch that might exist in the original signals does not degrade the information that can be obtained/ extracted from the latent space representation of the original time series. When minimal stretching is present in the data, the pairwise cross-correlation for the flattened matrices representing the two sets of elementary matrices that were computed from the trajectory matrix representing the two subsequent measurement campaigns, creates a latent space that looks promising for identifying changes in the profile as recorded by a track geometry car. Going forward, the effect of more advanced signal alignment strategies will also be evaluated when considering the ability of unsupervised statistical, machine, and deep learning strategies to extract information from stretched track geometry signals.

A Systematic Study of the Effect of Signal Alignment in Information …

489

References 1. Wilke DN, Fourie DJ, Grabe H (2023) Towards a railway digital twin framework for African railway lifecycle management. Paper submitted to the 3rd ACIS international conference of artificial intelligence (IAI-2023) 2. Golyandina N, Nekrutkin V, Zhigjavsky A (2001) Analysis of time series structure: SSA and related techniques. Taylor & Francis 3. Weston P, Roberts R, Yeo G, Stewart E (2015) Perspectives on railway track geometry condition monitoring from in-service railway vehicles. Veh Syst Dyn 53(7):1063–1091 4. Eklöf K, Nwichi-Holdsworth A, Eklöf J (2021) Novel algorithm for mutual alignment of railway track geometry measurements. Transp Res Rec 2675(12):995–1004 5. Khosravi M, Soleimanmeigouni I, Ahmadi A, Nissen A (2021) Reducing the positional errors of railway track geometry using alignment methods: a comparative study. Measurement 178 6. Van der Merwe G, Zaayman LC, Venter PB (2011) IM2000 infrastructure measuring car: the application of recording results. Rail Engineering International Edition 2002 Number 4

Wheel Damage Prediction Using Wayside Detector Data for a Cross-Border Operating Fleet with Irregular Detector Passage Patterns Johan Öhman, Wolfgang Birk, and Jesper Westerberg

Abstract Wheel damages on railway vehicles caused by rolling contact fatigue or blocked wheels can cause severe problems for railway operators and infrastructure owners. Wheel impact load wayside detectors (WILD) are one of the means to assess the condition of a wheel in operation, but varying operating routes, irregular traffic patterns, and especially cross-border operations make this quite challenging. While the condition updates occur randomly, the detectors themselves are managed by different owners and principles. Thus, using the same type of data from not only different wayside locations but also different providers and authorities with varying fidelity and operational practices introduces uncertainties in data quality and consistency. This paper presents an approach for predicting wheel damage severity on a wagon fleet with irregular cross-border operations, achieving similar confidence levels as for regular traffic patterns on a national scale. The different sensor characteristics are explored between countries and within each country. The approach is implemented as a cloud-based solution which integrates wayside detector data from multiple locations provided by two different infrastructure owners in two countries. The solution estimates remaining useful life based on data from both countries and aggregates this to a single indication for the decision maker. The algorithm’s performance is showcased for vehicles with cross-border operations. The results indicate that the proposed approach confirms that irregularly provided measurement data with data quality and consistency issues are manageable and adequate decision-making performance. Keywords Wheel damages · Railway · Condition-based maintenance · Predictive maintenance · Wayside detectors · Data fusion · Statistical fusion J. Öhman (B) · W. Birk · J. Westerberg Predge AB, 97236 Luleå, Sweden e-mail: [email protected] W. Birk e-mail: [email protected] J. Westerberg e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_36

491

492

J. Öhman et al.

1 Introduction The efficiency of railway operations depends on a well-working infrastructure and a minimal number of disruptions of operations. Wheel damages are one of the factors that can cause harm to the infrastructure, derailments, stops in line and acute shunting of wagons in operation. These damages are often caused by rolling contact fatigue or blocked wheels. Wayside detectors measuring the impact forces between rail and wheel aid in determining damage severity during the passage in normal operation on a main line, as pointed out by Alemi et al. [1], who also indicates that further research needs to be conducted to consider multiple detector stations. Clearly, if such detectors are distributed in the infrastructure, they can be used to capture growth in forces for individual wheels and to characterise growth patterns enabling conditionbased and predictive maintenance approaches, as shown in Zeng et al. [2]. High detector coverage on the operating routes would then be required, but irregular traffic patterns and especially cross-border operations are challenging boundary conditions. Integrating and synchronising different data types from multiple sources and the same type of wayside data from multiple locations are common tasks today, as shown by Birk et al. [3] and Mohammadi [4]. But using the same type of data from not only different wayside locations but also different providers and authorities with varying fidelity and operational practices introduces data quality and consistency uncertainties, which is also recognized by Mohammadi [4]. Subsequently, the design of automated decision-making algorithms achieving consistent confidence levels becomes a more complex task. In this study, data fusion from multiple wayside detector stations located in two different countries measuring wheel impact forces is used to detect damages and predict wheel damage severities. While this is in line with the proposed research tracks in Alemi et al. [1], there are numerous challenges that need to be considered, which are already shortly summarised by Karim et al. [5] and also discussed by Mohammadi [4]. In previous research, there is a consensus that the wheel-to-rail impact forces due to wheel defects are dependent on passage speed and severity of the damages, as shown in studies based on actual field tests as well as simulations of theoretical models, Maglio [6] and Pieringer [7]. Therein and claimed by Dong and Shankar [8] and by Steenbergen et al. [9], the shape and size of the wheel defects, axle load, train speed, and contact patch stiffness are factors that influence impact loads the most. Further, the impact of the ambient condition is investigated and reported by Olofsson and Sundvall [10]. Moreover, Kalay et al. [11] investigated impact loads from wheel flats as a function of train speed in the interval 30–100 km/h. The results show a slight increase in peak impact force with increased speed for short (25–40 mm) and long (75–100 mm) wheel flats. The results in Bogdevicius et al. [12] also show that, for all-wheel flat lengths, the increase in impact force slows down at speeds over 80 km/h, which is in contradiction to the field tests made by Nielsen and Johansson [13] with a wheel flat of length

Wheel Damage Prediction Using Wayside Detector Data …

493

100 mm and depth 0.9 mm showed a negative correlation between impact force and speed in the investigated speed interval, 30–100 km/h. A further challenge for a multiple detector station fusion is the local character of a damage and that there can be multiple damages with varying causes present both along the circumference and laterally, meaning that damage might not show consistently increased wheel impact forces at all detector stations. To mitigate the effect of the above challenges, an analytics scheme is suggested that can compensate for the above-mentioned effects and harmonises the detector data prior to the data fusion, making the subsequent predictive maintenance algorithms resilient to fluctuations in the reported wheel impact forces and able to detect wheel damages and forecast the future damage severities. The paper is organised as follows. First, the problem is more formally outlined, and then the measurement characteristics are discussed in the following section. The analytics approach to harmonise and fuse the data is then presented. Thereafter, the case study is presented where the approach is applied and compared with the naïve approach of using the data with a data harmonisation. The paper ends with some conclusions and an outlook on future work.

2 Problem Outline A train operator is operating a cross-border fleet in two countries: Sweden and Norway. During regular operation, the fleet passes several wheel impact load detectors (WILD), [14]. These sensors measure the force from the wheels to the rail during the passage. The force is measured indirectly by the strain in the rail between sleepers, Stewart et al. [15]. The result from a passage is the mean force and the peak force for each wheel. Denote the measurement of a wheel wi at location j and time t with j z i P E AK (t) = f t : wi , θ j ,

(1)

j z i M E AN (t) = f t : wi , θ j ,

(2)

where θ j are the location-dependent parameters. The measurement function f is stochastic, so the same wheel state and the parameters will generally not reproduce the same measurement. The time t is the large-scale time typically in hours. The difference in the sampling parameters depends on the WILD type, its calibration, and the track’s location. For instance, there will be a difference if the WILD is placed on a straight track or in conjunction with a turn. In most cases, the WILD detectors are placed on a straight track. The speed also affects the detected force level, Nielsen and Johansson [13], Pieringer [7]. However, the detections at most locations are made with velocities between 80 and 90 km/h. Since the differences between sites are minor, the effect is marginal. There are two different governing agencies in the two countries, Trafikverket in Sweden and Bane NOR in Norway.

494

J. Öhman et al.

These agencies are responsible for installing, calibrating and operating the detectors within each country. To an external part, the calibration and maintenance actions are unknown, and we do not know the status of the detectors. Another aspect that can vary between measurement locations is the WILD’s measurement technique. Common configurations are either based on strain gauges or fibre optics [16]. We assume that there are differences between detectors in the two countries and differences between individual detectors within each country. Historically, the processing for each country has been made individually. Joint processing is preferable since all data influence decision-making in conjunction. The mean value mainly depends on the wagon load, while the peak value depends on the wheel and loading conditions. It is preferable if the measurements are independent of the loading conditions. The dynamic force is calculated as the difference between the peak and mean force j

j

zi DYN (t) = zi PEAK (t) − zi MEAN (t)j ,

(3)

making it independent of the loading conditions. For the rest of the paper, we will exclusively use the dynamic force and drop the DYN notation from the variable. Given a set of historic force measurements from different locations, we would like to estimate the current force level and predict the time until maintenance is required. The different characteristics of different locations will influence the outcome of the prediction. Predge’s algorithm estimates the current force level and predicts future trends. Denote this algorithm function by g. Then the processing is described by

j j j z i , h = g z i (t0 ), z i (t−1 ), z i (t−2 )

(4)

where z i is the estimated current force level, and h is the time estimate until a maintenance action is required. Note that i is constant, but j varies in the equation above. Different locations may have different characteristics. These differences are further explored in the next section. The algorithm is proprietary, and we can not disclose the details. Performance is evaluated by Birk et al. [3]. It concluded that the analytics scheme achieves a detection capability of nearly 90% with a planning horizon of at least 12 h.

3 Measurement Characteristics If all measurements in both countries are included, we can look at the group statistics of both countries. We are limiting the analysis to one wagon type, Sggmrs. The data is from the time period of 2022-01-01 to 2022-04-30 and includes wagons in both Sweden and Norway. In total, 396 wagons and 263,548 detections are included in the analysis. Start by comparing the group statistics in both countries. Figure 1 shows the distributions of detections in Sweden and Norway. There is a skewed distribution

Wheel Damage Prediction Using Wayside Detector Data …

495

in both countries where lower force detections are most common. This behaviour is expected since the wheels operate correctly for longer periods than in a damaged state. The mode of the Norwegian data is lower than in Sweden, but in Norway, more data is located in the right tail. The low values are, in reality, uninteresting as long as they are below approximately 25 kN, so it is the data in the right tails that are of interest. It is this data that is interesting for estimating the wheel condition. It is noted that there are some minor differences between countries for values above 25 kN. If the data is further split into measurement locations in each country as depicted in parts (b) and (c) of Fig. 1, it is clear that the changes within a country are equal or greater than the changes between countries. In Sweden, two locations deviate from the majority. One of the locations also deviates in the shape of the pdf, having a more symmetric and gaussian-like profile. There are only five WILD detectors in Norway on the operated routes. Two deviate from the others, as shown in part (c) of Fig. 1. The data shows that differences within each country are already on the same scale as those between countries. We will approach the problem twofold: 1. Since differences already exist within the countries, joint processing could be done by running all data without any correction. 2. Normalise the data based on the group statistics of each location before performing the estimation using Eq. (4). The second approach requires some assumptions that the difference between the measurement locations depends only on the location’s characteristics, not on any external source. Using normalisation, the data is harmonised into a joint representation. From the joint representation, the data can be transformed into the configuration of a specific location.

3.1 Data Normalisation From the characteristics shown in Fig. 1, the detections at a single location can be assumed to have a log-normal distribution and parametrised by the mean μ and standard deviation σ. The following pdf models the data for each location f(z) =

1 1 √ ex p − 2 (ln(z) − μ) , 2σ σz 2π

(5)

where the distribution is given ina logarithm scale. Using this parametrisation, the location parameters θ j = μ j , σ j describes the distribution of each location. These parameters are calculated for each location by minimising the negative log-likelihood function. Given a detection and the location parameters, the value can be transformed into a normal N(0, 1) distribution in the logarithm scale using

logzj =

logz j − μ σ

(6)

496 Fig. 1 Distributions of detections in Sweden and Norway. Part a shows the histogram with the data divided into two groups. Parts b and c show the data further split into each country’s locations

J. Öhman et al.

Wheel Damage Prediction Using Wayside Detector Data …

497

If one location is chosen as the reference, the detections from other locations can be mapped to that location’s characteristics. logz j − μ σref + μref , zj = exp σ

(7)

where σr e f and μr e f are the parameters at the reference location.

4 Case Study To showcase the different processing methods suggested, data for three different wheels are processed in four different ways: 1. 2. 3. 4.

Using only the Swedish data. Using only the Norwegian data. Using the data from both countries. Using the normalised data from both countries.

For all four cases, the data is processed using Eq. (4). The result is in the form of an estimated force level and potential warnings. Depending on the severity, the warnings are on two different scales, yellow and orange. Figure 2 shows the result for the first three processing methods. Each row is a wheel, and each column is a processing method. The first column uses only Norwegian data, the centre uses only Swedish data, and the right column uses both. The solid black line is the estimated current force level representative of the damage severity. There is no apparent damage to the first wheel on the first row, as can be seen by the decreasing force. An incorrect warning is generated in the processing using the Norwegian and combined data due to the consistent moderate-force measurements. This warning is later removed when the Swedish data is fed to the algorithm. Of course, there is no incorrect warning when using only Swedish data since it is solely based on Norwegian data. For the second vehicle on the second row, there are two sections with increasing force levels. For the first increase, there is data from both Norway and Sweden. However, the increase is only present in the Swedish data. So, in part (d), there is no warning in processing only Norwegian data, and the estimated force level is low. An orange warning is raised correctly in both the Swedish and joint processing. After the first increase, the vehicle travelled to Norway, and the damage was corrected. For joint processing, the estimated force level decreases. Since this period only uses Norwegian data, the Swedish processing does not note this decrease and stays on a high level. About one month after the first increase, there is a second increase. This time, the Norwegian data also shows an increasing force level. However, Norway has fewer measurements, and the estimated force level increases slowly. The Swedish processing never noticed a decrease between the two peaks and continues to report a high force level and orange warning.

498

J. Öhman et al.

Fig. 2 Example processing of three different vehicles. Each row is a vehicle and each column is a processing method. I. a parts a–c is one vehicle processed in three different ways, parts d–f is the second vehicle and parts g–f is the third vehicle

The final wheel has a gradual transition starting in Norway with an increase from low force levels. When it enters Sweden around the fifteenth of April, there is an increase in force. The Norwegian processing does not have these measurements and estimates a lower value. The Swedish processing has no prior information on the wheel, and the high measures skyrocket the estimated force. The combined processing yields a smooth transition from Norwegian to Swedish data in this case. The final part of this case study is data normalisation, as described in Sect. 3.1. Before Eq. (4) estimates the wheel state, the data is normalised using Eq. (7). The processing is evaluated for the same three wheels but only with the joint processing. Figure 3 shows the results of this processing. This Figure should be compared to the right column in Fig. 2. The first wheel values in part (a) are drastically lower than in the corresponding non-normalised case in Fig. 2c. The value reduction is due to a

Wheel Damage Prediction Using Wayside Detector Data …

499

high mode and standard deviation for the Norwegian data. With the normalisation, the values are more in line with the Swedish results. On this data, there are no warnings, which is the desired outcome. The changes are minor for parts (b) and (c) compared to the corresponding data in Fig. 2. In part (b), a warning is raised correctly for the first increase, while the second is missed. In part (c), the wheel performs similarly to the normalised and non-normalised data.

5 Discussion and Conclusions Although most warnings are captured with Swedish or Norwegian processing, joint processing has advantages. If damage is initiated in one country and the vehicle travels to the other, it will appear as a step-function-like increase, triggering a more severe response. This effect is seen in parts (g) and (h) of Fig. 2. The joint processing in part (i) offers a smoother transition between the data in the two countries. The joint processing seems to have no inherent drawback since all data better reflect the wheel state. This is likely due to the already existing differences in location character within each country, as depicted in Fig. 1. The normalised data in Fig. 3 offers some additional benefits. The noisy data for the first row in Fig. 2 comes from a detection site with high variance in the detections. The high variance is compensated for in Fig. 3a, and no incorrect alarm triggers. It is important to note that all algorithm processing has been made with the same parameter settings. Future work will include finding optimal parameters for each processing method. Unoptimised parameters could be a reason for the miss of the second increase in Fig. 3b. This paper only showcases the behaviour of three wheels. The wheels chosen are representative of the data in the two countries. However, a larger dataset with associated evaluation metrics is required to fully benchmark the performance gain with the joint and normalised processing. Using normalised data comes with a risk. The data may represent the wheel state better, but the infrastructure owner will still act on high measurements regardless of its group statistics. Therefore, all stakeholders must know what is being done and what risk is involved.

500 Fig. 3 Processing of normalised data for the same vehicles as in Fig. 2. Part (a) corresponds to part c in Fig. 2, part (b) corresponds to part (f) in Fig. 2, and part (c) corresponds to part (I) in Fig. 2

J. Öhman et al.

Wheel Damage Prediction Using Wayside Detector Data …

501

Acknowledgements The authors thank Cargonet for providing the data for the analysis.

References 1. Alemi A, Corman F, Lodewijks G (2017) Condition monitoring approaches for the detection of railway wheel defects. Proc Inst Mech Eng, Part F: J Rail Rapid Transit 231(8):961–981 2. Zeng Y, Song D, Zhang W, Hu J, Zhou B, Xie M (2021) Physics-based data-driven interpretation and prediction of rolling contact fatigue damage on high-speed train wheels. Wear 484:203993 3. Birk W, Dittman T, Karim R, Westerberg J (2019) Experiences from the detection and prediction of wheel damages on railway vehicles in operation. In: Proceedings of the 13th international heavy haul association STS conference. Narvik 4. Mohammadi M, Mosleh A, Vale C, Ribeiro D, Montenegro P, Meixedo A (2023) An unsupervised learning approach for wayside train wheel flat detection. Sensors 23(4):1910 5. Karim R, Birk W, Larsson-Kråik PO (2015) Cloud-based maintenance solutions for conditionbased maintenance of wheels in heavy haul operation. In international heavy haul association: the 11th international heavy haul association conference will be held 21–24 June 2015 in Perth 21/06/2015–24/06/2015. International Heavy Haul Association 6. Maglio M, Vernersson T, Nielsen JC, Pieringer A, Söderström P, Regazzi D, Cervello S (2022) Railway wheel tread damage and axle bending stress–instrumented wheelset measurements and numerical simulations. Int J Rail Transp 10(3):275–297 7. Pieringer A, Kropp W, Nielsen JC (2014) The influence of contact modelling on simulated wheel/rail interaction due to wheel flats. Wear 314(1–2):273–281 8. Dong RG, Sankar S (1994) The characteristics of impact loads due to wheel tread defects. RTD Rail Transpn ASME 8:23–30 9. Steenbergen MJMM (2007) The role of the contact geometry in wheel-rail impact due to wheel flats. Veh Syst Dyn 45(12):1097–1116 10. Olofsson U, Sundvall K (2004) Influence of leaf, humidity and applied lubrication on friction in the wheel-rail contact: pin-on-disc experiments. Proc Inst Mech Eng, Part F: J Rail Rapid Transit 218(3):235–242 11. Kalay S, Tajaddini A, Reinschmidt A, Guins A (1995) Development of performance-based wheel-removal criteria for North American Railroads. In: Proceedings of 11th international wheelset congress. pp 227–233 12. Bogdevicius M, Zygiene R, Bureika G, Dailydka S (2016) An analytical mathematical method for calculation of the dynamic wheel–rail impact force caused by wheel flat. Veh Syst Dyn 54(5):689–705 13. Johansson A, Nielsen JO (2000) Out-of-round railway wheels—wheel-rail contact forces and track response derived from field tests and numerical simulations. Proc IMechE, Part F: J Rail and Rapid Transit 217:135–146 14. Swedish Transport Administration (2013) BVF 592.11—Detektorer. Hantering av larm från stationära detektorer samt åtgärder efter upptäckta skador vid manuell avsyning. (TDOK 2014:0689) 15. Stewart M, Flynn E, Marquis BP (2019) An implementation guide for wayside detector systems. U.S. Department of Transportation 16. Kouroussis G, Caucheteur C, Kinet D, Alexandrou G, Verlinden O, Moeyaert V (2015) Review of trackside monitoring solutions: from strain gages to optical fibre sensors. Sensors (Basel, Switzerland) 15(8):20115–20139

Predictive Maintenance and Operations in Railway Systems Antonio R. Andrade

Abstract The present paper explores the current need of a predictive model for Maintenance and Operations in Railway Systems that tackles the challenges of vertical separation. Railway Vehicles and Track (V-T) systems are responsible for large investment and maintenance costs, which should be optimised using a reliability-based Maintenance and Operation (M&O) decision model. The European railways face vertical separation, adding further complexity to M&O: while Train Operating Companies (TOCs) are maintaining their trains, track maintenance decisions are made by the Infrastructure Manager (IM). However, in this vertically separated system, no clear decision model seems to be in place to optimise the overall life cycle impacts of M&O decisions across the different railway agents. Therefore, a Collaborative Decision Model (CDM) is missing to align predictive M&O decisions. TOCs and IM are monitoring the evolution of their own assets and using sensor systems and signal processing techniques to identify and predict specific failures and support their M&O strategies in separate decision models. These M&O strategies, which very often have conflicting objectives, which may lead to sub-optimal overall life-cycle impacts. In fact, V-T systems have relevant joint behaviour in degradation, which significantly affect wear and damage of wheelsets and rails, as well as the lifecycle costs, reliability, availability and safety of the overall railway system. Thus, misalignments in M&O decisions can be reduced by using cooperative strategies between TOCs and IM. The current project PMO-RAIL will contribute towards an innovative reformulation of railway M&O problems, aiming to achieve a proof-ofconcept that such a CDM framework to support PM&O scheduling decisions can provide better overall life-cycle impacts. Keywords Railway maintenance · Railway systems · Vehicle-track interaction · Reliability

A. R. Andrade (B) IDMEC, Instituto Superior Técnico, University of Lisbon, Av. Rovisco Pais 1, 1049-001 Lisbon, Portugal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_37

503

504

A. R. Andrade

1 Introduction 2021 was the European Year of Rail. The focus is on making railways the most sustainable, innovative and safest transport mode in Europe. Railway Vehicles and Track (V-T) systems are responsible for large investment and maintenance costs. These costs should be optimised using a reliability-based Maintenance and Operation (M&O) decision model from a life-cycle perspective. Nowadays, several sensors installed in the V-T are capturing their real-time performance and feeding data-driven and physics-based models to support M&O decisions. Intelligent Maintenance Systems have been contributing towards Shift2Rail/ Europe’s Rail and Green Deal goals. However, the European railways face a challenge: vertical separation, which adds further complexity to M&O in an already highly dependable system. Whilst Train Operating Companies (TOCs) are maintaining their trains and running them in the same railway track, track maintenance decisions are made by a separate entity: the Infrastructure Manager (IM). However, in this vertically separated V-T system, no clear decision model seems to be in place to optimise the overall life cycle impacts of M&O decisions across the different railway agents. Therefore, a Collaborative Decision Model (CDM) is missing to align predictive M&O decisions. Currently, TOCs and IM are monitoring the evolution of their own assets and using sensor systems and signal processing techniques to identify and predict specific failures and support their M&O strategies in separate decision models. These M&O strategies, which very often have conflicting objectives, may lead to sub-optimal overall life-cycle impacts. In fact, V-T systems have relevant joint behaviour in degradation (e.g. wheel-rail contact), which significantly affect wear and damage of wheelsets and rails, as well as the life-cycle costs, reliability, availability and safety of the overall railway system. Thus, misalignments in M&O decisions can be reduced by using cooperative strategies between TOCs and IM, possibly with regulation and pricing mechanisms, to lead railway agents towards better solutions. Finally, though the perspective of the final user (passenger or freight) is essential in operations, it is very often missing in maintenance or is simplified by assuming unavailability costs or other contract mechanisms. For example, when a railway disruption/delay occurs (possibly due to a failure of a subsystem of a TOC and/or IM), cooperation (rather than competition) between TOCs and IM in re-scheduling operations may bring better impacts to users and taxpayers. This current project PMO-RAIL will contribute towards an innovative reformulation of railway M&O problems, aiming to achieve a proof-of-concept that such a CDM framework to support PM&O scheduling decisions can provide better overall life-cycle impacts. A first task will focus on two related PM decision problems where IM and TOCs can collaborate: (i) track tamping and rail grinding. (ii) wheelset turning.

Predictive Maintenance and Operations in Railway Systems

505

Although their responsibilities are clearly assigned: the first to the IM and the second to each TOC, a collaborative maintenance strategy can align predictive maintenance strategies. Moreover, as different trains from possibly different TOCs run over the same track, simulating the Vehicle-Track (V-T) Interaction will provide essential information on the energy in the wheel-rail contact and the expected wear and damage of rail and wheel profiles to predict their evolutions. PM will also require robust scheduling modelling techniques, especially if the impacts in railway traffic and passenger choices are quantified, such as the disturbances/disruptions due to unexpected failures (and associated corrective maintenance needs). Two additional tasks will develop predictive M&O models assuming the perspectives of different agents: (i) IM or Infrastructure-oriented models (ii) cooperative and/or competitive TOCs or Operator-oriented models. The final users (passengers/freight), i.e., passenger-oriented models are outside of the present exploratory project, and are left for further research. These models will be illustrated using data from case studies of the Portuguese railway industry. Three innovations are expected: (i) a method to derive digital twins in V-T systems, and support predictive M&O decisions (ii) optimisation models for the PM&O problems separately for each agent (iii) a joint formulation of the optimisation models for the IM and TOCs. The introduction to motivate the research of PMO-RAIL project is in Sect. 1. Then, a literature review is presented in Sect. 2, and a discussion on the project plan and methods in Sect. 3. Finally, Sect. 4 presents the main conclusions and future steps.

2 Literature Review Predictive maintenance in railway systems still face many challenges, namely on combining physics-based and data-driven models into hybrid approaches to predict failure and maintenance needs [1]. Vehicle-Track Interaction (VTI) models can iteratively assess, by using the Finite-Element (FE) and Multibody System Dynamics (MSD) [2], the evolution of track irregularities, rail and wheel profiles, while integrating degradation laws and some uncertainties associated (e.g., track vertical stiffness and/or suspension properties) [3]. In rail grinding [4, 5], the most relevant defects are rolling contact fatigue and head checks, and influencing factors are: rail profile, curve radius and super elevation. Several sources of information (axle box signals, video images, prior track condition) can be combined to support IM to plan grinding [6]. Integrating Life-Cycle Cost (LCC) models in the reliability modelling of wear and damage is essential to compare maintenance strategies [7]. Markov Decision Processes (MDP) have been used to optimise maintenance decisions in wheelset

506

A. R. Andrade

turning, defining the state spaces using: kilometres since last turning, damage occurrence and diameter [8, 9]. Extending the state space, by adding other variables/ quantities to predict different failure modes, is still a gap, though solving it might become computationally intractable (due to the curse of dimensionality). Prediction of wheel profile evolution due to wear [10] has also allowed to predict maintenance needs and optimise wheel and rail profiles, coupling several tasks: VTI models, local Wheel-Rail (W-R) contact and local wear models. MSD models can be used with WR contact models to compute contact points and forces, and use them in a wear law. W-R wear has been predicted by nonlinear autoregressive models with exogenous input Neural Network (NN), where W-R profiles, loads, speed, yaw angle, and their first and second derivatives were inputs, and wear of W-R were outputs of the NN [11]. The whole-system LCC [12] has been assessed by simulating a VTI model, to quantify the impacts of changes on vehicle and track systems, and integrating both perspectives of the W-R interface: TOC (wheel in the vehicle) vs. IM (rail in the track). Wear laws can then update W-R profiles in VTI models [13, 14]. Automatic detection of squats [15] can also use axle box signals and wavelet spectrum analysis to identify them, enabling their early detection. Wavelet Analysis (WA) of vertical accelerations has allowed [16] to identify weaker sections and assess vehicle response by using even a simple 2-DOF vehicle model. Recently, derailment risks have been assessed using a design of computer experiments [17, 18] based on WA of track irregularities [19]. We have also used different multivariate techniques [20] to assess reliability and availability risk and impacts using discrete simulation of failures [21] and expert judgement techniques [22]. Moreover, some sort of digital twins have been applied to vehicle design [23, 24], providing a new paradigm for fault detection, combining signal processing techniques for real-time monitoring and predictive failure and maintenance [23], while supporting sustainability and life cycle management [24]. However, current PM approaches [25] still neglect the decision interactions between TOCs and IM in PM. Integrating maintenance scheduling within railway operations has also been explored [26–29], though maintenance and train services are usually planned separately. Integrating them has been proposed [28], though more research is needed on resource constraints, e.g., adding more flexibility to predictive maintenance tasks (and their uncertainty) [30]. Combining the maintenance crew scheduling is also recent [26]. Scheduling IM maintenance within train services has also been proposed [27], by modelling competitive trains and maintenance on the same line, though its extension to a general railway network is missing. IM maintenance has been modelled within railway traffic [29]. However, the integration of the interactions in decision-making among TOCs and IM in predictive maintenance scheduling within railway operations still needs further research [29]. Some initial steps in that direction have included modelling collaboration between competing TOCs [31], incorporating combinatorial auctions, but regulation and pricing mechanisms are still missing. Recently, a game-theoretical approach on capacity allocation [32] explored conflicts between maximising passenger trains’ utility and minimising freight trains’ delays, and finding desired equilibrium schedules, while providing policy-relevant insight.

Predictive Maintenance and Operations in Railway Systems

507

Besides the TOC’s and the IM’s perspectives, the passengers’ perspective in predictive M&O has not been fully addressed, though it is out of the scope of PMORAIL exploratory project. Train rescheduling models [33–36] to optimise traffic reliability and passenger satisfaction have been proposed [33], integrating competition/ cooperation modelling [34], while integrating real-time traffic management [35, 36]. The passenger-centric train timetabling problem was first introduced in [37], using a MILP formulation to maximise TOC profit and maintain a certain passenger satisfaction level. However, maintenance has not been integrated, and a better understanding of passenger behaviour in disruptions/delays is still needed [37–39], by estimating their choices through discrete choice models and inserting them in the MILP through piecewise linear approximations (to avoid a nonlinear formulation). Hopefully, the current project PMO-RAIL will develop a decision framework (using a Game-Engineering approach [40]) for predictive maintenance and operations in railways systems that is currently missing. It will run vehicle-track simulations to derive a digital twin to predict maintenance needs, and to schedule maintenance within normal operations. Optimisation of M&O will then be formulated from the perspective of each agent (IM, TOC). Passengers’ perspective will be integrated later in a comprehensive framework.

3 Project Plan and Methods PMO-RAIL aims to integrate predictive M&O problems to improve overall life cycle impacts of the vertically separated railway system by: – using VTI simulation models to derive a digital twin and estimate predictive maintenance needs (or Remaining Useful Life (RUL)) for each subsystem (Sect. 3.1). – formulating M&O decisions from the perspective of the Infrastructure Manager (IM) (Sect. 3.2). – formulating M&O decisions from the perspective of the Train Operating Companies (TOCs) (Sect. 3.3). In a nutshell, current state-of-the-art in railway maintenance has not been able to link railway degradation processes in V-T systems with passenger utility in railway operations. Although, at first, such relation might seem distant, as different layers exist between them, with different agents making several decisions that affect them (e.g., IM, TOCs), there is evidence of a lack of market-oriented strategies in railway maintenance, and a passenger-centric maintenance strategy is missing. To achieve that, PMO-RAIL exploratory project will first explore the IM and TOC perspectives and interactions and leave the passengers’ perspective for later. Therefore, to deal with the uncertainty associated with the degradation processes of several components of V-T systems, vehicle dynamic simulations will be run in VAMPIRE (or with MUBODYN) software to estimate energy in the wheel-rail contact, safety and/or Key Performance Indicators (KPIs). These will be used to

508

A. R. Andrade

assess RAMS, LCC and PM impacts, in particular the RULs in wheelset and rail components (and track irregularities). Several reliability engineering techniques (supported by previous work developed in SMaRTE and LOCATE Shift2Rail research projects) will be applied to this end: FMECA and RAMS analysis to identify critical V-T subsystems, and an adaptive Design of Computer Experiments to define a sequence of simulations. Outputs will be post-processed using Surface Response Methods, Generalized Linear Models and/ or Machine Learning methods to define/achieve a digital twin to assess the impacts of different maintenance decisions in a computationally ‘cheaper’ way. Case studies from the Portuguese Railway network will be analysed using inspection records from the EM-120 inspection vehicle and the turning records from the two Portuguese TOCs, using the same Electric Multiple Unit (EMU 3500 series) in different tracks (with different layouts, vertical stiffness and curvatures). Then, different optimisation problems will be formulated (from the perspective of each railway agent: IM and TOCs, using a mixture of Reinforcement Learning techniques (e.g., Markov Decision Processes) and Mixed-Integer Linear Programming models (solved using commercial solvers e.g., GUROBI, CPLEX, FICO Xpress) to support maintenance and operations decisions. Objective functions will be defined for each railway agent. In simple terms: IM will try to minimise LCC and RAMS impacts and maximise capacity and revenues from infrastructure pricing; TOCs will aim to maximise profits (ticket revenues and performance subsidies minus maintenance, operating costs and performance penalties); Later, passengers will maximise their utility (computed using discrete choice models based on a large survey to railway passengers). Exact and approximated solving techniques based on branchand-bound, iterative, decomposition methods and meta-heuristics will also be tested to solve the proposed models. At least, a medium-term model and a short-term model will be integrated to reduce the computational complexity of the problem. Scenario-based (or robust) optimisations will also be pursued in order to incorporate the uncertainty associated with degradation, maintenance and operations. Monte Carlo simulation techniques will also be used to define scenarios: most likely, bestand worst-case scenarios. Later, exploratory work aiming to formulate simpler bi-level programming formulations to assess decision interactions between IM and TOCs will be attempted. Moreover, scenarios of competition and cooperation between TOCs will be explored for these simpler bi-level formulations. Note that data-driven, physics-based and digital-twin-based probabilistic degradation models will allow to detail and assess the influence of many V-T quantities in the RULs for many subsystems. Statistical modelling of some selected failure modes (e.g., wear, damage) will be conducted using Survival Modelling techniques (possibly using hybrid or hierarchical Bayesian models, to combine prior information from expert judgement and/or knowledge-based modelling with mechanical and statistical modelling). Reinforcement Learning (RL) models (e.g., Markov Decision Process) will be defined for some well-known practical maintenance problems: rail grinding, wheelset turning, track tamping. Optimal maintenance strategies will be computed using linear programming (and the revised simplex algorithm) or other

Predictive Maintenance and Operations in Railway Systems

509

techniques to estimate what is the best action to make, given the condition and RUL for a component (or subsystem) of track and vehicle systems. State spaces will be defined based on geometrical and physical indicators (or based on KPI quantities). Moreover, the decision framework on M&O will be integrated using a step-by-step approach (Fig. 1). PMO-RAIL targets a proof-of-concept that a collaborative and/or cooperative decision-making framework, applied to railway maintenance and operations, can improve LCC and RAMS from the perspective of the different railway agents and society. Finally, PMO-RAIL aims to contribute to the development of a decision framework based on VTI models to derive the trade-offs in M&O and overall life-cycle costs between the different railway agents.

Fig. 1 Example of a step-by-step approach in operator-oriented M&O scheduling

510

A. R. Andrade

3.1 Predictive Maintenance in Railway V-T Systems This first task will build a digital twin model to support Predictive Maintenance (PM) decisions in the railway vehicle and track (V-T) systems. It will mainly focus on predicting the maintenance needs associated with the following maintenance actions: 1. track tamping (and deterioration of railway track geometry defects or irregularities) 2. rail grinding (wear and damage) 3. wheelset turning (wear and damage). Different case studies for these three main maintenance actions will be analysed with maintenance, inspection and operation data supplied from railway industrial partners. First, a Failure Mode, Effects and Criticality Analysis (FMECA) and a Reliability, Availability, Maintainability and Safety (RAMS) analysis will be applied to all railway V-T subsystems, by compiling previous frameworks and findings from Shift2Rail European projects (SMaRTE and LOCATE). By analysing the databases from industrial partners, we will conduct statistical modelling of railway track irregularities using a mixture of Wavelet Analysis, Vector Auto-Regressive models and multivariate statistical techniques, controlling for spatial correlations and cross correlations between different railway track geometry defects/signals (e.g., vertical and horizontal alignments, etc.). In parallel, data analysis of rail corrugation and damage from ultrasonic inspection records will allow us to develop statistical survival models for the rail damage and wear occurrences/evolution. The same procedure will be applied to wheelset wear and damage using lathe and inspection records. Statistical tests (using several Information Criteria) will be conducted to verify and compare different model specifications. Then, Vehicle-Track Interaction models (combined with degradation laws for long-term behaviour for specific components) will be defined to assess the impact of track irregularities, rail corrugation and damage, wheel and rail profiles, primary and secondary suspensions and track stiffness in reliability- and safety-related quantities. Afterwards, we will define an Adaptive Design of Computer Experiments, using the statistical models to generate relevant inputs (possibly in the wavelet domain for railway irregularities, with most penalising and minimum entropy criteria), and to assess reliability- and safety-related outputs of V-T dynamic simulations (e.g., energy in the wheel-rail contact, Y/Q derailment coefficient and indicators on passenger comfort) using VAMPIRE software or MUBODYN software. We will conduct postprocessing analysis using Surface Response Methods or other statistical or machine learning techniques, to estimate a multivariate model to predict the V-T relevant outputs due to the degraded condition of rails, wheelsets and track irregularities. This multivariate model will be the core of the digital twin. It will also be possible to update the digital twin by using sensor and/or inspection information, and design and run another set of V-T simulations to estimate/calibrate the digital twin in other degraded conditions.

Predictive Maintenance and Operations in Railway Systems

511

3.2 Infrastructure-Oriented PMO This second task will develop optimisation models to plan/schedule Predictive Maintenance and Operations (PM&O) in railway systems from the perspective of the Infrastructure Manager. These optimisation models will be defined through a MixedInteger Linear Programming (MILP) formulation (and solved using commercial solvers). All associated technical and economic constraints (e.g., budget constraints) will be considered in the MILP formulation. These infrastructure-oriented models will be tested (and verified) by applying them to illustrative (small-sized) instances from case studies of the Portuguese railway industry. These decision support models will schedule all infrastructure maintenance tasks, with a special focus on track tamping and rail grinding maintenance operations. They will aim to minimise the lifecycle costs/impacts, in particular the unavailability in track system. Another objective will be to maximise capacity, reliability and safety. Later, other maintenance-related decisions will be included and their associated decision support models will be formulated and integrated later, such as: network design, planning track renewals or upgrades, capacity expansion and speed restriction management, slot allocation, maintenance crew scheduling and inspection scheduling. All these optimisation models will be developed from a single-agent perspective, aiming to minimise lifecycle costs, safety impacts, delay/disruption impacts and maximise the IM performance and revenues from operators through infrastructure pricing. Pricing mechanisms (set by the regulator) will also be considered towards the definition of a single objective function (to reduce model complexity). Ideally, sensitivity analysis will be conducted to assess the impacts of the inputs in the optimal value and PM&O solution. Meta-heuristic approaches will also be explored to reduce computational time and find optimal or near-optimal solutions at a reasonable computational cost. An extensive analysis of the performance of such meta-heuristics will be explored for illustrative examples and simulated instances (PM needs will be generated using the digital twin from Task 1). We will focus on track tamping and rail grinding maintenance, and derive (unconstrained) optimal maintenance strategies based on Alert thresholds (e.g., Alert Limit, Intervention Limit and Immediate Action Limit) and Reinforcement Learning techniques (e.g., MDP). These thresholds will allow us to estimate the RULs and to trigger (prescriptive, preventive and corrective) maintenance needs in space and in time in the IM network. The infrastructure-oriented model will schedule these activities within the train schedules, along with other infrastructure-oriented strategies that aim to minimise their unavailability impacts. Asset information from IM databases will be crucial to define realistic infrastructureoriented PM&O, and the technical and operational constraints to apply such models to the Portuguese case studies.

512

A. R. Andrade

3.3 Operator-Oriented PMO This third task will develop optimization models to plan/schedule Predictive Maintenance and Operations (PM&O) in railway systems from the perspective of the Train Operating Companies (TOCs). These operator-oriented models will be tested (and verified) by applying them to illustrative instances from case studies of the Portuguese railway industry. These decision support models will schedule all vehicle maintenance tasks, with a special focus on wheelset turning. They will aim to minimise the life-cycle costs/impacts, in particular the unavailability in vehicle systems. Another objective will be to maximise profits (revenues from tickets, subsidies from public service obligations, minus the maintenance and operating costs, including infrastructure charges). These optimization models will be defined and solved through MILP formulations (solved using commercial solvers). All relevant technical constraints will be considered in the MILP formulation. Other maintenancerelated and operation-related decision problems will be formulated and integrated in a step-by-step modelling approach, namely: depot location, fleet assignment, maintenance crew scheduling, etc. Please see the planned modelling steps in Fig. 1, and note that similar integration plans will be developed for the infrastructure-oriented PM&O model in Task 2. Firstly, all these decision models will be developed from the perspective of a single agent, i.e., from the perspective of a single TOC. Later, they will be extended to the following cases: – two collaborative/cooperative passenger TOCs – two competitive (or non-cooperative) passenger TOCs – two collaborative/cooperative TOCs (one passenger and another freight) These cases will explore collaborative and competitive strategies between (passenger and freight) TOCs. Theoretical examples will be developed and analysed before attempting to develop real-world case studies. The predictive maintenance needs associated with wheelset turning (and other maintenance tasks) will be estimated using the digital twin developed in Task 1, for each train unit from the TOC fleet, incorporating vehicle specifications and their infrastructure paths/slots (using railway traffic management software). Normal train operations can be rescheduled to minimise train delays/disruptions and/or cancellations. The operatororiented model will schedule the predictive maintenance and expected corrective maintenance activities in vehicle systems and minimise their unavailability impacts.

4 Conclusions PMO-RAIL is an ambitious project that aims to develop predictive maintenance and operations models in railway systems that incorporate the game-theoretic nature of a vertically separated sector, modelling the goals of different agents such as the IM, the TOCs and users. The present paper explores the objectives and motivations,

Predictive Maintenance and Operations in Railway Systems

513

the background literature and the methods and plan that is going to be followed to execute it. Although the outputs of the present paper may look intangible, it provides a path in a complex map, highlighting the main constraints and ingredients towards innovation in the topic of railway maintenance. Future directions in this path would require more insights on users’ experiences (passenger and freight) and how human-centric approaches of industry 5.0 can be integrated. Acknowledgements The author would like to thank all industrial partners and colleagues that have contributed towards project PMO-RAIL. This work is supported by the Foundation for Science and Technology (FCT), through IDMEC, under LAETA, project UIDB/50022/2020. Moreover, it is also supported by the Foundation for Science and Technology (FCT), through IDMEC, under LAETA, project PMO-RAIL—Predictive Maintenance and Operations in Railway Systems (2022.01738.PTDC).

References 1. Soleimanmeigouni I, Ahmadi A, Kumar U (2018) Track geometry degradation and maintenance modelling: a review. Proc Inst Mech Eng Part F: J Rail Rapid Transit 232(1):73–102 2. Antunes P, Magalhães H, Ambrosio J, Pombo J, Costa J (2019) A co-simulation approach to the wheel–rail contact with flexible railway track. Multibody SysDyn 45(2):245–272 3. Grossoni I, Andrade AR, Bezin Y, Neves S (2019) The role of track stiffness and its spatial variability on long-term track quality deterioration. Proc Inst Mech Eng Part F: J Rail Rapid Transit 233(1):16–32 4. Sancho LC, Braga JA, Andrade AR (2021) Optimizing maintenance decision in rails: a Markov decision process approach. ASCE-ASME J Risk Uncertain Eng Syst Part A: Civ Eng 7(1):04020051 5. Cuervo PA, Santa JF, Toro A (2015) Correlations between wear mechanisms and rail grinding operations in a commercial railroad. Tribol Int 82:265–273 6. Jamshidi A, Hajizadeh S, Su Z, Naeimi M, Núnez A, Dollevoet R, Schutter B, Li Z (2018) A decision support approach for condition-based maintenance of rails based on big data analysis. Transp Res Part C: Emerg Technol 95:185–206 7. Andrade AR, Stow J (2017) Assessing the potential cost savings of introducing the maintenance option of ‘Economic Tyre Turning’ in Great Britain railway wheelsets. Reliab Eng Syst Saf 168:317–325 8. Braga JA, Andrade AR (2019) Optimizing maintenance decisions in railway wheelsets: a Markov decision process approach. Proc Inst Mech Eng Part O: J Risk Reliab 233(2):285–300 9. Costa MA, Braga JP, Andrade AR (2021) A data-driven maintenance policy for railway wheelset based on survival analysis and Markov decision process. Qual Reliab Eng Int 37(1):176–198 10. Braghin F, Lewis R, Dwyer-Joyce RS, Bruni S (2006) A mathematical model to predict railway wheel profile evolution due to wear. Wear 261(11–12):1253–1264 11. Shebani A, Iwnicki S (2018) Prediction of wheel and rail wear under different contact conditions using artificial neural networks. Wear 406:173–184 12. Bevan A, Molyneux-Berry P, Mills S, Rhodes A, Ling D (2013) Optimisation of wheelset maintenance using whole-system cost modelling. Proc Inst Mech Eng Part F: J Rail Rapid Transit 227(6):594–608 13. Pombo J, Ambrosio J, Pereira M, Lewis R, Dwyer-Joyce R, Ariaudo C, Kuka N (2011) Development of a wear prediction tool for steel railway wheels using three alternative wear functions. Wear 271(1–2):238–245

514

A. R. Andrade

14. Ignesti M, Innocenti A, Marini L, Meli E, Rindi A (2014) Development of a model for the simultaneous analysis of wheel and rail wear in railway systems. Multibody Syst Dyn 31(2):191–240 15. Molodova M, Li Z, Núñez A, Dollevoet R (2014) Automatic detection of squats in railway infrastructure. IEEE Trans Intell Transp Syst 15(5):1980–1990 16. Cantero D, Basu B (2015) Railway infrastructure damage detection using wavelet transformed acceleration response of traversing vehicle. Struct Control Health Monit 22(1):62–70 17. Costa JN, Ambrósio J, Andrade AR, Frey D (2023) Safety assessment using computer experiments and surrogate modeling: Railway vehicle safety and track quality indices. Reliab Eng Syst Saf 229:108856 18. Pagaimo J, Magalhães H, Costa JN, Ambrosio J (2020) Derailment study of railway cargo vehicles using a response surface methodology. Veh Syst Dyn 1–26 19. Costa MA, Costa JN, Andrade AR, Ambrósio J (2022) Combining wavelet analysis of track irregularities and vehicle dynamics simulations to assess derailment risks. Veh Syst Dyn 1–27 20. Braga JA, Andrade AR (2021) Multivariate statistical aggregation and dimensionality reduction techniques to improve monitoring and maintenance in railways: the wheelset component. Reliab Eng Syst Saf 216:107932 21. Leite M, Costa M, Alves T, Infante V, Andrade AR (2022) Reliability and availability assessment of railway locomotive bogies under correlated failures. Eng Fail Anal 106104 22. Leite M, Infante V, Andrade AR (2021) Using expert judgement techniques to assess reliability for long service-life components: an application to railway wheelsets. Proc Inst Mech Eng, Part O: J Risk Reliab 1748006X211034650 23. Xu Y, Sun Y, Liu X, Zheng Y (2019) A digital-twin-assisted fault diagnosis using deep transfer learning. IEEE Access 7:19990–19999 24. Kaewunruen S, Lian Q (2019) Digital twin aided sustainability-based lifecycle management for railway turnout systems. J Clean Prod 228:1537–1551 25. Baptista M, Sankararaman S, de Medeiros IP, Nascimento C Jr, Prendinger H, Henriques EM (2018) Forecasting fault events for predictive maintenance using data-driven techniques and ARMA modeling. Comput Ind Eng 115:41–53 26. Pour SM, Marjani Rasmussen K, Drake JH, Burke EK (2019) A constructive framework for the preventive signalling maintenance crew scheduling problem in the Danish railway system. Transp Res Part C: Emerg Technol 70(11):1965–1982 27. D’Ariano A, Meng L, Centulio G, Corman F (2019) Integrated stochastic optimization approaches for tactical scheduling of trains and railway infrastructure maintenance. Comput Ind Eng 127:1315–1335 28. Lidén T, Joborn M (2017) An optimization model for integrated planning of railway traffic and network maintenance. Transp Res Part C: Emerg Technol 74:327–347 29. Luan X, Miao J, Meng L, Corman F, Lodewijks G (2017) Integrated optimization on train scheduling and preventive maintenance time slots planning. Transp Res Part C: Emerg Technol 80:329–359 30. Mira L, Andrade AR, Gomes MC (2020) Maintenance scheduling within rolling stock planning in railway operations under uncertain maintenance durations. J Rail Transp Plan Manag 14:100177 31. Kuo A, Miller-Hooks E (2012) Developing responsive rail services through collaboration. Transp Res Part B: Methodol 46(3):424–439 32. Talebian A, Zou B, Peivandi A (2018) Capacity allocation in vertically integrated rail systems: a bargaining approach. Transp Res Part B: Methodol 107:167–191 33. Xu P, Corman F, Peng Q, Luan X (2017) A train rescheduling model integrating speed management during disruptions of high-speed traffic under a quasi-moving block system. Transp Res Part B: Methodol 104:638–666 34. Luan X, Corman F, Meng L (2017) Non-discriminatory train dispatching in a rail transport market with multiple competing and collaborative train operating companies. Transp Res Part C: Emerg Technol 80:148–174

Predictive Maintenance and Operations in Railway Systems

515

35. Luan X, Wang Y, De Schutter B, Meng L, Lodewijks G, Corman F (2018) Integration of realtime traffic management and train control for rail networks-Part 1: optimization problems and solution approaches. Transp Res Part B: Methodol 115:41–71 36. Luan X, Wang Y, De Schutter B, Meng L, Lodewijks G, Corman F (2018) Integration of real-time traffic management and train control for rail networks-Part 2: extensions towards energy-efficient train operations. Transp Res Part B: Methodol 115:72–94 37. Robenek T, Maknoon Y, Azadeh SS, Chen J, Bierlaire M (2016) Passenger centric train timetabling problem. Transp Res Part B: Methodol 89:107–126 38. Zhu S, Masud H, Xiong C, Yang Z, Pan Y, Zhang L (2017) Travel behavior reactions to transit service disruptions: study of metro safetrack projects in Washington, DC. Transp Res Rec 2649(1):79–88 39. Robenek T, Azadeh SS, Maknoon Y, de Lapparent M, Bierlaire M (2018) Train timetable design under elastic passenger demand. Transp Res Part B: Methodol 111:19–38 40. Adler N, Pels E, Nash C (2010) High-speed rail and air transport competition: game engineering as tool for cost-benefit analysis. Transp Res Part B: Methodol 44(7):812–833

Experimental Setup for Non-stationary Condition Monitoring of Independent Cart Systems Abdul Jabbar, Gianluca D’Elia, and Marco Cocconcelli

Abstract The paper discusses the independent cart technology, which utilizes linear motors to move carts along a predetermined track autonomously. This technology offers control of individual speed profiles for each section along the track, frictionless propulsion mechanism, and the ability to start and stop loads quickly. Nevertheless, the initial cost of these systems is substantial, and regular condition monitoring is required to ensure optimal performance and long-term economic benefits. The paper provides an overview of various condition monitoring and signal processing techniques for analysis, including data-driven modeling with machine learning algorithms. The article presents an experimental setup based on the independent cart system and outlines a strategy for data acquisition that emphasizes specific conditions during each run of the system. The collected data is critical in monitoring the independent cart system’s condition and developing expertise in identifying different types of faults and their precise locations, utilizing hybrid modeling approaches. Keywords Condition monitoring · Independent cart systems · STFT

1 Introduction Independent cart technology represents quite an advancement in linear motors, eliminating the limitations of traditional conveyors, which rely on gears, chains, and belts. This technology operates by employing linear motors to move carts autonomously along a predetermined track. Carts designed for this technology feature permanent A. Jabbar (B) · G. D’Elia · M. Cocconcelli DISMI, University of Modena and Reggio Emilia, Via G. Amendola 2–Pad. Morselli, 42122 Reggio Emilia, Italy e-mail: [email protected] G. D’Elia e-mail: [email protected] M. Cocconcelli e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_38

517

518

A. Jabbar et al.

magnets, while linear motors are equipped with coils that create a magnetic field, causing the carts to move in harmony with the altering magnetic pattern. By utilizing magnets, the independent cart technology can exercise precise control over motion, allowing for frictionless propulsion, and reducing the number of parts involved in the process. The independent cart technology is highly beneficial as it significantly reduces the number of components required for transportation or movement, resulting in less wear and tear on individual parts. One interesting feature of independent cart technology is its frictionless propulsion mechanism. The absence of the physical contact between the cart and the surface of the linear motor coils results in a highly efficient solution for transportation and task manipulation. The reduction in friction during motion means that the system requires less energy to operate, leading to lower operational costs and energy savings. Furthermore, the independent cart technology provides the ability to start and stop loads quickly, without losing control of the movement, which is a significant advantage over traditional conveyors. This feature means that it is possible to increase the throughput of the system and minimize the risks associated with stopping and starting operations, such as material loss or damage. The independent cart technology allows precise and fluid manipulation and movement, from very slow to extremely high speeds, making it an excellent fit for a wide range of industrial applications. Its capability to adapt to diverse operational requirements is a significant advantage, making it a potentially beneficial solution for optimizing industrial processes and enhancing productivity. One crucial economic consideration for the independent cart system is its initial cost, which is substantial and requires a significant upfront investment for installation. While these systems are feasible for industries as long as they continue to function uninterrupted, it is essential to acknowledge that prolonged downtime can undermine their economic viability. As such, to ensure optimal performance and long-term economic benefits, regular condition monitoring of these systems is vital. This approach allows for the timely detection of potential issues and the implementation of necessary measures to prevent or mitigate any damage, reducing the risk of system downtime and safeguarding the investment made in this technology. Condition monitoring can be broadly classified into two main categories: Permanent monitoring and Intermittent monitoring. Permanent monitoring, which is applied solely to safeguard critical and costly machines, entails continuous observation of the machine’s condition and immediate shutdown upon detection of any anomaly. However, this monitoring technique requires expensive transducers to be incorporated into the machines at the design stage. In contrast, Intermittent monitoring involves offline data processing for detailed analysis of the machine’s condition. This methodology not only provides a low-cost condition monitoring solution but can also be used in conjunction with permanent monitoring. Furthermore, intermittent monitoring can warn of developing conditions, providing ample time to plan for maintenance work [1]. Therefore, intermittent monitoring is a simple and economical choice for condition monitoring of the independent cart systems as it does not require any customization or design modification to incorporate specialized transducers.

Experimental Setup for Non-stationary Condition Monitoring …

519

To perform condition monitoring of machines in an industrial process, it is essential to have a comprehensive understanding of the different machines’ working principles, suitable transducers for data collection, the types of signals emanating from different parts of the machines, and signal processing techniques for analysis. Several analysis techniques, including Performance Analysis, Acoustic Emissions, Vibration Analysis, Lubricant Analysis, and Thermography, can offer insights into the machine’s internal condition using various specialized transducers. However, some of these techniques are best suited for specific applications or require specialized transducers. For instance, Performance Analysis can determine the machine’s various efficiencies and overall condition using simple process parameter transducers like Temperature, Pressure, and Flowrates. Similarly, Thermography involves measuring small temperature variations and comparing them to standard conditions, while Lubricant Analysis involves examining metallic debris and conducting chemical analysis of lubricants to evaluate the machine’s internal condition. Vibration Analysis, on the other hand, is more widely applicable and provides advanced warning of impending failures. Machines, even when in good condition, generate vibrations known as mechanical signatures, with many directly linked to cyclic machine events such as shaft speed and gear-teeth mesh [1]. Various types of transducers can detect changes in the machine’s mechanical signature. Although both Vibration Analysis and Acoustic Emissions can provide warnings of impending failures, Acoustic Emissions are rather complicated to use for machine condition monitoring [1]. Given the fact that there are no lubricants involved in the independent cart systems that are under consideration in this study, there are no parameters such as pressure and flow rate. Additionally, the contactless motion of the carts provides less friction, resulting in little temperature variation during the motion. Therefore, the preferred choice for condition monitoring of independent cart systems is to analyze their mechanical signature using vibration analysis. To analyze the state of the machinery, the researchers have employed several signal processing techniques such as Envelop analysis [2], Cepstrum analysis [3, 4], Discrete Random Separation (DRS) [5–7], Time Synchronous Averaging (TSA) [6], Spectral Kurtosis [8, 9], and Short time Fourier transform (STFT) [10]. These techniques are useful for detecting faults and analyzing vibration patterns. However, data-driven modeling has emerged as a valuable approach that can complement traditional signal processing techniques. This involves using machine learning algorithms to identify patterns and anomalies in data. It requires expertise in pre-processing and machine learning, but the benefits can be significant. Therefore, in addition to classical signal processing techniques, researchers have been exploring the use of machine learning algorithms for bearing fault classification. A range of algorithms have been applied, including Support Vector Machines (SVM) [11, 12], Artificial Neural Networks (ANN), K-Nearest Neighbors (K-NN), Convolutional Neural Networks (CNN), Deep Learning (DL), Transfer Learning, Anomaly Detection, and Clustering algorithms [13–15]. This is not an exhaustive list, but it highlights the range of approaches that are available. By combining these different techniques, more accurate and reliable models for machine condition monitoring can be developed.

520

A. Jabbar et al.

Our aim is to undertake an extensive analysis of data by employing classical signal processing techniques, physics-based model of the system, as well as both supervised and unsupervised machine learning techniques. Specifically, we plan to apply various supervised learning techniques, including but not limited to support vector machines, decision trees, ensemble methods, and neural networks. Additionally, we intend to leverage the power of unsupervised learning techniques, which will include Kmeans clustering, hierarchical clustering, density-based scan, anomaly detection, and Hidden Markov Model. The combination of these techniques will allow us to extract meaningful insights and patterns from the data, which will enable us to gain a more comprehensive understanding of the underlying processes and phenomena. Our focus lies in the diagnosis and analysis of bearing faults as well as the track faults, an undertaking that presents numerous challenges. Among these challenges are the following: Firstly, the guide rollers must not only rotate but also move laterally along the track in accordance with the mover’s motion. Secondly, the bearings themselves are of a diminutive size. Thirdly, multiple guide rollers are present within each mover to facilitate continued movement in the presence of one or more faulty bearings. Lastly, as the number of movers increases, so does the complexity of the problem. Hence, determining the precise location or source of the fault becomes cumbersome. For the sake of conciseness, this article doesn’t include implementation of the any machine learning algorithm. The implementation of learning algorithms will be presented in a separate article in future. As data is a crucial component of condition monitoring and considering the benefits of vibration analysis, the main focus of this paper is centered around the acquisition of data through the utilization of multi-axial accelerometers. The article details an experimental setup and outlines a data acquisition strategy that prioritizes capturing data during specific conditions of the independent cart system. The acquired data is expected to be pivotal in accurately monitoring the condition of the system and developing expertise in identifying fault types and their precise locations using hybrid modeling approaches. To the best of our knowledge, there is a scarcity of literature available on the subject of condition monitoring of independent cart systems. This research gap has motivated us to undertake this investigation, with the aim of addressing this unexplored area and advancing knowledge in the field. Several companies are actively engaged in developing independent cart systems, such as Rockwell Automation’s iTRAK Intelligent Track System [16] and Beckhoff’s Extended Automation System (XTS) [17]. This paper takes a closer look at the XTS, specifically focusing on the vibration data acquisition for this independent cart system. The paper is organized into several sections. In Sect. 2, a brief overview of the Beckhoff’s XTS and experimental setup is presented. Section 3 outlines the data acquisition strategy, while Sect. 4 details the raw data and preliminary results. The paper concludes with Sect. 5.

Experimental Setup for Non-stationary Condition Monitoring …

521

Fig. 1 Extended Transport System (XTS) [17]

2 System Description and Experimental Setup 2.1 Extended Transport System (XTS) XTS, or Extended Transport System [17], is a versatile and modular linear transport system designed to provide efficient and reliable movement of materials, products, or components in a wide range of applications. It comprises linear motors, movers, and guide rails, which work together to create a flexible and dynamic platform for moving loads with precision and speed (see Fig. 1). The linear motors generate magnetic fields that propel the movers along the guide rails, eliminating the need for belts, chains, or gears. This not only increases the system’s efficiency but also reduces maintenance requirements and noise levels. The movers themselves can be customized to suit specific load requirements and can carry weights ranging from a few grams to several kilograms. One of the key benefits of XTS is its ability to be configured in various geometric path shapes. By combining the motor modules in different ways, we can create complex path patterns that follow the desired trajectory of the load. This allows XTS to be used in wide range of applications, from simple point-to-point transfers to more complex multi-axis movements. The system’s flexibility also makes it easy to integrate with other automation technologies such as robots, conveyors, and assembly systems, enabling the creation of highly efficient and integrated production lines.

2.2 Experimental Setup In our research, we employed a closed-loop path configuration with a length of 1500 mm or 1.5 m, utilizing four motor modules, two of which are straight, and two are 180-degree clothoid, as illustrated in Fig. 2. Our study involves the use of three types of carts/movers, including a 12 roller mover and two 6 roller movers with and without spring loading (see Fig. 3). The only difference between the 6 roller movers with and without spring loading is that middle roller (red roller in Fig. 3b) is attached to a spring. Discussion of the application of such movers is beyond the scope of this

522 Fig. 2 XTS experimental setup

A. Jabbar et al.

Triaxial accelerometer

Monoaxial accelerometer

Z Y

X

article. This system can be designed with the flexibility to be configured into multiple stations, making it possible to implement individual speed profiles for each section. This convenient feature enables the acquisition of data under non-stationary conditions. Despite this capability, our primary focus is on obtaining data under constant and linear speed profiles. This approach will allow us obtaining data that can be used to understand the behavior of the system more accurately under real-world operating conditions.

3 Data Acquisition Our main objective is to develop expertise in determining the types of faults and their precise locations, using hybrid modeling approaches such as combining the classical signal processing techniques, physics-based modeling, and data-driven modeling. Data-driven modelling allows to analyse several system parameters such as current of motor modules, position, and speed of the carts, as well as the vibration of the system all at once to detect small changes in the system behavior that at early stages of the fault might not be obvious in classical signal processing. To achieve this, we propose a strategy for data acquisition that emphasizes specific conditions during each run of the system. Firstly, we consider multiple constant speed profiles ranging from 0.5 to 3 m/s, along with a linear speed profile where the speed varies linearly between 0 and 3 m/s. Secondly, we examine a single mover with 12 rollers, a mover with 6 rollers without any spring loaded rollers, and a mover with 6 rollers, 2 of which are spring-loaded rollers. The types of faults include roller outer race fault, ball fault, and track or guide rails fault. A summary of the movers and fault types is presented in Table 2. In addition to the aforementioned conditions, we also plan to evaluate the system under no-load and with-load conditions, to ensure the validity of the results. Lastly, we acquire data at multiple sampling frequencies, including 20 and 51.2 kHz. By testing the system under these diverse conditions, we aimed to obtain comprehensive data that can be utilized for further analysis and modeling. Overall, our experimental setup is designed to provide a diverse range of conditions for data acquisition, to facilitate the development of hybrid modeling approaches

Experimental Setup for Non-stationary Condition Monitoring …

523

Fig. 3 a 12-roller mover, b 6-roller mover without spring loading [17]

(a)

(b)

that can accurately detect faults and determine their source or precise locations on the track. By using this approach, we can create a robust fault detection and diagnosis system that is capable of handling various fault scenarios, ensuring optimal performance and reliability of the system. As part of our experimental setup, we are currently acquiring a wide range of data from the motor module and mover, including supply, position, velocity, and acceleration information. To supplement this, we are also collecting vibration data using two accelerometers separated by a distance along the length of the system, one of which is mono-axial, while the other is 3-axial. This approach allows us to capture data from different angles and provides a more comprehensive understanding of the system’s behavior from vibration signature’s perspective.

524

A. Jabbar et al.

Fig. 4 Guide rail misalignment fault

In total, we are acquiring 15 channels of current data of the infeed motor module, enabling us to analyze the electrical behavior of the system. Furthermore, the position, velocity, and acceleration information of the mover, coupled with the accelerometers data, provides a holistic view of the system’s mechanical behavior. Additionally, the vibration data obtained using the accelerometers provides critical insight into the system’s health and performance. A tabular representation of the system’s variables acquired and fault types is provided in Tables 1 and 2. To ensure accuracy and completeness, we are acquiring data at two different sampling frequencies −20 and 51.2 kHz. These frequencies are selected to capture the necessary data with the required resolution. Moreover, data acquisition under similar conditions at different sampling frequencies allows us to compare and analyze Table 1 Variables of interest and data acquisition systems Data acquisition system

Variables of interest for data acquisition

PLC based data acquisition

15 channel current data of infeed motor module Mover position Mover velocity Mover acceleration Accelerometer1-vibration data along X-axis Accelerometer1-vibration data along Y-axis Accelerometer1-vibration data along Z-axis Accelerometer2-vibration data along Z-axis

National instrumentation data acquisition

Accelerometer1-vibration data along X-axis Accelerometer1-vibration data along Y-axis Accelerometer1-vibration data along Z-axis Accelerometer2-vibration data along Z-axis

Experimental Setup for Non-stationary Condition Monitoring …

525

Table 2 Prospective conditions and fault types for data acquisition Condition type

No load conditions

Loaded conditions

Mover type

12 roller mover

6 roller mover without spring loading

Fault type

Outer race fault

Outer Outer race race fault fault

6 roller mover with spring loading

12 roller mover

6 roller mover without spring loading

6 roller mover with spring loading

Outer race fault

Outer Outer race race fault fault

Ball fault

Ball fault Ball fault

Ball fault

Ball fault Ball fault

Track fault

Track fault

Track fault

Track fault

Track fault

Track fault

the data effectively, providing a better understanding of the system’s behavior. It is important to mention that the data acquired at two different frequencies is obtained by utilizing two separate systems, one of which is Beckhoff’s CX2040 PLC and the other is National Instrumentation’s data acquisition system. Overall, the data acquisition process used in our experimental setup will provide a comprehensive range of information from various sources, that will allow us to develop hybrid modeling approaches for detecting and diagnosing faults accurately. This multi-dimensional approach to data acquisition is crucial in developing robust and reliable fault detection and diagnosis systems that can operate under diverse conditions. It is worth mentioning that the data acquired during this project will be available as open access.

4 Raw Data and Preliminary Results This section showcases the raw data obtained during the experiment conducted utilizing a mover comprising 12 rollers. The data is obtained under no load conditions with and without rail misalignment fault as illustrated in the Figs. 5 and 6. In this experiment, a cart or mover moves along guide rails at a steady speed of 500 mm/s or 0.5 m/s. The length of the track is 1.5 m, which means that the mover completes one full cycle along the track in exactly 3 s. However, there is a misalignment in the rails located at about 250 mm from the absolute 0 position along the track (see Fig. 4). The misalignment could be due to several reasons, such as poor installation of the guide rail modules with respect to themselves and the motor modules. Alternatively, it could occur during runtime due to loose screws. When the system runs at a sufficiently high speed, it vibrates and can cause some rail screws to loosen slightly. As a result, when the roller glides over the rail near the loose screw, it pushes the rail down, causing contact between the motor coils and the cart’s magnets.

526

A. Jabbar et al.

Fig. 5 Accelerometers raw data for experiment run with 12 roller mover under healthy and no-load conditions

The primary objective of this type of the experiment is to determine the precise location of the misalignment fault that may be present within the system, as it has the potential to degrade the system performance. From a wear-and-tear perspective, the presence of misalignment fault holds significance, as it offers an increase in friction that the mover experiences during its movement. This misalignment causes contact between the mover’s permanent magnets and motor coils, leading to an undesirable and severe frictional force. As a result of this frictional force, a burst of high-frequency impulses is generated, which can be easily observed in Fig. 6, indicating an impulsive nature of the signal. Whereas, STFT of the signal under no fault condition (Fig. 7) contains no signature of the impulsiveness. Additionally, the short time Fourier transform (STFT) of the corresponding signal obtained under faulty conditions (as shown in Fig. 8) also indicates the same behavior. The STFT is calculated using open-source signal processing package called SciPy and detailed documentation can be found thein. This burst of impulses was a direct result of the misalignment of the guide rails, causing the contact between the mover’s

Experimental Setup for Non-stationary Condition Monitoring …

527

Fig. 6 Accelerometers raw data for experiment run with 12 roller mover under rail-misalignment and no-load conditions

Fig. 7 STFT of accelerometers raw data for experiment run with 12 roller mover under healthy and no-load conditions

528

A. Jabbar et al.

Fig. 8 STFT of accelerometers raw data for experiment run with 12 roller mover under rail-misalignment and no-load conditions

permanent magnets and motor coils. Therefore, the misalignment of guide rails can lead to significant degradation in system performance by damaging the motor coils, permanent magnets of the movers, and possibly causing a wear down of the rollers.

5 Conclusion In conclusion, the paper presents an experimental setup based on independent cart systems. It proposes a strategy to acquire data under numerous operating conditions, including with and without the presence of faults, different mover types, and no load and load scenarios. Additionally, several variables of interest are captured, ranging from motor current values, cart position and velocity, to 3-axial vibration data. The 15 channels of current data from the infeed motor module can provide insights into the electrical behavior of the system. Furthermore, the position, velocity, and acceleration information of the mover, coupled with accelerometer data, provide a holistic view of the system’s mechanical behavior. The resulting dataset is the first of its kind and is suitable for condition monitoring of independent cart systems, not only using classical signal processing techniques but also from a data-driven perspective. The availability of this data as open access will provide an opportunity for researchers with diverse expertise and backgrounds to propose new techniques for condition monitoring of independent cart systems.

6 Future Work In the future, we plan to extend the proposed data acquisition strategy to a fleet of carts, creating a comprehensive dataset that provides more accurate insights into system behavior under various operating conditions. This scenario of observing

Experimental Setup for Non-stationary Condition Monitoring …

529

multiple movers is more relevant to industrial operating conditions. Also, we plan to investigate the resulting data using hybrid modelling techniques. Acknowledgements Authors gratefully acknowledge the European Commission for its support of the Marie Sklodowska Curie Program through the H2020 ETN MOIRA project (GA 955681). Authors would like to thank Mr. Giovanni Paladini for his support in troubleshooting the issues related to the experimental setup.

References 1. Randall RB (2011) Vibration-based condition monitoring- industrial, aerospace, and automotive applications. Wiley 2. Konstantin-Hansen H, Herlufsen H (2010) Envelope and cepstrum analyses for machinery fault identification. Sound Vib 44:10–12 3. Peeters C, Guillaume P, Helsen J (2016) A comparison of cepstral editing methods as signal preprocessing techniques for vibration-based bearing fault detection. Mech Syst Signal Process 91:354–381 4. Borghesani P, Pennacchi P, Randall RB, Sawalhi N, Ricci R (2013) Application of cepstrum pre-whitening for the diagnosis of bearing faults under variable speed conditions. Mech Syst Signal Process 36(2013):370–384 5. Randall RB, Sawalhi N, Coats M (2011) A comparison of methods for separation of deterministic and random signals. Int J Cond Monit 1(1) 6. Randall RB, Antoni J (2011) Rolling element bearing diagnostics—a tutorial. Mech Syst Signal Process 25(2):485–520 7. Smith WA, Randall RB (2015) Rolling element bearing diagnostics using case western university data—a Benchmark study. Mech Syst Signal Process 64–65:485–520 8. Immovilli F, Cocconcelli M, Bellini A, Rubini R (2009) Detection of generalized-roughness bearing fault by spectral-kurtosis energy of vibration or current signals. IEEE Trans Industr Electron 56(11):4710–4717. https://doi.org/10.1109/TIE.2009.2025288 9. Antoni J (2007) Fast computation of kurtogram for detection of transient faults. Mech Syst Signal Process 21(1):108–124. https://doi.org/10.1016/j.ymssp.2005.12.002 10. Cocconcelli M, Zimroz R, Rubini R, Bartelmus W (2012) STFT Based Approach for Ball Bearing Fault Detection in a Varying Speed Motor. In: Fakhfakh T, Bartelmus W, Chaari F, Zimroz R, Haddar M (eds) Condition monitoring of machinery in non-stationary operations. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28768-8_5 11. Sun B, Liu X (2023) Significance support vector machine for high-speed train bearing fault diagnosis. IEEE Sens J 23(5):4638–4646. https://doi.org/10.1109/JSEN.2021.3136675 12. Fan Y, Zhang C, Xue Y, Wang J, Gu F (2020) A bearing fault diagnosis using a support vector machine optimised by the self-regulating particle swarm. Shock Vib 9096852. https://doi.org/ 10.1155/2020/9096852 13. Schwendemann S, Amjad Z, Sikora A (2021) A survey of machine learning techniques for condition monitoring and predictive maintenance of bearings in grinding machines. Comput Ind 125. https://doi.org/10.1016/j.compind.2020.103380 14. Surucu O, Gadsden SA, Yawney J (2023) Condition monitoring using machine learning: a review of theory, applications and recent advances. Expert Syst Appl 221:119738. https://doi. org/10.1016/j.eswa.2023.119738 15. Bertolini M, Mezzagori D, Neroni M, Zamori F (2021) Machine learning for industrial applications: a comprehensive literature review. Expert Syst Appl 175:114820. https://doi.org/10. 1016/j.eswa.2021.114820

530

A. Jabbar et al.

16. Rockwell Automation: iTRAK Intelligent Track Systems | Rockwell Automation 17. Beckhoff Automation: XTS | Linear product transport | Beckhoff Worldwide.

Hazardous Object Detection in Bulk Material Transport Using Video Stream Processing Vanessa Meulenberg, Kamal Moloukbashi Al-Kahwati, Johan Öhman, Wolfgang Birk, and Rune Nilsen

Abstract Belt conveyor systems are a primary means of bulk material transport in industrial applications due to their high bulk capacity and limited need for human involvement. Abnormal objects on the belt conveyor can be hazardous to the operation of the belt conveyor systems and/or downstream equipment. The dependability of production on a well-operating system in combination with the high degree of automation and limited inspection accessibility, establishes the need for a continuous and fully automated monitoring solution. In this paper, a monitoring solution comprising a camera, object detection and classification model, and decision support is presented and discussed. The detection and classification model is comprised of two steps: a classical brightness and contour detection algorithm using colour channel weighing, and a subsequent processing by a Convolutional Neural Network (CNN). The CNN performs a classification of the detections as True Positives (TP) or False Positives (FP). Further, the object size is estimated providing a measure for the risk imposed by the object. The solution makes use of an off-the-shelf industrial network camera that communicates with an edge computing device close to the installation site. The edge device is further connected to a SaaS solution for predictive maintenance and decision support where results (classified detections) are visualized in a dashboard. There, operators can assess classified detections as TP or FP, which provides a ground truth for subsequent retraining of the solution. Moreover, these actionable insights enable a warning and stopping mechanism that can be implemented when the operators trust the solution. The solution is implemented and tested at LKAB Narvik and operational since 2021. Initially, the solution was trained using artificially introduced objects and manually labelled video frames, followed by a validation phase to assess the performance of the solution. The solution exceeds targeted performance while having a low false positive rate.

V. Meulenberg (B) · K. Moloukbashi Al-Kahwati · J. Öhman · W. Birk Predge AB, Luleå, Sweden e-mail: [email protected] R. Nilsen LKAB, Narvik, Norway © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_39

531

532

V. Meulenberg et al.

Keywords Pattern recognition · Image analysis · Convolutional neural networks · Labelling · Autonomous decision making · Object detection

1 Introduction For the transportation of large quantities of bulk materials in mining industries, belt conveyor systems play a crucial role due to their high capacity and limited need for human involvement. At LKAB Narvik, where the experimental work of this study was carried out, a dense network of conveyor belts is used to load ships with ore arriving by train from the LKAB mines in Kiruna, Northern Sweden. The conveyor systems are critical logistic elements in the production chain and are often exposed to harsh and adverse environmental conditions. Dust, humidity, excessive loading, and freezing temperatures are just a few examples of the conditions in which conveyors must operate reliably [1]. Occasionally, hazardous objects can end up on the conveyor belts, such as silo chute plates, large chunks of ice, concrete blocks, tools and other undesired objects. Such objects can be damaging to the operation of the belt conveyor systems and/or downstream equipment. For example, metal plates with sharp edges can rip the conveyor belt, and concrete blocks can get stuck in enclosed conveyor belts or sieves, causing material build-up and damage. Such damages can occur anywhere in the production line, where there is a high degree of automation and limited possibilities to perform frequent inspections. Moreover, objects on the belts causing unplanned stops and disruptions in the operations can lead to a delay in material delivery to clients and increased costs such as contamination fees, increased harbour fees, etc. A monitoring solution to detect and warn for abnormal objects would therefore be beneficial for safe and reliable operation of a conveyor system and can be a complementary part of a larger predictive maintenance solution for belt conveyor systems, as discussed by Al-Kahwati et al. [2, 3]. Object detection has a well-established background, and classical object detection methods, such as brightness and contour detection algorithms, are suitable for basic applications. Furthermore, segmenting images into their respective red, green, and blue color channels can aid in detecting objects that exhibit lower intensity, yet are distinctly colored. This technique leverages the color information in images to improve object detection accuracy [4, 5]. However, in diverse and unpredictable operational environments, more advanced detection methods are needed to increase the detection rate while keeping the false detection rate low. The harsh conditions in which conveyor belts operate create unwanted features that are detected by the camera, such as dust, flares, and so on. These features can be referred to as noise and may trigger an object detector, but they may not warrant the stopping of operations. Convolutional neural networks (CNNs) are a widely used deep learning algorithm for image classification [6], and have proven to work reliably in environments with variability, such as the automotive sector and self-driving car applications. The most significant advantage of CNNs is their ability to automatically detect image features

Hazardous Object Detection in Bulk Material Transport Using Video …

533

without human supervision [7]. Therefore, they are a valuable tool for distinguishing between true positives (TPs) and false positives (FPs) as detected by a primitive object detector. In this paper, the authors present an object detection methodology implemented in a Software-as-a-Service (SaaS) platform, utilizing edge devices for local video stream processing. The proposed method consists of a two-step object detection model. Firstly, a classical brightness and contour detection algorithm is employed that uses colour channel weighing to identify objects of different colours, exceeding a predetermined size threshold. The second step involves subsequent processing by a CNN to classify the detections as TP or FP. The solution is implemented on an edge device at the asset site and is connected to a camera installed above the belt conveyor system.

2 Methodology An Axis P1455-LE IP camera was installed above one of the conveyor belt systems in the LKAB Narvik operation chain and connected to the local area network (LAN). Furthermore, a computer was installed at the asset site and connected to the same LAN to ensure swift communication between the devices. The architecture can be seen in Fig. 1. The frames captured by the camera are stored in a queue until a worker thread is available for the processing of said frame. When a thread becomes available, the frame is pre-processed in a manner to extract the relevant features for the first classification step—including the colour and size. Should the first classification step lead to a detection, the frame is further processed in a second step to determine whether the detection is TP or FP. In the case of a TP, a warning is issued to the operators and the frame is visualized in a SaaS platform. Here, the operators can decide if an action such as stopping the conveyor is necessary so that the object can be removed and thus provide a ground truth assessment. The ground truth information is stored for further training of the classification models.

2.1 Frame Pre-processing In order to address the issue of detecting on empty conveyor belt frames, an intensity thresholding approach was employed, whereby frames exhibiting a much darker intensity level than those containing ore were eliminated. Prior to the first classification step, frames were pre-processed by cropping them to a size of 935. × 580 pixels, following which a mask was applied to a region of interest (ROI). To further refine the frames and remove unwanted noise and graininess, a non-linear median filtering technique was applied.

534

V. Meulenberg et al.

Fig. 1 Architecture of the two-step object detector as constructed on the edge device

2.2 Initial Classification Step The initial step of object classification is designed to be less stringent, with a view to reducing the possibility of missing objects on the belt. Specifically, the approach is tailored to permit a greater number of false positives to pass through. The first classification step works as follows: a key frame was generated by averaging multiple frames during regular conveyor operation. Each frame was subsequently converted to grayscale, and the red, green, and blue (RGB) color channels were extracted. The original, grayscale and RGB channel images can be seen in the example of Fig. 2. Subsequently, the differential frames obtained by subtracting the key frames from the grayscale and RGB frames were analyzed for contours. The border following method proposed by Suzuki and Abe [8] was used for detecting the contours of the differential frames, as shown in Fig. 3. Through consultations with the operators, it was agreed that a minimum size threshold of 20 .× 20 cm or 500 cm.2 would be used to classify detected objects. It is known that the spacing between the left and right idlers is 200 cm, thus, the allowable pixel area threshold with respect to the location along the belt can be defined as follows:

Hazardous Object Detection in Bulk Material Transport Using Video …

535

Fig. 2 Original, grayscale, red, green and blue images of a safety helmet

A T = 9.606y + 656,

(1)

where . A T is the threshold area in px.2 and y is the location of the center point of the detected object (y is zero at the top of the image). The area of a rectangular bounding box which enclosed the detected contours were calculated and compared to the threshold calculated in Eq. 1.

536

V. Meulenberg et al.

Fig. 3 On the left the cropped, images with ROI mask applied is show and on the right the differential frame from the key frame is shown of a a metal sign from a gray scale image and b a safety helmet from the red channel image

2.3 Second Classification Step The second classification step was used to distinguish whether the detections made during the less aggressive first classification step are TP or FP. This was done using a image classification CNN. Full-sized images with the ROI mask in place were rescaled to 480. × 480 pixels before being used for training. The model was trained on 739 images, manually labeled as unacceptable, and 916 manually labeled as acceptable. The acceptable database contained images of normal operation, flares, small lumps (lumps that did not make the size threshold as calculated in Eq. 1) as can be seen in Fig. 4a. The unacceptable dataset contained images with artificially introduced objects, blocks of ice and concrete lumps (see examples in Fig. 4b). The CNN architecture can be seen in Fig. 5. To compensate for the relatively small dataset, a data augmentation layer was added to the model, where a random horizontal flip was applied. The images were then rescaled to have pixel values between 0 and 1. A sequential model consisting of three convolution blocks with 32, 64 and 128 filters, respectively, was used followed by a max pooling layer after each of the convolution layers. To prevent over-fitting, a dropout regularization layer was added to the network. A flatten layer was then added, followed by a fully connected dense layer with 256 units activated by the ReLU activation function.

Hazardous Object Detection in Bulk Material Transport Using Video …

537

Fig. 4 Example images of the training dataset classified as a acceptable (normal operation) and b unacceptable (objects, etc. on the belt)

The model was compiled using the Adam optimizer, which is an effective stochastic gradient descent (SGD) method based on adaptive first- and second-order moments, as defined by Kingma and Ba [9]. The model losses were computed using the sparse categorical cross-entropy loss function. The model was trained using 50 epochs with a training-validation split of 20% and a batch size of 32.

2.4 Testing and Validation The initial assessment of the first classification step involved the introduction of artificial objects onto the belt. Objects of varying color, brightness, and size were placed on the belt and frames were collected. These frames along with frames from normal operations in different conditions where the stream yielded frames that are normal, empty, dusty, with flares and reflections, flooding, etc. were used to test the first classification. In total 1,655 frames were used for testing. The resulting detections in different color channels and grayscale were analyzed. The number of TPs, FPs, true negatives (TNs) and false negatives (FNs) recorded for the frames. Example frames can be seen in Fig. 4. The training set for the CNN consisted of the images from the first classification step divided into two classes, acceptable (FPs and TNs from the first classification step) and unacceptable (TPs and FNs from the first classification step), where the unacceptable class represents frames containing unwanted objects, and the acceptable class represents normal operation. The CNN validation set comprised 20% of

538

V. Meulenberg et al.

Fig. 5 Details and layers of the convolutional neural network

all images, and an additional 300 images were withheld for further testing. Model validity was assessed by evaluating various metrics, including accuracy, positive predictive value (PPV) and the recall (see Eqs. 2–4). Additionally the inference time as well as the training and validation accuracy and losses were analysed. Accuracy =

TP +TN , T P + T N + FP + FN

(2)

P PV =

TP , T P + FP

(3)

Recall =

TP . T P + FN

(4)

Hazardous Object Detection in Bulk Material Transport Using Video …

539

3 Results and Discussion 3.1 First Classification Step The results from the first classification step are shown in Table 1. The accuracy of the object detection model for grayscale images was found to be highest, at around 0.72. Notably, gray and green channels exhibited similar TP and FP values as can be seen in Fig. 6. Dark objects such as the boxes shown in Fig. 4b were not detected in any channels, likely due to their non-reflective nature which results in their loss during differential frame creation. PPV scores for grayscale and green channel frames were found to be significantly higher than those for red- and blue channels, which yielded more FPs due to greater intensity of red and blue tints in iron ore. The true positive rate, or recall of the red channel was found to have the highest score (0.51), followed by gray, green, and blue channels. The red channel model yielded the highest number of TPs, likely due to the presence of red objects such as the helmet. These results indicate that although the red channel yields more detections, it also results in more FPs. Considering the two-step nature of the object detection model, it is advantageous to utilize RGB channels in conjunction with grayscale images to ensure detection of all objects and prevent belt damage. Without the second step classification, grayscale images would yield the most desirable results as false warnings could lead to unnecessary belt stoppage. Finally, the processing time per frame for this model was found to be 34 milliseconds, which is deemed sufficient for detecting objects in real-time on the live stream.

3.2 Second Classification Step The performance of the CNN model has been evaluated and is presented in Fig. 7. The validation accuracy of the model is notably high, indicating that it performs well on unseen data. The validation accuracy exhibit a linear increase with each training step. The training and validation losses closely follow each other, indicating that overfitting has been avoided. The evaluation metrics of the CNN model are summarized in Table 2. The model achieved outstanding results, with an accuracy of 0.97. Remarkably, the model did not

Table 1 Results of the first classification step Grayscale Red Metric Accuracy PPV Recall Processing time

0.72 0.90 0.41 34 ms/image

0.60 0.56 0.51

Green

Blue

0.71 0.90 0.40

0.59 0.55 0.37

540

V. Meulenberg et al.

Fig. 6 Confusion matrix for the first classification step of the a gray images, b red channel images, c green channel images and d blue channel images

produce any FPs from the validation set. The recall rate of the model was 0.99, suggesting that it accurately classified a high proportion of validation frames as acceptable, such as flares, smaller lumps, dusty frames, etc. Frames containing objects such as concrete lumps, metal plates, safety gear were accurately classified as unacceptable by the model. Finally, the CNN model was found to have a fast inference time of 8.1 milliseconds per image, making it highly efficient for practical use. The findings presented in this study provide a promising initial framework for understanding operations in the conveyor belt environment. However, it is crucial to acknowledge the restricted scope of the dataset utilized in this research, which may not comprehensively reflect all operational events, particularly those that are unanticipated. These unforeseen circumstances frequently occur in the surroundings of the conveyor belt, thus highlighting the necessity for further data collection and retraining of the model.

Hazardous Object Detection in Bulk Material Transport Using Video …

541

Fig. 7 Training and validation accuracy and loss per training step Table 2 Metrics of the CNN Metric Accuracy PPV Recall Processing time

Value 0.97 0.95 0.99 8.1 ms/image

4 Conclusions and Recommendations In this paper, an object detection model was presented for identifying hazardous materials in conveyor transport systems using data collected from a video stream. The model comprised a two-step classification algorithm, with the first step being a primitive brightness and contour detection approach that used different colour

542

V. Meulenberg et al.

channels and grayscale images, while the second step was a CNN model that classified the detections as either acceptable or unacceptable. Although the primitive approach detected many test objects placed on the belt, it also produced several false positives such as flares due to its intensity-based methodology. The use of colour channels assisted in the detection of coloured objects, but dark objects were not detected because of their low intensity. Further development is needed to enhance the detection of dark and less reflective objects. The second classification step, the CNN model, produced excellent results, effectively eliminating false positives and accurately detecting hazardous objects. Since the solution is a two-step model, the first step can be more aggressive, knowing that false positives can be eliminated by the second step. This will enable operators to stop the belt when alerted in the SaaS dashboard, reducing unplanned downtime and costs and preventing damage and contamination. While the results are promising, future work can consider several recommendations. For example, the CNN model may be used without the first classification step, increasing the likelihood of detecting dark objects, simplifying the process, and reducing the required computing power. As the system continues to operate, more data can be collected, and with operator feedback and ground truth assessment, the CNN can be retrained and refined for further unknown situations that the conveyor belt system may present. Acknowledgements The authors of this paper would like to thank Tord Arnóy and the operators at LKAB Narvik for their contributions, insights and help with testing.

References 1. Zhang M, Shi H, Zhang Y, Yu Y, Zhou M (2021) Deep learning-based damage detection of mining conveyor belt. Measurement 175: 109130. https://www.sciencedirect.com/science/article/pii/ S0263224121001561 2. Al-Kahwati K, Birk W, Nilsfors EF, Nilsen R (2022) Experiences of a digital twin based predictive maintenance solution for belt conveyor systems. PHM Soc Eur Conf 7: 1–8. https://doi. org/10.36001/phme.2022.v7i1.3355 3. Al-Kahwati K, Saari E, Birk W, Atta K (2021) Condition monitoring of rollers in belt conveyor systems. In: 2021 5th international conference on control and fault-tolerant systems (SysTol), pp 341–347 4. Alvarado-Robles G, Osornio-Ríos RA, Solís-Muñoz FJ, Morales-Hernández LA (2021) An approach for shadow detection in aerial images based on multi-channel statistics. IEEE Access 9: 34 240–34 250 5. Maison, TL, Luthfi A (2019) Retinal blood vessel segmentation using gaussian filter. J Phys: Conf Ser 1376(1): 012023. https://dx.doi.org/10.1088/1742-6596/1376/1/012023 6. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553): 436–444. https://doi. org/10.1038/nature14539 7. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8

Hazardous Object Detection in Bulk Material Transport Using Video …

543

8. Suzuki S, Be K (1985) Topological structural analysis of digitized binary images by border following. Comput Vis, Graphics, Image Process 30(1): 32–46. https://www.sciencedirect.com/ science/article/pii/0734189X85900167 9. Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015—conference track proceedings

Rotor and Bearing Fault Classification of Rotating Machinery Using Extracted Features from Experimental Vibration Data and Machine Learning Approach Khalid M. Al Mutairi, Jyoti K. Sinha, and Haobin Wen

Abstract Earlier studies have optimised the vibration-based parameters to identify the rotor defects only for the rotating machines. The artificial neural network (ANN) model was used earlier to classify the faults. The earlier optimised parameters are further examined for both rotor and bearing defects. These parameters are slightly modified in this research to accommodate bearing defects. The paper presents the study using an experimental vibration data from a laboratory-scaled rig. Keywords Rotating machines · Vibration-based condition monitoring · Machine learning · Rotor faults · Bearing faults

1 Introduction Faults in rotor machinery are inescapable because of many factors, such as errors in manufacturing, provision of tolerance on mating parts and human error during the assembling process of the different parts of the system. Furthermore, operating conditions such as the generation of heat and general wear and tear may also result in the occurrence of rotor faults. The most common rotor faults that cause rotor vibration are rotor unbalance, rotor/coupling misalignment and rotor-to-stator rubbing [1–4]. The vibration response of rotor machines is sensitive to any change in the structural parameter. Moreover, vibration behaviour due to rotor defects varies depending

K. M. Al Mutairi (B) · J. K. Sinha · H. Wen School of Mechanical, Aerospace and Civil Engineering (MACE), The University of Manchester, Manchester M13 9PL, UK e-mail: [email protected] J. K. Sinha e-mail: [email protected] H. Wen e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_40

545

546

K. M. Al Mutairi et al.

on the nature of the fault. Hence, it can be said that mechanical failures can be indicated early using vibration signals. Therefore, vibration-based condition monitoring (VCM) has become a well-accepted method for diagnosing faults in rotor types of machinery. Generally, the VCM is done by installing several vibration sensors, such as accelerometers, at individual bearing locations of a monitored machine [5]. The VCM of rotating machinery has received intense research interest in the last three decades. Muszynska [6] has highlighted the advantages of VCM as a part of maintenance activities. Edwards et al. [7] have given an overview of the VCM of rotating machines. Sinha [8] has presented a comprehensive literature review of the VCM in rotating machines. Several studies in the literature have investigated the ability to automate the vibration-based fault diagnosis process by using artificial intelligence (AI) techniques, such as Fuzzy Logic methods and Artificial Neural Networks (ANN) [9]. Model-based fault diagnosis techniques seem to be the most promising VCM methods as they have the ability to identify the type, severity, and location of several machine faults [10–12]. An expert system can be created by applying different diagnostic reasoning strategies [13]. Much interest has been shown in applying AI techniques in machine condition monitoring in the last few years. One prominent approach in the literature is ANN [14–16]. It can be defined as a non-linear statistical data modelling tool simulating the neural structure of the human brain (nerves and neurons), and it learns from experience (i.e., adaptive) [17]. Graupe [18] has provided a general introduction to ANNs. It has been shown that the ANNs can identify and classify various faults in rotor machinery, such as rotor unbalance, crack, and shaft bow [19–21]. Furthermore, ANNs have been used to detect bearing faults [22, 23]. Kolar et al. [24] presented a novel technique for deep-learning-based data-driven fault diagnosis for a rotating machine. The proposed method inputs raw three axes accelerometer signal as high definition 1D matrix into features extractor part of a convolutional neural network, enabling high classification accuracy. Using convolutional neural networks (CNN) trained model, classification in one of the four classes can be performed. Espinoza-Sepulveda and Sinha [25] have proposed a rotor faults identification method that relies on rotor vibration measurements from an experimental lab rig and implemented an artificial intelligence (AI)-based machine learning (ML) model. The study focused on optimising vibration-based parameters for identifying rotor defects only and used the artificial neural network (ANN) model for classification. However, there is a need to examine the effectiveness of these parameters for detecting both rotor and bearing defects. In this paper, the study explores the same parameters from a measured vibration data from a laboratory-scaled rig for both rotor and bearing faults. However, a couple of the parameters are slightly modified to accommodate bearing defects. The paper presents the ANN model, optimised vibration-based parameters, and the observations on the detection of both rotor and bearing faults.

Rotor and Bearing Fault Classification of Rotating Machinery Using …

547

2 Machine Learning (ML) Model Approach [25] The faults in the rotating machine can be diagnosed using the Vibration-based Machine Learning (VML) model that is created by employing the artificial neural network (ANN) technique. ANN is a modelling tool that learns from experience (i.e., adaptive). ANNs are knowledge-based systems that are created by a training procedure that establishes a link between symptoms and their corresponding causes [26]. This research introduces a multi-layered perceptron (MLP) network structure by presenting 4 hidden layers of weight between the inputs and the outputs [27]. The MLP is used for pattern recognition and extracting feature classification as inputs. In order to get an accurate performance, the network parameters such as the number of layers, neurons, and types of functions employed at the various stages are established and tweaked through iterations. The output of these iterations is a feedforward network with four hidden layers. Layers 1–4 have a variable quantity of non-linear neurons, which are 1000, 1000, 100 and 10, respectively, as illustrated in Fig. 1. The ANN model is kept precisely the same as that was used earlier [25]. The first function employed is the transfer function at the hidden neurons, namely the hyperbolic tangent sigmoid [28]. The second selected function is the transfer function at the output neurons (i.e., SoftMax) [29]. The third and fourth functions are the training function (i.e., scaled conjugate gradient back-propagation) and the performance function (i.e., cross-entropy). In this stage, the input layer passes toward the hidden layer, concluding with obtaining the result in the decision layer, which has 5 possible outputs. The 5 possible outputs are healthy, misalignment, shaft crack, rub and bearing-related fault. The selected parameters from the vibrational data are grouped into three different data sets. The first data set is responsible for training the ML model and modifying the weights per the learning rule. This set forms 70% of the samples for each machine condition. The validation process is accomplished using 15% of the samples by verifying the trained model with these samples until their classification error reaches

Fig. 1 Typical multi-layer perceptron (MLP) neural network used in the study [25]

548

K. M. Al Mutairi et al.

Table 1 Samples of different rotating machine conditions at a speed of 15 Hz (900 RPM) Machine condition

No. of runs

Healthy (residual unbalance and residual misalignment)

40

Misalignment

40

Shaft crack

80

Rub

40

Faulty bearing

40

an allowable limit, permitting the order to stop the training process. Reaching this stage means that the weights are optimal for the network. Finally, the third group of data (i.e., 15%) is examined, leading to the generalisation of the model. The data sets per rotor speed are depicted in Table 1. Equation (1) is employed to compute the performance of the ML model. Performance(%) =

no. correct classification total of input

× 100%

(1)

3 Laboratory Rig The measured vibrational data from an experiment in the current study are used [30]. Figure 2 displays a schematic diagram, and Fig. 3 shows a photograph of the experimental rig. The experimental rig consists of two 20 mm diameter stainless steel shafts with two different lengths. The first shaft (SH1) length is 1 m, and the second shaft (SH2) is 0.5 m. The shafts are attached through a rigid coupling and supported by four grease-lubricated ball bearings.

Fig. 2 Schematic drawing of the experimental rig

Rotor and Bearing Fault Classification of Rotating Machinery Using …

549

Fig. 3 The experimental rig setup

Each bearing is secured on a flexible bearing pedestal. Each bearing pedestal is secured to a steel base bolted on the base structure. The rotor-bearing-foundation system is mated to a 3 Phase motor through a flexible coupling to drive the rotor at different speeds. The motor’s power is 0.75 kW, and its maximum speed is 3000 rpm. Shaft SH1 carries two balancing discs, while Shaft SH2 carries one balancing disc. The balancing discs have a diameter of 125 mm and a thickness of 14 mm.

4 Analysis of Measured Data The rotor speed was 900 RPM (15 Hz) for measured vibration data for the machine conditions—healthy (only residual unbalance and misalignment), misalignment, crack in the shaft, rotor rub and faulty bearing, B2. The measured data consists of vibration acceleration responses of bearing (B1 to B4) housings at an angle of 45° from the horizontal direction [30]. The sampling frequency of 10,000 Hz was used for measured vibration data [30]. Table 1 shows the summary of the measured vibration data. It is well known in industries that residual unbalance and misalignment are always likely in the health machine condition. This presence of rotor imbalance and misalignment leads to the appearance of peaks in the 1x (frequency synchronous to

550

K. M. Al Mutairi et al.

rotor speed) and 2× speed, respectively. The presence of other malfunctions in the rotor and/or bearing will lead to the appearance of more peaks at the higher harmonics of the rotor speed frequency and may be other frequency peaks in their vibration spectra. Figure 4 shows typical vibration velocity amplitude spectra at the bearing, B2, for the different faults. Figure 4 shows peaks at 1x (frequency synchronous to rotor speed) and a few harmonics. Therefore, it requires experience and knowledge to make the fault diagnosis. The spectrum of the bearing B2 fault in Fig. 4 does not show the bearing fault’s presence. Hence the envelope spectrum analysis is also carried out. Typical envelope spectra in the acceleration domain at the bearing, B2, are shown in Fig. 5 for all faulty machine conditions. For the envelope analysis, the acceleration vibration signals are filtered by a bandpass filter of 2000–5000 Hz to remove the rotor-related harmonics frequencies and other higher frequencies from the measured acceleration signals. Hilbert transform is then applied to the filtered signals to get the envelope data, and then the envelope spectrum is calculated. Figure 5e clearly shows peaks at 5.6 Hz and its 2nd harmonics at 11.2 Hz. The frequency of 5.6 Hz is the bearing cage fault frequency. Hence this confirms that the bearing, B2, has the cage fault. However, there are no bearing characterises related frequency peaks are observed for other machine conditions. Hence there are no bearing defects for other machine conditions.

5 Data Preparation and Application of the ML Model The optimised vibration parameters, both in time and frequency domains used by Sepulveda and Sinha [25], are again used in the present study. These parameters are acceleration root mean square (RMS) values, Kurtosis of acceleration data (K), velocity vibration amplitudes are the frequencies of 1×, 2×, and 3× from the velocity vibration spectra, and the velocity spectrum energy (SE). The following 2 modifications are done in the optimised parameters: (a) The frequency range of the SE is extended for the current study to include the effects of subharmonics and any other frequency components due to rub and bearing faults related to the bearing assembly and housing resonance frequency. The range from 0.3 times to 333 times the rotational speed of the shaft is included in the Spectrum energy SE in order to count for the effect of bearing and shaft rub faults. (b) The Kurtosis (FK) of acceleration vibration data is calculated using the band pass filtered vibration signals. The band pass filter between 2000 and 5000 Hz are used so that the vibration signals should only contain the bearing-related frequencies. The databank, Data, are constructed as follows: T Data = Data H Data M DataC Data R Data B F

(2)

Rotor and Bearing Fault Classification of Rotating Machinery Using …

551

Fig. 4 Typical velocity vibration spectra at bearing 2 (B2) for the different machine faulty conditions

552

K. M. Al Mutairi et al.

Fig. 5 Typical envelope spectra at bearing 2 (B2) for faulty machine conditions

Rotor and Bearing Fault Classification of Rotating Machinery Using …

553

where Data H is the databank for the healthy condition, Data M is the databank for the faulty misalignment condition, DataC is the databank for the faulty cracked shaft condition, Data R is the databank for the faulty rotor rub condition, and Data B F is the databank for the bearing fault condition. Each databank consists of the parameters from each run. Each databank is arranged as per Eq. (3). T Data H = H1 H2 . . . H40 , T Data M = M1 M2 . . . M40 , T DataC = C1 C2 . . . C80 ,

(3)

T Data R = R1 R2 . . . R40 , T Data B F = B F 1 B F 2 . . . B F 40 where the subscripts 1, 2, 3, … represent the machine run numbers (or sample numbers) as per Table 1. The parameters per run (24 elements, 6 parameters per bearing × 4 bearings) for the healthy machine condition are arranged as per Eq. (4). Hi = [R M S B1i F K B1i 1x B1i 2x B1i 3x B1i S E B1i . . . R M S B2i . . . S E B2i R M S B3i . . . S E B3i R M S B4i . . . S E B4i ]

(4)

where i is run number (sample number) of the healthy machine condition. Similarly, the databanks can be arranged for other machine fault conditions. Once the databanks are prepared, the ML model is applied, as discussed in Sect. 2.

6 Results It has been observed that the overall performance in all stages, which are training, validation, and testing, is 100% for faults detection. It is evident from the result that the time domain parameters (acceleration RMS and filtered Kurtosis) and the frequency domain parameters (1×, 2× and 3× amplitudes of the velocity spectra and spectrum energy) are good indicators that represent the machine dynamics. Table 2 presents the overall performance of the ML model discussed in Sect. 2. The results clearly show that the VML model is capable of identifying both the rotor and bearing faults with 100% accuracy.

554

K. M. Al Mutairi et al.

Table 2 Overall performance of the VML model Target class Healthy Output class

Misalignment

Shaft crack

Rub

Faulty bearing

Healthy

100

0

0

0

0

Misalignment

0

100

0

0

0

Shaft crack

0

0

100

0

0

Rub

0

0

0

100

0

Faulty bearing

0

0

0

0

100

7 Concluding Remarks The optimized vibration parameters and the ANN-based VML model developed earlier are useful for detecting faults in both the rotor and bearings of rotating machines. To include bearing faults, a couple parameters have been slightly modified. The methodology is then validated using experimental rotation rig data, and it has been observed that the fault classification is 100% accurate. These results demonstrate the potential of this method for industrial applications. Acknowledgements Jyoti K. Sinha acknowledges his Ph.D. student Luwei for the development of the rig, and for the experimental data that are used in this study. Khalid M. Almutairi acknowledges the scholarship sponsored by the Government of the Kingdom of Saudi Arabia to study in the UK.

References 1. Verma A, Sarangi S, Kolekar MH (2014) Experimental investigation of misalignment effects on rotor shaft vibration and on stator current signature. J Fail Anal Prev 14(2):125–138 2. Prabhakar S, Sekhar AS, Mohanty AR (2001) Vibration analysis of a misaligned rotor— coupling—bearing system passing through the critical speed. Proc Inst Mech Eng Part C: J Mech Eng Sci 215(12):1417–1428 3. Patel TH, Darpe AK (2009) Experimental investigations on vibration response of misaligned rotors. Mech Syst Signal Process 23(7):2236–2252 4. Feng ZC, Zhang X-Z (2002) Rubbing phenomena in rotor–stator contact. Chaos Solitons Fractals 14(2):257–267 5. Sinha JK (2020) Industrial approaches in vibration-based condition monitoring. CRC Press 6. Muszynska A (1995) Vibrational diagnostics of rotating machinery malfunctions. Int J Rotating Mach 1(3–4):237–266 7. Edwards S, Lees A, Friswell M (1998) Fault diagnosis of rotating machinery. Shock Vib Dig 30(1):4–13 8. Sinha JK (2002) Health monitoring techniques for rotating machinery. University of Wales Swansea, Swansea, UK 9. Worden K, Staszewski WJ, Hensman JJ (2011) Natural computing for mechanical systems research: a tutorial overview. Mech Syst Signal Process 25(1):4–111

Rotor and Bearing Fault Classification of Rotating Machinery Using …

555

10. Bachschmid N, Pennacchi P (2002) Multiple fault identification method in the frequency domain for rotor systems. Shock Vib 9(4–5) 11. Bachschmid N, Pennacchi P (2003) Accuracy of fault detection in real rotating machinery using model based diagnostic techniques. JSME Int J Ser C 46(3):1026–1034 12. Bachschmid N et al (2000) Accuracy of modelling and identification of malfunctions in rotor systems: experimental results. J Braz Soc Mech Sci 22:423–442 13. Chasalevris AC (2009) Vibration analysis of nonlinear-dynamic rotor-bearing systems and defect detection. PhD thesis, Department of Mechanical Engineering and Aeronautics, University of Patras 14. Rao BKN, Pai PS, Nagabhushana TN (2012) Failure diagnosis and prognosis of rolling element bearings using artificial neural networks: a critical overview. In: 25th international congress on condition monitoring and diagnostic engineering (Comadem 2012). 364 15. Fast M (2010) Artificial neural networks for gas turbine monitoring.: division of thermal power engineering. Department of Energy Sciences, Faculty of Engineering, Lund University 16. Mayes IW (1994) Use of neutral networks for online vibration monitoring. Proc Inst Mech Eng Part A-J Power Energy 208(A4):267–274 17. Haykin SS (1999) Neural networks: a comprehensive foundation. Prentice Hall 18. Graupe D (2007) Principles of artificial neural networks. World Scientific, Singapore 19. McCormick AC, Nandi AK (1997) Real-time classification of rotating shaft loading conditions using artificial neural networks. IEEE Trans Neural Netw 8(3):748–757 20. Tao Y, Qingkai H (2010) Crack fault identification in rotor shaft with artificial neural network. In: Sixth international conference on natural computation (ICNC) 21. Srinivas H, Srinivasan K, Umesh K (2010) Application of artificial neural network and wavelet transform for vibration analysis of combined faults of unbalances and shaft bow. Adv Theor Appl Mech 3(4):159–176 22. Liu TI, Mengel JM (1992) Intelligent monitoring of ball bearing conditions. Mech Syst Signal Process 6(5):419–431 23. Li B, Goddu G, Mo-Yuen C (1998) Detection of common motor bearing faults using frequencydomain vibration signals and a neural network based approach. In: Proceedings of the American control conference 24. Kolar D, Lisjak D, Paj˛ak M, Pavkovi´c D (2020) Fault diagnosis of rotary machines using deep convolutional neural network with wide three axis vibration signal input. Sensors 20(14):4017 25. Sepulveda NE, Sinha J (2020) Parameter optimisation in the vibration-based machine learning model for accurate and reliable faults diagnosis in rotating machines. Machines 8(4):66 26. Vyas NS, Satishkumar D (2001) Artificial neural network design for fault identification in a rotor-bearing system. Mech Mach Theory 36(2):157–175 27. Tarassenko L (1998) A guide to neural computing applications. Elsevier, Amsterdam, The Netherlands 28. Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59(4–5):257–263. https://doi.org/10.1007/bf0033 2914 29. Bishop CM (2006) Pattern recognition and machine learning. Springer 30. Luwei K (2022) Vibration-based fault identification for rotor and ball bearing in rotating machines, PhD thesis, The University of Manchester, UK

Are We There Yet?—Looking at the Progress of Digitalization in Maintenance Based on Interview Studies Within the Swedish Maintenance Ecosystem Mirka Kans

Abstract Industry 4.0 promises huge effects on industrial performance, once critical equipment is equipped with sensors and interconnected, and big data sets and digital twins are established that allows for advanced data analytics using machine learning, cognitive computing, and information visualization techniques. Maintenance is an area of industrial activities that would greatly benefit from the implementation of Industry 4.0. But how far has the digital transformation progress come? In 2018, an interview study was performed with 14 representatives within the maintenance ecosystem during the Nordic maintenance fare held in Gothenburg. A similar study was performed at the fare held in 2022, in which 22 actors representing system providers, computerized maintenance management suppliers, researchers, and educators participated. The aim of the studies was to get a broad view on maintenance in the digital era, covering topics like enabling technologies, challenges as well as opportunities. This paper reports on the similarities and differences in results from the two interview studies and draws conclusions on the progress and directions of the digitalization in maintenance. The findings suggest that the progress is rather slow. Data management and decision-making capabilities forms the basis for digitalization of maintenance. The focus on sensor technology has somewhat been reduced, while the prediction was that it would have increased. Instead, the ability to communicate and share information is stressed. Advanced analytical capabilities are foreseen to have a breakthrough in five years’ time, as well as technologies for data gathering and communication. The challenges are mainly the same, i.e., related to competence, leadership, and strategy. This suggests that, to enable the digital transformation, we should focus on the formulation of appropriate business cases and initiation of pilot studies, supporting the implementation process and involving all people in the change, and securing the competence and skills by training, education, and recruitment of young people to maintenance positions.

M. Kans (B) Chalmers University of Technology, Vera Sandbergs Allé 8 SE-412 96, Göteborg, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_41

557

558

M. Kans

Keywords Maintenance digital transformation · Digitalization in maintenance · Industry 4.0 · Interview study · Comparative study · Challenges and opportunities · Swedish maintenance ecosystem

1 Introduction The digitalization of maintenance started over 40 years ago [1], and the term condition-based maintenance (CBM) have been used for about the same amount of time [2]. CBM is a data-driven and knowledge-based maintenance strategy that is enabled by emerging technologies. Industry 4.0 has been a hot topic for practitioners as well as researchers the past decade and is seen as an industrial revolution characterized by smart systems and Internet-based solutions that allows for creating effective, integrated, and flexible production and information flows [3–5]. Maintenance 4.0 can be viewed as a subset of Industry 4.0 allowing for efficient and automatic management of maintenance through automatic data collection, analysis, visualization, and decision making [6]. Emerging technologies of Industry 4.0 comprise information and communication technologies as well as electronics and process control technology [7], which provides the organization with digital capabilities of (1) Connecting and storing, (2) Understanding and acting, and (3) Predicting and self-optimizing [8]. Connecting and storing are the basic capabilities of Industry 4.0 enabled by Internet of things (IoT), i.e., objects fitted with sensors and processors that communicate and interact with each other [9, 10]. IoT enables the collection of large amounts of data, which are referred to as Big Data (BD). Mobile computing and Cloud Computing (CC) serves as the backbone for data collection and storing [6]. The term Cyber-Physical System (CPS) is often used to describe the total system that is connected as a network but shares the physical world, such as hardware, with a variety of communication systems that interact in the physical world [11]. Data carrying devices such as Radio Frequency IDentification (RFID) makes it possible to trace products and components through the manufacturing process [12]. Understanding and acting capabilities are those that give the organization the ability to monitor and control different processes. The enormous amount of data that is created with digitalized systems for connecting and storing require processing capability such as Machine Learning (ML) algorithms or other AI applications [6, 10, 12]. The purpose of AI is to mimic the brain’s ability to solve problems, plan, and draw conclusions, plan, and solve problems [13]. Machine learning is the use of mathematical or statistical methods for pattern recognition in big data sets [14–16] and gives opportunities for improved failure diagnostics and maintenance planning [17]. For optimized decision-making, the information should be presented in a form that is recognized and understood by a human. Visualization reinforces human cognition by presenting large amount of qualitative or quantitative data [18]. Predicting and self-optimizing capabilities are advanced analytical capabilities that enables the organization to work predictively and prescriptible. This is enabled by AI technologies such as Deep Learning (DL), which is an advanced form of machine

Are We There Yet?—Looking at the Progress of Digitalization …

559

learning using neural network methods. Self-optimizing machines could be in the form of automated planning and scheduling or self-adapting and self-maintaining machinery, or Intelligent Robotics [12, 19]. Emerging technologies could also support the practical execution of maintenance. Augmented reality (AR) and virtual reality (VR) enables the users, such as maintenance technicians, to interact with a virtual environment for training purposes or for supporting the execution of complex tasks [18, 19]. The main purpose of this paper is to understand the opportunities as well as challenges faced in maintenance with respect to emerging Industry 4.0 technologies, and how to support the digital transformation in the best way. For reaching the purpose, two interview surveys were conducted in 2018 and 2022 aiming at creating a broad view on maintenance in the digital era from various Swedish actors, such as technology providers, computerized maintenance management suppliers, consultants, researchers, and educators. The paper disposition is as follows: Sect. 2 presents the study setup and Sect. 3 the main results from the two studies, ending with a comparative and forward looking discussion based on the study results. General conclusions as well as areas of future research are given in Sect. 4.

2 Method In 2018, an interview study was performed with representatives within the maintenance ecosystem during the major bi-annual Nordic maintenance fare held in Gothenburg. The fare is the largest maintenance related event in Northern Europe consisting of over 300 exhibitors and 11 000 visitors, and attracts companies, practitioners as well as researchers [20]. The aim of the study was to get a broad view on maintenance in the digital era, covering topics like enabling technologies, challenges as well as opportunities. Study participants were 14 actors representing system providers, computerized maintenance management suppliers, researchers, and educators. A similar study was performed at the Nordic maintenance fare held in 2022, in which 22 actors representing system providers, computerized maintenance management suppliers, researchers, and educators participated. The study participants are summarized in Table 1. Note: in the table the total number (41) is higher than the number of participants (36) as some participants represent more than one actor type. The interviews lasted about 30 min each were based on four predetermined and open-ended questions: 1. Which technology do you view as the most important today for developing maintenance, and within which area (planning, preparation, execution, follow-up, improvement of maintenance)? 2. Which technology has made its breakthrough in five years’ time, and which area has developed most? 3. Which are the biggest digital challenges in maintenance? 4. How can the digital development best be facilitated?

560

M. Kans

Table 1 Study participants

Actor type

2018

2022

Maintenance consulting services

3

6

Total 9

Technology/IT consulting services

1

4

5 10

Supplier of IT solutions

4

6

Supplier of technology solutions

0

4

4

Supplier of products

2

0

2

Trade organization

1

0

1

Education

3

2

5

Research

2

3

5

In the 2022 interview study, the following question was added: 5. Imagine the scenario that we achieved full Industry 4.0. What new challenges will we encounter? The participants were free to give other comments upon the subject as well.

3 Results The results are presented in the same order as the questions. 3.1 comprises results from questions one and two, 3.2 results from question three, and 3.3 results from question four. Question five, which was included in the 2022 study, is used as basis for the comparative discussion in 3.4.

3.1 Enabling Technologies in Maintenance 3.1.1

Enabling Technologies of Today (2018 and 2022 Respectively)

Digitalization affects all areas of maintenance management, according to the participants in 2018. Especially planning is enabled by digital solutions, but also follow-up and improvement of maintenance. Many participants viewed digitalization as the way to reach condition-based and predictive maintenance but also a reality where sensor data were hard to utilize efficiently for the planning or improvement of maintenance. Technologies that enable condition-based maintenance (CBM) was the most common enabling technology mentioned in 2018. A wide set of technologies were mentioned, of which the following three were most frequent: . Sensor technology . Machine learning . Visualization

Are We There Yet?—Looking at the Progress of Digitalization …

561

These technologies support a CBM approach: The ability to collect real time data from the production and machines is the basis for advanced analysis of the data. Advanced planning engines, using machine learning or other artificial intelligence solutions for analyzing the large data sets retrieved from production, will enable the transformation of maintenance planning from calendar based to predictive and condition-based maintenance, according to the study participants. The ability to analyze large sets of data and present the results in an understandable way for internal as well as external actors is enabled by visualization. In summary, all three types of capabilities were emphasized in the 2018 study. The participants of the 2022 study emphasized the Computerized Maintenance Management System (CMMS) as being the backbone of effective maintenance planning. Having a solid digital basis in the form of a CMMS, which is integrated with other systems, is seen as the main enabler. The largest impact of digitalization is seen in the areas of follow-up and improvement, though. The three most frequently mentioned technologies in 2022 were: . CMMS . Mobile devices . Internet of Things (IoT) The capabilities to connect and store information are supported by these technologies. Communication technology and data processing were also mentioned by several participants. While some respondents argued for building a sound digital base is most important, others viewed artificial intelligence (AI) and IoT as important complements to the CMMS mainly for building understanding and acting capabilities. Some participant mentioned the ability to predict, or to be able to move from breakdown to preventive maintenance. Many saw all Internet 4.0 technologies as potentially important, and one respondent explained: “Perhaps the important thing is not which technology, but to start using one of them.”

3.1.2

Enabling Technologies of Tomorrow

In 2018, the study participants depicted a near future where Industry 4.0 has become a reality in maintenance. The maintenance strategy of tomorrow is highly conditionbased and predictive. IoT supports in collecting and managing sensor data while ML, deep learning, visualization, and digital twins are used for big data analytics. Two areas were pointed out as being mostly developed by digitalization: execution and follow-up. Mobile devises and Augmented Reality (AR) support the maintenance technicians in their daily work. New or complex tasks can be monitored and guided from a remote monitoring central. Planning and follow-up of operations is made remotely. Everything is connected creating vulnerability in the system. Thus, safety is a key concept. In 2022, planning and improvement were the areas with largest impact of digitalization in the future, according to the participants. This is highly reflected in the enabling technologies that were most frequently mentioned: technologies for big data

562

M. Kans

analytics, such as ML and AI, and for communication, such as Machine-to-machine communication and IoT. One participant mentioned the connection of people as important, as well. Although not in focus as areas that have developed most, we find that the participants recognize AR and mobile solutions as important digital support for the maintenance technicians in the future. The increased use of AR and VR solutions in five years’ time was not realistic, one participant claimed: there are several issues that must be solved for technologies that are to be worn by people, but maybe these issues are solved in ten years’ time.

3.2 Digital Challenges in Maintenance In the 2018 study, the main challenges for efficient digitalization were related to the organization and people, such as inability to connect technology investments to business needs, unwillingness to change, and lack of competence. Amongst issues connected with investment decisions, it is hard to identify business cases that will utilize digitalization is an efficient manner, hard to translate digital opportunities to business opportunities, and to know where to start the digital transformation The inability to understand how to make business out of digitalization relates to strategy and leadership. Unwillingness to change and lack of competence are mainly related to people and culture. Low level of formal as well as real competence affects the possibilities to implement digital solutions. The high age of personnel was also mentioned. Rigid cultures and manual maintenance management were seen as problematic as well. It was noted that many maintenance departments did not have a CMMS implemented, which makes the gap and the journey towards digitalization huge. Technology fear and unwillingness to change adds to the challenge. Issues with low integration of existing technologies, cybersecurity, and technologies with low user friendliness are also hindering the implementation. Organizational aspects connected with leadership and people were seen as the biggest challenges in 2022. Main challenge in leadership is being able to support the improvement and change work. The unwillingness and fear of change on all levels from the end user to the management. was seen as a main hinder by several participants. This is a leadership issue that often is combined with unwillingness to invest or the inability to understand the importance of digitalization, i.e., a strategic aspect. Without the top management support, there will be no room for investments and, thus, no incentives for change. It is also connected with the departmental culture that creates barriers and hinders information sharing between the maintenance department and other departments. The absolute biggest challenge is lack of all kinds of competence, though. Digitalization requires new kind of competences for handling technology and analyzing data, but basic maintenance related competence is highly missing as well. Many respondents mentioned the generation shift, and how hard it is to attract young people. Amongst the challenges connected with technology were system integration, cybersecurity, and lack of user friendliness. One participant mentioned poor collaboration with suppliers.

Are We There Yet?—Looking at the Progress of Digitalization …

563

Table 2 summarizes the challenges identified in 2018 and 2022. We can note both similarities and differences between the answers. For the strategic aspects, the main theme is the inability to align business and maintenance strategy with suitable technology. This was seen as a managerial problem as well in 2018, and the managerial problems are mainly appearing before an implementation has taken place. In 2022, the leadership aspects are emphasizing challenges in the improvement and change processes. Something that is recurrent in 2018 and 2022 is the unwillingness to change; it is mentioned on organizational, leadership, and people level. The culture aspect differs between the studies: while culture is seen as a hinder for digitalization in 2018, the hinders connected with culture in 2022 regards internal cooperation and collaboration. The challenges related with people are much the same; there is a lack of competence and lack of personnel. Unwillingness to change and technology fear is also mentioned both in 2018 and 2022. Security aspects are mentioned both in 2018 and 2022, as well as the lack of integration of systems and low user friendliness.

3.3 Facilitating the Digital Transformation In the study conducted 2018, the participants saw companies, suppliers, educators as well as researchers as facilitators. Companies should in firsthand approach the challenges that were present, for instance by improving the culture and leadership, and to start the process somewhere, such as feeding data into existing systems for achieving better decision making capabilities. Letting go and dare to try is one way to learn. Another way is to recruit new personnel. Young people like to be challenged and work with problem solving. A key is therefore to promote maintenance as a function where problem solving skills are needed. The suppliers could become better in describing available solutions and develop solutions that are compatible with existing systems. The suppliers and trade organizations were seen as important actors for aligning digitalization efforts in maintenance with business objectives; they could utilize their expert knowledge to help companies in finding good business cases and for describing the positive impacts that are achieved with effective and digitalized maintenance strategies. Trade organizations and standardization bodies have an important role in developing new standards both within the maintenance area and for the technologies. Education was mentioned by several participants. Educators have an important role in attracting young people to maintenance related education on all levels. One participant suggested standardizing the competence requirements according to the European qualifications for maintenance personnel. All actors should work together to facilitate the digitalization. For reaching this, sharing data, experiences and knowledge is essential. One example of shared data could be sensor data generated by a school or university that is open for everyone to download and analyze. In 2022, two main themes are seen: how to prepare the organization for change and how to support the successful implementation of digital technologies. Both are in firsthand a managerial problem addressing leadership, culture, and people. Amongst

564

M. Kans

Table 2 Challenges identified in 2018 and 2022 Category

2018

2022

Strategy

(1) Unclear what technology to invest in as the development is so fast (2) Hard to find the business cases (3) Inability to connect technology with current business processes (4) Low use of available technologies

(1) Poor willingness to invest (2) Poor connection between technology and utilisation (3) Poor understanding of the importance of maintenance (4) To change from preventive to predictive maintenance strategies

Leadership

(1) Unwillingness to change (2) The value of maintenance is not understood (3) Hard to convince decision makers that a system is useful (the maintenance representative understands, but has no authority to bring it further to the decision makers)

(1) Poor leadership (2) Poor adaptability (3) Not working with continuous improvements (4) Poor support and management for improvements and change in the organisation (5) Top-down management of change (6) Not enough benchmarking, learning from each other (7) Older managers’ unwillingness to digitize (8) Not understanding and carrying out the change completely

Culture

(1) Culture and people are interconnected–an (1) A departmental culture that creates openness to technology development is barriers and missing information lacking (2) Culture of firefighting (2) Companies do not dare to start using (3) Unwillingness to share data digitalisation (3) Hard to get companies to use the technology—go from paper and pen to Excel! (4) Technology is not seen as an enabler (5) To overcome technology fear

People

(1) Low level of competence (2) Lack of competence (such as technology competence) (3) Lack of social skills, cooperation problems (4) New types of jobs—from operations to control (5) Unwillingness to change (6) Getting employees to change their mind set (7) Technology fear (is seen as something that will replace personnel instead of supporting them) (8) Aging personnel

(1) Poor digital knowledge and competence, e.g., understanding of computer/machine communication (2) Lack of competence in all areas (3) Technology fear (4) Being afraid of change, low trust (5) Hard and difficult with new things, to get humans on the track–to keep pace with technology (6) Mindset of people, e.g., rely on technology to take care of everything (7) Acceptance and adaptability of technicians for new technology. Sometime, young managers might be unwilling to digitalize as well (8) Lack of personnel (9) Change-of-generation (continued)

Are We There Yet?—Looking at the Progress of Digitalization …

565

Table 2 (continued) Category

2018

2022

Governance (1) Rigid mind-set regarding data security (2) Poor support systems

(1) Cybersecurity

Technology (1) Intuitiveness of systems (2) New innovation such as Windows 95 is needed (3) A complete environment is lacking

(1) System integration (2) Lack of user friendliness (3) Poor collaboration with suppliers

suggestions for managing change is to make the work meaningful for the personnel, trust in people, use ambassadors of digitalization, and describe digitalization in a pedagogic way. As one of the participants stated: “Administrative tasks are mainly forced on us and we should get something back from it.” Adding new tasks to the personnel without giving them incentives is a hinder, and not an enabler. Having the right competence is a main facilitator for the successful implementation. One participant explained that all personnel should have a basic understanding of e.g., robotics and be able to use smart phones for easy data retrieval. This is achieved by education and competence development e.g., through dynamic learning platforms, or by acquiring competence e.g., by attracting young people and female. The latter has positive impact on the culture as well. Other means to achieve successful implementation are management involvement, communication and information sharing, cooperation and networking, and by applying a systems perspective. Good implementation practices will create trust, one participant mentioned. Other ways to facilitate digitalization is by developing cheaper technology solution and increasing the user friendliness. Reliable internet connection was mentioned by one participant, as this is the foundation of the Industry 4.0 concept. The main drivers of change are the companies and the suppliers, according to the study participants. Education was the most frequently mentioned facilitator, though, which implies that educators play an important role as well.

3.4 Where Are We Headed? At a first glance, it might seem like the digital development in maintenance has stalled or even reversed. Digital capabilities of connecting and storing, understanding and acting, and predicting and self-optimizing creates the backbone for predictive and prescriptive maintenance strategies. In 2018, the current enabling technologies were closely connected with these capabilities supporting CBM and predictive maintenance, and this was also depicted as the near future. In 2022, however, the connecting and storing capabilities were seen as enablers of today, while understanding and acting, and predicting and self-optimizing based on big data analytics were seen as enablers of tomorrow. However, this might be interpreted as a change of mindset from pure technology focus to a focus on utility in first hand, where the participant

566

M. Kans

answers of 2022 are closer to reality than the answers of 2018, that might describe “wants and hopes” of the participants rather than the reality. It might also reflect a more sober view on how to approach the implementation of emerging technologies. Without a solid base, which is reflected in collecting and storing capabilities, it is hard to achieve good results of advanced analytics, represented by understanding and acting, and predicting and self-optimizing capabilities. The true utilization of emerging technologies in Sweden is, most likely, for connecting and storing, and understanding and acting, while only few organizations have reached predicting and self-optimizing digital capabilities. Looking at the challenges we see that, although many hinders stays the same, there is a noticeable difference in challenges connected with leadership and culture aspects. In 2018, the managers struggled with getting improvement projects accepted due to the inability to express the benefits from the project. In 2022, however, the managers struggle with carrying out implementation projects. It seems like companies have started to implement emerging technologies! How to prepare the organization for change and how to support the successful implementation of digital technologies were also the main facilitating factors in the 2022 study. In a future where Industry 4.0 is fully implemented, the main challenges of securing maintenance and digital competence remains, according to the 2022 participants. Recruitment will be important in the future, just as it is today. One participant foresees a shift in maintenance tasks and the challenge in finding new tasks for the maintenance technicians. The most frequent and spontaneous answer was “Industry 5.0”. It is obvious that the role of people will increase rather than decrease. Moreover, participants believe that the technology related challenges will increase. For instance, the need to maintain all the emerging technologies is recognized. Cybersecurity will also be a challenge. In addition, working with continuous improvement was seen as a challenge of the future. The benefits of maintenance digitalization seem to be understood in the future. Instead, the challenge is to explain the impacts on sustainability. As one participant said: “To make it environmentally friendly, not only optimize machines to reduce energy and costs.”

4 Conclusions While it is positive that companies seem to run digitalization projects, we should be aware of how the implementation is carried out. In order to address the challenges, we need to promote cooperation between departments, and give space and time for collaborative learning and knowledge creation during and after the improvement projects. One way is to find joint business cases, i.e., projects that benefits more than one department. The main challenges were seen in the areas of people, culture, and leadership. The successful implementation is, thus, clearly connected with the core organizational capabilities. Change management is the process of understanding why changes have to be made, and how, and has impact on individual, organizational as well as cultural

Are We There Yet?—Looking at the Progress of Digitalization …

567

level [21]. The benefits in digitalization have to be understood and communicated. Technology fear could be addressed by pointing out the positive effects for the organization as well as for the individual worker. Formulating IT governance strategies and digitalization strategies supports the change [22]. The intellectual capital of personnel represents up to 80% of the total resources in the modern organization [23]. Preparing the personnel for the change by competence development and active participation in the implementation already in an early stage are therefore ways to increase the possibilities of succeeding. For the management, it is important to gain better understanding and suitable methods for the implementation process [24]. Being first in adopting emerging technologies might be hard. However, it is also recognized that the ones that dares to move fast in a business transformation are the ones that can gain competitive advantages. Developing a clear business case and having the financial possibility could definitely pay off. In order to do so, the view of maintenance has to change; from a necessary and unwanted cost to a business opportunity [25]. Thus, there exists a huge pedagogic task in describing the benefits of digitalization in maintenance, which involves both internal and external actors. Maintenance managers have to better explain the positive effects that could be gained from improvements and innovations, suppliers must explain quantitative as well as qualitative returns on investments in their technology solutions, and trade organizations and researchers have to develop pedagogic material and case studies that highlights the benefits of digitalization in maintenance. This study is of preliminary and inductive nature where the results have to be understood in the context in which they were gained. Therefore, limited possibilities exist to draw general conclusions. The comparative nature gives some interesting indications, though, that could be followed up in a larger longitudinal survey study. Acknowledgements The author would like to thank all participants in the study for sharing valuable knowledge, insights, and thoughts.

References 1. Kans M, Campos J, Salonen A, Bengtsson M (2017) The thinking industry: an approach for gaining highest advantage of digitalisation within maintenance. J Maint Eng 2:147–158 2. Bengtsson M (2008) Supporting implementation of condition based maintenance: highlighting the interplay between technical constituents and human and organizational factors. Int J Technol Hum Interact 4(1):49–75 3. Weyer S, Schmitt M, Ohmer M, Gorecky D (2015) Towards industry 4.0-standardization as the crucial challenge for highly modular, multi-vendor production systems. IFAC PapersOnLine 48:579–584 4. Sanders A, Elangeswaran C, Wulfsberg J (2016) Industry 4.0 functions as enablers for lean manufacturing. J Indust Eng Manag 9(3):811–833 5. Lasi H, Fettke P, Kemper H-G, Feld T, Hoffmann M (2014) Industry 4.0. Bus Inf Syst Eng 6:239–242 6. Kans M, Galar D, Thaduri A (2016) Maintenance 4.0 in railway transportation industry. In: Proceedings of the 10th world congress on engineering asset management, pp 317–331

568

M. Kans

7. Lu Y (2017) Industry 4.0: a survey on technologies, applications and open research issues. J Ind Inf Integr 6:1–10 8. Huber R, Oberländer AM, Faisst U (2022) Disentangling capabilities for industry 4.0-an information systems capability perspective. Inf Syst Front. https://doi.org/10.1007/s10796-022-102 60-x 9. Hofmann E, Rusch M (2017) Industry 4.0 and the current status as well as future prospects on logistics. Comput Ind 89(1):23–34 10. Marcucci G, Antomarioni S, Ciarapica FE, Bevilacqua M (2022) The impact of operations and IT-related Industry 4.0 key technologies on organizational resilience. Prod Plan Control 33(15):1417–1431 11. Mosterman P, Zander J (2016) Industry 4.0 as a cyber-physical system study. Softw Syst Model 15(1):17–29 12. Alcácer V, Cruz-Machado V (2019) Scanning the industry 4.0: a literature review on technologies for manufacturing systems. Eng Sci Technol Int J 22(3):899–919 13. Tecuci G (2012) Artificial intelligence. WIREs Compute Stat 4:168–180 14. Moret-Bonillo V (2018) Emerging technologies in artificial intelligence: quantum rule-based systems. Prog Artif Intell 7(2):155–166 15. O’Leary DE (2013) Artificial intelligence and big data. IEEE Intell Syst 28(2):96–99 16. Juuso EK (2018) Smart adaptive big data analysis with advanced deep learning. Open Eng 8(1):403–416 17. Pye A (2014) The internet of things connecting the unconnected. Eng Technol 9(11):64–70 18. Penna R, Amaral M, Espíndola D, Botelho S, Duarte N, Pereira CE, Zuccolotto M, Morosini Frazzon E (2014) Visualization tool for cyber-physical maintenance systems. In: Proceedings of 12th IEEE international conference on industrial informatics (INDIN), pp 566–571 19. Rikalovic A, Suzic N, Bajic B, Piuri V (2022) Industry 4.0 implementation challenges and opportunities: a technological perspective. IEEE Syst J Syst J 16(2):2797–2810 20. Underhållsmässan (2023). https://en.underhall.se/, visited 2023–03–01 21. May G, Stahl B (2017) The significance of organizational change management for sustainable competitiveness in manufacturing: exploring the firm archetypes. Int J Prod Res 55(15):4450– 4465 22. Kans M (2018) IT governance from the operational perspective: a study of IT governance strategies applied within maintenance management. Int J Serv Technol Manage 24:263–288 23. Jacobsen DI, Thorsvik J (2008) Hur moderna organisationer fungerar. Lund, Studentlitteratur AB 24. Campos J, Kans M, Salonen A (2021) A project management methodology to achieve successful digitalization in maintenance organizations. Int J COMADEM 24(1):3–9 25. Kans M, Ingwald A (2016) A framework for business model development for reaching service management 4.0. J Maint Eng 1:398–407

Integrated Enterprise Risk Management and Industrial Artificial Intelligence in Railway Peter Söderholm and Alireza Ahmadi

Abstract Traditionally, solutions for Industrial Artificial Intelligence (IAI) in railways focus on productivity improvements and single-loop learning. This is mainly achieved by the implementation of IAI in the technical rail system and its operation, traffic management, maintenance, and modification. These productivity improvements are limited to doing things the right way or better according to existing regulations. However, to support the implementation of these solutions and keep pace with the fast technological development (e.g., by reducing the pacing problem), IAI should also be used to manage effectiveness improvements and double-loop learning. Hence, IAI should be used in the management of regulations (e.g., based on technical specifications for interoperability, TSI) according to process-related regulations for dependability and safety (e.g., EN 50126/28/29 and Common Safety Methods, CSM). Thereby, IAI can change the traditional evolutionary management of railway regulations, where it tends to expand gradually based on experienced risks, incidents, and accidents. In addition, IAI can also support management in how to decide upon what the right things to do are by triple-loop learning. This might be achieved by using relevant theories in managing risks related to internal control, i.e., effectiveness, productivity, compliance, and reporting. This paper presents an integrated enterprise risk management framework and approach for the future railway, including the use of four different levels of IAI for continuous improvement and organizational learning. The applied approach is deductively based on a literature review in databases for regulations, standards, and scientific publications. The work is inductively supported by empirical examples, mainly from the Reality lab digital railway at Trafikverket (the Swedish transport administration). The result is an integrated enterprise risk management framework that should be applied to support the management of requirements P. Söderholm (B) Trafikverket and Quality Technology & Logistics at Luleå University of Technology, Box 809, 971 25 Luleå, Sweden e-mail: [email protected] A. Ahmadi Division of Operation and Maintenance Engineering, Luleå University of Technology, 97187 Luleå, Sweden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_42

569

570

P. Söderholm and A. Ahmadi

related to risk in the railway when working with continuous improvement supported by IAI. Keywords Railway · Industrial artificial intelligence (IAI) · Risk · Safety · Dependability · Compliance · Standard · Regulation · Improvement · Learning

1 Introduction The railway is a critical infrastructure for a country’s progress and is considered one of the central “lifelines” of a nation. An effective enterprise risk management (ERM) system is crucial for railways to identify, assess, and manage these risks that can impact their operations, safety, and financial stability. ERM considers all types of risks, including strategic, technical, operational, financial, and compliance risks. An ERM process consists of several activities, including risk identification, risk assessment, risk mitigation, risk monitoring, and risk reporting. An effective ERM should include a Risk-based maintenance (RBM) program that identifies critical systems, functions, and processes that are essential to the organization’s operations and prioritizes maintenance activities based on the risks they pose to safety, sustainability, business, and operation. Both ERM and RBM are living processes that require regular audits, reviews, and improvements. It is important to continuously monitor the railway organization’s risks and assets and to make changes to the ERM and RBM as needed to ensure the safety and stability of the organization. However, to manage the rapid technological development and support innovation, a corresponding development of the rail system’s organizational capability through effectiveness improvements is needed, i.e., regarding the management of risk-related regulations (see, e.g., [1, 2] and [3]). These types of improvements are necessary to use for national railway infrastructure administrations that have to manage budget restraints while achieving a safety performance that is economically sustainable for society ([4–7]). In addition, since the railway is safety–critical, initiatives for improved safety, availability, and Life Cycle Cost (LCC) have to comply with regulations and follow a risk-based process, see, e.g., (EU) No 402/2013 [5]. Compliance with these requirements helps to mitigate the risks associated with railway operations and ensures the safety of passengers, employees, and the public. In the context of technical risk management, compliance management helps to identify potential risks and ensure that appropriate measures are put in place to mitigate them. For example, compliance with railway safety regulations can help to reduce the risk of accidents to an acceptable level and ensure that appropriate safety measures are in place to protect passengers and employees. This requires developing and implementing policies, procedures, and systems to ensure that the railway organization meets all applicable regulations and standards. This includes maintaining accurate records, conducting regular inspections and audits, and implementing corrective actions when necessary. Furthermore, compliance management is critical for maintaining the reputation and credibility of the railway organization. Failure to comply

Integrated Enterprise Risk Management and Industrial Artificial …

571

with regulations can result in legal and financial penalties, damage to the organization’s reputation, and loss of public trust. Therefore, compliance management plays a vital role in ensuring the safe, efficient, and sustainable operation of railway systems. It should be noted that information, perspectives, and views on risk are constantly evolving, provisional, dynamic, transient, and therefore subject to change. These necessitate that the decision-makers must be prepared to adapt their plans as new information, knowledge or findings become available [8]. Hence, a continuous improvement process is also crucial in ERM to identify and respond to new risks and changing environments, assess the effectiveness of existing risk management strategies, stay ahead of emerging risks, optimize resources, stay in compliance and adjust them accordingly, and gain a competitive advantage. In an effective ERM system, organizations need to be proactive in identifying and managing risks, rather than simply reacting to them when they occur, requiring integration of compliance monitoring, continuous improvement, and innovation (ICCI) in ERM. By an ICCI approach, organizations can create a culture of risk management that is focused on the proactive identification and management of risks. This can help organizations to identify potential risks early on, take action to mitigate them, and avoid the negative consequences of risk events. Furthermore, this integration can help organizations to create a more sustainable and resilient business model that can adapt to changing conditions and emerging risks over time. The incorporation of such approaches requires the employment of double-loop and triple-loop learning in ERM. Double-loop learning involves questioning the assumptions and values that underpin an organization’s risk management processes. It involves examining the underlying assumptions and principles that shape the organization’s risk management practices and determining if they are still relevant and effective in light of changing circumstances. This type of learning enables organizations to adjust their risk management strategies and make more informed decisions [9]. Triple-loop learning takes double-loop learning a step further by examining the underlying systems and structures that shape an organization’s risk management processes. It involves questioning the norms, policies, and procedures that underpin an organization’s risk management practices and determining if they are still relevant and effective. This type of learning enables organizations to make fundamental changes to their risk management processes and improve their overall effectiveness [9]. By engaging in double-loop and triple-loop learning, organizations can continually improve their risk management processes, identify, and address emerging risks, and become more resilient in the face of uncertainty and change. This type of learning is critical in today’s rapidly evolving business environment, where risks are becoming more complex and difficult to manage. However, there are several challenges associated with the application of double- and triple-loop learning in ERM. Double- and triple-loop learning requires organizations to collect and analyse vast amounts of data related to their risk management processes. This can be a challenging and timeconsuming task. In addition, double- and triple-loop learning can be complex and difficult to implement. It requires organizations to critically reflect on their underlying assumptions and beliefs, which can be a challenging task. Furthermore, doubleand triple-loop learning requires the integration of multiple data sources and risk

572

P. Söderholm and A. Ahmadi

management processes. This can be difficult to achieve without integrating data from multiple sources and providing a holistic view of an organization’s risk profile. Moreover, double- and triple-loop learning can be a time-consuming process that may not always be feasible in real-time risk management scenarios. This challenge should be addressed by providing real-time data analytics and decision-support capabilities that can enable organizations to respond quickly to emerging risks. AI can help to address the challenges associated with applying double- and triple-loop learning in ERM by automating data analysis, simplifying complexity, enabling integration of multiple data sources, and providing real-time decision-support capabilities. Another quantum leap in enterprise risk management deals with Real-time enterprise risk management (RTERM). RTERM enables organizations to quickly identify and respond to potential risks and opportunities as they arise with a more comprehensive and proactive approach. AI can analyse large volumes of data and support the application of real-time risk management by identifying patterns and trends that humans may not detect and alerting organizations to potential risks. This can help organizations make better decisions and mitigate risk more effectively. Obviously, the application of IAI creates new possibilities for productivity improvements related to the physical asset [1] management of railway, its operation, and maintenance (see, e.g., [10–13]). Hence, this paper describes an integrated ERM framework for railway infrastructure, intended to support compliance, continuous improvement, and innovation related to the technical part of the rail system, its operation, and maintenance, but also related regulations, plans, and management. The framework provides a basis to identify the potential use of IAI for double- and triple-loop learning, in addition to the usual application for single-loop learning. The rest of the paper is organized as follows. First, Sect. 2 provides a theoretical frame of reference for theories related to maintenance program development and surveillance analysis, using Reliability centered maintenance (RCM), as well as double and triple-loop learning. Then, Sect. 3 includes the research methodology, and Sect. 4 presents the levels of IAI in railway risk management. Risk-related methodologies and tools for IAI applications will be described in Sect. 5, and finally, Sect. 6 ends the paper by presenting a short discussion and outlining some conclusions.

2 Theoretical Frame of Reference The theoretical frame of reference in this paper is mainly based on two theoretical contributions. One contribution is theories related to Reliability-centred maintenance (RCM) [14]. The other contribution is theories related to triple-loop learning presented by [9]. These two theories are in turn integrated with each other into a framework as outlined in Fig. 1.

Integrated Enterprise Risk Management and Industrial Artificial …

573

Triple-loop learning (and paradigm shifts)

Manage assets

Manage process-related and technical regulations

Improve effectiveness Double-loop learning

Plan actions in the infrastructure

Execute actions in the infrastructure

Assess productivity and effectiveness Single-loop learning and productivity improvements

Fig. 1 Loops of learning in relation to a generic asset management process in railway. Based on a combination of Nowlan and Heap [14], Argyris and Schön [9], and IEC 60300-3-14 [15]

2.1 Conceptual Framework of Reliability-Centred Maintenance (RCM) The RCM methodology is a risk-based approach for developing a preventive maintenance and inspection program for physical assets, to effectively manage the risk of function losses through applicable and effective maintenance [14]. RCM is a structured approach to maintenance that focuses on identifying and prioritizing maintenance tasks based on the potential consequences of equipment failure. It involves analyzing the functions of equipment, identifying the ways in which equipment can fail, and developing maintenance strategies to prevent or mitigate the consequences of those failures. Figure 2 shows the conceptual framework of RCM in preserving system functions and protecting against potential risks. It shows the chain from the cause, via failure, to the consequence in a typical engineering system, and includes an illustration of the role of system maintenance activities. In an RCM approach, the process of failure begins with a set of basic events, also known as “initiating events”, which perturb the system, i.e., cause it to change its operating state or configuration. If the initiating events (i.e., failure modes), cannot be managed at an early stage of their occurrence, they will lead to a number of “undesired events”, which is the outset of a possible undesired consequence [16]. Barriers are used to prevent or mitigate the escalation of both basic events and undesired events to unwanted consequences or loss. A barrier is a measure taken to reduce the probability that an unwanted event or situation will occur or to reduce the impact if they actually do occur. Barriers can be viewed as obstacles that perform the function of containing, removing, preventing, mitigating, controlling, or warning

574

P. Söderholm and A. Ahmadi The system maintenance activities

System Function Assurance

Consequences

Protective Efforts

Safety Consequences Operational Consequences Economic Consequences

Delay or fl ight cancellation

Maintenance Barrier

- O-ring Ruptured

Functional Failure: - Oil Leakage -

Protecti ve Barrier

Initiating failure

Maint enance B arrier

Basic event or failure cause

Injury or los s of life Air and noise pollution

Altitude restriction

Conseq. Redu cing Barrier

High repair cos t

Undesir ed Events

Ultimate Loss

Loss of system or equipment

Fig. 2 Conceptual framework of RCM

about the release of hazards [16]. As shown in Fig. 2, in the leftmost block (system function assurance), maintenance acts as a preventive barrier in order to preserve the main functions of the system. In the middle block (system protection assurance), maintenance acts as a preventive barrier to preserve the function of a protective device, or to assure the availability of a protective function. Other barriers can also be incorporated into the system, i.e., training, audits, emergency procedures, insurance, etc. RCM recognizes that the only reason for performing any kind of maintenance is not to avoid failures per se, but to avoid, or at least to reduce, the consequences of failure [14]. In fact, the probability of the consequences of undesired events, i.e., losses, and their magnitude depends to a great extent on the applicability and effectiveness of the barriers that are in place to avert the release of such consequences. Preventive maintenance acts as a preventive barrier whose aim is to eliminate the consequences of failure or reduce them to a level that is acceptable to the user. The RCM failure management policies include Preventive Maintenance (lubrication and servicing, functional inspection or check, restoration, and discard) default actions (operational test, age exploration, training, etc.), and redesign. In contrast to earlier methodologies, the RCM methodology is a system-level, top-down, and consequence-driven approach, intended to preserve function instead of failure prevention [17, 18]. The RCM analysis may be carried out as a sequence of activities or steps, including study preparation, system selection and identification, functional failure analysis, critical item selection (significant item selection), data collection and analysis, Failure Mode Effect & Criticality Analysis (FMECA), selection of maintenance actions, determination of maintenance intervals, preventive maintenance comparison analysis, treatment of non-critical items, implementation, and in-service data collection and updating [17]. Figure 3 shows the interaction of the elements of maintenance risk assessment and compliance monitoring, and the RCM development process. Risk assessment and compliance monitoring ensure that railway organization’s maintenance activities are compliant with applicable regulations, standards, and procedures.

Integrated Enterprise Risk Management and Industrial Artificial …

575

It involves tracking and verifying that maintenance activities are performed according to established requirements and identifying areas where improvements can be made. The first step is to establish maintenance procedures and standards that comply with applicable regulations and industry standards. This includes defining the scope of maintenance activities, establishing maintenance schedules, and specifying the requirements for maintenance documentation. Once maintenance procedures and standards are established, maintenance activities are implemented according to the developed RCM-based procedures and standards. This includes performing corrective, and preventive maintenance tasks, conducting inspections, and age exploration as needed. Accordingly, compliance monitoring involves tracking and verifying that maintenance activities are performed according to established requirements. This includes reviewing maintenance documentation, conducting audits, and assessing performance metrics to identify areas where improvements can be made. When noncompliance is identified, corrective actions are taken to address the issue. This may include re-training personnel, revising maintenance procedures and regulations, or implementing new controls to prevent future non-compliance. The RCM theories described by [14] focus on the problem of defining failure and some of the implications of this in analyzing failure data. For example, focusing on the consequences of item failure on the system level due to loss of required functions instead of the number of item failures. The theories also describe how to evaluate failure consequences, both regarding single and multiple failures. In addition, the theories describe the failure process itself and why complex systems, unlike simple items, do not necessarily wear out. RCM also provides explicit criteria for selecting failure management strategies based on effectiveness (related to consequences in RCM) and productivity (applicability related to reliability in RCM). These fundamental theories represent a paradigm shift within dependability and are robust since they do not change with time, see, [14] for more details. Other paradigm shifts are also indicated in the figure, e.g., corrective and time-based approaches, which were dominating before RCM, but also diagnostic, prognostic, and risk-based approaches which apply new technologies to enable RCM-logic.

2.2 Concept of Triple-Loop Learning Single-loop learning focuses on doing things the right way. Within risk management, it is about making adjustments to correct a mistake or a problem in relation to existing regulations, e.g., an initial maintenance program as defined by RCM. From a dependability perspective, it may be related to productivity improvements related to planning, execution, and assessment of the operation, traffic management, maintenance (including reinvestment), and modification (including upgrading and investment), see, e.g., [9, 14, 15]. At the single-loop level, AI and ML can support maintenance management in optimizing existing maintenance strategies and identifying potential issues through real-time monitoring and analysis of machine performance data. This involves using machine learning algorithms to detect patterns and identify potential

576

P. Söderholm and A. Ahmadi

Fig. 3 Interaction of RCM, risk management, and compliance

issues before they become major problems. Double-loop learning is about doing the right things. It focuses on identifying and understanding causality and then taking action to fix the problem. Within risk management in railway, this approach traditionally tends to expand the existing regulations (e.g., as part of the safety management system) based on an evolutionary development as a response to occurred risks, incidents, and accidents. However, from a dependability perspective, it should be related to effectiveness improvements of regulations (and the technical rail system), which is the product of a systematic application of dependability processes and supporting methodologies. One example is a living maintenance program (as a product) by a continuous application of the RCM process (see, e.g., [9, 14, 15]). Triple-loop learning goes even deeper to explore values and reasons behind the very existence of the rail system, related processes (e.g., in the safety management system), and desired services such as rail transports and traffic information. It is about trying to ascertain an understanding of how to make decisions that frame operation, traffic management, maintenance, and modification (e.g., based on fundamental theories within RCM). From a combined risk and dependability perspective, it may be related to management issues like core values and risk appetite as input to the use of methodologies such as RCM (see, e.g., [9, 14, 15]). Examples of other theories that support the framework are mainly related to continuous improvement ([19, 20]) and the pacing problem ([21]). The operationalization of these theories into internationally agreed-upon standards (i.e., ISO, IEC, ITU) is also relevant for the application of the proposed integrated enterprise risk management framework (see Fig. 2).

Integrated Enterprise Risk Management and Industrial Artificial …

577

3 Method and Material The applied research approach is deductively based on a literature review in databases for regulations, standards, and scientific publications. The work is inductively supported by empirical examples, mainly from the Reality Lab digital railway at Trafikverket (the Swedish transport administration). Additional projects that are related to the reality lab and contribute with empirical examples are: ASSET - Active, Systematic, and Effective Management of Assets (TRV 2022/29194); Pre-study automated measurement of railway infrastructure through innovation procurement (TRV 2020/39092); Fact or Fiction? A decision support system for fact-based decisions for railway maintenance (TRV 2020/25832).

4 Levels of IAI in Railway Risk Management Söderholm and Karim [22] presented an integrated enterprise risk management framework for the evaluation of eMaintenance solutions, e.g., based on IAI. The proposed framework is based on terms, definitions, principles, frameworks, and processes found in ISO 31000. Thereby, other risk-related management system standards can be aligned to achieve an integrated management system, e.g., related to asset management (ISO 55001) [23], dependability management (IEC 60300–1) [24], and information security (ISO 27001) [25]. In addition, this risk-based management system should be integrated with a safety management system (SMS) according to Directive (EU) 2016/798 [26], i.e., common safety methods (CSM) for SMS. This integrated enterprise risk management framework supports both productivity and effectiveness improvements by the logic of the PDCA cycle (Plan, Do, Check, Act). In addition, the establishment, use, review, and audit of this management system highlights additional opportunities to use IAI at different levels of improvement and organizational learning, see Figs. 4 and 5. The first level of IAI is related to single-loop learning and productivity improvements. Examples of productivity improvements within railway asset management are the replacement of existing manual inspection and the complement to automated inspection. This can be achieved by built-in-test, trains in regular traffic, unmanned rail vehicles, unmanned aerial vehicles (UAVs), manned aerial vehicles, and satellites (see Fig. 6). The collection of data by laser and 360 photography can also be used as the foundation for maintenance solutions based on virtual reality (VR) and augmented reality (AR), e.g., Rail View, Sky View, and Maintenance Go (see, e.g., [10]). This gives another example of productivity improvement within railway asset management, i.e., the analysis of inspection data by use of first-level IAI, e.g., for diagnostic, prognostic, and prescriptive purposes. These solutions tend to manage large amounts of measurement data by use of algorithms to deliver useful information products. These types of analyses can also be used for different asset management planning purposes, see, e.g., Table 1 for different asset management plans. Examples

578

P. Söderholm and A. Ahmadi

Fig. 4 Examples of theories, standards, and regulations related to the levels of organizational learning in railway risk management

First Second First Second First Third type type type type fourth level IAI second second level IAI fourth level IAI level IAI level IAI level IAI

Generic rail system requirements

Risk-based external audit of lower levels IAI’s compliance with enterprise risk management.

Risk-based internal audit of lower levels IAI’s compliance with internal control.

Risk-based control and support to lower levels IAI based on an understanding of regulations, standards and their fundamental theories.

Management of risk-related processes and methodologies implemented in the management system.

Management of national regulations and plans related to interoperable rail systems and components throughout their lifecycle.

Planning, execution, and assessment of actions in the national rail infrastructure system throughout its lifecycle.

Effectiveness and productivity improvements. Single- and double-loop learning.

Effectiveness and productivity improvements. Single- and double-loop learning.

Effectiveness and productivity improvements. Single-, doubleand triple-loop learning.

Effectiveness improvements. Double loop learning.

Effectiveness improvements. Double-loop learning.

Productivity improvements. Single-loop learning.

Sustainable rail service performance

Fig. 5 Four levels of IAI and their contribution to different types of improvements and loops of learning

are specific maintenance programs and plans for inherent items installed in the rail infrastructure asset, e.g., a population of a specific type of S&C (e.g., on a national level or on a line). Hence, this first level IAI is related to the delivery of a product (e.g., plan, regulation, or contract related to actions in the physical rail infrastructure) according to a specific process and included methodologies as part of the SMS. These plans can in turn be generated by different degrees of IAI support but tend to be documented in spreadsheets. In summary, this first level of IAI adapts single-loop learning and focuses on productivity improvements in the delivery of an information product (e.g., related to planning, execution, and assessment activities). At the single-loop level, AI

Integrated Enterprise Risk Management and Industrial Artificial …

579

Fig. 6 Examples of technologies for inspection in railway asset management. Adapted from Söderholm et al. [28]

Table 1 Examples of different asset management plan in railway with an indication of time periods (number of years) and update frequencies (change of colours) PLAN/YEAR Train plan

2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Operational plan

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

Technical development plan

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

Maintenance plan

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

Action plan

1

2

3

4

5

6

1

2

3

4

5

6

National transport infratsructure plan

1

2

3

4

5

6

7

8

9

10

11

12

1

2

3

4

5

6

7

8

9

10

11

12

1

2

3

4

5

Railway 2050

1

2

3

4

5

6

7

8

9

10

11

12

13

1

14

2

15

3

16

4

17

5

18

6

19

1

20

2

21

3

22

4

23

5

24

6

25

1

26

2

27

3

28

4

29

5

and ML can support maintenance management in optimizing existing maintenance strategies and identifying potential issues through real-time monitoring and analysis of machine performance data. This involves using machine learning algorithms to detect patterns in machine data and identify potential issues before they become major problems. The improvements mentioned above mainly address changes within existing regulations, e.g., an initial maintenance program. Hence, the solutions are evaluated based on their capability to fulfill existing regulations. However, there are also examples of evaluating the existing regulations to achieve effectiveness improvements, see, e.g., [27] and [28]. This can be done to achieve a more dynamic, or living, maintenance program or to evaluate the impact of new technologies for maintenance purposes, see, e.g., [2]. This second level of IAI focuses on double-loop learning and supports effectiveness improvements by more active management of a dynamic regulation, which in turn reduces the pacing problem (see Fig. 5). The second level of IAI would in turn support the implementation of productivity improvements enabled by the first level of IAI. On this second level of IAI, there are two different types that are of interest. The first type of second-level IAI supports effectiveness improvements by focusing on the management of generic regulations as a product (in accordance with specified processes and methodologies) related to the operation, traffic control, maintenance, or modifications. It might be in the management of generic maintenance programs and plans for inherent items of the

580

P. Söderholm and A. Ahmadi

rail system (e.g., a specific type of Switches & Crossings, S&C, typically provided by the original equipment manufacturer, OEM) (see Fig. 5). The national rail administrations are usually responsible for regulations about the design and management of the installed rail infrastructure system and its inherent items. This regulation is normally included in the management system when concerning operation and traffic management. When the regulation is related to maintenance (including reinvestment) and modification (including upgrading and investment) of the physical rail infrastructure it provides a basis for plans, procurement, and execution related to productivity improvement and single-loop learning. The resulting product is typically documented in text-based documents or spreadsheets, even though database formats are used (e.g., requirement management tools such as DOORS). These products are accessed by it-solutions such as webpages that are internal or external to the national rail administrations. The second type of second-level IAI supports the management of generic processes and methodologies for managing regulations (and other products such as technical systems and components) as part of the integrated management system. Two examples of this are implemented processes of CSM-RA for changes that might affect railway safety and the applied methodology of RCM in combination with a barrier analysis to achieve an initial, and thereafter, a living maintenance program. Another example is a process for the development of rail systems and components that are safe and dependable by following the IEC 50126 standard, which points to supporting methodologies such as FTA and FMECA. These artifacts are part of the integrated management system, typically as a number of documents and templates that are text-based or implemented in spreadsheets. These might be accessed through it-solutions such as a database (e.g., as a digital document centre or library) or a digital process map that can be more or less interactive. The third level of IAI adapts triple-loop learning and supports management decisions based on fundamental theories, e.g., as described by Nowlan and Heap [14]. This level of IAI is internal to the organization and supports the provision of expert knowledge about external regulations and related standards, e.g., regarding common safety methods (CSM) and technical specifications for interoperability (TSI). It is related to functions that control, but also provide support to, the ones using first and second levels of IAI to achieve productivity and effectiveness improvements that comply with external requirements. This level of IAI supports expert functions within different risk-related areas as implemented in the organization’s SMS and other parts of the overall management system. It may also be useful to introduce a fourth level of IAI, which supports the audit of the organization and is independent of the three other levels (see Fig. 5). Hence, the fourth level of IAI supports independent reviews of the other IAI levels of productivity, effectiveness, compliance, and documentation. The fourth level of IAI can be of two types. The first type is internal to the organization and focuses traditionally on the specific risk area of internal control. The second type of fourthlevel IAI supports the external audit of the other levels. This level of IAI is related to auditing authorities, such as the national and European railway agencies regarding railway safety. However, there are also other risk-related areas that are audited by different external actors, e.g., internal control, risk and vulnerability, and national

Integrated Enterprise Risk Management and Industrial Artificial …

581

security. Hence, even though the application area of internal audit normally is limited to internal control, the major difference between the first and second types of the fourth level of IAI is the responsible organization (see Fig. 5).

5 Risk-Related Methodologies and Tools for IAI Applications As described earlier, there are several levels of IAI that support different functions and responsibilities within an integrated ERM system. However, all levels of IAI should use the same methodologies and tools to work with continuous improvement on all levels. These improvements have to be managed through a risk-based approach, i.e., to fulfill requirements related to CSM for risk evaluation and assessment (RA), monitoring (MON), safety management system (SMS), and indicators (CSI). Both the CSM regulations and the dependability standards e.g., EN 50126, EN 50128, and EN 50129 ([29–31]) provide additional support by referring to appropriate methodologies, e.g., Fault tree analysis (FTA), Failure mode, effects & criticality analysis (FMEA/FMECA) and Reliability-centred maintenance (RCM). These methodologies are in turn described in specific dependability standards. In addition, both CSM and EN 50126/28/29 ([29–31]) provide risk matrices, but also principles and criteria for acceptance of risk in railway applications. The risk matrices provide levels of probability and consequence and their combination into risk levels. In addition, there are safety integrity levels (SIL) to be used. Due to the theoretical foundation and logic of the dependability standards they should be highly suitable to be implemented in process solutions based on the second type and level of IAI. From this, it will be possible to use the first type of second-level IAI to manage the products (e.g., plans, regulations, and contracts) from these processes. In turn, the combination of both types of second-level IAI solutions based on regulations and standards for methodologies and tools related to safety and dependability can be used to evaluate first-level IAI solutions for productivity improvements. In addition, the impact on existing regulations can be evaluated to ensure safety while improving effectiveness.

6 Discussion and Conclusions One way to bridge the rapidly growing gap between technical possibilities and organizational capabilities is today’s best agreed-upon practice related to risk management (see, e.g., [22, 32]). Within railway, support can be found in specific risk-related standards for railway applications, e.g., EN 50126/28/29 [29–31]. However, there are also application-neutral risk-related standards that represent the best internationally agreed-upon practice related to dependability and risk (e.g., IEC’s dependability suit of about 60 standards). Due to the dynamic society, it is necessary that the

582

P. Söderholm and A. Ahmadi

compliance of these risk-related regulations and standards are managed in a more active way to also support productivity, effectiveness, and documentation. Acknowledgements Our thanks to sponsors of the “Reality lab digital railway” (founded by TRV 2017/67785 and Vinnova 2017-02139) and some related projects, primarily “Fact or fiction?” (TRV 2020/25832), “ASSET” (TRV 2022/29194), and “Innovation procurement” (TRV 2020/39092).

References 1. Rasmussen J (1997) Risk management in a dynamic society: a modelling problem. Saf Sci 27(2–3):183–213 2. Granström R, Söderholm P, Lundkvist P Systematic dependability improvements by implementation of new technologies and regulations in railway infrastructure maintenance. eMaintenance,p 43 3. Mayntz R, Hughes T (2019) The development of large technical systems 4. EC (2014) Directive (EU) 2014/88 of the European parliament and of the council as regards common safety indicators and common methods of calculating accident costs. Off J Eur Union 5. EU (2015) No 2015/1136, Commission implementing regulation of 13 July amending Implementing Regulation (EU) No 402/2013 on the common safety method for risk evaluation and assessment. Bruss Off J Eur Union 6. ERA (2015) ERA-GUI-02–2015 implementation guidance on CSIs. European Railway Agency 7. Duranton S et al (2015) The 2015 European railway performance index: exploring the link between performance and public cost. The Boston Consulting Group, Paris 8. National Academies of Sciences, Engineering, and Medicine (2017) Enhancing the resilience of the nation’s electricity system 9. Argyris C, Schön DA (1997) Organizational learning: a theory of action perspective. Reis (77/ 78):345–348 10. Granström R, Söderholm P, Eriksson S (2022) Applications of Rail view, sky view and maintenance go–digitalisation within railway asset management. Int J COMADEM 25(2):23–30 11. Kulahci M, Bergquist B, Söderholm P (2022) Autonomous anomaly detection and handling of spatiotemporal railway data. In: International congress and workshop on industrial AI 2021 12. Söderholm P, Jägare V, Karim R (2022) Reality lab digital railway for sustainable development. Int J COMADEM 25(2):39–47 13. UNIFE and Shift2Rail (2022) The joint undertaking to build the railway system of tomorrow 14. Nowlan FS, Heap HF (1978) No title. Reliability-centered maintenance 15. International Electrotechnical Commission (2004) No title. IEC 60300–3–14, dependability management, Part 3–14: application guide maintenance and maintenance support 16. Modarres M (2006) Risk analysis in engineering: techniques, tools, and trends 17. Ahmadi A, Söderholm P, Kumar U (2010) On aircraft scheduled maintenance program development. J Qual Maint Eng 16(3):229–255 18. Ahmadi A, Söderholm P, Kumar U (2007) An overview of trends in aircraft maintenance program development: past, present, and future. In: European safety and reliability conference: 25/06/2007–27/06/2007 19. Juran JM, De Feo JA (2010) Juran’s quality handbook: the complete guide to performance excellence 20. Strewhart WA (1931) Economic control of quality of manufactured products 21. Marchant GE (2011) Addressing the pacing problem. The growing gap between emerging technologies and legal-ethical oversight: the pacing problem, pp 199–205 22. Söderholm P, Karim R (2010) An enterprise risk management framework for evaluation of eMaintenance. Int J Syst Assur Eng Manag 1:219–228

Integrated Enterprise Risk Management and Industrial Artificial …

583

23. ISO (2014) ISO 55001: 2014: Asset management–management systems–requirements 24. P. CODE (2003) Dependability management–Part 1: dependability management systems 25. IEC IEC 27002: 2022 Information security, cybersecurity and privacy protection—information security controls 26. EC (2016) Directive (EU) 2016/798 of the European parliament and of the council of 11 May 2016 on railway safety. Off J Eur Union 138 27. Söderholm P, Nilsen T (2017) Systematic risk-analysis to support a living maintenance programme for railway infrastructure. J Qual Maint Eng 23(3):326–340 28. Söderholm P et al (2019) Verklighetslabb digital järnväg: Förmåga för ökad digitalisering och hållbarhet 29. EN BS (2007) 50126–2; railway applications-the specification and demonstration of reliability, availability, maintainability and safety (RAMS)-Part 2: guide to the application of EN 50126–1 for safety. British Standards Institution (BSI), London, UK 30. Cenelec EN (2012) 50128-Railway applications-communication, signalling and processing systems-software for railway control and protection systems. Book EN 50128 31. EN B (2018) En 50129: 2018 railway applications-communication, signalling and processing systems-safety related electronic systems for signalling. BSI Standards Publication 32. Söderholm P, Norrbin P (2013) Risk-based dependability approach to maintenance performance measurement. J Qual Maint Eng 19(3):316–329

Digital Twin: Definitions, Classification, and Maturity Adithya Thaduri

Abstract In the process of developing digital twin for maintenance, there is a lack of reference to digital twin, architecture, and models in standards. In particular, the application also differs depending on needs of respective organisations. Before implementing the design and implementation of digital twin, it is necessary to define the user specifications and requirements. Hence, the first objective is to provide the digital twin terminology for maintenance based on the digital twin five-dimension model; physical, virtual, data, connection, and services. Due to the distinctive possibilities associated with the Digital Twin, their design and implementation are also wide-ranging. It can be classified based on dimensions. Hence, the second objective is to classify based on including several factors, such as, life cycle stages, completeness etc. In addition, the capability of digital representation to the physical asset is also of interest. Hence, the third objective is to assess the maturity level of DT. This paper provides a guideline in defining the DT with standardization, classification, and maturity level in the practical industrial applications. Keywords Digital twin · Maintenance · Definition · Abstraction · Maturity

1 Introduction Industries are facing several challenges in man, material, and machines to meet the targets imposed by government authorities such as sustainability, circular economy, energy efficiency, and zero CO2 footprint. Effective and efficient maintenance can facilitate to the achievement of these targets. Hence, they are exploring for various innovative solutions within both industry and academics to meet those defined targets. A. Thaduri (B) Division of Operation and Maintenance Engineering, Luleå University of Technology, Luleå, Sweden e-mail: [email protected] Maintenance Engineering, Scania Industrial Maintenance, Södertälje, Sweden © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 U. Kumar et al. (eds.), International Congress and Workshop on Industrial AI and eMaintenance 2023, Lecture Notes in Mechanical Engineering, https://doi.org/10.1007/978-3-031-39619-9_43

585

586

A. Thaduri

Since the introduction of the term “digital twin” coined by NASA and further Michael Grieves, several experts both in industrial and academicians were curious and started exploring the possibilities of application of DT in industrial applications. From 2017 onwards, DT becomes one of the strategic technology trends. As a simple definition, digital twin connects physical space to digital space with a specific function to optimize the operations and incorporate intelligence in decision making through various enabling technologies. These functions can change according to the requirements which is depending upon the industry requirements, such as design optimisation, resource optimisation, maintenance optimisation, etc. In the context of maintenance, these functions include but not limited to condition monitoring, diagnostics, predictive maintenance, etc. Hence, it can be applicable throughout the lifecycle stages of an asset. Since, it needs synchronisation between physical and digital space, it requires utilisation of various technologies such as Internet of Things (IoT), cloud storage and computing, 5G/6G, AI, data analytics, security, Augmented reality/Virtual reality, etc., depending upon the needs. The implementation of DT in the industrial context for maintenance requires predefined needs, specifications, requirements, and expectations. Several researchers and industries defined the digital twin, however, to conceptualise DT in an industrial environment, it is necessary to define in a standard way for common understanding within and outside the industrial environment. Furthermore, this common understanding becomes highly significant for industries which involves multiple stakeholders. For example, railways, construction, mining, etc., in several countries involves multiple partners who owns and share the assets. Further, it becomes even complicated when comes to maintenance, where it is an interdisciplinary department. Since, it is evident that DT not only incorporates emerging technologies but also integrates them, it is essential to implement standardisation from those technologies to adopt and adapt to suit the requirements. Though there have been efforts being carried out by various groups to standardize the DT, it cannot be applied universally as application depends on the industrial needs. Hence, the first step in implementation of DT is to choose a standard or collaborate with similar industries to define a standard for a specific application. In addition to the definition and standardisation, there has been several misconceptions, misinterpretations, and misunderstanding of its development and usage of both maintenance and DT. Its implementation might vary according to requirements, functionalities, and practical characteristics. Hence, there is a need to identify and classify the right kind of DT for maintenance. These classification dimensions include asset hierarchy, data sources (failure and maintenance), application domain, life cycle phase, level of autonomy, cognitive capabilities, etc. This classification defines the DT in clearer and concrete understanding within practitioners to setup within a relevant environment. However, there might be difference in existing capabilities and expectations from a DT point of view. Hence, it is necessary to assess it through a maturity model constituting the above-mentioned classification dimensions. Maturity model, as an instruments, assists the practitioners to identify and categorise the technological or organisational potentials to improve the desired outcomes of the business.

Digital Twin: Definitions, Classification, and Maturity

587

Thus, the main purpose of this paper is to provide guidelines for requirements before implementation of DT for industrial sector. The objectives of this paper are: a. to identify the definitions of DT in standards. b. to identify and classify the DT for an application. c. to choose the maturity level of DT. This paper provides a guideline for common understanding before or ongoing implementation of DT in industrial sectors. The architecture and implementation of DT is out of scope of this paper and needs further explorative research. The paper is structured as follows. Section 2 listed out various standards in the process and identifies the key components of DT. Section 3 classifies the DT with various characteristics and Sect. 4 chooses the maturity level of DT. Section 5 is about discussion and conclusion of the paper.ws.

2 Definitions from Standards Several years after defining DT, ISO started the development of a series of standards called ISO 23247 [1] in 2018 which was a framework for DT in manufacturing sector. It was formally approved in 2021. They define DT as “fit for purpose digital representation (data element representing a set of properties) of an observable manufacturing element with synchronization between the element and its digital representation”. These series of standards constitute general principles, reference architecture, digital representation of manufacturing elements, and technical requirements for information exchange. In addition, ISO published ISO 24464 [2] which analyses the visualization components in DT. They define as “compound model composed of a physical asset, an avatar and an interface”. IEEE initiated P2806.1 [3] that contributes to the standardization of architecture of digital representation for smart factories. They define as “high-speed protocol conversion, unified data modeling, and data access interfaces for heterogeneous data situations in the digital twin”. Other initiative, IEEE 2888.1 [4] defined as “vocabulary, requirements, metrics, data formats, and APIs for acquiring information from sensors and commanding actuators, providing the definition of interfaces between the cyber world and physical world”. IEC issued IEC 62,832 [5] series that also defines the architecture and application for the digital factories. They define as “the definition of logic objects to include intangible things such as software, concepts, patents, ideas, methods, and anything that could define as an asset of the industry”. ITU-T issued ITU-T Y.3090 [6] standard which provides the detailed requirements, architecture, and considerations for DT in specific use cases. It defined as “A digital twin is a virtual representation of real-world entities and processes, synchronized at a specified frequency and fidelity”. In addition, the Internet Research Task Force draft document on DT was published in 2022 [7]. It defined DT as a virtual instance of a physical system (twin) that is continually updated with the latter’s

588

A. Thaduri

Fig. 1 Timeline of standards for digital twin

performance, maintenance, and health status data throughout the physical system’s life cycle. In addition, there are several other initiatives are under development or planned for mainly DT for manufacturing such as ISO 30173 [8], IEEE P3144 [9], NISTR 8356 [10], etc. A timeline of standards is presented in Fig. 1 [11]. From the last decade, there are some notable efforts are being made for standardization of DT mainly in manufacturing, however, there is still a need to formalize the requirements and standards for DT for maintenance is essential. In addition, these standards also need to be aligned with the existing maintenance standards for the smoother implementation within the specified application domains.

2.1 Digital Twin Components From the various standards definitions, it is apparent that the main components of the digital twin are physical entity, digital entity, data, connection, and services (Fig. 2). In fact, all these components need to be integrated for the conceptualization and implementation of DT for maintenance in industrial sectors. However, each component consists of respective standards, where some of them are specific to DT or generic for emerging technologies as reported by Wang et al. [12]. The brief description of components as listed below: . Physical assets: it has two main functions; measurement and actuation. Actuation becomes more applicable for remote maintenance and reconfiguration. Existing relevant standards from each industry can be used for defining physical assets.

Fig. 2 Components of digital twin

Digital Twin: Definitions, Classification, and Maturity

589

. Digital assets: the digital representation composed of modelling to describe physical assets. Existing standards related to modelling, data access interfaces from heterogenous data sources can be utilised. . Data: Both data and model exchange will be performed and hence, standardized data model across heterogenous data sources needs to be developed. Appropriate standards in data format, data fusion, data mining, data storage, and data representation will be implemented. . Connection: Connection refers to communication between physical and digital assets and interoperability. There are innumerable IEEE and ISO standards availability which is more matured in other domains. . Services: The main purpose of DT will be defined here. In terms of maintenance, these services include, condition monitoring, anomaly detection, predictive modelling, maintenance optimisation, decision support, risk analysis, life cycle cost analysis, etc. These also include maintenance strategies such as Total Productive Maintenance (TPM), Condition Based Maintenance (CBM), Prognostics and Health Management (PHM), Reliability, Availability, Maintainability, and Safety (RAMS), Predictive Maintenance (PdM). Several use cases can be identified and defined to improve the performance of asset of choice. Some of the existing standards in maintenance can be applied here to define the maintenance services in DT.

3 Classification of Digital Twin Once the definition from the standard is selected and components for DT is identified, the next step is to classify it according to several dimensions. Only few researchers have tried to classify these dimensions for common understanding [13–17].

3.1 Application Area Since, DT is mostly depending upon industry requirements, it is essential to choose a specific application within an organisation. It can be more of physical assets such as manufacturing, transportation, energy, construction, mining, etc., or service-oriented such as healthcare, logistics, education, welfare, or digital assets such as software modules, networking, etc.

3.2 Hierarchical Level Each system or system of systems can be segregated to an asset hierarchy with indenture levels as shown in Fig. 3. Once, an industry is chosen, the positioning of the

590

A. Thaduri

Fig. 3 Hierarchical level of DT

Fig. 4 Interaction devices

assets needs to be selected depending upon the requirements within the organisation. If it is a pilot case, a unit/item/component can be chosen. If it is a wide scale implementation, then system can be chosen. The system of systems become more complex as it involves multiple interdependencies among systems, and it is difficult to envision at the initial stages. It is particularly becoming complicated to understand the root cause of failure and failure diagnostics.

3.3 Interaction Devices Interface devices acts as an intermediate component between user and DT (Fig. 4). The interaction levels need to be defined depending upon the requirement and its applicability. Complex DTs require adaptive assistance devices for higher cognition.

3.4 Data Sources Respective data sources are to be identified for a selected use case/services of the interest. The different types of maintenance data format are numeric, textual, visual, audio, and expert judgement (Fig. 5). Data can be collected from various IoT sensors, on-board sensors, etc., The domain expert’s knowledge in understanding the failure

Digital Twin: Definitions, Classification, and Maturity

591

Fig. 5 Types of maintenance data formats for DT

in terms of failure cause, failure mechanism, and failure effect can significantly improve the accuracy of DT. In addition, frequency of data collection, data storage mechanism, data model, data integration, data fusion, and data mining are required to assess the performance of the asset.

3.5 Integration The connection can be defined either automatic or manual and divided into three subcategories (Fig. 6) [18]. Digital model has manual connection on both sides established between physical and digital asset. Digital shadow has an automatic connection for measurement from physical to digital asset only. DT has a bi-directional flow of both measurement and actuation.

Fig. 6 Integration level for DT

592

A. Thaduri

Fig. 7 Levels of autonomy in DT

3.6 Level of Autonomy Autonomous DTs are defined to react changes to the environment and adapt to the external interferences. The five levels of this autonomy are illustrated in Fig. 7. The lowest level refers to manual operation of maintenance and highest level refers to remote maintenance without human interaction. The degree of involvement of human decreases from level 1 to level 5.

3.7 DT Creation Time The creation of DT (when it is created) [19] can be categorised as shown in Fig. 8. DTP consists of set of data that is required to create or build a physical asset from the simulation. This can be applied at reliability design, reliability growth, design for maintenance, design in maintenance. DTI is the physical counterpart throughout the lifecycle where it was already built. This can be useful to monitor, detect, predict,

Fig. 8 DT creation time

Digital Twin: Definitions, Classification, and Maturity

593

Fig. 9 Maintenance services at different life cycle phases

and control the behaviour of physical asset. DTA is a collection of multiple DTIs where a fleet of assets can be monitored and controlled.

3.8 Life Cycle Phase The reliability and maintenance services can be categorized based on the life cycle phase of the defined asset as shown in Fig. 9. These services must be defined at the use case description and consensus among the stakeholders. This categorization provides the purpose of DT. The implementation of these services can include traditional reliability and maintenance models, AI, ML, DL, hybrid modelling, Modelica analysis, big data modelling, 3D models, Computer Vision, Generic algorithms, etc. as defined by Liu et al. [16]. The selection of model depends on availability, quality, frequency, and feasibility of data.

3.9 Cognitive Type Figure 10 provides the roles of different cognitive types of DT. The basic interrogative DT consists of data acquisition where timely monitoring for diagnostics of fault or failure. Predictive DT utilises simulation, predictive and optimization models including ML models to predict the future condition. Prescriptive DT, in addition to predictive DT, utilises prescriptive analytics to suggest an optimum course of action. The higher level, the autonomous DT, includes the actuation of the physical asset with situation awareness, remote maintenance, and self-configuration.

3.10 Collaboration Type Most of the industries are based on multi-stakholder environment. As most of the assets are owned and shared, an appropriate business model will be developed, if not

594

A. Thaduri

Fig. 10 Cognitive roles of DT

Fig. 11 Collaborative levels of DT

exists, to envision the DT. The degree of collaboration can be defined in levels as shown in Fig. 11. Particularly, this degree depends on involvement of several partners that are responsible for performance of the asset, access rights, data sharing business model, and information sharing business model. It is essential that all relevant stakeholders need to be involved before the development of DT or else there might be issues in accuracy of DT because of non-availability of data from some of the partners. In addition to the above classification, there are other dimensions such as model authenticity, model fidelity, modelled characteristics, and accuracy which can be defined.

4 Digital Twin Maturity Maturity levels are used as an instrument to assess the performance. In this case, even before implementing DT or to assess the existing DT, a maturity levels are to be considered. Some of the existing maturity level assessments are described below.

Digital Twin: Definitions, Classification, and Maturity

595

Table 1 Atkin’s DT maturity model Level

Principle

Data acquisition

Usage

5

Autonomous operation and maintenance

Context awareness

Complete autonomous operations and maintenance

4

2-way data integration and interaction

Connection of real-time data Remote and immersive operations

3

Enrich with real-time data

IoT sensors

Operational efficiency

2

Connect model to static data

Documents, drawings, asset management

Asset management

1

Mapping of area

Objected based or building information model

Asset optimisation

0

Reality capture

Point cloud, drones, etc

As built

4.1 Atkins The maturity level for realisation of DT within the constellation of ecosystem, technologies, and integration can be provided by Atkins [20] as shown in Table 1. As moving from Level 0 to Level 5, there will be reduction of involvement of human to increase safety of humans from the hazardous environments. From their research, it was mentioned that most of existing DTS are in level 0 to 2.

4.2 Cognitive DT Maturity Depending upon the type of model and collected data, the DT maturity can be assessed by Table 2 [21]. Table 2 Cognitive DT maturity Level

Name

4

Intelligent DT Connect model to static data

Documents, drawings, asset management

3

Adaptive DT

Enrich with real-time data

IoT sensors

2

DT

Two-way data integration and interaction

Connection of real-time data

1

Pre-DT

Virtual model on technology assessment

Not applicable/not available

Model

Data acquisition

596

A. Thaduri

4.3 DT Maturity in Analytics Medina et al. [22] developed a comprehensive maturity model for considering several dimensions from the point of analytics as shown in Table 3. The authors suggested that this maturity assessment will generate additional revenue by offering complementary services to their products or to reinforce their designs with enhanced data usage to reduce cost savings for future products.

4.4 DT Maturity in Dimensions Uhlemkamp et al. [13] also developed maturity model based on several dimensions identified in Sect. 3 with an integrated index and assessed five use cases with weightage factors for each levels within dimensions.

5 Conclusion Because of wider implementation of DT in manufacturing, the other industries are in discussing on application of similar concept within their own industries to solve the problems and challenges. However, it is essential to define, classify, and assess the maturity of DT before implementation, especially, for industries which involves multiple stakeholders and consists of assets which are exposed to various environmental, operational, and human conditions. In addition, it is also necessary to envision the asset management and maintenance of assets to reach the target goals considering the DT acts as a facilitator. Hence, this paper provides a guideline to industrial sectors which define and identify DT with the right kind of requirements and expectations.