Model-Based Reliability Systems Engineering 9819902746, 9789819902743

148 103 17MB

English Pages [345]

Table of contents :
Preface
Contents
1 Development Phase of Reliability Systems Engineering
1.1 Background of Reliability System Engineering
1.1.1 The Development History of RMS Engineering in Foreign Countries
1.1.2 Global Developing Trends of RMS Engineering
1.1.3 Challenges Faced by Hexability in China and Demands for Leap-Forward Development
1.2 The Concept of Reliability System Engineering
1.2.1 System Engineering
1.2.2 Definition of Reliability System Engineering
1.2.3 Philosophical Connotation of Reliability System Engineering
1.3 Formation of the Theoretical and Technological Framework of Reliability System Engineering
1.3.1 Fundamental Theory of Reliability System Engineering
1.3.2 Technology Framework for Reliability System Engineering
1.3.3 Application Mode of Reliability System Engineering
1.4 Modelling Trend of Reliability System Engineering
1.4.1 Emergence and Development of Model-Based System Engineering (MBSE)
1.4.2 Trend of Modelling, Virtualization, and Integration of the Hexability Design
1.4.3 Difficulties in Reliability System Engineering
1.4.4 Technical Requirements for the Unified Model of Reliability System Engineering
2 Fundamentals of Model-Based Reliability System Engineering
2.1 MBRSE Theory and Methods
2.1.1 General Technical Framework for the Integration of RSE Technology
2.1.2 Main Research Areas and Engineering Significance of MBRSE
2.2 The Concept and Connotation of MBRSE
2.2.1 Definition of MBRSE
2.2.2 Elements and Architecture of MBRSE
2.3 Information Sharing Mechanism of MBRSE
2.3.1 The Cognitive Process of Product Life Cycle
2.3.2 Design Ontology Framework for MBRSE
2.3.3 Construction of the Fault Ontology
2.4 Process Control Mechanism of MBRSE
2.4.1 Meta-process for Brand-New Product Design Meta
2.4.2 Meta-process for Inherited Product Design Meta
2.4.3 Meta-process for Product Structural Design
2.4.4 Hexability Design Goal Control Method
2.5 MBRSE Design Evolution Mechanism
2.5.1 Design Evolution Method Set Based on Axiomatic Design Theory
2.5.2 Design Domain Extension for MBRSE
2.5.3 Mapping Principle of the MBRSE Design Domain
3 MBRSE Based Unified Model and Global Evolution Decision Method
3.1 MBRSE Model Evolution Process Integrating Functional Realization and Fault Mitigation
3.2 Modelling Method for Fundamental Product Model
3.3 Evolutionary Decision-Making of the Unified Model
3.3.1 Deterministic Model
3.3.2 Stochastic Model
3.3.3 Fuzzy Model
3.3.4 Hybrid Model
4 System Fault Identification and Control Method Based on Functional Model
4.1 Identification of Component Functional Faults in the Total Domain to Preserve Function
4.2 Identification of Component Physical Faults in the Total Domain Based on the Function-Physics Mapping
4.2.1 Fault Identification Methods of Basic Physical Components
4.2.2 Fault Identification Methods of Robust Physical Components
4.3 Emergent Fault Integrated Identification
4.3.1 Interface Fault
4.3.2 Transfer Fault
4.3.3 Error Propagation Fault
4.4 Fault Closed-Loop Mitigation Control
4.4.1 Component Fault Closed-Loop Mitigation Control
4.4.2 System Fault Closed-Loop Mitigation Control
4.5 Fault Mitigation Decision Method
4.5.1 Fault Mitigation Decision Considering Transmission Chain
4.5.2 Fault Mitigation Decision Considering the Coupling Relationship
5 Physics of Failure Based Fault Identification and Control Methods
5.1 Introduction to Physics of Failure Model
5.1.1 Physical Process of a Fault
5.1.2 Physics of Failure Model
5.1.3 Visualization Model of the Fault
5.2 Load-Response Analysis and Fault Identification
5.2.1 Fundamental Concepts and Principles
5.2.2 Finite Element Methods for the Load-Response Analysis
5.2.3 Simulation Based Fault Identification
5.3 Time Analysis and Fault Identification of the PoF Model
5.3.1 Failure Mode Analysis of the Product
5.3.2 Establishment of Time Varying Reliability Model
5.3.3 Calculation of Penetrability
5.3.4 Determination of the Limit State Function
5.3.5 Model Parameters Determination Based on the Degradation Process
5.4 PoF Based Failure Simulation and Evaluation
5.4.1 Fundamental Process
5.4.2 Failure Prediction
5.4.3 Reliability Evaluation
5.5 PoF Based Optimization Design and Fault Control
5.5.1 Parameter Sensitive Analysis Based on Orthogonal Test and Grey Correlation Model
5.5.2 Reliability-Based Design Optimization
5.5.3 Reliability-Based Multidisciplinary Design Optimization
6 Model-Based Reliability System Engineering R&D Process Model
6.1 Model-Based System Engineering Process and Design Flow
6.1.1 Evolution of the System Engineering Process
6.1.2 System Design Flow
6.2 Concepts of Integrated Design Process of Both Functional Performance and Hexability Under the MBRSE Mode
6.2.1 Integrated Design Process of Both Functional Performance and Hexability
6.2.2 Effects of the MBRSE on the Integrated Design Process
6.2.3 Multiple View Description Methods for the MBRSE Process
6.3 Key Technologies of the MBRSE
6.3.1 MBRSE Process Planning Technologies
6.3.2 Analysis Method for the Operation Conflict in the MBRSE Process
6.3.3 Simulation Based Operation Evaluation of the MBRSE Process
6.3.4 MBRSE Process Review and Validation Methods
7 Integrated Design Platform for Model-Based Reliability System Engineering
7.1 Engineering Requirements of the MBRSE Integrated Design Platform
7.1.1 Overview of Enabling Technology for Complex System Development
7.1.2 Enabling Technological Requirements for RSE Integrated Design
7.2 Fundamental Models of MBRSE Integrated Design Platform
7.2.1 Framework of Integrated Design Platform
7.2.2 Functional Composition of the Integrated Design Platform
7.2.3 Extension of the Product Data Model for Integrated Design
7.2.4 PLM Based Hexability Design Process
7.3 Integration of MBRSE Integrated Design Tools
7.3.1 Integration Requirements for the Integrated Design Tools
7.3.2 Integration Model on the Integrated Design Tools
8 Application Cases of the Use of MBRSE
8.1 Requirement Analysis
8.1.1 Operating Requirements
8.1.2 Requirements Decomposition
8.2 Preliminary Design
8.2.1 Functional Design
8.2.2 Structural Design
8.2.3 Working Principles
8.3 Systematic Failure Determination and Mitigation Based on the Functional Model
8.3.1 Systematic Determination of the Failure
8.3.2 Typical Failure Transmission Chain
8.3.3 Typical Closed-Loop Failure Mitigation Process
8.4 Determination and Control of Component Failures
8.4.1 Description of the Device
8.4.2 Digital Prototype Modeling
8.4.3 Load Response Analysis
8.4.4 Failure Prediction Model
8.4.5 Reliability Evaluation
8.4.6 Optimization Design and Failure Control
9 MBRSE Future Outlook
9.1 Technical Development Trend of MBRSE
9.2 Digital Twin Technology for Reliability
9.2.1 State-of-Art of Digital Twins
9.2.2 Reliability System Engineering Digital Twin
Bibliography

Recommend Papers

Systems Reliability Engineering: Modeling and Performance Improvement 9783110617375, 9783110604542

Reliability is one of the fundamental criteria in engineering systems. Design and maintenance serve to support it throug

183 87 13MB Read more

Systems Reliability Engineering: Modeling and Performance Improvement 9783110617375, 9783110604542

Reliability is one of the fundamental criteria in engineering systems. Design and maintenance serve to support it throug

160 42 11MB Read more

Site Reliability Engineering: How Google Runs Production Systems 9781491929124, 149192912X

The overwhelming majority of a software system's lifespan is spent in use, not in design or implementation. So, why

502 30 19MB Read more

Reliability and Maintainability Assessment of Industrial Systems: Assessment of Advanced Engineering Problems (Springer Series in Reliability Engineering) 3030936228, 9783030936228

This book covers advanced reliability and maintainability knowledge as applied to recent engineering problems. It highli

123 112 7MB Read more

Reliability Engineering 9819959772, 9789819959778

This textbook covers the fundamentals of reliability theory and its application for engineering processes, especially fo

108 103 10MB Read more

Reliability engineering 9781118140673, 3720146200, 1221221221

436 23 5MB Read more

Reliability Assessment of Safety and Production Systems: Analysis, Modelling, Calculations and Case Studies (Springer Series in Reliability Engineering) 3030647072, 9783030647070

This book provides, as simply as possible, sound foundations for an in-depth understanding of reliability engineering wi

100 10 37MB Read more

Importance-Informed Reliability Engineering (Springer Series in Reliability Engineering) [2024 ed.] 3031524543, 9783031524547

This book provides university students and practitioners with a collection of importance measures to design systems with

113 74 19MB Read more

Importance-Informed Reliability Engineering (Springer Series in Reliability Engineering) 3031524543, 9783031524547

This book provides university students and practitioners with a collection of importance measures to design systems with

117 74 5MB Read more

Statistical Reliability Engineering: Methods, Models and Applications (Springer Series in Reliability Engineering) 3030769038, 9783030769031

This book presents the state-of-the-art methodology and detailed analytical models and methods used to assess the reliab

115 89 5MB Read more

Model-Based Reliability Systems Engineering
9819902746, 9789819902743

Author / Uploaded
Yi Ren
Bo Sun
Qiang Feng
Cheng Qian
Dezhen Yang
Zili Wang

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Yi Ren · Cheng Qian · Dezhen Yang · Qiang Feng · Bo Sun · Zili Wang

Model-Based Reliability Systems Engineering

Model-Based Reliability Systems Engineering

Yi Ren · Cheng Qian · Dezhen Yang · Qiang Feng · Bo Sun · Zili Wang

Model-Based Reliability Systems Engineering

Yi Ren School of Reliability and Systems Engineering Beihang University Beijing, China

Cheng Qian School of Reliability and Systems Engineering Beihang University Beijing, China

Dezhen Yang School of Reliability and Systems Engineering Beihang University Beijing, China

Qiang Feng School of Reliability and Systems Engineering Beihang University Beijing, China

Bo Sun School of Reliability and Systems Engineering Beihang University Beijing, China

Zili Wang School of Reliability and Systems Engineering Beihang University Beijing, China

ISBN 978-981-99-0274-3 ISBN 978-981-99-0275-0 (eBook) https://doi.org/10.1007/978-981-99-0275-0 Jointly published with National Defense Industry Press The print edition is not for sale in China (Mainland). Customers from China (Mainland) please order the print book from: National Defense Industry Press. ISBN of the Co-Publisher’s edition: 978-7-118-12215-2 © National Defense Industry Press 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publishers, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publishers nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publishers remain neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Preface

Reliability, maintainability and supportability are the fundamental basis to provide high efficiency and reduce life cycle cost for the high-tech equipment. At present, with the rapid development of equipment products, we have deeply realized the harmfulness of the inconsistency design between functional performance characteristics and general characteristics (also called hexability which includes reliability, maintainability, supportability, testability, safety and environmental adaptability). And we have paid high attention on the ideological level and strict evaluation on the management level to try to avoid that problem. However, we are constrained at the technical level by the lack of systematic and effective methods. In practical situation, the product designers find it difficult to understand, master and utilize the complicated hexablity design methods. This results that the hexability design is disconnected from the functional performance design, and the hexability design results cannot affect the design of the product. The traditional reliability system engineering (RSE) methods are carried out mainly based on management. These methods provide “soft requirements” to the product design process but cannot create “hard constraints.” For this reason, this book proposes the model-based reliability system engineering (MBRSE) methodology, by taking the unified models as the basis, taking the modelbased fault prevention, detection and remedy as the core, and synergistically using different kinds of hexability design methods to carry out the integrated design of both functional performance and hexability. It inherits the management method of RSE and gives the overall design framework from the implementation perspectives. In this book, MBRSE is systemically elaborated from the following aspects: (1) The engineering requirement background and technology development status of MBRSE. This book summarizes the birth and development process of hexability technologies in foreign countries, as well as the development process of RSE in China in the past 20 years. Facing to the reality that China has a different industrial foundation and design concept compared to Western countries, Prof. Weimin Yang of Beihang University proposed the concept and connotation of RSE to adapt Chinese national conditions. And subsequent researchers gradually

v

vi

Preface

developed a theoretical and technical framework and created a professional engineering design theory considering Chinese characteristics. Through research and analysis, the problems faced in the implementation of traditional RSE methods are summarized in three aspects. First, an integrated design methodology is missed to hardly establish a design process model on both functional performance and hexability, and the hexability work can only rely on qualitative, cumbersome work items that are difficult to be implemented synchronously. Second, a universal unified design theory and method throughout the whole process of product development is missed to hardly establish a design method model on both functional performance and hexability. Third, an advanced integrated design software platform is missed to hardly integrate the hexability design six software tools into the digital environment of product design. These problems result in the main reason of proposing the MBRSE method. (2) The basic principles, basic models and technical framework of MBRSE. The main research categories and theoretical significance of MBRSE are firstly elaborated, and the conceptual models of MBRSE are then developed. These conceptual models include the principle models and technical framework of MBRSE. By taking the fault ontology as the core, the concept and relationship of the product faults in the design process are unified, and the model unification and sharing mechanism of MBRSE are provided. Driven by the forward evolution process of the unified model, the meta processes oriented to both a completely new design and inheritance design are established, respectively, to provide the process control mechanism of MBRSE. In addition, based on the mapping theory on the requirement domain, functional domain and physical domain, the domain extension and design process mapping principle for the model-based hexability design are presented. And the MBRSE process model is also provided, including the unified process framework, the unified process plan and reorganization method and process model, and comprehensive analysis and evaluation method of the process. (3) MBRSE design methods. The MBRSE design methodology is established by taking the unified model as the center. Firstly, a model-based multi-mode fault systematic identification method is proposed. According to the hierarchical structure and unified evolution process of the product, the global identification of the component functional fault is implemented to achieve the function maintenance. Then the global identification of the physical fault is implemented after the function-physics mapping. Based on the component fault, the interface fault, transmission fault and error propagation fault are further analyzed to implement the emergent identification of the system fault. 其Then, combined with the hexability design goals, a model-based synthetic control method for design goals is proposed by considering both functional performance and hexability design goals. Its core is to formulate a closed-loop fault mitigation strategy according to the faults identified by the system and give feedback on the evaluation process of the hexability indices indicators to ensure the implementation of them and application of using the hexability indices in the product design process.

Preface

vii

(4) MBRSE design platform and engineering application cases. Based on the theoretical models of MBRSE, the integrated design model, process model and tool integration model are established to develop the integrated design platform for the whole system and whole process. Then by taking a terrain mobile robot platform as an example, the MBRSE method, platform and software tools are applied from the demand analysis to the determination of the design plan for the verification purpose. This book is divided into nine chapters. The RSE development stage is firstly introduced, and then the basic theory, unified models and global evolution methods of MBRSE are given. Next, the model-based fault identification and control, MBRSE development process model and MBRSE integrated platform are mainly introduced. Finally, the latest development of the integration research between digital twin and RSE is discussed. This book is for engineers, technical managers and consultants in the aerospace, automotive, civil and ocean engineering industries and in the power industry who want to use, or are already using, reliability engineering methods. It was firstly released in Chinese and then translated into English. During this process, Prof. Zili Wang provides a professional guidance until the finalization of the book. The authors would like to sincerely thank him for his great concern and support. In addition, we would like to express our utmost thanks to Prof. Dariusz Mazurkiewicz from Lublin University of Technology, for all his contributions in editing and proofreading to this book. Our special thanks also goes to many of our colleagues, Post-Docs and Ph.D. students who contributed to the book in various ways. Beijing, China

Yi Ren Cheng Qian Dezhen Yang Qiang Feng Bo Sun Zili Wang

Contents

1 Development Phase of Reliability Systems Engineering . . . . . . . . . . . . 1.1 Background of Reliability System Engineering . . . . . . . . . . . . . . . . . 1.1.1 The Development History of RMS Engineering in Foreign Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Global Developing Trends of RMS Engineering . . . . . . . . . . 1.1.3 Challenges Faced by Hexability in China and Demands for Leap-Forward Development . . . . . . . . . . . . 1.2 The Concept of Reliability System Engineering . . . . . . . . . . . . . . . . . 1.2.1 System Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Definition of Reliability System Engineering . . . . . . . . . . . . . 1.2.3 Philosophical Connotation of Reliability System Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Formation of the Theoretical and Technological Framework of Reliability System Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Fundamental Theory of Reliability System Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Technology Framework for Reliability System Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Application Mode of Reliability System Engineering . . . . . . 1.4 Modelling Trend of Reliability System Engineering . . . . . . . . . . . . . 1.4.1 Emergence and Development of Model-Based System Engineering (MBSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Trend of Modelling, Virtualization, and Integration of the Hexability Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Difficulties in Reliability System Engineering . . . . . . . . . . . . 1.4.4 Technical Requirements for the Unified Model of Reliability System Engineering . . . . . . . . . . . . . . . . . . . . . .

1 1 1 6 11 14 14 19 21 24 26 27 29 32 32 35 39 40

ix

x

Contents

2 Fundamentals of Model-Based Reliability System Engineering . . . . . 2.1 MBRSE Theory and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 General Technical Framework for the Integration of RSE Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Main Research Areas and Engineering Significance of MBRSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Concept and Connotation of MBRSE . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Definition of MBRSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Elements and Architecture of MBRSE . . . . . . . . . . . . . . . . . . 2.3 Information Sharing Mechanism of MBRSE . . . . . . . . . . . . . . . . . . . 2.3.1 The Cognitive Process of Product Life Cycle . . . . . . . . . . . . . 2.3.2 Design Ontology Framework for MBRSE . . . . . . . . . . . . . . . 2.3.3 Construction of the Fault Ontology . . . . . . . . . . . . . . . . . . . . . 2.4 Process Control Mechanism of MBRSE . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Meta-process for Brand-New Product Design Meta . . . . . . . 2.4.2 Meta-process for Inherited Product Design Meta . . . . . . . . . . 2.4.3 Meta-process for Product Structural Design . . . . . . . . . . . . . . 2.4.4 Hexability Design Goal Control Method . . . . . . . . . . . . . . . . . 2.5 MBRSE Design Evolution Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Design Evolution Method Set Based on Axiomatic Design Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Design Domain Extension for MBRSE . . . . . . . . . . . . . . . . . . 2.5.3 Mapping Principle of the MBRSE Design Domain . . . . . . . .

45 45 45 47 48 48 50 55 55 56 59 62 63 64 65 67 72 73 75 80

3 MBRSE Based Unified Model and Global Evolution Decision Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.1 MBRSE Model Evolution Process Integrating Functional Realization and Fault Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.2 Modelling Method for Fundamental Product Model . . . . . . . . . . . . . 87 3.3 Evolutionary Decision-Making of the Unified Model . . . . . . . . . . . . 92 3.3.1 Deterministic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.3.2 Stochastic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.3.3 Fuzzy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.3.4 Hybrid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4 System Fault Identification and Control Method Based on Functional Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Identification of Component Functional Faults in the Total Domain to Preserve Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Identification of Component Physical Faults in the Total Domain Based on the Function-Physics Mapping . . . . . . . . . . . . . . . 4.2.1 Fault Identification Methods of Basic Physical Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Fault Identification Methods of Robust Physical Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

115 116 120 120 122

Contents

4.3 Emergent Fault Integrated Identification . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Interface Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Transfer Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Error Propagation Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Fault Closed-Loop Mitigation Control . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Component Fault Closed-Loop Mitigation Control . . . . . . . . 4.4.2 System Fault Closed-Loop Mitigation Control . . . . . . . . . . . . 4.5 Fault Mitigation Decision Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Fault Mitigation Decision Considering Transmission Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Fault Mitigation Decision Considering the Coupling Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Physics of Failure Based Fault Identification and Control Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction to Physics of Failure Model . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Physical Process of a Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Physics of Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Visualization Model of the Fault . . . . . . . . . . . . . . . . . . . . . . . 5.2 Load-Response Analysis and Fault Identification . . . . . . . . . . . . . . . . 5.2.1 Fundamental Concepts and Principles . . . . . . . . . . . . . . . . . . . 5.2.2 Finite Element Methods for the Load-Response Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Simulation Based Fault Identification . . . . . . . . . . . . . . . . . . . 5.3 Time Analysis and Fault Identification of the PoF Model . . . . . . . . . 5.3.1 Failure Mode Analysis of the Product . . . . . . . . . . . . . . . . . . . 5.3.2 Establishment of Time Varying Reliability Model . . . . . . . . . 5.3.3 Calculation of Penetrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Determination of the Limit State Function . . . . . . . . . . . . . . . 5.3.5 Model Parameters Determination Based on the Degradation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 PoF Based Failure Simulation and Evaluation . . . . . . . . . . . . . . . . . . 5.4.1 Fundamental Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Failure Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 PoF Based Optimization Design and Fault Control . . . . . . . . . . . . . . 5.5.1 Parameter Sensitive Analysis Based on Orthogonal Test and Grey Correlation Model . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Reliability-Based Design Optimization . . . . . . . . . . . . . . . . . . 5.5.3 Reliability-Based Multidisciplinary Design Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

126 131 133 140 143 143 146 148 148 152 163 163 163 165 173 185 185 189 197 198 199 199 201 202 203 205 205 205 206 207 207 210 214

xii

Contents

6 Model-Based Reliability System Engineering R&D Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Model-Based System Engineering Process and Design Flow . . . . . . 6.1.1 Evolution of the System Engineering Process . . . . . . . . . . . . 6.1.2 System Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Concepts of Integrated Design Process of Both Functional Performance and Hexability Under the MBRSE Mode . . . . . . . . . . . 6.2.1 Integrated Design Process of Both Functional Performance and Hexability . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Effects of the MBRSE on the Integrated Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Multiple View Description Methods for the MBRSE Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Key Technologies of the MBRSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 MBRSE Process Planning Technologies . . . . . . . . . . . . . . . . . 6.3.2 Analysis Method for the Operation Conflict in the MBRSE Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Simulation Based Operation Evaluation of the MBRSE Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 MBRSE Process Review and Validation Methods . . . . . . . . . 7 Integrated Design Platform for Model-Based Reliability System Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Engineering Requirements of the MBRSE Integrated Design Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Overview of Enabling Technology for Complex System Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Enabling Technological Requirements for RSE Integrated Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Fundamental Models of MBRSE Integrated Design Platform . . . . . 7.2.1 Framework of Integrated Design Platform . . . . . . . . . . . . . . . 7.2.2 Functional Composition of the Integrated Design Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Extension of the Product Data Model for Integrated Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 PLM Based Hexability Design Process . . . . . . . . . . . . . . . . . . 7.3 Integration of MBRSE Integrated Design Tools . . . . . . . . . . . . . . . . . 7.3.1 Integration Requirements for the Integrated Design Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Integration Model on the Integrated Design Tools . . . . . . . . .

219 219 219 222 224 224 228 230 234 234 237 239 241 245 245 245 247 248 248 250 252 261 269 269 272

Contents

xiii

8 Application Cases of the Use of MBRSE . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Requirement Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Operating Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Requirements Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Preliminary Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Functional Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Structural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Working Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Systematic Failure Determination and Mitigation Based on the Functional Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Systematic Determination of the Failure . . . . . . . . . . . . . . . . . 8.3.2 Typical Failure Transmission Chain . . . . . . . . . . . . . . . . . . . . . 8.3.3 Typical Closed-Loop Failure Mitigation Process . . . . . . . . . . 8.4 Determination and Control of Component Failures . . . . . . . . . . . . . . 8.4.1 Description of the Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Digital Prototype Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Load Response Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.4 Failure Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.5 Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.6 Optimization Design and Failure Control . . . . . . . . . . . . . . . .

285 285 285 289 289 289 294 298

9 MBRSE Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Technical Development Trend of MBRSE . . . . . . . . . . . . . . . . . . . . . . 9.2 Digital Twin Technology for Reliability . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 State-of-Art of Digital Twins . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Reliability System Engineering Digital Twin . . . . . . . . . . . . .

321 321 323 323 328

299 299 300 304 313 313 315 316 318 318 320

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Chapter 1

Development Phase of Reliability Systems Engineering

Abstract This chapter provides a systematic overview of the advent and evolution of hexability technology in China, and introduces the challenges and new development requirements in a new ear. Then, starting from the concept of system engineering, it reviews the establishment process, fundamental definition and philosophical connotation of Reliability System Engineering (RSE). Next, the theoretical and technical framework of reliability system engineering is given from three aspects, including the basic theory, basic technology and integrated technology. Finally, the prospective trends in the development of RSE in China are outlined based on the discussions of the emergence and development of MBSE. Keywords System engineering · Reliability system engineering · Definition and connotation · Theoretical and technical framework · Modeling

1.1 Background of Reliability System Engineering 1.1.1 The Development History of RMS Engineering in Foreign Countries Reliability, maintainability, supportability, testability, safety and environmental adaptability (herein referred to as hexability or RMS, which can also be simply represented by reliability, in other words, the RMS and reliability involved in this book generally refer to hexability) are the characteristics of the product required during its practical use. From a simple and clunky carriage in ancient times to a sophisticated and complex nuclear power plant in modern times, end users all hope that the product can work ‘solidly’ under various conditions, with no faults, few faults, or at least no fatal faults. And whenever a fault occurs, it can be quickly and accurately located, the failed part is easy to be repaired or replaced, and the maintenance is supported by professional experts and reliability tools in time. Such simple expectations for these characteristics of the product by the end-users are not naturally established and need to be realized in the hexability design of the product. © National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_1

1

2

1 Development Phase of Reliability Systems Engineering

In the era of the traditional handicraft industry, product design is based primarily on the experience of designers through its long-term accumulation in practice. In the early industrial age, products were relatively simple, hexability problems could be solved only based on engineers’ experience, and therefore special design activities were not necessary to carry out. Hexability, regarded as an engineering design discipline, was gradually formed and developed in World War II. During World War II, military equipment was developed in a short time, usually characterized by a complicated structure and low technological maturity, resulting in a large number of hexability problems in its use. For example, Nazi Germany’s V-2 rocket was developed in only two years, but it was constructed with the use of up to 220,000 parts and components owned. In addition, many new technologies such as liquid rocket engines, inertial navigation, and automatic flight control system were introduced for the first time. Its flight altitude reached 100 km and its speed reached Mach 48. Due to the substantial increase in system complexity, the massive adoption of new technologies and unprecedented working conditions, reliability has become a key technical problem in the development of the V-2 rocket. To evaluate the reliability of the V2 rocket, R. Lusser, one of the V-2 rocket developers, first proposed a probability multiplication rule by treating the rocket system as a series model, to calculate the reliability of the system by the sum of the reliability of its each component. Although the importance of reliability has already been realized, due to the lack of effective technical and management methods, the reliability of the V-2 rocket has not been well resolved, making its actual combat effectiveness much lower than expectations. In addition, in World War II, the rise of electronic products such as radar greatly improved the performance of weapons and equipment. However, 60% of airborne electronic equipment in the United States could not be used after being shipped to the Far East, and 50% of electronic equipment failed during storage, which led to severe restrictions in its combat effectiveness. Aiming to the hexability issues exposed during the use of various equipment in World War II, modern reliability engineering technology was first born in the United States in the 1950s, and gradually expanded to specific characteristics such as maintainability, supportability, testability, safety and environmental adaptability. For more than 70 years, many countries around the World have put great importance on the theoretical researches and engineering application methods of hexability. The hexability technology has achieved considerable development and obvious application results. A technical system was formed and developed from single technology to integrated technology composed of three parts including requirement determination, design and analysis, validation and evaluation. The technological development has gone through the following 5 stages:

1.1.1.1

Stage of Solving Typical Issues (1940s–1960s)

In order to address the high fault rate of electronic products in World War II, the United States established the Electronic Tube Research Committee in 1943

1.1 Background of Reliability System Engineering

3

to study reliability issues in electron tubes. In 1951, the Airlines Electronic Engineering Committee (ARINC) formulated the earliest reliability improvement plan and published the ARINC report in 1952, to define the terminology of reliability, and first clarified the random characteristics of Time To Failure (TTF) factor. Also in 1952, the US Department of Defense established Advisory Groupon Reliability of Electronic Equipment (AGREE); In 1955, AGREE began to implement a comprehensive reliability development plan covering the stages of design, testing, production to delivery, storage and use, and in 1957 published the research report “Reliability of Military Electronic Equipment”, (i.e. AGREE report), which elaborates the procedures and methods of reliability design, testing and management of military electronic equipment from nine aspects, and determines the development direction of reliability engineering in the United States. The AGREE report has become a foundational document for the development of reliability, marking that reliability has become an independent discipline, and its publication is considered as an important milestone in the development of reliability engineering. However, from engineering perspective, reliability engineering was not used to promote systematically weapons and equipment development in a planned way, but focused on the solution of detailed problems. Second-generation fighters such as F-4 and F-104 developed by the US Army in the 1950s represented very low reliability, low combat readiness, low attendance, and high maintenance and support costs in Vietnam War. In the 1960s, aiming to the problem of low reliability of the F-4 and other fighters in Vietnam War, the US military formulated and issued a series of reliability military standards, such as the MIL-STD-785 “Requirements of Reliability Outline for Systems and Equipment” on the basis of the report titled “Reliability of Military Electronic Equipment”, and applied them in the development of new, 3rd generation weapons and equipment such as F-14A and F-15A fighters or M1 tanks. Since then, the reliability requirements, reliability outline, reliability analysis, design and reliability qualification tests have been carried out.

1.1.1.2

Stage of Systematical Implementation (1970s–1980s)

In the 1970s and 1980s, both—the United States and the Soviet Union developed a large number of complicated new equipment in order to obtain strategic military advantages. Through the implementation of a number of programs, such as Apollo moon landing, the United States has rapidly improved its scientific and technological military strength and accumulated experience in comprehensively carrying out reliability engineering of large systems. The US Department of Defense established a joint technical coordination group that included reliability, availability, and maintainability that was directly led by Joint Logistics Commanders of the Three-Armed Services to manage the entire process of RMS in the development of equipment to comprehensively strengthen the reliability management of weapon equipment and improve their actual combat capabilities. In the late 1970s, in the development of weapons and military equipment, the United States began to use reliability development and growth tests, environmental

4

1 Development Phase of Reliability Systems Engineering

stress screening, and comprehensive environmental tests, and launched relevant standards. In 1980, the US Department of Defense released the first Reliability and Maintainability (R&M) Regulation DoDD5000.40 “Reliability and Maintainability”, which specifies the R&M policy of procurement and responsibilities of different organizations in the Department of Defense, and emphasized that the R&M work should be carried out from the beginning of the development of any equipment. In 1986, the US Air Force released the “R&M 2000” Action Plan, which clarified that R&M is an integral part of the combat effectiveness of aviation weapons and equipment. Beginning from management, this plan promoted the development and application of R&M technology, and institutionalized R&M management. The several local wars in the 1990s not only reflected the technological advancement of the US military equipment, but also highlighted its outstanding hexability. Most of the equipment used in those wars were developed or improved in the 1970s and 1980s. This reflects the effectiveness of the systematic implementation of the RMS technologies. Meanwhile, the Star Wars program and the development of stealth fighters during that period have also promoted the technological improvement of reliability engineering, such as the researches of highly accelerated life testing (HALT), highly accelerated stress screening (HASS), software reliability, and network reliability, analysis of Physics of Failure (POF), failure mode and effects analysis (FMEA), testability modelling, virtual maintainability analysis, integrated support simulation analysis, and other technologies, and are gradually being applied in the development of new generations of equipment.

1.1.1.3

Stage of Standstill and Retrogression (1990s–Early Twenty-First Century)

The United States became the only superpower in the World and faced with reduced threats after the disintegration of the Soviet Union. In order to reduce defense expenditures, the US military carried out defense procurement reforms in 1994. ThenSecretary of the Department of Defense Willem Perry abolished most of the reliability military standards in order to achieve military-civilian integration of equipment procurement and tried to ensure the reliability of equipment through a completely market-oriented approach, thereby saving procurement costs. However, actually this action has caused a continuous decline in the reliability of subsequent weapons and equipment. Between 1996 and 2000, 80% of the new US military equipment failed to reach the required level of operational reliability. There are also other technical reasons. Since the 1990s, the World has entered the information age represented by computers, software, and networks. The integration, informatization, and automation of equipment have become more and more important. Failures and specific requirements in reliability of informatic equipment have many new features that need to be solved from both basic theory and applied technology.

1.1 Background of Reliability System Engineering

1.1.1.4

5

Stage of Spiral Rise (Early Twenty-First Century–2015)

After entering the twenty-first century, nearly half of the Weapons and Military Supplies acquisition projects of the US military failed to meet the requirements during their initial test and verification process. Researches in the US Department of Defense have found that serious issues were caused by the failure to implement the reliability of equipment development. For instance, reliability in the design stage was insufficient, reliability design practice of defense contractors did not conform to the best business practice, failure mode effects and criticality analysis (FMECA) and the failure report analysis and corrective action system (FRACAS) did not work, reliability tests of components and systems were insufficient, etc. In order to solve the reliability issues in the development of weapons and military equipment, the US. Department of Defense cooperated closely with industries and the Government Electronics and Information Technology Association (GEIA), and in August 2008 officially released the reliability standard GEIA-STD-0009 “Reliability Work Standard for System Design, Development, and Manufacturing” [1] for the use in defense systems and military equipment development and manufacturing, and once again strengthened the reliability work of equipment development. In May 2013, US TechAmerica released the associated TA-HB-0009 “Reliability Program Handbook”. Meanwhile, Physics of Failure (PoF) based reliability design technology has gained high attention and in-depth development, in the use of reliability design of the aero-electronic devices in F-22 fighters and European A400 military freighters. The Maintenance Free Operating Period (MFOP) was first adopted in A400M as the reliability index, instead of the traditional mean flight hours between failures (MFHBF) factor.

1.1.1.5

Stage of the New Technological Revolution (2015–Present)

In 2013, Germany first proposed Industry 4.0, whose core concept is to use a CyberPhysical System (CPS) to digitize and to make smart the supply, manufacturing, and sales information processing in the production process, and finally achieve a rapid, effective, personalized product supply. In May 2015, China’s State Council officially released “Made in China 2025” to implement the comprehensive implementation of the strategy to make China a strong manufacturing country. Its key content is to promote the deep integration of informatization and industrialization, and to build a Chinese version of Industry 4.0, and pose new challenges to traditional reliability technology. Facing the new targets and new issues in the era of Industry 4.0, network reliability, CPS reliability, autonomous system reliability, system flexibility, and digital twin based reliability technology have been developed rapidly in recent years, and become the research hotspots of reliability technology. In this new age of the technological revolution, theory, technology, methods and tools of reliability technology are all facing new opportunities and challenges.

6

1 Development Phase of Reliability Systems Engineering

1.1.2 Global Developing Trends of RMS Engineering 1.1.2.1

Trend of Technological Synthesis and Integration

From the development of a single technology to comprehensive technology and integrated technology, and the integration of functional characteristics, RMS characteristics are both important features of technological development in this period. With the rise of digital design, three-dimensional paperless design, product lifecycle management (PLM), multi-disciplinary design optimization and collaboration in the environment of networks have become the new direction of design technology development, and it has also driven the direction of RMS towards integration. The ways of synthesis and integration can be summarized as follows: ● integration of RMS design and analysis, such as integrated analysis of reliability, maintainability, and availability, integrated design analysis of reliability, testability, maintainability, and supportability, integrated design of RMS and function/ performance, etc.; ● integration of reliability test, that is, making full use of the test information of research and development tests, growth tests, environmental tests, and appraisal tests to evaluate product reliability; ● integration of logistic support and diagnostic, that is making use of comprehensive diagnosis to achieve design, production, and maintenance testing integration; ● integration of hardware and software, that is carrying out a comprehensive analysis of the reliability of hardware and software; ● integration of the information about reliability, maintainability and supportability, by establishing an integrated data system for weapons and equipment, various design, production, maintenance and support information from the ordering party, users, the main system and the transfer system can be utilized and shared comprehensively. The engineering community has always wanted to integrate various design disciplines into the engineering process of the product development system to achieve integrated design performance and various characteristics. The idea of integrated design runs through from the integration of engineering disciplines [2, 3] in the United States in the 1970s to concurrent engineering in the 1990s and the credibility technology that emerged in the 1990s in Europe [4, 5]. The development of complex equipment is a system engineering process composed of different stages of work. System integration based on a systematic idea and basic principles of system engineering is the core throughout the system engineering process [6]. Unlike traditional function/performance design, the engineering profession is used to ensure that the developed system is more reasonable and more effective in the actually used environment. The integration of engineering disciplines refers to the integration of equipment technical requirements, integration of the equipment development process, the integration of the research team, and integration of various design methods (tools). Among them, the interactive and coordinated integrated

1.1 Background of Reliability System Engineering

7

development process is the core of driving the integration of engineering discipline. The most important part of the engineering disciplines integration lies in professional fields such as RMS. Lockheed Martin adopts matrix management and uses knowledge in specific areas such as reliability, maintainability, human engineering, transportability, safety, electromagnetic compatibility etc. to support product design, and ensure that the system has applicability to realize system engineering process integration in applied environment [6, 7]. With the development of technology, more and more elements such as people, technology, hardware, software, processes, and enterprises are involved in the product system, leading to more and more complicated system application and support processes, and thus giving birth to a new generation of systems engineering methods—MBSE (Model-based Systems Engineering). Such a method, by taking models as the center in system design, becomes the future development trend of system engineering. Its main result is the system model, which is made up of key elements such as system requirements, structure, behavior, and parameters. Reliability, as an important feature of the system, is also integrated into the process of the system model design.

1.1.2.2

Trend of Process Modelling

In recent years, the Georgia Institute of Technology (GIT), the National Aeronautics and Space Administration (NASA), Lockheed Martin, the French PRISME Institute, the International Council on Systems Engineering (INCOSE) and other institutions have all focused on reliability and special feature coordination, mainly using the MBSE method. This method strengthens communication and coordination among multiple users by improving the traceability of requirements, to improve knowledge extraction ability, design accuracy and integrity, thus facilitating information reuse, strengthening the system engineering process, and reducing development risks. At present, the MBSE method has been applied in a wide range of fields including aviation, aerospace, vehicles, ships, electronics, civil products etc. During its application, several researchers have also summarized a general system engineering process, which can effectively realize the integrated design. The most representative examples are the system engineering processes developed by GIT in the US and the PRISME Institute in France. (1) The system engineering process developed by GIT GIT has established a system engineering process based on MBSE, as shown in Fig. 1.1. At the same time, taking the excavator as an example, it built a system model, a system operation scenario model, and a factory manufacturing model for production. This process includes a collection of knowledge such as meta-model, profile, model library etc., through the simulation analysis on multi-domain hybrid systems with target optimization model, cost model, reliability model, mechanism dynamics model etc., to realize the impact on system design, to ensure the concurrent

8

1 Development Phase of Reliability Systems Engineering

Fig. 1.1 System engineering process based on MBSE method (an example by using the excavator) (color picture provided at the end of the book)

completion of the realization of the system design and reliability target, and finally to achieve the product design scheme. (2) Modelling reliability engineering technology framework developed by PRISME Institute in France Taking the unified model as the core, R. Cresent and F. Kratz (PRISME, ENSI de Bourges), P. David (Bourges Université de Technologie de Compiègne) et al. established an FMEA, reliability and failure scenario analysis, real-time embedded system simulation analysis, and the MeDISIS Simulink-based system simulation overall framework, as shown in Fig. 1.2, to realize the integration of reliability and traditional disciplines.

1.1.2.3

New Requirements by the New Technological Revolution

The rise of smart manufacturing and intelligent design based on both the Internet and Internet of Things has greatly reduced the difficulty of traditional design. Then, differentiation and high quality have become the new goals pursued by product design, such as Industry 4.0, Industrial Internet and the Made in China 2025 strategy. Therefore, design, management and assurance of hexability will play more important roles in the design and application of future products. In addition, the next generation

1.1 Background of Reliability System Engineering

9

Fig. 1.2 System engineering technology framework (MeDISIS) developed by PRISME Institute in France

of manufacturing based on cyber-physical systems (CPS) (Fig. 1.3) will cause major changes in the modes in product design, production and service. And the traditional hexability design method will face the requirements for further upgrading in order to adapt to such changes, as follows: (1) Hexability design requirements for small-batch flexible configuration products (2) Reliability design and validation requirements for intelligent devices. (3) Design and validation requirements for the hexability of the smart factory/CPS system. (4) Requirements for hexability design and intelligent products design and PHM design. 1.1.2.4

Industrialization Trend of the Hexibility Technology Industry

With the deepen of the social division of labor and the development of productive services, the manufacturing and service industries are integrated and interdependent, with a more and more ambiguous boundary in between them. In the twentyfirst century, service-based manufacturing gradually emerged. The internal demand

10

1 Development Phase of Reliability Systems Engineering

The first programmable logic controller Modicon 084 In 1969

The first production line Cincinnati Slaughterhouse In 1870

3.The industrial revolution Based on Cyber-Physical Systems

3.The industrial revolution Further strengthening of the manufacturing industry by utilizing electronics and information technology complexity

First mechanical loom In 1784

2.The industrial revolution After the introduction of large-scale power production based on division of labor

1.The industrial revolution After the introduction of hydro-powered and steampowered machinery manufacturing equipment Time End of the 18th century

Beginning of the 20th century

The early 1970s

Nowadays

Fig. 1.3 Industrial 4.0 based on CPS

comes from the market. And customer consumption culture has changed from product demand to personalized and experiential demand. Product homogeneity is becoming more and more serious. Manufacturing companies urgently need to provide products and services to overcome product homogeneity and meet customer needs. For complex high-tech products, this trend is becoming increasingly prominent. In order to ensure the normal function of the product, it is necessary to provide corresponding auxiliary services, such as professional installation, commissioning, maintenance and repair, health management and other services. Obviously, good hexability is the support for the profit and competitiveness of service-oriented manufacturing enterprises. Today, hexability is not only a peripheral activity of an engineering discipline or design company but also has gradually grown into a new industry. For example, the global market value of maintenance of civil jets and propeller aircrafts in 2014 reached as high as 56.3 billion US dollars. The development of maintenance technology has made GE a service-oriented manufacturing company. Another example is that in 2011, the global car ownership exceeded 1 billion, and the global car production reached 89 million in 2014. In the entire automotive industry chain, the

1.1 Background of Reliability System Engineering

11

automotive service industry accounts for 60% and maintenance is the core of automotive services, creating hundreds of billions of dollars in value every year and absorbing millions of employed people.

1.1.3 Challenges Faced by Hexability in China and Demands for Leap-Forward Development 1.1.3.1

Status Elevation of Hexability in China by the Galf War

In China, the domestic engineering design field is weak, and the equipment manufacturing industry has long been dominated by imitation. From the 1950s in which a relatively complete industrial system was established to the early stage of Reform and Opening-up, the systematic hexability design was almost blank. The typical manifestation was that designers lacked the consciousness of hexability design, the army did not require it, and the design lacked corresponding standards, difficult to advance, no assessment for acceptance. From 1980s, most of the self-developed equipment in China had many problems, such as a long development cycle, low combat effectiveness and many failures in use. However, at that time, no one realized that the main factor to cause these above-mentioned issues is the lack of hexability analysis. In the Gulf War that took place in early 1991 and lasted 43 days, within 53 h after receiving the order, 45 of the first 48 US F-15C air superiority fighters from the 1st Tactical Wing appeared in Saudi Arabia. This shows an extremely high combat readiness and rapid deployment capability. During the war in Iraq, the F-15C was mainly responsible for providing air protection for the troops and equipment deployed in Saudi Arabia, and used as the main aircraft for the competition of air supremacy. The 120 F-15C deployed in southwest Asia have flown a total of 5906 sorties, with an average flight duration of 5.19 h per sortie and a mission rate of up to 93.7%. Of the 39 Iraqi fighter jets shot down by the US Army in air combats, 34 were shot down by the F-15C. In contrast, only one of the F-15C fighters was lost, demonstrating outstanding combat readiness and strong combat capability of the US Army. This high effectiveness of the US military equipment has awakened the Chinese people. Since then, people have realized that the equipment “can fight and win wars” not only needs excellent combat performance, but also excellent hexability. The outstanding reliability, maintenance, and support ability of US military equipment is not naturally existent, but comes from the high attention and investment in the design of hexability. Regarding the formulation of standards and specifications, referring to US military standards and other international standards, a relatively complete hexability specification system was initially formed in light of China’s national conditions. In terms of supporting means, a part of RMS design analysis, testing technical tools and equipment were imported, to achieve remarkable results and solve the urgent needs for equipment development. In terms of key technology breakthrough, key technologies such as computer-aided RMS design and analysis

12

1 Development Phase of Reliability Systems Engineering

technology, integrated (temperature, humidity, low pressure, vibration) reliability test systems, electronic equipment component screening systems, mechanical and electrical product reliability integrated stress test technology, reliability test profile design technology, embedded software reliability simulation test technology, small sample reliability evaluation technology are tackled, and a large number of the three integrated (temperature, humidity, vibration) reliability test systems, fault analysis equipment and environmental test equipment were introduced. These above achievements and technical methods have been widely promoted and applied in high-tech equipment, providing key technical guarantees to the successful development and stable operation of these advanced constructions.

1.1.3.2

Challenge Faced by the Hexability Technology

Affected by its industrial foundation and design culture, it is always faced with the challenge of systematically and comprehensively implementing hexability design in China. Problems can be summarized such as: imperfect organizational model, irregular work process, inconsistent technical status, difficulty in accumulating design experience, insufficient synergy, system integrated without means, weak information foundation, and poor control on the overall status. These issues are caused by not only technical reasons, but also management factors, and deeper design cultural issues. The development of China’s hexability technology cannot completely copy the experience of the United States. It must develop its own theoretical and technical system based on China’s industrial foundation, management model, and cultural background, and take a path with Chinese characteristics. From a theoretical perspective, effective ways to ensure the realization of RMS and performance design requirements are based on the idea of reliability system engineering, overall planning performance and RMS design activities, coordinating standard performance and RMS engineering methods, and synchronizing control performance and RMS work processes. But different from the performance design requirements, the RMS requirements cannot be directly used as design parameters in the equipment development. The RMS characteristics need to be applied in large quantities and for a long time before they can be accurately measured. Therefore, in the process of equipment development, it has always been a difficult problem to develop the RMS design in a way that is easy for designers to understand and to adopt its implementation to gradually realize RMS requirements. After more than half a century of development in RMS engineering technology, a variety of methods and technical means have emerged and have been tested in engineering practice. These different RMS engineering methods do not exist in isolation but in extensive connections not only in between themselves but also between them and product function/performance design activities. These connections determine various further types of RMS design activities and their relationships with function/ performance design activities, which cannot be independent to each other but must be integrated and coordinated in accordance with certain rules. This book refers to this collaborative design process as function/performance and RMS integrated

1.1 Background of Reliability System Engineering

13

design. Due to the particularity of the RMS engineering technology, in the whole process of product development, the control of the implementation process of the RMS characteristics is often achieved through adequate application of management regulations and documents, the use of management and design review and other qualitative means, rather than “naturally to be achieved” in product design. There is a great risk of re-doing the product development due to RMS issues. Therefore, it is urgent to develop a method that can integrate the RMS design into the function/ performance design to achieve “precise” control. In this book, it is believed that in order to solve the problems systematically, improvements must be made in both technology and management, given in the following three aspects: (1) Achieve sharing of function/performance and RMS design information. There should be a unified source of information for performance and RMS design. Public information sources should be unified and traceable. Changes in product technical status should be reflected in the RMS design analysis in time, and the results of the RMS design analysis should be updated in time according to changes in the technical status and affect product design. At present, in the field of performance design, an accurate and unified model can be established and the sharing of design information can basically be realized. However, RMS design lacks a standardized and unified model, and RMS design and analysis are still at the initial stage with self-enclosed and one-way features, and lack of control. (2) Achieve the organic connection between function/performance and RMS design method. RMS and performance design have the same target object in a natural connection. Many RMS design analysis methods are carried out on the basis of performance models. Most RMS design methods can be incorporated into the performance design process, and the iteration of performance design can also be directly promoted by the RMS design conclusions; that is, an organic connection should be established between related engineering methods. However, due to the large differences in analysis purposes and modelling angles, the connection between RMS and performance design analysis is often implicit and vague, and it is very difficult to achieve interoperability. (3) Achieve precise control of function/performance and the RMS design process. The design of RMS is an integral part of product development, and the implementation process of the RMS design requirements should be integrated with the implementation process of the performance requirements. However, there is a lack of technical interoperability between the RMS engineering method and the performance engineering method, and there is also a lack of a unified control mechanism for the process of implementing RMS characteristics in terms of management. Therefore, the RMS design process cannot be organically integrated into the performance design process, resulting in two parallel ways. The consequence is that the RMS design is only a link that has to be done in the design process, and it is difficult to have a substantial impact on the product design, and it is easy to produce the so-called phenomenon of “two skins”. The traditional engineering process is mainly based on management to realize coordination and control between performance design and RMS design. It requires

14

1 Development Phase of Reliability Systems Engineering

highly experienced designers and managers, and the process is difficult to accurately control, prone to repetitive work, long work cycles, and high costs. At the same time, it is difficult to establish a unified digital integration platform that includes RMS design and cannot scientifically plan and effectively integrate various tools and means to support the efficient and coordinated development of performance and RMS integrated design. Therefore, the integrated design of performance and the RMS design cannot be discussed. 1.1.3.3

Requirements for Leap-Forward Development of the Hexibility Technology

There are two main obstacles to achieve the above improvements. One is that there is a big difference between the expression of functional performance design and various types of RMS design, which makes it difficult to automatically share and transfer information between performance and RMS; the other is that it is difficult to communicate and coordinate smoothly between performance and RMS design methods, so it is difficult to establish a scientific and reasonable unified process to control the entire process of design activities. For this reason, it is necessary to establish a unified hexability model through which the bridge between performance and various types of RMS design can be established to realize the unification of performance and RMS technology and management process. In addition, in order to achieve the precision, automation and intelligence, the hexability work should be transformed from a documentation-driven work flow to a model-based work flow. This also requires the establishment of a complete hexability design model system, which is consistently connected with the various models and on the other hand—seamlessly connected with product design process. However, the unified hexability model has not been systematically studied all over the World, and the model-based hexability design is in the development stage as well. This provides an unprecedented development opportunity to bring China to the forefront of the World in the field of reliability engineering. It also meets the needs for the rapid development of equipment research in China.

1.2 The Concept of Reliability System Engineering 1.2.1 System Engineering 1.2.1.1

Summary of Complex Engineering Systems

As Carl Marx said, “when numerous workers work together side by side, whether in one and the same process, or in different but connected processes, they are said to cooperate, or to work in cooperation”, “All combined labor on a large scale requires, more or less, a directing authority, in order to secure the harmonious working of the

1.2 The Concept of Reliability System Engineering

15

individual activities, and to perform the general functions that have their origin in the action of the combined organism, as distinguished from the action of its separate organs”, “A single violin player is his own conductor; an orchestra requires a separate one”. With the continuous development of science and technology, modern engineering has become increasingly complex, and large engineering systems such as aviation, aerospace, and nuclear engineering have emerged. The engineering system herein refers to a system that transforms demands into engineering products, including the practice of integrating science, technology and related elements by taking the value as the orientation, to achieve specific goals in an organized manner. These emerging projects are large in scale, multilevel, and complex in structure. They contain a large number of interactive components in terms of technical methods, personnel organization, and project management. They have the characteristics of complex internal correlation, uncertainty, and dynamics. This leads to the overall behavior of strong non-linearity of the system, which is therefore called a complex engineering system. The research and development of these large-scale and complex systems face many challenges, both from the system itself and the engineering process. And the essence of these challenges is to solve the various complex issues existed in an engineering system. These above complexity issues can be divided into three categories: (1) Object complexity of the engineering system refers to the inherent complexity of the engineering product itself, such as the diversity of value elements, the huge number of components, the intensity of interaction coupling, and the complexity of the expected use environment. (2) Subject complexity of the engineering system refers to the artificial complexity brought by the participants of the project, including the complexity of cognition and the complexity of behavior. The complexity of cognition is the source of the uncertainty of the engineering system, and the complexity of behavior may lead to various intentional, non-standard, naive, and even wrong engineering behaviors. (3) The environmental complexity of the engineering system is reflected in the impact of the increasing complexity of the various environments of the engineering system on the engineering system. The environment here is the sum of the resources that the engineering system may obtain and the constraints. The management factors affect the value, scientific, and technical elements of the engineering system. It can usually be divided into scientific and technological environment, cultural environment, social environment and natural environment. Prof. Xuesen Qian summarized the basic issues faced by complex engineering systems as: “How to gradually turn the relatively general initial development requirements into the specific tasks of thousands of participants in the development task” and

16

1 Development Phase of Reliability Systems Engineering

“How to finally integrate these tasks into a practical system that is technically reasonable, economically cost-effective, has a short development cycle, and can coordinate calculations, and makes this system an effective component of the larger system to which it belongs”.

1.2.1.2

Engineering Methodology for Complex Systems

As Bertalanffy said, “We are forced to use the concept of ‘whole’ or ‘system’ in all areas of knowledge to deal with complexity”. The complication trend of modern engineering systems has developed to use only consciously system concepts and principles to effectively deal with the complexity of the project. With the development of the system-based idea and methods in natural sciences, social sciences, engineering technology, and other fields, systems and their mechanisms are used as objects to study system types, properties, and rules of systems science gradually formed and began to mature. According to the opinion by Prof. Xuesen Qian, system science can be divided into three levels, including basic science, technical science, and engineering technology, respectively. The level of basic science is a discipline that studies the basic attributes and general rules of the system and is the basic theory of all systems research. At present, the basic science level system is still being established and perfected. The technical science level includes informatics, cybernetics, operations research, affair theory and other theories, which can provide direct guidance for engineering technology. The engineering technology level is the knowledge that directly transforms the objective world, and the most typical representative is the engineering of systems. In the development process of complex systems, the fundamental and technical scientific researches mainly play the guiding roles, and the solution of specific engineering issues require the support of engineering technology. According to the different roles in the development of complex systems, the engineering technology level can be refined into three levels, including the concept and methodology level, the engineering method level, enabling technology and the supporting environment level, respectively. These three levels interact with each other and together provide support for complex systems. The influence among them may be positive or negative. For example, design concepts or methodology may produce new engineering methods which will promote the development of corresponding enabling technologies and supporting environments; conversely, the development of enabling technologies and supporting environments may also change engineering methods, or even produce new ones. At present, the most representative viewpoints in the engineering methodology of complex systems include system engineering concepts, concurrent engineering concepts, and integration concepts. Among them, systems engineering researchers first took the engineering object as a system in their research. In the 1940s, the Bell Telephone Company of the United States firstly proposed the term “system engineering”. On the other hand, operations research gradually matured in World War II and was used in operation and management after the war, laying the foundation

1.2 The Concept of Reliability System Engineering

17

for the importance of system engineering. In 1957, the first book on “system engineering” was published, and then in the early 1960s, systems engineering gradually matured and officially became an independent discipline. The ideas and methods of systems engineering come from different industries, and its core role is to organize and manage the scientific ideas and technologies of engineering activities in accordance with the principles and methods of system science. Concurrent engineering, as a systematic idea, was first proposed by the Defense Advanced Research Projects Agency (DARPA) in 1986. Later, in 1988 the Institute for Defense Analysis (IDA) of the United States released the famous R-338 report, which clearly put forward the idea of concurrent engineering, and at the same time gave the most influential definition of concurrent engineering: “Concurrent engineering is a systematic working mode to provide parallel and integrated design for a product and its related processes (including manufacturing process and support process) [2]. This working mode strives to enable developers to consider all the elements of the product life cycle from the beginning, including quality, cost, schedule, and user needs”. The core idea of concurrent engineering is to organize product-centric and interdepartmental integrated product teams (IPT) for product development and to achieve rationalization of the product development process through the improvement and reorganization of the process. In 1990, Prof. Xuesen Qian named the method of dealing with open complex giant systems for the first time as integration method. The integration methodology clearly advocates to combine qualitative and quantitative research, combine scientific theory with empirical knowledge, and also combine multiple disciplines to conduct an integrated research based on the system idea. To unify the macro and micro researches of complex giant systems, it must be supported by a large-scale computer system, and the system is required not only to have functions such as information management and decision support, but also to have integrated functions. These three concepts were put forward by different advocators in different generations, so they focus on different aspects. As Gardiner pointed out, “Concurrent engineering and systems engineering focus on different aspects of the same object, and the two methods should be integrated to solve the issues”. Compared to the former two, the focus of the integration method is the complex large-scale system, which can be regarded as the inheritance and development of the two in a sense. It is noted that these three methodologies all follow the basic idea of system science, emphasizing the combination of reduction analysis thinking and comprehensive thinking, ensuring that the overall understanding is based on a detailed understanding of its parts, so as to break the existence issues of the modern engineering which using reductionism as the basis. In addition, although the three concepts have similar or overlapping parts, in fact they are still evolving continuously on their own, and have not yet formed a completely unified methodology.

18

1.2.1.3

1 Development Phase of Reliability Systems Engineering

Practical Applications of Engineering Methodology for Complex Systems

Complex system engineering concepts represented by systems engineering, concurrent engineering and integration, and related methods and technologies have been successfully applied in large-scale engineering systems and have achieved significant application effects. System engineering was first successfully applied to the “Apollo” moon landing program, which is a large-scale R&D project. During its implementation, hundreds of prime contractors, tens of thousands of companies, and enterprises participated in the development work. The entire project has a total of more than 15 million parts and components, costing more than 20 billion US dollars, lasted 11 years and finally achieved success. The idea of concurrent engineering and its theoretical methods were first applied in companies such as Boeing and Lockheed Martin. For example, Boeing has adopted the new concept of “parallel product definition” and new project management methods in the development of the new 767-X aircraft, thereby achieving the goal of a successful flight test within three years. The thought of “integration” proposed by Prof. Xuesen Qian was first successfully applied to the quantitative study of several complex weapon systems in China. In recent years, concepts such as concurrent engineering, systems engineering, and integration have been continuously applied in some major engineering system fields all over the world. People have been exploring in application to promote the development, enrichment and perfection of the relevant theories. Such as China’s manned spaceflight project and the Joint Strike Fighter (JSF) project jointly developed by the United States, United Kingdom and other countries. In these projects, the boundaries of various engineering methodological concepts are becoming more and more blurred, and their application is often a comprehensive manifestation of multiple concepts. The ideas and methods of system engineering were used not only to organize and manage the overall process of the entire project, overcome a series of difficulties and obstacles caused by the complexity and uncertainty of large-scale engineering systems, but also in various elements of the product’s life cycle were was considered to reduce the cost in product design early stage, according to the idea of concurrent engineering. At the same time, the project also contained the idea of integration. The engineering methodology, engineering methods, and enabling technologies of complex systems are driven by the requirements of engineering system projects. Along with the successful experience and the failure lessons of engineering practice, new ideas, methods, and technologies are continuously emerging. With the increasing complexity of modern engineering systems, the solution of engineering problems will inevitably move towards the dialectical unity of “reduction theory” and “system theory”, that is, to solve the complexity of engineering systems through the viewpoint of “system theory”.

1.2 The Concept of Reliability System Engineering

19

1.2.2 Definition of Reliability System Engineering To solve the complexity problem of reliability engineering and the ‘indigestion problem’ of imported technologies, Zili Wang used the system engineering method to present the ‘Comprehensive Quality View’ on three dimensions (i.e. the whole system, whole life and whole characteristic) and the concept of reforming quality technology. The quality characteristics are divided into general quality characteristics and special quality characteristics. General quality characteristics are proposed in contrast to special quality characteristics. Special quality characteristics refer to size, weight, accuracy etc., while general quality characteristics refer to reliability, maintainability, supportability, testability, safety, environmental adaptability, and electromagnetic compatibility. The general quality characteristics in this book generally refer to hexability. One of the important challenges faced by complex engineering systems is how to ensure the long-term stable operation of the system, i.e. to maintain a high level of quality and reliability of the system. This is a complex task. As shown in Fig. 1.4, the traditional engineering method relies on tests much more than design. These tests are mainly based on the principle of trial and error, using test-analysis-and-fix (TAAF) to iteratively identify the hexability issues of the system, to improve the system design, and finally improve the general quality of the system. The drawbacks of using this approach are as follows: (1) It usually takes a long time to expose the problems, with also lots of requirements of test labor, equipment, test source and samples etc. (2) Many of the problems exposed during the test stage need to be traced back to the design stage, which will elongate the development cycle of the equipment. (3) The test profile cannot fully simulate the real environment, and many potential problems cannot be fully exposed by the test alone, which will become hidden dangers in future practical applications.

Design of universal quality characteristics

Verification of universal quality characteristics

Fig. 1.4 The process of establishment of hexability of traditional equipment

20

1 Development Phase of Reliability Systems Engineering

Academician Prof. Sili Liang, an aerospace reliability engineering expert, proposed in the 1960s that “quality and reliability are designed, produced, and managed, rather than verified, tested, and statistically analyzed. To improve reliability, we must solve every engineering technical problem in the entire development process, it is necessary to establish the relevant principles of the total quality theory for very small samples with Chinese features”. Prof. Xuesen Qian also put forward the same statement of “reliability is designed, produced and managed”. Scientists in the older generations unanimously understood and practiced reliability engineering from a system-based perspective. China’s industrial foundation, management model, and design culture are quite different from those of western countries. During his research process on reliability theories and engineering practice, Professor Weimin Yang, the founder of national defense reliability engineering and education, was keenly aware that it was not feasible to completely copy foreign experience, but necessary to establish China’s own reliability engineering theory. After years of intensive research and engineering practice, Professor Weimin Yang firstly proposed the concept of Reliability System Engineering (RSE) in a paper published at the first ICRMS Annual Conference in 1994, Reliability System Engineering: Theory and Practice [8]. And then he systematically explained the concept and connotation of RSE in his book ‘General Theory of Maintainability and Supportability’ published in 1995. Reliability system engineering is an engineering technology that analyses the whole life process of a product and fights against faults. This technology starting from the integrity of the product and its dialectical relationship with the external environment, using methods such as experimental research, field investigation, fault or maintenance activity analysis, to study the relationship between product life and reliability and the external environment, and to study the rules of the fault occurrence, development, prevention and maintenance, until elimination of products, as well as a series of technical and management activities to improve reliability, extend lifetime and improve efficiency. The overall goal is to improve the product’s combat readiness and mission success, and reduce maintenance manpower and support costs. The essence of RSE is to study the rules of fault occurrence and development, repair and future fault prevention or its elimination. Reliability system engineering does not mean the reliability system engineering, but rather using a system-based idea to solve reliability engineering problems. The reliability here has a generalized concept including hexability. In other words, reliability system engineering is an engineering technology that studies the rule of “preventing and curing diseases” for products. This is similar to the medical system engineering that studies human life processes and fights against diseases, as shown in Fig. 1.5. The inherent characteristics of human beings can be expressed by C (Capability). But if someone wants to be a useful person, he must not only have the ability but also be healthy and able to do (Availability, A) when he is required to work, and in his working process, he must ensure that he can accomplish the task (Dependability, D). For the product, it should be able to start working normally at any time when it is required to work (active when it is turned on, A); at any time in the entire mission profile, it can work and complete its specified functions (Dependability, D); then, it

1.2 The Concept of Reliability System Engineering

21

The efficiency of Huam beings

The effectiveness of product

A

D

C

A

D

C

Availability

Dependability

Capability

Availability

Dependability

Capability

Healthiness(H) CE

working capacity physical power brain power

R.M

R.M

C

Capacity to heal CE

Capability C

R

M

C

maintainability testability level

performance level

Healthiness(H) Capacity to Heal CH

Healthiness H health level life level

Level of healing capacity Level of diagnostic capacity

reliability life level

level of capacity

external environment

external environment

disease prevention disease occurrence and development

Prevention and diagnosis

Production and work processes

maintenance Fault occurrence and development Diagnosis and repair

diagnoses and treatment health care pull through

health level life level

Maintenance MT

Process of existence and use

Maintenance and support

removal of fault Level of treatability Level of diagnosticability

level of capacity

reliability level life level

Maintainability level Testability level

performance level

Fig. 1.5 Medical system engineering versus reliability system engineering

is its ability to complete the specified tasks (Capability, C). An integration of A, D, and C gives the effectiveness of the products.

1.2.3 Philosophical Connotation of Reliability System Engineering The salient feature of Reliability Systems Engineering is its practical philosophy, which is a theoretical innovation made by Professor Yang Weimin on the basis of a profound understanding of the issues and rules of hexability engineering, combined with the essence of Chinese traditional culture and modern system science. This philosophical idea is based on engineering science, physics, human theory, and affair, and is realized through complete decision-making methodology, modelling methodology, simulation technology, optimization technology, and information technology. Its connotation can be summarized as follows.

1.2.3.1

Overall View

The overall view is a quintessence of traditional Chinese culture, which can be shown by the Go game as a typical example. Robertson and Munro proved that Go is a kind of PSPACE hard problem in 1978. At present, it is assumed that the number of memory calculations to win a Go game is more than 10600 , which has

22

1 Development Phase of Reliability Systems Engineering

exceeded the number of atoms in the universe (1075 ). Therefore, the solution of a Go problem cannot be done by an exhaustive method, but must be combined with human thinking and intuition. On March 15, 2016, the artificial intelligence program “AlphaGo” developed by Google defeated the Korean Go grandmaster Se-Dol Lee with a total score of 4:1. AlphaGo uses deep learning technology to successfully learn and apply the strategies from the Go masters. Therefore, AlphaGo does not prove computers defeating humans but shows development level that humans have achieved in the field of computing technology. The strategy of Go game is different from simple exhaustive calculations. The Go master Mr. Qingyuan Wu once said “The goal of the Go game is not limited to the struggle for sides and corners, but to maintain the balance of the whole status by looking at it from a high level”. “Every move in the Go game must consider the overall balance”. One of the ten key points of the Go game is: “the move must have certain conditions and timing” means that when you play Go, you must have an overall concept, always put the overall situation in the first place, and the local action must be in response to the overall situation. With a clear understanding of the local and overall relationship, the position chosen by the player must match the surrounding situation. The “corresponding conditions and timing” includes: gaining support from the surroundings, using the overall power to attack the opponent, coordinating with the overall situation and expanding the territory. The integration of various strategies to play Go greatly reduces the amount of possible calculations, allowing a single person to control the ultracomplex of Go “solution set” through “calculations”. Reliability system engineering uses this overall view to deal with the issues of hexability design. Modern complex equipment is made up of an astonishing number of parts. For example, the total number of parts in a Boeing 747 is up to 6 million. If the faults of each part and its assembly are analyzed according to the traditional reliability engineering method, the number of possible issues and the set of solutions create astronomical numbers that that are not typical for classical engineering. As shown in Fig. 1.6, the engineering of the reliability system is like a Go pattern. In the process of product development, key points are analyzed as a complex relationship between product, function, performance, use, fault, testing, maintenance and support. Then it needs to start from the overall situation of user requirements, grasp the main design contradictions, make plans in advance, predict possible design constraints and design contradictions in advance, and provide design solutions in advance, to avoid issues which may delay the development progress and increase development cost in the late development period.

1.2.3.2

Harmony

“Yin and Yang are the principles of heaven and earth, the principles of all things, and the source of change”. Yin and Yang are the basic factors behind the rules of nature in ancient Chinese civilization to promote the development and change of the rules of nature. This kind of thinking can also interpret the connotation of reliability system engineering well. Products and faults are two contradictory aspects. According to

1.2 The Concept of Reliability System Engineering

23

development cycle support usage performance test maintenance fault product function safety

product hierarchy

Fig. 1.6 Overview of the engineering of reliability systems

the second rule of thermodynamics, products tend to transform into faults. However, the characteristics of the products and human intervention can delay or reverse this trend and reduce the risk of impact by human and environments. The special quality characteristics of the products can be divided into functions and performances. The general quality characteristics related to faults mainly include hexability. As shown in Fig. 1.7, these eight elements constitute “the Eight Diagrams” of the product design characteristics. The essence of the product design process is to deal with the associations and conflicts between design features and to achieve predetermined design goals. By integrating this kind of harmonious thinking into the concepts and methods of modern science, hexability can be comprehensively considered and optimized according to the quantified goals (comparative readiness rate, mission success rate, life cycle cost, etc.). performance

support

maintenance

fault

product

environment

function

Fig. 1.7 Harmony between special quality characteristics (function/performance) and general quality characteristics (hexability)

24

1 Development Phase of Reliability Systems Engineering

Harmony and unity of functional performance and hexability should be considered in the initial stage of product development and be carried out throughout the product development process. The harmony of reliability system engineering is also reflected in the harmony and unity of technology and management, development process and development methods, products, work, and human beings.

1.3 Formation of the Theoretical and Technological Framework of Reliability System Engineering RSE is a reliability maintainability and supportability design theory oriented to national conditions and independent development. It is derived from engineering, faces engineering requirements, spirals upward and continuously develops. In 2005, Prof. Rui Kang and Prof. Zili Wang further gave the connotation and technical framework of reliability system engineering [9]. They elaborated the connotation of reliability system engineering from three levels: basic theory, basic technology, and integrated technology. The basic theory of reliability system engineering refers mainly to understanding the rule of faults, including understanding the rule of fault occurrence and performance. The basic technology of reliability system engineering is fault prevention technology, fault control technology, and fault remedy technology, all of which are developed with the use of basic theory. The integrated technology of reliability system engineering refers to the application technology of reliability system engineering formed with the use of basic theory and basic technology that can be successfully applied for product requirements analysis, design and analysis, test and evaluation, safety production, and finally for application and support. These technologies can form the application capability of reliability system engineering. The theoretical and technical framework of the engineering of the reliability system is shown in Fig. 1.8. In 2007, Reliability System Engineering was officially given in China as: “Using system engineering theories and methods, taking faults as the core, studying the rule of fault occurrence during the whole life cycle of complex systems and the comprehensive engineering technology of fault prevention, fault control, and fault remedy. It starts from the integrity of the system and its dialectical relationship with the external environment, and is based on mathematics, physics, science, statistics, operations research, science of affairs, cybernetics, system theory, and information theory, the comprehensive use of mechanics, materials science, electronics, management, computer science and other professional technologies, the use of technology and management combining, qualitative and quantitative, simulation and testing methods, research the modes, mechanisms, and rules of system faults, as well as the theories, technologies, and methods of preventing, controlling, and remedying system faults. It consists also of reliability engineering, maintainability engineering, test engineering, maintenance support engineering, humanity engineering etc. It takes high efficiency and low cost (that is, the best efficiency-cost ratio) as the target of

1.3 Formation of the Theoretical and Technological Framework …

25 materials internal cause

structure technology

causes and mechanisms of fault

pattern of utilization external cause

basic theory

recognize the law of fault

randomness regularity of fault behavior

environmental conditions human factor

certainty ambiguity/ roughness

reliability systems engineering theoretical and technical framework

fault prevention technology basic technology

comprehensive prevention and control of faults

fault control technology fault fixing technology

comprehensive demonstration technology design and analysis technology integrated technology

ability to form technology

test and evaluation technology production assurance technology apply and safeguard technology

Fig. 1.8 The theoretical and technical framework of reliability system engineering

trade-off and emphasizes the integrity and optimality of the system, so that the system has fewer faults, long life, easy maintenance, good support and low cost, and realizes high availability and high reliability of the system”. Reliability system engineering is regarded as an enabling technology that aims to eliminate system obstacles in terms of coordination, compatibility, applicability, stability, continuity, repeatability, usability etc., and is the “multiplier” for relevant performance and functional technologies. As a new engineering technology, reliability system engineering has been widely used in the development, production, and utilization of both military and civilian equipment and products, and helped to achieved huge benefits. In 2015, the first International Conference of Reliability System Engineering (ICRSE) was held in Beijing. Prof. Zili Wang first claimed that the core of reliability system engineering is the integrated design of the six characteristics (including reliability, maintainability, supportability, testability, safety, and environmental adaptability), and created a novel word “Hexabillity” for the full characteristics of reliability system engineering.

26

1 Development Phase of Reliability Systems Engineering

1.3.1 Fundamental Theory of Reliability System Engineering Reliability system engineering is carried out with faults as the core, and its foundation is understanding the rules of faults and grasping the characteristics of faults. As shown in Fig. 1.9, the fundamental theory of reliability system engineering includes unit fault rules and system fault rules, and can identify faults systematically based on these rules. The two basic behavior states of the product are normal operation and fault. When the product is designed, produced, and put into use, it can be said that we have basically mastered how the product operates and realizes its functions, that is, the rule of normal operation. But it is often difficult to say clearly how the product will fail and lose its function, that is, the rule of faults. Only by revealing the causes and mechanisms of product faults and understanding the behavioral rules of faults can these rules be used to prevent, control, and remedy faults. If the fault of the product is compared with the human disease, the fault mode is equivalent to the disease, and the fault cause and mechanism are equivalent to the pathology. It is difficult to prescribe the right medicine if you only know the symptoms but not the pathology. In the same way, it is impossible to prevent, diagnose, predict, treat, and remedy the fault without an understanding of the cause and mechanism of the fault and its behavior. If the researched product is analyzed as a whole, it can be regarded as a unit. The unit fault is the result of the interaction of internal and external factors (see Fig. 1.9). The internal factors mainly include the materials used in the unit, the structure and processing technology of the unit etc. External factors mainly include the use mode of the product, the environmental conditions experienced during the use of the unit,

Fig. 1.9 Fundamental theory of reliability system engineering (see the end of the book for the color picture)

1.3 Formation of the Theoretical and Technological Framework …

27

and human factors. The investigation of the mechanism and behavior of fault caused by the coupling of internal and external factors is the primary issue to be studied in the basic theory of reliability system engineering. Due to the different perspectives of the studies on product faults, the indications of the fault are also different. So far, domestic and foreign researchers have mainly studied and understood the rules of fault behavior from three aspects. First, they believed that faults are random and can be described by probability models, the so-called reliability mathematics, which gave birth to reliability mathematicsbased engineering technology; the second is that faults are deterministic and can be described by physical models, namely reliability physics, which gave birth to reliability engineering technology based on reliability physics; third, in recent years, some researchers believed that faults are vague and raw which can be described by fuzzy model and rough sets, respectively. System functions are realized through the organic combination of units. System faults are also caused by unit faults. How to establish a connection between unit faults and system faults is the core of the study of system fault rules. Unit faults evolve into system faults, which can be attributed to the propagation of unit faults in time, space and functional logic, as well as the accumulation, coupling, or logical addition of multi-unit faults. This requires comprehensively modelling and analyze from multiple perspectives such as physical rules and mathematical logic. The fundamental theory of reliability system engineering requires a comprehensive consideration of the physical rules of determinism of unit faults, the statistical rules of randomness, ambiguity, and roughness, as well as the system rules of unit failure evolution, propagation, and combination, to gain an in-depth understanding and reveal the fault rules of the unit system.

1.3.2 Technology Framework for Reliability System Engineering On the basis of understanding of the rules of faults and their further use rule, a number of related technologies can be developed, such as fault prevention design technology, fault diagnosis prediction technology, and fault remedy technology, to constitute the technical framework of reliability system engineering. (1) Fault prevention design technology. Fault prevention design technology is an important part of the fundamental technology of reliability system engineering. It is mainly used in the design and production stages of products, through design improvements to avoid faults from occurring within a specified time and under specified conditions. Current available methods such as design margin, derating analysis, environmental resistance design, statistical process control, and process reliability improvement design are technologies to prevent faults. Due to the complexity of the fault cause and its mechanism, it is difficult to

28

1 Development Phase of Reliability Systems Engineering

completely eliminate the fault, but the risk of fault can be reduced through this technology. (2) Fault diagnosis and prediction technology. During the product design stage, in the case of faults which cannot be eliminated by design, fault diagnosis prediction technology can be used to perceive the status of the product, predict the occurrence and progress trend of the fault, and diagnose the occurrence situation and location of the fault by collecting the characteristic signals of the product by adding sensors. The fault perception and prediction can be based on the rules of the determination, randomness and ambiguity of the failure mechanism. However, no matter which method is used, it is the most challenging task, in reliability system engineering technology, to accurately predict the fault trend of each product under the combined effect of internal and external factors. (3) Technology to fault remedy. The fault remedy indicates that after the fault is correctly diagnosed and predicted, the potential faults of the product can be remedied or the functions can be restored in a timely and effective manner. To remedy the existing and potential faults, a fault-tolerant design should be firstly carried out to quickly switch to backup mode when a fault occurs. Then maintainability and supportability designs should be carried out to fix product faults. These designs need consider the technologies, procedures and the procurement and supply of spare parts, tools, equipment, and labors for the remedy of product faults. As shown in Fig. 1.10, by taking the fault prevention, detection and remedy as the core, Beihang University has built a fault prevention and control technology system adequate for multi-level objects, all-stage processes, and multidisciplinary methods. Fig. 1.10 Fault prevention and control technology system and fault prevention and control technology type spectrum

Design Analysis

P Prevention

Test Evaluate

Failure P&C

Diagnosis

D Detection

Prognostic Maintainence

R Support Remedy

1.3 Formation of the Theoretical and Technological Framework …

29

1.3.3 Application Mode of Reliability System Engineering In the process of equipment development, the Institute of Reliability Engineering, Beihang University, proposed a reliability system engineering application model that includes eight elements, that is, organization type, professional group, index requirements, work flow, specification guideline, data information, process monitoring and evaluation of implementation capacity. Only by combining these elements with advanced RMS technology, the system engineering activities for the reliability of equipment models can be implemented.

1.3.3.1

Determination of the Type of Organization

There are three common organization types in RMS: quality management, project management, and matrix management, whose advantages and disadvantages are listed in Table 1.1.

1.3.3.2

Establishment of the Professional Group

Most of the RMS engineering activities are carried out by a certain number of qualified technical and management engineers. This requires the establishment of professional technical and management positions within the institute of RMS and engineers who meet the job requirements to ensure the implementation of various tasks. When setting up these positions, it should cover the various professional scopes of RMS, and specifically the corresponding responsibilities of each professional technical or management position through system documents, to achieve clear division of labors and responsibilities. Meanwhile, in the management system, it is necessary to clearly evaluate the RMS engineers in a qualitative or quantitative way, through the establishment of an evaluation index system, and offer the necessary rewards and punishments based on the evaluation results. Since the RMS profession is still emerging in the engineering community in China, staff training is particularly important. Only by setting up a hierarchical training system in a targeted manner, conducting necessary supervision to the training institutions and their trainers, and ensuring the effectiveness and authority of the training it can be ensured that the RMS technology is implemented in the development of equipment.

1.3.3.3

Determination of the Index Requirement

The RMS requirements for current equipment can be summarized into three categories, i.e. qualitative requirements, quantitative requirements, and work item requirements, which can be specifically represented by long life, high reliability, fast diagnosis, easy maintenance, good support and safety. For different equipment

30

1 Development Phase of Reliability Systems Engineering

Table 1.1 Comparison among the 3 types of organizations Manner of working

Advantages

Shortcomings

Quality management

The quality management department sets up one or more RMS professional groups to carry out the RMS technology and management of all projects

➀ Unification of technology and management ➁ Concentrate resources and reduce repetitive work

➀ The responsibilities of technology and management are ambiguous, which may make it difficult for RMS technical work to be integrated into the product development process ➁ When the number of projects increases or the complexity increases, the RMS technical workload doubles, and the professional team may be difficult to perform

Project management

Each project has its own RMS technical and management resources, and its RMS engineering activities are carried out independently

It can quickly respond to customer needs and can significantly improve work efficiency in the short term

Due to the lack of technical and management resources, it will lead to the competition of RMS resources within the equipment development unit

Matrix management

It is a combination of the above two forms, with a dedicated agency for RMS engineering activities, and its in-house RMS professionals are responsible for different projects

RMS has become an independent engineering major and can share the experience of different projects, which is conducive to the great improvement of internal professional capabilities

The management and coordination of this mode are more difficult and require professional and technical means as support

types and product levels, the indices of the above six aspects have different forms and are related to each other to result in a three-dimensional indicator system. In the process of argument and development of any type of equipment, this indicator system must be clarified, to guide the development of various RMS engineering activities in good order.

1.3.3.4

Arrangement of the Workflow

In a correct workflow, the work responsibilities of all parties involved in the model development are clearly divided according to the internal logical relationship of the RMS profession and the management level of equipment development. The workflow

1.3 Formation of the Theoretical and Technological Framework …

31

manager clarifies the responsible body (that is, who in which department), working timing (when), work items (what to do, what tools to use), and the input and output of the project (basic data, process data, results data) in the reliability system engineering activities. It is regarded that the workflow is the core (and key) to ensure the success of an integrated RMS work.

1.3.3.5

Formulation of the Specification Guideline

For each RMS engineering activity, corresponding specifications and guidelines should be formulated according to the characteristics of the model, to guide and restrain the relevant engineers to carry out the activities. Currently, many RMS technologies do not have relevant standards or existing standards are difficult to cover the actual requirements of the equipment. This requires that the technical and management guidelines should match the equipment requirements, to formulate its RMS specification system.

1.3.3.6

Data Collection

Data is the basis for the effective operation of the RMS integration platform. To perform the integration of RMS, it is necessary to establish a mechanism for the collection, analysis, processing, and feedback of RMS data and to conduct centralized management to ensure that valuable external data and information to be imported into the RMS integration platform.

1.3.3.7

Process Monitoring

There are many responsible organizations related to equipment RMS engineering activities, and the management chain is quite long as well. Therefore, it is necessary to establish a continuous process monitoring system, which points the responsible organizations of monitoring, and refines the process monitoring nodes. The monitoring focuses on whether the RMS engineering activities are carried out according to the workflow, whether the technology and methods adopted are reasonable, whether the interface relationship between the projects is correct, and whether the results of the RMS engineering activities meet the requirements, etc. Qualitative inspection methods are adopted for the realization of various RMS work items and qualitative index requirements, and quantitative tracking methods are adopted for various RMS quantitative index requirements.

32

1.3.3.8

1 Development Phase of Reliability Systems Engineering

Evaluation of the Implementation Capacity

Reliability system engineering capability reveals the comprehensive ability of an institute to use the mature RMS technology to form the equipment RMS level. It depends on not only the existing RMS technology level, but also on the understanding, application, and implement of the RMS technology established by the Institute. Reliability system engineering capability is a comprehensive reflection of the RMS management level of the R&D department, and is closely related to the above mentioned seven elements and the degree of integrated application of the RMS technology. Companies can identify their weaknesses in RMS project management through an external evaluation or internal self-evaluation to find out their working direction. Currently, the reliability system engineering capability assessment method has been incorporated into the enterprise standards in Chinese aircraft manufacturing industry.

1.4 Modelling Trend of Reliability System Engineering 1.4.1 Emergence and Development of Model-Based System Engineering (MBSE) With the development of technology, more and more elements such as personnel, technology, hardware, software, processes, and enterprises, etc. participate in the product system, and the application and support process has become more and more complicated, thus giving birth to a new generation of system engineering methods, Model-based Systems Engineering (MBSE). The International Council on Systems Engineering (INCOSE) first proposed MBSE[10] in the book “System Engineering Vision 2020” and officially defined MBSE as “a formal method of applying modelling methods to support requirement definition, design, and analysis, review, and validation of a system, starting from the conceptual stage and running through the entire product development process”. Its main purpose is to fix the limitations of document-based systems engineering (DBSE) methods, such as the difficulties in trace requirements, status consistency, and information reuse. As shown in Fig. 1.11, the transformation is implemented from document-centric methods to model-centric methods. By improving the traceability of requirements, the MBSE method strengthens communication and coordination among multiple users, to improve knowledge extraction ability, design accuracy and integrity, and then facilitate information reuse, strengthen the system engineering process, and reduce development risks. This method can design the system architecture according to the functions defined by its requirements. In the initial stage of design, the designer can analyze and optimize the overall design plan of the product based on the system model and complete the settings of performance index of each subsystem. In the subsystem development stage, the subsystem model is further refined. On the one

1.4 Modelling Trend of Reliability System Engineering

33 future

past

document-centric

model-centric

Fig. 1.11 DBSE versus MBSE

hand, one can check whether the performance of the subsystem meets the performance index requirements defined in the system design stage. On the other hand, the refined subsystem model can replace the subsystem functional model in the whole system model, then optimize the subsystems in the overall system design. With the rapid development of computer technology, the model-centric system design method reveals its advantage of consistency and collaboration. The modelbased approach is the development trend of system engineering in the last 10 years, and the design of model-driven architecture (MDA) has greatly promoted the popularization and application of MBSE. The system model is the main result of MBSE, which is made up of key elements of the system, including requirements, structure, behavior, and parameters. Designers can use suitable modelling languages such as SysML, OPDs/OPL, etc. to build system models. In recent years, some modelling tools have been introduced, such as Artisan Studio, OPCAT, CORE, etc., all of which provide a unified model library, which is convenient for designers to apply unified model elements for modelling [11]. These model libraries provide all system-related information, including user needs, decision analysis, environmental impact analysis, socioeconomic impact analysis, etc. The application of MBSE method modelling is conducive to the system integration in the development process, especially the cross-regional and cross-departmental design mode of modern large-scale complex products, which requires a lot of system integration work, requires designers to have a unified, applicable and accurate communication mechanism [12, 13]. Since the design from different departments is under a unified architecture, the integration of different subsystem models becomes easier, making possible the virtual prototype verification. MBSE can also be used to evaluate design quality, development process, risk, etc. The design quality can be evaluated by analyzing the satisfaction of requirements and monitoring key design parameters such as reliability; The development process can be evaluated by analyzing the number of used cases,

34

1 Development Phase of Reliability Systems Engineering

the percentage of components and parts, the completeness of interfaces and attribute descriptions. And the risk can be evaluated through the COSYSMO model. So far, there is no uniform standard for the MBSE method. This standard is expected to be created over the next 10 years. However, after many years of research, three relevant regulations have been published internationally, and have gained widespread attention: (1) Terminology and system engineering conceptual design model proposed by Olive et al.: Provides a series of definitions and graphical models required for system engineering conceptual design. Its purpose is to standardize terminology and unify definitions, which can be used as a support for the MBSE method. (2) Information model of system design proposed by Baker et al.: It can help designers understand the MBSE method from the perspective of information and its relationship. The information indicated by this specification includes 4 main parts, including model, requirements, components, and design plans. The designer determines the components of the system through the requirements. The requirements can be further decomposed into sub-requirements, and the components can also be further decomposed, and the design plan needs to meet the requirements. (3) The mathematical model of system engineering and MBSE proposed by Wymore: It gave the mathematical model framework of the large-scale complex system development process, which is the famous Wymorian theory. Wymore believes that everyone will have their own understanding of “systems”, so he is committed to applying set theory and system models to establish widely applicable mathematical models to define “systems”. Application of the MBSE method is not limited to the product development model but also to the V-model, the waterfall model and the spiral model. This also enhances the generalizability of the MBSE method. In recent years, there are more and more applications of the MBSE method in typical fields such as aviation, aerospace, automobile, shipbuilding, manufacturing, and management. Under the guidance of Dr. Lefei Li from the Department of Industrial Engineering of Tsinghua University, Dr. Xinguo Zhang from Aviation Industry Corporation of China, Ltd. (AVIC) translated the book “System Engineering Manual” published by INCOSE and promoted it among the original equipment manufacturers (OEMs) of AVIC. At present, some domestic OEMs and research institutions such as the Civil Aircraft Research Center and Chengdu Aircraft Industry (Group) Co., Ltd. have gradually begun to promote the application of the MBSE method. However, there are still many technical and management issues to be solved for the in-depth and comprehensive application of MBSE in engineering. According to INCOSE’s 2025 vision, the MBSE method will continue to play an important role in system engineering. This is an irreversible trend in the development of global systems engineering technology. The MBSE method will be naturally integrated into software, hardware, society, economy, environmental engineering, and other fields to create a certain system environment and work environment [10].

1.4 Modelling Trend of Reliability System Engineering

35

1.4.2 Trend of Modelling, Virtualization, and Integration of the Hexability Design The development of information technology has provided technical support for the implementation of concurrent engineering. Since the 1990s, digitalization of product development has become a supporting technology platform for product development in developed countries. It has a profound impact on product design, test, production, and management. Digitization and integration technology run through the product life cycle, implement product information integration and process integration, promote the coordination, flexibility, agility and intelligence of product development, production and management, and greatly shorten the product development and production cycle. For example, the US F-35 fighter jet adopts digital technology and has established a seamlessly linked and closely coordinated digital collaborative environment in which 50 companies from 30 countries around the world participate in research and development. Based on this, the design and manufacture of aircraft involving 3 variants and 4 military branches based on digital technology was quickly realized, the development cycle was reduced by 50%, and the manufacturing cost was cut by 50%. Typical digital development platforms are the PDM (Product Data Management) products including 3DEXPERIENCE developed by Dassault Group, UGS developed by Team-Center, and Windchill developed by PTC, which represent the highest level of current PDM technology. PDM products have also been widely used in the fields such as aviation, aerospace, shipbuilding, and weapons industry in China. From the 1990s to the present, the technical complexity of US military information products has further increased, the product application and support process has become more complex, and the timeliness and economics of development have become new issues. The US Army has implemented concurrent engineering and adopted advanced CAD/ CAM/CAE technologies to further improve the efficiency and effectiveness of the RMS work of products. “Integration” has become a new era characteristic of the RMS technology development of US military products, including model-based design, simulation-based method, integration-based tool, etc.

1.4.2.1

Model-Based Design

Model-based design acts as: in the product design process, different technical states of products are manifested in various product models and the association and evolution between the models. Product models are a set of attributes that can comprehensively reflect product characteristics at a specific moment in the product design process, which at least includes the requirement model, function model, physical model, fault model, maintainability model, testability model, comprehensive support model, etc. Details of the model-based design include:

36

1 Development Phase of Reliability Systems Engineering

(1) Model-based design process. Researchers at Georgia Institute of Technology (GIT) established a MBSE based system analysis process. Taking an excavator as an example, they established a system model, a system operation scenario model, and a factory manufacturing model for production, respectively. The modelling process includes the collection of knowledge of meta-model, profile, model library, etc., and simulation analysis of multidomain hybrid systems such as target optimization model, cost model, reliability model, mechanism dynamics model, etc., to achieve the impact on system design, ensure the synchronous realization between the system design rand reliability target, and finally get the product design plan. The US Defense Advanced Research Projects Agency (DARPA) is currently implementing a number of projects relevant to AVM (Adaptive Vehicle Make), including the unified modelling C2M2L model library, model-based system design analysis and validation technology CYPHY/ META, web-based network collaborative design Platform Vehicle FORGE. (2) Model-based fault prediction and health management (PHM) and comprehensive support technology. The fault diagnosis software RODON based on the model reasoning technology, developed by Sörman Infomation Company, uses the conflict-based fault search mechanism to inference and simulate system faults, finally complete the system fault diagnosis and generate fault diagnosis knowledge, including fault diagnosis decision trees, for fault isolation and health monitoring. The US F-35 joint strike fighter uses a large number of PHM technologies which are integrated with the F-35 Autonomic Logistics Global Sustainment (ALGS). By using the health report code (HRC) to uniquely locate specific aircraft faults or incidents, a comprehensive solution is established from built-in test to fault diagnosis, fault prediction, health management, to maintenance support, to achieve a global autonomous and continuous support, as shown in Fig. 1.12. 1.4.2.2

Simulation-Based Method

Simulation-based methods are used intensively as modelling and simulation methods in an integrated environment. In the index demonstration and scheme balance of reliability, maintainability and supportability, the simulation-based method can raise the scope and accuracy for the issues that needs to be considered. In the design analysis stage, based on the product-specific characteristic model, and with the introduction of the faults, the reliability, maintainability, and supportability of the product are simulated and analyzed, to ensure that the RMS work can be carried out in the early stage of product development to improve the ability and accuracy of design and analysis. The use of the simulation method in the verification and evaluation of the RMS virtual test can greatly reduce the number of physical tests, improve the verification and evaluation ability of the RMS test, and reduce the product development cycle and entire life cycle cost. For example, in order to achieve the goal of “better, faster, cheaper” for weapon systems, the US Department of Defense proposed a new idea of SBA (Simulation-Based Acquisition) by using simulation methods in 1997, proposed

1.4 Modelling Trend of Reliability System Engineering

Fig. 1.12 Illustration of the global sustainment of the autonomic logistics F-35 based on PHM

37

38

1 Development Phase of Reliability Systems Engineering

the SBA architecture and implementation process in 1998, and take the RMS integrated simulation as an important part of SBA. In this way, it can effectively reduce the time resources and risks of the entire procurement process, reduce the cost of the entire life cycle, increase the military value and supportability of the multi-equipment system, and support the integration of product process development. By taking the models as core, both PRISME and Bourge University carried out a project including FMEA analysis, reliability and failure scenario analysis, real-time embedded system simulation analysis, and the MeDISIS framework of the Simulinkbased system simulation. This project has been successfully applied in the European Defense Group. In China, the simulation methods for RMS are mainly applied in mechanical reliability analysis, reliability and durability simulation tests, and immersive virtual reality maintainability simulation. For reliability and durability simulation tests, fault simulations are mainly used for LRU, SRU or board-level products, to expose the design defects in product function or structure design as early as possible. Virtual reality maintainability simulations are mainly used in machine maintainability analysis and verification, to expose the maintainability design defects as early as possible.

1.4.2.3

Integration-Based Tool

Integration-based tools are intensively used as the RMS design and analysis tools and their integration applying CAD/CAE technologies. It will improve the quality and efficiency of product’s RMS design and analysis, reduce the development cycle, and increase the RMS level. The integrated application between the RMS management and RMS information collection and processing tools will greatly improve the RMS management efficiency of the product, and improve the timeliness, accuracy, completeness and connectivity of RMS information collection, to provide technical solutions for fundamental solutions of such issues as RMS information collection, sharing and utilization. In this way, the RMS level of the product can be ultimately improved. For example, NASA, UVA, and JPL in the US jointly developed an intelligent synthetic environment (ISE) software, to realize the data interaction between experts, design teams, manufacturers, suppliers, and other participating project research and development teams, and finally achieved an integrated design, by using technologies such as high-performance computing, high-capacity communication network, virtual product development, knowledge engineering, artificial intelligence, human–computer interaction, and product information management. With the development of artificial intelligence technology, many countries all over the world are currently developing intelligent tools to assist design, achieve intelligent design through the technologies such as knowledge engineering and deep learning. This will liberate the humans from repetitive work to better contribute their creative thinking.

1.4 Modelling Trend of Reliability System Engineering

39

1.4.3 Difficulties in Reliability System Engineering It can be seen that the tasks involved in the engineering of the reliability system are very complicated. There are hundreds of work items involved in the hexability toplevel standards, as shown in Fig. 1.13. Therefore, it is a huge challenge for product designers and reliability engineers to familiarize with and master these work items. Firstly, it is difficult for ordinary designers to choose the most effective from these work items. Second, it takes a lot of manpower and time to complete such a large number of work items. Thirdly, there are complex relationships among these work items to make it very difficult to coordinate and process the relationship between the hexability work items and functional performance design. Fourthly, it is easy to fall into some specific work items, leading to the situation “only see the trees, but not the forest”, by lacking a “holistic view”. As shown in Fig. 1.14, in the equipment development process, the role of hexability design is often underestimated in its early stages. Then, in its latter stage, the hexability design problems can only be corrected through continuous trial-and-error impact of functional testing, packaging, storage, handling, transportation and maintenance on product reliability

reliability establish a component acceptance selection and identify RMS failure review RCMA test organization control work item LORA reliability requirements reliability finite element life test reliability growth reliability development test analysis supervision and durability reliability environmental compliance management modeling identify control of analysis prediction stress screening test sneak circuit reliability critical contractors specify the analysis of preliminary reliability analysis components reliability FTA reliability RMS plan spare parts risk analysis growth test analysis and allocation change RMS design O&MTA circuit evaluation testability maintainability FMEA maintainability guidelines tolerance verification diagnostic modeling prediction analysis use reliability supporting design testability use reliability assessment engineering predictIon maintainability maintainability information records reliability use reliability modeling testability demonstration collection development test information allocation improvement reliability evaluation

Demonstration of preliminary sample scheme for batch production samples

complex man-machine systems

systems

equipment, components, original parts

reliability maintainability safety testability

supportability

whole characteristic

Fig. 1.13 Hexability work items such as dark clouds over the top

whole system

40 Design for hexability

1 Development Phase of Reliability Systems Engineering Verification for hexability

Fig. 1.14 Much more attention paid on tests rather than design

tests, by strengthening the test work. This way usually leads to a significant delay in discovering hexability problems, which increases the cost of design improvement, and as a result, the design cycle will also be greatly prolonged. To answer the question how to strengthen the hexability design work in the early stage of design needs to be started with the solution of the problem of hexability design coordination, under the perspective of technological change.

1.4.4 Technical Requirements for the Unified Model of Reliability System Engineering The hexability design problems under the development of complex systems should be changed from the traditional engineering mode centered by hexability work items and document output. Model-based system engineering provides a new solution for hexability design. On the other hand, the hexability design should also follow the modelling development trend, incorporated into the MBSE process, and integrated with the product model. In this book, the comprehensive consideration among the functional performance characteristics, hexability characteristics and the collaborative implementation process is called the Reliability System Engineering Unified Modelling (RSEUM) technology [14]. As shown in Fig. 1.15, the unified modelling technology of reliability system engineering reflects the current development trend of product life cycle modelling technology. It not only follows the development direction of the performance design field, but also reflects the development requirements on unified analysis, modelling and design process control of hexability design. Regarding the performance modelling technology, in order to solve the problems of design complexity, traditional functional design divides the design problems into multiple disciplines, and conducts relatively independent modelling, analysis, and design to them. However, due to the independence of different discipline models, it is difficult to carry out design collaboration and optimization from a global perspective. The design goal of the system is difficult to be achieved at one time, and often

1.4 Modelling Trend of Reliability System Engineering

41

Reliability System Engineering Unified Modeling

Modeling of products and their fault throughout their full life cycle

Reliability and performance integration modeling Reliability independent modeling

Fault-centric RMS comprehensive modeling

Full life-cycle process model

Performance design professional unified modeling

Realize the unified modeling of RMS based on the unified understanding of product fault

Enables support for full life cycle process modeling

All disciplines use the same modeling methods and means to achieve unified modeling

Establish a quantitative model between reliability and performance design parameters based on the mechanism

Reliability model based on statistics and simplified logic RMS Design Modeling Technology

Modeling of design and analysis process

Models of various disciplines are linked through standard protocols to achieve certain interoperability

Implement support for modeling of the design process Design and analysis model The model is mainly to support product design and analysis

Professional integrated modeling for performance design

Performance Design Modeling Technology Full life cycle modeling technology

Performance design professional independent modeling Models for each discipline are independent of each other

Fig. 1.15 The development of unified modelling technology for reliability system engineering

requires multiple design iterations, resulting in extended design cycles and increased design costs. With the development of design methodology and multidisciplinary design optimization technology, aiming at the achievement of interoperability and information sharing in between various disciplines, integrated modelling technology has emerged in the field of functional design, and various disciplines have achieved indirectly interaction through the establishment of standard interface protocols. With the continuous development of enabling technologies such as computer networks and the gradual improvement of unified standards, unified modelling technology has gradually moved from theory to practicality. Old problems and new challenges can be found in the hexability modelling of complex systems. In order to describe the logical relationships between product component faults and system faults, reliability models are produced based on statistics and logic simplification [15]. In order to consider more complete uncertainty factors (i.e. product internal parameters, faults and environmental conditions, etc.), a quantitative relationship is established between reliability and performance design parameters, and integrated reliability and performance models are thus developed. For instance, the physics-of-failure (PoF) based reliability modelling technology is developed at the unit level of products, whereas the reliability and performance integration modelling technology is developed at the system level of products [15–19]. With a comprehensive development of different disciplines, independent reliability, maintainability, and supportability models cannot meet the requirements of integrated design, analysis, and evaluation, and accordingly the unified modelling technology is gradually developed by taking faults as the core. Talking about the full life cycle product models, traditional product models focus on describing the functional and technical characteristics of the product, and basically describe the final state or stage state of the design but cannot reflect the dynamic process and design concept of the design. The development of today’s design technology has prompted everyone to pay attention to the process and gradually develop the design process modelling technology. Taking into account the various stages of

42

1 Development Phase of Reliability Systems Engineering

equipment development, the inheritance relationship between each design feature and the continuous link between each design feature, it is necessary to establish various element models of the product life cycle process. The process model not only describes the change process of product characteristics in the life cycle, but also describes the relationship between activities, organizations, and resources in the life cycle process. From a technical point of view, the goals of unified modelling of functional performance and hexability are: (1) to comprehensively consider the functional design requirements and hexability of the system, and (2) to use unified or mutually understandable expressions to reflect the various technical characteristics, environmental conditions and application conditions of the system. The hexability design and functional design of the system can be achieved with consistent data sharing, collaborative optimization methods, and comprehensively balanced characteristics, which is helpful in achieving the coordination of product performance and hexability requirements. From the perspective of management, the goal of unified modelling of functional performance and hexability is to decompose the process into activities, processes, organizations, resources, etc. for the complex process to implement system functions and hexability, and use a unified modelling method to express the process in order to achieve unified, coordinated and precise control of the performance of the product development process and the hexability design. From an implementation point of view, the goal of unified modelling of functional performance and hexability is to use a unified model to expand and enhance the existing digital environment, establish an integrated platform for performance and hexability collaborative design, and use the platform as a carrier to achieve the unified design process between performance and hexability through the use of information technology. Reliability system engineering is part of product development system engineering. The development of modelling technology is helpful to comprehensively consider the hexability design requirements in the product development process, quickly build the hexability model, and finally achieve the multidisciplinary optimization. The product design process is the process from analyzing user needs to producing physical products that meet the requirements. Its essence is the active process of continuously solving and checking multiple related complex problems. In this process, as the system continues to be decomposed and integrated, products are presented in different forms at various development stages. In other words, we can summarize the product design process as the evolution process of different forms of products. The precise representation of different forms of products at different stages requires the use of corresponding product models. With the deduction of the design process, the information contained in these product models is constantly enriched and improved. Because modern products are mostly complex systems, involving multiple disciplines including machinery, control, pneumatics, strength, electronics, hydraulics, software, reliability, etc., the requirements for product models in different disciplines are also different, and traditional modelling techniques often only start

1.4 Modelling Trend of Reliability System Engineering

43

from a single discipline, lacks systematisms, cannot effectively reflect the characteristics of multidisciplinary coupling, and it is also difficult to guarantee the consistency among different models. Therefore, it is necessary to establish a unified model that can reflect the product evolution process and multidisciplinary design requirements. With the unified model, a unified design process can be formed to solve the issues of unified description and unified design of the product’s hexability and special characteristics. Furthermore, an organic relationship between these above-mentioned special characteristics and hexability design methods needs to be established to realize the unified sharing of special characteristics and hexability design information. In the meanwhile, based on the unified design process, taking the function continuity as the goal, the unified design, analysis and control methods around the faults are given for determination of the reliability and special characteristics, and closed-loop control models for faults are established to achieve the affair logic discovered from the fault system and design requirements for reliability. This process solves the complete problem of the fault identification of complex and multi-level products, improves the reliability system engineering fault prevention and control method system, and achieves the precise control of the special characteristics and hexability design process. In the meantime, it is ensured that the products designed through the unified design process meet the design requirements for not only special characteristics but also hexability.

Chapter 2

Fundamentals of Model-Based Reliability System Engineering

Abstract In order to achieve the hard requirement of an organic integration between hexability and functional performance design, this chapter firstly discusses the general technical framework, research scope and engineering significance of Modelbased Reliability System Engineering (MBRSE). And then it provides the conceptual model, elements and architecture of MBRSE, by taking the construction of integrated design problems, identification of faults and loads with models as the center. Finally, the ontology-based information sharing mechanism of MBRSE, the process control mechanism of MBRSE based on meta-process and cybernetics, and the design evolution mechanism of MBRSE based on axiomatic design theory are presented respectively. Keywords Integration · Model-based reliability system engineering · Information sharing · Process control · Design evolution

2.1 MBRSE Theory and Methods 2.1.1 General Technical Framework for the Integration of RSE Technology The hexability design is supposed to be a part of product design. However, in traditional design methods, due to the non-direct quantitative design of hexability characteristics, actual projects tend to use multiple qualitative and quantitative ways to gradually achieve the product design. It also hopes that traditional hexability engineering activities can produce “direct impacts” on functional performance design. The inconsistency in hexability engineering practices has always been a crucial problem that still needs to be solved. But currently, the hexability design is mainly promoted through management methods in order to “indirectly affect” the product functional performance design, and there is a lack of necessary techniques to cut off the root cause of the inconsistency problem.

© National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_2

45

46

2 Fundamentals of Model-Based Reliability System Engineering

With the continuous development of engineering design theory, and the continuous progress of design methods driven by the development of computer software, hardware and network technologies, the connotation of modern design constantly changes and expands mainly from the following aspects:

2.1.1.1

Forward Design Process Driven by Requirements

The design is no longer centered in terms of product functional structure but oriented to the needs of users. The design process is driven by demand, and carried out according to a series forward process including requirements demonstration, design plan establishment, industrial specification design, production process control, use process evaluation, etc. And the importance of early product development requirements for establishing various characteristics is particularly emphasized in the design process. A full-process of forward design requires precise product technical status descriptions and fine-grained management and control, both of which are major challenges to current technologies and management.

2.1.1.2

Multidiscipline Collaborative Design and Optimization

The design of modern complex products requires advanced knowledge coming from different disciplines, which often are influenced and restricted by each other. For achieving an optimization on system design to reduce design iterations, a multidiscipline collaborative design and optimization beyond the time and space barriers should be performed through multidiscipline information sharing, model sharing, and method interoperability.

2.1.1.3

Full-process Digital Design

With the advancement of information technology, computer-aided design methods represented by CAD/CAE software are under continuous development, with everincreasing levels of automation and intelligence. At present, in the field of structural design, the degree of digitization is getting higher and higher, and data exchange standards are gradually recognized and used. Data and methods are increasingly linked, and “silos of automation” have been significantly reduced, enabling collaborative solutions to the conflicts arising from multidisciplinary coupling. In recent years, digital technology has been extended to designs of demand and function, and a digitalized whole process design has been gradually achieved with the development of MBSE theory and technology. In view of changes and development of the connotation of the modern design, on the one hand, the hexability design must adapt to the changes and extension of the connotation of the performance design, take the road of whole-process, parallelization, synergy and integration, and be closely integrated with the forward design

2.1 MBRSE Theory and Methods

47

process. On the other hand, under the guidance of modern advanced design theories and design methods, as well as the promotion of all-digital design methods, it is conducive to further development and enrichment of the connotation of the hexability design, and promotes the development of the hexability design modelling and digital technology. Hexability design is truly “organically” integrated into product design, known as the “hard” constraints for the design.

2.1.2 Main Research Areas and Engineering Significance of MBRSE Model-based reliability systems engineering (MBRSE) is carried out based on a unified model. The unified model is a generalized model, including two categories of models, which are the process models and method models, respectively. These two types of models form the basis for the effective development to support the performance and hexability design. And these two types of models describe various elements required for the performance and hexability design from different perspectives, which are related to each other. However, each one of these models has its own purpose. The models established in this book are mainly basic and principled models. To solve the core problems of these two types of models and their related engineering problems, one can effectively expand the basic models by combing with applications. The major focus of the MBRSE research is how to realize the interoperability of correlation models. Integration of hexability is not a simple linear assembly of its six elements. These elements should be rather combined in the way that the Luban lock is shown in Fig. 2.1. According to a specific installation process, these elements are tightly bonded to each other to form an organic whole. To establish the basic structure of the integrated unified model of both functional performance and hexability in the whole development process of a complex system, to lay a good foundation for the in-depth development and engineering application of the integrated design technology, the following steps should be implemented: Fig. 2.1 An organic integration of the hexability model

48

2 Fundamentals of Model-Based Reliability System Engineering

• Analyze the engineering requirements of the performance and hexability design, clarify the goal of establishing the unified model, carry out modelling technology research on integrated design processes and methods, and establish a unified model framework. • Break through the key technologies (such as the integration mechanism of the integrated design, multiview modelling for the unified process of the performance and hexability design, and failure mode mitigation decision-making oriented to the requirements of hexability), and build an integrated design model with the unified process and methods. • Provide the construction methods of the model-based and application-oriented integrated design platform and software tools for the integrated design, to verify the feasibility of the unified models. For the purpose of research convenience, in this book the design and analysis of support ability is limited in the scope of maintenance and support, and the design and analysis of safety ability is limited in the scope of fail safe.

2.2 The Concept and Connotation of MBRSE 2.2.1 Definition of MBRSE If the product design engineering is regarded as a system, then this system should be controllable, and should evolve toward the design goal, from the abstract to the concrete, and from the simple to the complex. According to the principle of systematics, every system should be studied as a process with directions. Therefore, by controlling the structure and behavior of an integrated design process, it is possible to control the direction of that process. However, the implementation of the process is subject to a number of constraints from technology, time, cost, etc. If there are large fluctuations beyond the expected tolerance, the process will be terminated and product design will fail. The design process begins from the analysis of the user’s target needs, until the fabrication of the physical product that meets all the needs. Its essence is an active process of continuously constructing solving and verifying multiple related complex problems. The main reason for product fluctuations is the complexity of design problems. The fact that hexability design is highlighted in product design is because although the traditional design fully considers the hexability design requirements, it focuses on the solution of point problems but lacks systematic and integrated top-down solutions. Systematic consideration of the hexability requirements will greatly increase the complexity of the design problems. The hexability is considered as an inherent property of a product and closely coupled with the functional performance of that product. Therefore, it is determined by the product design characteristics and its environmental characteristics. And, the related design activities of hexability are not only the extension of the functional performance design activities, but also in turn

2.2 The Concept and Connotation of MBRSE

49

affect and constrain the functional performance design. In the development process of the hexability project, it is also found that the earlier and deeper the involvement, as well as more comprehensive consideration of the problems, the more advanced improvement of the hexability level of the product can be achieved. For example, Harold proposed to put the reliability work thoughout the product life cycle, with a high attention on the DFR (design for reliability), design mode in the upstream of the product life cycle. There are also many similar methods, which, however, basically stay at the theoretical stage, and have not yet formed a systematic method system. They are more to provide a design concept for designers. An integrated design of the functional performance and hexability of complex products has dynamic, nonlinear, and uncertainty characteristics, which are difficult to quantify and also difficult to directly relate to the functional structure of the product. Therefore, it is necessary to deal with and solve design issues by combining qualitative and quantitative methods, integrating multiple approaches, and being assisted with multiple persons and from multiple directions. In the hexability design process, necessary data should come from such sources as design, test (including simulations), field and historical databases in the distributed digital R&D environment. The data will allow to create a model-based on product functions, its performance and physical characteristics of the process closely coordinated with the traditional functional design performance. The development of modelling technology provides a new solution for the integrated design of functional performance and hexability. Based on the models, the relationship between functional performance and hexability can be accurately described. The design evolution between different product levels and different design stages can be effectively realized, and the complexity, repetition, and uncertainty of the design can be reduced. MBRSE is based on the continuous refinement of various professional characteristic models and various external load models of products, to establish the product use process model, fault behavior process model, and maintenance model. During the continuous evolution of the product design process, based on these above mentioned models, it constantly recognizes the rules of product fault occurrence, development, prevention, control, and product use guarantee, analyzes the weak links of the hexability design by simulations, and verifies the realization of the hexability requirements by simulations. In this way, the hexability design can be improved by coordination and integration with the functional performance design, to simultaneously achieve the functional performance design and the hexability design requirements. The conceptual model of MBRSE is shown in Fig. 2.2. According to the demand vector for use {RC}, an integrated design problem is constructed. In this problem, the design is decomposed into functional performance design and fault mitigation and control design. Then, a batch of engineering methods are applied to analyze and solve this design problem. During the solution process, the above two types of design should cooperate with each other to reduce design iterations. The fault reduction and control design are based on the cognition of faults and their control rules. Such a cognition gradually deepens with the deepening of the design and the refinement of the design scheme, from qualitative to quantitative and from logic to

50

2 Fundamentals of Model-Based Reliability System Engineering engineering method set[D]

user {RC} demand demand model

comprehensive design for function,performance and RMS requirement functional model

construct comprehensive design problems

physical model

failure mitigation (design for reliability) and control(design for maintainability and supportability)

functional failure model

failure behavior model

system synthesis and evaluation

physical failure model

failure testing model

failure repair and maintenance model

failure and its control model

operation & mission model

key factors in lifecycle

lifecycle load model

operation process/environment(load) model

Fig. 2.2 MBRSE conceptual model

physics. At the same time, the process of fault reduction and control is also a process of re-understanding the fault and its control rule. The cognition of the fault and its control rule is based on the cognition of the use process/environment (load), and the cognition of the load is also gradually deepened with the progress of the design. After completing the solution of each problem, it is necessary to carry out a systematic synthesis and evaluation to verify the solution process and evaluate the degree of solution for the integrated issues. The above described process may be iterated many times in product design until the integrated design problems are satisfactorily solved.

2.2.2 Elements and Architecture of MBRSE 2.2.2.1

Fundamental Essentials of MBRSE

As a system, the core elements of MBRSE are the process and control methods, and it is necessary to have a deep understanding of the evolution rule of the models. The driving force of the model evolution comes from various engineering methods and their synergies to each other. These engineering methods should push the design model to evolve in the expected direction with the least cost, but also should allow certain iterations in local areas. In order to reduce the evolution cost and improve the accuracy and efficiency of the engineering method application, there is a need to use auxiliary tools which could also put forward new requirements for the method to result in the better application of these tools in return. Similarly to other systems,

2.2 The Concept and Connotation of MBRSE Fig. 2.3 Relationship among the models, methods, tools, and environments in MBRSE

51

model tool method environment

MBRSE is also generated in a certain environment, in which the operation of models, the application of methods and tools are all completed. Therefore, it is necessary to design and control the environment to make it has a positive effect on the evolution of the models in MBRSE. Therefore, the MBRSE system should contain at least four types of elements: models, methods, tools and environments, and the relationship among them is shown in Fig. 2.3. (1) MBRSE design models For convenience, the state of the design system is expressed by the system, the state of the product. Although the MBRSE process is continuous in time, the state of the product in the MBRSE process is discrete, so MBRSE can be regarded as a generalized discrete dynamical system. The system process can be viewed as a sequence of tasks and their combinations that are performed as a product model for evolving a specific state. The model defines what the task is (WHAT), but does not define the specific method for performing the task of design and analysis. This book focuses on planning the modelling process. In order to make the definition of the modelling process scientific and reasonable, it is necessary to establish the relationship between different types of sub-models, and determine the input and output interface between these sub-models. The established model should be scientifically planned to reduce feedback, eliminate coupling, and reduce the number of iterations in the design process. The model itself is divided into different levels and layers, such as the MBRSE model at the level of equipment, the MBRSE model at the level of systems, the MBRSE model at the level of devices, etc. There is a relationship among the MBRSE models at different levels. (2) MBRSE design methods The MBRSE method includes specific technologies to achieve specific functional performance and hexability goals. It defines the operation method and process of each task in the model evolution process. At any level, the process tasks are performed by using methods, each of which needs to be performed in certain steps. In other words, a method itself is a process. And for a process at a certain level, its higher levels become other methods. This book focuses on the system and integration of the methods, by classifying tasks in the process of model evolution, mapping them

52

2 Fundamentals of Model-Based Reliability System Engineering

to specific methods, and analyzing the essence of methods, to achieve data sharing and interoperability between methods. (3) MBRSE design tools and environments MBRSE design tools are used to assist in the implementation of specific methods. They can enhance the efficacies of the tasks, under the premise that they are applied correctly and the users must have the appropriate skills and training. These tools in MBRSE design are generally computer software, such as the computer-aided engineering (CAE) software which is commonly used for the analysis of functional performance and hexability design analysis software. The use of them enhances the control capabilities of the MBRSE process and the processing capabilities of the method. The traditional hexability design tool is often used as a single tool or a shared application within a specific field. They cannot automatically obtain the product design data necessary for the tool, nor can they transmit the required data to other tools in a standard way. The MBRSE design environment organically links system methods, tools, resources, and manpower to promote the positive evolution of the reliability systems engineering process. The environment here includes personnel organization, digital integration platform, technical specifications, corporate culture, etc. This book focuses on the physicalized environment, i.e. the digital integration platform.

2.2.2.2

MBRSE Unified Model System

The MBRSE unified model is not a simple accumulation of the traditional single models, but more like technical models that can describe both the special characteristics and hexability technical characteristics at the same time and support the application of various design analysis methods. Its specific definitions are as follows: Definition 2.1 The Product Model (PM) refers to a set of attributes that can comprehensively reflect the characteristics of the product at time t in the product design process, denoted as PT =t = { Ct , E t , At , Rt , . . .|T = t}. Each specialty model is an image of the product model in the specialty domain. Definition 2.2 The Unified Model (UM) refers to a collection of models that can comprehensively reflect the product evolution process and professional characteristics at different times in the product design process. It has the following characteristics: (1) It is not a single model, but an integration of multiple models organically linked to each other. (2) It is not a static model but a dynamic model that evolves along with the design process. (3) It is an integrated model of process and specialty, which includes not only the description of the system engineering process, but also the description of the

2.2 The Concept and Connotation of MBRSE

53

general characteristics and special characteristics of the product, to support the multi-professional collaborative design of the whole process. (4) It is a demand-oriented model, as the unified model is directly oriented to the requirements of the product design. If the product contains reliability requirements, reliability-related design characteristics must be considered in a comprehensive and systematic way. The completion of functional performance and hexability integrated design needs to solve the problems such as “why they can be integrated” (integration mechanism), “how to achieve an integration” (integration method), “how to conduct an integration” (integration process) and “how to establish the integrated support method “(digital integration platform) and other issues. The unified model of integrated design should also be implemented in different layers around the above requirements. Based on the conceptual models and basic elements of the integrated design, a unified model system framework for the integrated design is shown in Fig. 2.4, including three levels and four parts. (1) MBRSE integration mechanism The integrated mechanism model studies the basic principle in which the performance design and the hexability design can be carried out together. Firstly, according to the product use process, the basic elements involved in the functional performance and integrated design of hexability are identified, the design of product functional performance is further expanded to the product use support design, the identification of the environment/use load, and design for prevention and control of the failure. The understanding of the occurrence, development and control rules of faults is unified into the fault ontology model and the integrated design meta-process control model, to form a multi-professional cognition on the integrated design, and then establish a data sharing mechanism and integrated design process control mechanism, to guide the integrated design process. (2) MBRSE integrated process model Based on the ideas of both the complex product development system engineering process and concurrent engineering, the control mechanism of the integrated design process is applied to establish the parallel process framework of functional performance and hexability integrated design. According to this framework, the integrated design process planning and reorganization method based on design structure matrix (DSM) is used to plan the overall process, stage process and partial process of the integrated design, and the process model is constructed by using the multi-view process modelling method. For the constructed process model, both process operation conflict analysis and process operation capability evaluation are required. (3) MBRSE integrated method model The matrix of integrated design methods is applied to classify and sort the integrated design methods in each design domain, analyze the domain mapping relationship of the methods, and form an integrated design method system. Based on the fault

54

2 Fundamentals of Model-Based Reliability System Engineering

PLM-based process model

MBRSE integration platform framework

data model based on PLM

MBRSE-oriented process model

MBRSE tool integration

MBRSE-oriented data model

MBRSE process driver

integration and application of MBRSE method

MBRSE data sharing

integration method model

integrated process model conflict analysis and operational capability evaluation Comprehensive design process planning and reorganization method based on DSM

process planning whole flow

phase process

local process

Application platform layer

dynamic synthesis monitoring model for system synthesis process fault mitigation decisions for RMS design requirements influence model of fault mitigation on RMS index

MBRSE parallel process

fault mitigation Decision Model

fault model based on product model fault model based on functional relationship

parallel engineering process for complex product development

functional model based on physical structure

Basic model layer

MBRSE method system systems engineering process for complex product development

process framework

MBRSE method matrix

design domain

domain mapping

functional performance and RMS integration mechanism process control mechanism

meta-process control model

fault ontology model

data sharing mechanism

cognition of the law of occurrence, development and control of faults

use product

support resource possess

possess

integration mechanism layer

working condition

final user

demand

discovery and repair human factor

internal cause

function/structure realize control process

demand

working load

relevance, function performance influence, tradeoff

fault control

use support use support

fault related maintenance support

supportability

predict/ locate faults

testability

external cause

recognize and recognize and eliminate mitigate danger faults

maintainability

reliability

environmental/ operating load

prevent faults/ hazards

safety

support interoperability

enironmental adaptation

features MBRSE

Fig. 2.4 The unified model system of MBRSE

ontology model, the functional model and the physical model of the product are expanded to form a failure model-based on the product model. Based on the quantitative relationship between the functional performance design and the fulfillment of the hexability requirements, fault reduction is implemented for the hexability design. In this process, the monitoring is carried out through the dynamic integrated monitoring model of the system integration process. (4) MBRSE integration platform The functional performance and hexability integration platform carries the technology and management enable tools required for the whole process of the hexability integrated design, achieving the process integration and data integration between the hexability and functional performance. The use of the integrated platform to carry out hexability engineering activities ensures the integrated design of hexability characteristics and functional performance, and fulfills the hexability design goals while achieving the functional performance design goals.

2.3 Information Sharing Mechanism of MBRSE

55

2.3 Information Sharing Mechanism of MBRSE 2.3.1 The Cognitive Process of Product Life Cycle In the process of the product life cycle, different design stages reflect different understandings and perspectives on normal use conditions, possible faults, and prevention and remedying of these faults. To this end, this book establishes a model of the product development process that includes three layers and two stages of cognitive processes, as shown in Fig. 2.5. In the plan stage, starting from the top-level requirements (task mission) of the product, the requirements of “how to work” are given, including the quantitative indices of these requirements. The realization of these requirements is reflected in the cognitions of guarantees of product use and maintenance, represented by the views of their uses. The precise description of maintenance guarantee requires the cognitions of the occurrence and development of product faults, and the description of the product’s functional fault view. Product development starts from the functional performance design, first from the functional point of view to describe and analyze its use and faults. With the progress of product development, the cognitions of the product’s characteristics and environmental conditions become more detailed and accurate. Then, the influence of the overall external load and stress can be considered and the use and failures of the product can be recognized and described from the perspective of the physical structure to achieve the functional performance of the product. This stage needs a deeper understanding of the occurrence conditions and characteristics of the faults. The usage mode, failure mode and load obtained from the analysis can directly affect the development of the design and support plans of the product. As the design becomes more and more detailed, the understandings of requirements and problems are getting more “precise” as well. On the one hand, it has a more accurate identification of the normal use of the product and its maintenance guarantee conditions after faults, as well as its load and stress conditions during

use guarantee condition

maintenance guarantee condition

product’s characteristics and environment

load and stress

condition

perspective of use guarantee

perspective of maintenance guarantee

perspective of functional failure

perspective of physical failure

design

function implementation and maintenance requirements

top-level requirements

requirements for use and maintenance features

Fig. 2.5 The cognitive process of the product’s use and faults during its development

requirement

56

2 Fundamentals of Model-Based Reliability System Engineering

operation. On the other hand, information on the physical structure of the product is also more detailed. By using a physics-based method, it allows us to go deep into the internal structure of the product to analyze and recognize its failure mechanism, use and maintenance process. This cognitive process effectively links the use, faults and maintenance guarantee of the product in its design process, and the cognition deepens with the design process. This cognition continues to be more and more detailed, from the demand level to the functional logic level, and finally to the physical level to design a product that will meet both the functional performance and hexability requirements. The realization of this cognitive process requires the integrated implementation of multiple engineering disciplines, involving the gradual and synergistic use of a series of engineering methods.

2.3.2 Design Ontology Framework for MBRSE In the product life cycle, the various types of hexability data and knowledge are mainly recorded in documents, and sometimes also in hexability databases for some specific products. Although such information can support the hexability work to a certain extent, it has the following shortcomings. First, it may cause the loss of various “use” information. This is because the traditional method focuses on the simple records of the resultant information, without the description of the cognitive process, and therefore cannot fully and accurately express the design knowledge as well as the design plan. Secondly, the hexability information and process cannot be accurately and uniformly expressed. Different people may use different ways and terminologies to describe hexability information and process, creating obstacles to sharing of hexability information and process. The terminology in hexability engineering involves a large number of terms and concepts, which have complex relationships with the product functional performance design concepts. Therefore, by using ontology technology, this book provides a unified knowledge model of related concepts and their relationships, to achieve the unified expression of the functional performance and hexability data knowledge in the whole design process, and lay the foundation to establish a unified model for multiple engineering disciplines. Gruber (1993) from Stanford University first proposed the definition of ontology, and Borst (1997) made some revisions on the basis of Gruber’s definition. This book merges the above two definitions, i.e. an ontology is a set of explicit and formalized specifications of a system of concepts agreed by most people. The goal of ontology is to capture knowledge in related fields, provide a common understanding of knowledge in this field, identify commonly recognized vocabulary in this field, and give clear definitions of these vocabulary (terms) and their interrelationships at different levels. Ontology is an abstraction of the existence of domain entities. It emphasizes the association between entities, and expresses and reflects these associations through a variety of knowledge representation elements (i.e. ontology modelling primitives).

2.3 Information Sharing Mechanism of MBRSE

57

By using these elementary modelling rules, objects in correspondence to the ontology can be described rigorously and accurately. Although there are various summaries on ontology elements, all of them mainly emphasize the concepts in the ontology and their interrelationships. These concepts are broad and can refer to anything, such as job descriptions, functions, behaviors, strategies, and reasoning processes. The interrelationships represent interactions in between these concepts in a domain. Semantically, an instance represents an object, a concept represents a collection of objects, and a relationship corresponds to a collection of object tuples. The definition of a concept generally adopts a framework structure, including the name of the concept, a set of relationships with other concepts, and a description of that concept in natural language. The ontology modelling process includes two important relationships which are perspective (is-a) and composition (part-of). The “is-a” relationship is used to describe the internal structure of the target concept, whereas the “part-of” relationship describes the components of the object. In other words, “is-a” provides an appropriate conceptual classification structure, while “part-of” organizes appropriate concepts. This book uses four basic relations to express integrated design ontology, namely “part (P:)”, “class (K:)”, “instance”, and “attribute (A:)”. “Part” expresses the concept relationship between parts and the whole; “Class” expresses the inheritance relationship between concepts; “Instance” expresses the relationship between instances of a concept and that concept; “Attribute” expresses that a concept is an attribute of another concept.

2.3.2.1

Definitions

The meta-product and its structure are the objects of an integrated design, and also the carriers of a variety of engineering methods and data information. The definitions of the product meta and its structure are firstly provided as follows: Definition 2.3 A product meta is the smallest design unit that is included in a specific product configuration Γ, and does not consider its internal physical composition. A product meta is represented by ci , and a product is a collection of all product meta and their relationships. It is noted that the design of meta-studies in this book do not include software products. C (Γ) =

{(

)I )} ( I (Γ) c1(Γ) , c2(Γ) , · · · , cn(Γ) I∀ci(Γ) ∄ c(Γ) ⊂ c j i

(2.1)

Definition 2.4 Structure of the product. Let C be a set of meta-products in a particular product configuration Γ , let: BC (Γ) =

{(

)I )} ( X (Γ) , Y (Γ) I X (Γ) , Y (Γ) ∈ C ∧ B X (Γ) , Y (Γ)

(2.2)

where B(X (Γ ) , Y (Γ ) ) means that the product meta X (Γ ) , Y (Γ ) are coupled with each other. Let σ (C Γ , BCΓ ) be a graph, if and only if σ is a connected graph, σ is defined

58

2 Fundamentals of Model-Based Reliability System Engineering

as a structure, namely, a structure is a collection of interrelated product meta. This dependence should be determined based on the functional performance of the design elements. It can be seen that in a product configuration, there are multiple structures. If a complex product organizes product meta hierarchically, a so-called product tree can be created, in which the leaf nodes are the product meta and intermediate nodes are the product structures or clusters of multiple product structures. And with the progress of the design process, the original design elements can be further decomposed into multiple design elements in the new product configuration. The product design consists of multiple product function design themes. The product function design theme Z is the image of the product meta in the current perspective, which includes all functions that the current design can reflect in the design domain. The function F is a complete set of external states of the product meta, including both the expected normal functional states and unexpected and illegal functional states. Definition 2.5 Meta-functions and states of the product. The product design plan Z is expressed by the external functional pattern as follows. Z =< T, ∼ F>

(2.3)

˜ =< F1 , · · · , Fn >: T → V1 ⊗ · · · ⊗ Vn , Fi is called the ith possible state where: F function of Z, and F˜ is regarded as all of the state functions of Z; T is a set of times, which contains Cartesian product of the state variates V. S(Z ) = {< z 1 , · · · , z n >∈ V1 ⊗ · · · Vn |z i = F i (T )}

(2.4)

The above formula is called the possible state space of Z. For the product meta design, only the state variables and their combination relationships that follow ⌣ the physical rules are considered, so S (Z ) = { S(Z )| < z 1 , · · · , z n >} (where < z 1 , · · · , z n > follows certain physical rules) is a legal state space. As in this book, only the legal state space is considered, S(Z) represents the legal state space for the convenience of expression. Definition 2.6 Functional failure modes. If the functional mode of the functional design body is Z m =< D, F˜ > , and f a(z) ∈ Fa(z) is the decision rule of faults, then the fault space expression of the product meta is defined as follows } { S Fa (Z ) = < z 1 , · · · , z n >∈ V1 ⊗ · · · ⊗ Vn | F˜

(2.5)

{ } ˜ > where: F˜ = F˜ satis f ieswitheachl(Z ) ∈ L(Z ) , then Z f m =< D, F represents the functional failure mode. Definition 2.7 Fault event. A sequence pair Ft =< s ' , s >

(2.6)

2.3 Information Sharing Mechanism of MBRSE

59

is called a fault event, where, s ∈ S Fa (Z ), s ' ∈ S Fa (Z ), represent the transition of the product from the normal state to fault state. If the conditions that trigger the failure are lost, the product can be converted from the fault state to normal state, which is called a reversible fault event, and on the contrary an irreversible fault event. Definition 2.8 Product structure function. If the product structure consists of i product meta, and by assuming the product structure as a whole system, the realization of its various functions is completely determined by the state functions of the product meta, then there is a function Φ(BC), which represents the relationship between the product meta function and the system function. In particular, when the product meta has a fault function mode, the function Φ can be used to analyze the failure mode of the product structure function. Definition 2.9 A support resource meta is the smallest unit of support resources that is necessary to support a specific use or maintenance task of a product. The meta-resource contains the required support characteristics, without considering its internal physical composition. The support resource meta-data is represented by sci , and the SC is a collection of all support resource meta. { ( )} SC = (sc1 , sc2 , · · · , scn )|∀sci ∄ scj ⊂ sci

(2.7)

The support resource is a collection of all support meta-resources. It can be divided into use support resource and maintenance support resource, which are object attributes of the use task and maintenance task, respectively. The use task includes the expected functionality of a specific product, whereas the maintenance task includes fault events of specific products. Both will establish an indirect link between the support resource and the specific product.

2.3.2.2

Top-level Ontological Framework of the Integrated Design

According to the relationships among the concepts of products, uses, functions, faults and their extended concepts, the established top-level ontology framework of the functional performance and hexability integrated design is shown in Fig. 2.6. The product structure consists of product meta, each of which has a generalized function, and various states. The fault is regarded as a state of the product meta. The functions and states of the product structure are not a simple sum of product meta. New functions and states may emerge, and have to be expressed independently.

2.3.3 Construction of the Fault Ontology In the integrated functional performance and hexability design ontology, products and faults are placed in core positions, as they are the link to connect the functional performance design specialty with reliability, maintainability, supportability,

60

2 Fundamentals of Model-Based Reliability System Engineering product structure

product meta

states

functions

unexpected function

expected function

working state

P: a portion K: a kind A: an attribute

use task

support resource

use support resource

support resource meta

fault events

fault state

maintenance task

maintenance support resource

Fig. 2.6 Ontology framework of the integrated functional performance and hexability design

testability, and safety. Since the ontology description of products has been discussed in many studies and literature, this book will focus on the fault ontology, which is divided into 4 layers. The product design ontology is the foundation. On this basis, the relevant ontology of basic characteristics of the fault is established, including the relevant ontology of fault propagation in the system, and the relevant ontology of the fault control after occurrence. The understanding of the fault is with different emphasis in different product design domains, and is difficult to be expressed with a unified fault ontology. This book proposes a way to separate the local fault ontology from global fault ontology, to describe the common features of the faults and the characteristics of each product stage respectively. The connection between the fault concepts in different design domains is realized through ontology mapping.

2.3.3.1

Global Fault Ontology

The global fault ontology is a general description of the fault and its related concepts, which is independent on the state of a product and its cognition. The important concepts in the global fault ontology are defined in a formal way as follows. The fault is defined on the basis of the product state, and different product states may have different functions. Then the product state that can realize the expected function is called the working state, and the state that realizes the unexpected function

2.3 Information Sharing Mechanism of MBRSE

61

product structure

external condition

fault condition

internal condition

P product meta fault process states

functions

unexpected function

expected function

fault cause

working state

P: a portion K: a kind A: an attribute

fault events

fault mode

fault state

fault effect fault symptom

cognitive fault state

non-cognitive fault state

Fig. 2.7 Global fault ontology

is called the fault state. The relevant faults include the faults of the product meta itself and the faults of the product structure caused by the fault propagation of the product meta. The structure of the global fault ontology is shown in Fig. 2.7, in which the dashed line represents the semantic connection between different ontologies.

2.3.3.2

Local Fault Ontology

The local fault ontology is expressed according to the cognition of fault issues at a specific design stage. In different design domains, the fault ontology has its own characteristics. This book mainly focuses on describing the fault ontology in the functional domain and the physical domain. (1) In functional domains of product design, the product meta has an abstraction level high, and therefore there are many possible combinations of the function types and product states. Under this circumstance, the state and function of the core product meta should be analyzed. Moreover, the cognition of products and their load conditions in the functional domain is usually conducted at the macro level, and the failure mode is mainly recognized from the functional level, resulting in that the faults exhibit significant multiplicity and ambiguity. Expert experience and historical information should be fully utilized to narrow the scope of fault analysis and improve the accuracy of the knowledge of fault issues. The fault ontology established by the functional domain is shown in Fig. 2.8. Its characteristics lay mainly that the product design meta is mapped to functional design meta, and the influence and transmission of faults are carried out through the functional input and output interfaces.

62

2 Fundamentals of Model-Based Reliability System Engineering global fault ontology OFg

functional domain fault ontology OFf

product structure

external condition

fault condition

P

internal condition

product meta

functional input interfaces fault process

P: a portion K: a kind A: an attribute

expected function

functional output interfaces

fault cause

states

functions

unexpected function

functional design meta

working state

fault events

fault state

fault effect fault symptom

cognitive fault state

functional output parameter

fault mode

non-cognitive fault state

functional fault criteria function effect

functional fault

Fig. 2.8 Functional domain fault ontology and its mapping

(2) In the physical domain of product design, the function of the product must be realized using physical hardware design meta. Therefore, the description of the fault issue is converted from the “invisible” function to the “visible” structure, the identification of the fault problem is extended from the logical level to the hardware structure level, and the judgment of the fault problem is extended from the functional conformity to the parameter compliance. With the deepening of the design, the functional design of the product is basically determined, the process and tooling design are subsequently carried out. Further in-depth understanding of fault problems can be obtained from the analysis of mechanisms problems in product meta structure, materials, and processes under the various load conditions. This stage mainly cognizes the faults from the fault mechanism, fault site and fault parameters. This fault ontology established in the physical domain is shown in Fig. 2.9.

2.4 Process Control Mechanism of MBRSE The design process of complex systems follows a bottom-up, iterative search for convergence process rule, which has not been overwhelmed in the current technical field. Based on the control theory of system science, this book uses the process control mechanism of the integrated design to “precisely” control the entire process, to reduce the randomness in the search and solution process, and thus reduce the iterative numbers of the “design-feedback-redesign” cycles. Considering the inheritance problem of the design meta, it is not necessary to carry out a complete design process for each design meta. This book divides metaprocess into the meta-process for brand-new product design and inherited product design, respectively.

2.4 Process Control Mechanism of MBRSE

63

functional domain fault ontology OFh

physical design meta A

assembling unit

physical fault

fault mechanism

A load

component fault parameters processes

fault location

materials

operating load

maintenance load

functional domain fault ontology OFf

shock load

humidity load

temperature load

global fault ontology OFg

product design meta

external condition

product structure fault condition

P functional input interfaces

electromagnetic field

product meta

functional output interfaces

fault cause

fault process states

functions functional output parameter

unexpected function

expected function

working state

fault events

fault mode

fault state

fault effect

functional fault criteria function effect

internal condition

functional fault

P: a portion K: a kind A: an attribute

fault symptom cognitive fault state

non-cognitive fault state

Fig. 2.9 The physical domain fault ontology and its mapping

2.4.1 Meta-process for Brand-New Product Design Meta The new product meta has no prototype product to inherit, and its design meta-process starts from the requirement model. In the process of product design, on the one hand, the design parameter x needs to be dynamically adjusted according to the design changes of the system; on the other hand, uncertain factors may change and propagate. Therefore, the product meta continues to experience the change process of “steady state-instability-steady state”. The overall trend is that the product meta model is continuously refined, transforming from the initial functional model to the physical model, and then from the initial simple physical model to the detailed and precise physical model. As various methods and models based on product meta-models are constantly evolving, and the evaluation of various hexability design characteristics of products is also more accurate, to continuously provide the direction for product design for approaching the design goals. As shown in Fig. 2.10. Meanwhile, in the process of model evolution, it is necessary to establish relevant design analysis models according to all kinds of requirements, such as establishing reliability models of design elements based on reliability requirements (either statistical models or physics of failure models), to build a model for different views of the same product. The goal of model transformation is to continuously approach manufacturable products that meet various design requirements. The principle and basis of the model transformation are the engineering analysis models of these views. With continuous refinement of the model, the complexity of the model will also increase. Then it needs to reduce the complexity of the model through decomposition

64

2 Fundamentals of Model-Based Reliability System Engineering

Fig. 2.10 Meta-process for the new product meta design

techniques. The decomposition is carried out from two dimensions, one of them is the decomposition of work content. It decomposes the problems of complex systems into problems of constituent units, and decomposes complex comprehensive problems into the problems of single disciplines. The other is the decomposition of workload, which decomposes the work that is difficult for one person to that completed by multiple persons in parallel.

2.4.2 Meta-process for Inherited Product Design Meta In general product design, most design meta are completed by inheriting the existing design meta, and the re-design process is conducted through the similarity inheritance principle. The engineering process starts from the demand model; searches for possible matching design metas on design requirements and loads from the design meta knowledge base; inherits design information according to their similarities with the source design meta; and generates an instance of the target design meta model. If the inherited design meta-template cannot be searched, the design process for a new product design meta-template is then started; otherwise the design process for the inherited design product meta-template is executed. In the design process for the inherited product meta, the product meta model library is established by accumulating, summarizing and extracting historical experience. And in the meantime, the design method models related to the product meta are also accumulated into a library, which is connected to the product meta, as shown in Fig. 2.11 shown. In the refinement process of the design meta-model, it is necessary to continuously

2.4 Process Control Mechanism of MBRSE

provide search criteria

Ls0(initial use condition/load model)

product meta model retrieval

provide search criteria

ci0(initial product meta)

cii(constantly refining product meta)

fault-oriented model extension

fault-oriented model extension

Rs0(initial requirements model)

re-using product meta model library

Lsi(constantly accurate load models)

meta-process for product meta ci

re-using re-using

65

Association method model library

cip(product meta that meet the final requirements)

method model set

process monitoring design evaluation

Rsi(changing design requirements)

Rsp(finalized design requirements)

Fig. 2.11 Meta-process for inherited product design meta

search the product meta-model by re-using the product meta-model and associated method model. Taking advantages of the model reuse can reduce the number of model iterations in the process, and therefore shorten the development process.

2.4.3 Meta-process for Product Structural Design A system is composed of multiple structures which is further composed of multiple product meta. However, the system process is not a simple superposition of the meta-process, but requires an integrated mechanism to implement the process from the quantitative change to qualitative change. In the system integration process, new process is needed to be added, and one should recognize and handle the intercorrelation in between the different product meta, and also recognize and handle the new failure mode emerged in the process. The intercorrelation between the faults can be summarized as the coupling of functional faults and physical faults, respectively, as shown in Fig. 2.12. The coupling of the functional faults reveals the influence of the fault outputs on the related functional units. For example, a failure mode f SAi of the product meta A will generate a fault output vector V oSi A(, which )( 0affects ) product meta B and C respectively through 0 I A,C . . The coupling of the physical faults is defined the functional interface I A,B as the influence on the other product meta through the materials (M), information (I) and energy (E) of hazard materials produced by the faulty product meta through “radiation/conduction” in a physical space. For example, a failure mode f SAi of the product meta A will generate M, I, E of a hazard material, which will be coupled to −1 −1 and I A,C , respectively. the product meta B and C through spatial paths I A,B

66

2 Fundamentals of Model-Based Reliability System Engineering

Fig. 2.12 Coupling relationship of faults

Due to the coupling of faults, the effect of product meta fault mitigation on the system faults is nonlinear. It is assumed that the mitigation approaches for system failures are all reasonable and effective, and based on this assumption, the mitigation approaches for each failure mode can be known. The adoption of new technologies or system integration may result in new failure modes, which may also affect the occurrence probability of other failure modes, by the following scenarios: (1) Elimination of a failure mode causes new failure { modes. The failure } mode M1 is eliminated, but a new failure mode set f 1eN , f 2eN , . . . , f teeN is introduced, failure mode occurrence probability is recorded { and the corresponding } as β1eN , β2eN , . . . , βteeN . (2) Elimination of a failure mode causes the elimination of the {other failure modes.} The failure mode M1 is eliminated, and the failure mode set f 1ee , f 2ee , . . . , f Eee is also eliminated. (3) Elimination of a failure mode reduces the occurrence probabilities of the others. Failure mode M1 is eliminated, and the probability of occurrence probabilities { } ed is also reduced, from the original of the failure mode set { f 1ed , f 2ed , . . . , f De } { ed ed } ed ed ed ed β1 , β2 , . . . , β De to β 1 , β 2 , . . . , β De . (4) Reducing the probability of occurrence of a failure model introduces a new failure mode. The probability mode M1 is reduced, but { of occurrence of failure } a new set of failure modes f 1d N , f 2d N , . . . , f tdd N is introduced.

2.4 Process Control Mechanism of MBRSE

67

(5) Reduction of the probability of occurrence of a failure model reduces the probability of occurrence of the other failure modes. The probability of occurrence of failure mode is reduced, }and the probabilities of occurrence { M1 dd dd , f . . . , f Dd are of failure mode set f 1dd { dd2 , dd } also reduced, from the original { dd dd } dd dd β1 , β2 , . . . , β Dd to β 1 , β 2 , . . . , β Dd . (6) System integration introduces new failure modes. The interface failure modes and new failure modes with high severity levels that emerged in the system} { integration process are mainly considered, denoted as f 1I N , f 2I N , . . . , f tII N and { I Nthe Icorresponding } failure mode occurrence probabilities are denoted as N IN β1 , β2 , . . . , βt I . Based on the above relationship of influence for fault mitigation, the integrated process of a product at the system level is shown in Fig. 2.13. Firstly, the interface relationships in between the i product meta are sorted out, and the interface failure modes are analyzed to obtain the interface failure mode models. Then, through an analysis on the load conditions over the life cycle of the system, together with the structure of the system-level product, layout of the product meta, and internal loads in the system, the local loads of each product meta can be obtained. According to these local loads, the possible new failure modes of the product meta, and the functional impacts of the new failure modes on the product meta and the system are further analyzed. Next, by comprehensively analyzing the functional failure mode and physical failure mode of each product meta in the system, both the functional failure coupled model and the physical failure coupled model can be established. Furthermore, combined with the interface failure mode models, the set of new failure modes emerging from system integration can be obtained. This failure mode set, together with the system failure modes induced by product meta-faults, composes the complete set of system failure modes. The mitigation analysis of the system failure modes should provide a reasonable mitigation order according to the relationship among the failure modes, to avoid the repetitive occurrence of failure modes or the introduction of new failure modes.

2.4.4 Hexability Design Goal Control Method In order to reduce the development period and cost, it is necessary to initiatively carry out the hexability design in the early design stage, to simultaneously satisfy both hexability requirements and special characteristic design requirements. The process to achieve the product design goal has the same characteristics and rule as other control systems, by composing of four basic elements which are the manipulation mechanism, controlled target, sensitive element and sensing channel, and the feedback channel. The system can use the controller, through information transformation and feedback, to make the controlled target run in a predetermined program. With respect to the hexability design process of the product as a system, the hexability

68

2 Fundamentals of Model-Based Reliability System Engineering emergent failure modes set

life cycle load model

functional failure coupled model

c0i local external loads

c1i local external loads

cii local external loads

physical failure coupled model

c0i (model of the i

c1i (model of the i

cii (model of the i

state of product meta 0)

state of product meta 1)

state of product meta i)

interface failure model

Fig. 2.13 Integration of product meta-fault rules into product structure

requirements are the goal of the system, and then a controller can be constructed to describe this process.

2.4.4.1

Hexability Design Target Close-Loop Control Method Driven by Evaluation

The hexability characteristics of the product level unit are generally proposed from the perspective of the fulfill of the hexability characteristics of the system. These product characteristics are often given in probability, which is difficult to be directly used in the design. Therefore, the traditional hexability design cannot be carried out according to quantitative requirements, but should be carried out by applying design criteria and design standards. Its core is to find potential failure modes through continuous analysis, take improvement methods to mitigate or reduce them, and meanwhile, pay close attention to maintenance support resources, safety assurance methods, and test design. After the implementation of the hexability design work, the achievement of the hexability quantitative requirements can be verified through the index evaluation. In order to achieve the expected hexability requirements, it is necessary to repeatedly and iteratively carry out hexability design analysis and evaluation. According to the given hexability design requirements, the initial product design plan is developed through the analysis and improvement of its weak links. The hexability level of the product is evaluated by the feedback of “hexability evaluation (analysis/simulation/test)” to obtain the evaluation result z; Then compare the evaluation result z with the given value x to get the offset e; If the deviation e is a positive value (i.e. the evaluation result is greater than the given required value), the design

2.4 Process Control Mechanism of MBRSE

69

Fig. 2.14 Evaluation driven target close-loop control method of hexability design

plan of the product will remain unchanged; Otherwise, the (re)improvement, which can be regarded as the “execution unit” of the control, will be carried out on the weak links of the product; The re-improvement process gives the manipulation variable q, which refers to the parameters to improve product design, to change the design plan of the “controlled unit”, and make the product export a new “output value” y, (i.e. generate a new quantification level on hexability); Since the hexability design of the product is subject to epistemic uncertainty of the designers, as well as the interference of a number of aleatory uncertainties from such as loads, material properties and geometric parameters, it is necessary to measure and feed-back whether it can meet the requirements of “given value”. This is repeated iteratively until a “steady-state” design solution is formed, that is, a technical state of the product, as shown in Fig. 2.14. It can be seen from the above-mentioned control method in the process of implementation of hexability requirements, that the manipulation variable output by the execution unit is not usually obtained accurately by the deviation, but more based on experience. In order to make the product hexability index as required, it is necessary to perform multiple design iterations. However, it is difficult to effectively control the number of design iterations.

2.4.4.2

Hexability Design Initiative Target Control Method of Taking Fault Mitigation as the Core

Open-loop Control Process of Fault Mitigation Reliability engineering is a discipline that fights against faults. It is a simple idea to improve the level of hexability by removing or suppressing faults in the product as much as possible. Based on this idea, in this book, the control of faults is linked with the achievements of quantitative requirements, and initiative design methods are conducted to control the process of product fault mitigation, to achieve hexability design index by using as few design iterations as possible.

70

2 Fundamentals of Model-Based Reliability System Engineering

Fig. 2.15 Closed-loop fault mitigation process

Definition 2.10 Fault mitigation is a closed process, which takes the corresponding design improvement measures, using compensation measures, predictive diagnosis, and other methods to completely eliminate the fault or reduce the possibility of its occurrence and severity, according to the cause of the fault and the severity of its consequences. The process begins with the analysis of the faults introduced in the product early design, until the steps where the fault is effectively mitigated and mitigation process is verified, as shown in Fig. 2.15. As can be seen from the above definition, the fault mitigation clearly points out that in the design process, the initiative fault mitigation is used as a clue to gradually improve the reliability, maintainability, supportability, testability, safety design, and finally to achieve the growth of reliability. This control process is completely initiative, by taking the fault as the link in between general quality characteristics of the product such as reliability, maintainability, supportability, testability, and safety. With fault mitigation as the core, various tasks in the development stage can be organically linked to form a unified technology logic. Fault mitigation is usually implemented by multiple stages. The mitigation process of a fault is accompanied by its cognitive process. For the product in different design domains, the recognition of its faults is gradually achieved, with different methods and fault mitigation processes. However, the control methods and workflows for fault mitigation have common characteristics. The fault mitigation process begins with the analysis of faults introduced in the early phase of product design, so it is necessary to identify possible failure modes

2.4 Process Control Mechanism of MBRSE

71

Fig. 2.16 Open-loop control process for fault mitigation

and take design improvements to mitigate these faults as much as possible. The design methods implemented can be validated to verify the mitigation of failure modes. However, from the perspective of achieving quantitative requirements, it is impossible to effectively determine which failure modes need to be reduced, so it is impossible to effectively control the achievement of quantitative requirements. Therefore, the fault mitigation process is an open-loop control process, as shown in Fig. 2.16.

Mixed Control Model for Unit-Level Products The meta-process of integrated design is the process of carrying out the integrated design of both functional performance and hexability for the product units. Assuming that the system consists of a set of design elements, the overall design requirements of the system can be decomposed into individual design elements. And for each product unit C i , all functional performance characteristics and hexability requirements x are clear, all failure modes are cognizable, and each failure mode can find a corresponding structure or a certain mechanism to mitigate. The integrated design of the product unit is actually an open-closed-loop control process of initiative control of the failure mode and offset based adjustment of the design plan, as shown in Fig. 2.17. In this process, the qualitative and quantitative requirements of the hexability take x as the “given value” of the system, i.e. the expected output of the product unit; and the qualitative and quantitative levels of the hexability of the product are the “output value” y, i.e. the controlled variable of the system. The controlled unit is the product unit in the integrated design, and its initiative control process is as follows. First, the “sensor unit” dynamically monitors the design of the product unit and identifies all possible failure modes { f i1 , f i2 , . . . , f in }, after that according to the given value x and various kinds of constraints (such

72

2 Fundamentals of Model-Based Reliability System Engineering

Fig. 2.17 Mixed control model for the achievement of the hexability goal of the product unit

as economic constraints, safety constraints) to make a mitigation decision. Then a set of failure modes that need to reduce and will be reduced can be obtained as { f it1 , f it2 , . . . , f itm } ⊆ { f i1 , f i2 , . . . , f in }, { f it1 , f it2 , . . . , f itm } /= φ. Next, specific mitigation methods for these failure modes are given. Finally, failure mode mitigation methods must be implemented in the design of the product unit. It should be noted here that in order to achieve initiative control of the integrated design, it is necessary to establish a quantitative relationship model between the mitigation of the failure modes and the required value x. The passive control process of the integrated design product unit is referred to the hexability design target close-loop control method driven by evaluation proposed in 2.4.4.1. The above described process controls the system behavior according to the input– output of the product units, to achieve the self-organization of the fault mitigation through initiative control and to achieve the self-stabilization of the fault mitigation through passive control. The basic principles of the above control process are not only applicable to product units, but also to product structures until systems, however it should be noted that the corresponding decision-making process will be more complicated.

2.5 MBRSE Design Evolution Mechanism According to the process control mechanism of the MBRSE design, the MBRSE design realizes the continuous evolution (or refinement) of the product meta. Its basic process is “what do we want to achieve” and “how do we meet the requirements”. The driving force of this process is the corresponding design evolution methods. In traditional hexability engineering activities, the hexability goals are achieved

2.5 MBRSE Design Evolution Mechanism

73

Table 2.1 Hexability work items in Chinese military standards and American standards No

Standard

1

GJB450A-2004 general requirement for materiel reliability program

32

2

GJB368B-2009 general requirement for the materiel maintainability program

22

3

GJB3872-1999 general requirement for materiel integrated logistics support

14

GJB1371-1992 analysis for materiel supportability program

20

4

GJB2547A-2012 general requirement for the materiel testability program

21

5

GJB900A-2012 general requirement for materiel safety program

28

6

GJB4239-2001 general requirement for materiel environmental engineering

20

In total 7

No. of work items

157 GEIA-STD-0009 reliability program standard for systems design, development and manufacturing

49

through the achievement of a series of local work goals that are solved by one or more work items. The overall goals will ultimately be achieved by applying specific hexability techniques or management methods to integrate local results. As listed in Table 2.1, a number of Chinese military standards (GJB) including GJB450A, GJB368B, GJB3872, GJB1371, GJB2547A, GJB900A, GJB4239, etc. specify the hexability work items at each stage, which are in total 157 items. In addition, the US standard GEIA-STD-0009 provides 49 hexability related work items. In the design process of MBRSE, fundamental theories are needed to support the development of work items to form a scientific and clear mainline and achieve the best cost-effectiveness ratio. In this book, based on the MBRSE process control mechanism and the principles of axiomatic design theory, a method system framework for integrated design is established to provide methods for the selection and integrated application of work items.

2.5.1 Design Evolution Method Set Based on Axiomatic Design Theory The evolution process of the product meta can be described by the process objective and engineering method vector. Given a product meta C i at the design level, the requirement set {RC} is determined by a specific design goal, and {RC} is achieved through multiple design implementation methods, forming an evolved product meta C j . In the evolutionary process, a design implementation method matrix [D] is used

74

2 Fundamentals of Model-Based Reliability System Engineering

to export the design parameter set for {RC} to form a {DP} vector. That is, the essence of the evolution process is to achieve the mapping of the requirement set {RC} to the parameter set {DP}, by: Ci {RC} → [ D]C j { D P}

(2.8)

where: [D] is the design implementation method matrix used as the technical methods to achieve RC. For a design with n RCs and m DPs, the design implementation method matrix is shown in below: ⎤ ⎡ D11 · · · D1m ⎥ ⎢ (2.9) [ D] = ⎣ ... . . . ... ⎦ Dn1 · · · Dr m

By rewriting Eq. (2.9) in terms of differential form, as {dRC} = [D]{dDP}, for each design implementation method element, we have Di j = ∂ RCi /∂ D P j Di j = ∂ RCi /∂ D P j RCi =

n Σ

Di j D P j

(2.10)

j=1

For the evolved product C j , it is necessary to design the evaluation method to evaluate the satisfaction of the requirement set {RC} of the product meta C i according to the measurable parameters of C j . In the evaluation process, firstly, the measurable parameter vector {TP} is firstly constructed according to the evaluation requirements of {RC}, and then the design evaluation method matrix [A] is developed to achieve the reverse mapping of the measurable parameters to the requirement set. That is, C j { D P} → [ A]C j {T P}

(2.11)

where: [A] is the design evaluation method matrix used to evaluate the RC. For a design with n RCs and m TPs, the design evaluation method matrix has the following form. ⎤ ⎡ A11 · · · A1m ⎥ ⎢ (2.12) [ A] = ⎣ ... . . . ... ⎦ An1 · · · Anm

By rewriting Eq. (2.12) in terms of differential form, as {dRC} = [A]{dTP}, for each design evaluation method element, we have Aij = ∂RC i /∂TPj RCi =

n Σ i=1

Ai j T P j

(2.13)

2.5 MBRSE Design Evolution Mechanism

75

Fig. 2.18 Relationship between product meta evolution and design methodology

After a design implementation method [D] and design evaluation method [A], it can be determined whether the design requirements are satisfied, and whether to proceed to the next design or to change the iterative design plan. Such a process is shown in Fig. 2.18. It is noted that, for a complex system containing multiple product meta, the achievements of all its RC set do not mean the achievement of the system RC. Under this circumstance, a comprehensive evaluation on the system RC is needed to be conducted by using a system integrated method.

2.5.2 Design Domain Extension for MBRSE Different product types and different development stages have different matrices of the design method and evaluation method matrices. According to the principle of axiomatic design, as shown in Fig. 2.19, the integrated design of performance and hexability can be divided into three domains, including user domain, functional domain and physical domain, respectively. In different design domains, the states of product meta are quite different. The user domain corresponds to the demand product meta. The functional domain corresponds to the functional product meta, and the physical domain corresponds to the physical product meta. The product meta in the above three domains is mainly classified based on the reductionism method, and the complex top-level product meta X0 is decomposed into several subproduct meta that can be processed by engineering methods. In a domain, different types of product meta have different evolution methods, by applying the corresponding method according to the evolution requirements. The requirement product meta CA is the product meta defined by requirement vectors. These requirement vectors are obtained by understanding the user’s requirements, and processed and transformed to specific design requirements which can be understood and designed by designer. It includes functional requirements and hexability requirements. The commonly used methods in this process include Quality Function Deployment (QFD), Analytic Hierarchy Process (AHP), etc. The product meta in this domain must be designable. In other words, design has to be carried out by using the existing design methods, otherwise, the product meta needs to be re-defined. It should be noted that the solution space of the requirement product meta

76

2 Fundamentals of Model-Based Reliability System Engineering

Fig. 2.19 Domain division of the meta-product

is not unique, but needs to be obtained according to the previous design knowledge and designer’s experience. The functional product meta carries part of the requirements or sub-requirements and the system requirements can be restored through the integration of multiple product meta. The realization of functional product meta is an important step in the product design process, and it can be classified based on historical experience and the designer’s experience, by ensuring that product designer can control the achievement of independent requirements. The methods in the functional domain are mainly used to optimize the structure of functional product meta, by such as analyzing its failure mode, estimating its fault probability, and establishing reliability models of the functional product. The physical product meta is the carrier of product function realization, which maps the realization of the function to a specific physical process and realizes the product function through the interaction of one or more product meta. The evolution of product meta in the physical domain should be based on clear physical principles. Most of these principles have been summarized into specific design criteria or specifications. Designers can directly carry out the design with these principles. Nevertheless, some designs still need to be analyzed and tested on the basis of physical principles to generate the design plans. The systematisms and effectiveness of the hexability design have always been an important issue in the design field. In this book, axiomatic design theory is introduced in the product design process to carry out hexability design, and provide a design domain extension method for integrated design which gives a scientific basis for systematically planning the hexability method and its application process on products, and then make the hexability requirements to be practically designed into the product. This book only focuses on hexability design, and the performance design of product meta will not be discussed.

2.5 MBRSE Design Evolution Mechanism

77

To systematically consider the hexability design issues, the user domain should be clarified first. The user’s requirements for the hexability of a product are often implicitly contained in its functional performance requirements or proposed referring to the statistical data of historical products. This kind of way is not helpful to the integration of the hexability design and performance design. Therefore, it is necessary to clear the hexability requirements in the user domain and establish the relationship between the hexability requirements and functional performance requirements. As shown in Fig. 2.20, the hexability design requirements are explicitly mapped and expanded from the functional performance requirements. Second, the functional domain should be expanded, as shown in Fig. 2.21, to the function achievement domain and the function maintenance domain. The requirement items in the function achievement domain are actually the functional performance requirements in traditional methods; whereas the function maintenance domain is the requirement to maintain the various functions in the function achievement domain. By further expanding the function maintenance domain, it can be obtained the guarantee requirements, which is, however, not particular focused in this book. According to the definition of fault ontology, a fault can exist as an abnormal function

Fig. 2.20 The user domain with consideration of hexability design requirements

Fig. 2.21 The extension of the functional domain for integrated design

78

2 Fundamentals of Model-Based Reliability System Engineering

of the meta of the product, and a dual function is developed between the abnormal function and the specific function. For example, for the function FP2 , there is a function{guarantee requirement FPR2 , }in which FPR has an associated dual- function set F P R 21 , F P R 22 , · · · , F P R 2n , each dual-function represents a failure mode, to make the function guarantee requirements converted into failure mode mitigation requirements. Example 2.1 A signal processor is mainly used for the acquisition and processing of aircraft engine bearing signals. Its main functional requirements are to convert 28 VDC into 36 V 400 Hz single-phase AC output, and to calculate the conversion angle of the shaft angle. Therefore, the function requirements can be decomposed to two sub-function requirements, which are to convert 28 VDC to 36 V 400 Hz single-phase AC output, and to solve the conversion angle of the shaft angle. The decomposition structure is shown in Fig. 2.22. By applying similar analysis, we can obtain its secondary decomposition structure, as shown in Fig. 2.23. In order to correctly convert and output 36 V 400 Hz single-phase alternating current, and to accurately calculate the conversion angle of the shaft angle, the functional requirements are expanded as follows. Fig. 2.22 Decomposition structure of the first-level function of the signal processor

Fig. 2.23 Decomposition structure of the second-level function of the signal processor

2.5 MBRSE Design Evolution Mechanism

79

(1) Keep the input and output current stable. (2) Keep the shaft angle conversion signal stable. The above functional requirements are further decomposed as follows. (1) In order to keep the input and output current stable and not fail, firstly it is necessary to ensure that the current transmission and conversion are not affected by the ambient temperature; secondly, it is necessary to shield the electromagnetic and other current interference, and also be able to withstand the peak of the power supply, and be able to protect the circuit when power fail. Therefore, the following functional requirements should be implemented: ➀ Avoid circuit performance affected by the environment, including temperature, dust, salt spray, etc. ➁ Avoid the electromagnetic interference by the power signals. ➂ Avoid the interference by input current signal. ➃ Filter the output power supply to ensure the required output voltage. ➄ Avoid current undervoltage, overvoltage, and surge. ➅ Avoid the risk of power fail. (2) In order to maintain the stability of the shaft angle conversion signal and avoid the occurrence of signal conversion related faults, on the one hand, it is necessary to avoid errors due to the angle offset, and on the other hand, it is necessary to consider the possible influence of the external interference signals. Therefore, the following functional requirements should be implemented. ➀ Filter the angle offset signals. ➁ Shield external interference signals. ➂ Avoid circuit performance affected by ambient temperature. To sum up, the functional domain expansion decomposition structure of the signal processor can be obtained, as shown in Fig. 2.24. From the extended functional domains to physical domains, functional failure modes can be naturally mapped to specific hardware fault parameters corresponding to specific physical failure modes. As shown in Fig. 2.25, in which the mapping to the physical domain is achieved. Therefore, the set of hexability design methods should progressively identify and deal with the failure modes in each design domain, including identification of the conditions for the occurrence of failure modes, identification and evaluation of all possible failure modes, analysis of the failure mechanism and analysis of possible design plans, to eventually mitigate the exposed failure mode or reduce its occurrence probability. The hexability evaluation method set should be able to configure the test parameter set according to the extended design domain, judge the failure mode mitigation and control through the analysis of the test parameter set, analyze and determine the achievement of the requirements in the user domain.

80

2 Fundamentals of Model-Based Reliability System Engineering

Fig. 2.24 Extension of the functional domain of the signal processor

Fig. 2.25 Mapping from the failure mode mitigation to design parameters

2.5.3 Mapping Principle of the MBRSE Design Domain The implementation process of the integrated design method is essential for the evolution process of the product meta model, and the evolution process is controlled according to the control principle. In addition to considering the evolution of product meta in a certain design domain, it is also necessary to consider the mapping and transformation of product meta between two adjacent design domains. The evolution process applies the “zigzag” mapping of axiomatic design. This process is carried out from top to bottom between two adjacent design domains, and it is an iterative process of “decomposition” and “integration”, as shown in Fig. 2.26. In the user domain, the hexability method is used to achieve the analysis, transformation, decomposition, and integration of hexability requirements. In the functional domain, the hexability method is used to realize the product functional structure for the hexability requirements, including the design, analysis, and evaluation of this

2.5 MBRSE Design Evolution Mechanism

81

Fig. 2.26 Product meta-matrix mapping principle between different design domains

functional structure, and map the top-level requirements to the achievable specific functions. In the physical domain, the hexability method is used to map the functional structure that meets the hexability requirements to the specific design of the product structure.

Chapter 3

MBRSE Based Unified Model and Global Evolution Decision Method

Abstract According to the design evolution mechanism of MBRSE, this chapter first develops the MBRSE model evolution process by taking the faults as center and integrating functional realization and fault mitigation. Then by combing the above developed method with engineering design theory, it provides a unified modeling method that is used for a series of fundamental product models such as the requirement model, functional model and physical model. Finally, a comprehensive decisionmaking method based on the evolutionary unified model is proposed for solving the integrated design problem between the functional performance and hexability of the product in its R&D process. Keywords Unified model · Fault mitigation · Product modeling · Comprehensive decision-making · Evolutionary model

3.1 MBRSE Model Evolution Process Integrating Functional Realization and Fault Mitigation Model-based system engineering (MBSE) is the process of establishing engineering system model. Based on a MBSE model, MBRSE integrates fault, maintenance, test, support, etc., to further expand the MBSE model into a unified model. This unified model transforms the natural language based and relatively independent reliability system engineering into MBRSE which is integrated into the development process by taking the models as its main body. The evolution process of MBRSE from the initial unified requirement models to unified system verification model is shown in Fig. 3.1. Firstly, a unified requirement model including reliability and special quality characteristic requirements is established by analyzing user requirements. Then, a unified functional model of reliability and special quality characteristics is established through decomposition and mapping techniques. For a further detailed design, the most basic unified “unit” physical model for achievement of reliability and special characteristics is determined through the decomposition and modelling techniques. Next, through the integration, verification

© National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_3

83

84

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.1 Conceptual model of the unified model evolution

and optimization of the subsystems and systems, a unified subsystem model and a unified system model that meet user needs are gradually obtained. The “unit” here refers to the relatively smallest part to construct the system. During the design process of the unit, its internal structure and relationships are not considered, whereas only external characteristics are considered. The “System” refers to a whole with certain functions composed of “units” that restrict each other [20]. Reliability systems engineering (RSE) takes faults as the core, studies the law of fault occurrence in the whole life process of complex systems, and establishes a fault prevention and control technological system including fault prevention, fault control, fault repair and evaluation and verification [21]. This system should be integrated into the system engineering process, with continuous design decomposition and integration, which can be carried out repeatedly and iteratively. Traditional reliability system engineering design analysis methods are mainly based on table analysis and manual work, which are characterized by a difficult technical application, a complex process, and difficult management and control. The unified model provides a new basic model for reliability systems engineering. The product design based on the unified model will change the empirical and fragmented mode of traditional reliability engineering, and establish a new model with the unified design of both reliability and special characteristics. The process is shown in the Fig. 3.2. This process converts the engineering elements of the reliability systems into faults and models which are considered as the organic components of a unified model. With the evolution of the unified model, the fault model is also constantly evolving. Through closed-loop fault mitigation control and judgment optimization between the internal reliability characteristics and the special characteristics, the reliability design requirements of the product are gradually achieved.

3.1 MBRSE Model Evolution Process Integrating Functional Realization …

85

Fig. 3.2 A new unified design mode of reliability and special characteristics

(1) In the requirements analysis stage, while determining functional requirements, the reliability design requirements must be obtained simultaneously through mapping. Driven by market demand, the designer carries out the product plan to determine the main customer attributes (MCA) (which are basic needs that must meet user requirements, including main functional requirements such as power and load, main reliability requirements such as time between faults, life, and safety) and assistant customer attributes (ACA) (which are nonessential requirements to make the product easier to use, easier to repair, etc.) of the product, and develops the initial model of the product, i.e. the unified requirement model. Such a model is usually a “black box” that can meet the requirements of various users, and generally uses a requirement list to sketch the outline of the product. Under this circumstance, the reliability elements are represented by a reliability requirement list, which is an organic component of the overall requirement list. This book assumes that the list of reliability requirements already exists and will not be expanded. (2) In functional design, it is necessary to abstractly process and recognize the essence of the problem based on the unified requirement model, convert the problem into functional design, establish the corresponding functional structure, seek the working mechanism of the product, and eventually develop the functional model of the product. In the meantime, the mapping relationship between the requirement model and functional model is established to evaluate

86

3 MBRSE Based Unified Model and Global Evolution Decision Method

the technical and economic feasibilities of each plan, and finally determine the principle solution of the product. For the same requirement model, the solution of the functional design is not unique, but the best design solution needs to be determined according to the accumulation of design experience and professional knowledge. Then, for each of the different product design plans, by taking its corresponding functional models as the main line, and integrating the disciplines related to its functions, performance and reliability, through the physical principle analysis, a comprehensive and unified product model, which is the so-called unified functional model, can be developed. The designer can determine the proper design plan by comprehensively weighing the unified functional model of different design plans. It should be noted that the fault can be regarded as an abnormal function of the product during this process. Dual functions are formed between the abnormal function and specific function, and each dual function exhibits at least one kind of fault. All faults and their relationships constitute the functional fault model, which is part of the unified functional model. (3) When designing the minimum physical unit, the main design plan and alternative design plan shall be selected based on the corresponding unified functional models, respectively. First of all, for the main function, the main functionalities and its design parameters, including size, location, material, space constraints, etc., need to be obtained through the evolution of professional models of such as control, pneumatics, strength, electronics, hydraulics, and software. Second, for the necessary auxiliary functions, the design parameter requirements of the design plan also need to be given, but it is necessary to first determine whether a functionality needs a special design or the existing design can meet the requirements. Next, the achievement of the reliability requirements in the physical design must be considered. Reliability is an ability to maintain functions. Its physical essence and design parameters can be analyzed from the perspective of avoiding functional faults. There may be two ways to finally achieve the reliability requirements of the physical entities, by improving the design parameter requirements of existing functionalities and adding functionalities, respectively. Finally, rely on the unified functional model to establish the interface relationship between the functionalities and input and output requirements, such as the material flow, signal flow, and energy flow, to develop a unified unit physical model. In addition, it is also necessary to establish the mapping relationship between the functional model and physical model, to carry out the plan solution judgment. If the design plan cannot meet the requirements, it indicates that the functionality is defective in design. Designers need to combine functional faults and based on the mapping relationship between functional models and physical models to comprehensively and systematically find the faults in physical units, and optimize physical design plans by taking the mitigation of key and important physical faults as the core. The physics of the failure model, which is composed of all the physical faults obtained by the analysis, is an organic composition of the unit physical model.

3.2 Modelling Method for Fundamental Product Model

87

(4) The physical units need to be integrated together to achieve the functions given by the design. Such integration of units is not only dependent on their own configurations, but also related to their interfaces. The virtual physical entities composed by physical units for achieving specific functions are called subsystems or systems, where the corresponding models are called unified subsystem model and unified system model, respectively. In the integration of the system, it is necessary to verify the physical model and functional model of the system step by step through unit tests and simulation analysis from the bottom to the top to ensure the achievement of user requirements. In the verification process, it is necessary to comprehensively consider the factors that are not considered during the modelling and analysis of the physical units, including the relationship between physical units at different levels, and the fixed interfaces, energy and signal transfer interfaces, and environmental loads in between the physical units. These factors can cause the system to fail. The system fault model can be regarded as a subset of the unified system model.

3.2 Modelling Method for Fundamental Product Model The requirement model is mainly determined according to the task of use (or market demand) of the product. In the requirements analysis stage, the requirements that should be clarified are from include geometry, motion, force, energy, materials (physical and chemical properties of the input and output products, auxiliary materials, specified materials, etc.), signals, reliability, safety, manufacturing, assembly, transportation, maintenance, costs, etc., as well as the necessity of the requirements (requirements that must be met and desire requirements), responsible persons, logos, versions, etc., which are mainly represented by lists or clauses. The functional model is mainly the abstract and expansion of the requirement model and related to the requirement of the functional unit. It is developed by combining the effect mechanisms in between the functional units. In the meanwhile, it is necessary to maintain the many-to-many mapping relationship between the requirement models and functional models. The elements that need to be described in the functional model include the main functional unit, auxiliary functional unit, input (energy, material, signal), output (energy, material, signal), uncertain interference, undesired output, latent function fault, fault associations, roles structure (composition, decomposition), logic, interface, system boundary, index, etc. The physical model is mainly obtained according to the physical principle of the function realization, and has a many-to-many mapping relationship with the function model. The elements that need to be described in the physical model include the main physical unit, auxiliary physical unit, identification, installation location, input (energy, material, signal), output (energy, material, signal), uncertainty interference, unexpected output, fault (physical faults, system faults), fault associations, assembly relationships, logic, system boundary, physical interface, etc.

88

3 MBRSE Based Unified Model and Global Evolution Decision Method

By a comprehensive analysis of the above characteristics, this book provides the main elements which should be included in the meta-model of the above types of models: (1) The main unit (including the requirement unit that must be met, the main functional unit, and the main physical unit) and its input, output, uncertainty interference, undesired output, key indices or main functions, as shown in Fig. 3.3. (2) Auxiliary units (including the desired requirement unit, auxiliary functional unit, and auxiliary physical unit) and their input, output, uncertainty interference, undesired outputs, key indices, or main functions, as shown in Fig. 3.4. Fig. 3.3 Visualization of the main unit and its related information

Fig. 3.4 Visualization of the auxiliary units and related information

3.2 Modelling Method for Fundamental Product Model

89

Fig. 3.5 Sequence model

(3) Logic (including sequence, parallel, repetition, selection, multiple output, iteration, loop, etc.). ➀ Sequence: the sequence model is an arrow from left to right, as shown in Fig. 3.5. ➁ Parallel (A). The parallel model is shown in Fig. 3.6. ➂ Repetition (RP): This means that the function is executed repeatedly, which is regarded a special case of the parallel model. The repetition model is shown in Fig. 3.7.

Fig. 3.6 Parallel model

Fig. 3.7 Repetition model

90

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.8 Selection model

➃ Selection (OR). The selection model is shown in Fig. 3.8. ➄ Multiple outputs: If a function has multiple outputs, it is necessary to define the corresponding logic or rules to ensure only one output at a time. Therefore, the selection model (OR) must be included in the multiple output model. The multiple outputs model is shown in Fig. 3.9. ➅ Iteration (IT): The iteration model indicates that the functions and behaviors in a specified set will be executed with a given number or frequency multiple times, as shown in Fig. 3.10. ➆ Loop (LP): The loop model indicates that the functions between loop nodes are executed repeatedly until specified output conditions are met. Among them, at least one selection (OR) node and one loop output node are included. The loop model is shown in Fig. 3.11. Fig. 3.9 Multiple outputs model

3.2 Modelling Method for Fundamental Product Model

91

Fig. 3.10 Iteration model

Fig. 3.11 Loop model

(4) There is no essential difference between the system model and unit model in terms of description. The system model is shown in Fig. 3.12. (5) Structure and boundary of the system. Visualization of the structure and boundary of the system is shown in Fig. 3.13. (6) Assembly relationship: The assembly relationship is mainly described using the CAD model. (7) Interaction: The interaction model can be represented by the bond graph in the system dynamics modelling method, which will not be discussed in this book.

Fig. 3.12 System model

92

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.13 Visualization of the system structure and boundary

Fig. 3.14 Functional model of the power supply module (Partial model)

Example 3.1 For the signal processor shown in Example 2.1 [22, 23], the functional model of the power supply module can be developed according to the method of constructing the functional principle model mentioned above, as shown in Fig. 3.14.

3.3 Evolutionary Decision-Making of the Unified Model The comprehensive judgment between the overall functional performance and hexability of the product (herein referred to as the comprehensive judgment) is an optimization of multiple alternatives solution under the requirements of a specific product function and hexability and is a group decision-making process determined by many people. The overall framework is shown in Fig. 3.15, which mainly includes the following four basic steps: ➀ Formalize the analysis and evaluation results of

3.3 Evolutionary Decision-Making of the Unified Model

93

Fig. 3.15 Comprehensive judgment framework

multiple design plans to generate alternative plans. ➁ Select the appropriate model for operation according to the type of data. ➂ Provide the sequence of the plans. The work in ➀–➂ is jointly completed by decision analysts and decision makers. ➃ The decision makers draw a conclusion of the comprehensive judgment by referring to the sequence generated by the previous step, such as choosing a certain plan, or re-revising and improving the plan and making a decision again.

3.3.1 Deterministic Model Deterministic decision making is the basis for all types of judgment analysis and is relatively simple. There are many deterministic decision-making methods, such as Analytic Hierarchy Process (AHP), Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS), cobweb diagram, and the method of taking the best from a bad set. Taking into account the characteristics of product development and the need for a complete judgment between functional performance and hexability, the more suitable methods are AHP and TOPSIS.

94

3.3.1.1

3 MBRSE Based Unified Model and Global Evolution Decision Method

Analytic Hierarchy Process (AHP)

The Analytic Hierarchy Process (AHP) was proposed by Saaty, who is an American operations researcher, in the early 1970s. Essentially, AHP is a formalization of the human understandings on the hierarchical structure of complex problems. It gains a great deal of attention by the characteristics of practicality, simplicity, and systematisms, and has been rapidly applied to multiple attributes decision-making problems in various fields. The basic structure of its decision analysis is shown in Fig. 3.16. The key point of using AHP to obtain a comprehensive judgment is to make it possible for decision makers to use the attribute hierarchy to construct complex decision-making problems of multiple attributes. Therefore, AHP is more robust for solving the complex and large hierarchical problems. When applying AHP to analyze the comprehensive judgment between the functions and hexability, the problem should be organized and hierarchical, for

Fig. 3.16 The fundamental structure of AHP’s multi-attribute decision analysis

3.3 Evolutionary Decision-Making of the Unified Model

95

constructing a hierarchical structural model. Under this structural model, the complex judgment problem between functions and hexability is decomposed into several structural elements with certain hierarchical layers. The elements at the same layer act as a criterion to dominate certain elements at the lower layer, and in the meanwhile it is dominated by the elements of the upper layer. These layers can be roughly divided into three categories. (1) The top layer. There is only one element in this layer, which is the predetermined goal or desired result of the analysis problem, so it is also called the goal layer. (2) The middle layer. This layer includes the intermediate links involved in achieving the goal. It can be composed of several sub-layers, including the criteria and sub-criteria that need to be considered, so it is also called the criteria layer. (3) The bottom layer. This layer includes various methods, decision-making plans, etc. for achieving the goal, so it is also called the alternative layer or the plan layer. The dominance relationship between the above-mentioned layers is not necessarily complete. In other words, there may be some elements which do not dominate all elements but only some of them. The hierarchical structure established by this top-down dominance relationship is called a hierarchical structure. A typical hierarchical structure is shown in Fig. 3.17. The number of layers in the hierarchical structure is related to the complexity of the problem and the level of detail of the requirements and generally can be no constrained. A reasonable and effective hierarchical structure is extremely important for solving problems, so the hierarchical structure must be established based on a comprehensive and in-depth understanding of the problems faced by researchers and decision makers. If one is indecisive about the division of the hierarchy and the determination of the dominance relationship in between the elements of the hierarchy, it is best to reanalyze the problem and clarify the relationship between the elements to ensure a reasonable hierarchy. In AHP, the qualitative judgment and quantitative analysis of decision makers can be combined by constructing a hierarchical structure and performing a ratio analysis. Since the whole process meets the requirements of human decision-making thinking activities, the effectiveness and mobility of decision-making are greatly improved. However, in practical application, attention should be paid to the issues such as scaling, algorithm selection and testing. For instance, complex decisionmaking problems involve a large number of interactive dependencies and feedbacks in between different layers, to make the hierarchical structure cannot be constructed. Then AHP cannot work effectively. This is also a limitation of the current AHP technique. Based on the current researches, the main problems of AHP are shown as follows. (1) The assignment of each scale in the judgment matrix is very arbitrary. In the meantime, this assignment method is feasible for single-person decisionmaking, but may result in conflicts for multi-person decision-making. (2) The assignment method of the judgment matrix needs to be considered, because of the reciprocal relationship of the symmetric position weights in the matrix.

96

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.17 Typical hierarchical structure

(3) This “reciprocal” assignment of the positive reciprocal matrices will cause the phenomenon of “opinion amplification” in the subsequent calculations on standard weights and relative weights. When using AHP to make decisions, it can be roughly divided into the following four steps. Step 1: Analyze the relationship between the elements in the product and establish the product’s hierarchical structure. Let H be a finite, locally ordered set with the unique highest element if it satisfies the following conditions, (1) There is a partition {L k }(k = 1, 2, · · · , m) of H, where L 1 = {c}, and each partition L k is called a layer. (2) For each x ∈ L k (1 ≤ k ≤ m − 1), X − is not empty and X − ⊆ L k−1 . (3) For each x ∈ L k (2 ≤ k ≤ m), X + is not empty and X + ⊆ L k−1 . Then H is called a hierarchy. The hierarchical structure should have the following properties, (1) Any element in H must belong to one layer and only one layer, and the intersection of the element sets from different layers is an empty set. (2) There is no dominance or subordination between any two elements on the same level. (3) Any element in L k (2 ≤ k ≤ m) must be dominated by at least one element in L k−1 , and can only be dominated by the elements in L k−1 .In the meanwhile, each element in L k (1 ≤ k ≤ m − 1) dominates at least one element in L k+1 , and can only dominate the elements in L k+1 .

3.3 Evolutionary Decision-Making of the Unified Model

97

(4) There is no dominance relationship between any two elements in two nonadjacent layers. The hierarchical structure and the tree structure have both relations and differences. The tree structure belongs to an incomplete hierarchical structure. Namely, each element in the upper layer cannot completely dominate all the elements in its neighboring lower layer. The hierarchical structure is not necessarily a tree structure. The connection lines in a tree-like structure are parallel, i.e. the two elements in neighboring layers are disjoint. However, the connection lines in a hierarchical structure are generally intersected. Taking the aircraft selection problem as an example, its hierarchical structure is shown in Fig. 3.18. Step 2: Calculate the importance value of each element in the same layer in terms of a criterion in the upper layer, compare the importance values of any two elements to construct a paired comparison matrix, and perform ranking and consistency tests. After the hierarchical structure is established, the affiliation relationship of elements between the upper and lower layers is determined. Assuming that the toplayer element x 0 is used as the criterion, its dominant elements in the next layer (L 1 ) are x1 , x2 , . . . , xn , and the corresponding weights ω1 , ω2 , . . . , ωn of their relative importance values to the criterion x 0 can be obtained by the paired comparison. Obviously, the judgment matrix has the following properties. ∀i, j ∈ N , then

< for ai j > 0, a ji = 1 ai j , aii = 1 Here, the judgment matrix A is called a positive reciprocal matrix. ∀i, j, k ∈ N , then ai j × a jk = aik Here, A is called the complete consistency matrix.

Fig. 3.18 Schematic diagram of an aircraft hierarchy

98

3 MBRSE Based Unified Model and Global Evolution Decision Method

Step 3: Calculate the relative weights of the elements compared to the criterion from the judgment matrix. ( ) (1) Normalize the judgment matrix A = ai j n×n by column. (2) Calculate the summation by row, as given by: ωi =

n ∑

a i j (i ∈ N )

(3.1)

j=1

(3) After renormalization, the weight coefficient is obtained by: < ωi = ωi

n ∑

ωi (i ∈ N )

(3.2)

i=1

(4) Calculate the maximum characteristic root: λmax =

n ∑ (Aω)i i=1

nωi

(3.3)

Step 4: Calculate the synthetic weights of the elements in each layer to the product target, and perform a total hierarchical sequencing and consistency check. Through the above steps, only the weight vector of a set of elements is obtained to an element which is in the upper layer. However, the final decision requires the relative weights of the elements in each layer in terms of the general criterion, for making a decision on all alternatives. This requires a top-down synthesis of the single-layer element weights to obtain the synthetic weights of the elements in the lowest layer in terms of the highest layer. Assuming that the synthetic weight vector ω(k−1) of the elements n k−1 on the k − 1th layer is obtained in terms of the general criterion is obtained as ω(k−1) = ( )T ω1(k−1) , ω2(k−1) , . . . , ωn(k−1) . Let the n k elements in the kth layer in terms of the k−1 single weight vector of the jth element on the k − 1th layer as the criterion, which is )T ( j (k) j (k) j (k) set as P j (k) = P1 , P2 , . . . , Pn k , in which the non-dominated weights are taken as zero. ) ( P (k) = P 1(k) , P 2(k) , . . . , P n k−1 (k) n k ×n k−1 , represent the synthetic weights of the n k elements in the kth layer in terms of each element on the k − 1th layer, then the synthetic weight vector ω(k) of the kth layer elements in terms of the top-level general criterion is given by the following formula, )T ( ω(k) = ω1(k) , ω2(k) , · · · , ωn(k) = P (k) ω(k−1) k−1 or

(3.4)

3.3 Evolutionary Decision-Making of the Unified Model

ωi(k) =

n k−1 ∑

(k−1) Pi(k) (i = 1, 2, · · · , n k ) j ωj

99

(3.5)

j=1

Then, a recursive formula can be obtained as ω(k) = P (k) P (k−1) · · · ω(2)

(3.6)

where: P (k−1) is a n k−1 × n k−2 matrix composed of the weights of the elements in the k − 1th layer in terms of the upper layer elements; ω(2) is the single weight vector of the second layer elements in terms of the general criterion. Then, the consistency check is performed layer by layer from top to bottom.

3.3.1.2

Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS)

Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) is a multiattribute decision-making method from a geometrical point of view, i.e. to evaluate m solutions under n attributes, similar to m points in n-dimensional space. It draws on the idea of ideal solution and negative ideal solution (which are called positive ideal plan and negative ideal plan in this book) in multi-objective decisionmaking problems. TOPSIS was proposed by Yoon and Hwang in 1981.It is based on the theory that the selected plan should have the smallest gap with the positive ideal plan but the largest gap with the negative ideal plan. It firstly regulates the decision matrix of the multi-attribute decision-making problem, and then calculates the weighted distances between each plan and the positive and negative ideal points. The plan that is closest to the positive ideal point and farthest away from the negative ideal point is exactly the optimized plan. The so-called positive ideal plan is the most expected plan, of which each attribute reaches the best value from all candidate plans. The negative ideal plan is the least expected plan, of which each attribute reaches the worst value of all candidate plans. The plans are ranked by comparing their distances from the positive and negative ideal plans. Therefore, the optimal solution satisfies the condition that it is closest to the positive ideal plan and furthest away from the negative ideal plan. However, when conducting decision analysis, we often encounter the situation where a certain plan is closest to the positive ideal plan, but not far from the negative ideal plan. As shown in Fig. 3.19, A1 is closest to the positive ideal plan A+ , but A2 is farthest away from the negative ideal plan A− . Therefore, additional function is needed to merge these two indicators. Such a function is called the relative closeness function of the plan. The closeness function is developed mainly on the basis of the idea of relative proportion. The ratio between the distance from a plan to the positive or negative ideal plan and the sum of these two distances (to the positive and negative ideal plans) is calculated to compare the pros and cons of that plan.

100

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.19 An example of TOPSIS

TOPSIS has great significance for the comprehensive judgment decision between functions and hexability. First of all, it has completed geometric meanings, which can more accurately describe the plan with multi-dimensional functions and hexability in geometric space, which is in line with people’s thinking of comprehensive evaluation. In addition, it has good expansibility, can be applied to decision-making problems with different information states such as deterministic, random, and fuzzy through a rigorous mathematical process, and therefore has a relatively broad potential application. However, in the complete decision-making process using TOPSIS, the calculation process is relatively complicated and the mathematical logic is more difficult to understand, leading to a certain constraint on its application. Moreover, when dealing with random decision-making problems, it is easy to cause the loss of information and decision-making resources, which also constrains its application. Nevertheless, with the continuous development of computing technology and decision theory, the supporting role of TOPSIS on the comprehensive judgment between functions and hexability will become more and more obvious. The problem description of the deterministic TOPSIS decision-making model for functions and hexability is given as follows. Use X = {x1 , x2 , . . . , xm } to represent the set of plans that participate in decision making based on functions and hexability attributes, and use A = {a1 , a2 , . . . , an } to represent the set of evaluation indicators to measure functional and hexability attributes of each decision making plan. The evaluation value of the decision-making scheme xi in terms of the evaluation index a j is represented by ri j , then for the decision problem of n decision-making plan n, its decision matrix can be expressed as follows. a a · · · an ⎡ 1 2 x1 r11 r12 · · · r1n R = x2 ⎢ ⎢ r21 r22 · · · r2n .. ⎢ .. .. . . ⎣ . . · · · .. xm

rm1 rm2 · · · rmn

⎤ ⎥ ⎥ ⎥ ⎦

3.3 Evolutionary Decision-Making of the Unified Model

101

( )T ω = ω1 , ω2 , . . . , ωn represent the weight set of each evaluation index to reflect the designer’s preference. The specific steps to use TOPSIS for decision making [ ]are as follows: Step 1: Normalize the decision matrix R = ri j (i = 1, 2, . . . , m; j = 1, 2, . . . , n), which is composed of the attribute indices from functions and hexability in each plan. Since the measurement parameters and attribute indices from each of the functions and hexability of the design plan are all deterministic value indices, which generally can be divided into cost attribute indices and benefit attribute indices. Its normalization method is given as follows: (1) Benefit attribute indices, namely positive indices (the bigger the better). bi j = /

ri j n ∑

( )2 ri j

(i = 1, 2, · · · , m; j = 1, 2, · · · , n)

(3.7)

i=1

(2) Cost attribute indices, namely, the reverse indices (the smaller the better). bi' j = /

1/ri j n ∑

( )2 1/ri j

(i = 1, 2, · · · , m; j = 1, 2, · · · , n)

(3.8)

i=1

The normalized decision matrix is then calculated as follows: [ ] B = bi j (i = 1, 2, · · · , m; j = 1, 2, · · · , n)

(3.9)

[ ] Step 2: Combine the normalized decision matrix B = bi j m×n and the set of weighting parameters of the attribute index of the functions and the hexability ω = ( )T ω1 , ω2 , . . . , ωn of the plan, to obtain the weighted decision matrix, which is ⎡

u 11 u 12 · · · ⎢ u 21 u 22 · · · ⎢ U =⎢ . .. ⎣ .. . ··· u m1 u m2 · · ·

⎤

⎡

ω1 b11 ω2 b12 ⎥ ⎢ ω1 b21 ω2 b22 ⎥ ⎢ ⎥=⎢ . .. ⎦ ⎣ .. . u mn ω1 bm1 ω2 bm2 u 1n u 2n .. .

· · · ωn b1n · · · ωn b2n . · · · ..

⎤ ⎥ ⎥ ⎥ ⎦

(3.10)

· · · ωn bmn

Step 3: Determine the positive ideal plan X + and the negative ideal plan X − for the comprehensive judgment between functions and hexability. Let [ ] M +j = max u i j ( j = 1, 2, · · · , n)

(3.11)

[ ] M −j = min u i j ( j = 1, 2, · · · , n)

(3.12)

102

3 MBRSE Based Unified Model and Global Evolution Decision Method

Then the positive ideal plan is, } { X + = M1+ , M2+ , · · · , Mn+

(3.13)

and the negative ideal plan is, } { X − = M1− , M2− , · · · , Mn−

(3.14)

Step 4: Calculate the Hamming distance from each decision-making plan xi to the positive ideal plan X + and negative ideal plan X − . The Hamming distance from each decision-making plan xi to the positive ideal plan X + is calculated by: (

di+ = d xi , X

) +

⌜ I∑ I n ( + )2 di j = √

(3.15)

j=1

) ( ) ( = ui j − M + (i = 1, 2, · · · , m; j = 1, 2, · · · , n) di+j = d u i j , M + j j

(3.16)

The Hamming distance from each decision-making plan xi to the negative ideal plan X − is calculated by: (

di− = d xi , X

) −

⌜ I∑ I n ( − )2 di j = √

(3.17)

j=1

) ( ) ( = ui j − M − (i = 1, 2, · · · , m; j = 1, 2, · · · , n) di−j = d u i j , M − j j

(3.18)

Step 5: Calculate the relative closeness of each decision-making scheme xi to the negative ideal plan X − by: di =

di− (i = 1, 2, · · · , m) di− + di+

(3.19)

Step 6: Sort each decision-making plan from large to small according to the corresponding di . The decision-making plan with the largest di can be considered as the best plan.

3.3.2 Stochastic Model There is a large amount of random data in the evaluation results of the hexability data, and therefore a random judgment model can be given based on the TOPSIS.

3.3 Evolutionary Decision-Making of the Unified Model

103

Firstly, alternative plans are established, and the random distance functions to the positive and negative ideal plans, as well as the random relative distance function between the alternative plans and the ideal plans, are obtained. Second, the relative advantage probability matrix of each alternative plan is calculated according to the random relative distance functions. Finally, the alternative plans are prioritized with the decision risks according to the probability matrix. The multi-attribute decision-making problem of the determined value and confidence interval requires comprehensive research and consideration. The reliability attribute indices are expressed by the confidence interval estimates under a certain confidence level α. Performance attribute indices are expressed by estimated deterministic values. The V design plan space refers to a m-dimensional space established by weighing the performance attributes and the reliability attributes of the design plan as coordinate axes in the product development process. In this space, the position of each decision-making plan is determined by the attributes of that plan, which can be described in the{ plan space. } Let U = u 1 , u 2 , . . . , u m−1 , u mR be the set of attributes composed ( ) of the performance attribute indices (u i ) and the reliability attribute index u mR , ω = (ω1 , ω2 , . . . , ωm ) is the weight set of each attribute index in the plan space, and X = {x1 , x2 , . . . , xn } is the alternative set of plans. The evaluation value of the attribute index u j by the plan xi is represented by ri j , then the decision matrix D can be expressed as follows, u 1 u 2 · · · u m−2 u mR ⎡ r r · · · r1(m−1) x1 ⎢ 1 12 . x ⎢ .. r21 · · · r2(m−1) D = 2⎢ . . . x3 ⎢ ⎣ xn .. .. .. x4 rn1 rn2 · · · rn(m−1)

[ L U ]⎤ r1m , r1m [ L U ]⎥ r2m , r2m ⎥ ⎥ ⎥ .. ⎦ .[ ] L U rnm , rnm

(3.20)

[ L U] where: rim , rim is the reliability confidence interval at the given confidence level of the plan x i . ( ) The positive/negative ideal decision-making plan x0+ x0− refers to the virtual plan composed of the relative optimal (inferior) values of the attribute indices in the alternative plans in the decision-making plan set in the comprehensive decisionmaking process, and each of its attribute indices reaches the best (wrost) value for each decision-making plan. ( ) In this book, the positive/negative ideal decision-making plan x0+ x0− is determined by taking the upper/lower limit principle in the determination of the ( hexa) bility attributes. Then the positive/negative ideal decision-making plan x0+ x0− can be described in the V plan space as a point, which makes further decision analysis more intuitive.

104

3 MBRSE Based Unified Model and Global Evolution Decision Method

The specific steps of the modelling process are as follows: Step 1: Normalize the indices of the original decision-making matrix D, convert them into the values in the interval [0,1], and obtain the normalized decision-making matrix B: [ ] B = bi j (i = 1, 2, · · · , n; j = 1, 2, · · · , m)

(3.21)

(1) Normalization of benefit attribute indices: bi j = /

ri j n ( ) ∑ 2 ri j

(i = 1, 2, · · · , n; j = 1, 2, · · · , m − 1)

(3.22)

i=1

(2) Normalization of cost attribute indices: bi j = /

1/ri j n ∑

( )2 1/ri j

(i = 1, 2, · · · , n; j = 1, 2, · · · , m − 1)

(3.23)

i=1

Step 2: Determine the weighted decision-making matrix B' ] [ B ' = ω j bi j (i = 1, 2, · · · , n; j = 1, 2, · · · , m)

(3.24)

Step 3: Determine the ideal positive and negative ideal decision-making plans x0+ and x0− with the consideration of both performance and reliability. Because reliability is obtained by confidence interval estimates, 1 and 0 can be used to describe its best and worst ideal values respectively in the plan space V, then the positive and negative ideal decision-making plans x0+ and x0− can be expressed as: {[ ] ]} x0+ = max ω j bi j , j = 1, 2, · · · , m − 1 , 1 (3.25) 1≤i≤n

x0−

{[ =

] ]} min ω j bi j , j = 1, 2, · · · , m − 1 , 1

1≤i≤n

(3.26)

− Step 4: Calculate the distance functions L + 0 and L 0 from each alternative plan x i to the positive and negative ideal decision-making plans x0+ and x0− . For each alternative plan x i , use the random attribute yiR to represent the reliability index, then:

L i+ =

m−1 ∑(

)2

ω j bi j − max ω j bi j

j=1

1≤i≤n

( )2 + ωm yiR − 1

(3.27)

3.3 Evolutionary Decision-Making of the Unified Model

L i−

=

m−1 ∑(

105

)2

ω j bi j − min ω j bi j 1≤i≤n

j=1

)2 ( + ωm yiR

(3.28)

For the convenience of calculation, the distances L i+ and L i− are processed as: ( )2 L i+ = ωm yiR − 1 + Ci+

(3.29)

)2 ( L i− = ωm yiR + Ci−

(3.30)

where Ci+ and Ci− are constants: Ci+ =

m−1 ∑(

(3.31)

1≤i≤n

j=1

Ci− =

)2

ω j bi j − max ω j bi j

m−1 ∑(

)2

ω j bi j − min ω j bi j

j=1

1≤i≤n

(3.32)

Step 5: Calculate the relative distance function zi from plan x i to the negative ideal − decision-making plan x− 0 x 0 in the plan space V, by: zi =

L i− /L i+

+

L i−

( )2 ωm yiR + Ci− =( ( )2 )2 ωm yiR + Ci− + ωm yiR − 1 + Ci+

(3.33)

Because yiR ∈ [0, 1], by taking the reciprocal of zi , we can get )2 ( ωm yiR − 1 + Ci+ 1 θi = =1+ ( )2 zi ωm yiR + Ci−

(3.34)

Step 6: Use the numerical integration method to calculate the possibility pij based on the paired comparison of θ i , and establish the magnitude possibility matrix P. For any yiR that follows a Normal its confidence interval [ distribution, ] ) ( estimates U L , rim , then we can get yiR ∼ N μi , (σi )2 , under a given confidence level α is rim r L +r U

r U −r L

where μi = im 2 im , σi = im2zα/2im . The joint probability density function of any pair of yiR and y jR (i /= j) can be expressed by: ) ( ) ( ) ( p yiR , yiR = pi yiR p j yiR Because

(3.35)

106

3 MBRSE Based Unified Model and Global Evolution Decision Method

( θi j = θi − θ j =

( )2 )2 R + ω y − 1 + C +j m j − 1 + Ci − ( ( )2 )2 ωm yiR + Ci− ωm y jR + C −j

ωm yiR

(3.36)

Then the magnitude possibility of comparison in between θ i and θ j is calculated as: ( ) pi j θi j ≤ 0 =

¨ A

) ( p yiR , y jR dyiR y jR

(3.37)

( ) } {( ) where: A = yiR , yiR : θi j yiR , y jR ≤ 0 . The pij represents the possibility of θ i < θ j , i.e. the possibility of zi > zj . Therefore the magnitude possibility of zi ≤ zj is 1-pij . Then the possibility matrix P = [pij ] (i /= j) for the paired comparison of zi can be established. Step 7: Sort each plan according to the results of the probability matrix P, in which pij represents the decision-making risk of the plan x i better than x j .

3.3.3 Fuzzy Model 3.3.3.1

Grey Correlation Analysis Based Fuzzy Model

The technical idea of the functions and hexability fuzzy decision-making modelbased on grey correlation analysis is that the information of the hexability attributes is sometimes uncertain or difficult to be described with a strict mathematical model, and the gray theory can be used to comprehensively consider the attributes of the functions and haxability for multi-attribute decision-making. Firstly, a comparison table between the semantic information of each fuzzy attribute and the triangular fuzzy number is established to use the triangular fuzzy number to describe the uncertain attribute. After that, a virtual reference plan is established using the fuzzy operation rules, and the correlation coefficient between each alternative decision-making plan and the positive (negative) virtual reference plan composed of the best (worst) attribute indices is obtained. The gray correlation degree between each decisionmaking plan and the positive (negative) virtual reference plan is obtained from the correlation coefficient, to further calculate the relative correlation degree between each decision-making plan and the positive virtual reference plan. These relative correlation degrees are then sorted by magnitude to implement the decision-making.

3.3 Evolutionary Decision-Making of the Unified Model

107

Herein gives the detailed steps: Step 1: Normalize the original decision-making matrix indices and convert them into the values in the interval [0, 1]. The normalization formula is: (1) Benefit attribute indices, i.e. positive indices (the bigger the better): bi j = / ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ bM = ij ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

ri j m ∑

( )2 ri j

(i = 1, 2, · · · , m; j = 1, 2, · · · , n − 3)

i=1

biLj =

riLj

/

)2 m ( ∑ riUj

i=1

/

ri j m ∑

i=1

(ri j )2

(i = 1, 2, · · · , m; j = n − 2, n − 1, n) biUj =

/

(3.38)

riUj

m ( )2 ∑ riLj

i=1

(2) Cost attribute indices, i.e., the reverse indices (the smaller the better): bi' j = /

1/ri j m ∑

( )2 1/ri j

(i = 1, 2, · · · , m; j = 1, 2, · · · , n − 3)

i=1

⎧ 1/riUj ⎪ ⎪ L ⎪ b = / ⎪ ij ⎪ )2 ⎪ n ( ∑ ⎪ ⎪ ⎪ 1/riLj ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ ⎪ ⎪ 1/ri j ⎪ ⎨ biMj = / (i = 1, 2, · · · , m; j = n − 2, n − 1, n) m ( )2 ∑ ⎪ 1/ri j ⎪ ⎪ ⎪ i=1 ⎪ ⎪ ⎪ ⎪ L ⎪ ⎪ ⎪ bU = / 1/ri j ⎪ ⎪ ij ⎪ )2 n ( ⎪ ∑ ⎪ U ⎪ 1/r ⎪ i j ⎩

(3.39)

i=1

Normalized decision matrix, i.e. [ ] B = bi j (i = 1, 2, · · · , m; j = 1, 2, · · · , n)

(3.40)

Step 2: Determine the positive and negative virtual reference plans x0+ and x0− with consideration of the functions and hexability.

108

3 MBRSE Based Unified Model and Global Evolution Decision Method

x0+ =

x0− =

{[

] [ ]} max bi j , j = 1, 2, · · · , n − 3 , max biUj , j = n − 2, n − 1, n

1≤i≤m

{[

1≤i≤m

(3.41) ] [ ]} min bi j , j = 1, 2, · · · , n − 3 , min biUj , j = n − 2, n − 1, n

1≤i≤m

1≤i≤m

(3.42) Step 3: Calculate the correlation coefficient between each alternative decisionmaking plan and the virtual reference plan. (1) The correlation coefficient between each alternative decision-making plan and the positive virtual reference plan. Let the positive virtual reference plan ] [ L ] { [ L M U M U , b+(n−1) , b+(n−1) , x0+ = b+1 , b+2 , . . . , b+(n−3) , b+(n−2) , b+(n−2) , b+(n−2) , b+(n−1) } L M U b+n , b+n , b+n as the reference to be compared, whereas let ] [ L ] { [ L M xi = bi1 , bi2 , . . . , bi(n−3) , bi(n−2) , biM(n−2) , biU(n−2) , bi(n−1) , bi(n−1) , biU(n−1) , } L M U bin , bin , bin (i = 1, 2, · · · , m) as the alternative decision-making plan set to be compared. I I ∆i+j = Ib+ j − bi j I (i = 1, 2, · · · , m; j = 1, 2, · · · , n − 3)

(3.43)

I[ ] [ L M U ]I U I (i = 1, 2, · · · , m; j = n − 2, n − 1, n) ∆i+j = I b+L j , b+Mj , b+ j − bi j , bi j , bi j (3.44) Then the correlation coefficient can be calculated as: γi+j

=

min min ∆i+j + ζ

1≤i≤m 1≤ j≤n ∆i+j +

ζ

max

max ∆i+j

1≤i≤m⊖⊖ 1≤ j≤n max max ∆i+j 1≤i≤m 1≤ j≤n

(i = 1, 2, · · · , m; j = 1, 2, · · · , n) (3.45)

where: ζ is the resolution coefficient, ζ ∈ [0,1], and ζ is usually taken as 0.5.

3.3 Evolutionary Decision-Making of the Unified Model

109

(2) The correlation coefficient between each alternative decision-making plan and the negative virtual reference plan. Let the negative virtual reference scheme be: ] [ L ] { [ L M U M U , b−(n−1) , b−(n−1) , x0− = b−1 , b−2 , . . . , b−(n−3) , b−(n−2) , b−(n−2) , b−(n−2) , b−(n−1) } L M U b+n , b−n , b−n as the reference to be compared, whereas let ] [ L ] { [ L M xi = bi1 , bi2 , . . . , bi(n−3) , bi(n−2) , biM(n−2) , biU(n−2) , bi(n−1) , bi(n−1) , biU(n−1) , } L M U bin , bin , bin (i = 1, 2, · · · , m) as the alternative decision-making plan set to be compared. I I ∆i−j = Ib− j − bi j I (i = 1, 2, · · · , m; j = 1, 2, · · · , n − 3)

(3.46)

I[ ] [ L M U ]I U I (i = 1, 2, · · · , m; j = n − 2, n − 1, n) ∆i−j = I b−L j , b−Mj , b− j − bi j , bi j , bi j (3.47) Then the correlation coefficient can be calculated as: γi−j

=

min min ∆i−j + ζ

1≤i≤m 1≤ j≤n ∆i−j +

ζ

max

max ∆i−j

1≤i≤m⊖⊖ 1≤ j≤n max max ∆i−j 1≤i≤m 1≤ j≤n

(i = 1, 2, · · · , m; j = 1, 2, · · · , n) (3.48)

where: ζ is the resolution coefficient, ζ ∈ [0,1], and ζ is usually taken as 0.5. Furthermore, the correlation coefficient matrix between each decision-making plan and the positive and negative virtue reference plans can be obtained as: [ ] [ ] M+ = γi+j , M− = γi−j (i = 1, 2, . . . , m; j = 1, 2, . . . , n)

(3.49)

Step 4: Calculate the grey correlation matrix between each alternative decisionmaking plan and the virtual reference plan: δ + = ωM+T , δ − = ωM−T (i = 1, 2, . . . , m; j = 1, 2, . . . , n)

(3.50)

[ ] [ ] ∑ where: ω = ωj (j = 1, 2, . . . , n), ωj ∈ [0, 1], nj=1 ωj = 1 ω = ω j ( j = n ∑ 1, 2, . . . , n), ω j ∈ [0, 1], ω j = 1, and ω is the attribute weight set obtained j=1

based on the designer’s preference and expert’s opinion.

110

3 MBRSE Based Unified Model and Global Evolution Decision Method

Step 5: Calculate the relative correlation degree matrix between each alternative decision-making plan and virtual reference plan: ) ( θ = θi+ / θi+ + θi− (i = 1, 2, . . . , m)

(3.51)

Step 6: Rank alternative decision-making plans according to the degree of relative correlation degree θ i , to provide support for comprehensive decision-making. The above model considers the fuzziness of the hexability attributes, effectively solves the fuzzy decision-making problem that may occur in the integrated design process for functions and the hexability, and provides a quantitative plan ranking solution, which makes the decision more convincing and credible.

3.3.3.2

Fuzzy Comprehensive Evaluation Decision-Making Model

The fuzzy comprehensive evaluation decision-making model is established based on TOPSIS, with fuzzy positive and negative ideal plans as the reference, and uses measuring tools such as Hamming distance to measure the difference between the decision-making plan and fuzzy ideal plan. The principle of decision-making is to get the smallest distance from the fuzzy positive ideal solution, but the largest distance from the fuzzy negative ideal solution. The decision-making process of the fuzzy comprehensive evaluation model is given as follows: Step 1: Normalize the decision matrix, by: {

( ) ri j / maxi ri j (benefit attribute indices) ui j = 1/ri j (cost attribute indices) 1/ mini (ri j ) ) ⎧( riLj riMj riUj ⎪ ( ) ( ) ( ) ⎪ (benefit attribute indices) ⎨ max r L , max r M , max r U i ij i ij i ij ) ui j = ( 1/riLj 1/riMj 1/riLj ⎪ ⎪ ( ), ( ), ( ) (cost attribute indices) ⎩ N L M 1/ mini ri j

1/ mini ri j

(3.52)

(3.53)

1/ mini ri j

Step 2: Construct a weighted normalization matrix: [ ] B = ω j × u i j (i = 1, 2, . . . , m; j = 1, 2, . . . , n)

(3.54)

+ Step 3: Determine the positive and negative virtual reference plans x+ 0 x 0 and that consider functions and hexability:

− x− 0 x0

] [ ]} max bi j , j = 1, 2, · · · , n − 3 , max biUj , j = n − 2, n − 1, n 1≤i≤m 1≤i≤m {[ ] [ ]} min bi j , j = 1, 2, · · · , n − 3 , min biLj , j = n − 2, n − 1, n x0− =

x0+ =

{[

1≤i≤m

1≤i≤m

3.3 Evolutionary Decision-Making of the Unified Model

111

− Step 4: Calculate the distances Si+ and S− i Si of each decision-making plan to the positive and negative ideal decision-making plans:

Si+ =

n ∑

z i+j (i = 1, 2, · · · , m)

j=1

Si− =

n ∑

z i−j (i = 1, 2, · · · , m)

(3.55)

j=1

where: I I z i+j = Ibi j − b+ j I (i = 1, 2, · · · , m; j = 1, 2, · · · , n − 3) I I z i−j = Ibi j − b− j I (i = 1, 2, · · · , m; j = 1, 2, · · · , n − 3)

(3.56)

{ } [ ] = 1 − sup bi j (x) ∧ b+ j (x) (i = 1, 2, · · · , m; j = n − 2, n − 1, n) x { } [ ] (i = 1, 2, · · · , m; j = n − 2, n − 1, n) z i−j = 1 − sup bi j (x) ∧ b− j (x)

z i+j

−j

(3.57) Step 5: Calculate the relative degree of approaching for each plan to the ideal plans: ( ) Ci = Si− / Si+ + Si− (i = 1, 2, · · · , m)

(3.58)

Thus, the decision-making plans can be sorted according to the magnitude of C i .

3.3.4 Hybrid Model If alternative decision-making plans have multiple types of data such as deterministic, fuzzy, and random data, it is necessary to adopt the idea of hybrid simulation for a comprehensive judgment, as shown in Fig. 3.20. Various methods can be selected for building the deterministic decision-making models, see 3.3.1 for details. For deterministic, stochastic, and fuzzy data, Monte Carlo simulations can be used to transform single decisions into deterministic data decisions. And through multiple simulations, the rank of decision-making plans can be given with credibility. (1) Define the number of Monte Carlo simulations. Increasing the number of simulations can improve the accuracy of the simulation results, but a large number of simulations will increase the computational cost. The number of simulations N

112

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.20 Flow chart of hybrid simulation

can be determined by experts with a comprehensive consideration of the design requirements of product attribute indices. (2) Determine the attribute data type in a single simulation to realize the attribute deterministic transformation. Suppose there are m plans and n attributes, including i deterministic attributes, j-i random attributes, and n-j fuzzy attributes (0 < i < j < n). ➀ Directly read data from the i deterministic attributes. ➁ For the j-i random attributes, perform data sampling, by using for instance the acceptance-rejection sampling method, according to their probability densities. Firstly, determine the maximum value M from the distribution function, and generate random values η1 , η2 to judge Mη2 ≤ f[(b − a)η1 + a]Mη2 ≤ f [(b − a)η1 + a], where f (*) is the probability density, and a and b are the upper and lower bounds of the random variables. If the above inequality holds, get the sampling value; otherwise, continue sampling. ➂ For the n-j fuzzy data, perform data sampling according to the degree of membership. Set a set of fuzzy values {a1 ,a2 ,…,an } as the discrete random sampling

3.3 Evolutionary Decision-Making of the Unified Model

113

space, and directly sample the fuzzy values according to the membership degree to obtain the sampling value. By combining the data obtained from steps ➀, ➁ and ➂, the deterministic initial decision-making matrix X can be obtained. ⎡

x11 x12 ⎢ x21 x22 ⎢ X =⎢ . . ⎣ .. .. xm1 xm2

··· ··· .. .

x1n x2n .. .

⎤ ⎥ ⎥ ⎥ ⎦

· · · xmn

(3) Normalization of the data. A linear scale transformation is used to normalize the initial decision-making matrix X. Suppose the certain attribute of a decisionmaking plan is x ij , and the maximum value of all decision-making plans in terms of this attribute is x max j . If this attribute is expected to be larger the better, the attribute is expected to be smaller the normalized attribute ri j = xi j /x max j ; if the ( ) . better, the normalized attribute ri j = 1 − xi j /x max j (4) Use the deterministic decision-making method to make individual decisionmaking plans, sort them, and record the decision-making results. In the meanwhile, by taking the weights of the attributes into account, the normalized attributes are weighted and summed to form an index P. Then, by comparing the magnitude of the P values, and based on the sorting results of the P values, to determine the decision-making plan with the largest P value. (5) Determine the number of simulations. Repeat steps (2) to (4), and after the completion of step (4), determine whether the number of simulations reaches N. If the number of simulations is less than N, return to step (2); otherwise, summarize the results of multiple simulations, and go to step (6). (6) Count the rank of each decision-making plan and calculate its corresponding possibility. Count the rank of each decision-making plan after N times of Monte Carlo simulations, calculate the times N k of a decision-making plan placed in the k th position from ) simulation results, and calculate the possibility degree ( the Nk ' Nk ' , denoted as K . N N (7) Sort and compare all decision-making plans. According to the statistical results of (1), the possibility of each decision-making plan is compared at position K, and the decision-making plan with the highest probability will take this position; if the probabilities of the two decision-making plans are equal, continue to compare the probabilities at the next position, and the decision-making plan with the highest probability will take that position. With this comparison method, the final order of all decision-making plans can be obtained. It is noted that the establishment of the hybrid model requires simulations. Therefore, the hybrid model cannot be developed manually, but on a simulation platform (i.e. China’s howegrown software or commercial software such as Matlab, etc.), in which the basic simulation logic is shown in Fig. 3.21.

114

3 MBRSE Based Unified Model and Global Evolution Decision Method

Fig. 3.21 Simulation logic for hybrid judgment

Chapter 4

System Fault Identification and Control Method Based on Functional Model

Abstract In this chapter, the system fault identification and control method is developed based on the function model for the integrated design of both functional performance and hexability of the system-level product. First of all, the identification method of component functional faults in the total domain is established for function preservation, and furtherly on the basis of this, the identification method of component physical faults in the total domain is established, combined with the mapping relationship between the functional and the physical models. Then, for solving the fault emergence problem during the system integration process, the identification methods of the interface fault, transmission fault and error propagation fault are presented, respectively. Finally, combined with the fault closed-loop mitigation control process, the closed-loop fault mitigation and control method and fault mitigation decision method are developed respectively for the component faults to system faults. Keywords Component fault · System emergent fault · Fault identification · Closed-loop fault mitigation · Fault mitigation decision

The fault model is essentially a view of the product model. At different stages of development, product models are embodied in different forms. Accordingly, the fault model will also show different states, including the component functional fault model, the component physical fault model, the system fault model, etc. In this chapter, the connotation of the fault model is firstly defined as follows. Definition 4.1 The Unified Model-based Fault Model (UMBFM) refers to a set of attributes that comprehensively reflect all the failure characteristics of the product at time t during the product design process. Attributes include faults, fault triggering conditions, fault effects, fault relationships, etc., which can be written as {{ } } FT =t = ( f t , Con t , E f f t , . . .)1 , ( f t , Con t , E f f t , . . .)2 , . . . , Cort , |T = t where f t represents the fault, Cont represents the fault triggering condition, Eff t represents the fault effect, ( f t , Con t , E f f t , . . .)1 represents a set of attributes of fault i at time t, and Cor t represents the relationships among faults at time t. It is easy to note that the UMBFM model is a subset of the unified model. © National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_4

115

116

4 System Fault Identification and Control Method Based on Functional …

Before carrying out an RMS design, the component functional faults, component physical faults and system faults should be systematically identified to ensure the achievement of RMS requirements. The systemic identification methods on those component functional faults, component physical faults, and system faults are given below.

4.1 Identification of Component Functional Faults in the Total Domain to Preserve Function According to the definition in GJB451A, a fault is an event or state in which a product or a part of the product cannot or will not be able to perform its intended function [24]. Accordingly, the definition of component functional fault can be defined as follows. Definition 4.2 The component functional fault refers to the event or state that makes part or all of a single function unable to achieve the requirements in the functional domain. In the product design process, in order to achieve and maintain all functional requirements, all possible faults in the functional domain must be theoretically identified and mitigated. This requirement for identifying all possible faults is called functional fault identification in the total domain, which is defined below. Definition 4.3 Component functional fault identification in the total domain refers to the process of product design that starts from all the items that continuously maintain the function achievement requirements in the functional domain and identifies as many associated component functional faults as possible. It can be seen from Definitions 4.2 and 4.3 that if a functional fault occurs, it means that the function can only achieve part or none of its requirements and it also means that the ability to maintain that function disappears or declines. Therefore, we can determine a functional fault from the perspective of the disappearance or reduction in the ability to achieve and maintain functional requirements. The corresponding process is shown in Fig. 4.1. We can summarize the above process by the following mathematical expression: Assuming that product level i contains ni function achievement requirements FRRij ( j = 1, 2, …, ni ), it is easy to know that each function achievement requirement has multiple states {normal, function loss, discontinuity, incompleteness, offset, …}. When the function achievement requirements lay in the states of discontinuity, incompleteness, offset, etc., it means that the ability to maintain the function achievement requirements continues to decline, indicating that a fault will occur at this time. Similarly, when the function achievement requirements lay in the state of “function loss”, it means that the ability to maintain the function achievement requirements disappears, indicating that a fault will occur as well. Therefore, based on the clues of

4.1 Identification of Component Functional Faults in the Total Domain …

117

Fig. 4.1 Functional fault identification process

the functional fault, we can get the failure modes which might be contained in each function achievement requirement. Furthermore, by considering the influence factors such as use time, use environment, stress conditions, etc., the possible fault causes that lead to each functional fault can be analyzed. Combined with the functional model of the product, it is possible to continue to analyze the impacts of the fault on the function achievement requirement itself and other related functions, possible design improvement measures, and severity categories. Example 4.1 Using the above method, the faults of the signal processor given in Example 2.1 are analyzed, as listed in Table 4.1. Taking the secondary function “supply power” as an example, it can be seen from the functional fault analysis results that the possible causes of no power and power degradation include power down, current undervoltage/overvoltage/surge, and input current signal interference. To avoid the occurrence of these causes, the following measures are required. (1) Increase the protection of power down to avoid a short circuit, open circuit, or performance degradation caused by the power down. (2) Suppress current undervoltage/overvoltage/surge to avoid loss or decline of the anti-spike function. (3) Increase current signal filtering. (4) Suppress the electromagnetic interference to the power signal. Therefore, the corresponding functions maintenance requirements can be obtained as follows. (1) (2) (3) (4)

Dangerous avoidance of power down. Suppression of current undervoltage/overvoltage/surge. Avoidance of interference with current input signal. Avoid electromagnetic interference to the power signal.

Function discontinuity

Incompleteness of the function

3

Loss of prescribed function

Supply power

1

2

Fault clues

Function

Sequence number

Table 4.1 Functional fault information

–

The output voltage is unstable or has parasitic ripple

No-output

Fault

–

Input current signal interference

(1) Power down (2) Current undervoltage/ overvoltage/ surge

Cause of fault cause

Cause the current transmission connector to burn out

Impact on other associated functions

–

–

Power supply No function performance is degraded

The power supply has no output and fails function

Impact on oneself

Severity category

–

(1) Increase current signal filtering (2) Suppresses the power signal from electromagnetic interference

(continued)

–

II

(1) Enhance power failure I protection to avoid open circuit or performance degradation caused by power failure (2) Enhanced current undervoltage/overvoltage/ surge suppression to avoid degradation of anti-spike function

Possible design improvement measures

118 4 System Fault Identification and Control Method Based on Functional …

Undesired function

6 –

–

–

Performance time deviation

5 –

Output (1) Power down higher/lower (2) Current than specified undervoltage voltage/ overvoltage/ surge

Performance deviation

4

Cause of fault cause

Fault

Fault clues

Sequence number

Function

Table 4.1 (continued)

–

–

Power supply function performance is degraded

Impact on oneself

–

–

Cause the current transmission connector to burn out

Impact on other associated functions

Severity category

–

–

–

–

(1) Enhance power failure II protection to avoid open circuit or performance degradation caused by power failure (2) Enhanced current undervoltage/overvoltage/ surge suppression to avoid degradation of anti-spike function

Possible design improvement measures

4.1 Identification of Component Functional Faults in the Total Domain … 119

120

4 System Fault Identification and Control Method Based on Functional …

4.2 Identification of Component Physical Faults in the Total Domain Based on the Function-Physics Mapping Physical fault is mainly for the physical component in the physical domain, and is the root cause of the functional fault. Definition 4.4 Component physical fault refers to the event or state that makes a single physical component in the physical domain (which is expected to achieve the function achievement requirements in the functional domain) unable or partially unable to implement the functions assigned by the design. According to the above definition, it can be seen that in the unified design process, the identification of the component physical fault can be obtained through the component functional fault mapping. There are two types of physical components in the physical domain, namely basic physical components and robust physical components. Definition 4.5 The identification of component physical faults in the total domain refers to identifying as many potential physical faults associated with the physical components as possible from the mapping relationship between the physical domain and the functional domain in the product design process. Definition 4.6 Basic physical component refers to the physical component that is implemented by applying the axiomatic design method to hierarchically map the “function achievement requirements”. Definition 4.7 Robust physical component refers to the physical component that is implemented by applying the axiomatic design method to hierarchically map the “function maintenance requirements”.

4.2.1 Fault Identification Methods of Basic Physical Components Based on the mapping relationship between the functional domain and physical domain, the component functional fault in the functional domain will be mapped to fault parameters of that physical component. Then, the impact information of the component physical fault can be obtained from the mapping relationship between functional entities and physical components, by using the process shown in Fig. 4.2. The root causes of the physical failure of the component come from the physical and chemical effects of the physical component itself, as well as the influence of internal and external loads. For the fault impact information obtained based on the function-physics mapping, the following conditions can be further combined to identify the physical faults that cause the relevant fault impacts:

4.2 Identification of Component Physical Faults in the Total Domain Based …

Fig. 4.2 Fault identification process of a basic physical component

121

122

4 System Fault Identification and Control Method Based on Functional …

(1) The working principle of the physical components, including the component structure and materials, information and energy in its physical and chemical action process. (2) The characteristics of the devices, raw materials, and mechanical parts to make up the physical component, such as the tensile strength, flexural strength, compressive strength, seismic strength, expansion coefficient, density, dielectric constant of the metal materials, temperature resistance and antimagnetic properties of electronic materials; For components/raw materials/parts, according to the above conditions, physics of failure models (PoF) can be constructed for fault analysis and identification. (3) The internal and external loads applied on the physical components, including vibration, temperature, humidity, electromagnetic, pressure, mold, salt spray, sand and dust, remainder particles. It must be pointed out that repeated physical faults are very possible to be obtained by this method, so it is necessary to merge the physical faults, in which the faults obtained by this method are merged to get the component physical faults, expressed as follows: ⎧ I ⎫ I ⎬ n ik ⎨∐ I FD Pik = FD Pik II (4.1) ⎩ I ⎭ j=1 j

I in which: FD Pik represents the physical fault set of the component D Pik , FD Pik I j represents the physical fault set identified by the analysis of the function realization requirement F R Rik j , n ik represents the number of function realization requirements associated with the physical component D Pik . Taking the mapping relationship between functional domain and physical domain as an example, the analysis process shown in Fig. 4.2 is graphically illustrated in Fig. 4.3, in which the influences of the physical fault on the component itself and the fault severity category can be determined by path ➀; the influences of the physical fault on the associated equipment can be determined by path ➁; and the influences of the physical fault on the upper level equipment can be determined by path ➂. After selecting an associated influence, the physical fault of the component can be determined from the component internal working principle, characteristics of components/materials/parts, and the internal and external work conditions.

4.2.2 Fault Identification Methods of Robust Physical Components Robust physical components are obtained mainly through the map of function preservation requirements. Since the function preservation requirement itself has no corresponding fault, its associated robust physical component fault needs to be obtained

4.2 Identification of Component Physical Faults in the Total Domain Based …

123

Fig. 4.3 Illustration of the fault identification process of the basic physical components

from the fault set of the corresponding function realization requirement, by using the identification process is shown in Fig. 4.4. Similarly, repeated component physical faults may also be obtained by the above method. Therefore, these faults need to be merged, expressed as follows: FD Pik' =

⎧ ' n ik ⎨∐ ⎩

p=1

⎫ I ⎬ FD Pik' I p ⎭

(4.2)

I in which: FD Pik' represents the physical fault set of the component D Pik' , FD Pik' I p represents the physical fault set identified by the analysis of the function realization ' represents the number of function realization requirements requirement F P Rikp , n ik associated with the physical component D Pik' . Taking the mapping relationship between functional domain and physical domain as an example, the analysis process shown in Fig. 4.4 is graphically illustrated in Fig. 4.5, in which the influences of the physical fault on the component itself and the fault severity category can be determined by path ➀; the influences of the physical fault on the associated equipment can be determined by path ➁; and the influences of the physical fault on the upper level equipment can be determined by path ➂.

124

4 System Fault Identification and Control Method Based on Functional …

Fig. 4.4 Fault identification process for robust physical components

4.2 Identification of Component Physical Faults in the Total Domain Based …

125

After selecting an associated influence, the fault of the robust physical component can be determined from the internal working principle of the physical component, the characteristics of the components/materials/parts, and the internal and external work conditions. Through the above process, omission of the physical component faults can be avoided to a great extent, to enable the designers to analyze all possible potential physical faults as comprehensively as possible. However, the comprehensiveness and completeness of the faults analysis still need to be guaranteed by the designer through a full understanding of the physical action process of the physical components. Example 4.2 By using the analysis method mentioned above, the fault analysis results of each physical component of the signal processor in Example 2.1 can be obtained, and partly shown (due to the limited space in this book) in Table 4.2. Taking the input filter circuit as an example, through the fault analysis, it can be known that the possible causes of the short circuit, open circuit and parameter drift of the capacitor C1 are breakdown, oxidation, corrosion cracking, pollution and dielectric aging. In particular, the capacitor is very likely to be burned out under high temperature conditions during flight of the aircraft. To avoid these possible causes during practical use, one should not use electrolyte capacitors, but solid dielectric capacitors for properly working in a high-temperature environment. The working

Fig. 4.5 Illustration of the fault identification process for robust physical components

126

4 System Fault Identification and Control Method Based on Functional …

medium of the tantalum capacitors is a very thin Ta2 O5 layer formed on the surface of tantalum metal, with superior anti-oxidation and anti-corrosion characteristics. The detailed design parameters of the capacitor are listed in Table 4.3. The design parameters of other physical components can also be obtained using similar methods. According to the Chinese military standard GJB299C-2006 [25], the working failure rate of the solid tantalum capacitor is 0.0365 × 10–6 /h, which is capable of meeting design requirements.

4.3 Emergent Fault Integrated Identification The completion of the design functions of a physical component must depend on the entire system in which it is located. The overall function of the system is not a simple superposition but comprehensive integrity of all component functions. However, when constructing the Physics-of-Failure (PoF) model of a component, the designer mainly considers the design problem existing in the physical component itself, but ignores the new faults that may emerge in the integration of different physical components at the system level. The system integration process needs to identify and analyze these emerged faults to verify the design completeness on the system functions and reliability. These emerged faults or unexpected functions that may occur during system integration are as follows: (1) The faults that may occur during the installation and fix of the physical component in the system. (2) The faults that may occur during the transfer of functional parameters among different physical components with the physical components as carriers. (3) The faults that may be caused by the amplification effect in system integration, although there is no fault in each component. (4) The faults to result in unexpected and undesired functions beyond the functional domain that may be caused in the system composed by different physical components. For these four types of faults, the following definitions are provided: Definition 4.8 Interface-fault (I-F) refers to the fault, including transmissioninterface (T-I) inconsistency, fault-interface (F-I) mismatch, and logic-interface (LI) non-conformity, existing in the subsystems or systems composed by different physical components. These faults can be identified through the consistency and coordination analysis on the interfaces. Definition 4.9 Transmission fault (T-F) refers to the fault of the physical components that will be transmitted to a higher-level system through the system interface, action principle, or the fault that will cause additional faults existing in the neighbouring physical components.

Pollution, corrosion and medium aging

(1)The output voltage is unstable or has parasitic ripples

No

Parameters drift

No

Impact on associated equipment

3

Impact on oneself

(1) The No output voltage is unstable or has parasitic ripples (2) The output is below the specified voltage

Potential cause of the fault

Open circuit Oxidation, corrosion, and fracture

Avoid input current signal interference

Fault

2

Electric capacity C1

Associated functional requirements The power supply does not have output

Input filter circuit

Physical unit

Physical unit

Virtual entity Short circuit Breakdown

1

Sequence number

Table 4.2 Part of the physical component faults and effect analysis in the signal processor analysis table

The output quality of the power module is degraded

The output quality of the power module is degraded

Signal processing function is degraded

Signal processing function is degraded

(continued)

IV

III

I

Final impact Severity category

The power Loss of a module has no signal output processing function

Impact on upper equipment

4.3 Emergent Fault Integrated Identification 127

Potential cause of the fault

Short circuit Breakdown

Electric capacity C4

7

Pollution, corrosion, and medium aging

Parameters drift

Filter the Short circuit Breakdown output voltage signal

Fault

6

Electric capacity C3

Associated functional requirements

Open circuit Oxidation, corrosion, and fracture

Output filter circuit

Physical unit

Physical unit

Virtual entity

5

4

Sequence number

Table 4.2 (continued)

The functional performance of the power module has degraded

Loss of power Loss of a module signal function processing function

The + 15 V no power supply is unstable or has parasitic ripples The + 15 V no power supply has no output

Signal processing function is degraded

Signal processing function is degraded

The functional performance of the power module has degraded

(continued)

I

IV

III

I

Final impact Severity category

The + 15 V No power supply is unstable or has parasitic ripples

Impact on upper equipment Loss of power Loss of a module signal function processing function

Impact on associated equipment

The + 15 V No power supply has no output

Impact on oneself

128 4 System Fault Identification and Control Method Based on Functional …

…

…

…

…

…

Pollution, corrosion, and medium aging

…

Potential cause of the fault

Parameters drift

Fault

9

Associated functional requirements Open circuit Oxidation, corrosion, and fracture

Physical unit

Physical unit

Virtual entity

8

Sequence number

Table 4.2 (continued)

…

…

…

The functional performance of the power module has degraded

The + 15 V no power supply is unstable or has parasitic ripples

Impact on upper equipment The functional performance of the power module has degraded

Impact on associated equipment

The + 15 V no power supply is unstable or has parasitic ripples

Impact on oneself

…

Signal processing function is degraded

Signal processing function is degraded

…

IV

III

Final impact Severity category

4.3 Emergent Fault Integrated Identification 129

130

4 System Fault Identification and Control Method Based on Functional …

Table 4.3 Reliability design parameters of the capacitor C1 Sequence number

Parameter name Parameter value

1

Surface mount

Solid tantalum capacitors for chips

2

Rated voltage

12 V

3

Withstand voltage

20 V

4

Allowable deviation

± 20%

5

Dielectric loss

8

6

Series resistance

R ≥ 3.0Ω

7

Capacitance

C > 500 μF

8

Quality grade

A2, According to the quality certification standards, the products that have been certified by the China Electronic Components Quality Certification Committee are qualified

It is easy to know that the transmission faults will be transmitted layer by layer to the level of system. Such a propagation logic chain caused by those transmission faults is called a fault transmission line (F-TL), which can be abbreviated as a fault line (L) represented by the following equation: L=

) {(−−−→ Fsi Fξ j , Pεi ξ j |ε ∈ (1, 2, . . . , n), ξ ∈ (1, 2, . . . , n), i ∈ (1, 2, . . . , n ε ), } j ∈ (1, 2, . . . , n ξ ) (4.3)

in which: F εi represents the fault i existing in the product ε, F ξ j represents the fault j existing in the product ξ , Pεiξ j represents the probability of occurrence of F ξ j caused by F εi . The fault line characterizes the relationship not only between the component and system faults, but also between different component faults. And the construction of the fault line construction process should also consider the impacts by I-F. The transmission faults can be identified through the analysis of transmission relationship between different component physical faults, or through the mapping of the corresponding functional faults in the system. Definition 4.10 Error propagation fault (E–F) refers to the excessive output error of the system caused by the combined action of the output errors of one or more physical components under the condition that no fault is occurred in neither those physical components or their interfaces. Such faults can be analyzed and identified using error propagation theory. Definition 4.11 Potential functional fault (S-F) refers to out-of-functional domain and unexpected fault in the physical components, after they compose a physical system.

4.3 Emergent Fault Integrated Identification

131

The typical representative of a potential functional fault is the potential pathway fault, which can be found through potential pathway analysis by using a set of relatively mature theoretical methods. The other types of potential functional faults can be reversely analyzed to understand their potential harmful functions through the operation process of the physical components.

4.3.1 Interface Fault In the product design process, the designer should consider not only the physical realization of the functions of the product, but also the various relationships between the interfaces. According to the statistics of the field data on faults of an aircraft [26], the interface faults account for 15–25% of the entire aircraft faults. Nowadays, fault analysis in the product design process by industrial enterprises, for a new product or a modified product, is limited to the main function of the product system and its accessories. However, very little analysis is conducted on the influences by mutual cross-linking, interference (such as structural interference and electromagnetic interference), and environmental conditions. On the basis of the interface types in the physical model, the corresponding analysis methods are given below.

4.3.1.1

T-I

Based on T-I analysis, the coordination check of transmission parameters is carried out sequentially for all the physical components included in the system, as listed in Table 4.4. If the transmission parameters between the two physical components are consistent or coordinated, fill in “Y”, otherwise fill in “N”, or “X” if there is no interface. The transmission parameters of the two associated physical components should be combined in the analysis. Table 4.4 Table of T-I consistency and coordination analysis Class T interface matrix Start

E1 O

E2 I

Y/N/X

O

… I

…

Y/N/X

E1

O O

…

…

En

O

I

…

I

…

O O Y/N/X

… I

O Y/N/X

I I

Y/N/X

Y/N/X

Y/N/X

O

End

Y/N/X

Y/N/X E2

En

O

I

Y/N/X I

O

I

Y/N/X

… I

…

O Y/N/X

I

132

4 System Fault Identification and Control Method Based on Functional …

Table 4.5 Table of F-I consistency and coordination analysis of the physical components

4.3.1.2

E1

E2

…

En

Y/N/X

Y/N/X

Y/N/X

Y/N/X

Y/N/X

E1 …

Y/N/X

E n−1

Y/N/X

F-I

The consistency and coordination checks of F-I are performed on all physical components included in the system, as listed in Table 4.5, in which only the items in the upper right corner or lower left corner need to check. If the F-I between the two physical components is consistent or coordinated, fill in “Y”, otherwise fill in “N”, or “X” if there is no interface. The analysis should be conducted together with the 3D models of the associated physical components in the analysis process.

4.3.1.3

L-I

In the analysis of the L-I problem, it is necessary to consider the strength of the design of the entity and the possible stress under working conditions, both of which require consistency and coordination checks, as listed in Table 4.6. In the results of the analysis process, the stress analysis of the associated physical components and the system should be carefully checked. Based on the interface incoordination or inconsistency problems obtained from the above analysis (whose the answer is “N”), further in-depth analysis on the possible fault characterizations and fault causes of the product need to be conducted, and the corresponding improvement method needs to be determined. This interface faults analysis can be assisted by using models, as shown in Table 4.7. Table 4.6 Table of L-I consistency and coordination analysis of the physical components Stress Vibration physics units Possible Tolerance range range

High-low temperature

……

Electromagnetic

Possible Tolerance Possible Tolerance Possible range range range range range

E1

Y/N/X

Y/N/X

…

Y/N/X

E2

Y/N/X

Y/N/X

…

Y/N/X

……

Y/N/X

Y/N/X

…

Y/N/X

En

Y/N/X

Y/N/X

…

Y/N/X

Tolerance range

4.3 Emergent Fault Integrated Identification

133

Table 4.7 Table of the interface faults analysis Interface Associative Cause Impact on Systematic Final Severity Design fault entity of fault associated impact impact category compensation entities …

…

…

…

…

…

…

…

Fig. 4.6 Transmission relationship between physical faults at different levels

4.3.2 Transfer Fault 4.3.2.1

Process of Transmission of Faults

Transmission of Faults Among Physical Components at Different Levels In addition to directly obtain physical faults mapped from functional faults, subsystems/systems in the physical domain decomposition model can also obtain fault information from the analysis of fault effects of its components, and the transmission relation is shown in Fig. 4.6. (1) Effect on the product at the upper level by a physical component – > origin of the fault modes in the subsystem/system. (2) Fault mode of a physical component – > root causes of the fault in the subsystem/ system. (3) Final effect by a physical component – > final effect by the fault in the subsystem/ system. (4) Severity level by a physical component – > severity level by the fault in the subsystem/system. (5) Fault frequency ratio of a physical component – > calculated from the reference value of the fault frequency ratio of the subsystem/system. This process actually gives the many-to-one mapping relationship from the physical component fault set to the subsystem/system fault set, which is expressed in Eq. (4.4).

134

4 System Fault Identification and Control Method Based on Functional …

Fig. 4.7 Transmission relation between the faults of physical components at the same level T

Fi −→ FP

(4.4)

in which: F i represents the fault set of the physical component i; F P represents the fault set of the subsystem/system P; T represents the transfer function. The fault frequency ratio of the subsystem/system can be calculated from the fault frequency ratio of the physical component. Assuming that the subsystem/system P contains n physical components and K faults, where the fault k is transmitted by the fault f ji in one or several physical components Pj ( j = 1, 2, …, n), then the fault frequency ratio k to the reference value α k can be calculated using Eq. (4.5): ) ∑n ∑ K j ( ) ( j=1 i=1 Ik ji × λ j × α ji αk = T λ j , α ji = ∑ K ∑n ∑ K j ( ) i=1 Ik ji × λ j × α ji k=1 j=1

(4.5)

in which: λj represents the fault rate of the physical component Pj ( j = 1, 2, …, n); α ji represents the frequency ratio of the fault i (i = 1, 2, …, K i ) of the physical component Pj ( j = 1, 2, …, n); I kji is an indicative function, when I kji = 1, it represents the physical component Pj ( j = 1,2,…,n) fault i (i = 1,2,…,K i ) is to fault k; when I kji = 0, it means that fault k is irrelevant to fault i (i = 1,2,…,K i ) of the physical component Pj ( j = 1,2,…,n).

Transmission of Faults Among Physical Components at the Same Level For the physical components at the same level, there is also a fault transmission relation between them affected by the “T” interface, the “F” interface, and the “L” interface, as shown in Fig. 4.7. If the fault FD Pi j f1 of the physical component DPij causes the fault FD Pi j f2 of the physical component DPpq to occur, then the fault FD Pi j f2 should be considered as an item in the ‘impacts on associated equipment” of the fault FD Pi j f1 ; otherwise, the fault FD Pi j f1 should be regarded as the fault cause of the fault FD Pi j f2 .

4.3 Emergent Fault Integrated Identification

4.3.2.2

135

Design of the Transfer Model

The core of fault transmission line (F-TL) is the fault and its transmission relationship. Since the fault of the component is not unique, it is necessary to clearly specify which fault triggers the F-TL and affects what kind of fault in which component. The concept of object-oriented design and analysis originally derived from computer programming technology has now penetrated into a variety of fields. It constructs systems as “real world” objects in a way that is close to actual domain terms. When a designer analyzes a fault, they can also treat it as an object class rather than a property of the product. In the design process, the designer first pays attention to the fault and its impacts and probability of occurrence, which will be evaluated to determine the critical fault. The designer then develops design improvements or uses remedial measures to mitigate the fault until the reliability requirements are satisfied. Therefore, the component and its faults can be represented by graphic elements shown in Fig. 4.8, in which a product object is firstly created by the user, and then a product fault object is created under the product object. In Fig. 4.8, S i1 represents the severity of the i1th fault of component product i; Pi1 represents the probability of failure of the i1th fault of component i; S i represents the most serious consequence of the faults of component i; Pi represents the probability of failure of component i. (1) The detailed information of the fault in a fault chain node shown in Fig. 4.8 should be foldable and hidden, and displayed via a fault transmission chain between components. (2) The left and right sides of the fault chain node shown in Fig. 4.8 are displayed as connection ports, where the left input port indicates the fault of other components that causes fault i1 of component i to occur (i.e. the fault cause). The right output port indicates that the fault i1 of component i triggers the fault of other components to occur. Then, it can be known that: Si = max

{

j=1,··· ,n

Pi =

n ∑ j=1

Fig. 4.8 Graphical illustration of a fault chain node

Pi j =

Si j

}

n ∑ ( ) λi ti βi j j=1

(4.6)

(4.7)

136

4 System Fault Identification and Control Method Based on Functional …

in which: n represents the number of faults of component i; λi represents the fault rate of component i; t i represents the working time of component i; β ij represents the frequency ratio of fault j of component i. Example 4.3 Fig. 4.9 illustrates the graphical representation of the fault chain node of the “left-axis angle converter” of the signal processor in Example 2.1. From the fault analysis process given above, it can be seen that the fault transmission relation mainly comes from two aspects, one is the influence of the faults at the same level; the other is the influence of the faults at different levels.

Influences in Between the Faults in the Same Level Suppose DPij represents the jth component at the ith level, and includes in total K ij faults f ijk . And) there are M ij entities associated with DPij , denoted as D Pic jc i c = ( 1, 2, . . . , Mi j , whose fault jic jc kc may be caused by the occurrence of fault f ijk of component DPij . Then the corresponding transmission chain can be expressed in Fig. 4.10. The transmission relation between the faults is reflected by a broken line with a rightward arrow, and the components at different levels are separated by a horizontal line, with indications of the level attributes on the right side.

Fig. 4.9 Graphical representation of the left axis angle converter

Fig. 4.10 Transmission chain of faults at the same level

4.3 Emergent Fault Integrated Identification

137

Fig. 4.11 Transmission chain of faults at different levels

In practical engineering, it does not usually consider the influence of a subsystem/ system on the faults of its own component. If the fault transmitted at the same level continues to cause faults in other components, it should be shown simultaneously in the fault transmission chain, as shown by f icj2k2 in Fig. 4.10.

Influences of the Faults in Between Components at Different Levels Suppose that DPij is the jth component at the ith level, and includes in total K ij faults f ijk . Its subsystem/system is D P(i−1) j f , whose fault f(i−1)jf kf f (i−1) j f k f is transmitted from fault f ijk . Then the corresponding transmission chain can be expressed in Fig. 4.11. Likewise, the transmission relationship between the faults is reflected by a broken line with a rightward arrow, and a horizontal line is used to separate the components at different levels. Example 4.4 Together with the above results of the fault analysis, according to the fault transmission relation model provided in this section, the transmission relation diagram of component faults in the signal processing module of the signal processor can be obtained, as shown in Fig. 4.12.

4.3.2.3

Fault Logic Model

In the aforementioned fault analysis process, we assumed that the occurrence of a component fault will inevitably lead to an associated fault of the subsystem/system. However, in practical engineering, the subsystem/system fault is usually caused by the joint effects from multiple component faults, environmental disturbances, and human factors. Therefore, the logical relationship between the faults at different hierarchical levels of a product should be analyzed deeply, with the consideration

138

4 System Fault Identification and Control Method Based on Functional …

Fig. 4.12 Transmission relation diagram of component faults in the signal processing module

of environmental disturbances and human factors to more clearly express the fault logic of the product. Before introducing the fault logic analysis method, the following events are firstly defined: (1) (2) (3) (4)

Machine fault. Environmental disturbance. Human misoperation. Combination: a combination of effects by the above three factors.

In the functional model or physical model of a product, each function requirement or physical component has fault attributes. During the fault logic analysis, the designer can select a higher-level function requirement or subsystem/system and any related fault, and combine the fault transmission relation obtained by the aforementioned analysis to generate a simplified fault logic model, by using the following rules: (1) The transmission logic between the faults at the upper and lower levels is defined as “OR”. (2) The cause that leads to the occurrence of a fault is defined as an event: ➀ If a fault cause (which can be regarded as the fault at the next level) does not contain the effects by environmental disturbances and human factors, then it can be considered as a basic event which is defined as “machine fault”. ➁ If a fault cause (which can be regarded as the fault at the next level) does contain the effects by environmental disturbances and human factors, then it can be considered as an intermediate event which is defined as “combination”.

4.3 Emergent Fault Integrated Identification

139

As shown in the left figure in Fig. 4.13, it is assumed that “Product 2” at a level contains 4 faults, and the occurrence of “Fault 2” is caused by the faults of “Product 2.1”, “Product 2.2” and “Product 1” at a lower level, and “Fault 2.2.4” of “Product 2.2” is triggered by the fault of “Product 2.2.2” at a lower level, then based on the above rules, the fault logic model can be generated as shown in the right figure in Fig. 4.13. Example 4.5 For the transmission relation shown in Fig. 4.12, the fault logic model shown in Fig. 4.14 can be obtained by using the above-mentioned generation process. Based on the fault logic model, designers can continue to optimize the fault transmission behaviour by analyzing the gate logic and events.

Fig. 4.13 Generation of a fault logic model

Fig. 4.14 The fault logic model generated by the abnormal events in the working process of the signal processor

140

4 System Fault Identification and Control Method Based on Functional …

(1) Gate logic: Currently, the commonly used logic gates are listed in Table 4.8. (2) Events: Continue to add intermediate events and basic events, especially the basic events of “environmental disturbance” and “human misoperation”.

4.3.3 Error Propagation Fault The ever-increasing complexity and interaction of modern products have resulted in tighter and tighter inter-relations among system components. For the practical products, the output of a component can usually be used as the input of its latter components, and the output result of each component is usually not unique, but fluctuated within a certain range (namely so-called error). When a component has an error source caused by an internal defect, output error will be occurred. And more especially, when there is a nonlinear relationship between the output and the input (which is quite common in complex products), the output error will often be amplified. If in this circumstance, there are multiple components connected to the system, the error will also be amplified with the increase of the number of components connected in series, eventually leading to a fault of the product or system, as shown in Fig. 4.15. Such a fault cannot be found through manual analysis in the design process, but needs to be analyzed by means of simulation in combination with the product working principles. In the simulation process, we can take the input and output parameters as random variables and judge whether the product fails by extracting the characteristic information of the parameters for comparison. According to its definition, error propagation fault refers to the fault in which the system output exceeds the threshold range, caused by the joint effect of multiple input errors. Suppose that X i (i = 1, 2, …, n) represents the input parameter; Err Xi (i = 1, 2, …, n) represents the error of X i ; Y j ( j = 1, 2, …, p) represents the output parameter

Fig. 4.15 Illustration of the error propagation process

4.3 Emergent Fault Integrated Identification

141

after the action of the product, system, or component with the input parameters X i (i = 1, 2,…,n); Err Yj ( j = 1, 2,…,p) represents the error of Y j . And the relationship between the output and input parameters of the product is expressed by: Y j = f j (X 1 , X 2 , . . . , X n )( j = 1, 2, . . . , p)

(4.8)

in which f j (X 1 ,X 2 ,…,X n ) represents the interrelationship from input parameter X i (i = 1,2,…,n) to output parameter Y j . The mean of the output parameters is calculated as follows: ( ) μY j = f j μ X 1 , μ X 2 , . . . , μ X n ( j = 1, 2, . . . , p)

(4.9)

Let [ ∇X =

∂ ∂ ∂ , ,..., ∂ X1 ∂ X2 ∂ Xn

]T

[ ]T f (X ) = f 1 (X ), f 2 (X ), . . . , f p (X )

(4.10) (4.11)

Then the gradient operator [28] (i.e. Jacobi function) of f (X) can be calculated by: ]T [ FX = ∇ X · f (X )T = f (X ) · ∇ XT ⎡ ⎤ f 1 (X ) ] ⎢ f 2 (X ) ⎥[ ∂ ∂ ∂ ⎢ ⎥ =⎢ . ⎥ , ,··· , ⎣ .. ⎦ ∂ X 1 ∂ X 2 ∂ Xn ⎡

f p (X )

⎤ · · · ∂∂Xf1n · · · ∂∂Xf2n ⎥ ⎥ .. . . .. ⎥ ⎥ . . ⎦ . ∂ fp ∂ fp · · · ∂ Xn ∂ X2

∂ f1 ∂ f1 ∂ X1 ∂ X2 ∂ f2 ∂ f2 ∂ X1 ∂ X2

⎢ ⎢ =⎢ ⎢ .. ⎣ .

∂ fp ∂ X1

(4.12)

The covariance matrix of the output parameters can be calculated by: I CY = FX C X FXT Iμ X ,μ X ,··· ,μ X n 1 2 ⎡ ⎤ 2 σ Y1 σ Y1 Y2 · · · σ Y1 Y p ⎢σ ⎥ ⎢ Y2 Y1 σY22 · · · σY2 Y p ⎥ =⎢ .. . . . ⎥ ⎢ .. . .. ⎥ . ⎣ . ⎦ σY p Y1 σY p Y2 · · · σY2p

(4.13)

142

4 System Fault Identification and Control Method Based on Functional …

⎡

⎤ σ X2 1 σ X 1 X 2 · · · σ X 1 X n ⎢ σX2 X1 σ 2 · · · σX2 Xn ⎥ X2 ⎢ ⎥ in which: C X = ⎢ . .. . . .. ⎥ is the covariance matrix of the output ⎣ .. . . ⎦ . σ X n X 1 σ X n X 2 · · · σ X2 n ∑ ∂ fl ∂ f k 2 ∑ ∑ ∂ fl ∂ f k parameters, σYl Yk = σ + σ σ is the covariance in ∂ Xi ∂ Xi Xi ∂ Xi ∂ X j Xi X j i i/= j ∑( ∂ fk )2 2 ∑ ∑ ∂ fk ∂ fk σXi + σ between output parameters Y l and Y k , σY2k = ∂ Xi ∂ Xi ∂ X j Xi X j i

i/= j

is the variance of output parameter [ ] Yk. When μYk − σYk , μYk + σYk is out of the threshold range of output parameter Y k , a fault is considered to be occurred in the product. Example 4.6 The working principle of a universal active filter is shown in Fig. 4.16. Its output eigenfrequency ωn is a function of capacities C 1 , C 2 , and resistances R1 , R2 , RF1 , RF2 , with an equation: ωn = R1 ×R F1 ×RR2F2 ×C1 ×C2 . If the mean and allowance error of capacities C 1 , C 2 are 1000pF and ± 10% respectively, the mean and allowance error of resistances R1 , R2 are 50 kΩ and ± 10% respectively, the mean and allowance error of resistances RF1 , RF2 are 2.65 MΩ and ± 10% respectively. Then through the above process, the standard deviation of the eigenfrequency ωn can be calculated to be approximately 304, and the error will reach ± 21%. If no measures are considered during the design process, the output accuracy of the universal active filter will not meet the requirements.

Fig. 4.16 Working principle diagram of a universal active filter

4.4 Fault Closed-Loop Mitigation Control

143

4.4 Fault Closed-Loop Mitigation Control 4.4.1 Component Fault Closed-Loop Mitigation Control Definition 4.12 Component fault refers that the component achieves an expected function due to that component itself rather than the influence from other related components in the system. It includes component functional fault and component physical fault. In this book, in order to achieve the comprehensiveness, validity and standardization of the component fault mitigation, a logic decision-based component fault mitigation control model is established combined with the fault closed-loop mitigation process for realizing product reliability requirements, as shown in Fig. 4.17. Question 1: If a fault occurs during field use, will it affect safety? If the answer is “Yes”, further risk analysis and evaluation are required, and go to Question 2; if not, go to Question 3. In the early stage of product design, the risk matrix can be used for risk analysis and evaluation. Question 2: Is the risk level and occurrence possibility of the fault within an acceptable range? If the answer is “Yes”, go to Question 3; if not, improvements must be made. At this time, the designer must clarify the current implementation of the improvement measures for the fault and go to Question 5. Question 3: If the fault is occurred during field use, will it affect the completion of the mission? If the answer is “Yes”, go to Question 4 to check whether the probability (level) of the fault is within the acceptable range; if not, go to Question 6. Question 4: Is the probability (level) of the fault within an acceptable range? If the answer is “Yes”, go to Question 6 to check whether the fault needs to be mitigated; if not, it means that the fault is not allowed to exist, and the design must be improved, and go to Question 5. Question 5: For the circumstance that the design must be improved, have the improvement measures been implemented? If the answer is “Yes”, then go to Question 9 to check whether the fault has been eliminated, or whether both the probability of occurrence and consequences of the fault have been reduced to an acceptable range; if not, it means that the design improvement measures for the fault have not yet been determined, i.e. the design is currently in “under improvement” process. Once the design improvement measures are determined, go to Question 9.

144

4 System Fault Identification and Control Method Based on Functional …

Fig. 4.17 The closed-loop mitigation process model-based on logic decision-based component fault

Question 6: Does the current fault need to be mitigated? If the answer is “Yes”, go to Question 7; if not, it means that the fault “no mitigation required”, and it also means that the fault is very likely to occur in the future. Therefore, it is necessary to go to Question 10 to check whether the fault can be tested, and go to Question 12 to check whether there are corresponding use remedy measures,

4.4 Fault Closed-Loop Mitigation Control

145

especially for the fault that affects the task but with an acceptable probability of occurrence. Question 7: Are there any corresponding design improvement measures for this fault? If the answer is “Yes”, go to Question 8 and the designer must determine whether the improvement measures have been implemented; if not, it means that the fault is unable to be mitigated, and then the designer must explain the reasons why the design cannot be improved from the technical and economical perspectives. For the fault that cannot be mitigated but is very likely to occur in the future use process, the designer must goes to Question 10 to check whether it can be tested, and go to Question 12 to check whether there are corresponding compensation measures. Question 8: Have the design improvements for the fault been implemented? If the answer is “Yes”, then go to Question 9 to check whether the fault has been eliminated, or the probability of occurrence of the fault and its consequences can be reduced to an acceptable range; if not, it means that the design improvement measures for the fault have not yet been determined, i.e. the design is currently in “under improvement” process. Once the design improvement measures are determined, go to Question 9. Question 9: Has the fault been eliminated? If the answer is “Yes”, the designer must furtherly provide the implementation department and implementation results of the design improvement measures, and specify in details which tests or inspection methods are used to prove that the fault is indeed eliminated and will not occur during use; if not, it means that the probability of occurrence of the fault and its consequences is reduced to an acceptable range. It is necessary to provide the implementation department and implementation results of the measures, the reduced occurrence probability and consequence severity of the fault. Meanwhile, it is also necessary to specify in detail which tests or inspection methods are used to prove that the probability of occurrence and consequences of the fault are indeed “reduced”. Question 10: If the fault occurs during use, it is necessary to further check whether it can be tested? If the answer is “Yes”, it is necessary to determine whether the design plan for the test is implemented, and go to Question 11; if not, it is necessary to go to Question 12 to check whether there are corresponding remedy measures during the usage process. Question 11: Is the design plan implemented for testing the fault? If the answer is “Yes”, go to Question 12; if not, the designer should carry out a test modelling and analysis research to determine the test method, and implement it into the product design plan, and then go to Question 12. Question 12: Does the fault require use remedy measures? Use remedy measures must be applied on the fault which not only influences the safety and task, and but also cannot be completely eliminated. For the fault which does not affect the safety and task, the designer can make his\her own decision based

146

4 System Fault Identification and Control Method Based on Functional …

on the actual situation. If a remedy measure is required, the designer should go to Question 13. Question 13: Is the use remedy measures implemented? If the answer is “Yes”, according to the design plan, the designer needs to give detailed information including preventive maintenance information such as work type, maintenance level, timing of tasks, interval of tasks, and description of tasks, corrective maintenance information such as work type, maintenance level, and description of tasks, as well as support resource allocation information. If not, it is necessary to carry out a batch of support analysis work such as reliability-centered maintenance analysis (RCMA), corrective maintenance work analysis, operation and maintenance task analysis (O&MTA) and level of repair analysis (LORA) to determine the operation and maintenance support plan. The logical decision process for different types of fault events is consistent. However, the difference lies in the determination of design improvement measures and the use remedy measures. For environmental disturbances, the design improvements can be made in terms of anti-environmental disturbances, but the use remedy is not possible to be obtained; for human misoperation, it is necessary to improve the design from marking, error prevention, etc., while the use remedy can be achieved through in-depth training actions; for component faults, it is necessary to improve the design by virtue of the characteristics of the product itself, or prevent faults through the methods such as preventive maintenance and fault prediction. From the analysis results in Fig. 4.17, it can be seen that after the fault mitigation guided by logical decision, each fault has a mitigation status, which includes eliminated, reduced, under improvement, unable to be mitigated, and no mitigation required. For the fault whose mitigation status is “eliminated”, there is no need to furtherly conduct maintainability, testability, and supportability analysis.

4.4.2 System Fault Closed-Loop Mitigation Control Based on the analysis in Sect. 4.3, the sources of system faults mainly include two aspects: one is obtained by logical transmission among the component faults, which are categorized in transmission fault set (T-FS).The other includes the interface faults and error propagation faults introduced in system integration, which are categorized in complete fault set (E-FS). For E-FS, the mitigation control can be carried out directly by using the logic decision process provided in Sect. 4.4.1; for T-FS, unless the design plan is completely replaced, the mitigation cannot be directly implemented. Therefore, in this case the mitigation control needs to be implemented from the root cause (i.e. from the component faults), to determine the mitigation status of system faults in combination with the fault logic. When the component fault is mitigated, the mitigation status of the system fault will be changed accordingly, where its decision-making process is shown in Fig. 4.18.

4.4 Fault Closed-Loop Mitigation Control

147

Fig. 4.18 System fault mitigation state decision logic process

(1) When the mitigation statuses of all component faults that cause the system fault are displayed as only “eliminated”, the system fault mitigation status is displayed as “eliminated”. (2) Otherwise, when the mitigation statuses of all component faults that cause the system fault are displayed as only “eliminated” and “no mitigation required”, the system fault mitigation status is displayed as “no mitigation required”.

148

4 System Fault Identification and Control Method Based on Functional …

(3) Otherwise, when at least one of the mitigation statuses of all component faults that cause the system fault is displayed as “under improvement”, the system fault mitigation status is displayed as “under improvement”. (4) Otherwise, when at least one of the mitigation statuses of all component faults that cause the system fault is displayed as “reduced”, the system fault mitigation status is displayed as “reduced”. (5) Otherwise, when at least one of the mitigation statuses of all component faults that cause the system fault is displayed as “unable to be mitigated” and there is “eliminated” component fault mitigation status, the system fault mitigation status is displayed as “reduced”. (6) Otherwise, when at least one of the mitigation statuses of all component faults that cause the system fault is displayed as “unable to be mitigated” and there is no “eliminated” or “unprocessed” component fault mitigation status, the system fault mitigation status is displayed as “unable to be mitigated”. (7) Otherwise, when the mitigation statuses of all component faults that cause the system fault are displayed as only “unprocessed” or “unable to be reduced”, the system fault mitigation status is displayed as “unprocessed”. (8) Otherwise, the system fault mitigation status is displayed as “under improvement”.

4.5 Fault Mitigation Decision Method For a system, the number of potential faults obtained after analysis is large and the designer cannot implement the mitigation one by one due to technical or economic reasons, so we need a decision-making method to assist the designer in determining which critical faults to improve so that reliability goals can be achieved more costeffectively and efficiently. If the correlation between faults is not considered, the methods commonly used in current engineering can be directly applied, such as severity category, criticality, risk priority, etc., which will not be expanded here. This book focuses mainly on the decision of the fault to be mitigated considering the fault correlation obtained from the above analysis.

4.5.1 Fault Mitigation Decision Considering Transmission Chain For the transmission faults occurred in system integration, this book combines the network centrality theory based fault point importance evaluation method to quickly determine the key faults or their set. The higher the importance of fault points, the greater the impact of mitigation, and these fault points can be considered as key points in improving product design.

4.5 Fault Mitigation Decision Method

149

From the point of view of practical engineering, product designers focus on the fault problems in the following several aspects: (1) The faults that eventually lead to the occurrence of Class I and Class II faults, i.e., the faults with the final severity category. (2) The faults that cause multiple coherent faults, depending on the number of coherent faults caused by them. (3) Multiple faults can lead to the occurrence of one fault, depending on the number of coherent faults to cause the specific fault. (4) Faults with high probability of occurrence in the fault chain, depending on the probability of occurrence of consequential faults resulted from the occurrence of the antecedent fault. If each fault in the fault transmission chain is regarded as a network node, and the transmission relationship between two faults is regarded as an edge, then all faults of a product and their transmission relationship constitute a complex network topology. Combined with the node importance evaluation criteria in the network model, it can be known that: (1) The fault that causes many other faults to occur at the same time: OutDegree. (2) The fault caused by many other faults: the number of paths that pass through that fault point. (3) The fault with a high probability of occurrence in the fault transmission chain: the transition probability. Combined with the above descriptions, two types of indices for evaluating the importance of fault nodes in the fault chain can be given, which are the centrality of the fault node and the probability of occurrence of the fault chain, respectively. 4.5.1.1

Centrality of the Fault Node

In the fault transmission network, it is impossible to have a fault node with degree 0, so in the subsequent discussion the node degree is assumed to be no less than 1 by default. In network theory, k-shell refers to the remaining network model after gradually removing nodes with degree 1, 2, …, k from the original network model. Furthermore, the k-core is formed by combining the k-shells with k S k. A simple three-layer shell network is shown in Fig. 4.19. As can be seen in Sect. 4.3.1, the fault transmission network is a directed graph. First, the sphere shown in Fig. 4.20 is used to represent the fault. This fault sphere retains attributes such as fault name, severity category, occurrence probability, and fault impact, and then a directed fault transmission network can be obtained by removing the product-level attributes. However, for product design, the node importance evaluations are the same by OutDegree and InDegree. Therefore, when evaluating the importance of nodes, we can temporarily ignore the direction of influence, and convert the fault transmission network into an undirected graph. Figure 4.21 shows the undirected fault transmission network of a signal processor.

150

4 System Fault Identification and Control Method Based on Functional …

Fig. 4.19 An illustration of a three-layer shell network [29]

Fig. 4.20 Fault sphere

Fig. 4.21 The undirected fault transmission network of a signal processor

On the basis of the above definitions, relevant definitions of the fault transmission network evaluation can be provided as follows. Definition 4.13 Fault correlation degree. In the fault transmission network, assuming that there are k i1 faults that may lead to fault i, which may cause k i2 faults to occur in other products, then the correlation degree of fault i is called k i , and k i = k i1 + k i2 . Definition 4.14 K-shell of faults. In the fault transmission network, after gradually eliminating the faults with degree 1,2,…,k, the new fault transmission network obtained is the k-shell of faults. Figure 4.22 shows the 1-shell, 2-shell and 3-shell obtained after processing the signal processor fault transmission relation network shown in Fig. 4.21.

4.5 Fault Mitigation Decision Method

151

Fig. 4.22 The k-shell of the signal processor fault transmission network

II II

III

II

II II

II

IV IV IV

II

IV

II

II

IV

IV

IV

IV

II

IV

II II IV

III

II II

IV IV

IV

IV

IV

IV

IV

IV

III III

IV III III III III

III 1-shell

2-shell

As can be seen in Fig. 4.22, once the fault in the 3-shell “input signal filter 2 short circuit and angle offset adjustment circuit short circuit” are mitigated, all faults in the 3-shell can be eliminated. In practical engineering, the designer can start to mitigate the faults with the largest severity category in a larger k-shell, to make the fault mitigation more targeted.

4.5.1.2

Occurrence Probability of the Fault Chain

To calculate the probability of occurrence of a fault chain caused by a specific fault, the following two factors need to be considered: (1) The probability of occurrence of each fault in the fault chain can generally be calculated during the fault analysis process. For example, the probability of occurrence of a fault F εi of the component ε can be calculated by λε t ε β εi . (2) The transmission probability between the faults. If fault F εi causes another fault F ξ j to occur, F εi will be considered as a cause of F ξ j . By denoting the proportion of fault F εi in all causes of fault F ξ j as wεiξ j , Pεiξ j = wεiξ j λξ t ξ β ξ j can be regarded as the probability that fault F εi causes fault F ξ j to occur. In the same fault chain, if the same fault causes multiple faults on the same level, an “OR gate” relationship should be considered when calculating the probability of occurrence of the fault chain. In this case, the probabilities of multiple transmission paths caused by the same fault should be summed up. Then, in the evaluation process, the probability of occurrence of the fault chain caused by a certain fault is mainly investigated. For instance, the probability of occurrence of a fault chain L caused by fault F q can be obtained as:

152

4 System Fault Identification and Control Method Based on Functional …

PL = C Pςq

J ∑ j=1

⎛ ⎝

∏

⎞ Pεi ξ j ⎠

(4.14)

Θ∈L j

in which: C is a constant; Pq is the probability of occurrence of fault F q ; L j is the jth transmission path (there are J paths in total) caused by fault F q ; Θ is the edge between the fault nodes in L j . For the signal processor, the data of each component of the signal processor are statistically obtained through field tests (the data in this table are not real data, but in a proportionally scaling up down process, which does not affect the comparison of the final results), as listed in Table 4.9. By taking wεiξ j as the reciprocal of the InDegree of the fault F ξ j , the probability of occurrence of each fault chain can be calculated, as listed in Table 4.10. From Table 4.10, the open circuit of the connector has the highest occurrence possibility of the fault chain, indicating that the occurrence of the open circuit of the connector has the highest possibility of causing abnormal operations of the signal processor. According to the traditional analysis method, the severity category of the open circuit connector fault belongs to Class III, which is likely to be ignored. According to the statistical analysis of actual faults of that signal processor in field applications, it is true that the connector fault causes the largest proportion of functional faults of the signal processor.

4.5.2 Fault Mitigation Decision Considering the Coupling Relationship Through fault chain evaluation, key faults can be detected and then mitigated. However, a system is composed of multiple structures, which are further composed of multiple functional components or physical components. Therefore, the system process is not a simple superposition of metaprocesses, but requires an integrated mechanism to achieve the process from quantitative to qualitative. In the process of system integration, new processes need to be added to identify the relationships in between the components, and through these relationships to identify and deal with emerging new fault modes. Due to the coupling of the faults, the effect of component fault mitigation on system faults is nonlinear. By assuming that the fault mitigation measures are reasonable and effective, all faults can be mitigated. However, new faults may be introduced due to new technologies or system integration, and affect the probability of occurrence of other faults. Therefore, before determining the effect of a fault mitigation on reliability, it is necessary to determine the fault set related to the fault mitigation, i.e. the fault mitigation correlation set.

4.5 Fault Mitigation Decision Method

4.5.2.1

153

Correlation Relationship of Fault Mitigation

Definition 4.15 Fault correlation Set. Suppose that f = {f 1 , f 2 , …, f n } is the initial fault set of a product. For ∀ fault f i (i ∈ {1, . . . , n}), by implementing an improve/ f ' ( f ' represents the new fault set of that product ment measure to make f i ∈ after implementing the improvement measure) or the probability of occurrence of the {fault (or severity category) is significantly reduced. If ∃{ f i1 , f i2 , . . . , f im } ⊂ } f, ∃ f j1 , f j2 , . . . , f jn ⊂ f ' , and the following conditions are satisfied: (1) i t /= i (t = 1, . . . , m). (2) ∀t ∈ {1, . . . , m}, then f i ∈ / f ' or the probability of occurrence or severity category of its corresponding fault is significantly reduced. (3) f i h ∈ / f (h = 1, . . . , b). { } Then the fault set f i , f i1 , f i2 , . . . , f im , f j1 , f j2 , . . . , f jb is called as the fault correlation set in correspondence to fi, denoted by FCSfi or abbreviated as fault correlation set FCS when there is no ambiguity. Combined with practical engineering, it can be assumed that all fault mitigation measures are reasonable and effective. Then based on this assumption, the coupling effects in between the faults can be categorized in the following types: (1) Elimination of a failure mode causes new failure modes{(Type I). The failure} mode f i is eliminated, but a new failure mode set f 1eN , f 2eN , . . . , f teeN is introduced,{ and the corresponding failure mode occurrence probability is } eN eN eN β , β , . . . , β . Denote the Type I FCSfi as FC S I = recorded as te 1 2 } { f 1 , f 1eN , f 2eN , . . . , f teeN . (2) Elimination of a failure mode causes the elimination of the other failure modes II). The { ee (Type } failure mode f i is eliminated, and the failure mode set ee ee f , f , . . . , f is}also eliminated. Denote the Type II FCSfi as FCSII = 1 2 E { f 1 , f 1ee , f 2ee , . . . , f Eee . It is noted that the faults which are simultaneously eliminated could not be from the same product, but the other products with interfaces (similarly hereinafter). (3) Elimination of a failure mode reduces the occurrence probabilities of the others (Type III). The failure mode { f i isedeliminated, } and the occurrence probabilities ed , f , . . . , f is} also reduced, from the origof the failure mode set f 1ed 2 De { ed ed } { ed ed ed ed inal β1 , β2 , . . . , β De to β 1 , β 2 , . . . , β te . Denote the Type III FCSfi as { } ed . FCSIII = f 1 , f 1ed , f 2ed , . . . , f De (4) Reduction of the probability of occurrence of a failure model introduces a new failure mode (Type IV). The probability{of occurrence of failure mode f i is } reduced, but a new set of failure modes f 1d N , f 2d N , . .{. , f tdd N is introduced, } dN

dN

dN

and its corresponding fault occurrence probabilities are β 1 , β 2 , . . . , β td . { } Denote the Type IV FCSfi as FCSIV = f 1 , f 1d N , f 2d N , . . . , f tdd N . (5) Reduction of the probability of occurrence of a failure model reduces the probability of occurrence of the other failure modes (Type V). The probability

154

4 System Fault Identification and Control Method Based on Functional …

of occurrence of failure{ mode f i is reduced, } and the occurrence probabilities of failure mode set f 1dd{, f 2dd , . . . , f dddD are} also reduced, from the orig{ } dd dd dd inal β1dd , β2dd , . . . , βdddD to β 1 , β 2 , . . . , β d D . Denote the Type V FCSfi as { } FCSV = f 1 , f 1dd , f 2dd , . . . , f dddD . (6) System integration introduces new failure modes (Type VI). The interface failure modes and new failure modes with high severity levels that emerged the system { I N I in } integration process are mainly considered, denoted as f 1 , f 2 N , . . . , f tII{N and the corresponding } failure mode occurrence probabilIN

IN

IN

ities are denoted as β 1 , β 2 , . . . , β t I { } = f 1I N , f 2I N , . . . , f tII N .

. Denote the Type VI FCSfi as FCSVI

In practical engineering, due to the non-uniqueness and comprehensiveness, the fault correlation set is not a single set but a union as FC S = FC S I ∪ FC S I I ∪ FC S I I I ∪ FC S I V or FC S = FC S I V ∪ FC SV ∪ FC SV I . Under the circumstance of no product levels, Type VI can be converted into Type I or Type II, since they are both new faults caused by fault mitigation.

4.5.2.2

Determination of Fault Correlation Set

In engineering applications, it is difficult to determine the FCS, which needs to be analyzed and determined in combination with the product function principles and fault propagation [30]. In this book, a layer by layer detection method is developed to find the fault set related to certain faults through thinking movement directions in the deductive method [31]. The method includes two steps. The first step is to construct a visual diagram of the fault mitigation logic tree; the second step is to calculate the FCS.

Construct a Visual Fault Mitigation Logic Tree The fault mitigation logic tree diagram contains four levels, as follows: The first layer (mitigation object layer). Designate the specific mitigation object, which is the specific fault to be mitigated. The second layer (mitigation method layer). Combining with the definition of fault mitigation, mitigation can be divided into two categories: (1) fault is eliminated; (2) the probability of the fault is reduced. Then, when conducting deductive reasoning, the first layer exhibits an “OR” relationship between “the fault is eliminated” and “the probability of occurrence of the fault is reduced”. The third layer (mitigation measures layer). According to the specific fault mitigation method, combined with the causes of the fault, the measures and methods are developed to form the second layer for deductive reasoning. That is, the second layer exhibits the logical relationship between various improvement measures or

4.5 Fault Mitigation Decision Method

155

Table 4.8 Typical logic gates [27] Combinatory logic

Graphical representation

Sequential logic

AND gate

PAND gate

OR gate

Cold standby gate ( CSP)

Exclusive-OR gate

Sequence gate (SEQ)

rin gate

Functional trigger gate (FDEP)

Graphical representation

methods, such as “AND”, “OR” or “CONDITIONAL”. This layer may contain multiple sublayers, according to practical conditions. The fourth layer (association mode layer). For specific improvement measures or methods, all possible related faults determined by the measures or means are further analyzed together with the functional principles and interface relationships of the ” is proposed in the product. In this book, a specific “pattern association gate logic tree diagram. This gate has the same algorithm with the “AND gate ”, but gives the order of elimination of related faults, i.e. they are eliminated simultaneously from left to right or followed the decreasing trends on the reduction quantities of the occurrence probabilities of faults. Based on the above process, a logical tree diagram of different fault mitigations can be constructed, shown in Fig. 4.23 as an example.

Calculate FCS: According to the Logical Relationship in the Logic Tree Diagram, Use the Descending or Ascending Method to Determine the FCS with Correlation Relationships [Case analysis] The short-circuit fault of the input filter circuit in the signal processor is mainly caused by electrolyte aging, raw material defects, and silver ion migration.

156

4 System Fault Identification and Control Method Based on Functional …

Table 4.9 Data obtained from the signal processor ) ( λi 10−6 h ti Product

Fault

βi j

Signal processor

0.0251

1

The processing is not working properly

–

Signal processing module

0.0141

1

The signal processing function fails

0.3

The performance of the signal processing function deteriorates

0.4

It does not work correctly

0.3

The analog voltage has no output

0.79

The analog voltage output is disordered

0.04

The analog power output is out of tolerance

0.05

There was an error with the discrete output

0.08

The discrete quantity has no output

0.04

The analog voltage has no output

0.79

The analog voltage output is disordered

0.04

The analog voltage power output is out of tolerance

0.05

There was an error with the discrete output

0.08

The discrete quantity has no output

0.04

Left axis angle converter

Left axis angle converter

Right axis angle converter

Connector

0.0053

0.0053

0.0053

0.0009

Input signal filter circuit 0.0013

Angle offset adjustment 0.0013 circuit

1

1

1

1

1

1

Poor contact

0.22

Open circuit

0.44

Short-circuit

0.34

Short-circuit

0.15

Open circuit

0.46

Parameter drift

0.39

Short-circuit

0.15

Open circuit

0.46

Parameter drift

0.39

4.5 Fault Mitigation Decision Method

157

Table 4.10 Probabilities of occurrence of the fault chain (in part) Component Connector

Fault

Possibility of occurrence possibility of the fault chain

Poor contact

0.007007

Open circuit

0.014015

Short-circuit

0.010830

Input signal filter circuit

Short-circuit

0.000013

Input signal filter circuit

Open circuit

0.000006

Parameter drift

0.000005

Angle offset adjustment circuit

Table 4.11 Commonly used reliability contractual parameters for complex products

Short-circuit

0.000024

Open circuit

0.000010

Parameter drift

0.000009

Parameter name

Fault rate Mean time between failures TB F Mean time between mmaintenance TB M Mean time to failure TT F Mean time between fatal failures TBC F Mission reliability R √ Note: indicates applicable

Range of application System √

Unit √

√

√

√ √ √ √

Fig. 4.23 An example of determining the fault correlation set based on the deductive method

158

4 System Fault Identification and Control Method Based on Functional …

Fig. 4.24 Short-circuit fault mitigation correlation model of input filter circuit

Using the method given above, the logic model of the input filter circuit short-circuit fault mitigation coupling fault determination can be obtained, as shown in Fig. 4.24. It can be seen from Fig. 4.24 that the fault correlation sets coupled with the short-circuit fault mitigation of the input filter circuit are as follows: (1) {short-circuit of the input filter circuit, no output of the analog voltage of the shaft angle converter, open-circuit of the EMI module, open-circuit of the energy storage protection circuit}. (2) {short-circuit of the input filter circuit, parameter drift, open circuit}. (3) {short-circuit of the input filter circuit, no output of the analog voltage of the shaft angle converter, parameter drift}. 4.5.2.3

Model of Influence of Reliability Index

When considering the characteristics of the Type II and Type III coupling effects, these two coupling effects can be combined by using indictor functions to construct the influence model of fault mitigation on the reliability indices. For complex products, the commonly used contractual reliability parameters are listed in Table 4.11. Combined with the types of coupling effects between the faults, the influence models of the fault mitigation on reliability indices can be divided into two categories, i.e. the influence models related to component fault mitigation on its own reliability indices and the influence models related to component integration on system reliability indices, respectively.

4.5 Fault Mitigation Decision Method

159

Reliability Index Influence Models for Components By assuming that the lifetime of component p follows an exponential distribution, and the probability of occurrence of component fault f i is β i , the probability of occurrence of f i after mitigation is calculated as β i Ii , in which { Ii =

1 (the occurrence probability of f i is reduced) 0 ( f i is eliminated)

Then the increment of fault rate λp of fault f i is calculated by: Δλ p = λ p − λ p ⎞ ⎛ td Dd ( ) ∑ ∑ dN dd ⎠ β dd βj − =Ii ⎝ j −βj j=1

⎛

+ (1 − Ii )⎝

j=1 te ∑

eN

βj −

j=1

( ) − βi − β i Ii

E ∑

⎞ De ( ) ∑ ed ⎠ β ed β ee j − j −βj

j=1

j=1

(4.15)

If FCS includes the faults of other components, then the fault rate increment Δλp in Eq. (4.4.) is separated and calculated in terms of a different component. (1) If component p can be repaired, then the increment of its mean time between failures TB Fp is calculated by: ΔTB Fp = −

Δλ p ( ) λ p λ p + Δλ p

(4.16)

(2) If component p cannot be repaired, then the increment of its mean time to failures TT Fp is calculated by: ΔTT Fp = −

Δλ p ( ) λ p λ p + Δλ p

(4.17)

Reliability Index Influence Models for the System By assuming that the system is composed of n components, and not considering the mitigation of new faults introduced by system integration, the influence of component fault mitigation on system reliability indices is first analyzed. Combined with reliability theory, the increase in the system fault rate λs can be obtained as:

160

4 System Fault Identification and Control Method Based on Functional …

Δλs =

n ∑

Δλ p +

p=1

tI ∑

IN

βj

(4.18)

j=1

For the repairable system, based on Eq. (4.18), the increase in the mean time between failures T BFs can be expressed as: ΔTB Fs = −

Δλs λs (λs + Δλs )

(4.19)

In the product design process, only the faults resulted by inherent causes are considered. Therefore, combined with the statistical parameters provided from Boeing Co., the increase of the mean time between maintenance T BMs of aircrafts is expressed as: ΔTB Ms = K ×

[(

TB Fs + ΔTB Fs

)θ

)θ ] ( − TB Fs

(4.20)

in which: K is environmental coefficient; θ is complexity coefficient. These two parameters will be varied with the products and are generally determined by historical statistical data. Then, when considering T BM as a contract parameter, that is, only the faults resulting from inherent causes are considered, the K and θ parameters in Eq. (4.20) can be given as 0.59 and 0.7 respectively. Similarly, with the T BFs of the system, the incremental expression of the mean time between critical failure TBCFs can be obtained as: ΔTBC Fs =

λs × ΔTB Fs + Δλs × TB Fs + Δλs × ΔTB Fs k ∑ βC Fi

(4.21)

i=1 s is catastrophic fault factor [32], βC Fi is the probability of occurin which: ∫λks +Δλ i=1 βC Fi rence of the catastrophic faults (in Categories I and II) after mitigation, k is the total number of catastrophic faults (in Categories I and II). Then, under a specified mission profile (assuming that its mission time is T ), the increment of mission reliability RS for a complex product whose lifetime follows an exponential distribution can be expressed as

ΔRs = e−T /(TBC Fs +ΔTBC Fs ) − e−T /TBC Fs

(4.22)

Back to the assumptions in the beginning of this subsection, for the new faults introduced after system integration, the construction of the influence model by fault mitigations on the system reliability indices can be carried out by referring to the construction approach of the influence model for unit-level reliability indices. Combined with the above-mentioned influence model of fault mitigation on reliability indices, the influences of different FCS mitigations on reliability indices can be

4.5 Fault Mitigation Decision Method

161

obtained, and the fault mitigation plan can be determined according to the principle to provide the most improvement in reliability indices. In theory, all relevant reliability indices should be considered. However, it can be seen from the influence model that the fault rate can deduce other reliability indices, and there is an interdependent relationship between them. According to the attribute dependency reduction rule [33], the mitigation plan can be determined directly by taking the fault rate into account, as expressed in below: {

} { } f M1 , f 1eN , f 1ee , f 2ee > f M2 , f 1ed , f 2ed > · · ·

(4.23)

{ } in which: the order relation, f M1 , f 1eN , f 1ee , f 2ee needs to be mitigated } { > represents prior to f M2 , f 1ed , f 2ed . Therefore, as soon as the mitigation order is determined, the corresponding mitigation plan has also been determined from a variety of candidate fault mitigation plans. Example 4.7 For the example given in Fig. 4.24, the increment of the fault rate after the mitigation of each fault correlation set is calculated according to the above equations: (1) {short-circuit of the input filter circuit, no output of the analog voltage of the shaft angle converter, open-circuit of the EMI module, open-circuit of the energy storage protection circuit}: − 0.00000039. (2) {short-circuit of the input filter circuit, parameter drift, open circuit}: − 0.00000117. (3) {short-circuit of the input filter circuit, no output of the analog voltage of the shaft angle converter, parameter drift}: − 0.00000067. Therefore, among these three mitigation plans, methods such as “increase the sealing layer and use high quality electrolyte” should be taken to first mitigate {short-circuit of the input filter circuit, parameter drift, open circuit}. In addition, the influence of the fault mitigation on the reliability, maintainability, supportability, and testability indices can be comprehensive considered by using a number of methods including rough sets, fuzzy sets, AHP, and TOPSIS methods.

Chapter 5

Physics of Failure Based Fault Identification and Control Methods

Abstract In this chapter, the process of fault occurrence, common Physics of Failure (PoF) models and visualization models is first introduced. Then based on the fundamental concepts and principles of load-response analysis, the finite element method and simulation based fault identification technology are introduced. Next, the timevarying reliability models are developed from the analysis of the time effect of the PoF models, and the determination methods of these model parameters are developed based on the degradation process. On this basis, the fundamental process and related methods of failure prediction and evaluation are given based on PoF models. And in the end, the PoF based design optimization and fault control methods are introduced. Keywords Physics of failure · Load-response analysis · Failure simulation · Degradation based model · Design optimization · Fault control

5.1 Introduction to Physics of Failure Model 5.1.1 Physical Process of a Fault There are many reasons to cause failures in a component-level product, such as: material and process defects introduced by improper quality control, structural design defects introduced by improper product design, damage caused by improper stress or environmental control in aging, screening, assembly process, damage caused by improper operational and environmental loads in use, and multifarious damage problems caused by human factors. In conclusion, the failure of component-level products is resulted by both external and internal factors. The external factors mainly include use mode, environmental conditions and human factors, while the internal factors primarily originate from a series of physical and chemical changes in materials and structure of the product. Failure mechanisms, which refer to the physical and chemical changes to make the product fail, is the inherent nature of the fault occurrence. Whereas the external environmental load, operational load and human factors are the external conditions leading to the faults of products. As time grows, the performance

© National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_5

163

164

5 Physics of Failure Based Fault Identification and Control Methods terminal condition threshold

● ● ● ●

environmental loads operational loads human factors ……

component-level product

time effect

expected outputs sub-component structure 1

sub-component structure 2

sub-component structure 3

● ● ● ●

material structure process ……

internal factors

error outputs

time effect

Fig. 5.1 Illustration of the physical process of a fault

parameters of the component-level product drift beyond their specified threshold (in extreme cases to cause the loss of function directly), and finally form the faults of that component-level product [34]. Therefore, the occurrence of the faults in a component-level product is ultimately caused by the physical damage mechanisms existing in the micro structure or basic electronic circuit of the product under environmental and operational loads. The occurrence process of product fault is illustrated in Fig. 5.1. In the ideal use of component-level products, the expected outputs are obtained by virtue of the expected input information. However in actual situations, together with the expected inputs, interference information provided by the external environment will be imported into the component-level product as well to result in the output errors. And, under the action caused by internal and external factors, deviation of the inputs will be accelerated to reach a threshold, beyond which the product fault will occur. As a result, in fault identification and control of the component-level products, it is necessary to firstly analyze the external causes of product faults, carry out load-effect analysis and fault identification, to determine the key factors which plays important roles in the product failure, such as the environmental and operational loads. Then, by the use of PoF models, the time effect analysis and fault identification are carried out, and the key weak links of the product are then determined through the integrated fault analysis by the time effect and the internal cause. Once the critical load factors have been determined, the reliability simulation analysis and evaluation of the key weak links can be carried out by using the relevant PoF models, to determine the influences of each key weak link on the fault of the component-level product. Furtherly, by using the reliability design optimization (or multidisciplinary reliability design optimization) method for the fault optimization and control in the

5.1 Introduction to Physics of Failure Model

165

key weak links, the potential fault can be eliminated and the fault propagation path can be blocked during the entire product life cycle.

5.1.2 Physics of Failure Model The Physics of Failure (PoF) model can be defined as a quantitative mathematical model-based on the cognition of the physical mechanism and the root cause of a product fault. Figure 5.2 shows the general input parameters of the PoF model, in which faults are no longer regarded as random events. On the contrary, the life or reliability of each product (or its component) corresponding to different fault modes and fault sites is related to its own material properties, geometric parameters, environmental conditions, and working (operational) conditions. The PoF model can quantitatively analyze the reliability of the product under specific working and environmental conditions in a specific application circumstance. If there are uncertainties in the input parameters of the PoF model, the reliability, life, or time to failure (TTF) of the output parameters are also provided with uncertainties. Among the numerous PoF models, those developed for components, parts and raw materials are the basis in quantitative fault analysis of component-level products. Several examples of such traditional PoF models are briefly introduced below.

5.1.2.1

Classical Models

The classic PoF models refer to the fundamental models that are often used in researches of fault mechanism analysis or reliability tests. They are used to describe the specific failure mechanism with appropriate abstraction or simplification. These classical models mainly include the Arrhenius model (which is related to temperature stress) [34], the Eyring model (which is related to both temperature and other stresses) [35], and the inverse law model (which is related to electrical stress) [36], etc. In addition, the cumulative damage model and the competitive failure model will operational conditions

material properties

geometric parameters

Physics of Failure (PoF) model

reliability/life/ time to failure (TTF)

Fig. 5.2 Input parameters of the PoF model

environmental conditions

166

5 Physics of Failure Based Fault Identification and Control Methods

be incorporated when considering the joint effect of multiple failure mechanisms existing in the entire product.

5.1.2.2

Arrhenius Model

Usually speaking, when the harmful reaction in materials and components accumulates to a certain limit, failures will follow. Such a phenomenon can be described by a reaction rate model. Here, the harmful reaction not only refers to chemical reactions in a narrow sense, but also physical changes with certain rates such as evaporation, condensation, deformation, and crack propagation, as well as diffusion and transmission phenomena of heat, electricity, and mass. And the reaction rate model is generally capable to deal with all these above reactions. In the process from the normal state to the degraded state, there is an energy barrier. And to cross this barrier, enough energy must be provided by environmental loads. Moreover, the reaction across this energy barrier (so-called the activation energy) follows a certain probability, which is called Boltzmann distribution. The empirical relationship between the reaction rate K and environmental temperature was found by Arrhenius in the nineteenth century, and therefore is called Arrhenius equation. The reaction rate model is proposed to describe the relationship between stress and time, on the basis of summarizing the experimental data of chemical reactions. The deterioration of characteristics of the product until failure is due to the undesirable time-varying changes (reactions) by chemical or physical reasons of the atoms or molecules in materials of that product. When this kind of change (or reaction) leads to the change of characteristics of the product, the according result makes the damage accumulated to a critical degree of fault initiation. That is to say, the faster the reaction rate, the shorter the life of the product. In the nineteenth century, from the research experience in chemical experiments, Arrhenius concluded that the reaction rate is inversely proportional to both the exponent of activation energy and the exponent of the reciprocal of temperature, as given by: ∂M E = K (T ) = Ae− kT ∂t

(5.1)

in which M represents a certain characteristic or degradational feature of the product, ∂M = K (T ) represents the reaction rate, which exhibits a linear relationship with ∂t time t, under Kelvin temperature T, A is the pre-factor, k is the Boltzmann constant, and E is the activation energy (unit: eV) in correspondence to certain failure mechanism (or chemical reaction). It is noted that the activation energy E keeps constant with the same kind of products.

5.1 Introduction to Physics of Failure Model

5.1.2.3

167

Eyring Model

The Arrhenius model considers the influence of only temperature stress on the physical and chemical properties of products and materials. However, in engineering practice, the product are often affected simultaneously by multiple stresses, such as voltage, mechanical stress, and other environmental stresses. According to the principles of quantum mechanics, the relationship between the chemical reaction rate and temperature and other stresses is derived as follows: K (T, S) =

kT dM = A e−k/E T e S(C+D/kT ) = K 0 f 1 f 2 dt h

(5.2)

in which T is the Kelvin temperature, S is the non-thermal stress, dM/dt represents the chemical reaction rate, K 0 = A nk T e(−E/kT ) represents the Eyring reaction rate under only temperature stress, h is the Planck constant, E is the activation energy, k is the Boltzmann constant, f 1 = eCS represents the correction factor for the energy distribution with the consideration of non-temperature stresses, f 2 = eDS/kT represents the correction factor for the activation energy with the consideration of non-temperature stresses, A, C and D are undetermined constants. Equation (5.2) is called Eyring model. It is regarded as a multi-stress model for the acceleration test under multiple stresses such as temperature and electric field, and exhibits more universal application than Arrhenius model in practical uses. For example, under the action of electrical stress such as voltage and current, effects such as ion migration and electromigration occurred inside the electronic device will cause the device faults. Under the working conditions with constant temperature and high humidity, electronic device faults will be caused by the joint effects of corrosion and peeling due to the electrochemical reaction in the metal electrode system.

5.1.2.4

Cumulative Damage Model

The cumulative damage model is used to describe the degradation process of products and materials under different stress levels, by assuming the degradation mechanism or failure mechanism remains unchanged even if the stress level is changed. The widely used linear cumulative damage model (also known as Miner’s rule) was proposed by M. A. Miner in 1945 to explain the cyclic fatigue of mechanical materials [37]. For the product material under stress, its internal defects are generally divided into two types: reversible defects which will disappear after the stress is removed, and irreversible defects which always exists. And the damage process of the irreversible defect could be different depending on the applied stress, such as mechanical stress on the metal electrode material of a component. According to Miner’s linear cumulative damage theory, a certain amount of damage will occur when the product operates at stress level S i , and the damage quantity is related to the entire duration t and the total time (i.e. life) t i under stress S i . The damage ratio (DR) under a certain stress level can then be approximately

168

5 Physics of Failure Based Fault Identification and Control Methods

determined by the ratio between the actual working time under that certain stress level and its expected Time to Failure (TTF), by: DR =

n ∑ ∆t i=1

ti

(5.3)

in which t i represents the TTF (in hours) of the product under a certain stress level (usually in correspondence to a certain failure mechanism), ∆t represents the actual work time (in hours) of the product under the certain stress level, DR represents the cumulative damage ratio of the product under n different stress levels during its operation. The product is regarded to be failed when DR ≥ 1.

5.1.2.5

Competitive Failure Model

When applying a PoF model to predict or evaluate product’s reliability, the product fault problem driven by a joint action of multiple mechanisms is usually solved by the competitive failure model (or called weakest link model) [38]. In this model, only the most important components are considered. And the life of the product is determined by the life of the earliest failed component from those above-mentioned important components (assuming that the maintenance of the system is not considered). When applying the competitive failure model, the TTF corresponding to each failure mechanism of a product is regarded as an independent random variable, regardless of whether the product is a device, integrated circuit, component or subsystem. Meanwhile, the fault rate is also usually considered not a constant. If T 1 , T 2 , ..., T n are random TTFs corresponding to n potential failure mechanisms of a product, then the TTF of this product is calculated as Ts = min(T1 , T2 , . . . , Tn )

(5.4)

For a series of specific environmental load conditions, the reliability of the product is described by a function in terms of time. Then the reliability at a certain time t can be expressed by: Rs (t) = P(Ts ≥ t)

(5.5)

R S (t) = P[(T1 ≥ t) ∩ (T2 ≥ t) ∩ . . . ∩ (Tn ≥ t)]

(5.6)

Then

If the n TTFs corresponding to the n failure mechanisms of the product are independent, then Rs (t) = P(T1 ≥ t)P(T2 ≥ t) . . . P(Tn ≥ t)

(5.7)

5.1 Introduction to Physics of Failure Model

169

Then R S (t) = R1 (t)R2 (t) . . . Rn (t)

(5.8)

λs (t) = λ1 (t) + λ2 (t) + · · · + λn (t)

(5.9)

in which T i (i = 1,…,n) represents the TTFs of the product under 1 ~ n failure mechanisms, which can be calculated by the corresponding PoF models, T s represents the TTF of the product, Rs (t) represents the reliability of the product at time t, Ri (t)(i = 1,…,n) represents the reliability of the product which is failed under 1 ~ n failure mechanism, λs (t) represents the failure rate of the product at time t, and λi (t)(i = 1,…,n) represents the failure rate of the product which is failed under 1 ~ n failure mechanism.

5.1.2.6

Typical Models

With the rapid development of microelectronics technology, new materials, new processes, and new devices continue to emerge, and the corresponding failure mechanisms of the electronic products have also been established and accumulated, under electrical stresses (such as electromigration, time-dependent dielectric breakdown (TDDB), conductive filament formation (CFF), etc.), mechanical stresses (such as fatigue, corrosion, etc.), and thermal stresses (such as stress driven diffusion void (SDDV), etc.). In this sub-section, models of electromigration, TDDB, and corrosion are briefly introduced.

5.1.2.7

Electromigration Model

When strong currents flow through the metal interconnect, the ions in the metal will flow under the interaction by the current and other factors to form pores or cracks in the interconnect. This phenomenon is called electromigration [39]. When a high-density current flows through the metal film/metal interconnect, the conductive electrons with high momentum will exchange momentum with the metal atoms/positive ions (to create “electron wind”), causing the metal atoms to migrate following the direction of the electron flow (as shown in Fig. 5.3a). Electromigration will cause metal atoms to accumulate on the anode to form hillocks or whiskers, leading to short circuits between the electrodes. And a cavity is also formed at the cathode due to the clustering of metal vacancies, leading to an open circuit (whose essence is the mass transport process in the metallization system, as shown in Fig. 5.3b). Figure 5.4 shows the SEM photos when electromigration occurs in different parts. Electromigration is caused by the diffusion of metal ions. There are three kinds of metal ion diffusions, which are surface diffusion, lattice diffusion, and grain boundary

170

5 Physics of Failure Based Fault Identification and Control Methods

cavity

hillock

Fig. 5.3 Electromigration and its formation process. a Electromigration phenomenon; b formation mechanism of electromigration

Fig. 5.4 SEM photos of electromigration a metal interconnect; b BGA solder

diffusion respectively. The diffusion of different metal interconnects is dominated mainly by the grain boundary diffusion. And the diffusion of copper (Cu) interconnect is dominated mainly by the surface diffusion. The external forces that lead to diffusion mainly include: the integrated force generated by the external electric field and momentum exchange between electrons and metal ions, the diffusive force generated by the non-equilibrium ion concentration, the mechanical stress generated by a longitudinal pressure gradient, and the thermal stress generated by a temperature gradient. The existence of these stresses can cause discontinuities in ion current density to form electromigration. In addition to the aforementioned external stresses, electromigration is also affected by geometric factors. Under high current densities, a mechanical stress gradient will occur along the metal interconnect. Meanwhile, within a certain range of small current densities, the electromigration lifetime will decrease with the increase of the interconnect length. However, beyond this limit, the increase of the interconnect length will not affect the electromigration lifetime anymore when it is beyond a threshold. At this time, when the interconnect width is comparable or even smaller than the grain size, the grain boundary diffusion will decrease and be transformed to lattice diffusion and surface diffusion. In addition, the existence of corners, steps,

5.1 Introduction to Physics of Failure Model

171

and contact holes will increase the local stress gradient and accelerate the occurrence of electromigration. The third type of factor that affects electromigration is the metallic material of the interconnect itself. Generally, alloys can effectively inhibit electromigration. For example, the dope of a small amount of copper can greatly improve the lifetime of aluminum interconnects, and the dope of a small amount of silicon can also improve reliability, because the absorption of copper atoms along the grain interface reduces the diffusion area. The electromigration model establishes the relationship between the electromigration of electronic components and the current density, geometrical size, material properties and temperature distribution of the metal interconnect. The current flowed through the metal interconnect can be either direct current or alternating current. However, the electromigration model under alternating current conditions is developed based on PoF models derived from the direct current conditions. Then the electromigration life can be predicted by: MTTF =

( ) Ea W dT m exp C jn kT

(5.10)

in which W and d are the shape parameters of the metal interconnect (usually the cross-sectional area of the metal interconnect is calculated by W × d), T is the Kelvin temperature, j is the average current density, m and n are the fault intensity factors (for instance n = m = 1 under low current densities, n = m = 3 under high current densities), C is the factor related to the geometrical sizes and temperature of the metal interconnect, E a is the activation energy, and k is the Boltzmann constant.

5.1.2.8

TDDB Model

Generally, the dielectric breakdown of MOS (metal–oxide–semiconductor) device refers to the instantaneous breakdown by a high voltage where the electric field strength applied on the dielectric material reaches or exceeds its critical threshold. In MOS devices and their ICs, the thin SiO2 layer under the gate is commonly called gate oxide (dielectric). The quality of the gate oxide layer strongly depends on its leakage properties. When the leakage increases to a certain extent, dielectric breakdown will be occurred to eventually lead to the device fault. There are two types of gate oxide breakdowns, which are instantaneous breakdown and time dependent dielectric breakdown (time dependent dielectric breakdown, TDDB), respectively. TDDB refers to the phenomenon that the breakdown occurs not instantaneously but after a certain period of time, when the applied electric field is lower than the intrinsic electric strength of the gate oxide. During this period, the defects (traps) are generated and gradually accumulated in the oxide layer. It is generally believed that the breakdown of the gate oxide layer is caused by the combined action of both thermal stress and electrical stress. Therefore, it is related to the external electric field, activation energy of the dielectric material and temperature

172

5 Physics of Failure Based Fault Identification and Control Methods

[40]. In addition, the breakdown time is also related to the gate capacitance area and gate voltage. There are two kinds of TDDB models, as shown below. (1) Thermal-chemical model (E model) (

E a1 TTF E = A exp(γ E) exp kT

) (5.11)

(2) Anode hole injection model (1/E model) (

TTF1/E

G = τ exp E

)

(

E a2 exp kT

) (5.12)

in which A and τ are pre-factors, γ is electric acceleration factor, G is constant, E is the electric strength applied on the gate oxide layer, E a1 and E a2 are thermal activity energies, k is the Boltzmann constant, and T is the Kelvin temperature. In practical applications, the E model and 1/E model are usually used together as a unified model, as shown in below: 1 1 1 = + TTF TTF E TTF1/E

5.1.2.9

(5.13)

Corrosion Model

Corrosion refers to chemical or electrochemical degradation process of materials, and exhibits a time-dependent wear-out failure which are shown in Fig. 5.5 as examples. From a macro point of view, it can lead to brittle fracture or the propagation of fatigue cracks; whereas from a micro point of view, it will change the electrical and thermal properties of the material. And corrosion rate is related to factors such as material properties and crystal structure, ionic contaminants and geometric dimensions [41]. (1) Common corrosion failures are divided as: ➀ Uniform corrosion. The corrosion chemical reaction proceeds on the entire exposed surface, to cause the material thinner and thinner until it is completely corroded away. ➁ Galvanic corrosion. It occurs from the contact among multiple metals, and its severity highly depends on the difference in electrochemical properties of these materials. ➂ Stress corrosion. It occurs under the joint action of corrosion and mechanical stress. For those parts under tension, anode dissolution is prone to occur to result in smaller anode area, which further promotes the stress concentration in that area, to therefore initiate a vicious cycle.

5.1 Introduction to Physics of Failure Model

173

Fig. 5.5 Pictures of common corrosion failures

(2) The corrosion failures can also be divided into the following types, according to the types of products and surface films: ➀ Creep corrosion. When the base metal and noble metal, such as copper and gold, are connected, no oxide film is created on the bonding surface, therefore the copper will gradually “climb” to the side of gold to cause corrosion. ➁ Pinhole corrosion. The water vapor condensed on the coating surface will form an electrolyte film composed of water and impurity ions, to induce the occurrence of galvanic corrosion. It often starts from a pinhole and gradually grows into pinhole corrosion or even crevice corrosion. ➂ Dry corrosion. It occurs when the metal is exposed to an oxidized environment (such as oxygen and sulfur). The PoF based corrosion model is shown by: MTTF = A(R H )

−n

(

Ea exp kT

) (5.14)

in which A is the corrosion area dependent constant, RH is the relative humidity, n is an empirical constant which usually is determined as 3, E a is the activity energy, k is the Boltzmann constant, and T is the environmental temperature (in Kelvin).

5.1.3 Visualization Model of the Fault Product designers are those to implement the fault reduction design work. Considering that designers mainly use 3D digital design software such as UG and CATIA to carry out the design work, the failure mode, failure mechanism and other failure characteristics should be all reflected in the digital model to achieve a unified design of both hexability and special characteristics. However, the PoF models are developed based on mathematical equations and uncertainties, and therefore difficult to

174

5 Physics of Failure Based Fault Identification and Control Methods

be visually displayed in 3D models. In view of this, a set of visual representation methods are provided for the 3D visual expression of fault characteristics in this book. By using these methods, it is convenient for designers to directly use the product’s three-dimensional digital model to make monitoring decisions for the faults with the priority of remedies.

5.1.3.1

3D Model Simplification

The actual product has very complicated 3D structure, which is not conducive to the display of fault characteristics. Therefore, the 3D model of the product needs to be simplified. That is to say, the product with an irregular structure could be simplified into a regular 3D model, according to the following principles: (1) For the physical component at the lowest level in the 3D model, the envelop principle is implemented to construct a rectangular frame to enclose the physical component, whose length, width, and height are recorded as L ε , W ε , and H ε respectively. Example 5.1 For a cone shown in Fig. 5.6, the smallest rectangular frame is constructed by using the envelop principle, as shown in Fig. 5.7. (2) For the physical component not at the lowest level in the 3D model, the envelop principle is implemented to construct rectangular frames to enclose the physical components at its lower level, as shown in Fig. 5.8. The rectangular frame obtained through the 3D model simplification will inherit all the attributes of the physical component, including the basic information, connection relations, and fault information. Fig. 5.6 Example of a cone

Fig. 5.7 The smallest rectangular frame that encloses the cone

5.1 Introduction to Physics of Failure Model

175

Fig. 5.8 Non lowest level physical unit 3D wireframe

5.1.3.2

Visualization of the Fault in a Component-Level Product

5.1.3.3

Extraction of Visualization Features

Fault attributes include the fault cause, fault effect, severity level, fault occurrence probability (level), design improvement measures, use compensation measures, fault criticality, etc. The information set of all these above fault attributes is called the information space of that fault. For each fault, its information space has multiple dimensions. The purpose of fault visualization is to help designers to track faults in an explicit way, and then identify the critical faults. Therefore, the key attributes which can characterize the fault in the information space should be selected to reduce the data dimension and interference, when obtaining the visualization data set. In the process of determining the critical faults, the severity level and occurrence probability are two key fault characteristics most concerned by designers. Therefore, in this book the visual design of these two characteristics are mainly discussed. Assuming that product ε contains nε faults, each of which contains m attributes, then the information space formed by its all fault attributes can be expressed as ⎡

⎤

⎡

Cε11 Cε12 ⎥ ⎢ Cε21 Cε22 ⎢ ⎥ ⎢ ⎢ Oε = ⎢ ⎥=⎢ . .. ⎦ ⎣ .. ⎣ . Oεn ε Cεn ε 1 Cεn ε 2 Oε1 Oε2 .. .

⎤ · · · Cε1m · · · Cε2m ⎥ ⎥ . ⎥ .. . .. ⎦ · · · Cεn ε m

(5.15)

By using a disjunctive function E v , the fault characteristic collection can be obtained:

176

5 Physics of Failure Based Fault Identification and Control Methods

⎡

⎤

⎡

Cs11 Cs12 ⎥ ⎢ Cε21 Cε22 ⎢ ⎥ ⎢ ⎢ Oε = ⎢ ⎥=⎢ . .. ⎦ ⎣ .. ⎣ . Oεn ε Cεn ε 1 Cεn ε 2 Oε1 Oε2 .. .

⎤ ⎤ ⎡ s ⎡ p ⎤ Tε1 Tε1 Tε1 · · · Cε1m ⎢ Tε2 ⎥ ⎢ T s T p ⎥ · · · Cε2m ⎥ ⎥ Ev ⎥ ⎢ ε2 ε2 ⎥ ⎢ .. ⎥ .. ⎥ −→ Tε = ⎢ .. ⎥ = ⎢ .. .. ⎦ ⎣ . . . ⎦ . ⎦ ⎣ . p s Tεn ε Tεn ε · · · Cεn ε m Tεn ε (5.16)

in which Tε is the fault characteristics set of product ε, TεiS is the severity level of probability (level) of fault i in product ε. fault i in product ε, TεiP is the occurrence ] [ Then it is clear to know that Tεi = TεiS , TεiP . 5.1.3.4

Design of the Graphic Element for Visualization

In the fault visualization model, a fault is represented by a sphere in this book. And the color and size of the sphere are used to indicate the severity levels and occurrence probability of the fault, respectively. (1) Severity levels The severity level characterizes the consequence severity caused by the fault. In real life, the severity and danger are usually identified through red, yellow and other warning colors. Following such custom, this book also uses different colors to distinguish different severity levels. Figure 5.9 shows the colors corresponding to the 4 severity levels. Red indicates the highest severity, i.e. level I (with catastrophic disaster); yellow indicates high severity, i.e. level II (with fatal failure); blue indicates mediate severity, i.e. level III (with critical failure); green indicates the most natural severity, i.e. level IV (with slight or no impact failure). The color of the sphere in terms of different severity levels can be calculated using RGB coordinates, namely RGB

Tεis −→ Vεis = RG B(Rεi , G εi , Bεi )

(5.17)

in which Rεi , Gεi and Bεi are the red, green and blue values of fault i in product ε. high

severity level

Fig. 5.9 Visualization of severity levels

low

5.1 Introduction to Physics of Failure Model Fig. 5.10 Visualization of occurrence probability levels

high

177

occurrence probability level

low

For the commonly used 4 severity levels (i.e. level I, II, III and IV), which are numerated by 4, 3, 2, 1 (i.e. S εi = 4, 3, 2 and 1) respectively, the color values can be expressed as: ⎧ (0, 255, 0) ⎪ ⎪ ⎨ (0, 0, 255) Vεis = RG B(Rεi , G εi , Bεi ) = ⎪ (255, 255, 0) ⎪ ⎩ (255, 0, 0)

(Sεi = 1) (Sεi = 2) (Sεi = 3) (Sεi = 4)

(5.18)

(2) Occurrence probability (level) of the fault The occurrence probability of the fault can be expressed by the radius of the sphere. The larger the radius, the greater the occurrence probability, as shown in Fig. 5.10. In actual engineering, there are two ways to characterize the occurrence probability of the fault, which are the quantitative characterization in probability values and qualitative characterization in grades. For unifying these two ways, the occurrence probability of the fault is divided into 5 levels with an order of A, B, C, D, E (or 5, 4, 3, 2, 1). And the quantitative probabilities can be classified by: ⎧ ⎪ 5 ⎪ ⎪ ⎪ ⎪ ⎨4 Di = f 1 (Pi ) = 3 ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎩1

(I4 < Pi < 1) (I3 < Pi ≤ I4 ) (I2 < Pi ≤ I3 ) (I1 < Pi ≤ I2 ) (0 < Pi ≤ I1 )

(5.19)

In order to display the number of faults and the visual effect of the sphere in the rectangular frame, the relationship between the radius of the sphere and its enveloped frame should be firstly determined, with the following basic calculation rules: (1) All fault spheres cannot exceed their enveloped rectangular frame (i.e. the simplified structure of the product). (2) The fault spheres within a rectangular frame should be randomly distributed. (3) There should be a clear difference in visual sense in between the locations with many faults and less faults.

178

5 Physics of Failure Based Fault Identification and Control Methods

In order to obtain a fault sphere’s radius VεiP in the rectangular frame of product ε, it is firstly assumed that the radii of all fault spheres are VεP , which is calculated as follows: / ( ) L ε Wε Hε 3 3Cu ε P Vε = Min (5.20) , , , 2 2 2 8π Nε in which Cu ε = L ε × Hε × Wε represents the volume of rectangular frame ε, N ε is the number of the faults in rectangular frame ε. Next, by combining with the previous divisions on occurrence probability levels, the radii of different levels of the fault spheres can be obtained:

VεiP

⎧ ⎪ 1 × VεP ⎪ ⎪ ⎪ ⎪ ⎨ 0.8 × VεP = f 2 (Di ) = 0.6 × VεP ⎪ ⎪ P ⎪ ⎪ 0.4 × Vε ⎪ ⎩ 0.2 × V P ε

(Di (Di (Di (Di (Di

= 5) = 4) = 3) = 2) = 1)

(5.21)

Based on the above equations, the visualization mapping process from the occurrence probability to radius of the fault sphere of fault i can be obtained by: TεiP

f 2 ( f 1 ,min)

−→ VεiP

(5.22)

(3) Fault site By letting an angle of rectangular frame ε as the origin, the length, width, and height associated with this angle are regarded as the X-axis, Y-axis, and Z-axis, respectively. And rectangular frame ε is placed in the first quadrant of the abovementioned coordinate system. Then for any fault i ((∀1 ≤ i ≤ Nε )) in rectangular frame ε, the center coordinates U εi of its fault sphere can be calculated as follows: ] ] ⎞T P P [Vz P, L z − Vz P ⎜ V , Wz − V ⎟ ( ) z z ] ⎟ Uzi = (X zi , Yzi , Z zi ) = f 3 VzP , L z , Wz , Hz = rand⎜ ⎝ [ ⎠ P P P Vz , Hz − Vz ) ( P P s.t. d Uzi , Uz j ≥ Vzi + Vz j (∀1 ≤ i, j ≤ Nz ; i /= j ) (5.23) ⎛[

In conclusion, the visualization mapping process for any fault sphere can be obtained as:

5.1 Introduction to Physics of Failure Model

179

Fig. 5.11 3D model of a signal processor

⎡ ⎢ ⎢ Oε = ⎢ ⎣

Oε1 Oε2 .. .

⎤

⎡

⎢ ⎥ ⎢ ⎥ Ev ⎥ −→ Tε = ⎢ ⎣ ⎦

Tε1 Tε2 .. .

⎤

5.1.3.5

⎡

f 2 ( f 1 ,min) ⎢ ⎥ f3 ⎢ ⎥ ⎥ = −−−−→ Vε = ⎢ ⎣ ⎦

Oεn e Tεn ε ( ) S P ((X ε1 , Yε1 , Z ε1 ), Vε1S , Vs1P ) ⎢ (X ε2 , Yε2 , Z ε3 ), V , V ε2 ε2 ⎢ =⎢ .. ⎣ . ) ) (( X εn ε , Yεn ε , Z εn e , VεnS ε , VεnP ε ⎡

RG B

⎤

Vε1 Vε2 .. .

⎤ ⎥ ⎥ ⎥ ⎦

Vεn ε

⎥ ⎥ ⎥ ⎦

(5.24)

Case Study on Visualization of the Faults in a Component

The 3D model of a signal processor is shown in Fig. 5.11. Combining with the fault information given in the previous sub-sections, the visualization data of the signal processor is calculated and listed in Table 5.1. All these visualization data listed in Table 5.1 can be plotted in the 3D model of the signal processor, as shown in Fig. 5.12.

5.1.3.6

Visualization of the Fault in a System-Level Product

When the designer carries out the fault track in a product model, he/she should firstly determine which system may have a critical fault to result in serious consequences, and then deeply trace the root cause of that critical fault from component-level faults or interface faults. Therefore, in the visualization model of system-level products, a comprehensive characterization should be carried out among all fault consequences and occurrence probabilities.

180

5 Physics of Failure Based Fault Identification and Control Methods

Table 5.1 A visual data table for the signal processor Encode

L

W

H

VεP

i

VεiS

VεiP

Spherical coordinates X

0-1-1

0-1-2

0-1-3

0-1-4 0-1-5

0-1-6 0-1-7

0-2-1

0-2-2

0-2-3

0-2-4

38

40

15

30 20

22

21

17

23 20

30

18

10

25 17

7.93

6.70

3.70

7.00 5.13

Y

Z

1

(255,255,0)

3.17

16.56

3.30

3.37

2

(255,255,0)

3.17

7.30

13.26

21.21

3

(0,0,255)

3.17

30.73

11.26

26.25

1

(255,255,0)

2.68

16.18

15.75

12.85

2

(255,255,0)

2.68

32.60

16.07

14.34

3

(0,0,255)

1.34

33.10

17.53

1.55

1

(255,0,0)

0.74

11.75

14.80

1.82

2

(0,255,255)

1.48

12.48

10.36

2.17

3

(0,0,255)

0.74

4.51

9.23

8.90

1

(0,0,255)

1.40

27.64

4.58

22.95

2

(0,0,255)

2.80

26.15

11.25

18.33

1

(255,0,0)

1.03

3.57

8.60

14.71

2

(0,255,255)

1.03

15.24

18.25

10.83

3

(0,0,255)

2.05

2.62

15.55

14.10 3.47

30

15

5

2.50

1

(0,0,255)

0.50

20.18

11.11

30

15

5

2.50

2

(0,0,255)

0.50

11.87

9.68

1.18

33

24

38

8.43

1

(255,255,0)

5.06

21.22

5.50

12.78

2

(255,255,0)

3.37

4.58

5.05

29.11

3

(255,255,0)

1.69

22.27

8.23

34.59

1

(255,255,0)

2.25

31.22

9.49

12.51

2

(255,255,0)

4.50

52.84

26.65

6.70

64

15

21

33

35

17

29

24

32

10

14

38

11.26

3.70

5.54

8.43

3

(0,0,255)

2.25

15.53

24.97

14.30

4

(255,255,0)

2.25

53.74

5.11

23.98

5

(255,255,0)

4.50

42.26

22.11

11.97

1

(255,255,0)

1.48

1.90

7.64

4.17

2

(0,0,255)

2.22

10.30

12.21

3.26

3

(0,0,255)

1.48

7.38

7.74

6.03

1

(0,255,255)

2.21

13.97

20.76

4.86

2

(0,255,255)

3.32

5.03

14.46

10.38

3

(0,255,255)

1.11

7.50

16.78

3.74

1

(255,255,0)

3.37

23.10

7.77

19.19

2

(255,255,0)

3.37

21.73

18.75

33.36

3

(255,255,0)

3.37

17.14

5.76

8.04

5.1 Introduction to Physics of Failure Model

181

Axis angle signal processor power module 36V-400HzDC/AC converter ±15 V, +5V DC/AC converter input filter circuit EMI module tank circuit surge suppressor connector signal processing module axis angle converter input filter circuit angle offset adjustment circuit connector

Fig. 5.12 The visualization model of a signal processor

5.1.3.7

Extraction of Visualization Features

According to the aforementioned analysis, system faults include transmission faults, interface faults, error propagation faults and sneak path faults. (1) Severity levels For the faults such as the interface {fault,I error propagation } fault and sneak path fault, their severity levels are given as SsiE Ii = 1, 2, . . . , n sE during the fault analysis. However for the transmission fault, the severity level can be determined from the fault cause (which is obtained from the fault analysis or fault logic model), by the following equation: SsiI =

max

k=1,2,··· ,n siI

{Ssik }

(5.25)

) ( in which Ssik is the severity level obtained from the kth k = 1, 2, . . . , n siI fault cause of transmission fault FsiI . (2) Occurrence probability Using the fault logic equation, the occurrence probability of each transmission fault { I } of a system-level product can be calculated as PsiI Ii = 1, 2, . . . , n sI . Whereas, the occurrence of the interface fault and error propagation fault have been I } { probabilities EI E given as Ps j Ii = 1, 2, . . . , n s during the development of the fault logic model. By using Eq. (5.19), the occurrence probability of the fault I be converted into } { can I { } I occurrence probability levels DsiI Ii = 1, 2, . . . , n sI and DsEj Ii = 1, 2, . . . , n sE .

5.1.3.8

Characterization of Visualization Features

By using the risk matrix method, each fault of the system-level product in the risk matrix can be indicated, as shown in Fig. 5.13. Furtherly, by comprehensively considering its severity level and occurrence probability, the hazard of a fault

182

5 Physics of Failure Based Fault Identification and Control Methods O

Fig. 5.13 Illustration of the risk matrix

Severity level

A B C D E Occurrence probability level

H4

H3

H2

H1

{ I } can be divided into Bs levels, which are recorded as Hbs Ib = 1, 2, . . . , Bs where H BSs > · · · > H2S > H1S . Then, the hazard level H s of a system-level product can be calculated as follows: Hs =

max

b=1,2,··· ,Bs

{

Hbs

}

(5.26)

Likewise, by using the sensitivity of the human optic nerve to colors, different severity levels can be distinguished with different colors. The parameter Vs for the data visualization is then calculated as follows: RGB

Hs −→ Vs = RG B(Rs , G s , Bs )

(5.27)

in which, Rs , Gs , Bs represent the red, green, and blue chromaticity coordinate, respectively, when the hazard level of the system-level product is H s . For example, for a risk matrix divided by 4 levels, the chromaticity coordinates of each level can be defined by: ⎧ (0, 255, 0) ⎪ ⎪ ⎨ (0, 0, 255) Vs = RG B(Rs , G s , Bs ) = ⎪ (255, 255, 0) ⎪ ⎩ (255, 0, 0)

5.1.3.9

(Hs (Hs (Hs (Hs

= H1) = H2) = H3) = H4)

(5.28)

Case Study of the Fault Visualization

For the fault data of the signal processor in Example 2.1, visualization of the occurrence probability of each fault and its severity level can be obtained through the above equations, and listed in Table 5.2. According to Eq. (5.26), MaxT 0–1 = H4, MaxT 0–2 = H3, which are indicated by the red and yellow color respectively as shown in Fig. 5.14.

5.1 Introduction to Physics of Failure Model

183

Table 5.2 Fault datasheet of the signal processor in Example 2.1 Encode

Fault encode

Fault

Type

P

E

D

H

0–1

1

The 36 V 400 Hz AC power supply does not output

IFM

4.05 × 10–5

I

C

H4

2

The output quality of 36 IFM V 400 Hz AC power supply decreases

9.18 × 10–6

IV

D

H1

3

The + 15 V power supply does not output

IFM

3.65 × 10–5

I

C

H4

4

The − 15 V power supply does not output

IFM

3.65 × 10–5

I

C

H4

5

The + 5 V power supply does not output

IFM

3.65 × 10–5

I

C

H4

6

The output quality of + IFM 15 V power supply decreases

4.48 × 10–6

IV

D

H1

7

The output quality of − IFM 15 V power supply decreases

4.48 × 10–6

IV

D

H1

8

The output quality of + IFM 5 V power supply decreases

4.48 × 10–6

IV

D

H1

9

The output has no power down protection function

IFM

5.82 × 10–8

III

E

H2

10

The output power off protection function decreases

IFM

1.28 × 10–6

IV

D

H1

11

The power wire is broken

EFM

5.82 × 10–8

I

E

H2

1

The left signal IFM processing function fails

1.57 × 10–5

II

C

H3

2

The left signal processing function performance is reduced

IFM

1.65 × 10–5

IV

C

H1

3

The right signal IFM processing function fails

1.57 × 10–5

II

C

H3

4

The right signal processing function performance is reduced

IFM

1.65 × 10–5

IV

C

H1

5

The left signal voltage is offset by zero

IFM

1.53 × 10–5

III

C

H2

6

The right signal voltage IFM is offset by zero

1.53 × 10–5

III

C

H2

0–2

(continued)

184

5 Physics of Failure Based Fault Identification and Control Methods

Table 5.2 (continued) Encode

0

Fault encode

Fault

Type

P

E

D

H

7

The signal cable is broken

EFM

5.79 × 10–8

I

E

H2

1

The power supply function is not normal

IFM

1.74 × 10–4

I

B

H4

2

The signal processing function is not working properly

IFM

9.51 × 10–5

II

C

H3

3

The appearance and installation of the processor cannot meet the requirements

EFM

6.01 × 10–7

IV

E

H1

Axis angle signal processor power module 36V-400HzDC/AC converter ±15 V, +5V DC/AC converter input filter circuit EMI module tank circuit surge suppressor connector signal processing module axis angle converter input filter circuit angle offset adjustment circuit connector

yellow

red

Fig. 5.14 Visualization model of the signal processor in Example 2.1

5.1.3.10

Multi-level Visualization of the Fault

From a design perspective, the visualized fault model of the system-level product should be firstly displayed, and then the root cause of the fault is traced layer by layer. In this section, a set of multi-level methods for tracing and filtering the physical faults are developed based on the fault hierarchical network.

5.1.3.11

Construction of the Hierarchical Network

In order to visualize the hierarchical relationships among the faults, the product model can be transformed into a hierarchical network through the following steps: (1) Use a small circle to represent a fault node. Then number the small circles which are arranged in a hierarchical structure layer by layer from top to bottom. (2) Mark the small circle by using the color of the visualization model of the fault node.

5.2 Load-Response Analysis and Fault Identification

185

(3) Connect the nodes in the hierarchical network according to the fault transfer relationship. Example 5.2 By using the above three steps to transfer the fault models in the signal processor in Example 2.1, a hierarchical network is obtained and shown in Fig. 5.15.

5.1.3.12

Focus and Dynamic Filter of the Hierarchical Network

In general, designers preferentially focus on the nodes with high severities. Assuming that a certain fault node is selected to be focused, it is necessary to filter the fault nodes (which are irrelevant with the selected node) at the same layer and the next layer (if fault nodes at a higher layer are filtered, its related fault nodes at lower layers are filtered as well), and only the fault nodes to cause the selected fault node are kept. Example 5.3 From the judgement by colors, fault 1 of the signal processor is firstly focused, and then after dynamically filtering, the hierarchical network is obtained as shown in Fig. 5.16.

5.1.3.13

Establishment of 3D Fault Visualization Model

The above-mentioned focusing and filtering process on the hierarchical network can be used to build the 3D fault visualization model of a product. As shown in Fig. 5.17, it can be determined that the faults identified in red color located in the green line box on the right are those to be mitigated, such as the shortcut of the input filter circuit and shortcut of the output filter circuit.

5.2 Load-Response Analysis and Fault Identification 5.2.1 Fundamental Concepts and Principles Once a product is manufactured, it is subjected to various loads during its full life cycle including the periods of screening, storage, transportation, use, maintenance, etc. As a result, its physical, chemical, mechanical and electrical properties are continuously changing, to lead to the product fault. These common loads mainly include environmental loads (such as temperature, humidity, pressure, electric charge, vibration, shock, etc.) and operational loads (such as voltage, current, etc.). And the corresponding response to these loads is reflected by stress, i.e. the internal reaction load generated by the product to resist the applied loads.

186

5 Physics of Failure Based Fault Identification and Control Methods

Fig. 5.15 Hierarchical network of the signal processor in Example 2.1

5.2 Load-Response Analysis and Fault Identification

187

focusing

filtering

Fig. 5.16 Hierarchical networks of the signal processor in Example 2.1 before (a) and after (b) focusing and filtering

Fig. 5.17 3D fault visualization models of the signal processor in Example 2.1 before (a) and after (b) focusing and filtering

188

5 Physics of Failure Based Fault Identification and Control Methods

The latest American standard ANSI/GEIA-STD-0009-2008 “Reliability Work Standards in System Design, R&D, and Production” [1] clearly claims that a progressive understanding of system-level operational loads and environmental loads should be conducted to identify the failure modes and mechanisms. For a product with hierarchical structures, analysis on each level of the product can be divided into the system-level and the component-level. And the componentlevel analysis refers that it does not need to consider its internal components of the product. Therefore, two relative concepts, which are global load-response and local load-response respectively, are produced in the load analysis. For example, in the analysis of a computer case, the global load mainly refers to the environmental load and operational load, and the local load refers to the load applied on the devices on the main board. However, if the analysis is focused on the devices on the main board, the load applied on the main board is changed to the global load.

5.2.1.1

Global Load-Response

The global load-response usually refers to the environmental load and operational load in the life cycle of the system-level product. It generally comes from the outside of the product, including the environment, peripheral equipment, etc., and may also come from the operation activities from the users or maintenance persons.

5.2.1.2

Local Load-Response

Local load-response usually refers to the load applied on component-level products (such as sub-systems and components) during their life cycles. It is the local response or distribution on each component-level product from the global load, and obtained through the decomposition of the global load based on the structure analysis of the system. Accurate determination of local load-stresses is helpful to design reliable devices, and propose accurate requirements for the development of commercial offthe-shelf (COTS), non-developmental item (NDI) and customer furnished equipment (CFE). Generally, it can be considered that the location with a local stress concentration or excessive loads is the weakest link (the fault-prone point) that affects the reliability of the product. Such local stress concentration (or weakest link) locations can be determined from the load-response analysis by using finite element methods. Considering that the product is actually affected by a combination of multiple stresses, it is sometimes necessary to establish a reliability simulation model-based on various load effects, to carry out further detailed analysis for determining the reliability weakest link of the product. And, the basic idea of the above-mentioned weakest link identification based on load-response analysis is shown in Fig. 5.18.

5.2 Load-Response Analysis and Fault Identification

Object and load analysis

Stress and stress concentration related weak link analysis

Specify analysis objects

Modeling based on load and object physical structure

Specify object boundaries and define their environment

Determine the stress distribution on the physical structure of the object

Specify the load type and size

Determine the distribution location of stress concentration and possible weak links

189

Fault mechanism and reliability weak link analysis Find the single point of weakness based on the fault mechanism model

Multi-point fault fusion extrapolating product fault distribution characteristics

Fig. 5.18 Procedure for the weakest link identification based on load-response analysis

5.2.2 Finite Element Methods for the Load-Response Analysis In mechanics, stress is usually defined by the internal force per unit area. However, in the actual load-response analysis process, due to the structural complexity, it is difficult to directly calculate the stress within a product. Then, numerical methods are usually used in analysis and evaluations of the stresses (such as common mechanical stress and thermal stress) under a variety of external loads. Among these numerical methods, the finite element method is the most mature and effective one at present.

5.2.2.1

Fundamental Principles of the Finite Element Methods

In engineering practice, the load-response analysis can be transformed into a set of differential equations with corresponding boundary conditions, that is, the problem of solving differential equations. The finite element method [42] is the most commonly used numerical method to solve such a problem. It is a numerical analysis technique that organically integrates elastic theory, computational mathematics and computer software. The basic ideas of the finite element method can be summarized into two aspects, which are discretization and piecewise interpolation, respectively.

5.2.2.2

Discretization

Discretization is to artificially divide a continuous solution domain into a number of elements, and the connection points between the neighboring elements are called nodes. And the interaction in between these elements can only be transmitted through the nodes. By discretization, a continuum structure is divided into the assembly of a certain number of elements, as shown in Fig. 5.19.

190

5 Physics of Failure Based Fault Identification and Control Methods

discretization plane elements

element node discretization element

Space elements

node continuum

assembly

Fig. 5.19 Illustration of discretization of a continuum structure

The purpose of traditional discretization is to convert the original continuous variant differential equations and boundary conditions with infinite degrees of freedom into algebraic equations containing only a limited number of nodal variables, for easily to be solved by computer. However, the discretization idea of the finite element method is not only limited to differential equations, but extends to physical models of the continuum model itself. Even though the differential equations of the physical model cannot be derived, the discretization process can still be carried out. Meanwhile, the elements in the finite element method are not necessary to be in a regular structure, and their shape and size are not required to be the same. Therefore, the finite element method has higher adaptability and discrete accuracy in dealing with complex geometric shapes and boundary conditions, with local characteristics such as stress concentration.

5.2.2.3

Piecewise Interpolation

The idea of piecewise interpolation is to select a trial function (also called interpolation function) to do integral calculation in each element. Since the element owns a simple structure, its boundary conditions can easily be satisfied. Therefore, a loworder polynomial algorithm is good enough to obtain appropriate accuracy. For the entire solution domain of the continuum model, as long as the trial functions meet certain conditions, the finite element solution can converge to the actual accurate value with the reduction of the size of elements. The superiority and practicability of the finite element method are mainly summarized as: 1. 2. 3. 4.

It can be used for the structures with complex shapes; It can handle complex boundary conditions; It can meet the given accuracy requirement; It can handle different a variety of materials.

To sum up, the finite element method is currently widely used to handle the load-response analysis in most of engineering projects. It can be used in linear static

5.2 Load-Response Analysis and Fault Identification

191

analysis, dynamic analysis, and the analysis of special problems such as nonlinearity, thermal stress, fluid, electromagnetic, contact, creep, fracture, machining simulation and collision simulation. In addition, the finite element method has laid a good foundation for product fault identification and analysis, and is therefore a useful method of reliability analysis.

5.2.2.4

Basic Process of the Finite Element Analysis

The main process of the finite element method is to firstly build a finite element (FE) model, and then perform finite element analysis (FEA) by using the FE model [43]. The FE model is the key in applying the finite element method, as it will directly affects the accuracy, time, sizes of the calculations, and even if the calculation process can be completed. Although the finite element equations used in various loadstress analyses (such as static analysis, dynamic analysis, thermal analysis, etc.) are different, the analysis process is similar. From an application point of view, the finite element analysis process, as shown in Fig. 5.20, can be divided into three stages, including pre-processing, solving and post-processing respectively. (1) Pre-processing: The purpose in pre-processing is to establish a FE model. Its task is to convert the actual problem or design plan into a FE model that can provide all input data for numerical calculations. This model quantitatively reflects the characteristics (including geometry, materials, loads, constraints, etc.) of the FE model. The core of FE modelling is “discretization”. And around discretization, a lot of relevant work needs to be completed, including structural simplification, geometric modelling, determination of the element type and quantity, definition of the element properties, meshing, element quality inspection, sequential optimization of the element IDs, definition of boundary conditions, etc. (2) Solving: The task of solving is to complete the numerical simulation with the FE model and export the required results. Its main work includes the steps to establish the element and global matrices, set boundary conditions and solve characteristic equations. Due to the large computational cost, this part of the work is mainly done by computer programs, without manual intervention except for the necessary settings and selections on solvers, solving parameters and calculation conditions. actual design scheme

pre-processing

finite element model

calculation solution

computed result

postprocessing

evaluation, optimization and modification

Fig. 5.20 The general process of finite element analysis

192 actual structural design scheme

5 Physics of Failure Based Fault Identification and Control Methods problem definition

geometric modeling

selection of element type

mesh generation

model inspection

definition of boundary conditions

test

calculation

result comparison excessive deviation

acceptable deviation

end

model modification

Fig. 5.21 The general steps of FE modelling

(3) Post-processing: The task of post-processing is to perform necessary processing on the FE simulation results to analyze and evaluate the performance (or reasonability of design) of the FE model, and further propose the improvement or optimization plans. In the above-mentioned three stages of finite element analysis, the modelling (pre-processing) stage is the most critical one, which is mainly carried out through human–computer interaction, as shown in Fig. 5.21. Each of the steps in Fig. 5.21 can be referred to other related books, and herein will not be given in details.

5.2.2.5

Common Cases of Load-Response Analysis

A batch of load-response analysis, such as the static analysis, dynamic analysis, and thermal stress analysis, which are closely related to reliability analysis, are introduced in this sub-section. Especially, the support of these above load-response analysis on reliability analysis is particularly elaborated.

5.2.2.6

Static Analysis

Static analysis [44] is the simplest, most fundamental and commonly observed application by using finite element method. It is mainly used to calculate the response from a structure under a fixed static load, such as displacement, stress, strain and force, without the influence of inertia and damping. Static analysis can be linear or non-linear. The non-linear situations include large deformation, plasticity, creep, stress toughening, contact elements, and super-elastic elements. Figure 5.22 displays the stress distribution of a bearing support under static load. Structure is the important composition of all kinds of equipment. Some structures have relatively poor working conditions, such as long-term working under the full capacity, vibration or shock conditions. Finding a correct and reliable design (and simulation) method for the structure of an equipment is one of the main ways to improve its working performance, reliability and lifetime. In reliability analysis, the results of static analysis (with stress, strain, etc.) can be directly used as the fundamental data for further in-depth analysis, such as the stress-strength analysis in

5.2 Load-Response Analysis and Fault Identification

193

Fig. 5.22 Static analysis results of a bearing support structure

mechanical reliability, structural durability analysis, and failure mechanism analysis in product packaging.

5.2.2.7

Dynamic Analysis

In engineering, many products are subject to (time-varying) dynamic loads. For instance, cars are under road loads, radars are under wind loads, ocean platforms are under wave loads, rotating machines are under eccentric centrifugal loads, etc. In such circumstances, dynamic analysis is needed to understand not only dynamic characteristics but also reliabilities of those products under dynamic loads (such as reliabilities of airborne equipment and structures under aerodynamic loads). Dynamic analysis includes eigen value analysis and response analysis. The eigen values are quantitatively described by a set of modal parameters such as eigenfrequency, mode shape, modal stiffness and modal damping ratio. It is determined by the structure itself (mass and stiffness distribution), and has nothing to do with external loads. However, it determines the structure response under dynamic loads. The eigen value analysis is to calculate these modal parameters, with two major purposes: to avoid the occurrence of resonance and harmful modes in the structure, and to provide the necessary basis for the further response analysis. Response analysis is to calculate the various response characteristics, including displacement response, velocity response, acceleration response, dynamic stress and dynamic strain, etc., of the structure under a given dynamic load. These various responses of the structure are often represented by time-varying curves, the mode shape is usually displayed by deformations or animations, whereas the other modal

194

5 Physics of Failure Based Fault Identification and Control Methods

maximum equivalent stress point

equivalent stress/MPa

time/s

Fig. 5.23 Dynamic analysis results of an electronic package under shock loads a profile of equivalent Von Mises stress distribution; b dynamic response process of equivalent Von Mises stress

parameters are listed in tables. The related applicable situations in reliability analysis are those such as the components under vibration and shock loads, aircraft structures under aerodynamic loads, and electronic packages under drop shocks. For instance, Fig. 5.23 shows the dynamic response analysis results of an electronic package structure under shock loads.

5.2.2.8

Thermal Stress Analysis

The purpose of performing thermal analysis is to determine the temperature distribution of the product and its components, and to verify and optimize the thermal design [45]. Thermal analysis can calculate the temperature distribution within the structure or a local area under given thermal boundary conditions (i.e. thermal environment), and then find out the thermal deformation and thermal stress caused by temperature gradients. The temperature field of the product can be obtained by two ways, which are the numerical calculation and thermal measurement, respectively. The numerical calculation of the thermal field, also known as thermal simulation, is

5.2 Load-Response Analysis and Fault Identification

195

a method to obtain the temperature distribution in the product by using mathematical calculations. This method is mainly suitable to be used in the product design process (such as the preliminary design stage of the product), when no physical product is available for measurement. Nevertheless, the thermal measurement is used to determine the surface temperature and temperature field of the physical product, with more accurate results. The numerical calculation of the thermal field must consider three ways of heat exchange: heat conduction, heat convection, and heat radiation. In thermal analysis, it needs to establish and solve mathematical models for the product temperature and flow fields. Due to the computational complexity, the solving process needs to be done by virtue of computer programs or software, which can be either general finite element simulation software or special thermal analysis software. Heat conduction is the most concerning problem in reliability analysis. In the heat conduction analysis, the temperature of each node (i.e. temperature distribution) is first calculated, and then the thermal deformation and thermal stress of the structure are calculated accordingly. The change of temperatures in the structure will result in thermal deformation. When the thermal deformation is free to expand, it will not cause internal stress. But if the structure is not uniformly heated or its thermal conditions are restricted by environment, thermal deformation will be mutually restricted by both internal parts and external constrains, to generate internal stress within the structure. Such a stress caused by the temperature difference is called thermal stress. Correspondingly, the temperature difference that generates the thermal stress can be regarded as a kind of load, called the temperature load. The relevant analysis is a typical thermal–mechanical coupling problem, and displays of the temperature, thermal deformation, thermal stress distribution and heat flow of the structure in a certain manner are helpful to evaluate the pros and cons of the thermal design, and develop the corresponding improvement or control methods. The above-mentioned thermal design and analysis work mainly focus on the analysis on temperature distribution within a product or its structure, to develop the certain methods of improvements and optimization. Furthermore, the stress field in terms of the above-mentioned temperature field can be calculated and evaluated. The resulted thermal stress plays an important role in reliability of an electronic package and its interconnect structure. Therefore, the reliability issues of many packages can ultimately be attributed to fracture or fatigue failures caused by the thermal stress (including high and low temperature cycles, temperature shocks, etc.). This is to say that the results of thermal analysis, as shown in Fig. 5.25 for instance, can also support the failure mechanism analysis and development of the failure mechanism models. For instance (Fig. 5.24),

196

5 Physics of Failure Based Fault Identification and Control Methods

(a)

(b)

(c)

Fig. 5.24 Results of thermal stress analysis of the solder joints in an electronic package a FE model of the solder joints; b profile of the equivalent von Mises stress distribution; c stress–strain cycle diagram of the solder joint

simulation analysis input information

The main steps of simulation analysis

fault identification

design information collection product design information collection

product design information

feedback digital model construction CAD digital prototype thermal tests/vibration tests(model modification)

load- response analysis

CFD digital prototype

FEA digital prototype

thermal stress analysis

vibration stress analysis

potential failure point

mission profile

Fig. 5.25 The basic process of the simulation based fault identification

stress concentration point

5.2 Load-Response Analysis and Fault Identification

197

5.2.3 Simulation Based Fault Identification 5.2.3.1

Fundamental Ideas and Goals

The simulation based fault identification is implemented on the basis of the Physics of Failure (PoF) methods. It uses FE software to establish the product’s FE model, including geometric parameters, material properties, boundary conditions, etc., to calculate the displacement, acceleration and stress of each node/element of the product. Finally, by combining the PoF models, the reliability (such as the mean time to first failure) of the product is evaluated, the possible failure points can be detected, and the improvement methods can be developed. Through the combination between the reliability design and performance design, and the combination of reliability simulation with the modal tests, random vibration tests, and thermal tests, the simulation based fault identification can solve the difficult engineering problems effectively and is mainly used for electromechanical products or electronic products [34].

5.2.3.2

Basic Process

The simulation based fault identification includes four steps which are design data acquisition, development of the numerical model, stress analysis, and potential failure identification, as shown in Fig. 5.25.

5.2.3.3

Design Data Acquisition

Before conducting a simulation analysis, the design information of the product should be comprehensively collected by reliability simulation analysts and product designers. The reliability simulation analysts are responsible to provide information collection tables, whereas the product designers should fill in those tables in correct ways, and provide the relevant design documents in the same time.

5.2.3.4

Development of the Numerical Model

The numerical models, includes the CAD model, FEA model and computational fluid dynamics (CFD) model, of the product are developed based on the design data collected in the previous step. In these numerical models, the CAD model should be firstly established [46]. Then the FEA and CFD models can both be established on the basis of the CAD model.

198

5.2.3.5

5 Physics of Failure Based Fault Identification and Control Methods

Stress Analysis

By using the FEA (and CFD) models, the vibration stress analysis, thermal stress analysis or static/dynamic analysis should be implemented. (1) Vibration stress analysis The input information for implementing the vibration stress analysis includes the FEA model, working status and vibration environmental conditions during the product life cycle. The working status and vibration environmental conditions can be obtained from the environmental profiles in the life cycle of the product. They can be used to determine the vibration conditions required as inputs of the vibration stress analysis. The main outputs are the vibration analysis report and the relevant vibration analysis results required for further failure prediction. (2) Thermal stress analysis The input information for implementing the thermal stress analysis includes the CFD model, working status and thermal environmental conditions during the product life cycle. The working status and thermal environmental conditions can be obtained from the environmental profiles in the life cycle of the product. They can be used to determine the thermal conditions required as inputs of the thermal stress analysis. The main outputs are the thermal analysis report and the relevant thermal analysis results required for further failure prediction.

5.2.3.6

Potential Failure Identification

The potential failure identification process is implemented by the following steps: (1) find the concentration points of the stress (and strain, temperature, etc.) in the product’s FEA (or CAD) model. (2) Based on the stress concentration results obtained from the stress analysis, combing with the local working state of the corresponding locations and the PoF models, to identify the possible failures such as fatigue, fracture, crack propagation, etc. (3) The identified potential failures are fed back to the design plan of the product, to guide the design improvement or reduce the occurrence probability of the failure to an acceptable range.

5.3 Time Analysis and Fault Identification of the PoF Model The product reliability refers to not only randomness, but also time varying characteristics. This is because the structure, material properties and environmental loads of a product could all be time-dependent. During the service process of the product, its reliability will change over time. Nowadays, engineers have established product

5.3 Time Analysis and Fault Identification of the PoF Model

199

reliability models with consideration of the time effect, by transforming the limit state function into a classic structural reliability issue to calculate reliability [47]. The main process is as follows: (1) (2) (3) (4) (5)

Failure mode analysis of the product. Establishment of the time-varying reliability model. Calculation of penetrability. Determination of the limit state function. Model parameters determination based on degradation process.

5.3.1 Failure Mode Analysis of the Product During the life cycle of a product, all fault modes of the product can be obtained through FMECA analysis for various purposes [48]. In the early stage of product development, it firstly needs to understand all potential failure modes of the product [49]. Generally speaking, these potential failure modes can be obtained through a batch of methods such as statistics, testing, analysis, and prediction, based on the following main principles: (1) For the currently existing products, the potential failure modes can be obtained based on the historical failure modes, through the analysis and comparison of the similarities and differences of the environmental conditions. (2) For the new developed products, the potential failure modes can be obtained through the analysis and investigation of their functional principles and structural characteristics, or based on the failure modes of the products with similar functions and structures. (3) For imported Commercial Off-The-Shelf (COTS) products, the potential failure modes can be obtained from their official specification sheets (or manuals) or based on the failure modes of the products with similar functions and structures. (4) For commonly used components and parts, the potential failure modes can be obtained from certain Chinese and international standards and manuals. (5) For those products on which the above-mentioned methods in (1)–(4) do not work, the potential failure modes can be obtained referring to certain table of typical failure modes collected by experienced experts.

5.3.2 Establishment of Time Varying Reliability Model The time-varying characteristics of both strength and stress of the product belong to a stochastic process [49]. Therefore, a design reference period is needed in the development of the product design specification with the traditional structural design circumstances. And within such a design reference period, the stochastic process model for stress is transformed into the random variable model. However, in the calculation of the structural reliability, the degradation process of the structural strength over time

200

5 Physics of Failure Based Fault Identification and Control Methods

is not considered. This means that the developed reliability model is static. Unfortunately, for the products in service, there is a great error in reliability calculation by using static models. Therefore, the time effect must be considered in the development of the reliability model, by considering the time-varying characteristics of the product’s strength and stress. There are many factors to cause the random degradation of product reliability in terms of time. These factors can be indicated by variables X(x 1 , x 2 , …, x 3 ), in which x i refers to the random process of work/environmental loads, material properties, geometric dimensions and boundary conditions. And the time-varying process can be indicated by time t. For describing the random and time-varying properties of a product, the time-varying limit state function G(t, X) can be established. G(t, X) is regarded as a random process, and if: • G(t, X) > 0, the product is safe; • G(t, X) < 0, the product is failed; • G(t, X) = 0, the boundary in between the above two states, called the limit state. The limit state indicates the critical state, over which the product cannot meet its specified performance and function requirements. It also indicates the critical state in between reliability and unreliability, and therefore is the basis for reliability analysis and design [50]. The failure probability of the product in the time interval [0, T] is equivalent to the probability that the time-varying limit state function G(t, X) ≤ 0. This derives the definitions related to the first up-cross. For instance, within [0, T], the first occurrence of G(t, X) ≤ 0 is called the first penetration event. And the occurrence time and probability of the first penetration event is called the failure time and first penetration probability, respectively. Figure 5.26 illustrates an example of G(t, X). By letting G(t, X) = G0 − g(t), in which G0 and g(t) are the initial value and degradation process of the limit state function, respectively. There are 8 times in total that g(t) intersects with the G0 line within the period [0, T], and the crossings with positive and negative slopes are both 4 times. When the first penetration occurs, the product fails, that is Fig. 5.26 The first penetration process under the action of a random process

g(t)

first penetration

G0

0

t t

T t

5.3 Time Analysis and Fault Identification of the PoF Model

201

v + (t)dt = P{G(t, X ) > 0 ∩ G(t + ∆t, X ) ≤ 0}

(5.29)

In the product reliability analysis, the penetration probability v+ (t) indicates that the probability of that the product is under normal operation at time t, but fails at time t + ∆t. Therefore, v+ (t) can be expressed as: P(A ∩ B) , v (t) = lim + ∆t→0 ∆t +

{

A = {G(t, X ) > 0} B = {G(t + ∆t, X ) ≤ 0}

(5.30)

The penetration event can be assumed to follow a Poisson distribution, which is calculated approximately as: [ { P f,c (0, T ) ≈ 1 − exp −

T

+

]

v (t) dt

(5.31)

0

In summary, for calculation of the product failure probability, the key lies in the calculation of the penetrability [51].

5.3.3 Calculation of Penetrability In the previous sub-section, the time-varying reliability is transformed into the penetration probability in the time interval of [0, T], Then, this section focuses on the development of the time-varying reliability model by using the penetration probability model. In Eq. (5.30), ∆t is usually very small, and therefore can be discretized by using the finite difference method, namely v + (t) =

P{G(t, X ) > 0 ∩ G(t + ∆t, X ) ≤ 0} ∆t

(5.32)

Then, by using the first-order second moment method, the limit state functions at time t and t + ∆t can be linearized respectively by Eq. (5.33), and are shown in Fig. 5.27 (by taking n = 2), {

G(t, X ) = α T (t) · x + β(t) G(t + ∆t, X ) = α T (t + ∆t) · x + β(t + ∆t)

(5.33)

where: x = [x 1 , x 2 ,…, x n ] is the normalized vector of the stochastic process, α(t) and α(t + ∆t) are unit normal vectors of the tangent surfaces of the limit state hypersurface, β(t) and β(t + ∆t) are the corresponding reliabilities respectively. In Fig. 5.27, β(t) and β(t + ∆t) are the shortest distances from the limit state hypersurface to the origin in U space, respectively. α(t) is the unit normal vector of

202

5 Physics of Failure Based Fault Identification and Control Methods

B: limit state hypersurface at time t+ t

x2

A: limit state hypersurface at time t

failure domain

(t) Prob(A∩B) (t+ t)

x1

(t+ t) (t)

Fig. 5.27 The linearization process of the limit state function

the tangent surface of the limit state hypersurface. And the shadow part in Fig. 5.27 indicates the event probability in Eq. (5.30), namely v+ (t)∆t [52].

5.3.4 Determination of the Limit State Function When analyzing the time-varying reliability of a product, it is necessary to study the time-varying trends of both the product’s strength and stress at the same time. And during the service period of that product, the factors, such as long-term operational loads, environmental effects, corrosion and material degradation, will cause degradations on its structural strength and cross-sectional dimensions. The strength R of the product refers to its ability to withstand the external effect, i.e. the ability to resist damage, deformation, etc. It is related to the product’s material properties, geometry, size and external effects. The strength data of the product are obtained through mechanical tests and randomly distributed. In addition, the strength R will degrade under the effect of the external loads, to make it as a random process. The stress S(t) the product refers to the internal load including its own weight, the loads transmitted from the other parts, the inner stress generated to resist the external

5.3 Time Analysis and Fault Identification of the PoF Model

203

loads, etc. Thus, the limit state function of the product can be written as, G(t, X ) = R(t) − S(t)

(5.34)

According to the FORM method [53], there is: β(t) =

μ[G(t, X )] σ [G(t, X )]

(5.35)

where μ[G(t, X)] and σ [G(t, X)] are the mean and standard deviation of the random process G(t, X), respectively.

5.3.5 Model Parameters Determination Based on the Degradation Process The development of the time-varying reliability model relies on the determination of the mean and standard deviation of the limit state function G(t, X), both of which furtherly rely on the study of the degradation process of product’s strength and section modulus.

5.3.5.1

Determination of the Degradation Process Model

Degradation failure refers to the gradual reduction of performance properties of the product over time during the storage or working process. It generally propagates in a monotonous non-negative process, and eventually leads to the phenomenon of not achieving the specified function. The strength and geometric dimensions of the product will degrade under the effect of natural environment. And the degradation data can be obtained by natural accelerated environmental tests within certain test period, and then treated by statistical analysis for removing abnormal data, data smoothing, and data feature identification. However, due to limitations in sample counts and test period settings, the obtained data sometimes cannot reflect the characteristics of product’s parameters. Therefore, it is necessary to select an appropriate degradation model to describe the degradation process of those parameters, according to the determination process shown in Fig. 5.28 [54].

5.3.5.2

Estimation of Parameters of the Degradation Model

In the following part of this section, by taking the Gamma process based degradation model as an example, the process of estimation of the degradation model parameters is illustrated based on the results from strength degradation tests.

204

5 Physics of Failure Based Fault Identification and Control Methods failure mechanism degradation data The time uncertainty of the degradation is small?

yes

degenerative orbital model

no yes

Continuous degradation process

degradation model based on Wiener process

stationary gamma process

no

yes

Accumulation of minor damage caused by continuous use?

yes

degradation model based on Gamma process

With stable independent increment? no

no yes Damage accumulation due to stress impact?

Based on a cumulative damage model that conforms to the Poisson process

Nonstationary Gamma process

Fig. 5.28 The determination process of the degradation model

Suppose there are a total of n samples in the tests, which are measured periodically in a sequential chain of time t 1 < t 2 < · · · < t q . In other words, each of the sample is measured by q times. And the degradation data of the strength can be written as: { } ri j ; i = 1, 2, . . . , n; j = 1, 2, . . . , q

(5.36)

The log likelihood function of the strength measurements is ⎛ ⎞ q q n ∑ ∑ ∑ ( ( ) ) ( ) ( ( )) riq ⎝ ⎠ m t j − 1 ln ri j − m tq ln η − l(m(t), η) = lnΓ m t j − η i=1 j=1 j=1 (5.37) By substituting m(t) = at b into Eq. (5.37), the parameters of the degradation model can be estimated by using the likelihood function method given by: ⎧ ( ) ( q ) n ∑ ∑ ∂l ∂m ∂l b ⎪ ⎪ ln r − ln η t bj = 0 = · = − ψ at i j ⎪ ∂a j ∂m ∂a ⎪ ⎪ i=1 j=1 ⎪ ⎨ ( ) ) ( q n ∑ ∑ ∂l ∂l = ∂m · ∂m = a ln b ln ri j − ψ at bj − ln η t bj = 0 ∂b ∂b ⎪ i=1 j=1 ⎪ ⎪ ( ⎪ n ) ∑ ⎪ atqb riq ∂l ⎪ ⎩ ∂η = =0 − 2 η η

(5.38)

i=1

where ψ(•) represents the digamma function (i.e. the derivative of the logarithmic Gamma function).

5.4 PoF Based Failure Simulation and Evaluation

205

Then, the maximum likelihood estimations a, ˆ bˆ and ηˆ can be obtained by combining the above three equations [55].

5.4 PoF Based Failure Simulation and Evaluation The PoF based failure simulation and evaluation (abbreviated as reliability simulation) is usually conducted by using finite element technologies. For a product, based on its geometric characteristics, material characteristics and boundary conditions, profiles the displacements, accelerations and stresses can be calculated. And then, by combining the relevant PoF models, the potential failures of the product can be evaluated, and the occurrence time (for instance the mean time to failure) of those failures can be estimated [56]. Meanwhile, the weakest link of reliability in the product can be also detected and could be improved by design [57]. The PoF based failure simulation is capable to conduct both reliability and performance design of the product. In engineering practice, a combination of the failure simulation, load-response analysis, and time effect analysis can be used to tackle those physical failure problems which are difficult to be detected and handled. However, due to the limit of the existing modelling abilities and algorithms, the PoF based method is difficult to be used for system-level products. It is currently mainly used for component level products in electromechanical and electronic systems.

5.4.1 Fundamental Process The PoF based failure simulation mainly includes two steps, which are the failure prediction and reliability evaluation respectively, as shown in Fig. 5.29. And the inputs, main procedure and outputs of each step are introduced in the followings [58].

5.4.2 Failure Prediction The inputs for the failure prediction include the results from the load-response and time effect analysis. These results are obtained from the design and process parameters in details of the product include the type, location, size, focus, pins, power consumption of the components, as well as number of layers, thickness, and plated-through-hole (PTH) information of the PCBs. The main outputs include: (1) The weak links of the product. (2) The failure information matrix of the product.

206

5 Physics of Failure Based Fault Identification and Control Methods input information of failure simulation analysis

main steps of failure simulation analysis

load-response analysis

output result of failure simulation analysis

time effect analysis

failure prediction load-response analysis results

analysis of failure physics

time effect analysis result

weak links in product design and suggestions for improvement

Monte - Carlo randomization

reliability evaluation individual failure distribution fitting

multiple failures distribution fusion

product failure distribution characteristics mean time before failure of product

Fig. 5.29 Basic process of the PoF based failure simulation

(3) Monte Carlo simulation results on failure time predictions in terms of different failure mechanisms [59]. Among these above outputs, the failure information matrix contains the location of the failure, failure mechanism and a variety of other factors that affect TTF. And it, as well as the Monte Carlo simulation results, can further be used for the reliability evaluation of the product. The PoF based failure prediction method fully considers the environment and operational loads of an electronic component to determine the corresponding failure mode and failure mechanisms. Then the TTF of the electronic component can be obtained, and used in reliability evaluation to determine the design weaknesses of the product.

5.4.3 Reliability Evaluation By using the failure information matrix and the Monte Carlo simulation results on failure time predictions as inputs, the failure of the product can be finally evaluated. The main outputs include the distributions in time of individual or multiple failures and mean TTF of the product.

5.5 PoF Based Optimization Design and Fault Control

207

(1) Distribution in time of individual failures For a large sample of failure time data in terms of each failure mode, statistical distributions are used to obtain the probability density function of these data. (2) Distribution in time of multiple failures For the circumstances of multiple failures, a competitive failure model is used to obtain the failure time distribution which is fused from the failure time distribution calculated from each failure mechanism. And accordingly, the mean TTF of the product can be obtained from the fused failure time distribution.

5.5 PoF Based Optimization Design and Fault Control Design optimization (DO) refers to the design of complex systems and subsystems by using their internal interaction mechanism and multiple uncertainties. It will achieve the design optimization solution from an overall perspective of the system. As the failures inevitably exist in the system, the prevention and control of possible failure modes during the system design phase could greatly save R&D cost and reduce R&D time [60]. Once the system-level PoF model is established, it will be used in the system design to prevent and control the failures in the system.

5.5.1 Parameter Sensitive Analysis Based on Orthogonal Test and Grey Correlation Model Before doing the PoF based optimization design and failure control on the system, it is necessary to carry out the sensitivity analysis of system parameters on both the system performance and reliability. Such sensitivity analysis is usually performed by using the orthogonal experiments combined with gray correlation model, for dealing with both deterministic and uncertain parameters. Orthogonal experiments refer to a scientific method to arrange experiments using an orthogonal table, and analyze the experimental results by using mathematical statistics such as the range analysis method [61]. By supposing that X, Y, … are the different influencing factors in the experiment, t is the number of the influencing factors, X i is the value of the influence factor X at the ith level, and M ij is value of the influence factor j at the ith level, then the experimental result N k (k = 1, 2, …, n) after n times of experiments under M ij can be calculated as Ki j =

n 1∑ Nk − N n k=1

(5.39)

208

5 Physics of Failure Based Fault Identification and Control Methods

where K ij is the mean value of experimental results under influence factor k at the ith level, n is the number of experiments under influence factor k at the ith level, N k is the result of the kth experiment, N is the mean value of all experimental results. Then by using the range analysis method, the sensitivity of the influence factor j is evaluated by the range value Rj , which is calculated as follows: { } { } R j = Max K 1 j , K 2 j , . . . − Min K 1 j , K 2 j , . . .

(5.40)

From Eq. (5.40), it can be seen that the larger the range value Rj , the greater effect of the influence factor j on the experimental results. In another word, the sensitivity is positively related to the range value Rj . Grey correlation decision-making is one of the most commonly used methods in grey correlation theory [62]. Its basic idea is to find out the effect evaluation vector corresponding to the ideally optimized decision-making plan based on the background analysis of the problem. And then, the correlations of the evaluation vectors between the candidate decision-making plans and optimized decision-making plan are calculated to determine the rankings of these candidate decision-making plans. Before applying the grey correlation model, the system parameters for evaluation need to be described by with fuzzy numbers. In a fuzzy set theory, the fuzzy numbers are used to quantitatively describe the subjective and uncertain information. There are many kinds of fuzzy numbers, among which the triangular fuzzy number is more general one. It can be expressed as A = (a, b, c), with the membership grade function as: ⎧ ⎪ ⎪ ⎨

⎫ 0 (x ≤ a) ⎪ ⎪ ⎬ (x − a)/(b − a) (a < x ≤ b) μ A (x) = ⎪ (c − x)/(c − a) (b < x ≤ c) ⎪ ⎪ ⎪ ⎩ ⎭ 0 (x > c)

(5.41)

The triangular fuzzy number can be determined by Delphi method with the expert’s knowledge and experience. Suppose there are n experts, the ith expert’s ability is β i , and his fuzzy evaluation for a certain variable (related to a failure mode) is x i . In the form of triangular fuzzy numbers, x i = (ai , bi , ci ). Then, according to the expert’s opinion, the triangular fuzzy numbers in term of this variable are calculated as follows: a=

n ∑ i=1

βi ai , b =

n ∑ i=1

βi bi , c =

n ∑

βi ci

(5.42)

i=1

∑n where i=1 βi = 1, βi ∈ (0, 1). In a fuzzy environment, the basis in grey correlation theory is the defuzzification of the fuzzy numbers. There are many in-depth researchers focusing on the defuzzification algorithms. A typical one, which is proposed by Xiao and Li [63], is shown

5.5 PoF Based Optimization Design and Fault Control

209

as follows: A(x) =

N + 2N M + M 1 1 ∗a+ ∗b+ ∗c 2(1 + N ) 2(1 + N )(1 + M) 2(1 + M)

(5.43)

where M and N are model coefficients determined according to the degrees of deviation between b and a, c, respectively. And they indicate the circumstances that the probabilities of b is M times more than c, N times more than a, respectively. Hereby provides the procedure to carry out the grey correlation analysis on influencing factors. (1) Establish the comparison matrix. Suppose that a product or system has n kinds of influencing factors, which are denoted as x 1 , x 2 , … x j , … x n , and x j indicates the jth influencing factor. Assuming that each influencing factor has 3 variables. Then the data column in terms of the jth influencing factor can be expressed as x j = {x j (1), x j (2), x j (3)}, where x j (t) (t = 1, 2, 3) represents the expert evaluation for the three variables, calculated by the non-fuzzy equation such as Eq. (5.43). According to the above method, the comparison matrix A including n influencing factors can be obtained as follows: ⎤ ⎡ ⎤ x1 (1) x1 (2) x1 (3) x1 ⎥ ⎢ ⎥ { } ⎢ ⎢ x2 ⎥ ⎢ x2 (1) x2 (1) x2 (1) ⎥ A = x j (t) = ⎢ . ⎥ = ⎢ . ⎥ . . .. .. ⎦ ⎣ .. ⎦ ⎣ .. xn xn (1) xn (1) xn (1) ⎡

(5.44)

(2) Establish the reference matrix, since the sensitivity ranking is usually obtained on the basis of a certain reference. From the perspective of the failures in a product or system, the reference matrix should be established from the best (or worst) value of each variable of the influencing factor, as shown below: ⎡

⎤ ⎡ VH VH VH 10 ⎢ .. ⎥ ⎢ . . .. .. ⎦ = ⎣ ... A0 = {x0 (t)} = ⎣ . VH VH VH 10

10 .. .

⎤ 10 .. ⎥ . ⎦

(5.45)

10 10

(3) Calculate the gray correlation coefficients. According to the grey correlation theory, the grey correlation coefficients between the variables of the influencing factors and their reference can be calculated according to Eq. (5.46). (

) ξ x0 (t), X j (t) =

I I I I min minIx0 (t) − x j (t)I + ζ max maxIx0 (t) − x j (t)I t t j j I I I I Ix0 (t) − x j (t)I + ζ max maxIx0 (t) − x j (t)I j

t

(5.46) where ζ is the distinguish coefficient, and ζ ∈ (0, 1).

210

5 Physics of Failure Based Fault Identification and Control Methods

(4) Calculate the gray correlation degree. When measuring the sensitivity of an influencing factor, each of the three variables in that influence factor may exhibit different influence degree. Therefore, assume that the weight of each variable is λt , the correlation between the jth influencing factor and the reference benchmark can be calculated by the following formula: 3 ) ∑ { ( ( )} γ x0 , x j = λt ζ x0 (t), x j (t)

(5.47)

t=1

∑ where 3t=1 λt = 1, λt is determined by the experts in advance according to the practical situation.

5.5.2 Reliability-Based Design Optimization In the process of system design, reliability requirements are usually considered as the necessary design constraints that must be met. For complex products, it is necessary to carry out reliability-based design optimization (RBDO) to both achieve the optimized performance and improve the stability and reliability in the design plan [64]. And then the optimization results to meet the satisfaction of reliability requirements can be obtained through a comprehensive consideration of the influence of the uncertainty on those constraints during the optimization process. There are two main tasks in the PoF based RBDO method, which are establishment of the PoF based reliability model and synchronous design optimization, respectively.

5.5.2.1

Establishment of Reliability Models

Traditional reliability models describe the relationship between the system reliability and the component failure rates, component redundancies, etc., but not the relationship between the system reliability and the key design parameters (such as structural parameters, material characteristics, component rated values, etc.). These key design parameters are also called design dependency parameters (DDPs). Based on the basis that the PoF model of the system is known, a PoF based reliability model can be established to describe the relationship between the key design parameters and reliability of the system, to support the quantitative design and optimization of system reliability, as described below: Rs = f (DDPs)

(5.48)

where RS is the reliability of the system; DDPs are the key design parameters closely related to the reliability. Then the establishment of the PoF based reliability model mainly includes the following steps:

5.5 PoF Based Optimization Design and Fault Control

211 approximate model

experiment design

numerical simulation

Fig. 5.30 The procedure to generate an approximation model

(1) Through sensitivity analysis, the key design parameters that affect the system reliability are screened out from the DDPs included in the PoF model. The details of the sensitivity analysis method are provided in Sect. 5.5.1. (2) Generate the sample sets of those selected key parameters through the method of Design of Experiment (DOE). (3) Import the above sample sets into the PoF model to calculate the reliability indices through simulations. (4) Choose an appropriate approximate model, which can be converted to a reliability model through the regression by using the simulation results obtained from Step (3). (5) Verify the accuracy of the established reliability model. It should be noted that the most important one from the above 4 steps is to determine the approximate model. Generally, the procedure to construct such an approximate model requires three sub-steps, as illustrated in Fig. 5.30. First, generate samples of design parameters by using a certain experimental design method; Then, analyze these samples by using use mathematic models (or simulation software) to obtain the input/output datasets; Finally, fit these input/output data by using a certain fitting method to construct the approximate model. Nowadays, the mainstream approximate models include Response Surface Model (RSM), Artificial Neural Network (ANN), Kriging model, etc. [65]. A comprehensive comparison among these three kinds of models is shown in Table 5.3.

5.5.2.2

Synchronous Design Optimization

In the modern design process, optimization methods are often used to improve the key performance of the system with reasonable DDPs. Based on the PoF based reliability model, it is necessary to first clarify the optimization objectives, constraints and variables, and then carry out a synchronous optimization of both reliability and performance by using use various standard optimization algorithms provided in commercial optimization software.

212

5 Physics of Failure Based Fault Identification and Control Methods

Table 5.3 A comprehensive comparison among the RSM model, ANN model and Kriging model Model

Feature/applicability

RSM model

• Mature technology, can be systematically verified, and widely used in engineering applications • Work on situations with random error • Can only deal with small-scale problems (less than 10 variables)

ANN model

• Can deal with highly non-linear or large-scale problems (from 1 to 10,000 variables) • Work on modelling of deterministic problems • High computational cost (usually requires more than 10,000 training samples)

Kriging model

• Flexible, but also very complex • Work on modelling of deterministic problems • Can deal with medium size problems (less than 50 variables)

1) Optimization models In this section, three commonly used integrated optimization models are introduced in the followings. For the product related to multiple disciplines and their complicated coupling relationships, it is necessary to build the model by using Multi-disciplinary Design Optimization (MDO) method, as discussed in Sect. 5.5.3 for details. (1) Design optimization model with the reliability as the goal The design optimization model is built based on the reliability model, regarding the maximum reliability as the optimization goal, and key design parameters as the optimization variables. Meanwhile, it also considers the performance requirements, resource constraints (in space, weight, cost, etc.) and key design parameters as the design space constraints. Such a model is described as: f ind X max f (X ) s.t. g j (X ) ≥ g ∗j ( j = 1, 2, · · · , m) h k (X ) ≥ h ∗k (k = 1, 2, · · · , l) xil ≤ xi ≤ xiu (i = 1, 2, · · · , n)

(5.49)

where X = (x 1 , x 2 , …, x n )T are the n dimensional vectors of the key design parameter. f (X) is the system reliability function of X, gj (X) is the jth performance function of X , g ∗j is the design requirement for the jth performance, m is the number of the performance constraint indices, hk (X) is the kth source constraint function of X, h ∗k is the design requirement for the kth source constraint, l is the number of the source constraints, [xil , xiu ] is the design space of the ith key design variable. In the optimization process, gj (X) and hk (X) can be directly calculated by CAE simulations. In case that the calculation process is very complicated, the efficiency of the optimization process can be improved by using the response surface method. (2) Design optimization model with performance as the goal

5.5 PoF Based Optimization Design and Fault Control

213

The design optimization is conducted by regarding the maximum performance as the optimization goal, and key design parameters as the optimization variables. Meanwhile, it also considers the performance requirements, resource constraints (in space, weight, cost, etc.) and key design parameters as the design space constraints. Such a model is described as: f ind X ( ) max G g j (X ) ( j = 1, 2, · · · , m) s.t. f (X ) ≥ R∗ g j (X ) ≥ g ∗j ( j = 1, 2, · · · , m) h k (X ) ≥ h ∗k (k = 1, 2, · · · , l) xil ≤ xi ≤ xiu (i = 1, 2, · · · , n)

(5.50)

Since there are multiple performance properties that need to be optimized, solution of Eq. (5.50) is a multi-objective optimization problem. The optimization objective function G(gj (X)) is a cost function of multiple performance properties, which is usually described by a simple weighting function. R∗ is the required value of the reliability index. (3) Design optimization model with both reliability and performance as the goal The design optimization is conducted by regarding the maximum tradeoff between the reliability and performance as the optimization goal, and key design parameters as the optimization variables. Meanwhile, it also considers the performance requirements, resource constraints (in space, weight, cost, etc.) and key design parameters as the design space constraints. Such a model is described as: f ind X ( ) max G f (X ), g j (X ) ( j = 1, 2, · · · , m) s.t. f (X ) ≥ R ∗ g j (X ) ≥ g ∗j ( j = 1, 2, · · · , m) h k (X ) ≥ h ∗k (k = 1, 2, · · · , l) xil ≤ xi ≤ xiu (i = 1, 2, · · · , n)

(5.51)

2) Optimization algorithm The optimization in the integrated design is usually a complex nonlinear problem, since the related objective function and constraints are generally nonlinear functions. It can be solved by using both classical nonlinear programming methods and modern intelligent optimization algorithms. The classical methods have a high computational efficiency, but are easy to fall into a local optimal solution. And modern algorithms mainly include the genetic algorithm, simulated annealing algorithm, ant colony algorithm, and tabu search algorithm [66], as listed in Table 5.4.

214

5 Physics of Failure Based Fault Identification and Control Methods

Table 5.4 Main modern intelligent optimization algorithms Algorithms

Main features

Genetic algorithm

• Learn from the concepts of chromosomes and genes in biology to simulate the genetic and evolutionary mechanisms of organisms in nature • Use the fitness function to determine the direction and range of further search, without the need for information such as the derivative value of the objective function • Search for information at multiple points with inherent parallelism

Simulated annealing algorithm

• Simulate the crystallization process of solid materials in statistical physics • In the annealing process, if a good solution is found, accept it; otherwise, a bad solution is accepted with a certain probability to jump out from the local optimal solution

Ant colony algorithm

• Simulate the behavior of ants searching for food • It has a strong ability to find better solutions and not easy to fall into local optimal solutions • The algorithm itself is very complicated and generally requires a long search time

Tabu search algorithm

• The taboo technique is used to prohibit the repetition of the previous work, which avoids the main shortcoming of the neighborhood search falling into the local optimum • It has a strong dependence on initial solutions

5.5.3 Reliability-Based Multidisciplinary Design Optimization Reliability-based multidisciplinary design optimization (RBMDO) organically combines the reliability analysis and multidisciplinary design optimization (MDO) to make the design of complex products to achieve both optimal design and reliability requirements [67]. A design problem may involve multiple engineering disciplines, each of which can give analysis results based on its related theories or simulation tools. And there are also complex coupling relationships among the analysis results from different disciplines. RBMDO is to fully consider this coupling relationships to ensure that the designed complex product can have sufficient reliability against the fluctuations caused by uncertainties. RBMDO needs to consider the random, fuzzy and interval mixed uncertainties of the system parameters. In RBMDO under mixed uncertainties, serious coupling problems will be addressed in the circumstances that two or more types of uncertainties exist in the same time. These coupling problems include the couplings between deterministic optimization and reliability analysis, between deterministic optimization and multidisciplinary analysis, between reliability analysis and multidisciplinary analysis, etc. They will cause two, three, or even more levels of iterative solutions in the design optimization process, resulting in low computational efficiencies. In this way, they need to be simplified by using a collaborative optimization strategy.

5.5 PoF Based Optimization Design and Fault Control

215

Collaborative optimization (CO) decomposes the original design optimization problems into two levels, i.e. system-level optimization and subsystem-level optimization, respectively. With the process of the iterative optimization, the linear approximations of the subsystem-level response are continuously added to the system-level optimization as alternatives to the consistency constraints. These cumulative linear approximations become the constraints of system-level optimization. Therefore, it is necessary to carry out collaborative optimization combined with linear approximation (CLA-CO) based on the linear approximation filter (LAF) strategy [68]. In CLA-CO, the system-level equality constraints are replaced by the linear approximations of the cumulative subsystem objective function. Therefore, CLACO has a higher computational efficiency. The general expression form of CLA-CO is: • System-level optimization:

when

n [ U i=1

min f (xs , x1 , x2 , . . . , x N ) L i(1) (xs , xi ) ≤ 0, . . . , L i(k) (xs , xi ) ≤ 0

]

(5.52)

where f is the optimization objective function, x s is the design parameter in a vector form, x i is the system-level copy of the ith subsystem local variable, n and k are numbers of the subsystems and iterations respectively, L i(k) is the linear approximation of the ith subsystem at the kth iteration. • Subsystem level optimization: II II2 min Ji = ||xsi − xs ||2 + IIxi − xˆi II when ci (xsi , xi ) ≤ 0

(5.53)

where ci (x si , x i ) ≤ 0 is the vector form of the constraint function of the ith subsystem. The calculation steps are shown in Fig. 5.31. Step 0: Initialization Set the number of iterative cycles k = 0, and the initial values of the design parameters, including xs , xˆ1 , xˆ2 , . . . , xˆn . Step 1: Subsystem level optimization The design parameters xs# and xˆs# obtained from the system-level optimization are assigned to each subsystem. In the first iteration, the initial values of the design parameters are assigned to each subsystem as system-level target values. Then these target values are further used in the solution of the subsystem-level optimization problem. In this step, the optimizations of different subsystems are performed in parallel. Step 2: Linear approximation

216

5 Physics of Failure Based Fault Identification and Control Methods

step 0

initialization

current design point

step 1

system target assignment

subsystem optimization

step 2

linear approximation

step 3

acception

step 5

construct a system-level optimization

step 4 linear approximate filtering

update design points

system-level optimization

no converge yes end of optimization

Fig. 5.31 The CLA-CO calculation process based on LAF strategy

A linear approximation of the corresponding subsystem-level response is obtained based on the design parameters of the subsystem obtained in Step 1. Step 3: Determination whether the linear approximation is acceptable The linear judgment algorithms are used to judge the linear approximation obtained in Step 2. If the linear approximation is accepted, it will be added to the system-level product, otherwise it will be sent to the LAF structure. Step 4: Linear approximation filter For the circumstance that the linear approximation is not accepted, LAF structure is used to construct linear approximation with minimum equality constraints. Then these developed linear approximations will replace the original cumulative linear approximations as the new constraints in system-level optimization.

5.5 PoF Based Optimization Design and Fault Control

217

Step 5: System-level optimization The system level optimization problem can be solved quickly by using the linear approximation constraints created in the previous steps. The CLA-CO optimization process ends when the system-level optimization meets the convergence conditions as shown in Eq. (5.54). I I (k) I f − f (k−1) I I≤ε I I I f (k) where ε is a small positive real number given in advance.

(5.54)

Chapter 6

Model-Based Reliability System Engineering R&D Process Model

Abstract This chapter first introduces the model-based system engineering process and design flow through the discussions on typical models and evolution of the system engineering process. After that, it analyzes and introduces the basic principles in the integrated design process of both functional performance and hexability, investigates the influence of the MBRSE on this integrated design process, and provides the multiple view description methods of behavior, function, information, organization, etc. for the MBRSE process. Finally, it systematically introduces the key technologies of the MBRSE process, including the process planning, operation conflict analysis, simulation-based operation evaluation, process review and validation, etc. Keywords System engineering process · Integrated design process · Process modeling · Process panning · Process analysis

6.1 Model-Based System Engineering Process and Design Flow 6.1.1 Evolution of the System Engineering Process Model-based system engineering (MBSE) is the latest development stage of the system engineering. Therefore, essentially, it still belongs to the system engineering, and its basic idea of stepwise decomposition from up to down and afterward integration from down to up is not changed. The core of MBSE is to use formal, graphical, and associative modeling language to transform the system engineering process, for realizing the leap from document-centric system engineering to model-centric system engineering, and thereby improving the strictness, traceability, repeatability, etc. of the entire development process.

© National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_6

219

220

6 Model-Based Reliability System Engineering R&D Process Model

System engineering has been an organization and management technology concerning the process of system engineering since its birth. In the 1960s, Hall proposed the ultrafine structure model of system engineering. In 1970, Winston Royce proposed the famous waterfall model. Afterwards, Kevin Forsberg and Harold Mooz proposed the V model in 1978, and the double V model at the first conference of INCOSE in 1991. As shown in Fig. 6.1, these models reflect the characteristics of an individual aspect of the system engineering process from different perspectives. They are not contradictory to each other, but iteratively developed in a continuous way [69]. The Hall ultra-fine structure model describes the system development process from the time dimension and logic dimension, respectively. It mainly focuses on the description of the connection relationship between multiple development stages, and gives the technological logic applicable to each development stage. Under the same technological logic, the objects and content of the work are different from different stages, and the design results obtained from the previous stage create inputs for

problem definition and requirement analysis

requirements development

incentives from the environment

determine system value measurement

normative system

implementation

design execution generate viable alternatives

decision

experiment maintenance program tradeoff and optimization

alternative analysis

(a)

(b )

define system requirements

experiment

allocate system functions to subsystems

use and verification of the whole system

subsystem verification

detailed component design

components verification

(c)

Fig. 6.1 System development model a hall ultra-fine structure model; b waterfall model; c V model

6.1 Model-Based System Engineering Process and Design Flow

221

the next stage. The waterfall model emphasizes forward advancement and reverse iteration in between the stages in the product life cycle and provides the original concept of concurrent engineering. The V model is one of the well-known models, emphasizing the process of decomposition and integration in system engineering and mapping relationships between requirements and verification. All these above models have been verified by actual engineering projects to prove their effectiveness [70]. At the end of the twentieth century, a more mature system engineering process model is developed for the product life cycle, as shown in Fig. 6.2. This model focuses mainly on various baselines in the management processes, including the requirement baseline, functional baseline, allocation baseline, production baseline, and product update baseline. Among these baselines, the requirement baseline is the last baseline added to the model with the development concepts and methods in the requirement engineering. And this model follows the same principle as the V model. As system engineering enters the model-based era, the MBSE system design process is conducted from four models which are the requirement model, functional model, logical model, and physical model (i.e. RFLP). As given by the V model in Fig. 6.3, its left half shows significant model evolution characteristics, whereas the verification methods on its right half show the virtual verification. This makes the system engineering process more scientific, accurate, and rigorous. Under the MBSE mode, the key action of the system design is to construct certain models at the right time. Then through the correlation in between these models developed at different levels, it is capable to realize the traceability of user needs, and support the whole process of the system design. Compared with the traditional system design process, this new process emphasizes the virtual verification by using various

Fig. 6.2 System engineering process model

222

6 Model-Based Reliability System Engineering R&D Process Model virtual verification physical verification

multidisciplinary synthetical design

requirement model R

requirements definition

functional model F

comprehensive validation

technical indicator

functional decomposition

logical model L

comprehensive integration

logical definition single test

physical model P

physical design and implementation

Fig. 6.3 MBSE development process model based on V model

models. Meanwhile, the baseline assessment at different stages will be changed from the document-dominated mode to model-dominated mode [71].

6.1.2 System Design Flow Process usually refers to the route to accomplish a complete business behavior by two or more steps. In order to realize the system design, it needs to refine the V model on the basis of the system development process model to create an operable product design process. That is, activities should be carried out by specific people with the support of specific resources. It is believed by many researchers that the process is difficult to be described from only a single view. On the contrary, it is necessary to model the process through the multiview integration modeling technology. Then the process can be described by a simple equation as: P = W5H, that is, Process = WHO does WHAT, at WHEN, and WHERE, by WHY and HOW. When describing a development process, it needs to describe its different aspects through five main views, such as the function view, behavior (relationship) view, organization view, information view, resource (constraint) view, etc. It is noted that the development of modern equipment belongs to a complex system engineering, and most researches tend to build multiview system models to describe it from different perspectives. However, so far, there has been no modeling method or tool that can comprehensively describe all its views [72].

6.1 Model-Based System Engineering Process and Design Flow

223

function view

organization view

business processes, enterprise activities

organizational structure and power of the teams

information view object view, object relationship integration rules

process view information extraction

mapping

product design process and information

resource view resource implementation capability

Fig. 6.4 Conceptual model of the multiview process model

Figure 6.4 illustrates the conceptual model of an integrated multiview process model, in which the five main views mentioned above are defined [73]. Different points of view can describe different aspects of the process. To simplify the description of the process, one can only show the flowchart formed by the activities and the logics in between them. In Fig. 6.4, the process (behavior) view is regarded as the core, which is used to describe the state changes in the entire development process of the product, and also the related activities of the process. The organization view describes the organizational structure and power of the teams (departments), roles, and people who are involved in the entire product design process. The function view mainly describes the work (activities) in each step of the entire product design process, for instance what processing is done, and also describes other information related to these activities. The information view is responsible for describing the information structure and relationships between different pieces of information in the entire R&D process of the product. For instance, the relevant information about the product tree can be described in this view. The resource view is used to describe the relationships and management of the generation, use, and release of resources, such as human resources, equipment, tools, and materials, that are involved in the entire product design process. It also includes descriptions of the start and end time and the progress of the activities in the process [74]. A model of a design process for an electronic product is illustrated in Fig. 6.5, in which only the behavior view is provided. As can be seen from Fig. 6.5, the various design tasks of the electronic product, as well as the logical relationships among these design tasks, such as sequence, feedback, and judgment, are considered in the behavior view.

224 Fig. 6.5 Illustration of the behavior view of the electronic product design process

6 Model-Based Reliability System Engineering R&D Process Model start to design circuit board function design EDA modeling and simulation meet design requirement yes

no

no yes

selection of elements

Can the improved selection of element meet the requirements?

no

meet design requirement yes layout wiring

no meet design requirement yes comprehensive evaluation of circuit board performance

pass yes

yes

no

no

Can the improved layout meet the requirements?

end of the design

6.2 Concepts of Integrated Design Process of Both Functional Performance and Hexability Under the MBRSE Mode 6.2.1 Integrated Design Process of Both Functional Performance and Hexability The hexability of a product is an important design characteristic and needs to be organically integrated into the product design process. Although many research and development institutes have established the overall development process with concurrent iteration solution in multiple stages based on the “system engineering” concept, the hexability is obviously less or even no considered in the process. To organically integrate the hexability into the product design process, it is necessary to review the existing development process by either a complete reorganization or a small improvement to establish the integrated design process of both functional performance and

6.2 Concepts of Integrated Design Process of Both Functional Performance …

successful practice

detailed design of specific processes

draft process

225

project pilots for the process repeatable processes

architectural design of the process

process improvement functional specifications

organizational process definition

process adjustments

standard process process reengineering requirement process analysis process reengineering process process measurement requirement implementation reengineering and process and monitoring business requirement evaluation history data environment driven by new enabling technologies

training and promotion of processes

Fig. 6.6 Reorganization model of the product design process

hexability. And such a process must be effectively improved and reorganized under the guidance of the concurrent engineering concept [75]. Construction of an integrated design process that combines performance and hexability can follow a versatile process reorganization model, as shown in Fig. 6.6. It divides the reorganization of the product design process into 8 parts, with multiple iterations in the actual implementation. Each iteration usually starts from the requirement analysis of the process reorganization. In principle, construction of the integrated design process of both functional performance and hexability starts from the determination and implementation process of requirements of the hexability indices, followed by taking the implementation of the hexability design criteria as basic design requirements, and taking the closed-loop fault mitigation and control as the driven force. Then by effectively incorporating the work items in hexability into the existing R&D process of the functional performance, the integration of both functional performance and hexability can be effectively implemented, as shown in Fig. 6.7. The hexability design criterion is created based on the basic theories and methods of hexability, through concluding, summarizing, refining, and methodizing the experience of design, manufacturing, and operation of the existing or similar products. And incorporation of hexability design criteria into the design process of a product is helpful to improve its general quality level. The common design principles for hexability include simplified design, redundancy design, component screening, derating, accessibility, error prevention, BIT, corrosion-resistant design, etc. These principles

226

6 Model-Based Reliability System Engineering R&D Process Model

basis for improvement implementation of hexability design criteria incorporation sort out of the existing development process

closed-loop fault mitigation and control

hexability work items

integrated design unified process

determination and implementation of hexability requirements

hexability design objective active control with fault mode mitigation as the core

Fig. 6.7 The basic idea of constructing an integrated design process of both functional performance and hexability

should be directly integrated into the product design process, rather than treated as specific work items. However, the processes of development and compliance inspection of the hexability design principles can be integrated into the product design process as specific work items [76]. The requirements of the hexability indices play important roles in the product design process. The ultimate goal of all hexability work is to make sure the product meet these indices requirements. According to determination of the hexability indices, different work items can be selected. For instance, if a reliability index is clearly proposed, its relevant work items will be added as well; Likewise, if a maintainability index is clearly proposed, the work items related to that maintainability index also must be added [77]. Researchers have found that there is a direct quantitative relationship between the fault modes that are closed-loop mitigated and controlled and the hexablity indices. Therefore, one can make decisions on the fault modes that need to be mitigated based on the hexability requirements, fault risks, and all kinds of technical and economic constraints. Then, taking the failure mode as the core, the different tasks in the development process can be organically linked to form a unified technical logic [78]. In each stage of the product design process, the technical logics of the integrated design of both functional performance and hexability are similar. Both of them are composed of several work items selected from three categories of work items: determination of design requirements, realization of design requirements, and verification of design requirements. The only difference is from the key facets and corresponding monitoring points in different work items of the hexability. The technical logic of the integrated design at each stage of the product design process is shown in Fig. 6.8.

6.2 Concepts of Integrated Design Process of Both Functional Performance …

227

Fig. 6.8 Technical logic of the integrated product design process

Based on the above concepts, a specific integrated product design process can be constructed. By taking the typical hexability methods or work items in the Chinese national military standards as the process nodes, the typical development stage (i.e. the engineering development stage) can be formed, as shown in Fig. 6.9. The processes of other development stages can be developed by tailoring the typical development stage. For instance, the process of the requirement argument stage pays more attention in the determination of hexability indices; the process of the plan stage pays more attention in the realization of functional principles; the process of the design stage pays more attention in the implementation of the product; the process of the mass production stage pays more attention in the verification of hexability requirements.

228

6 Model-Based Reliability System Engineering R&D Process Model

Fig. 6.9 The integrated product design process model of the engineering development stage

6.2.2 Effects of the MBRSE on the Integrated Design Process MBRSE is the further extension of the MBSE concepts in hexability. The most important change from MBSE to MBRSE to use unified models to convert the implementation of hexability requirements into the evolutionary process of the corresponding models. Then, through the concept of taking the fault closed-loop mitigation control as the core, the hexability work by using the MBRSE is more precise and targetoriented. Therefore, the number of traditional hexability work items will be greatly reduced, and the remaining work items will focus more on model construction, analysis, and evolution [79]. The concurrent design process of the MBRSE by considering model evolution is shown in Fig. 6.10, in which the four parts are tightly connected together with overlaps and interactions parts. The MBRSE model can be regarded as a further improvement and reorganization of the existing integrated design process of both functional performance and hexability based on the MBSE mode. The main task is to determine the typical MBRSE work items and to figure out the logical relationships among them. Then, these work items and the relevant logics can be used to improve the current available integrated design process, by following the process shown in Fig. 6.6 [80]. In the MBRSE, the hexability requirements proposed by the users can be managed together with the functional performance requirements. Under this circumstance, the model-based hexability work items involve requirement decomposition and allocation, universal modeling, disciplinary modeling, multidisciplinary modeling, integrated design, etc., as shown in Fig. 6.11.

6.2 Concepts of Integrated Design Process of Both Functional Performance …

229

Fig. 6.10 Framework of the concurrent design process in MBRSE

Fig. 6.11 Typical MBRSE work items and their logical relationships

The purpose of requirement decomposition and allocation of hexability is to gradually decompose and allocate the user’s initial requirements from the top level to the lower levels to provide the basis for the design of hexability. Universal modeling is a kind of fundamental modeling technology that provides basic information for design analysis and modeling of hexability. Universal models include many types of models. For instance, the product model of hexability design mainly establishes and maintains the product models that can be associated with hexability and is capable to support requirement models, functional models, and

230

6 Model-Based Reliability System Engineering R&D Process Model

physical models to keep their evolutionary relationship and technical status consistent. The task and load model are responsible for establishing the task profiles of the product from its use process. And these task profiles are further combined with the product’s material properties, ambient temperature, environmental humidity, general vibration spectrum, etc., to extract the load profiles of the product using numerical simulation technology. The fault models refer to the functional fault model, PoF model, and system fault model, which are mutually related to each other during their evolutionary process. Discipline modeling refers mainly to the simulation and analysis of each discipline (for example, reliability, safety, etc.) of hexability, by using the logical/physical models of the product. These models for each characteristic should be unified to a standardized model as much as possible. Then, the functional performance and disciplinary models of the product are further combined together to develop the multidisciplinary model to comprehensively analyze and evaluate the product’s hexability and determine whether it meets the user’s requirements. In addition, the multi-disciplinary models can also be used in the product design and optimization, to achieve both low cost and high efficiency during its full life cycle, under the premise that requirements of hexability are achieved. The key clue of the integrated design is the model-based closed-loop fault mitigation control. Based on this process, traditional hexability design can be unified into systematic identification of the fault, risk analysis of the fault, elimination design of the fault, maintainability design, testability design, safety design, supportability design, etc., to provide auxiliary support for design improvement. The modelbased design decision-making and optimization mainly provides multiview decision support for the implementation of hexability indices and also provides support for the optimized selection among the multiple improvement plans. In addition, in order to meet the analysis needs from multiple views, specific work items could be transformed from other work items by using the universal models. For example, the fault tree model can be transformed from FMEA, based on the universal modeling technology.

6.2.3 Multiple View Description Methods for the MBRSE Process In the MBRSE, it is difficult to describe the integrated process of multiple disciplines only from a single view. Therefore, a variety of view description methods are proposed for describing the MBRSE process in this section.

6.2 Concepts of Integrated Design Process of Both Functional Performance …

6.2.3.1

231

Behavior View

The behavioral view can generally be described by ordinary flowcharts. When there is a lot of concurrent or iterative information in a design process, the design structure matrix (DSM) can be used to describe hierarchical features of the process due to its easy division and combination characteristics. For processes at the same level, a batch of extended DSMs can be integrated to describe the behavior views. For example, the concurrent iteration characteristics (behavior) of the MBRSE can be described by three types of extended digital DSM, which are pre-release DSM, iterative probability DSM, and iterative impact DSM respectively, as shown in Fig. 6.12, in which the elements are selected in the range of [0, 1]. The pre-release DSM is mainly used to describe the concurrent characteristics of the product design process. For the case that design task j starts when design task i proceeds to k/10 of the entire work, the element in the pre-release DSM can be

Fig. 6.12 DSM based behavior view to describe the integrated design process between performance and hexability

232

6 Model-Based Reliability System Engineering R&D Process Model

calculated as aij = k/10 (0 ≤ k ≤ 10). Moreover, DSMs of the iterative probabilities and iterative impacts are used to describe the iteration and feedback in the design process. The former describes the possibility of changing the activity output, and the latter describes the influence of the iterative feedback on the design process. If there is a probability l will go to design task j when design task i is completed, then the element in the iterative probability DSM is calculated as aij = l (0 ≤ l ≤ 1). On the other hand, if design task i goes to design task j, and the impact of such a process on design task j is m, then the element in the iterative impact DSM is calculated as aij = m (0 ≤ m ≤ 1).

6.2.3.2

Function View and Information View

DSM can be used to describe the behavior characteristics of each design task in the MBRSE process and the sequence and iterative feedback relationships among different activities. For the integrated design process and the data related to its design tasks, such as the description of the activity, related attributes of the activity, participants in the activity, and resources in the activity, the function view of the MBRSE process can be described by combing “hierarchical structure” and “activity function table”, as shown in Fig. 6.13. The function view of the complex MBRSE process can be modeled by means of the ways that “horizontally modeling in stages, and vertically modeling in hierarchical, with supplement of tables”. First, according to the above-mentioned staged hierarchical overall process model, the staged hierarchical structure can be constructed according to the professional settings of the product’s R&D departments and the structure of the product. Moreover, with the progress of the design process (such as from the outline design stage to the detailed design stage), the professions are divided in more and more detailed design tasks, until that work items at the lowest level are created and implemented by specific designers. Next, number the design tasks (i.e. leaf nodes in the hierarchical structure) in the above staged and hierarchical structure. The numbering rules can be formulated by “No. of the development stage + No. of the profession + No. of the hierarchical level + No. of the design task”, etc., for standard records and descriptions. Finally, each numbered design task is described in detail with the help of an activity function table. This activity function table is identified by the activity number as its unique ID and includes elements such as name of the activity, description of the activity, participants of the activity, predecessor activities, and subsequent activities. The determination of all these data needs to be done under specific conditions of the product and its R&D department. It should be noted that the function view and behavioral view are not isolated, but tightly integrated to each other. In the practical modeling process, these two views are also built complementarily. In this way, it is possible to create the function descriptions of all design tasks in the entire integrated design process of both functional performance and hexability. The relevant staged and hierarchical activity function tables are also easily adopted in further implementation and operation management

6.2 Concepts of Integrated Design Process of Both Functional Performance …

233

Fig. 6.13 Function view to describe the integrated design process between both functional performance and hexability by using the activity function table

of the process, and easily integrated in typical digital design environment (such as Product Data Management system, PDM).

6.2.3.3

Organization View

The organizational view in the integrated design is used to describe the roles and authority of the team (department) and the individuals involved in MBRSE. The organization view is usually determined by combining both the behavior view and the function view, which are intercorrelated and tightly integrated, in the integrated design process. Generally speaking, the organizational view in the integrated design process can be mapped from the function view of the integrated design process and the organization view of the R&D department, in combination with actual integrated design situations. In addition, “cluster analysis” can be performed on team members

234

6 Model-Based Reliability System Engineering R&D Process Model

to determine who should work together and then to plan and analyze the organization structure by using the similar ways in design tasks.

6.2.3.4

View of Resources

The resource view is used to describe the production, use, and release of the resources from the product, tools, materials, costs, time, and labour in the integrated design process of both functional performance and hexability. These resources come from product, tools, materials, costs, time, and labour. Among them, the time resource mainly refers to the start and end time and progress of the process activities, and can be formatively described by using a Gantt chart. The human resource is a special kind of resource, having the same subject with the persons in the above-mentioned organization view. A data table for resource consumption in each activity can be established, and referred to by other views through that activity. In addition, constraints on different resources are regarded as important factors that affect the accessibility of the design process.

6.3 Key Technologies of the MBRSE 6.3.1 MBRSE Process Planning Technologies Under the MBRSE mode, in order to solve the problems of integrating a variety of new hexability work items into the product design process, it is recommended to use DSMs combined with the segmentation algorithm and tearing algorithm to sort the related tasks and complete the reorganization for the MBRSE design process [81].

6.3.1.1

Determination of Information Dependencies Among Different Design Tasks

On the basis of determining the specific design work items, the information dependencies among different design tasks are first determined [82]. The strength of information dependence can be decided from not only the sensitivity and predictability, but also from the number of pieces of information, exchange frequency of information, etc. Then, the information dependence relationship among different design tasks can be described by multiple indices according to actual needs. Suppose that there are n indices P1 (I, j), P2 (i, j), …, Pn (i, j), the multiple indices can be transformed into a single integrated index [P1 (i, j)•P2 (i, j)•…•Pn (i, j)]1/n by using the Multiplicative Utility Function method. Then, a 9-point scale fuzzy number between 0 and 1 is given for each single integrated index.

6.3 Key Technologies of the MBRSE

6.3.1.2

235

Identification of Coupled Sets

It can be proved that the nodes and arcs of the directed graph can be transformed/ mapped with the design tasks and their inter-correlations in the DSM, respectively. Therefore, the coupled sets in the DSM can be calculated by using strongly connected component algorithms in graph theory. First, the DSM matrix of order n is transformed into an adjacency matrix A (namely the element is 1 corresponding to information transfer among tasks and 0 otherwise), and then calculate the exponentiations of the adjacency matrix and sum them with the unit matrix to obtain the reachability matrix P (namely P = I ∪ A ∪ A2 ∪ A3 … ∪ An , I is the unit matrix). Next calculate the product P ∩ P T (qij = pij pji ) between P and its transpose PT . If the elements of the row (column) of task i are all 0 except for qii = 1, then this task is considered independent. Otherwise, the tasks with elements equaling to 1 in the same row (column) are assembled to a strongly connected component, i.e. in the same coupled set. Finally, these strongly connected components are combined to obtain the coupled sets and the remaining uncoupled tasks.

6.3.1.3

Operation of Coupled Sets

(1) Segmentation The coupled sets identified from the integrated design process of both functional performance and hexability are often in large scales, and difficult to be effectively torn. Under this circumstance, these large-scale coupled sets can be segmented by the tightness between information connection. The weak links of information connection should be selected for segmentation, to avoid unnecessary information loss. As segmentation is basically a fuzzy clustering problem, the coupled tasks T 1 , T 2 , …, T n can be constituted into a universe of discourse U. In the fuzzy DSM determined from the information dependences, there are often aij = aji . In these cases, according to the utility theory, the function x ij = (aij + aij )/2 can be used to characterize the information dependence between two different design tasks. The similarity coefficient between these two tasks is then calculated by the angle cosine method shown in Eq. (6.1), which is often used in fuzzy clustering: Ri j =

n k=1

n n 1/2 2 2 xki xk j / xki xk j k=1

(6.1)

k=1

in which the matrix Rij is the fuzzy relation matrix on U. When i does not satisfy the fuzzy equivalence relation, it is necessary to calculate the transitive closure function t(R) = RK , and for all natural numbers l that are greater than K, there is Rl = RK . That is, t(R) satisfies the fuzzy equivalence relation. Then we can further perform dynamic clustering after selecting a reasonable clustering level λ to calculate the cut set [t(R)]λ , in which the element r ij *λ can be calculated according to the Eq. (6.2):

236

6 Model-Based Reliability System Engineering R&D Process Model

ri∗j λ

=

1, ri j ≥ λ 0, ri j < λ

(6.2)

in which r ij is the element in t(R). The clustering results are obtained under the consideration of three major factors, including degree of information loss, independence, and engineering experience, respectively. The degree of information loss is expressed by elements in the upper triangular matrix, i.e., feedback information in iterations of the coupled task sets. Independence can be characterized by the number of divisions segmented from the coupled sets, and the better the independence, the higher the concurrent degree in the entire process. A utility function f : IFL/ID → u can be built to represent both the degree of information loss and independence at the same time, in which IFL represents the degree of information loss, and ID represents the independence. And then the smaller the utility function u, the better the segmentation plan in the coupled sets. If more than one solution is obtained, qualitative analysis and decision-making can be further carried out through the utility function u, according to engineering experience. In addition, the coupled task sets after segmentation can perform multiple actions such as aggregation and tearing. (2) Aggregation The purpose of decoupling in the process is to remove the unintentional iteration. Because the hexability design has strong intentional iterative characteristics, some certain tasks can be treated by means of aggregation under specific conditions. That is to say, these tasks are treated as a “whole task”, and the relationship between these tasks and the other tasks can be transformed into the relationship between the “whole task” and other tasks. Then the development of these aggregated “tasks” can be handed over to specific teams, for reducing the coupling degree from the global perspective. (3) Tearing The tearing algorithm can be used to split the coupled sets that have not yet been processed in the above steps. One can calculate both the input intensity of information S i and output intensity of information S o of a task, and use their quotient F i to characterize the dependence intensity of information. The tasks with the least intensity of information dependence are placed in the front, assuming that they are firstly “torn” away from the matrix. By iteratively repeating the above operation, the remaining tasks in the matrix are continuously reduced until all tasks are planned.

6.3.1.4

Task Order and Classification

In this step, we can treat each coupled set as a “task”, and combine it with non-coupled tasks to form a new reduced matrix.

6.3 Key Technologies of the MBRSE

237

(1) The tasks with the elements in rows (columns) except for the main diagonal that are all empty are moved forward (backward). (2) Until all remaining rows and columns contain nonempty elements except for main diagonal, go to (3), otherwise repeats (1). (3) “Delete” the rows where all elements except for the main diagonal are empty and the columns relevant to these rows, and record those tasks as the first-level tasks. Do the same action again on the reduced matrix, and record new deleted tasks as the second-level tasks. Then by continuously repeating the above action, we can finally obtain the level (i.e. hierarchical) classification and topological ordering for all tasks.

6.3.2 Analysis Method for the Operation Conflict in the MBRSE Process During the MBRSE process, design is achieved by multidisciplinary teams distributed in different locations. There are many interdependencies between the multidisciplinary teams and members. These activities interact together with mutual influences and restrictions. Therefore, if the above relationships are not handled well, it is very easy to cause processing conflicts. In the product design process, there are many factors that can cause conflicts, including different design decisions, design incompatibility, wrong product data, different evaluation standards, different terminologies etc. Moreover, during the implementation of integrated design process of both functional performance and hexability of a product, there might exist self-existing errors in the process model after reorganization or changes in various resources and conditions. These problems will cause the design process model not be capable to effectively reflect the actual integrated design process, and result in operational conflicts. Furthermore, competition from multiple product development tasks using the same and limited resources (such as manpower and equipment) will also cause conflicts, as shown in Fig. 6.14. Under the assumption that the resources and conditions are not changed, there are two main types of conflict to affect the running feasibility of the process, as shown below: (1) Conflicts in the schedule of personnel design tasks, i.e. multiple tasks cannot be assigned in the same time with limited manpower and time. (2) Conflicts in the arrangement of resources such as equipment and tools, i.e., multiple tasks cannot be assigned in the same time by limited equipment, tools, etc. Before running the integrated design process model of both functional performance and hexability, it is necessary to analyze and check the conflicts in terms of the above two aspects. For conflicts in the schedule of personnel tasks, it is generally possible to sequentially accumulate the personal work time of completing certain design tasks, to compare with the designer’s ability and relevant time rules. If personal

238

6 Model-Based Reliability System Engineering R&D Process Model

Fig. 6.14 Conflicts during the running of the process

work time exceeds the designer’s tolerance or the natural time limit (such as working hours on a work day more than 24 h), there must be conflicts. The corresponding analysis algorithm is given as follows: (1) Suppose that an organization has i designers Ai (i = 1, …, n). For each designer Ai, thoroughly enumerate all the design tasks S ij ( j = 1, …, m) undertaken by him. (2) Determine the work time T ij to complete the design task S ij according to the roles and arrangement of manpower for that task. (3) Determine the time conflict criterion T max based on work habits, rules and regulations, historical experience, etc. (4) Count all of the work hours mj=1 Ti j of the designer Ai and compare it with the conflict criterion in (3) to determine whether there is a running conflict, for instance mj=1 Ti j ≤ Tmax . As for the check of conflicts in arrangements of equipment, tools, etc. similar steps can be done by enumerating all tasks and comparing them with the corresponding capabilities.

6.3 Key Technologies of the MBRSE

239

6.3.3 Simulation Based Operation Evaluation of the MBRSE Process In order to obtain an optimized and executable MBRSE integrated design process model, it is also necessary to analyze and evaluate the operability and operational capability of the process under limited resources. However, there are a large number of concurrent and iterative segments in the integrated design process. The resulting complexity in the process makes it difficult to use analytical algorithms to perform a complete and effective description and analysis [83]. Therefore, a more reasonable method is to use simulation technology to model and analyze the integrated design process, as illustrated in Fig. 6.15. The core of simulation-based process analysis is to describe the function view and behavioral view of the segments and state changes in the process. The main elements that need to be considered in the integrated design process simulation include.

6.3.3.1

Uncertainty Factors in the Process Simulation

In the process of integrated design for both functional performance and hexability, there are usually two types of uncertain factors [84]. The purpose of performing the process simulation is to simulate the process considering these uncertain factors. (1) Uncertainty in the operation time of design tasks. Since the integrated design task of both functional performance and hexability is an intellectual activity involving many factors such as customer requirements, designer capabilities, design resources, etc., it is difficult to explicitly determine the exact operation time of the design task. The design task has an operation time interval in

Fig. 6.15 Simulation based analysis on the operation capability of the integrated design process

240

6 Model-Based Reliability System Engineering R&D Process Model

between the shortest (optimal) time and the longest (worst) time. The probability to complete the design task in any time in this interval constitutes the time probability model. (2) Uncertainty in decision-making on output branches of design tasks. After the completion of a design task, it may trigger multiple subsequent design tasks to be operated simultaneously. In addition, the completed design task may also be the unique choice for the subsequent multiple design tasks, or may iteratively return to a new upstream design task. For this kind of problem, it is generally to establish a design task operation queue, and add the potential design tasks that may be operated into the queue. Then certain operation rules are developed to determine the priority orders of these design tasks and constitute the design task branches, each of which has its own operation probability.

6.3.3.2

Evaluation System of the Process Simulation Analysis

Self-evaluation is the main purpose for conducting simulation analysis of the integrated design process of both functional performance and hexability. To achieve such a goal, a quantitative (index) evaluation system is established to evaluate the integrated design process of both functional performance and hexability, based on the demands and requirements of the design process modeling, and the analysis capabilities that the simulation system can provide. Generally, enterprise expects to reduce production costs, shorten product development cycles, improve product and service quality, and improve work quality and employee satisfaction through the re-design of their business processes. For design enterprises, such as equipment development enterprises, they will pay more attention to work coordination from different designers, to reduce equipment design costs, shorten the design cycle of new equipment, and improve the quality characteristics of equipment. Therefore, when establishing the quantitative evaluation system, indices related to work progress, cost, efficiency, resource utilization, and waiting queue of design tasks are usually used to characterize performance (capacity) and hexability in the integrated design process. Based on the goals and requirements of the improvement and reorganization in the integrated design process, these indices can be used as the targets of the comprehensive analysis on both functional performance and hexability, and can also be used as the constraints for its operational reachability (i.e. with no operational conflicts). (1) Work progress is an important index to characterize the pros and cons of the integrated design process in terms of both functional performance and hexability. The overall work progress reflects the response speed of the equipment development enterprises to market demand; whereas the more detailed work progress for a certain design task (including, for instance, the operation time and waiting time of the design task) reflects its operation efficiency. (2) Cost is another important index to characterize the integrated design process of both functional performance and hexability. Depending on the closeness of the

6.3 Key Technologies of the MBRSE

241

relationship between cost and equipment or its organizational department, cost can be divided into direct cost or indirect cost. (3) Efficiency reflects the comprehensive ability of the integrated design process of both functional performance and hexability to handle different design tasks. (4) The resource utilization rate reflects the utilization efficiency of the resources provided by the R&D department (including manpower, materials, equipment and facilities, etc.). Low utilization efficiency indicates insufficient use of the resource, while high utilization indicates that the resource is prone to become a bottleneck (that is, the resource becomes a constraint in the integrated design process). (5) The waiting queue for design tasks can analyze the ability to deal with design tasks matters.

6.3.4 MBRSE Process Review and Validation Methods The results of the above comprehensive quantitative analysis on the feasibility and operational capabilities of the MBRSE process model should be further combined with the qualitative knowledge of the field experts’ experience. In this way, qualitative review and quantitative analysis are jointly used to verify and validate the integrated design process model, to ensure that it can be implemented and applied in the typical equipment development process.

6.3.4.1

Hierarchical Structure of the Index System for Verifying the Integrated Design Process

The integrated design of both functional performance and hexability emphasizes how to coordinate designers in multiple fields (such as hexability, performance, etc.) to carry out the design work in an orderly way. Meanwhile, it should also focus on how to reduce the cost of equipment design and shorten the design cycle of new equipment, etc. [85] Consequently, the following hierarchical index system is designed for process verification and validation, as shown in Fig. 6.16.

Fig. 6.16 The index system for process verification

242

6 Model-Based Reliability System Engineering R&D Process Model

In Fig. 6.16, “process capability” as the first-level index is the general overall goal (i.e. overall index) of all kinds of process model. For different implementation departments and implementation goals of different process models, the target domains (i.e. second-level indices) for the process verification are different as well. For the integrated design of both functional performance and hexability, indices related to coordination, flexibility, efficiency, work progress, and cost should be considered. Furthermore, on the basis of the second-level indices, the third-level indices are further decomposed by considering the actual operability of the process verification. These third-level indices include organizational coordination, resource allocation balance, degree of feedback from iterations, etc. It should be noted that the index system proposed in this book are not fixed. In the practical implementation of the process verification, additional adjustments and supplements should be taken in time according to the actual situations of the process implementation departments and implementation goals. This will make sure the verification results of the process can better reflect the feasibility and ability of its actual operation.

6.3.4.2

Integration of the Indices for the Process Verification

For different implementation departments, implementation goals, and organizational goals, the weights of the indices mentioned above for verifying the design process model are also different. In the actual verification process, the indices under the lowest level (i.e. leaf node indices) are first verified, and then based on different weights of the indices, a comprehensive process verification result can be obtained by using specific comprehensive methods for leadership decision-making. By using the analytic hierarchy process method, one can determine whether the established design process model is feasible and has sufficient operational capability to meet the organization’s target requirements.

6.3.4.3

Quantification of the Indices for the Verification of the Process

The indices for the process verification are shown in Table 6.1.

6.3.4.4

Determination of the Comprehensive Verification Score of the Process from Experts Using the Delphi Method

The process verification needs to be done by the experts from multiple field, and therefore belongs to a “group decision-making” problem. Therefore, the Delphi method can be used to determine the complete verification score. This method should be carried out under anonymous conditions, and need to give feedback to the experts [86]. The following attentions should be paid when implementing the Delphi method:

6.3 Key Technologies of the MBRSE

243

Table 6.1 The indices for the process verification Indices

Indices description

Score (ten-point system)

Organizational coordination

The different kinds of person involved in the 10 points for the best operation of the process need to be assigned with coordination, and 1 the corresponding roles in the department-based point for the worst organizational structure to complete the personnel organization under a specific process. Better organizational coordination can reduce personnel conflicts in the process

Resource The different kinds of resources (including 10 points for the best allocation balance documents, equipment, tools, programs, etc.) need balance and 1 point for to be used in the process. The existing balanced the worst arrangement of different resources in the process helps to improve the efficiency of the process Flexibility of choice

There are many alternative, effective paths in the 10 points for the largest process, and each path can be correct. In different flexibility, and 1 point situations, the path of operation is optional for the smallest

Adaptive flexibility

The process can be flexibly changed according to the specific situation and therefore is adaptable

10 points for the largest flexibility, and 1 point for the smallest

Iteration feedback In the integrated design process, it is required to degree minimize the degree of feedback from iteration. The fewer iteration feedbacks, the higher the operating efficiency of the process

10 points for the least iteration feedback and 1 point for the most

Simplicity/ non-redundancy

The integrated design process requires non-redundancy, and needs to minimize unnecessary redundancy to improve the work progress, cost, resource consumption, etc.

10 points for the least redundancy, and 1 point for the most

Average waiting time

During the operation of the process, some tasks could be pended due to its predecessor task not being completed in time, and therefore cause the extra occupation (i.e. waiting time) of labor, resources, etc. Generally, the shorter the waiting time of the task, the higher the overall progress and efficiency of the process

10 points for the shortest waiting time and 1 point for the longest

Work progress

This is a general evaluation index for different design processes. The work progress of the process should be as fast as possible

10 points for the fastest progress and 1 point for the slowest

Cost

This is a general evaluation index for different design processes. The cost of the process should be as least as possible

10 points for the least cost and 1 point for the most

244

6 Model-Based Reliability System Engineering R&D Process Model

(1) To ensure an adequate sample size, more than 10 sophisticated experts with a rich engineering experience should be selected to verify the integrated design process. (2) The verification criteria given by the experts must be consistent, such as referring to the same “process verification index system” mentioned above. It should be noted that the index system is not necessary constant but changed with the actual situation of the process implementation department and its organizational target. (3) The verification process should be carried out in a “back-to-back” way, in which experts are not allowed to discuss with each other. (4) The verification scores obtained from experts should be processed by using statistical methods, and the results should be fed back to each expert. Using the Delphi method, the key points to implement the verification of the integrated design process are as follows: (1) Develop a process verification index set U = {u1 , u2 , …, un }, and design the integrated design process verification table for experts. (2) Organize m experts to conduct the process verification to obtain the first round of verification scores. (3) Perform statistical analysis on the verification scores provided by the experts, and evaluate the consistency of these verification scores by using the statistical parameters (i.e. expectation E and variance δ 2 ) shown in below: E=

m 1 1 ai , δ 2 = (ai − E)2 m i=1 m−1

(6.3)

in which m is the number of experts; ai is the verification score of a certain index from the ith expert. 2

m 2 × n3 − n ai (m × n) × ai2 − Sr / = Q= Smax (m × n) × (m × n − 1) 12

(6.4)

Next, by using the χ 2 test method shown in Eq. (6.5) to evaluate the consistency of the verification scores: 2 2 2 / m × n 2 × (n − 1) (with n − 1 freedoms) χ = 12 × n × ai − ai (6.5) If the consistency requirement is not satisfied, the verification scores will be returned to the experts to carry out the next round of verification. This verification process will be repeated iteratively until the verification scores of all experts meet the consistency requirement.

Chapter 7

Integrated Design Platform for Model-Based Reliability System Engineering

Abstract Based on the theories and methods given in the previous six chapters, this chapter presents the enabling technological requirements of the MBRSE integrated design platform for engineering development. Then, by taking the fault ontology as the core, the “processes” and “methods” related to the hexability activities in the system engineering process are closely integrated into the MBRSE integrated platform, and the integrated design platform framework, functional composition, extension of the product data model for integration design of both functional performance and hexability, and construction methods of the PLM based hexability design process. Finally, by taking the product and fault as the core, it provides the integration requirements and models of both MBRSE and functional performance design tools. Keywords Integrated design platform · Enabling technological requirement · Integrated platform framework · Extension of product data model · Integration of integrated design tools

7.1 Engineering Requirements of the MBRSE Integrated Design Platform 7.1.1 Overview of Enabling Technology for Complex System Development Enabling technologies (i.e. auxiliary support technologies) have been emerging and developing to support the engineering development process of complex systems. With the ever-increasing complexities of products, the increase in participants and the shortening of the development cycle, enabling technology is playing an increasingly important role in the development of modern engineering products. It can even be said that modern complex projects will not be successful without the support of enabling technologies. According to the different roles which are played in the development of complex systems, enabling technologies can be divided into multiple types, including product digital modeling and analysis, product data sharing and management, product © National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_7

245

246

7 Integrated Design Platform for Model-Based Reliability System …

Fig. 7.1 Enabling technologies used for the complex system development

complex engineering system requirement

enabling

technology and management decision support product digital modeling and analysis

product data sharing and management

product life cycle process management

life cycle process management, technology and management decision support, etc., as shown in Fig. 7.1.

7.1.1.1

Digital Modeling and Analysis of Products

Product digital modeling and analysis refers to the activities of engineering product information (and data) expression and engineering analysis by using computers. Digital modeling not only provides convenience for quickly generating the design drawings of the product, but also enables product designers, users, and manufacturers to perform numerical operations in the early stages of the product design process. In the virtual digital environment, simulations on design optimization, performance testing, manufacturing process, and use process can be performed to detect and solve design problems in advance, and therefore reduce design iterations, save development costs, and accelerate the development progress of the product.

7.1.1.2

Product Data Sharing and Management

Due to the independence among CAD/CAE/CAPP/CAM tools on data structure, storage format, etc., the problem of “information islands” will arise to impede the completion and consistency of design data of the product. Data sharing and management technology is developed to solve this problem. The ultimate goal of data sharing and management is to realize the integration of data in multiple disciplinaries and multiple methods, and support the effective share of data throughout the full life cycle of the product.

7.1.1.3

Management of Product Lifecycle Processes

Product life cycle process management refers to the development process following by the engineering methodology through methods in process modeling, analysis, and

7.1 Engineering Requirements of the MBRSE Integrated Design Platform

247

reorganization. The engineering process of a complex system is very complicated, and therefore it is necessary to operate and monitor the process automatically by using computers, and then to assist different participants in the engineering system to carry out their work in an orderly manner. The purpose of managing the product life cycle process is to achieve an integrated development process based on enterprise data integration.

7.1.1.4

Support of Technology and Management Decision Making

Decision is one of the key activities in an engineering system, which runs through the whole process of the product design. Decision support refers to the activity of using computers to help the designers to make a choice based on data, model and knowledge, through the human–computer interaction (HCI) method. A typical enabling tool is a decision support system (DSS), which can provide an environment to analyze problems, build models, simulate decision processes and solutions, and invoke various information resources and analysis tools to help decision-makers improve the decision level and quality.

7.1.2 Enabling Technological Requirements for RSE Integrated Design Traditional hexability enabling technology is mainly used to assist specific methods or work items. There are a large number of commercial software, such as China made ARMS and CARMES, American made Relex and Isograph, and Israel made ALD and CARE. These software tools provide powerful support to efficiently carry out the haxability work. However, the hexability work is not a simple sum of the different work items, but requires effective coordination among them. The key to achieve practical results of the hexability work is to control the design process of hexability of the product and combine it with the performance design in an integrated and coordinated manner. Obviously, traditional hexability enabling tools cannot meet engineering needs. The product hexability enabling technology should not exist and be applied in isolation, but rely on the development of the product performance design enabling technology. The hexability design enabling fundamental models, integrated environment, and software tools are the inheritance and extension of the product performance design enabling tools, and their use processes need to be closely integrated into the product design process. Traditional performance design methods have gradually matured after a long period of development. Their core is the product data management system, which has developed into the product life cycle management (PLM) system in recent years. The main task of product life cycle management is to achieve integrated management between data and processes. However, due to the different data requirements and

248

7 Integrated Design Platform for Model-Based Reliability System …

management processes from different manufacturers and products, it is necessary to customize the basic integrated environment to form an integrated environment oriented to specific design requirements. And the core of such a customization is the data model and the process model. In theory, it is necessary and feasible to develop hexability based on PLM. However, the integration of hexability based on PLM is still in the early stage, and no mature solutions could be found. Due to the lack of unified models for hexability, it is difficult to involve hexability in the design process of the product, and effectively integrate the hexability design tools into the performance design environment to construct an integrated design environment. The integrated mechanism, unified process model, and integrated method model based on integrated design provide a theoretical basis for the integration of hexability design tools and digital design environment. This promotes the development of hexability enabling technology by leaps, and provides the capability to organically integrate the hexability work process and different hexability design tools into the product design environment. In this way, the hexability design is fully coordinated with the performance design, to comprehensively support the hexability and management work of the whole system and the whole process.

7.2 Fundamental Models of MBRSE Integrated Design Platform 7.2.1 Framework of Integrated Design Platform The fault ontology model lays the theoretical foundation for data interoperability. In order to realize the interoperability of hexability design and performance design, by taking the fault ontology as the core, the “process” and “methods” related to hexability engineering activities in the system engineering process are closely integrated into a unified “environment” (i.e. integrated platform) in this book. Such a platform uses PLM as the basic to realize the expansion to hexability. The materialized support for methods is given by the unified management of data and integrated software tools, and the materialized support for processes is given by the management of the integrated design workflow and the mitigation and control of failure modes. The resulting integrated design framework is shown in Fig. 7.2. Therefore, the integrated design platform for both functional performance and hexability does not exist independently, but is an extension of the performance design integration environment based on the requirement of integrated design. The integrated design “process control” is the core of the entire integrated design platform framework. It defines how the process realization is carried out by identifying and mitigating the failure modes in the product design process. By constructing the mapping relationship between the fault ontology and the whole process, the reliability design and redesign activities at each stage of the product design process are integrated with

7.2 Fundamental Models of MBRSE Integrated Design Platform

249

process

methods

integrated design workflow management

stress analysis

failure mode mitigation and control process analysis of RMS impact factors in life cycle

failure mode analysis RMS modeling

integrated design unified model

identification and mitigation decisions for failure modes

RMS design and evaluation

environment PLM unified environment PLM integrated interface data model

process model

RMS-oriented data model extensions

RMS-oriented process model extension

integration tool FEA failure modeling and analysis

Fig. 7.2 Framework of MBRSE integrated design platform

the system engineering process. Relevant methods include identification and analysis of failure modes, stress analysis and reliability design during the entire process, such as analysis of failure modes, analysis of the Event Tree Analysis (ETA), Finite Element Analysis (FEA), Physics of Failure (POF), and Reliability-based Multidisciplinary Design Optimization (RBMDO). These methods define the all kinds of technologies required to perform the tasks in the process, and the “interoperability” between these technologies can be guaranteed through the definition of the unified model. The software and tools generally include CAE software for the temperature, shock and vibration analysis, and software for system reliability design analysis, decision analysis and multidisciplinary optimization, etc. These above-mentioned processes, methods, and tools need to be integrated in a unified environment to realize the reliability system engineering process. And the PLM system is an ideal platform to realize this demand by utilizing its functions of data integration and process integration. The realization of process control relies on the work flow plan of the PLM system. Data sharing is realized based on the fault ontology by using the object-oriented customization of the PLM system, and the methods are realized by software tools.

250

7 Integrated Design Platform for Model-Based Reliability System …

7.2.2 Functional Composition of the Integrated Design Platform The general functional structure of the integrated MBRSE design platform is shown in Fig. 7.3. (1) Design process management module, which has the functions including: • Responsible for maintaining the organizational structure and operation authority of hexability designers, to establish clear rights and responsibilities. • Provide a “Three directors” management mode and security classification authority management, to meet the informatization construction requirements in the enterprise. • Achieve unified management of the project, unified configuration of the hexability requirements and specification guidelines, unified management

design decision and optimization module

chief engineer/ principal/supervisor

principal/ supervisor hexability tool sets

design process management module design model permission evolution process control management finished products supervisor finished product design model control module

design task control

design status control

unified basic configuration

product model construction module for hexability design requirement model

function model

fault model construction module functional fault

physical fault

system fault

supervisor /designer

hexability indices calculation and analysis module

designer

mechanical electronic product product reliability reliability simulation simulation analysis module analysis module

physical model

electronic prototype maintainability check module

environmental model construction module

testability supportability modeling and analysis and analysis module evaluation module

temperature vibration profile profile

fault/defect closed loop mitigation control module

fundamental database of the product data and models

products and their characteristics

model data

general requirement models

general functional models

general physical models

general maintenance models

general test models

general fault models

interface

product digital research and development environment

portal system

interface

PDM system

requirement management system

Fig. 7.3 Functional constructure of MBRDP

data processing

data management

data retrieval

data analysis

data mining

interface

CAD tool

CAE tool

FEA tool

7.2 Fundamental Models of MBRSE Integrated Design Platform

251

of the product technical status, unified configuration of basic templates and dictionary contents required for hexability analysis. • Support the control of the process and status of hexability design and analysis. (2) Product modeling module for hexability, which has the functions including: • Provide a unified product modeling environment, to construct a unified product function model, physical model (including geometrical model), and the evolutionary interrelationships in between these above models. • Track the design change point and its resulting design change interface. (3) Physics of failure modeling and risk evaluation module, which has the functions including the followings: • Provide a unified fault modeling environment for product models with different technical states, to systematically identify functional faults, physical faults, and system faults, establish fault impact and transmission models, and establish evolutionary relationships between the different types of fault models. • Comprehensively evaluate the risk by the fault to identify the weak links in design. (4) End product design model management and control module, which has the functions including: • Provide a unified management environment for design requirements and design models of end products. • Through the control module of the end product design model, the results determined from the design parameters are merged and the hexability indices assigned by the system are issued to the end product manufacturers. • After the manufacturers complete the product design in stages, the end product design model is returned to the general design institute through an interactive unified product model path with the supporting institutes. (5) Hexability design analysis tool kits, which focus mainly on the design and analysis of reliability, maintainability, testability, and supportability based on the product model, load profile, and failure model built in three key modules. These three modules include the product modeling module for hexability design, the environmental modeling module, fault modeling, and the risk evaluation module. (6) Design decision and optimization module, which provides a multi-view dynamic visual monitoring environment with the functions including: • Display in multiple screens the cross-linking relationship and dynamic evolution process among the different levels of the system, product models with different technical statuses, and different models.

252

7 Integrated Design Platform for Model-Based Reliability System …

• Analyze the design influence propagation process, to detect design weaknesses, determine the best design plan, and therefore provide support for design decision and optimization. (7) Fundamental database of the product data and models, which has the functions including: • Manage the general design models of the product. These general design models include general requirement models, functional models, physical models, fault models, maintenance models, and test models. They are regarded as the fundamental basis for developing a unified product modeling system. • Perform data management, data analysis and data mining for the data in the database. These functions are capable, on the one hand, to ensure the validity of data, and on the other hand, to discover the knowledge from the data in the database, and reuse it in the product design process. (8) Interface with the digital environment, which connects the data flow between MBRDP and the digital design environments in enterprises or R&D departments, to build a seamless integrated design environment. These digital design environments include portal system, Product Data Management (PDM) system, requirement management system, CAD modeling environment, CAE modeling software, Finite Element Analysis (FEA) software, etc. Cross-linking relationships among these above modules are shown in Fig. 7.4.

7.2.3 Extension of the Product Data Model for Integrated Design 7.2.3.1

Product Design View Under the Consideration of Hexability

The traditional PLM system has relatively comprehensive data models, which consider the product models as the core. Through a product model centric principle, the fault ontology discussed in Chap. 2 can be extended by involving the hexability related ontologies. The concept of product in the ontology can be an abstract concept. Different product structure views (abbreviated as product views) can be developed by the organization of product elements from different design perspectives, in different development stages, and using different design analysis methods. The traditional product design process mainly includes two types of core views, which are the functional view and physical view, respectively. The digital environment of product configuration is also established based on these two types of view. The functional view is mainly used in the conceptual design stage and the preliminary design stage. It focuses on the expression of the product functions. The hierarchical structure of the product elements is developed level by level, to form the functional

7.2 Fundamental Models of MBRSE Integrated Design Platform 1. hexability design requirement set 2. hexability work plan 3. hexability work progress requirements 4. hexability deliverables requirements

design process management module 1. modeling authority 2. design requirements

1. design scheme 2. requirement model 3. functional principle model 4. CAD model 5. using process

end product end product list CAD model

CAD model

product modeling module for hexability design product models, including functional, physical, and system models

hexability design requirements for end products end product design model management and control module hexability indices

hexability indices calculation and analysis module

for end products

test information such as test type, test location, test method, etc

failure model for end product testability model for end product

testability modeling and analysis module

1. maintenance procedures 2. verification criteria

1. use profile 2. type of maintenance work 3. available work opportunity 4. maintenance manual 5. supportability resource information...

1. hexability design results & deliverables 2. hexability design evaluation results

decision results optimization scheme

1. use task processes 2. power and other performance parameters 3. ambient temperature\ humidity 4. vibration spectrum of the whole machine

mission and load analysis module

CAD models of mechanical products

1. decision-making authority 2. design and deliverables requirements 3. design results & deliverables

mission mission profile profile

CAD model of electronic products

hexability indices hexability indices

hexability indices of product

mechanical product reliability simulation analysis module

electronic product reliability simulation analysis module

mechanical

product and failure information

physics of failure product failure modeling and information 1. finite element model risk evaluation 2. dispersion of key module parameters

1. information such as circuit board/ component/via hole/solder joint 2. thermal/vibration information 3. failure physical model

3. physics model of failure

failure distribution

failure

electronic prototype maintainability check module

electronic product failure information

failure and its risk

design decision and optimization module

design defect

classical reliability model transformation module

failure and its logic

supportability analysis and evaluation module

design defects and their risks

1. basic product model 2. basic product fault model/physics of failure model 3. maintainability and testability model of basic products ...

1. hexability design requirement sets 2. hexability design specifications

product model product model

1. end product model 2. hexability model 3. hexability design documents

253

failure/defect closedloop mitigation and control module

failure/defect control information (improvement measures, compensatory measures, postcontrol risks...)

basic model

fundamental database of the product data and models

Fig. 7.4 Data interaction within the platform

levels of the product. The physical view is mainly used in the preliminary design stage and detailed design stage. It is developed based on the functional view and reflects the physical structure to realize the functions. The integrated design of both functional performance and hexability can still be carried out around the evolution from functional view to physical view, but the influence of hexability should be much considered to expand the existing views. The relationship among these points of view is shown in Fig. 7.5. On the one hand, the design for hexability is carried out based on the functional view, by referring to the qualitative hexability criteria or knowledge, and considering the impacts from the failure, maintenance and support. There is no independent view for these designs, but these impacts above should be reflected in the transition between functional views and physical views. On the other hand, based on different perspectives of hexability, a domain view for the hexability analysis and its mapping mechanism need to be established to support the hexability analysis and evaluation. Hexability is a research against faults. It focuses on the understanding of the mechanisms and laws of fault occurrence and uses these laws to prevent or control

254

7 Integrated Design Platform for Model-Based Reliability System …

consideration consideration of hazard of failure sources

design for safety

design for reliability

consideration of maintenance

consideration of maintenance and support

design for maintainability

design for supportability

mitigation and control of failure functional view

product view for safety analysis

product view for reliability analysis

physical view

product view for maintainability analysis

product view for supportability analysis

Fig. 7.5 Relationships of the multiple views in the integrated design of both functional performance and hexability

faults [9]. Therefore, the perspectives of the hexability work items are all directly or indirectly related to the faults. For example, reliability focuses on the impacts of different faults on the completion of specified functions for a system. Maintainability focuses on the ability of prevention and repair of faults (including fault detection). However, supportability is partly related to faults. Therefore, the product supportability discussed in this book is limited to its maintenance support part, by focusing on the characteristics of system faults and ability of resource planning to meet the requirements of regular combat readiness and use in wartime. As described in Sect. 2.5, there are up to 157 hexability work items specified in Chinese National Military Standards. To confirm the views required in the hexability work items, the different hexability work items specified in Chinese National Military Standards are organized using a work item view matrix, following the ideas given in Fig. 7.5. An example of the hexability work item-view matrix (with only traditional hexability work items) is shown in Fig. 7.6. In Fig. 7.6, characteristic domain P = {reliability (P1 ), maintainability (P2 ), supportability (P3 ), safety (P4 )}, method domain W = {failure mode impact analysis (w11 ), reliability prediction (w12 ), structural/thermal finite element analysis (w13 ), testability prediction (w21 ), maintainability prediction (w22 ), virtual maintenance validation (w23 ), reliability centered maintenance analysis (w31 ), level of repair analysis (w32 ), use and maintenance task analysis (w33 ), event tree analysis (w41 ), functional risk analysis (w42 ), area hazard analysis (w43)}, and merged view domain V = {functional view (v1 ), physical view (v2 ), fault logic view (v3 ), inspection/ maintenance element view (v4 ), area view (v5 ), support element view (v6 )}.

7.2 Fundamental Models of MBRSE Integrated Design Platform

255

Fig. 7.6 Hexability work item: view matrix

From the above analysis, it can be found that ν = ν 1 ∪ ν 2 ∪ ν 3 indicates a strong correlation in general with wij . Therefore, the product (function/structure) and fault are considered as the intersection among the hexability domain views, and therefore constitute the connection between functional performance and hexability data.

7.2.3.2

Multiview Ontology-based Data Model of Integrated Design

The knowledge models in hexability are all developed using product/fault centered ways, but focuses on different emphasis. In the process of integrated design of a product, reliability focuses on failure mechanisms, fault evolution processes, fault propagation and consequences of the product. Maintainability focuses on fault detection and site characteristics, as well as characteristics to replace and repair failed parts. And supportability focuses on the maintenance methods to prevent and repair the faults, as well as the distribution of associated support resources. From the above analysis, it can be seen that the study of hexability highly requires knowledge of the models and faults of the product. Therefore, a hierarchical ontology is adopted which mainly contains the reference ontology and application ontology to describe the multiview model as shown in Fig. 7.7. In the hierarchical ontology, the reference ontology is not designed for use in any specialized field. Its purpose is to reuse knowledge in multiple applications, focusing on the description of the most basic, top-level, and abstract concepts from the product data [87, 88]. Whereas, the application ontology refers to detailed concepts for specific application domains and

256

7 Integrated Design Platform for Model-Based Reliability System …

Fig. 7.7 The relationship between the reference ontology and the application ontology

reference ontology

application ontology 1

application ontology 2 mapping

is usually established by experts in those domains. Generally speaking, part or all of the reference ontology can be used in multiple application ontologies. First, top -level concepts of the product design (such as product, structure, function, fault, etc.) are examined and selected to build the reference ontology of the multiview model. Then, different kinds of detailed concepts of hexability are selected to establish the application ontology in each field, and then the data/knowledge models on hexability are built based on these application ontologies. According to the process mentioned above, a multiview model shown in Fig. 7.8 can be built. In the multiview model, the reference ontology provides an organic connection of the mapping among different application ontologies, and therefore becomes the core to realize the sharing of data and knowledge in different fields of the integrated design. These ontologies are constructed by using the basic ontology relations, including the perspective relation (is-a) and composition relation (part of) [87], and expanding the unary relations (such as realization, correspondence, and use), and binary relations (such as unable to complete the specified function and fault detection). In the reference ontology view, physics and function are two different perspectives of the product, both of which are regarded as the subconcepts of the product. There is a realization relationship in between structure and function. And, the fault occurs when the specified function of a physical structure cannot be completed. Then the composition relation in between the physical structures constitutes the physical view of the product, and the composition relation in between functions constitutes the product functional view. In the reliability view, the product is observed from the perspective of fault logic units and their relationships. Therefore, the subconcepts, i.e. fault logic units, can be established in the reliability view of the product, and there is a composition relationship among these fault logic units. In the maintenance and support view, the product is observed from two perspectives of both detection/maintenance unit and area. Therefore, two subconcepts of the product are established, namely the detection/ maintenance unit and area, and the binary relation between the product and the fault can be established through detection/maintenance operations. It should be noted that Fig. 7.8 only establishes the basic framework in the multiview model. The concepts in both the reference ontology and application ontology

7.2 Fundamental Models of MBRSE Integrated Design Platform

viewpoint product meta

product meta

reliability view (R_View)

fault logic relationship

fault logic unit

viewpoint

257

fault

correspondence

composition unable to reference complete ontology view function specified (RF_View) function

physics

fault composition

maintenance and support view (MS_View)

product meta

composition

composition detection/ maintenance unit

detection and maintenance of faults detection/maintenance use

operations

concept

fault

field

relationship

support unit mapping and transformation

Fig. 7.8 Ontology based multi-view model of integrated design

can be expanded according to certain requirements. For example, the concepts to describe the fault can be expanded from the perspectives of failure mechanism, fault propagation and fault consequences, including fault site, fault time, fault impact, etc.

7.2.3.3

Mapping Mechanism in Between the Multiple Views

To use the framework shown in Fig. 7.8 to achieve mapping between multiple views, it is necessary, on the one hand, to determine the mapping mechanisms of the function and structure, and, on the other hand, to establish the mapping mechanisms of the product and fault from the reference ontology to other views. By instantiating the concepts of the structure and function, and then expanding the realization relationship in between them, the mapping mechanism in between the function and structure can be established, as shown in Fig. 7.9, in which F stands for function and S stands for structure. There are in total four types of structure-to-function realization relationships, which are direct realization, “AND” combination realization, “OR” combination realization, one-to-many realization, respectively.

258

7 Integrated Design Platform for Model-Based Reliability System …

realization S

F AND

S

F F

composition

F

S

OR

S

F

S

AND

F

S1

S

realization

Fig. 7.9 Mapping mechanism in between the structure and function

On the above basis, the mapping mechanisms of the product and fault in multiview are given as follows. Products can be expressed as a hierarchical structure in both the reference ontology and hexability view. The main difference lies in nodes and the relationships in between them. The mapping from the product to other views is mainly operated in two ways: i.e. unique mapping or multiple mapping, respectively. The multiple mapping indicates the correspondences from multiple structure/function nodes to one node in the hexability view, such as fault logic unit and area, and detection/maintenance unit. These two mapping operations can be better described by using a transformation matrix [89]. An example mapping from the reference ontology to the reliability view is given as follows. Supposing that the set of product nodes (structures or functions) in the reference ontology is A = {a1 , a2 , …, an }, the fault logic unit in the reliability view is B = {b1 , b2 , …, bm } (m ≤ n), the transformation matrix is T which is written in the following equation, and the value of each element is between 0 and 1, then there is B = A × T. Assuming that after calculation b1 = a1 + a2 , this means that a1 and a2 are jointly assigned to the fault logic unit b1 . ⎛

T11 T12 ⎜ T21 T22 ⎜ T =⎜ . ⎝ ..

··· ··· .. .

⎞ T1m T2m ⎟ ⎟ .. ⎟ . ⎠

Tn1 Tn2 · · · Tnm

The faults in the multiple views are distinguished by different observation perspectives. And the mapping in the other views can still be carried out by using the transformation matrix. Supposing that the attribute set of the fault in the reference ontology is P = { p1 , p2 , . . . , pk }, and the attribute set of the fault in the other views is Q = q1 , q2 , . . . , ql (l ≤ k). First, a k-order square matrix M is used to transform P into an intermediate matrix P = P × M. In the matrix M, the value of each element is in between 0 and 1, and there is at most one element with the value of 1 in any

7.2 Fundamental Models of MBRSE Integrated Design Platform

259

row or column. Then the elements with the value of 0 in all rows or columns of P are removed to obtain Q.

7.2.3.4

Application of the Integrated Design Multi-view Model in Integrated Platforms

The realization of the integrated design multi-view model of under PLM is discussed in the following, by using TeamCenter as an example. According to the customized implementation methodology in TeamCenter, the integrated design multi-view model is created by expanding the classes and relations in PLM, as shown in Fig. 7.10. The layers that need to be expanded include the data layer, object management framework layer (object model and object service) and interface layer of PLM. (1) Define the class structure. Referring to the ontology framework, metaphase object definition language (MODel) developed in TeamCenter is used to define the class structure corresponding to functions and faults, including classes, attributes, relations, and methods. (2) Define the interface. According to the class structure defined in the previous step, MODEL to define the interface, which includes menus, options, dialog boxes, attribute lists, etc., and dialog window editor (DWE) to graphically edit the interface. (3) Compile method. Using C language to invoke API functions to implement the methods (Message) defined in the class structure. (4) Compile the object dictionary. Using MODel compilation command to update the compiled class structure to the object dictionary.

Framework of the integrated design multi-view model based on ontology

define class structure

2.define interface

define class

define relation

define menus

define attributes

associate attribute

define dialog boxes

define method

associate method

3.construct method

4.compile object dictionary

5.update database

user interface PLM

object model

object services

database(Oracle)

Fig. 7.10 Implementation of framework of the multi-view model based on PLM

260

7 Integrated Design Platform for Model-Based Reliability System …

(5) Update database. Using the Updatadb mapping command provided in TeamCenter, we automatically update the newly added objects to the Oracle database. Through the above steps to establish each ontology class in the multi-view model framework shown in Fig. 7.8 under PLM, the object-oriented multi-view data model can be initially established. Here, ontology plays the main role as a metamodel of domain knowledge, to drive the design tools in multiple domains to use or share knowledge, as shown in Fig. 7.11. Figure 7.11 uses Express-G to describe ontology classes and their relationships. Due to limited space, only the most typical classes (rather than all the classes) are listed here. Figure 7.11a describes the class structure and main attributes of the reference ontology. Figure 7.11b describes the main reliability issues, in which the product (i.e., part) and the fault are reused. The mapping relationship between the fault logic unit and Part is used to describe the relationship between the reference ontology and application ontology. In the meantime, the fault concept is expanded from the perspective of knowledge sharing in reliability to concepts such as fault evolution and fault expansion. Similarly, Fig. 7.11c describes the main concepts of maintainability and supportability. The illustration of the results achieved using TeamCenter is shown in Fig. 7.12. At present, the above-mentioned models and methods have been applied in a prototype of a PLM-based integrated design of both functional performance and hexability. Such a platform achieves the information and process integration between 39 hexability tools and CAD tools. At present, multiple tools of reliability prediction, functional description

GPSElsm

function function name RelFunPt

ProdBI

fault frequency ratio

final impact

GenDoc

corresponding function

influence for next layer

Document

fault cause

local impact

Part

Cmponent

Assembly

PrdBIDmm

severity category

fault description fault

RelFalPt remark

fault name

(a) GPSElsm

ProdBI

fault logic unit mapping

PrdBIDmm

GPSElsm

ProdBI

LRU

SRU

PrdBIDmm

fault fault evolution

fault expansion

Part transient fault Assembly

intermittent failure

Cmponent

gradual faults

topology extension logical diffusion

detection and maintenance maintenance unit operation mapping fault handling Part Assembly Cmponent

(b)

fault detection spare parts replacement (c)

Fig. 7.11 The class structure of ontology framework of the multi-view model

fault fault mode

fault isolation direct maintenance

7.2 Fundamental Models of MBRSE Integrated Design Platform

261

stabilized surface support, A, 1, 1

product fault relationship stabilizer support, A, 1, 1 associated faults excessive deformation of the rear beam of the stabilizer, A, 1, 1 arm crack, A, 1, 1 bolt corrosion

associated maintenance tasks stabilizer maintenance maintenance operations included stabilizer decomposition, A, 1, 1 wound detector installation, A, 1, 1 test the maintenance effect, A, 1, 1 assembly, A, 1, 1

Product units: stabilizer support, A, 1, 1 Failure: excessive deformation of the rear beam of the stabilizer, A, 1, 1 Relationship ID: 61432-01 Frequency ratio: 0.02 Function: support the elevator Local impact: the rear beam deformation of stabilizer exceeds the allowable value Higher level impact: the elevator rotation is stuck Final Impact: damage to the aircraft Cause of failure: insufficient stiffness Test method: wound detector examination Basic maintenance measures: functional inspection Minimum device list: True Severity category: II. Remark:

associated tests stabilizer support function inspection, A, 1, 1

Fig. 7.12 Illustration of the multiview data model using TeamCenter

reliability modeling, maintainability prediction, level of repair analysis, etc. have been integrated into the TeamCenter platform to implement the sharing and exchange of knowledge of both functional performance and hexability, and to verify the abovementioned models and methods.

7.2.4 PLM Based Hexability Design Process 7.2.4.1

Implementation Plan of the Integrated Design Process of Both Functional Performance and Hexability

The implementation and management of the integrated design process of functional performance and hexability must be based on an integrated design process management system. Figure 7.13 shows the detailed implementation process. By using the integrated design idea, the modeling methods and metamodel of the process design are selected to define (build) the design process model at the design level. Then, in the operation layer, the design process is executed to complete the information exchange, and this executed design process will be tracked and analyzed in the control layer.

262

7 Integrated Design Platform for Model-Based Reliability System …

strategy and organization layer • performance and RMS integrated design collaboration objectives • performance and RMS integrated organization integration requirements

design layer • design methodology, process design metamodel • design flow model establishment

operation layer

control layer

• performance and RMS process execution • information exchange

• performance and RMS workflow tracking • workflow analysis

technology layer • distributed environment interactions • heterogeneous system connection

Fig. 7.13 Implementation level of the integrated design workflow management system of both functional performance and hexability

The integrated design process of functional performance and hexability is a collaborative design process that involves a large number of persons, information, and resources. Its design environment requires distributed characteristics. Under the premise of familiarizing and mastering the existing process integration technology, it adopts the process integration plan based on the life cycle management (LCM) module in PLM system. The traditional PLM systems include Team-Center developed by UGS corporation, WinChill developed by PTC corporation, ENOVIALCA developed by Dassault corporation, etc. The LCM module in these above PLM systems is relatively mature and has considerable process modeling and management capabilities. The LCM module basically conforms to the reference model proposed by the Workflow Management Alliance. Through its distributed task list and distributed invocation applications, it can meet the needs of distributed workflow users and application interfaces. If the whole process involves excessive persons and resources, a distributed workflow machine should be considered to reduce the burden on the system and improve the operation efficiency. Due to the distributed characteristics of the PDM system, the integration plan of the distributed workflow machine is proposed based on the LCM module, as shown in Fig. 7.14. The distribution of the PDM system is reflected in the fact that its servers can be divided into global servers and local servers, each of which can link to the service and database, to realize distributed databases and distributed services. The implementation of the distributed workflow machine needs to use the process definition function in LCM service in the global server to complete the process

7.2 Fundamental Models of MBRSE Integrated Design Platform

263

workflow control data LCM service workflow machine

task list

database

client local server 1

client

database client

user

workflow definition information/control data

application client

local server 2

global servers

client

client task list

LCM service workflow machine

LCM service

workflow machine

process definition

process monitor

process

process monitor staff

database

definition staff workflow control data

Fig. 7.14 Process integration plan of the distributed workflow machine based on the LCM module

definition. And then the multiple LCM modules of each local server provide the workflow machine function to drive the execution of the workflow. The operation of the entire process can be monitored on the global server by the process monitoring stuff. And the local server can directly communicate with customers through the workflow machine function in the LCM module, and can interact with users through the task table, or directly invoke the applications on the clients. Because the same LCM module is used, the interactions among the local servers and global server are relatively easy. The implementation process of the distributed workflow machine based on LCM module is shown in Fig. 7.15. First, configure the server and the client according to the specified requirements. Second, configure the multiworkflow machine. The LCM service module has already encapsulated the process definition, instantiation, execution and monitoring functions, and it will not automatically invoke the execution function in the workflow machine in LCM modules of each local server. Therefore, it is necessary to use the re-development function provided by the PDM system to implement the process definition in the global server and process automation in the local server. Afterward, a series of operations on the process definition, model instantiation, and process execution can then be carried out by the process management functions provided by PDM.

264

7 Integrated Design Platform for Model-Based Reliability System …

configure the server and client

configure the multi-workflow machine

process definition by global server

model instantiation

process execution by local server

process monitoring by global server

Fig. 7.15 Implementation process of the distributed workflow machine based on LCM module

The above-mentioned implementation process plan can make full use of the support from the existing PDM system, and can be closely integrated with information. Its capability for process management depends on the capability of the PDM system. Therefore, with continuous improvement of the PDM system capacity, the above-mentioned implementation process will become more and more mature.

7.2.4.2

Instantiation Modeling of the Integrated Design Process of Both Functional Performance and Hexability

Through a series of configuration and re-development, the authority management of PDM software can define that who uses what tools to operate the data from specific design work. This shows that PDM software can fully meet the needs of integrated design flow. Based on PDM, establishing an integrated design flow management system requires the following management functions: (1) User management It manages the information of all design participants, by mainly defining the persons involved in the design work, and their roles corresponding to the activities in the meta-model. (2) Message management Manages the empowerment behavior when a participant becomes an executor. The main purpose is to establish a relationship object between the participant and task, namely the executor through which the authority of the participant is managed by the specific executable program for the designer through writing message. After associating the activity with the executor (i.e. its relationship object), the reference relationships among the “activity”, “role”, and “application that needs to be activated” can be established. And the message compilation is carried out through re-development, such as customization. (3) Condition management It governs the rules of data flow, corresponding to the conversion condition in the meta-model.

7.2 Fundamental Models of MBRSE Integrated Design Platform

265

(4) Workflow management It’s main purpose is to create workflow tasks, define nodes of each workflow step, create relationships among different work flow steps, associate the work flow tasks and steps, and finally complete the definition of the entire workflow. The process of establishing an integrated design flow by using the PDM-based process management module is shown in Fig. 7.16. In the first stage, different executors and task modules in the integrated design flow are firstly created. There are four types of users in the LCM module of PDM, including users, user groups, roles, and dynamic participants. Among them, the dynamic participant is a unique kind of user in the workflow management module. There are 11 types of task modules in LCM. The task modules used in the first stage include job assignment tasks, distribution notifications, approval tasks, automatic

define task and state executor(relationship object)

task job assignment task

task undertaker

distribution notification

recipient

approval task

approval staff

automatic processing task

observer

edit the properties and specify the job tool

reference role

reference permission

condition task workflow

user, user group, role, and dynamic participant message group (permission)

condition

create parallel design flow and auxiliary module parallel workflow workflow history

parallel process

task flow

working calendar

decomposition process

workflow

associating transition condition (implicating the access logic of the data flow)

edit design flow define workflow step nodes

associate task and step

step updation (name, status, data warehouse) to assist in the access logic of the data flow through the transformation of the data warehouse

create process logic by wiring (control flow)

verify and save

Fig. 7.16 Illustration of the integrated design process of PDM based LCM

266

7 Integrated Design Platform for Model-Based Reliability System …

processing tasks, condition tasks, and workflow. And the corresponding design work can be managed through these task modules. For instance, the job assignment task module is used to specify tools for the job, and therefore become the main task module during the design work. The distribution notification module is used to send notifications to relevant people in the life cycle. The approval task module is used to vote when the state of a design object is changed. The automatic processing task module is used to automatically execute a certain task by sending specific messages, and the corresponding object will be sent back to the task being automatically executed. The condition task module is used to check the attribute value of one or more objects through the condition object and then set the branch for the workflow according to the returned Boolean values (i.e. TRUE or FALSE). The workflow can reflect changes from a series of states of the design object in the design process. Therefore, it can be created consisting of two parts: the attributes of the workflow itself and the execution steps in the workflow. Generally speaking, construction of a design object always starts from the processing state, and then undergoes a series of state changes such as approval, official release, change, and expiration, and finally builds the workflow through a variety of pre-defined tasks. After finishing the definitions in the task module, it is necessary to associate the users with the tasks by editing executor objects. This step relates to the reference of messages, message groups (rule objects) and conditions. There are in total four types of executor objects, which are the undertakers corresponding to the assigned tasks, the recipients corresponding to the distributed notifications, the approvals corresponding to the approval tasks, and the observers corresponding to other tasks. These four types of executor objects can all be associated with users, user groups, roles, and dynamic participants. Through the quotation of “messages” in the executor objects, the user permission in the life cycle can be clarified. Next, in the second stage, a variety of task modules related to the integrated design process are created, mainly including parallel workflows and auxiliary modules. There are two ways to improve the description of the workflow: i.e. process decomposition and task flow management. In the process decomposition module, different attributes of the same object enter different branches of the process, and the transfer of data is allowed. Whereas, in the task flow management module, a single object enters different task branches. However, the transfer of data is not allowed in this module. Afterward, in the third stage, a graphical representation of the integrated design flow is created. The third stage is an instantiation stage to perform the integrated design process of both functional performance and hexability. First, certain steps are created in the workflow module, then the established tasks are associated with these steps, and the relationships among these steps (e.g. success, failure, etc.) are then created. In this step, the association should be made following the process logic of performance and hexability integrated design. This is equivalent to the construction of a control flow. Finally, the direction of the data flow is controlled by writing conditions and editing the databases in different stages of the workflow. This process should be carried out under the specific data constraints given by the integrated design. In this

7.2 Fundamental Models of MBRSE Integrated Design Platform

267

way, the integrated design flow is completely created, using the workflow module iteratively to meet the hierarchical needs. After being created, the integrated design flow needs to be further verified and saved. Then by debugging using the pilot run function provided by TeamCenter® (TC) to debug, it can be officially run if no error is found. Then, it is possible to use the process management modules (such as the suspend, freeze, restart, and monitoring modules) provided by TC to monitor and manage the instantiation model of the integrated design process. The whole process is to implement the concept of integrated design of both functional performance and hexability based on the workflow management system, which is equivalent to construct an integrated design flow management system. Such a process can be completed by using the LCM module in PDM, under assistance with the re-development programming in the compilation of messages and conditions.

7.2.4.3

Example of Instantiation and Management of the Integrated Design Process of Both Functional Performance and Hexability

Taking a TC-based integrated design platform as an example, the running of an integrated design process is illustrated in the following. (1) Personnel organization and authority configuration As shown in Figs. 7.17, 7.18 and 7.19, the organizational structure of personnel is created in the integrated platform and the corresponding authorities are assigned. A total of 13 user groups (Fig. 7.17) and more than ten roles (Fig. 7.18) have been created. According to the method shown in Fig. 7.19, multiple roles are created to associated roles and user groups, as given in Table 7.1. For different user groups and roles, more than 100 permission rules are created to ensure the security and authority of the data of the users. In order to make the authority configuration more flexible, a number of message access rules are firstly written for the user groups and roles and then associating users with these user groups/roles to obtain the user’s authority. In the current example, to ensure that reliability designers can obtain/update data, a total of 133 message access rules were created, including 17 creation permissions, 43 update permissions, 55 check-in and check-out permissions, 4 delete permissions, and 14 other permissions. In the event that the hexability user group is allowed to check out the task stage objects from the RMSVault e-library, a new message access rule needs to be created, as shown in Fig. 7.20. In addition, it is necessary to create message access rules to allow the hexability user group to check in and check out other objects in the RMSVault e-library. (2) Process configuration The importance of the process lies in the fact that the integrated hexability platform drives the work development on track throughout the process. However, the process

268

7 Integrated Design Platform for Model-Based Reliability System … Object Explorer

File(F)

Action(A) Information(I)

military representation group

Create(C)

Query(Q)

chief engineer group

Search(S)

Window(W)

structure group

Option(Q)

View(V)

Help(H)

pneumatic group

propulsion group

guidance group

flight control group

combat cortroller group

component group

test group

mass production group

RMS Admin group

RMS group

Querying in progress Set the maximum number of Retrieved Objects: 5000. The placement operation is complete.

Fig. 7.17 The user group in a demonstration and validation project

is very complicated in practice. In order to ensure the accuracy of the process, it is necessary to carry out hierarchical analysis and decomposition analysis on the work process of hexability integrated design to finally establish the process model in the platform. In the current example, the established process model contains a total of 39 branch processes, more than 300 job nodes, about 40 review nodes, and more than 50 parallel processes. The prototype design process of the flight control system is shown in Fig. 7.21 as an example. In the integrated platform, it needs to the analyze the objects (in a branch) to determine whether that branch is run by a parallel process or a decomposed process. After that, the job nodes (such as the job flow process and evaluation process) and process nodes (such as the task flow and workflow) are created. As an example, the creation of the reliability prediction job process is shown in Fig. 7.22, in which the relevant task stages can be configured and the relevant hexability tools can be used. To implement the process shown in Fig. 7.21, a total of 12 task nodes, 3 parallel process nodes, 6 task flows, 6 workflows, and 1 evaluation node are created on the integrated platform, as shown from Tables 7.2, 7.3, 7.4 and 7.5. Finally, the integrated design flow is completed on the integrated platform, as shown in Fig. 7.23.

7.3 Integration of MBRSE Integrated Design Tools

269

Object Explorer File(F)

Action(A) Information(I)

military representation

structure engineer

Create(C)

Query(Q)

design chief engineer

pneumatic designer

flight control designer

combat cortroller designer

reliability allocation staff

reliability modeling staff

FTA staff

Search(S)

Window(W)

Option(Q)

process chief engineer

propulsion designer

administrator - test data management tool

reliability prediction staff

View(V)

Help(H)

quality chief engineer

guidance designer

general user - test data management tool

FMEA staff

component center administrator

The placement operation is complete. The placement operation is complete. The role component center administrator

is created.

Fig. 7.18 Roles in the demonstration and validation project

7.3 Integration of MBRSE Integrated Design Tools 7.3.1 Integration Requirements for the Integrated Design Tools The product meta sets configured with specific technical states are the main objects for the integrated design tools. As shown in Fig. 7.24, the meta sets of source products (including design requirement and design parameter) are firstly input to the integrated design tool. Then in terms of the design requirements of the product meta model, the tool selects the corresponding auxiliary design methods to complete the design analysis work with the required supporting data, and finally export the product meta sets (including the new and updated design parameters). In this way, the integrated design tool can be regarded the “processing factory” of the product meta model. The granularity of the integrated design tool should be set in correspondence to the hexability work items specified in, for instance, Chinese National Military standards. Usually, the plan, program, and assessment of hexability work are often carried out in the form of work items. Therefore, it is convenient to promote and control the development of work items by taking them as basic units to establish the

270

7 Integrated Design Platform for Model-Based Reliability System … Object Explorer File(F)

Action(A) Information(I)

chief engineer group

Create(C)

Query(Q)

military representation group

Search(S)

Window(W)

Option(Q)

View(V)

combat cortroller group

Help(H)

guidance group guidance designer PL-15

Option role name

mass production group quality chief engineer PL-15

Military representation Reliability allocation staff reliability modeling staff reliability prediction staff guidance designer

creat

RMS003 FMEA staff PL-15

process chief engineer

role assignment

combat cortroller designer role name

performance designer propulsion designer

project name

production department

user/group name

Ok(O)

Option Apply(A)

Erase

Cancle

Help

user/group name

personnel department component group military representation group type office guidance group combat controller group chief engineer group bulk production group technical reform department

OK(O)

cancel

Fig. 7.19 Roles allocation Table 7.1 Assignment of roles Role

User/user group

Item PL-XX

Design chief engineer

Chief engineer group

Military representative

Military representation group PL-XX

Reliability designer

Hexability group

PL-XX

Combat controller designer

Combat controller group

PL-XX

Guidance designer

Guidance group

PL-XX

Propulsion designer

Propulsion group

PL-XX

Pneumatic designer

Pneumatic group

PL-XX

Quality chief engineer

Mass production group

PL-XX

Component center manager

Component group

PL-XX

General users of the experimental data management tool Test group

PL-XX

…

…

…

7.3 Integration of MBRSE Integrated Design Tools

271

Create message access rule condition name

class name inherit?

message name/message group name participant

Ok(O)

Apply(A)

Erase

Cancle

Help

Fig. 7.20 Example of message access rules

flight control system reliability prediction

H

start

view reliability reports

flight control component design flight control software FMEA

flight control software test

flight control software reliability evalution

countersign

meet requirements?

chief engineer military representative

Y

Transformation Phase

N

test coverage analysis servo design

servo control circuit design tolerance analysis of control circuits

actuator subcomponent design

mechanical reliability simulation analysis and evaluation

Fig. 7.21 Prototype design process of the flight control system

integrated design tool. According to the hexability methods and models discussed in Chap. 4, the integrated design tool should not be limited to cover the work items in the relevant standards but should also reflect the integrity of the integrated design process. The input, output, and supporting data for the integrated design tool should follow certain standards. The input product meta-data should be obtained through the interface of the unified platform, with clear technical status and design requirements, and the same data format, which can be recognized by different integrated design tools. The output data should meet the format requirements and be submitted to the unified

272

7 Integrated Design Platform for Model-Based Reliability System …

Create project name

job flow Reliability prediction of flight control component A

version description valid from process security work calendar name

2010/04/20

until

2010/04/20

No security

task stage

PL- combat rocket target test - autonomous flight phase

RMS tool

reliability prediction

SupersededByRevVersa Ok(O)

Apply(A)

Erase

False Cancle

Help

Fig. 7.22 Illustration of creating a reliability prediction job process

platform interface to share the results. And the supporting data should also be set in a unified format in terms of requirements of the design tools, to achieve a centralized and unified management.

7.3.2 Integration Model on the Integrated Design Tools 7.3.2.1

Basic Principles of Integration of the Design Tools

By using integrated design tools, data sharing can be achieved on the basis of a common data model. As shown in Fig. 7.25, different design tools can invoke the same unified integrated module to implement integration between data models in PDM. In these integrated design tools, the functions that needs to be implemented include PDM login to the server, permission verification, work task obtainment, product retrieval, product view reading/updating, parameter attributes (and relevant documents), design analysis results submission (and relevant documents), and tool version updating. From the integration principle, it can be seen that the key to integration of the design tools is the exchange between the design tools and the hexability data models in PDM, through the interfaces provided by the different tools, as shown in Fig. 7.26. In this way, on the one hand, it is possible to obtain the product structure, attributes,

7.3 Integration of MBRSE Integrated Design Tools

273

Table 7.2 Operation process Name

Description

Flight control component design

The flight control department designs the flight control component

Hexability tool involved

Reliability prediction of the flight control component

The flight control department makes reliability predictions of the flight control component

Viewing reliability reports

View the reliability data of the flight control component

FMEA of flight control software

FMEA of the embedded software FMEA embedded software in the flight control

Reliability prediction

Test for flight control software Test for flight control embedded software Reliability assessment for flight control software

Reliability assessment for flight control embedded software

Servo design

The servo department makes the design plan for the servo

Control circuit design

Design the control circuit in the servo

Actuator sub-component design

Design the actuator component in the servo

Test coverage analysis

Analyze the test coverage of the control circuit

Test coverage analysis software

Control circuit

Analyze the control circuit tolerance in the servo

Circuit tolerance analysis software

Actuator reliability simulation Actuator reliability simulation analysis using analysis software

Software reliability assessment

Mechanical reliability analysis kit

Table 7.3 Task flow description Name

Task flow

TF_flight control software analysis

Flight control software FMEA, flight control software test, flight control software reliability assessment

TF_flight control reliability analysis

Flight control reliability prediction and viewing reliability report

TF_servo circuit analysis

Test coverage analysis, control circuit tolerance analysis

TF_actuator design and analysis

Actuator sub-component design, actuator reliability simulation analysis

TF_flight control component design and analysis

Flight control component design, P_flight control reliability analysis (in parallel)

TF_servo design and analysis

Servo design, P_servo component design and analysis (in parallel)

274

7 Integrated Design Platform for Model-Based Reliability System …

Table 7.4 Task flow description in parallel Name

Task flow

P_flight control reliability analysis

TF_flight control software analysis, TF_flight control reliability analysis

P_servo subcomponent design analysis TF_servo circuit analysis, TF_actuator design and analysis TF_flight control component design and analysis, TF_ servo design and analysis

P_rocket subsystem design

Table 7.5 Workflow description

Name Preliminary design LC_flight control design and analysis LC_servo design and analysis LC_flight control software analysis LC_servo control circuit LC_servo actuator Edit TF_ flight control design analysis, A, 1

Edit parallel flow process name

File(F)

Help(H)

P_missile component design analysis

description

Edit preliminary stage,A,1 File(F)

Help(H) List of process branches in parallel

condition process

TF_servo design and analysis TF_flight control design and analysis

2 P_flight control reliability process in parallel

OK(O)

cancle Edit TF_ servo design analysis

condition process File(F)

Help(H)

2 P_missile component Process in parallel

condition process 3 missile countersigning evaluation process servo sub_component process in parallel

4 phase change notice distribution-list

Accepted "open" option, initiating object server Accepted "update flow" option, initiating object server Accepted "edit process" option, initiating object server

Fig. 7.23 Illustration of the integrated platform process

Accepted Remove Node initiating object server

option,

7.3 Integration of MBRSE Integrated Design Tools

275

tool method 1 meta sets of source products[C]S

design requirement and design parameter

method selection

meta sets of target method 2

products[C]O new and updated design parameters

method n

supporting data Fig. 7.24 The relationship between the integrated design tool and product meta model Fig. 7.25 Integration principle of design tools

integrated component

tool 1 tool i

RMS data model

RMS process model

PDM

and documents required by the design tool, and on the other hand, it is possible to put the result data and documents in the appropriate position in PDM. To develop a PDM-based integration interface for the hexability design tools, it is necessary to grasp the data requirements of the tool, use service-oriented architecture (SOA), and combine XML technology to encapsulate API in PDM as a Web service. And the encapsulated Web service is going to be called during the integration of hexability design tools. Using this advanced architecture, it is capable of achieving a cross-platform, cross-regional WAN-based loosely coupled integration. The integration interface contains the main functions required for the integration of design tools. Figure 7.27 illustrates the integration process among different design tools by invoking the interfaces in these tools.

276

7 Integrated Design Platform for Model-Based Reliability System … enter the product name create a query statement

nothing

retrieve the product in PDM exist get the product object

get the RMS property

get the RMS document object

load the property

get the RMS file object

get the child object ID load parent-child relationship

retrieve child object

load the file

Fig. 7.26 Exchange of the design tools and hexability data models in PDM

tool information product hierarchy different levels of data information

model backstage execution and management makefile file generator

production target document

extension of the attribute model

compiler(nmake)

service

establishment of relational model

Browser(MODel)

library file

extension of the product model extension of the structural model

object dictionary

extension of the product information and construction of object files

constant feedback

PDM integration

modification of tools

PDM invokes the tool establish tool class model Create a Dptool.dat file to import the database

various application tools invocation

requirement

encapsulation integration of APIs

invocation

encapsulate the corresponding API function family according to the tool integration requirements

tool 1 tool 2

unified management of RMTSS engineering database

unified management of RMTSS basic database

Fig. 7.27 PDM-based integration process of different hexability design tools

7.3 Integration of MBRSE Integrated Design Tools

277

RMS integration tool performance and RMS product data management application interface interface service layer

performance and RMS product data management business logic interface

integrated design data model access

integrated data warehouse design flow control access

integrated design integrated database technology state design change access control management

engineering database business entity layer basic database

performance and RMS product data management business logic interfaces in the bottom-layer(PDM platform related)

integrated integrated data model process model

TeamCenter

integrated integrated data model process model

Enovia LCA

integrated integrated data model process model

PDM Service Layer

Windchill

PDM application services

Fig. 7.28 Interface architecture of the hexability design tools for integration with PDM

7.3.2.2

Interface Among the Design Tools for Integration

General Architecture of Integration Interface Among the Design Tools The integrated interface contains a three-layer architecture, which are “PDM Service Layer”, “Business Entity Layer” and “Interface Service Layer” respectively, as shown in Fig. 7.28. Among them, both the Business Entity Layer and Interface Service Layer have the application interface (called “performance and hexability product data management application interface”) and business logic interface (called “performance and hexability product data management business logic interface”). The business logic interface converts all relevant application methods in PDM into corresponding interface services, for conveniently using different data management business entities from the lower levels to implement the functions required by the hexability design tools. On the contrary, the application interface is mainly used to invoke the upper level modules in the hexability tools. For example, the reliability block diagram model and its related data can be directly transferred to FMECA for CA analysis through the interface. Among the above-mentioned interfaces in three layers, only the performance and hexability product data management business logic interfaces in the bottom-layer are connected with the PDM platform. In this way, only these interfaces (in the bottom layer) need to be changed in the cross-platform migration, and the interfaces in the other two layers can still remain unchanged. This reduces the workload in the cross-platform migration, and provides a better solution for the multi-environment problem of using the performance and hexability integrated design platform.

278

7 Integrated Design Platform for Model-Based Reliability System …

Fig. 7.29 A typical interface invoking process

Invoke the PDM login function in the application layer to login

Gets the XML document of the job task Obtain the XML documents of the product tree and related RMS information Interpret the documents as RMS model entity objects

Uses entity objects in clients

Based on the interface architecture shown in Fig. 7.28, a typical interface invoking process (as shown in Fig. 7.29) is provided as follows: (1) Invoke the PDM login function in the application layer to establish a connection with the PDM. The user can log into the PDM platform through the Web or Windows application client. (2) Get the job task and its related product tree in an XML document. (3) Use the application client to invoke related functions to obtain the XML document, which includes information on the product tree and relevant hexability parameters. (4) Interpret the XML document as a hexability model entity. (5) Use and maintain the related data with the hexability data model in the application client. Therefore, the interface of the hexability design tool is mainly provided as an entity of the hexability model. Figure 7.30 describes different kinds of data model entities and their relationships in the interface of the hexability design tool. The interface takes the product tree as the core, and provides the product tree and its related mission profile, mission stage, hexability parameters and fault modes for outside uses. However, the gray functions in Fig. 7.30 cannot be implemented for the time being, but will be gradually expanded driven by needs. Based on the service-oriented integration architecture for the hexability design tools, a PDM-based cross-regional and cross-platform integration can be well implemented. And by utilizing the natural advantages of the HTTP protocol and XML technology used in the Web service, an WAN-based integration can also be implemented. In this way, the existing applications can be integrated in a lower cost through the interface with only a few changes. In addition, the three-layer architecture can maintain good expansibility, maintainability, and portability. The expansibility is manifested by the interaction between interfaces based on the XML technology, where the information contained in the XML document can be extended. The maintainability is manifested in the system upgrade, where only the relevant information on the Web server need to be updated and maintained. If the user needs to change the

7.3 Integration of MBRSE Integrated Design Tools

basic maintenance operation

fault effects

fault event

279

fault mode information

support resource

maintenance task

test

fault report information

environmental test

quantitative parameters of reliability product tree

quantitative parameters of maintainability quantitative parameters of testability

mission stage information mission profile information

product tree-related attribute information

communication

PDM data model Fig. 7.30 Hexability data model entities and their relationships

PDM platform, only part of information on the server needs to be updated. And, the conversion of the PDM platform can be implemented by rewriting the API functions in the PDM platform. Figure 7.31 illustrates the deployment of TC-based integration of the hexability design analysis tools, with three layers which are “platform layer”, “service layer” and “application layer” respectively. The Web server in the service layer and PDM server in the platform layer are connected through Mux. And the data on the Web application server and the Windows applications on the application layer are exchanged with PDM by means of invoking Web services.

280

7 Integrated Design Platform for Model-Based Reliability System …

PDM platform PDM server Mux communication Web services invocation

operation

Web services invocation PDM Adapter Service Web server Run the PDM Adapter service, Web services invocation to provide web services to clients Web services invocation

service layer http connection

http connection

http connection

http connection

Web access Web access Web application server: Run RMS design and analysis tools based on Web and application system Web access

application layer

browers

browers

browers

Web based RMS design and analysis system clients

Windows client

Windows client

Windows client

RMTSS design and analysis tool clients based on Windows applications

Fig. 7.31 Deployment process of TC-based integration of the hexability design tools

Case Studies of the Integration Interface in the Hexability Design Tool The integration interface is written by using Visual Studio 2005C#, by using ISynthesis solution shown in Fig. 7.32. The ISynthesis solution includes three layers which are the “Adapter Service Layer”, the “Entity Layer” and the “UI Layer”, corresponding to the “PDM Service Layer”, the “Business entity layer” and the “Interface service layer” in Fig. 7.28, respectively. (1) Adapter service layer: including the PDM adapter service, which contains a Web service to process related operations such as data interaction with PDM (as shown in Fig. 7.33). (2) Entity layer: including the ISynthesis service, which describes the product data model through each entity class and the relationships among them (as shown in Figs. 7.34 and 7.35).

7.3 Integration of MBRSE Integrated Design Tools

281

solution explorer solution

solution explorer

(8 projects)

class view

Fig. 7.32 Illustration of the Isynthesis solution

Fig. 7.33 PDM adapter service

Fig. 7.34 Entity classes and their related interfaces

(3) UI layer: including the WinUI Synthesis module (as shown in Fig. 7.36) for logging in Windows applications and the WebUI Synthesis module (as shown in Fig. 7.37) for logging in Web applications. ➀ WinUISynthesis module ➁ WebUISynthesis module ➂ Case studies of integration of the hexability design tools.

282

7 Integrated Design Platform for Model-Based Reliability System …

Fig. 7.35 Relationships among the entity classes Fig. 7.36 Illustration of the WinUISynthesis module

7.3 Integration of MBRSE Integrated Design Tools

283

Fig. 7.37 Illustration of the WebUISynthesis module

modeling product model circuit board

EDA thermal design \ analysis

product MTBF

use use register

prediction model

report generation

prediction report

Teamcenter

reliability prediction integration tool

Fig. 7.38 Integration of the reliability prediction tool and its operation process

In this section, the integration process of the reliability prediction tool shown in Fig. 7.38 are provided as an example. Through the integration interface, the reliability prediction tool can obtain the PCB model from the TC product model, including the components and relevant information such as the types and parameters of these components. Using the obtained information, the reliability prediction tools can predict the reliability of the PCB model and output the prediction results (such as MTBF). Then the integration interface can be used again to save these reliability prediction results and the prediction report (in word format) in the TC model for other analysis tools or further evaluation.

Chapter 8

Application Cases of the Use of MBRSE

Abstract By taking a terrain mobile robot platform as an example, this chapter presents the forward reliability design process of using the MBRSE method, starting from the requirement analysis. At first, the functions, performance and hexability requirements of the terrain mobile robot platform are determined, and decomposed. Then, a preliminary design is carried out from the perspectives of function and structure, to figure out the working principles of the terrain mobile robot platform. Next, based on the functional models, a systematic fault identification is carried out, followed by the determination the failure transmission chain and closed-loop mitigation of the key failures. Finally, by taking a strain tester as an example, this chapter shows the component failure identification and control process, by means of Physics-of-Failure simulations. Keywords Terrain mobile robot platform · Requirement analysis · Preliminary design · System failure identification and mitigation · Component failure identification and control

8.1 Requirement Analysis 8.1.1 Operating Requirements The terrain mobile robot platform (here abbreviated as “platform”) is the mobile driving carrier of a dual-use robot, to perform a variety of tasks such as reconnaissance, transportation, and other special operations by loading different functional modules. It is used in many military and civilian fields such as national defense, antiterrorism, disaster relief, battlefield, and special industry [90]. Therefore, it must have the following abilities: (1) strong mobility, high driving speed and the ability to climb, cross obstacles, and pass through water [91]; (2) strong environmental adaptability, and can run freely in grass, sand fields, mountains, highway roads and indoor terrain [92]; © National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_8

285

286

8 Application Cases of the Use of MBRSE

(3) able to withstand high vibration and shock loads, and can adapt to the temperature from − 25 to 50 °C, and the harsh electromagnetic environment; (4) support remote-control mode, and can implement local path planning [93]. The key technical indices of the platform are listed in Table 8.1, where D indicates Demand (must achieve) and W indicates Wish (expected to achieve). Table 8.1 List of key technical indices Serial number

Category

1

Environment

2

Index name

Index requirement

D/ W

Natural environment

Rain, snow, ice, salt spray

D

Temperature

− 25 to 50 °C

D

3

Humidity

1–100

D

4

Wade

≥ 0.5 m

W

5

Ground environment

Sand, road, grass, dinas

D

Maximum driving speed

5 km/h

D

7

Maximum ramp angle

30°

D

8

Maximum stair climbing height

200 mm/step, angle 30°

W

9

maximum vertical-wall-trespassing height

200 mm

D

10

Maximum trench-crossing width

300 mm

D

6

Performance

11

Mileage range/time range

≥ 20 km or 2 h

D

12

Charging time

≤8h

D

Weight

≤ 50 kg

D

13

Structure

14

Function

Load

≥ 50 kg

D

15

Communication Distance (open conditions)

≥ 1.5 km

D

16

Positioning distance (under normal operating conditions)

≥3m

D

Control and perception

Direction, speed, attitude

D

Mean time between failures (MTBF) in miles

≥ 40 km

D

17 18

Reliability

19

Supportability

20 21

Maintainability Safety

Battery replacement time

≤ 5 min

D

Universal rate of support equipment (tools)

≥ 50%

D

Mean time to repair (MTTR)

≤ 20 min

D

Failure isolation capability

Ambiguity 3(LRU)

D

Safe

Explosion-proof, nontoxic, no sharpness, no high voltage leakage

D

8.1 Requirement Analysis

287

The general design requirements of the platform are as follows.

8.1.1.1

Functional Requirements

(1) The platform is required to be remotely controlled and to have local autonomous capability. (2) The movement of the platform is controlled by driving the wheel motor, and the attitude of the platform is controlled by driving the push rod. (3) The platform is capable to acquire environmental sensing information such as distance and images, and internal state information such as direction, position and speed. (4) The platform is required to equip the remote data communication channel, and is capable to simultaneously transmit commands, data and video images. (5) The platform is required to equip a “plug-and-play” load interface to deal with mechanical loads, power supplies, and electrical drivers and is capable of remotely managing load modules through the control center. 8.1.1.2

Technical Requirements

(1) The platform is required to be developed with a compact, simple and reliable structure, which is as miniaturized and light as possible, and equip the crawlertype movement system to adapt to multiple terrains. (2) The platform is required to move at a certain speed, perform tasks under different environmental and terrain conditions, climb and cross obstacles, steer in all directions, and have superior pass ability, attitude stability, and accuracy of high-speed movement. (3) The drive system of the platform must have high transmission efficiency, compact structure, light weight, high dynamic response characteristics, and integrate the drive control circuit. (4) The platform is required to have some sensing capability in the external environment to detect targets. It should be able to give an all-round observation on its surrounding environment by integrating multisource information. (5) The main control system of the platform is required to detect the working status of each functional module and provide real-time processing. It is also required to have a standard universal interface, which is small, light, low power assumption, can be upgraded and expanded, and can meet the requirements of temperature, mechanics and electromagnetic compatibility. (6) The platform is required to have a navigation function to provide real-time absolute and relative position, speed, and attitude in the task area, with certain accuracy. (7) The platform is required to have an emergency power supply, which can carry out battery charging and discharging and power supply management to avoid accidents.

288

8.1.1.3

8 Application Cases of the Use of MBRSE

General Requirements of Quality Characteristics

(1) Reliability requirements It is necessary to ensure that the platform has a high reliability. By fully formulating and implementing reliability design criteria. The platform is required to have a high MTBF and mission reliability (RM). (1) MTBF: not less than 40 km. (2) Requires redundancy for important subsystems and components. (2) Safety requirements Safety design shall be carried out to ensure that the platform does not cause accidents and dangers such as human casualties, equipment damage, and property loss during daily use or by equipment failure and human operation errors. (3) Maintainability requirements It should be ensured that when the platform fails, the failure location can be estimated, measurable, and easy to be repaired. (1) The mean time to repair (MTTR) should not be greater than 20 min. (2) The connection of the hatch cover for maintenance must be designed in a quickrelease mechanism, and the hatch cover must have a reliable anchor or support after opening. (3) It is required to adopt a combination structure as much as possible to achieve modularity and standardization. (4) The components are required to be interchangeable, and the key components should have error prevention and identification marks. (4) Supportability requirements It should be ensured that the platform has high self-supporting capability to consume as little support resources as possible, and can easily implement support-related work. (1) A portable support toolbox is required. (2) Support tools must be universal. (5) Testability requirements (1) It is required to monitor the output of key failures and provide an alarm function. (2) The output display of the monitor screen should meet ergonomic requirements. (3) The failure notification information is required to be accurate, and the failure isolation’s ambiguity group size is not higher than 3. (6) Environmental adaptability requirements (1) The working temperature range is required as: − 25 to 50 °C. (2) The storage temperature range is required as: − 25 to 50 °C. (3) The platform must be waterproof and able to adapt to environmental conditions such as outdoor vibration, shock, and dust storms.

8.2 Preliminary Design

289

8.1.2 Requirements Decomposition Because the overall requirements of the platform cannot be designed directly, it needs to further decompose these overall requirements into sub-requirements that can be designed and realized. There are in total 6 1st-level sub requirements decomposed from the overall requirements of the platform. As shown in Fig. 8.1 shows the 1st-level sub-requirements, and the sub-requirements of each 1st-level sub-requirement. The requirement decomposition shown in Fig. 8.1 is helpful in effectively carrying out the requirement-to-function mapping, which further provides helps in function analysis and design of the platform.

8.2 Preliminary Design 8.2.1 Functional Design According to its requirements, the main functions of the platform are summarized as follows: (1) The platform has good moving ability, certain ability to overcome obstacles, external environmental information (such as distance and image) and internal environmental information (such as direction, position and speed) collected by the sensors. (2) The platform has a remote data communication channel, which can simultaneously transmit instructions, data, videos and images, and receive the remote-control signals from the control center. (3) The platform provides reliable mechanical, power, and data communication electrical interfaces for the load module. The functional principle of the platform is shown in Fig. 8.2. By further analysis of the functional principles shown in Fig. 8.2, the functional process of key components of the platform can be obtained, which are the signal processing module (including power supply management), communication module (including image transmission), and the motion module (including obstacle crossing), etc. (1) The signal processing module (including power supply management) is mainly used to convert the external electrical energy into the energy to be directly provided to the platform and all sub-system functional modules. In addition, it is used to process information from the control center. Then the processed information is sent to the other modules of the platform, for real-time monitoring the working status of each functional module. The functional process of the signal processing module is shown in Fig. 8.3. (2) The communication module (including image transmission) is mainly used to transmit the information through wireless among the platform, control center

290

8 Application Cases of the Use of MBRSE mobile platform

movement requirements

communication requirements

image requirements

power requirements

load interface requirements

general quality characteristic requirements

(a) carrying capacity load traction load

shape structural characteristics

weight wheel/track maximum speed straight line acceleration time turning tadius

movement requirement

turn

movement

braking

turning angular speed braking time mileage range

range time range climbing angle

obstacle crossing

trench-crossing width wading depth vertical-walltrespassing height

transmission efficiency power

driving system

modular structure

(b)

Fig. 8.1 Overall division of mobile platforms a requirements decomposition of the mobile platforms; b requirements decomposition of the mobile platforms—movement requirement; c requirements decomposition of the mobile platforms—communication requirement; d requirements decomposition of the mobile platforms—image requirement; e requirements decomposition of the mobile platforms—power requirement and load interface requirement; f requirements decomposition of the mobile platforms—general quality characteristic requirement

8.2 Preliminary Design

291 distance remotecontroller

direction and speed control anti-electromagnetic interference internal monitoring issue instructions receive feedback information

communication requirement

computational performance

host computer

general purpose interface bus size, weight and power dissipation temperature requirement electromagnetic compatibility positioning accuracy navigation and positioning attitude perception

(c) resolution transmission speed image requirements

360° omnidirectional zoom magnification infrared/night vision appearance and dimension

(d) Fig. 8.1 (continued)

292

8 Application Cases of the Use of MBRSE

charging time charge battery replacement time power supply discharge power requirements

control supply image transmission supply

voltage transformation standby battery charging protection backup/safety discharging protection power down protection expansion interface

reserved extension interfaces general electric interface

load interface requirements general interface

general mechanical interface modular quick changeover device

(e) Fig. 8.1 (continued)

8.2 Preliminary Design

293 working temperature

working environment

exposure

ambient condition

humidity (corrosion and oxidation)

adapt to a variety of ground environments

rain, snow, antifreeze and anti-slip

storage temperature reliability

level 1 derating mean time to repair maintenance accessibility

maintainability

mean time between failure mean mileage between failure

modularity and standardization interchangeability and error prevention

failure isolation failure self-detection failure alarm anti-vibration

maintenance cost hexability requirements

maintenance safety critical parts backup supportability

general support tools and equipment spare parts, consumables and standard parts

safety reserve test interface testability

initial safety state operational safety power supply safety management non-toxic, harmless and explosion-proof

adapt to various tests running noise economy time and technology others

electromagnetic compatibility packaging and marking transportation

install handle appearance, dimensions, removable top

(f)

Fig. 8.1 (continued)

and manipulators. The platform transmits the acquired video information to the control center. And based on such video information, the control center gives control instructions to the platform in return. Similarly, the platform transmits the video information to the manipulator. And based on such video information, the manipulator gives control instructions to the platform in return. The functional process of the communication module is shown in Fig. 8.4. (3) The motion module (including obstacle crossing) is mainly used to provide support functions. It ensures that the different subsystems of the platform can be elastically connected to transmit load, alleviate impact, attenuate vibration, and adjust the position of the vehicle body while the platform is moving. It also converts electric energy into mechanical energy to control the speed of the platform. In this way, the platform is capable to accelerate and break, and also

294

8 Application Cases of the Use of MBRSE

interference signal

picture signal

image capture F-04 resolution ratio>=720P

picture signal communication

control signal

communication distance (open space)≥1.5km

monitor signal

obstacle crossing

F-03

F-05

control signal

maximum permissible gradient:30° maximum ladder height: 200mm/ step, angle 30 ° The maximum height over vertical wall is 200mm

electric energy

kinetic energy

monitor signal power supply F-01

electric energy

monitor signal electric energy

electric energy

charging time≤8h range≥20km or 2h

monitor signal

signal processing F-02 positioning distance (under normal working conditions) ≥3m control and perception: direction, speed, attitude

movement electric energy

F-06

control signal

kinetic energy

max running speed :5km/h weight≤50kg load≥50kg

monitor signal energy dissipation

Fig. 8.2 Functional principles of the platform

push rod driven

power supply for driving the push rod

F-02-04 control the direction and speed of the push rod

electrical energy

power supply

voltage stabilization and transformation

F-01-01

F-01-02

F-01-03

voltage=24V

voltage=24V

voltage=24V

backup battery power supply for signal processing

control signal

power supply for signal reception

signal process

signal reception F-02-01

control signal

control signal

communication distance (open condition) ≥1.5km

location GPS signal

communication distance (open condition) ≥1.5km

F-02-03 The remote control signal is processed and the corresponding control signal is given to monitor the real-time information of each part

GPS signal

interface transmission F-02-05

power supply for image transmission

motion control signal

communication distance (open space) ≥1.5km

signal emission monitoring signal

F-02-02 GPS power supply

control signal

power supply for the push rod

F-02-06 communication distance (open space) ≥1.5km

monitor signal power supply for driving motor

Fig. 8.3 Functional process of the signal processing module (including power supply management)

has capabilities such as all-round steering, climbing, and obstacle crossing. The functional process of the motion module is shown in Fig. 8.5.

8.2.2 Structural Design According to the functional process analysis of each module, the physical structure of the platform can be initially designed, including the sub-systems such as control

8.2 Preliminary Design

295 signal reception

image signal

image signal monitor signal

F-03-03

monitor signal

communication distance (open space) ≥1.5km

people

signal processing

The image information and monitoring information returned by the mobile platform are analyzed, and the control signal is given by the commands

electric energy

electric energy

control signal

F-03-04

command signal

power supply F-03-01

voltage stabilization and transformation electric energy F-03-02

voltage=12V

voltage=5V

electric energy

electric energy

communication distance (open space) ≥1.5km

control signal vehicle

monitor signal

information display F-03-06

image signal monitoring signal

electric energy

signal emission F-03-05

people image signal

resolution > = 720 p

Fig. 8.4 Functional process of the communication module (including image transmission) providing electricity for the push rod

power supply

control signal for the push rod

maximum ramp angle: 30° maximum stair climbing height: 200mm/ step, angle 30 ° maximum vertical-walltrespassing height: 200mm

motion control signal power supply for driving motor

F-05-01

power supply F-06-01 maximum driving speed: 5km/h weight ≤ 50kg load ≥ 50kg

power conversion kinetic energy

kinetic energy

F-05-02 maximum ramp angle: 30° maximum stair climbing height: 200mm/ step, angle 30 ° maximum vertical-walltrespassing height: 200mm

power conversion F-06-02 maximum driving speed: 5km/h weight ≤ 50kg load ≥ 50kg

kinetic energy

climbing and obstacle crossing F-05-03 maximum ramp angle: 30° maximum stair climbing height: 200mm/ step, angle 30 ° maximum vertical-walltrespassing height: 200mm

power output kinetic energy

monitor signals

signal monitoring F-06-05

monitor signal

F-06-03 maximum driving speed: 5km/h weight ≤ 50kg load ≥ 50kg

monitor signals including speed, attitude, temperature, etc

monitor signal

kinetic energy

support and protection F-06-04 support load

weight ≤ 50kg load ≥ 50kg

Fig. 8.5 Functional process of the motion module (including obstacle crossing)

box, remote-control box, power box, power supply, vehicle body and suspension system. The mapping of the function to the physical structure of the platform is shown in Fig. 8.6. Furthermore, according to the analysis mentioned above, the platform design plan can be obtained, as shown in Fig. 8.7 in which the different subsystems are displayed. The designed functions of each module are given as follows: (1) Communication module: build a bridge to connect the remote-control box and platform (the sub-modules marked with * are those designed for reliability, the same in the following modules). ➀ The platform control communication sub-module: monitor the platform and provide status feedback.

8 Application Cases of the Use of MBRSE

image capture

296

camera

image capture

fuse wire power supply for remote-control

stabilizer

image display

control panel, image display

signal processing

signal processor

signal transmission

signal transmitter

signal reception

signal receiver

remote control box

communication

battery

vehicle body vehicle body support

suspension structure, driven wheels caterpillar band driving wheel

power output

motor

monitoring signal acquisition

sensor

power output

push rod

platform

driver element

power conversion

push rod motor

power supply

circuit that drives the pushrod

signal transmission

interface module

signal sending

signal transmitter

signal reception

signal receiver

navigation

GPS module

signal processing

control circuit board back-up source fuse wire stabilizer

lithium-ion battery stack physical domain

charger

power supply for vehicle body

corresponding mapping

charging

functional domain

power supply

power management

Fig. 8.6 The mapping of the function to the physical structure of the platform

signal processing

climbing and obstacle crossing

platform

power supply

power box

power conversion

control box

motion

output shaft reduction gearbox

8.2 Preliminary Design

297

communication module

image module

motion power control module module module vehicle body and suspension

remote-control module reliability and supportability module

marching mechanism

Fig. 8.7 Modules division of the platform

➁ Image transmission sub-module: ensure real-time and stable transmission of images collected from the camera. ➂ ∗ Backup transmission submodule: provide backup control in case the control signal is disturbed or the main communication module is broken down. (2) Image module: collect image and sound information around the platform. ➀ Camera and lens sub-module: collect remote monitoring videos obtained from the platform. ➁ Pan-tilt-zoom(PTZ) sub-module: ensure the image stability during the bumpy movement. ➂ Pickup sub-module: collect sounds around the platform. (3) Motion module: consist of the traveling mechanism, vehicle body and suspension system, and generate power to drive the platform movement. ➀ Motor submodule focus on the solution of matching problem between power and speed. ➁ Motor drive sub-module: focus on the design of drive mode provided by the motor. ➂ Motor detection sub-module: real-timely monitor the motor speed. (4) Power module: to store electric energy and provide power supply. ➀ Power battery sub-module: drive the motor and other actuators in the platform. ➁ Separated battery sub-module: provide power supply for other modules of the platform, and if necessary, provide power supply for the motion module. ➂ ∗ Power supply management submodule: consists of functions such as battery output detection, voltage detection, and battery switch management.

298

8 Application Cases of the Use of MBRSE

(5) Control module: consist of sensors, circuit boards, and wires, and is used for signal processing and platform operation. ➀ Signal transmission submodule: connect the processor signals to other modules of the platform. ➁ Sensor sub-module: adopt appropriate sensors of such as temperature, water immersion, current detection, etc. to achieve the target function. ➂ Control sub-module: use algorithms to effectively control the different devices in the platform. ➃ Data processing sub-module: effectively process the signals from different sensors. ➄ ∗ Anti-interference sub-module: add a shielding layer to prevent interference from different kinds of circuits. ➅ ∗ Software reliability submodule: perform software evaluation of such as redundancy and self-checking. (6) Remote control module: control and monitor of the platform. ➀ Console sub-module: control the movement of the platform through the joystick and keyboard, and process the control signal collected from the camera on the PTZ. ➁ Data monitor submodule: collect and display data collected from the platform in real time. ➂ Multi-platform submodule: equip the same console function on PC client. (7) Reliability module: ➀ ∗ Waterproof sub-module: focus on the waterproof design for different materials, motion structures and electrical interfaces to maximize the wading depth of the platform. ➁ ∗ Self-protection sub-module: use sensors to protect the platform from being fatally damaged.

8.2.3 Working Principles Based on the functional design and structural design, the overall working principle of the platform is shown in Fig. 8.8. The input of the platform is human manipulation signals, GPS signals, and electric energy, whereas the output is the movement friction of the crawler and other relevant signals. In addition, the interference signals are mainly temperature, electromagnetic, humidity, rain, lightning, wind resistance, road condition signals. After completing the overall functional design of the platform, the internal design of the secondary modules is further developed. Figure 8.9 shows the design of the control box functional principle (including power supply management) and its internal energy and signal transmission relationship. This sub-system consists of a

8.3 Systematic Failure Determination and Mitigation Based … temperature, electromagnetic, humidity, rain, lightning and other interference signals

299

wind resistance

image signal camera P-010 resolution >= 720P

different signals

remote- control box P-001

control signal

communication distance (open space) ≥1.5km

push rod

control signal

kinetic energy

electric energy

charger P-002

GPS charging time ≤8h

electric energy

mileage range/time range≥20km/2h

maximum driving speed :5km/h maximum ramp angle:30° maximum stair climbing height: 200mm/ step, angle 30 ° maximum vertical-walltrespassing height: 200mm

kinetic energy

motor speed of the power box (used to determine the moving speed of the vehicle body)

position signal

lithium-ion battery stack P-003

people

P-009

detection signal kinetic energy voltage (electricity), temperature, immersion signals

kinetic energy

control box P-004 positioning distance (under normal operating conditions) ≥3m control and perception: direction, speed, attitude

power box electric energy

P-005

control signal

rotational energy

maximum driving speed: 5km/h

speed signal

rotation angle of the power box

driving wheel P-006 maximum driving speed: 5km/h weight ≤ 50kg load ≥ 50kg

rotational energy

caterpillar band P-007

frictional force

vibration signal

braced force traffic information

suspension P-008 weight ≤ 50kg load ≥ 50kg

support load car body P-011

wind power

weight ≤ 50kg load ≥ 50kg

dissipation of energy

Fig. 8.8 The general working principle of the platform

bunch of components such as the signal receiver, GPS module, core PCB, isolation regulator, main fuse, fuse box, backup power supply, signal transmitter, push rod drive circuit and optocoupler module. The main function of the control box is to receive different kinds of control signals and feedback signals to complete the analysis, give control signals to the subsequent actuators (for instance, power box and push rod), and provide power supply to the camera, image transmission, and power box. Likewise, the working principles of the other sub-systems can also be obtained from the similar analysis, which are discussed in this book. Finally, the three-dimensional design model of the platform is obtained according to the design results of the functional structure, as shown in Figs. 8.10, 8.11 and 8.12.

8.3 Systematic Failure Determination and Mitigation Based on the Functional Model 8.3.1 Systematic Determination of the Failure In this section, the control box and its key components are analyzed by using the systematic failure identification method. Table 8.2 gives the functional failure modes of the key components of the control box, based on the analysis of functional failure clues. After determining a functional failure mode of the key components of the control box, it is necessary to further analyze the cause of that failure mode and consequence and possible improvement measures to optimize the design plan as shown in Table 8.3. As given in Table 8.4, the severity category S is defined based on the task requirements of the platform.

300

8 Application Cases of the Use of MBRSE power supply for camera and image transmission push rod drive circuit 1 P-004-011/1 transmit speed and direction signals

push rod drive circuit 2 speed / direction signals

core PCB 24V

fuse

P-004-009 transmit and distribute the various types of control signals, including speed, direction, temperature, immersion, etc.

P-004-003/1 voltage = 24V overload protection

main switch 24V voltage

P-004-001

24V

voltage = 24V

main fuse P-004-002 voltage = 24V overload protection

fuse 24V P-004-003/2 voltage = 24V overload protection

A

24V

back-up power supply 12.6V P-004-004 voltage = 12.6V voltage converter

motion speed, temperature, water and other feedback signals

back-up power supply P-004-005 voltage = 12V voltage converter

12V

P-004-011/2 transmit speed and direction signals A

push rod drive circuit 3

push rod power (size and direction are used to control the push direction and speed of the push rod to control the speed and direction of the power box)

P-004-011/3 transmit speed and direction signals

12V push rod drive circuit 4 P-004-011/4

fuse transmit speed and direction signals

P-004-003/3

control signals from the remote control box

voltage = 24V overload protection

signal receiver P-004-007

isolated voltage 24V regulator 24V P-004-006 24V isolation and voltage stabilization

control signal

interface module P-004-010

receive and process the various control signals and feedback signals

transmit the rotation signal of the main drive motor

signal transmitter

5V

P-004-015

P-004-008 receive and transmit the platform position signal

detection signal pointing to the remote control box

transmits feedback signals such as motion speed, temperature, and water

GPS module GPS signal

power box control signal (the signal that drives the main motor to rotate)

GPS signal power the motor drive

Fig. 8.9 The working principle of the control box (including power supply management)

tension pulley driving wheel bogie wheel suspension system Fig. 8.10 Illustration of mechanical structure of the platform

8.3.2 Typical Failure Transmission Chain Based on the results of the above-mentioned failure analysis, the propagation of the failure and the degree of influence of the failure can be further analyzed by using the failure chain. For example, failure in the voltage converter control circuit of the backup power supply will result in the phenomenon that the voltage cannot be converted followed by the predetermined function. And the consequence might

8.3 Systematic Failure Determination and Mitigation Based … body side panel mounting bracket

301 push rod rght caterpillar band coupling driving assembly power box left caterpillar band

bearing with a seat

driving wheel

guide pulley

suspension system

Fig. 8.11 Illustration of the vehicle body of the platform tension pulley group2 suspension retention mechanism tension pulley group1 shock absorber module shock absorber connection bearing elastic steel plate

bogie wheel

Fig. 8.12 Illustration of the suspension system of the platform

spread in a large scale. This will possibly cause the manipulator to cannot remotely control the platform, and therefore to fail to complete the final task. The detailed failure transmission process is shown in Fig. 8.13. It can be seen from the failure transmission chain in the backup power supply (P-004-04) that the failure of the voltage converter control circuit or the breakdown of DC–DC stepdown charging module will result that the 24 V input voltage is directly output without being converted. Then the backup power supply (P-004-005) may be burned with no voltage output or the output voltage is higher than 12 V. In the former circumstance, the control box might not provide a power supply to the camera and image transmission modules. Therefore, the image signals cannot be collected and

302

8 Application Cases of the Use of MBRSE

Table 8.2 Functional failure modes of the key components in the control box No.

Component

Function

Failure category

Functional failure mode

1

Main switch (P-004-001)

Turn the power on or off

Loss of function

Unable to power on, unable to power off

Discontinuous function

—

Incomplete function

—

Performance deviation

—

Functional time deviation

—

Undesired function

—

Loss of function

Failure to be fused in time in case of overload

Discontinuous function

—

Incomplete function

—

Performance deviation

—

Functional time deviation

—

Undesired function

Be fused when not overloaded

Loss of function

The output current remains 24 V (the voltage has not been converted)

Discontinuous function

Unstable output voltage or parasitic ripple

Incomplete function

—

Performance deviation

The output voltage is either higher than 24 V or lower than 12.6 V

Functional time deviation

—

2

3

Fuse (P-004-003)

Overload protection

Back-up Convert 24–12.6 V power supply (P-004-004)

(continued)

8.3 Systematic Failure Determination and Mitigation Based …

303

Table 8.2 (continued) No.

4

5

6

Component

Signal receiver (P-004-007)

GPS module (P-004-008)

Core PCB (P-004-009)

Function

Receive and process various control signals and feedback signals. It includes feedback image signal, manipulation signal and lithium battery pack voltage/ temperature/immersion signal

Receive and transmit position signal of platform

Transfer and distribute signal

Failure category

Functional failure mode

Undesired function

—

Loss of function

No output signal

Discontinuous function

Discontinuous output signal

Incomplete function

Partial loss of signal

Performance deviation

Output signal distortion

Functional time deviation

Delay of signal reception

Undesired function

The output signal contains noise

Loss of function

Unable to receive GPS signal

Discontinuous function

Position signal intermittent outputs

Incomplete function

Incomplete position coordinates

Performance deviation

Inaccurate position navigation, the offset is larger than 10 m

Functional time deviation

Position signal lags

Undesired function

—

Loss of function

Unable to transmit signal

Discontinuous function

Discontinuous output transfer signal

Incomplete function

Incomplete output transfer signal

Performance deviation

Distorted output transfer signal

Functional time deviation

The output transfer signal contains noise (continued)

304

8 Application Cases of the Use of MBRSE

Table 8.2 (continued) No.

7

Component

Interface module (P-004-010)

Function

Transmit the signal driving the rotation of the main motor

Failure category

Functional failure mode

Undesired function

—

Loss of function

Unable to transmit signal

Discontinuous function

—

Incomplete function

—

Performance deviation

Distorted output signal

Functional time deviation

—

Undesired function

The output signal contains noise

Note: “—” indicates that the failure mode does not exist

fed back, and the manipulator cannot remotely control the platform, eventually failing to complete the final task. In the latter circumstance, the isolation regulator (P-004006) might fail to output 5 V voltage, and the signal transmitter (P-004-015) cannot output the speed, temperature, and water immersion signals to the remote-control box, nor feedback the measured speed, temperature, and water immersion signals to the control box. Both consequences will make the remote control box cannot output control signals, eventually causing the platform task to fail. Therefore, the reliability of the backup power supply (P-004-004) can seriously affect the reliability of the platform and needs to be improved in the design of the platform. Similarly, the transmission chains of other failures can also be established to identify key components (such as fuses, interface modules) and their key failures. The relevant details will not be discussed in this book.

8.3.3 Typical Closed-Loop Failure Mitigation Process For the key components and key failures that affect the success of the platform’s task, design improvement must be taken to improve the reliabilities of these components. According to the previous analysis, the main failure modes of the backup power supply (P-004-004) are: the output voltage is 24 V without conversion, the output voltage is unstable or has parasitic ripples, the output voltage is higher than 12.6 V, and the output voltage is lower than 12.6 V. The main reasons to cause these failure modes include: the failure in voltage converter control circuit, breakdown of the DC– DC stepdown charging module, filtering failure, reduction of the circuit effective

Switch cannot be disconnected

Recovery

The power supply cannot be turned off

The control box cannot be turned off

The platform cannot be turned off

III

(continued)

Select the remote switch or standby switch

(1) Select the correct fuse, such as air switch or fuse with self-recovery function (2) Install the fused wire according to certain regulations

Select the remote switch or standby switch

Stuck

6

Unable to power off

II

Improvement measure

5

The platform cannot be started

S

Add an alarm set-up to detect low capacity of the battery

The control box cannot be started

Final effect

Low capacity of the lithium-ion battery

Unable to power on

Higher level effect

4

Startup

Local effect

Improve the harness design of the power line and electrical terminal, to avoid poor contact after use or storage after a period of time

Stuck or contactor oxidation

Mission stage

Disconnected power line or electrical terminal

Unable to power on

Failure cause

3

Turn the power on or off

Failure mode

The main fuse gets fused

Main switch (P-004-001)

1

Function

2

Product

No.

Table 8.3 Functional failure analysis of key components of the control box

8.3 Systematic Failure Determination and Mitigation Based … 305

Function

Overload protection

Convert 24–12.6 V

Product

Fuse (P-004-003)

Backup power supply (P-004-004)

No.

7

8

Table 8.3 (continued)

Poor contact

Fused when not overloaded

Unable to filter

Unstable output voltage or with parasitic ripples Decrease of effective capacity

The DC–DC converter is broken down due to the failure of the converter circuit

No voltage conversion, and the output voltage remains 24 V

Short circuit

Instantaneous pulse

Too large fusing power or poor fusing quality

Failure cause

Failure to be fused when overloaded

Failure mode

Full cycle

Full cycle

Full cycle

Full cycle

Mission stage

The conversion function does not work, probably resulting in an unstable output voltage of the backup power supply P004-005

The conversion function does not work, probably resulting in the burnout of the backup power supply P004-005

Power transmission disconnected

The overload protection does not work

Local effect

Output voltage of the control box is unstable, resulting that the camera and image transmission device cannot stably collect image signals

The power of camera and image transmission device cannot be supplied by the control box, resulting in no collection and feedback of image signals

Unable to start the control box because of no power

The control box is burn down due to overload

Higher level effect

It is difficult to remotely control the platform, causing that the task might fail in severe scenarios

The platform cannot be remotely controlled, causing the task to easily fail

The platform cannot work

The platform is destroyed

Final effect

III

II

II

II

S

(continued)

Replace electrolytic capacitors

(1) Select the correct fuses, such as air switch or fuses with self-recovery function, according to the current safety threshold of the on-board equipment (2) Correct installation of fused wires according to regulations Use the backup power supply with overvoltage protection under 24 V to prevent the propagation of failure

Improvement measure

306 8 Application Cases of the Use of MBRSE

Function

Receive and process different control and feedback signals, including feedback image signal, operation signal, and lithium battery pack voltage/temperature/ soaking signal

Product

Signal receiver (P-004-007)

No.

9

Table 8.3 (continued)

The reference voltage of the DC–DC converter decreases or the charging IC is damaged

The output voltage is lower than 12.6 V

(1) The microprocessor signal cannot reach the signal transmission module due to cable failures (2) The microprocessor signal interferes during transmission, causing the wireless module to not recognize its data format

The microprocessor crashes and cannot send valid data

Reference voltage of the DC–DC converter increases or charging IC is damaged

The output voltage is higher than 24 V

No output signal

Failure cause

Failure mode

Full cycle

Full cycle

Full cycle

Mission stage

The signal receiver has no output signal

The conversion function does not work, resulting in failed output of 12 V voltage by the P-004-05 backup power supply

The conversion function does not work, probably resulting in the breakdown of the backup power supply P-004-05 when the output voltage is too high

Local effect

The control box has no control signal output and the control function is invalid

The output voltage of the control box is less than 12.6 V, resulting in the camera and image transmission device being unable to steadily collect image signals

The control box cannot power camera, image transmission device, and so on, resulting in the inability of image signal collection and feedback

Higher level effect

The platform cannot work and may crash when wading water or the battery temperature is too high

It is difficult to remotely control the platform, causing that the task might fail in severe scenarios

The platform cannot be remotely controlled, causing the task to easily fail

Final effect

I

III

II

S

(continued)

Adopt a shielded cable to connect the microprocessor and communication module or other shielding treatment is adopted to reduce interference

Select the microprocessor with protection circuit and self-diagnosis function

Use backup power supply with undervoltage protection to prevent failure propagation

Use a backup power supply with 24 V overvoltage protection to prevent propagation of failure

Improvement measure

8.3 Systematic Failure Determination and Mitigation Based … 307

No.

Product

Table 8.3 (continued)

Function

Failure cause (1) Discontinuous signal received from the microprocessor (2) Communication module fails

Transmission channel is interfered

Transmission channel is interfered

Excessive data transfer rate, causing too much cached data

Transmission channel is interfered

Failure mode

Discontinuous output signal

Loss of a partial signal

Distorted output signal

Signal reception delays

Noisy output signal

Full cycle

Full cycle

Full cycle

Full cycle

Full cycle

Mission stage

The output signal is interfered

Output signal delays

Output signal error or packet loss

Incomplete output signal

Unstable output signal

Local effect

Instructions from the control box may be biased

Control instructions from the control box may delay

The control instructions from the control box may be wrong

Incomplete control instructions from the control box

Incomplete control instructions from the control box

Higher level effect

The platform may not be able to complete the schedule task

The scheduled task is delayed

Platform cannot complete the scheduled task or even crash

The platform needs to be derated

The platform needs to be derated

Final effect

II

III

I

III

III

S

(continued)

Increase the wireless transmission rate or reduce the transmission distance

Adopt the multi-channel transmission plan, in which the channels can be switched in case of interference

Adopt the multi-channel transmission plan, in which the channels can be switched in case of interference

Adopt the multi-channel transmission plan, in which the channels can be switched in case of interference

(1) Improve the quality of the cable to connect the microprocessor PCB to the wireless module (2) Improve the communication module

Improvement measure

308 8 Application Cases of the Use of MBRSE

Function

Receive and transmit the position signal of the platform

Product

GPS module (P-004-008)

No.

10

Table 8.3 (continued)

(1) Platform is in the signal shielding area (2) GPS antenna disconnected

Unstable GPS antenna connection or poor satellite signal quality

Packet loss during signal transmission

Intermittent output position signal

Incomplete position coordinates

Failure cause

Unable to receive GPS signal

Failure mode

Full cycle

Full cycle

Full cycle

Mission stage

GPS positioning function fails

GPS positioning function is derated

GPS positioning function fails

Local effect

Control box cannot locate the platform position, and therefore cannot control the platform

Intermittent signals can cause inaccurate control instructions

Control box cannot locate the platform position, and therefore cannot control the platform

Higher level effect

The platform may not be able to complete the schedule task

The platform needs to be de-rated to complete the scheduled task

The platform may not be able to complete the scheduled task

Final effect

II

III

II

S

(continued)

Control the transmission rate, adopt shielded cable to connect the microprocessor and communication module or other shielding treatment is adopted to reduce interference

Improve the tightening of the GPS antenna connection cable and connection terminals

(1) Adopt IMU to provide inertial navigation to make up the loss of GPS signal and solve the unable positioning problem (2) Improve the connection between the GPS antenna cable and the terminals

Improvement measure

8.3 Systematic Failure Determination and Mitigation Based … 309

Product

Core PCB (P-004-009)

No.

11

Table 8.3 (continued)

Transfer and distribute signals

Function

The signal cannot be transmitted normally due to the loose connection of the terminal

Discontinuous output of the transfer signal

The satellite data parsing speed of the GPS receiving module or the effective data output rate are too slow

Position signal delays

The signal cannot be transmitted normally due to the damage of the PCB or terminal

(1) Few effective satellites to receive GPS signals (2) The surrounding obstacles cause a large error in the analysis of GPS signals (3) GPS signals are interfered with, or pseudo GPS signals are received

Inaccurate position coordinates with the offset farther than 10 m

Unable to transmit signal

Failure cause

Failure mode

Full cycle

Full cycle

Full cycle

Full cycle

Mission stage

Intermittent output signal, resulting that the transfer distribution function needs to be derating used

The signal cannot be transferred and distributed

GPS positioning function is derated

GPS positioning function is derated

Local effect

The control box can only intermittently control the pushing direction and speed of the push rod

The control box cannot control the pushing direction and speed of the push rod

Intermittent signals can cause inaccurate control instructions

Intermittent signals can cause inaccurate control instructions

Higher level effect

The function of power box is derated, resulting that the climbing and obstacle crossing tasks are likely to be failed

The platform may not be able to complete the scheduled task due to the failure of the power box

The platform needs to be derated to complete the schedule task

The platform needs to be derated to complete the schedule task

Final effect

II

II

III

III

S

(continued)

Tighten the connection terminal

(1) Tighten the connection terminal (2) Enhance the antivibration ability of PCBs

Increase the data resolution and output rate of the GPS receiving module

(1) Adopt a more efficient GPS receiver antenna (2) Adopt the differential positioning technique to achieve high-precision positions (3) No effective improvement measures

Improvement measure

310 8 Application Cases of the Use of MBRSE

Product

Interface module (P-004-010)

No.

12

Table 8.3 (continued)

Transmit the signal to drive the main motor to rotate

Function

Electromagnetic interference on the connecting wire or PCB Electromagnetic interference on the connecting wire or PCB

Noisy signal

Electromagnetic interference on the connecting wire or PCB

Noisy transfer signal

Distorted output signal

Electromagnetic interference on the connecting wire or PCB

Distorted output of the transfer signal

Signal from the microprocessor cannot reach the motor driver due to cable failures

The signal cannot be transmitted completely due to the loose connection of the terminal

Incomplete output of the transfer signal

Unable to transmit signal

Failure cause

Failure mode

Full cycle

Full cycle

Full cycle

Full cycle

Full cycle

Full cycle

Mission stage

Speed/direction output signal with interference

Wrong speed/ direction output signal

Wrong speed/ direction output signal

Output signal with interference

Wrong output signal

Incomplete output signal, resulting in that the transfer distribution function needs to be derating used

Local effect

Instructions from the control box may be biased

The instructions in the control box may be wrong

No instruction is issued from the control box

Instructions from the control box may be biased

The control box controls the pushing direction and speed of the push rod with the wrong instructions

The control box can only locally control the pushing direction and speed of the push rod

Higher level effect

The platform needs to be derated to complete the schedule task

Platforms may not be able to complete the schedule task

The platform does not work because the motor does not rotate

The platform needs to be derated to complete the schedule task

Platforms may not be able to complete the schedule task

The function of power box is derated, resulting that the climbing and obstacle crossing tasks are likely to be failed

Final effect

III

II

II

III

II

II

S

Adopt the shielding wire to optimize the anti-electromagnetic interference ability of the PCB

Adopt the shielding wire to optimize the anti-electromagnetic interference ability of the PCB

Adopt the shielding wire

Adopt the shielding wire to optimize the anti-electromagnetic interference ability of the PCB

Adopt the shielding wire to optimize the anti-electromagnetic interference ability of the PCB

Tighten the connection terminal

Improvement measure

8.3 Systematic Failure Determination and Mitigation Based … 311

312

8 Application Cases of the Use of MBRSE

Table 8.4 Definition of the severity category of the platform No. Severity category

Description

1

I This leads to severe damage to the platform (catastrophic)

2

II (fatal)

This leads to major economic loss, mission failure, serious damage to the platform or serious environmental damage

3

III (medium)

This leads to moderate economic loss, task delay or degradation, and moderate environmental damage

4

IV (light)

This not likely to lead to economic loss, platform damage and environmental damage, but unplanned maintenance

- platform(P)

II

unable to complete the scheduled task II complete the mission in downgrade

- camera(P-010)

III

II

The image signal has no output II

- control box(P-004) unable to power the device

isolation voltage regulator - (P-004-006)

II II

- remote control box(P-001)

II

The detection signal has no output

II

The control signal has no output

The detection signal output is unstable

III

The control signal output is unstable III

II

II

II No output of 5V voltage The 5V voltage output is unstable III or with parasitic ripples

II - back-up power supply (P-004-004) unconverted, the output current II voltage is 24V The output voltage is unstable or with parasitic ripples III

II - back-up power supply (P-004-005) unconverted, output current II voltage = input voltage The output voltage is unstable or with parasitic ripples III

- signal transmitter(P-004-015)

II

The output voltage is higher than 12.6V

II

The output voltage is higher than 12V

II

The signals directed to the remote control box such as speed, temperature, water, etc. have no output

II

The output voltage is less than 12.6V

III

The output voltage is less than 12V

III

The output of signals such as speed, temperature, water, etc. directed to the remote control box is unstable

III

Fig. 8.13 The failure influence transmission chain from the backup power supply

capacity, increase or decrease of the reference voltage of the DC–DC stepdown charging circuit, charging IC damage, etc. Taking into account all of these causes, two improvement measures are implemented in the design of the backup power supply: (1) Adopt the backup power supply equipped with 24 V overvoltage or undervoltage protection to prevent failure transmission. (2) Use electrolytic capacitors to perform filtering. In the design process, to ensure effective implementation of the improvement measure, it is necessary to control the failure mitigation process using the failure closed loop mitigation process control model. Figure 8.14 shows the work principle of the backup power supply (P-004-004) after design improvement. In addition to the backup power supply (P-004-004), design improvement measures of the other key components in the control box such as fuse (P-004-003) and interface module (P-004-010) are also developed based on the corresponding failure analysis results. Among them, the fuse (P-004-003) is replaced by an air switch selected by the safety threshold of the current of the onboard equipment. And the interface module (P-004-010) is replaced by a optocoupler module. The working principle model of the improved control box is shown in Fig. 8.15.

8.4 Determination and Control of Component Failures

313

Start

1.The output voltage remains 24V, which is not converted 2.The output voltage is unstable or has parasitic ripples 3.The output voltage is higher than 12.6V 4.The output voltage is lower than 12.6V

yes

1. Does it affect safety?

2. Conduct risk assessment to determine whether the risk index is acceptable?

no

3. Does it affect the completion of the task?

yes

no yes

no 5. Must improve the power supply, does improvement measures have been implemented?

4. Is the probability of the failure occurrence (grade) within the acceptable range?

no

yes

Improving (Further improvement measures are needed)

9. Has the failure been eliminated? yes

no

yes 7. Can the power supply be improved?

yes

Do not mitigate

no no

The fault has been eliminated Further information is given as follows: 1) Implementing departments 2) Implementation results 3) Test and process inspection instructions

6. Is it necessary to mitigate the failure?

yes

no

The consequences of the failure have been reduced to an acceptable range Further information is given as follows: 1) Implementing departments 2) Implementation results 3) Test and process inspection instructions 4) Probability and consequences after mitigation

Mitigated

Eliminated

8. Have improvement measures been implemented?

yes Implementing department: Electrical Department; Implementation results: The structural of the power supply has been improved; Test and process inspection instructions: Not yet carried out; Probability and consequences after mitigation: IV

no

Cannot be improved because: Not technically feasible Not economically possible Description of the detailed reason:

Unable to mitigate

Test method: BIT Test plan: Add the BIT module in the output port

10. Is the power supply testable? yes

no 11. Has the test plan been implemented?

yes

1) Test method 2) Test plan

12. Is it necessary to use compensation measures?

no Not yet implemented yes Testability modeling and analysis

no

yes

No compensation measure

13. Have the compensation measures been implemented?

Compensation measures have been implemented. The following information is required: 1) Preventive maintenance work type, maintenance level, work timing, work interval, and work description 2) Corrective maintenance work type, maintenance level, and work description 3) Support resources allocation information

no

Compensation measures have not yet to be implemented

Conduct supportability analysis such as RCMA, corrective maintenance work analysis, O&MTA and LORA

End

Fig. 8.14 Closed-loop mitigation process of the backup power supply (P-004-004) failure

8.4 Determination and Control of Component Failures 8.4.1 Description of the Device The strain tester is a test device that uses strain sensors to measure the surface strain of a component. Because it is easy to use, highly able to adapt to the environment and capable of measuring strains in a complex environment, it is widely used in

314

8 Application Cases of the Use of MBRSE

power supply for camera, image transmission, etc push rod drive circuit 1 P-004-011/1 transmit speed and direction signals

core PCB P-004-009 transmit and distribute the various types of control signals, including speed, direction, temperature, immersion, etc

main switch 24V voltage

P-004-001

back-up power supply

air switch 24V

voltage=24V

P-004-002 voltage=24V overload protection

24V

12.6V

P-004-004 24V

voltage=12.6V voltage convertor

back-up power supply P-004-005 voltage=12V voltage converter

speed/ direction signal

feedback signals such as movement speed, temperature, water, etc

12V 12V

push rod drive circuit 2 P-004-011/2 transmit speed and direction signals

push rod drive circuit 3 P-004-011/3 transmit speed and direction signals

push rod drive circuit 4 P-004-011/4

A

push rod power (size and direction are used to control the push direction and speed of the push rod, to control the speed and direction of the power box)

transmit speed and direction signals

signal receiver control signal from the remote control box

P-004-007

control signal

receive and process the various control signals and feedback signals

isolated voltage regulator P-004-006 isolation and voltage stabilization

P-004-010

power box control signal (the signal that drives the main motor to rotate)

transmit the rotation signal from the main drive motor

24V 24V 24V

signal transmitter

5V

P-004-015

P-004-008 receive and transmit platform position signal

detection signal pointing to the remote control box

transmit the feedback signals such as motion speed, temperature, and immersion

GPS module GPS signal

optocoupler module

GPS signal power the motor drive

Fig. 8.15 The working principle (after improvement) of the control box

the inspection tests of highway and bridge buildings, and strain tests of large-scale engineering structures [94]. The strain tester is made up of a bunch of components including the power supply, measuring bridge, signal amplification, low-pass filter, A/D conversion, signal output, etc. [95]. It amplifies and adjusts the weaker electrical signals of the strain measurements to provide the useable voltage signal for the external signal analysis terminal. In this case, a dual-channel static strain tester shown in Fig. 8.16a is selected as the research object. A strain measurement circuit board shown in Fig. 8.16b is used inside the dual-channel static strain tester. It is fixed by bolts on its four corners, and can provide a total of 6 dual-channel resistance strain measurement signals, with an appropriate temperature and humidity environment. The highest and lowest temperatures around the strain measurement circuit board are 50 °C and 10 °C respectively, and relative humidity is between 20 and 80%. In this case, the dual-channel strain tester mainly includes 4 modules, as listed in Table 8.5. Among these modules, the strain measurement circuit board module can be further divided into 7 submodules by functions, which are the signal input, signal first-level amplification, signal second-level amplification, power supply voltage stabilization, low-pass filter, common mode suppression, and voltage output.

8.4 Determination and Control of Component Failures

315

Fig. 8.16 Dual channel strain tester (a) and its strain measurement circuit board (b)

Table 8.5 Main modules in the dual-channel strain tester

Module

Count

Housing (including switch)

1

Base shell (including interface)

1

Strain measurement circuit board

1

Interface circuit board (no device)

1

8.4.2 Digital Prototype Modeling In order to build the digital prototype model, it is necessary to collect the relevant design information (including the type, package, weight, size, etc.) of the case shell, PCB and all of the 147 electronic components. Then, referring to its design documents, a CAD model of the dual-channel strain tester is established with appropriate simplification, as shown in Fig. 8.17.

Fig. 8.17 CAD model of the dual-channel strain tester a external view of the case shell; b internal perspective view

316

8 Application Cases of the Use of MBRSE

8.4.3 Load Response Analysis The CAD model established in the previous subsection is converted to a CFD model according to the thermal design information of the dual-channel strain tester. Then, by means of the CFD calculation, and the temperature distribution over the strain measurement circuit board and all of its electronic components are obtained, as shown in Fig. 8.18. It can be seen that when the dual-channel strain tester is placed in an environment with an ambient temperature of 50 °C, the highest surface temperature of the case shell is 73.3 °C, the highest temperature on the strain measurement circuit board is 117 °C, and the highest temperature is found on the electronic component LM340 with the highest power consumption. Next, the CAD model is converted to a FEA model according to the antivibration design information of the dual channel strain tester, as shown in Fig. 8.19. After setting the relevant parameters of materials in components, circuit boards, case shell, etc., the first six-order eigenfrequency results are obtained from the simulation of the dual-channel strain tester, as listed in Table 8.6. And the corresponding vibration mode results are shown in Fig. 8.20. It is observed that these

Fig. 8.18 CFD model (a) and thermal analysis results (b) of the dual-channel strain tester

Fig. 8.19 FEA model of the dual-channel strain tester

8.4 Determination and Control of Component Failures

317

Table 8.6 The first 6-order eigenfrequency and local modal position of the dual-channel strain tester Eigenfrequency (Hz)

Local modal position

1st order

59.1

Strain measurement circuit board

2nd order

160.9

Strain measurement circuit board

3rd order

225.7

Strain measurement circuit board

4th order

244.2

Strain measurement circuit board

5th order

335.8

Strain measurement circuit board

6th order

413.5

Strain measurement circuit board

Fig. 8.20 Simulation results of the dual channel strain tester (the first six orders) a 1st order mode; b 2nd order mode; c 3rd order mode; d 4th order mode; e 5th order mode; f 6th order mode

318

8 Application Cases of the Use of MBRSE

Fig. 8.21 Failure prediction model of the strain measurement circuit board

simulation results basically meet the vibration resistance design requirements of the dual-channel strain tester.

8.4.4 Failure Prediction Model Based on the relevant design information (such as component information), the failure prediction (analysis) model of the Strain measurement circuit board is further established, as shown in Fig. 8.21. By using a physics of failure (PoF) analysis software, the potential failures of the strain measurement circuit board can be predicted. The failure prediction results are shown in Fig. 8.22, in which potential failures are located on the electronic components IC LM79L05A1 and LM79L05A2. The details of the failure prediction results are shown in Fig. 8.23, and the information matrix of the main failures is listed in Table 8.7.

8.4.5 Reliability Evaluation Next, the predicted failure time and relevant simulation data of each potential failure (corresponding to each different failure mechanism) are output by using Matlab scripts. Then, the overall reliability evaluation of the strain measurement circuit board can be obtained, as shown in Table 8.8.

8.4 Determination and Control of Component Failures

319

Fig. 8.22 Predicted potential failures on the Strain measurement circuit board

Fig. 8.23 Failure prediction results of the strain measurement circuit board Table 8.7 The information matrix of the main failures of the strain measurement circuit board Failure site

Failure mode

Major failure mechanism

Predicted time to failure Mean

Minimum

Maximum

LM79L05A1

Solder joint cracking

Thermal fatigue 1391.92

750.45

1968.20

LM79L05A2

Solder joint cracking

Thermal fatigue 1557.24

1068.67

2055.02

320

8 Application Cases of the Use of MBRSE

Table 8.8 Reliability evaluation of the strain measurement circuit board Module

Strain measurement circuit board

Distribution type

Distribution parameters Shape parameter

Scale parameter

Location parameter

Weibull distribution

4.4817

10,908.9

11,874.6

Mean time to first failure (h) 31,657.5

According to the reliability analysis presented above, it can be determined that the reliability weakness of the strain measurement circuit board comes from two aspects. On the one hand, through the thermal analysis, the electronic component LM340 has a local high temperature that is close to the upper limit of its rated operating temperature. On the other hand, by analyzing the prediction of failure, two electronic components (i.e. LM79L05A1 and LM79L05A2, respectively) are potentially failed by thermal fatigue-caused solder joint cracks. Therefore, certain design improvement measures should be developed to improve the inherent reliability of the strain measurement circuit board, and prevent the occurrence of such above potential reliability problems in practical application.

8.4.6 Optimization Design and Failure Control As shown in Fig. 8.18, compared with the other electronic components, LM340 generates higher heat to result in a local high temperature, which is likely to cause thermal fatigue failure in the solder joint. Furthermore, the SMD components LM79L05A1 and LM79L05A2 are found close to the high temperature area, and therefore will also have solder joint cracks due to thermal fatigue. As a result, the design optimization and failure control measures should be developed to overcome these above problems. And three possible improvement measures are provided as follows: (1) Choose electronic components with a better heat dissipation capacity, such as metal packaging components. (2) Improve the layout of the electronic components. (3) Introduce additional forced cooling measures, such as forced air or liquid cooling. For example, in order to achieve better heat dissipation and improve reliability, in this case, a forced air-cooling setup can be equipped on the Strain measurement circuit board. If there are limitations on economic conditions or weight design requirements, it is also possible to modify the layout of the electronic components to avoid the occurrence of high temperature area. In the meantime, it can also place the electronic components suffered from high temperatures closed to the cold plate, to increase the heat transfer efficiency, and reduce the occurrence possibility of thermal fatigue failure caused by high temperature.

Chapter 9

MBRSE Future Outlook

Abstract In this chapter, the technical development trend of the Model-based Reliability System Engineering (MBRSE) is prospected from the several key aspects of multi-equipment system-level, system-level and product-level MBRSE technologies. In addition, the research status of the digital twin technology for reliability is discussed. And finally, the connotation and development trend of the Reliability System Engineering (RSE) digital twin are introduced. Keywords Model-based reliability system engineering · Development trend · Digital twin · Reliability system engineering

9.1 Technical Development Trend of MBRSE Through the development status analysis on international MBRSE-related technologies, it can be seen that the research interests are constantly moving towards micro and macro scales. On the one hand, more and more studies on failure mechanisms are carried out at the micro scale. On the other hand, the reliability, maintainability and testability issues of the multi-equipment systems and even larger systems have become research hotspots at the macro scale. In terms of the technical methods, on the one hand, new technologies such as artificial intelligence and big data are used more and more widely in reliability, maintainability, and supportability. And on the other hand, the test technologies of both reliability and maintainability are gradually merged together, towards to the integration direction. From the perspective of the product scales, the research object gradually expands from equipment to multi-equipment systems. From the perspective of methodology, the fundamental theory gradually extends from probability statistics to failure mechanisms at the micro scale, and achieves the integrated design of reliability, maintainability and testability at the mechanism level. From the perspective of the technical methods, the integrated design methods of RMS are developed from document-centric and data sharing to global modeling, digitization and intelligence. In particular in recent years, the concept of RMS digital twin has been proposed to realize personalized

© National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0_9

321

322

9 MBRSE Future Outlook

real-time monitoring of the equipment characteristics. From the perspective of tools and measures, with the development of equipment digital engineering technology, the abilities of modeling analysis and simulation become more and more stronger, accurate and efficient [81]. In general, the development trends in the above-mentioned key directions are summarized as follows: (1) MBRSE technology for the multi-equipment system. Facing to the technological development in the next decade, the US Department of Defence has established the engineered resilient system (ERS) project as a scientific and technological priority project. The goal of this project is to design and manufacture flexible multi-equipment systems, which are required to have the abilities to repel/ resist/absorb damage. Resilience provides a new parameter to comprehensively measure the RMS characteristics of multi-equipment systems. It is not only different from but also related to the parameters such as combat applicability and combat effectiveness. In terms of technical means, the US military proposes to establish the multi-equipment system digital twins in the future. These digital twins are built on the basis of single equipment together with different types of support facilities. After that, a digital twin cluster can be furtherly established for the operation and maintenance of the multi-equipment system. The digital twin cluster can grasp the overall status of the multi-equipment system in real time, to provide support for a high-efficiency use and intensive guarantee. (2) System-level MBRSE technology. For the new equipment with light weight, high load, extreme environment, and long-term support requirements, the traditional verification methods based on statistical distribution and physical tests are no longer applicable. The US Air Force Research Laboratory and NASA jointly proposed the digital mainstream/digital twin plan, which has been tested in the development of F-15 fighter jets and MEMS devices, respectively. At the same time, the System Engineering Office in US Department of Defence gave a series of description principles for digital system model (DSM) in 2013. It is also clearly pointed out that enterprises need to deliver the corresponding DSM or digital twin models together with the products, for clearly understanding the system faults and readiness during the use and maintenance process, and determining the service and maintenance requirements. So far, although the digital twin design platform has not been launched on the market, it is inevitably to be developed and utilized sooner or later. In addition, with continuous breakthroughs in equipment design technology based on big data and artificial intelligence, the global digital and intelligent design of RMS will become a trend. In the future, the automatic accumulation of hexability data and design experience will be implemented based on big data and intelligent rules. It will replace the artificial and repeatable hexability work. However, the innovative hexability design will still be dominated by designer [96]. (3) MBRSE technology at the equipment level. This technology emphasizes the deep integration of reliability, maintainability and testability at the physical

9.2 Digital Twin Technology for Reliability

323

level. Currently, the design models of reliability, maintainability and testability are obtained separately from the independent modelling, analysis and evaluation process in each field. The relevant analyses in reliability, maintainability and testability is under a waterfall analysis mode and lack physics-based data. Therefore, the design and development cycle might be longer, and the rework possibility might be increased. In the future, multi-discipline simulations consisting of multiple physical parameters (i.e. force, heat, electricity, structure, signal processing, control, etc.) will be performed through an efficient workflow management, multi-discipline system models and tools, and public and joint databases. Then the logical analysis models, physical models, three-dimensional drawings, digital data, metadata and single source of truth can be developed for the coupled analysis of reliability, maintainability and testability. Next, through the model-based integration design and analysis, the system engineering process can be continuously and deeply understood, and combined with the single source of truth to achieve multidisciplinary optimization. In recent years, the development of digital twin and related technologies provides a new direction and key enabling technologies for integrated design of both functional performance and hexability of equipment-level products. For example, by combining the physical system with its equivalent virtual system, the US NASA studies the digital twin based fault prediction and mitigation methods for complex equipment, and carries out the corresponding verification work with certain equipment-level products. The development of the digital twin technology will provide enhanced field data support for the integrated design of both functional performance and hexability, and provide new possible ways for the refinement, enhancement and intelligent development of integration technology for equipment-level products [79].

9.2 Digital Twin Technology for Reliability 9.2.1 State-of-Art of Digital Twins In the mid to late twentieth century, technologies such as computer, simulation software, Internet, and wireless network make the visualization of physical entities in parallel virtual spaces possible. Meanwhile, with these technologies, it is also possible to efficiently organize the resources and equipment to achieve remote collaboration. Nowadays, with the development of the new generation of information technology (such as cloud computing, Internet of things, big data, and artificial intelligence), the role of virtual space has become more and more important. The interaction between physical space and virtual space has also become unprecedentedly active. This makes the seamless integration of these two spaces become an inevitable trend, to create new development potential, improve the status in the design, manufacturing, and service industries, and promote technological progress. The digital twin is first proposed in 2003 by Professor Grieves at University of Michigan Executive Course on Product

324

9 MBRSE Future Outlook

Lifecycle Management (PLM) [97]. In that course, the concept of “a virtual, digital expression equivalent to a physical product” was put forward by the definition given as: a digital representation of one or a group of specific products that can abstractly express the real product and can be tested under real or simulated conditions. This concept stems from the desire to express the information and data of the product more clearly and to collect all the information together for a higher-level analysis. Although it was not called a “digital twin” at the time (actually it was called the “mirrored space model” from 2003 to 2005, and the “information mirror image model” from 2006 to 2010), the conceptual model has all the constituent elements of a digital twin, including physical space, virtual space and the connection or interface between them. Therefore, it can be considered as the rudimentary part of a digital twin. Later, in his book “Virtually perfect: Driving innovative and lean products through product lifecycle management” published in 2011 [98], Professor Michael Grieves quoted the term “digital twin”, proposed by his co-author John Vickers, to describe such above model. It has been still used until now. Afterward, the US Department of Defence introduced digital twin into the maintenance of spacecraft, and defined it as a simulation process that integrates multiple physical parameters, multiple scales, and multiple probabilities. Designed from the physical model of the aircraft, the digital twin constructs a complete virtual reality, using historical data and real-time data updated from sensors to describe and reflect the full life cycle of the aircraft. A new type of spacecraft and the corresponding digital model (i.e. digital twin) are planned to be delivered in 2025. Especially, the digital twin exhibit super-realistic features in the following two aspects: ➀ It has all geometric data, even including processing error; ➁ It has all material data, even including microstructural information of the materials. In 2012, the US Air Force Research Laboratory puts forward the concept of “airframe digital twin” [99]. The airframe digital twin is an integrated model composed of many sub-models, as shown in Fig. 9.1. It is regarded as a hyper realistic model of the airframe that is being manufactured and maintained, and can be used to simulate and determine whether the airframe meets the mission conditions. An integrated model composed of many sub-models, as shown in Fig. 9.1.

Fig. 9.1 Conceptual diagram of the airframe digital twin

9.2 Digital Twin Technology for Reliability

325

In order to meet the requirements of future aircrafts (such as light weight, high load, and longer service time under more extreme environments), NASA and US Air Force vehicles jointly proposed a digital twin paradigm for the R&D of future aircrafts. For the use in the aircrafts, flight systems, and launch vehicles, they defined digital twin as an integrated multi-physics, multiscale, multi probability simulation model, which adopts the best currently available physical models, real-time updated data from sensors and historical data to reflect the status of the corresponding physical entity. At the same time, NASA published “Modelling, simulation, information technology and processing roadmap”. And in this roadmap, the digital twin was officially introduced to the public. The propose of digital twin can be considered as a phase summary of the previous research results of the US Air Force vehicles and NASA, and its definition strongly underlines the integration, multi-physics, multi-scale, and probabilistic features of the digital twin. In 2014, the digital twin white paper has been released to make the three-dimensional architecture of the digital twin, i.e. “physical space, digital space and interconnection”, well known and widely accepted by the public. Moreover, the digital twin has been introduced into other industrial fields in addition to aerospace, such as automobiles, oil and gas, and healthcare. Digital twin technology has been listed as the first of the top six future technologies in the defence industry by Lockheed Martin. It had also been listed in the top ten strategic technologies for two consecutive years (2017–2018) by Gartner, the world’s most authoritative IT research and consulting company. The concept and technology development timeline of the digital twin is shown in Fig. 9.2. Nowadays, digital twin has been successfully applied in several domains. For example, by combing the physical system with equivalent virtual system, Grieves et al. develops a digital twin based method for the fault prediction and mitigation of the complex system, which has been used and verified by NASA. In addition, PTC has been working on establishing a real-time connection based on the digital twin between the virtual world and the real world, to provide customers with efficient after-sales service and support. Siemens uses digital twin to help manufacturing companies build a production system model of the entire manufacturing process in

Fig. 9.2 Time frame for the development of the concept and technology of the digital twin

326

9 MBRSE Future Outlook

the information space, through a comprehensive digitization of the entire physical space from product design to manufacturing execution. A large number of studies and discussions have shown that digital twin is an effective tool to implement the fusion between physics and information. And such a fusion is capable of overcoming the bottlenecks existing in industry 40, Chinese manufacturing 2025, the Internet of Things and CPS-based manufacturing. The digital twin technology has the following characteristics: ➀ It integrates different kinds of physical data to provide a comprehensive mapping to the physical entity; ➁ It exists and evolves in the full life cycle of the physical entity, by continuously accumulating relevant knowledge and data; ➂ It can not only describe the physical phenomena but also optimize the physical entities based on a series of models. Tuegel et al. [99] proposed a conceptual model to describe the use of digital twin in aircraft life prediction and structural integrity assurance. By comparing the traditional life prediction method with digital twin life prediction method (as shown in Fig. 9.3), they pointed out that the digital twin technology can provide a better management on the aircraft during its service cycle. In this way, engineers can obtain more real-time information about the status of the aircraft, and thus promote more timely and efficient maintenance and support decision making. Seshadri et al. [100] used digital twin to carry out the structural health management on damaged aircraft structures, and proposed a damage characterization method by using multisensory systems to estimate the wave propagation response. It is shown that this method can effectively predict the location, size and direction of the damage. By virtue of the concept of digital twin, Li et al. [101] used the Bayes method to establish a multifunctional probability model to implement fault prediction and diagnosis, and demonstrated the effectiveness of this proposed method through an example of predicting the propagation of fatigue cracks on an aircraft wing. select the task trajectory flight loads database

design essentials structural finite element model

internal loads database

aircraft monitoring database

spectrum evolution

carry out the mission

fatigue life prediction model

development of pressure transfer function

flight loads and environment digital twin structural digital twin

stress, temperature and vibration prediction

damaged driver

damaged condition

update/improve the digital twin

damage and life prediction structural system reliability assessment carry out the mission

healthy space monitoring

Fig. 9.3 Flowcharts of the traditional life prediction method (a) versus the digital twin life prediction method (b) [99]

9.2 Digital Twin Technology for Reliability

327

Services(Ss)

Connection(CN_SD) Digital Twin Data (DD) Connection(CN_PD)

Connection(CN_VD)

iterative optimization Physical Entity(PE)

Connection(CN_PV)

Virtual Entity(VE)

Fig. 9.4 The five-dimensional conceptual model of the digital twin [102]

The five-dimensional conceptual model shown in Fig. 9.4 for the description of digital twin was first proposed in China, as expressed below [102]: M DT = (P E, V E, Ss, D D, C N ) where PE represents the physical entity, VE represents the virtual entity, Ss represents the service, DD represents the twin data, and CN represents the connection in between the different dimensions. Another research team in China has also proposed a digital twin-driven prognostic and health management (DT-PHM) method for complex equipment and proposed the technical framework and workflow of the DT-PHM method (as shown in Fig. 9.5) [102]. They used the gear box as an example to show that the DT method can improve the accuracy of fault diagnosis. They also pointed out that the digital twin technology plays an important role for only high-value and major equipment. Establishment of the digital twin must require sufficient data, and therefore brings a high operational cost and complexity. The main challenges in the application of digital twin are: building a high-fidelity data model, processing a large amount of digital twin data, and balancing the costs and benefits of using digital twin.

328

9 MBRSE Future Outlook

Workflow of digital twin driving PHM step 1 digital twin modeling and calibration

The equipment in remote areas

step 2

no degradation

model simulation and interaction Pe and Pa

observation

step 3 consistency judgment

consistency data

model defects inconsistency step 5 judgment caused by inconsistency analysis

step 6

step 4 degradation detection

interference

identify the cause of failure and predict the failure

degradation

step 7 decision

maintenance strategy data

PHM service

degradation detection service

service call

consistency judgment service

service execution

failure cause and prediction service

calibration service

Fig. 9.5 Technical framework and workflow of the DT-PHM method [102]

9.2.2 Reliability System Engineering Digital Twin 9.2.2.1

Concepts and Principles

Researchers in different fields have proposed different definitions for digital twin. For example, as is mentioned above, digital twin makes full use of the data from such as physical models, sensors, operating history, etc., to integrate multidisciplinary, multiphysical, multiscale, and multi-probability simulation processes to complete the mapping from the physical space to digital space and, therefore, reflect the full life cycle of the physical equipment. Whereas another example points out that “digital twin is a real-time integration of common characteristics of the digital model and unique characteristics of the physical entity in full life cycle.“ At present, there has not been a unified definition of digital twin. But an agreement is reached on the understandings of its essential connotation and technical characteristics.

9.2 Digital Twin Technology for Reliability

329

A digital twin is a virtual representation of a physical system. It is a combination between data and models both of which will be continuously updated throughout the system full life cycle. The digital twin model is one of the core directions in digital twin research. Digital twin models can be divided into general models and special models, respectively. The general model focuses not on a specific product model but how to describe the model elements as a set of general objects and investigates the relationship among these objects. Then the general model is capable to provide a general method for the management and communication of the model elements under different environments. In contrast, the specific model focuses on modelling a specific product using digital twin technology. At present, specific models are still the research hotspots. Researchers study and build digital twins in different professional fields, such as structure, electronic product, flight controller, engines, etc. These digital twins can be summarized as “performance digital twins”. Currently, the reliability analysis and evaluation of the equipment is mainly carried out based on the concept of a “statistical average”, and the main parameters and indices of the equipment also exhibit distinct statistical characteristics. In the future technology, reliability design and process of the product will show more “individual characteristics”. And the reliability system engineering digital twin (RSE digital twin) just has this above described feature. Based on the digital twin technical characteristics and equipment reliability theory, RSE digital twin can be preliminary defined as: The RSE digital twin is a model that provides simultaneous integration and evolution between the solid and digital models in the physical-digital dual space and the corresponding hexability characteristics throughout the life cycle. Such a model can monitor the health status of equipment in real time, and predict its hexability characteristics. As shown in Fig. 9.6, the digital twin of reliability system engineering has an obvious interaction with models from other spaces. It is necessary to study the data characteristics and model categories used for RSE digital twin. The data used in RSE digital twin has typical characteristics of multi-source and heterogeneous. They are composed by continuous data, discrete data, and statistical data, which are collected from physical entities, physical products, and digital models. The RSE digital twin mainly focuses on the fault characteristics of equipment, including fault characteristics, fault diagnosis, fault prediction, status monitoring, fault prevention and repair, etc. It also has an interaction with other hexability characteristics. The characteristics of a product or equipment include two aspects, which are hexability characteristics and performance characteristics, characterized by the hexability indices and functional performance indices respectively. The equipment digital twin is also divided into two parts, which are the performance digital twin and RSE digital twin, respectively. The first one is used to simultaneously characterize the performance characteristics of the equipment, whereas the second one is used to simultaneously monitor the health status and characterize the reliability characteristics of the equipment.

330

9 MBRSE Future Outlook physical space field space

digital space

test space data interaction

physical entity

deduction R digital twin 1

physical twin

deduction R digital twin 2

digital twins

RSE digital twin

physical space

environmental stress model reliability model

maintainability model

fault model

safety model

testability model supportability model

degradation

model-based reliability system engineering theory

time

real-time health management of equipment real-time reliability prediction of equipment

Fig. 9.6 Interactions between the RSE digital twin and other spaces

In this book, the boundary between the RSE digital twin and the performance digital twin is preliminarily defined. And the relationship and difference in between these two digital twins are discussed as well. As shown in Fig. 9.7, the RSE digital twin and performance digital twin share part of information and data in the digital space. They are regarded as two aspects of digital twin, but with difference input and output preferences. The performance digital twin mainly focuses on the real-time mapping in between the performance parameters and load data of the physical entity, and the simulations based on these data. It simultaneously maps the performance status inside of the physical entity of an equipment to the digital space, and output the specific performance characteristics of that equipment. The RSE digital twin pays attention to not only the real-time simulated performance parameters of the equipment, but also the equipment status parameters, operation and maintenance data. It is used primarily to simultaneously monitor the health status of the equipment and then predict the reliability of the equipment and its remaining useful life in real time.

9.2 Digital Twin Technology for Reliability

331

Fig. 9.7 Conceptual difference between the RSE digital twin and performance digital twin

9.2.2.2

Future Development Trend

No matter which one of the statistics-based methods PoF-based method, and fault prediction and health management technology is used, the complex and changeable reliability nature of the equipment is only characterized by relatively simplified assumptions. Therefore, the predictions of the model are always not sufficient compared to the actual situation. In the future, the RSE digital twin should be able to fully utilize the multidimensional data collected from the equipment, including product model data, statistical data of fault events, real-time operational status data, historically environment and load data, etc. These data will be classified and comprehensively processed to evaluate the real-time health status of the product, and predict the time-varying reliability of the product in the future. On the other hand, the RSE digital twin is not unchanged, but can be updated and evolved along with the product state, to provide more accurate reliability predictions of the equipment from the different environmental conditions, mission profiles, and load history experienced by the other similar equipment. However, only the fundamental concepts are established for the RSE digital twin at the current moment, and the following questions have still not been perfectly solved: ➀ How to build the RSE digital twin of the equipment in its design stage; ➁ How to ensure the RSE digital twin to cover the full characteristics of the equipment; ➂ How to make the RSE digital twin to characterize the reliability characteristics of the entire multi-equipment system.

332

9 MBRSE Future Outlook

Unfortunately, there has been no mature research to solve these above problems. This is mainly because current research on reliability system engineering and digital twins is carried out independently. In the future, a complete technical framework that is capable of integrating the characteristics of the reliability system engineering and the digital twin is needed to be developed.

Bibliography

1. ‘Reliability Program Standard for Systems Design, Development, and Manufacturing’, Information of Technology Association of America, United States, Standard GEIASTD0009, Aug. 2008. [Online]. Available: https://saemobilus.sae.org/content/geiastd0009. 2. D. R. Hoffman, ‘An overview of concurrent engineering’, in Annual Reliability and Maintainability Symposium. 1998 Proceedings. International Symposium on Product Quality and Integrity, 1998, pp. 1–7. doi: https://doi.org/10.1109/RAMS.1998.653529. 3. G. Xiong, W. Xu, H. Zhang, and W. Fan, Theory and Practice of Parallel Engineering. Tsinghua University Press, 2001. (in Chinese) 4. T. C. Sharma, ‘New aircraft technologies—Challenges for dependability’, in Annual reliability and maintainability symposium, Jan. 1992, pp. 243–248. 5. M. H. Awtry, A. B. Calvo, and C. J. Debeljak, ‘Logistics engineering workstations for concurrent engineering applications’, in Proceedings of the IEEE 1991 National Aerospace and Electronics Conference NAECON 1991, 1991, pp. 1253–1259 vol.3. doi: https://doi.org/10. 1109/NAECON.1991.165922. 6. L. Ruan and W. Zhang, Aircraft Development Systems Engineering. Beihang University Press, 2008. (in Chinese) 7. R. Wang, S. Yu, and X. Zhang, Systems Engineering Management Guide. Aviation Industry Press, 1988. 8. W. Yang, L. Ruan, and Q. Tu, ‘RELIABILITY SYSTEM ENGINEERING—THEORY AND PRACTICE’, Acta Aeronautica et Astronautica Sinica, vol. 16, no. S1, pp. 1–8, Nov. 1995. 9. R. Kang and Z. Wang, ‘Theoretical and Technical Framework for Reliability Systems Engineering’, Chinese Journal of Aeronautics, vol. 26, no. 5, 2005. (in Chinese) 10. J. A. Estefan, ‘Survey of model-based systems engineering (MBSE) methodologies’, 2008. 11. H. Andersson, E. Herzog, G. Johansson, and O. Johansson, ‘Experience from introducing unified modeling language/systems modeling language at Saab Aerosystems’, Syst. Engin., vol. 13, no. 4, p. n/a-n/a, 2009, doi: https://doi.org/10.1002/sys.20156. 12. KENNEDY SPACE CENTER, ‘The Boeing Company Keunedy Space Center Technical Service’, 2011. 13. A. Howells and S. Bushell, ‘Experiences of using model based systems engineering’, IBM Software, United Kingdom, 2010. 14. Y. Ren, ‘A unified modelling approach for integrated system performance and RMS design’, Beihang University, 2012. (in Chinese) 15. S. Zeng, ‘Research on integrated design method of system reliability and performance’, Beihang University, 2009. (in Chinese) 16. Y. Chen, ‘Research on integrated modeling and analysis methods for performance and reliability’, Beihang University, 2004. (in Chinese) 17. G. Zhao, ‘Research on system failure behavior modeling method’, Beihang University, 2006. (in Chinese) © National Defense Industry Press 2024 Y. Ren et al., Model-Based Reliability Systems Engineering, https://doi.org/10.1007/978-981-99-0275-0

333

334

Bibliography

18. Y. Xu, Y. Li, and X. Chen, ‘Integrated design of RMS based on coupled modeling and knowledge flow’, Systems Engineering and Electronics Technology, vol. 35, no. 7, pp. 1564–1570, 2013. (in Chinese) 19. T. Xiao, ‘Integrated design of LPG refilling system performance and reliability’, Nanjing University of Science and Technology, 2012. (in Chinese) 20. D. Miao, The Essence of Systems Science. China Renmin University Press, 2010. (in Chinese) 21. S. Shi, Chinese Military Encyclopedia (Second Edition), Subject Division I, General Theory of Military Technology. Encyclopedia of China Publishing House, 2007. (in Chinese) 22. Chengdu Aircraft Group Electronic Technology Company, ‘XXX signal processor outfield fault analysis report’, 2011. (in Chinese) 23. Chengdu Aircraft Group Electronic Technology Company, ‘XX Signal Processor Technology Solution’, 2011. (in Chinese) 24. ‘GJB 451A Reliability Repairability Assurance Terminology’, General Armament Department of the Chinese People’s Liberation Army, 2005. (in Chinese) 25. ‘GJB 299C Electronic Equipment Reliability Projection Manual’, General Armament Department of the Chinese People’s Liberation Army, 2006. 26. S. Chen, D. Chen, and M. Zeng, ‘A study of failure modes and impact analysis at the system level of helicopters’, in Proceedings of the 11th Annual Academic Conference of Reliability Engineering Committee of Chinese Aeronautical Society, 2008. (in Chinese) 27. General Administration of Quality Supervision, Inspection and Quarantine of China/China National Standardization Administration, ‘GB/T 4888 Fault tree nomenclature and symbols’, General Administration of Quality Supervision, Inspection and Quarantine of China/China National Standardization Administration, 2009. (in Chinese) 28. K. O. Arras, ‘An Introduction To Error Propagation: Derivation, Meaning and Examples of Equation Cy= Fx Cx FxT’, ETH Zurich, Report, 1998. Accessed: Mar. 10, 2023. [Online]. Available: https://www.research-collection.ethz.ch/handle/20.500.11850/82620. 29. X. Wang, X. Li, and C. Guanrong, ‘Introduction to Network Science’, Higher Education Press, 2012. (in Chinese) 30. N. S. V. Rao, ‘On parallel algorithms for single-fault diagnosis in fault propagation graph systems’, IEEE Trans. Parallel Distrib. Syst., vol. 7, no. 12, pp. 1217–1223, 1996, doi: https:// doi.org/10.1109/71.553268. 31. Tarski, L. Zhou, R. Wu, and C. Yan, ‘Introduction to Logic and Deductive Science Methodology’, The Commercial Press, 1963. (in Chinese) 32. W. Zhang, ‘Computer-aided Reliability Maintenance Assurance Requirements Development Method Study’, Beihang University, 1999. (in Chinese) 33. W. Dong, Rough set theory and its data mining applications. Northeastern University Press, 2009. (in Chinese) 34. B. Sun, Y. Li, Z. Wang, D. Yang, and Q. Feng, ‘A Combined Physics of Failure and Bayesian Network Reliability Analysis Method for Complex Electronic Systems’, Process Safety and Environmental Protection, vol. 148, no. 3, 2021. 35. L. A. Escobar and W. Q. Meeker, ‘A Review of Accelerated Test Models’, Statist. Sci., vol. 21, no. 4, pp. 552–577, Nov. 2006, doi: https://doi.org/10.1214/088342306000000321. 36. Nelson and Wayne, ‘Graphical Analysis of Accelerated Life Test Data with the Inverse Power Law Model’, Reliability, IEEE Transactions on, 1972, doi: https://doi.org/10.1109/TR.1972. 5216164. 37. M. A. Miner, ‘Cumulative Damage in Fatigue’, J. Appl. Mech, vol. 12, no. 3, pp. A159–A164, Sep. 1945, doi: https://doi.org/10.1115/1.4009458. 38. M. Fan, Z. Zeng, E. Zio, and R. Kang, ‘Modeling dependent competing failure processes with degradation-shock dependence’, Reliability Engineering & System Safety, vol. 165, no. sep., pp. 422–430, 2017, doi: https://doi.org/10.1016/j.ress.2017.05.004. 39. D. Young and A. Christou, ‘Failure mechanism models for electromigration’, IEEE Trans. Rel., vol. 43, no. 2, pp. 186–192, Jun. 1994, doi: https://doi.org/10.1109/24.294986. 40. J. R. Lloyd, E. Liniger, and T. M. Shaw, ‘Simple model for time-dependent dielectric breakdown in inter- and intralevel low-k dielectrics’, Journal of Applied Physics, vol. 98, no. 8, p. 084109, Oct. 2005, doi: https://doi.org/10.1063/1.2112171.

Bibliography

335

41. M. Chookah, M. Nuhi, and M. Modarres, ‘A probabilistic physics-of-failure model for prognostic health management of structures subject to pitting and corrosion-fatigue’, Reliability Engineering & System Safety, vol. 96, no. 12, pp. 1601–1610, Dec. 2011, doi: https://doi.org/ 10.1016/j.ress.2011.07.007. 42. X. Wang and M. Shao, Basic principles and numerical methods of finite element method. Tsinghua University Press, 1988. (in Chinese) 43. Y. Yu, P. Du, and Z. Wang, ‘Research on the current application status of finite element method’, JOURNAL OF MACHINE DESIGN, vol. 22, no. 3, pp. 6–9, 2005. (in Chinese) 44. H. Cao, Y. Chen, J. Li, and S. Liu, ‘Static characteristics analysis of three-tower suspension bridges with central buckle using a simplified model’, Engineering Structures, vol. 245, p. 112916, Oct. 2021, doi: https://doi.org/10.1016/j.engstruct.2021.112916. 45. D. Hu, M. He, and Z. Yan, ‘Thermal Stress Analysis of Tractor Clutch Pressure Plate’, in International Conference on Engineering Science & Management, 2016. doi: https://doi.org/ 10.2991/esm-16.2016.63. 46. Y. Pan, Intelligent CAD methods and models. Science Press, 1997. (in Chinese) 47. Q. Gong, ‘Multi-mode time-varying reliability design analysis method based on degradation process’, Beihang University, 2012. (in Chinese) 48. C. Spreafico, D. Russo, and C. Rizzi, ‘A state-of-the-art review of FMEA/FMECA including patents’, Computer Science Review, vol. 25, no. Aug., pp. 19–28, Aug. 2017, doi: https://doi. org/10.1016/j.cosrev.2017.05.002. 49. Z. Wu et al., ‘A Gamma Process-Based Prognostics Method for CCT Shift of High-Power White LEDs’, IEEE Trans. Electron Devices, vol. 65, no. 7, pp. 2909–2916, Jul. 2018, doi: https://doi.org/10.1109/TED.2018.2835651. 50. Ø. Hagen and L. Tvedt, ‘Vector Process Out-Crossing as Parallel System Sensitivity Measure’, J. Eng. Mech., vol. 117, no. 10, pp. 2201–2220, Oct. 1991, doi: https://doi.org/10.1061/(ASC E)0733-9399(1991)117:10(2201). 51. J. Zhang and X. Du, ‘Time-dependent reliability analysis for function generation mechanisms with random joint clearances’, Mechanism and Machine Theory, vol. 92, pp. 184–199, Oct. 2015, doi: https://doi.org/10.1016/j.mechmachtheory.2015.04.020. 52. Z. Hu and X. Du, ‘Time-dependent reliability analysis with joint upcrossing rates’, Struct Multidisc Optim, vol. 48, no. 5, pp. 893–907, Nov. 2013, doi: https://doi.org/10.1007/s00158013-0937-2. 53. R. Rackwitz, ‘Reliability analysis—a review and some perspectives’, Structural Safety, vol. 23, no. 4, pp. 365–395, Oct. 2001, doi: https://doi.org/10.1016/S0167-4730(02)00009-7. 54. Q. Dong and L. Cui, ‘A Study on Stochastic Degradation Process Models under Different Types of Failure Thresholds’, Reliability Engineering System Safety, vol. 181, 2018. 55. M. Yan, S. Bo, Z. Li, D. Yang, and M. Wei, ‘An improved time-variant reliability method for structural components based on gamma degradation process model’, in Prognostics & System Health Management Conference, 2017. 56. Q. Xia et al., ‘Multiphysical modeling for life analysis of lithium-ion battery pack in electric vehicles’, Renewable and Sustainable Energy Reviews, vol. 131, p. 109993, Oct. 2020, doi: https://doi.org/10.1016/j.rser.2020.109993. 57. R. K. Sharma, D. Kumar, and P. Kumar, ‘Modeling system behavior for risk and reliability analysis using KBARM’, Qual. Reliab. Engng. Int., vol. 23, no. 8, pp. 973–998, Dec. 2007, doi: https://doi.org/10.1002/qre.849. 58. T. J. J. Lombaerts, G. H. N. Looye, Q. P. Chu, and J. A. Mulder, ‘Design and simulation of fault tolerant flight control based on a physical approach’, Aerospace Science & Technology, vol. 23, no. 1, pp. 151–171, 2012. 59. S. Cao, ‘Research on Reliability Prediction and Test Method Based on Physics of Failure of Electronic Products’, School of Mechatronics Engineering, 2016. (in Chinese) 60. G. J. Besseris, ‘A methodology for product reliability enhancement via saturated–unreplicated fractional factorial designs’, Reliability Engineering & System Safety, vol. 95, no. 7, pp. 742– 749, 2010.

336

Bibliography

61. T. Dersjö and M. Olsson, ‘Efficient design of experiments for structural optimization using significance screening’, Struct Multidisc Optim, vol. 45, no. 2, pp. 185–196, Feb. 2012, doi: https://doi.org/10.1007/s00158-011-0677-0. 62. G. Tian, H. Zhang, Y. Feng, D. Wang, Y. Peng, and H. Jia, ‘Green decoration materials selection under interior environment characteristics: A grey-correlation based hybrid MCDM method’, Renewable and Sustainable Energy Reviews, vol. 81, no. pt.1, pp. 682–692, Jan. 2018, doi: https://doi.org/10.1016/j.rser.2017.08.050. 63. Y. Xiao and H. Li, ‘Improvement on Judgement Matrix Based on Triangle Fuzzy Number’, Fuzzy Systems and Mathematics, 2003. 64. Y. Aoues, E. Pagnacco, D. Lemosse, and L. Khalij, ‘Reliability-based design optimization applied to structures submitted to random fatigue loads’, Struct Multidisc Optim, vol. 55, no. 4, pp. 1471–1482, Apr. 2017, doi: https://doi.org/10.1007/s00158-016-1604-1. 65. M. Moustapha and B. Sudret, ‘Surrogate-assisted reliability-based design optimization: a survey and a unified modular framework’, Struct Multidisc Optim, vol. 60, no. 5, pp. 2157– 2176, Nov. 2019, doi: https://doi.org/10.1007/s00158-019-02290-y. 66. Y. Zhang, S. Wang, and G. Ji, ‘A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications’, Mathematical Problems in Engineering,2015,(2015–10–7), vol. 2015, no. PT.19, pp. 1–38, 2015. 67. W. Yao, X. Chen, Q. Ouyang, and M. van Tooren, ‘A reliability-based multidisciplinary design optimization procedure based on combined probability and evidence theory’, Struct Multidisc Optim, vol. 48, no. 2, pp. 339–354, Aug. 2013, doi: https://doi.org/10.1007/s00158013-0901-1. 68. X.-J. Meng, S.-K. Jing, L.-X. Zhang, J.-H. Liu, and H.-C. Yang, ‘Linear approximation filter strategy for collaborative optimization with combination of linear approximations’, Struct Multidisc Optim, vol. 53, no. 1, pp. 49–66, Jan. 2016, doi: https://doi.org/10.1007/s00158015-1303-3. 69. K. X. Campo, T. Teper, C. E. Eaton, A. M. Shipman, G. Bhatia, and B. Mesmer, ‘Model-based systems engineering: Evaluating perceived value, metrics, and evidence through literature’, Systems Engineering, vol. 26, no. 1, pp. 104–129, 2023, doi: https://doi.org/10.1002/sys. 21644. 70. R. Kang and Z. Wang, ‘Review and prospect of reliability system engineering theory’, Journal of Aeronautics, vol. 43, no. 10, p. 11, 2022. (in Chinese) 71. Zhiguo, Zeng, Meilin, Wen, Rui, and Kang, ‘Belief reliability: a new metrics for products’ reliability’, Fuzzy Optimization & Decision Making A Journal of Modeling & Computation Under Uncertainty, 2013. 72. X. Xu and J. Guan, ‘Research on the Integrated Multi-View Modeling for RMS Based on Multi-Agent’, in International Conference on Electrical, Control and Automation Engineering (ECAE 2013), 2013. 73. Y. Zhou, R. Li, C. Liu, and H. Lu, ‘Multi-view integrated modeling method based on UML’, Journal of Transportation Engineering, vol. 5, no. 2, p. 7, 2005. (in Chinese) 74. L. Xie and Z. Wu, ‘Research on multi-view modeling of integrated product design’, Journal of Zhejiang University of Technology, vol. 28, no. 001, pp. 37–42, 2000. (in Chinese) 75. Z. Pei and Y. Xu, ‘Application and discussion of six properties cooperative design method for steam generator accident protection system’, Nuclear Science and Engineering, vol. 40, no. 3, p. 7, 2020. (in Chinese) 76. Y. L. Cui and W. Wu, ‘A Review of Reliability Principles in Conceptual Design’, AMR, vol. 199–200, pp. 583–586, 2011, doi: https://doi.org/10.4028/www.scientific.net/AMR.199200.583. 77. X. Liu, ‘Research on the design and evaluation of six indicators of complex products’, Value Engineering, vol. 41, no. 34, p. 3, 2022. (in Chinese) 78. H. Shi, H. Tao, Y. Guo, and W. Jiang, ‘Research on comprehensive evaluation model of complex product based on DEMATEL-ANP’, Mathematics in Practice and Theory, vol. 50, no. 11, p. 12, 2020. (in Chinese)

Bibliography

337

79. Y. Su, G. Fu, Y. Wang, H. Leng, and H. Gu, ‘Research on model based reliability system engineering methodology of system in package’, in 2017 IEEE Electrical Design of Advanced Packaging and Systems Symposium (EDAPS), Dec. 2017. doi: https://doi.org/10. 1109/EDAPS.2017.8277030. 80. L. Yang, Y. Ma, Z. Fang, and H. Yang, ‘Research on the six property knowledge system and intelligent application method’, Electronic Product Reliability and Environmental Testing, vol. 38, no. 2, p. 7, 2020. (in Chinese) 81. Z. Wang, ‘Current status and prospects of reliability systems engineering in China’, Frontiers of Engineering Management, vol. 008, no. 004, p. P.492–502, 2021. 82. R. Cressent, P. David, V. Idasiak, and F. Kratz, ‘Designing the database for a reliability aware Model-Based System Engineering process’, Reliability Engineering & System Safety, vol. 111, pp. 171–182, Mar. 2013, doi: https://doi.org/10.1016/j.ress.2012.10.014. 83. R. Cressent, V. Idasiak, F. Kratz, and P. David, ‘Mastering safety and reliability in a model based process’, in 2011 Proceedings—Annual Reliability and Maintainability Symposium, Jan. 2011. doi: https://doi.org/10.1109/RAMS.2011.5754506. 84. Z. Zeng, R. Kang, M. Wen, and E. Zio, ‘A Model-Based Reliability Metric Considering Aleatory and Epistemic Uncertainty’, IEEE Access, vol. 5, pp. 15505–15515, 2017, doi: https://doi.org/10.1109/ACCESS.2017.2733839. 85. J. Shen, X. Xu, J. Li, and Y. Zhou, ‘A few thoughts on the issue of equipment “six qualities”’, Aviation maintenance and engineering, no. 10, pp. 50–53, 2015. (in Chinese) 86. J. P. Martino, An Introduction to Technological Forecasting. Routledge, 2018. 87. Y. Kitamura, ‘Roles of Ontologies of Engineering Artifacts for Design Knowledge Modeling’, Proc of International Seminar & Workshop Engineering Design in Integratedproduct Development, pp. 59–69, 2006. 88. J. F. Brinkley, D. Suciu, L. T. Detwiler, J. H. Gennari, and C. Rosse, ‘A framework for using reference ontologies as a foundation for the semantic web’, AMIA ... Annual Symposium proceedings. AMIA Symposium, vol. 2006, p. 96, 2006. 89. Q. Yang, G. Qi, and S. Wu, ‘Matrix description of the product data model view mapping’, Computer Integrated Manufacturing System, vol. 9, no. 6, pp. 421–425, 2003. (in Chinese) 90. X. Luo and X. Ding, ‘Research and prospective on motion planning and control of ground mobile manipulators’, Journal of Harbin Institute of Technology, vol. 53, no. 1, p. 15, 2021. (in Chinese) 91. L. Bruzzone, S. E. Nodehi, and P. Fanghella, ‘Tracked Locomotion Systems for Ground Mobile Robots: A Review’, Machines, vol. 10, no. 8, p. 648, Aug. 2022, doi: https://doi.org/ 10.3390/machines10080648. 92. L. Bruzzone and G. Quaglia, ‘Review article: locomotion systems for ground mobile robots in unstructured environments’, Mech. Sci., vol. 3, no. 2, pp. 49–62, 2012, doi: https://doi.org/ 10.5194/ms-3-49-2012. 93. G. Quaglia, R. Oderio, L. Bruzzone, and R. Razzoli, ‘A Modular Approach for a Family of Ground Mobile Robots’, International Journal of Advanced Robotic Systems, vol. 10, no. 7, p. 296, Jul. 2013, doi: https://doi.org/10.5772/56086. 94. D. Jia, Z. Zhou, J. Wu, X. Luo, and Y. Jin, ‘Design and implementation of wireless strain tester based on C8051F350’, Foreign electronic measurement technology, no. 3, p. 6, 2019. (in Chinese) 95. K. Wang and Y. Jiang, ‘Design of a portable intelligent stress-strain tester’, Engineering Construction and Design, no. 9, p. 6, 2010. (in Chinese) 96. W. Kuo, ‘Risk and Reliability are part of our Life’, in First International Conference on Reliability Systems Engineering (ICRSE), 2015. 97. M. W. Grieves, ‘Product lifecycle management: the new paradigm for enterprises’, IJPD, vol. 2, no. 1/2, p. 71, 2005, doi: https://doi.org/10.1504/IJPD.2005.006669. 98. M. Grieves, Virtually Perfect: Driving Innovative and Lean Products through Product Lifecycle Management. 2011. 99. E. Tuegel, ‘The Airframe Digital Twin: Some Challenges to Realization’, in Aiaa/asme/asce/ ahs/asc Structures, Structural Dynamics & Materials Conference Aiaa/asme/ahs Adaptive Structures Conference Aiaa, 2013.

338

Bibliography

100. B. R. Seshadri and T. Krishnamurthy, ‘Structural Health Management of Damaged Aircraft Structures Using Digital Twin Concept’, in 25th AIAA/AHS Adaptive Structures Conference, Jan. 2017. doi: https://doi.org/10.2514/6.2017-1675. 101. C. Li, S. Mahadevan, Y. Ling, S. Choze, and L. Wang, ‘Dynamic Bayesian Network for Aircraft Wing Health Monitoring Digital Twin’, AIAA Journal, vol. 55, no. 3, pp. 930–941, Mar. 2017, doi: https://doi.org/10.2514/1.J055201. 102. F. Tao, M. Zhang, Y. Liu, and A. Nee, ‘Digital twin driven prognostics and health management for complex equipment’, Cirp Annals, pp. S0007850618300799–, 2018.