Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems 0128224738, 9780128224731

Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems gives a systematic description of the many face

280 43 37MB

English Pages 419 [421] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front cover
Half title
Full title
Copyright
Contents
Contributors
Preface
Chapter 1 - Quality-related fault detection and diagnosis: a technical review and summary
1.1 Introduction
1.2 Basic methodology
1.3 Recent research
1.3.1 The KDD algorithm
1.3.2 The KLS-based approach
1.3.3 Reconstruction partial derivative contribution plot
1.3.4 Kernel sample equivalent replacement
1.4 Simulation
1.4.1 Introduction of the Tennessee-Eastman process
1.4.2 Fault detection results
1.4.3 Nonlinear fault detection using KSER
1.4.4 Fault diagnosis without smearing effect
Appendix A Description of the variables and faults
References
Chapter 2 - Canonical correlation analysis-based fault diagnosis method for dynamic processes
2.1 Introduction
2.2 Preliminaries
2.2.1 Basics of conventional CCA
2.2.2 Obtaining the positions and images in CCA
2.2.3 Details of the SVD-based technique
2.2.4 The CCA-based fault diagnosis method
2.2.5 Main steps of the CCA-based fault diagnosis method
2.3 CCA-based fault diagnosis method for dynamic processes
2.3.1 DCCA-based fault detection
2.3.2 The GRU-aided CCA fault detection method
2.4 Experimental results and analysis
2.4.1 The CSTR process
2.4.2 The TDCS process
2.5 Conclusion
Acknowledgments
References
Chapter 3 - H∞ fault estimation for linear discrete time-varying systems with random uncertainties
3.1 Introduction
3.2 Robust fault detection for LDTV systems with multiplicative noise
3.3 Robust fault detection for LDTV systems with measurement packet loss
3.4 Fixed-lag fault estimator design for LDTV systems under an unreliable communication link
3.5 Conclusion
Acknowledgments
References
Chapter 4 - Fault diagnosis and failure prognosis of electrical drives
4.1 Introduction
4.1.1 Operation under field orientation control
4.1.2 Operation under Direct Torque Control
4.2 What can fail and how
4.2.1 Electric power converters
4.2.2 Electrical machines
4.2.3 Capacitors
4.2.4 Batteries
4.3 Diagnosis methodology and tools
4.3.1 Signal selection
4.3.2 Signal features
4.3.3 Classification
4.4 Faults, their manifestation, and diagnosis
4.4.1 Winding faults in AC machines
4.4.2 Bearing faults
4.4.3 Insulation
4.4.4 Power electronics
4.4.5 Induction motor drives
4.4.6 PMAC drives
4.4.7 Switched reluctance machines
4.5 Failure prognosis, fault mitigation, and reliability
4.5.1 From diagnosis to prognosis
4.5.2 Prognosis tools
4.5.3 Applications and new developments
4.5.4 Decisions based on prognosis and mitigation
References
Chapter 5 - Intelligent fault diagnosis for dynamic systems via extended state observer and soft computing
5.1 Introduction
5.2 Extended state observer
5.2.1 ESO design
5.2.2 Estimation error convergence
5.3 Case study: three-tank dynamic system
5.4 Fault detection by means of ESO
5.4.1 Fault detection scheme
5.4.2 Fault detection without exact knowledge of the plant model
5.5 FAULT isolation and fault identification
5.5.1 Generation of reference values
5.5.2 Fault isolation by means of fuzzy inference and ESO
5.5.3 Fault identification via neural networks
5.6 Simultaneous faults of different types
5.6.1 Isolation of process faults
5.6.2 Isolation of sensor faults
5.6.3 Isolation of actuator faults
5.7 Isolation of simultaneous process faults and actuator faults
5.7.1 Characteristics of process faults and actuator faults
5.7.2 Utilizing an outflow sensor to isolate actuator faults
5.8 Conclusion and future work
References
Chapter 6 - Fault diagnosis and failure prognosis in hydraulic systems
6.1 Application status of sensor detection technology
6.1.1 Relevant standards of hydraulic machinery sensor detection technology
6.1.2 Instrumentation for the hydraulic turbine prototype
6.1.3 On-site detection for hydraulic turbines
6.2 Cavitation research
6.2.1 Establishment of cavitation theory
6.2.2 Numerical simulation of the cavitation mechanism
6.2.3 Cavitation model establishment and optimization
6.2.4 Engineering application of numerical simulation for cavitation flow in a hydraulic turbine
6.3 Intelligent evaluation and diagnosis technology
6.3.1 Current theoretical research hotspot
6.3.2 Commercial intelligent evaluation and fault diagnosis systems
6.3.3 Application status and deficiency of current diagnosis systems
6.4 Prognostics research
6.4.1 Prediction based on the classical linear time series model
6.4.2 Time Series Prediction Based on Intelligent Technology
6.4.3 Fuzzy time series prediction based on fuzzy set theory
6.4.4 Combination forecast
References
Chapter 7 - Fault detection and fault identification in marine current turbines
7.1 The HT-based detection method
7.1.1 Problem description
7.1.2 The HT-based detection method
7.1.3 Simulation results and analysis
7.2 The wavelet threshold denoising-based dectection method
7.2.1 Problem description
7.2.2 The wavelet threshold denoising-based detection method
7.2.3 Simulation results and analysis
7.2.4 Experimental results and analysis
7.3 The identification method of blade attachment based on the sparse autoencoder and softmax regression
7.3.1 Problem description
7.3.2 The recognition method based on the sparse autoencoder and softmax regression
7.3.3 Experimental results and analysis
7.4 The identification method of blade attachment based on depthwise separable CNN
7.4.1 Problem description
7.4.2 The recognition method based on depthwise separable CNN
7.4.3 Experimental analysis
7.5 Conclusion and future works
References
Chapter 8 - Quadrotor actuator fault diagnosis and accommodation based on nonlinear adaptive state observer
8.1 Introduction
8.2 Mathematical model of a quadrotor
8.2.1 The nonlinear quadrotor model
8.2.2 The actuator fault model
8.3 Naso-based FTC
8.3.1 The fault detection module
8.3.2 The fault diagnosis module
8.3.3 The fault accommodation module
8.4 Validation
8.4.1 Numerical simulation results
8.4.2 Flight test
8.5 Conclusion
References
Chapter 9 - Defect detection and classification in welding using deep learning and digital radiography
9.1 Introduction
9.1.1 Welding Process
9.1.2 Digital Radiography
9.2 Literature Review
9.3 Database Preparation
9.4 Experimental Study
9.4.1 Deep Learning Architecture
9.4.2 Training
9.4.3 Network HP Optimization
9.5 Experimental Implementation
9.6 Conclusion
References
Chapter 10 - Real-time fault diagnosis using deep fusion of features extracted by PeLSTM and CNN
10.1 Introduction
10.2 Basic theory
10.2.1 Convolutional neural network
10.2.2 Long short-term memory
10.3 Deep fusion of feature extracted by PeLSTM and CNN
10.3.1 2D screenshot image construction
10.3.2 The feature fusion algorithm based on CNN and PeLSTM
10.4 Experimental testing
10.4.1 Rolling bearing test and analysis
10.4.2 Gearbox test and analysis
10.5 Conclusion and future work
Acknowledgment
References
Index
Back cover
Recommend Papers

Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems
 0128224738, 9780128224731

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

152 x 229 mm, Paper, PG, Spine: 20.828 mm

EDITED BY HAMID REZA KARIMI Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems gives a systematically and almost self-contained description of the many facets of envisaging, designing, implementing, or experimentally exploring emerging trends in fault diagnosis and failure prognosis in mechanical, electrical, hydraulic, and marine systems. The book is devoted to the development of mathematical methodologies for fault diagnosis and isolation, faulttolerant control, and failure prognosis problems of engineering systems. It presents new techniques in reliability modeling, reliability analysis, reliability design, fault and failure detection, signal processing, and fault-tolerant control of engineering systems. It is specifically focusing on the development of mathematical methodologies for diagnosis and prognosis of faults or failures, providing a unified platform for understanding and applicability of advanced diagnosis and prognosis methodologies for improving reliability purposes in both theory and practice, such as vehicles, manufacturing systems, circuits, flights, and marine systems. This book will be a valuable resource for different groups of readers—mechanical engineers working on vehicle systems, electrical engineers working on rotary machinery systems, control engineers working on fault detection systems, mathematicians and physician working on complex dynamics, and postgraduate students majoring in mechatronics, control engineering, mechanical engineering, and applied mathematics. It can be also of significant interest to the researchers within the mechatronics engineering society, including both academic and industrial parts. Key Features • • •

Presents recent advances of theory, technological aspects, and applications of advanced diagnosis and prognosis methodologies in engineering applications. Provides a series of latest results in, including but not limited to, fault detection, isolation, fault-tolerant control, and failure prognosis of components. Gives numerical and simulation results in each chapter to reflect the engineering practice, yet demonstrate the focus of the developed analysis and synthesis approaches.

FAULT DIAGNOSIS AND PROGNOSIS TECHNIQUES FOR COMPLEX ENGINEERING SYSTEMS

FAULT DIAGNOSIS AND PROGNOSIS TECHNIQUES FOR COMPLEX ENGINEERING SYSTEMS

FAULT DIAGNOSIS AND PROGNOSIS TECHNIQUES FOR COMPLEX ENGINEERING SYSTEMS

About the Editor

KARIMI

Dr. Hamid Reza Karimi is a Professor of Applied Mechanics with the Department of Mechanical Engineering, Politecnico di Milano, Milan, Italy. His current research interests include control systems and mechatronics with applications to automotive systems, robotics, vibration systems, and wind energy. Prof. Karimi is currently the Editorin-Chief, Technical Editor, or Associate Editor for some international journals. He has been awarded as the 2016-2020 Web of Science Highly Cited Researcher in Engineering and also received the 2020 IEEE Transactions on Circuits and Systems Guillemin-Cauer Best Paper Award. Technology and Engineering ISBN 978-0-12-822473-1

EDITED BY 9 780128 224731

HAMID REZA KARIMI

Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems

Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems

Edited by

Hamid Reza Karimi Department of Mechanical Engineering, Politecnico di Milano, Italy

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2021 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12-822473-1 For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Mara Conner Acquisitions Editor: Sonnini R. Yura Editorial Project Manager: Megan Healy Production Project Manager: Kamesh Ramajogi Cover Designer: Greg Harris Typeset by Aptara, New Delhi, India

Contents Contributors Preface

1

ix xi

Quality-related fault detection and diagnosis: a technical review and summary Guang Wang and Hamid Reza Karimi 1.1 1.2 1.3 1.4

2

Introduction Basic methodology Recent research Simulation Appendix A: Description of the variables and faults References

1 6 9 27 43 47

Canonical correlation analysis–based fault diagnosis method for dynamic processes Zhiwen Chen and Ketian Liang 2.1 2.2 2.3 2.4 2.5

3

Introduction Preliminaries CCA-based fault diagnosis method for dynamic processes Experimental results and analysis Conclusion Acknowledgments References

51 53 63 71 82 84 84

H∞ Fault estimation for linear discrete time-varying systems with random uncertainties Yueyang Li 3.1 3.2 3.3 3.4

Introduction Robust H∞ fault detection for LDTV systems with multiplicative noise Robust H∞ fault detection for LDTV systems with measurement packet loss Fixed-lag H∞ fault estimator design for LDTV systems under an unreliable communication link

89 91 102 111 v

vi

Contents

3.5

4

Conclusion Acknowledgments References

123 123 123

Fault diagnosis and failure prognosis of electrical drives Elias G. Strangas 4.1 4.2 4.3 4.4 4.5

5

Introduction What can fail and how Diagnosis methodology and tools Faults, their manifestation, and diagnosis Failure prognosis, fault mitigation, and reliability References

127 132 144 150 165 175

Intelligent fault diagnosis for dynamic systems via extended state observer and soft computing Paul P. Lin 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

6

Introduction Extended state observer Case study: three-tank dynamic system Fault detection by means of ESO Fault isolation and fault identification Simultaneous faults of different types Isolation of simultaneous process faults and actuator faults Conclusion and future work References

182 183 188 192 194 197 200 203 204

Fault diagnosis and failure prognosis in hydraulic systems Jie Liu, Yanhe Xu, Kaibo Zhou and Ming-Feng Ge 6.1 6.2 6.3 6.4

7

Application status of sensor detection technology Cavitation research Intelligent evaluation and diagnosis technology Prognostics research References

207 217 229 244 253

Fault detection and fault identification in marine current turbines Tianzhen Wang, Zhichao Li and Yilai Zheng 7.1 7.2 7.3 7.4

The HT-based detection method The wavelet threshold denoising–based dectection method The identification method of blade attachment based on the sparse autoencoder and softmax regression The identification method of blade attachment based on depthwise separable CNN

264 269 283 290

vii

Contents

7.5

8

Conclusion and future works References

299 300

Quadrotor actuator fault diagnosis and accommodation based on nonlinear adaptive state observer Sicheng Zhou, Kexin Guo, Xiang Yu, Lei Guo and Youmin Zhang 8.1 8.2 8.3 8.4 8.5

9

Introduction Mathematical model of a quadrotor NASO-based FTC Validation Conclusion References

305 307 309 319 323 323

Defect detection and classification in welding using deep learning and digital radiography M-Mahdi Naddaf-Sh, Sadra Naddaf-Sh, Hassan Zargaradeh, Sayyed M. Zahiri, Maxim Dalton, Gabriel Elpers and Amir R. Kashani 9.1 9.2 9.3 9.4 9.5 9.6

10

Introduction Literature review Database preparation Experimental study Experimental implementation Conclusion References

327 333 336 336 345 346 347

Real-time fault diagnosis using deep fusion of features extracted by PeLSTM and CNN Funa Zhou, Zhiqiang Zhang and Danmin Chen 10.1 10.2 10.3 10.4 10.5

Index

Introduction Basic theory Deep fusion of feature extracted by PeLSTM and CNN Experimental testing Conclusion and future work Acknowledgment References

353 356 357 371 395 398 398 401

Contributors Danmin Chen, School of Software, Henan University, China Zhiwen Chen, School of Automation, Central South University, China Maxim Dalton, Artificial Intelligence Lab, Stanley Oil, and Gas, Stanley Black, and Decker, United States Gabriel Elpers, Artificial Intelligence Lab, Stanley Oil, and Gas, Stanley Black, and Decker, United States Ming-Feng Ge, School of Mechanical Engineering and Electronic Information, China University of Geosciences, China Kexin Guo, School of Automation Science and Electrical Engineering, Beihang University, China Lei Guo, School of Automation Science and Electrical Engineering, Beihang University; Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, China Hamid Reza Karimi, Department of Mechanical Engineering, Politecnico di Milano, Italy Amir R. Kashani, Artificial Intelligence Lab, Stanley Oil, and Gas, Stanley Black, and Decker, United States Yueyang Li, School of Electrical Engineering, University of Jinan, China Zhichao Li, School of Logistic Engineering, Shanghai Maritime University, China Ketian Liang, School of Automation, Central South University, China Paul P. Lin, Fellow of the American Society of Mechanical Engineers (ASME); Professor Emeritus, Mechanical Engineering Department, Cleveland State University, United States; Visiting Scholar, Kaohsiung University of Science and Technology, Taiwan M-Mahdi Naddaf-Sh, Electrical Engineering Department, Lamar University, United States Sadra Naddaf-Sh, Electrical Engineering Department, Lamar University, United States Elias G. Strangas, Michigan State University, United States Guang Wang, North China Electric Power University – Baoding Campus, China Tianzhen Wang, School of Logistic Engineering, Shanghai Maritime University, China Xiang Yu, School of Automation Science and Electrical Engineering, Beihang University; Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, China ix

x

Contributors

Sayyed M. Zahiri, Artificial Intelligence Lab, Stanley Oil, and Gas, Stanley Black, and Decker, United States Hassan Zargaradeh, Electrical Engineering Department, Lamar University, United States Youmin Zhang, Department of Mechanical, Industrial, and Aerospace Engineering, Concordia University, Montreal, Quebec, Canada Zhiqiang Zhang, School of Computer and Information Engineering, Henan University, China Yilai Zheng, School of Logistic Engineering, Shanghai Maritime University, China Funa Zhou, School of Logistic Engineering, Shanghai Maritime University, China Sicheng Zhou, School of Automation Science and Electrical Engineering, Beihang University, China

Preface With the rapid growth of health monitoring technology in various fields such as process industry, energy systems, vehicles, and some other advanced technologies, the problems of fault diagnosis and failure prognosis are receiving much attention in both academic and industrial engineering areas. They are mainly motivated by the enhancement of reliability and resilience capability against different and complex failure modes from theoretical and practical aspects. To achieve reliability requirements, reliability design and resilient control are critical for the development of engineering systems. With the advances in reliability center maintenance and condition-based maintenance techniques, it is opportunistic to exploit them for the benefit of reliability design, fault diagnosis, and failure prognosis to enhance the remaining useful life of systems components. The main core of this book is on the new techniques in reliability modeling, reliability analysis, reliability design, fault and failure detection, signal processing, and fault tolerant control of engineering systems, including mechanical, electrical, hydraulic, and marine systems, for instance. This book is targeting as a reference for graduate and postgraduate students and for researchers in all engineering disciplines, including mechanical engineering, electrical engineering, and applied mathematics to explore the state-ofthe-art techniques for solving problems of integrated fault diagnosis and failure prognosis of complex systems with collective safety and robustness aspects. Thus, it shall be useful as a guidance for system engineering practitioners and system-theoretic researchers alike, today and in the future. The book chapters are organized as separate contributions and listed according to the order of the list of contents as follows: Chapter 1 “Quality-Related Fault Detection and Diagnosis: A Technical Review and Summary,” conducts a technical review and summary of the classical achievements for quality-related fault detection and diagnosis, including their principles, implementation algorithms, technical advantages, and defects. Chapter 2 “Canonical Correlation Analysis–Based Fault Diagnosis Method for Dynamic Processes,” focuses on the application of canonical correlation analysis (CCA) technique in dynamic process fault diagnosis. Specifically, two variants of the CCA-based method— the dynamical CCA method and the gated recurrent units–aided xi

xii

Preface

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

CCA method—are presented to deal with the fault diagnosis of dynamic processes. “H∞ Fault Estimation for Linear Discrete Time-Varying Systems With Random Uncertainties,” presents fault estimation problems for linear discrete time-varying systems with random uncertainties such as multiplicative noise and packet loss. “Fault Diagnosis and Failure Prognosis of Electrical Drives,” addresses the faults in the power electronics, the DC link capacitor, batteries, and electrical machines. Specifically, the faults can be an open or short circuit of switches, and they can be identified from current and voltage measurements. For example, in electrical machines, the faults can be in the windings, with either incipient in precipitous degradation, and can be identified and their severity determined using either model- or signal-based techniques. Moreover, mechanical faults detection in bearings is discussed through the measurement of vibrations, whereas eccentricity can be detected through the change of flux and inductances. “Intelligent Fault Diagnosis for Dynamic Systems via Extended State Observer and Soft Computing,” addresses the common model-based fault diagnosis difficulties encountered in industrial applications. Specifically, this chapter uses an extended state observer to detect faults without exact knowledge of the plant model and a fuzzy inference system to help fault isolation and fault identification. “Fault Diagnosis and Failure Prognosis in Hydraulic Systems,” reviews the state of the art in diagnostics and prognostics pertaining to hydraulic machinery systems. Attention is given to detailing the application status of sensor detection technology, cavitation research, intelligent evaluation and diagnosis technology, and prognostics research, among others, used by researchers in the main areas of diagnostics and prognostics. “Fault Detection and Fault Identification in Marine Current Turbines,” develops a Hilbert transform–based detection method to detect the imbalance faults for a marine current turbine’s rotor and blade. “Quadrotor Actuator Fault Diagnosis and Accommodation Based on Nonlinear Adaptive State Observer,” proposes a nonlinear adaptive state observer–based fault-tolerant tracking control system for a quadrotor unmanned aerial vehicle. “Defect Detection and Classification in Welding Using Deep Learning and Digital Radiography,” presents two realistic welding quality datasets for training deep learning models based on radiography images collected from various projects and nondestructive

Preface

xiii

test expert-annotated datasets: SBD-1 and SBD-2. Then an optimized convolutional neural network was designed to find defects in the weldment and heat-affected zones and was subsequently trained and evaluated based on prepared datasets. Chapter 10 “Real-Time Fault Diagnosis Using Deep Fusion of Features Extracted by PeLSTM and CNN,” focuses on extracting useful features potentially involved in vibration signals using intelligent techniques for safety analysis and health monitoring of rotary machines. Finally, I would like to express appreciation to all contributors for their excellent contributions to this book. Hamid Reza Karimi Milan, November 20, 2020

Chapter 1

Quality-related fault detection and diagnosis: a technical review and summary Guang Wang a and Hamid Reza Karimi b a North

China Electric Power University – Baoding Campus, China. b Department of Mechanical Engineering, Politecnico di Milano, Italy

1.1 Introduction Today, industrial production plays a crucial role in the modern age, as it has important influence on every aspect of society. The industrial process is developing rapidly to be further automated and integrated. Plenty of producing processes contain myriad variables and indices and complex structures. As a result, fault detection and diagnosis theories are significant to this issue, alarming the occurrence of the faults and making the analysis to find the faulty variables [1–3]. The development of sensor, data transmission, and storage technology provides great opportunities in the research of data-based fault detection and diagnosis theories. One of most popular methods is multivariate statistical process monitoring [4]. The amount of algorithms are proposed to construct and decompose the feature space of the object systems, among which the representative methods are principal component analysis (PCA), canonical variable analysis, and partial least squares (PLS), among others. For the fault detection and diagnosis tasks of the industry system, the key performance index (KPI)-related, or quality-related, fault detection and diagnosis attracts more and more attention in the recent research [5–7]. On one hand, the faults with an impact on KPI would influence the quality of the output and other indicators of the product or pose a threat to the safety and stability of the production, to which more attention should be paid quickly with necessary measures. On the other hand, there still exist plenty of faults happening on the variables that do not have any direct relationship to the product or the whole production process, which can be dealt with in the daily maintenance [1, 8]. To solve the quality-related fault detection problem, one of the crucial parts is the algorithm to obtain the relationship between process variables and quality Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00010-0 Copyright © 2021 Elsevier Inc. All rights reserved.

1

2

Fault diagnosis and prognosis techniques for complex engineering systems

variables. Multiple linear regression, PLS, canonical variable analysis, and many other methods are proposed to solve this problem. As an effective method to alarm faults, PCA does not consider the correlation with the quality variables when decomposing the feature space, so it cannot distinguish whether the faults relate to the quality variables [9]. At the same time, it is still an important classical theory of dimensionality reduction by extracting the principal components [10], which is widely used in other quality-related algorithms. The PLS algorithm is a classical theory to acquire the projection directions, reflecting the changes of process variables X that are related to the quality variables. It calculates the max covariance of the process variables and quality variables Y to obtain the scores and makes the decomposition to the feature space [9, 11]. However, the PLS method is not the perfect solution, as it is not a complete orthogonal decomposition. The projection space does not cover all of the directions related to quality variables, and there is still information existing in the residual space, so the alarm results can be inaccurate [12]. Zhou et al. [12] proposed total PLS (T-PLS) to further decompose the subspaces orthogonally into four subspaces to distinguish the quality-related and qualityunrelated parts in the two original subspaces. However, it also faces some problems that it decomposes the residuals obtained from the PLS algorithm without considering the quality-related information remaining in the subspaces. It uses four subspaces to monitor, which makes the judgment logic more complex. Qin and Zheng [13] modified the PLS algorithm and proposed concurrent PLS (CPLS) to decompose the subspaces according to the contribution or relevancy to the prediction of the quality variables. On the basis of the prediction by PLS, it decomposes the principal components to distinguish the part that has contribution to the predicted quality variables and the part only related to the inputs themselves. At the same time, the quality variables are also decomposed to find the remaining unpredicted part. The multiblock C-PLS algorithm is also proposed to monitor and diagnose the decentralized process [14]. Ding et al. [15] also proposed a modified PLS (M-PLS) algorithm that applies singular value decomposition (SVD) to decompose the feature space into a principal component subspace containing all of the information to predict quality variables and a residual subspace totally unrelated to the prediction of quality variables. This method improves the effectiveness of the KPI prediction, whereas the residual subspace may contain the factors that cannot predict quality variables but are able to affect them. Based on M-PLS, Peng et al. [16] proposed the efficient PLS (E-PLS) algorithm that also makes use of SVD to decompose the feature space according to the contribution to the quality variable prediction and further decompose the residual subspace by PCA to separate the quality-related part. Apart from the PLS-based method, Peng et al. [17] proposed principal component regression (PCR) theory, which belongs to the multiple linear regression method. It applies the PCA algorithm to extract the principal component of the process variables and constructs the linear regression coefficient matrix to the quality variables. An orthogonal decomposition is conducted to the regression

Quality-Related Fault Detection and Diagnosis: Chapter | 1

3

coefficient matrix, obtaining two subspaces. The quality-related part remaining in the residual component of the first PCA function is also considered in the further decomposition of the residuals of the coefficient matrix. Wang et al. [18] proposed a total PCR (T-PCR) algorithm, which extracts the principal components of the predicted quality variables and projects the process variables again to obtain the subspaces more highly related to the quality variables and the corresponding residual subspace. The canonical correlation analysis (CCA) algorithm is also applied to fault detection. Chen et al. [19] proposed the CCA-based fault detection method to acquire the principal components by maximizing the correlation between process variables and quality variables. Chen et al. [20] further improved the CCA method to solve the fault detection problem in a detailed industrial fault condition. Zhu et al. [21] proposed a concurrent CCA (CCCA) model with regularization to deal with the defect that CCA does not take the variance of the data, which decomposes the feature space into five subspaces. Then Zhu and Qin [22] proposed a supervised diagnosis scheme, applying the CCCA algorithm to realize the fault alarm. The preceding algorithms are quite effective in the linear process. However, most of the complex industrial processes have strong nonlinear characteristics that cannot be decomposed by linear methods directly. Nonlinear mapping is applied to map the process variables into high-dimensional feature space so that the linear decomposition can be conducted in this high dimension [1]. This method is feasible in principle, but it also brings the problem that the extremely high dimension makes the calculation hard to conduct. In addition, the kernel function method is introduced to solve this problem by forming the kernel matrix to replace the mapping matrix in the calculation, where the Gaussian kernel function is widely used in the modeling of the fault detection problems [23]. Cho et al. [24] proposed a kernel PCA (KPCA) method and applied the nonlinear extension of PCA successfully to the fault identification experiments. Rosipal and Trejo [25] proposed the kernel PLS (KPLS) method and conducted the linear decomposition through a mapping matrix. The nonlinear extension of T-PLS, TKPL, is also proposed, which provides satisfying detection and diagnosis results to the industrial system [5, 26]. Recently, other theories have been proposed to conduct the decomposition of the sample space. For example, the kernel direct decomposition (KDD) theory is to perform SVD directly on the regression coefficient matrix of the quality variables [27]. Kernel least squares (KLS) theory decomposes the linear regression matrix containing the full correlation between the mapping matrix and quality variables [6]. The two orthogonal subspaces are formed by projecting the mapping matrix to the decomposition results of the regression matrix. One of the subspaces contains the relationship to the quality, whereas the other is quality unrelated. Based on the T-PCR algorithm, Wang et al. [18] also combined this algorithm with the kernel method, through which T-KPCR is proposed to solve the nonlinear problems.

4

Fault diagnosis and prognosis techniques for complex engineering systems

So far, there is a prerequisite for the application of most of the modeling algorithms and methods introduced earlier, which assumes that the process data follows Gaussian distribution. However, it is usually not fulfilled in the practice processes [28]. Much research has been proposed to solve the problem of nonGaussian process modeling. Independent component analysis (ICA) is one of the widely used algorithms in this field, as Kano et al. [29] first applied this algorithm to the process monitoring task. Lee et al. [30] proposed an ICA-based fault detection method that conducts the measurement of non-Gaussian process by negative entropy and estimates the independent components and the mixing matrix. In their work, a new statistics index I 2 is designed for monitoring with the corresponding confidence limits determined by kernel density estimation, although kernel density estimation is a widely applied method for the estimation of control region of the normal process data [31]. Another method to deal with this problem is the Gaussian mixture model (GMM), which decomposes the process data into different Gaussian components corresponding to different operating modes, respectively [32]. This method has attracted much attention recently. Choi et al. [33] combined the GMM algorithm with PCA for fault detection and further conducted fault isolation with the combination of GMM and discriminant analysis [33]. Yu [32] combined the GMM method with a Bayesian inference strategy to conduct fault isolation [32]. Jiang et al. [34] further modified the GMM with Bayesian inference to decompose the process data, based on which PCA has been performed on each operating mode to select the optimal principal components. In addition, Liu et al. [35] used the support vector data description (SVDD) algorithm to define the control region of the normal data by a minimal spherical volume. Ge and Song [36] proposed the one-class support vector machine method to separate the data with a hyperplane. As to the dynamic characteristics of the process, there is also much research that deals with this issue [37, 38]. Ku et al. [39] proposed the dynamic PCA (DPCA) method, performing PCA on the process variables with time lags. Li and Qin [40] improved the DPCA into an indirect DPCA to realize the consistent estimation of process variables and quality variables. Dong and Qin [41] proposed a dynamic inner PCA algorithm that extracts the dynamic components with best prediction results from the history data and remains little dynamic relationship information in the corresponding residual components so that the dynamic relationships and static relationships could be processed respectively. In addition, Dong and Qin [42] proposed the dynamic-inner canonical correlation analysis (DiCCA) algorithm. This algorithm selects the dynamic components by the predictability of each latent variable such that the components extracted are sufficient enough to describe the dynamic relationship. The DPCA algorithm is also combined with a dynamic ICA (DICA) algorithm to deal with the process variables obeying Gaussian and non-Gaussian distributions, respectively [43]. As to the PLS-based algorithm, Helland et al. [44] proposed the recursive PLS regression (RPLS) method to update the prediction model with the new calibration objects. This recursion method is also be applied in recent research.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

5

Hu et al. [7] proposed the recursive C-PLS (RCPLS) algorithm, taking the normal testing samples to update the monitoring model. Qin [45] modified the PLS algorithm and proposed the block RPLS algorithm to adapt for the large number of updating data and the new changes of the PLS model. Dong and Qin [46] proposed a dynamic inner PLS (DiPLS) method, which conducts the PLS method with dynamic models for both process variables and quality variables to construct the dynamic relationships between the input and output. In accordance with the methods introduced previously, when a fault occurs, it can be detected and alarmed in the corresponding statistics. Then it comes to the fault diagnosis, aiming to seek the faulty variables. There are methods proposed to analyze the contribution of each variable done to the fault occurring so that the abnormal variables can be selected [47]. The contribution plot is a widely used method especially for the linear process. It analyzes the contribution values of each process variable to the fault and selects the variables with high contribution values as the faulty variables, according to the corresponding threshold [48]. The effectiveness of this method has been proved in much research. To improve the accuracy of the diagnosis results, the reconstruction-based contribution (RBC) plot method is proposed [49]. RBC calculates the reconstructed faulty amplitude index on each variable direction to seek the variables with an abnormally large index, or compares the contribution value of each original variable and the reconstructed non-fault contribution value of that. This method achieves great improvement of the diagnosis accuracy. Further, some expansion algorithms based on RBC have been proposed. Li et al. [50, 51] proposed a multidirectional RBC method to find the minimum variables that can satisfy the reconstruction condition. In addition, Yoon and MacGregor [52] proposed angle-based contribution (ABC), which analyzes the angle measures between the observed component vector and the preknown fault vectors to judge whether fault occurs or not, of which the diagnosis effectiveness is similar to RBC. Liu [53] proposed an improved contribution plot by the reduction of combined index (RCI) to avoid the influence of the smearing effect. Later, Liu et al. [54] proposed the faulty variable selection method based on Bayesian decision theory. However, as to the nonlinear processes, the traditional contribution plot can not be used directly. Because of the loss of a corresponding relationship between the process variables and the kernel matrix, the contribution values can not be accurately calculated from the kernel matrix-based statistics. Aiming at the solution of the faulty variable diagnosis for the nonlinear processes, Zhang et al. [55] proposed the partial derivation method, also considered as a kernel gradient method, which takes the partial derivation on every variable direction to obtain contribution values by the gradient decline of each variable direction. Through this method, the faulty variables diagnosis issue of the nonlinear process can be solved, of which the effectiveness is proved in the research. At the same time, the performance of this method also been influenced by the smearing effect. Recently, to construct the revelation of the corresponding relationship, Wang

6

Fault diagnosis and prognosis techniques for complex engineering systems

et al. [56] proposed a kernel sample equivalent replacement (KSER) theory that performs the first-order Taylor series expansion on the Gaussian kernel function and acquires the kernel matrix by the variance-covariance matrix of the process variable matrix directly. This theory can solve the problem that the Gaussian kernel does not correspond to the input process variables, which can conduct the detection and diagnosis processing with process variables X for the nonlinear system. In this section, several process monitoring relevant theories have been introduced concisely. To solve the problems in this field, as to the quality-relevant detection and diagnosis and the characteristics of the process system and data, plenty of studies have been proposed with valuable theories. In the following section, some classical theories are explained in detail. Then a detailed description of some of the latest research results and progress are introduced.

1.2 Basic methodology In this section, a detailed explanation of some classical and basic fault detection theories are provided. The mentioned KPLS, T-KPLS, and C-PLS methods are summarized here. The principles of these algorithms are presented concisely in this section. To initialize, the process samples are input in the form of xi = T  xi,1 , · · · , xi,m ∈ Rm×1 , where xi represents a process variable sample with T  ∈ Rl×1 , m variables. The quality variable sample is yi = yi,1 , · · · , yi,l where l is the number of quality variables. N samples would be collected in T N×l X = [x1 , · · · , xN ]T ∈ RN×m  1 , · · · , yN ] ∈ R . The centralized X  and1Y =T[y ¯ can be obtained as X = IN − N 1N 1N X that has zero mean and unit standard deviation, where IN is a unit matrix of order N and 1N is a column vector of N order whose elements are all 1. Aiming at solving the modeling problem of the nonlinear process, the nonlinear mapping method is widely applied, mapping the origin variable samples to the high-dimensional feature space F: xi ∈ Rm×1 → φ(xi ) ∈ R×1 .

(1.1)

The variables with nonlinear relationships would be linearly decomposable in the high-dimensional space F, and the corresponding linear decomposition could be realized in that space. The mapping matrix  ∈ RN× can be formed by φ(xi ). It can be centralized as   ¯ = IN − 1 1N 1TN , (1.2)  N where IN×N ∈ RN×N is the unit matrix and 1N ∈ RN×1 is the all 1 vector. However, the number of the dimensions of  can be extremely high, which is not available to conduct the calculation directly. To solve this problem, the

Quality-Related Fault Detection and Diagnosis: Chapter | 1

7

kernel function method is applied in much of the research. The kernel matrix can be formed as K = T , which consists of ki, j that can be defined as

xi − x j 2      ki, j = φ(xi ), φ x j = fker xi , x j = exp − , (1.3) c where the Gaussian kernel function is applied to form the kernel matrix, according to plenty of studies, as it can always satisfy the Mercer theorem and obtain good fault detection effectiveness. Unless otherwise stated, the kernel function applied in this section refers to the Gaussian kernel. The normalized kernel matrix can be calculated as     ¯ = ¯ ¯ T = IN×N − 1 1N 1TN K IN×N − 1 1N 1TN . (1.4) K N N Among the fault detection methods of the complex industrial processes, KPLS is a classic algorithm to decompose the feature space according to the quality variables, which is based on the least squares principle. It is a nonlinear extension of the PLS algorithm. KPLS obtains the score matrix T and the load ¯ to the quality-related matrices P and Q by Algorithm 1 so that it can map  directions. Algorithm 1: The principle of KPLS Initialize i = 1 and ui as the first column of Yi . ¯ Ti ui ; wi =  ¯ ti = i wi , ti = ttii  ; qi = YTi ti ; ui = Yi qi , ui = uuii  . Repeat steps 2 through 5 until    ti converges.  ¯ i , Yi+1 = IN − ti tTi Yi . ¯ i+1 = IN − ti tTi  ¯ i and Yi : Update  Collect ti , qi , and ui to the matrices T, Q, and U, respectively. i = i + 1. If i ≤ A, repeat steps 1 through 8. T is the score matrix, and the load matrix of the process matrix can be expressed as ¯ T T. P=

(1.5)

¯ can be decomposed into two subspaces: principal component subspace Then  related to the quality variables and the residual subspace. The quality variables can be predicted by T and its load matrix Q as ˜ ¯ = TPT + , 

(1.6)

Y = TQT + E,

(1.7)

8

Fault diagnosis and prognosis techniques for complex engineering systems

ˆ = TPT is the principal component of , ¯ which is monitored by the where  TM 2 ˜ Hotelling⬠s T statistics. In addition,  is the residual subspace, using SPE ˆ = TQT is the predicted Y, and E is the residual part of Y. statistics. Y However, the KPLS algorithm does not realize the orthogonal decomposition actually. It extracts the principal components without considering the extent of their influence on the quality variables. Thus, there are quality-related components remaining in the residual subspace. The principal component subspace also has the quality-unrelated part. It is a kind of oblique decomposition. One of the solutions is the T-KPLS algorithm, which is developed from the Tˆ and  ˜ of KPLS PLS theorem. T-KPLS further decomposed the two subspaces  into four by the PCA theorem. ˆo+ ˆr + ˜ r = Ty PTy + ToPTo + Tr PTr +  ˜r ¯ = ˆy+ 

(1.8)

Y = Ty QTy + Ey

(1.9)

ˆ is further deAs Eq. (1.8) and Eq. (1.9) show, the principal component  ˆ ˆ o, composed to quality-related subspace y and quality-unrelated subspace  ˆ ˜ whereas r is extracted from  as the principal subspace to monitor the ˜  ˜ r consists of the residual components components with large variation in . with small variation. ˆ o,  ˆ r , and ˆ and  ˜ are decomposed into  ˆ y,  According to Algorithm 2,  2 ˆ y,  ˆ o , and  ˆ r are monitored by T statistics to observe the variation in ˜ r.   Algorithm 2: The principle of T-KPLS ˆ = TQT , Y = Ty QTy + Ey . The number of the principal Perform PCA on Y components Ay = rank(Q). −1  ˆ and it has  ˆ y = Ty PTy . PTy = TTy Ty TTy ,   ˆ − ˆ y , and perform PCA on  ˆ o, attracting Ao = A − Ay components; ˆo=  ˆ o = ToPTo .  ˜ whose number of components Ar is settled by the PCA Perform PCA on , ˆ r = Tr PTr is obtained. algorithm.  ˜ − ˆ r. ˜r =  these subspaces. At the same time, the SPE statistic is designed for the residual ˜ r . Although the T-KPLS method realizes the further decomposition of subspace  ˆ from the process variables and the subspaces, it only focuses on the predicted Y decomposes the process variables into four subspaces, whereas the unpredicted part of Y is not analyzed. Another improvement of PLS is the method of C-PLS that is proposed by Qin and Zheng [13] on the basis of PLS to obtain the principal component and the residual components of the unpredicted part of Y, and it also attracts

Quality-Related Fault Detection and Diagnosis: Chapter | 1

9

¯ that have no contribution to the prediction of Y and the the components of X ˜ with some potential relation to the quality variable Y. residual components X The corresponding nonlinear method is proposed by Zhang et al. [57]. The decomposition is realized as ˜ ¯ = ˆ + ˆ x + , 

(1.10)

ˆ +Y ˆ y + Y. ˜ Y=Y

(1.11)

The detailed algorithm is shown in Algorithm 3. Algorithm 3: The principle of C-KPLS Perform KPLS and obtain matrices T, U, and Q. ˆ = TDV = TQT and P = PQT VD−1 . ˆ = TQT so that Y Conduct SVD on Y ˆ so Y ˜ c = Ty PTy + Y ˜ with Ay principal components. ˜ c = Y − Y, Conduct PCA on Y   Tˆ T ˆ It has ty = Py y and ey = I − Py Py y. ¯ − TPT and conduct PCA on  ˜ c . It holds that  ˜ c = Tx PTx + , ˜ whose ˜c =    number of principal components is Ax . In addition, ex = I − Px PTx φ˜c . Through Algorithm 3, Eq. (1.10) and Eq. (1.11) can be rewritten as ˆx+ ˜ = TPT + Tx PTx + , ˜ ¯ = ˆc+  ˆ +Y ˆy +Y ˜ = TQ + Y=Y T

Ty PTy

˜ + Y.

(1.12) (1.13)

Here, Tx describes the variation in process variables unrelated to Y and is ˜ is the residual monitored by Tx2 statistics for the quality-unrelated faults.  part of process variables that contains the variation potentially related to Y. ˜ monitoring the fault potentially related to quality SPEx statistics is applied to , variables. There also remains variation in Y unpredicted by process variables, ˜ Ty2 and SPEy statistics are designed which are taken into account by Ty and Y. for the principal components of unpredicted quality variables Ty and the residual ˜ respectively. part of unpredicted quality variables Y,

1.3 Recent research 1.3.1

The KDD algorithm

The KDD algorithm is a simple and effective method to solve the problem of nonlinear quality-related fault detection. The main idea of this algorithm is to decompose the feature matrix into two orthogonal parts directly, according to the full correlation between the feature matrix and the output. This method does not need to construct any regression model, so it is much simpler than other conventional nonlinear methods. In addition, the detection performance is more stable. In this part, the principle of the KDD algorithm is explained in detail.

10

Fault diagnosis and prognosis techniques for complex engineering systems

  ¯ Y , which The KDD algorithm processes the cross-covariance matrix of , can be estimated as follows: ¯ YT  . (1.14) ϒ= N−1 Obviously, ϒ contains the full correction between  and Y. Perform SVD on ϒ, and it gives

   UT11 T (1.15) = VUT11 , ϒ = V1 1 U1 = V1  0 UT12 ⎡ ⎤ λ1 ⎢ ⎥ .. where U11 ∈ RM×s , U12 ∈ RM×(M−s) , λ = ⎣ ⎦ ∈ Rs×s , where . λs s is determined by the number of the eigenvalues that could cover most of the n  10, s is equal to n. characteristics in this cross-covariance matrix. If λλn+1 According to the principle of SVD, it can be obtained that VT1 V1 = Is , UT11 U11 = Is , UT11 U12 = 0,

(1.16)

U11 UT11 + U12 UT12 = IM .

(1.17)

¯ can be projected by U11 UT and U12 UT to two orthogonal parts, In addition,  11 12 as shown in the following formula: ˜ = U ¯ 12 UT12 . ˆ = U ¯ 11 UT11 ,  

(1.18)

Therefore, the remaining task is to calculate U11 UT11 and U12 UT12 . From Eq. (1.14), we have ¯ ¯ T YYT  ¯ T  ¯   = , (1.19) ϒT ϒ = 2 (N − 1) (N − 1)2 ⎡ ⎤ π11 · · · π1N ⎢ .. ⎥. .. where  = YYT = ⎣ ... . . ⎦ πN1 · · · πNN Then ϒ T ϒ is decomposed by SVD as ϒT ϒ

⎡ ⎢ where 2 = ⎣

= V2 2 UT2

  2 = V21  0 = V21 2 VT21 ⎤

λ21 ..

0 0



VT21 

 ,

(1.20)

⎥ ⎦ ∈ Rs×s . It is clear that

. λ2s

col(V21 ) = col(U11 ).

(1.21)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

11

Consequently, it holds that U11 UT11 = V21 VT21 , UT11 U11 = VT21 V21 ,

(1.22)

where  is an arbitrary real symmetric matrix with proper rows and columns. Here, V21 can be expressed as V21 = [v1 , · · · , vs ].

(1.23)

V21 VT21 = Is .

(1.24)

At the same time, it has

Combined with Eq. (1.20), we have λi vi = ϒ T ϒvi ,   where vi ∈ V21 , λi ∈ λ21 , · · · , λ2s , i = 1, · · · , s. According to Eq. (1.19), it holds that ϒT ϒ =

¯ ¯ T    T N N  1 ¯ (xi ), =   π ji φ¯ x j  2 2 i=1 j=1 (N − 1) (N − 1)

(1.25)

(1.26)

so Eq. (1.25) can be rewritten as the following expression: λv = ϒ T ϒv =

  N N 1   π ji φ¯ x j βi , 2 i=1 j=1 (N − 1)

(1.27)

where βi = φ¯ T (xi )v ∈ R is a scalar. Thus,   βi N N 1   π ji φ¯ x j 2 i=1 λ j=1 (N − 1)   N N 1 =  ui  π ji φ¯ x j (N − 1)2 i=1 j=1 1 ¯ T u, =  (N − 1)2

v=

(1.28)

where ui = βλi ∈ R is a scalar and u = [u1 , · · · , uN ]T ∈ RN is a column vector. Then we have 1 ¯ T u λφ¯ T (xm ) λφ¯ T (xm )v = (N − 1)2 1 λk¯ Tm u, (1.29) = (N − 1)2 1 ¯ T v ¯ φ¯ T (xm ) (N − 1)2 1 ¯ k¯ Tm Ku, = (N − 1)4

φ¯ T (xm )ϒ T ϒv =

(1.30)

12

Fault diagnosis and prognosis techniques for complex engineering systems

where m = 1, · · · , N. In accordance with Eqs. (1.27), (1.29), and (1.30), it is clear that 1 ¯ Ku. ¯ ¯ K (1.31) λKu = (N − 1)2 Thus, λu =

¯ K u. (N − 1)2

(1.32)

After solving the eigenvalue-eigenvector problem of Eq. (1.32), the first s largest eigenvalues (λ  1  λ2  · · ·  λs ) are selected with their corresponding eigenvectors u1 , u2 , · · · , us to calculate V21 . According to Eq. (1.23) and Eq. (1.60), V21 is obtained as follows: V21 = [v1 , · · · , vs ] =

1 ˆ ¯ T U,  (N − 1)2

(1.33)

  ˆ = u1 , · · · , us . Here, V21 can be calculated. On the basis of Eq. where U (1.17) and Eq. (1.22), it can be noted that U11 U11 T and U12 UT12 can be indirectly computed from V21 . When it comes to the online testing, the online sample xnew ∈ Rm×1 would be mapped into feature space F to get φ(xnew ) ∈ R×1 first. In addition, the ¯ new ) = φ(xnew ) − φ. ¯ As well, the online centralized zero mean of φ(xnew ) is φ(x T T ¯ ¯ T. ¯ kernel sample can be calculated as knew = φ (xnew ) According to Eq. (1.18), the online space can be decomposed into two subpaces: φˆ T (xnew ) = φ¯ T (xnew )U11 UT11 ,

(1.34)

φ˜ T (xnew ) = φ¯ T (xnew )U12 UT12 .

(1.35)

ˆ is defined as Then the T 2 statistics in the subspace corresponding to  −1  T ¯T ¯ U11  U11 2 ¯ new ). = φ¯ T (xnew )U11 UT11 φ(x (1.36) Tkdd N−1 2 can be calculated as According to Eq. (1.22) and Eq. (1.33), Tkdd −1  T ¯T ¯ V21  V21 2 T ¯ ¯ new ) VT21 φ(x Tkdd = φ (xnew )V21 N−1

−1 ¯K ¯ T U ˆ ˆ T T K U T ˆ T T k¯ new , ˆ ¯ U = knew U N−1

(1.37)

whose corresponding threshold is Jth,T 2

  s N2 − 1 = Fα (s, N − s). N(N − s)

(1.38)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

13

˜ are calculated as follows: Similarly, the SPE statistics corresponding to ˜ new ) SPEkdd = φ˜ T (xnew )φ(x T ¯ ¯ new ) − 2φ¯ T (xnew )U11 UT11 φ(x ¯ new ) = φ (xnew )φ(x ¯ new ) +φ¯ T (xnew )U11 UT11 U11 UT11 φ(x = C0 − C1 + C2 ,

(1.39)

where ¯ new ) C0 = φ¯ T (xnew )φ(x  T   = φ(xnew ) − φ¯ φ(xnew ) − φ¯ 1 2 = φ T (xnew )φ(xnew ) − k¯ Tnew 1N + 2 1TN K1N , N N ¯ new ) C1 = 2φ¯ T (xnew )U11 UT11 φ(x 2 ˆU ˆ T T k¯ new , k¯ Tnew U = (N − 1)4

(1.40)

(1.41)

¯ new ) C2 = φ¯ T (xnew )U11 UT11 U11 UT11 φ(x 2 ¯ U ˆU ˆ T T k¯ new . ˆU ˆ T T K k¯ Tnew U = (N − 1)8

(1.42)

The threshold of SPE is Jth,SPEkdd = χα2 (h),

(1.43)

where 2 SPEkdd 2 − SPE¯ kdd ,h = g= 2SPE¯ kdd

SPEkdd = SPEkdd 2 =

N  1  ¯ Co + C¯1 + C¯2 , N i=1 N 2 1  ¯ C0 + C¯1 + C¯2 . N i=1

2 2SPE¯ kdd 2 SPEkdd 2 − SPE¯ kdd

,

(1.44) (1.45) (1.46)

When calculating C¯n (n = 0, 1, 2), it uses the similar form with Cn , except the testing samples k¯ new and knew are replaced by the training samples k¯ Ti and kTi , respectively. Based on the statistics, it can be assumed that the fault exists when the 2 2 , it indicates statistic exceeds Jth,Tkdd statistic exceeds its threshold. If the Tkdd that a quality-related fault occurs at least. Or if only the SPEkdd statistics exceeds 2 has no alarm, it indicates that there is one quality-unrelated Jth,SPEkdd and the Tkdd fault occurring at least. The detailed algorithm is presented in Algorithm 4.

14

Fault diagnosis and prognosis techniques for complex engineering systems

Algorithm 4: The KDD-based fault detection method The training process: 1) Perform the nonlinear mapping to the training samples, and use the kernel function to process the mapping matrix. Normalize the training samples K ¯ and obtain K. 2) Obtain  according to Eq. (1.19), and solve the eigenvalue-eigenvector problem of Eq. (1.32). ˆ according to Eq. (1.33). 3) Calculate V21 and form the matrix U, 4) Set the confidence limit α, and calculate the thresholds with Eq. (1.38) and Eq. (1.43). The testing process: 1) Input the testing samples, and build the kernel matrix. Normalize the testing ¯ new . samples to get K 2 and SPEkdd,n of each process sample, according 2) Calculate the statistics Tkdd,n to Eq. (1.37) and Eq. (1.39), respectively. 3) Compare the statistics with their corresponding thresholds for each process sample, and make the judgment by the following logic: 2 2 and SPEkdd,n ≤ Jth,SPE ≤ Jth,Tkdd ⇒ fault free; if Tkdd,n kdd 2 2 ⇒ quality-related fault occurs; if Tkdd,n > Jth,Tkdd if SPEkls,n > Jth,SPEkdd ⇒ quality-unrelated fault occurs.

1.3.2

The KLS-based approach

This section introduces a KLS-based nonlinear method. Similar to the KPLSbased approaches, this method also deals with the nonlinear relationships among process variables by mapping these original variables into high-dimensional feature space. Then a KLS model is modeled to extract the full correlation between the process matrix and quality matrix to ensure the accuracy of the designed method. In the process, KPLS theory is not applied and it is unnecessary to set the number of latent variables, which indicates the simplicity of engineering implementation. Afterward, the feature matrix is decomposed into two orthogonal parts by SVD according to its full correlation with the quality matrix. Finally, test statistics are appropriately designed in the subspaces corresponding to the decomposed two parts for the purpose of quality-related fault detection. Detailed descriptions about the KLS method are presented in the following. First, the original training samples X are mapped into the high-dimensional feature space F to obtain . The proposed method aims to extract the full ¯ and Y by least squares regression, and then utilizes the correlation between ¯ and Y into the following forms: full correlation to decompose ¯ = ˆ + , ˜ 

(1.47)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

ˆ + Ey , Y=Y

15

(1.48)

ˆ is the full part of Y that correlated with , ¯ whereas Ey is the noise or where Y ¯  ˆ and  ˜ are orthogonal, and disturbance that completely uncorrelated with . ˆ whereas  ˆ ˆ is fully responsible for predicting Y, ˜ has no relationship with Y.  ¯ according to Y without losing any As such, we realize the decomposition on  correlation between them. ˆ can be expressed as Y ˆ = M ¯ kls , Y

(1.49)

where Mkls is the regression coefficient matrix. It can be noted that Ey is ¯ , so it has completely unrelated to      ¯ = ε ey φ¯ T (x) = 0, (1.50) cov ey , φ(x) ¯ respectively. where eTy and φ¯ T (x) are the row vectors of Ey and , In this model, N ≥ m > l ≥ 1. According to Eq. (1.48) and Eq. (1.49), it holds that ¯T ¯ 1 1 T¯ ¯ T ¯ + 1 ETy  ¯ ≈ MTkls   , Y  = MTkls  (1.51) N N N N based on which Mkls can be obtained with the pseudo-inverse function (·)† :  T † T ¯  ¯  ¯ Y. Mkls =  (1.52)  T † ¯ could be ob¯  On the basis of the definition of the pseudo-inverse,  T ¯ ¯ by aT matrix   ZT with  the same type as  , which can satisfy that tained ¯ Z  ¯  ¯ =  ¯  ¯ and Z  ¯ T ¯ Z = Z. ¯ T  To realize the pseudo-inverse calculation, the following process should be ¯ ¯ T : conducted. First, SVD is performed on 

   0    ¯ = W 1 W2 ¯ T WT1 = W1 WT1 , (1.53)  0 0 where  is a diagonal matrix with the A nonzero singular values, whereas W1 contains the corresponding A vectors. The remaining singular values are collected in W2 . In accordance with the principle of SVD, W1 satisfies that WT1 W1 = IA .

(1.54)

¯ ¯ T O=

(1.55)

Z = W1 −1 WT1 .

(1.56)

Here, define that

and

Considering Eq. (1.53) and Eq. (1.54), it holds that     OZO = W1 WT1 W1 −1 WT1 W1 WT1 = W1 WT1 = O

(1.57)

16

Fault diagnosis and prognosis techniques for complex engineering systems

and

    ZOZ = W1 −1 WT1 W1 WT1 W1 −1 WT1 = W1 −1 WT1 = Z.

(1.58)

It is obvious that Z is the pseudo-inverse of O. Therefore, it has †

¯ T ) ¯ = W1 −1 WT1 . (

(1.59)

¯ T , ¯ as shown in Eq. (1.53), the following Then, according to the SVD of  theorem can be acquired. Theorem 1. Define w ∈ W1 and λ ∈ diag{}. It holds that ¯ T w. ¯ λw = 

(1.60)

Proof.. Based on Eq. (1.53) and Eq. (1.54), we have ¯ T W ¯ 1 = W1 WT1 W1 = W1 , 

(1.61)

which can be rewritten as

⎡ λ1  T  ¯ w ¯ T w ¯ 1, · · · ,  ¯ A = [w1 , · · · , wA ]⎢  ⎣ 0 0 = [λ1 w1 , · · · , λA wA ].

0 .. . 0

0



⎥ 0 ⎦ λA (1.62)

Therefore, Eq. (1.60) holds. On the basis of this theorem, it has ¯ T w ¯ = λw = 

N 

¯ i )φ¯ T (xi )w = φ(x

i

N 

¯ i )βi , φ(x

(1.63)

j

where βi = φ¯ T (xi )w is a scalar. It can be further transformed as w=

N  βi i

λi

¯ i) = φ(x

N 

¯ i) =  ¯ T u, ui φ(x

(1.64)

i

where ui = βλii is also a scalar, and u = [u1 , · · · , uN ]T ∈ RN×1 . As with other nonlinear algorithms, the Gaussian kernel method is introduced here to realize the calculation. According to Eq. (1.64), for n = 1, · · · , N, ¯ n ) on both sides of Eq. (1.60). Thus, the left side turns to multiply φ(x ¯ T u = λk¯ Tn u, λφ¯ T (xn )w = λφ¯ T (xn )

(1.65)

whereas the right side changes to ¯ ¯ T w ¯ T ¯ = φ¯ T (xn ) ¯ ¯ T u = k¯ Tn Ku. φ¯ T (xn )

(1.66)

¯ λk¯ Tn u = k¯ Tn Ku.

(1.67)

Thus, it has

Quality-Related Fault Detection and Diagnosis: Chapter | 1

17

For all of the N samples, the equation can be expressed as ¯ =K ¯ Ku. ¯ λKu

(1.68)

¯ λu = Ku.

(1.69)

Finally, it has So far,  and U can be calculated by solving the eigenvalue-eigenvector problem of Eq. (1.69). The largest A singular values are selected as the main elements of the diagonal matrix , and the corresponding A singular vectors form the matrix U. With u, W1 can be obtained: ¯ T UT W1 = 

(1.70)

Therefore, the pseudo-inverse calculation of Eq. (1.59) can be transformed as  T † ¯  ¯ T , ¯ ¯ = 

(1.71)

 = UT −1 U.

(1.72)

where

In addition, Eq. (1.52) is rewritten as  T † T ¯ ¯  ¯ T  ¯ ¯ TY =  ¯ T KY. ¯  ¯ Y= Mkls = 

(1.73)

At this point, Y has been decomposed by M completely. ¯ is projected onto the space span{Mkls } and the In the following process,  corresponding orthogonal complement space span{Mkls }⊥ . It has ˆ ≡ span{Mkls }, 

(1.74)

˜ ≡ span{Mkls }⊥ , 

(1.75)

˜ kls = 0. M

(1.76)

  ˆ = M ¯ kls =  ˆ + ˜ Mkls = M ˆ kls . Y

(1.77)

where

Thus,

The preceding projection can be conducted by the following steps: (1) Perform SVD on Mkls MTkls : Mkls MTkls =



 W1

W2



0

  and where W 1 = W 1 , · · · , W A number of nonzero singular values.

0 0



W2

 =

T W1



 W

A

=

T , W1  W1  , · · · , WN . +1

(1.78) A

is the

18

Fault diagnosis and prognosis techniques for complex engineering systems

¯ by (2) Project 

T W1 W1

and

T , W2 W2

respectively. It has

ˆ = ¯ 

˜ = ¯ 

T

W1 W1

,

(1.79)

.

(1.80)

= I

(1.81)

= 0.

(1.82)

T

W2 W2

Based on the properties of SVD, it holds that T

W1 W1

+

T

W2 W2

and T

W1 W2

=

T

W2 W1

As a result, 

 ˆ + ˜ = ¯ 

T

W1 W1

ˆ ˜T = ¯ 

+

T

T

W2 W2 T

W1 W1 W2 W2

¯ = ,

¯ T = 0. 

ˆ can be acquired such that (3) Y   T T ˆ ¯ ¯ ˆ kls . Y = Mkls =  + Mkls = M W1 W1 W2 W2

(1.83)

(1.84)

(1.85)

According to the preceding three steps, the orthogonal decomposition is realized. What remains to be solved is the calculation of W 1 and W 2 . This solution is provided as follows. According to Eq. (1.73), Mkls MTkls can be expressed as T ¯T T ¯ ¯ ¯ T KYY ¯ ¯ T , K  = Mkls MTkls = 

where



π11 ⎢ .. T ¯T T ¯  = KYY K  = ⎣ . πN1

··· .. . ···

⎤ π1N .. ⎥ ∈ RN×N . . ⎦ πNN

(1.86)

(1.87)

Thus, Eq. (1.86) can be expanded as Mkls MTkls =

N N   i=1 j=1

  π j,i φ¯ x j φ¯ T (xi ).

(1.88)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

Let W ∈ W 1 , and  is the corresponding singular value that Theorem 1, it holds that W

= Mkls MTkls

W

=

N N  

  π j,i φ¯ x j φ¯ T (xi )

i=1 j=1

W

=



N N  

19

. According to

  π j,i φ¯ x j

i=1 j=1

βi

.

(1.89) Here,

β

W

= φ¯ T (xi ) W is a scalar. Therefore, =

N N  

N   β π j,i φ¯ x j i = i

i=1 j=1

i=1

W

can be rewritten as

N 

ui

  ¯ T , π j,i φ¯ x j =  u j=1

(1.90)

T  where u i = β i is also a scalar, and u = u 1 , · · · , u N ∈ RN×1 . i Similar to the principle of Eqs. (1.65) through (1.69), it holds that 

φ¯ T (xn )

φ¯ T (xn )Mkls MTkls

W

W

=



¯ T φ¯ T (xn )

¯ ¯ T  = φ¯ T (xn )

u

W

=



k¯ Tn  , u

¯ = k¯ Tn K , u

(1.91)

(1.92)

where n = 1, · · · , N. As to the all N samples, it yields that 

¯ K

u

¯ K ¯ = K . u

(1.93)

The following eigenvalue-eigenvector problem is obtained: u

¯ = K . u

(1.94)

Obtaining the results of the preceding eigenvalue-eigenvector problem, the largest A singular values are selected as the main elements of the diagonal matrix , and the corresponding A singular vectors forms the matrix u 1 . The rest of the  singular vectors are collected in u 2 . Based on Eq. (1.78) and Eq. (1.90), it is obtained that W1 W2

¯ T = ¯ T =

u1 u2

,

(1.95)

.

(1.96)

The preceding process describes the detailed derivation of the KLS algorithm, which can decompose the feature space into two orthogonal parts. When it comes to the online testing tasks, the following process can be applied. The online samples are mapped into the high-dimensional space first, ¯ new = as xnew ∈ Rm → φ(xnew ) ∈ R . new is formed and centralized by  1 T T new − N 1N 1N new . The kernel matrix Knew = new new is introduced to ¯ new =  ¯ new  ¯ Tnew = be centralized as K   and it can realize the1 calculation, 1 T T ¯ IN×N − N 1N 1N Knew IN×N − N 1N 1N . φ(xnew ) can be decomposed into two

20

Fault diagnosis and prognosis techniques for complex engineering systems

orthogonal subspaces as φˆ T (xnew ) = φ¯ T (xnew )

φ˜ T (xnew ) = φ¯ T (xnew )

T

W1 W1 T

W2 W2

,

(1.97)

.

(1.98)

ˆ are mutually correlated. Therefore, moniˆ new and Y It can be noticed that  toring φˆ T (xnew ) will provide the fault information related to Y. On the contrary, unrelated to Y. monitoring φ˜ T (xnew ) will provide  us the fault Tinformation  ¯ T ≤ rank T  ¯ ≤ , it follows that T  ¯T = Since A = rank W 1 W T1  W1 A W1 2 ¯ . φ(xnew ) is a suitable candidate for T statistics. A W1 ˆ is calculated as follows: The T 2 statistic in the subspace corresponding to 

T ¯T ¯  W1 W1 T ¯ 2 = φ¯ T (xnew ) Tkls φ(xnew ) W1 N−1 W1

T T ¯ ¯  KK u 1 u T T¯ T 1 = k¯ new   knew , (1.99) u1 N−1 u1 whose threshold can be calculated with training samples by the following formula:  2    N −1 A  Fα , N − JTkls2 ,th =  . (1.100) A A N N−A The confidence level α is settled in advance. ¯ new ) can also be a feasible candidate for T 2 statistics. At the same time, W T2 φ(x ˜ new usually represents the residual portion of  ¯ new , and the variance However,  ˜ new is usually very small. To avoid numerical problem in the inverse process, of  it is more suitable to use SPE statistics instead of T 2 statistics in this subspace. The SPE statistic is calculated as follows: SPEkls = φ¯ T (xnew )

T

W2 W2

¯ new ) = k¯ Tnew  φ(x

T

u2 u2

T k¯ new ,

(1.101)

whose threshold can be calculated by JSPEkls ,th = gχα2 (h),

g=

2 − SPE SPEkls kls

2SPEkls

2

,h =

2SPEkls

(1.102)

2

2 − SPE SPEkls kls

2

.

Algorithm 5 shows the specific KLS-based fault detection process.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

21

Algorithm 5: The KLS-based fault detection method The training process: 1) Perform the nonlinear mapping to the training samples, and use the kernel function to process the mapping matrix. Normalize the training samples K to ¯ zero mean and unit variance, and obtain K. 2) Solve the eigenvalue-eigenvector problem of Eq. (1.69). 3) Calculate  by Eq. (1.72) and  by Eq. (1.87). 4) Solve the eigenvalue-eigenvector problem of Eq. (1.94). Form u 1 and u 2 . 5) Set the confidence limit α, and calculate the thresholds with Eq. (1.100) and Eq. (1.102). The testing process: 1) Map the testing samples to the high-dimensional space, and build the kernel ¯ new . matrix. Normalize the testing samples to get K 2 2) Calculate the statistics Tkls,n and SPEkls,n of each process sample, according to Eq. (1.99) and Eq. (1.101), respectively. 3) Compare the statistics with their corresponding thresholds for each process sample, and make the judgment by the following logic: 2 ≤ JTkls2 ,th and SPEkls,n ≤ JSPEkls ,th ⇒ fault free; if Tkls,n 2 if Tkls,n > JTkls2 ,th ⇒ quality-related fault occurs; if SPEkls,n > JSPEkls ,th ⇒ quality-unrelated fault occurs.

1.3.3

Reconstruction partial derivative contribution plot

Although the contribution plot approach is popular in the fault diagnosis field as the default method, the diagnosis effectiveness would still be influenced by the fault smearing effect seriously. In particular, a traditional contribution plot cannot be used directly for nonlinear fault diagnosis because the relationship between the original variables and statistical index is cut off. The partial derivative contribution plot method has been proposed to improve the diagnosis results, which applies the kernel gradient as the contribution value. In this section, the partial derivative contribution plot method is advanced by data reconstruction to obtain more accurate faulty variables and suppress the smearing effect. Here, a column vector is defined as v ∈ Rm×1 , where the element vi = 1(i = 1, 2, · · · , m). Then the partial derivative contribution for the lth variable in the kernel matrix K can be obtained as follows: ⎡ ∂K ⎢ =⎢ ⎣ ∂vl

∂K1,1 ∂vl

.. .

∂Kn,1 ∂vl

··· .. . ···

∂K1,n ∂vl

.. .

∂Kn,n ∂vl

⎤ ⎥ ⎥. ⎦

(1.103)

22

Fault diagnosis and prognosis techniques for complex engineering systems

A mathematical algorithm is defined as  xi  v = xi,1 v1 xi,2 v2 Then the elements in matrix

∂K ∂vl

···

xi,m vm



.

(1.104)

can be calculated by the following formula:

∂Ki, j ∂k(xi , x j ) ∂k(xi  v, x j  v) = = ∂vl ∂vl ∂vl ∂x v−x v2

xi  v − x j  v2 − i c j = exp(− ) )( c ∂vl ∂xi  v − x j  v2 1 = − k(xi , x j ) c ∂vl  2 2 2 (x v ∂ ( 1 i,1 1 − x j,1 v1 ) + · · · + (xi,m vm − x j,m vm ) ) = − k(xi , x j ) c ∂vl 2 ∂ ((xi,1 v1 − x j,1 v1 ) + · · · + (xi,m vm − x j,m vm )2 ) 1 = − k(xi , x j ) c ∂vl 2 ∂ (xi,l vl − x j,i vl ) 1 = − k(xi , x j ) c ∂vl 2 = − k(xi , x j )(xi,l − x j,l )2 . (1.105) c The centralized

∂K ∂vl

can be obtained as follows:

¯ ∂K ∂K ∂K ∂K 1 ∂K 1 1 = − 1N 1TN − 1N 1TN + 2 1N 1TN 1N 1TN . ∂vl ∂vl N ∂vl N ∂vl N ∂vl

(1.106)

The combined index based on the KLS model is used in the diagnosis process: 2 kls = J T 2 + JSPE . The contributions of the lth variable for this combined index th,SPE th,T kls in the ith sample can be computed as conti,l = |

¯ ¯ ∂K ¯ i )T + K| ¯ i kls ( ∂ K |i )T |. |i kls (K| ∂vl ∂vl

(1.107)

The partial derivative contribution values of the training samples conform to the χ 2 distribution. As a result, the threshold of the lth variable can be calculated by training samples by the following formula: CUCL,l = 9 ∗ m(contl ),

(1.108)

where m(contl ) is the average of the contribution values. When the contribution of a variable exceeds its threshold, the variable can be considered as a fault variable. At the same time, the partial derivative contribution value can also be affected by the smearing effect. Therefore, the reconstruction of the normal testing samples is a solution to suppress the smearing effect. Aiming to solve this problem, the reconstruction-based method is applied. The first step of this

Quality-Related Fault Detection and Diagnosis: Chapter | 1

23

method is to reconstruct the variables. It reconstructs one single variable at a time. For this reconstructed variable, a detection process is conducted, whose data consists of this reconstructed variable and the other original variables. This reconstruction detection process is conducted on every variable, obtaining the detection statistics indices. As a result, the variables can be reordered by the descending order of their statistics indices. Then the sequence [η1 , . . . , ηm ] can be obtained, where η1 represents the variable with the maximum statistics value and ηm is the variable with the minimum statistics value. When it comes to the diagnosis process, it calculates the detection statistics q times. Every time the statistic is calculated, one more variable is replaced by the reconstructed variable data. The order of the variable replaced each time is determined by the sequence [η1 , . . . , ηm ]. If the statistic detected drops below the corresponding threshold when the first q variables in the sequence are replaced, these q variables are considered as the faulty variables for this fault. When the fault amplitude is small, partial reconstruction may occur. At this point, only some of the actual faulty variables replaced can make the statistics value satisfy the termination condition of data reconstruction, resulting in the missing diagnosis of other fault variables. Facing to this problem, the original termination condition J0.99 , the threshold with a confidence level of 0.99, is changed into J0.99 − 3st , where st is the variance of the statistics. Using the reconstruction matrix proposed in the work of Yu [32], predictive value of fault variables can be obtained as follows: ξ xnew = 0,

(1.109)

where ξ ∈ Rn f ×m . n f is the number of fault variables. The corresponding element value of the lth fault variable in row vector ξl is 1, and the rest are 0. If the sample xnew is transformed to xnew = xnew + (I − )xnew , then Eq. (1.109) can be expressed as follows: ξ xnew = −ξ (I − )xnew ,

(1.110)

where  is a diagonal matrix, the main elements corresponding to the faulty variable are 1, and the others are 0. It holds that xnew = ξ T xn f , where xn f is a column vector comprised of faulty variables. According to Eq. (1.110), it has that x∗n f = −(ξ ξ T )−1 ξ (I − )xnew .

(1.111)

Next, the resulting x∗n f is replaced by the corresponding variable in xnew : xˆ ∗ = ξ T x∗n f + xnew (I − ),

(1.112)

where xˆ ∗ is the fault-free predictive value of the testing sample. During the diagnosis process, one of the variables is replaced by the fault-free predictive data, which is the reconstructed variable. This new sample contains

24

Fault diagnosis and prognosis techniques for complex engineering systems

this reconstructed lth variable as follows: xlnew = xnew  δl + xˆ ∗  (1Tm − δl ),

(1.113)

where δl ∈ R1×m is a row vector, whose lth element is 1 and the others are 0. The new kernel vector klnew can be calculated and centralized as follows: xlnew − x j 2 ), c 1 1 1 = klnew − 1TN K − klnew 1N 1TN + 2 1TN K1N 1TN . N N N

klnew,j = k(xlnew , x j ) = exp(− k¯ lnew

(1.114)

Based on the preceding processing, the contribution of lth variables can be obtained as follows:  l  ∂klnew ∂klnew,N new,1 = ∂k∂v . (1.115) · · · ∂vl l ∂vl Then the elements of

∂klnew ∂vl

∂klnew, j ∂vl

can be calculated as ∂k(xlnew , x j ) ∂vl ∂k(xlnew  v, x j  v) = ∂vl 2 l = − k(xnew , x j )(xlnew,l − x j,l )2 . c =

(1.116)

As to the new testing sample, K is fixed so that it would not influence the ¯ new has contribution. The centralized K ∂ k¯ lnew ∂kl 1 ∂klnew = new − 1N 1TN . ∂vl ∂vl N ∂vl

(1.117)

On the basis of Eq. (1.107), the contribution to combined index kls of lth variables can be calculated as contnew,l = |

∂ k¯ lnew ∂ k¯ l kls (k¯ lnew )T + k¯ lnew kls ( new )T |. ∂vl ∂vl

(1.118)

According to the thresholds of each variable determined by Eq. (1.108), the relative contribution can be acquired by Crl =

contnew . CUCL

(1.119)

If the relative contribution of one variable exceeds 1, indicating the contribution value is larger than the threshold, this variable would be selected as a fault variable.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

1.3.4

25

Kernel sample equivalent replacement

Kernel-based methods are the mainstream methods for solving data modeling and fault detection of nonlinear systems. However, they also suffer from some problems when applying different kernel functions in practice. The most widely used kernel function is the Gaussian kernel function, as all of the algorithms introduced earlier are based on this kind of kernel. The Gaussian kernel is constructed on the basis of Euclidean norm, which leads to the uncertain relationship between the original variables and the kernel vectors. The partial derivative contribution is proposed to solve this problem. Here, the KSER algorithm is introduced to provide another way to solve this issue. The key idea of this algorithm is to construct the relationship between the original variable X and the Gaussian kernel matrix K. The detailed principle is expressed as follows. The core of this algorithm is to conduct the first-order Taylor expansion on the Gaussian kernel function. As a result, the following results can be obtained: ki,j = k(xi , x j ) = exp(− = 1−

xi − x j 2 ) c

xi − x j 2 + o(x2 ). c

(1.120)

Then the Gaussian matrix can be expressed as 1 K = 1N 1TN − S, c

(1.121)

  where S = si, j , (i, j = 1, · · · , N) and si, j = xi −x j 2 = xTi xi +xTj x j −2xTi x j . The kernel matrix K would be centralized as ¯ = (IN − 1 1N 1TN )K(IN − 1 1N 1TN ) K N N 1 1 1 T T = (IN − 1N 1N )(1N 1N − S)(IN − 1N 1TN ). N c N

(1.122)

It holds that 1 1 1N 1TN )1N = 1N − 1N N = 0, N N 1 1 1TN (IN − 1N 1TN ) = 1TN − N1TN = 0. N N (IN −

(1.123)

As mentioned previously, H = (IN − N1 1N 1TN ), so Eq. (1.122) can be simplified as ¯ = H(1N 1TN − 1 S)H. K c

(1.124)

26

Fault diagnosis and prognosis techniques for complex engineering systems

¯ can be rewritten as According to Eq. (1.123), K ¯ = − 1 HSH K c     T  1  = − H diag XXT 1TN + 1N diag XXT − 2XXT H c 2 ¯ ¯T = XX . (1.125) c As to the testing process, the kernel matrix similar to Equation (1.121) can be expressed as follows: 1 Knew = 1Nt 1TNt − Snew , (1.126) c  = xnew − xi 2 |i = 1, · · · , Nt and can be represented as

 where Snew = snew,i follows:    T  1 − 2Xnew XT . (1.127) Snew = 1Nt 1TN diag XXT 1TN + 1N diag XXT N On the basis of Eq. (1.121), Eq. (1.124), Eq. (1.126), and Eq. (1.127), Knew can be transformed into   ¯ new = Knew − 1 1Nt 1TN K HT K N   1 1 = 1Nt 1TN S − Snew HT c N   1 2 = 2Xnew XT − 1Nt 1TN XXT HT c N  T  2 1 1 T T = Xnew − 1Nt 1N X X − 1N 1N X c N N 2¯ ¯ T. = Xnew X (1.128) c So far, the Gaussian kernel can be rewritten by X as ¯ T, ¯ new = 2 X ¯X ¯ T,K ¯T X ¯ = 2X (1.129) K c c new Taking the KPLS model as an example, the predicted quality variable can be calculated as  ¯ k Ykpls = M

  ¯ −1 TT Y ¯ ¯ T U TT KU =   ¯ −1 TT Y ¯ TT KU = KU 2 ¯ ¯ T  T ¯ −1 T X U T KU T Y. = X c

(1.130)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

27

The predicted value of the testing sample quality variable is  ¯ k Ykpls,new = M   ¯ −1 TT Y ¯ T U TT KU = φ T (xnew )   ¯ −1 TT Y = KTnew U TT KU   2 ¯ T U TT KU ¯ −1 TT Y. (1.131) = x¯ Tnew X c Taking quality-related subspace fault detection as an example, the statistical index is   ¯ −1 ¯ T U TT KU tTnew = φ T (xnew )   ¯ −1 = kTnew U TT KU   2 ¯ T U TT KU ¯ −1 , = x¯ Tnew X (1.132) c  T −1 T T 2 T tnew Tkpls,n = tnew N−1  T −1   4 ¯ T U TT KU ¯ −1 T T ¯ T T)−1 UT X¯ ¯ xnew = 2 x¯ Tnew X (UT K c N−1 4 ¯ new . = 2 x¯ Tnew −1 (1.133) kser x c It is obvious that the relationship between Gaussian kernel function and the input variables is settled, so the input variables participate in the calculation directly. The diagnosis or isolation of the faulty variable can be more effective and efficient. The calculation of nonlinear data modeling and fault diagnosis process is greatly reduced after the equivalent replacement.

1.4 Simulation 1.4.1

Introduction of the Tennessee-Eastman process

As a widely used object in the multivariate statistical process monitoring field, the Tennessee-Eastman process is introduced in this section and used to verify the effectiveness of several fault detection and diagnosis algorithms [58, 59, 60]. This is a model extracted from the actual industrial chemical process that is created by the Eastman Chemical Company. The detailed production is shown in Fig. 1.1. It contains five major units: a reactor, condenser, compressor, separator, and stripper. There are eight ingredients related to the production in this process: A, B, C, D, E, F, G, and H. The main reactions are A(g) + C(g) + D(g) → G(liq), A(g) + C(g) + E(g) → H(liq), A(g) + E(g) → F (liq), 3D(g) → 2F (liq),

(1.134)

FI

FC

28

46 FI

5

FC

XC Purge

FI

FC

CWS

43 1

10

Compressor

PHL

SC

A

FI XC FI

Condenser

FC

PI

12 8

XC FI

Reactor

FC FC

Pl

CWS

E

Vapor liquid separator

14

Stripper LC 23 XA 24 XB 25 XC 26 XD 27 XE

A N A L Y Z E R

FI

15 TI

FI

TC CWR

21

51

9 TI

FI

XC

TI

TC FC

FC

Level

Pressure

Flow rate

Composition

Temperature

Other

19 Steam

30

XC

31

XD

32

XE

33

XF

34

XG

35

XH

36

Condensor LI

The production process of Tennessee-Eastman process.

A N A L Y Z E R

XD

37

XE 38 XF

39

XG 40 XH 41

17 FI

Manipulated variables

FI

50

LC

Process measurement

FIGURE 1.1

29

48

TC 18

45 A/B/C

16

XA XB

6

28 XF

4

22

13

44 3

11

52

D 7

A N A L Y Z E R

LI

CWR

LI

42 2

PI

49

LC Product

Fault diagnosis and prognosis techniques for complex engineering systems

47

20

XC

Quality-Related Fault Detection and Diagnosis: Chapter | 1

29

where G and H are two liquid products of this industrial process. The components A, C, D, and E are the gas reactants that are fed into the reactor with inert catalyst B. At the same time, the liquid by-product F is also produced in the reactor. The gas reactants are fed into the reactor with the catalyst to conduct the reaction. The mixture produced and the reactants remaining are processed by the condenser, and then the gas components and liquid components are separated by the separator. The gas components would be recycled after being compressed by the compressor, whereas the liquid components are entered into the stripper for further purification to obtain the product components G and H. The gas components separated by the stripper would also be introduced back to the reactor. As shown in Fig. 1.1, there are 52 variables in total that can be sorted into two blocks. The XMV block consists of 11 manipulated variables, whereas the XMEAS blocks contain 22 process variables and 11 analysis variables, as shown in Table A.1.1, Table A.1.2, and Table A.1.3. As in most of the previous research, 22 process variables, XMEAS(1--22), and 11 manipulated variables, XMV(1--11), are selected as the input x1 -x33 in the simulation, so there are 33 input variables in the simulation dataset. XMEAS(35), the purge gas analysis variable of the product component G, is considered as the output variable y. As to the training samples, 480 normal samples are applied to construct the feature space and the mapping directions. When it comes to the testing process, 960 samples are introduced with the fault occurring at the 161th sample. The faults of this TEP model contain 20 kinds of faults that are described in detail in Table A.1.4.

1.4.2

Fault detection results

In this section, the simulation results of several fault detection methods are provided in detail with the TEP model, particularly the KPLS method, the TKPLS method, the KLS method, and the KDD method. The confidence level is settled as α = 0.99, and the parameter in the Gaussian kernel function is c = 104 . To illustrate the efficiency of the detection methods, two classical faults, IDV(2) and IDV(14), quality related and quality unrelated, respectively, are selected to show the statistics indices curves so that the alarm results can be obtained clearly. At the same time, the fault detection rates (FDRs) and false alarm rates (FARs) of all of the faults are provided and analyzed to compare the efficiency of the different methods. The calculation of FDR and FAR uses the data obtained after the fault occurrence and is conducted through the following formula: Alarm rate =

Number of alarm samples × 100%. Total number of faulty samples

(1.135)

First, the occurrence of the quality-related faults is discussed. In Fig. 1.2(A-D), the detection results of IDV(2) are noted. This fault would

30

Fault diagnosis and prognosis techniques for complex engineering systems

A

B FIGURE 1.2 KDD.

Fault detection results with IDV(2) occurring. (A) KPLS (B) TKPLS (C) KLS (D)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

C

D FIGURE 1.2

(Continued)

31

32

Fault diagnosis and prognosis techniques for complex engineering systems

cause alarms both in quality-related subspaces and quality-unrelated spaces. It is clear that the statistics of the four methods alarm obviously after the 161th sample in Fig. 1.2(A-D). According to the FDRs in Table 1.1, the alarm rates are almost up to 100% in the results of four quality-related methods. As to the other quality-related faults, Table 1.1 shows that the KPLS and TKPLS methods have similar detection efficiency, as most of the faults are detected with high alarm rates. Their FDRs for IDV(5), IDV(10), and IDV(20) are consistently lower than the conditions when most other faults occur. At the same time, the alarm rates of the KLS and KDD methods are slightly lower than the previous two classic methods for most of the conditions. When IDV(1), IDV(10), IDV(12), IDV(13), and IDV(20) occur respectively, the FDRs of the T 2 statistics of the KLS and KDD methods are obviously lower than KPLS and TKPLS. This phenomenon partly results from the function of the control system of the process that adjusts the abnormal variables back to the normal level or into a new balance condition, and the statistics may return under the thresholds. In particular, the KLS method shows significantly low detection rates for IDV(6) and IDV(18). In addition, the FDRs of the KPCA method are also shown in Table 1.1, which exhibits low FDRs and illustrates that the performance of KPCA for the quality-related fault detection task is not so reliable. According to the results and analysis shown previously, it can be known that the KPLS, TKPLS, KLS, and KDD methods can provide efficient quality-related fault detection with satisfactory alarms to make the detection judgment. Then quality-unrelated faults are taken into consideration, as Fig. 1.3(A-D) and Table 1.2 show. Fig. 1.3(A-D) demonstrates the statistics index curves for IDV(14) detected by the KPLS, TKPLS, KLS, and KDD methods. As IDV(14) is a classical quality-unrelated fault, it can be noted that the KPLS method has obvious alarms both in T 2 and SPE statistics in Fig. 1.3(A), which means that large FARs in T 2 statistics should be normal all the time. As to the TKPLS method, Tko2 and Tkr2 statistics represent the quality-unrelated subspaces. In Fig. 1.3(B), these two statistics have obvious alarm signals, whereas SPEkr statistics also contain abnormal samples. Although the quality-related Tky2 statistics have few alarm signals, the SPEkr statistic represents the variation in the residual subspace, whose relationship to the quality variable and alarm conditions are hard to analyze. In Fig. 1.3(C) and (D), the detection results of the KLS and KDD methods show similar characteristics with low FARs in T 2 statistics and obviously FDRs in SPE statistics. The accuracy of the KLS and KDD methods can also be proved by the FDRs and FARs shown in Table 1.2. The qualityrelated T 2 statistics of KDD achieve low alarm rates for the quality-unrelated faults compared to other methods, whereas the quality-unrelated SPE statistics have high FDRs. The KLS method gives similar detection results to the KDD method, of which the FDRs of SPE statistics are lower than that of the KDD method. Based on the alarm results, the KLS and KDD methods can identify the quality-unrelated faults. In contrast, the FDRs and FARs of KPCA, KPLS, and TKPLS are shown in Table 1.2, within which the T 2 statistics of these

TABLE 1.1 FDRs of the quality related faults (%) KPCA 2 Tkpca

a

SPEkpca

KPLS 2 Tkpls

TKPLS SPEkpls

Tky2 & SPEkr

99.88

99.75a

98.63

98.63a

KLS 2& T2 Tko kr

2 Tkls

KDD SPEkls

2 Tkdd

99.5

34.88a

99.13

38.38a

99.63

98.50

86.88a

98.00

89.88a

98.50

27.30

23.50a

26.88

23.63a

37.50

99.50

17.75a

19.13

98.25a

100.00 99.13

SPEkdd

IDV(1)

0.38a

0.40

99.63a

IDV(2)

1.88a

1.88

98.50a

IDV(5)

76.33a

75.60

30.50a

25.38

39.38a

IDV(6)

0a

0

99.50a

99.88

100.00a

IDV(8)

0.23a

0.23

95.88a

97.38

98.13a

95.13

79.00a

97.00

97.63a

IDV(10)

60.75a

33.80

67.25a

49.50

61.00a

61.88

26.75a

47.75

27.00a

60.88

IDV(12)

0.10a

0.13

98.00a

98.88

98.25a

97.38

76.63a

98.13

77.13a

99.63

IDV(13)

0.50a

0.55

94.88a

95.25

95.50a

95.13

79.13a

93.88

80.25a

90.88

IDV(18)

0.99a

10.33

89.13a

89.63

91.63a

88.75

24.38a

22.38

88.63a

91.13

IDV(20)

69.33a

49.00

44.38a

52.88

67.25a

39.50

28.38a

38.63

28.63a

63.63

FDR.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

Fault

33

34

Fault diagnosis and prognosis techniques for complex engineering systems

A

B FIGURE 1.3 KDD.

Fault detection results with IDV(14) occurring. (A) KPLS (B) TKPLS (C) KLS (D)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

C

D FIGURE 1.3

(Continued)

35

36

Fault

a

KPCA

KPLS

TKPLS

KLS

SPEkpca

2 Tkpls

SPEkpls

Tky2 &

IDV(3)

97.13b

97.88a

12.63b

3.25a

11.38b

8.88a

7.25b

8.88a

7.63b

17.38a

IDV(4)

99.00b

98.88a

56.38b

95.38a

94.38b

50.50a

5.63b

23.75a

5.75b

94.25a

IDV(7)

59.13b

59.88a

95.88b

99.88a

100.00b

88.25a

32.63b

68.13a

33.25b

100.00a

IDV(9)

97.50b

97.00a

10.88b

1.00a

10.75b

8.00a

5.63b

7.00a

5.88b

13.38a

IDV(11)

57.50b

59.88a

61.63b

60.13a

66.00b

59.50a

9.50b

41.75a

9.88b

73.75a

IDV(14)

0b

0.13a

85.13b

99.88a

100.00b

89.13a

3.38b

95.75a

4.38b

100.00a

IDV(15)

92.33b

93.75a

17.38b

3.50a

27.13b

10.8a

10.50b

10.63a

10.75b

22.00a

IDV(16)

70.33b

68.63a

49.75b

18.00a

27.00b

41.25a

20.00b

37.50a

20.25b

53.5a

IDV(17)

0.40b

0.64a

76.00b

90.50a

89.38b

76.75a

13.25b

77.88a

29.38b

92.25a

IDV(19)

88.13b

86.33a

6.63b

14.50a

11.00b

4.75a

1.00b

4.00a

1.00b

15.75a

FDR. b FAR. Boldface, highlighting the low FARs of KLS and KDD.

SPEkr

2& T2 Tko kr

2 Tkls

KDD

2 Tkpca

SPEkls

2 Tkdd

SPEkdd

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 1.2 FDRs and FARs of the quality-unrelated faults (%)

Quality-Related Fault Detection and Diagnosis: Chapter | 1

37

methods are high for several faults occurrence, such as IDV(4), IDV(7), IDV(11), IDV(14), and IDV(17), which can mislead the fault detection results. In summary, the detection performance of the KPLS, TKPLS, KLS, and KDD methods were compared in this section. It can be concluded that these four methods can realize quality-related fault detection. When quality-related faults occur, they can provide high FDRs to make the judgment. At the same time, it can be noticed that the KPLS and TKPLS methods obtain large FARs in the qualityrelated subspaces, which would impact the accuracy of the detection results badly. However, the KLS and KDD methods are more reliable by comparison, as they cannot only obtain accurate alarms for quality-related faults but also get low FARs for quality-unrelated faults.

1.4.3

Nonlinear fault detection using KSER

As the preceding section mentioned, plenty of the fault diagnosis methods apply the Gaussian kernel function to solve the nonlinear process detection problem and obtain satisfactory results. However, they also suffer from the unclear relationship between the input and the kernel matrix, when analyzing the contribution of each variable to the fault occurring. Within several solutions, KSER is proposed recently to replace the kernel function in the fault detection and diagnosis process. This section describes the simulation results of the KSER method with the TEP model and compares the detection results of KSER-KPLS with typical kernel-based nonlinear methods to verify the equivalent effect to the original KPLS method and the advantage of the KSER algorithm. In particular, the detection results of KSER-KPLS are compared with those of KPLS, TKPLS, and KPCA. The confidence level and the parameter in the Gaussian kernel function are set as same as in the preceding simulation experiments. Here, the fault IDV(7) is applied to the simulation. As can be noted in the upper part of Fig. 1.4, both KSER-KPLS and the original KPLS method have achieved satisfactory results in the prediction of the output Y. The effectiveness of the KSER method is even better than that of KPLS, as the deviation between KSER-KPLS (green curve) and the actual output (red curve) is smaller than that between KPLS (blue curve) and the actual output (red curve), which can also be proved in the lower part that shows the deviation of KSER-KPLS is closer to 0. Fig. 1.5(A) and (B) are the fault diagnosis results of the KSER-KPLS algorithm with the nonlinear diagnosis method and the linear method RCI, respectively. By comparing Fig. 1.5(A) and (B), it can be found that with the KSER algorithm, the linear method can also realize a satisfactory diagnosis effect of the nonlinear process, and the variable x26 is diagnosed as the fault variable, which is consistent with the fault diagnosis result of the nonlinear method. In addition, the application of the KSER algorithm can also improve the calculation efficiency of the fault detection process. Table 1.3 shows the time

38

Fault diagnosis and prognosis techniques for complex engineering systems

Predicted value of Y

15 Actual output KSER Kernel function

10 5 0 −5 −10 −15

0

100

200

300

400

500 Samples

600

700

800

900

Deviation of Y prediction

5 KSER Kernel function 0

−5

−10

0

100

200

300

400

500

600

700

800

900

Samples

FIGURE 1.4

The prediction effect before and after the application of KSER.

TABLE 1.3 The computational efficiency results for the TE process Methods

Offline

Online

KPCA/KPLS/TKPLS

24hrs

> 3hrs

KPLS-KSER

Unnecessary

0.0245408 s

spent by several conventional methods and the KPLS-KSER method in the online and offline stages of fault detection, respectively. The general kernel-based nonlinear method increases the dimension of the data matrix, so the calculation time increases significantly. However, the KSER method can significantly reduce the time required for fault detection due to the offline part being unnecessary, and the time required for the online process only takes about 0.025 s. In conclusion, the use of the KSER method can make up for the deficiency of the linear method in dealing with nonlinear problems, and achieves almost the same effect as the nonlinear method. At the same time, it shortens the time cost of fault detection. According to the simulation results, the efficiency of KSERKPLS is much better than that of the methods without the replacement.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

39

A

B FIGURE 1.5 Fault diagnosis results of KSER-based algorithm with different diagnosis methods. (A) Nonlinear fault diagnosis method (B) Linear fault diagnosis method.

40

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 1.4 The CDRs of fault diagnosis (%) Fault ID

Reconstructed partial derivative method

Partial derivative method

IDV(1)

10.13

0

IDV(2)

5.75

0.88

IDV(3)

84.50

25.75

IDV(4)

73.62

18.38

IDV(5)

65.25

20.50

IDV(6)

2.13

0

IDV(7)

50.12

1.87

IDV(8)

4.75

0

IDV(9)

86.50

28.38

IDV(10)

49.63

14.37

IDV(11)

59.50

10.88

IDV(12)

6.88

0.13

IDV(13)

13.13

1.63

IDV(14)

93.42

1.75

IDV(15)

79.87

29.88

1.4.4

Fault diagnosis without smearing effect

The smearing effect is a problem that can lead to false diagnosis results, as the abnormal contribution values may smear over the nonfaulty variables. According to the reconstructed partial derivative contribution plot introduced in Section 1.3.3, the simulation results of this method are provided in this section to prove the effectiveness of avoiding the smearing effect. In the simulation, IDV(14) is selected, which causes the reactor cooling water valve to stick and leads to the change of the water temperature of the reaction tower (XMEAS(21), variable x21 in the simulation), the temperature of the reaction tower (XMEAS(9), variable x9 in the simulation), and affect purge gas analysis D (XMV(10), variable x32 in the simulation) eventually. Fig. 1.6 shows the variation of these three faulty variables of this fault. The simulation performs the KLS-based fault detection method to obtain the fault alarm results. When the partial differential contribution plot is performed, the diagnosis result is shown in Fig. 1.7(A) that other variables also have abnormal samples in red color, except for the three actual faulty variables: x9 , x21 , and x32 . Fig. 1.7(B) exhibits the diagnosis result of the reconstruction partial derivative contribution plot method. It is obvious that the red bars are concentrated on x9 , x21 , and

The changes in the faulty variables of IDV(14).

41

FIGURE 1.6

Quality-Related Fault Detection and Diagnosis: Chapter | 1

42

Fault diagnosis and prognosis techniques for complex engineering systems

Input varibles

Contribution plot of combine index 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

A

100

200

300

400

500 Samples

600

700

800

900

Input varibles

Contribution plot of combine index

B

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

100

200

300

400

500 Samples

600

700

800

900

0

FIGURE 1.7 Fault diagnosis results of the partial derivative contribution plot methods. (A) Partial derivative contribution plot (B) Reconstructed partial derivative contribution plot.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

43

x32 and rarely appear in other variables, which is consistent with that shown in Fig. 1.6. Then the correct diagnosis rate (CDR) index is used to describe the diagnosis effect. This index can be calculated as Nc , (1.136) CDR = Nf where Nc is the number of correctly diagnosed faulty samples in total and N f is the number of all diagnosed faulty samples. Obviously, the larger the CDR, the better the fault diagnosis effect. Table 1.4 provides the CDRs of the partial derivative contribution plot and the reconstructed method. The CDR data of the partial derivative contribution plot is from Li et al. [61]. It is obvious that the CDRs obtained from the reconstructed partial derivative contribution plot in the first column are much larger than that from the original partial derivative method in the second column, which illustrates that the reconstruction can eliminate the smearing effect and improve the accuracy of the diagnosis.

Appendix A Description of the variables and faults

TABLE A.1.1 Labels and descriptions of the manipulated variables of TEP Variable no.

Description

XMV(1)

D feed flow

XMV(2)

E feed flow

XMV(3)

A feed flow

XMV(4)

A and C feed flow

XMV(5)

Compressor recycle valve

XMV(6)

Purge valve

XMV(7)

Separator pot liquid flow

XMV(8)

Stripper liquid product flow

XMV(9)

Stripper steam valve

XMV(10)

Reactor cooling water flow

XMV(11)

Condenser cooling water flow

44

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE A.1.2 Labels and descriptions of the process variables of TEP Variable no.

Description

XMEAS(1)

A feed

XMEAS(2)

D feed

XMEAS(3)

E feed

XMEAS(4)

A and C feed

XMEAS(5)

Recycle flow

XMEAS(6)

Reactor feed rate

XMEAS(7)

Reactor pressure

XMEAS(8)

Reactor level

XMEAS(9)

Reactor temperature

XMEAS(10)

Purge rate

XMEAS(11)

Product separator temperature

XMEAS(12)

Product separator level

XMEAS(13)

Product separator pressure

XMEAS(14)

Product separator underflow

XMEAS(15)

Stripper level

XMEAS(16)

Stripper pressure

XMEAS(17)

Stripper underflow

XMEAS(18)

Stripper temperature

XMEAS(19)

Stripper steam flow

XMEAS(20)

Compressor work

XMEAS(21)

Reactor cooling water outlet temperature

XMEAS(22)

Separator cooling water outlet temperature

Quality-Related Fault Detection and Diagnosis: Chapter | 1

45

TABLE A.1.3 Labels and descriptions of the analysis variables of TEP Variable no.

Component

Description

XMEAS(23)

A

Reactor feed analysis

XMEAS(24)

B

Reactor feed analysis

XMEAS(25)

C

Reactor feed analysis

XMEAS(26)

D

Reactor feed analysis

XMEAS(27)

E

Reactor feed analysis

XMEAS(28)

F

Reactor feed analysis

XMEAS(29)

A

Purge gas analysis

XMEAS(30)

B

Purge gas analysis

XMEAS(31)

C

Purge gas analysis

XMEAS(32)

D

Purge gas analysis

XMEAS(33)

E

Purge gas analysis

XMEAS(34)

F

Purge gas analysis

XMEAS(35)

G

Purge gas analysis

XMEAS(36)

H

Purge gas analysis

XMEAS(37)

D

Product analysis

XMEAS(38)

E

Product analysis

XMEAS(39)

F

Product analysis

XMEAS(40)

G

Product analysis

XMEAS(41)

H

Product analysis

46

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE A.1.4 Description of the faults Variable

Description

Type

IDV(1)

A/C feed ratio, B composition constant (stream 4)

Step

IDV(2)

B composition (stream 4)

Step

IDV(3)

D feed temperature (stream 2)

Step

IDV(4)

Reactor cooling water inlet temperature

Step

IDV(5)

Condenser cooling water inlet temperature

Step

IDV(6)

A feed loss

Step

IDV(7)

C header pressure loss-reduced

Step

IDV(8)

A, B, C feed composition (stream 4)

Random variation

IDV(9)

D feed temperature (stream 2)

Random variation

IDV(10)

C feed temperature (stream 4)

Random variation

IDV(11)

Reactor cooling water inlet temperature

Random variation

IDV(12)

Condenser cooling

Random variation

IDV(13)

Reaction kinetics

Slow Drift

IDV(14)

Reactor cooling water valve

Sticking

IDV(15)

Condenser cooling

Sticking

IDV(16)

Unknown

Unknown

IDV(17)

Unknown

Unknown

IDV(18)

Unknown

Unknown

IDV(19)

Unknown

Unknown

IDV(20)

Unknown

Unknown

Quality-Related Fault Detection and Diagnosis: Chapter | 1

47

References [1] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control 36 (2) (2012) 220–234. [2] Z. Ge, Z. Song, F. Gao, Review of recent research on data-based process monitoring, Industrial & Engineering Chemistry Research 52 (10) (2013) 3543–3562. [3] Z. Ge, Z. Song, S.X. Ding, B. Huang, Data mining and analytics in the process industry: The role of machine learning, IEEE Access 5 (2017) 20590–20616. [4] Z. Ge, Review on data-driven modeling and monitoring for plant-wide industrial processes, Chemometrics & Intelligent Laboratory Systems 171 (2017) 16–25. [5] K. Peng, K. Zhang, G. Li, Quality-related process monitoring based on total kernel PLS model and its industrial application, Mathematical Problems in Engineering 2013 (2013) 707953. [6] G. Wang, J. Jiao, A kernel least squares based approach for nonlinear quality-related fault detection, IEEE Transactions on Industrial Electronics 64 (4) (2016) 3195–3204. [7] C. Hu, Z. Xu, X. Kong, J. Luo, Recursive-CPLS-based quality-relevant and process-relevant fault monitoring with application to the Tennessee Eastman process, IEEE Access 7 (2019) 128746–128757. [8] K. Zhang, H. Hao, Z. Chen, S.X. Ding, K. Peng, A comparison and evaluation of key performance indicator-based multivariate statistics process monitoring approaches, Journal of Process Control 33 (2015) 112–126. [9] S. Yin, S.X. Ding, A. Haghani, H. Hao, P. Zhang, A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, Journal of Process Control 22 (9) (2012) 1567–1581. [10] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1) (1995) 41–59. [11] S. Joe Qin, Statistical process monitoring: Basics and beyond, Journal ofChemometrics 17 (8-9) (2003) 480–502. [12] D. Zhou, G. Li, S.J. Qin, Total projection to latent structures for process monitoring, AIChE Journal 56 (1) (2010) 168–178. [13] S.J. Qin, Y. Zheng, Quality-relevant and process-relevant fault monitoring with concurrent projection to latent structures, AIChE Journal 59 (2) (2013) 496–504. [14] Q. Liu, S.J. Qin, T. Chai, Multiblock concurrent PLS for decentralized monitoring of continuous annealing processes, IEEE Transactions on Industrial Electronics 61 (11) (2014) 6429– 6437. [15] S.X. Ding, S. Yin, K. Peng, H. Hao, B. Shen, A novel scheme for key performance indicator prediction and diagnosis with application to an industrial hot strip mill, IEEE Transactions on Industrial Informatics 9 (4) (2013) 2239–2247. [16] K. Peng, K. Zhang, B. You, J. Dong, Quality-relevant fault monitoring based on efficient projection to latent structures with application to hot strip mill process, IET Control TheoryApplications 9 (7) (2015) 1135–1145. [17] K. Peng, K. Zhang, J. Dong, B. You, Quality-relevant fault detection and diagnosis for hot strip mill process with multi-specification and multi-batch measurements, Journal of the Franklin Institute 352 (3) (2015) 987–1006. [18] G. Wang, H. Luo, K. Peng, Quality-related fault detection using linear and nonlinear principal component regression, Journal of the Franklin Institute 353 (10) (2016) 2159–2177. [19] Z. Chen, S.X. Ding, K. Zhang, Z. Li, Z. Hu, Canonical correlation analysis-based fault detection methods with application to alumina evaporation process, Control Engineering Practice 46 (2016) 51–58.

48

Fault diagnosis and prognosis techniques for complex engineering systems

[20] Z. Chen, K. Zhang, S.X. Ding, Y.A. Shardt, Z. Hu, Improved canonical correlation analysisbased fault detection methods for industrial processes, Journal of Process Control 41 (2016) 26–34. [21] Q. Zhu, Q. Liu, S.J. Qin, Concurrent quality and process monitoring with canonical correlation analysis, Journal of Process Control 60 (2017) 95–103. [22] Q. Zhu, S.J. Qin, Supervised diagnosis of quality and process faults with canonical correlation analysis, Industrial & Engineering Chemistry Research 58 (26) (2019) 11213–11223. [23] B. Schölkopf, A. Smola, K.-R. Möller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10 (5) (1998) 1299–1319. [24] J.-H. Cho, J.-M. Lee, S.W. Choi, D. Lee, I.-B. Lee, Fault identification for process monitoring using kernel principal component analysis, Chemical Engineering Science 60 (1) (2005) 279– 288. [25] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, Journal of Machine Learning Research 2 (2001) 97–123. [26] K. Peng, K. Zhang, G. Li, D. Zhou, Contribution rate plot for nonlinear quality-related fault diagnosis with application to the hot strip mill process, Control Engineering Practice 21 (4) (2013) 360–369. [27] G. Wang, J. Jiao, S. Yin, A kernel direct decomposition-based monitoring approach for nonlinear quality-related fault detection, IEEE Transactions on Industrial Informatics 13 (4) (2016) 1565–1574. [28] G. Li, S.J. Qin, Comparative study on monitoring schemes for non-Gaussian distributed processes, Journal of Process Control 67 (2018) 69–82. [29] M. Kano, S. Tanaka, S. Hasebe, I. Hashimoto, H. Ohno, Monitoring independent components for fault detection, AIChEJournal 49 (4) (2003) 1–8. [30] J.-M. Lee, C. Yoo, I.-B. Lee, Statistical process monitoring with independent component analysis, Journal of Process Control 14 (5) (2004) 467–485. [31] R. Gonzalez, B. Huang, E. Lau, Process monitoring using kernel density estimation and Bayesian networking with an industrial case study, ISA Transactions 58 (2015) 330–347. [32] J. Yu, A new fault diagnosis method of multimode processes using Bayesian inference based Gaussian mixture contribution decomposition, Engineering Applications of Artificial Intelligence 26 (1) (2013) 456–466. [33] S.W. Choi, J.H. Park, I.-B. Lee, Process monitoring using a Gaussian mixture model via principal component analysis and discriminant analysis, Computers & Chemical Engineering 28 (8) (2004) 1377–1387. [34] Q. Jiang, B. Huang, X. Yan, GMM and optimal principal components-based Bayesian method for multimode fault diagnosis, Computers & Chemical Engineering 84 (2016) 338–349. [35] X. Liu, L. Xie, U. Kruger, T. Littler, S. Wang, Statistical-based monitoring of multivariate non-Gaussian systems, AIChE Journal 54 (9) (2008) 2379–2391. [36] Z. Ge, Z. Song, A distribution-free method for process monitoring, Expert Systems with Applications 38 (8) (2011) 9821–9829. [37] S. Ding, Data-driven design of monitoring and diagnosis systems for dynamic processes: A review of subspace technique based schemes and some recent results, Journal of Process Control 24 (2) (2014) 431–449. [38] Y. Dong, S.J. Qin, Dynamic latent variable analytics for process operations and control, Computers & Chemical Engineering 114 (2018) 69–80. [39] W. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometrics & Intelligent Laboratory Systems 30 (1) (1995) 179–196.

Quality-Related Fault Detection and Diagnosis: Chapter | 1

49

[40] W. Li, S. Qin, Consistent dynamic PCA based on errors-in-variables subspace identification, Journal of Process Control 11 (6) (2001) 661–678. [41] Y. Dong, S.J. Qin, A novel dynamic PCA algorithm for dynamic data modeling and process monitoring, Journal of Process Control 67 (2018) 1–11. [42] Y. Dong, S.J. Qin, Dynamic-inner canonical correlation and causality analysis for high dimensional time series data, IFAC-PapersOnLine 51 (18) (2018) 476–481. [43] J. Huang, X. Yan, Dynamic process fault detection and diagnosis based on dynamic principal component analysis, dynamic independent component analysis and Bayesian inference, Chemometrics & Intelligent Laboratory Systems 148 (2015) 115–127. [44] K. Helland, H.E. Berntsen, O.S. Borgen, H. Martens, Recursive algorithm for partial least squares regression, Chemometrics & Intelligent Laboratory Systems 14 (1) (1992) 129–137. [45] S.J. Qin, Recursive PLS algorithms for adaptive data modeling, Computers & Chemical Engineering 22 (4) (1998) 503–514. [46] Y. Dong, S.J. Qin, Dynamic-inner partial least squares for dynamic data modeling, IFACPapersOnLine 48 (8) (2015) 117–122. [47] C.F. Alcala, S.J. Qin, Analysis and generalization of fault diagnosis methods for process monitoring, Journal of Process Control 21 (3) (2011) 322–330. [48] S.W. Choi, I.-B. Lee, Multiblock PLS-based localized process diagnosis, Journal of Process Control 15 (3) (2005) 295–306. [49] C.F. Alcala, S.J. Qin, Reconstruction-based contribution for process monitoring, Automatica 45 (7) (2009) 1593–1600. [50] G. Li, S.J. Qin, T. Chai. Multi-directional reconstruction based contributions for root-cause diagnosis of dynamic processes. In Proceedings of the 2014 American Control Conference. 3500⬓3505. [51] G. Li, T. Yuan, S.J. Qin, T. Chai, Dynamic time warping based causality analysis for root-cause diagnosis of nonstationary fault processes, IFAC-PapersOnLine 48 (8) (2015) 1288–1293. [52] S. Yoon, J.F. MacGregor, Fault diagnosis with multivariate statistical models Part I: Using steady state fault signatures, Journal of Process Control 11 (4) (2001) 387–400. [53] J. Liu, Fault diagnosis using contribution plots without smearing effect on non-faulty variables, Journal of Process Control 22 (9) (2012) 1609–1623. [54] J. Liu, D.S.H. Wong, D.-S. Chen, Bayesian filtering of the smearing effect: Fault isolation in chemical process monitoring, Journal of Process Control 24 (3) (2014) 1–21. [55] Y. Zhang, L. Zhang, R. Lu, Fault identification of nonlinear processes, Industrial & Engineering Chemistry Research 52 (34) (2013) 12072–12081. [56] G. Wang, J. Jiao, S. Yin, Efficient nonlinear fault diagnosis based on kernel sample equivalent replacement, IEEE Transactions on Industrial Informatics 15 (5) (2018) 2682–2690. [57] Y. Zhang, R. Sun, Y. Fan, Fault diagnosis of nonlinear process based on KCPLS reconstruction, Chemometrics & Intelligent Laboratory Systems 140 (2015) 49–60. [58] J.J. Downs, E.F. Vogel, A plant-wide industrial process control problem, Computers & Chemical Engineering 17 (3) (1993) 245–255. [59] L.H. Chiang, E.L. Russell, R.D. Braatz, Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis, Chemometrics & Intelligent Laboratory Systems 50 (2) (2000) 243–252. [60] D. Zhou, G. Li, S.J. Qin, Total projection to latent structures for process monitoring, AiChE Journal 56 (1) (2009) 168–178. [61] G. Li, T. Yuan, S.J. Qin, T. Chai, Dynamic time warping based causality analysis for root-cause diagnosis of nonstationary fault processes, IFAC-PapersOnLine 48 (8) (2015) 1288–1293.

Chapter 2

Canonical correlation analysis–based fault diagnosis method for dynamic processes Zhiwen Chen and Ketian Liang School of Automation, Central South University, China

2.1 Introduction With the rapid development of a decentralized control system and the other enabling techniques, such as sensor and computational power, a huge amount of data can be easily obtained in modern industrial processes for control, monitoring, and predictive maintenance tasks, among others [1, 2]. Due to the physical interaction and the role of feedback control, dependence is ubiquitous among the collected variables. In statistics, dependence is any statistical relationship between two random variables. In the broadest sense, correlation is any statistical dependence, although it commonly refers to the degree of which a pair of variables are linearly related [3]. With regard to the fault diagnosis task, correlations are useful because they can indicate a predictive relationship that can be exploited in practice. Over the past decades, multivariate analysis (MVA), which considers the correlation among random variables, has been widely explored in the process monitoring and fault diagnosis community [4]. Parallel with principal component analysis (PCA) and partial least squares, as well as independent component analysis, canonical correlation analysis (CCA) as a typical MVA technique has been widely used for process monitoring and fault diagnosis [5]. The main difference between CCA and the other MVA methods is that CCA is closely related to mutual information [6]. CCA was first introduced by Harold Hotelling in 1936 to study relations between two sets of variables [7], but the motivation is not for industrial data analysis. In an attempt to calculate the amount of information about a random function contained in another random function, CCA has been used by Gelfand and Yaglom [8] with a proper definition of the amount of information. Later, Akaike [9] analyzed the structure of the information interface between the Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00004-5 Copyright © 2021 Elsevier Inc. All rights reserved. 51

52

Fault diagnosis and prognosis techniques for complex engineering systems

future and the past of a discrete-time stochastic processes using CCA. The CCA technique used in Akaike’s work is the same as the conventional CCA; however, a study of system dynamics forms a point of departure from the conventional MVA to the dynamic time series analysis. With the aid of the Akaike information criterion for the determination of the state order, Larimore [10] then explored CCA for system identification, filtering, and control of a general linear system represented by state space models. Unlike the CCA method in multivariate statistical analysis, Larimore’s method called canonical variate analysis (CVA) is more general and involves solving a closely related reduced rank prediction problem, and the solution is given by a CCA of the covariance of past and future. Based on the work of Larimore, the CVA method was explored for process monitoring and fault diagnosis [11–14]. For such a task, the basic idea is to first identify a state space model of the system of interest and then look for abnormal behavior either within the state space or behavior that significantly departs from the state space. For example, Chiang et al. [5] used the CVAbased fault detection method for monitoring a chemical process. In CVA-based process monitoring methods, a state space model should be identified first, and the abnormalities are strictly divided into two types: the one affects the state space, and the other departs from the state space. On one hand, the abnormalities may hardly be strictly divided since the faults are usually unknown and the occurrence of faults may change the process dynamics. On the other hand, for the process monitoring task, a state space model may not need to be identified. Motivated by this discussion, Chen et al. [15] proposed two CCA-based process monitoring methods to detect faults for static and dynamic processes. In the so-called static process, the random variables are assumed to be independent and identical distribution, whereas in the dynamic process, the random variables are autocorrelated—that is, the future variables are correlated with the past and present variables. Instead of identifying a state space model, a residual generator is constructed via the CCA algorithm for obtaining the required parameters. Except for dynamic characteristic, most processes are nonlinear in nature. Therefore, the conventional process monitoring methods are inappropriate for monitoring such nonlinear dynamic processes. To extend the CCA method to handle the nonlinear characteristic, much information can be found in the literature. For example, in the work of Odiowei and Cao [13], to handle the non-Gaussian issue induced by a nonlinear process, a method based on CVA and the kernel density estimation (KDE) method was proposed. Motivated by the limitation of KDE for determining the threshold, a randomized algorithm was studied to determine an appropriate threshold and was combined with the CCA method to handle the non-Gaussian problem [16]. The other CCA-based advanced fault diagnosis methods can be found in other works [17–20]. Benefiting from nonlinear representation learning ability, the deep neural network (DNN) has been widely explored [21, 22]. Among the methods, CCA combined with DNN has attracted a great deal of attention. In the pioneering work of

canonical correlation analysis–based fault diagnosis method Chapter | 2

53

Galen Andrew, a deep CCA method was proposed to learn complex nonlinear transformations of two views of data such that the resulting representations are highly linearly correlated [23]. In this deep CCA method, two deep feed-forward neural networks are used. Then a deep canonically correlated autoencoder method was proposed for an extension of dynamical CCA (DCCA) to include the reconstruction objective of multimodal autoencoders with the correlation objective [24]. To improve temporal relationship learning performance, the deep canonically correlated long short-term memory (LSTM) methods were proposed in the work of Mallinar and Rosset [25]. Motivated by the success of the DNN-based CCA method in the multiview learning community, recently a regularized deep correlated representation method that combines two deep belief networks and CCA was proposed for monitoring nonlinear processes [26]. It can be observed that the DNN-based CCA methods are a hot topic and an alternative method for dynamic process fault diagnosis. Therefore, in this work, we introduce two types of CCA-based methods for dynamic processes: one is a DCCA for dynamic processes in steady state, and the other is a DNN-aided CCA method for nonlinear dynamic processes. The performance of the CCA-based fault detection methods will be demonstrated and compared to the conventional CCA method. Two applications will be presented: one using continuously stirred tank reactor (CSTR) data and the other using the traction drive and control system (TDCS) of a high-speed train. The rest of this chapter is organized as follows. In Section 2.2, the background of CCA-based fault diagnosis is introduced. Two CCA-based fault diagnosis methods for dynamic processes are presented in Section 2.3. In Section 2.4, the introduced methods are validated through two industrial benchmark cases, and furthermore, a comparative study of the presented methods to conventional CCA method is performed. We provide our conclusion in Section 2.5.

2.2 Preliminaries 2.2.1

Basics of conventional CCA

CCA is one of the most well known MVA techniques and is widely used in various disciplines, such as in dimensionality reduction, semantic representation learning, system identification, filtering, and fault detection [27–29]. CCA is a method of correlating linear relationships between two random variables, which could be multidimensional. In mathematics, CCA is used to find basis vectors for two sets of variables such that the correlations between the projections of the variables onto these basis vectors are mutually maximized. Assume that u ∈ Rl , y ∈ Rm are two set of random vectors with dimensions of l and m, respectively. By collecting N observations of each random vector, we have two data matrices U ∈ RN×l and Y ∈ RN×m , respectively. The observations are supposed to be jointly sampled from a normal multivariate distribution. Without loss of generality, we assume that the variables are zero mean. In other

54

Fault diagnosis and prognosis techniques for complex engineering systems

words,      u 0  ∼N , u y 0 yu

uy y

 , yu =Tuy .

(2.1)

Let the variable vectors of the n observations be the column vectors ui ∈ Rn for i = 1, 2,…, l and yi ∈ Rn for i = 1, 2,…, m, respectively. u, y or uT y denotes the inner produce between two vectors. In CCA, the purpose is to find the linear relations between the variables of U and Y. We consider the following linear transformations, Uwu = zu and Ywy = zy

(2.2)

where w u ∈ Rl , zu ∈ RN , wy ∈ Rm , and zy ∈ RN . Following the work of Uurtio et al. [28], the matrices U and Yrepresent linear transformations of the positions wu and wy onto the images zu and zy in the space Rn . According to Golub and Zha [30], the solution of Eq. (2.2) when it comes to CCA should satisfy that the position wu and wy are unit norm vectors and that the enclosing angle,   vectors θ ∈ 0, π2 , between zu and zy is minimized. In this case, the cosine of the angle, also referred to as the canonical correlations, between the images zu and zy is given by the formula

 cos zu , zy = zu , zy /zu  zy

(2.3)

and due to the unit norm constraint cos (zu ,zy ) = zu ,zy . Therefore, the basic idea of CCA is to find two positions wu ∈ Rl and w y ∈ m R that after the linear transformations U ∈ RN×l and Y ∈ RN×m are mapped onto an N-dimensional unit ball and located in such a way that the cosine of the angle between the position vectors of their images zu ∈ RN and zy ∈ RN is maximized, which results in the smallest angle, θ 1 , determine the first canonical correlation that equals cos θ 1 [31]. We have that cos θ1 =

max zu 2 = 1 zy 2 = 1. zu ,zy ∈RN zu ,zy 

(2.4)

Let the maximum be obtained by z1u and z1y . The second smallest enclosing angle θ 2 is obtained by the pair of images z2u and z2y , which can be obtained in the orthogonal complements of z1u and z1y . The procedure to find the remaining pairs   of images is stopped until no more pairs can be found. Thus, κ angles θκ ∈ 0, π2

canonical correlation analysis–based fault diagnosis method Chapter | 2

55

for κ= 1, 2,…,min (l, m) can be obtained recursively as given by max cos θκ = zu ,zy ∈RN zκu ,zκy  κ z = 1 zκ = 1 u 2 y



2  j κ zu , zu = 0 zκy , zyj = 0

(2.5)

∀ j = κ : j, κ = 1, 2, · · · , min (l, m). The dimensionality of CCA equals the number of canonical correlations κ. In summary, the principle behind CCA is to respectively find two positions in the two data spaces that have images on a unit ball such that the angle between them is minimized and consequently the canonical correlation is maximized. The number of relevant positions can be determined by analyzing the values of the canonical correlations or by applying statistically significance tests [32].

2.2.2

Obtaining the positions and images in CCA

It is known that to obtain the position vectors wu and wy can use techniques from functional analysis, in which the eigenvalue-based methods are widely used depend on various demands. There are three well-accepted eigenvalue-based methods: the methods originally proposed by Hotelling [7], a generalized eigenvalue problem [29], and singular value decomposition (SVD) as presented in the work of Healy [33]. Among them, the SVD-based solution is computationally more tractable for very large datasets. Regarding the efficiency of computation, in this chapter the SVD-based technique is used, which will be briefly introduced in the sequel. As proposed by Hotelling, both the positions wu and wy and images zu and zy are obtained by solving a standard eigenvalue problem. The sample covariance matrices u , y , and uy in Eq. (2.1) can be obtained as u ≈ y ≈ uy ≈

1 UT U N−1 1 YT Y N−1 1 UT Y. N−1

The joint covariance matrix is the one given in Eq. (2.1):   u uy . yu y

(2.6)

(2.7)

The first and greatest canonical correlation that corresponds to the smallest angle is between the first pair of images zu = Uwu and zy = Ywy . Since the correlation between images zu and zy is scale invariant, we can constraint wu and wy to be such that zu and zy have unit variance. In other words, zTu zu = wTu UT Uw u = wTu u wu = 1,

(2.8)

56

Fault diagnosis and prognosis techniques for complex engineering systems

zTy zy = wTy YT Yw y = wTy y wy = 1.

(2.9)

Because of the normality assumption, the variables of U and Y should be zero means. In such a case, the covariance between two images is given by zTu zy = wTu UT Yw y = wTu uy wy .

(2.10)

Substituting Eq. (2.8), Eq. (2.9), and Eq. (2.10) into the algebraic problem in Eq. (2.4), we have



T max wu uy wy cos θ = max zu , zy = zu ,zy ∈RN wu ∈Rl ,w y ∈Rm (2.11)   zu 2 = wTu u wu = 1 zy 2 = w Ty y wy = 1. In the work of Healy [33], the technique of applying SVD to solve the CCA problem was first presented, and later the detailed solutions were described by Ewerbring and Luk [34], which are given as follows. First, the covariance matrices u and y are transformed into identity forms. Based on the symmetric positive definite property, the square roots of the matrices can be found using a Cholesky or eigenvalue decomposition: 1/2 1/2 and y = 1/2 u = 1/2 u u y y .

(2.12)

Multiplying the inverses of the square root factors in both sides of the joint covariance matrix (Eq. 2.7), we have     −1/2 u uy u−1/2 0 0 u 0 y−1/2 yu y 0 y−1/2   Il u−1/2 uy y−1/2 = . (2.13) y−1/2 yu u−1/2 Im Doing an SVD on the following matrix, u−1/2 uy y−1/2 = QT SV,

(2.14)

where the columns of the matrices Q and V correspond to the sets of orthonormal left and right singular vectors, respectively. The canonical correlations are the singular values of matrix S. The position vectors wu and wy can be obtained as wu = u−1/2 Q and wy = y−1/2 V.

(2.15)

It should be noted that there are also some alternatives to solve the CCA problem. In general, the main motivation to improve the eigenvalue-based technique contributes to the computational complexity. The time complexities of the standard and generalized eigenvalue methods are scale with the cube of the input matrix dimension—that is, O(N3 ) for a matrix of size N × N. The matrix u−1/2 uy y−1/2 in the SVD-based solution is rectangular, whose time complexity is O(MN2 ), for a matrix of size M × N.

canonical correlation analysis–based fault diagnosis method Chapter | 2

2.2.3

57

Details of the SVD-based technique

After doing SVD of matrix u−1/2 uy y−1/2 , we obtain three matrices: Q, S, and  S 0 , V. They can be expressed as Q = (q1 ,…, ql ), V = (v1 ,…, vm ), S = κ 0 0 QQT = Il , and VVT = Im , where κ = rank(uy ) denotes the number of nonzero eigenvalues, Sκ = diag(δ1 , δ2 , · · · , δκ ), 1 ≥ δ1 ≥ δ2 ≥ · · · ≥ δκ ≥ 0.   Note that wu = u−1/2 Q ∈ Rl×l , wu = w 1u , · · · , w lu , wy = y−1/2 V ∈ Rm×m , and w y = w 1y , · · · , w m y . It is known that w Tu u w u = Il , wTy y wy = Im , 

wTu uy wy

diag(δ1 , · · · , δκ ) =S= 0

(2.16)  0 . 0

(2.17)

Definition 1. Given random vectors u ∈ Rl , y ∈ Rm satisfying Eq. (2.1) and wu , wy being defined in Eq. (2.15). Then wiu = u−1/2 qi , w iy = y−1/2 v i , i = 1, · · · , κ

(2.18)

are called canonical correlation vectors, ziu = Uwiu , ziy = Yw iy , i = 1, · · · , κ

(2.19)

are called canonical correlation variables, and δ 1 , δ 2 ,…, δ κ are called canonical correlation coefficients. Moreover, it holds for the canonical correlation vectors     ¯ y = w1y , · · · , w κy ¯ u = w1u , · · · , w κu , w w ¯ Ty y w ¯ Tu u w ¯ u = Iκ , w ¯ y = Iκ w

(2.20)

¯ Tu uy w ¯ y = diag(δ1 , · · · , δκ ) := Sκ . w

2.2.4

The CCA-based fault diagnosis method

Here, we introduce the basics of the CCA-based fault diagnosis method based on our previous work; most of details are referred to in the work of Chen et al. [35]. For fault diagnosis, we first define two random vectors: ¯ Tu u − Sκ w ¯ Ty y r¯1 = w ¯ Ty y − Sκ w ¯ Tu u. r¯2 = w

(2.21)

58

Fault diagnosis and prognosis techniques for complex engineering systems

Let ε( • ) be the expectation operator; it turns out that  ¯ Tu u w ¯ Tu uy w ¯ u + Sκ w ¯ Ty y w ¯ y Sκ − w ¯ y Sκ − Sκ w ¯ Ty yu w ¯u ε r¯1 r¯T1 = w  2 2 = Iκ − Sκ Sκ = diag 1 − δ1 , · · · , 1 − δκ , 

¯ Ty y w ¯ Ty yu w ¯ y + Sκ w ¯ Tu u w ¯ u Sκ − w ¯ u Sκ − Sκ w ¯ Tu uy w ¯y ε r¯2 r¯T2 = w  = Iκ − Sκ Sκ = diag 1 − δ12 , · · · , 1 − δκ2 .

(2.22)

(2.23) In general, when we use all columns of the canonical weight matrix, the random vectors can be defined as r1 = wTu u − SwTy y

(2.24)

r2 = wTy y − ST w Tu u. The covariance matrices of both vectors satisfy  y wy ST − wTu uy wy ST − SwTy ⎞ yu w u ε r1 rT1 = w Tu u wu + SwTy ⎛ = Il − SST = diag⎝1 − δ12 , · · · , 1 − δκ2 , 1, · · · , 1⎠,    ε



r2 rT2



(2.25)

l−κ

=

w Ty y wy

+S

T

wTu⎛ u wu S



wTy yu wu S

− ST wTu  ⎞uy wy

= Im − ST S = diag⎝1 − δ12 , · · · , 1 − δκ2 , 1, · · · , 1⎠,   

(2.26)

m−κ

A comparison with   ε wTu UUT wu = wTu u wu = Il , ε w Ty YYT w y = w Ty y wy = Im which are the normalized covariance matrices of u, y, respectively, makes it clear that the covariance matrix of the measurement under consideration becomes smaller when the correlated measurements are taken into account. In fact, r1 and r2 can be rewritten as   r1 = wTu u − uy wy w Ty y = QT u−1/2 u − uy −1 (2.27) y y ,   r2 = wTy y − yu wu wTu u = VT y−1/2 y − yu −1 u u .

(2.28)

Note that uˆ = uy −1 y y,

yˆ = yu −1 u u

are least squares estimations for u, y, and thus the estimation errors u − uˆ and y − yˆ have the minimum variances. This motivates us to use signals r1 and r2 for fault detection and estimation.

canonical correlation analysis–based fault diagnosis method Chapter | 2

59

Let u = fu + ωu , ωu ∼ N(0, u ),

(2.29)

 y = fy + ωy , ωy ∼ N 0, y ,

(2.30)

be the process models for (sub)systems y, u, where fu and fy represent fault vectors in process measurements u and y, respectively. We assume that fu and fy are not present in the process simultaneously. Suppose that ωu and ωy are correlated with  ε ωu ωyT = uy Then after determining wu ,wy ,S according to Eq. (2.18), the hypothesis testing technique can be used for the fault detection decision [36, 37]. To be concise, fault detection can be formulated as a binary hypothesis testing problem because the main objective in fault detection is to make a yes/no decision about the presence or absence of a fault. It is known that the solution to this hypothesis problem should perform a compromise between two incorrect decisions: a false alarm (i.e., false rejection of the null hypothesis, H0 ) and no detection alarm (i.e., missed acceptance of the alternative hypothesis, H1 ). Then we should develop a tractable statistical test to aid in making the hypothesis testing decision. The quality of a test can be characterized by two measures: the probability of a false alarm (referred as false alarm rate [FAR]) and the power function, which is the probability of deciding H1 when H1 is true (referred as fault detection rate [FDR]). Of course, a good fault detection can be defined as the value of FAR should be as small as possible, and the value of FDR should be as large as possible for each fault [38]. Hence, after getting the residual signals r1 and r2 , we can first develop a statistical test and then compare the statistical test with a corresponding threshold to make a fault detection decision. The dedicated fault detection solution is as follows: • Develop the test statistics: ⎛

Ju = rT1 −1 r1 r1 , ⎛

(2.31) ⎞⎞

where r1 = Il − SST = ⎝diag⎝1 − δ12 , · · · , 1 − δκ2 , 1, · · · , 1⎠⎠, and    l−κ



Jy = rT2 −1 r2 r2 , ⎛

(2.32) ⎞⎞

where r2 = Im − ST S = ⎝diag⎝1 − δ12 , · · · , 1 − δκ2 , 1, · · · , 1⎠⎠.    m−κ

60

Fault diagnosis and prognosis techniques for complex engineering systems

• Determine the thresholds: for a given acceptable FAR α:   Jth,u = χα2 (l), prob Ju > χα2 (l)|fault-free = α   Jth,y = χα2 (m), prob Jy > χα2 (m)|fault-free = α

(2.33) (2.34)

where χ (l) standsfor stands for the chi-square distribution with l degrees of  2 (l)|fault-free = α for the probability of Jy > freedom, and prob Jy > χ1−α 2 (l) equals to α(significance level) given that there is a fault-free case. χ1−α 2

• Make the detection logic:  Ju > Jth,u ⇒ absence of fault, otherwise precense of fault Jy > Jth,y ⇒ absence of fault, otherwise precense of fault.

(2.35)

It should be noted that the preceding fault detection solutions only allow a successful fault detection but do not guarantee a perfect fault isolation. This fact can be clearly seen from the following relations: r1 = w Tu u − SwTy y = w Tu fu − SwTy fy + wTu ωu − SwTy ωy .

(2.36)

r2 = w Ty y − ST w Tu u = wTy fy + ST wTu fu + wTy ωy − ST wTu ωu .

(2.37)

which means that r1 and r2 will be influenced by both fu and fy . However, it holds that  ε(Ju ) = fuT u,1 fu + l  for fu = 0, fy = 0 ε Jy = fuT u,2 fu + m   ε Jy = fyT y,1 fy + m for fy = 0, fu = 0 ε(Ju ) = fyT y,2 fy + l   −1 T −1/2 Q u , u,1 = u−1/2 Q diag 1 − δ12 , · · · , 1 − δκ2 , 1, · · · , 1 

 2 −1 δ1 δκ2 diag ,··· , , 0, · · · , 0 QT u−1/2 , u,2 = 1 − δκ2 1 − δ12   −1 T −1/2 V y , y,1 = y−1/2 V diag 1 − δ12 , · · · , 1 − δκ2 , 1, · · · , 1 u−1/2 Q



y,2 =

y−1/2 V

 2 −1 δ1 δκ2 diag ,··· , , 0, · · · , 0 VT y−1/2 , 1 − δκ2 1 − δ12

where l and m are the mean value of vectors that follow chi-squared distribution with the degree of freedom of l and m, respectively. It turns out that on the assumption δ 1 < 1,  u,1 > u,2 , y,1 > y,2 ⇒ ε(Ju ) > ε Jy − m + n for fu = 0, fy = 0,

(2.38)

canonical correlation analysis–based fault diagnosis method Chapter | 2

 ε(Ju ) < ε Jy − m + n for fy = 0, fu = 0.

61

(2.39)

Inequalities (Eq. 2.38) and (Eq. 2.39) can be applied as a decision logic for fault isolation. Note that if Ju and Jy are used, instead of their mean values, false isolation decisions can be made for this purpose. The rate of the false isolation decision depends on fu and fy , which are in general unknown. To reduce false isolation decisions, we can collect data and estimate ε(Ju ) and ε(Jy ). Following the discussion in the work of Chen et al. [39], it is evident that the preceding solutions, {Ju ,Jth,u } and {Jy ,Jth,y }, are the optimal solutions for detecting faults fu and fy , thanks to the fact that u− uˆ and y− yˆ have the minimum variances. However, attention should be paid to the assumption that fu and fy are not present in the process measurement simultaneously. If this is not the case, then the overall model             ω ω 0  uy u f , u (2.40) = u + u , u ∼N fy ωy ωy yu y 0 y should be used for the detection purpose.

2.2.5

Main steps of the CCA-based fault diagnosis method

From the preceding discussion, it can be seen that the CCA-based fault diagnosis method usually consists of two main steps: the generation of residual signals based on the use of a process model and then the evaluation of these residuals using a statistical test. To this end, the flowchart of the method is scratched in Fig. 2.1. It can be seen from Fig. 2.1 that we reasonably partition the CCA-based fault diagnosis method into four steps: (1) Data collection. This is a step of capturing and storing various measurements from different sensors installed on the equipment in the process of interest [40]. It is the first step of CCA-based fault diagnosis, which provides basic information for the following steps. Usually, a data collection system consists of sensors, data transmission devices, and data storage devices. Various sensors are used to capture different types of measurement data, which represent the status of the process and are also able to reflect the information of fault, and if not, this type of fault cannot be detected anyway. In practice, the commonly used sensors include current sensors, voltage sensors, temperature transducers, accelerometers, and flow sensors, among others. Through a data transmission device, the collected data are transmitted into a PC or portable devices and stored into a memory location for further analysis. (2) Data preprocessing. There are two stages in this step. In the first stage, due to the ubiquitous noise in process, errors in measurement devices, and loss and disturbances in data transmission and data storage, the collected

T2ocadin T2ocadin2

FIGURE 2.1

Flowchart of the CCA-based fault diagnosis method.

102 100

0

200

400

600

800 1000

0

200

400

600

800 1000

103 102 101

Fault diagnosis and prognosis techniques for complex engineering systems

Detection result of CCA PPT

104

62

cos θ r = max (zra ,z rb), za,zb€ℜn zra 2 = 1 zrb 2 = 1 (zra ,z ja), = 0 (zrb,z jb), = 0, ∀j = r : j, T = 1, 2,...,min(p, q).

canonical correlation analysis–based fault diagnosis method Chapter | 2

63

data usually is missing data, outliers, and other abnormal data, which will definitely affect the quality of the following steps. In the literature, numerous techniques have been proposed for the removal of outliers and the imputation of missing entries [41]. In the second stage, to handle the case that the data fed into the following steps are too large and redundant, or to find more informative variables than the original measurements, feature variables are extracted from the original measurements to be used in the following steps [42]. For example, the PCA method is a second-order method, which only considers the mean and variance-covariance of the measurement. To provide higher-order representations for non-Gaussian data, some higher-order statistics (kinds of features) can be first extracted and use PCA to analyze the extracted higher-order statistics [43]. Since the removal of outliers and missing data is standard, in the sequel we will not explicitly mention this step. In the second stage, extracting features depends on demands. In this work, we use the original measurements. The remaining two steps—CCA modeling and hypothesis testing—are the key steps in the CCA-based fault diagnosis method, which we explain in detail in Section 2.3 and 2.4. The preceding introduction to the conventional CCA method is reliable to the process, in which variables are independent. However, in practice, autocorrelation is present in data due to the dynamics of process of interest. In other words, when the data contains dynamic information, applying CCA on the data will not reveal the exact relations between the measurement vectors but rather a linear static approximation. Like the successful application of DPCA and DPLS methods to detect faults from a dynamic process [44–46], however, the statistical basis is violated because the data break the assumption of time independence. Therefore, to deal with the data, which will be autocorrelated and possibly cross correlated, two variants of the CCA method—the DCCA method and the gated recurrent units (GRU)-aided CCA method—for dealing with the fault diagnosis of the dynamic process are presented in Sections 2.3.1 and 2.3.2. From the viewpoint of the four steps, the DCCA method makes a change in the CCA modeling step, and the GRU-aided CCA method can be viewed as a variant of CCA changed in the preprocessing and CCA modeling steps. In the preprocessing step, the DNN GRU is used as a feature extraction tool, but the GRU’s training involves the conventional CCA optimization.

2.3 CCA-based fault diagnosis method for dynamic processes 2.3.1

DCCA-based fault detection

Suppose that the dynamic processes under consideration are linear time invariant and with Gaussian distributed process noise and measurement noise. A standard

64

Fault diagnosis and prognosis techniques for complex engineering systems

model form of a dynamic process is the state space representation given by x(k + 1) = Ax(k) + Bu(k) + w(k),

(2.41)

y(k) = Cx(k) + Du(k) + v(k),

(2.42)

where x ∈ Rn is the state vector, u ∈ Rl and y ∈ Rm are input and output vectors, and w ∈ Rn and v ∈ Rm are process noise and measurement noise, respectively. Matrices A, B, C, and D are unknown constant matrices with appropriate dimensions. In this study, we further assume that the process is in the steady state—that is, lim ε(x) = constant and lim x = constant. As a result, k→∞

k→∞

the cross covariance of input and output vectors is constant. In the following, we present DCCA, as an extension of the CCA-based method, to detect faults in such dynamic systems in the steady state.

2.3.1.1 Modeling of input and output datasets Based on the stochastic system model (Eqs. 2.41 and 2.42), the dependence of future outputs yf on past inputs- outputs zp and future inputs uf is investigated in this section. To this end, we first define the data structures and sets. Suppose that s and sf are the time lags. Let the lagged variables and corresponding data matrices be defined as ⎡ ⎤ y(k − s) ⎢ ··· ⎥ ⎡ ⎡ ⎤ ⎤ ⎥ ⎢ y(k) u(k) ⎢ y(k − 1) ⎥ ⎥ ⎣ ⎣ ··· ⎦ ⎦ z p (k) = ⎢ ⎢ u(k − s) ⎥, y f (k) =  · · · , u f (k) =  ⎥ ⎢ y k + s u k + sf f ⎣ ··· ⎦ u(k − 1)   z p (k) = z p (1), · · · , z p (N) ∈ R(s(m+l))×N   y f (k) = y1 (1), · · · , y f (N) ∈ R(s f +1 )m×N   u f (k) = u f (1), · · · , u f (N) ∈ R(s f +1 )l×N .

(2.43)

It is shown in the work of Lehmann [47] that the representation of Eqs. (2.41 and (2.42) can be rewritten as x(k + 1) = AK x(k) + BK u(k) + Ky(k),

(2.44)

y(k) = Cx(k) + Du(k) + e(k),

(2.45)

where AK = A − KC, BK = B − KD, with K as Kalman filter gain matrix to ensure that the eigenvalues of AK are all located in the unit circle to make the system stable. e(k) is the innovation sequence. It is straightforward from

canonical correlation analysis–based fault diagnosis method Chapter | 2

65

Eq. (2.44) that the following equation holds: x(k + 1) =

AsK x(k

− s) +

s #

Ai−1 K

i=1

   y(k − i) KBK . u(k − i)



(2.46)

Recall that AK is stable and a large s leads to AsK ≈ 0, then x(k) ≈ PT z p (k), (2.47)       where PT = Py Pu , Py = AKs−1 K · · · AK KK , Pu = AKs−1 BK · · · AK BK BK . The “past” process measurements zp (k) include the process input and output data in the time period [k − s, k − 1] as shown in Eq. (2.43). However, from Eqs. (2.44) and (2.45), the following equations also hold: y f (k) = K,s f x(k) + HK,u,s f u f (k) + HK,y,s f y f (k) + e f (k), where

⎡ ⎤T D C ⎢ ⎢ CAK ⎥ ⎢ CBK ⎢ ⎥ = ⎢ . ⎥ , HK,u,s f = ⎢ ⎢ .. ⎣ .. ⎦ ⎣ . sf s f −1 CAK CA B ⎡

K,s f

K

⎡ HK,y,s f

⎢ ⎢ =⎢ ⎢ ⎣

0

s −1

D .. . ···

··· .. . .. . CBK

⎤ 0 .. ⎥ . ⎥ ⎥ ⎥ 0⎦ D

⎤ ⎤ ⎡ ··· 0 e(k) .. ⎥ .. ⎢ e(k + 1) ⎥ . .⎥ 0 ⎥ ⎥, e f (k) = ⎢ ⎥. ⎢ .. ⎥ .. .. ⎦ ⎣ . ⎦ . . 0  e k + s f · · · CK 0 0

CK .. .

CAKf

K

0

(2.48)

K

Based on Eq. (2.47), we obtain  I − eK,y,s f y f (k) ≈ K,s f PT z p (k) + HK,u,s f u f (k) + e f (k)     z p (k) T K,s f P HK,u,s f + e f (k). u f (k) Eq. (2.49) is further rewritten as

 z p (k) + e f (k), u f (k) T  T  where L = I − HK,y,s f , M = K,s f PT HK,u,s f . LT y f (k) = MT

(2.49)



(2.50)

2.3.1.2 The DCCA-based fault detection method This section addresses fault detection in dynamic processes by applying the CCA technique for residual generation. The process  input  and outputdata are Zp Zp . Let Yf and be constructed in a time interval, denoted as Yf and Uf Uf

66

Fault diagnosis and prognosis techniques for complex engineering systems

mean centered, then 

z y f ,z

z,y f y f



⎛

 T Zp Zp 1 ⎜ ⎜ Uf U Tf ≈ N − 1⎝ Zp Yf Uf



 ⎞ Zp T Yf ⎟ Uf ⎟. ⎠ T Yf Yf

By using CCA, the weighting matrices wuf and wyf can be obtained from (:, 1 : n), w u f = z−1/2 (:, 1 : n), w y f = y−1/2 f   n 0 =  , = z−1/2 z,y f y−1/2 , f 0 0

(2.51)

where n = diag(λ1 ,…, λn ). The cumulative percentage value or the Akaike information criterion method [15] can be used to determine n, which is called the order of the system. Note that   T Z Zp wu f = Im, w Tyf Y f YTf wy f = I w Tu f p Uf Uf the following equation can be obtained from Eq. (2.51):   Z wTu f p YTf wy f = n . Uf It is reasonable to define a residual vector according to Eq. (2.52),   T T z p (k) r(k) = w y f y f (k) − n w u f . u f (k)

(2.52)

(2.53)

Furthermore, the covariance matrix of r(k) can be estimated as     T T T Zp T T Zp wy f Y f (k) − n w u f w y f Y f − n w u f Uf Uf   T   Z Zp Z = wTyf Y f YTf wy f + 2n wTu f p wu f − 2n wTu f p YTf wy f Uf Uf Uf = I − 2n .

(2.54)

The residual follows multivariate normal distribution with zero mean and covariance matrix given by Eq. (2.54). It is thus reasonable to apply the following test statistic for the fault detection decision: −1  (2.55) Tr2 (k) = (N − 1)rT (k) I − 2n r(k). The corresponding threshold Jth,T can be determined by  n N2 − n F1−α (n, N − n), Jth,T = N(N − n)

(2.56)

where F1 − α (n, N − n) stands for the F-distribution with n and N − n degrees of freedom with the given significance level α.

canonical correlation analysis–based fault diagnosis method Chapter | 2

FIGURE 2.2

2.3.2

67

Illustrations of the GRU model [48].

The GRU-aided CCA fault detection method

In this section, the proposed method is described in detail. Section 2.3.2.1 introduces the basic structure of the GRU with only one layer. In Section 2.3.2.2, the GRU is combined with the CCA technique to form a new method for fault diagnosis of dynamic processes.

2.3.2.1 The GRU The recurrent neural network (RNN) is a commonly used DNN to deal with time series data analysis. The GRU is a gating mechanism in the RNN, which can overcome the problem of gradient disappearance [48]. The same gate control mechanism is used as LSTM [49]. In the GRU, it has only two gates: the update gate and the reset gate. The pre-memory can be correlated when identifying and predicting the follow-up data. Compared with LSTM, the GRU has fewer parameters, and hence the calculation of the GRU is much less. Its structure is explained in Fig. 2.2. The update gate vector zt is used to control the extent to which the state information of the previous moment is brought into the current state. The reset gate vector rt is adopted to control the degree of ignoring the state information of the previous moment. For the input vector ut at time t, operating functions in GRU hidden elements are given as follows: zt = δ(Wz ut + Vz Ht−1 + bz ),

(2.57)

rt = δ(Wr ut + Vr Ht−1 + br ),

(2.58)

ht = tanh (Wc ut + Vc (rt ◦ Ht−1 )),

(2.59)

Ht = (1 − zt ) ◦ Ht−1 + zt ◦ ht ,

(2.60)

where Wz , Wr are weight parameters of zt and rt gates, respectively, and Wc denotes the weight parameter of the output gate. Ht − 1 is the output states at time

68

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 2.3

Structure of GRU-aided CCA.

t − 1. “◦” represents the Hadamard product, and ht and Ht are candidate states and output states at time t. δ( • ) and tanh ( • ) denote activation functions, which activate the update gate and reset gate. Vz , Vr , bz , and br are the parameter matrices and vectors, and they are learned through the model training process. For an intuitive illustration, it can be seen from Eqs. (2.57) through (2.60) that when rt is set to 1 and zt is set to 0, the GRU degenerates into a simple RNN model.

2.3.2.2 The GRU-aided CCA method Note that the GRU has been used as fault classifier for fault diagnosis [50, 51]. The structure of the GRU-aided CCA method is illustrated in Fig. 2.3, and the major component of the proposed method includes two deep GRUs and a CCA optimization in the top layer. Before the training, the measured data including

canonical correlation analysis–based fault diagnosis method Chapter | 2

69

input vectors ut and yt should be prepared. It is well accepted that the GRU can represent the dynamic in nature; therefore, the input data fed to the GRU has no need to augment with s lagged data as in conventional CVA. Then they are expanded with a batch of N samples, respectively. When the sample interval is short enough in practice, it will also detect the fault in time even if the value of N is large. Joint representation learning consists of two GRUs and a CCA, the aim of which is to learn the parameters in GRUs so that the correlation between the transformed input data h(ut ) and the transformed output data g(yt ) is as large as possible, where h( • ) and g( • ) denote the mapping function of the two GRUs. By collecting a batch of samples with length N, the batched input and output data are UN ∈ Rl×N and YN ∈ Rm×N , and it is necessary to be initially scaled into the interval between 0 and 1, which is convenient for neural networks to train. For the established process input and output time series dataset, two GRUs with the same structure are constructed to extract the hierarchical representations from the input and output, respectively. Two GRUs’ output, U = h(UN ) and Y = g(YN ), have the fixed and same dimension. GRU-aided CCA trains h and g based on the following objective, which maximizes the canonical correlation at the output layer between the two data:  max N1 trace JT NYT L h , g ,J,L, ' & T s.t.JT UU + r I J=I u N (2.61) ' & T LT YY + r I L = I y N ru > 0, ry > 0. where h and g are the collection of parameters in both GRUs, and ru and ry are small real values and are used as regularization parameters to avoid the possible numerical problem of calculating the covariance matrices of U and Y. Then the conventional CCA algorithm can be used to solve the objective (Eq. 2.61) and train the two GRUs. Ending with the training of two GRUs, the parameters ˆ g are retained, the linear mappings J and L can be obtained, and the ˆ h and diagonal matrix is also obtained. As shown in Fig. 2.3, the procedures include residual generator building, test statistic construction, and monitoring by detection logic. For implementation, the complete method includes two stages: offline modeling and online monitoring. (1) Offline modeling: Deep GRU training and construction of the test statistic. Fault-free datasets are used for GRU-aided CCA training in this chapter. To begin with, it is necessary to construct the training set with the form of time series based on the original time series of N samples. At this point, it is particularly important to select the length of each time series in the training set, which may not only affect the amounts of samples in the whole training set but

70

Fault diagnosis and prognosis techniques for complex engineering systems

also directly determines the training effect. For the time series training sets constructed by system input and output data, we built two GRU networks with the same structure for joint learning, respectively. The whole neural network consists of multiple GRU layers, dropout layers, and fully connected layers. Then CCA is performed with the output data from the two GRUs. The choice of node number of output layers will directly affect the accuracy and complexity of the subsequent analysis and calculation. In addition, the activation function of the hidden layer in GRUs is chosen as “Sigmoid,” whereas the activation function of the output layer is “linear” and the training optimizer uses “Adam.” Construction of the test statistic. For the established process input and output time series dataset, two GRUs with the same structure are constructed to extract the hierarchical representations from the input and output, respectively. Each of the GRUs outputs hierarchical representation vectors uth and ytg with the fixed and same dimension. With the mappings J and L calculated, the residual vectors can be obtained as rg1 = JT uth − LT ytg,

(2.62)

rg2 = LT ytg − T JT uth .

(2.63)

Then the T2 test statistics can be constructed as Tg12 = rTg1 −1 g1 rg1 .

(2.64)

Tg22 = rTg2 −1 g2 rg2 ,

(2.65)

where g1 and g2 are covariance matrices of residual vectors rg1 and rg2 , respectively. They can be obtained as g1 = I − T ,

(2.66)

g2 = I − T .

(2.67)

Threshold determination. Since the process is nonlinear, the Gaussian distribution does not hold. Therefore, KDE can be used to calculate the threshold of statistics [13]: (b P(x < b) = −∞

  M 1 # x − xk K dx, Md k=1 d

(2.68)

where xk , k = 1, 2,…, M is the sample of the test statistic T2 , function K( • ) is a kernel function, and d is the bandwidth of it. Here, a radial basis kernel function is used:   β2 1 . (2.69) K(β ) = √ exp − 2 2π

canonical correlation analysis–based fault diagnosis method Chapter | 2

FIGURE 2.4

71

Schematic of closed-loop CSTR.

Given a significance level α, a threshold Jth,g can be calculated by  P T 2 < Jth,g = α.

(2.70)

Hence, using Eq. (2.70), the corresponding thresholds of statistics Tg12 and Tg22 can be obtained and denoted as Jth,g1 and Jth,g2 , respectively. (2) Online monitoring: When a new sample is measured, it is initially scaled with the same methods as normal training sets, noted as unew and ynew . The T2 statistics are calculated on the basis of the established GRU-CCA model. The following decision logic is used to decide if the fault occurs or not:  Tg12 > Jth,g1 or Tg22 > Jth,g2 ⇒ presence of fault (2.71) Tg12 ≤ Jth,g1 and Tg22 ≤ Jth,g2 ⇒ absence of fault.

2.4 Experimental results and analysis In this section, two industrial benchmark experiment cases will be used to assess the performance of the proposed methods.

2.4.1

The CSTR process

2.4.1.1 Introduction to CSTR Controlled CSTR is a second-order nonlinear dynamic simulation system in the chemical industry and is widely used for validation of fault diagnosis methods. In this work, a Simulink model of a CSTR under closed-loop control is used, where the process is represented by three ordinary differential equations that are mass and energy balances around the system. CSTR carries out a hypothetical firstorder exothermic reaction, where the tank temperature (T) is maintained using a cooling jacket. Process conditions are being perturbed around the nominal operating point by random disturbances on three input variables. Fig. 2.4 shows the schematic of the closed-loop CSTR, in which the measurement locations and the control strategy are illustrated, and reactor temperature, T, is maintained by manipulating the coolant flow rate, Qc . In this simulation,

72

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 2.1 Fault types in the CSTR dataset Case

Description

F1

Catalyst deactivation

F2

Heat transfer fouling by exponential decay

F3

Feed temperature with disturbance ramp changes

F4

Coolant feed temperature with disturbance ramp changes

the controller (Kc = 1.0 and τ I = 0.2) is set to saturate below 10 L/min and above 200 L/min. Detailed descriptions of CSTR can be found in the work of Pilario and Cao [52], and no details are given here. According to Fig. 2.4, the inputs are material import concentration, Ci , material import temperature, Ti , coolant inlet temperature, Tci and Qc , the outputs are coolant outlet temperature, Tc , and concentration of material in the reactor, C and T. Therefore, a dataset comprised of seven variables is manipulated to create incipient faults to evaluate the efficiency of the fault diagnosis method. Four typical incipient faults were simulated, including two multiplicative faults and two additive faults, as show in Table 2.1. Fault-free and faulty data for every kind of fault are generated from the simulation for 20 hours of operations with sampling interval being 1 minute, so the length of time series of every variable is 1200 points. In a faulty dataset, the fault is introduced after 200 minutes of normal operation, so every faulty data includes 200 points of normal data and 1000 points of faulty data behind. In addition, by perturbing the input variables randomly every fixed length of time, the datasets measured become temporally correlated and dynamic. At the same time, they do not fit a Gaussian distribution due to the nonlinearity of process, making the data suitable for the evaluation of the presented methods.

2.4.1.2 DCCA and GRU-aided CCA training In the training process of CCA and DCCA, iterative optimization is no need, and SVD is applied to solve the optimization objective, as well the fault detection threshold is determined. However, the training process of the GRU is relatively complex because neural networks contain several hyperparameters and a large number of trainable parameters. In the training phase, the main task of GRUaided CCA is the hyperparameter adjustment of two GRUs, among which the length of input sequence and the number of hidden layers and neurons play the most important role. In this experiment, the final parameters and structure of GRU-aided CCA are as follows. First, only fault-free data are used for training, and seven variables are divided into two groups and input into two GRU models, respectively. The length of the input sequence is 60. Both GRU models have

canonical correlation analysis–based fault diagnosis method Chapter | 2

73

Fault injection Detection threshold

A Results of F1

Fault injection Detection threshold

B Results of F2 FIGURE 2.5

CCA fault detection result of CSTR.

the same structure, which is Input→GRU (64 cells)→Dropout (0.2)→GRU (64 cells)→Dropout (0.2)→fully connected layer (32 nodes)→fully connected layer (20 nodes)→Output. According to the preceding structure, the input sequences will be mapped to a 20-dimensional vector after passing through the GRU. Then the loss function value is calculated by Eq. (2.68), and the Adam gradient descent algorithm is used to optimize the parameters of two neural networks.

2.4.1.3 Results and analysis The fault detection result of CCA is shown in Fig. 2.5. In the case of F1, the statistics of CCA fluctuate sharply, and a large number of miscalculations occur at the initial stage of the failure. This is because the system variable oscillates violently when F1 occurs, whereas CCA only considers the correlation of the variables of current moment, so the statistics also change with the oscillation of variables. CCA performs well in the case of other faults, indicating that CCA is

74

Fault diagnosis and prognosis techniques for complex engineering systems

Fault injection Detection threshold

C Results of F3

Fault injection Detection threshold

D Results of F4 FIGURE 2.5

Continued.

not completely useless for nonlinear systems and can still correctly detect faults in some cases. The fault detection result of DCCA is shown in Fig. 2.6. For F1, the test statistics obtained by DCCA are obviously more stable than that of CCA, with lower detection delay, and there will be no high misjudgment rate in the early stage of failure. For other fault types, DCCA also achieves a good detection effect. GRU-aided CCA was trained according to Section 2.4.1.2, and the value of loss function during the training process is shown in Fig. 2.7. This means that the neural network is able to perform nonlinear mapping for two sets of input sequences, and the correlation after mapping is close to 1. After training with fault-free data, GRU-aided CCA was tested by faulty data of cases F1, F2, F3, and F4, and the detection results are shown in Fig. 2.8. It can be seen from Fig. 2.8 that GRU-aided CCA can effectively detect multiple types of faults, and the T2 test statistics under normal and fault scenarios

canonical correlation analysis–based fault diagnosis method Chapter | 2

75

Fault injection Detection threshold

A Results of F1

Fault injection Detection threshold

B Results of F2 FIGURE 2.6

DCCA fault detection result of CSTR.

are clearly distinguished. The specific evaluations are shown in Table 2.2, in which the three fault detection performance indicators are FAR, FDR, and fault detection delay (FDD), whose formula can be found in the work of Chen et al. [39].

2.4.2

The TDCS process

2.4.2.1 Introduction to TDCS To verify the feasibility of GRU-aided CCA in more complex nonlinear dynamic systems, another experimental platform of high-speed train TDCS is used for testing. TDCS is jointly developed by Central South University and CRRC Zhuzhou Institute Company, Limited, based on the hardware-in-the-loop (HIL) platform, including a dSPACE real-time simulator, signal conditioner, traction control unit (TCU), and host PC, as shown in Fig. 2.9 [53, 54].

76

Fault diagnosis and prognosis techniques for complex engineering systems

Fault injection Detection threshold

C Results of F3

Fault injection Detection threshold

D Results of F4 FIGURE 2.6

Continued.

FIGURE 2.7

Loss function value of GRU-aided CCA during training.

canonical correlation analysis–based fault diagnosis method Chapter | 2

77

Fault injection Detection threshold

A Results of F1

Fault injection Detection threshold

B Results of F2 FIGURE 2.8

GRU-aided CCA fault detection result of CSTR.

The platform can simulate a variety of equipment faults; in this case, three motor faults are selected for the experiment: rotor broken bar fault (RBB), interturn short circuit fault (ITSC), and air gap eccentricity fault (AGE). Only the motor stator three-phase current data were used in the experiment, and the severity of the three motor faults was 10%. The experimental data consisted of normal and fault data under fixed working conditions. Due to the complexity of the motor fault characteristic, feature extraction is required before training. In this experiment, variance and kurtosis of stator three-phase current are extracted, so all together there are six variables in the input sequences.

2.4.2.2 DCCA and GRU-aided CCA training As in the CSTR experiment, GRU-aided CCA requires hyperparameter adjustment and iterative optimization in the training phase. The optimized parameters and structure selected in this case are Input→GRU (64

78

Fault diagnosis and prognosis techniques for complex engineering systems

Fault injection Detection threshold

C Results of F3

Fault injection Detection threshold

D Results of F4 FIGURE 2.8

Continued.

cells)→Dropout (0.2)→GRU (64 cells)→Dropout (0.2)→fully connected layer (64 nodes)→fully connected layer (30 nodes)→Output. Since a lot of noise is contained in raw data, the length of input sequences is 256.

2.4.2.3 Results and analysis The fault detection result of CCA is shown in Fig. 2.10. It can be seen that under normal conditions, the T2 test statistics fluctuate greatly, but in the RBB and ITSC fault scenario, the statistics change significantly and the fluctuation becomes smaller, in which it can be considered that good detection results have been achieved. However, in the case of AGE fault, the T2 test statistics is unstable, which may lead to miss detection. The fault detection results of DCCA are shown in Fig. 2.11. The statistical difference between normal and fault is greater than CCA, but there are also large fluctuations, and the statistics become more unstable under AGE fault.

canonical correlation analysis–based fault diagnosis method Chapter | 2

79

TABLE 2.2 Evaluations of the fault detection result of CSTR CCA

DCCA

GRU-aided CCA

Fault

FAR

FDR

FDD

FAR

FDR

FDD

FAR

FDR

FDD

F1

5.5%

88.7%

16

4.5%

97.7%

21

0.5%

84.2%

147

F2

5.0%

95.1%

13

5.0%

97.3%

22

5.5%

99.7%

3

F3

5.0%

98.2%

5

5.0%

98.2%

16

5.0%

100%

0

F4

4.5%

98.5%

9

4.0%

98.0%

6

4.0%

100%

0

FIGURE 2.9

TDCS experimental setup.

The value of loss function during the training process is shown in Fig. 2.12. Under the influence of environmental noise, the convergence speed of the neural network is slower, and the amplitude fluctuation of the loss function is also larger, but eventually it can converge to around –1. After the training, the detection results of three faults are shown in Fig. 2.13. As can be seen from this figure, in the normal state, the T2 test statistics are seriously affected by noise, but in case of motor fault, the statistics will change significantly and the fluctuation caused by noise will decrease, so the distinction between normal and fault states is very obvious. Table 2.3 shows the evaluation of the preceding three methods.

80

Fault diagnosis and prognosis techniques for complex engineering systems

Fault injection Detection threshold

A Results of F1

Fault injection Detection threshold

B Results of F2

Fault injection Detection threshold

C Results of F3 FIGURE 2.10

CCA fault detection result of TDCS.

canonical correlation analysis–based fault diagnosis method Chapter | 2

Fault injection Detection threshold

A Results of F1

Fault injection Detection threshold

B Results of F2

Fault injection Detection threshold

C Results of F3 FIGURE 2.11

DCCA fault detection result of TDCS.

81

82

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 2.12

Loss function value of GRU-aided CCA during training.

According to experiment results, for the nonlinear dynamic system, CCA can still correctly detect faults and achieve good detection performance under some circumstances, but the statistical difference between normal and fault is small, and it is easy to be disturbed by noise. The statistics obtained by DCCA under normal and fault conditions are more diverse, but the influence of noise is still not eliminated. GRU-aided CCA correctly detected faults in the case of a large amount of noise and obtained low FAR and high FDR under the three faults, but there was certain trouble in training, and FDD was relatively large. Therefore, the practical application of the CCA fault detection method can be reasonably selected considering the requirements of system complexity, noise level, and FDD.

2.5 Conclusion This work investigates the application of the CCA technique for fault diagnosis of dynamic processes. We introduce three CCA methods, namely conventional CCA, DCCA, and GRU-aided CCA, and the main steps of fault diagnosis with these methods. Conventional CCA aims to find basis vectors for two sets of

TABLE 2.3 Evaluations of GRU-aided CCA CCA Fault

FAR

DCCA FDR

FDD

FAR

GRU-aided CCA FDR

FDD

FAR

FDR

FDD

RBB

5.0%

100%

0

4.8%

99.9%

1

4.6%

98.4%

79

ITSC

4.9%

100%

0

2.7%

100%

0

4.9%

99.5%

23

AGE

4.9%

85.5%

1

4.6%

87.3%

1

4.9%

99.9%

1

canonical correlation analysis–based fault diagnosis method Chapter | 2

Fault injection Detection threshold

A Results of F1

Fault injection Detection threshold

B Results of F2

Fault injection Detection threshold

C Results of F3 FIGURE 2.13

GRU-aided CCA fault detection result of TDCS.

83

84

Fault diagnosis and prognosis techniques for complex engineering systems

variables such that the correlations between the projections of the variables onto these basis vectors are mutually maximized. By means of hypothesis testing, a certain test statistic is constructed with normal data and the corresponding threshold is determined. Then the system state can be judged by the value of the test statistic. On this basis, DCCA takes into account the dynamic and autocorrelation of the system and adds past moments of the system into the calculation of CCA. In GRU-aided CCA, the nonlinear dynamic property is considered, and the system variables are nonlinearly mapped by the neural network so that the mapped variables can meet the requirements of conventional CCA and achieve a better fault diagnosis effect. The presented methods are then used in detecting faults in the CSTR and TDCS processes. Experimental results show that for nonlinear dynamic systems, conventional CCA can achieve a good detection effect in some faults, but it is greatly affected by noise and variable fluctuation. The overall diagnostic effect of DCCA is close to that of CCA, and the fluctuation of statistics is smaller, but it is also affected by noise. GRU-aided CCA performs well on all fault data and has a higher tolerance for noise, but it has disadvantages such as difficulty in training and large delay in fault detection. These two benchmark experiments further verify the effectiveness of CCA in complex engineering systems.

Acknowledgments Financial sponsorship from the project of the National Natural Science Foundation of China (#61803390, #61773407, #61790571, #61621062) is gratefully acknowledged. This work was also partly sponsored by Hunan Provincial Key Laboratory (#2017TP1002), the postdoctoral foundation (#2019T120713), and the Project of State Key Laboratory of High Performance Complex Manufacturing, Central South University (#ZZYJKT2020-14).

References [1] S.X. Ding, Data-Driven Design of Fault Diagnosis and Fault-Tolerant Control Systems, Springer-Verlag, London, UK, 2014. [2] S.J. Qin, L.H. Chiang, Advances and opportunities in machine learning for process data analytics, Computers & Chemical Engineering 126 (2019) 465–473. [3] W. Härdle, L. Simar, Canonical correlation analysis, Applied Multivariate Statistical Analysis, Springer, Berlin, Germany, 2003. [4] T.W. Anderson, An Introduction to Multivariate Statistical Analysis (2nd ed.), John Wiley & Sons, Hoboken, New Jersey, USA, 1984. [5] L.H. Chiang, E. Russell, R. Braatz, Fault Detection and Diagnosis in Industrial Systems, Advanced Textbooks in Control and Signal Processing, Springer-Verlag, London, UK, 2001. [6] M. Borga, Canonical Correlation: A Tutorial, 2001. [7] H. Hotelling, Relation between two sets of variates, Biometrika 28 (1936) 321–377. [8] I.M. Gelfand, A.M. Yaglom, Calculation of amount of information about a random function contained in another such function, American Mathematical Society Translations: Series 2, 12,

canonical correlation analysis–based fault diagnosis method Chapter | 2

[9] [10] [11] [12] [13]

[14]

[15]

[16]

[17]

[18]

[19]

[20] [21] [22] [23] [24] [25] [26] [27]

85

English translation of original in Uspekhi Matematicheskikh Nauk, 1975, pp. 3–52. 12:199-246. H. Akaike, Markovian representation of stochastic processes by canonical variables, SIAM Journal on Control 13 (1) (1975) 162–173. W.E. Larimore, Canonical variate analysis in identification, filtering, and adaptive control, Proceedings of the 29th IEEE Conference on Decision and Control (1990). Y. Wang, D.E. Seborg, W.E. Larimore, Process monitoring using canonical variate analysis and principal component analysis, IFAC Proceedings Volumes 30 (9) (1997) 577–582. B.C. Juricek, D.E. Seborg, W.E. Larimore, Fault detection using canonical variate analysis, Industrial & Engineering Chemistry Research 43 (2004) 458–474. P.P. Odiowei, Y. Cao, Nonlinear dynamic process monitoring using canonical variate analysis and kernel density estimations, IEEE Transactions on Industrial Informatics 6 (1) (2010) 36– 45. B.B. Jiang, X. Zhu, D.X. Huang, R.D. Braatz, Canonical variate analysis-based monitoring of process correlation structure using causal feature representation, Journal of Process Control 32 (2015) 109–116. Z.W. Chen, S.X. Ding, K. Zhang, Z.B. Li, Z.K. Hu, Canonical correlation analysis-based fault detection methods with application to alumina evaporation process, Control Engineering Practice 46 (2016) 51–58. Z.W. Chen, S.X. Ding, T. Peng, C.H. Yang, W.H. Gui, Fault detection for non-Gaussian processes using generalized canonical correlation analysis and randomized algorithms, IEEE Transactions on Industrial Electronics 65 (2) (2018) 6321–6330. Z.W. Chen, C. Liu, S.X. Ding, T. Peng, C.H. Yang, W.H. Gui, Y. Shardt, A just-in-timelearning aided canonical correlation analysis method for multimode process monitoring and fault detection, IEEE Transactions on Industrial Electronics (2020). Y.Q. Liu, B. Liu, X.J. Zhao, M. Xie, A mixture of variational canonical correlation analysis for nonlinear and quality-relevant process monitoring, IEEE Transactions on Industrial Electronics 65 (8) (2017) 6478–6486. X.C. Li, Y.J. Yang, I. Bennett, D. Mba, Condition monitoring of rotating machines under time-varying conditions based on adaptive canonical variate analysis, Mechanical Systems & Signal Processing 131 (2019) 348–363. Q.C. Jiang, X.F. Yan, Locally weighted canonical correlation analysis for nonlinear process monitoring, Industrial & Engineering Chemistry Research 57 (41) (2018) 13783–13792. Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85–117. G. Andrew, R. Arora, J. Bilmes, K. Livescu, Deep canonical correlation analysis, Proceedings of the 30th International Conference on Machine Learning (2013) 1247–1255. W.R. Wang, R. Arora, K. Livescu, J. Bilmes, On deep multi-view representation learning, Proceedings of the 32nd International Conference on Machine Learning (2015) 1083–1092. N. Mallinar, C. Rosset. Deep canonically correlated LSTMs. [Online]. 2018. Available at https://arxiv.org/abs/1801.05407. Q.C. Jiang, X.F. Yan, Learning deep correlated representations for nonlinear process monitoring, IEEE Transactions on Industrial Informatics 15 (12) (2019) 6200–6209. K. Zhang, K. Peng, R. Chu, J. Dong, Implementing multivariate statistics-based process monitoring: A comparison of basic data modeling approaches, Neurocomputing 290 (2018) 172–184.

86

Fault diagnosis and prognosis techniques for complex engineering systems

[28] V. Uurtio, J. Monteiro, J. Kandola, J. Shawe-Taylor, D. Fernandez-Reyes, J. Rousu, A tutorial on canonical correlation methods, ACM Computing Surveys 50 (2017) 14–38. [29] D.R. Hardoon, S. Szedmak, J. Shawe-Taylor, Canonical correlation analysis: An overview with application to learning methods, Neural Computation 16 (12) (2004) 2639–2664. [30] G.H. Golub, H. Zha, The canonical correlations of matrix pairs and their numerical computation. In Linear Algebra for Signal Processing, Springer, 1995. [31] A. Bjorck, G.H. Golub, Numerical methods for computing angles between linear subspaces, Mathematics of Computation 123 (1973) 579–594. [32] M.S. Bartlett, The statistical significance of canonical correlations, Biometrika 32 (1) (1941) 29–37. [33] N.J.R. Healy, A rotation method for computing canonical correlations, Mathematics of Computation 11 (58) (1957) 83–86. [34] L. M. Ewerbring, F. T. Luk. Canonical correlations and generalized SVD: Applications and new algorithms. In Proceedings of the 32nd Annual Technical Symposium, 1989. International Society for Optics and Photonics. 206-222. [35] Z. W. Chen, S. X. Ding, K. Zhang, C. H. Yang, T. Peng. Generalized CCA with applications for fault detection and estimation. In Proceedings of the IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS), 2018. 545--550. [36] M. Basseville, I. Nikiforov, Detection of Abrupt Changes, PTR Prentice Hall, 1993. [37] E.L. Lehmann, Testing Statistical Hypotheses (2nd ed.), Springer-Verlag, 1986. [38] Z.W. Chen, K. Zhang, Y.A.W. Shardt, S.X. Ding, X. Yang, C.H. Yang, T. Peng, Comparison of two basic statistics for fault detection and process monitoring, IFAC-PapersOnLine 50 (1) (2017) 14776–14781. [39] Z.W. Chen, C.H. Yang, T. Peng, H. Dan, C.G. Li, W.H. Gui, A cumulative canonical correlation analysis-based sensor precision degradation detection method, IEEE Transactions on Industrial Electronics 66 (8) (2018) 6321–6330. [40] Y.G. Lei, N.P. Li, L. Guo, N. Li, T. Yan, J. Lin, Machinery health prognostics: A systematic review from data acquisition to RUL prediction, Mechanical Systems & Signal Processing 104 (2018) 799–834. [41] J.L. Zhu, Z.Q. Ge, Z.H. Song, F.R. Gao, Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data, Annual Reviews in Control 46 (2018) 107–133. [42] Z.W. Chen, R.J. Guo, Z. Lin, T. Peng, X. Peng, A data-driven health monitoring method using multi-objective optimization and stacked autoencoder based health indicator, IEEE Transactions on Industrial Informatics (2020), doi:10.1109/TII.2020.2999323. [43] J. Wang, Q. He, Multivariate statistical process monitoring based on, statistics pattern analysis. Industrial & Engineering Chemistry Research 49 (2010) 7858–7869. [44] S. Yin, S.X. Ding, A. Haghani, H.Y. Hao, P. Zhang, A comparison study of basic datadriven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process, Journal of Process Control 22 (2012) 1567–1581. [45] Z.Q. Ge, Z.H. Song, F.R. Gao, Review of recent research on data-based process monitoring, Industrial & Engineering Chemistry Research 52 (10) (2013) 3543–3562. [46] W.F. Ku, R.H. Storer, C. Georgakis, Disturbance detection and isolation by dynamic principal component analysis, Chemometrics & Intelligent Laboratory Systems 30 (1) (1995) 179–196. [47] S.J. Qin, An overview of subspace identification, Computers & Chemical Engineering 30 (2006) 1502–1513.

canonical correlation analysis–based fault diagnosis method Chapter | 2

87

[48] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proceedings of Empirical Methods in Natural Language Processing (2014) EMNLP. [49] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780. [50] Z.Z. Wang, Y.J. Dong, W. Liu, Z. Ma, A novel fault diagnosis approach for chillers based on 1-D convolutional neural network and gated recurrent unit, Sensors 20 (9) (2020) 2458. [51] Y. Tao, X. Wang, R. Sánchez, S. Yang, Y. Bai, Spur gear fault diagnosis using a multilayer gated recurrent unit approach with vibration signal, IEEE Access 7 (2019) 56880–56889. [52] K.E.S. Pilario, Y. Cao, Canonical variate dissimilarity analysis for process incipient fault detection, IEEE Transactions on Industrial Informatics 14 (12) (2018) 5308–5315. [53] C.H. Yang, C. Yang, T. Peng, X.Y. Yang, W.H. Gui, A fault-injection strategy for traction drive control systems, IEEE Transactions on Industrial Electronics 64 (7) (2017) 5719–5727. [54] X.Y. Yang, C.H. Yang, T. Peng, Z.W. Chen, B. Liu, W.H. Gui, Hardware-in-the-loop fault injection for traction control system, IEEE Journal of Emerging & Selected Topics in Power Electronics 6 (2) (2018) 696–706.

Chapter 3

H∞ fault estimation for linear discrete time-varying systems with random uncertainties Yueyang Li School of Electrical Engineering, University of Jinan, China

3.1

Introduction

Research on observer-based robust fault detection and isolation (FDI) problems for linear time-invariant (LTI) systems has received much attention over the past three decades (see [1–3] and the references therein). Basically, the fault detection (FD) issue concerns designing a fault detection filter (FDF) for generating a residual signal such that the sensitivity of residual to fault is intensified by enhancing the robustness to the disturbance. In reviewing the development of the observer-based FDI for LTI systems with various characteristics such as time-delay, model inaccuracy, time-dependent switching mode, and uncertain observations, H∞ optimization and H∞ filtering techniques are two primary approaches that are widely used for LTI systems with l2 -norm bounded unknown inputs and faults [4–6]. Recently, some contributions have been devoted to linear time-varying (LTV) systems since most practical industrial processes can be represented or well approximated by time-varying dynamics (see [7–10] and related works). In the work of Zhong et al. [8], a finite horizon H− /H∞ , H∞ /H∞ FDI formulation was proposed for linear discrete time-varying (LDTV) systems and an optimal solution was derived by solving a Riccati equation. In another work of Zhong et al. [9], a Krein space–based approach was proposed to H∞ filtering-based FDI for LDTV systems. In the work of Shen et al. [7], the H∞ filtering-based fault estimation methods are proposed for LDTV systems in virtue of the Krein space–based reorganized innovation analysis and projection theory in the background of Zhang et al. [10]. On another research front line, multiplicative noise is widely used to represent model uncertainty in state space representation and plays a significant role in many practical engineering fields, such like aerospace, machinery, chemical Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00001-X Copyright © 2021 Elsevier Inc. All rights reserved. 89

90

Fault diagnosis and prognosis techniques for complex engineering systems

reaction, and communication. Naturally, problems of stability analysis, control, and filtering for systems with multiplicative noise have been widely investigated [11–15]. For example, the statistic testing scheme is proposed in the work of Ding et al. [16] and the H∞ -filtering-based FDI is implemented in the work of Ma et al. [17]. With the rapid progress of networked control systems and distributed sensor/actuator systems, the packet dropout caused by sensor gain reductions may happen when transmitting information under unreliable links. The so-called packet dropout refers to the incomplete measurements phenomenon described by Bernoulli random distribution as the multiplicative factor, which is the special case of multiplicative noise. The random uncertainty introduced by packet dropouts evidently deteriorates the performance of the FDF [18]. Many contributions are dedicated to the FD issue for systems with incomplete measurements by employing the LMI-formulated H∞ fault estimation approach over infinite horizon (refer to [19–21] and references therein). For example, He et al. [22] discuss the problem of fault detection for LTI systems with both random delay and packet loss. Ruan et al. [23] design FDF for LTI systems with multistep packet loss. Wang et al. [24] and He et al. [25] solve the problem of fault detection for a class of LTI systems with packet loss and Markov jump characteristics by using the equivalent space method and observer method, respectively. In this chapter, we are devoted to solving H∞ fault estimation problems for LDTV systems with random uncertainties (i.e., multiplicative noise and packet loss). The rest of the content is organized as follows. Section 3.2 deals with the problem of robust FDI for LDTV systems subject to multiplicative noise, and l2 -norm bounded unknown input will be dealt with. The FDF design of the LDTV system with packet loss is studied in Section 3.3. The problem of H∞ fixed-lag fault estimator design for LDTV systems subject to intermittent observations is dealt with in Section 3.4. Finally, Section 3.5 presents some conclusions. Notations. Throughout this chapter, vectors in the Krein space are represented by boldface letters, and vectors in the Euclidean space are denoted by normal letters. For a matrix X, X T and X −1 stand for the transpose and inverse of X, respectively. X > 0 (X < 0) denotes that X is positive (negative) definite. Rn means the set of n-dimensional real vectors. I and 0 denote the identity matrix and zero matrix with appropriate dimensions, respectively.  E{ϑ (k)} means the mathematical expectation of ϑ (k). ϑ (k) ∈ l2 [0, N] means Nk=0 ϑ T (k)ϑ (k) < ∞, where N is a positive integer. The symbol L{{ϑ (i)}ki= j } represents the linear space spanned by the sequence ϑ (k) taking values in the time interval [ j, k]. Prob{ϒ} denotes the occurrence probability of the event “ϒ.” δi j represents the Kronecker delta function, which is equal to unity for i = j and zero for i = j. diag{S1 , S2 , . . . , Sn } means a block diagonal matrix with diagonal blocks S1 , S2 , . . . , Sn .

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

91

3.2 Robust H∞ fault detection for LDTV systems with multiplicative noise 3.2.1 Problem formulation Consider the following LDTV system: ⎧ x(k + 1) = (A(k) + Av (k)v(k))x(k) + (B f (k) + B f v (k)v(k)) f (k) ⎪ ⎪ ⎪ ⎪ + (Bd (k) + Bdv (k)v(k))d(k) ⎨ y(k) = (C(k) + Cv (k)v(k))x(k) + (D f (k) + D f v (k)v(k)) f (k) ⎪ ⎪ + (Dd (k) + Ddv (k)v(k))d(k) ⎪ ⎪ ⎩ x(0) = x0

(3.2.1)

where x(k) ∈ Rn , y(k) ∈ Rny , d(k) ∈ Rnd , and f (k) ∈ Rn f denote the state, measurement output, unknown input, and fault to be detected, respectively; d(k) ∈ l2 [0, N], f (k) ∈ l2 [0, N]. A(k), B f (k), Bd (k), C(k), D f (k), Dd (k), Av (k), B f v (k), Bdv (k), Cv (k), D f v (k), and Ddv (k) are known matrices with appropriate dimensions. We first introduce the definition of exponential stability in mean square for system (3.2.1). Definition 3.2.1 [26]. System (3.2.1) with f (k) = 0 and d(k) = 0 is said to be exponentially stable in mean square if there exist c ≥ 0 and q ∈ (0, 1) such that E{x(k)2 } ≤ cqk x(0)2 Throughout this chapter, it is assumed that C(k) is full row rank for all k, {v(k)} is a scalar zero-mean white noise sequence, and E{v(i)v( j)} = εδi j , where ε is a known positive scalar, and δi j denotes the Kronecker delta function; system (3.2.1) is exponential stable in mean square in finite horizon [0, N]. For the purpose of fault detection, the following observer-based FDF is considered for system (3.2.1): ⎧ ˆ + 1) = A(k)x(k) ˆ + L(k)(y(k) − C(k)x(k)) ˆ ⎨x(k r(k) = W (k)(y(k) − C(k)x(k)) ˆ (3.2.2) ⎩ x(0) ˆ = xˆ0 where x(k) ˆ is an estimate for x(k), r(k) ∈ Rr is the generated residual, xˆ0 is a guess of initial state, and observer gain matrix L(k) and post-filter W (k) are parameters to be determined. Define T  e(k) = x(k) − x(k), ˆ η(k) = xT (k) eT (k) re (k) = r(k) − f (k), w(k) = [ f T (k)

d T (k)]T

It follows from (3.2.1) and (3.2.2) that η(k + 1) = (Aη (k) + Aηv (k)v(k))η(k) + (Bη (k) + Bηv (k)v(k))w(k) (3.2.3) re (k) = (Cη (k) + Cηv (k)v(k))η(k) + (Dη (k) + Dηv (k)v(k))w(k)

92

Fault diagnosis and prognosis techniques for complex engineering systems

where



0 Av (k) , Aηv (k) = A(k) − L(k)C(k) Av (k) − L(k)Cv (k) B f (k) Bd (k) Bη (k) = B f (k) − L(k)D f (k) Bd (k) − L(k)Dd (k)

B f v (k) Bdv (k) Bηv (k) = B f v (k) − L(k)D f v (k) Bdv (k) − L(k)Ddv (k)     Cη (k) = 0 W (k)C(k) , Cηv (k) = W (k)Cv (k) 0   Dη (k) = W (k)D f (k) − I W (k)Dd (k)   Dηv (k) = W (k)D f v (k) W (k)Ddv (k) Aη (k) =

A(k) 0



0 0

Now the problem of H∞ -FDF design can be formulated to find L(k) and W (k) such that system (3.2.3) is exponential stable in mean square and satisfies sup

re (k)22,E

w(k)2 =0

ηT (0)Sη(0) + w(k)22

< γ2

(3.2.4)

where γ is a positive scalar and S is a given positive definite initial state weighting matrix.

3.2.2 H∞ performance analysis To derive the main results, the following lemma that indicates the condition on exponential stability in mean square for system (3.2.3) will first be given. Lemma 3.2.1. The stochastic parameter system (3.2.3) is exponentially stable in mean square if there exists a symmetric positive definite matrix P(·) such that the following inequality holds: ATη (k)P(k + 1)Aη (k) + εATηv (k)P(k + 1)Aηv (k) − P(k) < 0

(3.2.5)

Proof. Let Fk be the minimal σ -algebra generated by {v(k), 0 ≤ k ≤ N}. Suppose (3.2.5) holds, since P(·) > 0, then there exist κ1 (·) > 0 and κ2 (·) > 0 such that κ1 (k)I ≤ P(k) ≤ κ2 (k)I and κ1 (k + 1)I ≤ P(k + 1)κ2 (k + 1)I, then κ1 (k)E{ηT (k)η(k)} ≤ E{ηT (k)P(k)η(k)} ≤ κ2 (k)E{ηT (k)η(k)}. In addition, we have

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

93

E{ηT (k + 1)P(k + 1)η(k + 1)|Fk } = E{ηT (k)(ATη (k)P(k + 1)Aη (k) + εATηv (k)P(k + 1)Aηv (k) − P(k))η(k)} + E{ηT (k)P(k)η(k)}. If (1.4.4) holds, then there exists 0 < κ3 (k) < κ2 (k) such that E{ηT (k + 1)P(k + 1)η(k + 1)} ≤ −κ3 (k)E{ηT (k)η(k)} + E{ηT (k)P(k)η(k)} −κ3 (k) E{ηT (k)P(k)η(k)} + E{ηT (k)P(k)η(k)} < κ2 (k) κ3 (k) = (1 − )E{ηT (k)P(k)η(k)}. κ2 (k) Thus, we can find that κ1 (k + 1)E{ηT (k + 1)η(k + 1)}leE{ηT (k + 1)P(k + 1)η(k + 1)}

κ3 (k) E{ηT (k)P(k)η(k)} ≤ 1− κ2 (k)



κ3 (k − 1) κ3 (k) 1− ≤ 1− κ2 (k) κ2 (k − 1) × E{ηT (k − 1)P(k − 1)η(k − 1)}

κ3 (k) ≤ ··· ≤ 1 − κ2 (k)

κ3 (0) ··· 1 − E{ηT (0)P(0)η(0)}. κ2 (0) Let q1 = max{(1 −

κ3 (k) ), · · · κ2 (k)

, (1 −

κ3 (0) )}, κ2 (0)

then

T k+1 T E{ηT (k + 1)P(k + 1)η(k + 1)} ≤ qk+1 1 E{η (0)P(0)η(0)} = q1 η (0)P(0)η(0).

Furthermore, we have 2 κ1 (k + 1)E{ηT (k + 1)η(k + 1)} ≤ κ2 (0)qk+1 1 η(0) ,

which leads to E{η(k)2 } ≤ cqk η(0)2 with κ2 (0) >0 κ1 (k)



 κ3 (k − 1) κ3 (0) q = max 1 − ,··· , 1 − ∈ (0, 1). κ2 (k − 1) κ2 (0) c=

This completes the proof.



94

Fault diagnosis and prognosis techniques for complex engineering systems

Based on Lemma 3.2.1, the following two theorems that play important roles in deriving the main results will be obtained in terms of Riccati equations. Theorem 3.2.1. If there exist a positive scalar ζ and a solution P(k) > 0 such that ⎧ T (k)Cηv (k) + εATηv (k)P(k + 1)Aηv (k) P(k) = ATη (k)P(k + 1)Aη (k) + εCηv ⎪ ⎨ T T −1 + Cη (k)Cη (k) + E (k) (k)E(k) + ζ I ⎪ ⎩ P(N + 1) = SN+1 (3.2.6) where T (k)Dηv (k) + εATηv (k)P(k + 1)Bηv (k) E(k) = (ATη (k)P(k + 1)Bη (k) + εCηv

+ CηT (k)Dη (k))T (k) = γ 2 I − BTη (k)P(k + 1)Bη (k) − DTη (k)Dη (k) − εBTηv (k)P(k + 1)Bηv (k) − εDTηv (k)Dηv (k) > 0, SN+1 > 0 is a terminal state weighting matrix, then system (3.2.3) with η(0) = 0 is exponentially stable in mean square and, for given γ > 0, the following H∞ performance is satisfied:   γe (k)22,E + E ηT (N + 1)SN+1 η(N + 1) < γ 2. (3.2.7) sup ω(k)22 ω(k)2 =0 Proof. Define V(η(k), k) = ηT (k)P(k)η(k), P(k) > 0, then

V(k) = E{V(k + 1)|Fk } − V(k) = ηT (k)ATη (k)P(k + 1)Aη (k)η(k) + ηT (k)ATη (k)P(k + 1)Bη (k)w(k) + εηT (k)ATηv (k)(k)P(k + 1)Aηv (k)η(k) + εηT (k)ATηv (k)P(k + 1)Bηv (k)w(k) + w T (k)BTη (k)P(k + 1)Aη (k)η(k) + w T (k)BTη (k)P(k + 1)Bη (k)w(k) + εw T (k)BTηv (k)P(k + 1)Aηv (k)η(k) + εw T (k)BTηv (k)P(k + 1)Bηv (k)w(k) − ηT (k)P(k)η(k), which leads to the following identical equation with the aid of (3.2.3): E{ V} = E{ V} − E{γ 2 w T (k)w(k)} + E{γ 2 w T (k)w(k)} − E{reT (k)re (k)} + E{reT (k)re (k)} = ηT (k)11 (k)η(k) + w T (k)21 (k)η(k) + ηT (k)12 (k)w(k) − w T (k)(−22 (k))w(k) + E{γ 2 w T (k)w(k)} − E{reT (k)re (k)}, (3.2.8) where

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

95

T 11 (k) = ATη (k)P(k + 1)Aη (k) − P(k) + CηT (k)Cη (k) + εCηv (k)Cηv (k)

+ εATηv (k)P(k + 1)Aηv (k) T 12 (k) = ATη (k)P(k + 1)Bη (k) + CηT (k)Dη (k) + εCηv (k)Dηv (k)

+ εATηv (k)P(k + 1)Bηv (k) 22 (k) = BTη (k)P(k + 1)Bη (k) − γ 2 I + DTη (k)Dη (k) + εDTηv (k)Dηv (k) + εBTηv (k)P(k + 1)Bηv (k) 21 (k) = T12 (k). Based on (3.2.8), taking the sum of both sides of E{ V} from zero to N by the completing squares method, we have N 

E{ V} = E{ηT (N + 1)P(N + 1)η(N + 1)} − ηT (0)P(0)η(0)

k=0

=

N 

E{ηT (k)R(P(k))η(k)} + E

k=0

−E

N−1 

{γ 2 w T (k)w(k) − reT (k)re (k)}

k=0

N  

 (w(k) − μ∗ (k))T (k)(w(k) − μ∗ (k))

(3.2.9)

k=0

where (k) = −22 (k),

μ(k) = −1 (k)21 (k)η(k)

R(P(k)) = 11 (k) + 12 (k) −1 (k)21 (k). (i) Stability analysis. Let w(k) = 0, then from Lemma 3.2.1, we know that system (3.2.3) is exponentially stable in mean square if (3.2.5) holds. It is clear that when (3.2.6) holds with (k) > 0, (3.2.5) is satisfied, which leads to the exponential stability in mean square. (ii) H∞ performance analysis. When w(k) = 0, define  N  N   T 2 T re (k)re (k) − γ w (k)w(k) JN = E k=0 T

k=0

+ E{η (N + 1)S(N + 1)η(N + 1)}, then from (3.2.9), under zero initial condition, we have JN = −E

N  

 (w(k) − μ∗ (k))T (k)(w(k) − μ∗ (k))

k=0

+E

N  k=0

{ηT (k)R(P(k))η(k)}

96

Fault diagnosis and prognosis techniques for complex engineering systems

+ E{ηT (N + 1)S(N + 1)η(N + 1)}−E{ηT (N + 1)P(N + 1)η(N +1)} ≤ −E

N 

{(w(k) − μ∗ (k))T (k)(w(k) − μ∗ (k))}

k=0

+E

N 

{ηT (k)(R(P(k))+ζ I)η(k)} + E{ηT (N + 1)S(N + 1)η(N + 1)}

k=0 T

− E{η (N + 1)P(N + 1)η(N + 1)}.

(3.2.10)

From (3.2.10), we can conclude that if the equation R(P(k)) + ζ I = 0 (i.e., if there exists P(k) > 0 such that (3.2.6) holds with the constraint condition that (k) > 0), then system (3.2.3) is exponentially stable in mean square and JN < 0 (i.e., H∞ performance (3.2.7) holds). This completes the proof.  Remark 3.2.1. Theorem 3.2.1 establishes a relationship between backward Riccati equation (3.2.6) and the H∞ performance (3.2.7). Next, a solution to L(k) and W (k) in terms of a forward Riccati equation will be derived in the following Theorem 3.2.2 by applying an adjoint operator [27,28]. Theorem 3.2.2. If there exist a positive scalar ζ and matrix Q(k) > 0 satisfying the following forward Riccati equation, ⎧ T T T ⎪ ⎨Q(k + 1) = Aη (k)Q(k)Aη (k) + εBηv (k)Bηv (k) + εAηv (k)Q(k)Aηv (k) + Bη (k)BTη (k) + M(k)−1 (k)M T (k) + ζ I ⎪ ⎩ Q(0) = S−1 (3.2.11) where M(k) = (Cη (k)Q(k)ATη (k) + εDηv (k)BTηv (k) + εCηv (k)Q(k)ATηv (k) + Dη (k)BTη (k))T T (k) = γ 2 I − Cη (k)Q(k)CηT (k) − Dη (k)DTη (k) − εCηv (k)Q(k)Cηv (k)

− εDηv (k)DTηv (k) > 0, then system (3.2.3) is exponentially stable in mean square and, for given γ > 0, H∞ performance (3.2.4) is satisfied. Proof. Define G to be the linear operator that maps (η(0), w(k)) to re (k) based on the following definition of inner product: (η1 (0), w1 (k)), ((η2 (0), w2 (k)) = E{η1T (0)Sη2 (0)} + w1 (k), w2 (k)

 N   ω1 (k), ω2 (k) = E ω1T (k)ω2 (k) k=0

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

97 T

Let G∼ be the adjoint operator of G, and denote G∼ re (k) = [ηaT (0) waT (k)] . The inner product of G∼ and G has the following property [27]: G(η(0), w(k)), re (k) = (η(0), w(k)), G∼ re (k) .

(3.2.12)

Applying (3.2.12) for w(k) and re (k) in l2 [0, N], we have G(η(0), w(k)), re (k) = (η(0), w(k)), G∼ re (k)

 N   T T w (k)wa (k) . = E{η (0)Sηa (0)} + E k=0

In other words, ⎧ ⎡ N ⎨  E reT (k)⎣(Cη (k) + v(k)Cηv (k))(k, 0)η(0) + (Cη (k) ⎩ k=0

+ v(k)Cηv (k))

k−1 

(k, j + 1)(Bη ( j) + v( j)Bηv ( j))w( j)

j=0

⎤⎫ ⎬ + (Dη (k) + v(k)Dηα (k))w(k)⎦ ⎭  N   T T = E{η (0)Sηa (0)} + E w (k)wa (k) , k=0

which implies that ηa (0) = S−1

N−1 

T ( j, 0)(Cη ( j) + v( j)Cηv ( j))T re ( j)

j=0

wa (k) = (Bη (k) + v(k)Bηv (k))T

N 

T ( j, k + 1)(Cη (k) + v(k)Cηv (k))T re ( j)

j=k+1

+ (Dη (k) + v(k)Dηv (k))T re (k). Let λa (k) =

N 

T ( j, k + 1)(Cη (k) + v(k)Cηv (k))T re ( j),

j=k+1

and thus the state-space realization of G˜ can be obtained as  T λa (k − 1) = (ATη (k) + v(k)ATηv (k))λa (k) + (CηT (k) + v(k)Cηv (k))re (k) wa (k) = (BTη (k) + v(k)BTηv (k))λa (k) + (DTη (k) + v(k)DTηv (k))re (k) (3.2.13)

98

Fault diagnosis and prognosis techniques for complex engineering systems

Denote k¯ = N − k, then (3.2.13) can be rewritten as follows:  T ¯ ¯ + v(k) ¯ A¯ Tηv (k)) ¯ λ ¯ + C¯ηT (k) ¯ + v(k) ¯ C¯ηv ¯ ¯ a (k) ¯ a (k¯ + 1) = (A¯ Tη (k) (k))¯re (k) λ ¯ + v(k) ¯ B¯ Tηv (k)) ¯ λ ¯ + (D¯ Tη (k) ¯ + v(k) ¯ D¯ Tηv (k))¯ ¯ re (k) ¯ ¯ a (k) w ¯ a (k) = (B¯ Tη (k) (3.2.14) where

        A¯ η k¯ = Aη N − k¯ , A¯ ην k¯ = Aην N − k¯         B¯ η k¯ = Bη N − k¯ , B¯ ην k¯ = Bην N − k¯         C¯η k¯ = Cη N − k¯ , C¯ην k¯ = Cην N − k¯         D¯ η k¯ = Dη N − k¯ , D¯ ην k¯ = Dην N − k¯         ¯ a k¯ = λa N − k¯ , γ¯e k¯ = λe N − k¯ λ     ¯ a (0) = 0. ω¯ a k¯ = ωa N − k¯ , λ

By applying Theorem 3.2.1, if there exist a positive scalar ζ and a solution ¯ > 0 to the following equation, P(k) ⎧ ¯ k¯ + 1)A¯ Tη (k) ¯ + ε B¯ ηv (k) ¯ B¯ Tηv (k) ¯ + ε A¯ ηv (k)P( ¯ k¯ + 1)A¯ Tηv (k) ¯ ¯ = A¯ η (k)P( P(k) ⎪ ⎨ T −1 T ¯ B¯ η (k) ¯ + F (k) ¯ ¯ ¯ + ζI ¯ (k)F (k) + B¯ η (k) ⎪ ⎩ −1 P(N + 1) = S (3.2.15) where ¯ k¯ + 1)A¯ Tη (k) ¯ + εD¯ ηv (k) ¯ B¯ Tηv (k) ¯ + εC¯ηv (k)P( ¯ k¯ + 1)A¯ Tηv (k) ¯ ¯ = (C¯η (k)P( F (k) ¯ B¯ Tη (k)) ¯ T + D¯ η (k) T ¯ ¯ k¯ + 1)C¯ηT (k) ¯ − D¯ η (k) ¯ D¯ Tη (k) ¯ − εC¯ηv (k)P( ¯ k¯ + 1)C¯ηv ¯ = γ 2 I − C¯η (k)P( ¯ k) (k) ( ¯ D¯ Tηv (k) ¯ > 0, − εD¯ ηv (k) then system (3.2.14) is exponential stable in mean square and satisfies the following H∞ performance:   w ¯ 2 + E{λ ¯ T (N + 1)S−1 λT (N + 1)} ¯ a (k) 2,E sup < γ 2. (3.2.16)  2   ¯ ¯ re (k) 2,E re (k)2,E =0 Notice that the H∞ performance (3.2.16) for system (3.2.14) and H∞ performance (3.2.4) for system (3.2.3) are induced norms in G∼ and G, respectively. Thus, from Theorem 3.9-2 in the work of Kreyszig [29], we know that the H∞ performance (3.2.16) and (3.2.4) are equivalent. Let Q(k) = P(N + 1 − k), then (3.2.15) will reduce to the forward equation (3.2.11), which completes the proof.  Remark 3.2.2. If system (3.2.1) is LTI, then for k → ∞, Theorem 3.2.2 is the Riccati equation version of Lemma 3.2.1 proposed in the work of Ma et al.

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

99

[17] for the time-invariant robust FDF design when multiplicative noise exists. If system (3.2.1) is LTV with Aηv (k) = 0, Bηv (k) = 0, Dη (k) = 0, and Dηv (k) = 0 without considering the exponential stability in mean square in finite horizon, Theorem 3.2.2 is identical to Lemma 3.2.1 given in the work of Zhong et al. [30] for the deterministic H∞ fault estimation problem.

3.2.3 Design of parameter matrices Based on Theorem 3.2.2, we are now ready to give a solution to the H∞ FDF design problem. First, the determination of the parameter matrices will be converted into a quadratic optimization problem and an analytical solution will be derived by solving this problem. Let ! Q12 (k) Q11 (k) Q(k) = Q22 (k) Q21 (k) (k) = −W (k)C(k)Q22 (k)CT (k)W T (k) − W (k)Dd (k)DTd (k)W T (k) − εW (k)D f v (k)DTf v (k)W T (k) − εV (k)Ddv (k)DTdv (k)W T (k) − εW (k)Cv (k)Q11 (k)CvT (k)W T (k) + γ 2 I

(3.2.17)

− (W (k)D f (k) − I)(W (k)D f (k) − I)T From (3.2.11), it concludes that Q11 (k + 1) is independent of L(k), whereas for Q22 (k + 1), we have Q22 (k + 1) = ε(Av (k) − L(k)Cv (k))Q11 (k)(Av (k) − L(k)Cv (k))T + (A(k) − L(k)C(k))Q22 (k)(A(k) − L(k)C(k))T + (Bd (k) − L(k)Dd (k))(Bd (k) − L(k)Dd (k))T + (B f (k) − L(k)D f (k))(B f (k) − L(k)D f (k))T + ε(B f v (k) − L(k)D f v (k))(B f v (k) − L(k)D f v (k))

(3.2.18) T

+ ε(Bdv (k) − L(k)Ddv (k))(Bdv (k) − L(k)Ddv (k)) + (k)−1 (k) T (k) + ζ I, where (k) = (A(k) − L(k)C(k))Q22 (k)CT (k)W T (k) + Bd (k)−L(k)Dd (k)DTd (k)W T (k) + (ε(B f v (k) − L(k)D f v (k))DTf vW T (k) + ε(Av (k) − L(k)Cv (k))Q21 (k)CvT (k)W T (k) + (B f (k) − L(k)D f (k))(W (k)D f 1 (k) − I)T + ε(Bdv (k) − L(k)Ddv (k))DTdvW T (k).

100

Fault diagnosis and prognosis techniques for complex engineering systems

Let L f (k) denote a feasible solution of L(k) and W f (k) denote a feasible solution of W (k). Motivated by Yu et al. [31], L f (k) and W f (k) can be derived such that makes γ as small as possible meanwhile guarantees (k) > 0. Thus, L f (k) and W f (k) are supposed to satisfy the following inequality: ς T (k)(W f (k), L f (k − 1), γ )ς (k) ≥ ς T (k)(W (k), L(k − 1), γ )ς (k), (3.2.19) where ς (k) is any nonzero vector with appropriate dimension. Based on this idea, for obtaining L f (k), it follows from ∂ς T (k)Q22 (k + 1)ς (k) =0 ∂ (LT (k)ς (k)) that 0 = (H(k) + K(k)−1 (k)K T (k))LT (k) − K(k)−1 (k)GT (k) − T (k) (3.2.20) where K(k) = Dd (k)DTd (k)W T (k) + εDdv (k)DTdv (k)W T (k) + εD f v (k)DTf v (k)W T (k) + D f (k)(W (k)D f (k) − I)T + C(k)Q22 (k)CT (k)W T (k) + εCv (k)Q21 (k)CvT (k)W T (k) H(k) = Dd (k)DTd (k) + εDdv (k)DTdv (k) + D f (k)DTf (k) + εCv (k)Q11 (k)CvT (k) + C(k)Q22 (k)CT (k) + εD f v (k)DTf v (k) G(k) = εAv (k)Q21 (k)CvT (k)W T (k) + Bd (k)DTd (k)W T (k) + B f (k)(W (k)D f (k) − I)T + A(k)Q22 (k)CT (k)W T (k) + εBdv (k)DTdv (k)W T (k) + εB f v (k)DTf v (k)W T (k) T (k) = C(k)Q22 (k)AT (k) + εDdv (k)BTdv (k) + εCv (k)Q11 (k)ATv (k) + D f (k)BTf (k) + Dd (k)BTd (k)T + εD f v (k)BTf v (k). Simultaneously, for deriving W f (k), it follows from ∂ς T (k)(k)ς (k) =0 ∂ (W T (k)ς (k)) that 0 = −W (k)H(k) + DTf (k). Furthermore, under the assumption that C(k) is full row rank for all k, we have ∂ 2 ς T (k)Q22 (k + 1)ς (k) ∂ (LT (k)ς (k))2

= H(k) + K(k)−1 (k)K T (k) > 0

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

101

and ∂ 2 ς T (k)(k)ς (k) ∂ (W T (k)ς (k))2

= −H(k) < 0,

and thus L f (k) and W f (k) can be chosen as L f (k) = T (k)(H(k) + K(k)−1 (k)K T (k))−1 W f (k) = DTf (k)H −1 (k)

(3.2.21) (3.2.22)

where (k) = T (k) + K(k)−1 (k)N T (k), which implies (3.2.19) holds such that γ satisfying (k) > 0 can be obtained as small as possible. Substituting W f (k) back into (3.2.17), (k) can be calculated as follows: (k) = (γ 2 − 1)I + DTf (k)H −1 (k)D f (k) Theorem 3.2.3. Let Q(k) =

Q11 (k) Q21 (k)

(3.2.23)



Q12 (k) Q22 (k)

given γ ; if there exit a positive scalar ζ and a solution Q(k) > 0 to the Riccati equation (3.2.11) with the resulted in (3.2.23) is positive definite, then (3.2.3) is exponentially stable in mean square and H∞ performance (3.2.4) is satisfied. The design parameter matrices L(k) and W (k) are given in (3.2.21) and (3.2.22), respectively. Remark 3.2.3. The design of post-filter W (k) brings more freedom such that a smaller attenuation level γ can be achieved. Comparing (3.2.23) with (3.2.17), it can be seen that when W (k) = I, (k) is inclined to be negative when γ tends to be small. Thus, the H∞ performance (3.2.4) can be optimized by using W f (k) in some sense in contrast with designing L(k) only.

3.2.4 Numerical examples To illustrate the result achieved in this chapter, consider the following stochastic LDTV system with



k 0 0 −0.1e− /100 0.9k , Av (k) = A(k) = 0 0.5 −0.85 −0.1







0.6 sin(k) 0.5 0.6 0.3 B f (k) = , B f v (k) = , Bd (k) = , Bdv (k) = 0.4 0 0.2 0.1     C(k) = −0.1 0.3 , Cv (k) = 0.1 0 D f (k) = 0.75, D f v (k) = 0.1, Dd (k) = 0.3, Ddv (k) = 0.1.

102

Fault diagnosis and prognosis techniques for complex engineering systems

0.4

Unknown input d(k)

0.3 0.2 0.1 0 −0.1 −0.2 −0.3

0

20

40

60

80

100

k FIGURE 3.1

Unknown input d(k).

1.5 Fault Residual

Fault and residual

1 0.5 0 −0.5 −1 −1.5 0

20

40

60

80

100

k FIGURE 3.2

Stepwise fault f (k) and the residual r(k).

{v(k)} is a zero-mean scalar white noise sequence with unit variance, x(0) = ˆ = [0 0]T , Q(0) = I, ζ = 0.01, and γ = 0.8. The unknown input [0.2 0]T , x(0) d(k) is simulated as shown in Fig. 3.1. By applying Theorem 3.2.3, a stepwise fault and the corresponding generated residual are shown in Fig. 3.2. In addition, a sine wave fault and its corresponding generated residual are shown in Fig. 3.3. It can be seen from the simulation results that the generated residual can approach to fault well when there exists multiplicative noise in systems.

3.3 Robust H∞ fault detection for LDTV systems with measurement packet loss 3.3.1 Problem formulation Consider the following LDTV systems:

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

103

1.5 Fault Residual

Fault and residual

1 0.5 0 −0.5 −1 −1.5 0

20

40

60

80

100

k FIGURE 3.3

Sine wave fault f (k) and residual r(k).



x(k + 1) = A(k)x(k) + B f (k) f (k) + Bd (k)d(k) y(k) = C(k)x(k) + D f (k) f (k) + Dd (k)d(k)

(3.3.1)

where x(k) ∈ Rn , y(k) ∈ Rny , d(k) ∈ Rnd , and f (k) ∈ Rn f are the state, measurement output, unknown input, and fault to be detected on system (3.3.1), respectively; without losing generality, under the assumption that d(k), f (k) are l2 norm-bounded signal and C(k) is full of rank; A(k), B f (k), Bd (k), C(k), D f (k), and Dd (k) are known time-varying matrices of appropriate dimensions. When there is a data packet loss phenomenon in the transmission process of the measurement output, it is assumed that the actually obtained measurement signalψ (k) ∈ Rq is ψ (k) = θ (k)y(k) + (1 − θ (k))ψ (k − 1),

(3.3.2)

where θ (k) is an independent identically distributed Bernoulli random variable and satisfies Pr{θ (k) = 1} = E{θ (k)} = ρ (3.3.3) Pr{θ (k) = 0} = 1 − E{θ (k)} = 1 − ρ ρ ∈ (0, 1] is the known scalar. Define α(k) = θ (k)−ρ, α(k) has the following statistical characteristics from (3.3.3): E{α(k)} = 0 (3.3.4) E{α 2 (k)} = ρ − ρ 2 := ε Introducing the augmented vector ξ (k) = [xT (k) bining (3.3.1) and (3.3.2), we have

T

ψ T (k − 1)] and com-

104

Fault diagnosis and prognosis techniques for complex engineering systems

⎧ ⎪ ⎪ξ (k + 1) = (A1 (k) + α(k)Aα (k))ξ (k) + (B f 1 (k) + α(k)B f α (k)) f (k) ⎨ + (Bd1 (k) + α(k)Bdα (k))d(k) ψ (k) = (C1 (k) + α(k)Cα (k))ξ (k) + (D f 1 (k) + α(k)D f α (k)) f (k) ⎪ ⎪ ⎩ + (Dd1 (k) + α(k)Ddα (k))d(k) (3.3.5) where



A(k) 0 B f (k) Bd (k) A1 (k) = , Bd1 = , Bf1 = ρC(k) (1 − ρ)Iq ρD f (k) ρDd (k)   C1 = ρC(k) (1 − ρ)Iq , D f 1 = ρD f (k), Dd1 = ρDd (k)





0 0 0 0 , Bdα (k) = , B f α (k) = Aα (k) = C(k) −I D f (k) Dd (k)   Cα (k) = C(k) −I , D f α (k) = D f (k), Ddα (k) = Dd (k). Residual generation is the crucial segment in the design of the FDI system. Therefore, the following observer-based FDF is considered as the residual generator: ⎧ ⎨ξˆ(k + 1) = A1 (k)ξˆ(k) + L(k)(ψ (k) − C1 (k)ξˆ(k)) ˆ r(k) = V (k)(ψ (k) − C1 (k)x(k)) (3.3.6) ⎩ ˆ ˆ ξ (0) = ξ0 where ξˆ(k) is the estimation of ξ (k) and ξˆ0 is the initial value of the designed filter, r(k) ∈ Rr is the residual, and observer gain matrix L(k) and post-filter V (k) are the parameters to be designed. Let T  e(k) = x(k) − x(k), ˆ η(k) = ξ T (k) eT (k) re (k) = r(k) − f (k), w(k) = [ f T (k)

d T (k)]

T

from Eqs. (3.3.5) and (3.3.6), and we have η(k + 1) = (Aη (k) + α(k)Aηα (k))η(k) + (Bη (k) + α(k)Bηα (k))w(k) re (k) = (Cη (k) + α(k)Cηα (k))η(k) + (Dη (k) + α(k)Dηα (k))w(k) (3.3.7) where



A1 (k) Aα (k) 0 0 , Aηα (k) = Aη (k) = 0 A1 (k) − L(k)C1 (k) Aα (k) − L(k)Cα (k) 0

B f 1 (k) Bd1 (k) Bη (k) = B f 1 (k) − L(k)D f 1 (k) Bd1 (k) − L(k)Dd1 (k)

B f α (k) Bdα (k) Bηα (k) = B f α (k) − L(k)D f α (k) Bdα (k) − L(k)Ddα (k)     Cη (k) = 0 V (k)C1 (k) , Cηα (k) = V (k)Cα (k) 0     Dη (k) = V (k)D f 1 (k) − I V (k)Dd1 (k) , Dηα (k) = V (k)D f α (k) V (k)Ddα (k)

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

105

Note that system (3.3.7) is a time-varying system containing a random variable α(k), and the following definition is presented, first given in the work of Morozan [26]. Definition 3.3.1. System (3.3.7) is said to be mean square exponential stable if c ≥ 0 and q ∈ (0, 1) exist under the condition of zero input (i.e., w(k) = 0) so that the following relation holds: E{η(k)2 } ≤ cqk η(0)2 To sum up, the FDF design problems to be solved in this section can be summarized as follows. Problem 3.3.1. Given γ > 0, design parameter matrices L(k) and V (k) so as to make system (3.3.7) mean square index stable and meet the following performance indexes, sup

w(k)2,N =0

re (k)22,E ηT (0)Sη(0) + w(k)22

< γ2

(3.3.8)

where S > 0 is the weighting matrix for initial state. Remark 3.3.1. Systems with packet loss characteristics described by Bernoulli random variables can be roughly divided into two categories. One is the case with multistep measurement of packet loss in the form of (3.3.2); when the parameter matrix of system (3.3.1) is constant, the FDF design problem mentioned earlier is the problem studied by the document [23]. The second is the case of singlestep data packet loss measurement as given in the work of Gao et al. [32] and Zhao et al. [33]. This section mainly studies the FDF design of an LDTV system with multistep data packet loss measurement. The proposed algorithm can also be applied to an LDTV system with single-step data packet loss measurement.

3.3.2 Main results Note that the form of system (3.3.7) is similar to the LDTV system (3.2.3) with multiplicative noise in the previous section. It is known that the statistical characteristics of the random variable α(k) are also similar to the multiplicative noise v(k) according to (3.3.4).Therefore, the relevant Theorem 3.2.2 in Section 3.2.2 can be extended to system (3.3.7) to obtain the necessary and sufficient conditions for FDF to exist, which can be summarized as the following theorem. Theorem 3.3.1. For system (3.3.7), given γ > 0, if the constant β > 0 and positive definite matrix Q(k) make the following Riccati equation hold so that system (3.3.7) is mean square exponential stable and meets the H∞ performance

106

Fault diagnosis and prognosis techniques for complex engineering systems

index (3.3.8): ⎧ T T T ⎪ ⎨Q(k + 1) = Aη (k)Q(k)Aη (k) + εBηα (k)Bηα (k) + εAηα (k)Q(k)Aηα (k) + Bη (k)BTη (k) + M(k) −1 (k)M T (k) + βI ⎪ ⎩ Q(0) = S−1 (3.3.9) where M(k) = (Cη (k)Q(k)ATη (k)+εDηα (k)BTηα (k)+εCηα (k)Q(k)ATηα (k)+Dη (k)BTη (k))T T (k) = γ 2 I − Cη (k)Q(k)CηT (k) − Dη (k)DTη (k) − εCηα (k)Q(k)Cηα (k)

− εDηα (k)DTηα (k) > 0. Based on the sufficient conditions for the existence of H∞ -FDF given by Theorem 3.3.1, the calculation of parameter matrices L(k), V (k) is transformed into a quadratic optimization problem. The main idea is to obtain a set of feasible solutions of L(k), V (k) by solving Riccati equation (3.3.9), making T heta(k) satisfies the positive qualitative condition for the smallest possible γ and further optimizes the performance index (3.3.8) to a certain extent [34]. Define

Q11 (k) Q12 (k) Q(k) = Q21 (k) Q22 (k) and we have (k) = −V (k)C1 (k)Q22 (k)C1T (k)V T (k) − εV (k)Cα (k)Q11 (k)CαT (k)V T (k) − εV (k)D f α (k)DTfα (k)V T (k) − εV (k)Ddα (k)DTdα (k)V T (k) + γ 2 I − (V (k)D f 1 (k) − I)(V (k)D f 1 (k) − I)T − V (k)Dd1 (k)DTd1 (k)V T (k). (3.3.10) Notice that (k) is a quadratic function of V(k), then a feasible solution V f (k) of V (k) can be defined as follows: at time instant k, given Q(k), the following relation holds for any nonzero column vector of appropriate dimension ς (k) as follows: ς T (k) (V f (k), k)ς (k) ≥ ς T (k) (V (k), k)ς (k) > 0. Associating the following equation, ∂ς T (k) (V (k), k)ς (k) = 0. ∂ (V T (k)ς (k)) we have − ς T (k)V (k)(C1 (k)Q22 (k)C1T (k) + εCα (k)Q11 (k)CαT (k) + εD f α (k)DTfα (k) + εDdα (k)DTdα (k) + D f 1 (k)DTf1 (k) + Dd1 (k)DTd1 (k)) + ς T (k)DTf1 (k) = 0. (3.3.11)

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

107

Furthermore, from the prior condition that C(k) is a row-full rank, we get ∂ 2 ς T (k) (V (k), k)v(k) ∂ (V T (k)ς (k))2

= −(C1 (k)Q22 (k)C1T (k) + εCα (k)Q11 (k)CαT (k) + εD f α (k)DTfα (k)+εDdα (k)DTdα (k) + D f 1 (k)DTf1 (k) + Dd1 (k)DTd1 (k)) < 0,

then V f (k) = DTf1 (k)(C1 (k)Q22 (k)C1T (k) + εCα (k)Q11 (k)CαT (k) + εD f α (k)DTfα (k) + εDdα (k)DTdα (k) + D f 1 (k)DTf1 (k) + Dd1 (k)DTd1 (k))−1

(3.3.12)

Combining (3.3.12) and (3.3.10), we get (k) = (γ 2 − 1)I + DTf1 (k)(C1 (k)Q22 (k)C1T (k) + εCα (k)Q11 (k)CαT (k) + εD f α (k)DTfα (k) + εDdα (k)DTdα (k) + D f 1 (k)DTf1 (k) + Dd1 (k)DTd1 (k))D f 1 (k).

(3.3.13)

Referring (3.3.7) and (3.3.9), the matrix Q11 (k + 1) and L(k) are independent of each other, and the following relation holds for Q22 (k + 1) : Q22 (k + 1) = (A1 (k) − L(k)C1 (k))Q22 (k)(A1 (k) − L(k)C1 (k))T + ε(Aα (k) − L(k)Cα (k))Q11 (k)(Aα (k) − L(k)Cα (k))T + (B f 1 (k) − L(k)D f 1 (k))(B f 1 (k) − L(k)D f 1 (k))T + (Bd1 (k) − L(k)Dd1 (k))(Bd1 (k) − L(k)Dd1 (k))T + ε(B f α (k) − L(k)D f α (k))(B f α (k) − L(k)D f α (k))T − L(k)Ddα (k))(Bdα (k) − L(k)Ddα (k))T + (k) −1 (k) T (k) + βI, where (k) = (A1 (k) − L(k)C1 (k))Q22 (k)C1T (k)V T (k) + ε(Aα (k) − L(k)Cα (k))Q11 (k)CαT (k)V T (k) + (B f 1 (k) − L(k)D f 1 (k))(V (k)D f 1 (k) − I)T + (Bd1 (k) − L(k)Dd1 (k))DTd1 (k)V T (k) + ε(B f α (k) − L(k)D f α (k))(V (k)D f α (k) − I)T + ε(Bdα (k) − L(k)Ddα (k))DTdα (k)V T (k). Similar to the preceding analysis for finding V (k), a feasible solution L f (k) of L(k) can be defined as follows: at time instant k, given V (k), Q(k), the following

108

Fault diagnosis and prognosis techniques for complex engineering systems

relation holds for any nonzero column vector of appropriate dimension ς (k) : ς T (k)Q22 (k + 1)ς (k)|L(k)=L f (k) ≤ ς T (k)Q22 (k + 1)ς (k)|L(k)=L f (k) so that the following inequality holds: ς T (k + 1) (L f (k + 1), k + 1)ς (k + 1) ≥ ς T (k + 1) (L(k + 1), k + 1)ς (k + 1) > 0. Based on ∂ς T (k)Q22 (k + 1)ς (k) = 0, ∂ (LT (k)ς (k)) we get 0 = (H(k) + G(k) −1 (k)GT (k))LT (k) − G(k) −1 (k)N T (k) − T (k), where H(k) = C1 (k)Q22 (k)C1T (k) + εCα (k)Q11 (k)CαT (k) + D f 1 (k)DTf1 (k) + Dd1 (k)DTd1 (k) + εD f α (k)DTfα (k)T + Dd1 (k)DTd1 (k) + εDdα (k)DTdα (k) G(k) = C1 (k)Q22 (k)C1T (k)V T (k) + εCα (k)Q11 (k)CαT (k)V T (k) + Dd1 (k)DTd1 (k)V T (k) + εD f α (k)DTfα (k)V T (k) + εDdα (k)DTdα (k)V T (k) + D f 1 (k)(V (k)D f 1 (k) − I)T N(k) = A1 (k)Q22 (k)C1T (k)V T (k) + εAα (k)Q11 (k)CαT (k)V T (k) + Bd1 (k)DTd1 (k)V T (k) + B f 1 (k)(V (k)D f 1 (k) − I)T + εB f α (k)DTdα (k) + εBdα (k)DTdα (k)V T (k) T (k) = C1 (k)Q22 (k)AT1 (k) + εCα (k)Q11 (k)ATα (k) + Dd1 (k)BTd1 (k) + εD f α (k)BTdα (k) + εDdα (k)BTdα (k) + D f 1 (k)BTf1 (k). Furthermore, combining the known condition that C(k) is the full row rank, we obtain ∂ 2 ς T (k)Q22 (k + 1)ς (k) (∂LT (k)ς (k))2

= H(k) + G(k) −1 (k)GT (k) > 0,

then L f (k) = T (k)(H(k) + G(k) −1 (k)GT (k))

−1

where (k) = T (k) + G(k) −1 (k)N T (k). (k) could be obtained following (3.3.13). Based on the preceding analysis, the following theorem is obtained.

(3.3.14)

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

109

1.5

(k)

1

0.5

0

−0.5

0

20

40

60

80

100

k FIGURE 3.4

The rate of change θ (k).

Theorem 3.3.2. For system (3.3.7), given γ > 0, if the constant β > 0 and symmetric matrix Q(k) make Eq. (3.3.9) hold and (k) > 0 in (3.3.13) so that system (3.3.7) is mean square exponential stable and meets the H∞ performance index (3.3.8), where the parameter matrices L(k), V (k) can be obtained by the expressions (3.3.14) and (3.3.12), respectively.

Q11 (k) Q12 (k) Q(k) = Q21 (k) Q22 (k) Remark 3.3.2. Comparing the form of the FDF parameter matrix of the LDTV system with that affected by multiplicative noise in Section 2 under the condition of multistep measurement data packet loss in this section, it can be seen that for a specific form of system (3.2.3) or (3.3.7), the feasible solution of parameter matrix can be obtained by adopting the matrix optimization idea given in this section to solve the Riccati equation and the method of minimizing the upper bound of variance in Section 2.

3.3.3 Numerical examples To illustrate the effectiveness of FDF designed by Theorem 3.3.2, a set of system (3.3.1) parameter matrices are selected as follows:





k 0.6 sin(k) 0.6 −0.1e− /100 0.9k , B f (k) = Bd (k) = A(k) = 0.4 0.2 −0.85 −0.1   C(k) = −0.1 0.3 , D f (k) = 0.8, Dd (k) = 0.3. T

ˆ = Set packet loss probability ρ = 0.85, Q(0) = 0.1I, x(0) = [0.2 0] , x(0) T [0 0] , Q(0) = I, β = 0.01, and γ = 1.1. The rate of change θ (k) and the unknown input signal is shown in Fig. 3.4 and Fig. 3.5 respectively. Fig. 3.6 and Fig. 3.7 respectively give the square wave fault and sine wave fault and the

110

Fault diagnosis and prognosis techniques for complex engineering systems

0.4

Unknown input d(k)

0.3 0.2 0.1 0 −0.1 −0.2 −0.3

0

20

40

60

80

100

k FIGURE 3.5

The unknown input signald(k).

1.5 Fault Residual

Fault and residual

1 0.5 0 −0.5 −1 −1.5

0

20

40

60

80

100

k FIGURE 3.6

Square wave fault f (k) and the residual r(k).

1.5 Fault Residual

Fault and residual

1 0.5 0 −0.5 −1 −1.5 0

20

40

60 k

FIGURE 3.7

Sine wave fault f (k) and the residual r(k).

80

100

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

111

corresponding residual signal. From Fig. 3.6 and Fig. 3.7, it can be seen that the robust H∞ -FDF designed based on Theorem 3.3.2 can obtain an effective residual signal when a fault occurs.

3.4 Fixed-lag H∞ fault estimator design for LDTV systems under an unreliable communication link 3.4.1 Problem formulation and preliminaries Consider the following LDTV system: ⎧ ⎨x(k + 1) = A(k)x(k) + B f (k) f (k) + D(k)d(k) y(k) = θ (k)C(k)x(k) + v(k) ⎩ x(0) = x0

(3.4.1)

where x(k) ∈ Rn , y(k) ∈ Rq , d(k) ∈ Rnd , v(k) ∈ Rnv , and f (k) ∈ Rn f denote the state, sensor measurement, process noise, observation noise, and fault, respectively. f (k), d(k), and v(k) belong to l2 [0, N]. A(k), B f (k), C(k), and D(k) are known time-varying matrices with appropriate dimensions. θ (k) is a Bernoulli distributed binary stochastic variable to describe the measurement packet dropouts, which satisfies Prob{θ (k) = 1} = E{θ (k)} = ρ (3.4.2) Prob{θ (k) = 0} = 1 − E{θ (k)} = 1 − ρ with ρ as a known constant. The value of ρ can be obtained by empirical observations, experimentations, and statistical analysis [35]. The main purpose of this chapter is as follows. Given a prescribed disturbance attenuation level γ , by collecting the observations y(0), . . . , y(k), find fˇ(k − l|k) as a suitable estimation of the fault signal f (k) such that the following l-step delayed H∞ performance index is fulfilled with l as a positive integer: N   T ˇ ˇ E ( f (k − l|k) − f (k − l)) ( f (k − l|k) − f (k − l)) k=l < γ2 sup N N−1 N    (x0 , fk ,dk ,vk )=0 T −1 x0 P0 x0 + f T (k) f (k) + d T (k)d(k) + v T (k)v(k) k=0

k=0

k=0

(3.4.3) where fk = [ f T (0) · · · f T (k)]T , dk = [d T (0) · · · d T (k)]T , and vk = [v T (0) · · · v T (k)]T . Due to the denominator of the left side of (3.4.3) being positive, (3.4.3) can be rewritten as J0 = x0T P0−1 x0 +  − E γ −2

N  k=0

N  k=l

f T (k) f (k) + 

N−1  k=0

vsT (k)vs (k) > 0,

d T (k)d(k) +

N 

v T (k)v(k)

k=0

(3.4.4)

112

Fault diagnosis and prognosis techniques for complex engineering systems

where vs (k) = fˇ(k − l|k) − f (k − l). Consequently, according to Stoorvogel et al. [36], the H∞ fixed-lag fault estimation problem can be restated as follows. Given a constant γ > 0, design an estimator in the following way: ¯ f , d, v), fˇ = (y) = ( ¯ mapping where  denotes a stable operator that generates a bounded operator  ˇ from f , d, v to f such that the indefinite cost function (3.4.4) has a positive minimum with respect to f , d, and v. Remark 3.4.1. In the existing results (e.g., [19–21, 37-40]), the Bernoulli distributed random variables are introduced to describe the packet dropping or finite step measurement time-delay phenomenon. It is noteworthy that the designed estimators only depend on the probability (i.e. ρ) rather than θ (k). This indicates that the desired fault estimator does not require the timestamp of the data packet. Remark 3.4.2. Notice that when y(k) is affected by the so-called sensor fault with the following form, y(k) = θ (k)C(k)x(k) + D f (k) f (k) + v(k), the existing BRL-based H∞ fault estimation algorithm in the work of Li et al. [39] is applicable in a filter manner. In the case that D f (k) = 0, the estimator is supposed to be designed as a smoother with the proposed performance index (3.4.4). In this scenario, the methodology of Li et al. [39] may induce computational burden via a state augmentation approach and the gain matrices of the estimator are arduous to be derived due to some coupled product terms. In what follows, a Krein space–based fault estimator design scheme will be addressed to overcome the aforementioned defects. In this section, inspired by the work of Zhao et al. [41] and Lu et al. [42], an equivalent Krein space stochastic system and a corresponding H∞ performance index are first introduced. Then, by exploiting the reorganized innovation analysis and the projection theory in Krein space, the H∞ fault estimator is derived.

3.4.2 Krein space model design Before we proceed, we would like to propose the following lemma to construct an auxiliary stochastic system in Krein space. Lemma 3.4.1. Given a scalar γ > 0 and an integer l > 0, the H∞ performance (3.4.4) is fulfilled if and only if there exists a fault estimator fˇ(k − l|k) such that

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

113

the following inequality holds: J = x0T P0−1 x0 +

N 

f T (k) f (k) +

k=0

+

N 

N 

v0T (k)v0 (k) +

k=0

vzT (k)vz (k) − γ −2

N 

N−1 

d T (k)d(k)

k=0

vsT (k)vs (k) > 0

(3.4.5)

k=l

k=0

subject to the following dynamic constraints ⎧ x(k + 1) = A(k)x(k) + B f (k) f (k) + D(k)d(k) ⎪ ⎪ ⎪ ⎪ + v0 (k) y0 (k) = ρC(k)x(k) ⎨ √ yz (k) = ρ(1 − ρ)C(k)x(k) + vz (k) ⎪ ⎪ fˇ(k − l|k) = f (k − l) + vs (k) ⎪ ⎪ ⎩ x(0) = x0

(3.4.6)

where y0 (k) and yz (k) are the fictitious observations with their corresponding observation noises v0 (k) and vz (k), respectively. The instantaneous value of y0 at each time instant k is equal to y(k) along with yz (k) ≡ 0. Proof. Necessity: From (3.4.1), the state transition matrix  is defined as A(k − 1) · · · A( j), 0 < k < j (k, j) = I, k= j and hence we have x(k) = (k, 0)x0 +

k−1 

(k, i + 1)B f (i) f (i) +

i=0

k−1 

(k, i + 1)D(i)d(i).

i=0

(3.4.7) Define yk = [yT (0) · · · yT (k)]T , vs,k = [vsT (0) · · · vsT (k)]T fˇk = [ fˇT (0|l) · · · fˇT (k − l|k)]T , then, in view of (3.4.7), we have  yN = (k)Gx x0 + (k)G f fN + (k)Gd dN + vN fˇN = fN−l + vs,N

(3.4.8)

where (k) = diag{θ (1), . . . , θ (k)}, G f (k, i) = C(k)(k, i + 1)B f (i) Gd (k, i) = C(k)(k, i + 1)D(i) ⎡ ⎤ ⎡ C(0)(0, 0) 0 ⎢ C(1)(1, 0) ⎥ ⎢ G f (1, 0) ⎢ ⎢ ⎥ Gx = ⎢ .. ⎥, G f = ⎢ .. ⎣ ⎦ ⎣ . . C(N)(N, 0)

G f (N, 0)

... 0 .. .

... ... .. .

⎤ 0 0⎥ ⎥ ⎥ ⎦

G f (N, 1)

...

0

114

Fault diagnosis and prognosis techniques for complex engineering systems



0 ⎢ Gd (1, 0) ⎢ Gd = ⎢ .. ⎣ .

... 0 .. .

... ... .. .

⎤ 0 0⎥ ⎥ ⎥ ⎦

Gd (N, 0) Gd (N, 1) . . . 0 Thus, by substituting (3.4.8) into (3.4.4) and taking (3.4.2) into consideration, we have  N N−1   f T (k) f (k) + d T (k)d(k) J0 = E x0T P0−1 x0 + k=0

k=0

− (yN − (k)Gx x0 − (k)G f fN − (k)Gd dN )T × (yN − (k)Gx x0 − (k)G f fN − (k)Gd dN ) − γ −2

N 



( fˇ(k − l|k) − f (k − l))T ( fˇ(k − l|k) − f (k − l))

k=l

= x0T P0−1 x0 +

N  k=0

f T (k) f (k) +

N−1 

d T (k)d(k)

k=0

¯ x x0 − G ¯ f fN − G ¯ d dN )T (y0,N − G ¯ x x0 − G ¯ f fN − G ¯ d dN ) + (y0,N − G ˜ x x0 − G ˜ f fN − G ˜ d dN )T (yz,N − G ˜ x x0 − G ˜ f fN − G ˜ d dN ) +(yz,N − G − γ −2

N 

( fˇ(k − l|k) − f (k − l))T ( fˇ(k − l|k) − f (k − l)),

(3.4.9)

k=l

where y0,k = [yT0 (0) · · · yT0 (k)]T , yz,k = [yTz (0) · · · yTz (k)]T y0 (i) = y(i), yz (i) = 0 (i = 0, . . . , k) $ ¯ = ρI,  ˜ = ρ(1 − ρ)I.  Therefore, if the H∞ performance index (3.4.4) is satisfied, then following the same line with the correlation between (3.4.1) and (3.4.4), we have J > 0 subject to the dynamics (3.4.6) over x0 , fk , and dk . Sufficiency. For (3.4.6), since the value of y0 (k) is equivalent to y(k) and yz (k) ≡ 0, in light of (3.4.9), it is easy to find out that for a given constant γ > 0 and an integer l > 0, J0 = J, which indicates that if J > 0 holds, then the H∞ performance (3.4.4) is satisfied. Combing the sufficiency and necessity part, the proof is complete.  In virtue of Lemma 3.4.1, the auxiliary performance index J in (3.4.5) can be converted into the following compact form: ⎤T ⎡ ⎤−1 ⎡ ⎤ ⎡ I 0 0 0 x0 x0 ⎢ ⎥ ⎢ dN ⎥ ⎢0 I 0 0 ⎥ ⎥ ⎢ ⎥ ⎢ dN ⎥ (3.4.10) J=⎢ ⎣ fN ⎦ ⎣0 0 I 0 ⎦ ⎣ fN ⎦ va,N va,N 0 0 0 Qa,N

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

where



⎧ v0 (k) ⎪ ⎪ , 0≤k 0, the H∞ performance (3.4.5) has a minimum over x0 , f , d if and only if Qa (k) and Qw (k) have the same inertia, where Qw (k) = w(k), w(k) is the covariance matrix of innovation sequence w(k) given by w(k) = ya (k) − yˆ a (k),

(3.4.17)

where yˆ a (k) is the projection of ya (k) onto L{{ya ( j)}k−1 j=0 }. Furthermore, the minimum value of J is Jmin =

N

 k=l

+

y f (k) − C1 (k)x(k) ˆ fˇ(k− l|k)− fˆ(k− l|k − 1)

T



y f (k) − C1 (k)x(k) ˆ Q−1 (k) w fˇ(k− l|k)− fˆ(k− l|k − 1)

l−1  T −1 [y f (k) − C1 (k)x(k)] ˆ Qw (k)[y f (k) − C1 (k)x(k)] ˆ

(3.4.18)

k=0

with x(k) ˆ and fˆ(k − l|k − 1) are respectively calculated from the Krein space projections of x(k) and f(k − l) onto L{{ya ( j)}k−1 j=0 }. Remark 3.4.3. According to Lemma 3.4.1 and Lemma 3.4.2, the purpose of establishing the dynamic model (3.4.6) associated with (3.4.5) is to derive a positive minimum of the cost function (3.4.4) by applying the projection theory in Krein space. Notice that although the measurement {y(k)}Nk=0 is a substantially stochastic sequence, the instantaneous values of y(k) and fˇ(k − l|k) at each instant are available for the estimator. Thus, the equivalent cost function (3.4.5) and its corresponding dynamic constraint are constructed in a conditional expectation sense by gathering up {y(k)}Nk=0 (cf. Equation (3.4.9) in the proof of Lemma 3.4.1).

3.4.3 Kalman filtering in krein space From the preceding analysis, the key step to achieve our goal is to find a suitable x(k) ˆ and fˆ(k − l|k − 1). To this end, let

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

117

y1 (k) = y f (k)

y f (k) y2 (k) = ˇ , f(k|k + l) then y1 (k − l + i) = C1 (k − l + i)x(k − l + i) + v˜ 1 (k − l + i), i = 1, . . . , l y2 (i) = C2 (i)x(i) + Hf(i) + v˜ 2 (i), i = 0, . . . , k − l, where v˜ 1 (k) = v1 (k) and v˜ 2 = [vT1 (k) vTs (k + l)]T are zero-mean white noises with the following covariance matrices, respectively: Qv˜ 1 (k) = diag{I, I} Qv˜ 2 (k) = diag{I, I, −γ 2 I}. It is easy to check out that {y2 (0), . . . , y2 (k − l); y1 (k − l + 1), . . . , y1 (k)} span the same linear space as L{{ya ( j)}kj=0 }. To proceed, the following definition is introduced. Definition 3.4.1 [42]. For t > k − l, the estimator η(t, ˆ 1) is the optimal k−l−1 ; {y1 (t )}t=k−1 estimation of η(t ) on the observation L{{y2 (t )}t=0 t=k−l }. For 0 < t ≤ k − l, the estimator η(t, ˆ 2) is the optimal estimation of η(t ) on the observation L{{y2 (t )}t=k−1 t=0 }. In accordance with (3.4.17), the innovation sequence is defined as follows: w1 (k − l + i) = C1 (k − l + i)e1 (k − l + i) + v˜ 1 (k − l + i), i = 0, . . . , l (3.4.19) w2 (i) = C2 (i)e2 (i) + H f (i) + v˜ 2 (i), i = 0, . . . , k − l,

(3.4.20)

where e1 (k − l + i) = x(k − l + i) − xˆ (k − l + i, 1), i = 0, . . . , l e2 (i) = x(i) − xˆ (i, 2), i = 0, . . . , k − l with the corresponding covariance matrices given as P1 (k − l + i) = e1 (k − l + i), e1 (k − l + i) , i = 0, . . . , l P2 (i) = e2 (i), e2 (i) , i = 0, . . . , k − l. In light of Lemma 2.2.1 in the work of Xie and Zhang [43], the innovation k−l−1 ; {w1 (t )}t=k−1 sequences L{{w2 (t )}t=0 t=k−l } are uncorrelated white noises and span the same linear space as L{{ya ( j)}kj=0 }. For deriving xˆ (k − l, 2) (k = l + 1, l + 2, . . .), applying the Krein space– based projection formula in the work of Hassibi et al. [44] by taking (3.4.15) and (3.4.16) into account, we have that ⎧ xˆ (k − l, 2) = A(k − l − 1)ˆx(k − l − 1, 2) + x(k − l), w2 (k − l − 1)

⎪ ⎪ ⎨ × w2 (k − l − 1), w2 (k − l − 1) −1 w2 (k − l − 1) (3.4.21) ⎪ = A(k − l − 1)ˆx(k − l − 1, 2) + K2 (k − l − 1)w2 (k − l − 1) ⎪ ⎩ xˆ (0) = 0

118

Fault diagnosis and prognosis techniques for complex engineering systems

where K2 (k − l − 1) = (A(k − l − 1)P2 (k − l − 1)C2T (k − l − 1) + B f (k − l − 1)H )Q−1 2 (k − l − 1) with Q2 (k−l−1) = C2 (k−l−1)P2 (k−l−1)C2T (k−l−1)+HH T +Qv˜ 2 (k−l−1). In addition, following the definition of P2 (i) and (3.4.21), P2 (i) (i = 0, 1, . . . , k − l − 1) is the solution to the following standard Riccati equation ⎧ T T T ⎪ ⎨P2 (i + 1) = A(i)P2 (i)A (i) + B f (i)B f (i) + D(i)D (i) −1 T (3.4.22) − K2 (i)Q2 (i)K2 (i) ⎪ ⎩ P2 (0) = P0 For calculating xˆ (k − l + i, 1) (i = 1, . . . , l) with the initial condition xˆ (k − l, 1) = xˆ (k − l, 2), we apply the projection formula once again such that xˆ (k − l + i + 1, 1) = A(k − l + i)ˆx(k − l + i, 1) + A(k − l + i) x(k − l + i), w1 (k − l + i)

× w1 (k − l + i), w1 (k − l + i) −1 w1 (k − l + i) = A(k − l + i)ˆx(k − l + i, 1) +K1 (k − l + i)w1 (k − l + i),

(3.4.23)

where K1 (k − l − 1) = A(k − l + i)P1 (k − l + i)C1T (k − l + i)Q−1 1 (k − l + i) with Q1 (k − l + i) = C1 (k − l + i)P1 (k − l + i)C1T (k − l + i) + Qv˜ 1 (k − l + i), and P1 (k − l + i) is computed recursively in the following form: ⎧ P1 (k − l + i + 1) = A(k − l + i)P1 (k − l + i)AT (k − l + i) ⎪ ⎪ ⎪ ⎪ ⎪ + B f (k − l + i)BTf (k − l + i) ⎪ ⎨ + D(k − l + i)DT (k − l + i) (3.4.24) ⎪ ⎪ −1 T ⎪ (k − l + i)Q (k − l + i)K (k − l + i) − K ⎪ 1 1 2 ⎪ ⎪ ⎩ P1 (k − l) = P2 (k − l) ˆ − l|k − 1)—that Similarly, the projection formula is reutilized to compute f(k is, l−1  ˆf(k − l|k − 1) = f(k − l), w1 (k − l + i) Q−1 1 (k − l + i)w1 (k − l + i) i=0

=

l−1 

T −1 k−l k−l+iC1 (k − l + i)Q1 (k − l + i)w1 (k − l + i),

i=0

i = 1, . . . , l − 1, k−l k−l+i ,

(3.4.25)

i = 1, . . . , l − 1 is obtained recursively in terms of where ⎧ k−l k−l ⎨k−l+i = k−l+i−1 [A(k − l + i − 1) − K1 (k − l + i − 1)C1 (k − l + i − 1)]T (3.4.26) ⎩k−l = BT (k − l). f k−l+1

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

119

Finally, to calculate Qw (k) that is associated with Jmin and fˇ(k − l)|k, define ˜f(k − l) = f(k − l) − ˆf(k − l|k − 1), and then, from (3.4.25), we know that l−1  ( T −1 ˜f(k − l), ˜f(k − l) = I − k−l k−l+iC1 (k − l + i)Q1 (k − l + i)

'

i=0

T T × k−l k−l+iC1 (k − l + i) . 

(3.4.27)

Combing (3.4.19), (3.4.20) and (3.4.27), we have ⎧ C (k)P1 (k)C1T (k) + I, 0 < k < l ⎪ ⎪ 1 ! ⎪ ⎨ C (k)P (k)CT (k) + I T C1 (k)(k−l 1 1 1 k ) Qw (k) = ' ( , ⎪ k−l T 2 ˜(k − l), f˜(k − l) ⎪  C (k) −γ I + I − f ⎪ 1 k ⎩ k≥l (3.4.28) where P1 (k) and k−l l−l+i are the same as in (3.4.24) and (3.4.26).

3.4.4 H∞ fault estimator design From analysis and lemmas presented previously, we are now in the position to give our main results for designing the fault estimator, which is summarized in the following theorem. Theorem 3.4.1. For (3.4.6), given a scalar γ > 0 and an integer l > 0, the H∞ fixed-lag fault estimator that satisfies (3.4.5) exists if and only if 1 (k) = C1 (k)P1 (k)C1T (k) + I > 0

(3.4.29)

and 3 (k) = −γ 2 I + I −

l−1 

T k−l k−l+iC1 (k − l + i)

i=0 −1 T × Q1 (k − l + i)(k−l k−l+iC1 (k − l T −1 k−l T T −k−l k C1 (k)1 (k)(k C1 (k))

+ i))T < 0.

(3.4.30)

In this case, a feasible fault estimator is given by fˇ(k − l|k) =

l 

T −1 k−l k−l+iC1 (k − l + i)Q1 (k − l + i)

i=0

ˆ − l + i, 1)], (3.4.31) ×[y f (k − l + i) − C1 (k − l + i)x(k where x(k−l+i, ˆ 1), Q1 (k−l+i), and k−l k−l+i are calculated by (3.4.21), (3.4.22), (3.4.23), (3.4.24), and (3.4.26).

120

Fault diagnosis and prognosis techniques for complex engineering systems

Proof. For k ≥ l, applying the block triangular factorization technique to Qw (k) in (3.4.28), we have



T

I 0 I 0 1 (k) 0 , Qw (k) = 2 (k)T 1 (k)−1 I 0 3 (k) 2 (k)T 1 (k)−1 I (3.4.32) T where 2 (k) = k−l k C1 (k). Thus, from Lemma 3.4.2, we know that Qa (k) and Qw (k) have the same inertia if and only if 1 (k) > 0 as well as 3 (k) < 0. Furthermore, based on (3.4.18) and (3.4.32), J has a minimum Jmin if (3.4.29) and (3.4.30) are satisfied, where

N

 y f (k) − C1 (k)x(k) ˆ I 0 T Jmin = (k) I fˇ(k − l|k) − fˆ(k − l|k − 1) −2 (k)−1 1 k=l

−1

I 0 1 (k) 0 × I −2 (k)−1 0 −1 1 (k) 3 (k)

y f (k) − C1 (k)x(k) ˆ × ˇ f (k − l|k) − fˆ(k − l|k − 1)

+

l−1  T −1 [y f (k) − C1 (k)x(k)] ˆ 1 (k)[y f (k) − C1 (k)x(k)]. ˆ (3.4.33) k=0

Since 3 (k) < 0, to guarantee Jmin > 0, combining (3.4.25) with (3.4.33), we know that a possible choice of fˇ(k − l|k) is fˇ(k − l|k) = fˆ(k − l|k − 1) + 2 (k)−1 (y f (k) − C1 x(k)) ˆ =

l 

T −1 k−l k−l+iC1 (k − l + i)Q1 (k − l + i)

i=0

ˆ − l + i, 1)], ×[y f (k − l + i) − C1 (k − l + i)x(k which indicates (3.4.31). This completes the proof.  Remark 3.4.4. It can be seen from Theorem 3.4.1 that the superiority of the proposed algorithm lies in three aspects: (i) In contrast to the results elsewhere [19–21, 37, 38], the proposed algorithm can be applied to systems with time-varying ρ(k). (ii) Comparing to the result of Li and Zhong [39], the parameter matrices of the addressed estimator are given in terms of standard Riccati equations with the same dimension n of system (3.4.6), where no coupled Lyapunov equation with higher dimension is needed. (iii) The fault can be estimated in an arbitrary fixed-lag l.

3.4.5 Numerical examples To illustrate the effectiveness and the applicability of the proposed method, we shall implement our algorithm on a time-varying model. The following system

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

121

1.5

Change of (k)

1

0.5

0

−0.5

0

10

20

30

40

50

60

70

80

90

k FIGURE 3.8

The change mode of θ (k).

matrices are adopted that are borrowed from Ma et al. [45] and Moayedi et al. [46]:

0.8 0 A(k) = (1 + 0.2 sin(0.02kπ )) × 0.9 0.2  T   B f (k) = 0.5 0.5 , C(k) = 1 1  T D(k) = 0.3 0.25 The process noise d(k) is uniformly randomly chosen from the interval [−0.5, 0.5], and the measurement noise v(k) is assumed as v(k) = 0.5 sin(0.2k). The fault signal f (k) is assumed to be time varying in the following sinusoidal form sin(0.5k), k ∈ [30, 80] f(k) = 0, otherwise and the expectation of θ (k) is assumed as ρ = 0.8, where Fig. 3.8 displays the switching mode of θ (k). Setting l = 10, γ = 1.52, x0 = [0.2 0]T , and P0 = 0.1I, we design the fault estimator by applying Theorem 3.4.1. Fig. 3.9 displays the fault signal and its estimation simultaneously. Fig. 3.10 shows the value of f (k − l) − fˇ(k − l|k),

122

Fault diagnosis and prognosis techniques for complex engineering systems

2 Fault signal Fault estimation 1.5

Fault and its estimation

1

0.5

0

−0.5

−1

−1.5

−2

0

10

20

30

40

50

60

70

80

90

50

60

70

80

90

k

FIGURE 3.9

Fault and its estimation.

2

1.5

1

Estimation error

0.5

0

−0.5

−1

−1.5

−2

0

10

20

30

40 k

FIGURE 3.10

Fault estimation error.

H∞ fault estimation for linear discrete time-varying systems Chapter | 3

123

which is the error between the fault and its estimation. It can be seen from the results that our algorithm can track the fault signal regardless of whether random packet dropouts occur.

3.5

Conclusion

In this chapter, we have dealt with fault estimation problem for LDTV systems with random uncertainties. It is divided into the following three aspects. First, we have handled the problem of robust fault detection for LDTV systems subject to multiplicative noise, and l2 -norm bounded unknown disturbance is investigated. The design of the FDF is converted into the framework of H∞ filtering. A sufficient condition on the existence of the FDF is derived in terms of a Riccati equation, and by solving the Riccati equation, an analytical solution of the parameter matrices is obtained. Second, the FDF design problem of the LDTV system with multistep data packet dropouts is studied. The observer-based robust H∞ -FDF is used as a residual generator to transform the FDF design problem into a H∞ filtering problem of a class of stochastic time-varying systems. By applying the derivation derived in Section 2, the sufficient conditions for the existence of H∞ -FDF based on Riccati equation are given. The solution of the parameter matrix is transformed into a quadratic optimization problem. By solving the Riccati equation, the analytic solution of the FDF parameter matrix is obtained. Third, the problem of H∞ fixed-lag fault estimator design for LDTV systems subject to intermittent observations has been dealt with. Special efforts have been made to handle the multiplicative uncertainty introduced by the random measurement packet dropouts. Through defining a couple of equivalent dynamic systems and the H∞ performance index, the fault estimator has been derived by using the projection formula in Krein space based on the reorganized innovation approach. The parameter matrices of the estimator have been calculated by solving two standard Riccati equations. Finally, the achieved results are illustrated by numerical examples.

Acknowledgments This work was supported in part by the National Natural Science Foundation of China under grants 61973135, 91948201, and 61773242, and in part by the Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project) under grant 2019JZZY10441.

References [1] S. X. Ding. Fault identification schemes. In Model-Based Fault Diagnosis Techniques: Design Schemes, Algorithms and Tools (2nd ed.). Advances in Industrial Control. Springer, London, UK, 2013. 441–470.

124

Fault diagnosis and prognosis techniques for complex engineering systems

[2] J.P. Cai, C.Y. Wen, H.Y. Su, Z.T. Liu, Robust adaptive failure compensation of hysteretic actuators for a class of uncertain nonlinear systems, IEEE Transactions on Automatic Control 58 (9) (2013) 2388–2394. [3] J.L. Liu, Y. Dong, Event-based fault detection for networked systems with communication delay and nonlinear perturbation, Journal of the Franklin Institute 350 (9) (2013) 2791–2807. [4] S.X. Ding, T. Jeinsch, A unified approach to the optimization of fault detection systems, International Journal of Adaptive Control & Signal Processing 14 (7) (2015) 725–745. [5] Y. Zhang, H.J. Fang, Z.X. Liu, Fault detection for nonlinear networked control systems with Markov data transmission pattern, Circuits Systems & Signal Processing 31 (4) (2012) 1343– 1358. [6] Y. Zhang, Z.X. Liu, H.J. Fang, H.B. Chen, H∞ fault detection for nonlinear networked systems with multiple channels data transmission pattern, Information Sciences: An International Journal 221 (1) (2013) 534–543. [7] B. Shen, S.X. Ding, Z.D. Wang, Finite-horizon H∞ fault estimation for linear discrete timevarying systems with delayed measurements, IEEE Transactions on Circuits & Systems II: Analog & Digital Signal Processing 60 (12) (2013) 902–906. [8] M.Y. Zhong, S.X. Ding, E.L. Ding, Automatica 46 (8) (2010) 1395–1400. [9] M.Y. Zhong, D.H. Zhou, S.X. Ding, On designing H∞ fault detection filter for linear discrete time-varying systems, IEEE Transactions on Automatic Control 55 (7) (2010) 1689–1695. [10] H.S. Zhang, G. Feng, C.Y. Han, Linear estimation for random delay systems, Systems & Control Letters 60 (7) (2011) 450–459. [11] A. Bouhtouri, D. Hinrichsen, A. Pritchard, H∞ -type control for discrete-time stochastic systems, International Journal of Robust and Nonlinear Control (13) (1999) 923–948. [12] A.M. Rami, X.Y. Zhou, Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls, International Journal of Robust and Nonlinear Control 45 (6) (2000) 1131–1143. [13] V. Dragan, T. Morozan, A. Stoica, h2 Optimal control for linear stochastic systems, Automatica 40 (7) (2004) 1103–1113. [14] E. Gershon, U. Shaked, I. Yaesh, H∞ Control and Estimation of State-Multiplicative Linear Systems, Springer, London, 2005. [15] W.H. Zhang, Y.L. Huang, H.S. Zhang, Stochastic H2 /H∞ control for discrete-time systems with state and disturbance dependent noise, Automatica 43 (3) (2007) 513–521. [16] S. X. Ding, P. Zhang, E. L. Ding. Fault detection system design for linear a class of stochastically uncertain systems. IFAC Proceedings Volumes 2006;39(13):705–710. [17] C.F. Ma, M.Y. Zhong, M. Sader, T. Jeinsch, Robust fault detection for linear systems with multiplicative noise, IFAC Proceedings Volumes 39 (13) (2006) 1228–1233. [18] M. Tabbara, D. Nesic, A.R. Teel, Stability of wireless and wireline networked control systems, IEEE Transactions on Automatic Control 52 (9) (2007) 1615–1630. [19] J. Yu, M. Liu, W. Yang, P. Shi, Robust fault detection for Markovian jump systems with unreliable communication links, International Journal of Systems Science 44 (11) (2013) 2015–2026. [20] H.L. Dong, Z.D. Wang, H.J. Gao, On design of quantized fault detection filters with randomly occurring nonlinearities and mixed time-delays, Signal Processing 92 (4) (2011) 1117–1125. [21] S.M. Alavi, M. Saif, Fault detection in nonlinear stable systems over lossy networks, IEEE Transactions on Control Systems Technology 21 (6) (2013) 2129–2142. [22] X. He, Z.D. Wang, D.H. Zhou, Networked fault detection with random communication delays and packet losses, International Journal of Systems Science 39 (11) (2008) 1045–1054.

H∞ fault estimation for linear discrete time-varying systems Chapter | 3 [23] [24]

[25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]

[38]

[39]

[40]

[41] [42] [43] [44]

125

Y.B. Ruan, W. Wang, F.W. Yang, Fault detection filter for networked systems with missing measurements, Control Theory Applications 26 (3) (2009) 291–295. Y.Q. Wang, Y. Hao, S.X. Ding, G.Z. Wang, D.H. Zhou, Residual generation and evaluation of networked control systems subject to random packet dropout, Automatica 45 (10) (2009) 2427–2434. X. He, Z.D. Wang, D.H. Zhou, Robust fault detection for networked systems with communication delay and data missing, Automatica 45 (11) (2009) 2634–2639. T. Morozan, Stabilization of some stochastic discrete–time control systems, Stochastic Analysis & Applications 1 (1) (1983) 89–116. E. Gershon, U. Shaked, I. Yaesh, H∞ control and filtering of discrete-time stochastic systems with multiplicative noise, Automatica 37 (3) (2001) 409–417. M. Green, D.N. Limebeer, Robust Linear Control, Prentice Hall, Englewood Cliffs, NJ, 1995. E. Kreyszig, Introductory Functional Analysis with Applications., Wiley Classics Library, 1978, pp. 196–197. M.Y. Zhong, S. Liu, H.H. Zhao, Krein space-based H∞ fault estimation for linear discrete time-varying systems, Acta Automatica Sinica 34 (12) (2008) 1529–1533. X.G. Yu, C.S. Hsu, Reduced order H∞ filter design for discrete time-variant systems, International Journal of Robust & Nonlinear Control (8) (1997) 797–809. H. Gao, T. Chen, L. Wang, Robust fault detection with missing measurement, International Journal of Robust and Nonlinear Control 81 (5) (2008) 804–819. Y. Zhao, J. Lam, H.J. Gao, Fault detection for fuzzy systems with intermittent measurements, IEEE Transactions on Fuzzy Systems 17 (2) (2009) 398–410. J.P. Hespanha, P. Naghshtabrizi, Y.G. Xu, A survey of recent results in networked control systems, Proceedings of the IEEE 95 (2007) 138–162. J. Nilsson, Real-Time Control Systems With Delays, Department of Automatic Control, Lund Institute of Technology, Lund, Sweden (1998). A.A. Stoorvogel, H.H. Niemann, A. Saberi, P. Sannuti, Optimal fault signal estimation, International Journal of Robust & Nonlinear Control 12 (8) (2002) 697–727. X.B. Wan, H.J. Fang, F. Yang, Fault detection for a class of networked nonlinear systems subject to imperfect measurements, International Journal of Control, Automation, and Systems 10 (2) (2013) 265–274. D. Zhang, Q.G. Wang, L. Yu, H.Y. Song, Fuzzy-model-based fault detection for a class of nonlinear systems with networked measurements, IEEE Transactions on Instrumentation & Measurement 62 (12) (2013) 3148–3159. Y.Y. Li, M.Y. Zhong, On designing robust H∞ fault detection filter for linear discrete timevarying systems with multiple packet dropouts, Acta Automatica Sinica 36 (12) (2010) 1788– 1796. Y.Y. Li, S. Liu, Z.H. Wang, Fault detection for linear discrete time-varying systems with measurement packet dropping, Mathematical Problems in Engineering 2013 (pt.5) (2013) 697345.1–697345.9. H. Zhao, C. Zhang, H∞ fixed-lag smoothing for linear discrete time-varying systems with uncertain observations, Applied Mathematics & Computation 224 (1) (2013) 387–397. X. Lu, H. Zhang, J. Yan, H∞ deconvolution fixed-lag smoothing, International Journal of Control, Automation & Systems 8 (4) (2010) 896–902. L.H. Xie, H.S. Zhang, Control and Estimation of Systems with Input/Output Delays, Springer, Berlin, Germany, 2007. B. Hassibi, A.H. Sayed, T. Kailath, Linear estimation in Krein spaces⬔Part I: Theory, IEEE Transactions on Automatic Control 41 (1) (1996) 18–33.

126

Fault diagnosis and prognosis techniques for complex engineering systems

[45] J. Ma, S.L. Sun, Optimal linear estimators for systems with random sensor delays, multiple packet dropouts and uncertain observations, IEEE Transactions on Signal Processing 59 (4) (2011) 5181–5192. [46] M. Moayedi, Y.K. Foo, Y.C. Soh, Adaptive Kalman filtering in networked systems with random sensor delays, multiple packet dropouts and missing measurements, IEEE Transactions on Signal Processing 58 (3) (2010) 1577–1588.

Chapter 4

Fault diagnosis and failure prognosis of electrical drives Elias G. Strangas Michigan State University, United States

4.1 Introduction The interest in continuous, safe, and reliable operation of electrical drives has dramatically increased in recent years. Until a few decades ago, an AC electrical machine was typically a synchronous generator or a three-phase uncontrolled motor that would operate connected directly to the power grid, or if a DC machine, its speed would be controlled through a rectifier. Although these applications remain, they have been augmented by many others. Many developments have led to this explosion of applications. Primarily, the ubiquitous use of power electronics and microcontrollers, DSPs, and so forth, have allowed control of speed and torque, among others, as well as the utilization of new motor types, such as permanent magnet AC (PMAC) and switched and synchronous reluctance, with increased versatility, higher torque, and power density. These new power electronics-controlled machines have been finding uses in applications in manufacturing, robotics, vehicle traction and operation, airplane propulsion, wind, and wave generators, but also in medical, commercial, and consumer electronics that are becoming vital in health and other necessary devices. The need for higher reliability and the associated fault diagnosis and failure prognosis resulted from the fact that these applications replaced older methods of control, such as pneumatic or other mechanical links, which were considered reliable, and from the desired and expected extension of life of well-established systems like generators. The tools for fault diagnosis and prognosis of failure of electrical machines and drives are not fundamentally different from those used in other complex engineering systems. What differs here are the faults themselves, the variables and parameters that are monitored, and the interaction between components. What is also important is the utilization of the results of diagnosis and prognosis. Interruption of service and request for maintenance is not the only Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00008-2 Copyright © 2021 Elsevier Inc. All rights reserved. 127

128

Fault diagnosis and prognosis techniques for complex engineering systems

option; others are the employment of mitigating partial redundancies in hardware and control without interruption of service. In the past, regularly scheduled maintenance was the standard, based on a schedule or simple evaluation of health. Condition-based maintenance processes are a precursor to fault diagnosis and prognosis, and to the resulting decisions and actions. Originally based on sensing of vibrations, overvoltages and overcurrents, and higher temperatures, it evolved to signature analysis of currents or voltages, using initially Fourier transform of signals and thresholding, to the present-day features extraction, signal processing, fault classification, and decision making. Fault diagnosis alone, to some degree, is not useful or adequate to lead to a informed decision. Prognosis, whether implicit or explicit, is what will allow the operator, human or automated, to take the drive out of operation, or use redundancies or other mitigating measures. All of these—detection, identification, diagnosis, prognosis, decision making, and action planning—have been designed to work together under a unified control system that provides health monitoring and reliability improvement. A fault in this context means a malfunction that allows the continuous operation of the drive, but that may lead to an eventual disruption or at least a deterioration of operation. A failure is in unplanned interruption of service. Management of the condition may include early emergency interruption of service, scheduling of maintenance, mitigation by use of redundancies, or continuous operation at lower performance. For the drive to be able to utilize any of these management systems, a fault has to be detected and identified, its severity diagnosed, and its progression monitored and predicted. Most electrical machines, motors, or generators are a type of induction or synchronous machine. The first are typically squirrel cage or wound rotor field machines, with a number of double-fed ones. The second, synchronous machines, are typically generators with a wound field, but increasingly permanent magnet AC ones of size up to a few hundred kilowatts to megawatts. The power electronic supply consists of an inverter, usually operating at high switching frequency, and a controller, with adequate sensors and high-frequency sampling of variables. To increase the reliability and availability of a drive, including diagnosis and prognosis, a number of steps are required: • Design of the drive and design evaluation through risk analysis and simulation, • Testing before assembly of components and materials to ascertain their quality and establish statistical characteristics, and • Testing before commissioning to avoid early failures and to develop baseline operating parameters. The qualities of diagnosis and prognosis of a drive that will allow avoidance of a catastrophic faults are as follows:

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

DC supply

129

a b c

(A) Schematic of a two-level inverter FIGURE 4.1

(B) Possible voltages obtained from a threephase two-level inverter

A two-level inverter.

• Speed of detection, specificity, sensitivity, and confidence, and the ability to determine severity, and • Continuously updating prognosis and confidence mitigation update of modified system remaining useful life (RUL) and reliability. The main components of an electrical drive are an electrical machine, consisting of steel stator and rotor cores, windings, possibly magnets, bearings, and frame; a power electronics supply, consisting of power electronics switches, capacitors, inductors, and resistors and sensors: current, voltage, temperature, vibrations, magnetic field, and a controller. Fig. 4.1(A) shows the schematic of the simplest three-phase, two-leg voltage source inverter, and Fig. 4.1(B) shows the space vector of the voltages that it can produce, through all possible safe combinations of open and closed switches. Each of the three outputs can be connected to either the positive or negative DC rail and the space vector of output voltages, a complex variable defined as vαβ = va + vb e jγ + vc e− jγ = vα + jvβ

γ = 120o.

(4.1)

There are six nonzero such vectors, and two zero ones, which are arrived at when each phase is connected to the positive or negative DC rail. The desired voltages, which are not one of these eight possibilities, are created through pulse width modulated technique, connecting each output phase alternating between the two DC links during each power cycle. A commanded voltage vα + jvβ representing a sector in Fig. 4.1(B) is synthesize by appropriately selecting duty cycles, d1 , d2 , and d3 , averaging the times of the three neighboring voltage vectors, Vr , Vl , and V0 . A pulse width modulation (PWM) inverter using fast electronic switches, insulated gate bipolar transistors (IGBTs), and MOSFETS can produce the desired voltage almost instantaneously, within a microsecond. The switching frequency, which defines the frequency of high harmonics, is typically between 5 and 25 kHz. Two control schemes are used primarily today: most applications in synchronous and induction machines use some version of field-oriented or vector

130

Fault diagnosis and prognosis techniques for complex engineering systems

control, which is based on commanding phase currents. From these current commands, voltage commands are derived through either high gain controllers alone or in combination with feed-forward control. Both the control and the health evaluation of the drive use sensors. Some of the sensors are integral parts of a drive system, such as current and speed sensors in an AC motor drive; others, such as interferometers and magnetic field sensors, are additions that may increase the cost but also decrease reliability due to additional wiring, space, and so forth.

4.1.1

Operation under field orientation control

Vector control is a technique that allows the decoupling of the control of flux in an AC machine from the control of the torque. This is accomplished through two transformations of the terminal voltages and currents: the first is the Clarke transformation (Eq. 4.1) discussed already, which transforms the phase voltages or other three-phase quantities to two. The second is the Park transformation, which transforms these stationary quantities, with subscripts vα and vβ , to a rotating frame of reference: vd + jvq = (vα + jvβ )ehθ

(4.2)

The angle θ in Eq. 4.1 in the case of orientation to the rotor magnetic field is that of the instantaneous position of the rotor field established by the rotor and stator currents, and in the case of a PMAC machine by the magnets of the rotor. In the case of a PMAC machine, this field position is easily measured, as it coincides with the rotor position and a simple sensing of this position suffices. In the case of an asynchronous (or induction) machine, this field moves with respect to the rotor and its location has to be estimated accurately for high-performance torque control. Equally simply, the flux due to the rotor magnets, λd , is usually known a priori through the testing of the machine commissioning or manufacturing, whereas for an induction machine, λd has to also be estimated, along with its position. The advantages of the decoupling through field orientation making themselves apparent in the torque equation for induction machines using fieldoriented variables is T =

3 λd i q , 2

(4.3)

where λd is the flux linkages of the stator in the −d axis and iq is the component of the stator current perpendicular to that (i.e., the q axis). In an induction machine, the rotor field is established and controlled through the id component. For PMAC machines, at high speeds the field also has to be modified though the component stator current aligned with the rotor flux, id . For salient geometry PMAC machines, this current is used to provide an additional component of the torque. In all of these cases, if the machine is supplied by a voltage source inverter, the commanded current is established through the use of proportional-integral

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

Inverse Park’s Transformation *

+-

PI +-

*

PI

131

Source

* *

*

*

*

Space Vector PWM

PMSM Inverter

Shaft

Ɵ

Park’s Transformation

FIGURE 4.2

Schematic of a field-oriented PWM drive.

(PI) controllers, with relatively high bandwidth. Fig. 4.2 shows a typical scheme for the control of a permanent magnet synchronous machine (PMSM).

4.1.2

Operation under Direct Torque Control

Direct torque control (DTC) replaces the decoupling in field-oriented control with bang-bang control, which naturally fits the inherently discrete nature of switch-mode power inverters. DTC does not involve space vector modulation but utilizes a switching table that consists of different voltage vectors. In addition, the rotor position sensing that is essential for FOC is not necessary for DTC to operate properly even if a speed control loop is included in the DTC scheme. Instead, the desired stator flux λ and torque T are established through comparators and voltage commands. Fig. 4.3 show a typical controller for direct torque control. Almost all diagnosis and prognosis is based on the collection of operating data, and the design of a model—deterministic, stochastic, or entirely analytical. We should differentiate between data required to create a data-based model of the drive components and the data used to arrive at a decision for one device. The

Torque & Flux Observer

FIGURE 4.3

Schematic of a DTC PMAC drive. From Niu et al. [1]

132

Fault diagnosis and prognosis techniques for complex engineering systems

under highperformance control

Measure and preprocess Data storage currents, voltages,

n

(LDA, k-NN,.).

Healthy

Early health decision

probability, and accuracy

Decision: healthy,

FIGURE 4.4

Prognosis

Fault classification and decision flowchart.

preceding is directly related to the time available to decide appropriate action and the accuracy of the decision.

4.2 What can fail and how Not all components of a drive are prone to the same rate of failure, and of course they develop different types of faults. Although many faults involve two different components, such as an inverter and a machine, we discuss the modes of failure of each separately.

4.2.1

Electric power converters

Inverter and DC converters consist of power electronics switches and gate drivers, as well as inductors and capacitors. All of these are subjected to faults and failures, and are considered to be of the weaker components of a drive.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

133

4.2.1.1 IGBT catastrophic failure mechanisms Generally, catastrophic failure mechanisms of IGBTs are mostly caused by overstressed working conditions and are understood by studying the physics of the devices (a detailed review was published by Wu et al. [2]): • Open-circuit failure may not be directly catastrophic, but it will disrupt the drive operation and can cause secondary faults. One is bond wire lift-off and can happen after a short circuit. It results from uneven thermal expansion between silicon and aluminum, together with high temperature gradients. This unevenness results in a crack around the bonding interface. One bond wire lifts off, the current in the rest increases, along with thermal gradient and stress, and the crack expands. Another type of open fault is caused by There are many possible causes of gate driver failure; a typical one is when damage in the wires connecting the drive board and IGBT are disconnected. The driver failure may result in IGBT intermittent misfiring and degraded output voltage. In addition, abnormal operating conditions may also damage the gate driver, which is more sensitive to temperature rise than the IGBT itself. • Short-circuit failure can lead to destruction of the IGBT, and this can precipitate the failure of the remaining IGBTs and the motor, as it results in high current through the circuit. High voltage breakdown is the most common failure. It occurs as the IGBT turns off the collector current very quickly; the falling current through the small stray series inductance, not compensated for by the bar design or snubber capacitors, causes a high turn-off voltage spike. The electric field can reach a critical value. It reaches a few IGBT cells first and leads to high leakage current and high local temperature. A high value of collector-emitter and gate-emitter voltages can also lead to a short circuit during turn-on. Latch-up is a condition where the collector current is no longer controlled by the gate voltage. Latch-up happens when the parasitic NPN transistor of the IGBT is turned on, and works together with the main PNP transistor as a thyristor, that turns on but not off. Static latchup happens at high collector currents, whereas dynamic latch-up happens during switching transients, usually during turn-off. Second breakdown is a local thermal breakdown for transistors due to high current stresses, which can also happen to IGBTs during on-state and turn-off. With the increase of current, the charge density in the collector-base junction increases, and the breakdown voltage decreases. This results in a further increase in the current density. When the area of the high current density region reduces beyond the minimum area of a stable current filament, the temperature increases rapidly and causes a short circuit. As for energy shocks, during short circuit at the on-state, failure may happen due to high power dissipation, an energy shock that will result in fast-rising temperature. The IGBT may not immediately fail, even when the junction temperature exceeds the rated temperature, but it may survive until more repetitive short circuits occur.

134

4.2.2

Fault diagnosis and prognosis techniques for complex engineering systems

Electrical machines

Most of the electrical machines in use at present are squirrel cage induction and PMAC motors. They come in many variations, with distinct operational and fault characteristics and weaknesses. Switched reluctance machines are less common and have been introduced in niche applications; their faults and failures are also discussed briefly here. Many other types (permanent magnet axial flux, variable flux, etc.) are far less common and have some similarities, at least in failure modes, to the those discussed in more detail. We first discuss faults that may occur in more than one machine type, as well as specific machine types that have particular fault modes associated with them.

4.2.2.1 Bearing faults These are of the most common and severe faults in electrical drives. Similar to faults in purely mechanical systems, they can be due to environmental conditions, high temperature, contamination, or vibrations, but there are additional fault causes in electrical machines. These faults in turn manifest themselves with increasing localized temperature, vibrations, and occasionally through highfrequency stator currents. The following are some of the causes of bearing faults according to their origins [3]: 1. Mechanical origin. Mechanical origin is a mechanical bearing load applied by radial and axial forces, as well as vibrations, but also contamination. Radial and axial forces are the most frequent and most analyzed causes of bearing faults. Such loads cause increased wear and are primarily due to eccentricity and the associated radial forces or even axial forces due to rotor displacement. An important consequence of bearing deterioration for electrical machines is that magnetic pull of the rotor, especially for PMAC machines, becomes asymmetric [4], and the rotor becomes further eccentric in the stator bore, increasing static and/or dynamic eccentricity, placing more load on the bearing and causing further bearing degradation. 2. Electric origin (electric bearing load as given by bearing currents). A review is presented in the work of Plazenet et al. [5]. The possible damage to bearings is due to voltages building between the races of the bearings and then discharging through electric currents and has been recognized for a long time. These voltages are present under operation from a balanced, symmetric, sinusoidal system (traditional operation); additional ones are induced under operation from an inverter. The first include the following: • Alternating voltages induced in the shaft. Unbalanced magnetic fields caused by design—manufacturing details such as axial holes in the stator or/and rotor laminations, joints between stator segments, rotor eccentricities, and bowed rotor—can create a magnetic flux encircling the shaft. Thus, alternating voltages are induced in the shaft and may cause a

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

135

circulating current in the loop “stator frame—drive-end bearing—rotor shaft—non-drive-end bearing.” If the bearing voltage increases above a threshold to break the insulating lubricant film of the bearing. As a result, design rules have been established to limit this issue. Axial rotor flux can be generated through the shaft by residual magnetism (linked to magnetic particles and improper demagnetization) local saturation, asymmetries in the rotor field winding, and rotor eccentricities. This homopolar flux will circulate from the shaft, in the loop “stator frame—drive-end bearing—rotor shaft—non-drive-endbearing.” • High-frequency bearing currents. High-frequency bearing currents are generated through capacitive and inductive coupling inside the machine. Under inverter operation, the high-frequency components of the common mode voltage excite the parasitic capacitances and inductances of the motor, producing the so-called inverter-induced bearing currents. These phenomena are in the range of 100 kHz to several megahertz. One can identify “circulating” and “noncirculating” currents for the purpose of developing models and possible design methods to limit them: (1) Small capacitive high-frequency bearing currents (≈ 5 − 200mA), “noncirculating” type that appear at low speed. (2) EDM bearing currents, “noncirculating” type. The bearing voltage mirrors the common mode voltage through a capacitive voltage divider. The bearing voltage increases, charges the lubricant until its breakdown field strength is surpassed, and causes a breakdown with the EDM current pulse (≈ 0.5 − 3A), which is oscillating at frequencies in the megahertz range. (3) High-frequency “circulating” type of bearing current. The parasitic capacitances between the stator winding and the frame are excited by the high dV/dt at the motor terminals, which creates a highfrequency ground current. The latter produces a circular flux around the motor shaft, inducing bearing voltages. The lubricating film breakdown high-frequency current (≈ 0.5 − 20A) circulates in the loop “stator frame—drive-end bearing rotor shaft—non-drive-end bearing” with a frequency of several hundred kilohertz. This type of bearing current is due to inductive coupling, and it mirrors the common mode current. (4) Bearing currents, “circulating” type due to rotor ground currents. This happens if the rotor-to-ground impedance is lower than the stator-to-ground impedance. In this case, a portion of the ground current crosses the bearings toward the shaft. These currents can reach high levels (≈ 1−35A) and prematurely damage the bearings. Small motors up to 20 kW are more sensitive to EDM bearing currents, whereas larger motors are likely to be subjected to circulating bearing currents.

136

Fault diagnosis and prognosis techniques for complex engineering systems

• Chemical and environmental causes have significant effects, but very little systematic study is available. Temperature as both a cause and an effect plays an important role. Contamination and generally environmental conditions can severely shorten bearing life. Liquids (e.g., water) directly degrade the lubricant and surfaces through oxidation, and particles disrupt the lubricant films and separate the rolling body surfaces. Unlike other reasons and causes of degradation, they cannot be easily quantified and included in a model used for detection. These causes will lead to the generation of local flaking. Studies on how a fault is initiated and developed have provided a better understanding of the fault progression and helped develop diagnosis tools. The initiation step requires the stress to exceed a threshold value, a fatigue-initiation stress criterion. Beyond stresses due to operation, stresses on bearings occur because of loading without rotation (e.g., true brinelling) or vibrations unrelated to operation (false brinelling). Healing, the smoothing of sharp edges of a crack or damage done by the rolling contact, initially reduces vibrations until the damage spreads. As for the electric bearing load, the cause-and-effect chains of this type of load are well known. Discharge bearing currents can lead to localized pits that may translate into a gray trace, frosting, or fluting. Depending on the energy released, the discharge may lead to melting or vaporization of the bearing raceway surface. The small craters that are caused by melting are flattened by the rolling bearing balls, resulting only in a frosted raceway, and have been found to have no direct effect on the lifetime of the bearing; they can be ameliorated by frequent greasing. The large craters, however, resulting from vaporization, affect the lubricating grease, lead to corrugated patterns, and shorten the lifetime of the bearing. Similar damage patterns also result from differential-mode currents. The fault is developed further by the localized currents and by the breakdown of the grease that results when current flows where these factors are iteratively affecting each other and related to low vibration frequencies.

4.2.2.2 Winding faults A large portion of faults in electrical machines are in the windings and are either short circuits between turns, of the same phase, between two phases, and between turns and the ground steel paths. These shorts lead to localized overcurrents that create local high temperature spots, damage the machine, and quickly result in catastrophic failure. Armature winding faults. The causes of short circuit faults (turn-to-turn, phase-to-phase, phase-to-ground) are multiple [6]. Insulation degrades for a number of reasons. High temperature leads to oxidation and other forms of chemical degradation, and with it the mechanical weakening of the insulating material; vibrations of the whole machine, or torque pulsations related to the internal torque production, mechanically stress the insulation, as it is mechanically

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

137

weaker than the iron core and copper windings; overvoltages due to the supply or fast switching transients in the electronic controller are yet a separate cause of insulation breakdown, albeit through a different physical mechanism, partial discharges (PDs) [7]. In low- and medium-voltage electrical machines, the turn-to-turn insulation is usually a thin layer of enamel, which often reveals an organic chemical composition (polyamide, polyimide, polyester-imide, etc.). Hence, PDs can be incepted in air spaces within the insulation (i.e., voids), leading to polymeric bonds breaking and resulting in conductor short circuits [8]. As degradation progresses, these short circuits result in the changes of the insulating material makeup and create pockets or voids in it, and discharges in these voids are initiated, a precursor of arcing and breakdown. Detection of these discharges is therefore useful although difficult to conduct online. The high values of the rate of voltage rise, dv/dt, resulting from high-frequency PWM both increase the PD occurrence and make it more difficult to distinguish the high frequencies of the signal and of the discharge [9]. Many parameters have been used to detect insulation degradation other than PDs. As insulation ages, the resistance of the insulation decreases, and hence a measurement of that resistance can lead to an understanding of the fault presence and severity estimation. The capacitance between turns and the grounded wall or other turns also depends on the insulation quality and either directly or indirectly is a measure of this fault [10]. Open circuit windings are important to detect in that they may disrupt the control algorithm, causing torque pulsations and overcurrents. They are caused most of the time by a fault in a power electronic switch, which stays open, and to a lesser extent by the breakage of a connection at the terminals or in internal connections of the windings due to corrosion, mechanical stress, or initial weakness. Nevertheless, they have to be detected, identified, and compensated for through the utilization of a modified control algorithm. A special case is the open circuit of rotor bars or the end ring in a squirrel cage of induction machines. This occurs due to increased temperature and uneven temperature distribution and expansion between copper (or aluminum) bars and the steel core, and mechanical stresses, and in the case of a welldesigned machine operating from a variable speed drive, during heavy loads. The asymmetries resulting from these breakages cause high-frequency stator currents and voltages, decreased developed torque, and increased torque pulsations. This fault propagates, as other rotor bars become increasingly loaded due to the failure of one AC [11].

4.2.2.3 Demagnetization of permanent magnets Permanent magnet machines are increasingly being adopted. Unlike induction machines, the magnetizing field can be very high, more than 2T , due to the high

138

Fault diagnosis and prognosis techniques for complex engineering systems

energy density of modern magnets, resulting in high torque density and efficiency. Their design has evolved into many variants and dramatically improved their already attractive characteristics. For this discussion, we will consider two basic permanent magnet materials: rare earth and ferrites. There are a number of concerns in the use of permanent magnets: rare earth magnets, neodymium iron boron (NdFeB), and SmCo alloys, typically Sm2Co17, have very high energy density but are sensitive to demagnetization due to higher temperature and demagnetizing fields. NdFeB materials are high energy content and magnetization characteristics preferable to those of SmCo, but the latter can operate at higher temperatures. Demagnetization happens at high temperatures and under a demagnetizing field, which is often due to intended field weakening at high speeds. Fig. 4.5 shows the demagnetization of Permanent magnets due to high temperature and/or demagnetizing field. An important concern is that certain materials, such as dysprosium, which in very small quantities is needed for high temperature operation, is not easily available, and hence its use results in high costs. Efforts to alleviate these problems have led to the use of lower cost, lower strength ferrite magnets that are not sensitive to high temperatures but are sensitive to demagnetization under a strong negative field. The problems associated with the use of magnets are as follows:

FIGURE 4.5 Demagnetization characteristics of NdFeB magnetic material. BH1: The original magnetization curve at low temperature; BH2: the same at higher temperature shifted; L1: a load curve without demagnetizing current; L2: a load curve with demagnetization current; P1, P2: operating points with and without demagnetizing current, of the “cold” magnet; P2’: operating point below the knee; R2: the new recoil line after operating at P2’; P3: the operating point P1 shifted on the new recoil line.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

139

• Avoidance of demagnetization through material selection, design, and operation, and • Detection of demagnetization and continuous operation after partial demagnetization.

4.2.3

Capacitors

A very common and critical component in electrical drives is the DC link capacitor, used to filter the ripple of the voltage provided to the inverter. The inverter supplies AC current to the load of frequencies that depend on the switching frequency, and hence the current it draws from the source is not pure DC. This causes the voltage of the DC link to vary, and a capacitor is used to smooth it. Three types of capacitors are generally available for DC-link applications: aluminum electrolytic capacitors (Al-Caps), metallized polypropylene film capacitors (MPPF-Caps), and high-capacitance multilayer ceramic capacitors (MLC-Caps). Al-Caps could achieve the highest energy density and lowest cost per joule but have the disadvantages of relatively high equivalent series resistance (ESR), low ripple current ratings, and wear out issue due to evaporation of electrolyte. MLC-Caps have smaller size, wider frequency range, and higher operating temperatures up to 200oC. However, they suffer from higher cost and mechanical sensitivity. MPPF-Caps provide a balanced performance for high-voltage applications (e.g., above 500 V) in terms of cost and ESR, capacitance, ripple current, and reliability. Nevertheless, they have large volume and moderate upper operating temperature. The DC-link applications can have a high or low ripple current. The ripple current capability of the three types of capacitors is approximately proportional to their capacitance values. C1 is defined as the minimum required capacitance value to fulfill the voltage ripple specification. For low ripple current applications, capacitors with a total capacitance no less than C1 are to be selected by both the Al-Caps solution and the MPPF-Caps solution. For high ripple current applications, the Al-Caps with capacitance of C1 could not sustain the high ripple current stress due to low value of capacitance (A/μF.) Therefore, the required capacitance is increased by the Al-Caps solution, whereas the one by the MPPFCaps solution is C1. In terms of ripple current (i.e., $/A), the cost of MPPFCaps is about one-third that of Al-Caps. This implies the possibility to achieve a lower cost, higher power density DC-link design with MPPF-Caps in high ripple current applications, like the case in electric vehicles. Electrolyte vaporization is the major wear-out mechanism of small size AlCaps due to their relatively high ESR and limited heat dissipation surface. For large-size Al-Caps, the wear-out lifetime is primarily determined by the increase of leakage current. An important reliability feature of MPPF-Caps is their selfhealing capability. Initial dielectric breakdowns (e.g., due to overvoltage) at local weak points of a MPPF-Cap will be cleared and the capacitor regains its full

140

Fault diagnosis and prognosis techniques for complex engineering systems

ability except for a negligible capacitance reduction. With the increase of these isolated weak points, the capacitance of the capacitor is gradually reduced to reach end of life. The metallized layers in MPPF-Caps are less than 100 nm in thickness and are susceptible to corrosion due to absorption of moisture. Severe corrosion occurs at the outer layers, resulting in the separation of metal film and the reduction of capacitance. Unlike the dielectric materials of Al-Caps and MPPFCaps, the dielectric materials of MLC-Caps are expected to last for thousands of years at use-level conditions without significant degradation. An MLC-Cap could degrade much more quickly due to the “amplifying” effect from the large number of dielectric layers. A modern MLC-Cap could wear out faster through the increase of the number of layers. The failure of MLC-Caps may induce severe consequences to power converters due to the short circuit failure mode. The dominant failure causes of MLC-Caps are insulation degradation and flex cracking. Insulation degradation results in increased leakage currents. Under high voltage and high temperature conditions. either with an abrupt burst of current leading to an immediate breakdown, or a more gradual increase of leakage current [12]. Failure modes, failure mechanisms, and critical stressors. DC-link capacitors could fail due to intrinsic and extrinsic factors, such as design defect, material wear out, operating temperature, voltage, current, moisture and mechanical stress, and so on. Generally, the failure can be divided into catastrophic failure due to single-event overstress and wear-out failure due to the long-time degradation of capacitors. Based on these prior art research results, Table 4.1 gives a systematic summary of the failure modes, failure mechanisms, and corresponding critical stressors of the three types of capacitors.

4.2.4

Batteries

Although not always a component of a drive, batteries are quite commonly used, especially in transportation applications. Their health is directly related to the health of the drive, and often their health monitoring, fault diagnosis, and failure prognosis are integrated in the drive [13, 14]. Lithium-ion (Li-ion) batteries are the dominant general type used today, and this section discusses only this type. Li-ion batteries consist of two electrodes as the anode and the cathode, which are separated by an electrolyte, where lithium ions move from the cathode to the anode during charging and back during discharging. Compared to nonrechargeable batteries containing metallic lithium, Li-ion batteries utilize a compound lithium electrode material.

4.2.4.1 Overview of the battery • Characteristics of Li-ion batteries. The important characteristics of Li-ion batteries include their size (physical and energy density), longevity (capacity

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

141

TABLE 4.1 Overview of failure modes, critical failure mechanisms, and critical failure mechanisms of capacitors (Wang and Blaabjerg [12]). Cap. type

Failure modes

Critical failure mechanisms

Critical stressors

Al-Caps

Open circuit

Self-healing dielectric breakdown

VC , Ta , iC

MPPF-Caps

Disconnection of terminals

Vibration

Short circuit

Dielectric breakdown of oxide layer

VC , Ta , iC

Wear out: electrical parameter drift (C, ESR, tanδ, ILC , R p )

Electrolyte vaporization Electrochemical reaction (e.g. degradation of oxide layer, anode foil capacitance drop)

Ta , iC VC

Open circuit (typical)

Self-healing dielectric breakdown

VC , Ta , dVC /dt

Connection instability by heat contraction of a dielectric film

Ta , iC

Reduction in electrode area caused by oxidation of evaporated metal due to moisture absorption

Humidity

Short circuit (with resistance) Dielectric film breakdown Self-healing due to overcurrent Moisture absorption by film

MLC-Caps

VC , dVC /dt Ta , iC Humidity

Wear out: electrical parameter Dielectric loss drift (C, ESR, tanδ, ILC , R p )

VC , Ta , iC , humidity

Short circuit (typical)

VC , Ta , iC

Dielectric breakdown Cracking; damage to capacitor body

Wear out: electrical parameter Oxide vacancy migration; dielectric puncture; drift (C, ESR, tanδ, ILC , R p ) insulation degradation; micro-crack within ceramic

Vibration VC , Ta , iC , vibration

VC -capacitor voltage stress, iC -capacitor ripple current stress, iLC -leakage current, Ta -ambient temperature.

and life cycles), charge and discharge characteristics, cost, performance in a wider temperature range, self-discharge profile and leakage, gassing, and toxicity impact. On the positive side, they have high specific energy (230 Wh/kg) and power density (12 kW/kg) good energy density, excellent cycle life and long life, and good charging and discharging efficiency. On the negative side are the cost, the electronic protection system that is mandatory during charging and discharging, and the emissions during manufacturing and disposal. The Li-ion battery has good charging and discharging electrical characteristics, as shown in Fig. 4.6. While charging, the charging capacity increases gradually with the charge voltage while maintaining a constant current. When the voltage reaches a maximum, the current decreases exponentially. However, the capacity discharge maintains an almost constant voltage and current to the load, with a small decrease and increase in the voltage and current values, respectively, until the cell capacity reaches the minimum acceptable level. • Li-ion components. The Li-ion battery is composed of four primary components: the cathode, anode, electrolyte, and separator. The cathode is a lithium-metal-oxide powder. The lithium ions enter the cathode when the battery discharges and leave when the battery charges. The lithium ions leave the anode when the battery discharges and enter the anode when the

Charging capacity (%)

Fault diagnosis and prognosis techniques for complex engineering systems

Charge voltage (V)

142

Charge voltage (V)

(A)

(B)

Discharge capacity (Wh)

FIGURE 4.6 Typical characteristics of the Li-ion battery charging (A) and discharging (B). From Hannan et al. [13].

battery charges. The cathode and anode materials are made of lithium metal oxide and lithiated graphite in Li-ion batteries, where both structures are organized in layer of aluminium and copper current collectors, respectively. The electrolyte is composed of lithium salts and organic solvents; it allows for the transport of the lithium ions between the cathode and anode rather than electrons. • Li-ion battery formations. Li-ion batteries can be constructed and packed in either metal cans in cylindrical or prismatic shapes, or laminate films (stacked cells) that are familiarized as Li-ion polymer batteries. They can be shaped as the cylindrical structure of rolled and plastered layers in metal cans with electrolytes. In the stacked form, the three layers are confined in laminate film and where their edges are heat-sealed aluminized plastic. In general, the main sources of the active lithium ions in a battery are the positive electrode material or the cathode. Hence, to achieve high capacity, a huge amount of lithium is included in this material. Additionally, cathode materials follow a reversible process to exchange the lithium with slight structural modifications to its properties; in the electrolyte, the materials are prepared from high lithium ions that have diffusivity, good conductivity, and high efficiency. Those types of cathode materials involve lithium cobalt oxide (LiCoO2), lithium

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

143

manganese oxide (LiMn2O4), lithium iron phosphate (LiFePO4), lithium nickel manganese cobalt oxide (LiNiMnCoO2), lithium nickel cobalt aluminium oxide (LiNiCoAlO2), and lithium titanate l (Li4Ti5O12):

• Lithium cobalt oxide (LiCoO2). This has high specific energy, is expensive because of the of cobalt, and has short life and restricted load capacity. It requires protection against overheating and excessive stress while charging quickly, and the charge and discharge rate need to be limited to a secure level. • Lithium manganese oxide (LiMn2O4). Due to its structure, this has high thermal stability and safety but limited life span. LiMn2O4 has more specific energy than cobalt. This type of battery provides approximately 50% more energy than nickel-based batteries. Good design enhances the longevity and high current handling of this battery. • Lithium iron phosphate (LiFePO4). This material is steady in the overcharged condition and can tolerate high temperatures without breaking down; the cathode material in this battery is more dependable and more secure than other cathode materials. Phosphates have a cell operating temperature range of −30oC to +60oC and a cell packing temperature range of −50oC to +60oC that deteriorates thermal runaway and prevents from burning out. The LiFePO4 battery has low resistance, long life span, high-load handling capability, improved security and thermic consistency, no toxic effects, and less expense. • Lithium nickel manganese cobalt oxide (LiNiMnCoO2). The cathode blend of nickel-manganese-cobalt (NMC) can cause either high specific energy or power with high density. For the silicon-based anode, the capacity and life cycle compromise each other. The cathode mix of 33% nickel, 33% manganese, and 34% cobalt brings lower raw material costs because of the decreased cobalt content. Presently, this battery is in great demand for EV applications due to its high specific energy and minimum self-heating rate. • Lithium nickel cobalt aluminium oxide (LiNiCoAlO2). The lithium nickel cobalt aluminium oxide battery (NCA) has a small amount of the world market share . Now automobile industries are emphasizing NCA battery production because of its high profile, such as high specific energy and power densities and long life span considering cost and safety [57]. • Lithium titanate (Li4Ti5O12). Lithium-titanate anodes have been commonly used for batteries since the 1980s. It has a spinel architecture and a high life span compared to that of a typical Li-ion battery. Moreover, Li4Ti5O12 batteries can be operated safely and have phenomenal features at cold temperatures. Because its specific energy is not high, unlike other Liions, the developments and research are focused on enhancing the specific energy and reducing the price.

144

Fault diagnosis and prognosis techniques for complex engineering systems

4.3 Diagnosis methodology and tools In general, effective fault diagnosis requires accurate measurements and tracking of signals in a motor drive. In this section, we discuss the types, characteristics, and selection. Fig. 4.4 show the basic methodology for fault diagnosis and classification.

4.3.1

Signal selection

It is seldom that one can perform a fault diagnosis of a drive without some prior knowledge of the drive operation under healthy conditions, and an expectation of the manifestation of all expected faults under different operating conditions and under different fault severities. This knowledge has to be developed from models—analytical, statistical, or based on artificial intelligence (AI)—that satisfy some basic criteria: • Occupy minimal storage space, • Are used in real or quasi-real time, • Identify and discriminate between fault types and fault severities. Knowledge of the behavior of the drive under a fault based on the physical understanding of the fault and an analytical model is the preferred technique, but it is possible only in a few scenarios. An obvious example is the monitoring of a stator current, which will indicate an open circuit; more complex but also possible is the monitoring of the harmonics of the currents or voltages of an induction machine drive during a rotor bar breakage. A number of faults either in the power electronics or in the machine may cause vibrations in the housing of the drive and a qualitative change in the stray magnetic field. Some raw data from these indicators are enough to identify the fault and its severity, but in most cases further analysis is needed. The signals that are used can be the raw currents, voltages, or fluxes, or transformed to a rotating frame of reference, typically a field-oriented one, at least in a PMAC machine. The field orientation, although useful in an induction machine under healthy, normal operation, becomes inaccurate during some faults, as its operation and accuracy depends on the signals and parameters that are disrupted. In an AC drive system, the stator currents and/or voltages are continuously available in the controller, either from measurements or from internal calculations. However, they may not be of the bandwidth required, and rectifying this may introduce additional costs. Other signals that are used are rotor position or speed, internal or stray magnetic field and vibrations, and temperature. Other signals may be specific to power electronics, such as voltages between gate, collector or emitter, and gate current. The signals collected are processed first by filtering to remove unwanted frequency components and noise, and then by more complex operations. For

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

145

example, in the case of a current-controlled, voltage source inverter, harmonics can be extracted, real and reactive power calculated, or the voltage command outputs of the current controller to the inverter can be extracted and from those the terminal voltages estimated. In many faults in electrical machines, the variable most closely connected to the fault is the magnetic flux, which is disrupted by the fault itself. It is therefore advantageous to attempt to measure the flux itself, a cumbersome and expensive proposal, or one of two variables: the voltage induced due to the magnetizing flux in the main power windings, or additional sensing windings, or the stray flux leaking from the stator frame. Among these faults are eccentricity, shorts, and demagnetization. They lead to changes of both the leakage flux and in the back EMF induced in that stator windings. These changes are not only in the amplitude but in the saturation level of the steel, and hence the relationship between stator magnetizing current and back EMF (e.g., see [15, 16]).

4.3.2

Signal features

Features of the signals used for the diagnosis are selected in one of three possible domains: time domain, frequency domain, and time-frequency domain. Of course, combinations of these and hybrid versions abound. Here we will limit ourselves to the basic concepts and applications. In the time domain, the statistical parameters of signals that are mostly used to detect damage are for a signal of N sample points xi , i = 1 · · · N. Use of time domain signals, when effective, offers significant reduction of effort compared to more complex techniques [17]. The use of harmonics of signals, voltages, currents, vibrations stray field, and so forth computed using fast Fourier transform requires operation in steady state—a condition not very common in electrical drives. Among other frequency analysis tools, bispectrum analysis [18], a higher-order spectrum (third-order spectrum), has unique advantages compared with the power spectrum, such as identification of nonlinear systems, retention of phase information, and elimination of Gaussian noise. It is usually used to detect quadratic phase coupling in nonlinear signals. In stationary conditions, the frequency content of the signals is obtained through the Fourier transform. The time-frequency characteristics are obtained from like wavelet analysis and the general Cohen class transformations. These signal features are extracted from a relatively large collection of prior observations of healthy and faulty operation of similar drives. Signal processing techniques are used to detect the health condition based on statistical or AI models. Most prevalent is time-frequency domain analysis, using wavelets and Cohenclass transformations. To perform monitoring of dynamic systems, an approach to retain key temporal information contained in the signatures is to use wavelet transforms (WTs). A WT is a powerful tool for time-frequency

146

Fault diagnosis and prognosis techniques for complex engineering systems

(A) Time-frequency tiling for Fourier transform

(B) Time-frequency tiling for short time Fourier transform

(C) Time-frequency tiling for wavelet transform

FIGURE 4.7 Time-frequency tiling for Fourier transform, Short time Fourier transform and Wavelet transform.

analysis in a dynamic system. Wavelet analysis is suitable for nonstationary signals. The discrete WT (DWT) greater flexibility than short time Fourier transform. Different basis functions, or mother wavelets, are used in wavelet analysis, whereas the basis function for Fourier analysis is always the sinusoid. Unlike sinusoids, wavelets have finite energy concentrated around a point. One can choose or design a wavelet to achieve the best results for a specific application. Time-frequency tiling for the DWT is shown in Fig. 4.7(C). Unlike Fourier methods (Fig. 4.7(B), tiling for the DWT is variable, allowing for both good time resolution of high-frequency components and good frequency resolution of lowfrequency components in the same analysis. However, it requires to predefine a specific band of frequency of interest to perform the analysis. In the case when the area of frequency interest is dynamically changing, the application of WT becomes challenging. Computational complexity has also been challenging to implement in a commercial DSP until now, or field-programmable gate array has been seldom utilized. A drawback of using wavelets is that once selected, the same transformation must be applied to all possible fault scenarios. The Cohen distributions make up a generalized class of time-frequency distributions that includes several types popular in electric machine diagnostics, such as WignerVille, Choi-Williams, and Zhao-Atlas- Marks distributions.

4.3.3

Classification

4.3.3.1 Nearest neighbor rule In k-nearest neighbors (kNN) classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, a set of previously classified points, with the object being assigned to the class most common among its kNN (k is a positive integer, typically small). If k = 1, then the object is simply

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

147

assigned to the class of that single nearest neighbor. To categorize a sample point in d-dimensional space, it is assumed that observations which are close to each other (in some appropriate metric) will have the same classification. This categorization problem can be approached in two different ways: first by assuming that some statistical distribution is given for the data, and second by assuming no knowledge of a distribution except for what can be concluded from the samples. In calculating the minimum distance, some appropriate measure needs to be used. Any dissimilarity measure (4.4) would be applicable; however, the most commonly used dissimilarity measures are the Minkowski p metrics (4.5), where d is the dimensionality of the vectors Xm and Xn .  d   d(Xm , Xn ) = g fi (Xim , Xin ) (4.4)

d(Xm , Xn ) =

 d 

i=1

 1p

|Xim − Xin | p

(p ≥ 1)

(4.5)

i=1

The three most often used Minkowski metrics are the taxi-cab distance (4.6) where p = 1, the Euclidean metric (4.7) for which p = 2, and the maximum coordinate distance (4.8) where p = ∞. d(Xm , Xn ) =  d(Xm , Xn ) =

d 

|Xim − Xin |

i=1 d 

(4.6)

 12

(Xim − Xin )2

(4.7)

i=1

d(Xm , Xn ) = max {|Xim − Xin |} 1≤i≤d

(4.8)

4.3.3.2 Linear discriminant analysis A second approach to categorizing points in a d-dimensional space relies on the use of discriminant functions. In the implementation of discriminant functions, no prior knowledge of a probability distribution among the sample points is assumed. The space is divided into K regions, each having its own weighting coefficients. In this work, linear discriminant functions (4.9) are used, Dk (x) = x1 α1k + x2 α2k +, . . . , +xN αNk + αN+1,k k = 1, 2, . . . , K,

(4.9)

where x is the N-dimensional sample vector and α are the normalized weighting coefficients for the k-th class. Linear discriminant functions were chosen for the algorithm since they are the most computationally efficient form. A sample vector belongs to a particular class if its discriminant function is greater for that

148

Fault diagnosis and prognosis techniques for complex engineering systems netj

xi

FIGURE 4.8 layer.

wij

wik

yj

A typical neural network, x is the input layer, y is the output layer and one hidden

class than for any other class—that is, xi belongs to class C j if D j (x) > Dk (x)

for every k = j.

The weighting coefficients are adjusted from their initial guess through a training procedure using sample vectors of which the proper classification is known. The algorithm for this procedure makes adjustments to the weighting coefficients until each sample vector is correctly classified. Once a sample vector is correctly classified, no adjustment to the weighting coefficients is made. When a sample vector is incorrectly classified, or D j (x) ≤ Dl (x), where Dl (x) = max [D1 (x), . . . , DK (x)], l= j

adjustments are made to α j (4.10) and αl (4.11) only: α j (i + 1) = α j (i) + axi

(4.10)

αl (i + 1) = αl (i) − axi ,

(4.11)

where a is a gain constant. Fig. 4.8 shows a typical neural network with four inputs and outputs and one hidden layer.

4.3.3.3 Support vector machines For classification problems, the support vector machine (SVM) attempts to find a hyperplane, the optimal one, to separate the data points according to their classes such that the separation between the classes is maximum [19]. Consider a twoclass training set consisting of data points in which xi is the i-th real valued– dimensional input vector and yi is the corresponding class of xi . The optimal

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

149

hyperplane, which successfully separates the points according to their classes, can be given by the equation wT xi + b = 0. In this equation, w and b denote a weight vector and a bias term, respectively, and the goal of SVM is to find the values of w and b such that the separation between the classes is maximum. This problem is solvable in linear cases, but nonlinear versions have been developed to address real-life problems. They utilize a transformation of the training data into a higher (extended) dimension space. The original nonseparable data may become separable in the expanded space. The transformation is accomplished by using kernel functions (linear, polynomial, or sigmoid)

4.3.3.4 AI-based tools AI-based tools extract inherent characteristics of data to classify faults from healthy conditions. These tools can perform classification tasks on a mixture of electrical and mechanical signals. Neural networks and fuzzy and neuro-fuzzy system–based artificial techniques are mainly used for motor fault diagnosis. The flexibility of the neural networks makes them an immensely popular choice in machine fault diagnosis. Artificial neural networks (ANNs) provide an approach that makes the use of small processing elements called neurons that are interconnected in a manner similar to the brain. The amount of influence that one neuron exerts on another is determined by the weight corresponding to the interconnection between them and is a tunable parameter of the network. Numerous variations of neural networks have been proposed, as well as many techniques to update the weights. We discuss the simplest possible version here. The input to the network x and the vector of all weights w determine its output. Forward propagation. In the input layer, the output of neuron i equals the value of xi . In the hidden layer, with the neuron j of the hidden layer, its input value net j is each unit’s weighted sum of the prior layer. The output layer is the weighted sum of its inputs.  net j = wi j xi (4.12) i

a j = f (net j )

f is the sigmoid function

yk =



w jk a j

(4.13) (4.14)

k

Back propagation. For targeted output dk , the error function is defined as  E = (1/2) (dk − yy )2 (4.15) k

and it is used to correct the weights: w jk = −η

∂E . ∂w jk

(4.16)

150

Fault diagnosis and prognosis techniques for complex engineering systems

4.4 Faults, their manifestation, and diagnosis We first discuss the commonalities between various drive types and the diagnosis tools used there. We then proceed to discuss some of the specific methods that have been used in the more common types of drives. The diagnosis process is similar to the diagnosis of most faults in nonelectrical systems and starts with the measurement of variables of interest. What is different in electrical drives is that the signals used, as well as the source of the signals, are primarily electrical or magnetic. In all cases, the first step is measurement of available signals. What happens next depends on the technique used. An electrical drive is a complex system consisting minimally of a power electronics converter, a motor, a controller, and sensors. Each of these components consists of parts that, as discussed earlier, can fail in more than one way and have multiple manifestations of the fault, each depending on its own particular characteristics as well as the conditions they are operating. Faults in general can be detected and identified, their severity determined, and their progress monitored and predicted based on signals from sensors that are either in place as part of the drive controller or installed for this purpose. It is preferable that the entire fault detection, identification, and management process is conducted when the drive is operating normally, although some testing could be done during off times. It is also important that extensive testing be done at drive commissioning, both to detect inherent existing weaknesses and faults and to establish a baseline of healthy operation. The controllers in electrical drives have as integral parts the speed and position, current, and possibly voltage sensors. It is then reasonable to expect that these sensors are the first to be considered for use in diagnosis. In addition, sensors that may be used are those that can measure vibrations, temperature, and a stray magnetic field. If an analytical model is established for a healthy drive and for all anticipated fault types and severities, then this model can be used to compare its output to that of the drive. When signal- and data-based models are used, features again are extracted in the time, frequency, and/or timefrequency domain. These are processed and compared to those resulting from extensive training tests to determine fault identification and severity. Faults can be detected based on different techniques. A more or less accurate model of the drive operation already exists in the controller and can be used to detect deviations of the expected operation. Using an analytical model of the machine, one can estimate the magnetic flux, torque, and terminal voltage waveforms of the healthy stage and compare them to the ones commanded or measured. The deviation can be a source of information about the health condition. Some of the signals can be used in their raw form; to take a very simplistic example, a zero current is an indication of an open circuit. The closer the measured variable characteristics are to those of predetermined fault, the higher its significance and usefulness. A change in the flux, whether measured or

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

151

estimated through the induced voltage, is closely related to a demagnetization, eccentricity, or short circuit. However, bearing degradation may result in lowamplitude high-frequency torque pulsations, which are distantly related to torque and speed pulsations and current harmonics, which may also be caused by other faults. Detection and identification of faults and estimation of their severity then is not often possible based on the raw signals.

4.4.1

Winding faults in AC machines

An open circuit can be detected from the current measurement in the corresponding phase. The open circuit may be located in the winding itself or caused by the malfunction of one of the power electronic switches feeding that winding. This is more difficult to detect when only one switch is damaged. A short circuit in the windings, between turns, between phases, or between a winding and the ground is a more complex fault to detect. There are a number of variables that affect the characteristics and severity: the number of shorted turns, speed, and load current. The design of the machine makes a big difference in the effects. Induction machines are usually constructed with distributed windings, where the current in one phase and a shorted turn affects the induced voltages and current in the others. In PMAC machines, windings can be concentrated or distributed; in concentrated windings, coupling between phases is minimized, thus effectively isolating magnetically and thermally the damaged phase from the others. In induction machines, the magnetic field created by the rotor currents is controlled by the stator current through a time constant of typically a few hundred milliseconds. In PMAC machines, this is not the case, as the magnetic field is not controlled, except though field weakening, which is usually encountered at higher speeds. This means that the magnets, due to inertia, keep on rotating and continue to feed a short circuit until the speed is decreased or the demagnetizing stator current is applied. Beyond the signals available either directly or calculated in the controller, additional signals include those measured in the stray field or by imbedded sensors. For measuring the health of bearings, the most common and reliable variable is vibrations, although they can also be caused by the load or even short circuits. In addition, measurements of the stator currents have been attempted with the results strongly dependent on the machine and application. For the detection of the health of power electronics, other than load voltages and currents, the condition of the gate voltage, and of the collector-emitter, VCE is detected. A higher possible level of processing to obtain the condition of the drive, using these signals and analytical models, will stay close to the source of the fault, such as flux estimated through the back EMF, real and reactive power, currents transformed from natural to a synchronously rotating frame of reference, and harmonics related to the slip of an induction machine, among others. A third level, which makes little use of an analytical model, is based on the signal

152

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 4.2 Time domain features. 1 2 [max(xi ) − min(xi )]   N 1 2 i=1 (xi ) N   N 1 2 i=1 (xi − x) N

Peak Value RMS Value Standard Deviation

N

Kurtosis

i=1 (xi

− x)4 /RMS value

N

− x)3 /

Skewness

1 N

Crest Factor

(Peak value)/RMS

Impulse Factor

Peak value/( N1

i=1 (xi

1 N N ( i=1 (xi

N i=1

− x)2 )3/2

|xi |)

characteristics. These are often in the time domain: mean, variance, skewness, and kurtosis as shown in Table 4.2.

4.4.2

Bearing faults

As discussed in Section 4.2.2.1, faults may be caused by a variety of operating conditions that are not always easy to pinpoint and to associate with the bearing health condition. This observation renders the use of variables available in the controller to be of lesser value. The general idea is that at least at the early stage of a fault, where there are distinct craters, the mechanical resistance will appear whenever a rolling element tries to pass the defect. This in turn produces a train of impulses, which can be detected. The frequencies with the anomaly appear depending on the location of the fault: Outer raceway

fo =

Nb 1 fr

Inner raceway

fi =

Nb 1 fr

Rolling element

fb =

Dc Db



1−

Db Dc

 cos β  cos β

 b 1+ D Dc

D2 fr 1 − Db2 cos2 β c

where Db , Dc are dimensions of the bearing, Nb is the number of the rolling elements, and fr is the mechanical frequency of rotation [20]. The detection can then be based on the resulting harmonics of the torque-producing stator current. Much more acceptable is the use of direct measurement of vibrations and extracting the signature using time-frequency analysis and various classification methods such as neural networks with Fourier transform [21]. In this case, the algorithm followed is shown in Fig. 4.9. This algorithm time segments the input signal. The time-segmented vibration signals are used to obtain spectral contents. The signal then is filtered and further enhanced. In Fig. 4.10 the original spectrum of the signal is shown, while in

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

FIGURE 4.9

153

Flowchart of the bearing fault diagnosis using Fourier transform and ANN [21].

FIGURE 4.10

Exemplary motor vibration spectral image [21].

FIGURE 4.11

Enhanced image of an exemplary motor vibration signal. From Amar et al. [21].

Fig. 4.11, the enhanced version, resulting from filtering is shown, with coherent patterns enhanced and incoherent or noise spectra depreciated. Supervised learning of the neural networks was used, and the steepest descent method was used to adjust the weights and biases. For the four classes considered, from the confusion matrix (see Fig. 27), the hit and false alarm probability pairs are (1, 0), (0.87, 0), (1, 0), and (1, 0.04), respectively. The overall hit and false alarm probabilities of the classifier are 0.96 and 0.01, respectively. High hit probability (0.96) and low false alarm probability (0.01) indicate that VSI with ANN is capable to distinguish different classes accurately, showing a very

154

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 4.3 Confusion matrix indicating hit and false alarm probability pairs [21]. Confusion matrix Output class

1

2

3

4

320

0

0

0

100%

25%

0.0%

0.0%

0.0%

0.0%

0

280

0

0

100%

0.0%

21.9%

0.0%

0.0%

0.0%

0

0

320

0

100%

0.0%

0.0%

25%

0.0%

0.0%

0

40

0

320

88.9%

0.0%

3.1%

0.0%

25%

11.1%

100%

87.5%

100%

100%

96.9%

0.0%

12.5%

0.0%

0.0%

3.1%

1

2

3

4

Target class

high correct detection percentage with very low false alarms even under a poor signal-to-noise ratio (Table 4.3). A new multispeed fault diagnostic approach was proposed by Hao et al. [22]. They used self-adaptive WT components generated from bearing vibration signals. The proposed approach was capable of discriminating signatures from four conditions of rolling bearing: normal bearing and three different types of defected bearings on outer race, inner race, and roller separately. Particle swarm optimization and Broyden-Fletche-Goldfarb-Shanno-based quasi-Newton minimization algorithms are applied to seek optimal parameters of the impulse modeling–based continuous WT model. The impulse modeling–based continuous WT model was introduced for decomposing vibration signals obtained from roller bearings with WT. After that, three-dimensional statistical parameters are applied to extract fault characteristics. The nearest neighbor classifier using Mahalanobis distance is adopted to map samples into corresponding categories. The method provides very accurate results, as shown in Table 4.4.

4.4.3

Insulation

As we shall see later, incipient short circuits in machine windings are difficult to detect and the insulation breakdown can develop precipitously. In only a few cases can a short circuit fault be detected in time to be managed using the power frequency signals and related harmonics. It is therefore preferable to monitor the health of the insulation.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

155

TABLE 4.4 Multispeed resulting trust rate of fault detection and identification on roller bearing [22]. The number of Speeds speed

The number of The number of The number of Trust rate training data testing data sets misclassified sets data sets

Testing Accuracies(%)

1

1000

400

400

0

98.94

100

2

1000,1500

800

800

0

98.38

100

3

1000,1500,2000

1200

1200

0

94.45

100

4

1000,1500,2000,2500

1600

1600

0

91.77

100

5

1000,1500,2000,2500,3000

2000

2000

0

91.10

100

FIGURE 4.12 Turn-to-turn voltages for IGBT (red) and SiC (blue) inverters between the first and second turns with PWM-like signal (Ubus = 700V ). From Acheen et al. [24].

The two phenomena, PDs and general insulation breakdown, have been studied extensively. For the first, there has been an IEEE guide published [23], although it is applicable to relatively high voltage machines. Detection is complex, as these discharges are caused by switching of power electronics, and the frequencies in the current resulting from PDs are close to the harmonics of switching frequency. Fig. 4.12 shows a typical case of different turn-turn stress in the same windings using IGBTs and SiC inverters. Fig. 4.13 shows the effect of PD on the current on the affected phase When performing measurement using a nonintrusive sensor, it is usual to connect a high-pass analog filter to remove noise coming from the inverter drive switches or power amplifier. Typical cut-off frequency randing from 200 to 500 MHz been used in the following experiments. The nonintrusive sensor used to detect PD is taking advantage of a capacitive effect between the sensor fitting and the cable core. It is necessary then to use more complex signal analysis and processing. There has been better success in detecting PDs and insulation health in offline measurement of noise, light, or radiated electromagnetic signals [25]. For the second,

156

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 4.13 Phase-to-phase voltage (yellow), and PDs on phase 1, 2, and 3, respectively (blue/purple/green). From Acheen et al. [24].

general insulation breakdown, high-frequency signals and measurements are required [26]. It is interesting to notice that the sharp rise times of the power electronics switches that can be the cause of the insulation deterioration can also produce the current signals that can be used to detect the health of the insulation (e.g., [27, 28]). There an insulation state indicator (ISI) is introduced for the assessment of the insulation condition in one phase. It is based on quantifying the change in the machine’s high-frequency behavior by comparison of the amplitude spectra recorded for a healthy machine condition (reference) and that recorded during later condition assessment. The root-mean-square deviation (RMSD) was chosen as a comparative value and serves as the ISI for the respective phase: ISIp,k = RMSD p,k (x1 , x2 ) n high

 Yref,p (g) − Ycon,p,k (g) 2 =

(4.17)

g=nlow

nhigh − nlow

Here, Yref and Ycon are the Fourier transformed signals for a healthy machine condition (reference) and a later condition assessment, respectively. The index p defines the investigated phase (U,V,W). The variables nhigh and nlow define the compared frequency range and depend on the sampling rate and investigated window length. The performance and applicability of the method has been proven for different machine ratings and winding systems like random wound winding and preformed coils. One must notice that the sampling rate is significant, required to be in 2.6 MS/s for an induction machine of 1.4 MW. Jensen et al. [27] proposed a method to lower the necessary sampling rate.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

(A) FIGURE 4.14

4.4.4

157

(B)

Circuit implementation of the monitoring system.

Power electronics

We have discussed some of the more common failure mechanisms of power electronics, particularly IGBTs. Oh et al. [29] presented a comprehensive review of condition monitoring and prognostics of IGBTs. A sensitive signal for the IGBT failure prognosis, using as precursor of the failure of wire binding, is VCE , but its measurement cannot be reliable. Using more than one online measurement of damage indicators, such as the on-state voltage of the semiconductor and the voltage drop in the bond wires, a more accurate diagnosis can result (e.g., [30, 31]). In the work of Gonzalez-Hernando [31], a monitoring system for both IGBTs and SiC MOSFETS is proposed and shown in Fig. 4.14. The measurements are synchronized with the PWM signals generated in the digital controller. For VDSmax , the measurement circuit is based on the online VCE measurement. The circuit complexity limits the measurement speed of the system, thus defining the minimum settling time and maximum switching frequency. The diagnosis of the inverter circuits, especially when redundancies are used, is complex. The limited availability of sensors and measured signals lead most of the time to the use of simpler time domain methods. Incipient IGBT faults may cause a spurious junction resistance increase at the junction point associated with junction degradation [32]. A normalized feature called mean current vector (MCV) gets arranged with respect to different types of spurious resistance faults. The normalized features thus obtained are ranked and an optimized set of effective features is computed using the SVMbased recursive feature elimination (SVM-RFE) algorithm. The method uses rather limited computations, a welcome development compared to most of the techniques available.

158

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 4.15 Normalized Iα and Iβ current from a healthy inverter (A) and a faulty inverter (B). Concordia pattern on the α − β-plane for a healthy inverter (C) and a faulty inverter (D). MCV on the α − β-plane for a healthy inverter (E) and a faulty inverter (F).

When a plot of this normalized Iα versus normalized Iβ currents is traced on a two-dimensional α − β reference frame, the normalized Concordia pattern is formed. This pattern is a perfect circle with unit radius for a healthy system and a distorted shape for a faulty system, as shown in Fig. 4.15. This is converted to a normalized Concordia pattern into a single point that is called MCV. The MCV for a healthy system is at the center of the circle, whereas for a faulty system it lies at a particular position on the α − β plane, as shown in Fig. 4.15(E) and (F), respectively). Methods based on time domain analysis of the output current of the inverter, such as normalized current vector, slope methods, current profile shape, reference current error, and derivative of the absolute current Park’s vector, have been in use for locating faults in the inverter portion of the drive. The whole array of statistical modes of the current can be used as features, and a variety of classifiers, ANNs, nearest neighbor, and SVM have been used. The problems of creating

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

FIGURE 4.16

159

Cross section of a squirrel cage induction machine.

a database to use for classification and the selection of the most appropriate features limit the applicability of such methods.

4.4.5

Induction motor drives

Induction motor drives suffer often from open and short circuit faults, with the majority of these faults in the power electronics rather than in the motor. The single open-circuited phase winding is the most common form of winding fault, arising from either a fault in the winding itself or in the converter leg that it is connected to. The greatest impact is seen in the two phases that are physically located adjacent to the faulted phase in the stator winding, which exhibit an increase in input current in an attempt by the controller to provide the air-gap MMF that is missing in their locality. Fig. 4.16 shows a cross section of a typical rotor cage induction machine, with the stator windings and rotor bars shown. In cage rotor induction machines, an additional problem may arise, in which a rotor bar is damaged to the point that its resistance becomes high or infinite, and similarly a section of the end ring may become damaged, fSH = (1 ± 2s) fs ,

(4.18)

where fs is the supply frequency and s is the slip. This open circuit fault causes increased stator currents, decreased torque, and increased torque pulsations. The stator current has frequency components related to the slip frequency, and these can be detected both in the case of operation off the grid and during transients under controlled operation. Motor current and motor voltage signature analysis

160

Fault diagnosis and prognosis techniques for complex engineering systems

has been shown to work well in most of the cases of broken rotor bars, although they are more accurate under extreme operating cases, such as heavy loading, corresponding to high slip. Generally, the frequencies produced by interturn shorts are fSHc = fs [m(1 − s)/p ± k],

(4.19)

with m = 0, 2, 1, k = (0, 1, 3, 5). These frequency components already exist in the current since they already exist in the drive due to imbalances and so forth. Furthermore, under the short-circuit condition, a significant rise in rotorslot harmonic components occurs. Since many induction motor drives are using a voltage source currentcontrolled inverter, incipient short circuits are difficult to detect using motor current signature analysis, as it is close to the commanded current. However, the phase voltages are seldom directly measured. This leaves two options: either use the estimation of the stator voltages from the commanded ones, or use additional variables calculated from these voltages and currents. They include the instantaneous active and reactive power to extract signatures [16, 33], the negative sequence component of the currents or voltages, and the corresponding reactances, as well as the Park transformation of these variables. In these methods, the fault signature is a function of slip, which requires a lookup table with large data memory and complex interpolation for compensation. As most techniques cannot estimate fault severity, AI-based techniques have been proposed. Monitoring the stray field of the machines has been a recently proposed tool, either alone or in conjunction with other signals [34, 35]. The average current methods detect the fault from the mean value of currents over one fundamental period. They either use the phase currents or the current space vector in the stator frame, applying Clarke’s transformation. The magnitude of the average current space vector over one period is compared to a threshold value. The angle of this space vector indicates the faulty transistor. Since this method is dependent on the load, the current is normalized. In the slope method, the slope of the trajectory of the current space vector in the stator frame is observed. It is assumed that the slope is rather constant for a quarter of a period during the fault. The slope depends on which inverter leg experiences the fault. To determine the faulty transistor, Schmitt triggers are used for the detection of the polarity of the phase currents during a fault [36]. Having more than three phases improves the ability to continue operation after a fault, as well as windings, where each phase may have multiple coils. Parallel paths ameliorate the effects of open circuit faults by allowing currents to flow in some of the coils in the faulted phase, thereby providing a measure of fault tolerance. Methods have been proposed based on neural networks, using every available variable as input: currents, speed, and voltages in the time or frequency domain.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

161

FIGURE 4.17 (A) Phase-to-ground fault locus in phase “a.” (B) Phase-to-ground fault locus in different phases. (C) Phase-to-ground fault in different fault severities in phase “a.”. From Eftekhari et al. [37].

A fuzzy decision system has been proposed, which has been shown to be accurate, although it requires extensive data for training. In addition, most of the proposed methods deal with the detection procedure and not with the location of the fault, unless the neutral point is accessible or intrusive sensors are installed inside the machine or even extra sensors (e.g., voltage or flux sensors) are used. Meanwhile, a turn-to-turn fault in two phases is eventually possible to simultaneously occur, and it may even lead to more catastrophic consequences than the case of a fault in only one stator winding; however, there has not been much research effort in this area [37]. Fig. 4.17 shows the stator current locus for the case of phase-ground faults in different phases. This work is based on developing the best fit of an ellipse describing the currents in a three-dimensional current locus with the circular pattern of the healthy motor. The short-circuit fault has to be recognized safely and quickly, in a few milliseconds, since, especially in IM distributed windings, the fault can propagate rapidly between phases that have turns in the same slots, resulting in a catastrophic fault that cannot be mitigated or compensated for.

4.4.6

PMAC drives

This general category, including interior and surface permanent magnet machines, with rare earth or ferrite magnets, concentrated or distributed windings, or axial or radial flux, is the fastest growing segment of electrical drives at

162

Fault diagnosis and prognosis techniques for complex engineering systems

(A) Inset magnets, concentrated windings FIGURE 4.18

(B) Interior permanent magnets, distributed windings

(C) Interior permanent magnets, concentrated windings

Typical cross sections of radial flux PMAC machines.

present. Fig. 4.18 shows the cross sections of some typical Permanent Magnet AC machines. Unlike induction machines, PMSMs require power electronics converters to operate and form a drive, and the faults of these drives can be either internal to the machine or the converter. Some of these faults have common characteristics, making it difficult to separate and discriminate, and one fault can precipitate the other. They have to be isolated, since any mitigation or other action will depend on the fault type. All of these faults have been the subject of a plethora of detailed individual studies, with a comprehensive review published by Choi et al. [38]. PMAC drives operate under one of two control principles: either field orientation (discussed in Section 4.1.1) that requires the continuous updating of the position of the rotor and online transformation of currents and voltages, measured, estimated, or commanded, to a frame of reference related to this rotor position, or a DTC method (discussed in Section 4.1.2). Eccentricity fault in these machines is often caused or initiated by manufacturing flaws, which can be detected before commissioning, at the end of the production line, by mechanical radial loads causing bearing wear, and by partial demagnetization. They result in imbalance of radial forces and magnetic pull, which will worsen the wear and increase the fault severity. They lead to vibrations, rotor-stator interference, and rubbing. Detecting eccentricity is based on one of two principles. The first is detecting sideband components of the stator current [39] affected by the machine design (slots and saturation) and analyzed using WT or simply spectrum analysis. The second is based on changes in the flux [40], which can be estimated through the back EMF and identified through the changes in the flux saturation. Changes in the stray flux, measured outside the machine, are a promising indicator. Similar are the tools for detecting partial demagnetization of magnets. An open circuit in a PMAC drive can be in the windings, the terminals, or the switches. The first two are relatively straightforward to detect, given that typically there are current sensors in the drive controller. More complex is the

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

DC supply

163

DC supply

(A) (B) FIGURE 4.19

Typical connections to mitigate inverter fault.

case of a fault in the inverter supplying the machine, as it is possible that only one switch may be open, discussed by Eickhoff et al. [41]. Again, it is the currents that can be used for the detection of the fault, along with the rotor position. In this case, it is important to detect which switch is malfunctioning, the number of shorted turns, and the resistance of the open. The indicators that have been proposed include the negative sequence component of the stator current and voltage and the real and reactive power estimation in the controller. In most cases of open circuit fault, mitigation is possible, especially if the drive has been designed with that in mind, such as multiple phases or access to the neutral access to the neutral and an additional inverter leg that can be connected to it. Two such options are shown in Fig. 4.19. To accomplish this, it will be necessary to provide access to the neutral and utilize additional power electronics components [38]. A short circuit in PMAC machine windings can lead very quickly to a subsequent catastrophic fault. This is because even if the fault is detected rapidly and the machine is disconnected from the supply, its rotational speed will not drop as quickly, and thus the magnetic field will continue feeding the short circuit. The short circuit can be detected through the measurement of the harmonics of the current and commanded voltages, and the estimation of real and reactive power fed to the motor. Then the current in the short-circuited portion of the winding can possibly be limited through design of high inductance, and through the injection on a demagnetizing current. A disadvantage of this scheme is that very fast sampling may be needed. Since the short-circuit current increases rapidly and a detection may be completed too late for the system to react, a reasonable approach is instead, or in parallel, to monitor the health of insulation rather than the operating variables of the drive, and predict its failure through the calculation of the RUL. This is a rapidly developing and promising area. Separating the various faults has been a daunting experience. One of the many proposals was by Haddad et al. [42, 43], with results shown in Fig. 4.20 and mitigating a short circuit is not always possible. To limit damage, in the case of concentrated windings with

164

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 4.20 Experimental and simulation results of the effects of three fault types on PMAC machines. From Haddad et al. [43].

electrical, mechanical, and thermal isolation, the only possible solution is to inject demagnetizing current in the stator (e.g., see [44]).

4.4.7

Switched reluctance machines

The operation of thee machines is based entirely on developing torque using the reluctance of the magnetic circuit that varies with the rotor position. A simplistic cross section of the cross section of such a machine is shown in Fig. 4.21(A), and one of the many possible switch configurations is shown in Fig. 4.21(B). The machine operates by consequently energizing the coils in windings around the stator teeth A, B, and C. It is clear that these coils are not in physical proximity and hence have little thermal or magnetic coupling. It is generally accepted that SRM drives are fault tolerant by their nature but not completely fault free [45]. An exhaustive methodology to identify possible faults, open and short of diodes and switches in the inverter, has been presented by Gopalakrishnan et al. [46]. Certain faults can be only handled by disabling the complete drive. In most cases, disabling a phase can lead to the continued operation of the drive.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

165

DC supply

(A) Conceptual cross section of a switched reluctance machine FIGURE 4.21 inverter leg.

(B) Components of a switched reluctance drive

Cross section of a 6/8 switched reluctance machine, and diagram of a one-phase

Phase-to-phase shorts are generally of little interest here because of the physical separation of the phases.

4.5 Failure prognosis, fault mitigation, and reliability 4.5.1

From diagnosis to prognosis

The obvious goal of failure prognosis is to estimate the RUL of a drive, component, or part of a drive. This knowledge is used to schedule maintenance, employ redundancies, or, in general, avoid unexpected failures and manage the health of the drive. To do so in a profitable manner, the estimation of RUL has to be accurate, and hence a degree of certainty has to be associated with this prediction, related to the width of the prediction interval. The prediction interval is defined as an estimate of the time interval in which a future observation (in our case a failure) will fall, with a certain probability, given what has already been observed. The RUL estimate, accounting for this interval, has to be longer than the time it takes to schedule and perform maintenance or mitigate an incipient fault. It should be pointed out that decisions based on diagnosis alone implicitly include failure prognosis, although they do not provide an estimate of time to fail. Two extreme examples illustrate the concept, its usefulness, and limitations. Bearings degrade almost always at a slow rate, but it is very seldom that a fault of a bearing can be mitigated. Extensive research has shown that the bearing RUL can be estimated accurately and well in advance of a failure. This allows to schedule maintenance with a low chance of catastrophic failure. The opposite extreme is the short-circuit fault in the windings of a PMAC. In a limited number of cases, the fault can be mitigated by a quick modification of the control algorithm, but in most cases, the short-circuit current is fed by the rotating magnets and increases faster than any action can be taken. If it is the short circuit that is being monitored and predicted, there is not always enough time to manage

166

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 4.22

The RUL decision interval and the threshold determine the point of action.

the fault, and hence in that case, the RUL estimate is essentially useless. But if it is insulation health that is being determined, there may be adequate time to plan maintenance. Prognosis of drive failure can increase the reliability of the system and decrease cost of operation and chances of unexpected failure. As the state of the health approaches failure, a well-designed prediction system will provide a RUL estimate that approaches zero, and the estimate of the confidence in it increases. Action is needed when the threshold for decision (mitigation, shutdown, etc.) falls within this interval, and the decision should be made well ahead of that point so that the time to act is adequate. If the decision threshold is well defined in relation to the prediction interval, early decision for action, followed by the necessary time to act, will be effective. If instead this threshold is set too low, then the decision will be too late, leaving insufficient time for action. If the threshold is set too high, the decision will cause early interruption of service, as shown in Fig. 4.22.

4.5.2

Prognosis tools

To predict the RUL with high precision, it is necessary to have the diagnostic tools in place first, from which the state of health of the device can be estimated. Beyond this, methods have been developed to identify trends of the features used for diagnosis, based on Baysian statistics and AI. To identify trends in the degradation of a component or subsystem, it is necessary to have stored histories of similar components, and at least part of the history of the one that is being

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

167

monitored. The establishment of the relationship between physical degradation and its manifestation gives validity to a prognostic technique. Although physicsbased methods offer a direct connection between operation and degradation, they often become too complicated and require more observed variables.

4.5.2.1 Kalman filter The Kalman filter is the optimal linear estimator for linear system models (e.g., see [47]). The extended Kalman filter (EKF) [48] and the unscented Kalman filter linearize the model but are not optimal [49]. A Kalman filter is a model-based state estimator that estimates the state utilizing a linear model, given inaccurate inputs and inaccurate measurements. The continuous state-space model is first discretized with a timestep ts , using the Euler backward method, resulting in a discrete state-space model with a timestep. An EKF can be used to calculate the RUL of the data that are measured. It uses an estimate of the expected trajectory in the state variables to predict future values of those variables when the input measurements are noisy. The EKF will predict the next values of the state variables, receive a measurement from the system, and then update the prediction of the next value along with updating the parameters of the expected trend that the variables will follow. During this process, a certain level of white noise is expected in the system model and in the input measurements. The nonlinear system model used for the EKF is shown in Eqs. 4.20 through 4.25. Here, x represents the state variables, F is the state transition matrix, w is the process noise covariance, v is the measurement noise covariance, H is the output matrix, and z is the measured output. The uncertainty matrices, M and P, are used to update the Kalman gain, K, and the predicted value of the state variables as shown in Eqs. 4.20 through 4.25. Matrix R represents the measurement noise covariance, and matrix Q represents the process noise covariance. xˆk = Fk−1 xk−1 + ek−1

(4.20)

zk = Hk xˆk + uk

(4.21)

T Mk = Fk−1 Pk−1 Fk−1 + Qk−1

(4.22)

−1 Kk = Mk HkT Hk Mk HkT + Rk

(4.23)

xk = xˆk + Kk (Zk − Hk xk )

(4.24)

Pk = (1 − Kk Hk )Mk x

(4.25)

4.5.2.2 Particle filters Particle filters are an attractive alternative, discussed elsewhere (e.g., [50–52]). The Monte Carlo method is a Baysian model–based estimation of internal states

168

Fault diagnosis and prognosis techniques for complex engineering systems

in dynamical systems when partial observations are made. The solution of the filtering problem is computed by recursive estimation. The filtering problem is to estimate the first two moments of the state vector that is governed by the dynamic state-space model having noisy observation. A discrete time-controlled process can be expressed in state-space form by the stochastic difference equation of the form: xk = fk (xk−1 , wk−1 )

(4.26)

and a measurement equation y ∈ k given by yk = hk (xk , vk ).

(4.27)

Eq. (4.26) is called the state transition (dynamic) equation, whereas Eq. (4.27) is called the correction, update, or output equation. At time tk , xk is the state vector, wk is the dynamic noise, yk is the real observation vector, and vk is the observation noise vector. The function fk gives the relationship between the previous state and the current state, and the function hk links the current state to the output. In Bayesian form, instead of the future state vector, the probability density (pdf) of the future state vector is estimated. Following the pattern of update and measurement equations, the prior pdf is calculated using the update equation and the posterior pdf using the measurement equation. Eq. (4.26) gives the predictive conditional transition density, p(xk |xk−1 , yk−1 ), of the current state, given the previous states and previous observations. The observation or measurement equation, Eq. (4.27), gives the likelihood function of the current measurement given the current state, p(yk |xk ). If p(xk−1 |yk−1 ) is defined as the previous posterior density, then the prior pdf p(xk |y1:k−1 ) is defined using the Baye’s rule as  p(xk |xk−1 )p(xk−1 |yk−1 )dxk−1 (4.28) p(xk |y1:k−1 ) = The correction step generates the posterior probability density function from p(xk |y1:k ) = c ∗ p(yk |xk )p(xk |y1:k−1 ).

(4.29)

The filtering problem is to recursively estimate the first two moments of xk given yk . For a general distribution, px , this consists of recursive estimation of the expected value of any function of x, say g(x) p(x) , using Eqs. 4.28 and 4.29.  g(x) p(x) = g(x)p(x)dx (4.30)

4.5.2.3 Hidden Markov model The hidden Markov model (HMM) is a stochastic technique for modeling signals that evolve through a finite number of states. The states are assumed hidden and responsible for producing observations. A HMM assumes that the system is

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

169

Markovian (i.e., the behavior depends only on the current state). The objective is to characterize the states given the observations. Sk is the hidden state at time k and Ok is the observation sequence, assuming that there are C possible states. The main objective is to determine hidden parameters (states) from the observable parameters. The problem to be solved in our case is as follows. Given the observation sequence y = {y1 y2 ...yk } and set of model parameters θ = {π , A, B}, how do we choose the corresponding state sequence x = {x1 , x2 ..., xk }, which is optimal to generate the observation sequence. The optimal measure can be the maximum likelihood. The model developed has three elements: π: C x 1 initial state distribution vector where the ith element is the probability of being in state i at time k = 0, p(S0 = i). A: C x C state-transition matrix where the (i, j)th element is the probability of being in state j at time k + 1, given that it is in state i at time k, p(Sk+1 = j|Sk = i). B: State-dependent observation density B. Its jth element is the probability of observing Ok at time k given the system is in state j, b j (Ok ) = p(Ok |Sk = j). The model parameters are collectively denoted by λ = {π, A, B}. To implement the HMM-based prognosis algorithm, the model parameters need to be trained. The state transition probabilities (A) and state dependent observation densities (B) are generally obtained from the historical data collected from a large number of observations, and the initial state probability distributions (π) depend on the implementation area and the nature of operation of the system being studied.

4.5.3

Applications and new developments

The prognosis techniques that have been described, as well as similar ones, have been applied to a level that the end users have found acceptable and useful and have incorporated in industrial systems. The main applications include the following. Bearings . This problem has been addressed in multiple publications; an experimental platform, PRONOSTIA [53], provided data for a large number of research efforts and a competition, which advanced the state of the art. The tools used extend the whole gamut of feature extraction, both time and time-frequency domain, and both data and AI methods for diagnosis and prognosis [3]. In the work of Kim et al. [54], intelligent fault diagnosis and health states estimation of discrete failure degradation was performed using a range of classification algorithms, such as ANNs, SVM, classification and regression trees and random forests, and linear regression. Among the available classifiers, SVM showed outstanding performance in the classification process compared with the other classifiers. The health state probability estimations were conducted using the classification ability of SVM and with subsequent machine prognostics being conducted based on the probability distributions of each health state. The method

170

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 4.23

RUL estimation with different EKF tracking start times. From Singleton et al. [57].

shows the usefulness of identifying degradation states, in that case five, to be used as an estimation tool for machine remnant life prediction in real-life industrial applications. Using more than one set of features allows an in-depth understanding of the degradation of bearings [55]. There the authors used primarily two features (skewness of 236–256 Hz band and entropy of 160–200 Hz band) of six selected features to detect change points in signals with high volatility, such as bearing vibration data. The more features used, the more the change-point detection algorithm becomes robust to the noise and can identify more distinct change points. The same authors used the time-frequency domain as well as frequencydomain features and Kalman filtering [56]. In the last case, in Fig. 4.23 it becomes clear that the duration of the estimation process plays a role and that RUL is more accurately estimated close to failure. Soualhi et al. [58] combined data-driven (time domain features of the vibration signal) as health indicators and experience-based approaches (artificial ant clustering) for classification. The imminence of the next degradation state in bearings is given by HMMs, and the estimation of the remaining time before the next degradation state is given by the multistep time series prediction and the adaptive neuro-fuzzy inference system. Fig. 4.24 gives the predication of the RUL based on this method.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

171

FIGURE 4.24 Prediction of the imminence and the remaining time of a bearing: the medium-good state (A), the medium-bad state (B), and the bad state (C). From Soualhi et al. [58].

FIGURE 4.25 IGBT RUL prediction results of V  (VCE(on) , T j ) using particle filters under different initial prediction periods. From Rao et al. [59].

Power electronic switches . Extensive efforts discussed earlier have led to advanced and affordable techniques to identify failure precursors and utilize modifications, either of the control system of built-in redundancies. Rao et al. [59] fused information of the junction temperature and collector emitter voltage to establish a more accurate precursor, using a Particle Filter algorithm Fig. 4.25. An extensive review of topologies and their uses is given by Zhang et al. [60]. These topologies, some shown in , require only partial redundancies and modification of the control algorithm. Alternatives [61, 62] include keeping the same controller but modifying the PWM scheme to decrease the switching losses and temperatures. Winding insulation . Methods to determine the development of incipient faults and predict RUL well ahead of failure have been published both by industry and academia (see [26]). Nussbaumer et al. [28] utilized the response

172

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 4.26

Insulation RUL prediction from switching transients. From Jensen et al. [27].

(A) Normalized capacitance change for different temperatures indicating effect of temperature on aging. FIGURE 4.27

(B) Output space vector diagram after one switch open.

Changes in output space vector with one switch open. From Tsyokhla et al. [63].

to high-frequency testing and the switching impulses provided by the inverter. EKF has been used for prognosis, and innovative schemes have been proposed to avoid high-frequency sensors [27]. There the authors utilized the peaks the current transients at the PWM edges resulting from the use of fast switches (Fig. 4.26). Tsyokhla et al. [63] used the capacitance data as a prognostic tool. In Fig. 4.27, the change of normalized capacitance is shown for different tem-

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

173

FIGURE 4.28 Insulation failure prediction and confidence along with Ceq versus time. From Tsyohkla et al. [63].

peratures and the level of filtering used. In Fig. 4.28, the RUL estimate and its threshold confidence are plotted versus the operating time. Batteries . The concerns related to battery health as being vital to the health of electric transportation has been increasing. Fault mitigation methods have been proposed, using analytical models, Monte Carlo, and particle filters (e.g., see [64]).

4.5.4

Decisions based on prognosis and mitigation

Prognosis tools can give adequate warning of an impending failure but not the ability to recover or mitigate the fault. The resulting decision may include a complete shutdown at a convenient moment before the anticipated failure for bearings, gears, couplings, and decaying insulation. If the estimation of RUL leaves limited time for such action, redundancies such as the (1) use of a different motor or inverter or (2) operation with reduced phases or with a neutral inverter leg should be utilized. Such failures can be at the switches, windings, and so forth, and the control algorithm can be changed, for instance, to inject negative d-axis current to offset the effects of a short circuit fed by the rotating permanent magnet. Health monitoring and the possible detection of a fault and its severity are steps toward the decision to some action. This action may be

174

Fault diagnosis and prognosis techniques for complex engineering systems

(A) Output space vector diagram before fault FIGURE 4.29

(B) Output space vector diagram after one switch open

Changes in output space vector with one switch open. From Ginart et al. [67].

Continue operation considering the drive to be healthy, Plan maintenance without disruption of present operation, Take mitigating action including deployment of redundancies, Continue operation recognizing the possibility of a failure, or Emergency stop. Both in cases of prognosis of an expected fault or of outright fault identification, some of the appropriate action may allow the continuation of operation. These are based primarily on the use of redundancies, modifications of the control algorithm, or both. Several techniques have been proposed to handle phase loss in an induction machine, as well as the case where an open fault is internal to the inverter. A detailed survey was published by Mirafzal [65]. The case where the switch remains open but the antiparallel diode remains intact is discussed by Eickhoff et al. [66]. A disadvantage of these mitigation schemes is that they cannot produce the complete range of voltages [67], thus limiting the operational ability of the drive (Fig. 4.29). A few considerations are needed: the inverter fault has to be detected very quickly, in the order of microseconds; the drive has to have a more complex topology, including a fourth inverter leg; and possibly a split DC link capacitor, thyristors, and fuses. A case of a shorted switch can also be dealt with by injecting demagnetizing current before the fault expands, thus limiting the extend of the damage but also decreasing produced torque and introducing torque pulsations. Batteries can be managed by prognosis and topology changes for stress mitigation. A converter equipped with combinations of battery cells and capacitors forms a unit with increased power density and lowered electromagnetic interference. Electrolytic capacitors, as well as supercapacitors, placed in addition to batteries are used to reduce stress and losses that medium-frequency current pulsations cause in battery packs (see [68]). In a recent article, Lee et al. [69] discuss emerging challenges in this area and the possible directions of research. For failure prognosis to become more

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

175

widely applied, open questions such as how to decrease the amount of data used to train the algorithm and how to improve the confidence, giving adequate time for reaction have to be further addressed. Methods being investigated include the use of translational models, such as adapting past results from similar systems without extensive new tests, and hybrid methods combining statistical methods with neural networks. Another important line of research is in applying advanced algorithms to fault classification and prognostics of electric drives. As indicated earlier, prognosis is a natural and necessary step after most diagnosis events. Since it is based on and requires methods to identify and utilize trends, it results in a further level of complexity. Advanced data-based methods are evolving, tested, and proposed, and they offer accurate predictions, albeit often based on a long history of observations. Fusion of sensed data and hybrid physical/data-based systems also offer a promise of reducing the testing effort and data storage [70].

References [1] F. Niu, B. Wang, A.S. Babel, K. Li, E.G. Strangas, Comparative evaluation of direct torque control strategies for permanent magnet synchronous machines, IEEE Transactions on Power Electronics 31 (2) (2016) 1408–1424. [2] R. Wu, F. Blaabjerg, H. Wang, M. Liserre, F. Iannuzzo, Catastrophic failure and fault-tolerant design of IGBT power electronic converters—An overview, Proceedings of the 39th Annual Conference of the IEEE Industrial Electronics Society (IECON), 2013, pp. 507–513. [3] A. Muetze, E.G. Strangas, The useful life of inverter-based drive bearings: Methods and research directions from localized maintenance to prognosis, IEEE Industry Applications Magazine 22 (4) (2016) 63–73. [4] A. Aggarwal, E.G. Strangas, J. Agapiou, Analysis of unbalanced magnetic pull in PMSM due to static eccentricity, Proceedings of the 2019 IEEE Energy Conversion Congress and Exposition (ICCE), 2019, pp. 4507–4514. [5] T. Plazenet, T. Boileau, C. Caironi, B. Nahid-Mobarakeh, A comprehensive study on shaft voltages and bearing currents in rotating machines, IEEE Transactions on Industry Applications 54 (4) (2018) 3749–3759. [6] G.C. Stone, I. Culbert, E.A. Boulter, H. Dhiran, Electrical Insulation for Rotating Machines: Design, Evaluation, Aging, Testing, and Repair, IEEE Press Series on Power Engineering, 2nd, Wiley–IEEE Press, Hoboken, NJ, 2014. [7] S. Ul Haq, M.K.W. Stranges, B. Wood, A proposed method for establishing partial discharge acceptance limits on API 541 and 546 sacrificial test coils, IEEE Transactions on Industry Applications 53 (1) (2017) 718–722. [8] V. Madonna, P. Giangrande, W. Zhao, G. Buticchi, H. Zhang, C. Gerada, M. Galea, Reliability vs. performances of electrical machines: Partial discharges issue, Proceedings of the 2019 IEEE Workshop on Electrical Machines Design, Control, and Diagnosis (WEMDCD), 1, 2019, pp. 77–82. [9] T.J. Å. Hammarström, Partial discharge characteristics within motor insulation exposed to multi-level PWM waveforms, IEEE Transactions on Dielectrics & Electrical Insulation 25 (2) (2018) 559–567.

176

Fault diagnosis and prognosis techniques for complex engineering systems

[10] P. Maussion, A. Picot, M. Chabert, D. Malec, Lifespan and aging modeling methods for insulation systems in electrical machines: A survey, Proceedings of the 2015 IEEE Workshop on Electrical Machines Design, Control, and Diagnosis (WEMDCD), 2015, pp. 279–288. [11] V. Climente-Alarcon, J.A. Antonino-Daviu, E.G. Strangas, M. Riera-Guasp, Rotor-bar breakage mechanism and prognosis in an induction motor, IEEE Transactions on Industrial Electronics 62 (3) (2015) 1814–1825. [12] H. Wang, F. Blaabjerg, Reliability of capacitors for DC-link applications in power electronic converters—An overview, IEEE Transactions on Industry Applications 50 (5) (2014) 3569– 3578. [13] M.A. Hannan, M.M. Hoque, A. Hussain, Y. Yusof, P.J. Ker, State-of-the-art and energy management system of lithium-ion batteries in electric vehicle applications: Issues and recommendations, IEEE Access 6 (2018) 19362–19378. [14] K. Smith, Y. Shi, S. Santhanagopalan, Degradation mechanisms and lifetime prediction for lithium-ion batteries—A control perspective, Proceedings of the 2015 American Control Conference, 2015. [15] O. Vitek, M. Janda, V. Hajek, P. Bauer, Detection of eccentricity and bearings fault using stray flux monitoring, Proceedings of the 8th IEEE Symposium on Diagnostics for Electrical Machines, Power Electronics, and Drives, 2011, pp. 456–461. [16] K.N. Gyftakis, A.J. Marques Cardoso, Reliable detection of stator interturn faults of very low severity level in induction motors, IEEE Transactions on Industrial Electronics 68 (4) (2021) 3475–3484. [17] A. Bellini, Quad demodulation: A time domain diagnostic method for induction machines, Proceedings of the 2007 IEEE Industry Applications Annual Meeting, 2007, pp. 2249–2253. [18] Y. Jiang, C. Tang, X. Zhang, W. Jiao, G. Li, T. Huang, A novel rolling bearing defect detection method based on bispectrum analysis and cloud model-improved EEMD, IEEE Access 8 (2020) 24323–24333. [19] B.-K. Yeo, Y. Lu, Expeditious diagnosis of linear array failure using support vector machine with low-degree polynomial kernel, IET Microwaves, Antennas & Propagation 6 (13) (2012) 1473–1480. [20] M. Blodt, P. Granjon, B. Raison, G. Rostaing, Models for bearing damage detection in induction motors using stator current monitoring, IEEE Transactions on Industrial Electronics 55 (4) (2008) 1813–1822. [21] M. Amar, I. Gondal, C. Wilson, Vibration spectrum imaging: A novel bearing fault classification approach, IEEE Transactions on Industrial Electronics 62 (1) (2012) 494–502. [22] Z. Huo, Y. Zhang, P. Francq, L. Shu, J. Huang, Incipient fault diagnosis of roller bearing using optimized wavelet transform based multi-speed vibration signatures, IEEE Access 5 (2017) 19442–19456. [23] IEEE. P1434/D14, Jul 2014—IEEE Approved Draft Guide for the Measurement of Partial Discharges in AC Electric Machinery. IEEE, Los Alamitos, CA [24] R. Acheen, C. Abadie, T. Billard, T. Lebey, S. Duchesne, Study of partial discharge detection in motors fed by SiC MOSFET and Si IGBT inverters, Proceedings of the 2019 IEEE Electrical Insulation Conference (EIC), 2019, pp. 497–500. [25] A. Bhure, E.G. Strangas, J. Agapiou, R.M. Lesperance, Partial discharge detection in medium voltage stators using an antenna, Proceedings of the IEEE 11th International Symposium on Diagnostics for Electrical Machines, Power Electronics, and Drives (SDEMPED), 2017, pp. 480–485. [26] K. Younsi, P. Neti, M. Shah, J.Y. Zhou, J. Krahn, K. Weeber, C.D. Whitefield, On-line

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37] [38]

[39]

[40] [41]

[42]

177

capacitance and dissipation factor monitoring of AC stator insulation, IEEE Transactions on Dielectrics & Electrical Insulation 17 (5) (2010) 1441–1452. W.R. Jensen, E.G. Strangas, S.N. Foster, A method for online stator insulation prognosis for inverter-driven machines, IEEE Transactions on Industry Applications 54 (6) (2018) 5897– 5906. P. Nussbaumer, M.A. Vogelsberger, T.M. Wolbank, Induction machine insulation health state monitoring based on online switching transient exploitation, IEEE Transactions on Industrial Electronics 62 (3) (2015) 1835–1845. H. Oh, B. Han, P. McCluskey, C. Han, B.D. Youn, Physics-of-failure, condition monitoring, and prognostics of insulated gate bipolar transistor modules: A review, IEEE Transactions on Power Electronics 30 (5) (2015) 2413–2426. U. Choi, F. Blaabjerg, S. Jorgensen, S. Munk-Nielsen, B. Rannestad, Reliability improvement of power converters by means of condition monitoring of IGBT modules, IEEE Transactions on Power Electronics 32 (10) (2017) 7990–7997. F. Gonzalez-Hernando, J. San-Sebastian, A. Garcia-Bediaga, M. Arias, F. Iannuzzo, F. Blaabjerg, Wear-out condition monitoring of IGBT and MOSFET power modules in inverter operation, IEEE Transactions on Industry Applications 55 (6) (2019) 6184–6192. I. Bandyopadhyay, P. Purkait, C. Koley, Performance of a classifier based on time-domain features for incipient fault detection in inverter drives, IEEE Transactions on Industrial Informatics 15 (1) (2019) 3–14. M. Drif, A.J.M. Cardoso, Stator fault diagnostics in squirrel cage three-phase induction motor drives using the instantaneous active and reactive power signature analyses, IEEE Transactions on Industrial Informatics 10 (2) (2014) 1348–1360. C. Yang, T. Kang, D. Hyun, S.B. Lee, J.A. Antonino-Daviu, J. Pons-Llinares, Reliable detection of induction motor rotor faults under the rotor axial air duct influence, IEEE Transactions on Industry Applications 50 (4) (2014) 2493–2502. F.E. Prahesti, D.A. Asfani, I.M. Yulistya Negara, B.Y. Dewantara, Three-phase induction motor short circuit stator detection using an external flux sensor, Proceedings of the 2020 International Seminar on Intelligent Technology and Its Applications (ISITIA), 2020, pp. 375– 380. H.T. Eickhoff, R. Seebacher, A. Muetze, E.G. Strangas, Enhanced and fast detection of openswitch faults in inverters for electric drives, IEEE Transactions on Industry Applications 53 (6) (2017) 5415–5425. M. Eftekhari, M. Moallem, S. Sadri, M. Hsieh, Online detection of induction motor’s stator winding short-circuit faults, IEEE Systems Journal 8 (4) (2014) 1272–1282. S. Choi, M.S. Haque, M.T.B. Tarek, V. Mulpuri, Y. Duan, S. Das, V. Garg, Fault diagnosis techniques for permanent magnet AC machine and drives—A review of current state of the art, IEEE Transactions on Transportation Electrification 4 (2) (2018) 444–463. B.M. Ebrahimi, J. Faiz, M.J. Roshtkhari, Static-, dynamic-, and mixed-eccentricity fault diagnoses in permanent-magnet synchronous motors, IEEE Transactions on Industrial Electronics 56 (11) (2009) 4727–4739. A. Aggarwal, E.G. Strangas, Review of detection methods of static eccentricity for interior permanent magnet synchronous machine, Energies 21 (12) (2019) 4105. H.T. Eickhoff, R. Seebacher, A. Muetze, E.G. Strangas, Enhanced and fast detection of openswitch faults in inverters for electric drives, IEEE Transactions on Industry Applications 53 (6) (2017) 5415–5425. R.Z. Haddad, E.G. Strangas, On the accuracy of fault detection and separation in permanent

178

[43]

[44]

[45]

[46]

[47] [48]

[49]

[50] [51] [52]

[53]

[54] [55]

[56]

[57]

[58]

Fault diagnosis and prognosis techniques for complex engineering systems magnet synchronous machines using MCSA/MVSA and LDA, IEEE Transactions on Energy Conversion 31 (3) (2016) 924–934. R.Z. Haddad, C.A. Lopez, S.N. Foster, E.G. Strangas, A voltage-based approach for fault detection and separation in permanent magnet synchronous machines, IEEE Transactions on Industry Applications 53 (6) (2017) 5305–5314. J.G. Cintron-Rivera, S.N. Foster, E.G. Strangas, Mitigation of turn-to-turn faults in fault tolerant permanent magnet synchronous motors, IEEE Transactions on Energy Conversion 30 (2) (2015) 465–475. C. Gan, J. Wu, S. Yang, Y. Hu, W. Cao, Wavelet packet decomposition-based fault diagnosis scheme for SRM drives with a single current sensor, IEEE Transactions on Energy Conversion 31 (1) (2016) 303–313. S. Gopalakrishnan, A.M. Omekanda, B. Lequesne, Classification and remediation of electrical faults in the switched reluctance drive, IEEE Transactions on Industry Applications 42 (2) (2006) 479–486. K. Reif, R. Unbehauen, The extended Kalman filter as an exponential observer for nonlinear systems, IEEE Transactions on Signal Processing 47 (8) (1999) 2324–2328. R. Dhaouadi, N. Mohan, L. Norum, Design and implementation of an extended Kalman filter for the state estimation of a permanent magnet synchronous motor, IEEE Transactions on Power Electronics 6 (3) (1991) 491–497. M. Mosallaei, K. Salahshoor, Comparison of centralized multi-sensor measurement and state fusion methods with an adaptive unscented Kalman filter for process fault diagnosis, Proceedings of the 4th International Conference on Information and Automation for Sustainability, 2008, pp. 514–524. N. Patila, D. Dasa, M. Pecht, A prognostic approach for non-punch through and field stop IGBTs, Microelectronics Reliability 52 (3) (2012) 482–488. M.S. Haque, S. Choi, J. Baek, Auxiliary particle filtering-based estimation of remaining useful life of IGBT, IEEE Transactions on Industrial Electronics 65 (3) (2018) 2693–2703. M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing 50 (2) (2002) 174–188. P. Nectoux, R. Gouriveau, K. Medjaher, E. Ramasso, B. Chebel-Morello, N. Zerhouni, C. Varnier, PRONOSTIA: An experimental platform for bearings accelerated degradation tests, Proceedings of the 2012 IEEE International Conference on Prognostics, 2012. H.-E. Kim, A.C. Tan, J. Mathew, B.-K. Choi, Bearing fault prognosis based on health state probability estimation, Expert Systems with Applications 39 (2012) 5200–5213. R.K. Singleton, E.G. Strangas, S. Aviyente, Discovering the hidden health states in bearing vibration signals for fault prognosis, Proceedings of the 40th Annual Conference of the IEEE Industrial Electronics Society (IECON), 2014, pp. 3438–3444. R.K. Singleton, E.G. Strangas, S. Aviyente, Extended Kalman filtering for remaining-usefullife estimation of bearings, IEEE Transactions on Industrial Electronics 62 (3) (2015) 1781– 1790. R.K. Singleton, E.G. Strangas, S. Aviyente, Extended Kalman filtering for remaining-usefullife estimation of bearings, IEEE Transactions on Industrial Electronics 62 (3) (2015) 1781– 1790. A. Soualhi, H. Razik, G. Clerc, D.D. Doan, Prognosis of bearing failures using hidden Markov models and the adaptive neuro-fuzzy inference system, IEEE Transactions on Industrial Electronics 61 (6) (2014) 2864–2874.

Fault diagnosis and failure prognosis of electrical drives Chapter | 4

179

[59] Z. Rao, M. Huang, X. Zha, IGBT remaining useful life prediction based on particle filter with fusing precursor, IEEE Access 8 (2020) 154281–154289. [60] W. Zhang, D. Xu, P.N. Enjeti, H. Li, J.T. Hawke, H.S. Krishnamoorthy, Survey on faulttolerant techniques for power electronic converters, IEEE Transactions on Power Electronics 29 (12) (2014) 6319–6331. [61] Y. Song, B. Wang, Evaluation methodology and control strategies for improving reliability of HEV power electronic system, IEEE Transactions on Vehicular Technology 63 (8) (2014) 3661–3676. [62] E. Ugur, S. Dusmez, B. Akin, An investigation on diagnosis-based power switch lifetime extension strategies for three-phase inverters, IEEE Transactions on Industry Applications 55 (2) (2019) 2064–2075. [63] I. Tsyokhla, A. Griffo, J. Wang, Online condition monitoring for diagnosis and prognosis of insulation degradation of inverter-fed machines, IEEE Transactions on Industrial Electronics 66 (10) (2019) 8126–8135. [64] R. Xiong, Y. Zhang, J. Wang, H. He, S. Peng, M. Pecht, Lithium-ion battery health prognosis based on a real battery management system used in electric vehicles, IEEE Transactions on Vehicular Technology 68 (5) (2019) 4110–4121. [65] B. Mirafzal, Survey of fault-tolerance techniques for three-phase voltage source inverters, IEEE Transactions on Industrial Electronics 61 (10) (2014) 5192–5202. [66] H.T. Eickhoff, R. Seebacher, A. Muetze, E.G. Strangas, Post-fault operation strategy for single switch open-circuit faults in electric drives, IEEE Transactions on Industry Applications 54 (3) (2018) 2381–2391. [67] A.E. Ginart, P.W. Kalgren, M.J. Roemer, D.W. Brown, M. Abbas, Transistor diagnostic strategies and extended operation under one-transistor trigger suppression in inverter power drives, IEEE Transactions on Power Electronics 25 (2) (2010) 499–506. [68] A. Kersten, O. Theliander, E.A. Grunditz, T. Thiringer, M. Bongiorno, Battery loss and stress mitigation in a cascaded H-Bridge multilevel inverter for vehicle traction applications by filter capacitors, IEEE Transactions on Transportation Electrification 5 (3) (2019) 659–671. [69] S. Lee, G. Stone, J. Antonino-Daviu, K. Gyftakis, E. Strangas, P. Maussion, C. Platero, Recent challenges in condition monitoring of industrial electric machines, IEEE Industrial Electronics Magazine (2020). Early access, June 8 [70] A. Chehade, Z. Shi, Sensor fusion via statistical hypothesis testing for prognosis and degradation analysis, IEEE Transactions on Automation Science & Engineering 16 (4) (2019) 1774– 1787.

Chapter 5

Intelligent fault diagnosis for dynamic systems via extended state observer and soft computing Paul P. Lin# Fellow of the American Society of Mechanical Engineers (ASME); Professor Emeritus, Mechanical Engineering Department, Cleveland State University, United States; Visiting Scholar, Kaohsiung University of Science and Technology, Taiwan

Overview. There have been many studies on observer-based fault detection and isolation (FDI), such as using an unknown input observer and a generalized observer. Most of them require a nominal mathematical model of the system. Unlike sensor faults, actuator faults and process faults greatly affect the system dynamics. The main function of an observer, also known as estimator, is to extract information of the otherwise immeasurable variables for a vast number of applications that include feedback controls and system health monitoring or fault diagnosis. Over the past few decades, two classes of observer design have emerged. One relies on mathematical plant models to produce state estimates; the other uses available plant knowledge to estimate not only the state but also the part of the physical process that is not described in the plant model (i.e., disturbances). For the first class, however, it requires an accurate mathematical model of the plant that is often unavailable in practice. In contrast, the second class provides practical state and disturbance estimation when significant nonlinearity and uncertainty are present in a dynamic system. This chapter presents a new process fault diagnosis technique without exact knowledge of the plant model via extended state observer (ESO) and soft computing. The ESO’s augmented or extended state is used to compute the system dynamics in real time and thereby provides a foundation for real-time process fault detection. Based on the input and output data, the ESO identifies the # Contributors: Zhiqiang Gao, Cleveland State University, United States; Qing Zheng, formerly worked at Gannon University, United States; Jimmy Zhu, Facebook Inc., United States Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00009-4 Copyright © 2021 Elsevier Inc. All rights reserved. 181

182

Fault diagnosis and prognosis techniques for complex engineering systems

unmodeled or incorrectly modeled dynamics combined with unknown external disturbances in real time and provides vital information for detecting faults with only partial information of the plant, which cannot be easily accomplished with any existing methods. Another advantage of the ESO is its simplicity in tuning only a single parameter. Without the knowledge of the exact plant model, fuzzy inference was developed to isolate faults. A strongly coupled three-tank nonlinear dynamic system was chosen as a case study. In a typical dynamic system, a process fault such as pipe blockage is likely incipient requires the degree of fault determination at all time. Neural networks were trained to identify faults and also instantly determine the degree of fault. The simulation results indicate that the proposed FDI technique effectively detected and isolated faults and also accurately determined the degree of fault. Soft computing (i.e., fuzzy logic and neural networks) makes fault diagnosis intelligent and fast because it provides intuitive logic to the system and real-time input-output mapping. For a typical MIMO (multiple-input, multiple-output) nonlinear dynamic system, FDI usually aims at process faults with an assumption that actuator faults and sensor faults do not occur at the same time, which is not always the case. Simultaneous faults of different types turns out to be quite complex, which may explain why there have been very few studies on this topic. This study investigates the coupling relationship among process faults, actuator faults, and sensor faults, and presents how a combination of different types of faults could lead to no-fault detection or false FDI. Finally, a method to isolate actuator faults from process faults is presented.

5.1 Introduction The term fault diagnosis generally refers to FDI. The fault diagnosis for nonlinear dynamic systems using model-free or model-based approaches has received much attention lately [1–3]. The model-free approach relies on a rich data collection to train neural networks in conjunction with the use of a fuzzy inference system (FIS). Such an approach might prove to be impractical, if not impossible, to collect rich experimental data. The model-based approach uses a linear or linearized model of the supervised system to generate a series of faultindicating signals. In particular, the observer-based FDI methodologies have been developed along with the observer theory, and some of them have been successfully applied to industrial processes [4–6]. To deal with the nonlinearity and uncertainty of a dynamic system, nonlinear fault diagnosis has recently become an active research topic. There have been many observer-based residual generation methods for fault diagnosis in nonlinear dynamic system. Frank [7] first proposed a nonlinear identity observer approach for fault diagnosis, followed by a survey on diagnostic observers [8] and a survey on robust residual generation and evaluation methods used in observer-based fault detection [9]. Later, Isermann [10] presented the status and applications of model-based fault

Intelligent fault diagnosis for dynamic systems Chapter | 5

183

detection and diagnosis. Observer-based fault diagnosis was applied to robot manipulators using a mathematical technique called algebra of functions to design the nonlinear diagnostic observer [11]. Adaptive observers [12] and nonlinear robust-based observer schemes [13, 14] both developed an algorithm to adjust the gain matrix of the observer to track the fault parameters of the system online and have been applied to practical processes successfully. Additionally, a new concept of practical optimality using disturbance estimation for health monitoring has been proposed [15]. However, the common drawback of these observer-based fault diagnosis methods is the dependency on detailed knowledge of the process represented by its mathematical model. This study begins with discussion on diagnosing process faults that affect the plant of a nonlinear dynamic system. The sensors and actuators are assumed healthy when process faults occur. More specifically, the presented fault diagnosis technique aims at a nonlinear dynamic system with an uncertain system model and unmodeled or incorrectly modeled dynamics combined with unknown external disturbances. The complexity of fault isolation due to simultaneous faults of different types will be discussed later. Based on the parameterized ESO, a new FDI technique is proposed in this chapter, which is organized as follows. Section 5.2 describes the design of the improved ESO and its estimation error convergence. Section 5.3 presents a case study on a MIMO nonlinear dynamic system. Section 5.4 describes fault detection by means of the ESO, whereas Section 5.5 describes fault isolation, fault identification, and degree-of-fault determination. Section 5.6 discusses simultaneous faults of different types, followed by isolation of process faults and actuator faults in Section 5.7. Finally, our conclusion and future work are presented in Section 5.8.

5.2 Extended state observer To extend FDI to the processes beyond the scope of existing methods, consider a nonlinear dynamic system that can be described by   ˙ · · ·, yn−1 , d = bu (5.1) y(n) = f t, y, y, where y(n) denotes the n-th time derivative of y, f, short for f (t, y, y, ˙ · · · , y(n−1) , d), is a lumped nonlinear time-varying function of the plant dynamics and the unknown external disturbance d, u is the system’s input and b is a constant. In all physical systems, f and b are both bounded. From the fault diagnosis point of view, the f can be thought of as lumped unknown unmodeled or incorrectly modeled dynamics combined with the unknown external disturbances. Instead of separating unmodeled dynamics from the disturbance, the term f in its totality is to be estimated as an extended state of the system, together with the states of the system. Normally, an observer only provides the state estimation; however, with what is known as ESO [16–19], the term f is treated as another state and estimated in real time.

184

Fault diagnosis and prognosis techniques for complex engineering systems

Such additional information proves to be crucial for FDI purposes, as will be shown in this chapter. The ESO technique first developed by Han [16, 17], however, is rather complex, and its implementation requires the adjustments or tuning of several parameters, which can be difficult and time consuming. Later, Gao [18] improved the ESO technique and made it more practical by using a particular parameterization method that reduces the number of tuning parameters to 1. Such parameterized ESO has been successfully applied in many applications, particularly in the context of the active disturbance rejection control [19]. In this section, the design of the improved ESO is described, followed by the proof of the observer’s estimation error convergence.

5.2.1

ESO design

The main idea of ESO is to use an augmented state space model of Eq. (5.1) that includes f as an additional state. Thus, Eq. (5.1) can be represented in state space form as  x˙1 = x2 + bu  = f + bu  (5.2) x˙2 = f˙ = η x, u, d, d˙ where both f and η are assumed unknown. Alternatively, in the case of single output (i.e., y = x1 ), Eq. (5.2) can be written in matrix form as  x˙ = Ax + Bu + Eη (5.3) y = Cx where



     0 1 b 0 A= ; B= ; C = [1 0]; E = 0 0 0 1

The ESO can be expressed in matrix form as  z˙ = Az + Bu + L(y − y) ˆ yˆ = Cz or

(5.4)



z˙1 = z2 + l1 (x1 − z1 ) + bu z˙2 = l2 (x1 − z1 )

(5.5)

where L = [l1 l2 ]T is the observer gain vector that can be obtained using any known method, such as the pole placement technique. When properly selected, the ESO provides an estimate of the state in Eq. (5.3) (i.e., zi estimates xi , where  i = 1, 2), where y is the estimate of system output y. More specifically, z1 tracks the system output, whereas z2 tracks f that includes system internal dynamics and external disturbance. The choice of the observer gain vector L, originally consisting of a set of nonlinear gains [16, 17], was simplified with linear gains so

Intelligent fault diagnosis for dynamic systems Chapter | 5

185

that it can be parameterized by solving the characteristic equation of the observer [18]. For instance, if gains are chosen as L = [2ωo ωo 2 ]T , then the characteristic polynomial of Eq. (5.4) becomes λ0 (s) = (s + ωo )2

(5.6)

where ωo is the observer bandwidth, which needs to be tuned in practice to ensure that the ESO operates effectively, and this is a complex argument (Laplace’s variable). In comparison with the original ESO, this is regarded as the improved ESO since the observer bandwidth is the only parameter that needs to be tuned. The analysis of ESO was briefly given in the work of Gao [18]; a more elaborate account is given in the work of Zheng et al. [19]. For practitioners, however, perhaps it is just as interesting to see the various applications of ESO and their success in providing a practical solution in dealing with uncertainties [18, 20]. The estimation error of the ESO is described in the next section.

5.2.2

Estimation error convergence

In this section, we will mathematically prove that, with plant dynamics largely unknown, the ESO can accurately estimate the unknown dynamics and disturbances with upper-bounded estimation error. Let ξ˜i (t ) = xi (t ) − zi (t ), i = 1, 2

(5.7)

From Eq. (5.2) and Eq. (5.4), the observer estimation error for states x1 and x2 can be described as ξ˙˜1 = ξ˜2 − l1 ξ˜1 ξ˙˜2 = η − l2 ξ˜1

(5.8)

Now let us scale down the observer estimation error ξ˜i (t ) by ωoi−1 —that is, let εi (t ) =

ξ˜i (t ) , i = 1, 2 ωoi−1

Then Eq. (5.8) can be written as   η x, u, d, d˙ ε˙ = ωoAε ε + Bε ωo where



   −2 1 0 , Bε = Aε = −1 0 1

Here, A is Hurwitz for L = [l1 l2 ]T = [2ωo ωo 2 ]T .

(5.9)

186

Fault diagnosis and prognosis techniques for complex engineering systems

  Theorem 1. Assuming η x, u, d, d˙ is there exists a constant  bounded,  σ i > 0 and a finite time T1 > 0 such that ξ˜i (t ) ≤ σi , i = 1, 2, ∀t ≥ T1 > 0, and ωo > 0. Note that  1 (5.10) σi = O k ωo where O is a function representing the order of the reciprocal of bandwidth   to the order of a positive integer k. The boundedness of η x, u, d, d˙ (i.e., f˙) means that the rate of change of the combined effect of internal dynamics and external disturbances is finite, which leads to an assumption that the combined effect and the control input are continuous. Here, η is essentially the derivative of acceleration. In a typical motion system, η being bounded means that the force applied to the body does not change infinitely within a very short period of time. In other words, the jerk (i.e., time derivative of acceleration) is finite. This is a reasonable assumption for a typical motion. Proof. Solving Eq. (5.9) gives ωo As t

ε(t ) = e



t

ε(0) +

e 0

Let

ωo As (t−τ )

  η x, u, d, d˙ Bε dτ ωo

(5.11)

  η x(τ ), u, d, d˙ p(t ) = e Bε dτ (5.12) ωo 0     Since η x(τ ), u, d, d˙ is bounded—that is, η x(τ ), u, d, d˙ ≤ δ, where δ is a positive constant, for i = 1, 2—then    −1 ωoAs t  δ   |pi (t )| ≤ 2  A−1 (5.13) Bε  ε Bε i +  Aε e i ωo

t

With A−1 ε

ωo As (t−τ )



   0 −1 0 = , Bε = 1 −2 1

the following can be written:

    −1    A Bε = 1|i=1  ε 2|i=2 

Since Aε is Hurwitz, there exists a finite time T1 > 0 such that  ω A t    e o ε ≤ 1 ij ωo2 for all t ≥ T1 , i, j = 1, 2. Hence,  ω A t    e o ε Bε  ≤ 1 i ωo2

(5.14)

(5.15)

(5.16)

Intelligent fault diagnosis for dynamic systems Chapter | 5

187

for all t ≥ T1 , i = 1, 2. Note that T1 depends on ωo Aε . Combining     0 −1 S11 S12 = = A−1 ε S21 S22 1 −2 and Eq. (5.16), which means     eωoAε t Bε  = d1 ≤  ω A t 1   e o ε Bε  = d2 ≤ 2

1 ωo2 1 ωo2

gives the expression  −1 ω A t    A e o ε Bε  = |si1 d1 + si2 d2 | ≤ |si1 d1 | + |si2 d2 | ≤ ε

i



1 | ωo2 i=1 3 | ωo2 i=2

(5.17)

for all t ≥ T1 . Eq. (5.13) can be expressed in terms of Eq. (5.14) and Eq. (5.17) as follows: 3δ 2δ |pi (t )| ≤ 2 + 4 (5.18) ωo ωo for all t ≥ T1 , i = 1, 2. Let εsum (0) = |ε1 (0)| + |ε2 (0)|; it follows that  ω A t   e o ε ε(0)  ≤ εsum (0) i ωo2 for all t ≥ T1 , i = 1, 2. Eq. (5.11) yields the following constraint:   |εi (t )| ≤  eωoAε t ε(0) i  + |pi (t )|

(5.19)

(5.20)

Substituting εi (t ) =

ξ˜i (t ) ωoi−1

and Eq. (5.18) into Eq. (5.20) leads to a conclusion that the absolute estimation error is, indeed, upper bounded.          ξ˜i (t ) ≤ 1 ξ˜1 (0) +  1 ξ˜2 (0) + 2δ + 3δ = σi   3−i ωo ωo ωo3−i ωo5−i for all t ≥ T1 , i = 1, 2

(5.21)

Theorem 1 has been mathematically proved that, in the absence of the plant model, the estimation error of the ESO as described in Eq. (5.4) is bounded and its upper bound monotonously decreases with the observer bandwidth. As long as the bandwidth is sufficiently large, the ESO can be used to estimate the state and the extended state f, which includes system internal dynamics and external disturbance. The ESO’s ability to estimate and track the system’s output state, y, and the extended state, f, provides a foundation for the proposed FDI schemes.

188

Fault diagnosis and prognosis techniques for complex engineering systems Pump 1

Pump 2

1

3

2

h1 h3 h2

Block s13 FIGURE 5.1

Block s32

Block s20

Schematic diagram of the three-tank system.

Since the extended state f, which includes system internal dynamics and external disturbances, is estimated by the ESO in real time and canceled in the control law in real time, the ESO achieves high disturbance rejection performance and strong robustness performance.

5.3 Case study: three-tank dynamic system To illustrate how the presented ESO can be used to track a nonlinear dynamic system, a three-tank nonlinear dynamic system [3] as shown in Fig. 5.1 was chosen for a case study. The system consists of three tanks (T1 , T2 , and T3 ) that are connected by three pipes. The system has two controlled inputs (pump flow rates), three measurable outputs (h1 , h2 , and h3 ; water levels), and three possible faults (pipe blockages). It is, indeed, a strongly coupled MIMO system. Using Torricelli’s law, the following three dynamic system equations can be obtained: ⎧ dh1 √ ⎪ 3 ) 2g|h1 − h3 | + Q1 ⎨AT dt = −s13 a1 sign(h1 − h√ √ 2 , AT dh = s32 a3 sign(h3 − h2 ) 2g|h3 − h2 | − s20 a2 2gh2 + Q2 dt ⎪ √ √ ⎩ dh3 AT dt = s13 a1 sign(h1 − h3 ) 2g|h1 − h3 |−s32 a3 sign(h3 − h2 ) 2g|h3 − h2 | (5.22) where AT is the circular cross-sectional area of each tank (assumed the same for all); a1 , a2 , and a3 denote the circular cross-section area of each pipe; s13 , s32 , and s20 denote pipe blockage; Q1 and Q2 denote the pump’s flow rate; and h1 , h2 , and h3 denote the water level of tanks T1 , T2 , and T3 , respectively. The blockage is in terms of degree of fault between 0 and 1, where 0 and 1 correspond to complete blockage and no blockage, respectively. Eq. (5.22) can

Intelligent fault diagnosis for dynamic systems Chapter | 5

be rewritten as

⎧ ˙ ⎪ ⎨h1 = f1 + h˙ 2 = f2 + ⎪ ⎩˙ h3 = f3

1 AT 1 AT

189

Q1 Q2 ,

(5.23)

where   1 s13 a1 sign(h1 − h3 ) 2g|h1 − h3 | AT    1 s32 a3 sign(h3 − h2 ) 2g|h3 − h2 | − s20 a2 2gh2 f2 = AT    1 f3 = s13 a1 sign(h1 − h3 ) 2g|h1 − h3 | − s32 a3 sign(h3 − h2 ) 2g|h3 − h2 | . AT

f1 = −

Let y(t) and u(t) be the system’s output and input vector, respectively, y(t ) = [h1 h2 h3 ]T ; u(t ) = [Q1 Q2 0]T ,

(5.24)

where h1 , h2 , and h3 denote the water level of tanks T1 , T2 , and T3 , respectively, and Q1 and Q2 denote the flow rate of pumps 1 and 2, respectively. Essentially, the water levels are the system output variables and the flow rates are the system input variables. Combining Eq. (5.23) and Eq. (5.24) gives y(t ˙ ) = f + bou(t ) where

(5.25)

⎡ ⎤ ⎡ ⎤ 1 0 0 f1 1 ⎣ 0 1 0⎦ f = ⎣ f2 ⎦. bo = AT 0 0 0 f3

The f1 , f2 , and f3 are called the generalized system dynamics of tank T1 , T2 , and T3 , respectively, and u(t) is the system’s inputs. Note that the constant bo can be determined by the system, which in this case is simply the reciprocal of the tank’s area. Eq. (5.25) can be represented in state space form as ⎧ ⎨x˙1 = x2 + bou x˙2 = v (5.26) ⎩ y = x1 where u(t) = [Q1 Q2 0]T is the system input, y = x1 = [h1 h2 h3 ]T is the system output, x2 = f = [f1 f2 f3 ]T is an augmented state, and ν is the time derivative of f. Rewriting Eq. (5.26) in matrix form gives  x˙ = Ax + Bu + Dv , (5.27) y = Cx

190

Fault diagnosis and prognosis techniques for complex engineering systems

where x=

        0 I b 0 x1 , A= , B= o , D= x2 6×1 0 6×3 0 0 6×6 I 6×3

C = [1 1 1 0 0 0] and I is a 3 × 3 identity matrix. Note that the expression for C in Eq. (5.27) is for three outputs, whereas that for C in Eq. (5.3) is for a single output. Employing the ESO design (Eqs. 5.2–5.6), denoting y as the measured or T  actual output, yˆ = hˆ 1 hˆ 2 hˆ 3 as the estimated output, and incorporating the difference between the two outputs, the ESO of Eq. (5.26) can be rewritten as  z˙1 = z2 + l1 (x1 − z1 ) + b0 u . (5.28) z˙2 = l2 (x1 − z1 ) The state space observer can be constructed as  z˙1 = Az + Bu + L(x1 − z1 ) , yˆ = Cz

(5.29)

where   z = [z1 z2 ]T i.e., z1 = [z11 z12 z13 ]T ; z2 = [z21 z22 z23 ]T . Eq. (5.22) shows that three-tank system consists of three simultaneous firstorder differential equations. Thus, the observer gain matrix, L, can be expressed as ⎤ ⎡ 0 0 Aωo ⎢ 0 0 ⎥ 2ωo ⎥ ⎢ ⎥ ⎢ 0 0 2ω o⎥ ⎢ L=⎢ 2 (5.30) ⎥. ω 0 0 ⎥ ⎢ o ⎣ 0 0 ⎦ ωo2 0 0 ωo2 With a chosen bandwidth ωo , the z vector can be used to estimate the system outputs and the system dynamics in real time. As proved in Section 5.2, the ESO’s estimation error is upper bounded and monotonously decreases with the bandwidth. With a sufficiently large bandwidth and as time proceeds, z1 quickly approaches y (i.e., h1 , h2 , and h3 ) and z2 approaches f (i.e., f1 , f2 , and f3 ). In other words, z1 tracks the system’s outputs, and z2 tracks the unmodeled system dynamics combined with external disturbance. More specifically, as stated in Eq. (5.29), z1 = [z11 z12 z13 ]T estimates the state variables x1 (i.e., the water levels h1 , h2 , and h3 ), and z2 = [z21 z22 z23 ]T estimates the extended state f (i.e., f1 , f2 , and f3 ).  z11 = hˆ 1 → h1 ; z12 = hˆ 2 → h2 ; z13 = hˆ 3 → h3 . (5.31) z21 → f1 ; z22 → f2 ; z23 → f3

Intelligent fault diagnosis for dynamic systems Chapter | 5

191

System dynamics (m/s)

0.01 0.005

z 23 f3

0

z 22

-0.005

f2 z 21

-0.01 -0.015

f1 0

10

20

30

40

50

Time (s) FIGURE 5.2

System dynamics tracking with ωo = 1 and 5% noise.

System dynamics (m/s)

0.01 0.005

-0.005

f2

-0.01 -0.015

z 23

f3

0

z 22 z 21

f1

0

10

20

30

40

50

Time (s) FIGURE 5.3

System dynamics tracking with ωo = 5 and 5% noise.

The value of the bandwidth ωo affects the system’s tracking speed and the state estimation’s sensitivity to measurement noise. Figs. 5.2 and 5.3 show the simulation results on the sensitivity of the ωo value to the measurement noise (with sampling time, t = 0.01 s). The simulation results demonstrate the effectiveness of the ESO in tracking the outputs and the dynamics of the system. The smaller the ωo , the slower the ESO tracks the system. As the ωo increases, the ESO tracks the system more quickly, but it also becomes more sensitive to the measurement noise. Choosing the appropriate ωo is a trade-off between the tracking speed and sensitivity to noise.

192

Fault diagnosis and prognosis techniques for complex engineering systems

5.4 Fault detection by means of ESO This section presents how faults can be detected by means of the ESOs based on real-time estimation of the system dynamics.

5.4.1

Fault detection scheme

As mentioned earlier, the faults to be detected are neither the sensor faults nor the actuator faults. Rather, they are the process faults possibly caused by structural deterioration. The process faults, in this case, are the pipe blockage faults s13 , s32 , and s20 as shown in Fig. 5.1. Traditionally, faults are considered detected when the outputs exceed the expected values by a preset tolerance. This approach, however, has some drawbacks in open-loop and closed-loop controls. When using the ESO for closed-loop control, observing the system’s output does not provide useful information about the health of the system because the controller tries to augment the inputs in an effort to stabilize the system. Thus, the health does not surface until the system finally collapses. Using the ESO for open-loop control also encounters a problem before the system reaches its steady states. In other words, an abrupt change on the system output does not necessarily mean the system is becoming faulty. Thus, solely relying on monitoring the system output could trigger a false alarm or miss detection of possible faults. It is worthwhile to note that the ESO’s unique feature is its ability to estimate the general system dynamics (i.e., the unmodeled system dynamics and unknown external disturbance) in real time, which provides crucial information for the presented fault detection technique. Our study found that the system outputs and the general system dynamics both exhibit abrupt changes as soon as a fault occurs. However, the rate of change on the general system dynamics is more profound. Furthermore, the system outputs potentially contain the process faults (i.e., the pipe blockage faults) and the actuator faults (i.e., the actuating faults in the pumps), whereas the general system dynamics solely contain the process faults. Considering that the goal of this study was to diagnose the process faults, our proposed fault detection scheme is based on the general system dynamics, f. More specifically, a fault is considered detected when the rate of change of general system dynamics, f / f , exceeds the predetermined threshold value.

5.4.2

Fault detection without exact knowledge of the plant model

As mentioned earlier, the ESO estimates the states of z21 , z22 , and z23 that track the system dynamics f1 , f2 , and f3 . The only information needed for fault detection is to estimate the value of bo . Our study found that the value of bo is, indeed, not critical to fault detection. Fig. 5.4 shows the simulation result of successfully detecting two sequential faults using the exact bo values of 127. Fig. 5.5 further indicates that the same faults can be detected even with a bo value of 635, which is five times as much as the exact one. The simulation assumes

Intelligent fault diagnosis for dynamic systems Chapter | 5

Abrupt change of system dynamics (m/s)

5

x 10 -3

z23

Δ z23

0

-5

Δ z22

-10

-15

193

z22

Δ z21

0

10

z21

20

30

40

50

Time (s) FIGURE 5.4 Detection of multiple faults (s13 = 0.8 at t = 10 s and s32 = 0.6 at t = 20 s) with bo = 127 (the exact value).

Abrupt change of system dynamics (m/s)

0.01 z23

0 -0.01 -0.02

z22

-0.03 z21 -0.04

0

10

20

30

40

50 Time (s)

FIGURE 5.5 Detection of multiple faults (s13 = 0.8 at t = 10 s and s32 = 0.6 at t = 20 s) with bo = 635 (rough estimated value).

that the first blockage fault s13 = 0.8 (i.e., 80% blocked) in the pipe connecting tanks 1 and 3 occurs at t = 10 s, followed by the second blockage fault s32 = 0.6 (i.e., 60% blocked) in the pipe connecting tanks 3 and 2 occurring at t = 20 s. The first fault affects the dynamics of tanks 1 and 3 (f1 and f3 ), which reflects the abrupt changes in the estimated states z21 and z23 . The second fault affects the dynamics of tanks 3 and 2 (f3 and f2 ), which reflects abrupt changes in the estimated states z23 and z22 . Note that ESO’s estimated z21 , z22 , and z23 closely track the system dynamics f1 , f2 , and f3 , respectively. The bo value is associated with the physical system, which is the cross-sectional area of the pipes connecting tanks. Figs. 5.4 and

194

Fault diagnosis and prognosis techniques for complex engineering systems

5.5 clearly demonstrate that the bo value is not critical to fault detection, which suggests that knowledge of the exact system model is not required. The presented ESO-based fault detection technique suggests that the accuracy of bo is not critical to fault detection. It should be noted that although faults can be detected without exact knowledge of the plant model, some knowledge about the model, such as the order of the system, is needed. The changes of these three extended states are worth observing. For instance, as shown in Fig. 5.4, when the first fault just occurred, z23 was negative, z22 was close to 0, and z21 was positive. However, when the second fault was added 10 s later, z23 became positive and z22 became negative, but z21 remained positive but smaller. The changing signs of the states and the levels of the state values (i.e., low, medium, and high) provide useful information for fault isolation.

5.5 FAULT isolation and fault identification The fault isolation to be presented here is based on the assumption that the exact system model is unknown. However, to verify the effectiveness of the presented technique, the referenced system outputs need to be generated first.

5.5.1

Generation of reference values

The outputs, in the case of the three-tank system, can be obtained by using such as piezo-resistive pressure sensors with resolution of 0.1 mm to measure the water levels. With sufficient input-output correspondence, a back-propagation neural network can be trained. The trained network can then be used to predict the outputs with reasonably good accuracy. Alternatively, the system outputs can be estimated in real time using the ESO based on the assumption that the exact plant model is known. With this alternative approach, the first step for identifying faults is to associate all faults with the system dynamics. First of all, Eq. (5.22) containing the pipe dynamics (the dynamics between two outputs) can be extracted as follows: ⎧ √ √ ⎨P13 = a1 sign(h1 − h3 )√2g|h1 − h3 | ≈ a1 sign(z11 − z13 )√2g|z11 − z13 | ) 2g|h3 − h2 | ≈ a3 sign(z13 − z12 ) 2g|z13 − z12 | , P32 = a3 sign(h 3 − h2√ √ ⎩ P20 = a2 2gh2 ≈ a2 2gz12 (5.32) where z11 , z12 , and z13 are the ESO’s system outputs, the water level in each tank as shown in Eq. (5.29). Substituting Eq. (5.32) into Eq. (5.23) gives the expressions for the general system dynamics f as follows: ⎧ 1 ⎪ ⎨ f1 = − AT s13 P13 1 f2 = AT (s32 P32 − s20 P20 ) , (5.33) ⎪ ⎩ f3 = 1 (s13 P13 − s32 P32 ) AT

Intelligent fault diagnosis for dynamic systems Chapter | 5

195

where AT is the circular cross-sectional area of each tank (assumed the same for all). Note that bo is reciprocal of the AT . Furthermore, if the exact plant model were known, the degree of each fault for the three-tank system could be easily determined by ⎧ z21 AT ⎪ ⎪ sˆ13 = − ⎪ ⎪ P13 ⎪ ⎨ (z21 + z22 + z23 )AT . (5.34) sˆ20 = − ⎪ P20 ⎪ ⎪ (z + z23 )AT ⎪ ⎪ ⎩sˆ32 = − 21 P32 In the case of an uncertain plant model, not only does fault isolation becomes more difficult but also degree-of-fault determination becomes a major task. These will be addressed in the following two sections.

5.5.2

Fault isolation by means of fuzzy inference and ESO

In addition to monitoring the system outputs, the system dynamics, f, used for fault detection can be used for fault isolation. Referring to Fig. 5.4 when the first fault occurs at t = 10 s, if z21 (the ESO’s estimated f1 ) is positive, z22 (the ESO’s estimated f2 ) is negative, and z23 (the ESO’s estimated f3 ) is negative, then a blockage fault between tanks 1 and 3 (i.e., s13 ) likely has occurred. When the second fault occurs at t = 20 s, if z21 is positive, z22 is negative, and z23 is positive, then a blockage fault between tanks 3 and 2 (i.e., s32 ) likely has occurred. The observations suggest some intuitive logic, better known as fuzzy logic, can be employed to classify the faults. An FIS consists of input membership functions, output membership functions, and the if-then fuzzy logic rules. Among them, constructing the proper input membership functions is critical and can be most difficult if there is no prior knowledge about how input data are distributed. The best way to determine data distribution is through the use of histograms. The FIS’s input variables are z21 , z22 , and z23 , which are normalized to the range of [–1, 1]. The output variables are the degree of fault for s13 , s32 , and s20 , which are normalized to the range of [0, 1], where “0” represents no fault and “1” represents complete fault. The input membership functions for z21 , z22 , and z23 are the same, which are LNG (large negative), SNG (small negative), and POS (positive). The output membership functions for faults s13 , s32 , and s20 are also the same, which are normal and faulty. The crisp input variables are first fuzzified and then processed by the fuzzy logic rules. Afterward, they are defuzzified into the range between 0 and 1, which indicates the fault occurrence confidence between 0% and 100%. The six if-then fuzzy rules for a single fault are as follows: Rule 1: If ( z21 is POS) and ( z22 is SNG) and ( z23 is LNG), then (s13 is faulty) and (s32 is normal) and (s20 is normal).

196

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 5.1 Result of fault isolation and identification Pump 2 Assumed Pump 1 flow rate, flow rate, degree of Q1 (L/min) Q2 (L/min) fault 6.5

8.25

Fault occurrence confidence

Degree-ofNN predicted fault degree of fault error

s13 = 0.17

s13 with 96%

0.1700

0%

7

9

s13 = 0.38

s13 with 96%

0.3802

0.05%

10

6.75

s13 = 0.63

s13 with 96%

0.6306

0.10%

6.5

8.25

s32 = 0.12

s32 with 96%

0.1200

0%

7

9

s32 = 0.44

s32 with 96%

0.4403

0.07%

10

6.75

s32 = 0.57

s32 with 96%

0.5704

0.07%

6.5

8.25

s20 = 0.23

s20 with 96%

0.2301

0.04%

7

9

s20 = 0.45

s20 with 96%

0.4503

0.06%

10

6.75

s20 = 0.70

s20 with 96%

0.7009

0.13%

NN: Neural network.

Rule 2: If ( z21 is POS) and ( z22 is LNG) and ( z23 is LNG), then (s13 faulty) and (s32 is normal) and (s20 is normal). Rule 3: If ( z21 is POS) and ( z22 is LNG) and ( z23 is SNG), then (s13 faulty) and (s32 is normal) and (s20 is normal). Rule 4: If ( z21 is POS) and ( z22 is LNG) and ( z23 is POS), then (s32 faulty) and (s13 is normal) and (s20 is normal). Rule 5: If ( z21 is POS) and ( z22 is SNG) and ( z23 is POS), then (s32 faulty) and (s13 is normal) and (s20 is normal). Rule 6: If ( z21 is POS) and ( z22 is POS) and ( z23 is POS), then (s20 faulty) and (s13 is normal) and (s32 is normal).

is is is is is

The FIS essentially gives the confidence in a fault occurrence. A component is considered faulty when the confidence exceeds or is equal to 80%.

5.5.3

Fault identification via neural networks

With the given three-tank system, incipient faults are likely to occur, which will require monitoring and determining the degree of fault at all time. However, the degree of fault, in theory, cannot be determined unless the exact plant model is known. The only alternative is to use experimental data. In the absence of experimental data, simulation data using Eq. (5.2) were generated. Table 5.1 shows examples of single fault in which the FIS was able to isolate all faults with 96% confidence, which was the maximum output value by design. The error of each predicted degree of fault was extremely small. In this simulation, the system input variables are the pump rates: Q1 = 6 liter/min

Intelligent fault diagnosis for dynamic systems Chapter | 5

197

and Q2 = 4 liter/min. To demonstrate the ESO’s effectiveness in filtering noise, 5% white noise was added to each input variable. More studies on model-free fault diagnosis can be found in other works [21–23]. With the given three-tank system, incipient faults are likely to occur, which will require monitoring and determining the degree of fault at all time. However, the degree of fault, in theory, cannot be determined unless the exact plant model is known. The only alternative is to use experimental data. In the absence of experimental data, simulation data using Eq. (5.2) were generated. To do so, a back-propagation neural network for each fault using randomly selected inputs and their corresponding outputs was trained via the Matlab Neural Network Toolbox. The input variables of each neural network are the pump flow rates, whereas the output variable is the degree of fault between 0 and 1. As soon as the fault is isolated, the respective neural network is fired to instantly predict the degree of fault.

5.6 Simultaneous faults of different types In 2007, Lin and Singh [21] developed an intelligent model-free diagnosis technique for multiple faults in a nonlinear dynamic system; however, the technique was limited to the same type of process faults. Later, Zhang et al. [22] investigated the issue of isolation of process faults and sensor faults for a class of nonlinear uncertain systems but did not address the more difficult issue on isolation of actuator faults and process faults. In fact, our literature search found no studies conducted on simultaneous actuator faults and process faults. This study investigates the complexity due to simultaneous occurrence of process faults, actuator faults, and sensor faults, and proposes a method to isolate actuator faults from process faults. To better explain how a system behaves with simultaneous faults of different types, a strongly coupled MIMO three-tank dynamic system is used in this study.

5.6.1

Isolation of process faults

In the given three-talk dynamic system, there exist three possible types of faults: process faults, sensor faults, and actuator faults. The three process faults in this case refer to the pipe blockage for each tank, which was investigated and presented earlier in this chapter. The sensor faults refer to faults in sensing the water level of each tank, whereas the actuator faults refer to failure in pumping the right water to tanks 1 and 2. Isolation of simultaneous process faults, in general, is not a difficult task. However, it can become complex for a strongly coupled dynamic system such as the three-tank system. This is when hard computing alone cannot accomplish the isolation task. The isolation can be more reliable with the aid of soft computing.

198

5.6.2

Fault diagnosis and prognosis techniques for complex engineering systems

Isolation of sensor faults

The occurrence of a sensor fault typically causes a bias to occur in the measurements of the affected sensor. The sensor faults investigated here were introduced via an instantaneous numerical offset at a specific time after reaching the system’s steady state. The unique behavior of the sensor fault that distinguishes it from the actuator and process faults can be mathematically explained via the matrix algebra of the ESO as described in Eq. 5.5. This equation is repeated here. z˙1 = z2 + l1 (x1 − z1 ) + bu z˙2 + l2 (x1 − z1 ) For multiple output systems with n variables, the variables x1 , z1 , and z2 are column vectors with n elements. Correspondingly, the observer gains l1 and l2 are n by n diagonal matrices where each nonzero element represents the gain corresponding to a specific element of (x1 – z1 ). In the event of a sensor fault, the immediate change in one of the measured variables causes one of the elements of (x1 – z1 ) to become nonzero (as the ESO values previously matched the measurements, all elements in (x1 – z1 ) were zero). After multiplication by the diagonal gain matrix, the resulting vector also contains only one nonzero element, with the same index. Sensor faults demonstrate distinct patterns in both the observable state variables and extended state variables, which makes them relatively easy to detect and separate from process and actuator faults. Specifically, if the i-th sensor associated with the i-th observable state variable has a fault, the associated ith extended state variable will have a spike corresponding to the step change in the observed state variable. The changes in other observable and extended state variables are negligible. Using the same three-tank system, the following data were used to perform simulation. Figs. 5.6 and 5.7 exhibit abrupt changes corresponding to the occurrence of three consecutive sensor faults: Initial heights ho = [6 3 4] m Bandwidth ωo = 10 Tank area, At = 3 m2 Simulation period = 90 s Acceleration of gravity g = 9.81 m/s2 Pipe areas a1 = a2 = a3 = 1 m2 Flux coefficients or blockage s13 = s32 = s20 = 1 unless specified otherwise Pump inputs Q1 = Q2 = 5 m3 /s unless specified otherwise. Fig. 5.7 shows how the Z2 values that track extended state variables responded to the three sensor faults occurring sequentially at three different times. Each time a fault occurred, only the corresponding observable state and extended state

Intelligent fault diagnosis for dynamic systems Chapter | 5 Height over time 10 9 H1

8

H2

Tank height (m)

7 6 5

H3

4

H1 H2 H3 Z11 (~H1) Z12 (~H2) Z13 (~H3)

3 2 1 0

FIGURE 5.6

0

10

20

30

40 50 Time (s)

60

70

80

90

Water level spikes due to consecutive sensor faults at 25, 50, and 75 s.

Z2 value over time 25 Z21 Z22 Z23

20

15

Z2

10 Z21

Z23

5

0 −5 Z22 −10

FIGURE 5.7

0

10

20

30

40 50 Time (s)

60

70

80

Z2 value spikes responding to the sensor faults at 25, 50, and 75 s.

90

199

200

Fault diagnosis and prognosis techniques for complex engineering systems

variable exhibit changes (a step and a spike), which clearly demonstrates the isolated nature of sensor faults. Despite having multiple faults occur (each in a different tank), each one is isolated to its respective variables. Thus, when a sensor fault occurs alone, a remark can be made as follows. Remark 1. If the i-th sensor experiences a fault, only the i-th elements of the observable state variable and the extended state variables are affected despite multiple faults.

5.6.3

Isolation of actuator faults

It should be noted here that the given three-tank system is a MIMO system consisting of two inputs and three outputs. The two inputs are the pump rates, and the three outputs are the water level of each tank. Thus, sensors are only available to measure the output variables (i.e., the water levels). When an actuator fault alone occurs, the calculated values of the associated extended state variable from the ESO do not converge to the expected value. This is because ESO uses system input u to calculate the extended state variables. Thus, when an actuator fault occurs, the corresponding value in u deviates from the designed theoretical value, which causes the extended state variable to change at the equilibrium. Likewise, occurrence of a process fault alone can cause changes in state variables, which leads to changes in extended state variables. This, indeed, further complicates the isolation of simultaneous process faults and actuator faults.

5.7 Isolation of simultaneous process faults and actuator faults A process or actuator fault usually affects values of more than one observable variables and extended state variables as calculated by the ESO. Isolation of process faults from actuator faults exhibits complex coupling effects for a typical MIMO system. Isolation of process faults and actuator faults are closely examined in the following.

5.7.1

Characteristics of process faults and actuator faults

To distinguish actuator faults and process faults, the way they affect the convergence of extended state variables calculated by the ESO must be examined. Process faults alter the dynamics of the system, and thus the final steady state values of the state variables (in this case the tank heights) are different. Actuator faults cause unexpected discrepancies in the system input and thus also result in different steady state variable values. However, due to the calculation method of the ESO, the steady state values of the extended state variables of z2 are also affected. At steady state, all time derivatives become zero and the ESO’s

Intelligent fault diagnosis for dynamic systems Chapter | 5

201

estimated values match the measured values, which cause Eq. (5.5) to reduce to the following: 0 = z2 + l1 (0) + bu(i.e. z2 = −bu) In the case of an actuator fault, values in vector u will be affected by the fault and deviate from the designed or expected theoretical values. However, the ESO will not know this and will still use the theoretical values. This will cause the elements of z2 to converge to incorrect values, which do not match the steady state values of corresponding functions f1 , f2 ,…fn . Without the ability to compare z2 to f (as in most situations, f that represents plant dynamics must be assumed unknown), this discrepancy cannot be observed and thus the only distinguishing characteristic of actuator faults is unobservable. Recall that a general nonlinear system that can be modeled by the ESO is expressed in the following: y(n·) = f + bu, where y(n) denotes the n-th time derivative of y, f is essentially the extended state representing the lumped nonlinear time-varying function of the plant dynamics and external disturbances, u is the system input, and b is a constant that is related to the physical model. The occurrence of a process fault causes a change in f, whereas an actuator fault causes a change in u. However, either type of fault results in a net change to the same variable, y(n) , which updates the system state. This makes it possible for both types of faults to produce quite similar behavior in the system state and extended state variables in the case of a single fault, or cause the appearance of an undisturbed system with simultaneous actuator faults and process faults. Mathematically, this fault ambiguity can be represented by : y(n) = f + bu +

(5.35)

For actuator faults or process faults, represents a change in either the input b(u + u) or the system dynamics f + f. The fact that could be either b u or f allows for either type of a single fault to produce similar states if b u = f. Furthermore, simultaneous process faults and actuator faults can result in a seemingly undisturbed state if b u = – f. The computer simulation supports this mathematical finding. Figs. 5.8 and 5.9 show how false isolation could be concluded when an actuator fault and process faults occur at the same time (t = 25 s). Thus, the following remark can be made. Remark 2. For a typical MIMO nonlinear system, actuator faults, in general, cannot be isolated from process faults unless one or more additional sensor measurements are made.

202

Fault diagnosis and prognosis techniques for complex engineering systems Height over time 10 9

Z13

Z11

8

Tank height (m)

7

Z12 H1 H3

6

H2

5

H1 H2 H3 Z11 (~H1) Z12 (~H2) Z13 (~H3)

4 3 2 1 0 0

FIGURE 5.8 at 25 s.

10

20

30

40 50 Time (s)

60

70

80

90

Water levels changed when an actuator fault and a process fault are introduced

Z2 value over time

25

Z21 Z22 Z23

20

Z2

15 10 5 Z23

0

Z21 −5

FIGURE 5.9

5.7.2

0

10

Z22 20

30

40 50 Time (s)

60

70

80

90

Z2 values that tracked ESO show virtually no changes due to combined faults.

Utilizing an outflow sensor to isolate actuator faults

As discussed earlier, at least a sensor measurement must be added to resolve the ambiguity between process faults and actuator faults. Taking the presented

Intelligent fault diagnosis for dynamic systems Chapter | 5

203

three-tank system as an example, one can add an outflow sensor at the very end of the pipe (right outside the right-hand side of tank 2) to measure the system’s net outflow. At steady state, conservation of mass dictates that the outflow of the three-tank system must be equal to the sum of the inputs Q1 + Q2 (i.e., pump flow rates). The theoretical steady state value of this quantity is only dependent on system specifications, and the actual outflow must always converge to the theoretical value except in the event of an actuator fault. As no nonactuator fault can affect Q1 or Q2 , measuring the net outflow allows for a means to isolate actuator faults. This concept will work only if the added outflow sensor itself is not faulty. In fact, there is a simple way to determine if the outflow sensor itself is faulty. After the system reaches the steady state, if the net outflow does not equal Q1 + Q2 but there is no noticeable disturbance in the observable state and the extended state variables, then it is likely the outflow sensor is faulty. However, if the net outflow does not equal Q1 + Q2 and there is noticeable disturbance in those variables, then there exists a fault in actuator 1 or 2. The only means to isolate one actuator fault from the other is to directly measure each actuator’s output.

5.8 Conclusion and future work This study mathematically proved that the ESO’s estimation error is upper bounded and its upper bound monotonously decreases with the observer bandwidth. This important proof allows for applying the improved ESO to be an effective means for FDI. The main advantage of the presented FDI technique is its robustness against uncertainty in the plant dynamics as well as disturbances. The parameterized ESO that requires tuning of only a single parameter (the observer bandwidth) makes it easy to be implemented in fault diagnosis. The bandwidth affecting the system’s tracking speed and sensitivity to measurement noise can be easily tuned to meet the individual need for diagnosis. From the model-based FDI point of view, the issue of how much knowledge about a nonlinear dynamic system is needed has been of great interest to researchers for years. This study concludes that ESO-based fault detection requires little knowledge about the plant model, not much beyond the order of the system. The ESO-based fuzzy inference proved to be an effective mean for fault isolation. The fuzzy inference is particularly good at handling uncertainty in the plant model. Furthermore, this study went beyond the traditional FDI by adding the capability of determining the degree of fault via neural networks. Such capability is particularly important for diagnosis of incipient faults. The detection and isolation of process faults by means of ESO and fuzzy inference have been presented. This study was conducted via computer simulation in which the equations for the three-tank system were used to calculate the theoretical values. To simulate unmodeled dynamics, 5% to 10% of external

204

Fault diagnosis and prognosis techniques for complex engineering systems

disturbance was introduced. The ESO was found capable of filtering system noise and correctly detecting process faults even when the system was not correctly modeled (when using bo = 635 as supposed to the exact value of 127). However, in reality, process faults could be accompanied by sensor faults and/or actuator faults. This study investigated the complexity of fault isolation for simultaneous process faults, actuator faults, and sensor faults. Among them, sensor faults can be easily detected and isolated. However, for a strongly coupled MIMO nonlinear system, the combination of process faults and actuator faults exhibits complex coupling effects because these two types of faults affect the values of both observable state and extended state variables. For the presented three-tank system, a remedy was proposed to isolate actuator faults from process faults. Future work will include developing a general methodology to isolate actuator faults from process faults for a strongly coupled dynamic system.

References [1] P. M. Frank, S. X. Ding, T. Marcu. Model-based fault diagnosis in technical processes. Transactions of the Institute of Measurement and Control 2000;22(1):57–101. [2] V. Venkatasubramanian, R. Rengaswamy, K. Yin, S. N. Kavuri. A review of process fault detection and diagnosis: Part I. Quantitative model-based methods. Computer & Chemical Engineering 2003;27(3):293–311. [3] P. P. Lin, X. Li. Fault diagnosis, prognosis and self-reconfiguration for nonlinear dynamic system using soft computing techniques. In Proceedings of the 2006 IEEE Conference on Systems, Man, and Cybernetics. [4] R. Tarantino, F. E. Szigeti, Colina-Morles. Generalized Luenberger observer-based faultdetection filter design: An industrial application, Control Engineering Practice 8 (2000) 665– 671. [5] S. K. Dash, R. Rengaswamy, V. Venkatasubramanian. Fault diagnosis in a nonlinear CSTR using observers. In Proceedings of the 2001 Annual AIChE Meeting. Paper 282i. [6] A. Z. D. Odloak. Sotomayor, Observer-based fault diagnosis in chemical plants, Chemical Engineering Journal 112 (2005) 93–108. [7] P.M. Frank, Advanced fault detection and isolation schemes using nonlinear and robust observers, Proceedings of the 10th IFAC World Congress (1987). [8] P. M. Frank. Online fault-detection in uncertain nonlinear-systems using diagnostic observers—A survey. International Journal of Systems Science 1994;25(12):2129–2135. [9] P. M. Frank, X. Ding. Survey of robust residual generation and evaluation methods in observerbased fault detection systems. Journal of Process Control 1997;7(6):403–424. [10] R. Isermann. Model-based fault-detection and diagnosis—Status and applications. Annual Reviews in Control 2005;29: 71–85. [11] V. F. Filareretov, M. K. Vukobratovic, Observer-based fault diagnosis in manipulation robots, Mechatronics 9 (1999) 929–939. [12] A. Xu, Q. Zhang. Nonlinear system fault diagnosis based on adaptive estimation. Automatica 2004;40: 1181–1193. [13] M. Fang, Y. Tian, L. Guo. Fault diagnosis of nonlinear system based on generalized observer. Applied Mathematics & Computation 2007;185: 1131–1137.

Intelligent fault diagnosis for dynamic systems Chapter | 5

205

[14] H. B. Wang, J. L. Wang, Robust fault detection observer design: Iterative LMI approaches, ASME Journal of Dynamic Systems, Measurement & Control 129 (2007) 77–82. [15] A. Radke, On Disturbance Estimation and Its Application on Health Monitoring, Ph.D. Dissertation, Cleveland State University, 2006. [16] J. Han. A class of extended state observers for uncertain systems. Control & Decision 1995;10(1):85−88 (in Chinese). [17] J. Han, Nonlinear design methods for control systems, Proceedings of the 14th IFAC World Congress (1999). [18] Z. Gao. Scaling and parameterization based controller tuning. In Proceedings of the 2003 American Control Conference. 4989−4996. [19] Q. Zheng, L. Q. Gao, Z. Gao. On estimation of plant dynamics and disturbance from inputoutput data in real time. In Proceedings of the 2007 IEEE Multi-Conference on Systems and Control. 1167−1172. [20] Z. Gao. Active disturbance rejection control: A paradigm shift in feedback control system design. In Proceedings of the 2006 American Control Conference. 4989−4996. [21] P. P. Lin, H. Singh. Intelligent model-free diagnosis for multiple faults in a nonlinear dynamic system. In Proceedings of the 2007 IEEE/ASME Conference on Advanced Intelligent Mechatronics (AIM). [22] X. Zhang, M. M. Polycarpou, T. Parisini. Isolation of process and sensor faults for a class of nonlinear uncertain system. In Proceedings of the 2008 American Control Conference. [23] P. Zhang, A model-free approach to fault diagnosis of continuous-time systems based on time domain data, International Journal of Automation & Computing 4 (2) (2007) 189–194. [24] M.J. Korbicz. Kowal, Fault detection under fuzzy model uncertainty, International Journal of Automation & Computing 4 (2) (2007) 117–124.

Chapter 6

Fault diagnosis and failure prognosis in hydraulic systems Jie Liu a, Yanhe Xu a, Kaibo Zhou b and Ming-Feng Ge c a School

of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan, China. b School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China. c School of Mechanical Engineering and Electronic Information, China University of Geosciences, Wuhan, China

6.1 Application status of sensor detection technology Based on the energy conversion relationship between water and machinery, and the hydraulic characteristics, dynamic characteristics and structural characteristics of hydraulic machinery are the main research objects, and the main task is to ensure high efficiency and safe and stable operation of various hydraulic machines. The main contents of the research are energy characteristics of hydraulic machinery, cavitation characteristics, and operational stability [1]. Sensor detection technology is an important method for researchers to grasp the operating status information of hydraulic units. This chapter introduces the relevant standards for hydraulic machinery sensor detection, two typical application scenarios of model test rigs and field prototypes, and the development and application status of sensor detection technology.

6.1.1 Relevant standards of hydraulic machinery sensor detection technology The sensor detection system can obtain online status information of equipment in real time, record the operation data of the equipment comprehensively, reduce the downtime of the unit, and find the fault symptoms in advance. This has always been a research hotspot in the industry. The extensive application of sensor detection technology can improve the accuracy of test experiments, reduce errors, extend the preliminary test period of equipment, and provide a decision-making basis for the power plant to carry out state maintenance and optimized maintenance. Industrial organizations such as the China Electrical Equipment Industry Association, the China Water Turbine Standardization Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00011-2 Copyright © 2021 Elsevier Inc. All rights reserved. 207

208

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 6.1 Relevant standards of hydraulic machinery sensor detection technology. ID

Name

GB/T15613

Model acceptance test of hydraulic turbine, energy storage pump, and pump turbine

GB/T10969

Technical conditions for flow components of hydraulic turbines

GB/T15613

Regulations on site acceptance test for hydraulic performance of hydraulic turbines, energy storage pumps, and pump turbines

SL142-2008

Test rules turbine model muddy water acceptance

IEC60041

Field acceptance tests to determine the hydraulic performance of hydraulic turbines, storage pumps, and pump turbines

IEC60193

Hydraulic turbines, storage pumps, and pump-turbines: model acceptance tests

IEC60308

Hydraulic turbines–testing of control systems

IEC994

Guide for field measurements of vibrations and pulsations in hydraulic machines

Technical Committee, the International Electrotechnical Commission (IEC), and other industry organizations pay close attention to hydraulic mechanical sensor detection systems and have launched a series of related standards successively, among which the important ones are shown in Table 6.1.

6.1.2

Instrumentation for the hydraulic turbine prototype

Hydraulic machinery is a subject mainly based on experimental science. Highprecision hydraulic machinery test equipment and sensor testing technology are the necessary conditions for the design and verification of new hydraulic machinery products, and also important means to study and solve abnormal failures in actual field operations [2]. (1) Brief introduction to the hydraulic mechanical model test bench. The highprecision hydraulic machinery model test bench is an indispensable tool to advance the technological progress of hydraulic machinery disciplines. The main universal hydraulic machine model test beds worldwide include VOITH, ALSTOM, EPFL, Rainpower, ANDRIZ, Chinese Institute of Water Resources and Hydropower Research (IWHR), Harbin Institute of Electrical Machinery (HEC), and Dongfang Electric Co., Ltd. (DEC). At present, the highest test head of the hydraulic mechanical model test stand with an international advanced level can reach 150 m, the maximum test flow can reach 2.2 m3 /s, and the comprehensive efficiency error of the efficiency test

Fault diagnosis and failure prognosis in hydraulic systems Chapter | 6

209

TABLE 6.2 Comparison of parameters of the test bench for an advanced-level hydraulic machinery model. Test bench

VOITHUHD2-2 ALSTOMT3 EPFLPF2 IWHR-TP1 HEC-H

DF-100

Highest test head/m

240(T) /250(P)

100(T) /150(P)

120

150

150

100

Test flow (m³/s)

1.5

0.9

1.4

2.2

2.0

1.5

Model runner diameter/mm

300–500

300–500

300–500 250–500

300–500 350–500

Power/kW

600

360

300

540

500

Rotate speed /rpm

2000

1000–2400 2500

2600

300–2500 3000

Water supply pump

2





24SA-10

24SA -10B

2

Motor power /kW

1600





724∗ 2

600∗ 2

700

Motor rotate speed/rpm

1490





1200

Integrated efficiency error/%

±0.2

±0.25

0 ⎩ 0, yci ≤ 0 (3) A classifier based on softmax regression. After multiple feature extractions, diagnosis features are obtained by a linear transformation, and the detailed formula of softmax is shown in the following. Suppose θ is a parameter matrix and x is the input of softmax. The input of the normalized exponential function is obtained by making matrix multiplication between θ and x. The normalized exponential function is applied to each element z(i) . Then the probability of the corresponding category is calculated by dividing the sum of all of these exponentials. The detail is shown in Eq. (7.50) and Eq. (7.51), where y(i) is the label of z(i) and p(y(i) = k|z(i) ) is a probability that x is classified to category k. z = θx ⎡

p(y(i) (i) ⎢

 ⎢ p(y h z(i) = ⎢ ⎣ p(y(i)

⎤ ⎡ z1 (i) ⎤ e = 1|z(i) ) ⎢ez2 (i) ⎥ = 2|z(i) )⎥ 1 ⎥ ⎢ ⎥ ⎥ = k ⎢ . ⎥. .. z j (i) ⎣ . ⎦ ⎦ e . . j=1 (i) = k|z(i) ) ezk

(7.50)

(7.51)

296

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 7.21

Feature extraction unit.

(4) Architecture with high diagnostic speed and high accuracy. For high diagnostic speed and accuracy, we use a network based on MobileNet (a depthwise separable convolutional network) and experimentally choose the most suitable image size. The base feature extraction unit of the network based on MobileNetV1 is shown in Fig. 7.21, which consists of depthwise separable convolution, batch normalization, and ReLU6. The whole network consists of 14 feature extraction units and a softmax classifier. We use five-time subsampling to reduce the size of the feature map, and the detailed information is shown in Table 7.8. Convolution and depthwise separable convolution with two strides are used to reduce the size of the feature map the first four times, and the global average pooling operation is used to reduce size in the final time.

7.4.3

Experimental analysis

In this experiment, data is sampled from the environment illuminated by an artificial light source. In addition, each category will be sampled with 2400 images for a total of 19,200 images. The instances of the environment by the artificial light source are shown in Fig. 7.22. The data consists of the training set and test set. The training set and test set include 6400 and 12,800 images, respectively. Because electrical signals cannot diagnose symmetrical faults, symmetrical faults (20%–40%, 20%–40%, 40%–60%) are set in this part. Table 7.9 shows the detailed percentage of attachment degree and corresponding classification labels. Since the input size affects the diagnostic speed and accuracy, we find the most suitable input size for each CNN by multiple experiments. The results are shown in Table 7.10, where size: 256 denotes that the image sizes md and nd are 256. The best input size is 128 for ResNet50, 96 for MobileNet and InceptionResNetV2. At the same time, the best performance numbers of each model are marked in bold. Because the useful network is based on MobileNet, the residual unit is not used in the network. The network based on MobileNetV1 is still affected by the

Fault detection and fault identification in marine current turbines Chapter | 7

297

TABLE 7.8 Detail of the architecture based on MobileNetV1. Type/Stride

Kernel shape

Conv/s2

3 × 3 × 3 × 32

DSCONV/s1

3 × 3 × 32 + 32 × 64

DSCONV/s2

3 × 3 × 64 + 64 × 128

DSCONV/s1

3 × 3 × 128 + 128 × 128

DSCONV/s2

3 × 3 × 128 + 128 × 256

DSCONV/s1

3 × 3 × 256 + 256 × 256

DSCONV/s2

3 × 3 × 256 + 256 × 512

DSCONV/s1

3 × 3 × 512 + 512 × 512

DSCONV/s1

3 × 3 × 512 + 512 × 512

DSCONV/s1

3 × 3 × 512 + 512 × 512

DSCONV/s1

3 × 3 × 512 + 512 × 1024

DSCONV/s1

3 × 3 × 512 + 512 × 512

DSCONV/s1

3 × 3 × 512 + 512 × 512

DSCONV/s2

3 × 3 × 512 + 512 × 1024

DSCONV/s2

3 × 3 × 1024 + 1024 × 512

Global Avg Pool/s1

Pooling operation

FC/s1

1024 × number of categories

Softmax/s1

Classifier

FIGURE 7.22

Data sampled from the environment illuminated by the artificial light source.

298

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 7.9 Diagnostic category label in the work environment. Percentage of the area occupied by attachment

Softmax classifier labels

0%, 0%, 0%

0

0%–20%, 20%–40%, 0%

1

0%–20%, 20%–40%, 40%–60%

2

0%–20%, 40%–60%, 0%

3

0%–20%, 0%, 0%

4

20%–40%, 20%–40%, 40%–60%

5

20%–40%, 0%, 0%

6

40%–60%, 0%, 0%

7

TABLE 7.10 Experimental results with different CNNs. Accuracy CNN

Size: 256

Size: 224

Size: 192

Size: 160

Size: 128

Size: 96

Size: 64

ResNet50

83.84%

76.55%

80.38%

81.57%

89.04%

85.82%

57.74%

MobileNet

86.94%

81.43%

85.14%

88.93%

91.19%

93.97%

92.21%

Inception83.86% ResNetV2

86.54%

84.51%

85.88%

84.37%

93.14%

TABLE 7.11 Experimental results with the number of layers in MobileNetV1. No. of layers

Accuracy

13

93.02%

14

93.13%

15

93.97%

16

94.09%

17

94.25%

18

93.83%

vanishing gradient, which results in limited network depth. Therefore, we study the effect of the number of convolutional layers on the performance by multiple experiments. The accuracy of 15-layer MobileNet [54] is used as the benchmark, and 17-layer MobileNet presents the best accuracy; the average accuracy results are shown in Table 7.11.

Fault detection and fault identification in marine current turbines Chapter | 7

299

TABLE 7.12 Performance indicator results with different CNNs. CNN

No. of floating-point operations

No. of parameters

Accuracy

ResNet50

47,053,088

23,606,153

89.04%

CNN based on MobileNetV1 (17-layer MobileNet)

7,505,631

3,778,760

94.25%

InceptionResNetV2

108,549,709

54,350,569

93.14%

Performance indicators include accuracy and the number of floating-point operations and parameters; the detail is shown in Table 7.12. Compared with other networks, the network based on MobileNetV1 has higher accuracy. To extract distinct features without sacrificing efficiency, we use data compression and normalization to process the raw image taken from the abominable working conditions. To overcome the difficulty of blurry and dim image feature extraction, an effective feature extraction method is chosen that consists of three parts: (1) set label including imbalance and symmetrical attachment faults; (2) use nearest-neighbor interpolation and normalization to preprocess images; and (3) extract image features through CNN-based depthwise separable convolution. The identification method of blade attachment based on depthwise separable CNN has four advantages: (1) it has high diagnostic accuracy and speed, (2) it is suitable for underwater working conditions without a natural light source, (3) it has effective imbalance and symmetrical attachment fault diagnoses, and (4) it is robust into the recognition of blurred images.

7.5 Conclusion and future works An MCT’s rotor and blade are often affected by attachment, which leads to an imbalance fault. Thus, the attachment degree diagnosis is an important domain for MCT research. To detect the imbalance faults more easily and reduce the influence, a HT-based detection method is introduced in this chapter. The results of simulation show that this method is useful to detect the imbalance faults based on the voltage signal for the direct-drive MCT. To decrease the interference signals, which are generated by turbulence and waves in different velocities of water flow, a wavelet threshold denoising–based detection method is introduced to detect the imbalance fault for MCTs. This method can detect the imbalance fault automatically and has good stability in different velocities of water flow. The experimental results in different velocities of water flow with Q statistics have shown satisfactory imbalance fault detection with false alarm and false-negative rates less than 1% and 5%, respectively. Only an imbalance fault can be detected based on the electrical signal, and if the attachment is evenly distributed on the blades, it is difficult to find the faults.

300

Fault diagnosis and prognosis techniques for complex engineering systems

The identification method of blade attachment based on the sparse autoencoder and softmax regression are introduced to monitor whether the blade is attached by benthos and then to determine its corresponding degree of attachment. The experimental results show that this method is useful to classify the different degrees of biological attachment. To classify the percentage of area occupied by attachment, the identification method based on depthwise separable CNN is introduced in this chapter. The diagnostic accuracy and efficiency are high speed in this method, which is suitable for an underwater environment with strong currents and complex spatiotemporal variability, and it is effective uniform and symmetrical attachment fault diagnosis; meanwhile, it is robust into the recognition of blurred pictures under high-speed rotation. It will be better to combine image features and electrical features to diagnose uniform faults and symmetrical attachment faults in the future.

References [1] Z. Ren, Y. Wang, H. Li, X. Liu, Y. Wen, W. Li., A coordinated planning method for micrositing of tidal current turbines and collector system optimization in tidal current farms, IEEE Transactions on Power Systems 34 (1) (2018) 292–302. [2] Y. Dai, Z. Ren, K. Wang, W. Li, Z. Li, W. Yan, Optimal sizing and arrangement of tidal current farm, IEEE Transactions on Sustainable Energy 9 (1) (2017) 168–177. [3] O.A.L. Brutto, M.R. Barakat, S.S. Guillou, J. Thiébot, H. Gualous, Influence of the wake Effect on electrical dynamics of commercial tidal farms: Application to the Alderney Race (France), IEEE Transactions on Sustainable Energy 9 (1) (2017) 321–332. [4] M.R. Barakat, B. Tala-Ighil, H. Chaoui, H. Gualous, Y. Slamani, D. Hissel, Energetic macroscopic representation of a marine current turbine system with loss minimization control, IEEE Transactions on Sustainable Energy 9 (1) (2017) 106–117. [5] S.B. Chabane, M. Alamir, M. Fiacchini, R. Riah, T. Kovaltchouk, S. Bacha, Electricity grid connection of a tidal farm: An active power control framework constrained to grid code requirements, IEEE Transactions on Sustainable Energy 9 (4) (2018) 1948–1956. [6] Z. Li, N. Maki, T. Ida, M. Miki, M. Izumi, Comparative study of 1-MW PM and HTS synchronous generators for marine current turbine, IEEE Transactions on Applied Superconductivity 28 (4) (2018) 1–5. [7] G.L. Wick, W.R. Schmitt, R. Clarke, Harvesting Ocean Energy (1981). [8] P.A. Lynn, Electricity from Wave and Tide: An Introduction to Marine Energy, John Wiley & Sons, West Sussex, UK, 2013. [9] H.T. Pham, J.M. Bourgeot, M. Benbouzid, Fault-tolerant finite control set-model predictive control for marine current turbine applications, IET Renewable Power Generation 12 (4) (2017) 415–421. [10] Z. Ren, H. Li, W. Li, X. Zhao, Y. Sun, T. Li, F. Jiang, Reliability evaluation of tidal current farm integrated generation systems considering wake effects, IEEE Access 6: (2018) 52616– 52624. [11] Z. Zhou, M. Benbouzid, J.F. Charpentier, F. Scuiller, T. Tang, Developments in large marine current turbine technologies—A review, Renewable and Sustainable Energy Reviews 71 (2017) 852–858.

Fault detection and fault identification in marine current turbines Chapter | 7

301

[12] M. Zhang, T. Tang, T. Wang. Multi-domain reference method for fault detection of marine current turbine. In Proceedings of IECON 2017—The 43rd Annual Conference of the IEEE Industrial Electronics Society, 2017. IEEE, Los Alamitos, CA, 2017. 8087–8092. [13] R. Rosli, E. Dimla. A review of tidal current energy resource assessment: Current status and trend. In Proceedings of the 5th International Conference on Renewable Energy: Generation and Applications (ICREGA), 2018. IEEE, Los Alamitos, CA, 34–40. [14] X. Yang, N. Liu, P. Zhang, Z. Guo, C. Ma, P. Hu, X. Zhang, The current state of marine renewable energy policy in China, Marine Policy 100: (2019) 334–341. [15] A. Uihlein, D. Magagna, Wave and tidal current energy–A review of the current state of research beyond technology, Renewable & Sustainable Energy Reviews 58: (2016) 1070– 1081. [16] A. Mérigaud, J.V. Ringwood, Condition-based maintenance methods for marine renewable energy, Renewable & Sustainable Energy Reviews 66 (2016) 53–78. [17] T. Flanagan, J. Maguire, C.M. Ó’Brádaigh, P. Mayorga, A. Doyle, Smart affordable composite blades for tidal energy, Proceedings of the 11th European Wave and Tidal Energy Conference (EWTEC) (2015) 6–11. [18] M. Mueller, R. Wallace, Enabling science and technology for marine renewable energy, Energy Policy 36 (12) (2008) 4376–4382. [19] J.M. Walker, K.A. Flack, E.E. Lust, M.P. Schultz, L. Luznik, Experimental and numerical studies of blade roughness and fouling on marine current turbine performance, Renewable Energy 66: (2014) 257–267. [20] B. Polagye, B. Van Cleve, A. Copping, K. Kirkendall, Environmental effects of tidal energy development, Proceedings of the Tidal Energy Workshop (2011). [21] T. Wang, J. Qi, H. Xu, Y. Wang, L. Liu, D. Gao. Fault diagnosis method based on FFT-RPCASVM for cascaded-multilevel inverter. ISA Transactions 2016;60: 156–163. [22] M. Zhang, T. Wang, T. Tang, M. Benbouzid, D. Diallo, Imbalance fault detection of marine current turbine under condition of wave and turbulence, in: Proceedings of ECON 2016—The 42nd Annual Conference of the IEEE Industrial Electronics Society, IEEE, Los Alamitos, CA, 2016, pp. 6353–6358. [23] A.N. Einrí, G.M. Jónsdóttir, F. Milano. Modeling and control of marine current turbines and energy storage systems. IFAC-PapersOnLine 2091;52(4):425–430. [24] G. Keenan, C. Sparling, H. Williams, F. Fortune. SeaGen Environmental Monitoring Programme: Final Report. Marine Current Turbines, Northern Ireland, UK, 2011. [25] W. Li, H. Zhou, H. Liu, Y. Lin, Q. Xu, Review on the blade design technologies of tidal current turbine, Renewable & Sustainable Energy Reviews 63: (2016) 414–422. [26] H. Titah-Benbouzid, M.E.H. Benbouzid, Biofouling issue on marine renewable energy converters: A state of the art review on impacts and prevention, International Journal on Energy Conversion 5 (3) (2017) 67–78. [27] X. Sheng, S. Wan, L. Cheng, Y. Li, Blade aerodynamic asymmetry fault analysis and diagnosis of wind turbines with doubly fed induction generator, Journal of Mechanical Science & Technology 31 (10) (2017) 5011–5020. [28] X. Gong, W. Qiao, Bearing fault diagnosis for direct-drive wind turbines via currentdemodulated signals, IEEE Transactions on Industrial Electronics 60 (8) (2013) 3419–3428. [29] H. Talhaoui, A. Menacer, A. Kessal, A. Tarek, Experimental diagnosis of broken rotor bars fault in induction machine based on Hilbert and discrete wavelet transforms, International Journal of Advanced Manufacturing Technology 95 (1–4) (2018) 1399 1408. [30] M. Voltz, R. Webster, A comparison of kriging, cubic splines and classification for predicting soil properties from sample information, Journal of Soil Science 41 (3) (1990) 473–490.

302

Fault diagnosis and prognosis techniques for complex engineering systems

[31] R. Yan, R.X. Gao, X. Chen, Wavelets for fault diagnosis of rotary machines: A review with applications, Signal Processing 96 (2014) 1–15. [32] H. Chen, N. At-Ahmed, M. Machmoum, M.E.H. Zam, Modeling and vector control of marine current energy conversion system based on doubly salient permanent magnet generator, IEEE Transactions on Sustainable Energy 7 (1) (2015) 409–418. [33] H.T. Pham, J.M. Bourgeot, M.E.H. Benbouzid, Comparative investigations of sensor faulttolerant control strategies performance for marine current turbine applications, IEEE Journal of Oceanic Engineering 43 (4) (2017) 1024–1036. [34] Z. Li, T. Wang, Y. Wang, Y. Amirat, M. Benbouzid, D. Diallo, A wavelet threshold denoisingbased imbalance fault detection method for marine current turbines, IEEE Access 8 (2020) 29815–29825. [35] Z. Liu, Z. He, W. Guo, Z. Tang, A hybrid fault diagnosis method based on second generation wavelet de-noising and local mean decomposition for rotating machinery, ISA Transactions 61 (2016) 211–220. [36] A.K. Bhandari, D. Kumar, A. Kumar, G.K. Singh, Optimal sub-band adaptive thresholding based edge preserved satellite image denoising using adaptive differential evolution algorithm, Neurocomputing 174 (2016) 698–721. [37] X. Gong, W. Qiao, Imbalance fault detection of direct-drive wind turbines using generator current signals, IEEE Transactions on Energy Conversion 27 (2) (2012) 468–476. [38] H.T. Chiang, Y.Y. Hsieh, S.W. Fu, K.H. Hung, Y. Tsao, S.Y. Chien, Noise reduction in ECG signals using fully convolutional denoising autoencoders, IEEE Access 7 (2019) 60806– 60813. [39] M.Z. Sheriff, M. Mansouri, M.N. Karim, H. Nounou, M. Nounou, Fault detection using multiscale PCA-based moving window GLRT, Journal of Process Control 54 (2017) 47–64. [40] M. Mansouri, M.Z. Sheriff, R. Baklouti, M. Nounou, H. Nounou, A.B. Hamida, N. Karim, Statistical fault detection of chemical process-comparative studies, Journal of Chemical Engineering & Process Technology 7 (1) (2016) 282–291. [41] M. Zhang, T. Wang, T. Tang, M. Benbouzid, D. Diallo, An imbalance fault detection method based on data normalization and EMD for marine current turbines, ISA Transactions 68 (2017) 302–312. [42] G. Hou, Z. Pan, B. Huang, G. Wang, X. Luan, Hue preserving-based approach for underwater colour image enhancement, IET Image Processing 12 (2) (2017) 292–298. [43] A. Ng, Sparse autoencoder, CS294A Lecture Notes 72 (2011) 1–19. [44] B. Xin, T. Wang, T. Tang, A deep learning and softmax regression fault diagnosis method for multi-level converter, in: Proceedings of the IEEE 11th International Symposium on Diagnostics for Electrical Machines, Power Electronics, and Drives (SDEMPED), 2017, IEEE, Los Alamitos, CA, 2017, pp. 292–297. [45] V.D. Krsman, A.T. Sari´c, Bad area detection and whitening transformation-based identification in three-phase distribution state estimation, IET Generation, Transmission & Distribution 11 (9) (2017) 2351–2361. [46] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. [47] H. Chen, T. Tang, N. Aït-Ahmed, M.E.H. Benbouzid, M. Machmoum, M.E.H Zaïm, Attraction, challenge and current status of marine current energy, IEEE Access 6 (2018) 12665– 12685. [48] Y. Zheng, T. Wang, B. Xin, T. Xie, Y. Wang, A sparse autoencoder and softmax regression based diagnosis method for the attachment on the blades of marine current turbine, Sensors 19 (4) (2019) 826.

Fault detection and fault identification in marine current turbines Chapter | 7

303

[49] F. Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258. [50] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen. MobileNetB2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520. [51] K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770–778. [52] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 1–9. [53] S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015. [54] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, 2017. [55] A. Howard, M. Sandler, G. Chu, L.C. Chen, B. Chen, M. Tan, W. Wang, et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE International Conference on Computer Vision. 1314–1324.

Chapter 8

Quadrotor actuator fault diagnosis and accommodation based on nonlinear adaptive state observer Sicheng Zhou a, Kexin Guo a, Xiang Yu a,b, Lei Guo a,b and Youmin Zhang c a School

of Automation Science and Electrical Engineering, Beihang University, China. b Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing, China. c Department of Mechanical, Industrial, and Aerospace Engineering, Concordia University, Montreal, Quebec, Canada

Financial Support: This research was supported by the National Natural Science Foundation of China (No. 61833013, 61973012 and 61903019), the Program for Changjiang Scholars and Innovative Research Team (no. IRT 16R03), Zhejiang Lab Fund (2019NB0AB08), China Postdoctoral Science Foundation (no. 2019M660404), Zhejiang Provincial Natural Science Foundation (no. LQ20F030006), and NSERC.

8.1 Introduction In recent years, unmanned aerial vehicles (UAVs) have been widely used because of the huge potential in military and civilian applications [1–3], such as traffic monitoring, recognition and surveillance, and search and rescue operations in hostile environments [4, 5], especially quadrotor UAV. With the rapid development of UAVs, the importance of fault-tolerant control (FTC) is increasing. Generally, faults of quadrotor UAVs can be classified into actuator faults, sensor faults, and component faults [6]. The occurrence of actuator faults can deteriorate the tracking performance and stability of the closed-loop system [7]. FTC design methods can be essentially classified into passive and active approaches [8, 9]. With respect to passive FTC, an adaptive FTC strategy with consideration of input saturation is presented against actuator faults and external disturbances [10]. In the work of Yu and Jiang [11], a robust nonlinear controller is developed to handle the disturbances and faults by combining the sliding Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00002-1 Copyright © 2021 Elsevier Inc. All rights reserved. 305

306

Fault diagnosis and prognosis techniques for complex engineering systems

mode control and backstepping control techniques. In the work of Avram et al. [12], an adaptive FTC is designed to guarantee asymptotic convergence of the altitude and attitude tracking errors in the presence of multiple actuator faults and modeling uncertainties. Focusing on active FTC, to actively compensate actuator faults, a fault detection and diagnosis (FDD) module is necessary. In the past few years, researchers have paid tremendous attention to FDD design, resulting in different kinds of methods, such as sliding mode observer methods [13], the high-gain observer approach [14, 15], Kalman filter–based estimations [16], and the adaptive observer-based method [17]. Additionally, a moving horizon estimator and an unscented Kalman filter are compared to examine the FDD performance of quadrotor UAV [18]. In the work of Aguilar-Sierra et al. [19], a polynomial observer is proposed to diagnose actuator faults in a smallscale quadrotor UAV. In the work of Yu and Jiang [20], a hybrid FTC system that combines the merits of passive and active FTC systems is proposed to accommodate the partial actuator failures. It is also noteworthy that the neural network adaptive techniques have been exploited at the active FTC design stage in recent years [21, 22]. In this chapter, we focus on the design of active FTC subject to time-varying actuator faults of a quadrotor UAV. The developed FTC system includes the fault detection module, fault diagnosis module, and accommodation unit. Before the fault occurs, the fault detection module is exploited to monitor the state residual of the quadrotor UAV. Once the fault is detected, the fault diagnosis is thereby activated to identify fault amplitude and estimate unknown fault parameters. Based on the fault estimation, the accommodation unit adjusts the control signal to guarantee the tracking performance of the quadrotor UAV. Our major contributions are briefly stated as follows: (1) An adaptive fault detection threshold is proposed to determine the fault occurrence in the presence of model uncertainties and external disturbances. The fault can be detected if the state residual generated by a nonlinear state observer exceeds the threshold. (2) Four nonlinear adaptive state observers (NASO) with respect to four rotors are developed in the fault diagnosis module. In comparison with sliding mode observer and high-gain observer, NASO has better tracking performance for time-varying faults and can locate the fault rotor accurately as well. (3) An accommodation unit is proposed to adjust the control signal without changing the original control architecture. With the proposed approach, some complicated FTC design can be avoided and there is no effect on the parameters of the baseline controller. This chapter is organized as follows. Section 8.2 presents the dynamic model of a quadrotor UAV and time-varying actuator faults model. The FTC design algorithm is described in Section 8.3, including the fault detection module, fault

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

FIGURE 8.1

307

Structure of the quadrotor and frames.

diagnosis module, and accommodation unit. Moreover, the results of numerical simulation and flight test are illustrated to validate the effectiveness and applicability of the proposed FTC scheme in Section 8.4. Finally, Section 8.5 concludes the chapter.

8.2 Mathematical model of a quadrotor This section presents the nonlinear quadrotor model and the actuator fault model, which provides a basis for the proposed active FTC design.

8.2.1

The nonlinear quadrotor model

Generally, the dynamic model includes translation and rotation equations. As shown in Fig. 8.1, the Body Frame (BF) is assumed to be at the center of gravity of the quadrotor, where the x-axis (xB ) is pointing head, the y-axis (yB ) is pointing left, and the z-axis (zB ) is pointing upward, and the Inertial Frame (IF) is assumed to be at the take-off point of the quadrotor. The transformation of vectors from the BF to the IF can be expressed as REB⎡

cosψ cos θ = ⎣sin ψ cos θ −sinθ

−sinψ cos φ + cosψ sin θ sin φ cosψ cos φ + sin ψ sin θ sin φ cos θ sin φ

⎤ sin ψ sin φ + cosψ sin θ cos φ −cosψ sin φ + sin ψ sin θ cos φ ⎦ cos φ cos θ

(8.1) where b = [φ

θ

ψ]T represents body-axis pitch, roll, and yaw angle.

308

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 8.1 Parameters for quadrotor dynamics Parameter

Meaning

pE = [x y z]T

Position of the quadrotor in the IF

vE = [u v

Linear velocity of the quadrotor in the IF

ω = [p q

w]T r]T

[ξv ξω ]T [Jx Jy Jz

Angular rates of the quadrotor in the BF Model uncertainties in the translational and rotational dynamics

]T

Moment of inertia in the BF

[Mx My Mz ]T

The rolling torque, the pitching torque, and the yawing torque

Fm

The thrust

m

Mass of the quadrotor

g

The gravitational acceleration

The nonlinear quadrotor dynamics considered in this chapter can be described as p˙ E = vE , ⎡ ⎤ ⎡ ⎤ 0 0 1 v˙ E = REB (b )⎣0 ⎦ + ⎣0 ⎦+ξv , m Fm −g ˙ b = R0 (b )ω,  ⎡ Jy −Jz Jx ⎢ J −J ω= ˙ ⎣ z Jy x Jx −Jy Jz

⎤ ⎡ Mx ⎤ qr Jx ⎢ My ⎥ pr ⎥ + ⎦ ⎣ Jy ⎦ + ξω , pq

(8.2) (8.3) (8.4)

(8.5)

Mz Jz

where the specific forms of the matrix R0 (b ) can be expressed as ⎡ ⎤ 1 tan θ sin φ tan θ cos φ cos φ − sin φ ⎦ R0 (b ) = ⎣0 0 sin φ/cos θ cos φ/cos θ

(8.6)

and the parameters are listed in Table 8.1.

8.2.2

The actuator fault model

T  In this chapter, the quadrotor UAV inputs u = Fm Mx My Mz can be simplified and thereby described as ⎤⎡ ⎤ ⎡ ⎡ ⎤ f1 1 1 1 1 Fm ⎥⎢ f2 ⎥ ⎢−dφ −dφ ⎢Mx ⎥ d d φ φ ⎥⎢ ⎥, ⎢ ⎥ = Ru fs = ⎢ (8.7) ⎣ dθ ⎣My ⎦ −dθ dθ −dθ ⎦⎣ f3 ⎦ Mz cτ f −cτ f −cτ f cτ f f4

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

309

where fs (s = 1, . . ., 4) is the force produced by each rotor acting on the quadrotor body. dϕ is the half of roll motor-to-motor distance, dθ is the half of pitch motorto-motor distance, and cτ f is a fixed constant reflecting the relationship between the thrust force fs and its corresponding torque. The slow speed of the rotor due to phase break is an important reason for quadrotor UAV tasking failure during missions. The main reasons for phase break include excessive temperature, excessive load, and aging of the coil. For example, because of the aging of the coil insulation layer, some coils are short circuited during flight, which eventually induces phase break and slows down the rotors’ speed. In fact, this speed change is often time varying. According to propeller dynamics, the actuator faults caused by slower rotor speed can be modeled as a time-varying partial loss of effectiveness (LOE) in the rotors and thereby represented as fs∗ = u fs ,

(8.8)

T  where fs = f1 f2 f3 f4 denotes the commanded thrust force generated T  by the sth rotor, and fs∗ = f1∗ f2∗ f3∗ f4∗ stands for the actual thrust force generated by the sth rotor. = diag(α s )(s = 1, . . ., 4) and α s ∈ (0, 1] is an unknown parameter representing the occurrence of a partial LOE fault in the sth rotor. The case of α s = 1 represents a healthy rotor, whereas α s < 1 represents a faulty rotor with LOE. By using Eq. (8.8), the actual system inputs can be expressed as u∗ = Ru fs∗ = Ru u fs .

(8.9)

Assumption 1. The unstructured modeling uncertainties and environment disturbance ξ v and ξ ω in Eq. (8.3) and Eq. (8.5), respectively, are unknown but assumed to be bounded by some known functions. For t ≥ 0, |ξv | ≤ ξ¯v , |ξω | ≤ ξ¯ω and the bounding functions ξ¯v and ξ¯ω are known, continuous, and bounded. Assumption 1 represents the class of modeling uncertainty considered. To generate the adaptive threshold to distinguish the influence of fault and the modeling uncertainty during the FDD process, the boundary of the uncertainty of unstructured modeling should be known a priori.

8.3 Naso-based FTC The proposed FTC scheme is shown in Fig. 8.2. It can be seen that the FDD module includes two main components: a nonlinear fault detection module to determine fault occurrence, and a set of nonlinear adaptive fault diagnosis estimators to identify fault rotor and estimate unknown fault parameters. Once the fault detection estimator detects an actuator fault, four estimators are activated to locate the failed rotor. After fault diagnosis, fault parameters are employed to adjust controller output signals.

310

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 8.2

8.3.1

Structure of the FTC.

The fault detection module

In this section, a nonlinear state observer is designed to generate the state residual between the estimated and the actual states. According to Assumption 1, an adaptive threshold is proposed to improve the robustness of the scheme. Thus, the fault detection module can determine the fault occurrence accurately by monitoring the state residual. T  First, consider the state vector ζ = vz p q r , where vz represents the quadrotor velocity in the vertical direction of IF. Therefore, regardless of the effect of the failure, by substituting Eq. (8.7) and Eq. (8.8) into Eq. (8.3) and Eq. (8.5), one can obtain the following: ¯ t ), ζ˙ = f (x, ¯ t ) + BRu fs + ξ (x,

(8.10)

where fs is the commanded forces of each rotor, B = diag cos φmcos θ , J1x , J1y , J1z , and the known nonlinearity f (x, ¯ t ) is defined as ⎡ ⎤ −g ⎢ Jy −Jz ⎥ ⎢ J qr ⎥ ⎢ x ⎥ f (x, ¯ t ) = ⎢ Jz −Jx ⎥. (8.11) ⎢ Jy pr ⎥ ⎣ ⎦ Jx −Jy pq Jz Subsequently, by using Eq. (8.10), the nonlinear state observer is designed as ⎡ ⎡ ⎤ ⎤ v˙ˆ z vˆ z − vz ⎢ p˙ˆ ⎥ ⎢ ⎥ ⎢ ⎥ = − ⎢ pˆ − p ⎥ + f (x, ¯ t ) + BRu fs , ⎣q˙ˆ ⎦ ⎣qˆ − q ⎦ rˆ − r r˙ˆ

(8.12)

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

311

 T where vˆ z pˆ qˆ rˆ represents estimated velocity in IF and angular rates in BF, = diag( i ), with i > 0 for i = 1, …, 4. Based on Eq. (8.10) and Eq. (8.12), the state estimation error dynamics can be described as ε(t ˙ ) = − ε(t ) + ξ (x, ¯ t ), (8.13) T p − pˆ q − qˆ r − rˆ denotes the residual of nonlin-

 where ε(t ) = vz − vˆ z ear state estimator. According to Assumption 1, ξ (x, ¯ t ) given by Eq. (8.10) is bounded. Meanwhile, the matrix is also stable. Thus, the error dynamics given by Eq. (8.13) are stable. By using Eq. (8.13), the residual of nonlinear state estimator ε(t) satisfies t

− (t−τ )

− i (t−t0 )

e i |εi (t0 )| + |εi (t )| ≤ e ξi (x, ¯ τ ) dτ, (8.14) t0

where ε i (t) represents the ith component of ε(t). According to Assumption 1 and Eq. (8.14), an adaptive threshold ε¯i (t ) can be defined as t

− (t−τ )

e i ε¯i (t ) = e− i (t−t0 ) |εi (t0 )| + ¯ τ ) dτ, (8.15) ξ¯i (x, t0

where the first term of the equation is only related to the state residual, and the upper bounds from Assumption 1 determine the second term of the equation. Thus, the proposed threshold can be adaptively adjusted according to the state residual at the last moment and Assumption 1. If any one of the residuals ε i (t) exceeds the adaptive threshold ε¯i (t ), the fault detection module will conclude that a fault has occurred and fault diagnosis estimators will be activated.

8.3.2

The fault diagnosis module

In this section, a NASO is adopted to estimate the unknown fault parameters of each rotor. It is worth mentioning that there is no independent design of reconfigurable control—only the control signal of the original basic controller is adjusted. Hence, in this section, the basic controller is stable to the system by default. After the fault is detected, the equation of state can be written as: ¯ t ). ζ˙ = f (x, ¯ t ) + BRu (I − u ) fs + ξ (x,

(8.16)

With respect to the ith rotor, the equation of state can be written as ζ˙i = f (x, ¯ t ) + BRu fs − α i BRu i fs + ξ (x, ¯ t ),

(8.17)

where α i represents the fault parameter of the ith rotor and i denotes the transformation matrix of the ith rotor. For example, if i = 1, i = diag(1, 0, 0, 0).

312

Fault diagnosis and prognosis techniques for complex engineering systems

As shown in Fig. 8.2, once the fault is detected, a set of four nonlinear adaptive estimators are activated to identify the fault rotor and estimate unknown fault parameters. Based on Eq. (8.17), each estimator is designed for the corresponding actuator failure. Theorem 1. Consider the faulty system described by Eq. (8.16). Define z1 and z2 as the states of the designed NASO. If the observer of the ith rotor is formed as i z˙1 = −ei + f (x, ¯ t ) + BRu fs − zi2 BRu i fs   (8.18) T z˙i2 = BRu i fs ei ,   where  = diag ηsi (s = 1, . . . , 4) is a positive coefficient matrix of the ith rotor, and ei = zi1 − ζ . The terms ζ i and α i will be estimated within finite time through zi1 and zi2 , respectively. Proof. Consider the Lyapunov function as 1  i T i 1 2 (8.19) e e + z2 , 2 2 together with Eq. (8.17). Thus, the derivative of V1 can be obtained as  T V˙1 = ei e˙i + zi2 z˙i2  T  T = ei (−ei + f (x, ¯ t ) + BRu fs − zi2 BRu i fs − ζ˙ ) + zi2 BRu i fs ei  T  T   f (x, ¯ t ) + BRu fs − ζ˙ = − ei ei + ei  T    T + ei −zi2 BRu i fs + zi2 BRu i fs ei  T  T   f (x, ¯ t ) + BRu fs − ζ˙ , = − ei ei + ei (8.20) where  is a positive coefficient matrix, ensuring that the first term of the equation is less than 0. The baseline controller is stable, guaranteeing that the second term of the equation is less than 0. Hence, the stability of the NASO can be guaranteed. V1 =

Moreover, it is proven thatzi1 and zi2 can approach to ζ i and α i within finite time, respectively. Remark 1. The fault diagnosis scheme presented in this chapter is applicable not only for the constant fault but also for the time-varying fault. In comparison with the work of Avram et al. [12], this study has a deeper analysis of quadrotor UAV actuator faults, whereas the proposed NASO has better tracking performance in the case of time-varying faults.

8.3.3

The fault accommodation module

As depicted in Fig. 8.3, the trajectory control of quadrotor is implemented by using a dual-loop architecture. More specifically, the outer loop controls the X

FIGURE 8.3

Structure of the baseline controller.

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

313

Simulation results of NASO affected by constant fault.

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 8.4

314

Adjusted commanded motor forces affected by constant fault.

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

FIGURE 8.5

315

Simulation results of NASO affected by time-varying fault.

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 8.6

316

Simulation results of NASO affected by time-varying fault.

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

FIGURE 8.7

317

318

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 8.8

Flight test environment.

Motion camera

Wired connection Camera hub

UAV

PC

Router

GROUND STATION PC

FIGURE 8.9

Layout of the experimental test environment.

and Y positions by generating the desired roll and pitch angles. The altitude and attitude controller can generate the required rotor speed for the quadrotor UAV to track the required attitude and altitude. To sum up, based on the nominal quadrotor model given by Eqs. (8.1) through (8.4) under healthy conditions, the PID baseline controller is designed for inner and outer loops providing satisfactory tracking performance. The PID controller calculates the required rotor speed, which is used by the motor servo control system to generate the force and moment acting on the quadrotor to track a set of reference trajectories. As shown in Fig. 8.2, after the fault actuator is isolated by the fault diagnosis component, the matching adaptive estimators can provide an estimate of the unknown fault amplitude. In consequence, the estimated values can be used by the fault accommodation module to adjust the baseline control signals. On

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

FIGURE 8.10

319

Position tracking before the fault occurred.

the premise of generality, it is assumed that the partial LOE occurs in the first actuator and the fault is detected at time td . Hence, for t ≥ td , the output signal of baseline controller is modified as −1  f˜s = I4 − zi2 i fs , (8.21) where fs is the commanded thrust force generated by the baseline controller, f˜s is the adjusted commanded thrust force sent to the actuator control system, and zi2 is the fault parameter estimate provided by the NASO corresponding to the ith rotor. Remark 2. According to Eq. (8.21), it is worth noting that the proposed fault accommodation scheme cannot deal with the situation where the actuator fails completely due to the requirement of inversion. In addition, the inaccuracy of the estimator has not been taken into consideration in the current stage. The significance of this method lies in that time-varying faults can be handled. Based on the estimated parameters, adjustment of actuator allocation and mission replanning under saturation is one of our future works.

8.4 Validation To validate the effectiveness of the developed active FTC scheme, both numerical simulation and real-world flight test are conducted.

320

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 8.11

8.4.1

Position tracking after the fault occurred.

Numerical simulation results

(1) Constant fault. The results of estimated fault parameters and the corresponding adjusted control signals under the constant faults are shown in Fig. 8.4 and Fig. 8.5, respectively. As can be observed from Fig. 8.4, the estimated fault parameter can be eventually converged to the actual one successfully. As illustrated in Fig. 8.5, after 15% constant LOE fault occurs on motor 1, the adjustment of control signal increases by 17.65%. Hence, from Figs. 8.4 and 8.5, the constant LOE fault is successfully compensated by the proposed FTC scheme. (2) Time-varying fault. The result of estimated fault parameter and the adjust control signal under the time-varying fault are shown in Fig. 8.6 and Fig. 8.7, respectively. As is visible in Fig. 8.6, the fault detection time is 0.66 s after the fault occurrence, whereas the estimated result is always kept in a reasonable range that is close to the real one. From Fig. 8.7, it is illustrated that the adjust control signals become larger as the faults increase. Thus, from Figs. 8.6 and 8.7, the actuators governed by the proposed FTC can satisfactorily handle the time-varying faults.

8.4.2

Flight test

The developed algorithms herein are implemented on the quadrotor UAV platform, whereas the experimental environment is illustrated in Fig. 8.8.

FIGURE 8.12

Results of the NASO in the flight test.

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

321

Roll and pitch angle tracking performance.

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 8.13

322

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

323

Fig. 8.9 shows the layout of the experimental environment. The experiments are conducted in an indoor environment without GPS. Hence, a network of eight motion cameras, which can locate the mark balls on the quadrotor UAV, is exploited for position capture. Furthermore, a ground station is adopted for command set and real-time status monitoring. The connection between the quadrotor UAV and the station is achieved by a router. Fig. 8.10 and Fig. 8.11 present the tracking performance of the quadrotor before and after a 40% constant fault occurred, respectively. As checked in the figures, it can be concluded that the UAV can guarantee a good trajectory tracking performance before and after the fault occurrence. The mean absolute tracking error is 0.0372 m and the maximum shifting distance is 0.253 m during the flying test. Fig. 8.12 shows the results of estimated fault parameters in the flight test. As can be seen from this figure, the fault detection time is 0.36 s after the fault takes place. It is worth mentioning that the fault detection module has misdiagnosed for motors 2 and 3 during 25 s to 27 s. This situation is caused by the ground effect of the quadrotor UAV. The ground effect affects the flow characteristics of air around the UAV, which eventually causes the rotor to not produce the desired thrust. Fig. 8.13 shows the satisfactory performance of roll and pitch angle tracking. The residual increases sharply at 43.87 s because of the fault. It is also interesting to find that the residual is quickly reduced with the aid of fault diagnosis and the accommodation module.

8.5 Conclusion A NASO-based active FTC system is presented for quadrotor UAV. The benefits of the proposed FTC scheme include the following: (1) the adaptive fault detection threshold enhances the robustness of the FDD module to external disturbance, (2) the NASO is able to guarantee the tracking performance for timevarying fault, and (3) the proposed fault accommodation scheme can complete the compensation of the fault without changing the baseline controller. These improvements offer the potential to enhance the safety of quadrotor UAV. The numerical simulation and flight test demonstrate that the proposed FTC scheme can effectively deal with actuator LOE faults.

References [1] G. Vachtsevanos, L. Tang, G. Drozeski, L. Gutierrez. From mission planning to flight control of unmanned aerial vehicles: Strategies and implementation tools. Annual Reviews in Control 2005;29(1):101--115. [2] Y.M. Zhang, A. Chamseddine, C.A. Rabbath, B. W. Gordon, C.-Y. Su, S. Rakheja, C. Fulford, J. Apkarian, P. Gosselin. Development of advanced FDD and FTC techniques with

324

[3] [4] [5]

[6] [7]

[8] [9] [10] [11]

[12]

[13]

[14] [15] [16]

[17]

[18]

[19]

[20]

Fault diagnosis and prognosis techniques for complex engineering systems application to an unmanned quadrotor helicopter testbed. Journal of the Franklin Institute 2013;350(9):2396--2422. X. Yu, Y. M. Zhang. Sense and avoid technologies with applications to unmanned aircraft systems: Review and prospects. Progress in Aerospace Sciences 2015;74:152--166. S. Gupte, P. I. T. Mohandas, J. M. Conrad. A survey of quadrotor unmanned aerial vehicles. In 2012 Proceedings of IEEE Southeastcon. 1–6. A. Jaimes, S. Kota, J. Gomez. An approach to surveillance an area using swarm of fixed wing and quad-rotor unmanned aerial vehicles UAV(s). In Proceedings of the2008 IEEE International Conference on System of Systems Engineering. 1–6. Y. M. Zhang, J. Jiang. Bibliographical review on reconfigurable fault-tolerant control systems. Annual Reviews in Control 2008;32(2):229--252. Z. T. Dydek, A. M. Annaswamy, E. Lavretsky. Adaptive control of quadrotor UAVs: A design trade study with flight evaluations. IEEE Transactions on Control Systems Technology 2012;21(4):1400--1406. X. Yu, J. Jiang. A survey of fault-tolerant controllers based on safety-related issues. Annual Reviews in Control 2015;39:46--57. J. Jiang, X. Yu. Fault-tolerant control systems: A comparative study between active and passive approaches. Annual Reviews in Control 2012;36( 1): 60--72. S. Li, Y. Wang, J. Tan. Adaptive and robust control of quadrotor aircrafts with input saturation. Nonlinear Dynamics 2017;89( 1): 255--265. F. Chen, R. Jiang, K. Zhang, B. Jiang, G. Tao. Robust backstepping sliding-mode control and observer-based fault estimation for a quadrotor UAV. IEEE Transactions on Industrial Electronics 2016;63( 8): 5044--5056. R. C. Avram, X. Zhang, J. Muse. Nonlinear adaptive fault-tolerant quadrotor altitude and attitude tracking with multiple actuator faults. IEEE Transactions on Control Systems Technology 2017;26( 2): 701--707. D. Lee, H. J. Kim, S. Sastry. Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter. International Journal of Control Automation & Systems 2009;7( 3): 419–428. H. K. Khalil. Adaptive output feedback control of nonlinear systems represented by input– output models. IEEE Transactions on Automatic Control 1996;41( 2): 177–188. M. S. Mahmoud, H. K. Khalil. Robustness of high-gain observer-based nonlinear controllers to unmodeled actuators and sensors. Automatica 2002;38: 361–369. P. Freeman, R. Pandita, N. Srivastava, G. J. Balas. Model-based and data-driven fault detection performance for a small UAV. IEEE/ASME Transactions on Mechatronics 2013;18( 4):1300– 1309. F. Chen, W. Lei, G. Tao, B. Jiang. Actuator fault estimation and reconfiguration control for the quad-rotor helicopter. International Journal of Advanced Robotic Systems 2016;13( 13): 1–12. H. A. Izadi, Y. Zhang, B. W. Gordon. Fault tolerant model predictive control of quad-rotor helicopters with actuator fault estimation. IFAC Proceedings Volumes 2011;44(1): 6343-6348. H. Aguilar-Sierra, G. Flores, S. Salazar, R. Lozano. Fault estimation for a quad-rotor MAV using a polynomial observer. Journal of Intelligent & Robotic Systems 2014;73( 1-4): 455-468. X. Yu, J. Jiang. Hybrid fault-tolerant flight control system design against partial actuator failures. IEEE Transactions on Control Systems Technology 2012;20( 4): 871--886.

Quadrotor actuator fault diagnosis and accommodation Chapter | 8

325

[21] A. Abbaspour, K.K. Yen, P. Forouzannezhad, A. Sargolzaei, A neural adaptive approach for active fault-tolerant control design in UAV, IEEE Transactions on Systems, Man & Cybernetics: Systems (2018), doi:10.1109/TSMC.2018.2850701. [22] Y. Song, L. He, D. Zhang, J. Qian, J. Fu. Neuroadaptive fault-tolerant control of quadrotor UAVs: A more affordable solution. IEEE Transactions on Neural Networks & Learning Systems 2018;30( 7): 1975--1983.

Chapter 9

Defect detection and classification in welding using deep learning and digital radiography M-Mahdi Naddaf-Sh a, Sadra Naddaf-Sh a, Hassan Zargarzadeh a, Sayyed M. Zahiri b, Maxim Dalton b, Gabriel Elpers b and Amir R. Kashani b,1 a Electrical

Engineering Department, Lamar University, United States. b Artificial Intelligence Lab, Stanley Oil and Gas, Stanley Black and Decker, United States

9.1 Introduction One of the foremost concerns of the modern-day industrial and infrastructure world is safety [74]. Whether it be in the context of towering skyscrapers, bridges, or pipelines, the consequences of overlooked or undetectable mistakes can be incredibly catastrophic [38]. To ensure safety, you not only have to devise and implement reliable systems, equipment, and methods, but you must also have an accurate means of recognizing the presence of unexpected and potentially fatal flaws. This is because the practice of automating machinery for complex systems can be incredibly difficult and error prone [4]. Perhaps even more important, no matter how precise and consistent equipment otherwise may be, if human beings are allowed to contribute directly in the process, there is always room for mistakes. As technology advances and gives rise to new ways to minimize risk for human operators and maximize efficiency or capacity, it also becomes increasingly difficult to identify the fault modes of supremely precise instrumentation [3]. Reliability of large-scale infrastructure is heavily dependent on reliability of welding massive metal structures together to function as an individual unit [23].

1 Primarily with help from the Jason Miller, William Aston, Manny Glover, and Shengnan Wang from Stanley Oil and Gas. Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00007-0 Copyright © 2021 Elsevier Inc. All rights reserved. 327

328

Fault diagnosis and prognosis techniques for complex engineering systems

Naively, the process is simple—apply sufficient heat to melt two pieces of metal that are fused together after they cool and harden [39]. In general, a separate piece of metal acts as a filler material that is deposited and cooled between the two pieces in question. In practice, the process requires careful adjustment of finely tuned parameters (automatic or manually enforced) that govern an energy source powerful enough to first liquefy the metal and then deposit the filler material without compromising the integrity of either the base metal or the finished product [7]. From the standpoint of a welding operator, the slightest accidental or misjudged movement can be the difference between a successful weld and a very costly mistake. Quality of welded joints and their inspection assessment are critical elements of industries (e.g., marine, chemical, and aeronautical industries) [77]. Digitizing and automation of monitoring the quality of product is one of the main pillars of Industry 4.0. To achieve this goal, various robotic platforms to increase the consistency, quality, and automation of the process have been developed, including robotic platforms to perform digital radiography. It is important to note that not all construction projects involve the same fundamental structures or conditions. Of unique interest is the process of welding in pipeline construction [69]. Rather than being concerned with welding two large metal beams at varying degrees, as would be necessary in the construction of a building, welding in pipelines is principally focused with a particular type of weld—that of girth welding. Girth welding involves welding the circumference of two cylindrical pipe joints together to extend the pipeline into a single contiguous unit [16]. In practice, pipelines are constructed in one of two primary environments: on land and underwater. In both cases, multiple teams work in tandem to ensure that the pipeline is welded, properly inspected, and potentially coated with special material to protect the pipeline from corrosion [82]. When pipelines are constructed on land, the pipeline is often buried under ground after each joint is welded. Pipelines that are constructed underwater or offshore are produced on large vessels from massive segmented spools and laid underwater [48]. In the latter case, it becomes especially important to protect against the effects of saltwater or electrochemical corrosion with an extra layer of coating. At the end of the day, welding in both environments requires very accurate testing to ensure long life of reliable operation because the cost of repairs skyrocket by the time leaks or major damage occur [57]. Thanks to advances in robotics engineering, there are actually two primary means to perform welds in the modern day. Although there are still many occasions for manual welding, automatic welding is proven to be more cost efficient, more consistent, and faster in many cases, especially in pipeline construction [6]. In this case, the ability to perform automated welds inside the pipe produces a greater quality joint that is fused both internally and externally and is often necessary because it is impossible to fit a person within small-diameter pipelines.

Defect detection and classification in welding using deep learning Chapter | 9

329

In addition, at the very least, it is safer to eliminate the need for a person to travel within a pipe, let alone operate welding equipment inside of it [80].

9.1.1

Welding Process

In general, manual welding is noticeably less precise and more prone to error. Hand movements are random and often unpredictable. By contrast, automated welding is designed to be performed on the basis of pre-configured parameters of operation. Complex factors due to how the weld is deposited around the pipe and effects due to gravity require careful and reproducible patterns that are nearly impossible for a person to re-create every time and almost guaranteed for highquality equipment [51]. No two welding technicians will perform a weld under the same conditions in the same way, and neither will a technician reproduce exactly what was done yesterday [41]. As much as an improvement as automated welding is over manual welding, just in the space of automated welding, there is already a surprisingly large set of potential flaws that are classifiable and well documented [54]. It is clear that a set of methodologies ought to be established for testing the quality or properties of a weld without destroying the end product. Otherwise, a major purpose for testing in the first place would be defeated. In fact, there is a field dedicated precisely to this endeavor, namely the field of non-destructive testing (NDT) [11]. NDT techniques are designed to probe material with some sort of stimulus and interpret the response, either manually or automatically through software [58]. The idea is that anomalies in the welding process are impossible to detect with the naked eye, especially when they are located internally away from the visible surface. Using techniques and the research from applied physics, we have a plethora of ways at our disposal to observe how a material behaves or how something is affected indirectly through the material in question [33]. Most importantly, we can do so without damaging something very costly to repair later down the line [21]. NDT not only provides a means to test material properties but is also intentionally non-destructive. This means that we can actually save on the cost of more expensive repairs as we have the means to preemptively repair a given weld before it becomes too costly and ensure the quality of welds without any harm or foul should they be in flawless condition [67]. Furthermore, testing in general is necessary because it would be fundamentally impossible to easily detect flaws that are known to be problematic in practice. This is because they are embedded deep in the material, hidden from sight [66]. Certain techniques also afford a level of precision that allows for measured judgment calls based on acceptability standards [70]. Although it is clear that a certain weld may contain a defect, it is also possible through NDT techniques such as automated ultrasonic technology (AUT) and real-time radiography (RTR) to actually measure the geometric properties of the

330

Fault diagnosis and prognosis techniques for complex engineering systems

defect [20]. In this way, one can assess whether the defect is really worth a full weld reattempt or repair or if it is absolutely necessary [1]. Is it substantially long or deep? Is it likely to form the basis of a future crack that will eventually and completely undermine the integrity of the weld? These questions are now possible to answer because the entire material and the conditions and characteristics that gave rise to the defects are fully intact and explorable. Alternative methods might either weaken the material, destroy the possibility of assessing defect characteristics, or disallow the ability to test a specific weld altogether [64]. In welding, NDT ranges across a wide spectrum of techniques built on a potentially very different physical phenomenon [32]. For instance, the same fundamental ultrasound technology that enables us to observe an unborn fetus in its mother’s womb can be applied fundamentally to weld inspection [37]. In NDT, this is called AUT. In pipeline girth (circumferential) welding, AUT can be used to identify defects based on irregularities in the patterns of waves that reflect from surfaces in the pipe [61]. AUT technology itself has many forms and variations that adjust characteristics such as the patterns of the wave pulses and the orientation or number of transducers [2]. Another perhaps even more familiar medical technique that has a similar counterpart in weld NDT is that of X-ray scanning. Here, high-energy radiation is deposited directly onto material in front of a screen that shows visual contrast between regions where the material is dense and very little of the radiation is able to pass through and regions where the material is relatively less dense where the screen absorbs the radiation. This technology corresponds to a technique called radiographic testing (RT) [40]. In RTR, the same fundamental RT technology is employed, but the radiation absorption pattern is captured electronically rather than on film [46]. Many other techniques exist, such as electromagnetic or eddy current testing (ET), magnetic particle testing (MT), and acoustic emission testing (AE), and the list goes on [9].

9.1.2

Digital Radiography

Compared to other NDT techniques, RT/RTR remains a popular mainstay of inspection in practice [60]. Other techniques might be too new to have a real presence and trustworthiness among both practitioners and project stakeholders, although this is changing quickly as other techniques are gaining widespread acceptance in the inspection community [13]. In addition, they might be too costly to maintain or support in the field [22]. In some cases, they might even be applicable only in limited circumstances or too uninformative to form the basis of support for or against critical judgment calls [28]. Similar to most other NDT techniques, the discipline of RTR is divided into two primary responsibilities: using equipment to scan the weld and properly interpreting the scan. Although there is always technological advancement in the efficient and high-quality

Defect detection and classification in welding using deep learning Chapter | 9

FIGURE 9.1

331

Digital X-ray detector and source on a robotic platform (Stanley Oil & Gas).

capture of radiographic weld scans, the most important task remains the latter, as this is what determines the perceived weld quality [35]. In practice, RT is a very well established technique. Entire codes or specifications for the practice of and techniques of inspection have been developed as an attempt to standardize the practice of reliable interpretation and operation [73]. One of the most well defined standards relate to the interpretation of radiographic weld scans. In these images, experienced operators are trained to visually detect a whole host of defects that are caused by different factors related to the material and welding process. To the untrained eye, defects are difficult to classify, even imperceptible, unless told exactly what to look for. Through years of experience, individual operators learn how to identify, classify, and prioritize these defects, in accordance with one of the major inspection codes that provide visual examples, descriptions, and guidelines [18]. Even today, almost the entire process, with the exception of image capture itself, is managed and performed by people. Fig. 9.1 is showing an example of X-ray image capturing devices. People are both susceptible to visual fatigue and uniquely trained over long periods of time with an exclusive skillset that takes years to transfer to the next generation of qualified technicians [36]. A fundamental question that we can ask regarding RTR inspection is what role, if any, could the interpretation of radiographic scans be delegated to a machine? The idea of allowing a machine to assist a human operator in the process of interpreting responses from the application of NDT inspection techniques for the purpose of identifying problematic flaws is known as assisted defect recognition (ADR) [83]. The key is that although we still ensure that human beings are fully and directly in control of the final assessment, people are still willing and able to use whatever tools in their disposal they have to aid them in a task so difficult to consistently reproduce or perform perfectly. Human operators, depending on their role and level of skill, are expected to visually parse up to hundreds of weld scan images daily. This is a lot to ask

332

Fault diagnosis and prognosis techniques for complex engineering systems

from the eyes and the brain and is bound to at least introduce the possibility of reasonable errors that are unfortunately essentially unacceptable from a standard of quality [52]. As the process of radiographic scanning is nearly fully automated in practice, people, although specially trained, are expected to keep up with the same pace and operate with consistent and high accuracy [15]. This is at least one reason it seems reasonable that people should be assisted in the process of interpretation and analysis of radiographic weld scans. Given that a single individual is incapable of being at thousands of places in the world at once, we must employ the talents of many individuals to process weld images from multiple projects. Moreover, each individual has a unique project background, level of experience, predisposition, preference, and opinion [65]. Despite the presence of NDT inspection codes and standard, it is almost totally unexpected to expect all of these individuals to agree upon the same internal standard formed over time and through experiences [78]. Even if this were not the case, it guaranteed that individuals trained on the basis of separate codes, which are not entirely consistent with one another, will differ with one another in the interpretation of certain key conditions [49]. It takes substantial time for someone to internalize an external standard and develop the ability to apply it as second nature professionally. If operators are trained under the API 1104 inspection standard for pipeline welding, they do not automatically acquire the knowledge embodied in the ASME standard [50]. Even more pertinent, they must contradict themselves relative to a previous standard on occasion to adhere to a new one. People are not often capable of maintaining multiple potentially incompatible standards mentally. The fundamental approach is a data-driven machine learning model. The advantage of this is that the experience of technicians on former projects across a wide variety of conditions are now directly available, more than any one individual could provide, even with sufficient time. On some level, there is no real need to acquire the understanding that an experienced inspection operator possesses to develop a system that is capable of learning from that operator [81]. There are varying ways and degrees that a machine might assist a human by focusing on tasks that machines are proven to be successful with. Specially designed software might simply identify the presence of anomalies or more usefully classify them into distinct categories. Operators would be able to see the most likely classification candidates. Going one step further, this classification could be based on specific classes that are defined in common inspection codes. As an operator is advised through ADR, the operator may also be quickly directed to key references or guidelines in a given code or explicit examples encountered in previous projects. When it becomes clear that different codes deviate from one another, such a tool could be trained much like a person is trained on competing standards, far more than any one individual could master in a lifetime. In essence ADR offers the opportunity to enlist the help of all professionals everywhere in every decision, even when they are completely unavailable [53].

Defect detection and classification in welding using deep learning Chapter | 9

333

As the inspection community gains increased faith in the accuracy and efficacy of such machine learning models and associated software tooling, the possibilities for safe and valuable technological aids in the process of NDT weld inspection and analysis grow accordingly. The focus of this research is at the intersection of most of the key concepts discussed previously. In particular, the goal was the development of a practically valuable and useful model for defect recognition in radiographic image scans on pipeline girth welds to assist inspection technicians and operators in visual analysis. To accomplish this, deep learning techniques were employed for automatically classifying defects built from a library of a large number of labeled annotations. The value of a data-driven model is rooted in the ability to learn from professional experts directly and efficiently [29]. Additionally, a deep learning model—provided enough training data—is capable of extracting highly complex features that intuitively correspond to the same features for which professional inspection technicians develop a reliable intuition [71]. This intuition, by its very nature, is unfortunately at the same time difficult to explain to another person. Even without years of professional experience, it becomes possible to translate the collective experience of the inspection expert community into a language that can be understood. NDT using radiography (RT) is one of the oldest techniques to evaluate welding quality, which is used predominantly in inspection of welded joints. Utilization of RT techniques is inevitable to certify the safety and reliability of manual welds in these structures. However, the process of assessment of X-ray images is both time consuming and at times can be subjective for the expert operators due to various reasons [76]. The advent of novel image processing techniques and pattern recognition has been applied to accelerate the process of improving the image quality to help in weld defect diagnosis and increase the accuracy of defect detection. Nonetheless, due to the complexity of the task, many efforts have been made to develop such an automated intelligent system to recognize defective welded joints.

9.2 Literature Review Previous research on weld defect detection was mostly performed using techniques such as segmentation and texture features extraction and classification, one of which has reached the highest accuracy of 90.91% by Mery et al. [55]. Artificial neural networks (ANNs) are used predominantly with enhancements on classifier description among various research. Kumar et al. [44] used graylevel co-occurrence (GLCM) texture features as ANN input and reached an accuracy of 86.1% for defect classification. In another work, Kumar et al. [43] added geometrical features and enhanced the accuracy to 87.34%. Zapata et al. [85] utilize two neural classifiers: an adaptive network-based fuzzy inference system (ANFIS) and ANN. In their

334

Fault diagnosis and prognosis techniques for complex engineering systems

work, ANN achieved accuracy of 78.9%, and using the 12 chosen geometrical features extracted from X-ray images as input, ANFIS reached an accuracy of 82.6%. To integrate both texture and geometrical features and to select only useful features, preventing an increase in computational complexity, Valavanis et al. [75] employed the sequential backward selection (SBS) technique to design the classifier of the ANN, which resulted in 85.4% accuracy. Wang and Liao [79] designed a weld flaw detection system, in which they extracted 12 different features from radiography images such as size, intensity, direction, location, and shape for the input of networks. They applied two wellknown networks for detection of weld flaws: fuzzy k-nearest neighbor (fuzzy KNN) and multi-layer perceptron (MLP) neural networks, which attained accuracy of 91.57% and 92.39%, respectively. However, traditional feature extraction methods used in these systems are not able to extract discriminant features. The solution to improve pattern recognition in weld flaw detection is to take advantage of deep neural networks that can extract higher-level and more abstract features. Hou et al. [30] employed deep convolutional neural networks (DCNNs) on three resampled datasets based on the GDXray [56] weld dataset, and the highest achieved accuracy was 97.2% on the dataset using the synthetic minority over-sampling technique (SMOTE). For the remaining datasets, two techniques of random over-sampling (ROS) and random under-sampling (RUS) for modifying samples were applied, and best-case performances were 96.3% and 79.9% for each method, respectively. Integration of computer vision methods using DCNNs shows exceptional promise for use in several applications like object detection, crack detection and NDT but requires many images for the training process [87–89]. Although utilizing CNNs improves the defect detection accuracy, there are known drawbacks in the conventional approaches for studying the defects. For instance, Lin et al. [47] studied application of a DCNN for detection of casting defects. The authors reported 96% accuracy achieved through using an eight-layer CNN for detecting defects in X-ray images of a casted object. They could improve the detection rate up to 8% by applying their CNN-based method over an ANN-based method. Utilizing the proposed method not only makes it possible to increase the quality but also reduces the scrap rate during the casting process. As reviewed earlier, methods based on deep learning for NDT improve false detection accuracy within X-ray images, whether using different CNN architectures or pre-trained models [31]. Zhang et al. [86] deployed a CNN classification model to detect weld defects. The goal was to detect defects for aluminum alloy in robotic arc welding using an 11-layer CNN. In this method, the authors reported 99.38% accuracy for a single type of welded material. For the database, the authors used a CCD camera instead of radiography images. In industry, using images of a CCD camera for various types of materials in weld is not common, because it is not possible to detect in-depth defects like lack of infusion. As another example in the work of Zhang et al. [84], the imaging sensor is improved by utilizing a UVV band visual sensor system, but this method is

Defect detection and classification in welding using deep learning Chapter | 9

335

FIGURE 9.2 Two samples of images in the GDXray database (top) and the SBD database (bottom). The bottom image is cropped to be able to compare with the top image. Defects are more visible in the GDXray database.

not capable of penetrating through material deep enough to detect defects like an internal cavity. Zhang et al. [34] studied generative adversarial nets (GANs) in weld defect detection. Based on the definition of defect, they categorized defects into three groups: cracks, porosity, and burn through. In their definition, series of single dots considered as porosity, linear shadow that can be found vertical or parallel to the edge of welds defined as crack, and burn through defined as shadows of anomalous shape on X-ray images. The definition is not applicable in the realworld X-ray images, and many defect classes will be neglected due to the limited bounded definition. Although the proposed method could reach above 94%, the trained CNN results on other sets of X-ray weld datasets are not reported to check the generalization of the network. However, most reported works used the GDXray database, which is public [56], and the WDXI dataset [25], which is not available publicly. As depicted in Fig. 9.2 (top image), defects in images of the GDXray database are handpicked and defects are easily visible due to their extreme nature. In addition, weld images in the GDXray database are limited to only 88 samples. Although GDXray has become a standard benchmark for testing the performance of different algorithms [30], defects in real-world X-ray images are not as obvious as those in the GDXray dataset. As is shown in Fig. 9.2 (bottom image), it is more challenging to detect defects in real-world X-ray images that are not handpicked and do not contain oversized defects. Moreover, in previous studies, whether traditional image processing algorithms or methods based on deep learning were used, differences in wall thickness of welded objects and scattering or waving in the weld root due to geometry of pipes and variation of welding pattern of manual welder are not present and neglected as a feature. In such cases, due to the change of thickness, intensity of the image varies significantly in each segment of the weld, as is shown in Fig. 9.3.

336

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 9.3 An image sample from the SBD database in that two surfaces with different thicknesses are welded.

In this chapter, the main goal is set to locate and identify discontinuities and defects (detection) in a realistic X-ray image dataset (the SBD dataset) and to determine the defect type (DT). Due to the aforementioned differences with the GDXray database, previous methods either failed to detect the defects present in the SBD dataset or the overall accuracy of the methods was dramatically lowered. To address this problem, a DCNN is designed and trained to be able to detect defects in the SBD database.

9.3 Database Preparation To develop and evaluate the proposed network architecture, 5000 full-sized 15,360 × 1024 (Width × Height) pixel images of a welded pipeline containing weld defects are gathered. All full-sized images were cropped into 224 × 224 pixel patches, each patch including the weld’s center. A total of 100,000 patches were categorized as non-defected and defected by an expert with API and ASME welding certificates. In the first phase, we selected a smaller set of non-defective and defective patches in a balanced number. This database was named SBD1. In addition, to evaluate the network’s performance against different types of defects, another database was developed (SBD-2), containing more than 20,000 discontinuity patches, including 11 different types of major defects recognized by ASME and API. DTs and quantity are shown in Table 9.1. Fig. 9.4 depicts 10 samples of non-defective patches, and Fig. 9.5 shows 11 patches along with the DT.

9.4 Experimental Study AlexNet was first proposed in 2012 by Krizhevsky et al. [42] and was the winner of the ImageNet [14] LSVRC-2010 contest. In this work, for the first time, rectified linear units (ReLUs) were used as the activation function. The network was trained on 1.2 million high-resolution images with the goal of classifying into 1000 categories. The eight-layer architecture of the network, which is an extension of Lenet-5 [45], consists of 5 convolutional layers followed by three fully connected (FC) ones, and the network has 60 million parameters. As input, it takes 224 RGB images, and the final FC layer is a 1000-way Softmax. After

Defect detection and classification in welding using deep learning Chapter | 9

337

TABLE 9.1 Defect types Defect Type

Total Number of Patches

Elongated Slag Inclusions (ESIs)

5884

Hollow Bead (HB)

5204

Isolated Slag Inclusions (ISIs)

3776

Gas Porosity (GP)

2528

Inadequate Penetration (IP)

1216

External Undercut (EU)

1692

Porosity (P)

785

Scattered Porosity (SP)

632

Inadequate Penetration Due to High-Low (IPD)

938

Internal Undercut (IU)

902

Internal Concavity (IC)

814

FIGURE 9.4

Ten samples of non-defect images.

that, using Deeper CNN has become prevalent among researchers. As a proof that deepening the network will improve model accuracy, VGG-16 and VGG19 were proposed by Simonyan and Zisserman [72] on the same ImageNet dataset. The network has approximately 138 million parameters. In comparison to AlexNet, they have added 11 convolutional layers to VGG-16 and 14 to VGG19 to improve overall accuracy of the network. In 2015, a deeper model with 152 layers, which is known as ResNet-50 [27], was presented by Microsoft Research. In this work, they leveraged skip connections and batch normalization to address the problem of saturation and prevent compromising of model generalization in deep networks. Several CNN architectures like VGG-16, VGG-19, AlexNet, and ResNet were applied to SBD-1 and SBD-2. The maximum accuracy obtained using these well-known architectures was 86% for SBD-1 and 75% for SBD-2. Because the

338

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 9.5

Classes of welding defects based on the ASME standard.

object of interest in this problem is not a common object (i.e., weld defects), methods based on transfer learning, such as in the work of Ferguson et al. [17], would not be an optimal solution. To address this problem, a network architecture is proposed in the following section. To obtain optimized values for the number of layers, filter size, and other hyperparameters (HPs), a Bayesian optimization algorithm was used [63]. In the following sections, design steps for the SBD-2 database are presented. The steps will be the same for the SBD-1 database with exception of the number of classes.

9.4.1

Deep Learning Architecture

Fig. 9.6 illustrates the overall CNN architecture including input and multiple convolutional layers, followed by a batch normalization layer, ReLU layer, maxpooling layer, FC layer, SoftMax layer, and output layer for the classification task. The first layer is the input layer that receives the input of 224×224 patches for classification. In SBD-1, this classification would be between two classes, and in SBD-2, it would be a multi-class classification problem.The defect features in each data batch were extracted using multiple convolutional layers. This layer consists of various sets of neurons whose weights and biases will be updated relative to the defect features. In the convolutional layer, the neuron input consists of small sectors from the previous layer called the filter (kernel). The size of the filter, s f , can be tuned from 1 × 1 pixels up to the size of the input image. In the convolutional layer, the filter moves along the input and builds a convoluted feature map. To increase the number of feature maps, multiple filters should be used, and each filter has different weights and biases to be able to extract various features of the image. The stride (amount of horizontal

Defect detection and classification in welding using deep learning Chapter | 9

FIGURE 9.6 Network architecture used for weld defect detection. Five convolutional layers are used in this architecture. Conv: convolutional layer; BN: batch normalization layer; MP: max-pooling layer; FC: fully connected layer.

339

340

Fault diagnosis and prognosis techniques for complex engineering systems

and vertical movement of the filter on the input per convolution) is set to 3 pixels. After the convolutional layer, a batch normalization layer is used for reducing the CNN sensitivity to initial HP values and decreasing training processing time. Following the batch normalization layer, a ReLU activation layer is added to apply a zero threshold to all negative values in the batch normalization layer, which means that the inputs from the previous layer b go through max(0, b). The max-pooling layer downsamples the input by dividing it into rectangular pooling regions to compute the maximum of each region of gathered feature matrices. After designing the feature extractor, the FC layer is used to map the features matrix in the last layer in the form of a 1 × c vector, where c = 11 is the number of DTs in SBD-2. For representing the probability distribution over multiple classes in the output of a classifier, a generalized model of binary logistic regression classifier (Softmax function) is utilized after the FC [8, 21]. Considering the input of the Softmax function as a sample defect patch (DP) that belongs to one of the 11 DTs, DP∈DT j , where j∈{1, ..., 11}, then the DP prior probability [24] is defined as P(DTj ), which shows the probability of DP∈DTj ¯ is the parameter ¯ ¯ and conditional probability as P(DPj , φ|DT ¯ b] j ), where φ = [ω, ¯ The Softmax function is described vector that consists of weights ω¯ and biases b. as follows:   ¯ = P DTj | DP, φ¯ S j (DP, φ)       ¯ P DP, φ¯ | DTj P DTj exp r j (DP, φ) = j = L  ,   ¯ P DP, φ¯ | DTm P(DTm ) m=1 exp rm (DP, φ) m=1

(9.1) ¯ = ln(P(DP, φ|DT ¯ where r j (DP, φ) j )P(DT j )) and S j is a probability distribution as the Softmax function output, where 0 ≤ S j ≤ 1 and 13 ¯ j=1 S j (DP, φ) = 1. Following the Softmax function, the classification output layer (cross entropy function) is used to assign each input to one of the n = 11 mutually exclusive DTs using the loss function shown in the following: ¯ =− l(φ)

p n  

  di j ln S j DPi , φ¯ ,

(9.2)

i=1 j=1

where p is the number of samples and di j is a matrix that shows with what probability the ith sample of DP belongs to the jth DT.

9.4.2

Training

Stochastic gradient descent with momentum (SGDM) was used to train the CNN for classification. This method updates the CNN’s weights and biases to minimize the loss function that measures the difference between true-classified

Defect detection and classification in welding using deep learning Chapter | 9

341

and false-classified DPs. The SGDM uses a subset of training data (mini-batch). The gradient derived from the data within the mini-batch is used for updating the weights and biases. Each update to the weights and biases is defined as one iteration. The gradient descent update law is described as     (9.3) φ¯k+1 = φ¯k − λl φ¯k + η φ¯k − φ¯k−1 , where subscript k represents the iteration number, the initial learning rate is 0 < ¯ is the loss function, λ < 1, φ¯ is a vector that contains the weights and biases, l(φ) and 0 ≤ η ≤ 1 is the momentum, which defines the level of contribution from the previous step. For λ values close to 0, the learning processed is slowed. and values close to 1 lead to either diverging or suboptimal weights. Moreover, to prevent over-fitting of the CNN during the training process, L2 regularization [8, 62] is utilized as follows:  T     τ w  ¯k w ¯k , (9.4) l φ¯k+1 = l φ¯k + 2 where τ is the regularization factor. To address over-fitting and feature memorization, and improve the generalization of the Softmax classifier during the training process, a modified data augmentation procedure is used during each iteration [21], where the DPs were translated randomly in the horizontal and vertical directions by a maximum of by ±10 pixels.

9.4.3

Network HP Optimization

HPs in the proposed CNN architecture and SGDM are the filter size s f , number of filters N f , and number of CNN layers ND, η, τ, and λ. The search range for HPs was defined as ND ∈ {1, 2, . . . , 20}, S f ∈ {1, 2, . . . , 15} , N f ∈ {1, 2, . . . , 100}, 0 ≤ η ≤ 1, 0 ≤ τ ≤ 1, and 0 < λ < 1. The possible values for ND, s f , and N f are integers, and for η, τ, and λ are logarithmically spaced values between 0 and 1. The classification error is the number of misclassified DPs by the classifier (Softmax). The objective of optimization is to find optimal values for the HPs such that the classification error is minimized. Thus, the objective function can be considered a function with HPs as the input and the classification error as the output. Modeling of this objective function is algebraically complicated and computationally intensive. The BOA is capable of performing optimizing the HPs to minimize the classification error, whereas the objective function is considered as a black box [5]. To perform the BOA, a validation set was defined that consists of 15% randomly selected DPs from the training set. The inputs of the objective function are the training set and the validation set. As shown in Fig. 9.7 the objective function trains the CNN and returns the classification error on the validation set. By modeling the calculated error using a Gaussian process (GP) as mentioned in the work of Gelbart et al. [19] and in multiple iterations Z, where z = {1, 2, . . . , 83}, the BOA finds the optimal values

342

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 9.7

Block diagram of optimizing hyperparameters using BOA.

for HPs that minimize the classification error. The kernel function that was used for the GP is the automatic relevance determination (ARD) Matéra 5/2 in the work of Rasmussen [68]. In addition, the acquisition function (qz (HP)) that is used for the GP is the expected improvement function E(.) [59], as follows:    (9.5) qz (HP) = arg max E max 0, fz+1 (HP) − fzmax (HP) , HP

where fzmax (HP) is the current maximum observed value for the objective function. The next estimation for maximizing the objective function is obtained by using the acquisition function. The GP posterior is updated in each iteration using Eq. (9.5): P( fz | u) =

P(u | fz )P( fz ) , P(u)

(9.6)

where u = {(HPz , fz ), z = 1 : 100}. The extrema of fz (HP) was obtained numerically at sampled values of the function. A closed-form expression of the objective function is not required within the BOA mathematical structure [10]. The objective function and acquisition function for two of the SGDM HPs (i.e., η and λ) during the optimization process are shown in Fig. 9.8. As depicted in Fig. 9.8(A) the observed points are demarcated by blue dots ( fz (HP)), the model mean that is obtained from the observations is depicted as the red surface, and subsequent evaluation point addition is demarcated with a black dot. Moreover, Fig. 9.8(B) illustrates the acquisition function. The objective function is shown to reach a minimum at the 53rd iteration; this point is demarcated with a black star. Fig. 9.8(B) shows the maximum feasible value that is generated upon minimizing the classification error. Fig. 9.9 visualize how well False Positive rate decreased after HP optimization. Most enhancement is performed on SP and ISI, and class IU did not show any improvement. The total number of iterations was set to 80. Each iteration calculates the classification error among 700 randomly selected DPs from 11 defect batches.

Defect detection and classification in welding using deep learning Chapter | 9

FIGURE 9.8 (A) The observation function model. (B) Acquisition function for two parameters of the SGDM. The starred point is the calculated optimal values at the 53rd iteration for η and λ.

343

344

Fault diagnosis and prognosis techniques for complex engineering systems

SP

1% 43% 4%

IU

4% 4%

ISI

IPD

IP

18% 0.7% 3% 2% 5% 6%

IC

P

HB

GP

10% 3% 8% 2% 4% 1% 3% 11%

EU

16% %Errorafteroptimization

ESI

0% FIGURE 9.9

3% %Errorbeforeoptimization

8% 5%

10% 15% 20% 25% 30% 35% 40% 45%

Comparison between error among 11 defect categories before and after optimization.

The BOA was evaluated statistically using the Wald method [26] by representing the images in the test set as independent events with a known probability of success. The number of misclassified images were represented with a binomial distribution. By applying the trained CNN with optimized HPs on the test set and computing the number of correctly classified DPs, the test error Er is defined as follows: 1 Dsi , Et = 1 − b i=1 b

(9.7)

Defect detection and classification in welding using deep learning Chapter | 9

345

where Ds and b are the number of correctly classified DPs and the total number of DPs in the test set, respectively. Note that to evaluate the trained CNN performance on the test set without exposing the CNN to the optimization process, Es is used to obtain the standard error. This approach helps to increase the optimization speed. The standard error is represented as follows:  Et (1 − Et ). (9.8) Es = Ds Moreover, as the target of this research, to obtain a ±8 error margin, a confidence interval of 92% is defined to calculate the generalization error EG defined as EG = Et ± 0.92Es .

(9.9)

The final HP values for the CNN were S f 1 = 7, S f 2 = 7, S f 3 = 9, N f 1 = 235, N f 2 = 105, and N f 3 = 98. In addition, the optimized values for SGDM were λ = 0.0016737, η = 0.040151, and τ = 0.003134. Applying the optimal HP values to the CNN and SGDM yields a CNN with 20 layers and 91.20% accuracy in eight epochs. Moreover, the minimized value for the loss function was 0.021 in eight epochs. As well, the generalized error EG interval for the test set was [0.5170.0115]. The same steps followed for SBD-1 and the resulted HP values for the CNN were S f 1 = 3, S f 2 = 5, S f 3 = 9, N f 1 = 107, N f 2 = 87, and N f 3 = 159. The optimized values for SGDM were λ = 0.002128, η = 0.025287, and τ = 0.003876. Applying the optimal HP values to the CNN and SGDM yields a CNN with 20 layers and 95.86% accuracy in eight epochs. Moreover, the minimized value for the loss function was 0.037 in eight epochs.

9.5 Experimental Implementation There is an increasingly valuable resource for managing both training and inference pipelines based on sophisticated models, namely large-scale cloud computing platforms. Through services that are specifically designed to offload high capacity data storage and intensive computations to scaleable remote specially designed servers, it becomes possible to develop and serve complex models such as the aforementioned and chosen deep learning CNN. In the course of the model’s development, multiple AWS were employed to handle all stages of production, including the means to easily supply the model through the input of dedicated experts. For such advanced models, it simply is not possible anymore to train efficiently on local generic machines with the amount of data necessary so that the model can learn or extract sophisticated features and maximize accuracy in its predictions. In particular, the storage service known as S3 allowed for systematic massivescale large file data hosting. In practice, the electronically captured radiographic DICOM (Digital Imaging and Communications in Medicine) images assessed and analyzed by inspection operators are quite unwieldy to manage on local

346

Fault diagnosis and prognosis techniques for complex engineering systems

machines and are essentially impossible to collect at one time. As technicians capture data through equipment designed for RTR, these images are directed to a database of files ready for processing to either be used as training or testing data for the model or for the opportunity for additional annotations and detailed labeling. As these images are collected from projects that are confidential and display information that should not be seen by third parties, a mandatory redaction step is applied using optical character recognition models to maximize the accuracy of redaction for exclusively confidential information. Once these images have been redacted, they are ready for labeling and annotation or further processing. Through the power of microservices architecture and services that enable scalable computation effects such as AWS Lambda, each processing step in the development of the model can be produced independently. A separate series of proprietary software tools were responsible for extracting images from large data storage and preparing the input in a preprocessing step for direct application as training data into the model. As these images are essentially long, rectangular unwrapped scans of the weld and pipe circumference, it was necessary to employ a method for essentially extracting the region of the image where the actual weld exists between the the walls of the pipe and producing specifically sized slices that follow the weld precisely. These slices are then related to labeled defect annotations in one of two ways. First, experts in RTR inspection are asked to annotate and classify suspected defects from the slices directly or from the original image, even outside the context of the project where the weld was originally captured. Second, specific metadata in the image recording the annotations produced from proprietary inspection and annotation tooling software in the project itself is parsed and cross referenced with the exact location of the slice relative to the original image. Finally, after the appropriate labels are applied to the corresponding image slices, each centered on the weld and each containing the visual pattern of the defect class given by the label, the labeled training data is applied to the model.

9.6 Conclusion In this work, two realistic welding quality datasets for training deep learning models were created based on radiography images collected from various projects and NDT expert-annotated datasets SBD-1 and SBD-2. An optimized CNN was designed to find defects in the weldment and heat-affected zones and was subsequently trained and evaluated based on prepared datasets. An accuracy of 96% was achieved. In addition, the robustness of the proposed method was tested across 11 different types of welding discontinuity. Moreover, the results were compared with the non-optimized CNN architecture. In the case of using an optimized CNN, the classification accuracy was improved by 10%. In addition, this approach was found to be applicable to real-world datasets. This method

Defect detection and classification in welding using deep learning Chapter | 9

347

improved not only the defect detection speed but also the accuracy of defect detection in high-volume projects.

References [1] A. Aljaroudi, F. Khan, A. Akinturk, M. Haddara, P. Thodi. Risk assessment of offshore crude oil pipeline failure. Journal of Loss Prevention in the Process Industries 37 (2015) 101–109. [2] W.M. Alobaidi, E.A. Aluam, H.M. Al-Rizzo, E. Sandgren. Applications of ultrasonic techniques in oil and gas pipeline industries: A review. American Journal of Operations Research 5 (4) (2015) 274. [3] M. Arntz, T. Gregory, U. Zierahn. Revisiting the risk of automation. Economics Letters 159 (2017) 157–160. [4] T. Backström, M. Döös. A comparative study of occupational accidents in industries with advanced manufacturing technology. International Journal of Human Factors in Manufacturing 5 (3) (1995) 267–282. [5] R. Baptista, M. Poloczek. Bayesian optimization of combinatorial structures. arXiv:1806.08838, 2018. [6] R. Beeson, Miller Electric Manufacturing Co., Appleton, WI (US). Pipeline welding goes mechanized. Welding Journal (Miami) 78 (11) (1999) 47–50. [7] K. Benyounis, A. Olabi, M. Hashmi. Effect of laser welding parameters on the heat input and weld-bead profile. Journal of Materials Processing Technology 164 (2005) 978–985. [8] Don E. Bray, Don McBride, Nondestructive testing techniques, NASA Scientific and Technical Information/Recon Technical Report A93 (1992) 17573. [9] E. Brochu, V.M. Cora, N. De Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599, 2010. [10] L. Cartz. Nondestructive Testing. ASM International, Novelty, OH, 1995. [11] J. De Raad, F. Dijkstra. Mechanized ultrasonic testing on girth welds during pipeline construction. Materials Evaluation 55 (8) (1997) 890–895. [12] J. Deng, W. Dong, R. Socher, L.-J. Li. K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009. IEEE, Los Alamitos, CA. 248–255. [13] M.C. Domke, J.H. Messinger, S. Soorianarayan, T.E. Lambdin, S. L. Sbihli. Systems and methods for analyzing data in a non-destructive testing system. US Patent 9,217,999, 2015. [14] D. Fairchild, M. Macia, N. Bangaru, J. Koo. Girth welding development for x120 linepipe. International Journal of Offshore and Polar Engineering 14 (1) (2004) ISOPE-04-14-018. [15] M.K. Ferguson, A. Ronay, Y.-T.T. Lee, K.H. Law. Detection and segmentation of manufacturing defects with convolutional neural networks and transfer learning. Smart and Sustainable Manufacturing Systems 2 (2018) 10. [16] D.S. Forsyth, H.T. Yolken, G.A. Matzkanin. A brief introduction to nondestructive testing. AMMTIAC Quarterly 1 (2) (2006) 7–10. [17] M.A. Gelbart, J. Snoek, R.P. Adams. Bayesian optimization with unknown constraints. arXiv:1403.5607, 2014. [18] O. Gericke. Determination of the geometry of hidden defects by ultrasonic pulse analysis testing. Journal of the Acoustical Society of America 35 (3) (1963) 364–368. [19] I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press, Cambridge, MA, 2016.

348

Fault diagnosis and prognosis techniques for complex engineering systems

[20] R. Gordon, R. Holdren, M. Johnson, M. Lozev. Reducing pipeline construction costs: New technologies. Welding in the World 47 (5-6) (2003) 7–14. [21] L. Gourd. Principles of Welding Technology. Edward Arnold, London, UK, 1986. [22] A. Graves, S. Fernández, F. Gomez, J. Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, 2006, 369–376. [23] W. Guo, H. Qu, L. Liang. WDXI: The dataset of X-ray image for weld defects. In Proceedings of the 14th Inter national Conference on Natural Computation, Fuzzy Systems, and Knowledge Discovery (ICNC-FSKD), 2018. IEEE, Los Alamitos, CA. 1051–1055. [24] F.E. Harrell Jr. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer Nature, Switzerland AG, 2015. [25] K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 770– 778. [26] Y. He. M. Pan, F. Luo, G. Tian. Pulsed eddy current imaging and frequency spectrum analysis for hidden defect nondestructive testing and evaluation. NDT & E International 44 (4) (2011) 344–352. [27] M.A. Henn, H. Zhou, B.M. Barnes. Data-driven approaches to optical patterned defect detection. OSA Continuum 2 (9) (2019) 2683–2693. [28] W. Hou, Y. Wei, Y. Jin, C. Zhu. Deep features based on a DCNN model for classifying imbalanced weld flaw types. Measurement 131 (2019) 482–489. [29] W. Hou, D. Zhang, Y. Wei, J. Guo, X. Zhang. Review on computer aided weld defect detection from radiography images. Applied Sciences 10 (5) (2020) 1878. [30] D.C. Howard. Non-destructive testing of pipeline. US Patent 4,098,126, 1978. [31] O. Hunaidi, M. Bracken, A. Wang. Non-destructive testing of pipes. US Patent 7,328,618, 2008. [32] H. Zhang, Z. Chen, C. Zhang, J. Xi, X. Le. Weld defect detection based on deep learning method. In Proceedings of the IEEE 15th International Conference on Automation Science and Engineering (CASE), 2019. IEEE, Los Alamitos, CA. 1574–1579. [33] J. Jarmulak, E.J.H. Kerckhoffs, P.P. van’t Veen. Case-based reasoning for interpretation of data from non-destructive testing. Engineering Applications of Artificial Intelligence 14 (4) (2001) 401–417. [34] J.F. Jarvis. Visual inspection automation. In Proceedings of the IEEE Computer Society’s 3rd International Computer Software and Applications Conference, 1979. IEEE, Los Alamitos, CA. 251–255. [35] J.A. Jensen. Medical ultrasound imaging. Progress in Biophysics and Molecular Biology 93 (1-3) (2007) 153–165. [36] B.Y. Jeong. Occupational deaths and injuries in the construction industry. Applied Ergonomics 29 (5) (1998) 355–360. [37] M.I. Khan. Welding Science and Technology. New Age International, Delhi, India, 2007. [38] S. Knight, S.G. Drake. X-ray inspection apparatus for pipeline girth weld inspection. US Patent 8,923,478, 2014. [39] K. Kobayashi, S. Ishigame, H. Kato. Skill Training System of Manual Arc Welding. Entertainment Computing: Technologies and Application. Springer US, Boston, MA, 2003, pp. 389–396. [40] A. Krizhevsky, I. Sutskever, G.E. Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097– 1105.

Defect detection and classification in welding using deep learning Chapter | 9

349

[41] J. Kumar, R. Anand, S. Srivastava. Flaws classification using ANN for radiographic weld images. In Proceedings of the International Conference on Signal Processing and Integrated Networks (SPIN), 2014. IEEE, Los Alamitos, CA. 145–150. [42] J. Kumar, R. Anand, S. Srivastava. Multi-class welding flaws classification using texture feature for radiographic images. In Proceedings of the International Conference on Advances in Electrical Engineering (ICAEE), 2014. IEEE, Los Alamitos, CA. 1–4. [43] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11) (1998) 2278–2324. [44] G. Light. Demonstration of Realtime Radiography on Pipeline Girth Welds. Technical Report. Southwest Research Institute, San Antonio, TX. [45] J. Lin. Y. Yao, L. Ma, Y. Wang. Detection of a casting defect tracked by deep convolution neural network. International Journal of Advanced Manufacturing Technology 97 (1-4) (2018) 573–581. [46] T.R. Lin, B. Guo, S. Song. A.J. Chacko. Ghalambor. Offshore Pipelines. Gulf Professional Publishing. Burlington, MA, 2005. [47] R. Lumb. Non-destructive testing of high-pressure gas pipelines. Non-Destructive Testing 2 (4) (1969) 259–268. [48] B. Ma, J. Shuai, J. Wang, K. Han. Analysis on the latest assessment criteria of ASME B31G-2009 for the remaining strength of corroded pipelines. Journal of Failure Analysis and Prevention 11 (6) (2011) 666–671. [49] T. Matsutani, F. Miyasaka, T. Oji, Y. Hirati. Mathematical modelling of gta girth welding of pipes. Welding International 11 (8) (1997) 615–620. [50] E. Megaw. Factors affecting visual inspection accuracy. Applied Ergonomics 10 (1) (1979) 27–32. [51] J. Meier, I. Tsalicoglou, R. Mennicke. The future of NDT with wireless sensors, AI and IoT. In Proceedings of the 15th Asia Pacific Conference for Non-Destructive Testing, 2017. [52] P.F. Mendez, T.W. Eagar. Penetration and defect formation in high-current arc welding. Welding Journal 82 (10) (2003) 296. [53] D. Mery, M.A. Berti. Automatic detection of welding defects using texture features. Insight: Non-Destructive Testing and Condition Monitoring 45(10) (2003) 676–681. [54] D. Mery, V. Riffo, U. Zscherpel, G. Mondragón, I. Lillo, I. Zuccar, H. Lobel, M. Carrasco. GDXray: The database of X-ray images for nondestructive testing. Journal of Nondestructive Evaluation 34 (4) (2015) 42. [55] C. Mgonja. The consequences of cracks formed on the oil and gas pipelines weld joints. International Journal of Engineering Trends and Technology 54 (2017) 223–232. [56] Introduction to Nondestructive Testing: A Training Guide, John Wiley & Sons, Hoboken, New Jersey, 2005. [57] J. Mockus. Bayesian Approach to Global Optimization: Theory and Applications, 37, Springer, Netherlands, 2012. [58] L. Morgan. Testing defects in automated ultrasonic testing and radiographic testing. Insight: Non-Destructive Testing and Condition Monitoring 60 (11) (2018) 606–612. [59] L. Morgan, P. Nolan, A. Kirkham, R. Wilkerson. The use of automated ultrasonic testing (AUT) in pipeline construction. Insight: Non-Destructive Testing and Condition Monitoring 45 (11) (2003) 746–763. [60] M. Naddaf-Sh, S. Hosseini, J. Zhang, N.A. Brake, H. Zargarzadeh. Real-time road crack mapping using an optimized convolutional neural network. Complexity 2019 (2019) 2470735. [61] J. Nestleroth. Pipeline in-line inspection challenges to NDT. Insight: Non-Destructive Testing and Condition Monitoring 48 (9) (2006) 524.

350

Fault diagnosis and prognosis techniques for complex engineering systems

[62] L. Norros. Human and organisational factors in the reliability of non-destructive testing (NDT). In RATU2: The Finnish Research Programme on the Structural Integrity of Nuclear Power Plants. VTT Technical Research Centre of Finland, Espoo, Finland, 271. [63] H.G. Pisarski, C.M. Wignall. Fracture toughness estimation for pipeline girth welds. In Proceedings of the International Pipeline Conference, Vol. 36207 (2002) 1607–1611. [64] J. Quirk. Achieving greater efficiency in NDT inspections. Sensor Review 19 (4) (1999) 268– 272. [65] C.E. Rasmussen. Gaussian processes in machine learning. Summer School on Machine Learning, Springer, Berlin, Heidelberg, 2003, pp. 63–71. [66] W.G. Roe. Welding system. US Patent 3,278,721, 1966. [67] A. Seto, T. Masuda, S. Machida, C. Miki. Very low cycle fatigue properties of butt welded joints containing weld defects: Study of acceptable size of defects in girth welds of gas pipelines. Welding International 14 (1) (2000) 26–34. [68] F. Shaheen, B. Verma, M. Asafuddoula. Impact of automatic feature extraction in deep learning architecture. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2016. IEEE, Los Alamitos, CA. 1–8. [69] K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. [70] S.T. Snyder. Alternative acceptance criteria for pipeline girth welds. Inspection Trends 2006 (2006) 20–22. [71] M.Törner, A. Pousette. Safety in construction—A comprehensive description of the characteristics of high safety standards in construction work, from the combined perspective of supervisors and experienced workers. Journal of Safety Research 40 (6) (2009) 399–409. [72] I. Valavanis, D. Kosmopoulos. Multiclass defect detection and classification in weld radiographic images using geometric and texture features. Expert Systems with Applications 37 (12) (2010) 7606–7614. [73] P. Valentin. Weld Classification Based on Grey Level Co-occurrence and Local Binary Patterns. Master’s Thesis, Aalborg University, Aalborg, Denmark, 2017. [74] R. Vilar, J. Zapata, R. Ruiz. An automatic system of classification of weld defects in radiographic images. NDT & E International 42 (5) (2009) 467–476. [75] M. Wall. Human factors guidance to improve reliability of non-destructive testing in the offshore oil and gas industry. In Proceedings of the 7th European-American Workshop on Reliability of NDE. [76] G. Wang, T.W. Liao. Automatic identification of different types of welding defects in radiographic images. NDT & E International 35 (8) (2002) 519–528. [77] R. Wang, R.-J. Guo. Developments of automatic girth welding technology in pipelines. Dianhanji/Electric Welding Machine 41 (9) (2011) 53–55. [78] D. R. Williams Jr., L. R. Gutzwiller, M. U. Hazen, B. S. Anderson, A. McIntyre, T. Abeles. Classifying data with deep learning neural records incrementally refined through expert input. US Patent 9,324,022, 2016. [79] D. Yapp, S. Blackman. Recent developments in high productivity pipeline welding. Journal of the Brazilian Society of Mechanical Sciences and Engineering 26 (1) (2004) 89–97. [80] S. Yella, M. Dougherty, N. Gupta. Artificial intelligence techniques for the automatic interpretation of data from non-destructive testing. Insight: Non-Destructive Testing and Condition Monitoring 48 (1) (2006) 10–20. [81] Y. Zhang, D. You, X. Gao, N. Zhang, P.P. Gao. Welding defects detection based on deep learning with multiple optical sensors during disk laser welding of thick plates. Journal of Manufacturing Systems 51 (2019) 87–94.

Defect detection and classification in welding using deep learning Chapter | 9

351

[82] J. Zapata, R. Vilar, R. Ruiz. Performance evaluation of an automatic inspection system of weld defects in radiographic images based on neuro-classifiers. Expert Systems with Applications 38 (11) (2011) 8812–8824. [83] S. Naddaf-Sh, M-M. Naddaf-Sh, A.R. Kashani, H. Zargarzadeh, An Efficient and Scalable Deep Learning Approach for Road Damage Detection, 2020 IEEE International Conference on Big Data (Big Data) (2020) 5602–5608, doi:10.1109/BigData50022.2020.9377751. [84] M-M. Naddaf-Sh, H. Myler, H. Zargarzadeh , Design and Implementation of an Assistive Real-Time Red Lionfish Detection System for AUV/ROVs, Complexity (2018) doi:https://doi.org/10.1155/2018/5298294. [85] M. M. Dargahi, A. Khaloo, D. Lattanzi, Color-space analytics for damage detection in 3D point clouds, Structure and Infrastructure Engineering (2021) 1–14 doi:https://doi.org/ 10.1080/15732479.2021.1875488.

Chapter 10

Real-time fault diagnosis using deep fusion of features extracted by PeLSTM and CNN Funa Zhou a, Zhiqiang Zhang b and Danmin Chen c a School

of Logistic Engineering, Shanghai Maritime University, China. b School of Computer and Information Engineering, Henan University, China. c School of Software, Henan University, China

10.1 Introduction Fault diagnosis is one of the critical means to secure the safety and efficient operation of large-scale automation systems. Therefore, fault diagnosis has received much attention from experts in both academic and engineering fields [1–5,13,33,36]. The data-driven fault diagnosis method has become a promising tool in engineering applications because no accurate physical model and information of correct expert knowledge are required [8,19,26,29]. Deep learning is an efficient data feature representation tool. It can be applied to data-driven fault diagnosis. There are currently four kinds of fault diagnosis methods that use deep learning: deep neural network (DNN)-based methods, deep belief network (DBN)-based methods, convolutional neural network (CNN)-based methods, and long short-term memory (LSTM) neural network– based methods [1,2,12,15,17,27,40]. Different from DNN and DBN, CNN can well extract local feature successively by using a convolution layer and pooling layer. LSTM does well in sequence feature extraction by designing a forget gate. Jiang et al. [14] reshaped the vibration signal of bearing into 2-D matrix data and fed them into CNN for fault diagnosis. Zhang et al. [38] first converted the bearing vibration signal into a 2-D matrix and then fed it into CNN for fault diagnosis. However, an accurate fault diagnosis result can be achieved only by reshaping the vibration signal into large 2-D matrix data, which will seriously affect the computational complexity of the diagnosis algorithm and does not take real-time performance into account [38]. Eren et al. [6] used the original time series data as the input of 1-D CNN to achieve real-time fault diagnosis without manually extracting features in advance. Peng et al. [22] proposed a new Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems. DOI: 10.1016/B978-0-12-822473-1.00003-3 Copyright © 2021 Elsevier Inc. All rights reserved. 353

354

Fault diagnosis and prognosis techniques for complex engineering systems

multiscale CNN for feature extraction of a strong coupling vibration signal with a low signal-to-noise ratio. Han et al. [9] used a wavelet transform spectrum of the vibration signal as the input of CNN for fault diagnosis of rolling bearing. Kou et al. [16] developed a CNN-based fault diagnosis method for bearing working under different working conditions without manual intervention. Hsueh et al. [11] first converted the original current signal into a 2-D grayscale image, then applied the deep CNN model to automatically extract robust features from the grayscale image to diagnose fault of the induction motor. However, the preceding methods cannot extract the autocorrelation feature involved in 1-D sequence data, which may yield an inaccurate fault diagnosis result. Wen et al. [30] converted 1-D sequence data into 2-D images and then used CNN to extract features. However, the training samples are required to be reshaped into a 2-D image in advance, which cannot secure a real-time fault diagnosis [30]. Jiang et al. [14] used CNN directly to extract features involved in original bearing data, but it cannot reach a real-time fault diagnosis either. It is worth noting that the preceding methods utilize CNN as a unique feature extraction tool that is incapable of comprehensive feature extraction. Li et al. [18] integrated DNN and CNN to design a fault diagnosis algorithm using deep learning that can improve the diagnosis accuracy. The feature extracted by CNN and DNN can be fused to get more accurate feature representation and fault diagnosis. However, this method can only be applied to offline fault classification rather than online real-time fault diagnosis. LSTM is an important branch of the recurrent neural network (RNN). It is an efficient tool for extracting long-term dependency trend features involved in 1-D sequence data. Yang et al. [32] designed a rotary mechanical fault diagnosis method using LSTM to extract long-term time dependency from all available data sampled by multiple sensors to detect and classify faults. Wang et al. [28] designed an LSTM network to extract features involved in gear fault data, which is used for gear fault diagnosis. By using LSTM’s forgetting mechanism, Yu et al. [35] designed a layered LSTM algorithm for overcoming the shortcomings of feature extraction in shallow networks. Thus, a fault diagnosis algorithm of rolling bearings is constructed [27]. Using LSTM to extract the feature of long-term autocorrelation for original nonlinear vibration data, Fu et al. [7] designed a fault diagnosis algorithm for a high-speed train bogie. Luo and Hu [20] proposed a rolling bearing fault diagnosis method based on LSTM to simplify the fault diagnosis process. The well-trained network is used to discriminate multiple categories of original data. Thus, it can effectively improve the accuracy of fault diagnosis [29]. Xiao et al. [31] designed a fault diagnosis algorithm of the three-phase asynchronous motor by using LSTM to extract features involved in the motor acceleration signal. Thus, the relationship between the original vibration signal and the health state can be established [30]. Yin et al. [34] proposed a fault diagnosis method for a wind turbine gearbox by designing an optimized LSTM neural network with cosine loss. The loss is

Real-time fault diagnosis using deep fusion of features Chapter | 10

355

converted from Euclidean space to angular space through cosine loss, thereby eliminating effects and improving the accuracy of fault diagnosis for the gearbox [34]. Polat [23] used deep-level LSTM to diagnose fault by extracting features involved in the vibration signals of CNC machine tools. Yu et al. [35] proposed a layered LSTM algorithm for overcoming the shortcomings of the shallow structure without any preprocessing operations or manual feature extraction. Thus, an end-to-end fault diagnosis system framework for rolling bearing is proposed. Zhao et al. [40] developed a fault diagnosis method based on LSTM to directly classify the original process data without specific feature extraction and classifier design. It can also adaptively learn the dynamic information involved in the original data [40]. Bruin et al. [4] constructed LSTM for signals from multiple tracking circuits in a geographical area; faults can be diagnosed by exploring the spatial and temporal dependencies of the faults, and these dependencies can be directly learned from the data to perform fault diagnosis for railway track circuits. The preceding works use LSTM as a feature extraction tool to extract features of the raw data sequence. However, the local feature involved in data cannot be well extracted by using LSTM as a unique feature extraction tool. The integration of LSTM and CNN can obtain more accurate feature representation. Although LSTM has achieved great success in fault diagnosis by extracting features involved in 1-D sequence data, its inherent structure makes it fail to deal with long sequence data [21]. The reason is that the previous information contained at the front of the sequence cannot be transferred to the end of the sequence when traditional LSTM is adopted. This memory bottleneck will limit the feature extraction capabilities of LSTM [37]. Zhang et al. [39] proposed a parallel LSTM (PLSTM) by processing all observations in the sequence at the same time to overcome the memory bottleneck problem. However, since feature extraction is performed on the complete sequence at the same time, difficulties corresponding to insufficient data utilization and information loss will be encountered. For the purpose of safety monitoring, data collected by various types of sensors for health monitoring of rotary machines are stored in the database, such as 1-D vibration signals collected by various accelerometers. However, the waveform of some critical variables monitored are usually displayed in real time. According to the refresh rate of the display, a screenshot of the display at the monitoring center can illustrate the waveform of the vibration signal. The screenshot image contains some trend information for the vibration signal, which can be accurately extracted by CNN. It can be concluded from the preceding analysis that it is important to design an efficient feature extraction network for 1-D sequence data. Developing a deep fusion mechanism to incorporate the 2-D screenshot image and the original 1-D sequence may be helpful to extract more comprehensive features required by a more accurate fault diagnosis result.

356

Fault diagnosis and prognosis techniques for complex engineering systems

The main contributions of this chapter are as follows: (1) We aim to develop an efficient feature extraction technique for health monitoring of rolling bearing utilizing a 1-D vibration signal. The proposed PLSTM with Peephole (PeLSTM) can prevent useless information transfer. It not only can solve the memory bottleneck problem of traditional LSTM for long sequences but also can make full use of all possible information helpful for feature extraction. (2) A fusion network with a new training mechanism is designed to fuse features extracted from PeLSTM and CNN respectively to further explore the potential feature related to both autocorrelation and local cross-correlation information. By designing a new loss function and global optimization mechanism for the training process, the fusion network can incorporate a 2-D screenshot image into comprehensive feature extraction. It can provide a more accurate fault diagnosis result since the 2-D screenshot image is another expression form of the 1-D vibration sequence involving additional trend and locality information. (3) A real-time screenshot image is fed into the input of CNN to secure a real-time online fault diagnosis, which is the primary requirement in the engineering field of health monitoring. The rest of this chapter is organized as follows. Section 10.2 introduces some basic theory of CNN and LSTM. Section 10.3 presents a real-time fault diagnosis method based on the deep feature fusion of PeLSTM and CNN. Section 10.4 presents our experimental verification. Section 10.5 provides our conclusion and plans for future work.

10.2 Basic theory This section briefly introduces some basics of CNN and LSTM.

10.2.1

Convolutional neural network

As is known to all, CNN is a special feed-forward neural network for 2-D image feature extraction, especially for large-sized images. Fig. 10.1 shows the schematic chart of CNN. Compared with traditional DNN, there are no specific hidden layers in the structure of CNN. Multiple layers of convolution operation and pooling operation are connected in sequence. CNN can extract local features involved in the input image with a reduced computational complexity since weight sharing of convolution operation and downsampling of the pooling operation is used. The resulting feature map is expanded into a 1-D vector, which can be fed as the input of the fully connected layers and the following classifier. The parameters of the network can be updated by minimizing the loss function through a backpropagation algorithm.

357

Real-time fault diagnosis using deep fusion of features Chapter | 10

Convolution

Max pooling

FeatureCNN

Convolution Max pooling

Output

Input 2-D image

Convolution kernel

Polling kernel

Destination pixel

Destination pixel The first layer

FIGURE 10.1

Reshape

The second layer

Fully connected

Schematic diagram of CNN.

v (t) = C (t) ≠ o (t) foget gate f (t)

output gate f (t)

C (t) = i(t) ≠ (t) + C (t−1) ≠ f (t)

input gate f (t)

(t)

FIGURE 10.2

10.2.2

Schematic diagram of an LSTM cell.

Long short-term memory

LSTM is a variation of RNN. It can sequentially extract autocorrelated features involved in the 1-D sequence data [10]. Each LSTM cell is composed of three gates defined by activation functions, which are used for transferring information for current input, for the previous input, and for the output of the cell, respectively. The schematic chart is shown in Fig. 10.2. The function of the forget gate is to determine what kind of previous information can be transferred through the cell. The function of the input gate is to determine what kind of input information needs to be transferred through the cell. The parameters of a well-trained LSTM include weights and bias of each gate.

10.3 Deep fusion of feature extracted by PeLSTM and CNN Real-time accurate diagnosis is the primary performance index required for fault diagnosis algorithms. How to make full use of all available multimodal data to accurately extract the feature involved in them is one of the critical means to get

358

Fault diagnosis and prognosis techniques for complex engineering systems

0.0085

0.4235

0.0130

-0.2652

0.2372

0.5909

-0.0930

-0.4069

0.2794

0.4370

-0.3529

0.1539

0.1425

0.1182

0.0597

-0.1397

0.0524

0.2867

-0.0471

-0.1206

0.0597

0.2367

-0.2940

0.1145

0.3594

-0.2424

-0.4772

0.1048

0.3825

-0.2485

-0.3980

0.1771

0.4581

0.0301

0.0747

-0.1693

-0.1174

0.0650

-0.1271

0.0244

0.6120

-0.4378

-1.3324

1.1468

1.6211

-1.8449

0.9567

-0.2834

-0.6989

1.2382

0.0199

-1.5967

0.4045

1.5456

-0.7866

-1.3332

1.0684

-0.1389

0.4231

-0.0032

-0.3545

0.3151

0.2989

-0.3610

-0.0244

0.5332

-0.0459

-0.3155

0.1624

0.2140

-0.2079

-0.3094

-0.1689

0.3350

-0.1125

0.0179

0.2144

0.0686

0.0694

0.0670

-0.0512

0.1259

0.1263

-0.0508

-0.0435

0.0451

-0.0142

0.1580

0.1559

0.0219

0.1750

0.2469

-0.0057

-0.0585

0.3066

0.1653

0.1105

0.0451

0.2778

0.1170

-0.0244

0.0743

0.0560

0.0678

-0.0621

0.3854

1.5951

-0.9896

-1.2962

1.7048

0.8910

-1.1789

-0.8788

1.6865 8.1218e-04

-0.8727

0.5596

-0.9417

0.1872

1.1898

-0.7135

-0.8187

0.5389

0.5413

-0.3200

0.0731

-0.1559

X1D (t −1) X1D (t) FIGURE 10.3

X1D (t +1)



-0.3890



0.6294



-0.6124 0.0394

1.0968

X1D (N)

1-D sequence data stored in the database.

more accurate fault diagnosis results. For this goal, by incorporating the fault diagnosis error, the output error of CNN, and the output error of PeLSTM, a global optimization mechanism is developed to train the deep fusion network such that a more comprehensive feature involved in 1-D sequence data and the 2D screenshot image can be achieved. The designing mechanism of an improved LSTM, called PeLSTM, is illustrated in detail for accurate feature extraction of 1-D sequence data, which can further secure the efficiency of deep fusion for 1-D sequence data and the 2-D screenshot image.

10.3.1

2D screenshot image construction

During the operation process of rotating machinery, the vibration signals collected for monitoring is a kind of 1-D sequence signal, and the existing CNNbased fault diagnosis method is obliged to use 1-D sequence data by reshaping it into 2-D matrix data row by row or transforming it into the time-frequency image via fast Fourier transform. This method relies on the collected sequence in a long period time window, so it can only be used for offline fault classification rather than online fault diagnosis. However, there are many kinds of monitoring sensors collecting 1-D sequence data, but only a few 1-D signals of critical monitored variables can be shown on the display of the monitoring center. Since the 2-D screenshot image is captured in real time, feeding it into CNN may achieve realtime fault diagnosis. Moreover, some real-time dynamic trend information of the 1-D signal can also be involved in the 2-D screenshot image rather than the 1-D signal itself. Fig. 10.3 shows the 1-D sequence stored in the database of the monitoring center, and it can be seen that each sample can be expressed as: X1D (t) ∈ R1 × 1 . For CNN of the 1-D sequence signal, the common means is to reshape the 1-D sequence into 2-D matrix data by stacking in row as shown in Eq. (10.1): X2D,Matrix (t ) ⎡ ⎢ =⎢ ⎣

X1D (t + 1) X1D (t + l + 1) ··· X1D (t + (l − 1) ∗ l + 1)

X1D (t + 2) X1D (t + l + 2) ··· X1D (t + (l − 1) ∗ l + 2)

··· ··· ···

⎤ X1D (t + l) X1D (t + l + 3) ⎥ ⎥ ∈ R l ×l, ⎦ ··· X1D (t + l ∗ l)

(10.1)

Real-time fault diagnosis using deep fusion of features Chapter | 10

(a) FIGURE 10.4

(b)

(c)

2-D matrix data by different reshaping means.

X1DCNN (t –1) X1DCNN(t) FIGURE 10.5

359

X1DCNN(t +1)

Samples fed into 1-D CNN.

where t is the start time of the sample and l is the number of sample points collected in each row. It is invalid for real-time fault diagnosis at time t. It can be seen from Eq. (10.1) that the samples fed in CNN are stacked by reshaping the 1-D sequence signal using a rather large window size, so it is not real-time observation related to the vibration sensor. The 2-D matrix reshaped by different means can involve different information, as shown in Fig. 10.4. Fig. 10.4(a) through (c) correspond to reshaping in row, reshaping in column, and reshaping in random, respectively. This figure indicates that for a given window size, the 2-D matrix data using different reshaping means are different. So reshaping is not a good choice for extracting local features of 1-D sequence data by using CNN. In addition, the dynamic sequential feature may be completely lost. As shown in Eq. (10.2) and Fig. 10.5, the sample fed in 1-D CNN is not a real-time observation at time t either. X1DCNN (t ) = [X1D (t ∗ 1 + 1)X1D (t ∗ 1 + 2) · · · X1D ((t + 1) ∗ l)] ∈ R l × l (10.2) It can be obtained from Eqs. (10.1) and (10.2) that 1-D CNN does better than traditional CNN since the sample fed into 1-D CNN ranges over a relatively

360

Fault diagnosis and prognosis techniques for complex engineering systems X2D (t –1)

X1D (t –1)

FIGURE 10.6

X2D (t +1)

X2D (t)

X1D (t)

X1D (t +1)

2-D screenshot image in the monitoring center.

short period of time. However, the existing method for the fusion of CNN and LSTM uses a separately training mechanism. Since 1-D CNN and LSTM are connected in series, inaccurate feature extraction from 1-D CNN will deteriorate the following feature extraction from LSTM. Therefore, it is not a preferred choice. Fig. 10.6 shows the 2-D screenshot image stored in the database of the monitor center. This figure indicates that each sample X2D (t) ∈ Rl × l is directly related to the real-time observation at time t. Feeding the 2-D screenshot image X2D (t) ∈ Rl × l into CNN can achieve real-time feature extraction. Furthermore, the more accurate features can be extracted since dynamic trend information at time t can also be illustrated in the screenshot X2D (t) ∈ Rl × l .

10.3.2

The feature fusion algorithm based on CNN and PeLSTM

CNN and LSTM can extract different kinds of features involved in data described in different forms. They both can be applied for fault diagnosis, but there are still some difficulties encountered in the application. How to overcome these difficulties is a challenging problem for further study. CNN pays more attention to local neighborhood feature extraction by using the pooling operation, whereas the autocorrelated dynamic feature is not considered. Due to its inherent structure, LSTM does well in autocorrelated feature extraction without considering too much local feature representation. Fuse the feature extracted by CNN and LSTM respectively to extract more accurate features, which is the basis requirement of accurate fault diagnosis. Fig. 10.7(a) shows a schematic chart for existing fault diagnosis based on 1-D CNN and LSTM. However, LSTM feature extraction has a strong dependence on the output of 1-D CNN. Inaccurate features extracted by 1-D CNN will deteriorate the feature extraction of the following LSTM. Separately training 1-D CNN and LSTM cannot secure a comprehensive feature involved in different modes of the signal. Yet not making full use of the 2-D screenshot image may result in an inaccurate feature. To overcome these shortcomings, an improved PLSTM, called PeLSTM, is designed and the deep fusion mechanism of the features extracted by PeLSTM and CNN is developed for more comprehensive feature extraction from multimodal data stored in the monitoring center. Fig. 10.7(b) shows a schematic diagram of existing fault diagnosis based on

Real-time fault diagnosis using deep fusion of features Chapter | 10

361

(a) Original 1-D sequence

Build 1-D CNN model

Feature1–DCNN

Build LSTM model

FeatureLSTM

Softmax classifier

Fault diagnosis

Error

(b)

FIGURE 10.7 Block diagram comparison of different fusion mechanisms. (a) Existing fault diagnosis method combining 1DCNN and LSTM. (b) Existing fault diagnosis method for combining CNN and LSTM. (c) Fault diagnosis using deep feature fusion of PeLSTM and CNN.

CNN and LSTM. It cannot come to a real-time fault diagnosis result since the sample fed into CNN is a 2-D matrix reshaped from 1-D sequence data. Using LSTM to extract features from 1-D sequence data cannot avoid the memory bottleneck problems. However, training CNN and LSTM separately cannot guarantee comprehensive features from multimodal data. To solve the preceding problems, a global optimization mechanism is developed to train all four related networks simultaneously. Fig. 10.7(c) shows the schematic of the deep fusion method developed in this work. The detail algorithms are as follows. Step 1: Designing a parallel PeLSTM. Since the 1-D sequence signal is one of the most commonly used data for mechanical system monitoring, extracting a satisfying dynamic sequential feature from 1-D sequence data is critical for accurate fault diagnosis. However, there is a memory bottleneck problem when LSTM is used to process long sequences: the useful information at the front of the sequence cannot be transferred to the back end of the sequence. PLSTMs can partially solve the preceding problems by processing the sequence at the same time. However, the state of the memory unit of traditional PLSTM may be affected by useless information. To overcome this problem, PeLSTM is designed. The peepholes will prevent the memory unit from transferring useless information. Taking the LSTM cell in Fig. 10.8 as an example, design a peephole denoted with a red line to prevent useless information from transferring. During the process of updating for C(t), the information flow through the forget gate f(t)

FIGURE 10.7

Continued.

Fault diagnosis and prognosis techniques for complex engineering systems

(c)

362

Real-time fault diagnosis using deep fusion of features Chapter | 10

forget gate

C (t−1)

⊗ fpe(t)

H (t−1)

input gate

output gate

C (t)

⊕ f(t)

wpe



(t)

tanh

363

tanh (t)

⊕ H (t)

X1D(t)

FIGURE 10.8

PeLSTM cell structure.

without peephole is shown in Eq. (10.3): f (t ) = σ (u f H(t − 1) + w f X1D (t ) + b f ),

(10.3)

where the forget gate is defined by the sigmoid function σ , uf ,wf represent the weight parameters, bf represents the bias, and H(t − 1) represents the output at the previous sampling time. It can be seen that C(t − 1) is not transferring through the forget gate, which will cause some useless information in C(t − 1) to occupy memory and it may be transferred to C(t). To overcome this shortcoming, a peephole connection is designed to filter uC(t − 1). After adding the peephole, information flow through the forget gate fpe (t) can be described as follows: Hpe (t − 1) = w peC(t − 1) f pe (t ) = σ (Hpe (t − 1) + w f X1D (t ) + u f H(t − 1) + b f )

(10.4) (10.5)

It can be seen from Fig. 10.8 that the updating process from C(t − 1) to C(t) can be described as follows: ∼

C(t ) = i(t )∗ C (t )+C(t − 1) ∗ f pe (t )

(10.6)

Therefore, in the process of memory unit updating, once any useless information leaks from C(t − 1), it will be introduced into the forget gate by the peephole and then cut off by the forget gate. For 1-D sequence data sampled in a given time window X1D = (X1D (1),X1D (2),…, X1D (t),…, X1D (m)), m samples can be fed into the m PeLSTM cell simultaneously by exchanging information with adjacent front and back samples. Repeat this step until the information transferring within the sequence is completed. Fig. 10.9 shows a schematic of the designed PLSTM with peephole. This figure shows the p-th round of the sequence information exchanging process. The red line shows the peephole connection, which is designed in front of each forget gate. Take the p-th information exchange process at time t as an example to illustrate its internal information exchange process. Each cell of PeLSTM has

364

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 10.9

Schematic diagram of the designed PeLSTM.

four forget gates. The first forget gate can be expressed by Eq. (10.7) and Eq. (10.8): H1,pe (t ) = w1,peCp−1 (t + 1), f1,p (t ) = σ (H1,pe (t ) + w f1,p ξ p (t ) + u f1,p X (t ) + v f1,p g p−1 + b f1,p ),

(10.7) (10.8)

where H1,pe (t) represents the peephole connection of the first forgetting gate f1,p (t), which corresponds to the information filter process of Cp − 1 (t + 1). It not only can prevent useless information from transferring but also can make fuller use of information in feature extraction by reusing the information. ξ p−1 (t ) = [Hp−1 (t − 1), Hp−1 (t ), Hp−1 (t + 1)] represents the information of three adjacent samples in step p − 1. gp − 1 represents information for all sequences in step p–1. In Eq. (10.8),w f1,p , u f1,p , v f1,p represent weights and b f1,p represents bias. Eq. (10.9) and Eq. (10.10) can be used to describe the information transferring

Real-time fault diagnosis using deep fusion of features Chapter | 10

365

process of the second forget gate: H2,pe (t ) = w2,peCp−1 (t ), f2,p (t ) = σ (H2,p e (t ) + w f2,p ξ p (t ) + u f2,p X (t ) + v f2,p g p−1 + b f2,p ),

(10.9) (10.10)

where H2,p e (t) represents the peephole connection of the second forgetting gate f2,p (t), which corresponds to the information filter process in Cp − 1 (t). It not only can prevent the transferring of useless information but also can make fuller use of information during the recurrent process of information. Similarly, the third forget gate and the fourth forget gate can be designed to prevent useless information transferring in Cp − 1 (t − 1) and Cg,p − 1 . In such a way, the peephole of the third forget gate and the fourth forget gate can be designed by Eqs. (10.11) through (10.14): H3,p e (t ) = w3,p e cCp−1 (t − 1), f3,p (t ) = σ (H3,p e (t ) + w f 3,p ξ p (t ) + u f 3,p X (t ) + v f3,p g p−1 + b f3,p ), H4,p e (t ) = w4,p eCg,p−1 ,

(10.11) (10.12) (10.13)

f4,p (t ) = σ (H4,p e (t ) + w f 4,p ξ p (t ) + u f 4,p X (t ) + v f 4,p g p−1 (t ) + b f 4,p ). (10.14) The forget gate is designed to filter the useless information in the corresponding memory unit, and the peephole connection can transfer useful information to the corresponding forget gate when useless information leaks out of the memory unit and cut off its transferring, as shown in Eq. (10.15): P e p (t ) = f1,p (t ) ∗ C p −1 (t + 1) + f2,p (t ) ∗ C p −1 (t ) + f3,p (t ) ∗ C p −1 (t − 1) + f4,p (t ) ∗ Cg,p −1 .

(10.15)

The feature Pep (t) transferred in the network after filtering the useless information is combined with the features of the same ip (t) and op (t) to complete the transferring process, as shown in Eq. (10.16): Hp (t ) = o p (t ) ∗ tanh(Pe p (t ) + i p (t ) ∗ C˜ p (t )).

(10.16)

In PLSTM, gp represents the overall characteristics of the sequence in the pth cycle, as shown in Fig. 10.9. The peephole connections are designed during the propagation of gp , denoted by the solid red line in the figure. The peephole connection is designed before the forget gate as follows: Hg,pe (t ) = wg,peCg,p−1 (t ), fg,p (t ) = σ (Hg,pe (t ) + w f gg p−1 + u f gHp−1 (t ) + b f g ),

(10.17) (10.18)

where Hg,pe (t) is the peephole connection of the forget gate fg,p (t), which corresponds to the recurrent information in Cg,p − 1 (t). It not only can prevent

366

Fault diagnosis and prognosis techniques for complex engineering systems

useless information from transferring but also can make the feature extraction more sufficient by designing the peephole connection. Since the information transferring of gp (t) is similar to that of the LSTM cell, it has the same structure of forget gate ig,p and output gate og,p . Therefore, the information transferring of gp (t) can be described as follows:  g p (t ) = og,p ∗ tanh( fg,p ∗ Cg,p−1 (t ) + ig,p ∗ Cg,p−1 (t )). (10.19) m

By designing the peephole connection, the information in the corresponding memory unit can be reused to avoid waste of useful information in the recurrent process and make the information more fully utilized. At the same time, useless information can be prevented from occupying weights in the information flow of the memory unit, which can effectively overcome the memory bottleneck problem. Step 2: Build a PeLSTM model to extract the autocorrelated dynamic feature involved in 1-D sequence data. First, we build a PeLSTM neural network NetPeLSTM with peephole connections as follows: NetPeLST M = GenP e LST M (θ1 , θ2 , · · · , θm ; n1 , n2 , · · · , nm ; Cycles),

(10.20)

where GenPeLSTM is a function for generating a neural network and m is the number of cells. n1 ,n2 ,…, nm are the number of hidden neurons in the gate structure in each cell.θ 1 = {w1,gate ,b1,gate }, θ 2 = {w2,gate ,b2,gate },…, θ m = {wm,gate ,bm,gate } are the weight and bias of each gate structure in each cell. Cycles is the number of cycles. Take the output of the last cycle step gcycle as the feature extracted from PeLSTM: FPeLST M =GPeLST M (NetPeLST M , θPeLST M , X1D ),

(10.21)

where GPeLSTM is the nonlinear function to describe the relation of input and output of the PeLSTM network, θ PeLSTM = {θ 1 ,θ 2 ,…, θ m }. Step 3: Use CNN to extract the local feature of the 2-D screenshot image. Build NetCNN , as shown in Eq. (10.22): NetCNN = GenCNN (K, bCNN ; Kc , Ksize , Psize ; Kstep , Pstep ),

(10.22)

where Kc , Ksize , and Kstep represent the parameters of the convolution kernel: the number of channels, the size, and the step size during the convolution operation. Psize and Pstep represent the parameters of the pooling kernel: the size and the step size when completing the pooling operation. The constructed CNN can be trained by samples of the 2-D screen capturing image. Back Propagation is used for parameter adjustment of NetCNN to obtain the convolution kernel K and the bias bCNN . Once NetCNN is well trained, the feature FCNN of the 2-D screen capturing image can be extracted by Eq. (10.23): FCNN =GCNN (NetCNN , K, bCNN , X2D ),

(10.23)

Real-time fault diagnosis using deep fusion of features Chapter | 10

1-D sequence data

367

FCNN

CNN

Feature fusion network

2-D screen capturing image

Hidden layer

PeLSTM

1-D sequence data

PeLSTM cell

PeLSTM cell

PeLSTM cell

PeLSTM cell

PeLSTM cell

PeLSTM cell

PeLSTM cell

The first cycle

FIGURE 10.10 and CNN.

PeLSTM cell

Feature splice

FPeLSTM

The second cycle

Schematic diagram of fault diagnosis based on deep feature fusion of PeLSTM

where GCNN is the nonlinear output function to describe the relation of CNN’s input and output. Step 4: Design the feature fusion network for CNN and PeLSTM. To fuse the feature extracted by PeLSTM and CNN, a fusion network can be designed via Eq. (10.24): Net f usion = Feedf orward(θ f usion ; H f usion , L f usion ),

(10.24)

where θ fusion = {Wfusion ,bfusion } are the parameters of the fusion network. The number of hidden neurons is represented by Hfusion . Lfusion is the number of layers of the fusion network. As shown in Fig. 10.10, the rough feature FCNN , FPeLSTM respectively extracted by PeLSTM and CNN can be fed into Netfusion . Step 5: The global optimization mechanism to tune the parameters of the four networks. To achieve a deep fusion rather than a simple combining of FCNN and FPeLSTM , a global optimization mechanism to further tune the parameters of Netfusion , NetPeLSTM , NetCNN and the classifier for fault diagnosis is developed as shown in Fig. 10.11. Use the fused feature Ffusion as the input of the Softmax classifier as follows: Netso f tmax = Feedf orward(θs ; Hs ),

(10.25)

where θ s = {Ws , bs } is the parameter of the classifier network for fault diagnosis; Ws and bs are the corresponding weight and bias, respectively; and Hs is the number of hidden layers of Softmax. The global error is generated by comparing

368

Fault diagnosis and prognosis techniques for complex engineering systems

JPeLSTM NetPeLSTM

1-D sequence

FPeLSTM

Netfusion 2-D screen capturing image

NetCNN

FCNN

JFD−error

Label

Ffusion

Netzofmax

Label

J (θ) Label

J (θ)

JCNN FIGURE 10.11

Global optimization parameter architecture diagram.

the output label labeloutput with the real label labelreal , and the error is backpropagated to PeLSTM, CNN, and the fusion network at the same time. Thus all parameters related to the fault classification network, fusion network, PeLSTM, and CNN can be globally adjusted by minimizing the global loss function defined in Eq. (10.26): J(θ ) = JFD−error + J f usion + JPeLST M + JCNN ,

(10.26)

where JFD − error , Jfusion , JPeLSTM , JCNN are the loss functions of the corresponding four networks defined in the form of cross entropy, as shown in Eq. (10.27): 1 labelreal ln labelout put JFD−error = − K + (1 − labelreal )ln(1 − labelout put ). (10.27) Jfusion , JPeLSTM , JCNN are defined similarly to that found in Eq. (10.27). It is worth noting that during the back-propagation process, Efusion used to define Jfusion can be divided into two parts as follows: E f usion = EPeLST M + ECNN .

(10.28)

Once the four networks NetCNN , NetPeLSTM , Netfusion , and Netsoftmax are globally optimized, the well-trained network parameters can be described as T rglobal =Train(NetPeLST M , NetCNN , Net f usion , Netso f tmax ; J(θ ); X1D , X2D ),

(10.29)

where Trglobal = {θ PeLSTM ; K, bCNN ; θ fusion ; θ Softmax } are the well-trained parameters of the four networks. The fusion feature Ffusion is generated on the last layer of the fusion network as follows: Ff usion = G f usion (Net f usion , θ f usion , X1D , X2D ).

(10.30)

Real-time fault diagnosis using deep fusion of features Chapter | 10

369

Step 6: Online fault diagnosis based on the feature fusion network. Once online samples Xonline,1D (t) and Xonline,2D (t) at time t are collected, use the welltrained PeLSTM network NetPeLSTM to extract features involved in online 1-D data as follows: FPeLST M (t ) = GPeLST M (NetPeLST M , θPeLST M ; Cycles, Xonline,1D (t )).

(10.31)

Use the well-trained CNN network NetCNN to extract features involved in the online 2-D screenshot image as follows: FCNN (t ) = GCNN (NetCNN ; K, bCNN ; Xonline, 2D (t )).

(10.32)

The trained fusion network Netfusion is then used to fuse the feature of the online 1-D sequence data and 2-D screenshot image as follows: Ff usion (t ) = G f usion (Net f usion , θ f usion , FPeLST M (t ), FCNN (t )).

(10.33)

Finally, the fused feature Ffusion (t) obtained by deep feature fusion is used as the input of the fault classifier, and the output of the classifier can be obtained via Eqs. (10.34) and (10.35): ⎤ ⎡ p(label(t ) = 1)|Ff usion (t ); θS ) ⎢ p(label(t ) = 2)|Ff usion (t ); θS ) ⎥ ⎥ ⎢ hθ, f usion (t ) = ⎢ ⎥ .. ⎦ ⎣ . p(label(t ) = L)|Ff usion (t ); θS ) ⎡ θ T F (t ) ⎤ (10.34) e S1 f usion T ⎢ θS Ff usion (t ) ⎥ ⎢ e 2 ⎥ 1 ⎢ ⎥, = L . ⎢ ⎥ .. T ⎣ ⎦ θSl Ff usion θ TS Ff usion (t ) l=1 L e lableX (t ) = argmax {hθs , f usion (t )|X1D (t ), X2D (t ); θs )}, k=1,2,··· ,K

(10.35)

where θ s is the parameter of the Softmax classifier and lableX (t) is the online diagnostic result at time t. The flowchart of fault diagnosis using deep fusion of the feature extracted by PeLSTM and CNN is shown in Fig. 10.12. Remark 1. The proposed algorithm can improve the accuracy of fault diagnosis results in the following aspects: (1) An improved PLSTM called PeLSTM is designed for more accurate feature extraction by designing a peephole connection before each forget gate to prevent useless information transferring in the cell. (2) A deep fusion mechanism using a new loss function and global training scheme is designed to incorporate the feature involved in 1-D sequence data

370

Fault diagnosis and prognosis techniques for complex engineering systems

FIGURE 10.12 Flowchart of fault diagnosis using the deep fusion of features extracted by PeLSTM and CNN.

Real-time fault diagnosis using deep fusion of features Chapter | 10

371

FIGURE 10.13 Experimental platform for the rolling bearing to obtain a vibration signal. From the Bearing Data Center of Case Western Reserve University [3].

and the 2-D screenshot image such that a more comprehensive feature can be extracted. (3) Since the global training mechanism is used for PeLSTM, CNN, the fusion network, and the classifier, the accuracy of fault classification can be improved once the feature involved in data is accurately extracted.

10.4 Experimental testing In this section, bearing data and gearbox data are used for experimental research in Sections 10.4.1 and 10.4.2, respectively.

10.4.1

Rolling bearing test and analysis

The rolling bearing data used in this section are downloaded from the Bearing Data Center of Case Western Reserve University [3]. The experimental platform is shown in Fig. 10.13. The fault size ranges from 0.007 to 0.014 to 0.021 to 0.028 in. The motor load varies from 0 to 3 hp. The sample frequency of the raw vibration signal is 12 kHz.

10.4.1.1 Data preprocessing and experimental design The moving window technique is used to generate the samples required by the related algorithms. To test the algorithm in different experimental scenarios, three kinds of sliding window size 100/400/900 are selected, and the sliding step is set to 20. The original 1-D sequence data is stacked into 2-D matrix data with window size 10∗ 10/20∗ 20/30∗ 30, and the size of the 2D screenshot is 28∗ 28. Table 10.1 lists the model and the data used in the related algorithms.

372

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 10.1 Model and data used in the related algorithms. Experimental serial no. Fault type

Fault size (in.)

Training Test sample sample

1

Normal Bearing/Ball Fault/Inner Ring Fault/ Outer Ring Fault

0/0.007/0.007/0.007

8000

4000

2

Normal Bearing/Ball Fault/Inner Ring Fault/ Outer Ring Fault

0/0.014/0.014/0.014

8000

4000

3

Normal Bearing/Ball Fault/Inner Ring Fault/ Outer Ring Fault

0/0.21/0.021/0.021

8000

4000

4

Normal Bearing and different fault size of ball Fault

0/0.007/0.014/0.021

8000

4000

5

Normal Bearing and different fault size of Inner Ring Fault

0/0.007/0.014/0.021

8000

4000

6

Normal Bearing and different fault size of Outer Ring Fault

0/0.007/0.014/0.021

8000

4000

7

Multiple Faults of various Fault size

0/0.007/0.014/0.021

20000

10000

The experimental design is shown in Table 10.2. Experiments 1 through 6 are for 10 failure categories. Experiments 1 through 3 are for different sequence length; each class has 2000 training samples. A total of 8000 training samples are available for Experiments 4 through 6. Experiments 7 through 9 are set for different fault sizes, and Experiments 9 through 12 are set for the different types of faults. These 12 experiments are designed for the case of different fault types and different fault sizes with the same fault type, respectively. The network parameters of the related 12 experiments are listed in Table 10.3. Diagnostic accuracy comparison of the related algorithms is presented later in Table 10.6.

10.4.1.2 Analysis of experimental results Experiments 1 through 3 are the scenarios with different window size, and 2000 samples are used for training. However, in the field of engineering, the samples used to train the network are not easy to obtain, so we designed Experiments 4 through 6, and the training set has 800 training samples. Experiments 7 through 9 are the scenarios for different sizes of fault settings, and Experiments 9 through 12 are for different types of fault settings. Through the experimental results,

Real-time fault diagnosis using deep fusion of features Chapter | 10

373

TABLE 10.2 Experimental design. Grouping

No. of layers

No. of neurons

Iteration times

Learning rate

1

5

100/200/100

2000

0.01

2

5

100/200/100

2000

0.01

3

5

100/200/100

2000

0.01

4

5

100/300/100

3000

0.005

5

5

100/300/100

3000

0.005

6

5

100/300/100

3000

0.005

7

5

200/500/200

5000

0.01

TABLE 10.3 Network parameters. Diagnostic accuracy using extracted common features

Diagnostic accuracy extracted by fusion of common features and two other features

Diagnostic accuracy Experiment using only no. 1-D data

Diagnostic accuracy using only 2-D data

Diagnostic accuracy using feature fusion by simply splicing

1

80.02%

82.45%

90.02%

97.14%

98.7%

2

81.32%

83.35%

91.87%

97.50%

98.87%

3

83.89%

86.44%

94.45%

98.92%

99.17%

4

73.22%

75.07%

87.45%

93.27%

95.95%

5

74.50%

76.59%

89.39%

94.60%

97.75%

6

75.52%

75.57%

87.62%

94.12%

97.27%

7

70.27%

73.38%

84.28%

90.13%

94.38%

the advantages of the proposed PeLSTM and the fusion algorithm can be clearly seen. The accuracy of the proposed fault diagnosis algorithm using 1-D sequence data is compared with other existed methods. The fault diagnosis results are shown in Table 10.4. Comparing column 6 and column 2 in Table 10.4, it can be concluded from row 3 that when the window size of the 1-D data sequence is small, the diagnostic accuracy of PeLSTM is 93.63%, whereas the diagnostic accuracy of the traditional DNN with the same training data is only 82.28%. The diagnostic accuracy is improved by 11.35%, indicating that the autocorrelation involved in 1-D sequence data has a great influence on the diagnosis results, and PeLSTM can make full use of this information. For all 12 experiments, the accuracy of PeLSTM is always more than 5% higher than DNN.

374

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 10.4 Fault diagnosis results. Algorithm no.

Abbreviation

Algorithm description

1

DDN

DNN using only 1-D sequence data

2

I-D CNN

1DCNN using only 1-D sequence data

3

LSTM

LSTM using only 1-D sequence data

4

PLSTM

Parallel LSTM using 1-D sequence data

5

PeLSTM

Pe LSTM using only 1-D sequence data

6

CNN(2DM)

CNN using 2-D matrix data stacked in a row

7

CNN(2DS)

CNN for fault diagnosis using a 2-D screenshot image

8

LSTM-IDCNN

The method of using the output of 1-D CNN as the input of LSTM in the sequence feature fusion method

9

LSTM-SFCNN(2DM)

Splicing fusion of LSTM using 1-D sequence data and CNN using 2-D matrix data

10

LSTM-DF-IDCNN

Deep feature fusion of LSTM using 1-D sequence data and I-D CNN

11

PeLSTM-DF-IDCNN

Deep feature fusion of PeLSTM using 1-D sequence data and I-D CNN

12

LSTM-FF-CNN (2DM)

Deep fusion of LSTM and CNN using 1-D sequence data and 2-D matrix data stacked in a row

13

PeLSTM-DFCNN(2DS)

Deep feature fusion of PeLSTM and CNN using a 2-D screenshot image

Column 3 in Table 10.4 shows the fault diagnosis accuracy using 1-D CNN. Comparing column 6 with column 3 indicates that when the number of training samples is reduced, 1DCNN’s ability to extract features is obviously affected, as only 87.63% accuracy can be achieved, whereas the accuracy of PeLSTM in the same experimental scenario can reach 94.93%. For the same experimental scenario, the accuracy of PeLSTM is 9.02% higher than that of DNN and LSTM. Comparing column 6 and column 4 in Table 10.4, it can be concluded that in the scenario of Experiment 8, when fault size is 0.014, although diagnosis accuracy using LSTM can reach 97.65%, there is still some useless information spread in the LSTM network, affecting the diagnosis effect. Using PeLSTM can further improve the diagnosis accuracy by 1.15% to 98.71%. The reason is that the traditional LSTM cannot prevent the propagation of useless information, which occupies network memory and affects accuracy of fault diagnosis. Using features extracted from heterogeneous data can effectively improve accuracy of the fault diagnosis result. Table 10.5 compares the fault diagnosis

TABLE 10.5 Comparison of fault diagnosis accuracy of the proposed deep fusion algorithm with other existing fusion methods. Fault type

Window size

Fault size

Training sample size

Test sample size

1

Inner race: Roller and outer race of three sizes; normal condition

100

0.007/0.014/0.021/0

20,000

20,000

2

Inner race: Roller and outer race of three sizes; normal condition

400

0.007/0.014/0.021/0

20,000

20,000

3

Inner race: Roller and outer race of three sizes; normal condition

900

0.007/0.014/0.021/0

20,000

20,000

4

Inner race: Roller and outer race of three sizes; normal condition

100

0.007/0.014/0.021/0

8000

4000

5

Inner race: Roller and outer race of three sizes; normal condition

400

0.007/0.014/0.021/0

8000

4000

6

Inner race: Roller and outer race of three sizes; normal condition

900

0.007/0.014/0.021/0

8000

4000

7

Inner race: Roller and outer race; normal condition

400

0.007/0.014/0.021/0

8000

8000

8

Inner race: Roller and outer race; normal condition

400

0.007/0.014/0.021/0

8000

8000

9

Inner race: Roller and outer race; normal condition

400

0.007/0.014/0.021/0

8000

8000

10

Roller faults of three sizes; normal condition

400

0.007/0.014/0.021/0

8000

8000

11

Inner faults of three sizes; normal condition

400

0.007/0.014/0.021/0

8000

8000

12

Outer faults of three sizes; normal condition

400

0.007/0.014/0.021/0

8000

8000

Real-time fault diagnosis using deep fusion of features Chapter | 10

Experiment no.

375

376

Fault diagnosis and prognosis techniques for complex engineering systems

accuracy of the proposed deep fusion algorithm with other existing fusion methods. In Table 10.5, column 2 illustrates the fault diagnosis accuracy of existing sequential fusion when the output of 1DCNN is fed into LSTM. Column 4 corresponds to deep fusion of 1-D CNN and LSTM. Comparing column 2 and column 4 of Table 10.5, it can be seen from row 2 that under the scene setting of Experiment 1, diagnosis accuracy of deep fusion is 90.05%, whereas diagnosis accuracy corresponding to the traditional fusion method is only 88.96%. It shows that for the same set of feature extraction networks, deep fusion methods are superior to traditional fusion methods. Table 10.6 summarizes the experiment result analysis mentioned earlier. Column 7 of Table 10.6 shows the accuracy for CNN fault diagnosis using 2-D matrix data stacked in a row, and column 8 shows the accuracy for CNN fault diagnosis using a 2-D screenshot image. Comparing with column 7 and column 8, it can be concluded that a more accurate real-time fault diagnosis can be achieved when a 2-D screenshot image of 1-D waveform rather than 2-D matrix data stacked in a row from 1-D sequence data is fed into CNN for feature extraction. Thus, the 2-D screenshot image is a better choice for the data source fed into CNN since dynamic trend information is involved in the 2-D screenshot image. Column 4 of Table 10.6 shows the accuracy for traditional LSTM fault diagnosis using a 1-D sequence. Comparing column 2 with column 4, it can be seen that diagnosis accuracy of DNN in all experimental scenarios is lower than that of LSTM, and the difference is more than 5%. The reason is that the forgetting mechanism does well in autocorrelation extraction of 1-D sequence data, whereas DNN can only extract features from the overall perspective. Comparing column 4 and column 8 of Table 10.6, the differences in these experimental results indicate that the accuracy of CNN fault diagnosis using a 2-D screenshot image of the screenshot for 1-D sequence data is worse than that of LSTM using 1-D sequence data. The reason is that the original signal used for diagnosis is a sequence signal. Comparing with CNN, LSTM does better in dynamic trend feature extraction. Column 6 and column 4 of Table 10.6 indicate that PeLSTM can achieve higher diagnostic accuracy than traditional LSTM. PeLSTM does better in sequence feature extraction since the forget gate for PeLSTM can intelligently forget useless information and avoids the problem of the bottleneck, which is the reason for this difference. Column 6 indicates that even when PeLSTM is used, it cannot achieve a satisfying fault diagnosis result required by the engineer since the accuracy for Experiments 1, 4, and 5are all less than 90%. Thus, it is necessary to fuse features extracted by LSTM and CNN. Column 9 of Table 10.6 shows the fault diagnosis accuracy using existing fusion methods to combine LSTM and 1-D CNN. Comparing column 9 with column 4, it can be found that the fault diagnosis capability can be improved once it incorporates the advantage of LSTM and 1-D CNN. But comparing column 9 and column 6 shows that for some specific experiments, such as

Real-time fault diagnosis using deep fusion of features Chapter | 10

TABLE 10.6 Model parameters table. Experiment no. Model

Training parameters

1

DNN

No. of layers: 6 No. of neurons of each layer: (100/400/900)/1200/800/400/100/(10/4) Learning rate: 0.001

2

l-D CNN

Convolutional layer: Ksize : 1∗ 3 Ke : 16/32Kstep : 1 Pooling layer: Psize : 1∗ 2 Pstep : 2 Learning rate: 0.001

3

LSTM

Sequence length: 100/400/900 Cell no.: 10/20/30 No. of hidden neurons in the cell: 138 Learning rate: 0.001

4

PLSTM

Cycle: 7 No. of hidden neurons in the cell: 138 Learning rate: 0.001

5

PeLSTM

Cycle: 7 No. of hidden neurons in the cell: 138 Learning rate: 0.001

6

CNN(2DM) (CNN using 2-D matrix data stacked in a row)

Convolutional Layer : Ksize : 3 Kc : 16/32 Kstep : 1 Pooling Layer: Psize : 2∗ 2 Pstep : 2 Fully connected layer: No. of neurons: 138 Learning rate: 0.001

7

CNN(2DS) (CNN for fault diagnosis using a 2-D screenshot imag)

Convolutional Layer : Ksize : 3∗ 3 Kc : 16/32 Kstep : 1 Pooling Layer: Psize : 2∗ 2 Pstep : 2 Fully connected layer: No. of neurons: 138 Learning rate: 0.001

8

LSTM-l-D CNN (series feature fusion method where the input of LSTM is the output of lDCNN)

Convolutional Layer: Ksize : 1∗ 3 Kc : 16/32 Kstep : 1 Pooling Layer: Psize : 1∗ 2 Pstep : 2 LSTM: No. of hidden neurons in the cell: 138 Learning rate: 0.001

9

LSTM-SF-CNN(2DM) (splicing fusion of LSTM using 1-D sequence data and CNN using 2-D matrix data)

Convolutional Layer: Ksize : 3∗ 3 Kc : 16/32 Kstep : 1 Pooling Layer: Psize : 2∗ 2 Pstep : 2 LSTM: No. of hidden neurons in the cell: 138

377

378

Fault diagnosis and prognosis techniques for complex engineering systems

Experiment 3 and Experiment 11, due to the simple combination instead of using the deep fusion mechanism, the fault diagnosis accuracy is even lower than the method using only PeLSTM. Column 10 of Table 10.6 shows the fault diagnosis accuracy for deep feature fusion of LSTM using 1-D sequence data and CNN using 2-D matrix data. It can be seen from column 8 and column 10 that fault diagnosis accuracy for most experiments is satisfying except in Experiment 3 when the deep feature fusion mechanism is adopted. The reason is that the method in column 10 still uses the original 1-D sequence as the data source since the 2-D matrix is stacked from 1-D sequence data in the means row by row. This indicates that fault diagnosis using a data source with a single modal cannot achieve satisfying diagnosis accuracy even when PeLSTM and the deep fusion mechanism are used for advanced feature extraction means. Column 14 of Table 10.6 tells us that fault diagnosis accuracy based on deep feature fusion of PeLSTM using 1-D sequence data and CNN using a 2-D screenshot image is superior to the other 12 methods mentioned in Table 10.6. Diagnosis for all 12 experiments is higher than 90%, which is satisfying for the engineer. There are 12 experiments designed in this section to show the influence of window size of the sequence, number of training samples, and fault size. Taking column 14 of Table 10.6 as an example, we tried to analyze these specific influences to fault diagnosis accuracy. Rows 2 through 4 indicate that given the number of training samples, a large window size will result in more accurate fault diagnosis since much information is involved in a relatively long sequence. Comparing rows 5 through 7 with rows 2 through 4, it can be concluded that once the sequence length and the number of training samples are given, fault with large fault size is much easier to detect. The experimental results shown in rows 11 through 13 indicate that in addition to a high diagnosis result for different fault types, the algorithm designed in this chapter can achieve a better distinguished capability of a unique fault with different fault size, which is quite helpful for the prognosis and maintenance of mechanical equipment. In other words, no matter whether the training sample size is small or the sample sequence length is changed, column 14 in Table 10.6 can achieve high diagnostic accuracy, which will help us get the following conclusions: (1) The 2-D screenshot image involves more useful fault features when CNN is used to extract local features. Thus, it is helpful for real-time and accurate fault diagnosis. (2) The designed PeLSTM is a good feature extraction tool for 1-D sequence data. (3) Deep feature fusion using a global optimization mechanism does well in the fusion of heterogeneous data, such as 1-D sequence data and a 2-D screenshot image. Thus, it is helpful for accurate fault diagnosis. To improve the readability of the experiments listed in Table 10.6, Fig. 10.14 shows the fault diagnosis classification chart taking Experiment 6 as an example.

Real-time fault diagnosis using deep fusion of features Chapter | 10

379

(a) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

(b) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

(c) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

FIGURE 10.14 Fault diagnosis classification chart for Experiment 6 with 10 types of faults, with 800 training samples for each type and a window size of 900.

380

Fault diagnosis and prognosis techniques for complex engineering systems

(d) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

400

800

1200

1600

2000 2400 Simple time

400

800

1200

1600

2000 Simple time

400

800

1200

1600

2000 Simple time

2800

3200

3600

4000

2400

2800

3200

3600

4000

2400

2800

3200

3600

4000

(e) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

(f) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

FIGURE 10.14

Continued

Real-time fault diagnosis using deep fusion of features Chapter | 10

381

(g) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

(h) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

(i) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

FIGURE 10.14

Continued

382

Fault diagnosis and prognosis techniques for complex engineering systems

(j) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

(k) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

(l) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

FIGURE 10.14

Continued

Real-time fault diagnosis using deep fusion of features Chapter | 10

383

(m) Normal 0.021Out race 0.021Inner race 0.021Roller 0.014Out race 0.014Inner race 0.014Roller 0.007Out race 0.007Inner race 0.007Roller 0

FIGURE 10.14

400

800

1200

1600

2000 Simple time

2400

2800

3200

3600

4000

Continued

The classification results are represented by red stars in the figure. The blue circles represent the true fault categories of the sample. The coincidence of blue circle and red star indicates that the classification is correct. Parts (a) through (m) of Fig. 10.14 correspond to row 7 in Table 10.6. Fig. 10.14(a) is the result of traditional DNN fault diagnosis. Fig. 10.14(b) is the result of 1-D CNN fault diagnosis. Fig. 10.14(c) is the result of LSTM fault diagnosis. Fig. 10.14(d) is the result of PLSTM fault diagnosis, and Fig. 10.14(e) is the diagnosis result of PeLSTM designed in this chapter. Comparing Fig. 10.14(e) with Fig. 10.14(a) through (d), it can be concluded that Fig. 10.14(e) shows more coincidences of stars and circles, indicating that there are fewer samples with misclassification of PeLSTM, which has obvious advantages. Fig. 10.14(f) uses CNN as a feature extraction tool and 2-D matrix data as training data, and Fig. 10.14(g) is the fault diagnosis result using a 2-D screenshot image as the input of CNN. It can be seen from Fig. 10.14(f) that the red stars are dense, which indicates that the misclassification rate is relatively high. Although there are more inconsistent red stars in Fig. 10.14(f) than in Fig. 10.14(c) and (e), they are still significantly less than in Fig. 10.14(f). Comparing Fig. 10.14(f) with Fig. 10.14(g), the fault diagnosis result using two-dimensional screenshots as CNN input is significantly higher than the diagnosis result using two-dimensional matrix as CNN input, indicating that the 2-D screenshot image involves more useful information than that of the 2-D matrix data. Fig. 10.14(l) is a fault diagnosis result based on feature fusion of CNN(2DS) and PeLSTM. Comparing Fig. 10.14(l) with 10.14(c) and (f) indicates that after merging of features extracted by CNN and LSTM, the diagnosis result is improved. However, it uses CNN as a diagnostic tool and 2-D matrix data stacked from 1-D sequence data as training samples, and this diagnosis algorithm is not in real time. The existing fusion algorithm using 1-D CNN and LSTM is represented in Fig. 10.14(h).

384

Fault diagnosis and prognosis techniques for complex engineering systems

This method uses the output of 1-D CNN as the input of LSTM, so the fused feature extracted by LSTM depends on the output accuracy of 1-D CNN. By comparing with Fig. 10.14(c), it is shown that Fig. 10.14(h) has a better diagnostic result because it merges the local feature extracted using 1D CNN and the autocorrelated feature extracted using LSTM. Fig. 10.14(m) shows a fault diagnosis result based on deep feature fusion of PeLSTM using 1-D sequence data and CNN using a 2-D screenshot image. It can be seen from Fig. 10.14(m) that the misclassification rate is quite small. The reason is that the networks participating in deep feature fusion can make full feature extraction from heterogeneous data. The fault diagnosis result of this proposed method is the best one in these 13 algorithms since the 1-D dynamic trend feature and 2-D local feature are combined well extracted and fused via a fusion network trained by a mechanism of global optimization. However, the 2-D screenshot image rather than 2-D matrix data stacked from the 1-D sequence is adopted to achieve a real-time diagnosis required by related engineers. Fig. 10.15 shows the comparison bar chart. Remark 2. This chapter uses common fault diagnosis algorithms such as DNN, CNN, and LSTM for comparison with existing results. The proposed algorithm is compared with the existing feature fusion algorithms such as LSTM-1DCNN and LSTM-DF-1DCNN. After doing multiple sets of experiments under any experimental scene settings, it can be concluded that (1) a 2-D screenshot image involves more useful fault features when CNN is used to extract local features; (2) the designed PeLSTM is a good feature extraction tool for 1-D sequence data, and (3) deep fusion using the global optimization mechanism does well in the fusion of heterogeneous data, such as 1-D sequence data and the 2-D screenshot image.

10.4.2

Gearbox test and analysis

The proposed algorithm can also be applied to diagnose fault of the gearbox. Fault and normal data were collected from the QPZZ − IIrotating machinery vibration experimental platform [24]. QPZZ −II can simulate the following faults: pitting, broken tooth, wear, and combining fault of pitting and wear. In this experiment, the data are sampled in the case when speed is 880 r/min and the current is 0.05 A. The vibration signal is sampled at the output shaft motor side. The health state of the gearbox can be divided into six categories: (1) Normal, (2) Pitting, (3) Broken tooth, (4) Wear, (5) Pitting and Wear, and (6) Broken teeth and Wear.

10.4.2.1 Data preprocessing The original data are the 1-D vibration signal, and it can be a very long sequence, so the moving window technique is used for data preprocessing. The means of data preprocessing is the same as that of bearing data as listed in Section 10.4.1. In other words, the window size of each sample is 100/400/900. The test samples

95. 00% 90. 00% 85. 00% 80. 00% 75. 00% 70. 00% 65. 00%

FIGURE 10.15

2 pe

rim

en

t1

1 t1 Ex

Ex

pe

rim

en pe

rim

en

t1

0

t9 Ex

Ex

pe

rim

en

t8 pe

rim

en

t7 Ex

pe

rim

en

t6 Ex

pe

rim

en

t5 Ex

pe

rim

en

t4 Ex

pe

rim

en

t3 Ex

pe

rim

en

t2 Ex

en rim pe Ex

Ex

pe

rim

en

t1

60. 00%

DNN

LDCNN

LSDM

PLSTM

PeLSTM

CNN (2DM)

CNN (2DS)

LSTM−1DCNN

LSTM−SF−CNN (2DM)

LSTM−DF−1DCNN

PeLSTM-1DCNN

LSTM−DF−CNN (2DM)

PeLSTM−DF−CNN (2DS)

Comparison of different fault diagnosis methods for rolling bearing.

Real-time fault diagnosis using deep fusion of features Chapter | 10

100. 00%

385

386

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 10.7 Comparison of bearing fault diagnosis accuracy using 1-D sequence data. Experiment no.

DNN

1-D CNN

LSTM

PLSTM

PeLSTM

1

79.68%

82.36%

85.51%

86.03%

87.18%

2

82.28%

86.53%

91.16%

92.23%

93.63%

3

86.41%

89.32%

94.38%

95.14%

96.58%

4

68.39%

70.81%

73.35%

74.02%

75.92%

5

76.56%

79.98%

85.68%

86.17%

87.65%

6

85.91%

87.63%

92.80%

93.21%

94.93%

7

90.39%

92.41%

96.56%

97.32%

98.45%

8

91.21%

93.24%

96.65%

97.13%

98.71%

9

91.96%

93.56%

97.80%

98.04%

99.04%

10

83.73%

85.85%

88.45%

89.37%

90.48%

11

86.74%

90.73%

95.13%

96.46%

97.80%

12

91.68%

93.06%

96.84%

97.17%

98.06%

are reshaped in 2-D matrix data with size 10∗ 10/20∗ 20/30∗ 30, and the screenshot image size is 28∗ 28. The comparison of the corresponding 12 experiments can illustrate the superiority of the proposed algorithm in different cases.

10.4.2.2 Experiment result analysis Just as the experiment design means in 4.1, there are also 12 experiments designed in this section to verify our method for cases of different fault types and same fault types with different fault sizes, respectively. The specific experimental design is shown later in Table 10.9, and the experimental results are shown later in Table 10.10. Comparing column 6 and column 2 in Table 10.7, it can be seen from row 4 that when the window size of the 1-D data sequence is short, the diagnostic accuracy of PeLSTM is 89.61%, whereas the diagnostic accuracy of the traditional DNN with the same training data is only 79.31%. The diagnostic accuracy is improved by 10.3%, indicating that the autocorrelation involved in 1-D sequence data has a great influence on the diagnosis results. PeLSTM can make full use of this information. For all 12 experiments, the accuracy of PeLSTM is always 5% higher than DNN. Column 3 in Table 10.7 shows the fault diagnosis accuracy using 1-D CNN. Comparing column 6 with column 3, it can be concluded that when the number of training samples is reduced, 1DCNN’s ability to extract features is obviously affected, as only 81.70% accuracy can be achieved, whereas the accuracy of

Real-time fault diagnosis using deep fusion of features Chapter | 10

387

TABLE 10.8 Comparison of fault diagnosis accuracy of the proposed deep fusion algorithm with other existing fusion methods. Experiment LSTMno. 1DCNN

LSTM-SFCNN (2DM)

LSTM-DF- PeLSTMLSTM-DF- PeLSTMCNN DF-CNN lDCNN DF-1DCNN (2DM) (2DS)

1

87.96%

88.62%

90.05%

90.41%

91.84%

95.49%

2

91.74%

92.31%

93.21%

94.02%

95.94%

97.36%

3

92.24%

93.42%

94.12%

95.21%

96.56%

98.74%

4

79.30%

80.04%

81.47%

82.59%

83.77%

90.95%

5

87.22%

87.93%

88.54%

89.01%

90.38%

94.28%

6

93.65%

94.37%

95.03%

96.25%

96.93%

98.68%

7

96.14%

96.96%

97.81%

98.11%

98.33%

99.45%

8

95.17%

96.27%

97.77%

98.30%

98.51%

99.56%

9

95.18%

96.67%

97.31%

98.04%

98.60%

99.97%

10

88.51%

89.87%

90.99%

91.21%

91.86%

94.21%

11

95.26%

96.42%

97.07%

97.92%

98.98%

99.99%

12

96.86%

97.03%

97.48%

98.20%

98.38%

99.94%

PeLSTM in the same experimental scenario can reach 85.45%. For the same experimental scenario, there is improvement of 11.75% and 2.09% relative to that of DNN and LSTM. Comparing column 6 and column 4 in Table 10.7 shows that in the scenario of Experiment 8, using LSTM can reach 87.83% accuracy, but there is still some useless information transferred in the LSTM network, which will affect diagnosis efficiency. Using PeLSTM to prevent useless information from transferring can further achieve an improvement of 1.91%. The reason is that traditional LSTM cannot prevent the transferring of useless information, which occupies the network memory and affects the accuracy of fault diagnosis. Table 10.8 compares the fault diagnosis accuracy of the proposed deep fusion algorithm with other existed fusion methods. In Table 10.8, column 2 illustrates the fault diagnosis accuracy of sequential fusion when 1-D CNN and LSTM are connected in sequence. Column 4 corresponds to deep fusion of 1DCNN and LSTM. Comparing column 2 with column 4 of Table 10.8, it can be seen from row 2 that under the scenario of Experiment 1, diagnosis accuracy of deep fusion is 89.31%, whereas diagnosis accuracy corresponding to the traditional fusion method is only 87.94%, and the accuracy is increased by 1.37%. It shows that for the same set of feature extraction networks, deep fusion methods are superior to traditional fusion methods.

388

CNN (2DS)

LSTM1DCNN

LSTM-S F-CNN (2DM)

LSTM-D F-lD CNN

PeLSTM- LSTMPeLSTMDFDF-CNN DF-CNN 1DCNN (2DM) (2DS)

DNN

1DCNN

LSTM

PLSTM

PeLSTM

CNN (2DM)

79.68%

82.36%

85.51%

86.03%

87.18%

69.63%

77.74%

87.96%

88.62%

90.05%

90.41%

91.84%

95.49%

82.28%

86.53%

91.16%

92.23%

93.63%

84.83%

88.63%

91.74%

92.31%

93.2l%

94.02%

95.94%

97.36%

86.H%

89.32%

94.38%

95.14%

96.58%

92.25%

95.16%

92.24%

93.42%

94.12%

95.21%

96.56%

98.74%

68.39%

70.81%

73.35%

74.02%

75.92%

65.75%

69.90%

79.30%

80.04%

81.47%

82.59%

83.77%

90.95%

76.56%

79.98%

85.68%

86.17%

87.65%

82.83%

86.43%

87.22%

87.93%

88.54%

89.01%

90.38%

94.28%

85.91%

87.63%

92.80%

93.21%

94.93%

S4.75

92.55%

93.65%

94.37%

95.03%

96.25%

96.93%

98.68%

90.39%

92.41%

96.56%

97.32%

98.45%

96.88%

98.05%

96.14%

96.96%

97.81%

98.11%

98.33%

99.45%

91.21%

93.2%

96.65%

97.13%

98.71o/o 97.25%

97.46%

95.17%

96.27%

97.77%

98.30%

98.51%

99.56%

91.96%

93.56%

97.80%

98.04%

99.04%

98.91%

95.18%

96.67%

97.31%

98.04%

98.60%

99.97%

98.71%

83.73%

85.85%

88.45%

89.37%

90.48%

83.70%

90.96%

88.51%

89.87%

90.99%

91.21%

91.86%

94.21%

86.74%

90.73%

95.13%

96.46%

97.80%

96.65%

97.40%

95.26%

96.42%

91.0%

97.92%

98.98%

99.99%

91.68%

93.06%

96.84%

97.17%

98.06%

97.06%

98.11%

96.86%

97.03%

97.48%

98.20%

98.38%

99.94%

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 10.9 Experimental design.

Real-time fault diagnosis using deep fusion of features Chapter | 10

389

TABLE 10.10 Experimental results. Experiment no.

DNN

l-D CNN

LSTM

PLSTM

PeLSTM

1

72.52%

78.64%

82.75%

83.65%

84.39%

2

75.72%

83.72%

86.86%

87.33%

88.19%

3

79.31%

86.27%

87.42%

88.05%

89.61%

4

63.53%

67.61%

71.16%

72.57/o

73.28%

5

67.10%

69.46%

72.83%

73.37%

74.92%

6

73.70%

81.70%

83.36%

84.24%

85.45%

7

76.74%

82.93%

84.67%

85.43%

86.13%

8

81.27%

85.02%

87.83%

88.59%

89.02%

9

85.63%

88.49%

90.08%

91.29%

92.96%

10

70.24%

73.56%

75.17%

76.65%

77.74%

11

73.25%

76.47%

79.04%

80.41%

81.53%

12

78.67%

82.84%

86.08%

87.10%

89.07%

Comparing with LSTM, the improved PeLSTM does better in feature extraction of 1-D sequence data. Thus, it can be confirmed that after deep fusion, the diagnosis accuracy shown in column 5 is better than that of column 4. Comparing column 7 with column 5 in Table 10.8, it can be concluded that incorporation of a 2-D image matrix stacked in the deep fusion process can achieve more accurate diagnosis since CNN can extract additional local neighboring features involved in the 2-D image matrix. Comparing column 7 with column 6 of Table 10.8, it is shown that incorporation of a 2-D screenshot image rather than a 2-D image matrix can achieve a real-time accurate fault diagnosis. It shows that the trend information involved in a 2-D screenshot image can be effectively mined, and the autocorrelation in 1-D sequence data is also effectively extracted. The deep fusion method is more effective in accurate feature extraction when only sensors related to 1-D signals are equipped in the gearbox monitoring system. It can be seen from the column 5 of Table 10.9 that the PeLSTM algorithm proposed in this paper can extract sufficient features from 1-D sequences, and from the column 11, it can be seen that the fusion algorithm proposed in this paper is heterogeneous data fault diagnosis the best solution. Table 10.10 summarizes the experimental result analysis mentioned earlier. The explanation of Table 10.10 is similar to that of Table 10.6. Comparing Table 10.10 with Table 10.6, it is shown that the fault diagnosis accuracy of gearbox is lower than those of rolling bearing. The reason is that (1) fewer training samples are used to train the related networks, which will inevitably decrease the diagnosis accuracy; (2) wearing fault is usually viewed as a kind of minor fault that is more difficult to detect; and (3) comparing with single fault diagnosis, study for an accurate diagnosis method for combing fault is still a

390

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 10.11 Comparison of fault diagnosis results based on feature fusion of CNN and LSTM. Experiment LSTMno. 1DCNN

LSTM-SFCNN (2DM)

LSTM-DF- PeLSTMLSTM-DF- PeLSTMCNN DF-CNN lDC’NN DF-lDCNN (2DM) (2DS)

1

87.94%

88.02%

89.31%

90.02%

91.08%

93.00%

2

89.27%

90.25%

91.02%

92.51%

93.67%

95.83%

3

91.59%

92.31%

93.62%

94.24%

96.58%

98.78%

4

75.33%

76.43%

77.52%

78.41%

81.33%

82.00%

5

78.97%

79.75%

80.90%

82.64%

84.11%

86.75%

6

88.50%

89.63%

91.06%

92.08%

94.50%

96.33%

7

87.62%

88.06%

89.52%

91.03%

92.96%

94.71%

8

89.07%

90.48%

91.07%

92.65%

94.46%

96.63%

9

92.65%

93.45%

94.38%

96.71%

98.06%

99.83%

10

80.32%

81.39%

82.64%

84.05%

87.38%

89.75%

11

85.51%

86.27%

87.53%

89.57%

91.96%

93.83%

12

89.02%

90.16%

91.01%

92.23%

94.17%

96.25%

challenging problem. Table 10.10 indicates that the proposed method can also do well in minor fault diagnosis and combining fault diagnosis. Table 10.11 is the fault diagnosis accuracy comparison between the fusion algorithm proposed in this paper and the existing fusion algorithm, and Table 10.12 is the experimental design of the gearbox. To improve the readability of the experiment listed in Table 10.10, a diagnostic classification chart for Experiment 6 is shown in Fig. 10.16. The explanation of Fig. 10.16 is similar to that of Fig. 10.14. Fig. 10.16 indicates that the misclassification rate in Fig. 1.16(m) is the smallest one. The reason is that the deep fusion network does well in comprehensive feature extraction from 1-D sequence data and the 2-D screenshot image by training in a mechanism of global optimization. However, a 2-D screenshot image rather than 2-D matrix data stacked from 1-D sequence is adopted to achieve a real-time diagnosis required by related engineers. Take Fig. 10.16(m) as an example for analysis of detection ability for different kinds of fault. It can been seen from Fig. 10.16(m) that for wear fault, as well as broken and wear fault, the diagnosis accuracy is relatively lower that other five kinds of health state, because wear fault is a minor fault that is difficult to detect. When this minor fault is combined with broken and fault and pitting fault, the difficulties in combining fault detection is significantly increased. Comparing Fig. 10.16(m) with the other parts in Fig. 10.16 shows that even the complex fault is difficult to detect, and the method proposed in this chapter has significant superiority. Fig. 10.17 shows the comparison bar chart. Experimental results for gearbox fault diagnosis show that the validity of the proposed method can be tested in many cases as long as the monitoring data can

Real-time fault diagnosis using deep fusion of features Chapter | 10

391

(a) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

(b) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

(c) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

FIGURE 10.16 Gearbox diagnosis accuracy rate graph of Experiment 6 is six faults, with 800 training samples for each fault and a window size of 900.

392

Fault diagnosis and prognosis techniques for complex engineering systems

(d) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

(e) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

(f) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

FIGURE 10.16

Continued

Real-time fault diagnosis using deep fusion of features Chapter | 10

(g) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

(h) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

(i) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

FIGURE 10.16

Continued

393

394

Fault diagnosis and prognosis techniques for complex engineering systems

(j) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

600

1200

1800 Simple time

2400

3000

3600

(k) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

(l) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

FIGURE 10.16

Continued

Real-time fault diagnosis using deep fusion of features Chapter | 10

395

(m) Normal

Broken and Wear

Pitting and Wear

Wear

Broken

Pitting 0

600

FIGURE 10.16

1200

1800 Simple time

2400

3000

3600

Continued

100. 00% 95. 00% 90. 00% 85. 00% 80. 00% 75. 00% 70. 00% 65. 00%

FIGURE 10.17

2 en

t1

1 Ex

pe

rim

en

t1

0 rim

en pe Ex

rim pe

Ex

Ex

t1

t9 en

t8 rim

en pe

t7 rim

en Ex

pe

t6 en

rim

Ex

pe

t5 rim

en Ex

pe

t4 rim

en Ex

pe

t3 rim

Ex

pe

en

t2 rim

en pe Ex

rim pe

Ex

Ex

pe

rim

en

t1

60. 00%

DNN

LDCNN

LSDM

PLSTM

PeLSTM

CNN (2DM)

CNN (2DS)

LSTM−LDCNN

LSTM−SF−CNN (2DM)

LSTM−DF−LDCNN

PeLSTM-LDCNN

LSTM-DF-CNN (2DM) PeLSTM−DF−CNN (2DS)

Comparison of different fault diagnosis methods for the gearbox.

be collected, even when the number of training samples is limited and the faults are complex to detect.

10.5 Conclusion and future work Deep learning can be applied to fault diagnosis of rotating machinery. The efficiency is up to the training sample size and the means to extract potential features involved in multimodal data available. Since 1-D vibration data collected from the accelerometer are the most common available monitoring signals, it is significant to develop an efficient tool to extract features involved in 1-D sequence data.

396

Fault diagnosis and prognosis techniques for complex engineering systems

TABLE 10.12 Experimental design of gearbox fault diagnosis. Experiment Window no. size Fault type

No. of training samples

No. of test samples

1

100

Normal; pitting; wear; pitting and Wear; 12,000 broken teeth; broken teeth and wear

3600

2

400

Normal; pitting; wear; pitting and Wear; 12,000 broken teeth; broken teeth and wear

3600

3

900

Normal; pitting; wear; pitting and Wear; 12,000 broken teeth; broken teeth and wear

3600

4

100

Normal; pitting; wear; pitting and Wear; 4800 broken teeth; broken teeth and wear

3600

5

400

Normal; pitting; wear; pitting and Wear; 4800 broken teeth; broken teeth and wear

3600

6

900

Normal; pitting; wear; pitting and Wear; 4800 broken teeth; broken teeth and wear

3600

7

100

Normal; pitting; wear; pitting and Wear; 8000 broken teeth

2400

8

400

Normal; pitting; wear; pitting and Wear; 8000 broken teeth

2400

9

900

Normal; pitting; wear; pitting and Wear; 8000 broken teeth

2400

10

100

Normal; pitting; wear; pitting and Wear; 3200 broken teeth

2400

11

400

Normal; pitting; wear; pitting and Wear; 3200 broken teeth

2400

12

900

Normal; pitting; wear; pitting and Wear; 3200 broken teeth

2400

Traditional LSTM may encounter memory bottlenecks that make it an unideal feature extraction tool. This chapter designs a new parallel LSTM with peepholes to overcome the memory bottleneck, as peepholes are designed to prevent useless information transferring. However, using PeLSTM as a unique feature extraction tool will inevitably result in inaccurate diagnosis results because it does not do well in local feature extraction. Existing fusion means of LSTM and CNN cannot achieve a real-time and more accurate diagnosis. Thus, an additional feature fusion network with global training mechanism is designed. The efficiency of the proposed algorithm is only verified by experimental analysis. Designing an interpretable deep fusion network may guide us to design a better fusion mechanism. However, error control is an important factor in

DNN

IDCNN

LSTM

PLSTM

PeLSTM

CNN (2DM)

72.52%

78.64%

82.75%

83.65%

84.39%

79.53%

CNN (2DS)

1DCNN- LSTM-SF- LSTMLSTMCNN DF1DCNN (2DM) 1DCNN

PeLSTM- LSTMPeLSTMDFDF-CNN DF-CNN 1DCNN (2DM) (2DS)

80.72%

87.94%

89.31%

90.02%

91.08%

93.00%

88.02%

75.72%

83.72%

86.86%

S7.33%

88.19%

86.36%

87.31%

89.27%

90.25%

91.02%

92.51%

93.67%

95.83%

79.31%

86.27%

87.42%

88.05%

89.61%

86.58%

88.81%

91.59%

92.31%

93.62%

94.24%

96.58%

98.78%

63.53%

67.61%

71.16%

72.57%

73.28%

64.08%

68.78%

75.33%

76.43%

77.52%

78.41%

81.33%

82.00%

67.10%

69.46%

i2.83%

73.37%

74.92%

71.86%

73.53%

78.97%

79.75%

80.90%

82.64%

84.11%

86.75%

73.70%

81.70%

83.36%

84.24%

85.45%

82.92%

83.39%

88.50%

89.63%

91.06%

92.08%

94.50%

96.33%

76.74%

82.93%

84.67%

85.43%

86.13%

83.75%

84.88%

87.62%

88.06%

89.52%

91.03%

92.96%

94.71%

81.27%

85.02%

87.83%

88.59%

89.01%

87.29%

88.54%

89.07%

90.48%

91.07%

92.65%

94.46%

96.63%

85.63%

88.49%

90.08%

91.29%

92.96%

89.21%

90.87%

92.65%

93.45%

94.38%

96.71%

98.06%

99.83%

70.24%

73.56%

75.17%

76.65%

77.74%

74.50%

76.96%

80.32%

81.39%

82.64%

84.05%

87.38%

89.75%

73.25%

76.47%

79.04%

80.41%

81.53%

79.37%

81.04%

85.51%

86.27%

87.53%

89.57%

91.96%

93.83%

78.67%

82.84%

86.08%

87.10%

89.07%

86.96%

88.18%

89.02%

90.16%

91.01%

92.23%

94.17%

96.25%

Real-time fault diagnosis using deep fusion of features Chapter | 10

TABLE 10.13 Comparison of bearing fault diagnosis accuracy using 1D sequence data.

397

398

Fault diagnosis and prognosis techniques for complex engineering systems

engineering [25], and designing a deep fusion network with error constraints is also our future work.

Acknowledgment This research was partially supported by the NSFC project (grant no. U1604158) and the Shanghai S&T Commission (grant no. 19040501700).

References [1] W. Abed, S. Sharma, R. Sutton, Neural network fault diagnosis of a trolling motor based on feature reduction techniques for an unmanned surface vehicle, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 229 (8) (2015) 738–750. [2] Z. An, S. Li, J. Wang, Y. Xin, K. Xu, Generalization of deep neural network for bearing fault diagnosis under different working conditions using multiple kernel method, Neurocomputing 352 (2019) 42–53. [3] Bearing Data Center. Home page. [Online]. 2021. Available at http://csegroups.case.edu/ bearingdatacenter/home. [4] T.D. Bruin, K. Verbert, R. Babuska, Railway track circuit fault diagnosis using recurrent neural networks, IEEE Transactions on Neural Networks 28 (3) (2017) 523–533. [5] I. Djelloul, Z. Sari, I. Sidibe, Fault diagnosis based on the quality effect of learning algorithm for manufacturing systems, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 233 (7) (2019) 801–814. [6] L. Eren, T. Ince, S. Kiranyaz, A generic intelligent bearing fault diagnosis system using compact adaptive 1D CNN classifier, Signal Processing Systems 91 (2) (2019) 179–189. [7] Y. Fu, D. Huang, N. Qin, K. Lang, Y. Yang, High-speed railway bogie fault diagnosis using LSTM neural network, Proceedings of the 37th Chinese Control Conference (CCC), (2018) 5848–5852. [8] R. Guo, K. Guo, J. Dong, Fault diagnosis for the landing phase of the aircraft based on an adaptive kernel principal component analysis algorithm, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 229 (10) (2015) 917–926. [9] J. Han, D. Choi, S. Hong, H. Kim, Motor fault diagnosis using CNN based deep learning algorithm considering motor rotating speed, Proceedings of the IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA) (2019) 440–445. [10] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780. [11] Y. Hsueh, V.R. Ittangihal, W. Wu, H. Chang, C. Kuo, Fault diagnosis system for induction motors by CNN using empirical wavelet transform, Symmetry 11 (10) (2019) 1212. [12] R. Huang, Y. Liao, S. Zhang, W. Li, Deep decoupling convolutional neural network for intelligent compound fault diagnosis, IEEE Access 7 (2018) 1848–1858. [13] H. Jafari, J. Poshtan, H. Sadeghi, Application of fuzzy data fusion theory in fault diagnosis of rotating machinery, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 232 (8) (2018) 1015–1024.

Real-time fault diagnosis using deep fusion of features Chapter | 10

399

[14] H. Jiang, F. Wang, H. Shao, H. Zhang, Rolling bearing fault identification using multilayer deep learning convolutional neural network, Journal of Vibroengineering 19 (1) (2017) 138– 149. [15] S.T. Kandukuri, J.S.L. Senanayaka, V.K. Huynh, H.R. Karimi, K.G. Robbersmyr, Current signature based fault diagnosis of field-oriented and direct torque–controlled induction motor drives, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 231 (10) (2017) 849–866. [16] L. Kou, Y. Qin, X. Zhao, X. Chen, A multi-dimension end-to-end CNN model for rotating devices fault diagnosis on high-speed train bogie, IEEE Transactions on Vehicular Technology 69 (3) (2020) 2513–2524. [17] J. Lei, C. Liu, D. Jiang, Fault diagnosis of wind turbine based on long short-term memory networks, Renewable Energy 133 (2019) 422–432. [18] H. Li, J. Huang, S. Ji, Bearing fault diagnosis with a feature fusion method based on an ensemble convolutional neural network and deep neural network, Sensors 19 (9) (2019) 2034. [19] Y. Li, H.R. Karimi, Q. Zhang, D. Zhao, Y. Li, Fault detection for linear discrete time-varying systems subject to random sensor delay: A Riccati equation approach, IEEE Transactions on Circuits & Systems I: Regular Papers 65 (5) (2017) 1707–1716. [20] P. Luo, Y. Hu, Research on rolling bearing fault identification method based on LSTM neural network, Materials Science & Engineering 542 (1) (2019;) 012048. [21] H. Pan, X. He, S. Tang, F. Meng, An improved bearing fault diagnosis method using onedimensional CNN and LSTM, Journal of Mechanical Engineering 64 (7) (2018) 443–452. [22] D. Peng, H. Wang, Z. Liu, W. Zhang, M.J. Zuo, J. Chen, Multibranch and multiscale CNN for fault diagnosis of wheelset bearings under strong noise and variable load condition, IEEE Transactions on Industrial Informatics 16 (7) (2020) 4949–4960. [23] K. Polat, The fault diagnosis based on deep long short-term memory model from the vibration signals in the computer numerical control machines, Journal of the Institute of Electronics & Computer 2 (1) (2020) 72–92. [24] Pudn.com. QPZZ gearbox data. [Online]. 2021. Available at http://www.pudn.com/ Download/item/id/3205015.html. [25] K. Sun, J. Qiu, H. R. Karimi, H. Gao. A novel finite-time control for nonstrict feedback saturated nonlinear systems with tracking error constraint. IEEE Transactions on Systems, Man & Cybernetics. Early access, December 27, 2019. [26] K. Tidriri, N. Chatti, S. Verron, T. Tiplica, Model-based fault detection and diagnosis of complex chemical processes: A case study of the Tennessee Eastman process, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 232 (6) (2018) 742–760. [27] H. Wang, S. Li, L. Song, L. Cui, A novel convolutional neural network based fault recognition method via image fusion of multi-vibration-signals, Computers in Industry 105 (2019) 182– 190. [28] W. Wang, X. Qiu, C. Chen, B. Lin, H. Zhang. Application research on long short-term memory network in fault diagnosis. In Proceedings of the 2018 International Conference on Machine Learning and Cybernetics. 360--365. [29] Y. Wang, M. Liu, Z. Bao, S. Zhang, Stacked sparse autoencoder with PCA and SVM for databased line trip fault diagnosis in power systems, Neural Computing & Applications 31 (10) (2019) 6719–6731. [30] L. Wen L, X. Li, L. Gao, Y. Zhang, A new convolutional neural network-based data-driven fault diagnosis method, IEEE Transactions on Industrial Electronics 65 (7) (2017) 5990– 5998.

400

Fault diagnosis and prognosis techniques for complex engineering systems

[31] D. Xiao, Y. Huang, X. Zhang, H. Shi, C. Liu, Y. Li. Fault diagnosis of asynchronous motors based on LSTM neural network. In Proceedings of the 2018 IEEE Prognostics & System Health Management Conference. IEEE, Los Alamitos, CA, 540--545. [32] R. Yang, M. Huang, Q. Lu, M. Zhong, Rotating machinery fault diagnosis using long-shortterm memory recurrent neural network, IFAC-PapersOnLine 51 (24) (2018;) 228–232. [33] R. Yang, H. Li, C. He, Z. Zhang, Rolling element bearing weak fault diagnosis based on optimal wavelet scale cyclic frequency extraction, Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems & Control Engineering 232 (7) (2018) 895–908. [34] A. Yin, Y. Yan, Z. Zhang, C. Li, R.-V. Sanchez, Fault diagnosis of wind turbine gearbox based on the optimized LSTM neural network with cosine loss, Sensors 20 (8) (2020) 2339. [35] L. Yu, J. Qu, F. Gao, Y. Tian, A novel hierarchical algorithm for bearing fault diagnosis based on stacked LSTM, Shock & Vibration 2019 (2019) 1–10. [36] J. Zarei, M.A. Tajeddini, H.R. Karimi, Vibration analysis for bearing fault detection and classification using an intelligent filter, Mechatronics 24 (2) (2014) 151–157. [37] B. Zhang, S. Zhang, W. Li, Bearing performance degradation assessment using long shortterm memory recurrent network, Computers in Industry 106 (2019) 14–29. [38] W. Zhang, G. Peng, C. Li, Bearings fault diagnosis based on convolutional neural networks with 2-D representation of vibration signals as input, MATEC web of conferences, EDP Sciences 95 (2017) 13001. [39] Y. Zhang, Q. Liu, L. Song, Sentence-state LSTM for text representation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018) 317–327. [40] H. Zhao, S. Sun, B. Jin, Sequential fault diagnosis based on LSTM neural network, IEEE Access 6 (2018) 12929–12939.

Index

Page numbers followed by “f” and “t” indicate, figures and tables respectively.

A AC drive system, 144 Actuator faults, 200 fault model, 308 Adaptive network-based fuzzy inference system (ANFIS), 333 Adjusted commanded motor forces, 315f AI-based tools, 149 Air gap eccentricity fault (AGE), 77 Akaike information criterion method, 66 Algebra of functions, 182 Aluminum electrolytic capacitors, 139 Analogy evaluation method, 238 Angle-based contribution (ABC), 5 ARCH model, 245 Armature winding faults, 136 Artificial neural networks (ANNs), 149, 333, 232 Axial rotor flux, 135 Axis orbit, 232

B Baseline controller, 313f Bayesian inference, 4 Bayesian network–based prediction method, 248 Bayesian network model, 235 Bayesian statistical inference method, 248 Baysian model–based estimation, 168 Bearing faults, 134, 152 electric origin, 134 mechanical origin, 134 “Black box” model, 233 Body frame (BF), 307 Bond wire lift-off, 133 Broyden-Fletche-Goldfarb-Shanno-based quasi-Newton minimization algorithms, 154

C Canonical correlation analysis (CCA), 3, 51 based fault diagnosis method, 57, 61 conventional, 53 for dynamic processes, 63 fault detection method, GRU-aided, 67 GRU, 67 GRU-aided CCA method, 68 positions and images in, 55 Canonical correlation coefficients, 57 Canonical correlation variables, 57 Canonical correlation vectors, 57 Canonical variate analysis (CVA), 51 Capacitors, 139 Cavitation, 218 bubble, 221f erosion, 223 establishment of, 218f flow, 219 flowchart of, 218f model establishment and optimization, 224 numerical simulation, 218 phenomenon, 224, 227 types, 218 Chaos theory, 248 China Hydropower Research Institute, 240 China Water Turbine Standardization Technical Committee, 207 Chinese Institute of Water Resources and Hydropower Research, 208 Cholesky/Eigen value decomposition, 56 Clarke’s transformation, 161 Combination forecast, 251 Complex Morlet wavelet, 268 Cubic spline interpolation, 266, 267 Concurrent CCA (CCCA) model, 3 Continuously stirred tank reactor (CSTR), 71, 71, 52 analysis, 73

401

402

Index

DCCA and GRU-aided CCA training, 72 Convolutional neural networks (CNNs), 334, 356 Covariance matrix, 55

D Data augmentation methods, 340 Data-driven fault diagnosis method, 232 DC converters, 132 DC-link applications, 139 DC-link capacitors, 140 Deep neural network (DNN)-based methods, 52, 334 Digital radiography, 330 Digital X-ray detector, 331f Direct torque control (DTC), 131 Discharge bearing currents, 136 2-D matrix data, 359f Dongjiang Hydropower Station, 214 Doppler frequency shift, 211 Doppler velocimetry testing technology, 211 Dynamical CCA (DCCA), 52 based fault detection, 63, 65 modeling of input and output datasets, 64 and GRU-aided CCA training, 72, 77 Dynamic ICA (DICA) algorithm, 4 Dynamic-inner canonical correlation analysis (DiCCA) algorithm, 4 Dynamic inner PLS (DiPLS) method, 4 Dynamic PCA (DPCA) method, 4 Dysprosium, 138

E Efficient PLS (E-PLS) algorithm, 2 Eigen value-eigenvector problem, 11 Electrical machines, 134, 134 Electric power converters, 132 Electrolyte vaporization, 139 Electrolytic capacitors, 175 ELM segmented prediction model, 246 Empirical mode decomposition (EMD), 230, 231 Energy shocks, 133 Equivalent series resistance (ESR), 139 Estimation error convergence, 185 Euclidean geometric distance, 238 Euclidean space, 354 Eulerian method, 219 Exact plant model, 195 Experimental design, 385f, 373t Experimental test environment, 318f

Extended state observer (ESO), 181 design, 184 technique, 184 Extreme learning machine, 234

F False alarm rate (FAR), 59 Fault accommodation module, 312 Fault classification and decision flowchart, 132f Fault detection (FD), 89 basic methodology, 6 fault diagnosis without smearing effect, 40 KDD Algorithm, 9 kernel sample equivalent replacement, 25 KLS-based approach, 14 nonlinear fault detection using KSER, 37 reconstruction partial derivative contribution plot, 21 simulation, 27 results, 29 Tennessee-Eastman process, 27 module, 305 Fault detection and isolation (FDI), 89 Fault detection delay (FDD), 74 Fault detection filter (FDF), 89 Fault detection module diagnosis accuracy, 375t diagnosis methods, 385f diagnosis module, 311 diagnosis systems, 239 diagnosis, 182, 353 structure of, 310f Fault detection rate (FDR), 59 Fault detection scheme, 192 Fault identification, 196 Fault isolation and fault identification, 194 Fault-tolerant control (FTC), 305 active approaches, 305 naso-based, 309 scheme, 309, 319 structure of, 310f Fault tree analysis, 235 FDI schemes, 187 Field-oriented PWM drive, 131f Fixed-Lag H∞ fault estimator design for LDTV systems, 112, 120, 122 Kalman filtering in, 117 Krein space model design, 113 problem formulation and preliminaries, 112 Flight test environment, 323 NASO in, 321f Flight test environment, 318f

Index Flow field measurement technology, 211 Fluent software environment, 227 Forecasting methods, 251 Forward propagation,, 149 Fourier transform, 153f, 146, 153 Francis99 high-head Francis model turbine, 246 Francis turbine, 214 Fuzzy inference system (FIS), 182 Fuzzy set theory, 249

G Gated recurrent units (GRU), 63 Gaussian kernel function, 25 Gaussian mixture model (GMM), 4 GDXray database, 335 Gearbox test and analysis, 384 Gezhouba hydropower plant, 214 Global optimization parameter architecture diagram, 368f Gray-level co-occurrence (GLCM), 333

H Hardware-in-the-loop (HIL), 75 Heterogeneous data, 374 Hidden Markov model (HMM), 169 High voltage breakdown, 133 HM9000 system, 240 HT-based detection method, 269 Hydraulic cavitation, 218 Hydraulic fault, 243 Hydraulic machinery, 208 Hydraulic turbines, on-site detection for, 212 Hydropower unit equipment, 242 degradation evaluation system of, 243 fault diagnosis for, 242 Hydropower units, 230f, 241 mechanism of, 229 Hydropower units research on intelligent comprehensive state assessment, 235 Hydrostatic bearing, 212f, 212

I IGBT catastrophic failure mechanisms, 133 Independent component analysis (ICA), 4 Induction motor drives, 160, 161 Inertial frame (IF), 307 Influence of imbalance fault, 265f Insulated gate bipolar transistors (IGBTs), 129 Internal fluid testing technology, 211 International Electrotechnical Commission (IEC), 207

403

International Standardization Association (ISO), 236 Internet-based remote diagnosis system, 217 Interturn short circuit fault (ITSC), 77

K Kalman filter, 168 Kalman filter–based estimations, 305 Kalman Filtering in Krein Space, 117 Kernel density estimation (KDE), 52 Kernel direct decomposition (KDD), 3 algorithm, 9 based fault detection method, 14 Kernel least squares (KLS) theory, 3 Kernel PCA (KPCA) method, 3 Kernel PLS (KPLS) method, 3 principle of, 7 Kernel sample equivalent replacement (KSER) theory, 5 nonlinear fault detection, 37 Key performance index (KPI), 1 k-nearest neighbors (kNN) classification, 147 Krein space model design, 113 Kronecker delta function, 91

L Laser cavitation, 223 Latch-up condition, 133 Li-ion batteries, 141, 142, 140 Limit evaluation method, 236 Linear combination prediction, 252 Linear discrete time-varying (LDTV), 89 robust H∞ fault detection,, 91 H∞ performance analysis, 92 numerical examples, 102, 110 parameter matrices, 100 problem formulation, 91, 103 results, 106 Linear discriminant analysis, 147 Linear time-invariant (LTI), 89 Linear time series model, 245 Linear time-varying (LTV), 89 Lithium cobalt oxide, 141 Lithium ions, 141 Lithium iron phosphate, 143 Lithium manganese oxide, 143 Lithium nickel manganese cobalt oxide, 143 Lithium titanate, 143 Llinear time series, 246 Long short-term memory (LSTM), 52, 357 algorithm, 354 network, 374

404

Index

neural network, 354 neural network–based methods, 353 Loss of effectiveness (LOE), 309 Lyapunov function, 312

M Marine current turbine (MCT), 263 Mean-balanced cavitation flow model, 219 Mean current vector (MCV), 158 Metallized polypropylene film capacitors, 139 MIMO (multiple-input, multiple-output), 182, 200 Minkowski metrics, 147 Monte Carlo method, 168 MPPF-Caps, 140 Muddy water model test of turbine silt wear, 225 Multi-information fusion, 234 Multivariate analysis (MVA), 51

N Navier-Stokes equation, 227 Network architecture, 339 Network parameters, 373t Neural networks, 246 Neuro-fuzzy inference system, 250 Nickel-manganese-cobalt (NMC), 143 NNARX turbine regulation system, 246 Nonlinear adaptive estimators, 312 Nonlinear combination prediction, 252 Nonlinear dynamic system, 183 Nonlinear mapping, 3 Nonlinear prediction method, 248 Nonlinear quadrotor dynamics, 308 Numerical simulation method, 218 results, 320

O Observer-based fault diagnosis, 182 Ontario Hydropower Bureau of Canada and Sri International, 239 Open circuit windings, 137

P Park transformation, 130 Partial least squares (PLS), 1 algorithm, 2 Particle Filter algorithm, 172 PeLSTM cell structure, 363f Permanent magnet AC (PMAC), 127 drives, 162

machine windings, 164 machine, 130, 130, 151 Permanent magnet machines, 137 PID baseline controller, 318 PIV high-speed photography technology, 211 PIV technology, 211 Power electronics, 157 Power electronic switches, 172 Primary performance index, 357 Principal component analysis (PCA), 1, 51 Principal component regression (PCR), 2 Priori, 130, 309 Process faults, 200 isolation of, 200 Process of memory unit, 363 Prognosis tools, 167, 174 Prototype testing technology, 213 Pulse width modulation (PWM) inverter, 129 Pump turbines, 228

Q Quadrotor and frames, structure of, 307f mathematical model of, 307 nonlinear quadrotor model, 307 parameters, 308t

R RBF neural network, 250 Real-time accurate diagnosis, 357 Reconstruction-based contribution (RBC) plot method, 5 Reconstruction partial derivative contribution plot, 21 Recurrent neural network (RNN), 67, 354 Recursive C-PLS (RCPLS) algorithm, 4 Recursive PLS regression (RPLS), 4 Reduction of combined index (RCI), 5 Remaining useful life (RUL), 129 Reynolds number, 219 Robust H∞ fault detection, 91 H∞ performance analysis, 92 for LDTV systems, 103 numerical examples, 110 problem formulation, 103 results, 106 numerical examples, 102 parameter matrices, 100 problem formulation, 91 Roll and pitch angle tracking performance, 322f Rolling bearing data, 371 Rotor broken bar fault (RBB), 77

Index

S Schmitt triggers, 161 Schnerr and Sauer model, 225 Sensor detection system, 207 Sensor detection technology, 207 Sensor faults, 198, 198 Short-circuit failure, 133 Signal features of, 145 selection, 144 Signal-to-noise ratio, 353 Simultaneous faults, 197 Singhal model, 224 Singular value decomposition (SVD), 2, 55 based technique, 57 SST K-turbulence model, 227 Stability analysis, 96 State detection and tracking analysis system, 240, 241 Stochastic gradient descent with momentum (SGDM), 340 Support vector data description (SVDD) algorithm, 4 Support vector machine, 233 Support vector machine (SVM), 149, 231 based prediction, 247 SVR regression model, 247 Switched reluctance machines, 165

T Tennessee-Eastman process, 27 Three-talk dynamic system, 197 Three-tank system, 188f, 188, 198 Tianjin Institute of Geological Exploration and Electrical Engineering, 214 Time domain, 152t Time Series Analysis: Forecasting and Control, 245 Total PCR (T-PCR) algorithm, 2 Total PLS (T-PLS), 2 Traction control unit (TCU), 75 Traction drive and control system (TDCS), 75, 75, 52

405

analysis, 78 DCCA and GRU-aided CCA training, 77 Traditional limit value evaluation method, 236 Turbine gearbox, 354 Turbine vibration level, 215 Two control schemes, 129 Two-level inverter, 129f, 166f

U Unbalanced magnetic fields, 134 Unit stability test, 214 Unmanned aerial vehicles (UAVs), 305 quadrotor, 309

V Vector control technique, 130 VM600 system of VIBRO-METER, 216 VOF model, 223 Volume of fluid (VOF) method, 219

W Water turbine generator set, allowable vibration values of, 237t Wavelet threshold denoising, 264 Wavelet threshold denoising–based detection method, 269 Wavelet transforms (WTs), 146, 231 Winding faults AC machines, 151 armature, 136 open, 136 Winding insulation, 172

Z ZGB cavitation model, 226 Zhao-Atlas- Marks distributions, 146 Zhejiang Shafan Hydropower Station, 214 ZOOM2000 online monitoring and diagnosis system, 239 Zwart cavitation model, 227 Zwart-Gerber-Belamri (ZGB) model, 224

152 x 229 mm, Paper, PG, Spine: 20.828 mm

EDITED BY HAMID REZA KARIMI Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems gives a systematically and almost self-contained description of the many facets of envisaging, designing, implementing, or experimentally exploring emerging trends in fault diagnosis and failure prognosis in mechanical, electrical, hydraulic, and marine systems. The book is devoted to the development of mathematical methodologies for fault diagnosis and isolation, faulttolerant control, and failure prognosis problems of engineering systems. It presents new techniques in reliability modeling, reliability analysis, reliability design, fault and failure detection, signal processing, and fault-tolerant control of engineering systems. It is specifically focusing on the development of mathematical methodologies for diagnosis and prognosis of faults or failures, providing a unified platform for understanding and applicability of advanced diagnosis and prognosis methodologies for improving reliability purposes in both theory and practice, such as vehicles, manufacturing systems, circuits, flights, and marine systems. This book will be a valuable resource for different groups of readers—mechanical engineers working on vehicle systems, electrical engineers working on rotary machinery systems, control engineers working on fault detection systems, mathematicians and physician working on complex dynamics, and postgraduate students majoring in mechatronics, control engineering, mechanical engineering, and applied mathematics. It can be also of significant interest to the researchers within the mechatronics engineering society, including both academic and industrial parts. Key Features • • •

Presents recent advances of theory, technological aspects, and applications of advanced diagnosis and prognosis methodologies in engineering applications. Provides a series of latest results in, including but not limited to, fault detection, isolation, fault-tolerant control, and failure prognosis of components. Gives numerical and simulation results in each chapter to reflect the engineering practice, yet demonstrate the focus of the developed analysis and synthesis approaches.

FAULT DIAGNOSIS AND PROGNOSIS TECHNIQUES FOR COMPLEX ENGINEERING SYSTEMS

FAULT DIAGNOSIS AND PROGNOSIS TECHNIQUES FOR COMPLEX ENGINEERING SYSTEMS

FAULT DIAGNOSIS AND PROGNOSIS TECHNIQUES FOR COMPLEX ENGINEERING SYSTEMS

About the Editor

KARIMI

Dr. Hamid Reza Karimi is a Professor of Applied Mechanics with the Department of Mechanical Engineering, Politecnico di Milano, Milan, Italy. His current research interests include control systems and mechatronics with applications to automotive systems, robotics, vibration systems, and wind energy. Prof. Karimi is currently the Editorin-Chief, Technical Editor, or Associate Editor for some international journals. He has been awarded as the 2016-2020 Web of Science Highly Cited Researcher in Engineering and also received the 2020 IEEE Transactions on Circuits and Systems Guillemin-Cauer Best Paper Award. Technology and Engineering ISBN 978-0-12-822473-1

EDITED BY 9 780128 224731

HAMID REZA KARIMI