Bayesian Real-Time System Identification: From Centralized to Distributed Approach 9819905923, 9789819905928

This book introduces some recent developments in Bayesian real-time system identification. It contains two different per

179 99 13MB

English Pages 285 [286] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Nomenclature
1 Introduction
1.1 Dynamical Systems
1.1.1 Time-Invariant Systems
1.1.2 Time-Varying Systems
1.2 System Identification
1.2.1 Problems in System Identification
1.2.2 Real-Time System Identification
1.2.3 Bayesian System Identification
1.3 Uncertainty
1.4 Organization of the Book
References
2 System Identification Using Kalman Filter and Extended Kalman Filter
2.1 Introduction
2.2 Standard Kalman Filter
2.2.1 Derivation of the Discrete-Time Kalman Filter
2.3 Applications to State Estimation
2.3.1 Vehicle Tracking Problem
2.3.2 Sixty-Story Building
2.4 Extended Kalman Filter
2.4.1 Derivation of the Extended Kalman Filter
2.4.2 Extended Kalman Filter with Fading Memory
2.5 Application to State Estimation and Model Parameter Identification
2.5.1 Single-Degree-of-Freedom System
2.5.2 Three-Pier Bridge
2.5.3 Bouc-Wen Hysteresis System
2.6 Application to a Field Inspired Test Case: The Canton Tower
2.6.1 Background Information
2.6.2 Identification of Structural States and Model Parameters
2.7 Extended Readings
2.8 Concluding Remarks
References
3 Real-Time Updating of Noise Parameters for System Identification
3.1 Introduction
3.2 Real-Time Updating of Dynamical Systems and Noise Parameters
3.2.1 Updating of States and Model Parameters
3.2.2 Updating of Noise Parameters
3.3 Efficient Numerical Optimization Scheme
3.3.1 Training Phase
3.3.2 Working Phase
3.3.3 Uncertainty Estimation of the Updated Noise Parameters
3.4 Applications
3.4.1 Bouc-Wen Hysteresis System
3.4.2 Three-Pier Bridge
3.5 Concluding Remarks
References
4 Outlier Detection for Real-Time System Identification
4.1 Introduction
4.2 Outlier Detection Using Probability of Outlier
4.2.1 Normalized Residual of Measurement
4.2.2 Probability of Outlier
4.3 Computational Efficiency Enhancement Techniques
4.3.1 Moving Time Window
4.3.2 Efficient Screening Criteria
4.4 Outlier Detection for Time-Varying Dynamical Systems
4.4.1 Training Stage
4.4.2 Working Stage
4.5 Applications
4.5.1 Outlier Generation
4.5.2 Single-Degree-of-Freedom Oscillator
4.5.3 Fourteen-Bay Truss
4.6 Concluding Remarks
References
5 Bayesian Model Class Selection and Self-Calibratable Model Classes for Real-Time System Identification
5.1 Introduction
5.2 Bayesian Real-Time Model Class Selection
5.3 Real-Time System Identification Using Predefined Model Classes
5.3.1 Parametric Identification with a Specified Model Class
5.3.2 Parametric Identification Using Multiple Model Classes
5.3.3 Parametric Identification Using the Most Plausible Model Class
5.3.4 Predefined Model Classes
5.4 Self-Calibratable Model Classes
5.4.1 Parameterization and Model Classes
5.4.2 Self-Calibrating Strategy
5.4.3 Procedure of the Real-Time System Identification with Self-Calibratable Model Classes
5.5 Hierarchical Interhealable Model Classes
5.5.1 Hierarchical Model Classes
5.5.2 Interhealing Mechanism
5.5.3 Triggering Conditions
5.5.4 Procedure of the Real-Time System Identification Using Hierarchical Interhealable Model Classes
5.6 Applications to Bayesian Real-Time Model Class Selection for System Identification
5.6.1 Identification of High-Rise Building with Predefined Model Classes
5.6.2 Identification of Bouc-Wen Nonlinear Hysteresis System with Self-Calibratable Model Classes
5.6.3 Identification of Three-Dimensional Truss Dome with Hierarchical Interhealable Model Classes
5.7 Concluding Remarks
References
6 Online Distributed Identification for Wireless Sensor Networks
6.1 Introduction
6.2 Typical Architectures of Wireless Sensor Network
6.2.1 Centralized Networks
6.2.2 Decentralized Networks
6.2.3 Distributed Networks
6.3 Problem Formulations
6.4 Compression and Extraction Technique at the Sensor Nodes
6.4.1 Compression and Extraction of the Updated State Vector
6.4.2 Compression and Extraction of the Covariance Matrix
6.5 Bayesian Fusion at the Central Station
6.5.1 The Product of Univariate Gaussian PDFs
6.5.2 The Product of Multivariate Gaussian PDFs
6.5.3 Fusion of the Compressed Local Information
6.6 Illustrative Examples
6.6.1 Example 1: Forty-Story Building
6.6.2 Example 2: Bridge with Two Piers
6.7 Concluding Remarks
References
7 Online Distributed Identification Handling Asynchronous Data and Multiple Outlier-Corrupted Data
7.1 Introduction
7.2 Online Distributed Identification Framework
7.2.1 Local Identification at the Sensor Nodes
7.2.2 Bayesian Fusion at the Central Station
7.3 Online Distributed Identification Using Asynchronous Data
7.4 Application to Model Updating of a Sixteen-Bay Truss
7.5 Hierarchical Outlier Detection
7.5.1 Local Outlier Detection at the Sensor Nodes
7.5.2 Global Outlier Detection at the Central Station
7.6 Application to Model Updating of a Forty-Story Building
7.7 Concluding Remarks
References
Recommend Papers

Bayesian Real-Time System Identification: From Centralized to Distributed Approach
 9819905923, 9789819905928

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Ke Huang Ka-Veng Yuen

Bayesian Real-Time System Identification From Centralized to Distributed Approach

Bayesian Real-Time System Identification

Ke Huang · Ka-Veng Yuen

Bayesian Real-Time System Identification From Centralized to Distributed Approach

Ke Huang School of Civil Engineering Changsha University of Science and Technology Changsha, Hunan, China State Key Laboratory of Internet of Things for Smart City and Department of Civil and Environmental Engineering University of Macau Taipa, China

Ka-Veng Yuen State Key Laboratory of Internet of Things for Smart City and Department of Civil and Environmental Engineering University of Macau Taipa, China

ISBN 978-981-99-0592-8 ISBN 978-981-99-0593-5 (eBook) https://doi.org/10.1007/978-981-99-0593-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

System identification aims at estimating the physical parameters associated with a mathematical model from measured data. It is applicable to virtually all areas of science and engineering. Any discipline concerned with mathematical modeling of its underlying system is a likely candidate for system identification. This includes statistical physics, information sciences, chemical engineering, electrical engineering, etc. Real-time system identification provides comprehensive treatment for the challenging issue in dynamical systems: tracking time-varying systems. It is desirable for promptly acquiring information from the system for monitoring and control purposes. This book aims to provide novel identification insights for dealing with some challenging identification problems for time-varying dynamical systems. It offers two different perspectives to data processing for system identification, namely, centralized and distributed, compared with conventional centralized approaches. Centralized identification requires transmitting all measured data to a single processing unit. A number of methods for parameter estimation have been developed in this scope under different working conditions. Real-time model class selection is also considered and it is incorporated with parameter estimation for system identification purposes. Traditionally, there are two levels of system identification problems, i.e., estimation of the uncertain parameters governed in the mathematical model and selection of an appropriate model class representing the underlying dynamical system. However, in terms of model class selection, there is no guarantee of satisfactory modeling results since all the model class candidates may be unsuitable. A new third level of system identification using self-calibratable model classes will be introduced. The structure of the model classes can be adaptively modified. On the other hand, distributed identification takes advantage of wireless sensing, data acquisition, and computational technology so that the computational workload can be outsourced and distributed to the sensor nodes. Then, the obtained local results are fused at the central station to provide the global interpretation of the underlying system. Methods are also developed based on the distributed identification framework using data of different natures, including asynchronous data and multiple outlier corrupted data. This book presents the applications for Bayesian real-time system identification to enrich practical engineering scenarios. Although the application context mainly v

vi

Preface

focuses on civil engineering infrastructures, the presented theories and algorithms are widely applicable for general dynamical systems (such as mechanical systems and aerospace structures). This book provides sufficient background to follow Bayesian methods for solving real-time system identification problems in civil and other engineering disciplines. Changsha, China Taipa, China

Ke Huang Ka-Veng Yuen

Acknowledgements We would like to express our sincere thanks for the financial support from the Science and Technology Development Fund of the Macao SAR Government, the Research Committee of the University of Macau, the Guangdong-Hong Kong-Macau Joint Laboratory Program (2020B1212030009) and National Natural Science Foundation of China. We are also grateful to Mr. Rajasekar Ganesan and Mr. Wayne Hu for their excellent advices throughout the entire book preparation process.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Time-Invariant Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Time-Varying Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Problems in System Identification . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Real-Time System Identification . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Bayesian System Identification . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 System Identification Using Kalman Filter and Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Standard Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Derivation of the Discrete-Time Kalman Filter . . . . . . . . . . . 2.3 Applications to State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Vehicle Tracking Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Sixty-Story Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Derivation of the Extended Kalman Filter . . . . . . . . . . . . . . . . 2.4.2 Extended Kalman Filter with Fading Memory . . . . . . . . . . . . 2.5 Application to State Estimation and Model Parameter Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Single-Degree-of-Freedom System . . . . . . . . . . . . . . . . . . . . . 2.5.2 Three-Pier Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Bouc-Wen Hysteresis System . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Application to a Field Inspired Test Case: The Canton Tower . . . . . 2.6.1 Background Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Identification of Structural States and Model Parameters . . .

1 1 2 3 4 5 11 14 16 17 20 25 25 28 28 33 33 39 42 43 47 50 50 53 57 63 63 67

vii

viii

Contents

2.7 Extended Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68 71 72

3 Real-Time Updating of Noise Parameters for System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.2 Real-Time Updating of Dynamical Systems and Noise Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.2.1 Updating of States and Model Parameters . . . . . . . . . . . . . . . 80 3.2.2 Updating of Noise Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.3 Efficient Numerical Optimization Scheme . . . . . . . . . . . . . . . . . . . . . . 84 3.3.1 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.2 Working Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.3.3 Uncertainty Estimation of the Updated Noise Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.4.1 Bouc-Wen Hysteresis System . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.4.2 Three-Pier Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4 Outlier Detection for Real-Time System Identification . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Outlier Detection Using Probability of Outlier . . . . . . . . . . . . . . . . . . 4.2.1 Normalized Residual of Measurement . . . . . . . . . . . . . . . . . . . 4.2.2 Probability of Outlier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Computational Efficiency Enhancement Techniques . . . . . . . . . . . . . 4.3.1 Moving Time Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Efficient Screening Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Outlier Detection for Time-Varying Dynamical Systems . . . . . . . . . 4.4.1 Training Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Working Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Outlier Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Single-Degree-of-Freedom Oscillator . . . . . . . . . . . . . . . . . . . 4.5.3 Fourteen-Bay Truss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Bayesian Model Class Selection and Self-Calibratable Model Classes for Real-Time System Identification . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Bayesian Real-Time Model Class Selection . . . . . . . . . . . . . . . . . . . . 5.3 Real-Time System Identification Using Predefined Model Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

109 109 119 119 121 124 124 125 126 126 127 129 129 129 133 142 146 147 147 152 155

Contents

ix

5.3.1 Parametric Identification with a Specified Model Class . . . . 5.3.2 Parametric Identification Using Multiple Model Classes . . . 5.3.3 Parametric Identification Using the Most Plausible Model Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Predefined Model Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Self-Calibratable Model Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Parameterization and Model Classes . . . . . . . . . . . . . . . . . . . . 5.4.2 Self-Calibrating Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Procedure of the Real-Time System Identification with Self-Calibratable Model Classes . . . . . . . . . . . . . . . . . . . 5.5 Hierarchical Interhealable Model Classes . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Hierarchical Model Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Interhealing Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Triggering Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Procedure of the Real-Time System Identification Using Hierarchical Interhealable Model Classes . . . . . . . . . . 5.6 Applications to Bayesian Real-Time Model Class Selection for System Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Identification of High-Rise Building with Predefined Model Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Identification of Bouc-Wen Nonlinear Hysteresis System with Self-Calibratable Model Classes . . . . . . . . . . . . 5.6.3 Identification of Three-Dimensional Truss Dome with Hierarchical Interhealable Model Classes . . . . . . . . . . . 5.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

155 157

6 Online Distributed Identification for Wireless Sensor Networks . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Typical Architectures of Wireless Sensor Network . . . . . . . . . . . . . . 6.2.1 Centralized Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Decentralized Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Distributed Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Problem Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Compression and Extraction Technique at the Sensor Nodes . . . . . . 6.4.1 Compression and Extraction of the Updated State Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Compression and Extraction of the Covariance Matrix . . . . . 6.5 Bayesian Fusion at the Central Station . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 The Product of Univariate Gaussian PDFs . . . . . . . . . . . . . . . 6.5.2 The Product of Multivariate Gaussian PDFs . . . . . . . . . . . . . . 6.5.3 Fusion of the Compressed Local Information . . . . . . . . . . . . . 6.6 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Example 1: Forty-Story Building . . . . . . . . . . . . . . . . . . . . . . . 6.6.2 Example 2: Bridge with Two Piers . . . . . . . . . . . . . . . . . . . . . .

203 203 205 205 206 207 208 211

158 158 160 161 162 165 167 167 170 171 172 174 174 180 189 201 202

211 214 216 216 219 223 228 228 230

x

Contents

6.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 7 Online Distributed Identification Handling Asynchronous Data and Multiple Outlier-Corrupted Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Online Distributed Identification Framework . . . . . . . . . . . . . . . . . . . 7.2.1 Local Identification at the Sensor Nodes . . . . . . . . . . . . . . . . . 7.2.2 Bayesian Fusion at the Central Station . . . . . . . . . . . . . . . . . . 7.3 Online Distributed Identification Using Asynchronous Data . . . . . . 7.4 Application to Model Updating of a Sixteen-Bay Truss . . . . . . . . . . 7.5 Hierarchical Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Local Outlier Detection at the Sensor Nodes . . . . . . . . . . . . . 7.5.2 Global Outlier Detection at the Central Station . . . . . . . . . . . 7.6 Application to Model Updating of a Forty-Story Building . . . . . . . . 7.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

241 241 247 247 249 250 254 257 261 263 268 273 276

Nomenclature

C C f G h H i I J K M n Nd Nm No Ns P p r T y yi|i−1 yi|i z z i|i−1 φm Φ λ θ θk Σf

model class damping matrix excitation vector Kalman gain observation function (linearized) observation matrix time index at the sensor node time index at the central station objective function stiffness matrix mass matrix measurement noise vector number of degrees of freedom (DOFs) number of modes number of observed DOFs number of sensor nodes plausibility probability density function restoring force vector force distributing matrix augmented state vector one-step-ahead predicted state vector filtered state vector observation vector one-step-ahead predicted observation mode shape vector of the mth mode modal matrix, Φ = [φ 1 , φ 2 , . . . , φ Nm ] fading vector model parameter vector for identification stiffness parameter vector for identification covariance matrix of process noise xi

xii

Σn Σ y,i|i−1 Σ y,i|i Σz,i|i−1 R

Nomenclature

covariance matrix of measurement noise covariance matrix of the one-step-ahead predicted state vector covariance matrix of the filtered state vector covariance matrix of the one-step-ahead predicted observation set of real numbers

Chapter 1

Introduction

Abstract This chapter introduces the basis and fundamental concepts about system identification. First, dynamical systems and their basic properties are introduced. System identification is the problem of building mathematical models describing the behavior of a dynamical system based on the observations from the system. Traditionally, there are two levels of system identification problems, i.e., estimation of the uncertain parameters governed in the mathematical model and selection of an appropriate model class representing the underlying dynamical system. In addition, one of the special features of this book is that it will introduce the new third level of system identification using self-calibratable model classes. Real-time system identification is desirable for promptly acquiring information from the system for monitoring and control purposes. The Bayes’ theorem and system identification using Bayesian methods are briefly introduced. Finally, an overview of this book is given with outline of each chapter for the convenience of readers. This book introduces some recent developments in Bayesian real-time system identification. It contains two different perspectives on data processing for system identification, namely centralized and distributed. Keywords Bayesian inference · Dynamical system · Real-time estimation · System identification · Time-varying system

1.1 Dynamical Systems A dynamical system can be represented by a mathematical model describing the time dependence behavior of a point in an ambient space (Katok and Hasselblatt 1995). At any given time instant, a dynamical system has a state that represents a point in its state space. In other words, the states of a system are the variables that provide a complete representation of the internal condition or status of the system at any given time instant. The mathematical model characterizes the link between actual physical systems and mathematically feasible solutions. Representative examples include the oscillation of a single pendulum, the location of a moving vehicle, the daily weather of a city and population growth, etc. Given a dynamical system, the values of outputs rely on not only the instantaneous and past values of the inputs but also the behavior © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Huang and K.-V. Yuen, Bayesian Real-Time System Identification, https://doi.org/10.1007/978-981-99-0593-5_1

1

2

1 Introduction

k

Fig. 1.1 A SDOF mass-spring-damper model

c

m

f (t) xt

Accelerometer

of the system. One of the most commonly used examples for a dynamical system is the equation of motion of a mass-spring-damper model which consists of discrete mass nodes interconnected via springs and dampers. The mass-spring-damper model is well known in the studies of mechanical vibration for its simplicity and accurate imitation of natural behavior. Figure 1.1 shows a simple single-degree-of-freedom (SDOF) mass-spring-damper model. The mass node moves in response to the external force f applied to it. The derivation of the equation of motion for this SDOF massspring-damper model can be easily done by examining the sum of the forces on the mass node and the equation of motion can be expressed as follows: m x(t) ¨ + c x(t) ˙ + kx(t) = f (t)

(1.1)

where x denotes the displacement vector of the node; m, c and k indicate the mass of the node, damping coefficient and stiffness of the spring, respectively; f is the external force applied to the mass node. Equation (1.1) describes the behavior of the mass-spring-damper model in terms of its motion as a function of time. If an accelerometer is deployed on the mass node to observe its acceleration response, the input and output of this dynamical system are the external force f and the acceleration x¨ (subject to some measurement error), respectively.

1.1.1 Time-Invariant Systems Time-invariant systems refer to the systems whose behavior and characteristics are fixed over time (Oppenheim and Willsky 1997). This description is conceptually straightforward. For example, the SDOF mass-spring-damper system in Fig. 1.1 is time-invariant if the mass, damping and stiffness values m, c and k remain constant over time. The time-invariant property can be simply represented by using concept in signals. Specifically, a system is time-invariant if a time shift in the input signal leads to an identical time shift in the output signal (Cherniakov 2003). In particular, consider a system with time-dependent output y(t) and input f (t). A time-invariant system will have system output y(t + δ) when the input f (t + δ) is applied to the system, where δ is an arbitrary time shift. In the perspective of mathematics, a system will be regarded as time-invariant if the relationship between inputs f (t) and outputs y(t) is constant with respect to time t: y(t) = g( f (t), t) = g( f (t))

(1.2)

1.1 Dynamical Systems

3

where g(·) is the function describing the relationship between the input and output. Equation (1.2) implies that the system variation with time is identically equal to zero: ∂g( f (t), t) ∂g( f (t), t) d f (t) ∂g( f (t), t) d f (t) dy(t) = + = dt ∂t ∂ f (t) dt ∂ f (t) dt

(1.3)

where ∂g( f∂t(t),t) ≡ 0. There are various applications of time-invariant systems, including electrical circuit analysis and design (Fontana et al. 2019), signal processing and control theory (Chen et al. 2006), digital filter design (Shmaliy 2010), image processing (Messner and Szu 1985) and mechanical engineering (Araújo 2019). In addition to the property of time invariance, the conventional and classical theory of dynamical system identification often assumes that the system of interest is linear and classically damped. Linear time-invariant system theory has been developed as the fundamental materials of system identification for decades and it arises in a wide variety of applications (Phillips et al. 2003).

1.1.2 Time-Varying Systems In practical applications, the model parameters in the system of interest are often not constant over time. For example, if the stiffness k of the spring in the SDOF dynamical system in Fig. 1.1 changes over time, e.g., stiffness degradation due to long service duration, the SDOF dynamical system will no longer be time-invariant but time-varying. Time-varying systems indicate that the behavior and characteristics of the systems change over time. It is obvious that a shift of the input signal for a time-varying system does not simply shift the output signal in time. On the contrary, the time-frequency content of the output can be completely changed. In other words, time-varying systems respond differently to the same input at different instants of time. This is the major difference between time-invariant and time-varying systems. Figure 1.2 is the schematic diagram depicting this difference between these two types of systems. Compared with time-invariant systems, it is much more challenging to identify timevarying dynamical systems since the number of uncertain/unknown parameters in time-varying systems is typically substantially larger than that in the corresponding time-invariant systems.

4

1 Introduction

Inputs

Outputs

Time-invariant system

Time-varying system

Fig. 1.2 Difference between time-invariant system and time-varying system

1.2 System Identification System identification is applicable to virtually all areas of science and engineering. Any discipline concerned with mathematical modeling of its underlying system is a likely candidate for system identification. This includes electrical engineering (Kristinsson and Dumont 1992; Andrieu et al. 2004), mechanical engineering (Sirca and Adeli 2012; Noël and Kerschen 2017), astronomical engineering (Chiuso et al. 2009; Shore et al. 2010), computer science (Yang and Sakai 2007; Tepljakov et al. 2011), chemical engineering (Zheng and Hoo 2004; Vasquez et al. 2008), engineering geology (Kijewski and Kareem 2003; Wang et al. 2016), aerospace engineering (Cowan et al. 2001; Valasek and Chen 2003), robotics (Bemporad et al. 2005; Kozlowski 2012), economics (Heckman 2000), ecology (Wu et al. 2006), biology (Banga 2008), finance (Los 2006), sociology (Golding 1982) and many others (Zhang and Yin 2003). System identification considers the problem of building mathematical models of dynamical systems using measurements of the input and output signals (or sometimes output-only measurements) of the system (Ljung 1987, 1998). A model refers to a mathematical relationship between inputs and outputs. In general, the models of dynamical systems are described by difference or differential equations, transfer functions and/or state-space equations, etc. The typical process of system identification consists of: • Observing the input-output or output-only signals from the underlying dynamical system in the time- or frequency- domain; • Selecting a proper mathematical model for representing the dynamical system; • Applying an estimation method to obtain the estimated values of the uncertain parameters in the candidate mathematical model; • Utilizing the estimated model to evaluate whether the model is appropriate to represent the application requirements.

1.2 System Identification

5

It is noticed that there are, traditionally, two levels of system identification problems to be considered although they are strongly connected (Yuen 2010). The first level is parametric identification for a prescribed mathematical model with uncertain parameters. The second level is on the selection of a class of parametric models, namely model class hereafter, based on some observed input and output signals. However, in terms of model class selection, there is no guarantee of satisfactory modeling results since all the model class candidates may be unsuitable. In other words, a poor model class will still be chosen if there is no suitable model class among all the candidates. In Chap. 5, we will introduce a new level of system identification using self-calibratable model classes. The structure of the model classes can be adaptively modified. As a result, even if one starts from some poor model class candidates, there is still possibility to end up with a satisfactory modeling result. This is one of the key features of this book.

1.2.1 Problems in System Identification 1.2.1.1

Parametric Identification

The first level of system identification is to identify the physical parameters governing the underlying dynamical system, e.g., stiffnesses of some structural elements or damping coefficients of a dynamical system. These estimated parameters can be further utilized as indicators for analysis of the corresponding dynamical systems. For example, the estimated stiffnesses of some structural elements can indicate the integrity of the monitored structure and an abrupt change of an estimated stiffness indicates possible damage of the corresponding structural member(s). However, abrupt changes may be simply due to statistical uncertainty (Yuen 2010). As a result, in addition to estimating the uncertain parameters governing the dynamical system, it is highly desired to quantify the associated uncertainty of the estimation results so that the abrupt changes of the parameters can be appropriately utilized to reflect the possible variations of the underlying dynamical system. On the other hand, the distribution of the estimated parameters indicates the credibility level of the identification results. Example. Parametric estimation using the classical least squares estimation. Consider a simple example drawn from physics. A helical spring was attached at one end to a fixed point and was pulled horizontally by an external force F to the other end, as shown in Fig. 1.3. It was assumed that the spring had reached a state of equilibrium, which inferred that the length of the spring did not change anymore. Define x as the extension (positive) or compression (negative) of the spring. Then, the well-known Hooke’s law states that the extension or compression of the spring is proportional to the external force F (Rychlewski 1984):

6

1 Introduction

x

k

F

Fig. 1.3 Extension of a helical spring pulled by an external force

F = kx

(1.4)

where k is the parameter characterizing the stiffness of the spring, namely the stiffness parameter or spring constant. It was assumed that the stiffness parameter k was unknown and N experimental observations were obtained to estimate the value of k. Specifically, the input-output dataset {(xn , Fn )}, n = 1, 2, . . . , N , consisted of N independent pairs of measurements with different extensions of the spring and the corresponding magnitudes of the external force. The prediction error ε was represented by a zero-mean discrete Gaussian white noise process: Fn = kxn + εn ,

n = 1, 2, . . . , N

(1.5)

The goal is to determine the value of the stiffness parameter for the model in Eq. (1.4) such that the data is best fitted. The classical least squares estimation was used to identify the unknown parameter k (Sorenson 1970). The fitness of the model to a data point was evaluated by its residual defined as the difference between the observation and output value from the model: rn = Fn − kxn ,

n = 1, 2, . . . , N

(1.6)

1.2 System Identification

7

The optimal parameter value could be identified by minimizing the sum of the squared residuals (Bai 1994): /\

k = arg min k

N {

rn2

= arg min k

n=1

N {

(Fn − kxn )2

(1.7)

n=1

The analytical solution of Eq. (1.7) could be easily obtained as follows: /\

{N

n=1 k= { N

Fn xn

n=1

(1.8)

xn2

This simple example gives the gist of parametric identification. The uncertain stiffness parameter governed by Eq. (1.4) can be estimated by using least squares estimation based on Hooke’s law.

1.2.1.2

Determination of a Mathematical Model Class

The basis for parametric identification is subject to an appropriate parametric model. Therefore, a crucial problem of system identification is to determine a proper mathematical model class to represent the underlying dynamical system for future prediction. A model class consists of a set of predictive input-output models with adjustable parameters for a system (Beck 2010). The objective of model class selection is to improve the predictive capability of the models for dynamical systems design (Muto and Beck 2008). The conventional approach in system identification is to select the optimal model in a specified class of models, e.g., a class of shear building models or a class of hysteretic models (Beck and Yuen 2004). Given a set of model class candidates, it is obvious that a more complicated model class with extra adjustable uncertain model parameters always fits the measurements better than a simpler model class with fewer model parameters. In other words, if the selection criterion is based on the minimization of some prediction errors between the output data and the corresponding predictive results of the optimal model in each class, the optimal model class will be the most complicated one (Beck and Yuen 2004). However, it is insufficient to consider solely the data fitting capability for model class selection because it often leads to unnecessarily complicated model classes with poor predictive capability. Example. Polynomial curve fitting Consider the following input-output relationship: yn = −0.1xn3 − 0.3xn2 − 0.2xn + 6 + εn ,

n = 1, 2, . . . , N

(1.9)

A training dataset contained N = 20 data points, in which xn , n = 1, 2, . . . , N , were generated uniformly from the interval [−1, 1] and εn , n = 1, 2, . . . , N , were

8

1 Introduction

Fig. 1.4 Training dataset

generated by a zero-mean discrete Gaussian process with standard deviation σε = 0.1. The relationship in Eq. (1.9) was used for data generation purpose only. The training dataset is shown in Fig. 1.4. Polynomials with different orders were utilized to fit the dataset in Fig. 1.4 in the least-squares sense: f (x) = ad x d + ad−1 x d−1 + · · · + a2 x 2 + a1 x + a0 ,

d = 2, 5, 8, 10, 15, 19 (1.10)

where a0 , a1 , . . . , ad are the coefficients of the polynomial and they can be determined by using least squares estimation approach (Sorenson 1970). Figure 1.5 shows the fitting results of the corresponding polynomial functions by using the training dataset. The red dashed lines represent the true input-output relationship; the dots represent the training data; the solid lines represent the fitted polynomial functions; and the grey dashed lines represent the bounds of the 99.7% credible intervals. It is obvious to see that the polynomial functions with orders d = 2 and d = 5 were smooth curves. Their fitting errors were larger than the remaining functions since they have less adjustable coefficients than other polynomials with higher orders. In addition, the bounds of the credible intervals for the polynomial functions with orders d = 2 and d = 5 were wider than those of the remaining functions. Moreover, the curves of the polynomial functions with orders d = 8, d = 10 and d = 15 were more fluctuating than those of the aforementioned two

1.2 System Identification

9

Fig. 1.5 Fitting results of the polynomial functions with different orders

functions. However, the fitting errors of these three functions were smaller than the functions with orders d = 2 and d = 5. Finally, the 19th order polynomial fluctuated the most severely since there were 20 adjustable coefficients in this function. These coefficients could be determined uniquely by using the training dataset with 20 data points and the obtained fitting curve passed through each of the data points. As a result, this polynomial function had the smallest fitting error, namely zero. In order to evaluate the predictive capability of these obtained polynomial functions, a testing dataset containing the same number of data points as the training dataset was utilized, in which xn and εn , n = 1, 2, . . . , 20, were generated in the

10

1 Introduction

Fig. 1.6 Predictive results of the polynomial functions with different orders

same fashion as the training dataset. This testing dataset was not involved in the polynomial fitting process and it was employed to examine the model generalization capability for new data. Figure 1.6 shows the estimation results of the polynomial functions with orders d = 2, 5, 8, 10, 15, 19. The red dashed lines represent the true input-output relationship; the dots represent the testing data; the solid lines represent the predictive results from the corresponding polynomial functions; and the grey dashed lines represent the bounds of the 99.7% credible intervals. It is seen that the curves of the polynomial functions with orders d = 2 and d = 5 remained to be smooth and the corresponding estimation results were reliable. However, the

1.2 System Identification

11

curves of the polynomial functions with orders d = 8 and d = 10 using the testing dataset were more fluctuating than those using the training dataset. Finally, it is not surprising to observe that the performance of the polynomial functions with orders d = 15 and d = 19 was very poor since the estimation results were associated with extremely large estimation errors. These two polynomial functions were unnecessarily too complicated. Their coefficients relied heavily on the details of the training data (including the noise) and were severely influenced by the errors in the dataset. This example validates that a more complicated model class with more adjustable uncertain model parameters fits the observations better than a simpler model class with fewer model parameters. In other words, the best fitting model class is the most complicated one. However, it is insufficient to consider solely the data fitting capability for model class selection because it often leads to unnecessarily complicated model classes with poor predictive capability. If the model selection criterion is based on best fitting, the resultant model may provide catastrophic prediction results. The reason is that the selected model depends too much on the details of the data and the measurement noise and modeling error have a substantial role in the data fitting. As a result, in model class selection, it is necessary to penalize the model classes with complicated parameterization. This issue was first pointed out by Jeffreys Harold who did pioneering work on the application of Bayesian methods (Jeffreys 1961). He pointed out that it is necessary to construct a quantitative expression of the wellknown Ockham’s razor (Sivia 1996). Ockham’s razor, also known as the principle of parsimony, is a problem-solving principle that entities should not be multiplied beyond necessity (Schaffer 2015). It is generally understood in the sense that with competing theories or explanations, the simpler one, for example a model class with fewer parameters, is to be preferred. Although model class selection provides a basis for selecting the most suitable model class, it does not guarantee a good model class to be selected. This is because it is possible that all the model class candidates are unsatisfactory. To resolve this problem, we will introduce the third level of system identification, namely system identification using self-calibratable model classes. The structures of these model classes can be modified. Therefore, even when one starts from poor model classes, there is still a possibility to obtain a satisfactory one. This will be introduced in Chap. 5.

1.2.2 Real-Time System Identification Before we introduce real-time system identification, let’s consider real-time systems. Real-time system was first described by Martin James (Martin 1965) as a system controlling an environment by receiving data, processing data, and returning the results sufficiently quickly to affect the environment at that time. The term realtime describes various operations in computing or other processes, which ensure the response time within a specified short time. In other words, there is no significant delay in a real-time system.

12

1 Introduction

Real-time systems are online systems in nature. Strictly speaking, there is a slight difference between real-time systems and online systems in the sense of time latency between data acquisition and result reporting. In particular, there is no time lag for real-time systems so the systems receive data and respond immediately. However, online systems allow acceptably small time lag between data acquisition and result reporting, due to connection, loading and populating, etc. For example, when we move our computer mouse, the pointer is expected to react instantly and precisely follow our actions. This is a real-time system. When we add a photo to a website page, it takes some time, from a second to few minutes latency, to upload it. This is an online system. Pioneering works for system identification relied on offline computing strategy (Ljung 1987). In general, offline system identification methods require extensive input-output or output-only data to obtain the identification results. Figure 1.7 is a typical schematic diagram of the offline system identification. In offline estimation scheme, data is accumulated over a period of time and then the accumulated data is utilized to perform system identification algorithms. There are some appealing features for offline system identification methods. In general, the accuracy of the results from offline system identification is higher than that from online identification under the same conditions because offline methods can incorporate all available data to obtain the estimation results. Therefore, offline methods are more robust to small variations in systems. Furthermore, since there is typically no strict limitation about the computational time for offline calculation models, there are more flexibilities in the development of offline system identification algorithms. On the other hand, disadvantages of offline system identification are obvious. Offline methods usually require extensive data and thereby induce heavy computational burden. In addition, offline models are not adaptive to potential changes in the underlying dynamical system and cannot provide real-time or timely updated parameter estimation. The reason is that the offline identified model will not change until a new batch of data is utilized to obtain the updated model. For example, when offline identification methods are used to estimate the fundamental frequency of a structure, they often assume a single value of the fundamental frequency within the entire time period of measurements acquisition. Being a special case of real-time system, real-time system identification refers to the incremental identification process where data is acquired sequentially and utilized to update the estimation results at each time step. In other words, real-time system identification implies that identification is performed once a new data point

Input/output data f1 f2 ... fN

Computer

z1 z2 ... zN Fig. 1.7 Schematic diagram of offline system identification

Identification results

1.2 System Identification

13

becomes available and the corresponding results are obtained within sufficient small time duration. Real-time computing strategy is ideal for time-varying dynamical systems that receive data as a continuous flow and are required to adapt to rapidly changing conditions. In addition, real-time system identification is data efficient. Data efficient refers to the property that a data point is no longer required once it has been used for identification. From the identification perspective, there is no requirement to store the dataset for identification purpose. This is especially important for systems with continuous data acquisition requirement such as global positioning systems and structural health monitoring systems. Nevertheless, there is usually other consideration for storing the data although it is not for the use in the identification process in later time. The major difference between online and offline identification methods is the chronological of data acquisition and data processing. In online algorithms, identification of system is implemented when a new data is available during the operation of the system, as shown in Fig. 1.8. In offline estimation strategy, it is required to first collect all the input/output data and then to perform system identification algorithms based on the entire dataset, as shown in Fig. 1.7. Recursive least squares estimation is a conventional real-time system identification method (Söderström et. al. 1978) and it has extensive applications in adaptive control systems and monitoring the process for diagnostic purposes. It is based on the classical least squares estimation theory proposed by Gauss (Gauss 1963). Recursive least squares estimation is capable to obtain real-time updated results in the situation that measurements are obtained sequentially. However, least squares approaches are intuitive but heuristic. Although probability was not involved in the development of least squares approaches, the nature of least squares is compatible to the Gaussian distribution. In other words, Gaussian distributions were assumed implicitly. In the subsequent decades, sophisticated approaches were expanded along the idea of minimizing the squared discrepancies between the observed data and the value predicted by the model. Early methods require that the input signals are known (Chen et al. 1992; Yang and Lin 2005; Yang et al. 2006; Houtzager et al. 2011), whereas current methods can handle unknown excitation situations (Yang and Huang 2007; Askari et al. 2016a, 2019; Ji et al. 2019). Traditional methods rely on determining the uncertain parameters in a prescribed model class (Lin et al. 1990; Puttige and Anavatti 2008), but now there are algorithms dealing with simultaneous model class selection Input/output data ft-1 zt-1

Input/output data ft

zt

Input/output data ft+1 zt+1

Identification results at time t-1

Identification results at time t

Identification results at time t+1

t-1

t

t+1

Fig. 1.8 Schematic diagram of online system identification

Time

14

1 Introduction

and parametric identification (Le Borgne et al. 2007; Bayrak et al. 2018). In addition, observations from sensing systems are typically assumed as regular noisy data and synchronous (Lin et al. 1990; Hann et al. 2009). However, measurements corrupted with outliers and asynchronism can be considered in real-time system identification (Bar et al. 2003; Basu and Meckesheimer 2007; Angelov et al. 2011; Li et al. 2016; Ma et al. 2021; Cintuglu and Ishchenko 2021; Cavraro et al. 2022). In the twentieth century, the well-known Kalman filter (KF) was proposed and developed by Kalman (1960) and Kalman and Bucy (1961). The KF provides a linear, unbiased, and minimum error variance recursive algorithm to optimally estimate the unknown state of a dynamical system from noisy data recorded in the discrete time manner. It has been widely applied to various areas of industrial and technological applications in tracking systems (Kang et al. 2016), satellite navigation (Xiong et al. 2009), missile guidance (Tang and Borrie 1984), and flight control (Napolitano et al. 1998), etc. Moreover, with the recent development of high-speed computers, the KF and its variates enable the handling of complicated real-time applications (Valasek and Chen 2003; Askari et al. 2016b; Lei et al. 2019). On the other hand, the H-infinity filter (Jiang et al. 2016; Xiong et al. 2017) and particle filter (Chatzi and Smyth 2009; Park and Lee 2011; Cadini et al. 2019) were proposed to primarily tackle the problem of uncertain modeling errors and noise and highly nonlinear systems in real-time system identification, respectively. Recent interest has arisen to perform real-time system identification using machine learning algorithms (Sangkatsanee et al. 2011; Ahmad and Chen 2020; Verma et al. 2020; Wang et al. 2021). However, most, if not all, of the existing methods for real-time machine learning rely on an offline training process for implementing real-time estimation process (Mahrishi et al. 2020; Easwaran et al. 2022).

1.2.3 Bayesian System Identification The Bayes’ theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event (Joyce 2003). In the perspective of mathematics, the Bayes’ theorem is a classical statement of conditional probabilities. It is utilized to transform the available information into the required information involved in the problems to be resolved. Use D and θ to denote a set of observations and an uncertain parameter vector from a system, respectively. Based on the Bayes’ theorem, the updated/posterior probability density function (PDF) of the unknown parameters in the parameter vector θ can be obtained as follows: p(θ |D, C) =

p(D|θ , C) p(θ|C) p(D|C)

(1.11)

where C is the class of probabilistic and physical models used for the description of the concerned problem; and p(D|C) acts as the normalizing constant such that the integration of the right-hand side over the parameter space Θ yields unity:

1.2 System Identification

( p(θ |D, C)dθ = Θ

15

1 p(D|C)

( p(D|θ , C) p(θ |C)dθ = 1

(1.12)

Θ

The prior distribution p(θ |D) represents the prior information of the parameters and it is based on previous knowledge or user’s judgement. When there is no prior information about the unknown parameters, the prior distribution can be taken as a constant. The likelihood function p(D|θ , C) represents the contribution of the measured data in establishing the posterior PDF. It reflects how likely the measurements are observed from the model with a particular set of parameters. The likelihood function can be constructed given the class of probabilistic and physical models of the problem and it plays the vital role in Bayesian updating. If a large amount of measurement is available, the likelihood function will be the dominant factor for the Bayesian inference (Yuen 2010). A conventional way to perform system identification is to prescribe a parametric model for characterizing dynamical systems and then use the observations from the underlying system to estimate the values of the uncertain parameters in the parametric model. There are many approaches to determine the uncertain parameters in the prescribed model class. In the Bayesian approach, the uncertain parameters are encapsulated in the posterior PDF which is a joint PDF conditioned on the model assumptions and the observed data. Then, they can be obtained by maximizing the posterior PDF from the Bayes’ theorem to obtain the maximum a posteriori estimation. On the other hand, the uncertain parameters can be determined by using other non-Bayesian approaches. In particular, they can be estimated by maximizing the likelihood function to obtain the maximum likelihood estimation which is equivalent to maximum a posteriori estimation subject to a uniform prior over the parameter space. Furthermore, least squares estimation is regarded as a feasible approach to obtain the uncertain parameters and it is equivalent to the maximum likelihood estimation when the Gaussian model is used for the combined prediction and measurement errors. Although system identification is useful in many disciplines, there are several conceptual and computational difficulties. First, there is no true value of the uncertain parameters since any prescribed model gives essentially only an approximation of the actual behavior of systems (Beck 2010). Therefore, it raises the question about the basis for the choice of a single model in a set of models corresponding to the estimated parameters and the future prediction based on the prescribed model class. Second, it is often encountered in parametric estimation that the estimation results are not unique, particularly large number of uncertain parameters. Although these models fit the data equally well, they provide different future predictions. One feasible solution for resolving this issue is to preassign some parameters in the model class so that the remaining ones can be uniquely determined. However, it often leads to severely biased estimation results due to the subjectivity of the pre-assignment. The non-uniqueness issue was referred to as the problem of unidentifiability and it was well investigated by Beck and Katafygiotis (1998), Katafygiotis and Beck (1998) for modeling of structural dynamical systems in a probabilistic perspective. On the other

16

1 Introduction

hand, a maximum a posteriori estimate is typically unique by using regularization from the prior probability distribution. However, when the identification problem is unidentifiable, there is more than one model with the maximum posterior distribution value and the corresponding predictions should not be ignored thereby. Finally, since there is no single model to be expected to give perfect estimation results, it is very important to quantify the uncertain estimation errors. This point will be further elaborated in Sect. 1.3. Bayesian inference is one of the most significant applications of the Bayes’ theorem. It evaluates how the degree of belief should rationally change to account for the availability of related evidence and it is typically expressed as probability (Jeffreys 1973). Bayesian inference has found applications covering an extensive range of disciplines, including philosophy (Körding and Wolpert 2006), science (Trotta 2008; Sprenger and Hartmann 2019), engineering (Yuen 2010) and medicine (Goodman 1999). Bayesian inference provides a rigorous solution for system identification using probabilistic logic (Beck 2010). In recent decades, Bayesian system identification has attracted substantial attention and it has been applied to diverse areas of science and engineering (Peterka 1981; Ninness and Henriksen 2010; Green 2015; Green et al. 2015; Huang et al. 2017). This book focuses on Bayesian system identification methods for time-varying dynamical systems. Some recently developed real-time Bayesian methodologies are introduced to resolve the aforementioned critical issues encountered in time-varying dynamical system identification. In particular, two different perspectives of data processing, namely centralized and distributed identification, are provided. This book aims to provide novel identification insights for dealing with some challenging identification problems for time-varying dynamical systems.

1.3 Uncertainty In system identification, information about the input signals is often unknown. In addition, the magnitude of vibration responses for typical dynamical systems, e.g., structures, vehicles and aircrafts, is usually small so that the measurement noise is not negligible. In general, uncertainty can be quantified by probability distributions which depend on the state of information about the likelihood of what the single true value of the uncertain quantity is (Begg et al. 2014). As a result, uncertainty quantification can benefit immensely from the use of Bayesian inference for system identification problems (Yuen 2010). Regardless of the method used for system identification, it is impossible to achieve identification results with perfect precision from a given set of raw observations in practice. The finite amount of measurement is one of the core reasons. In general, there are two sources of identification uncertainty. The first source of identification uncertainty, namely aleatoric uncertainty, is the inherent characteristics of the system. This type of uncertainty is inherently unavoidable, such as modeling error. In the real world, there exist many types of unmodeled mechanisms for complex physical

1.4 Organization of the Book

17

systems. One common approach is to treat them as random variables or random processes such that statistical moments or probability distribution can be utilized to describe the characterizations of these random variables and random processes (Yuen 2010). For example, the input excitation is commonly modeled as a broad-band stochastic process such as stationary Gaussian white noise. In addition to aleatoric uncertainty, there is also epistemic uncertainty arising from imperfect or unknown information (Wheeler et al. 2020). This second type of uncertainty, namely epistemic uncertainty, is the exogenous factors disturbing the systems, such as limited number of observations and measurement error. In practical applications, the amount of information from the measured data is finite in terms of measured DOFs, sampling frequency and time duration. The input excitation cannot be observed in some circumstances and it can be regarded as part of the uncertain properties to be determined. Moreover, measurement error is unavoidable due to imperfect nature of data acquisition systems including sensor, data transmission medium and digitizing hardware, etc. In general, the aleatoric uncertainty cannot be eliminated and its influence remains in the entire identification process. However, in some situations, it can be reduced by utilizing some advanced modeling techniques. The conventional way for handling the aleatoric uncertainty is to keep it at a low level. Regarding the epistemic uncertainty, it has the same feature as the aleatoric uncertainty in the sense that it cannot be eliminated under a given set of data. The effective way is to reduce the adverse impact caused by the uncertain factors, such as increasing the amount of observed data and using high-precision equipment. By analyzing the identification results, it is feasible to verify, reveal and reduce the potential modeling errors in the underlying dynamical system.

1.4 Organization of the Book There are seven chapters in this book and each chapter is written in a self-contained manner by including essential modeling, algorithms and illustrations. Connections among different chapters are given in the abstract of each chapter. This chapter gives the general introduction of system identification and fundamental ideas of Bayesian inference. State-of-the-art applications of system identification in different disciplines are presented. The sources of identification uncertainty are elaborated. Chapter 2 introduces the well-known standard KF and the extended Kalman filter (EKF). The detailed derivations of the discrete-time KF and EKF are presented from the Bayesian perspective. By using the Bayes’ theorem, the conditional PDF for the prediction can be obtained in a recursive manner and the analytical solutions can be obtained. The standard KF keeps on tracking the estimated states of the system and the covariance matrix of the estimation results recursively, so it provides not only the state estimation but also the associated uncertainty in a real-time manner. Applications for a vehicle tracking system and a state estimation problem for a 60-story building are presented by using the KF algorithm for demonstration purpose. On the other

18

1 Introduction

hand, the nonlinear state space model for general nonlinear dynamical systems is introduced and it will be utilized in the EKF. By expanding the nonlinear equations using the Taylor series, the locally linearized state space model can be obtained and the procedures of the EKF algorithm are formulated in the same manner as the standard KF algorithm. The EKF with fading memory is also introduced to enhance the tracking capability of the EKF algorithm for time-varying systems. Applications to simultaneous states and model parameters estimation are presented. The EKF is the fundamental algorithm in this book. In the subsequent chapters, some recently developed methods will be introduced based on the EKF to resolve some critical issues in real-time system identification. A brief introduction about the Bayesian filtering technique for general situation is given as extended readings. Chapter 3 presents an online updating approach for noise parameter estimation in the EKF. One of the challenging issues in the application of the standard KF and the EKF is to prescribe the covariance matrices of the process noise and measurement noise since inappropriate assignment of these noise covariance matrices leads to unreliable estimation and erroneous uncertainty estimation of the system state and model parameters. More seriously, improper assignment of the noise covariance matrices induces diverging estimation. A novel method introduced in this chapter can update the noise covariance matrices in the standard KF and the EKF in an online manner. The parameterization of the noise covariance matrices in the EKF is introduced. Then, a Bayesian probabilistic algorithm for online estimation of the noise parameters is presented and the posterior PDF of the noise parameters is formulated. In order to settle the optimization problem efficiently, an online estimation scheme integrating a heuristic stochastic local search method and a gradient method is presented. Applications on states and parameters estimation of a Bouc-Wen model of hysteresis and structural damage detection of a three-pier bridge are used for demonstration. Chapter 4 introduces a novel real-time methodology for outlier detection and removal in the measurements from time-varying dynamical systems. The probability of outlier of a data point is defined and derived and the method utilizes this definition of outlier to assess the outlierness of each raw data point. The probability of outlier integrates the normalized residual, the measurement noise level and the size of the dataset and it provides a systematic and objective criterion to effectively detect the abnormal data points in the observations. Instead of using other adhoc judgement on outliers, this method provides an intuitive threshold 0.5 for outlier detection. Computationally efficient techniques are introduced to alleviate the computational burden encountered in the identification process using long-term monitoring data. The outlier detection algorithm can be embedded into the EKF. Therefore, it can not only remove the outliers in the measurements but also identify the time-varying system simultaneously. By excluding the outliers in the measurements, the algorithm ensures the stability and reliability of the estimation. A SDOF dynamical system and a three-pier bridge finite-element model are utilized to demonstrate the effectiveness of the method. In Chaps. 2, 3 and 4, the problem of parametric identification was considered for a prescribed model class with uncertain parameters. However, selection of an

1.4 Organization of the Book

19

appropriate model class for parametric identification is crucially important. Conventional methods resolve the critical issue on the selection of a suitable model class in an offline manner. However, the behavior of the underlying dynamical system can be time-varying. Chapter 5 addresses the critical problem of system identification for real-time selection of a suitable class of models and provides a scheme to adaptively reconfigure the model classes for real-time system identification. The Bayes’ theorem is utilized to formulate the plausibilities of some given model classes. Then, model class selection will be conducted according to the plausibilities in the realtime manner. The Bayesian model class selection approach is embedded into the EKF so this resultant algorithm provides simultaneous model class selection and parametric identification in the real-time manner. Although the algorithm presented in this chapter is based on the EKF, the real-time model class selection component can be easily embedded into other filtering tools. The aforementioned real-time model class selection method relies on the assumption that there is at least one suitable prescribed model class candidate. However, this is not always the case in practice. In order to resolve this problem, we will introduce the new third level of system identification, namely system identification using selfcalibratable model classes. This strategy allows to adaptively reconfigure the model classes in the real-time manner. The time-varying model parameters can be traced via these self-calibratable model classes. Moreover, although Bayesian model class selection allows one to choose among some prescribed model classes, it is often encountered that the number of possible model classes for large dynamical systems is huge. A novel real-time method with hierarchical interhealable model classes is presented. The model classes are established in a hierarchical manner so that the method requires only a small number of model classes and it is able to explore a large solution space. The modeling errors, including the estimation error in the states and model parameters and the deficiencies of the parametric models for the hierarchical model classes, can be corrected adaptively according to the measurements and the results from the optimal model class. In the previous chapters, real-time system identification problems have been considered under the centralized identification framework. Centralized identification requires transmitting all measured data to a single central processing unit, where global identification results are obtained. In Chaps. 6 and 7, novel real-time system identification methods are established based on the distributed identification framework. Chapter 6 presents the online dual-rate distributed identification approach for wireless sensor networks (WSNs). Distributed identification allows an individual unit to obtain the local estimation using part of the data, and the obtained local estimation can then be used as a basis for global estimation. Filtering method using only raw observations collected at each sensor is introduced. The preliminary local identification results are then compressed before transmitting to the central station for fusion. At the central station, Bayesian fusion is developed to integrate the compressed local identification results transmitted from the sensor nodes in order to obtain reliable global estimation. As a result, the large identification uncertainty in the local identification results can be substantially reduced. In addition to data compression,

20

1 Introduction

a dual-rate strategy for sampling and transmission/fusion is used to alleviate the data transmission burden so that online model updating can be realized efficiently for WSNs. Applications for states and model parameters estimation of a 40-story building and a three-pier bridge are utilized for demonstration. The online dual-rate distributed identification framework is utilized in Chap. 7 to resolve two critical issues, i.e., asynchronous measurements and multiple outliercorrupted measurements. Due to unavoidable imperfection of data acquisition systems, the measurements among different channels are generally asynchronous. In the distributed identification framework, since each sensor node uses only the data of its own, the local model identification results are not affected by asynchronism among different sensor nodes. Then, these local model estimation results will be compressed and sent to the central station for fusion. Note that this method requires neither a model nor quantification of the asynchronism. On the other hand, since the sensing systems for civil engineering structures are usually exposed to severe service environment, the measurements are inevitable to be corrupted with outliers. A hierarchical outlier detection approach is introduced. This method utilizes the method introduced in Chap. 4 to detect the local outliers according to the probability of outlier of the raw observations at the sensor nodes. However, after removing the outliers in the raw observations, there may still exist anomalies in the local estimation results due to segmental outliers, e.g., due to sensor bias. The method is able to detect the global outliers according to the probability of outlier of the local estimation results. The proposed methods can resolve the challenging problems of asynchronous measurements and multiple outlier-corrupted measurements effectively and achieve reliable identification results for time-varying dynamical systems in an online manner. Two applications are the online model updating of the time-varying dynamical systems.

References Ahmad T, Chen H (2020) A review on machine learning forecasting growth trends and their real-time applications in different energy systems. Sustain Cities Soc 54:102010 Andrieu C, Doucet A, Singh SS, Tadic VB (2004) Particle methods for change detection, system identification, and control. IEEE IJCNN 92(3):423–438 Angelov P, Sadeghi-Tehran P, Ramezani R (2011) A real-time approach to autonomous novelty detection and object tracking in video stream. Int J Intell Syst 26(3):189–205 Araújo JM (2019) Partial eigenvalue assignment in linear time-invariant systems using statederivative feedback and a left eigenvectors parametrization. Proc Inst Mech Eng I-J Syst 233(8):1085–1089 Askari M, Li J, Samali B (2016a) A compact self-adaptive recursive least square approach for real-time structural identification with unknown inputs. Adv Struct Eng 19(7):1118–1129 Askari M, Li J, Samali B (2016b) Application of Kalman filtering methods to online real-time structural identification: A comparison study. Int J Struct Stab Dyn 16(06):1550016 Askari M, Yu Y, Zhang C, Samali B, Gu X (2019) Real-time tracking of structural stiffness reduction with unknown inputs, using self-adaptive recursive least-square and curvature-change techniques. Int J Struct Stab Dyn 19(10):1950123

References

21

Bai J (1994) Least squares estimation of a shift in linear processes. J Time Ser Anal 15(5):453–472 Banga JR (2008) Optimization in computational systems biology. BMC Syst Biol 2(1):1–7 Bar T, Ståhlberg A, Muszta A, Kubista M (2003) Kinetic outlier detection (KOD) in real-time PCR. Nucl Acids Res 31(17):e105–e105 Basu S, Meckesheimer M (2007) Automatic outlier detection for time series: an application to sensor data. Knowl Inf Syst 11(2):137–154 Bayrak ES, Wang T, Tulsyan A, Coufal M, Undey C (2018) Product attribute forecast: adaptive model selection using real-time machine learning. IFAC Papers Online 51(18):121–125 Beck JL (2010) Bayesian system identification based on probability logic. Struct Control Health 17(7):825–847 Beck JL, Katafygiotis LS (1998) Updating models and their uncertainties. I: Bayesian statistical framework. J Eng Mech 124:455–461 Beck JL, Yuen KV (2004) Model selection using response measurements: Bayesian probabilistic approach. J Eng Mech 130(2):192–203 Begg SH, Welsh M, Bratvold RB (2014) Uncertainty vs. variability: what’s the difference and why is it important? In: SPE hydrocarbon economics and evaluation symposium, pp 273–293 Bemporad A, Garulli A, Paoletti S, Vicino A (2005) A bounded-error approach to piecewise affine system identification. IEEE Trans Automat Contr 50(10):1567–1580 Cadini F, Sbarufatti C, Corbetta M, Cancelliere F, Giglio M (2019) Particle filtering-based adaptive training of neural networks for real-time structural damage diagnosis and prognosis. Struct Control Health 26(12):e2451 Cavraro G, Comden J, Dall’Anese E, Bernstein A (2022) Real-time distribution system state estimation with asynchronous measurements. IEEE Trans Smart Grid 13(5):3813–3822 Chatzi EN, Smyth AW (2009) The unscented Kalman filter and particle filter methods for nonlinear structural system identification with non-collocated heterogeneous sensing. Struct Control Health 16(1):99–123 Chen CW, Huang JK, Phan M, Juang JN (1992) Integrated system identification and state estimation for control of flexible space structures. J Guid Control Dyn 15(1):88–95 Chen Y, Ahn HS, Xue D (2006) Robust controllability of interval fractional order linear time invariant systems. Signal Process 86(10):2794–2802 Cherniakov M (2003) An introduction to parametric digital filters and oscillators. John Wiley & Sons Chiuso A, Muradore R, Marchetti E (2009) Dynamic calibration of adaptive optics systems: a system identification approach. IEEE Trans Contr Syst Technol 18(3):705–713 Cintuglu MH, Ishchenko D (2021) Real-Time asynchronous information processing in distributed power systems control. IEEE Trans Smart Grid 13(1):773–782 Cowan TJ, Arena AS Jr, Gupta KK (2001) Accelerating computational fluid dynamics based aeroelastic predictions using system identification. J Aircraft 38(1):81–87 Easwaran B, Hiran KK, Krishnan S, Doshi R (2022) Real-time applications of machine learning in cyber-physical systems. IGI global Fontana G, Grasso F, Luchetta A, Manetti S, Piccirilli MC, Reatti A (2019) A symbolic program for parameter identifiability analysis in systems modeled via equivalent linear time-invariant electrical circuits, with application to electromagnetic harvesters. Int J Numer Model El 32(4):e2251 Gauss CF (1963) Theory of the motion of the heavenly bodies moving about the sun in conic section. Dover, New York Golding R (1982) Freud, psychoanalysis, and sociology: some observations on the sociological analysis of the individual. Brit J Sociol 33(4):545–562 Goodman SN (1999) Toward evidence-based medical statistics. 2: The Bayes factor. Ann Int Med 130(12):1005–1013 Green PL (2015) Bayesian system identification of a nonlinear dynamical system using a novel variant of simulated annealing. Mech Syst Signal Process 52:133–146

22

1 Introduction

Green PL, Cross EJ, Worden K (2015) Bayesian system identification of dynamical systems using highly informative training data. Mech Syst Signal Process 56:109–122 Hann CE, Singh-Levett I, Deam BL, Mander JB, Chase JG (2009) Real-time system identification of a nonlinear four-story steel frame structure—application to structural health monitoring. IEEE Sens J 9(11):1339–1346 Heckman JJ (2000) Causal parameters and policy analysis in economics: a twentieth century retrospective. Q J Econ 115(1):45–97 Houtzager I, Van Wingerden JW, Verhaegen M (2011) Recursive predictor-based subspace identification with application to the real-time closed-loop tracking of flutter. IEEE Trans Contr Syst Technol 20(4):934–949 Huang Y, Beck JL, Li H (2017) Bayesian system identification based on hierarchical sparse Bayesian learning and Gibbs sampling with application to structural damage assessment. Comput Meth Appl Mech 318:382–411 Jeffreys H (1961) Theory of probability, 3rd edn. Oxford Clarendon Press, Oxford, UK Jeffreys H (1973) Scientific inference. Cambridge University Press Ji J, Yang M, Jiang L, He J, Teng Z, Liu Y, Song H (2019) Output-only parameters identification of earthquake-excited building structures with least squares and input modification process. Appl Sci 9(4):696 Jiang C, Zhang SB, Zhang QZ (2016) A new adaptive H-infinity filtering algorithm for the GPS/INS integrated navigation. Sensors 16(12):21–27 Joyce J (2003) Bayes’ theorem. Zalta, Edward N Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82:35–45 Kalman RE, Bucy RS (1961) New results in linear filtering and prediction theory. J Basic Eng 83:95–108 Kang CH, Park CG, Song JW (2016) An adaptive complementary Kalman filter using fuzzy logic for a hybrid head tracker system. IEEE Trans Instr Meas 65(9):2163–2173 Katafygiotis LS, Beck JL (1998) Updating models and their uncertainties. II: Model identifiability. J Eng Mech 124:463–467 Katok A, Hasselblatt B (1995) Introduction to the modern theory of dynamical systems. Cambridge University Press Kijewski T, Kareem A (2003) Wavelet transforms for system identification in civil engineering. Comput Aided Civ Inf 18(5):339–355 Körding KP, Wolpert DM (2006) Bayesian decision theory in sensorimotor control. Trends Cogn Sci 10(7):319–326 Kozlowski KR (2012) Modelling and identification in robotics. Springer Science & Business Media Kristinsson K, Dumont GA (1992) System identification and control using genetic algorithms. IEEE Trans Syst Man Cybernet 22(5):1033–1046 Le Borgne YA, Santini S, Bontempi G (2007) Adaptive model selection for time series prediction in wireless sensor networks. Signal Process 87(12):3010–3020 Lei Y, Xia D, Erazo K, Nagarajaiah S (2019) A novel unscented Kalman filter for recursive stateinput-system identification of nonlinear systems. Mech Syst Signal Process 127:120–135 Li L, Ding SX, Qiu J, Yang Y (2016) Real-time fault detection approach for nonlinear systems and its asynchronous T-S fuzzy observer-based implementation. IEEE T Cybernet 47(2):283–294 Lin CC, Soong TT, Natke HG (1990) Real-time system identification of degrading structures. J Eng Mech 116(10):2258–2274 Ljung L (1987) System identification: theory for the user. Prentice-Hall Inc., Englewood Cliffs, NJ Ljung L (1998) System identification. Signal analysis & prediction. Birkhäuser, Boston, MA, pp 163–173 Los CA (2006) System identification in noisy data environments: An application to six Asian stock markets. J Bank Financ 30(7):1997–2024 Ma X, Wen C, Wen T (2021) An asynchronous and real-time update paradigm of federated learning for fault diagnosis. IEEE Trans Ind Inf 17(12):8531–8540

References

23

Mahrishi M, Hiran KK, Meena G, Sharma P (2020) Machine learning and deep learning in real-time applications. IGI global Martin J (1965) Programming real-time computer systems. Prentice-Hall Inc., NJ Messner RA, Szu HH (1985) An image processing architecture for real time generation of scale and rotation invariant patterns. Comput Graph Image Process 31(1):50–66 Muto M, Beck JL (2008) Bayesian updating and model class selection for hysteretic structural models using stochastic simulation. J Vib Control 14(1–2):7–34 Napolitano MR, Windon DA, Casanova JL, Innocenti M, Silvestri G (1998) Kalman filters and neural-network schemes for sensor validation in flight control systems. IEEE Trans Contr Syst Technol 6(5):596–611 Ninness B, Henriksen S (2010) Bayesian system identification via Markov chain Monte Carlo techniques. Automatica 46(1):40–51 Noël JP, Kerschen G (2017) Nonlinear system identification in structural dynamics: 10 more years of progress. Mech Syst Signal Proc 83:2–35 Oppenheim AV, Willsky AS (1997) Signals & systems (second edition). Prentice Hall Park CB, Lee SW (2011) Real-time 3D pointing gesture recognition for mobile robots with cascade HMM and particle filter. Image Vision Comput 29(1):51–63 Peterka V (1981) Bayesian system identification. Automatica 17(1):41–53 Phillips CL, Parr JM, Riskin EA, Prabhakar T (2003) Signals, systems, and transforms. Prentice Hall, Upper Saddle River Puttige VR, Anavatti SG (2008) Real-time system identification of unmanned aerial vehicles: a multi-network approach. J Comput 3(7):31–38 Rychlewski J (1984) On Hooke’s law. J Appl Math Mech 48(3):303–314 Sangkatsanee P, Wattanapongsakorn N, Charnsripinyo C (2011) Practical real-time intrusion detection using machine learning approaches. Comput Commun 34(18):2227–2235 Schaffer J (2015) What not to multiply without necessity. Australas J Philos 93(4):644–664 Shmaliy YS (2010) Linear optimal FIR estimation of discrete time-invariant state-space models. IEEE Trans Signal Proces 58(6):3086–3096 Shore P, Cunningham C, DeBra D, Evans C, Hough J, Gilmozzi R, Kunzmann H, Morantza P, Tonnellier X (2010) Precision engineering for astronomy and gravity science. CIRP Ann 59(2):694–716 Sirca GF Jr, Adeli H (2012) System identification in structural engineering. Sci Iran 19(6):1355– 1364 Sivia DS (1996) Data analysis: a Bayesian tutorial. Oxford Science Publications, Oxford, UK Söderström T, Ljung L, Gustavsson I (1978) A theoretical analysis of recursive identification methods. Automatica 14(3):231–244 Sorenson HW (1970) Least-squares estimation: from Gauss to Kalman. IEEE Spectr 7(7):63–68 Sprenger J, Hartmann S (2019) Bayesian philosophy of science. Oxford University Press Tang YM, Borrie JA (1984) Missile guidance based on Kalman filter estimation of target maneuver. IEEE Trans Aerosp Electron Syst 6:736–741 Tepljakov A, Petlenkov E, Belikov J (2011) FOMCOM: a MATLAB toolbox for fractional-order system identification and control. Int J Microelectron Comput Sci 2(2):51–62 Trotta R (2008) Bayes in the sky: Bayesian inference and model selection in cosmology. Contemp Phys 49(2):71–104 Valasek J, Chen W (2003) Observer/Kalman filter identification for online system identification of aircraft. J Guid Control Dyn 26(2):347–353 Vasquez JR, Perez RR, Moriano JS, Gonzalez JP (2008) System identification of steam pressure in a fire-tube boiler. Comput Chem Eng 32(12):2839–2848 Verma C, Stoffová V, Illés Z, Tanwar S, Kumar N (2020) Machine learning-based student’s native place identification for real-time. IEEE Access 8:130840–130854 Wang Y, Alangari M, Hihath J, Das AK, Anantram MP (2021) A machine learning approach for accurate and real-time DNA sequence identification. BMC Genom 22(1):1–10

24

1 Introduction

Wang Y, Cao Z, Li D (2016) Bayesian perspective on geotechnical variability and site characterization. Eng Geol 203:117–125 Wheeler DM, Meenken E, Espig M, Sharifi M, Shah M, Finlay-Smits SC (2020) Uncertainty-what is it? Nutr Manage Farmed Landsc 33 Wu MCK, David SV, Gallant JL (2006) Complete functional characterization of sensory neurons by system identification. Annu Rev Neurosci 29:477–505 Xiong K, Liu LD, Zhang HY (2009) Modified unscented Kalman filtering and its application in autonomous satellite navigation. Aerosp Sci Technol 13(4–5):238–246 Xiong R, Yu Q, Lin C (2017) A novel method to obtain the open circuit voltage for the state of charge of lithium ion batteries in electric vehicles by using H infinity filter. Appl Energ 207:346–353 Yang JM, Sakai H (2007) A new adaptive filter algorithm for system identification using independent component analysis. IEICE Trans Fund Electron 90(8):1549–1554 Yang JN, Huang H (2007) Sequential non-linear least-square estimation for damage identification of structures with unknown inputs and unknown outputs. Int J Nonlin Mech 42(5):789–801 Yang JN, Huang H, Lin S (2006) Sequential non-linear least-square estimation for damage identification of structures. Int J Nonlin Mech 41(1):124–140 Yang JN, Lin S (2005) Identification of parametric variations of structures based on least squares estimation and adaptive tracking technique. J Eng Mech 131(3):290–298 Yuen KV (2010) Bayesian methods for structural dynamics and civil engineering. John Wiley & Sons Zhang JF, Yin GG (2003) System identification using binary sensors. IEEE Trans Automat Contr 48(11):1892–1907 Zheng D, Hoo KA (2004) System identification and model-based control for distributed parameter systems. Comput Chem Eng 28(8):1361–1375

Chapter 2

System Identification Using Kalman Filter and Extended Kalman Filter

Abstract This chapter presents the standard Kalman filter (KF) and the extended Kalman filter (EKF). The detailed derivations of the discrete-time KF and EKF are presented from a Bayesian perspective. In order to formulate the KF algorithm, the state space model of a linear dynamical system is introduced. By using the Bayes’ theorem, the conditional probability density function for prediction can be obtained in a recursive manner and the analytical solutions can be obtained. Applications for a vehicle tracking system and a structural state estimation problem are provided to demonstrate the state tracking capability of the KF algorithm. Afterwards, the nonlinear state space model for general nonlinear dynamical systems is introduced and it will be utilized in the EKF. By expanding the nonlinear equations using the Taylor series expansion, the locally linearized state space model can be obtained and the procedures of the EKF algorithm are formulated in the same manner as the standard KF algorithm. The EKF with fading memory is introduced to enhance the tracking capability for time-varying systems. Applications to simultaneous states and model parameters estimation are presented. The KF algorithm and its variants provide an effective and efficient way for recursive estimation. More importantly, the EKF is the fundamental algorithm in this book. Keywords Bayesian inference · Extended Kalman filter · Fading memory · Kalman filter · Real-time estimation · Structural health monitoring

2.1 Introduction The Kalman filter (KF) was developed by Rudolf Emil Kalman in 1960 (Kalman 1960) and it was proposed to formulate and solve the Wiener problem. Afterwards, the KF algorithm was implemented on the Apollo Project in 1961 to resolve the navigation problem of spacecraft (Grewal and Andrews 2010). The KF uses a series of observations recorded over time and produces estimations of unknown state variables with the consideration of measurement noise. The KF is regarded as an optimal state estimator since it provides the state estimation results by minimizing the expected value of the mean squared error of the residual between the estimated and the actual values recursively. As a result, the KF algorithm and its variants have been applied © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Huang and K.-V. Yuen, Bayesian Real-Time System Identification, https://doi.org/10.1007/978-981-99-0593-5_2

25

26

2 System Identification Using Kalman Filter and Extended Kalman Filter

to various technological applications, including guidance, navigation and aerospace engineering, vehicles control, audio signal processing, remote surveillance, telecommunications, weather forecasting and many other fields. Due to the limited space, we can only briefly introduce some representative applications of the KF and its variants in the following. • Missile guidance Missile Guidance refers to the process that guides a missile to its intended target. The target accuracy of a missile is the utmost critical factor for the effectiveness in missile guidance. As a result, the guidance system aims to enhance the missile accuracy by improving its probability of guidance (Siouris 2004). By minimizing the distance between the missile and the target, the KF and the extended Kalman filter (EKF) are typically utilized to estimate the position and velocity of an attacking missile and the target using the observations from radars (Nesline and Zarchan 1981; Pan et al. 2010). • Global positioning system navigation Global positioning system (GPS) is a satellite-based radionavigation system. It provides geolocation and time information to a GPS receiver nearly everywhere on the earth by using measurements of arrival time difference from several GPS satellites. The GPS receiver generally uses the KF and EKF to calculate the current position and velocity of the target (Qi and Moore 2002; Wang et al. 2006). In addition, the ephemeris information, which indicates the trajectory of naturally occurring astronomical objects and artificial satellites, is typically calculated by using the KF (Xu et al. 2012). • Guidance, navigation, and control of vehicles Guidance, navigation and control deal with the design of systems to control the movement of vehicles, particularly, ships, aircrafts and spacecrafts positioned dynamically. Guidance refers to the determination of the desired trajectory linking the current position of vehicles with a designated target; navigation refers to the determination of the location, velocity and attitude at a given time instant; and control refers to the application of the forces by ways of commanding accelerators, steers and thrusters, etc., to implement guidance commands while maintaining the stability of the vehicles. The KF and EKF utilize the observed data from various measuring instruments including rate gyroscopes, accelerometer, compass, GPS, airspeed and barometric pressure, to estimate the position, velocity and angular orientation of the vehicles (Zarchan and Musoff 2000; Kendoul 2012). • Target tracking Target tracking refers to the process that utilizes multiple measurements from various types of sensors (e.g., radars, acoustic array and optical information) to determine the position and velocity of a remote target. The simplest tracking system is a single

2.1 Introduction

27

target tracking system in a clutterless environment, where only one target exists in an area of interest. When tracking is performed continuously in time, the estimated trajectory of the target is typically obtained by using the KF and EKF (Musicki et al. 2007). In addition to single target tracking, multiple targets tracking can be utilized to track multiple targets at the same time in the same geographical area. Challenging problems, such as data association and multi-assignment issue, are usually encountered in the multiple targets tracking problem. Once the data association is resolved, the Kalman filtering techniques can be utilized to determine the location of the target (Kreucher et al. 2005). • Medical imaging Medical imaging refers to the technique for imaging the interior of a body for clinical analysis and medical intervention. Electroencephalography and magnetoencephalography are widely employed in brain imaging to reconstruct the position and waveform of the neuronal activity inside the brain from noisy sensing data, such as scalp potentials. The minimum norm estimation and its variants are typically utilized to determine the field with certain prior structure from observations. Combining the dynamical prior and the observations leads to a spatio-temporal estimation problem, which can be resolved by using the KF and its variants (Galka et al. 2004; Lamus et al. 2012). • Structural health monitoring Structural health monitoring (SHM) refers to the process that evaluates the integrity of civil infrastructures by using periodically sampled response observations. The KF and EKF are common tools for the identification of the structural states (i.e., displacement and velocity) and the unknown model parameters characterizing the structural systems (Hoshiya and Saito 1984; Koh and See 1994). It is worth noting that the concept and its theory of structural health monitoring can be widely employed in different disciplines of engineering, such as mechanical engineering, aeronautics and astronautics, electrical and electronic engineering and biomedical engineering. In other words, the KF and its variants can be used to perform health monitoring of mechanical, electrical and aerospace systems, etc. As the fundamental algorithms of this book, the standard KF and EKF in discrete time are introduced in this chapter because it is the commonly used form in practice. In Sect. 2.2, the essential recursive equations of the standard KF are derived in detail by using the Bayes’ theorem and two applications for state estimation are presented in Sect. 2.3. In Sect. 2.4, the essential recursive equations of the EKF are elaborated and the EKF with fading memory is introduced to enhance the tracking capability of the filtering technique for time-varying systems. Three representative applications for simultaneous estimation of states and model parameters are presented in Sect. 2.5. Then, Sect. 2.6 presents a practical field application of the EKF using the SHM data from the Canton Tower. Extended readings of the recursive Bayesian filtering technique are briefly introduced in Sect. 2.7. Finally, concluding remarks are given in Sect. 2.8.

28

2 System Identification Using Kalman Filter and Extended Kalman Filter

2.2 Standard Kalman Filter 2.2.1 Derivation of the Discrete-Time Kalman Filter Consider a linear dynamical system with Nd degrees of freedom (DOFs) and equation of motion: M x¨ (t) + C x˙ (t) + Kx(t) = T f (t)

(2.1)

where x(t) denotes the generalized coordinate vector of the system at time t; M, C and K are the mass, damping and stiffness matrix of the system, respectively; f is the excitation applied to the system; and T is the influence matrix associated with the excitation f . The state vector at time t, u(t), is defined to include the displacement vector and the velocity vector: ]T [ u(t) ≡ x(t)T , x˙ (t)T

(2.2)

Equation (2.1) can be converted to the following state-space representation: ˙ u(t) = Au u(t) + Bu f (t)

(2.3)

where the matrices Au and Bu are given by: [

I Nd 0 Nd ×Nd Au = −M−1 K −M−1 C ] [ 0 Nd ×N f Bu = M−1 T

] (2.4)

(2.5)

Hereafter, 0 Nd ×Nd and I Nd represent the Nd × Nd zero matrix and Nd × Nd identity matrix, respectively; N f denotes the number of excitations applied to the system, i.e., f (t) ∈ R N f . By assuming that the excitation is constant within any time interval: f (iΔt + ξ ) = f (iΔt),

∀ξ ∈ [0 , Δt), i = 0, 1, 2, . . .

(2.6)

Equation (2.3) can be discretized to a difference equation: ui+1 = Aui + B f i

(2.7)

where ui ≡ u(iΔt); f i ≡ f (iΔt); the state transition matrix A is given as follows: A = exp(Au Δt)

(2.8)

2.2 Standard Kalman Filter

29

where Δt is the sampling Σ timenstep and the matrix exponential is given by the power series, i.e., exp(X) = ∞ n=0 X /n!. It is worth noting that the matrix exponential can be calculated by using the function ‘expm’ in MATLAB® (Al-Mohy and Higham 2010). The input-to-state matrix B is given by: ( ) B = A−1 u A − I2Nd Bu

(2.9)

The excitation f is modeled as zero-mean discrete Gaussian white noise with covariance matrix Σ f : [ ] E f i = 0 N f ×1

(2.10)

[ ] E f i f Tj = Σ f δi j , i, j = 0, 1, 2, . . .

(2.11)

where 0 N f ×1 is the N f × 1 zero column vector; E[.] denotes the mathematical expectation; δi j is the discrete-time unit impulse function (also known as the Dirac delta function) given by: ( δi j =

1, i = j 0, i /= j

(2.12)

It is assumed that discrete-time response measurements are observed at N0 DOFs of the dynamical system. There is a difference between the measured responses and the actual responses due to the measurement noise. As a result, the noise-corrupted measurements can be expressed as follows: z i+1 = Hui+1 + ni+1

(2.13)

where H ∈ R N0 ×2Nd is the observation matrix; ni+1 refers to the measurement noise at the (i + 1)th time step and n is modeled as Gaussian independent and identically distributed (i.i.d.) process with zero mean and covariance matrix Σ n : E[ni ] = 0 N0 ×1

(2.14)

] [ E ni nTj = Σ n δi j , i, j = 0, 1, 2, . . .

(2.15)

In addition, the measurement noise n is assumed to be statistically independent to the excitation f : ] [ E f i nTj = 0 N f ×N0 , i, j = 0, 1, 2, . . .

(2.16)

First, a priori state estimation (which can also be referred to as the one-stepahead predicted state vector) can be obtained by computing the expected value of

30

2 System Identification Using Kalman Filter and Extended Kalman Filter

ui+1 in Eq. (2.7) conditioned on the measurement dataset up to the ith time step Di = {z 1 , z 2 , . . . , z i }: ] [ ui+1|i ≡ E ui+1 |Di [ ] = E Aui + B f i |Di ] [ = AE[ui |Di ] + BE f i |Di = Aui|i

(2.17)

where ui|i is defined as follows: ui|i ≡ E[ui |Di ]

(2.18)

Then, the prediction error between the actual state and the priori state estimation at the (i + 1)th time step can be expressed as follows: εi+1 = ui+1 − ui+1|i

(2.19)

Thus, the covariance matrix of the prediction error is readily obtained as follows: I ] [ T I Σ u,i+1|i ≡E εi+1 εi+1 IDi [( )( )T II ] =E ui+1 − ui+1|i ui+1 − ui+1|i IDi [( )( )T II ] =E Aui + B f i − Aui|i Aui + B f i − Aui|i IDi [[ ( ) ][ ( ) ]T II ] =E A ui − ui|i + B f i A ui − ui|i + B f i IDi [ ( )( )T ) ( )T II ] ( =E A ui − ui|i ui − ui|i AT + B f i f iT BT + A ui − ui|i f iT BT + B f i ui − ui|i AT IDi I ] [( [ )( )T II ] I =AE ui − ui|i ui − ui|i IDi AT + BE f i f iT IDi BT I [( [ ] ) I ( )T II ] + AE ui − ui|i f iT IDi BT + BE f i ui − ui|i IDi AT (2.20)

( ) Since ui − ui|i is uncorrelated with f i , Eq. (2.20) can be further simplified as follows: [( I ] )( )T II ] [ Σ u,i+1|i = AE ui − ui|i ui − ui|i I Di AT + BE f i f iT I Di BT = AΣ u,i|i AT + BΣ f BT

(2.21)

where Σ u,i|i is defined as follows: εi = ui − ui|i

(2.22)

I ] [ Σ u,i|i ≡ E εi εiT I Di [( )( )T II ] = E ui − ui|i ui − ui|i I Di

(2.23)

2.2 Standard Kalman Filter

31

According to the Bayes’ theorem, the posterior probability density function (PDF) of ui+1 given the measurement dataset Di+1 up to the (i + 1)th time step can be given as follows: p( ui+1 |Di+1 ) = p( ui+1 |z i+1 , Di ) p( ui+1 |Di ) p( z i+1 |ui+1 , Di ) = p( z i+1 |Di )

(2.24)

where p( z i+1 |Di ) is a normalizing constant such that the integral of the posterior PDF over the entire parameter space is unity; p( ui+1 |Di ) is the prior PDF of ui+1 . Since the excitation follows the Gaussian distribution, the prior PDF of ui+1 given the measurement dataset Di also follows the Gaussian distribution with mean ui+1|i and covariance matrix Σ u,i+1|i : I I− 1 p( ui+1 |Di ) =(2π )−Nd IΣ u,i+1|i I 2 ] [ )T ( ) 1( u × exp − ui+1 − ui+1|i Σ −1 − u i+1 i+1|i u,i+1|i 2

(2.25)

The likelihood function p( z i+1 |ui+1 , Di ) reflects the contribution of the measurement z i+1 in establishing the posterior PDF at the (i + 1)th time step and it also follows the Gaussian distribution with mean Hui+1 and covariance matrix Σ n : p( z i+1 |ui+1 , Di ) = p( z i+1 |ui+1 ) −

= (2π )

N0 2

− 21

|Σ n |

] [ 1 T −1 exp − (z i+1 − Hui+1 ) Σ n (z i+1 − Hui+1 ) 2 (2.26)

By substituting Eqs. (2.25) and (2.26) into Eq. (2.24), the posterior PDF of ui+1 given the measurement dataset Di+1 is obtained as: I ) ( I ) ( I ) p ui+1 I Di p zi+1 Iui+1 , Di ( I ) ( p ui+1 I Di+1 = p zi+1 I Di =

=

(2π )

N I− 1 −Nd − 20 II −1 Σ u,i+1|i I 2 |Σ n | 2

I ) ( p z i+1 I Di ] [ )T ( ) 1( )T ( ) 1( z i+1 − Hui+1 Σ −1 ui+1 − ui+1|i − zi+1 − Hui+1 × exp − ui+1 − ui+1|i Σ −1 n u,i+1|i 2 2 N I− 1 −1 −Nd − 20 II Σ u,i+1|i I 2 |Σ n | 2

( ) 1 T 1 T I ) exp − ui+1|i Σ −1 ui+1|i − z i+1 Σ −1 n z i+1 u,i+1|i I 2 2 p z i+1 Di [ ) ( )] 1 T ( −1 T × exp − ui+1 Σ u,i+1|i + HT Σ −1 H ui+1 + ui+1 ui+1|i + HT Σ −1 zi+1 Σ −1 n n u,i+1|i 2

(2π )

(

(2.27) As a result, the posteriori state estimation of ui+1 can be obtained by maximizing the posterior PDF p( ui+1 |Di+1 ) in Eq. (2.27) or equivalently, by minimizing the

32

2 System Identification Using Kalman Filter and Extended Kalman Filter

objective function: ui+1|i +1 = arg min J (ui+1 ) ui+1

(2.28)

] [ where ui+1|i +1 ≡ E ui+1 |Di+1 , and the objective function J (ui+1 ) is defined as the negative logarithm of the posterior PDF p( ui+1 |Di+1 ): J (ui+1 ) ≡ − ln p( ui+1 |Di+1 ) ) 1 T ( −1 =κ0 + ui+1 Σu,i+1|i + HT Σn−1 H ui+1 2( ) −1 T Σu,i+1|i + ui+1 ui+1|i + HT Σn−1 zi+1

(2.29)

where the constant κ0 is given by: ) ( I 1 1 I N0 ln(2π ) + lnIΣ u,i+1|i I + ln|Σ n | + ln p( z i+1 |Di ) κ0 = N d + 2 2 2 1 T 1 T −1 + ui+1|i Σ −1 (2.30) u,i+1|i ui+1|i + z i+1 Σ n z i+1 2 2 It is noticed that the constant κ0 does not depend on ui+1 . The closed-form solution for the optimization problem formulated in Eq. (2.28) can be easily obtained as follows: ( )−1 ( −1 ) T −1 ui+1|i +1 = Σ −1 Σ u,i+1|i ui+1|i + HT Σ −1 n z i+1 u,i+1|i + H Σ n H

(2.31)

( ) T −1 To avoid computing the inverse of the matrix Σ −1 u,i+1|i + H Σ n H , the Woodbury matrix identity (Henderson and Searle 1981) can be utilized: ( )−1 (A + BCD)−1 = A−1 − A−1 B C−1 + DA−1 B DA−1

(2.32)

−1 T with A = Σ −1 u,i+1|i , B = H , C = Σ n and D = H. Then, Eq. (2.31) can be rewritten as follows:

( ) ui+1|i +1 = ui+1|i + Gi+1 z i+1 − Hui+1|i

(2.33)

where Gi+1 is the Kalman gain matrix (Kalman 1960) given by: ( )−1 Gi+1 = Σ u,i+1|i HT HΣ u,i+1|i HT + Σ n

(2.34)

On the other hand, according to Eq. (2.27), it is noticed that ui+1 follows the multivariate Gaussian distribution with mean ui+1|i +1 given by Eq. (2.33) and covariance matrix given as follows:

2.3 Applications to State Estimation

33

( )−1 T −1 Σ u,i+1|i+1 = Σ −1 u,i+1|i + H Σ n H ( )−1 = Σ u,i+1|i − Σ u,i+1|i HT Σ n + HΣ u,i+1|i HT HΣ u,i+1|i ( ) = I2Nd − Gi+1 H Σ u,i+1|i (2.35) Equations (2.17), (2.21), and (2.33)–(2.35) constitute the essential recursive formulae of the standard KF. The derivation of the KF algorithm starts from a mathematical description of a dynamical system. Then, the conditional PDF of the state for the dynamical system is established by obtaining the mean vector and its associated covariance matrix based on the Gaussian assumption. When a new measurement is available, the posterior PDF of the state can be obtained by using the Bayes’ theorem. The analytical solutions of the estimated state vector and its associated covariance matrix can be obtained by solving an optimization problem. The KF provides an estimation of the state for a dynamical system as a weighted average of the predicted and measured states from the system and the new observation, respectively. The weighting in the KF is calculated from the covariance matrices of the state estimation and the measurements. Furthermore, it is worth noting that the KF provides the estimation of the state vector and its associated covariance matrix recursively over time by using the dynamical model of the system and the incoming measurements. As a result, the KF is feasible in real-time applications to cope with real-time data. In other words, the estimated state vector and its associated covariance matrix are updated per time step when a new observation arrives. In summary, the procedure of the standard KF algorithm is shown in Fig. 2.1. The algorithm starts with an arbitrary and admissible initial state vector u0|0 and covariance matrix Σ u,0|0 . Then, the one-step-ahead predicted state vector and its associated covariance matrix can be obtained by using Eqs. (2.17) and (2.21), respectively. After calculating the Kalman gain matrix using Eq. (2.34), the updated state vector and its associated covariance matrix can be obtained by using Eqs. (2.33) and (2.35), respectively.

2.3 Applications to State Estimation 2.3.1 Vehicle Tracking Problem The first example uses the standard KF to resolve a simple vehicle tracking problem. Consider a car moving in a plane as shown in Fig. 2.2. This vehicle can apply accelerating input to turn maneuver. It has an on-board location sensor which can report the x and y coordinates of the vehicle. The state vector of the vehicle tracking system is defined as follows:

34

2 System Identification Using Kalman Filter and Extended Kalman Filter

Set initial values

and

Compute the one-step-ahead predicted state vector by using Equation (2.17)

Compute the covariance matrix by using Equation (2.21)

Obtain the Kalman gain matrix by using Equation (2.34)

Compute the updated state vector by using Equation (2.33)

Compute the covariance matrix by using Equation (2.35) Fig. 2.1 Flow chart of the standard KF algorithm

]T [ ui ≡ xi , x˙i , x¨i , yi , y˙i , y¨i

(2.36)

where xi , x˙i and x¨i indicate the position, velocity and acceleration of the vehicle in the x-direction at the ith time step, respectively; yi , y˙i and y¨i indicate the position, velocity and acceleration of the vehicle in the y-direction at the ith time step, respectively. Then, based on kinematics, the state-space representation of the vehicle tracking system can be expressed as follows: ui+1 = Aui + Bwi The state transition matrix A is given by:

(2.37)

2.3 Applications to State Estimation

35

Fig. 2.2 A car moving in a plane



1 ⎢0 ⎢ ⎢ ⎢0 A=⎢ ⎢0 ⎢ ⎣0 0

⎤ Δt 0.5Δt 2 0 0 0 1 Δt 0 0 0 ⎥ ⎥ ⎥ 0 1 0 0 0 ⎥ ⎥ 0 0 1 Δt 0.5Δt 2 ⎥ ⎥ 0 0 0 1 Δt ⎦ 0 0 0 0 1

(2.38)

where Δt is the sampling time step; B = I6 is the influence matrix of the process noise; wi is the process noise at the ith time step and it is modeled as zero-mean Gaussian i.i.d. with covariance matrix Σ w given as follows: ⎡ Δt 4

Δt 3 Δt 2 2 2 2

0 0 ⎢ Δt Δt 0 0 ⎢ ⎢ Δt 1 0 0 ⎢ Σw = ⎢ 4 3 ⎢ 0 0 0 Δt4 Δt2 ⎢ 3 ⎣ 0 0 0 Δt Δt 2 22 0 0 0 Δt2 Δt 4 Δt 3 2 Δt 2 2

0 0 0



⎥ ⎥ ⎥ ⎥ 2 2 ⎥σw Δt ⎥ 2 ⎥ Δt ⎦ 1

(2.39)

where σw is the standard deviation of the process noise. The process noises in the x and y directions are assumed to be uncorrelated. On the other hand, the process noise is considered to be correlated among the state variables of position, velocity and acceleration. Then, the KF can be used to update the values of the state variables in the real-time manner.

36

2 System Identification Using Kalman Filter and Extended Kalman Filter

It is assumed that the x and y coordinates of the vehicle were measured with an on-board location sensor with sampling time step Δt. The measurement equation is given by: z i = Hui + ni

(2.40)

where z i ∈ R2 is the measurement vector at the ith time step; ni is the measurement noise at the ith time step and it is modeled as zero-mean Gaussian i.i.d. process with covariance matrix Σ n ; and H is the observation matrix given as follows: [

100000 H= 000100

] (2.41)

The vehicle moved in a straight line in the x-direction with a constant velocity 10 m/s. After moving 600 m in the x-direction, the vehicle turned right and travelled along a semicircular path with radius 100 m using a constant angular velocity 0.016 rad/s. Then, the vehicle turned left and traveled along another semicircular path with the same radius using the same constant angular velocity. The sampling time step was taken as Δt = 1 s and the entire tracking duration was 462 s. The positions of the vehicle in the x and y directions were observed and the rms of the measurement noise was taken as 2% rms of the corresponding actual positions of the vehicle. The standard deviation of the acceleration was taken as σw = 0.02 m2 /s−4 and the covariance matrix for the measurement noise was taken as Σ n = 3I2 m2 . The actual movement trajectory and the measurements of the vehicle are shown in Fig. 2.3. The standard KF can be performed according to Eqs. (2.37) and (2.40) to estimate the positions, velocities and accelerations of the vehicle. Figure 2.4 shows the comparison between the actual and estimated values of the vehicle position. It is observed that the standard KF tracked the positions of the vehicle quite well. Small deviations of the estimated values from the actual values of the vehicle position could be observed when the vehicle began the turning maneuver. The reason is that the estimated values were obtained based on the data points at the current and previous time steps. Nevertheless, the estimated values approached the actual values of the vehicle position after more data points were acquired. Figure 2.5 shows the estimation results of the velocities in the x and y directions of the vehicle. The dotted lines represent the estimated values; the solid lines represent the actual values; the dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be employed in other later figures. When the vehicle moved along the straight line in the x-direction, the actual velocities in the x and y directions were equal to 10 m/s and 0 m/s, respectively. Although the turning angular velocity was constant, the actual projections of the velocity on the x and y directions varied with the position. In Fig. 2.5, it is clearly seen that in the early stage of the filtering the estimated velocities fluctuated severely due to the inaccurate initial values and the lack of measurements. After acquiring more data points from

2.3 Applications to State Estimation

Fig. 2.3 Actual movement trajectory and the measurements of the vehicle

Fig. 2.4 Comparison between the actual and estimated values of the vehicle position

37

38

2 System Identification Using Kalman Filter and Extended Kalman Filter

the on-board location sensor, the estimated velocities in the x and y directions agreed well with the corresponding actual values during the uniform rectilinear motion and the turning maneuver. Small time delay was observed when the sudden change of the x-directional velocity occurred. This was because the estimation was obtained based on the data points of the previous and current time steps. Nevertheless, the delay was slight and the estimated velocities in the x-direction were recovered promptly after acquiring some data points during the turning maneuver. Figure 2.6 shows the estimation results of the accelerations in the x and y directions of the vehicle. When the vehicle moved along a straight line in the x-direction with a constant velocity, the actual accelerations in the x and y directions were both zero. In addition, since the vehicle used constant angular velocity to turn maneuver, the actual angular acceleration and the projections of the acceleration in the x and y directions were equal to zero. Again, in the early stage of identification, the estimated accelerations were highly fluctuating since only a limited amount of measured data was available. In Fig. 2.6, the actual and estimated accelerations in the x and y directions showed decent agreement after acquiring more data points. Moreover, a small valley in the x-directional acceleration appeared to compromise the sudden drop of the velocity in the x-direction. In addition, the credible intervals provided reasonable uncertainty levels of the estimation results. As a result, it is concluded that

Fig. 2.5 Estimation results of the velocities of the vehicle

2.3 Applications to State Estimation

39

Fig. 2.6 Estimation results of the accelerations of the vehicle

the standard KF can provide accurate estimation results and reasonable uncertainty levels of the estimation results for the vehicle tracking system.

2.3.2 Sixty-Story Building The second example considers a 60-story shear building. It has uniformly distributed floor mass and interstory stiffness, and the stiffness-to-mass ratio was taken as 1632 s−2 . As a result, the fundamental frequency of the building was 0.1669 Hz. The Rayleigh damping model was used, and the damping matrix was given by C = αM + βK, where α = 0.0157 s−1 and β = 0.0048 s. As a result, the damping ratios for the first two modes were 1.0%. The building was subjected to ground excitation modeled as zero-mean Gaussian white noise with spectral intensity S0 = 1.6 × 10−5 m2 /s3 . The force distribution matrix T in Eq. (2.1) for the ground excitation was given by: [ ]T T = M −1, −1, · · · , −1

(2.42)

40

2 System Identification Using Kalman Filter and Extended Kalman Filter

The entire monitoring duration was 200 s and the sampling frequency was 200 Hz. Acceleration responses of the 1st, 10th, 20th, 30th, 40th, 50th and 60th floors were observed and the rms of the measurement noise was taken as 5% rms of the corresponding noise-free response quantities. Figure 2.7 shows acceleration measurements of the 1st, 30th and 60th floors. The standard KF can then be implemented to update the structural displacement and velocity responses recursively. The initial values of the estimated state vector and the associated covariance matrix were given as follows: u0|0 = 02Nd ×1

(2.43)

Σ u,0|0 = I2Nd

(2.44)

Figure 2.8 shows the comparisons between the actual displacement responses and the estimated displacement responses of the 10th, 30th and 60th floors. Figure 2.9 shows the comparisons between the actual velocity responses and the estimated velocity responses of the 10th, 30th and 60th floors. The dotted lines represent the estimated values and the solid lines represent the actual values. It can be clearly seen

Fig. 2.7 Measurements at the 1st, 30th and 60th floors

2.3 Applications to State Estimation

41

that the two sets of curves were virtually on top of each other. The standard KF is capable to provide accurate displacement and velocity estimation results. Figure 2.10 shows the log standard deviation time histories of the estimated displacement responses of the 1st, 30th and 60th floors. Figure 2.11 shows the log standard deviation time histories of the estimated velocity responses of the 1st, 30th and 60th floors. At the beginning of the filter propagating process, the standard deviations of the estimated values were large due to the large prior uncertainty. However, as more data points have been acquired, the standard deviations of the estimated values decreased significantly and converged rapidly to small and steady values. It implies that increasingly more confidence of the estimation results can be obtained when more measurements are acquired. The influence of the initial values in the standard KF can soon be negligible after more observed data points are available. This verifies that the standard KF can provide accurate state estimation results for the linear dynamical systems.

Fig. 2.8 Actual and estimated displacement responses of the 1st, 30th and 60th floors

42

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.9 Actual and estimated velocity responses of the 1st, 30th and 60th floors

2.4 Extended Kalman Filter In the previous sections, we introduced the linear KF for the identification of linear dynamical systems. However, it often occurs in practical applications that the dynamical systems and the observation models are nonlinear. In this case, the standard KF is not applicable for state estimation. Therefore, it is necessary to explore the filtering techniques applicable for nonlinear systems. Furthermore, if there are unknown system parameters for identification, the problem will become nonlinear even for linear dynamical systems. This will be further elaborated afterwards. Popular choices of nonlinear filtering include the EKF, unscented Kalman filter and particle filter. In this section, we will introduce the most widely used nonlinear filtering technique, i.e., EKF. The core idea of the EKF is to use the Gaussian approximation to find an appropriate linear system representing the underlying nonlinear dynamical system and then the standard KF algorithm can be employed for state estimation. In the EKF, the approximation is realized by using the Taylor series expansion to approximate the nonlinearities in the dynamical system and the observation model. In this book, we consider only the first-order EKF, to which EKF is usually referred. The EKF is suitable for sightly nonlinear dynamical systems and other

2.4 Extended Kalman Filter

43

Fig. 2.10 Log standard deviation of the estimated displacement responses of the 1st, 30th and 60th floors

higher-order EKF can be considered to perform system identification for highly nonlinear dynamical systems by reducing the linearization error.

2.4.1 Derivation of the Extended Kalman Filter Consider a general, possibly nonlinear, dynamical system with Nd DOFs and equation of motion: M x¨ (t) + R(x(t), x˙ (t); θ (t)) = T f (t)

(2.45)

where x(t) denotes the generalized coordinate vector of the system at time t; M is the mass matrix; the function R(., .; .) represents the general linear/nonlinear restoring force governed by the possibly time-varying model parameters in θ (t) ∈ R Nθ ; f is the excitation applied to the system and T is the influence matrix associated with f . Define an augmented state vector given as follows: ]T [ y(t) ≡ x(t)T , x˙ (t)T , θ (t)T

(2.46)

44

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.11 Log standard deviation of the estimated velocity responses of the 1st, 30th and 60th floors

Then, the dynamical system in Eq. (2.45) can be rewritten as: ˙y(t) = g( y(t), f (t); θ (t))

(2.47)

where g(., .; .) is the nonlinear state-space function that characterizes the dynamical system in Eq. (2.45). Equation (2.47) can be expanded in a Taylor series around the state y(t) = y∗ as follows: I I ( ) ∂g II ∂g II ∗ ˙y(t) = g| y= y∗ + y− y + f (2.48) ∂ y I y= y∗ ∂ f I y= y∗ where higher-order terms are neglected with the assumption that y is close to y∗ . Then, Eq. (2.48) can be rewritten to the following equation: ˙y(t) = A y (t) y(t) + B y f (t) + δ y (t) where the matrix A y is given by:

(2.49)

2.4 Extended Kalman Filter

45

I ∂g II ∂ y I y= y∗ ⎡ ⎤ I Nd 0 Nd ×Nθ 0 Nd ×Nd ⎦ = ⎣ −M−1 ∂R −M−1 ∂R −M−1 ∂R ∂x ∂ x˙ ∂θ 0 Nθ ×Nd 0 Nθ ×Nd −ηI Nθ

A y (t) =

(2.50)

where η is a small positive number to avoid singularity when calculating the inverse matrix of A y . The matrix B y is given by: I ∂g II ∂ f I y= y∗ ⎤ ⎡ 0 Nd ×N f = ⎣ −M−1 T ⎦ 0 Nθ ×N f

By =

(2.51)

and δ y (t) is the remainder term due to the local linear approximation given by: δ y (t) = g| y= y∗ −

I ∂g II y∗ ∂ y I y= y∗

(2.52)

By assuming that the excitation is constant within any time interval, i.e., f (iΔt + ξ ) = f (iΔt), ∀ξ ∈ [0 , Δt), i = 0, 1, 2, . . . , Eq. (2.49) can be discretized to the following difference equation: yi+1 = Ai yi + Bi f i + δ i

(2.53)

where yi ≡ y(iΔt); f i ≡ f (iΔt) and Δt is the sampling time step; Ai , Bi and δ i are given by: ( ) Ai = exp A y Δt

(2.54)

( ) Bi = A−1 y Ai − I2Nd +Nθ B y

(2.55)

( ) δ i = A−1 y Ai − I2Nd +Nθ δ y (iΔt)

(2.56)

Discrete-time response measurements z 1 , z 2 , . . . , z i+1 are acquired at N0 DOFs: ( ) z i+1 = h yi+1 + ni+1

(2.57)

where h(·) defines the observation quantities; ni+1 represents the measurement noise at the (i + 1)th time step and the measurement noise is modeled as Gaussian i.i.d. process with zero mean and covariance matrix Σ n ∈ R N0 ×N0 . In the standard KF, the

46

2 System Identification Using Kalman Filter and Extended Kalman Filter

measurement equation is a linear function given by Eq. (2.13). However, it is often encountered that the measurement equation in Eq. (2.57) is a nonlinear function of the state vector yi+1 in the EKF. For example, when the observation is the acceleration response and the structural parameters are included in the augmented state vector, the terms kx and c x˙ are nonlinear terms. As a result, the observation equation is in general a nonlinear function and the corresponding observation matrix can be written as follows: I ∂h II Hi+1 = (2.58) ∂ y I y= yi+1 On the other hand, if the displacement or velocity responses are observed, the observation equation will become a linear function given by: z i+1 = H yi+1 + ni+1

(2.59)

where the observation matrix H is a constant matrix. It is noticed that the state-space representation of the dynamical model in Eq. (2.53) and the observation equation in Eq. (2.57) are available. As a result, the EKF algorithm can be readily obtained by using the same procedure of the derivation of the standard KF. First, given the measurement dataset Di = {z 1 , z 2 , . . . , z i }, the one-step-ahead predicted state vector yi+1|i can be obtained by taking the expected value of Eq. (2.53): ] [ yi+1|i ≡ E yi+1 |Di = Ai yi|i + δ i

(2.60)

The associated covariance matrix Σ i+1|i can be obtained as follows: Σ i+1|i ≡ E

[(

yi+1 − yi+1|i

)(

yi+1 − yi+1|i

)T II ] IDi

= Ai Σ i|i AiT + Bi Σ f BiT

(2.61)

When a new data point z i+1 is available, the updated state vector yi+1|i+1 and its associated I ) matrix Σ i+1|i+1 can be obtained by maximizing the conditional ( covariance PDF p yi+1 I Di+1 : ( )) ( yi+1|i+1 = yi+1|i + Gi+1 z i+1 − h yi+1|i

(2.62)

( ) Σ i+1|i+1 = I2Nd +Nθ − Gi+1 H Σ i+1|i

(2.63)

where H is the observation matrix given by:

2.4 Extended Kalman Filter

47

H=

I ∂h II ∂ y I yi+1|i

(2.64)

and Gi+1 is the Kalman gain matrix given as follows: ( )−1 Gi+1 = Σ i+1|i HT HΣ i+1|i HT + Σ n

(2.65)

Equations (2.60)–(2.63) and (2.65) comprise the essential formulae of the EKF algorithm (Hoshiya and Saito 1984). The states and the model parameters of the system can be calculated recursively. In addition, the associated uncertainties can also be obtained since the covariance matrix of the updated state vector in Eq. (2.63) is given. Therefore, the EKF algorithm not only provides the recursive estimation results but also quantifies the uncertainties of the estimation results in an online manner. However, the uncertainty quantification relies on the prescribed values of Σ f and Σ n . This issue will be elaborated and resolved in Chap. 3. Compared with other nonlinear filters, e.g., the unscented Kalman filter (Chatzi and Smyth 2009) and the particle filter (Andrieu et al. 2004), EKF has its superiority with significantly less computation demand. On the other hand, the EKF algorithm is not applicable for highly nonlinear systems. This is because the EKF is based on local linear approximation and Gaussian approximation.

2.4.2 Extended Kalman Filter with Fading Memory There is an implementation issue when the EKF is utilized for the long-term identification of time-varying systems. After the EKF utilized substantial amount of data, the contribution of a new data point will have tiny contribution to the system parameters to be updated due to diminishing effect. As a result, the tracking capability (of system parameters) will be poor and there will be serious time delay to track the changes of the parameters. In order to resolve this problem, the fading memory is employed such that the contribution from the past data will be gradually downgraded. As a result, the tracking capability can be enhanced to track the change of the underlying system. In the following, the EKF with fading memory will be introduced. In order to construct the EKF with fading memory, the covariance matrix of the one-step-ahead predicted state vector in Eq. (2.61) is assumed to have the following form (Sorenson and Sacks 1971): ) ( Σ i+1|i = λ Ai Σ i|i AiT + Bi Σ f BiT

(2.66)

where λ ≥ 1 is known as the fading factor which is assigned to be not smaller than 1; Σ i|i and Σ i+1|i indicate the covariance matrix of the updated state vector at the previous time step and the covariance matrix of the one-step-ahead predicated state vector at the current time step in the fading memory filtering, respectively; Ai and

48

2 System Identification Using Kalman Filter and Extended Kalman Filter

Bi are given as Eqs. (2.54) and (2.55), respectively; Σ f is the covariance matrix of the process noise. Equation (2.66) infers that the contribution of the past data is discounted by enlarging the covariance matrix of the one-step-ahead predicted state vector at the current time step by the factor λ. On the other hand, given the measurement dataset Di = {z 1 , z 2 , . . . , z i }, the onestep-ahead predicted state vector in the fading memory filtering can be obtained in the same fashion as in the EFK: yi+1|i = Ai yi|i + δ i

(2.67)

where yi|i is the updated state vector at the previous time step in the fading memory filtering and δ i can be obtained by using Eq. (2.56). When a new data point z i+1 is available, the updated state estimation yi+1|i+1 in the fading memory filtering can be obtained by using the KF (Kalman 1960): ( )) ( yi+1|i+1 = yi+1|i + Gi+1 z i+1 − h yi+1|i

(2.68)

where the Kalman gain matrix Gi+1 is given by: ( )−1 Gi+1 = Σ i+1|i HT HΣ i+1|i HT + Σ n

(2.69)

where H is the observation matrix given by Eq. (2.64). In addition, the covariance matrix of the updated state vector can be obtained as follows: ( ) Σ i+1|i+1 = I2Nd +Nθ − Gi+1 H Σ i+1|i

(2.70)

Equations (2.66)–(2.70) comprise the essential formulae of the EKF with fading memory. It is obvious to see that the one-step-ahead predicted state vector yi+1|i and the updated state estimation yi+1|i+1 in Eqs. (2.67) and (2.68) are the same as Eqs. (2.60) and (2.62), respectively. Moreover, compared Eqs. (2.66), (2.69), and (2.70) with Eqs. (2.61), (2.65), and (2.63), only the covariance matrix of the onestep-ahead predicted state vector in Eq. (2.66) is different from the one in Eq. (2.61). As a result, the fading-memory EKF is identical to the EKF, except the covariance matrix of the one-step-ahead predicted state estimation. It is noticed that the fading factor λ in Eq. (2.66) increases the uncertainty in the state estimation and this results in diluting the contribution of the past data. If λ = 1, the EKF with fading memory is equivalent to the EKF. In most applications, the fading factor is taken to be slightly larger than 1. On the other hand, the choice of λ is constrained by the following tradeoff. A larger value of λ implies larger state estimation uncertainty in the filter and will result in more fluctuating estimation results and smaller delay of identification. A smaller value of λ implies smaller state estimation uncertainty in the filter and will result in more stable estimation results and larger delay of identification. In this book, we take the fading factor as the following form (Sorenson and Sacks 1971):

2.4 Extended Kalman Filter

49 2

λ = 2 Nλ

(2.71)

where Nλ is the prescribed number of time steps and it denotes the half-life of the contribution of a data point. In other words, the contribution of a data point will be decreased by half after Nλ time steps. The procedure of the EKF with fading memory is listed in Fig. 2.12. In the following chapters, we will employ the EKF with fading memory as the fundamental algorithm to resolve some critical issues encountered in real-time system identification. When ‘EKF’ is mentioned in the subsequent content, it implies the EKF algorithm with a given fading factor.

Set initial values , a fading factor

and

Compute the one-step-ahead predicated state vector by using Equation (2.67)

Compute the covariance matrix by using Equation (2.66)

Obtain the Kalman gain matrix by using Equation (2.69)

Compute the updated state vector by using Equation (2.68)

Compute the covariance matrix by using Equation (2.70) Fig. 2.12 Flow chart of the EKF with fading memory

50

2 System Identification Using Kalman Filter and Extended Kalman Filter

2.5 Application to State Estimation and Model Parameter Identification 2.5.1 Single-Degree-of-Freedom System Consider a single-degree-of-freedom (SDOF) system and its equation of motion given as follows: x(t) ¨ + 2ζωn x(t) ˙ + ωn2 x(t) = f (t)

(2.72)

where ωn = 0.7071 rad/s and ζ = 0.01 are the natural frequency and damping ratio of the oscillator, respectively; f is the input signal and it is modeled as a zero-mean Gaussian white noise with spectral intensity S0 = 1.0 × 10−4 m2 /s3 . In order to employ the EKF to update the natural frequency and damping ratio of the oscillator recursively, the augmented state vector y(t) including the displacement, velocity, natural frequency and damping ratio is defined as follows: y(t) ≡ [x(t), x(t), ˙ ωn , ζ]T

(2.73)

The state-space representation of Eq. (2.72) can be written as follows: ˙y(t) = g( y(t), f (t); ωn , ζ)

(2.74)

Then, Eq. (2.74) can be expanded using Taylor series around the state y(t) = y∗ : ˙y(t) = g| y= y∗ +

I I ( ) ∂g II ∂g II ∗ y − y + f ∂ y I y= y∗ ∂ f I y= y∗

(2.75)

where the higher-order terms are neglected with the assumption that y is close to y∗ . Equation (2.75) can be discretized to the following equation: yi+1 = Ai yi + Bi f i + δ i

(2.76)

where yi ≡ y(iΔt); f i ≡ f (iΔt), and Δt is the sampling time step; Ai and Bi are given by: ( ) Ai = exp A y Δt

(2.77)

Bi = A y (Ai − I4 )B y

(2.78)

where the matrix A y is given by:

2.5 Application to State Estimation and Model Parameter Identification

I ∂g II ∂ y I y= yi ⎤ ⎡ 0 1 0 0 ⎢ −ω2 −2ζωn −2ωn x(t) − 2ζx(t) ˙ −2ωn x(t) ˙ ⎥ n ⎥ =⎢ ⎦ ⎣ 0 0 −η 0 0 0 0 −η

51

Ay =

(2.79)

where η is a small positive number to avoid singularity when calculating the inverse of A y . The matrix B y is given by: I ∂g II By = ∂ f I y= yi [ ]T = 0, 1, 0, 0

(2.80)

δ i is the remainder term due to the local linear approximation: ) ( δ i = A y (Ai − I4 ) g| y= yi − A y yi

(2.81)

The displacement and velocity responses of the oscillator were measured for 200 s with sampling time interval Δt = 0.005 s. The observation equation was given by: z i = H y i + ni

(2.82)

where H is the observation matrix given by: [

1000 H= 0100

] (2.83)

and ni is the measurement noise modeled as zero-mean Gaussian i.i.d. process with the covariance matrix Σ n . The rms of the measurement noise was taken as 5% rms of the corresponding noise-free response quantities. The EKF can then be performed based on Eqs. (2.76) and (2.82). The initial values of the updated state vector and its associated covariance matrix were taken as: y0|0 = [0, 0, 1, 1]T

(2.84)

Σ 0|0 = I4

(2.85)

Note that the initial condition in Eq. (2.84) was purposely assigned with large error. In addition, the fading factor λ was taken as 21/1000 . In other words, the data half-life was 2000 time steps or 10 s. Figures 2.13 and 2.14 show the estimation results of the natural frequency and the damping ratio in time histories, respectively. The dotted lines represent the estimated values; the solid lines represent the actual values; the

52

2 System Identification Using Kalman Filter and Extended Kalman Filter

dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be used to the later figures. It is seen that at the beginning of the filter propagating stage, the estimated natural frequency and damping ratio were highly fluctuating due to the inaccurate initial values and the large prior uncertainty. When more data points were acquired, the estimated values approached the corresponding actual values. The 99.7% credible intervals provided reasonable estimation uncertainties for the identification results. Figure 2.15 shows the comparison between the actual and estimated displacement responses. Figure 2.16 shows the comparison between the actual and estimated velocity responses. At the early stage of identification, the estimated displacement and velocity responses deviated apparently from the corresponding actual values. After acquiring more data points, the actual and estimated responses of the oscillator were virtually on top of each other. As a result, it is concluded that the EKF can provide accurate estimation results for both states and model parameters.

Fig. 2.13 Estimation result of the natural frequency

2.5 Application to State Estimation and Model Parameter Identification

53

Fig. 2.14 Estimation result of the damping ratio

2.5.2 Three-Pier Bridge In this illustrative example, a bridge with three piers is considered (shown in Fig. 2.17). The bridge has pin supports at its two ends (i.e., the 1st and 17th nodes) and the bottom of the three piers (i.e., the 19th, 21st and 23rd nodes). It has a span of 256 m, and the length of its three piers is 16 m. The deck is divided into 16 components (each with 16 m) and the pier is divided into 2 components (each with 8 m). The deck has uniform box cross-section with area 1.56 m2 and weak axis moment of inertia 2.02 m2 . The three piers have the same circular cross-section with area 1.57 m2 . The mass density is 3780 kg/m3 and the modulus of elasticity is 2 GPa. As a result, the first five natural frequencies are 0.47, 0.53, 0.70, 0.89 and 1.33 Hz. The damping matrix is given by C = αM + βK, where α = 0.063 s−1 and β = 0.006 s, so the damping ratios for the first two modes are 2%. The components of theΣ bridge were separated into seven groups and the stiffness matrix was given by (n) K = 7n=1 θ (n) k K . Specifically, one stiffness parameter was assigned to every four components on the deck, and one stiffness parameter was assigned to each pier. The bridge was subjected to horizontal and vertical ground excitations: f (t) =

[

f h (t), f v (t)

]T

(2.86)

54

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.15 Comparison between the actual and estimated displacement responses

where f h and f v are the horizontal and vertical excitations and they were modeled as zero-mean Gaussian white noises with spectral intensity Sh = 5.0 × 10−5 m2 /s3 and Sv = 3.2 × 10−5 m2 /s3 , respectively. The entire monitoring period was 200 s and the sampling frequency was 250 Hz. The horizontal accelerations of the 5th, 13th, 20th and 22nd nodes and the vertical accelerations of the 9th, 18th, 20th and 22nd nodes were observed, so the number of observed DOFs was N0 = 8. The rms of the measurement noise was taken as 5% rms of the corresponding noise-free acceleration responses. In addition, the bridge was undamaged in the first 100 s. Then, sudden damages occurred on the deck linking the 5th node to the 9th node and the middle pier at the 100th s and the 150th s, respectively. The fading factor λ was taken as 21/1000 , which implied that the data half-life was 2000 time steps or 8 s. The EKF can be implemented to update the structural states and the uncertain model parameters. Figures 2.18 and 2.19 show the estimation results of the stiffness parameters and damping coefficients, respectively. The estimation results fluctuated severely at the beginning of the monitoring period since the initial values were far from the actual values and the number of measurements were quite limited. After more measured data points were available, the estimated values approached the actual values. Moreover, the credible intervals were sufficient to provide the proper uncertainty quantification results.

2.5 Application to State Estimation and Model Parameter Identification

55

Fig. 2.16 Comparison between the actual and estimated velocity responses 1

2

3

a

4

5

6

7

b

8

18

20

19

21

9

10

11 c

12

13

14

15

16

17

22 23

Fig. 2.17 Bridge model

Figure 2.20 shows the comparison of the actual and estimated responses for some representative DOFs (a, b and c shown in Fig. 2.17) in the first 5 s. Figure 2.21 shows the comparison of the actual and estimated responses in the remaining time period. In particular, the DOF marked as a is the rotational DOF of the 4th node; the DOF marked as b is the horizontal DOF of the 8th node and the DOF marked as c is the vertical DOF of the 11th node. The 45-degree line in each subplot of Figs. 2.20 and 2.21 provides the reference of perfect match. It is obvious that the data points corresponding to the first 5 s sparsely scattered from the 45-degree line. Again, this was caused by that the initial values were far from the actual values and the number of measurements was limited at the early stage of identification. The remaining points

56

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.18 Estimation result of the stiffness parameters

2.5 Application to State Estimation and Model Parameter Identification

57

Fig. 2.19 Estimation result of the damping coefficients

were distributed along the line of perfect match after more measurements became available. Thus, it demonstrates that the estimated structural responses using the EKF achieved high accuracy.

2.5.3 Bouc-Wen Hysteresis System This application is concerned with a Bouc-Wen hysteresis system with Nd = 10 DOFs shown in Fig. 2.22. The Bouc-Wen model of hysteresis is one of the most commonly used hysteretic models to describe nonlinear hysteretic systems in structural engineering (Vaiana et al. 2018). The governing equation of this nonlinear system is given by: M x¨ (t) + C[θ c (t)] x˙ (t) + K[θ k (t)]r(t) = T f (t) r˙n (t) = x˙n − μ|x˙n ||rn |η−1rn − κ x˙n |rn |η , n = 1, 2, . . . , 10

(2.87) (2.88)

58

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.20 Comparison between the actual and estimated responses of the representative DOFs in the first 5 s

2.5 Application to State Estimation and Model Parameter Identification

59

Fig. 2.21 Comparison between the actual and estimated responses of the representative DOFs in the remaining time period

60

2 System Identification Using Kalman Filter and Extended Kalman Filter

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

c1

c2

c3

c4

c5

c6

c7

c8

c9

c10

Fig. 2.22 Bouc-Wen hysteresis system

where x¨ (t), x˙ (t) and r(t) are the acceleration, velocity, and restoring force vector, respectively; M, C and K are the mass, damping and stiffness matrix of the system, respectively; the stiffness and damping matrix are parameterized with possibly timevarying parameters θ k (t) and θ c (t), respectively; f is the excitation applied to the system and T is the influence matrix associated with the excitation f . The stiffness matrix K is parameterized as follows: K=

6 Σ

(n) θ (n) k K

(2.89)

n=1

where K(n) is the nth substructural stiffness matrix given by: K

(1)

=

θk(1)

[

k 01×9 09×10

]



K(n)

⎤ 0 [(n−2)×10 ] ⎢ ⎥ 1 −1 ⎥ = θk(n) ⎢ ⎣ 02×(n−2) k −1 1 02×(10−n) ⎦, n = 2, . . . , 5 0(10−n)×10 ⎤ ⎡ 0 4×10 ⎤ ⎡ ⎢ 1 −1 0 0 0 0 ⎥ ⎥ ⎢ ⎢ ⎢ −1 2 −1 0 0 0 ⎥ ⎥ ⎢ ⎥⎥ ⎢ ⎢ ⎥⎥ ⎢ K(6) = θk(6) ⎢ ⎢ 0 −1 2 −1 0 0 ⎥ ⎥ ⎢ 06×4 k ⎢ ⎥⎥ ⎢ ⎢ 0 0 −1 2 −1 0 ⎥ ⎥ ⎢ ⎥⎥ ⎢ ⎣ ⎣ 0 0 0 −1 2 −1 ⎦ ⎦ 0 0 0 0 −1 1

(2.90)

(2.91)

(2.92)

]T [ θ k = θk(1) , θk(2) , θk(3) , θk(4) , θk(5) , θk(6) is used to interpret the integrity of the system. The actual stiffness of each spring was taken as k = 2000 N/m. The Rayleigh damping model was used so the damping matrix was given by C = αM + βK. The damping coefficients were given by α = 0.100 s−1 and β = 7.526 × 10−4 s. The

2.5 Application to State Estimation and Model Parameter Identification

61

characteristic parameters of the Bouc-Wen system were taken to be μ = 1000 s2 /m2 , κ = 1500 s2 /m2 and η = 2. The hysteresis system was subjected to a zero-mean Gaussian white noise with spectral intensity S0 = 5.0 × 10−6 m2 /s3 . The entire monitoring period was 300 s and the sampling time interval was Δt = 0.002 s. The measurements included the velocity responses of the 1st, 3rd, 5th, 8th and 10th DOFs. The measurement noise was modeled as zero-mean Gaussian i.i.d. process and the rms of the measurement noise was taken as 5% rms of the corresponding noise-free response quantities. In addition, the system was undamaged in the first 150 s. Then, sudden damages with 10% and 5% stiffness reduction occurred in the 1st and 2nd nonlinear springs at t = 100 s and t = 200 s, respectively. The fading factor λ was taken as 21/1000 , which implied that the data half-life was 2000 time steps or 4 s. The augmented state vector was defined as follows: [ ]T y(t) ≡ x(t)T , x˙ (t)T , r(t)T , θ (t)T

(2.93)

where the uncertain model parameter vector θ (t) was given by: ]T [ θ (t) = θ Tk , α, β, μ, κ, η

(2.94)

The EKF can then be performed to update the responses and the uncertain model parameter vector of the hysteresis system. The estimation results of the stiffness parameters are shown in Fig. 2.23. It is seen that at the early stage of identification, the estimated stiffness parameters fluctuated violently since the initial values were far from the actual values. After the filtering gained more measured data points, the estimated values approached the actual values and they were within the 99.7% credible intervals. The abrupt changes of the stiffness parameters could be tracked promptly. It is noticed that there was small time delay in tracking the abrupt changes of θk(1) and θk(2) , because the estimation results were obtained based on the previous and current time steps. Nevertheless, the time delay was acceptably small for the sudden changes of the stiffness parameters. Figure 2.24 shows the estimation results of the damping coefficients and the characteristic parameters of the hysteresis system. Again, the estimated values severely fluctuated at the beginning of the identification period. With the accumulation of more data points, it is seen that the estimated values showed good agreement with the corresponding actual values and the associated estimation uncertainties could be well represented by the 99.7% credible intervals. As a result, it indicates that the EKF could achieve accurate estimation results for the model parameters of the Bouc-Wen hysteresis system. On the other hand, Fig. 2.25 shows the actual versus the estimated values of the restoring forces of the first 10 s for the 1st, 2nd, 5th and 10th nonlinear springs. Figure 2.26 shows the actual versus the estimated values of the restoring forces of the remaining time period for the 1st, 2nd, 5th and 10th nonlinear springs. The 45degree line in each subplot of Figs. 2.25 and 2.26 provides the reference of perfect match. In the first 10 s, the estimated restoring forces in each subplot of Fig. 2.25

62

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.23 Estimation results of the stiffness parameters

deviated significantly from the 45-degree line, because the initial values of the state vector were randomly selected from reasonable ranges. Moreover, the initial prior uncertainty was assigned to be large enough to cover a broad range of reasonable values of the state vector. After t = 10 s, the estimated restoring forces in each subplot of Fig. 2.26 showed good agreement with the actual restoring forces since the filter acquired more measured data points. As a result, it demonstrates that the EKF could provide accurate estimation results for both responses and model parameters.

2.6 Application to a Field Inspired Test Case: The Canton Tower

63

Fig. 2.24 Estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen hysteresis system

2.6 Application to a Field Inspired Test Case: The Canton Tower 2.6.1 Background Information In this section, a SHM benchmark study is presented based on the full-scale measurements of the Canton Tower by using the EKF. The Canton Tower located in Guangzhou, China, is a supertall tube-in-tube structure with a total height of 610 m. It consists of a 454 m high main tower and a 156 m high antenna mast (Ni et al.

64

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.25 Estimation results of the restoring forces of the 1st, 2nd, 5th and 10th nonlinear springs in the first 10 s

2009, 2012; Chen et al. 2011). In addition, there are 37 floors connecting the inner tube and the outer tube. The reduced-order finite element model of the Canton Tower proposed in Chen et al. (2011) was utilized for SHM purpose. The schematic diagram of the reducedorder finite element model for the Canton Tower is shown in Fig. 2.27. The entire structure has 37 segments and each segment is modeled as a linear elastic beam element. As a result, the tower can be modeled as a 3D cantilever beam with 37 nodes. In addition, the axial deformation of each beam element in the reduced-order model is neglected so each node in Fig. 2.27 has five DOFs. Specifically, there are two horizontal translational DOFs and three rotational DOFs. Therefore, the total number of DOFs for the reduced-order finite element model of the Canton tower is 185. Twenty uni-axial accelerometers were deployed at eight locations of the tower to record the structural acceleration responses. The detailed illustration on the modular design of the SHM system devised for the Canton Tower can be found in Ni et al. (2009). In this section, we investigate the capability of the EKF in state and model

2.6 Application to a Field Inspired Test Case: The Canton Tower

65

Fig. 2.26 Estimation results of the restoring forces of the 1st, 2nd, 5th and 10th nonlinear springs in the remaining time period

parameter estimation for visualization of a real case test example for higher dimensionality, so we utilized the synthetic acceleration measurements under ground excitation to implement online updating of the Canton tower model. The mass matrix was assumed to be fixed and the stiffness matrix was parameterized with two stiffness parameters θk(1) and θk(2) . Specifically, θk(1) referred to the translational effect in the element stiffness matrix and θk(2) referred to the bending and rotational effect in the element stiffness matrix. Therefore, the parameterization for a typical element stiffness matrix can be expressed as follows:

66

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.27 The reduced-order finite element model of the Canton Tower

Vy

37

My 36

Mx

Mz

35

Vx

3 2 1

z

y x

⎡ (1) e (1) e θk K1,1 θk K1,2 ⎢ (1) e ⎢ θk K2,2 ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ e K (θ k ) = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

(2)

(2)

e θk K1,3

e θk K1,4

e θk K2,3

e θk K2,4

(2)

(1)

(1)

(2)

(2)

(2)

e θ Ke θ Ke θ Ke θ Ke e θk K1,5 1,6 k 1,7 k 1,8 k 1,9 θk K1,10 k



⎥ (2) e (1) e (1) e (2) e (2) e (2) e ⎥ θk K2,5 θk K2,6 θk K2,7 θk K2,8 θk K2,9 θk K2,10 ⎥ ⎥ (2) e (2) e (2) e (2) e (2) e (2) e (2) e (2) e θk K3,3 θk K3,4 θk K3,5 θk K3,6 θk K3,7 θk K3,8 θk K3,9 θk K3,10 ⎥ ⎥ e ⎥ (2) e (2) e (2) (2) e (2) e (2) e (2) e θk K4,4 θk K4,5 θk K4,6 θk K4,7 θk K4,8 θk K4,9 θk K4,10 ⎥ ⎥ (2) e (2) e (2) e (2) e (2) e (2) e θk K5,5 θk K5,6 θk K5,7 θk K5,8 θk K5,9 θk K5,10 ⎥ ⎥ ⎥ (1) e (1) e (2) e (2) e (2) e θk K6,6 θk K6,7 θk K6,8 θk K6,9 θk K6,10 ⎥ ⎥ ⎥ (1) e (2) e (2) e (2) e θk K7,7 θk K7,8 θk K7,9 θk K7,10 ⎥ ⎥ ⎥ (2) e (2) e (2) e symmetric θk K8,8 θk K8,9 θk K8,10 ⎥ ⎥ (2) e (2) e θk K9,9 θk K9,10 ⎥ ⎦ (2) e θk K10,10 (2)

(2)

(2.95) e where Km,n is the (m, n)th element of the nominal element stiffness matrix. The monitoring period was 50 s with the sampling time step 0.002 s. The ground excitation was modeled as a zero-mean Gaussian white noise with the spectral density 5.0 × 10−3 m2 /s3 . The rms of the measurement noise was taken as 5% rms of the corresponding noise-free acceleration responses. The fading factor λ was taken as 21/1000 , which implied that the data half-life was 4 s.

2.6 Application to a Field Inspired Test Case: The Canton Tower

67

2.6.2 Identification of Structural States and Model Parameters The augmented state vector is defined as follows: ]T [ yi ≡ x iT , x˙ iT , θ iT

(2.96)

where x i ∈ R185 , x˙ i ∈ R185 and θ i ∈ R2 indicate the displacement, velocity and the stiffness parameter vectors at the ith time step, respectively. Then, EKF can be performed to update the displacement, velocity and stiffness parameters recursively. Figure 2.28 shows the identification results of the stiffness parameters. The dotted lines represent the estimated values; the solid lines represent the nominal values; and the dashed lines represent the bounds of the 99.7% credible intervals. It is seen that at the early stage of identification, the estimation results of the stiffness parameters fluctuated severely due to the inaccurate initial values and large prior uncertainty. When the EKF gained more data points from the accelerometers, it is observed that a reasonable agreement could be found between the nominal and estimated values. In addition, the 99.7% credible intervals offered reasonable uncertainty levels for the estimated stiffness parameters.

Fig. 2.28 Identification results of the stiffness parameters

68

2 System Identification Using Kalman Filter and Extended Kalman Filter

Fig. 2.29 Comparison of the nominal and estimated displacement responses

On the other hand, Figs. 2.29 and 2.30 present the nominal displacement and velocity responses versus the corresponding estimated values in time histories, respectively. The dotted lines represent the estimated values and the solid lines represent the nominal values. It is seen that the estimated values coincided well with the nominal values, so accurate estimation results for displacement and velocity responses could be achieved by the EKF. As a result, it is concluded that the EKF is capable for online tracking the optimal values of the states and model parameters.

2.7 Extended Readings In Sects. 2.2 and 2.4, the standard KF and the EKF are introduced by using the Bayes’ theorem. The standard KF is a special case of the Bayesian filtering. In particular, the standard KF is the closed-form solution to the Bayesian filtering equations for the filtering models, where the dynamical model and the observation model are linear and Gaussian. In this section, we will briefly introduce the Bayesian filtering equations in a more general way (Jazwinski 1970).

2.7 Extended Readings

69

Fig. 2.30 Comparison of the nominal and estimated velocity responses

Consider a general discrete-time linear/nonlinear dynamical system described by the following equation: ) ( yi+1 = G yi , f i ; θ i

(2.97)

where i indicates the time index; yi is the state vector of the dynamical system; θ i is the model parameter vector characterizing the dynamical system; f i is the input vector of the system and G(·) defines the governing equation of the underlying dynamical system. Then, it is assumed that the discrete-time observation equation of the dynamical system in Eq. (2.97) can be expressed as follows: ( ) z i = h y i + ni

(2.98)

where z i is the observed vector at the ith time step; ni is the measurement noise vector modeled as zero-mean Gaussian i.i.d. process and h(·) defines the observation quantities. In addition, it is assumed that the measurement noise is statistically independent to the input.

70

2 System Identification Using Kalman Filter and Extended Kalman Filter

In the Bayesian filtering technique, the actual state vector is assumed to be an observed Markov process, so this implies that the actual state vector yi at the ith time step given the previous state vector yi−1 at the (i − 1)th time step is independent of all earlier states: ) ( I ) ( I p yi I y0 , y1 , . . . , yi−1 = p yi I yi−1

(2.99)

In addition, the measurement vector is the observations of a hidden Markov model. Therefore, the measurement z i at the ith time step is dependent only upon the current state yi and is conditionally independent of all other states given the current state: ) ( ) ( p z i | y0 , y1 , . . . , yi−1 , yi = p z i | yi

(2.100)

{ } Based on Eqs. (2.99) and (2.100), the PDF over all states y0 , y1 , . . . , yi−1 , yi of the hidden Markov model can be obtained as follows: i ) ( ) II ) ( ) ( ( I p y0 , y1 , . . . , yi−1 , yi , z 1 , z 2 , . . . , z i−1 , z i = p y0 p yn I yn−1 p z n | yn n=1

(2.101) However, ) order to realize real-time implementation, we need to calculate the ( I in PDF p yi I Di associated with the current state conditioned on the measurements up to the current time step. This PDF can be obtained by marginalizing out the previous states and dividing by the PDF of the measurement dataset (Masreliez and Martin 1977). First, the conditional PDF associated with the one-step-ahead predicted state vector is the integral of the products of the PDF associated with state transition from the (i − 1)th time step to the ith time step and the PDF associated with the previous state over all possible yi−1 : I ) p yi I Di−1 = (

= =

( ( (

I ( ) p yi , yi−1 I Di−1 d yi−1 I ( I ) ( ) p yi I yi−1 , Di−1 p yi−1 I Di−1 d yi−1 I ) ( ) ( I p yi I yi−1 p yi−1 I Di−1 d yi−1

(2.102)

) ( I where the PDF p yi I yi−1 can be obtained based on the dynamical system I ) ( easily equation in Eq. (2.97). The PDF p yi I Di−1 is a priori PDF of yi . Given the measurement dataset Di = {z 1 , z 2 , . . . , z i−1 , z i } up to the ith time step, the conditional PDF of the updated state vector can be obtained by using the Bayes’ theorem: ) ( I ) ( I ) p z i | yi p yi I Di−1 p yi I Di = p( z i |Di−1 ) (

(2.103)

2.8 Concluding Remarks

71

where the denominator p( z i |Di−1 ) is a normalizing constant and it can be obtained in the similar fashion with Eq. (2.102): ( p( z i |Di−1 ) = =

( (

=

I ) ( p z i , yi I Di−1 d yi ) ( I ) ( p z i | yi , Di−1 p yi I Di−1 d yi ) ( I ) ( p z i | yi p yi I Di−1 d yi

(2.104)

) ( On the other hand, the PDF p z i | yi can be obtained based on the measurement equation in Eq. (2.98) and( theI PDF ) of the measurement noise ni . As a result, the PDF p yi I Di of the updated state vector can obtained I readily ) ( be based on Eqs. (2.97), (2.98), (2.102), and (2.103). The PDF p yi I Di is a posteriori PDF of yi . The Bayesian filtering can be implemented in an online manner by utilizing Eq. (2.102) to calculate the priori PDF and Eq. (2.103) to calculate the posterior PDF recursively. It is noteworthy that analytical solutions for Eqs. (2.102) and (2.103) are available for a few cases. Specifically, when the system equation in Eq. (2.97) and the measurement equation in Eq. (2.98) are linear and the noise terms are independent and Gaussian, analytical solutions of Eqs. (2.102) and (2.103) can be represented by the standard KF. As a result, the derivation of the Bayesian filtering proves that the standard Kalman filter is the optimal filter, when the system and measurement models are linear and associated with Gaussian noise terms.

2.8 Concluding Remarks This chapter introduced the standard Kalman filter (KF) and the extended Kalman filter (EKF). KF and its variants have been widely applied in numerous technological fields and they are considered as a popular tool. The detailed derivations of the standard KF and EKF are presented from a Bayesian perspective. The standard Kalman filter keeps tracking the estimated states of the system and the variances of the estimation results recursively, so it provides not only the state estimation but also the associated uncertainty in a real-time manner. Applications of the state estimation and the state-parameter estimation by using the standard KF and the EKF are presented, respectively. The results show that the standard KF and EKF have powerful tracking capabilities for the states in a dynamical system and they can achieve reasonable uncertainty quantification results for the estimated states. However, there are some critical challenges in the applications of the standard KF and its variants. For example, the statistical information of the process noise and the measurement noise in the KF and its variants is required. In addition, it is well-known that the KF is sensitive to asynchronism among different measurement channels. In

72

2 System Identification Using Kalman Filter and Extended Kalman Filter

the following chapters, we will introduce some recently developed methodologies for resolving those critical issues encountered in real-time system identification using the EKF.

References Al-Mohy AH, Higham NJ (2010) A new scaling and squaring algorithm for the matrix exponential. SIAM J Matrix Anal Appl 31(3):970–989 Andrieu C, Doucet A, Singh SS, Tadic VB (2004) Particle methods for change detection, system identification, and control. Proc IEEE 92(3):423–438 Chatzi EN, Smyth AW (2009) The unscented Kalman filter and particle filter methods for nonlinear structural system identification with non-collocated heterogeneous sensing. Struct Control Health Monit 16(1):99–123 Chen WH, Lu ZR, Lin W, Chen SH, Ni YQ, Xia Y, Liao WY (2011) Theoretical and experimental modal analysis of the Guangzhou New TV Tower. Eng Struct 33(12):3628–3646 Galka A, Yamashita O, Ozaki T, Biscay R, Valdés-Sosa P (2004) A solution to the dynamical inverse problem of EEG generation using spatiotemporal Kalman filtering. Neuroimage 23(2):435–453 Grewal MS, Andrews AP (2010) Applications of Kalman filtering in aerospace 1960 to the present. IEEE Control Syst Mag 30(3):69–78 Henderson HV, Searle SR (1981) On deriving the inverse of a sum of matrices. SIAM Rev 23(1):53– 60 Hoshiya M, Saito E (1984) Structural identification by extended Kalman filter. J Eng Mech 110(12):1757–1770 Jazwinski AH (1970) Stochastic processes and filtering theory. Academic Press Kalman RE (1960) A new approach to linear filtering and prediction problems. J Fluids Eng 82(1):35–45 Kendoul F (2012) Survey of advances in guidance, navigation, and control of unmanned rotorcraft systems. J Field Robot 29(2):315–378 Koh CG, See LM (1994) Identification and uncertainty estimation of structural parameters. J Eng Mech 120(6):1219–1236 Kreucher C, Kastella K, Hero AO (2005) Multitarget tracking using the joint multitarget probability density. IEEE Trans Aerosp Electron Syst 41(4):1396–1414 Lamus C, Hämäläinen MS, Temereanca S, Brown EN, Purdon PL (2012) A spatiotemporal dynamic distributed solution to the MEG inverse problem. Neuroimage 63(2):894–909 Masreliez C, Martin R (1977) Robust Bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans Autom Control 22(3):361–371 Musicki D, La Scala BF, Evans RJ (2007) Integrated track splitting filter-efficient multi-scan single target tracking in clutter. IEEE Trans Aerosp Electron Syst 43(4):1409–1425 Nesline FW, Zarchan P (1981) A new look at classical vs modern homing missile guidance. J Guid Control 4(1):78–85 Ni YQ, Xia Y, Liao WY, Ko JM (2009) Technology innovation in developing the structural health monitoring system for Guangzhou New TV Tower. Struct Control Health Monit 16(1):73–98 Ni YQ, Xia Y, Lin W, Chen WH, Ko JM (2012) SHM benchmark for high-rise structures: a reducedorder finite element model and field measurement data. Smart Struct Syst 10(4–5):411–426 Pan S, Su H, Chu J, Wang H (2010) Applying a novel extended Kalman filter to missile-target interception with APN guidance law: a benchmark case study. Control Eng Pract 18(2):159–167 Qi H, Moore JB (2002) Direct Kalman filtering approach for GPS/INS integration. IEEE Trans Aerosp Electron Syst 38(2):687–693 Siouris GM (2004) Missile guidance and control systems. Springer Science & Business Media Sorenson HW, Sacks JE (1971) Recursive fading memory filtering. Inf Sci 3(2):101–119

References

73

Vaiana N, Sessa S, Marmo F, Rosati L (2018) A class of uniaxial phenomenological models for simulating hysteretic phenomena in rate-independent mechanical systems and materials. Nonlinear Dyn 93(3):1647–1669 Wang W, Liu ZY, Xie RR (2006) Quadratic extended Kalman filter approach for GPS/INS integration. Aerosp Sci Technol 10(8):709–713 Xu H, Wang J, Zhan X (2012) Autonomous broadcast ephemeris improvement for GNSS using inter-satellite ranging measurements. Adv Space Res 49(6):1034–1044 Zarchan P, Musoff H (2000) Fundamentals of Kalman filtering: a practical approach. American Institute of Aeronautics and Astronautics Incorporated

Chapter 3

Real-Time Updating of Noise Parameters for System Identification

Abstract This chapter presents the algorithm for real-time updating of the noise covariance matrices in the extended Kalman filter. This content is motivated from practical applications, in which the noise statistics of the Kalman filter or its variants are usually not known a priori. To address this issue, a Bayesian probabilistic algorithm is developed to estimate the noise parameters which are utilized to parameterize the noise covariance matrices in the extended Kalman filter. A computationally efficient algorithm is then introduced to resolve the optimization problem formulated by using the Bayesian inference. The proposed method not only estimates the optimal noise parameters but also quantifies the associated estimation uncertainty in a realtime manner. This method does not impose any stationarity condition of the process noise and measurement noise. By removing the stationarity constraint in the extended Kalman filter, the proposed method enhances the applicability of the real-time system identification algorithm for nonstationary circumstances generally encountered in practice. Examples using stationary/nonstationary response of linear/nonlinear timevarying dynamical systems are presented to illustrate the practical aspects in real-time system identification. Keywords Bayesian inference · Extended Kalman filter · Measurement noise · Noise covariance matrices · Nonstationary response · Process noise

3.1 Introduction The Kalman filter (KF) and extended Kalman filter (EKF) introduced in Chap. 2 are attractive choices for state tracking, system identification and control design for dynamical systems. However, when a KF or an EKF is implemented on a real system, it may not work. One of the primary causes for the failure of KF and EKF is the misspecification of the covariance matrices of the process noise and the measurement noise. Kalman (1960) indicated the problem of unknown noise statistical properties and the need of a method to tackle it. As a result, the performance of KF and EKF depends on the assignment of the covariance matrices of the process noise and measurement noise. Hence, in practice, the process noise and measurement noise covariance matrices of the KF and EKF are usually assumed known (Hoshiya and © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Huang and K.-V. Yuen, Bayesian Real-Time System Identification, https://doi.org/10.1007/978-981-99-0593-5_3

75

76

3 Real-Time Updating of Noise Parameters for System Identification k1

c1

m1

k2

k3

m2

c2

k4

m3

m4

c4

c3

k5

m5

f (t)

c5

Fig. 3.1 Chain-like system

Saito 1984; Koh and See 1994; Lin and Zhang 1994). However, in most practical situations, this assumption is difficult to fulfill. Therefore, an ad hoc trial-and-error manner is often used to manually tune the covariance matrices of the process noise and the measurement noise (Azam et al. 2015). On the other hand, the covariance matrix of the filtered state vector can be obtained by using the EKF. Note that the model parameters of the underlying dynamical system are included in the augmented state vector for the implementation of the EKF. Therefore, a submatrix of the covariance matrix of the filtered state vector, corresponding to the model parameters entries in the augmented state vector, represents the posterior uncertainty of the model parameters. In other words, the EKF allows for uncertainty quantification, in principle, for the updated model parameters. However, we will demonstrate that this can be realized only when the aforementioned covariance matrices can be accurately assigned or estimated. This justifies the importance of the noise parameters estimation algorithm presented in this chapter. Example. Adhoc selection of the noise covariance matrices in EKF. Consider a 5-degree-of-freedom chain-like system depicted in Fig. 3.1. The mass and spring constants are taken to be m 1 = m 2 = m 3 = m 4 = m 5 = 1.0 kg and k1 = k2 = k3 = k4 = k5 = 5000 kN/m. The system was subjected to a stationary excitation f applied on the 5th degree of freedom (DOF) and this excitation is modeled as Gaussian with zero mean and covariance matrix Σ f . The governing equation of the system is: M x¨ (t) + C x˙ (t) + Kx(t) = T f (t)

(3.1)

where x = [x1 , x3 , x3 , x4 , x5 ]T is the nodal displacement vector; M = m 1 I5 is the mass matrix with I5 being the 5 × 5 identity matrix; K is the stiffness matrix: ⎡

2 ⎢ −1 ⎢ ⎢ K = θk k1 ⎢ 0 ⎢ ⎣ 0 0

−1 2 −1 0 0

0 −1 2 −1 0

0 0 −1 2 −1

⎤ 0 0 ⎥ ⎥ ⎥ 0 ⎥ ⎥ −1 ⎦ 2

(3.2)

As a result, the fundamental frequency of the system is 3.20 Hz. The damping matrix C is: C = αM + βK

(3.3)

3.1 Introduction

77

where α = 0.2998 s−1 and β = 2.536 × 10−4 s. The excitation distribution matrix T is: T = [0, 0, 0, 0, 1]T

(3.4)

Define the model parameter vector θ (t) and the augmented state vector y(t) as follows: θ (t) ≡ [θk , α, β]T

(3.5)

]T [ y(t) ≡ x(t)T , x˙ (t)T , θ (t)T ∈ R13

(3.6)

Then, Eq. (3.1) can be converted to the following state-space representation: ˙y(t) = g( y(t), f (t); θ (t))

(3.7)

where g(., .; .) represents the nonlinear state-space function. The excitation f is a stationary zero-mean Gaussian white noise with spectral intensity S0 = 5.0 × 10−5 m2 /s3 . As a result, the covariance matrix of the input is a scalar Σ f = σ 2f . It is assumed that discrete-time acceleration responses of all the DOFs are observed and thus the observation equation is given by: ( ) z i = h y i + ni ,

i = 1, 2, . . .

(3.8)

where z i ∈ R5 is the noise-corrupted acceleration measurements at the ith time step; yi = y(iΔt), where Δt is the sampling time step; h(·) defines the observation quantities; and ni represents the measurement noise at the ith time step and the measurement noise is modeled as Gaussian independent and identically distributed (i.i.d.) process with zero mean and covariance matrix Σ n = σn2 I5 . In particular, the rms of the measurement noise was taken to be 5% rms of the noise-free acceleration of the 5th node. The accelerations are measured with a sampling time step of 0.0025 s for 100 s. By using the EKF introduced in Chap. 2, the augmented state vector can be estimated recursively based on Eqs. (3.7) and (3.8). The derivation has been elaborated in detail in Chap. 2. It is noticed that the process noise covariance matrix in Eq. (2.66) and measurement noise covariance matrix in Eq. (2.69) need to be prescribed. In order to investigate the performance of the EKF with different noise covariance matrices, comparison using four representative sets of noise covariance matrices is presented. The representative sets of noise covariance matrices are listed in Table 3.1. The second and third columns refer to process noise covariance matrix and measurement noise covariance matrix, respectively. Case 1 refers to the situation with the actual noise covariance matrices. Case 2 refers to the situation that Σ n is fixed at its actual value with different value of Σ f . Case 3 refers to the situation that Σ f is fixed at its

78 Table 3.1 Selected noise covariance matrices for the EKF

3 Real-Time Updating of Noise Parameters for System Identification Case

Σf

Σn

1

0.126

2.62 × 10−4 I5

2

0.001

2.62 × 10−4 I5

3

0.126

2.62 × 10−3 I5

4

125.664

2.62 × 10−4 I5

actual value with different value of Σ n . Case 4 refers to the situation that Σ n is fixed at its actual value with different value of Σ f . The estimation results of the stiffness parameters and damping coefficients by using the EKF are shown in Fig. 3.2. The dotted lines represent the estimated values; the solid lines represent the actual values and the dashed lines represent the bounds of the 99.7% credible intervals. First, since the actual noise covariance matrices are utilized in Case 1, correct inference including both the estimated model parameters and the credible intervals can be achieved. In Case 2, the estimated model parameters fluctuate severely and are biased in the entire time histories. Compared with Case 1, there are substantial discrepancies in the obtained credible intervals in Case 3 when the measurement noise covariance matrix Σ n is different from the actual value. In Case 4, using this set of noise covariance matrices results in divergent estimation results although the measurement noise covariance matrix is exact.

Fig. 3.2 Estimation results of the stiffness parameter and damping coefficients using the EKF

3.1 Introduction

79

It is concluded that ad hoc assignment of the noise covariance matrices in EKF may lead to bias estimation, unreliable uncertainty estimation and even divergence problems. As a result, it is highly desired to investigate the updating approach for noise covariance matrices in KF and EKF. The parameters in the noise covariance matrices are often treated as tuning parameters in state and parameter estimation/tracking. Therefore, previous investigations for the estimation of the unknown noise covariance matrices focused on offline estimation of the covariance matrices in the case of the linear KF (Mehra 1970; Mohamed and Schwarz1999; Odelson et al. 2006b; Yuen et al. 2007). Some of the investigators claimed that their methods have the potential for online application and have been the starting point for the current work for online estimation of the noise covariance matrices. The methodologies for online estimating the noise covariance matrices can be classified into four categories, i.e., covariance matching (Myers and Tapley 1976; Odelson et al. 2006a), correlation techniques (Bèlanger 1974), maximum likelihood methods (Dempster et al. 1977; Bavdekar et al. 2011) and Bayesian methods (Shumway and Stoffer 2000; Yuen et al. 2013; Yuen and Kuok 2016). Most of the existing works focused on the estimation of the process and measurement noise in the case of linear dynamical systems, and there is a very limited number of works that encompass real-time noise identification in a nonlinear dynamical system context, which is important for structural health monitoring. In this chapter, a real-time updating approach for the covariance matrices of the process noise and measurement noise in EKF using output-only measurements is developed. The proposed method propagates simultaneously with the system identification for time-varying dynamical systems. This algorithm is established based on the Bayesian inference, so it provides not only the most probable values of the noise parameters characterizing the noise covariance matrices, but also the associated estimation uncertainty. It avoids the possible divergence and instability problem caused by adhoc selection of the noise covariance matrices in the implementation of the KF and EKF. Moreover, reliable real-time uncertainty quantification can be achieved due to the reliable estimation of the noise parameters. Uncertainty quantification is important for many applications, such as damage detection and reliability analysis. In the next section, the simultaneous updating scheme for model parameters and noise parameters is presented in a real-time manner and the characterization of the process and measurement noise covariance matrices is introduced. The Bayesian probabilistic framework is presented for real-time updating of the noise parameters in EKF. A computationally efficient algorithm for solving the optimization problem formulated in Sect. 3.2 is elaborated in Sect. 3.3. In Sect. 3.4, illustrative examples are presented in the application of system identification for a five DOFs Bouc-Wen hysteretic system and a three-pier bridge. Finally, conclusion of the method will be given in Sect. 3.5.

80

3 Real-Time Updating of Noise Parameters for System Identification

3.2 Real-Time Updating of Dynamical Systems and Noise Parameters 3.2.1 Updating of States and Model Parameters Consider the state-space equation of general linear/nonlinear dynamical systems with Nd DOFs: x˙ (t) = G(x(t), f (t); θ (t))

(3.9)

where x(t) ∈ R Nx is the state vector of the system; θ (t) ∈ R Nθ is the possibly time-varying model parameters vector and G(., .; .) is the governing equation of the underlying dynamical system. f is the excitation applied to the system and it is modeled as an N f -variate zero-mean Gaussian stochastic process. Define the [ ]T augmented state vector y(t) ≡ x(t)T , θ (t)T ∈ R Nx +Nθ composed of the system state vector and unknown model parameter vector. Then, Eq. (3.9) can be expressed as the following augmented state-space equation: ˙y(t) = g( y(t), f (t); θ (t))

(3.10)

Equation (3.10) can be expanded in a Taylor series around the state y(t) = y∗ as follows: I I ( ) ∂g II ∂g II ∗ ˙y(t) = g| y= y∗ + y− y + f (3.11) ∂ y I y= y∗ ∂ f I y= y∗ where higher-order terms are neglected with the assumption that y is close to y∗ . Then, Eq. (3.11) can be discretized to the following equation: yi+1 = Ai yi + Bi f i + δ i

(3.12)

where yi ≡ y(iΔt); f i ≡ f (iΔt), and Δt is the sampling time step; Ai and Bi are the transitional and input-to-state matrices, respectively; and δ i is the remainder term due to the local linear approximation. Detailed derivation of Eq. (3.12) and Ai , Bi , δ i can be found in Chap. 2. The uncertain excitation f i at the ith time step is Gaussian with zero mean and covariance matrix Σ f,i . Discrete-time response measurements z 1 , z 2 , . . . , z i+1 are observed at N0 DOFs: ) ( z i+1 = h yi+1 + ni+1

(3.13)

where h(·) defines the observation quantities; ni+1 represents the measurement noise at the (i + 1)th time step and n is modeled as Gaussian i.i.d. process with zero mean and covariance matrix Σ n,i+1 ∈ R N0 ×N0 .

3.2 Real-Time Updating of Dynamical Systems and Noise Parameters

81

Given the measurement dataset Di = {z 1 , z 2 , . . . , z i }, the one-step-ahead predicted state vector yi+1|i and its covariance matrix Σ i+1|i can be obtained: yi+1|i = Ai yi|i + δ i

(3.14)

) ( Σ i+1|i = λ Ai Σ i|i Ai T + Bi Σ f,i Bi T

(3.15)

where λ is the fading factor introduced in Chap. 2. When a new measurement z i+1 is available, the updated state vector and its covariance matrix can be obtained by Kalman filter (Kalman, 1960). The detailed derivation can be found in Chap. 2. The Kalman gain matrix can be obtained as: ( )−1 Gi+1 = Σ i+1|i Hi+1 T Hi+1 Σ i+1|i Hi+1 T + Σ n,i+1

(3.16)

where Hi+1 is the observation matrix at the (i + 1)th time step. Then, the updated state vector and its associated covariance matrix are readily obtained: ( )) ( yi+1|i+1 = yi+1|i + Gi+1 zi+1 − h yi+1|i

(3.17)

) ( Σ i+1|i +1 = I2Nd +Nθ − Gi+1 Hi+1 Σ i+1|i

(3.18)

It is straightforward to notice that the accuracy of the updated state vector and its associated covariance matrix depends on the covariance matrix of the process noise Σ f,i in Eq. (3.15) and the covariance matrix of the measurement noise Σ n,i+1 in Eq. (3.16). Moreover, the posterior uncertainty of the model parameters can be represented by the submatrix of the covariance matrix of the updated state vector, corresponding to the model parameters included in the augmented state vector. Thus, the EKF allows for uncertainty quantification of the updated model parameters. However, since the resultant posterior covariance matrix in Eq. (3.18) depends fully on the noise covariance matrices, which are usually prescribed by the user subjectively in practice. As a result, the uncertainty quantification results will be arbitrary due to the arbitrary assignment of the noise covariance matrices. For example, if both noise covariance matrices are scaled up by a factor of ρ, the covariance matrices of predicted state vector and filtered state vector will also be scaled up by the same factor of ρ according to Eqs. (3.15) and (3.18). Therefore, improper choice of these matrices will lead to erroneous uncertainty quantification results.

82

3 Real-Time Updating of Noise Parameters for System Identification

3.2.2 Updating of Noise Parameters 3.2.2.1

Parameterization of Noise Covariance Matrices

The covariance matrices of the process noise and measurement noise are parameterized as follows: Σ f,i = Σ f,i (ψ f,i )

(3.19)

Σ n,i+1 = Σ n,i+1 (ψ n,i+1 )

(3.20)

where ψ f,i and ψ n,i+1 are the noise parameters of the covariance matrices Σ f,i and Σ n,i+1 , respectively. Then, the noise parameter vector ψ i+1 can be defined to group the noise parameters ψ f,i and ψ n,i+1 : [ ]T ψ i+1 ≡ ψ f,i T , ψ n,i+1 T ∈ R Nψ

(3.21)

where Nψ indicates the number of parameters in the noise parameter vector ψ i+1 . It is noted that this parameterization does not require any stationarity condition of the process noise and the measurement noise.

3.2.2.2

Posterior PDF of Noise Parameters

By using the Bayes’ theorem, the posterior probability density function (PDF) of the noise parameter vector given the measurement data set Di+1 is given by: I I ) ( ) ( p ψ i+1 I Di+1 = p ψ i+1 I z i+1 , Di I ( ) =κ0 p( ψ i+1 I Di ) p z i+1 |ψ i+1 , Di

(3.22)

where κ0 is a normalizing constant such that theIintegral of the posterior PDF over I the entire domain of ψ(i+1 yields unity; ) p( ψ i+1 Di ) is the prior PDF of the noise parameter vector and p z i+1 |ψ i+1 , Di is the likelihood function. By assuming I that the noise covariance matrices are slowly time-varying, the prior PDF p( ψ i+1 I Di ) can be approximated as the Gaussian distribution. The mean is taken as the estimation of the previous time step ψ i|i and the covariance matrix is taken as the covariance matrix of the previous time step Σ ψ,i|i . As a result, this prior PDF can be expressed as: ] [ N I I− 1 I ) )T −1 ( ) 1( − 2ψ I 2 I I λΣ ψ,i|i ψ exp − − ψ i|i Σ ψ,i|i ψ i+1 − ψ i|i p ψ i+1 Di = (2π ) 2λ i+1 (3.23) (

3.2 Real-Time Updating of Dynamical Systems and Noise Parameters

83

where λ(≥ 1) is the fading factor used to dilute the contribution of the measurements in the past and to weigh more measurements. ) ( heavily the recent The likelihood function p z i+1 |ψ i+1 , Di reflects the contribution of the measurements z i+1 in establishing the posterior PDF at the (i + 1)th time step and it is given by: I− 1 ) ( N0 I p zi+1 |ψ i+1 , Di =(2π )− 2 IΣz,i+1|i I 2 [ ] )T −1 ( ) 1( × exp − zi+1 − zi+1|i Σz,i+1|i zi+1 − zi+1|i 2

(3.24)

where z i+1|i is the one-step-ahead predicted observation and it can be obtained by taking the expectation of Eq. (3.13): ] [ z i+1|i ≡ E z i+1 |Di = H yi+1|i = HAi yi|i + Hδ i

(3.25)

In addition, Σ z,i+1|i is the covariance matrix of the one-step-ahead predicted observation and it can be obtained by using Eqs. (3.15) and (3.25): Σ z,i+1|i ≡E

[(

z i+1 − z i+1|i

)(

z i+1 − z i+1|i

)T II ] I Di

=HΣ y,i+1|i HT + Σ n,i+1 =λHAi Σ y,i|i AiT HT + λHBi Σ f,i BiT HT + Σ n,i+1

(3.26)

It is noticed that the noise parameters appear implicitly on the right-hand side of Eq. (3.24) through Σ z,i+1|i . ( I ) Then, the posterior PDF p ψ i+1 I Di+1 can be readily obtained by substituting Eqs. (3.23) and (3.24) into Eq. (3.22): [ I I I− 1 ) )T −1 ( ) ( 1( ψ i+1 − ψ i|i Σψ,i|i ψ i+1 − ψ i|i p ψ i+1 I Di+1 = κ1 IΣz,i+1|i I 2 exp − 2λ ] )T −1 ( ) 1( zi+1 − zi+1|i − zi+1 − zi+1|i Σz,i+1|i 2 (3.27) Nψ +N0 I I− 1 where the constant κ1 = κ0 (2π )− 2 IλΣ ψ,i|i I 2 does not depend on the noise parameters in ψ i+1 . As a result, the objective function can be defined as the negative algorithm of the posterior PDF without including the constant term:

I ) ( ) ( J ψ i+1 ≡ −ln p ψ i+1 I Di+1 [ I 1( )T −1 ( ) 1 II ln Σz,i+1|i I + ψ i+1 − ψ i|i Σψ,i|i ψ i+1 − ψ i|i = 2 λ ( )T −1 ( )] + zi+1 − zi+1|i Σz,i+1|i zi+1 − zi+1|i (3.28)

84

3 Real-Time Updating of Noise Parameters for System Identification

The updated noise parameter vector ψ i+1|i+1 can be obtained by maximizing the I ) ( posterior PDF p ψ i+1 I Di+1 in Eq. (3.27), which is equivalent to minimizing the objective function in Eq. (3.28): ( ) ψ i+1|i+1 = arg min J ψ i+1 ψ i+1

(3.29)

It is obvious that there is no closed-form solution for the optimization problem in Eq. (3.29), so a computationally efficient algorithm is desirable and it will be presented next.

3.3 Efficient Numerical Optimization Scheme In this section, an efficient numerical optimization scheme is proposed to resolve the optimization problem in Eq. (3.29). The optimization scheme consists of two phases, namely training phase and working phase.

3.3.1 Training Phase In the early stage of the filter propagating process, the initial condition and the prior distribution dominate the posterior PDF in Eq. (3.27) since only a limited amount of measured data is available. Moreover, due to the inaccuracy of the initial condition and the lack of information from the measurements, the posterior PDF of the noise parameter vector is associated with large uncertainty, so direct application of Newton’s gradient method in the training phase will lead to erroneous and possibly diverging estimation results. As a result, in order to handle this challenging numerical environment, a heuristic stochastic local search method is presented for the training phase. First, a candidate pool Θi+1 is constructed to consist of Nτ candidate parameter vectors for the (i + 1)th time step: } { Θi+1 ≡ ψ i+1 = ψ i|i ∪ { } ) ( (l) ψ i+1 : ψi+1 = 1 + στ ξm(l) ψ (l) , m = 1, . . . , N − 1; l = 1, . . . , N τ ψ i|i (3.30) (l) where ψi+1 and ψ (l) i|i are the lth component of the noise parameter vector ψ i+1 and the updated noise parameter vector ψ i|i , respectively. The variable στ = 2−N controls the pace and precision of the estimation in the training phase. A larger value corresponds to larger step size, in the statistical sense, with lower estimation precision. ξm(l) is the standard Gaussian random variable truncated to be larger than −1/στ . The candidate

3.3 Efficient Numerical Optimization Scheme

85

parameter set Θi+1 includes the updated noise parameter vector of the previous time step and Nτ − 1 generated candidate ) parameter vectors in its neighborhood. ( The objective function J ψ i+1 in Eq. (3.28) will then be evaluated for all these Nτ candidates. The updated noise parameter vector at the (i + 1)th time step is the parameter vector candidate which provides the minimum objective function value within the parameter candidate pool Θi+1 : ψ i+1|i+1 = arg

min

ψ i+1 ∈Θi+1

) ( J ψ i+1

(3.31)

The training phase will be completed/terminated if the following two criteria are satisfied. First, it has been implemented for more than ten fundamental periods of the underlying dynamical systems. Second, the updated noise parameter vector remains identical for ten consecutive time steps. The termination criteria can be expressed as follows: ( ) ( i ≥ INT 10T period /Δt (3.32) j = 1, 2, . . . , 10 ψ i+ j|i+ j = ψ i|i , where INT takes the nearest integer toward zero and T period is the fundamental period of the underlying dynamical system. Note that it is only required to have a rough estimation of T period . The termination criteria ensure that sufficient information has been gained from the data for the estimation of the state, the model parameters and the noise parameters. This training phase provides a robust strategy to obtain a preliminary solution of the noise parameter vector. At the end of this phase, the preliminary solution is within a sufficiently small neighborhood of the true optimal point of the posterior PDF. As a result, the posterior PDF can be well approximated by a Gaussian distribution.

3.3.2 Working Phase In the working phase, the gradient method is utilized to solve the optimization problem in Eq. (3.29). The updated noise parameter vector can be obtained by: ) ( ψ i+1|i +1 = ψ i|i − Σ ψ,i|i ∇ J ψ i|i

(3.33)

) ( where ∇ J ψ i|i is the gradient of the objective function evaluated at ψ i = ψ i|i . It can be computed numerically using the finite difference method and its lth component is given by: ( )I ( ) ∂ J ψ i|i II ∇ J ψ i|i ≡ I ∂ψi(l) Iψ =ψ (l)

i

i|i

86

3 Real-Time Updating of Noise Parameters for System Identification

=

) ( )] 1 [ ( J ψ i|i + Δψ l − J ψ i|i − Δψ l 2Δψl

(3.34)

where Δψ l is a vector with all elements being zero expect the lth element equal to a properly selected small step Δψl (> 0): Δψ l = [0, . . . , 0, Δψl , 0, . . . , 0]T .

(3.35)

Since the information contained from one data point is limited, the estimated noise parameter vectors in two consecutive time steps are close to each other. Therefore, Eq. (3.33) provides accurate solution for the optimization problem without any iteration.

3.3.3 Uncertainty Estimation of the Updated Noise Parameters By using the Bayesian inference, not only the optimal noise parameter vector ψ i+1|i +1 can be obtained but also the associated uncertainty can be quantified in the form of the covariance matrix Σ ψ,i+1|i+1 . This covariance matrix is required for the evaluation of the objective function in Eq. (3.28). For a large number of data points, it can be approximated by the inverse of the Hessian matrix of the objective function calculated at ψ i+1 = ψ i+1|i+1 (Yuen 2010): )]−1 [ ( Σ ψ,i+1|i+1 = H J ψ i+1|i+1

(3.36)

( ) where H J ψ i+1|i+1 is the Hessian matrix of the objective function evaluated at ψ i+1 = ψ i+1|i+1 , and its diagonal elements are given by: ( ) ψ i+1|i+1 H(l,l) J [ ( ( ) )] ∂ J ψ i+1|i+1 ∂ = (l) (l) ∂ψi+1 ∂ψi+1 ψ i+1 =ψ i+1|i+1 ⎤ ⎡ ( ( ) II ) II ∂ J ψ ∂ J ψ 1 ⎣ i+1|i+1 I i+1|i+1 I ⎦ − ≈ I I (l) (l) I I Δψl ∂ψi+1 ∂ψ Δψ l Δψ l i+1 ψ i+1 =ψ i+1|i+1 + 2 ψ i+1 =ψ i+1|i+1 − 2 [ ( ( ) ( ) ) ( )] J ψ i+1|i+1 − J ψ i+1|i+1 − Δψ l J ψ i+1|i+1 + Δψ l − J ψ i+1|i+1 1 ≈ − Δψl Δψl Δψl ( ) ( ) ( ) J ψ i+1|i+1 + Δψ l − 2J ψ i+1|i+1 + J ψ i+1|i+1 − Δψ l = (3.37) (Δψl )2

3.3 Efficient Numerical Optimization Scheme

87

Moreover, the off-diagonal elements can be computed as follows: ) (l,l ' ) ( ψ i+1|i+1 HJ [ ( ( ) )] ∂ J ψ i+1|i+1 ∂ = (l ' ) (l) ∂ψi+1 ∂ψi+1 ψ i+1 =ψ i+1|i+1 ⎡ ( ⎤ ) II )I ( ∂ J ψ i+1|i+1 II 1 ⎣ ∂ J ψ i+1|i+1 I ⎦ − ≈ I I (l) (l) I I 2Δψl ' ∂ψi+1 ∂ψi+1 ψ i+1 =ψ i+1|i+1 +Δψ l ' ψ i+1 =ψ i+1|i+1 −Δψ l ' [ ( ) ( ) J ψ i+1|i+1 + Δψ l + Δψ l ' − J ψ i+1|i+1 − Δψ l + Δψ l ' 1 ≈ 2Δψl ' 2Δψl ) ( )] ( J ψ i+1|i+1 + Δψ l − Δψ l ' − J ψ i+1|i+1 − Δψ l − Δψ l ' − 2Δψl [ ( ) ( ) 1 = J ψ i+1|i+1 + Δψ l + Δψ l ' − J ψ i+1|i+1 − Δψ l + Δψ l ' 4Δψl Δψl ' ) ( )] ( (3.38) − J ψ i+1|i+1 + Δψ l − Δψ l ' + J ψ i+1|i+1 − Δψ l − Δψ l ' where Δψ l and Δψ l ' are the vectors with zero elements except the lth and l ' th elements being Δψl and Δψl ' , respectively. This approximation can be applied to both training and working phases. However, at the very beginning of the filter propagation process, the approximation in Eq. (3.36) may not be accurate due to the large uncertainty of the posterior PDF. This can be reflected in the violation of the positive-definiteness condition for the resultant covariance matrix. This is because the updated state vector may be far from the true optimal value of the posterior PDF at the early stage of the estimation. As a result, it is necessary to examine whether the covariance matrix Σ ψ,i+1|i+1 of the updated noise parameter vector is positive-definite during the training phase. When the covariance matrix Σ ψ,i+1|i+1 does not satisfy the positive-definite condition, it will be replaced by the covariance matrix Σ ψ,i|i of the previous time step. When the observations accumulate over time, the estimated noise parameters will approach the true optimal point of the posterior PDF and the posterior uncertainty will be reduced. As a result, Eq. (3.36) provides an accurate approximation of the covariance matrix. In this case, the positive-definiteness condition is automatically fulfilled.

88

3 Real-Time Updating of Noise Parameters for System Identification

3.4 Applications 3.4.1 Bouc-Wen Hysteresis System The first example uses a Bouc-Wen system with Nd = 5 DOFs shown in Fig. 3.3. The governing equation of this nonlinear system is given by: M x¨ (t) + C[θ c (t)] x˙ (t) + K[θ k (t)]r(t) = T f (t) r˙n (t) = x˙n − μ|x˙n ||rn |η−1 rn − κ x˙n |rn |η ,

(3.39)

n = 1, 2, . . . , 5

(3.40)

where x¨ (t), x˙ (t) and r(t) are the acceleration, velocity, and restoring force vector, respectively; M, C and K are the mass, damping and stiffness matrix of the system, respectively; the stiffness and damping matrix are parameterized with possibly timevarying parameters θ k (t) and θ c (t), respectively; f (t) is the excitation applied to the system and T is the influence matrix associated with the excitation f . The stiffness matrix K is parameterized as: K=

5 {

(n) θ (n) k K

n=1 θk(1) k1

⎡ ⎢ ⎢ ⎢ =⎢ ⎢ ⎣

⎤ + θk(2) k2 −θk(2) k2 0 0 0 θk(2) k2 + θk(3) k3 −θk(3) k3 0 0 ⎥ −θk(2) k2 ⎥ ⎥ (3) (3) (4) (4) θk k3 + θk k4 −θk k4 0 ⎥ 0 −θk k3 ⎥ θk(4) k4 + θk(5) k5 −θk(5) k5 ⎦ 0 0 −θk(4) k4 0 0 0 −θk(5) k5 θk(5) k5 (3.41)

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

Bouc-Wen

c1

c2

c3

c4

c5

Fig. 3.3 Bouc-Wen hysteresis system

3.4 Applications

89

where K(n) is the nth substructural stiffness matrix and θ k (t) = [ ]T (1) (2) (3) (4) (5) θk , θk , θk , θk , θk is used to represent the integrity of the system. The actual stiffnesses are taken as kn = 2000 N/m, n = 1, . . . , 5. Rayleigh damping was used so the damping matrix was given by C = αM + βK, where the damping coefficients were given by α = 0.38 s−1 and β = 8.02 × 10−4 s. The characteristic parameters of the Bouc-Wen system were taken to be μ = 1000 s2 /m2 , κ = 1500 s2 /m2 and η = 2. In this example, στ = 2−8 and Nτ = 20 were taken for the training phase. The measurements included the velocity responses of the 1st, 3rd and 5th DOFs. The entire monitoring period was 300 s and the sampling time interval was Δt = 0.002 s. The system was undamaged in the first 170 s. Sudden damages occurred in all the five nonlinear springs. Specifically, 10% stiffness reduction of the first two springs and 5% stiffness reduction of the other three springs occurred at t = 170 s. The following three stationary scenarios of the process noise and measurement noise are investigated: Case 1. Stationary base excitation and stationary measurement noise Case 2. Nonstationary base excitation and stationary measurement noise Case 3. Nonstationary base excitation and nonstationary measurement noise

3.4.1.1

Case 1. Stationary Base Excitation and Stationary Measurement Noise

In the 1st case, the hysteresis system was subjected to stationary base excitation which was modeled as zero-mean Gaussian white noise with spectral intensity S0 = 6.0 × 10−3 m2 /s3 . The rms of the measurement noise was taken to be 5% rms of the noise-free response of the 5th DOF. As a result, the noise parameter vector consisted of two parameters: [ ]T [ ]T 2 ψ i+1 = ψ f,i , ψn,i+1 = σ 2f,i , σn,i+1

(3.42)

The initial values of the noise parameter vector and the associated covariance matrix were taken as: ψ 0|0 = [1, 1]T

(3.43)

1 I2 9

(3.44)

Σ ψ,0|0 =

The estimation results of the stiffness parameters are first shown in Fig. 3.4. The dotted lines represent the estimated values; the solid lines represent the actual values and the dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be used to other later figures. The proposed method provided satisfactory estimation results of the stiffness parameters and the estimation results

90

3 Real-Time Updating of Noise Parameters for System Identification

Fig. 3.4 Estimation results of the stiffness parameters of the Bouc-Wen system (Case 1)

were within the 99.7% credible intervals. Moreover, the abrupt change of the stiffness parameters could be well captured. Figure 3.5 shows the estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen hysteresis system. It can be observed that the estimated values approached the actual values and were within the 99.7% credible intervals. Figure 3.6 shows the time histories of the estimation results of the noise parameters. The training phase was terminated at t = 7.68 s shown by a vertical dashed line, which is larger than ten small-amplitude periods of the hysteresis system 10T period ≈ 5 s. The fundamental period T period could be estimated roughly from a simple Fourier spectrum of the system response. Since both the excitation and measurement noise were stationary in this case, the actual noise parameters remained constant throughout the entire monitoring duration. The estimation results at the early propagation stage were fluctuating severely because the initial noise parameters were far from the actual values. However, the estimation results approached the actual values with reasonable credible intervals after t = 20 s. In addition, after the abrupt drop of the stiffness parameters at t = 170 s, the estimated noise parameter of the excitation ψ f was slightly lower than the actual value. This was caused by the time delay of the estimation of the stiffness parameters. Due to the time delay, the identified stiffness parameters were larger than the actual decreasing values in this period of time. To compensate this error, the estimated process noise variance was

3.4 Applications

91

Fig. 3.5 Estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen system (Case 1)

lower than the actual value. Meanwhile, the estimated measurement noise parameter ψn was overestimated to maintain the total contribution by the process noise and measurement noise to the one-step-ahead predicted state vector and filtered state vector. Nevertheless, these were recovered promptly after more data points of the damaged system were obtained. Moreover, the results show that the proposed approach provided reasonable quantification of the posterior uncertainty. Except the short period around the sudden damage of the system, the 99.7% credible intervals were able to cover the actual values of the noise parameters in the entire monitoring duration.

3.4.1.2

Case 2. Nonstationary Base Excitation and Stationary Measurement Noise

In Case 2, the hysteresis system was subjected to nonstationary base excitation, which was modeled as a modulated zero-mean Gaussian white noise process: f (t) = W f (t)g(t) ¨

(3.45)

92

3 Real-Time Updating of Noise Parameters for System Identification

Fig. 3.6 Estimation results of the noise parameters of the Bouc-Wen system (Case 1)

where g¨ is the stationary zero-mean Gaussian white noise with spectral intensity S0 = 6.0×10−3 m2 /s3 and W f (t) is the amplitude modulating function of excitation given by: ( W f (t) =

1 1+

t−ta exp(1 tb



t−ta ) tb

0 < t ≤ ta t > ta

(3.46)

where ta = 150 s and tb = 20 s, so this modulating function reaches its maximum value at t = ta + tb = 170 s: max W f (t) = W f (ta + tb ) = 2 t

(3.47)

The rms of the measurement noise was taken to be 5% rms of the noise-free response of the 5th DOF. Therefore, the noise parameter vector consisted of two parameters: ]T [ ]T [ 2 ψ i+1 = ψ f,i , ψn,i+1 = σ 2f,i , σn,i+1

(3.48)

3.4 Applications

93

The initial values of the noise parameter vector and the associated covariance matrix were taken by using Eqs. (3.43) and (3.44), respectively. The estimation results of the stiffness parameters are shown in Fig. 3.7 and the estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen hysteresis system are shown in Fig. 3.8. The results show that the estimated stiffness parameters, damping coefficients and characteristic parameters of the BoucWen system agreed well with the actual values and the estimation results were within the 99.7% credible intervals. The abrupt reduction of the stiffness parameters could be well captured. Acceptably small time delay of the estimation of the stiffness parameters was observed since the estimations were obtained by using the data at the current and previous time steps. Figure 3.9 shows the time histories of the estimation results of the noise parameters. The training phase was terminated at t = 8.91 s and this time instant was shown with a vertical dashed line in Fig. 3.9. The variation of the process noise parameter could be successfully tracked and the estimated measurement noise parameter agreed well with the actual value. It can be observed that there was time delay for tracking the peak value of the nonstationary process noise. This time lag was expected because the estimation was based on the data at the current and previous time steps, so it took time to fade out the effect of the previous data. Nevertheless, this delay was acceptably small and the overall estimation of ψ f was very accurate. Moreover, the

Fig. 3.7 Estimation results of the stiffness parameters of the Bouc-Wen system (Case 2)

94

3 Real-Time Updating of Noise Parameters for System Identification

Fig. 3.8 Estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen system (Case 2)

estimated noise parameter of the excitation ψ f was slightly lower than the actual values when tracking the sudden change of the stiffness parameters. This was caused by the time delay of the estimation of the stiffness parameters. On the other hand, the estimated measurement noise parameter ψn was overestimated in this period of time to maintain the total contribution of the process noise and measurement noise to the one-step-ahead predicted state vector and filtered state vector. Nevertheless, the overall estimated noise parameters were satisfactory.

3.4.1.3

Case 3. Nonstationary Base Excitation and Nonstationary Measurement Noise

In Case 3, the proposed method is demonstrated with an application under nonstationary process noise and nonstationary measurement noise. The Bouc-Wen hysteresis system was subjected to nonstationary base excitation, which was modeled as a modulated zero-mean Gaussian white noise process by using Eqs. (3.45) and (3.46), where the parameters S0 , ta and tb were taken the same as Case 2. The measurement noise was modeled as:

3.4 Applications

95

Fig. 3.9 Estimation results of the noise parameters of the Bouc-Wen system (Case 2)

n(t) = Wn (t)n0 (t)

(3.49)

where n0 is an N0 -variate Gaussian i.i.d. process with zero mean and covariance matrix given by: Σ n 0 = σn20 I N0

(3.50)

where σn 0 was taken as 5% rms of the noise-free response of the 5th DOF. The modulating function Wn (t) was given by: ( Wn (t) =

1 1+

t−ta exp(1 2tb



t−ta ) tb

0 < t ≤ ta t > ta

(3.51)

where ta = 150 s and tb = 20 s. As a result, the noise parameter vector consisted of two parameters: [ ]T [ ]T 2 ψ i+1 = ψ f,i , ψn,i+1 = σ 2f,i , σn,i+1

(3.52)

96

3 Real-Time Updating of Noise Parameters for System Identification

The initial values of the noise parameter vector and the associated covariance matrix were taken by using Eqs. (3.43) and (3.44), respectively. The estimation results of the stiffness parameters are shown in Fig. 3.10 and the estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen system are shown in Fig. 3.11. The results show that the proposed method provided satisfactory estimation results of all the model parameters with reasonable credible intervals, indicating its applicability to nonstationary scenario without prior information. The abrupt reduction of the stiffness parameters could be well tracked with slight time delay. Figure 3.12 shows the time histories of the estimation results of the noise parameters. The training phase was terminated at t = 11.33 s shown with a vertical dashed line. The variation of the process noise and measurement noise could be successfully tracked and the estimated values were within the 99.7% credible intervals. It is realized that the proposed method could track the time-varying noise parameters and provided reasonable credible intervals of the estimation. It is also observed that there was time delay for tracking the peak values of the nonstationary process noise and measurement noise. This was expected because the estimation was based on the data at the current and previous time steps, so it took time to fade out the effect of the

Fig. 3.10 Estimation results of the stiffness parameters (Case 3)

3.4 Applications

97

Fig. 3.11 Estimation results of the damping coefficients and the characteristic parameters of the Bouc-Wen system (Case 3)

previous data. Nevertheless, this delay was acceptably small and the overall estimation of ψ f and ψn was very accurate. Again, the estimated process noise parameter ψ f was slightly lower than the actual values during tracking the sudden change of the stiffness parameters, while the estimated measurement noise parameter ψn was overestimated in this period of time. This was caused by the time delay of the estimation of the stiffness parameters.

3.4.2 Three-Pier Bridge A bridge with three piers is considered in this example (Fig. 3.13). The bridge has pin supports at its two ends (i.e., the 1st and 17th nodes) and the bottom of the three piers (i.e., the 19th, 21st and 23rd nodes). It has a span of 256 m, and the length of its three piers is 16 m. The deck is divided into 16 components (each with 16 m) and the pier is divided into 2 components (each with 8 m). The deck has uniform box

98

3 Real-Time Updating of Noise Parameters for System Identification

Fig. 3.12 Estimation results of the noise parameters (Case 3)

cross-section with area 1.56 m2 and weak axis moment of inertia 4.02 m4 . The three piers have the same circular cross-section with area 1.57 m2 . The mass density is 3780 kg/m3 and the modulus of elasticity is 2 GPa. As a result, the first five natural frequencies are 0.47, 0.53, 0.70, 0.89 and 1.36 Hz. The damping matrix is given by C = αM + βK, where α = 0.063 s−1 and β = 0.006 s, so the damping ratios for the first two modes are 2%. The components of the{ bridge were separated into (n) seven groups and the stiffness matrix was given by K = 7n=1 θ (n) k K . Specifically, one stiffness parameter was assigned to every four components on the deck, and one stiffness parameter was assigned to each pier. 1

2

3

4

5

6

7

8

9

10

11

12

18

20

22

19

21

23

Fig. 3.13 Bridge model

13

14

15

16

17

3.4 Applications

99

The entire monitoring period was 300 s and the sampling frequency was 400 Hz. In this example, στ = 2−8 and Nτ = 20 were taken for the training phase. The measurements included the horizontal and vertical acceleration responses of the 5th, 9th, 13th, 18th, 20th and 22nd nodes, so the number of observed DOFs was N0 = 12. The rms of the measurement noise was taken as 5% rms of the noise-free horizontal acceleration of the 9th node. As a result, the covariance matrix of the measurement noise can be expressed as: 2 Σ n,i+1 = σn,i+1 I12

(3.53)

Moreover, the bridge was undamaged during the first 100 s. Then, sudden damage with 5% stiffness reduction occurred in the deck between the left support and the left pier at t = 100 s. Afterwards, the same level of damage occurred in the right pier of the bridge at t = 200 s. We consider two different scenarios of the excitations: Case 1. Nonstationary horizontal ground excitation Case 2. Stationary horizontal ground excitation and nonstationary vertical ground excitation

3.4.2.1

Case 1. Nonstationary Horizontal Ground Excitation

In the first case, the bridge was subjected to nonstationary horizontal ground excitation, which was modeled as a modulated zero-mean Gaussian white noise given as follows: f (t) = f h (t) = W fh (t)g¨ h (t)

(3.54)

where g¨ h is the horizontal ground acceleration modeled as zero-mean Gaussian white noise with spectral intensity Sh = 5.0 × 10−6 m2 /s3 ; W fh (t) was the modulating function given by: ( W fh (t) =

1 1+

t−ta exp(1 tb



t−ta ) tb

0 < t ≤ ta t > ta

(3.55)

where ta = 120 s and tb = 20 s. As a result, the covariance matrix of the excitation can be expressed as: Σ f,i = σ 2fh ,i

(3.56)

Therefore, the noise parameter vector consisted of two parameters: ]T [ ]T [ 2 ψ i+1 = ψ f,i , ψn,i+1 = σ 2fh ,i , σn,i+1

(3.57)

100

3 Real-Time Updating of Noise Parameters for System Identification

The initial noise parameter vector and its associated covariance matrix were taken as: ψ 0|0 = [1, 1]T

(3.58)

1 I2 9

(3.59)

Σ ψ,0|0 =

The estimation results of the stiffness and damping coefficients are shown in Figs. 3.14 and 3.15, respectively. The dotted lines represent the estimated values; the solid lines represent the actual values and the dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be used to other later figures. It can be observed that the estimated model parameters fluctuated severely at the beginning of the filter propagating process because the initial values were far from the actual values. When more data points were acquired, the estimated model parameters were stable and approached the actual values and they were within the 99.7% credible intervals. In addition, a slight time delay for tracking abrupt changes of θk(1) and θk(7) could be observed since the identification results depend on the measurements at the current and previous time steps. Nevertheless, the time delay was acceptably small. Figure 3.16 shows the estimation results of the noise parameters. The vertical dashed line in Fig. 3.16 represents the time instant t = 21.32 s when the training phase was terminated. It is not surprising that there was small time delay for the estimation result of the peak values of the nonstationary process noise. The parameters were estimated based on the current and previous data, so it took time to fade out the effect of the previous data. Nevertheless, this delay was acceptably small and the proposed method provided satisfactory estimation result of the process noise. In addition, the estimation result of the measurement noise was accurate and it was within the 99.7% credible intervals.

3.4.2.2

Case 2. Stationary Horizontal Ground Excitation and Nonstationary Vertical Ground Excitation

In this case, the bridge was subjected to ground excitations in both horizontal and vertical directions: f (t) = [ f h (t), f v (t)]T

(3.60)

where the horizontal ground excitation f h was modeled as stationary zero-mean Gaussian white noise with spectral intensity S0 = 5.0 × 10−6 m2 /s3 . The nonstationary vertical ground excitation f v was modeled as a modulated zero-mean Gaussian white noise given as: f v (t) = W fv (t)g¨v (t)

(3.61)

3.4 Applications

101

Fig. 3.14 Estimation results of the stiffness parameters (Case 1)

where g¨v is the vertical ground acceleration modeled as zero-mean Gaussian white noise with spectral intensity Sv = 5.0 × 10−6 m2 /s3 ; W fv was the modulating function given by: ( W fv (t) =

1 1+

t−ta exp(1 tb



t−ta ) tb

0 < t ≤ ta t > ta

(3.62)

where ta = 150 s and tb = 10 s. Then the covariance matrix of the excitations can be expressed as:

102

3 Real-Time Updating of Noise Parameters for System Identification

Fig. 3.15 Estimation results of the damping coefficients (Case 1)

[ Σ f,i =

σ 2fh ,i 0 0 σ 2fv ,i

] (3.63)

Therefore, the noise parameter vector consisted of three parameters: [ ]T 2 ψ i+1 = σ 2fh ,i , σ 2fv ,i , σn,i+1

(3.64)

The initial noise parameter vector and the associated covariance matrix were taken as: ψ 0|0 = [1, 1, 1]T Σ ψ,0|0 =

1 I3 9

(3.65) (3.66)

3.4 Applications

103

Fig. 3.16 Estimation results of the noise parameters (Case 1)

The estimation results of the stiffness parameters and damping coefficients are first shown in Figs. 3.17 and 3.18, respectively. It is observed that the presented approach provides satisfactory estimations for both stiffness parameters and damping coefficients. Time delay for tracking the stiffness parameters was expected because the identification results depended on the data at the current and previous time steps. As a result, it was inevitable for the existence of a time lag. Nevertheless, the time delay was acceptably small for tracking the sudden damage of the bridge.

104

3 Real-Time Updating of Noise Parameters for System Identification

Fig. 3.17 Estimation results of the stiffness parameters (Case 2)

Figure 3.19 shows the estimated noise parameters. Again, the vertical dashed lines represent the time instant t = 22.14 s when the training phase was terminated. The proposed method performed satisfactorily in tracking the process noise in both horizontal and vertical directions. In addition, the measurement noise could also be estimated accurately. It is observed that there was small time delay for tracking the variations of the noise parameter of the vertical excitation. This time lag was expected because the estimation was based on the data at the current and previous time steps.

3.4 Applications

Fig. 3.18 Estimation results of the damping coefficients (Case 2)

Fig. 3.19 Estimation results of the noise parameters (Case 2)

105

106

3 Real-Time Updating of Noise Parameters for System Identification

3.5 Concluding Remarks In this chapter, we introduced a real-time updating approach for the noise covariance matrices in the extended Kalman filter. The proposed approach utilizes the Bayesian probabilistic algorithm to formulate the posterior probability density function of the noise parameters which govern the noise covariance matrix in the extended Kalman filter. A two-phase numerical approach is presented to solve the optimization problem formulated by the Bayesian inference. The proposed method resolves the possible divergence problem due to adhoc selection of the noise covariance matrices in the filter. Moreover, reliable real-time uncertainty quantification can be achieved due to the reliable estimation of the noise parameters. In addition, since the noise parameters are updated in a real-time manner for every time step, the proposed approach is applicable for both stationary and nonstationary responses. The proposed approach outperforms in real-time system identification and uncertainty quantification, and it has high potential for a wide range of applications.

References Azam SE, Chatzi E, Papadimitriou C (2015) A dual Kalman filter approach for state estimation via output-only acceleration measurements. Mech Syst Signal Pr 60(2015):866–886 Bavdekar VA, Deshpande AP, Patwardhan SC (2011) Identification of process and measurement noise covariance for state and parameter estimation using extended Kalman filter. J Process Contr 21(4):585–601 Bèlanger PR (1974) Estimation of noise covariances for a linear time-varying stochastic process. Automatica 10:267–275 Dempster AP, Laird NM, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 3:1–38 Hoshiya M, Saito E (1984) Structural identification by extended Kalman filter. J Eng Mech 110(12):1757–1770 Kalman RE (1960) A new approach to linear filtering and prediction problems. J Fluids Eng 82(1):35–45 Koh CG, See LM (1994) Identification and uncertainty estimation of structural parameters. J Eng Mech 120(6):1219–1236 Lin JS, Zhang Y (1994) Nonlinear structural identification using extended Kalman filter. Comput Struct 52(4):757–764 Mehra RK (1970) On the identification of variance and adaptive Kalman filtering. IEEE T Automat Contr 15(2):175–184 Mohamed AH, Schwarz KP (1999) Adaptive Kalman filtering for INS/GPS. J Geodesy 73:193–203 Myers KA, Tapley BD (1976) Adaptive sequential estimation with unknown noise statistics. IEEE Trans Automat Control 21:520–523 Odelson BJ, Lutz A, Rawlings JB (2006) The autocovariance-least squares method for estimating covariances: application to model-based control of chemical reactors. IEEE Trans Control Syst Technol 14(3):532–540 Odelson BJ, Rajamani MR, Rawlings JB (2006b) A new autocovariance least-squares method for estimating noise covariances. Automatica 42(2):303–308 Shumway RH, Stoffer DS (2000) Time series analysis and its applications. Springer, New York

References

107

Yuen KV (2010) Bayesian Methods for Structural Dynamics and Civil Engineering. John Wiley & Sons Yuen KV, Hoi KI, Mok KM (2007) Selection of noise parameters for Kalman filter. Earthq Eng Struct D 6(1):49–56 Yuen KV, Kuok SC (2016) Online updating and uncertainty quantification using nonstationary output-only measurement. Mech Syst Signal Pr 66:62–77 Yuen KV, Liang PF, Kuok SC (2013) Online estimation of noise parameters for Kalman filter. Struct Eng Mech 47(3):361–381

Chapter 4

Outlier Detection for Real-Time System Identification

Abstract This chapter introduces an algorithm for detecting anomalous data in the measurements from time-varying systems. The probability of outlier of a data point is defined and derived and this algorithm utilizes it to evaluate the outlierness of each data point. The probability of outlier integrates the normalized residual, the measurement noise level and the size of the dataset, and provides a systematic and objective criterion to effectively screen the possibly abnormal data points in the observations. Instead of using other adhoc judgement on selecting outliers, the proposed method provides an intuitive threshold 0.5 for outlier detection. Computationally efficient techniques are introduced to alleviate the heavy burden encountered in the identification using long-term monitoring data. The proposed outlier detection algorithm is embedded into the extended Kalman filter. Therefore, it can remove the outliers in the measurements and identify the time-varying systems simultaneously. By excluding the outliers in the measurements, the proposed algorithm ensures the stability and reliability of the estimation. Examples are presented to illustrate the practical aspects of detecting outliers in the measurements and identifying time-varying systems in a real-time manner. The algorithm presented in this chapter is suitable for centralized identification while the algorithm presented in Chap. 7 is suitable for distributed identification. Keywords Outlier detection · Probability of outlier · Intuitive threshold · Outliers · Structural health monitoring

4.1 Introduction In this chapter, we introduce an efficient outlier removal algorithm for real-time system identification. This context is highly demanded in practical situations, in which it is desired to perform data cleansing before evaluating the state of systems. The quality of data obtained from physical apparatuses is generally affected by noise, malfunction of instruments, and error in data transmission and transmission. On the other hand, human error, environmental disturbances, changes in system behavior and unmodeled mechanism of the concerned system may also seriously affect the data quality. The aforementioned factors lead to outliers in the dataset and further © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Huang and K.-V. Yuen, Bayesian Real-Time System Identification, https://doi.org/10.1007/978-981-99-0593-5_4

109

110

4 Outlier Detection for Real-Time System Identification

degrade the decision-making process. As a result, it is crucially important to ensure the reliability and accuracy of the sensor data for system identification. Outliers are also known as abnormalities, deviants, or anomalies in data mining and statistics literature. The classical definition of outliers was given by Hawkins (1980): ‘An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism’. In system identification, outliers can be considered as the measurements that deviate significantly from the normal pattern of the observations. Outlier detection is the process to search the data points that do not follow the normal pattern of the sensing data. A successful outlier detection algorithm can enhance the data reliability and improve the robustness of data analysis. On the other hand, the detected outliers can be used as fault detection (Luo et al. 2005) and event reporting of interest (Krishnamachari and Iyengar 2004). In recent years, the topic about outlier detection has attracted much attention, and it has been widely researched in various disciplines, such as statistics (Rousseeuw and Hubert 2018), physics (Thaprasop et al. 2021), economy (Punzo et al. 2018), and information theory (Dong et al. 2019). There are two typical types of evaluation output for outlier detection algorithms, i.e., outlier score and binary label. Outlier score is used to quantify the level of outlierness for a data point, and to rank the data points in an order of their outlier tendency. Binary label is utilized to classify the data points into normal or abnormal. Outlier score can also be converted into binary label by imposing some thresholds on the outlier scores. Although a binary label contains less information than an outlier score, it is the final decision necessary for the detection of the abnormal data in dataset. According to the concentration of the anomalous data points, outliers can be classified as isolated outliers and segmental outliers. Isolated outlier, also known as spike (Quiroga et al. 2004), is normally represented as an arbitrary change, and is extremely different from the rest of the data. Isolated outliers in the profile often appear independently in time series, and they can be caused due to various sources, such as transient malfunction of sensing devices, human error, and/or sudden change of the system model. Segmental outliers last for a certain time period and change the historical pattern of the observations. Thus, segmental outlier detection can be used to detect the occurrence of the concerned events, such as air pollution (Ottosen and Kumar 2019), equipment failure (Li et al. 2009) and network intrusion (Zhang et al. 2008). Note that the determination of an outlier usually requires subjective judgment. For example, a commonly used criterion is that a data point with absolute value of the deviation larger than 2.5 times of the standard deviation will be considered as an outlier. Therefore, the determination of the threshold is crucially important for outlier detection and efforts have been devoted (Zhang et al. 2010). Such threshold is usually user-specified and fixed. A proper threshold is vital for an outlier detection method to achieve satisfactory performance. Specifically, if the threshold is selected to be too restrictive, the algorithm will miss many true outlier data points, resulting in high level of false negative. On the other hand, if the threshold is chosen to be too loosely, the detection results will conclude too many outlier data points, leading to high level

4.1 Introduction

111

of false positive. This trade-off can be quantified in terms of two commonly used indicators, namely masking and swamping. In the following, a widely-used outlier detection criterion is introduced and its performance under different conditions is evaluated. Example. Empirical rule for outlier detection. In this example, we investigate the performance of a simple and well-known empirical rule for outlier detection. First, let us briefly introduce this empirical rule (also known as 68-95-99.7 rule) and its modification for outlier detection. This empirical rule is often used as a simple test for outliers when the population is assumed to be Gaussian. For a Gaussian data set, 68%, 95% and 99.7% of the values lie within one, two, and three standard deviations from the mean, respectively: P(μ − σ ≤ X ≤ μ + σ ) ≈ 68.27% P(μ − 2σ ≤ X ≤ μ + 2σ ) ≈ 95.45%

(4.1)

P(μ − 3σ ≤ X ≤ μ + 3σ ) ≈ 99.73% where P(·) denotes the probability of the event in its argument; X is an observation from a Gaussian random variable; μ is the mean of the distribution; and σ is its standard deviation. Figure 4.1 shows the schematic diagram of Eq. (4.1).

( ) Fig. 4.1 Probability density function of a Gaussian distribution X ∼ N μ, σ 2

112

4 Outlier Detection for Real-Time System Identification

Then, the outlier detection criterion is given as follows: |ε| >α σ

(4.2)

where |ε| is the absolute value of the residual ε; σ is the standard deviation; and α is a threshold value. In the literature, the threshold value is often assigned to be α = 2.5 as a moderately conservative choice (Leys et al. 2013). As a result, the outlier detection can be performed as follows. For a given sample set, one can compute the absolute normalized residuals and compare these with the prescribed threshold. For those points falling more than 2.5 standard deviations away from the model output, they will be considered as “likely outliers”. Next, we will evaluate the performance of the outlier detection criterion in Eq. (4.2) under three different scenarios. Case 1. Outlier detection using datasets with different sizes First, we investigate the relationship between the size of dataset and the threshold value. Let us compare the outlier detection results under two different sizes of dataset with N1 = 10 and N2 = 1000 data points. In order to simplify the problem, it is assumed that these two datasets are generated from the same bivariate Gaussian distribution with zero mean and covariance matrix I2 , given by: ) ( x ∼ N [0, 0]T , I2

(4.3)

where I2 is the 2 × 2 identity matrix. Note that there is no outlier in both cases but we applied the criterion in Eq. (4.2) for outlier detection to these two clean datasets. In this application, the same definite threshold α = 2.5 is used. In other words, the data points with absolute normalized residuals larger than 2.5 are considered as outliers. Otherwise, they are identified as regular data points. Figures 4.2 and 4.3 show the outlier detection results of the two datasets for N1 = 10 and N2 = 1000, respectively. The circles represent the identified regular data points, and the dots represent the identified outliers. The dashed lines represent the bounds with plus and minus α. For the dataset with N1 = 10, all the data points lie within the 2.5 standard deviations box so no outlier is identified. However, for the dataset with N2 = 1000 data points, it is clearly seen that there are 26 data points fall more than 2.5 standard deviations box. As a result, these 26 data points will be mistakenly detected as outlier. This simple example indicates that the threshold should be chosen according to the data size as it affects the size of the box. The threshold should be larger for more data points and there is no universal value which fits for data of different sizes. Case 2. Outlier detection using datasets with different noise levels In Case 2, we investigate the relationship between the level of measurement noise and the threshold value. 500 (i.e., N = 500) noise-free points were generated by a bivariate Gaussian distribution with zero mean and covariance matrix I2 :

4.1 Introduction

Fig. 4.2 Outlier detection results for the dataset with N1 = 10 data points

Fig. 4.3 Outlier detection results for the dataset with N2 = 1000 data points

113

114

4 Outlier Detection for Real-Time System Identification

) ( x ∼ N [0, 0]T , I2

(4.4)

Then, assume that these data points are observed through an imperfect instrument, and the observations are given by: z i = x i + ni ,

i = 1, 2, . . . , 500

(4.5)

where z i is the ith measured data point; ni represents the measurement noise of the ith measured data point and n is modeled as zero-mean Gaussian independent and identically distributed (i.i.d.) process. Three different noise levels are considered in this case. In particular, the noise levels for the first, second and third dataset are taken as 0.05, 0.25 and 0.5. Again, note that there is no outlier in these three datasets but we applied the criterion in Eq. (4.2) for outlier detection. By implementing the outlier detection criterion in Eq. (4.2) with the definite threshold α = 2.5, the detected outliers in the datasets with different noise levels are shown in Figs. 4.4, 4.5 and 4.6. Again, the regular data points are marked as circles and the detected outliers are marked as dots. The dashed lines represent the bounds with plus and minus 2.5. It is seen that the number of the detected outliers in the datasets with 0.05, 0.25 and 0.5 noise level is 7, 12 and 25, respectively. Therefore, 7, 12 and 25 data points are misclassified as outliers in the datasets with 0.05, 0.25, and 0.5 noise level, respectively. However, by utilizing a definite threshold α = 2.5 to detect the outliers in the datasets with different noise levels, the number of the detected outliers grows with the increasing noise level. As a result, it is necessary to consider the measurement noise level for outlier detection. A larger threshold should be chosen for the dataset with higher measurement noise level. Case 3. Outlier detection using different thresholds In Case 3, we investigate the outlier detection results using different thresholds α. Again, consider a two-dimensional dataset generated from the bivariate Gaussian distribution in Eq. (4.3), and the number of data points is taken as N = 500. Five threshold values, namely 2, 2.4, 2.5, 2.6 and 3, are considered to detect the outliers in the dataset. Figures 4.7, 4.8, 4.9, 4.10 and 4.11 show the outlier detection results with different threshold values. The identified regular data points are marked as circles, and the detected outliers are marked as dots. The dashed lines represent the bounds with plus and minus 2, 2.4, 2.5, 2.6 and 3 in Figs. 4.7, 4.8, 4.9, 4.10 and 4.11, respectively. It is seen that the number of the detected outliers in Figs. 4.7, 4.8, 4.9, 4.10 and 4.11 is 45, 18, 15, 12 and 6, respectively. As a result, the selection of the thresholds severely affects the outlier detection results, even though the differences of the thresholds are mild. It is observed that the difference between the outlier detection results using thresholds α = 2 and α = 3 is huge. Therefore, the outlier detection result using the criterion in Eq. (4.2) is very sensitive to the choice of the threshold. In addition, this threshold is usually assigned by the user based on prior information

4.1 Introduction

Fig. 4.4 Outlier detection results for the dataset with 0.05 noise level

Fig. 4.5 Outlier detection results for the dataset with 0.25 noise level

115

116

4 Outlier Detection for Real-Time System Identification

Fig. 4.6 Outlier detection results for the dataset with 0.5 noise level

or experience, and thus the decision-making processing concerning the exclusion criterion of abnormal data is unavoidably subjective. On the other hand, although an optimized level of the threshold can be obtained by some optimization methods, i.e., minimizing false positive and false negative (Japkowicz et al. 1995), it is usually computationally demanding and fails to provide prompt decisions for time-varying systems. Moreover, it requires a large dataset. In conclusion, it is found that the number of detected outliers not only relies on the normalized residuals, but also is influenced by the size of the dataset and the noise level. On the other hand, a proper value of the threshold is crucial for outlier detection methods to achieve good performance. As a result, it is inappropriate to detect the outliers by simply considering the normalized residuals and a subjective assignment of the threshold. In this chapter, we introduce a novel outlier detection approach for real-time system identification. The proposed outlier removal method is embedded into the extended Kalman filter (EKF), so simultaneous outlier detection and system identification can be realized in a real-time manner. The proposed approach integrates the size of the dataset, the noise level and the normalized residual to obtain the probability of outlier for the evaluation of the outlierness of a data point. In the next section, the probability of outlier is defined and elaborated. Two computational efficiency enhancement techniques are also introduced to release the memory and computational burden of the proposed method for long-term monitoring in Sect. 4.3. Section 4.4 summarizes the details of the proposed outlier detection algorithm. In

4.1 Introduction

Fig. 4.7 Outlier detection results using the threshold α = 2

Fig. 4.8 Outlier detection results using the threshold α = 2.4

117

118

4 Outlier Detection for Real-Time System Identification

Fig. 4.9 Outlier detection results using the threshold α = 2.5

Fig. 4.10 Outlier detection results using the threshold α = 2.6

4.2 Outlier Detection Using Probability of Outlier

119

Fig. 4.11 Outlier detection results using the threshold α = 3

Sect. 4.5, illustrative examples are presented for the outlier detection in application of damage identification for a single-degree-of-freedom oscillator and a 14-bay truss. Finally, conclusion of the method will be given in Sect. 4.6.

4.2 Outlier Detection Using Probability of Outlier 4.2.1 Normalized Residual of Measurement Consider a dynamical system with Nd degrees of freedom (DOFs) and equation of motion: M x¨ (t) + C[θ c (t)] x˙ (t) + K[θ k (t)]x(t) = T f (t)

(4.6)

where M, C and K are the mass, damping and stiffness matrix of the system, respectively; the stiffness and damping matrix are parameterized with possibly time-varying ]T [ model parameters θ (t) ≡ θ k (t)T , θ c (t)T ∈ R Nθ ; f is the excitation applied to the system and T is the influence matrix associated with the excitation f . Define the [ ]T augmented state vector y(t) ≡ x(t)T , x˙ (t)T , θ (t)T ∈ R2Nd +Nθ composed of the displacement, velocity and unknown model parameter vector. Then, Eq. (4.6) can be

120

4 Outlier Detection for Real-Time System Identification

discretized and expressed as follows: yi+1 = Ai yi + Bi f i + δ i

(4.7)

where yi ≡ y(iΔt); f i ≡ f (iΔt), Δt is the sampling time step; Ai and Bi are the transitional and input-to-state matrices, respectively; and δ i is the remainder term due to the local linearization approximation. Detailed derivation of Eq. (4.7) and Ai , Bi , δ i can be found in Chap. 2. Discrete-time response measurements z 1 , z 2 , . . . , z i+1 are observed at N0 DOFs: ( ) z i+1 = h yi+1 + ni+1

(4.8)

where h(·) defines the observation quantities; ni+1 represents the measurement noise at the (i + 1)th time step and the measurement noise is modeled as Gaussian i.i.d. process with zero mean and covariance matrix Σ n ∈ R N0 ×N0 . Then, the one-stepahead predictor of the measurements z i+1|i can be readily obtained by taking the expected value of Eq. (4.8): z i+1|i = Hi+1 yi+1|i

(4.9)

where Hi+1 is the observation matrix at the (i + 1)th time step; and yi+1|i is the one-step-ahead predicted state vector. In this chapter, we consider the general situation that every channel of the measure(c) to represent the cth component ments is possibly polluted with outliers. Denote z i+1 (c) (channel) of the measurement z i+1 and z i+1|i to represent the cthcomponent of the one-step-ahead predictor of the measurements z i+1|i . The normalized residual for (c) z i+1 can be defined as (Myers and Tapley 1976): (c) ∊i+1 =

(c) (c) z i+1 − z i+1|i (c) σi+1

(4.10)

(c) where σi+1 is the standard deviation of the one-step-ahead predictor. For the regular measurement noise, the probability so the probability of a data I Iis Gaussian, I) ( Imodel I (c) I I (c) I point falling outside the interval −I∊i+1 I, I∊i+1 I is

I) ( I I (c) I (c) qi+1 = 2Φ −I∊i+1 I

(4.11)

where Φ(·) is the cumulative distribution function of the standard Gaussian random (c) . variable. The shaded area in Fig. 4.12 represents the probability qi+1

4.2 Outlier Detection Using Probability of Outlier

121

(c)

Fig. 4.12 Schematic diagram of the probability qi+1

4.2.2 Probability of Outlier In this subsection, the probability of a data point being regarded as an outlier is given. Consider a dataset Ri(c) at the ith time step for the cth measurement channel including the absolute normalized residuals of the regular data points up to the ith time step: {I I } I I (c) : z Ri(c) = I∊ (c) is a regular data point, j = 1, 2, . . . , i I j j

(4.12)

] [ (c) The initial dataset R(c) is 0 , c = 1, 2, . . . , N0 , is an empty set and the variable N Ri (c) defined to indicate the number of elements in Ri(c) . The variable κi+1 is then defined (c) to denote the number of previous regular data points in Ri with absolute normalized residual not smaller than that of the current time step:

I I ]} { [ I (c) I (c) (c) (c) κi+1 = L Ri(c) (k) : I∊i+1 I ≤ Ri (k), k = 1, 2, . . . , N Ri

(4.13)

where L{X } indicates the length of the set X ; and Ri(c) (k) is the kth element of (c) the dataset Ri(c) . In other words, κi+1 counts the number of the absolute normalized residuals of the previous regular data points in Ri(c) falling outside the interval

122

4 Outlier Detection for Real-Time System Identification

(c)

Fig. 4.13 Schematic diagram of the dataset Ri

(c)

and the variable κi+1

I I I] [ I I (c) I I (c) I (c) (c) −I∊i+1 I, I∊i+1 I . The schematic diagram of the dataset Ri and the variable κi+1 is shown in Fig. 4.13. In order to construct the model for the probability of outlier, we consider the probability that the Gaussian noise model does not generate as many large residual data points as occur. A random ]variable η is introduced to indicate the number [ of data points among the N Ri(c) + 1 standard Gaussian samples fall outside the I I I] [ I I (c) I I (c) I interval −I∊i+1 I, I∊i+1 I . Then, the probability that no point (i.e., η = 0) among ] [ the N Ri(c) + 1 regular data points up to the ith time step falls outside the interval I I I] [ I I (c) I I (c) I −I∊i+1 I, I∊i+1 I is considered. In other words, the probability that the residuals of I I I) ( I ] [ I (c) I I (c) I all N Ri(c) + 1 standard Gaussian samples fall in the interval −I∊i+1 I, I∊i+1 I is evaluated and it is given by: )N ( (c) P(η = 0) = 1 − qi+1

[

] Ri(c) +1

(4.14)

] [ The probability that there is one point (i.e., η = 1) among the N Ri(c) +1 regular I I I] [ I I (c) I I (c) I data points up to the (i + 1)th time step falls outside the interval −I∊i+1 I, I∊i+1 I is

4.2 Outlier Detection Using Probability of Outlier

123

given by: ] ) )N ( [ ( (c) (c) P(η = 1) = N Ri(c) + 1 · qi+1 · 1 − qi+1

[

Ri(c)

]

(4.15)

] [ Similarly, the probability that there are η points among the N Ri(c) + 1 standard I I I] [ I I (c) I I (c) I Gaussian samples fall outside the interval −I∊i+1 I, I∊i+1 I is given by: ] ( [ ) [ ) N Ri(c) ( N Ri(c) + 1 (c) η (c) P(η) = · qi+1 · 1 − qi+1 η

]

+1−η

(4.16)

( ) a! where ab = b!(a−b)! is a binomial coefficient. As a result, the probability of outlier can be considered as the probability to obtain a data point with very large residual. ] Then, it can be defined that there are no more [ (c) samples among N Ri(c) + 1 standard Gaussian samples fall outside the than κi+1 I I I] [ I I (c) I I (c) I interval −I∊i+1 I, I∊i+1 I . In other words, this probability is given as that the absolute ] [ (c) data points among N Ri(c) + 1 standard normalized residuals of no more than κi+1 I I I (c) I Gaussian samples are larger than I∊i+1 I: ( ) ( ) (c) (c) ≡ P η ≤ κi+1 Po z i+1

(4.17)

It is noted that this random ] variable η follows the binomial distribution with [ (c) (c) probability qi+1 under N Ri(c) +1 trials. Therefore, the probability of outlier for z i+1 (c) is I I Ias theI]probability[that ]no more than κi+1 samples fall outside the interval [ defined I (c) I I (c) I (c) −I∊i+1 I, I∊i+1 I among N Ri + 1 standard Gaussian samples. This probability (c) can be obtained by summing up the probabilities from η = 0 to η = κi+1 :

( ) ( ) (c) (c) = P(η = 0) + P(η = 1) + P(η = 2) + · · · + P η = κi+1 Po z i+1 [ ] (c) ( ) ] [ κi+1 ) N Ri(c) +1−η { N Ri(c) + 1 ( (c) )η ( (c) qi+1 1 − qi+1 = (4.18) η η=0 As a result, the outlier detection criterion is given by: ⎧ ( ) (c) (c) ⎪ ⎨ Po z i+1 ≥ 0.5, z i+1 is an outlier ( ) (c) (c) ⎪ ⎩ Po z i+1 < 0.5, z i+1 is a regular data point

(4.19)

124

4 Outlier Detection for Real-Time System Identification

It is intuitive to choose the outlier probability threshold as 0.5. When the probability of outlier is 0.5, it implies that this data point has equal probability of being a regular ( ) data point or an outlier. The data point with the probability of outlier (c) Po z i+1 ≥ 0.5 indicates that it is more likely for this data point being an outlier than being a regular data point. In addition, although a data point with probability of outlier slightly larger than 0.5 may not necessarily be an outlier, it is suggested to be removed from system identification, and this can be explained as follows. If a regular point is mistakenly removed, the loss of information from one measured data point is minor. The consequence is only slight increase of the posterior uncertainty of the estimation. However, if an outlier is included as a regular data point in system identification, it will possibly lead to misleading results. Although Eq. (4.18) is sufficient for the classification of outliers, it is noticed that the number of elements in the set Ri(c) continually grows as observations accumulate. (c) As a result, a memory burden will occur for determining the variable κi+1 , especially for long-term monitoring (Mu and Yuen 2015). On the other hand, we will develop two bounds for fast outlier screening. As a result, it is not necessary to compute the probability of outlier for every observed data. In the next section, we will present these two computational efficiency enhancement techniques.

4.3 Computational Efficiency Enhancement Techniques 4.3.1 Moving Time Window In order to enhance the efficiency of the proposed outlier detection method, a moving time window including no more than Nwin data points is considered. Specifically, ∼ (c)

dataset Ri is defined instead of considering the dataset Ri(c) {in Eq. ] a reduced } [ (4.12), to consist of only the most recent min N Ri(c) , Nwin elements in Ri(c) : ∼ (c)

] } { [ min N Ri(c) ,Nwin

Ri ∈ R

(4.20)

[ (c) ] ∼ The symbol N Ri is utilized to denote the number of elements in the reduced ∼ (c)

dataset of regular absolute normalized residuals Ri . As a result, the probability of (c) outlier of a data point z i+1 are{ approximated as ] } the probabilities of outlier depending [

only on the most recent min N Ri(c) , Nwin regular data points:

4.3 Computational Efficiency Enhancement Techniques

(

(c) Po z i+1

)

⎞ ⎛ [ (c) ] ∼(c) [ (c) ∼ ∼ κ i+1 )N R { + 1 ⎟( (c) )η ( i ⎜ N Ri (c) ≈ 1 − qi+1 ⎠ qi+1 ⎝ η η=0

125 ] +1−η

(4.21)

I I ∼(c) I (c) I where κ i+1 is the number of elements that are no less than I∊i+1 I in the reduced ∼ (c)

dataset Ri : ∼(c) κ i+1

( (c) [ (c) ]) I I ∼ ∼ I (c) I ∼ (c) = L Ri (k) : I∊i+1 I ≤ Ri (k), k = 1, 2, . . . , N Ri

∼ (c)

(4.22)

∼ (c)

where Ri (k) is the kth element of the dataset Ri . This moving time window technique is utilized to mitigate the memory burden in measurement storage for long-term monitoring because only the most recent Nwin data points, instead of all data points in the history, will need to be stored and computed. In addition, it is suitable to consider the past data points in a reasonably short time window due to the possibly changing conditions in long-term monitoring. It is suggested to use a value of Nwin to cover approximately 100 fundamental periods of the underlying dynamical system.

4.3.2 Efficient Screening Criteria It is obvious that computing the probability of outlier for every observed data point in each measurement channel induces a heavy computational burden. Therefore, efficient screening criteria are introduced for fast screening. As a result, it is no longer necessary to compute the probability of outlier in Eq. (4.18). I I I (c) I We develop two bounds of the absolute normalized residual I∊i+1 I, namely ∊u and ∊l such that: ⎧I I I (c) I (c) ⎪ ⎨ I∊i+1 I ≥ ∊u , z i+1 is an outlier I I (4.23) (c) I (c) ⎪ ⎩ II∊i+1 I ≤ ∊l , z i+1 is a regular data point (c) In other words, an observation I z i+1 I will be recognized as a regular data point if its I (c) I absolute normalized residual I∊i+1 I is not more than ∊l . On the contrary, an observaI I I (c) I (c) tion z i+1 will be recognized as an outlier if its absolute normalized residual I∊i+1 I is larger than or equal to ∊u . Note that these two bounds are not tight bounds but they can still enhance the computational efficiency by avoiding the computation of the probability of outlier for a large portion of data points.

126

4 Outlier Detection for Real-Time System Identification

Next, we determine the value of ∊l and ∊u . First, ∊l can be taken as a conservatively small value: ∊l = 2

(4.24)

This implies that any data point with prediction error within 2 standard deviations is immediately classified as a regular data point. On the other hand, to obtain ∊u , we have the following inequality based on Eq. (4.21): (

(c) Po z i+1

)

( )N (c) ≥ 1 − qi+1

[

∼ (c)

Ri

] +1

) Nwin ( (c) ≥ 1 − qi+1

( ) ( ) Nwin (c) (c) ≥ 0.5 when 1 − qi+1 As a result, Po z i+1 [1 − 2Φ(−∊u )] Nwin

+1

+1

+1

(4.25)

≥ 0.5.

= 0.5

(4.26)

Finally, the value ∊u is readily obtained: ∊u = −Φ

−1

(

1 1 − 2 21+ Nwin1

) (4.27) +1

where Φ−1 is the inverse of the standard Gaussian cumulative distribution function. For example, if a moving time window is assigned as Nwin = 10000, the bound ∊u will be 3.9787. As a result, the probability of outlier is calculated only for the data points with absolute normalized residual falling into the interval (∊l , ∊u ). These efficient screening rules facilitate the outlier detection process by avoiding the computation of the probability of outlier for a large portion of data points.

4.4 Outlier Detection for Time-Varying Dynamical Systems 4.4.1 Training Stage Since the initial state vector y0|0 in EKF is assigned arbitrarily, it is necessary to perform a short training stage for the outlier detection method to gain sufficient information to learn about the state vector y. Moreover, the training stage is also utilized to obtain the initial value of the standard deviation of the prediction error σ1(c) , c = 1, 2, . . . , N0 . It is suggested to implement EKF using the raw observations

4.4 Outlier Detection for Time-Varying Dynamical Systems

127

(i.e., without outlier screening) for roughly ten fundamental periods of the underlying dynamical system. This time window is flexible and a rough estimation of the fundamental frequency can be easily obtained from the Fourier spectrum of the response for the dynamical system. For each measurement channel c = 1, 2, . . . , N0 : I I I (c) (c) I 1. Calculate the absolute residuals Iz i+1 − z i+1|i I for each time step in the training stage, and sort them in the ascending order. 2. The initial value σ1(c) can be obtained as the 68th percentile value of the sorted absolute residuals. The reason to use the 68th percentile value of the sorted absolute residuals is that the interval (−σ, ( σ ) )covers a probability of 0.68 (see Eq. 4.1) for the Gaussian distribution N 0, σ 2 . This is a robust estimator of the standard deviation in the presence of outliers because the extreme values do not affect this estimator in contrast to the common definition of standard deviation.

4.4.2 Working Stage When the training stage is terminated, the proposed algorithm will enter into the working stage. Then, the time index is reset to i = 0, so the first data point to be examined is z i+1 = z 1 . For each measurement channel c = 1, 2, . . . , N0 : ∼ (c)

1. Set the initial reduced set of regular absolute normalized residuals R0 as an empty set; assign Nwin , ∊l using Eq. (4.24), and calculate ∊u using Eq. (4.27). 2. Detect outliers according to the probability of outlier: (1) Calculate the one-step-ahead predictor z i+1|i using Eq. (4.9) and the (c) normalized residual ∊i+1 using Eq. (4.10). (c) (2) Determine I I whether or not the observation z i+1 is an outlier: I (c) I (c) (a) if I∊i+1 I ≤ ∊l(c) , z i+1 will be classified as a regular data point. I I I (c) I (c) (b) if I∊i+1 I ≥ ∊u(c) , z i+1 will be classified as an outlier. I I I (c) I (c) if ∊l(c) < I∊i+1 I < ∊u(c) , i.

∼(c)

(c) calculate qi+1 using Eq. (4.11) and count κ i+1 according to Eq. (4.22) based ∼ (c)

on the dataset Ri .

( ) ( ) (c) (c) < ii. compute the probability of outlier Po z i+1 using Eq. (4.21). If Po z i+1 (c) 0.5, z i+1 will be classified as a regular data point. Otherwise, it will be recognized as an outlier and will be discarded from the identification process.

3. According to the outlier detection result in Step 2,

128

4 Outlier Detection for Real-Time System Identification ∼ (c)

(1) Update the reduced set of regular absolute normalized residuals Ri+1 as follows: ⎧ (c) ∼ (c) ⎪ ⎪ Ri , z i+1 is an outlier ⎪ ⎪ ⎪ ⎪ [ (c) ] ⎪ ⎨ ∼ (c) {II (c) II} (c) ∼ ∼ (c) < Nwin Ri+1 = Ri ∪ I∊i+1 I , z i+1 is a regular data point, and N Ri ⎪ ⎪ ⎪ [ (c) ] ⎪ I} {I I} ⎪ ∼ (c) {I ∼ ⎪ I (c) ⎪ ⎩ Ri ∪ II∊ (c) II − II∊ (c) = Nwin i+1 earliest I , z i+1 is a regular data point, and N Ri

(4.28) I I ∼ (c) I (c) I where I∊earliest I is the earliest element in the set Ri . (c) (2) Update the standard deviation σi+2 as follows: ⎧ (c) (c) , z i+1 is an outlier ⎪ σ ⎪ ⎨ i+1 / (c) ] [ σi+2 = )( (c) )2 ( 2 1 ⎪ (c) (c) (c) ⎪ ⎩ Nwin − 2 σi+2 + (z i+1 − z i+1|i ) , z i+1 is a regular data point Nwin − 1

(4.29)

4. Remove the outliers from z i+1 , and modify the observation matrix Hi+1 and the noise covariance matrix Σ n accordingly. 5. Update the state vector and its associated covariance matrix: (1) If all components in z i+1 are classified as outliers, yi+1|i+1 = yi+1|i and Σ i+1|i+1 = Σ i+1|i . (2) If not all components in z i+1 are classified as outliers, update the state vector yi+1|i+1 and its associated covariance matrix Σ i+1|i+1 by using EKF with the measurements obtained from Step 4. 6. Go back to Step 2 for the next time step. It is worth noting that if no outlier is found in raw measurements, the identification results from the proposed outlier detection method will be identical to the conventional EKF introduced in Chap. 2. On the other hand, if there are outliers in raw measurements, the proposed approach will serve as a safeguard to prevent the adverse impact caused by the outliers for the system identification results.

4.5 Applications

129

4.5 Applications 4.5.1 Outlier Generation In order to demonstrate the proposed method for its outlier removal capability, outlier generation is introduced based on the measurements from the time-varying dynamical system in this subsection. First, it is necessary to introduce two assumptions before presenting the specific outlier generation technique. It is assumed that all measurement channels from the sensor network are contaminated with outliers. The second assumption is that the inserted outliers are distributed randomly into the series, with all data points equally likely to be an outlier. Next, the outlier generation technique is described. The generation of outliers is controlled by two parameters, i.e., the outlier occurrence rate and the magnitude of outliers. The outlier occurrence rate controls the percentage of outliers in the measurements of each channel while the magnitude of outliers controls the degree of which the data point deviates from its actual value. In the following applications, the outlier occurrence rate is taken as γ = 1% in the first application, and it is taken as a random variable following the uniform distribution on the interval [0.5%, 1%] for multi-channel measurements in the second application. The magnitude of the outliers is taken from a uniform distribution within the interval [smin , smax ], where smin and smax are the minimum and maximum value of the corresponding noise-free response quantities, respectively. Their particular values will be presented in the subsequent sections. Please note that the aforementioned information is known and used only in the data generation process. However, both the models and the parameter values are completely unknown throughout the entire identification process.

4.5.2 Single-Degree-of-Freedom Oscillator In the first application, a single DOF oscillator is considered. The mass, stiffness and damping coefficients are given by m = 1 kg, k = 1200 N/m and c = 1.06 Ns/m, respectively. Two model parameters are introduced to parameterize this single DOF oscillator, i.e., the stiffness parameter θk and the damping parameter θc . The entire monitoring period was 600 s, and the sampling time step was taken as Δt = 0.005 s. The external excitation applied to the oscillator was a zero-mean stationary Gaussian white noise with spectral intensity S0 = 1.5 × 10−6 m2 /s3 . The stiffness of the oscillator had two sudden decreases of 5% and 10% at the 200th s and 400th s, respectively. Acceleration measurement was acquired with measurement noise, which was 10% root mean square (rms) of the noise-free acceleration response of the oscillator. In addition, the measurement was polluted by outliers. Specifically, the outlier occurrence rate was taken as γ = 1%, so the number of outliers in the measurement was

130

4 Outlier Detection for Real-Time System Identification

1200. These outliers were randomly inserted into the measurements, in which all data points were equally likely to be an outlier. Moreover, the magnitude of the outliers varied and it was drawn from a uniform distribution ranging from the minimum to the maximum noise-free acceleration response of the oscillator. It is noted again that the outlier occurrence rate and the magnitude of the outliers were unknown in the entire identification process. Figure 4.14 shows the outlier-corrupted acceleration measurements and the time interval [360, 380] s is magnified in Fig. 4.15. In order to compare the performance of the conventional EKF and the proposed method, the estimation results by using the conventional EKF with the regular noisy measurements, i.e., the measurements without the outliers but with regular measurement noise, are first presented in Fig. 4.16. The upper subplot in Fig. 4.16 shows the estimated stiffness parameter and the lower subplot shows the estimated damping parameter. The dotted lines represent the estimated values; the solid lines represent the actual values and the dashed lines represent the bounds of the 95% credible interval. The same line style will be used to the later figures. It is observed that the estimated model parameters using the regular noisy measurements are close to the actual values and they are within the 95% credible intervals. Then, the estimation results by using the conventional EKF with the outliercorrupted measurements are presented. The upper and lower subplots in Fig. 4.17 show the estimation results of the stiffness parameter θk and the damping parameter θc , respectively. Since the measurements were contaminated with outliers, the estimation results obtained by using the conventional EKF fluctuated severely, and they

Fig. 4.14 Outlier-corrupted acceleration measurements of the oscillator

4.5 Applications

131

Fig. 4.15 Outlier-corrupted acceleration measurements of the oscillator in the time interval [360, 380] s

deviated substantially from the actual values. In other words, the plain EKF failed to provide accurate tracking estimation and the detection of the stiffness reduction. Figures 4.18 and 4.19 show the actual versus estimated values of the displacement and velocity responses of the oscillator, respectively. The 45-degree lines in Figs. 4.18 and 4.19 provide the reference of perfect match. It is obvious that the deviations between the actual and estimated responses were large. Together with the structural parameter identification results, it can be concluded that the conventional EKF is sensitive to anomalous data points in the measurements, and it is not capable to provide reliable estimation results in the presence of outliers. Next, the estimated stiffness and damping parameters obtained by using the proposed method with the same set of outlier-corrupted measurements in Fig. 4.17 are presented. The upper and lower subplots in Fig. 4.20 show the estimation results of θk and θc by using the proposed method, respectively. It is seen that the proposed method tracked the model parameters well, although there were outliers in the observations. It indicates that the proposed method provided substantially stable and accurate identification results. In addition, the estimation results in Fig. 4.20 were very similar to those obtained by using the conventional EKF with regular noisy measurements in Fig. 4.16. This confirms that the proposed method was capable for simultaneous outlier removal and time-varying system identification.

132

4 Outlier Detection for Real-Time System Identification

Fig. 4.16 Estimation results of θk and θc by using the conventional EKF with regular noisy measurements

In addition, we use 50 independent simulation runs to demonstrate the quantitative comparison between the conventional EKF and the proposed method using outliercorrupted measurements. Two indicators, namely root mean square error (RMSE) and mean absolute error (MAE), are utilized to quantify the error of the estimation results. Since the initial conditions of the 50 simulation runs were set randomly, the estimation results of the first 10 s in each simulation were discarded from the compar˙ stiffness ison. Table 4.1 shows the error statistics of the displacement x, velocity x, parameter θk and damping parameter θc , by using the conventional EKF using regular noisy measurements (without inserting the outliers), conventional EKF using outliercorrupted measurements, and the proposed method using outlier-corrupted measurements. The second to the fourth columns show the RMSE; the fifth to the seventh columns show the MAE. It is obvious that the proposed method outperformed the conventional EKF in the presence of outliers by a large margin. Moreover, both RMSE and MAE obtained using the proposed method were close to those using the conventional EKF with regular noisy measurements. In conclusion, it is confirmed that the proposed method provided more accurate estimation results than the conventional EKF in the presence of outliers. Finally, using these 50 simulation runs, the outlier detection performance of the proposed method is presented in Table 4.2. Two indicators, namely masking and

4.5 Applications

133

Fig. 4.17 Estimation results of θk and θc by using the conventional EKF with outlier-corrupted measurements

swamping, are introduced. In particular, swamping refers to regular data points being classified as anomalies while masking refers to anomalies being classified as regular data points. As a result, the masking rate is the average of the masking number divided by the actual number of outliers; and the swamping rate is the average of the swamping number divided by the actual number of regular data points. The masking and swamping rates lie within the interval [0, 100%], and low values for both rates indicate that an outlier detection method possesses low level of false alarm and high level of correct detection. Table 4.2 shows the masking and swamping rates based on the aforementioned 50 simulation runs, and these two rates were both very close to zero. As a result, it confirms the good performance of the proposed method for simultaneous outlier detection and identification of time-varying dynamical system in a real-time manner.

4.5.3 Fourteen-Bay Truss A 14-bay truss is considered in this example. This truss is simply supported at its two ends (i.e., the 1st and 28th nodes). The length of each horizontal and vertical member

134

4 Outlier Detection for Real-Time System Identification

Fig. 4.18 Actual versus estimated values of the displacement response by using the conventional EKF with outlier-corrupted measurements

Fig. 4.19 Actual versus estimated values of the velocity response by using the conventional EKF with outlier-corrupted measurements

4.5 Applications

135

Fig. 4.20 Estimation results of θk and θc by using the proposed method Table 4.1 Comparison of error statistics (SDOF oscillator) RMSE (conventional EKF using regular noisy measurements)

RMSE RMSE (conventional (proposed EKF using method) outlier-corrupted measurements)

MAE (conventional EKF using regular noisy measurements)

MAE MAE (conventional (proposed EKF using method) outlier-corrupted measurements)

x

4.12E-06

1.78E-04

5.16E-06

3.25E-06

1.28E-04

3.60E-06



3.26E-04

7.12E-03

3.72E-04

2.84E-04

5.04E-03

2.93E-04

θk 1.70E-02

1.46E-01

1.90E-02

1.00E-02

1.43E-01

1.20E-02

θc 1.93E-01

1.87E+00

2.46E-01

1.53E-01

1.77E+00

1.94E-01

Table 4.2 Outlier detection results of the proposed method (SDOF oscillator)

Masking rate (%)

Swamping rate (%)

0.02

0.31

136

4 Outlier Detection for Real-Time System Identification

is 3 m and each member has uniform circular hollow cross section with area 1.0 × 10–4 m2 . The mass density is 7860 kg/m3 and the modulus of elasticity is 2.0 × 109 Pa. As a result, the first five natural frequencies are 0.43, 1.45, 2.10, 2.89 and 4.44 Hz. The damping matrix is given by C = αM + βK, where α = 0.083 s−1 and β = 0.003 s, such that the damping ratios are 2% for the first two modes. The members of the truss are separated into six groups and the group numbers are marked in parentheses { (n) in Fig. 4.21. The stiffness matrix is thereby given as K = 6n=1 θk(n) K . Therefore, the unknown model parameters consist of θ k and θ c = [α, β]T . The truss was subjected to ground motion with spectral density 1.2 × 10–4 m2 /s3 in horizontal and vertical directions. The sampling time interval was Δt = 0.0025 s and the entire monitoring period was 300 s. Horizontal acceleration responses of the 13th, 14th, 17th, 22nd, 28th nodes, and vertical acceleration responses of the 2nd, 5th, 6th, 9th, 10th, 19th, 26th nodes were observed using twelve accelerometers. The measurement noise was taken as 10% rms of the corresponding noise-free response quantities. In addition, these measurements were contaminated with outliers. The outlier occurrence rate for different channels of measurements was different and it was taken from a uniform distribution within the interval [0.5%, 1%]. Moreover, the magnitude of outliers in each measurement channel was generated according to a uniform distribution in the range of the minimum to the maximum noise-free response quantities of the corresponding channel. These variations allowed us to assess the capability of the proposed method for detecting outliers of different magnitude and identifying different number of outliers. Structural damage was also imposed during the monitoring duration. In particular, the first damage of 5% stiffness reduction occurred in the elements of the first group at t = 100 s; and the second damage of 5% stiffness reduction occurred in the elements of the sixth group at t = 200 s. Figure 4.22 shows the outlier-corrupted vertical acceleration measurements of the 6th node and the time interval [150, 160] s is magnified in Fig. 4.23. Figure 4.24 shows the outlier-corrupted horizontal acceleration measurements of the 17th node and the time interval [200, 210] s is magnified in Fig. 4.25. It is worth noting that these outliers are not necessarily fall outside the range of the bulk of the data. For comparison purpose, the estimation results obtained using the conventional EKF with regular noisy measurements (i.e., measurements without outliers) are first presented. Figures 4.26 and 4.27 show the estimated stiffness and damping parameters, respectively. Satisfactory agreement between estimation and actual values could

Fig. 4.21 14-bay truss model

4.5 Applications

137

Fig. 4.22 Outlier-corrupted vertical acceleration measurements of the 6th node

Fig. 4.23 Outlier-corrupted vertical acceleration measurements of the 6th node in the time interval [150, 160] s

138

4 Outlier Detection for Real-Time System Identification

Fig. 4.24 Outlier-corrupted horizontal acceleration measurements of the 17th node

Fig. 4.25 Outlier-corrupted horizontal acceleration measurements of the 17th node in the time interval [200, 210] s

4.5 Applications

139

be observed. The sudden damages could be tracked with reasonably small time delay. Then, the estimation results of the stiffness and damping parameters using the conventional EKF with the outlier-corrupted measurements are presented in Figs. 4.28 and 4.29 respectively. It is obvious that the estimation results deteriorated drastically and they highly fluctuated. This indicates that the conventional EKF method was sensitive to anomalous data points. Therefore, removing the outliers in the measurements is necessary to obtain reliable structural parameter estimation results. Next, we performed the identification with our proposed method using the same set of outlier-corrupted measurements in Figs. 4.28 and 4.29. The estimation results of the stiffness and damping parameters are shown in Figs. 4.30 and 4.31, respectively. The estimated values were close to the actual values and the fluctuation was suppressed substantially although there were many outliers in all measurement channels. As a result, the two sudden drops of stiffness could be successfully tracked. By comparing Figs. 4.26 and 4.30, the estimation results by using the proposed method with outlier-corrupted data are comparable with those by using the conventional EKF with regular noisy measurements. This verified the outlier removal and model parameter tracking capability of the proposed method. In addition, slight time delay could be observed for tracking the abrupt change of θk(1) and θk(6) in Fig. 4.30. The reason is that the estimations were obtained using the observed data at the current

Fig. 4.26 Estimation results of θ k by using the conventional EKF with regular noisy measurements

140

4 Outlier Detection for Real-Time System Identification

Fig. 4.27 Estimation results of θ c by using the conventional EKF with regular noisy measurements

and previous time steps, and thus the existence of the time lag was expected. As a result, the proposed method is capable for outlier removal and tracking the unknown model parameters for time-varying dynamical systems. The actual versus estimated structural responses of some typical DOFs (a, b, c and d depicted in Fig. 4.21) obtained by using the proposed method are presented in Fig. 4.32. The 45-degree line in each subplot provides the reference of perfect match. It is obvious that the proposed method provided accurate estimation results for the structural responses. On the other hand, the quantitative comparison among the conventional EKF using regular noisy measurements (without inserting the outliers), conventional EKF using outlier-corrupted measurements, and the proposed method using outlier-corrupted measurements, is presented in Table 4.3. The estimation accuracy was evaluated according to the RMSE and the MAE of the representative estimated quantities, over 50 independent simulation runs. The second to the fourth columns show the RMSE values; and the fifth to the seventh columns show the MAE values. Since the initial conditions of the 50 simulation runs were randomly assigned, the estimation results of the first 10 s in each simulation were discarded from the comparison. It is obvious that the proposed method provided much smaller estimation error than the conventional EKF in the presence of outliers. Moreover, the

4.5 Applications

141

Fig. 4.28 Estimation results of θ k by using the conventional EKF with outlier-corrupted measurements

estimation errors of the proposed method were comparable with those of the conventional EKF using regular noisy measurements. As a result, it can be concluded that the proposed method resolved the problems of the conventional EKF method when the observations were polluted with anomalous data points. Finally, the outlier removal performance of the proposed method is presented quantitatively in Table 4.4. The first column indicates the number of measurement channel and the second column indicates the location of the accelerometer. The third and fourth columns indicate the masking and swamping rates over 50 independent simulation runs, respectively. It can be observed that the proposed outlier detection method could successfully detect virtually all the outliers with very low level of swamping in all the measurement channels. In conclusion, the proposed method provided reliable outlier removal and system identification results in a real-time manner.

142

4 Outlier Detection for Real-Time System Identification

Fig. 4.29 Estimation results of θ c by using the conventional EKF with outlier-corrupted measurements

4.6 Concluding Remarks This chapter introduced an outlier detection approach for real-time system identification. This method incorporates the absolute normalized residual, the noise level, and the size of the dataset, and avoids the definite judgment criterion on the outlierness of a data point. It utilizes the binomial distribution model to construct the probability of a data point being an outlier. In order to enhance the computational efficiency of the method, two efficient techniques are introduced, i.e., the moving time window and the efficient screening criteria. The proposed approach was tested in the application of real-time damage detection. The estimation results demonstrated that the presented method performed well in handling different number and magnitude of outliers. Moreover, in contrast to the conventional EKF, which was severely degraded in the presence of outliers, the proposed method provided reliable and stable estimation results, by accurately removing the anomalous data points in the measurements.

4.6 Concluding Remarks

Fig. 4.30 Estimation results of θ k by using the proposed method

Fig. 4.31 Estimation results of θ c by using the proposed method

143

144

4 Outlier Detection for Real-Time System Identification

Fig. 4.32 Actual versus estimated structural responses by using the proposed method

4.6 Concluding Remarks

145

Table 4.3 Comparison of error statistics RMSE (conventional EKF using regular noisy measurements)

RMSE RMSE (conventional (proposed EKF using method) outlier-corrupted measurements)

MAE (conventional EKF using regular noisy measurements)

MAE MAE (conventional (proposed EKF using method) outlier-corrupted measurements)

xa

2.08E-05

4.60E-04

2.12E-05

1.64E-05

3.61E-04

1.67E-05

xb

3.86E-06

7.98E-05

3.82E-06

2.89E-06

6.23E-05

2.89E-06

xc

3.32E-05

7.29E-04

3.39E-05

2.61E-05

5.71E-04

2.66E-05

xd

4.01E-06

8.30E-05

3.98E-06

2.99E-06

6.47E-05

3.02E-06

x˙a

7.05E-03

8.65E-02

6.68E-03

2.48E-03

4.90E-02

2.17E-03

x˙b

8.54E-03

1.25E-01

4.94E-03

4.85E-03

8.98E-02

3.84E-03

x˙c

1.22E-02

1.60E-01

5.38E-03

5.77E-03

1.05E-01

4.31E-03

x˙d

7.57E-03

1.16E-01

5.28E-03

4.98E-03

8.87E-02

4.26E-03

9.10E-03

1.39E-01

5.04E-03

4.86E-03

1.01E-01

4.02E-03

6.07E-03

9.43E-02

5.00E-03

3.48E-03

6.96E-02

3.09E-03

(1)

θk

(2) θk (3) θk (4) θk (5) θk (6) θk

4.48E-03

9.31E-02

4.48E-03

3.37E-03

7.17E-02

3.42E-03

2.02E-05

4.02E-04

1.91E-05

1.58E-05

3.18E-04

1.51E-05

2.08E-05

4.60E-04

2.12E-05

1.64E-05

3.61E-04

1.67E-05

3.86E-06

7.98E-05

3.82E-06

2.89E-06

6.23E-05

2.89E-06

α

3.32E-05

7.29E-04

3.39E-05

2.61E-05

5.71E-04

2.66E-05

β

4.01E-06

8.30E-05

3.98E-06

2.99E-06

6.47E-05

3.02E-06

Table 4.4 Outlier detection results of the proposed method

Measurement channel 1

Location 2

Masking rate (%)

Swamping rate (%)

0.00

0.10

2

5

0.00

0.11

3

6

0.00

0.08

4

9

0.00

0.13

5

10

0.00

0.11

6

13

0.00

0.07

7

14

0.01

0.14

8

17

0.00

0.10

9

19

0.00

0.11

10

22

0.00

0.09

11

26

0.00

0.07

12

28

0.00

0.09

146

4 Outlier Detection for Real-Time System Identification

References Dong Y, Hopkins S, Li J (2019) Quantum entropy scoring for fast robust mean estimation and improved outlier detection. Adv Neur In 32 Hawkins DM (1980) Identification of outliers. Chapman and Hall London Japkowicz N, Myers C, Gluck M (1995) A novelty detection approach to classification. In: Proceedings of the 14th international conference on artificial intelligence, vol 1, pp 518–523 Krishnamachari B, Iyengar S (2004) Distributed Bayesian algorithms for fault-tolerant event region detection in wireless sensor networks. IEEE Trans Comput 53(3):241–250 Leys C, Ley C, Klein O, Bernard P, Licata L (2013) Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J Exp Soc Psychol 49(4):764–766 Li X, Bowers CP, Schnier T (2009) Classification of energy consumption in buildings with outlier detection. IEEE Trans Ind Electron 57(11):3639–3644 Luo X, Dong M, Huang Y (2005) On distributed fault-tolerant detection in wireless sensor networks. IEEE Trans Comput 55(1):58–70 Mu HQ, Yuen KV (2015) Novel outlier-resistant extended Kalman filter for robust online structural identification. J Eng Mech 141(1):04014100 Myers K, Tapley B (1976) Adaptive sequential estimation with unknown noise statistics. IEEE Trans Automat Control 21(4):520–523 Ottosen TB, Kumar P (2019) Outlier detection and gap filling methodologies for low-cost air quality measurements. Environ Sci-Proc Imp 21(4):701–713 Punzo A, Mazza A, Maruotti A (2018) Fitting insurance and economic data with outliers: a flexible approach based on finite mixtures of contaminated gamma distributions. J Appl Stat 45(14):2563– 2584 Quiroga RQ, Nadasdy Z, Ben-Shaul Y (2004) Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput 16(8):1661–1687 Rousseeuw PJ, Hubert M (2018) Anomaly detection by robust statistics. Wires Data Min Knowl Discov 8(2):e1236 Thaprasop P, Zhou K, Steinheimer J, Herold C (2021) Unsupervised outlier detection in heavy-ion collisions. Phys Scripta 96(6):064003 Zhang J, Zulkernine M, Haque A (2008) Random-forests-based network intrusion detection systems. IEEE Trans Syst Man Cybernet C 38(5):649–659 Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170

Chapter 5

Bayesian Model Class Selection and Self-Calibratable Model Classes for Real-Time System Identification

Abstract This chapter introduces the Bayesian model class selection for real-time system identification and an adaptive reconfiguring strategy for the model classes. In addition to parametric estimation, another critical problem in system identification is to determine a suitable model class for describing the underlying dynamical system. By utilizing the Bayes’ theorem to obtain the plausibilities of a set of model classes, model class selection can be performed accordingly. The proposed method provides simultaneous model class selection and parametric identification in a real-time manner. On the other hand, although Bayesian model class selection allows for determination of the most suitable model class among a set of prescribed model class candidates, it does not guarantee a good model class to be selected. It is possible that all the prescribed model class candidates are inadequate. Thus, a new third level of system identification is presented to resolve this problem by using self-calibratable model classes. This self-calibrating strategy can correct the deficiencies of the model classes and achieve reliable real-time identification results for time-varying dynamical systems. On the other hand, the large number of prescribed model class candidates will hamper the performance of real-time system identification. In order to resolve this problem, a hierarchical strategy is proposed. It only requires a small number of model classes but a large solution space can be explored. Although the algorithms presented in this chapter are based on the EKF, the real-time model class selection component and the adaptive reconfiguring strategy for model class selection can be easily embedded into other filtering tools. Keywords Plausibility · Evidence · Bayesian inference · Real-time identification · Self-calibration · Hierarchical model classes

5.1 Introduction In Chaps. 2, 3, and 4, the problem of real-time parametric identification was considered for a prescribed model class with uncertain parameters. Determination of an appropriate model class for parametric identification has not been illustrated in previous chapters for real-time system identification. Model class selection concerns

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Huang and K.-V. Yuen, Bayesian Real-Time System Identification, https://doi.org/10.1007/978-981-99-0593-5_5

147

148

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

the problem on the selection of a suitable class of models for parametric identification. Given a set of model class candidates, it is intuitive that a more complicated model class with extra adjustable uncertain parameters always fits the data better than a simpler one with fewer adjustable parameters. As a result, if the selection criterion is based on the minimization of the prediction errors between the output data and the corresponding predictive results of the optimal model among the model classes, the optimal model class will be the most complicated one. For instance, in modal identification, using a twenty-mode model would always be better than using a ten-mode model since the former one would fit the data better. However, the data fitting improvement might be negligible (Beck and Yuen 2004). Therefore, it is insufficient to consider solely the data fitting capability for model class selection because it often leads to over-fitted model classes. When an over-fitted model class is used for future prediction, it is peculiarly prone to give poor predictive results. The reason is that the selected model depends too much on the details of the data and the measurement noise and modeling error have a substantial role in the data fitting. As a result, in model class selection, it is necessary to penalize the model classes with complicated parameterization. This issue was first pointed out by Jeffreys Harold who did pioneering work on the application of Bayesian methods (Jeffreys 1961). He pointed out that it is necessary to construct a quantitative expression of the wellknown Ockham’s razor (Sivia 1996). Ockham’s razor, also known as the principle of parsimony, is a problem-solving principle that entities should not be multiplied beyond necessity (Schaffer 2015). It is generally understood in the sense that with competing theories or explanations, the simpler one is to be preferred. Most, if not all, model selection methods are constructed based on an appropriate information criterion in which data are utilized to give each model class candidate a certain score. Then, a complete ranking of all the candidates can be achieved from the best to the worst. Well-known model class selection methods include the Akaike information criterion (AIC) (Akaike 1974), the Bayesian information criterion (BIC) (Schwarz 1978), the deviance information criterion (DIC) (Spiegelhalter et al. 2002) and the minimum description length (MDL) (Rissanen 1978). The AIC deals with the tradeoff between the goodness of fit and the simplicity of the model class so it considers both the risks of underfitting and overfitting. The objective function for the AIC is given as follows (Akaike 1974):      JAIC C (m)  D = −2ln p D|θ (m) , C (m) + 2N (m) ,

m = 1, 2, . . . , Nm

(5.1)

where C (m) indicates the mth model class among a set of Nm model class candidates C (1) , C (2) , …, C (Nm ) ; D is the input–output or output-only data; θ(m) is the uncertain  parameter vector and it depends on the model class C (m) ; ln p D|θ , C (m) is the log-likelihood function of the model class C (m) ; N (m) is the number of adjustable parameters in model class C (m) . The best model class is the one that minimizes the objective function in Eq. (5.1). The second term on the right hand side of Eq. (5.1) serves as a penalty that punishes the model classes for being too complicated in the sense of containing too many adjustable parameters. It is noticed that when the

5.1 Introduction

149

number of data points is large, the first term on the right hand side of Eq. (5.1) will dominate. Later, Schwarz (1978) and Akaike (1978) developed the BIC which is given as follows:      JBIC C (m)  D = −2ln p D|θ (m) , C (m) + N (m) ln N ,

m = 1, 2, . . . , Nm (5.2)

where N is the sample size of the dataset D. The model class that minimizes the objective function in Eq. (5.2) is preferred. The penalty term in the BIC increases with the number of data points N . By using the BIC, a more complicated model class may be selected when more data is acquired. This is because it is less likely to have overfitting with more data. The DIC is a hierarchical modeling generalization of the AIC (Spiegelhalter et al. 2002). The DIC criterion is defined, analogously to the AIC, as follows:      (m) JDIC C (m)  D = −2ln p D|θ , C (m) + 2Ne , 

m = 1, 2, . . . , Nm

(5.3)

  (m) ≡ E θ (m) D and E[·] denotes the mathematical expectation; Ne is the where θ effective number of adjustable parameters in model class C (m) and it is given by: 

     (m) Ne = E −2ln p D|θ (m) , C (m) + 2ln p D|θ , C (m) , m = 1, 2, . . . , Nm 

(5.4) The factor Ne can be considered as the mean deviance minus the deviance of the means (Spiegelhalter et al. 2002) and it acts as a penalty term for model complexity. The most suitable model class for the DIC is the one that minimizes the objective function in Eq. (5.3). The MDL principle was developed by Rissanen (1978) in the application of algorithmic coding theory for computer science. It regards both data and model classes as codes. The crux of the MDL principle is to encode the dataset with the help of a model class. Therefore, the more the data is compressed by extracting redundancy from it, the more the underlying regularities in the data are uncovered (Grünwald et al. 2005). The code length is utilized to evaluate the generalization capability of a model class candidate. The model class that achieves the shortest description of the data is regarded as the best one. The objective function in the MDL principle is given by (Rissanen 1978):        JMDL C (m)  D = L D|C (m) + L C (m)

(5.5)

  where L C (m) is the length, in bits, of the description for the model class candidate C (m) ; L D|C (m) is the length, in bits, of the description of the data encoded with the help of the model class C (m) . The optimal model class is the one that minimizes

150

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

the objective function in Eq. (5.5). It is seen that the MDL principle  still involves (m) L D|C and a penalty a tradeoff between two terms, i.e., a goodness-to-fit term   term L C (m) to punish the complicated model class (Grünwald 2007). In some special cases, the resulting selection mechanism in Eq. (5.5) coincides with the BIC in Eq. (5.2) (Claeskens and Hjort 2008). The main idea for conventional model class selection approaches is to introduce ad hoc penalty terms to suppress the complicated model classes. However, in some circumstances, such as insufficient samples, ad hoc penalty terms cannot provide reliable model class selection results. The Bayesian approach to model class selection has been investigated by showing that the evidence for each model class candidate provided by the data leads to a quantitative expression of a principle of parsimony or Ockham’s razor (Beck and Yuen 2004; Yuen 2010b). According to the Bayes’ theorem, the model class selection criterion is rigorously derived without introducing any ad hoc penalty term and the Bayesian model class selection approach explicitly builds a tradeoff between the data fitting capability and its information-theoretic complexity. Beck and Yuen (2004) developed the Bayesian model class selection approach for globally identifiable case. Afterwards, this method was extended to the general cases where the model may be unidentifiable (Ching et al. 2005). Bayesian model class selection provides a powerful and rigorous approach for model updating and system identification and it has been widely used in many different disciplines (Yuen 2010a; Ley and Steel 2012; Mark et al. 2018). Although Bayesian model class selection has been widely applied in a wide range of applications, critical issues remain to be resolved. Conventional model class selection methods are conducted in an offline manner. In other words, the model class candidates are evaluated after the entire dataset is obtained. However, the behavior of the underlying dynamical system can be time-varying, e.g., due to damage, and offline model class assessment cannot adapt to substantial changes in the time-varying dynamical systems. In addition, there is no guarantee of good results when all the model class candidates are not suitable. In other words, if there is no suitable model class candidate, parameter estimation based on the selected model class will be unreliable. Furthermore, assessment of the large number of model class candidates will induce heavy computational burden. For example, consider a model class with 10 uncertain parameters to be determined. Choices can be made on the inclusion of the terms with these ten parameters. As a result, it is required to evaluate 210 − 1 = 1023 model class candidates. For practical problems, there are typically more uncertain parameters governed in the underlying systems and a finer discretization is required. Therefore, it is inevitable to require exhaustive assessment for all possible model class candidates. On the other hand, a more severe situation in the real-time application is that all model class candidates are evaluated at each time step. The large number of model class candidates hampers the methods to track the time-varying behavior of the dynamical systems, especially for large dynamical systems.

5.1 Introduction

151

Damage detection can benefit from model class selection. The damage level of a substructure can be reflected through the reduction of the corresponding stiffness parameter. The stiffness matrix K of the underlying structure can be represented as a linear combination of Nk substructural stiffness matrices: K(θ k ) =

Nk

θk,s Ks

(5.6)

s=1

where Ks , s = 1, 2, . . . , Nk , is the prescribed stiffness matrix of the sth substructure; θk,s , s = 1, 2, . . . , Nk , is the corresponding stiffness parameter and the vector θ k is defined to include all the stiffness parameters: T  θ k ≡ θk,1 , θk,2 , . . . , θk,Nk ∈ R Nk

(5.7)

The term θk,s Ks , s = 1, 2, . . . , Nk , represents the stiffness contribution from the sth substructure towards the global stiffness matrix K and the value of θk,s can be identified by using the observations of the structure. Reduction of θk,s indicates damage of the sth substructure. For example, for a two-story building, its stiffness matrix can be given as follows: K(θ k ) = θk,1 K1 + θk,2 K2

(5.8)

where K1 and K2 are the stiffness matrix of the first and second story, respectively, given by: K1 = K2 =

k1 0 0 0



k2 −k2 −k2 k2

(5.9) (5.10)

where k1 and k2 are the stiffness of the first and second story, respectively. Then, 5% reduction of θk,1 indicates 5% stiffness loss in the first story. This model-based parameterization is straightforward and widely used for system identification. However, the success of the conventional model-based approaches relies highly on the appropriate parameterization in the prescribed model class. Inappropriate selection of the parameterization will induce misleading results. On one hand, it is desirable to partition more substructures, i.e., with large Nk , since smaller portions of the structure will then have their designated stiffness parameters. As a result, damages might be detectable to a finer structural level. However, if there are too many substructures and, hence, too many uncertain parameters, the identification results will be highly fluctuating or even unidentifiable. On the other hand, if Nk is too small, each stiffness parameter will represent a large portion of the structure.

152

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

As a result, finer damages will be diluted and they may not be detectable. Furthermore, even when such reduction contributes to noticeable stiffness reduction of the corresponding substructure, the damage location(s) cannot be obtained. Moreover, there will be large modeling errors in the description of the substructural stiffness matrices. For instance, consider a building. The model class with one stiffness parameter for each story is sufficient to distinguish stiffness difference up to story level. However, unless there are large number of sensors, it is inevitable to encounter large estimation uncertainty or even unidentifiable situation. On the other hand, by using a model class which bundles a number of stories into a substructure, the identification results will be less fluctuating because there are less uncertain parameters. However, this parameterization induces additional modeling errors because the stiffness ratios between different stories in the same substructure are fixed due to the establishment of the substructural stiffness matrices. Furthermore, individual story damage within a substructure is undetectable because the stiffness ratios between different stories in a substructure are fixed. If an individual story is damaged, the stiffness of the entire substructure will be decreased (but diluted). However, it will be unable to determine which story in the substructure is damaged. Moreover, if the substructure is too large, individual damage in a story is insufficient to induce noticeable reduction of the stiffness of the corresponding substructure. Therefore, it is crucially important but nontrivial to establish a model class with suitable complexity for identification purpose. This is especially challenging for real-time identification when multiple damages are possible during the monitoring period. In this situation, the most suitable model class varies from time to time. In this chapter, efficient real-time identification scheme combining Bayesian model class selection and parametric identification is introduced. In Sect. 5.2, Bayesian real-time model class selection is introduced for the evaluation and selection of model classes. In Sect. 5.3, real-time system identification for simultaneous model class selection and parametric identification is introduced based on prescribed model classes. Then, Sect. 5.4 will present the novel third level of system identification, namely system identification using self-calibratable model classes. In Sect. 5.5, real-time system identification using hierarchical interhealable model classes is introduced. Illustrative examples for damage detection using prescribed model classes, self-calibratable model classes and hierarchical interhealable model classes are presented in Sect. 5.6. Finally, concluding remarks are given in Sect. 5.7.

5.2 Bayesian Real-Time Model Class Selection Use D to denote the input–output or output-only data from a dynamical system. Given a set of Nm model classes candidates C (1) , C (2) , . . . , C (Nm ) , each model class represents a particular parameterization of the unknown model parameters for the underlying dynamical system. The goal of model class selection is to use D to determine the most plausible or suitable class of models describing the system among the given model classes C (1) , C (2) , . . . , C (Nm ) (Yuen 2010a). Furthermore, real-time

5.2 Bayesian Real-Time Model Class Selection

153

model class selection aims to evaluate continuously the model class candidates at which the sampled observations are obtained. The dataset Di+1 can be obtained by grouping the first i + 1 data points: Di+1 = {z 1 , z 2 , . . . , z i , z i+1 }

(5.11)

where z i+1 is the measurements at the (i + 1)th time step. A set of Nm model class candidates at the (i + 1)th time step is considered: 

(1) (2) (Nm ) C i+1 = Ci+1 , Ci+1 , . . . , Ci+1

(5.12)

(m) where Ci+1 , m = 1, 2, . . . , Nm , is the mth model class candidate at the (i + 1)th time step. Since probability may be interpreted as a measure of plausibility based on specified information (Cox 1961), the probability of a class of models conditional on the dataset is required. By using the Bayes’ theorem, the plausibility of a class of models conditional on Di+1 can be obtained as (Yuen and Mu 2015):

     (m) (m)    p z i+1 |Di , Ci+1 P Ci+1 Di (m)  , m = 1, 2, . . . , Nm P Ci+1  Di+1 = p(Di+1 ) 

(5.13)

where the denominator p(Di+1 ) is given by the law of total probability: p(Di+1 ) =

Nm     

(m) (m)  P Ci+1 p z i+1 |Di , Ci+1  Di

(5.14)

m=1

   (m)  and P Ci+1 Di is the prior plausibility of the mth model class based on the dataset up to the ith time step and it is given by:       (m)  (m)  P Ci+1 Di = P Ci Di ,

m = 1, 2, . . . , Nm

(5.15)

Equation (5.15) indicates that the prior plausibility of a model class is equal to its posterior plausibility of the previous time step. In addition, the prior plausibilities for all model class candidates at the ith time step are normalized as follows: Nm   

(m)  P Ci+1  Di = 1

(5.16)

m=1

When no observation is available at the first time step, uniform prior plausibilities are considered:     P C1(m) D0 = 1/Nm , m = 1, 2, . . . , Nm (5.17)

154

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

In other words, there is no preference on any model class before the first measure  (m) ment is acquired. The factor p z i+1 |Di , Ci+1 in Eq. (5.13) is the conditional (m) evidence of model class Ci+1 based on dataset Di . It reflects how likely the data (m) is assumed are obtained if the model class Ci+1  and it plays an important role on  (m)  model class selection. The plausibility P Ci+1  Di+1 represents the relative plau(m) sibility of Ci+1 within the proposed set of Nm model class candidates, conditional on the dataset up to the (i + 1)th time step. By applying the Bayes’ theorem at the model class level, a set of model class candidates is chosen to represent the underlying dynamical system (Beck 2010). In the Bayesian real-time model class selection,  the model class candidates are  (m)  ranked according to the plausibilities P Ci+1  Di+1 , m = 1, 2, . . . , Nm . The most plausible class of models representing the system is the one which gives the largest value of this quantity. The Bayesian real-time model class selection approach provides rational and adaptive evaluation of the model class candidates. Moreover, robust parametric identification results can be achieved according to the plausibilities and this will be demonstrated in the following section. On the other hand, in practice, computational problems may be encountered for direct calculation of the plausibilities. For example, the value of the conditional evidence may be of a small order of magnitude, so direct calculation of the plausibilities in Eq. (5.13) will induce computational problem. To resolve this problem, log-evidence and log-prior plausibility can be given as follows:

      (1) (2) (Nm ) , ln p z i+1 |Di , Ci+1 , . . . , ln p z i+1 |Di , Ci+1 ln p z i+1 |Di , Ci+1          (1)  (2)  (N )  ln P Ci+1 Di , ln P Ci+1 Di , . . . , ln P Ci+1m Di

(5.18) (5.19)

Instead of taking the exponential of the sum of the log-evidence and log-prior plausibility and then normalizing the plausibility, the maximum value of the sum of the log-evidence and log-prior plausibility is subtracted from the sum of the log-evidence and log-prior plausibility of each model class and then taking the exponential operator. This operation does not affect the relative plausibility among different model classes. As a result, the plausibility of a model class can be obtained as follows:        (m) (m)    exp ln p z i+1 |Di , Ci+1 + ln P Ci+1 − M D i (m)      ,   P Ci+1 Di+1 =  (m) (m)  Nm m=1 exp ln p z i+1 |Di , C i+1 + ln P C i+1 Di − M 

m = 1, 2, . . . , Nm (5.20) where M is the maximum value of the sum of the log-evidence and log-prior plausibility:

5.3 Real-Time System Identification Using Predefined Model Classes

 

   (m) (m)  + ln P Ci+1 M = max ln p z i+1 |Di , Ci+1  Di m

155

(5.21)

5.3 Real-Time System Identification Using Predefined Model Classes 5.3.1 Parametric Identification with a Specified Model Class Consider a general, possibly nonlinear, dynamical system with Nd degree-offreedoms (DOFs) and equation of motion: M x¨ (t) + R(x(t), x˙ (t); θ (t), C) = T f (t)

(5.22)

where x(t) denotes the generalized coordinate vector of the system at time t; M is the mass matrix; the function R(., .; ., .) represents the general linear/nonlinear restoring force governed by the possibly time-varying model parameters in θ (t) ∈ R Nθ ; C is the model class that specifies the model parameterization; f is the excitation applied to the system and T is the influence matrix associated with f . Define the augmented state vector which is composed of the system state vector and the unknown model parameter vector as follows:  T y(t) ≡ §(t)T , θ (t)T ∈ R N x +Nθ

(5.23)

where x(t) ∈ R N§ is the system state vector at time t. Then, the state-space representation of Eq. (5.22) can be expressed as follows: ˙y(t) = g( y(t), f (t); θ (t), C)

(5.24)

where g(., .; ., .) represents the nonlinear state-space function. By using Taylor series expansion, Eq. (5.24) can be discretized to the following equation: yi+1 = Ai yi + Bi f i + δ i

(5.25)

where yi ≡ y(it); f i ≡ f (it), and t is the sampling time step; Ai and Bi are the transitional and input-to-state matrix, respectively; and δ i is the remainder term due to the local linear approximation. Detailed derivation of Eq. (5.25) and Ai , Bi and δ i can be found in Chap. 2. Discrete-time response measurements z 1 , z 2 , …, z i+1 are observed at N0 DOFs:   z i+1 = h yi+1 + ni+1

(5.26)

156

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

where h(·) defines the observation quantities; ni+1 represents the measurement noise at the (i + 1)th time step and n is modeled as Gaussian independent and identically distributed (i.i.d.) process with zero mean and covariance matrix  n ∈ R N0 ×N0 . Given the measurement dataset Di+1 = {z 1 , z 2 , . . . , z i+1 }, the updated state vector yi+1|i+1 and its associated covariance matrix  i+1|i+1 can be obtained by using the extended Kalman filter (EKF) (Hoshiya and Saito 1985). The detailed derivation and implementation of the EKF can be found in Chap. 2. However, the implementation of the EKF has been mostly done by prescribing a model class C and the identification results heavily rely on the prescribed model class. Therefore, the Bayesian real-time model class selection can be utilized as a rational and quantitative criterion to select the most suitable model class for reliable parametric identification. Consider a set of Nm model class candidates at the (i + 1)th time step: 

(1) (2) (Nm ) C i+1 = Ci+1 , Ci+1 , . . . , Ci+1

(5.27)

(m) where Ci+1 , m = 1, 2, . . . , Nm , indicates a particular model parameterization for describing the dynamical system in Eq. (5.22). By using the Bayes’ theorem, the (m) plausibility of model  can be given by using Eq. (5.13). In Eq. (5.13), the  class Ci+1 (m)  (m) prior plausibility P Ci+1 Di of model class Ci+1 is given as Eq. (5.15). Then, the is to derive the conditional crux to calculate the posterior plausibility in Eq. (5.13)   (m) evidence p z i+1 |Di , Ci+1 .   (m) The conditional evidence p z i+1 |Di , Ci+1 reflects the contribution of the current (m) data point in establishing the plausibility of model class Ci+1 . It can be expressed as follows:   − 1 N0  (m) = (2π )− 2  z,i+1|i  2 × p z i+1 |Di , Ci+1 (5.28) T −1   1 exp − z i+1 − z i+1|i  z,i+1|i z i+1 − z i+1|i 2

where z i+1|i is the one-step-ahead predicted observation and it can be obtained by taking expectation of Eq. (5.26):  z i+1|i ≡ E z i+1 |Di   = h yi+1|i

(5.29)

where yi+1|i is the one-step-ahead predicted state vector in the EKF (Hoshiya and Saito 1985). In addition,  z,i+1|i is the covariance matrix of the one-step-ahead predicted observation and it can be obtained by using Eqs. (5.26) and (5.29):  z,i+1|i ≡ E



z i+1 − z i+1|i



z i+1 − z i+1|i

T    Di

5.3 Real-Time System Identification Using Predefined Model Classes

= H i+1|i HT +  n

157

(5.30)

where H is the observation matrix given by:  ∂h  H= ∂ y  yi+1|i

(5.31)

and  i+1|i is the covariance matrix of the one-step-ahead predicted state vector in the EKF (Hoshiya and Saito 1985). At each time step, the conditional evidence of each model class can be obtained by using Eq. (5.28) and then the plausibility of each model class can be updated by using Eq. (5.13). The plausibility in Eq. (5.13) quantifies the performance of a model class considering data fitting capability and (m ∗ ) is the one with robustness at each time step. The most plausible model class Ci+1 the highest plausibility among a set of model class candidates with: ∗

(m ) Ci+1 = arg

   (m)  max P Ci+1 Di+1 (m) Ci+1 m = 1, 2, . . . , Nm

(5.32)



(m ) where Ci+1 is the most plausible model class at the (i + 1)th time step and m ∗ is the time-dependent index of the most plausible model class. Based on the results of model class selection, robust parametric identification results can be obtained by using multiple model classes or the most plausible model class.

5.3.2 Parametric Identification Using Multiple Model Classes It often encounters that there is more than one model class with substantial plausibility for dynamical systems. As a result, in order to achieve robust parametric identification results, multiple model classes can be utilized. The real-time system identification results can be obtained as a weighted average of the estimation results from multiple model classes: Nm   

    (1) (2) (Nm ) (m)  (m) yi+1 Ci+1 , Ci+1 , . . . , Ci+1 , Di+1 = P Ci+1  Di+1 yi+1|i+1 Ci+1 (5.33)



m=1

  (1) (2) (Nm ) where yi+1 Ci+1 , Ci+1 , . . . , Ci+1 , Di+1 is the identification results of the augmented  state  vector based on all the Nm model classes at the (i + 1)th time step; (m) yi+1|i+1 Ci+1 , m = 1, 2, . . . , Nm , is the updated state vector from model class    (m) (m)  Ci+1 at the (i + 1)th time step and it is obtained by using the EKF; P Ci+1  Di+1 

(m) is the plausibility of model class Ci+1 at the (i + 1)th time step. Equation (5.33)

158

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

indicates that the parametric identification by using multiple model classes is intrinsically a plausibility-weighted average of the parametric identification results from a given set of model classes. The multi-model based real-time system identification strategy is more robust than using a single prescribed model class because it utilizes the estimation from all model classes with properly assigned weightings (Yuen 2010b).

5.3.3 Parametric Identification Using the Most Plausible Model Class The plausibilities of the underparameterized and overparameterized model classes will be close to zero so their contributions will be negligible to the identification  (m ∗ )  results in Eq. (5.33). In other words, if the plausibility P Ci+1 Di+1 of the most ∗

(m ) is much larger than the plausibilities of the others, the plausible model class Ci+1 parametric identification results can be approximated from the most plausible model (m ∗ ) : class Ci+1

    (m ∗ ) (m ∗ ) |Di+1 = yi+1|i+1 Ci+1 yi+1 Ci+1    (m)  m ∗ = arg max P Ci+1 Di+1



(5.34)

m

On the other hand, the covariance matrix of the parametric identification results (m ∗ ) is given as follows: using the most plausible model class Ci+1     (m ∗ ) (m ∗ ) |Di+1 =  i+1|i+1 Ci+1  i+1 Ci+1    (m)  m ∗ = arg max P Ci+1  Di+1 

(5.35)

m

  (m) where  i+1|i+1 Ci+1 , m = 1, 2, . . . , Nm , is the covariance matrix of the updated (m) state vector from model class Ci+1 at the (i + 1)th time step and it is obtained by using the EKF. Equations (5.34) and (5.35) imply that it is sufficient to use the most plausible model class for real-time system identification.

5.3.4 Predefined Model Classes Consider the finest substructure configuration for identification. It has Nk structural components and the component stiffness matrices are given by:

5.3 Real-Time System Identification Using Predefined Model Classes

K1c , K2c , . . . , KcNk

159

(5.36)

where Knc ∈ R Nd ×Nd , n = 1, 2, . . . , Nk , is the component stiffness matrix of the nth finest substructure. For instance, consider a ten-story building. The finest substructure is considered as each story of the building and Nk is taken as 10. A set of Nm model class candidates is considered and each model class takes different groupings of the structural components into substructures. Specifically, for the mth model class, there are Nk(m) substructures grouping the component stiffness matrices in Eq. (5.36) and the substructural stiffness matrices are given by: K1(m) , K2(m) , . . . , K(m)(m) Nk

(5.37)

where Ks(m) , s = 1, 2, . . . , Nk(m) , is the sth substructural stiffness matrix: Ks(m) =



Knc , s = 1, 2, . . . , Nk(m)

(5.38)

n∈S(m) s (m) where S(m) s , s = 1, 2, . . . , Nk , represents the membership vector for structural components of the sth substructure in the mth model class. For the ten-story building example, the simplest model class has only one substructure for all the ten stories, (m) T = 1. A slightly more complicated model class i.e., S(m) 1 = [1, 2, . . . , 10] and Nk can be considered with two substructures: one for the first story and the other for = 1, S(m) = [2, 3, . . . , 10]T and Nk(m) = the remaining stories. In this case, S(m) 1 2 (m) (m) (m) 2. As a result, S1 , S2 ,…, S (m) form a mutually exclusive partition of the set Nk {1, 2, . . . , Nk }. Then, the stiffness matrix of the mth model class can be parameterized by using the substructural stiffness matrices: (m)

K

(m)

Nk  

(m) (m) (m) θk = θk,s Ks ,

m = 1, 2, . . . , Nm

(5.39)

s=1

  ∈ R Nd ×Nd indicates the parametrized stiffness matrix of the mth where K(m) θ (m) k (m) model class and θk,s is the stiffness parameter of the sth substructure for the mth model class. The uncertain stiffness parameter vector for the mth model class can be defined by grouping the stiffness parameters of all the substructures in the mth model class:

T  (m) (m) (m) θ (m) = θ , θ , . . . , θ , (m) k k,1 k,2 k,Nk

m = 1, 2, . . . , Nm

(5.40)

It is noted that θ (m) k , m = 1, 2, . . . , Nm , is time-dependent and the dependence of θ (m) on time is omitted for symbol simplicity. k

160

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Given a set of Nm prescribed model class candidates, the augmented state vector in Eq. (5.23) for the mth model class can be obtained by grouping the state vector and the corresponding model parameter vector. Then, real-time parametric identification for the mth model class can be performed by using the EKF (Hoshiya and Saito 1985). After obtaining the parametric identification results at the (i + 1)th time step, the conditional evidence of the mth model class can be calculated by using Eq. (5.28). Then, by using the Bayes’ theorem, the plausibilities of these Nm model class candidates can be computed based on Eq. (5.13). Moreover, in order to realize robust parametric identification, the real-time system identification results are obtained by using all the Nm model classes based on Eq. (5.33). In other words, the real-time identification results are obtained by using the plausibility-weighted average of all the Nm model classes. The procedure of the real-time system identification using predefined model classes can be summarized as follows: 1. At the initial time step, i.e., i = 0, set the initial conditions: (1) Define the structural component matrices Knc , n = 1, 2, . . . , Nk , in Eq. (5.37). (2) Define the substructural stiffness matrices Ks(m) , m = 1, 2, . . . , Nm , in Eq. (5.38) for all the Nm model classes. (3) Set the initial state vector y0|0 and the initial covariance   matrix  0|0 .  (m)  (4) Set the initial prior plausibilities as P C1 D0 = 1/Nm , m = 1, 2, . . . , Nm . At each time step (i ≥ 0): Calculate the updated state vector yi|i and its associated covariance matrix  i|i . Calculate the conditional evidence by using Eq. (5.28). Calculate the plausibilities of all the Nm model classes by using Eq. (5.13). Obtain the real-time system identification results using multiple model classes based on Eq. (5.33). 6. Go back to Step 2 for the next time step. 2. 3. 4. 5.

5.4 Self-Calibratable Model Classes Although Bayesian model class selection provides a rational and quantitative basis for assessing the relative performance of the model classes, it requires that a good model class exists in the prescribed model class pool. This is more difficult than intuition because it requires not only the fundamental understanding of the underlying dynamical system but also, more critically, the suitable complexity of a model class regarding the given measurement configuration and target spatial resolution of the identification results. Otherwise, reliable parametric identification results cannot be achieved among the poor model class candidates. Therefore, in this section, we

5.4 Self-Calibratable Model Classes

161

present a self-calibrating strategy to adaptively reconfigure the model classes. This strategy starts with a few simple model class candidates. The model classes can be selected in a real-time manner and their parameterization structure can also be recalibrated adaptively (Yuen et al. 2019). Consequently, it does not require one or a few good model classes to be prescribed.

5.4.1 Parameterization and Model Classes In order to correct the discrepancy of the model class candidates, the finest substructure configuration up to the component level is considered. The component stiffness matrices and the substructural stiffness matrices can be defined by using Eqs. (5.36) and (5.37), respectively. Then, the self-calibrating strategy can be performed based on the parameterized stiffness matrices and Nm + 1 model class candidates are constructed in this strategy. In particular, consider the simplest model class with only one adjustable parameter as the baseline model class with the model class index m = 0. Its stiffness matrix is parameterized as follows:   (0) (0) = θk,1 K(0) θ (0) K1 k

(5.41)

(0) where θk,1 is the sole stiffness parameter of the baseline model class; and K1(0) is the substructural stiffness matrix of the baseline model class:

K1(0) =

Nk

Knc

(5.42)

n=1

Equations (5.41) and (5.42) imply that the baseline model class is constructed with only one substructure. On the other hand, for each of the remaining Nm model classes, the stiffness matrix for each model class is constructed by using Nk(m) substructural stiffness matrices associated with the uncertain stiffness parameters: (m)

K

(m)

Nk  

(m) (m) θ (m) = θk,s Ks , k

m = 1, 2, . . . , Nm

(5.43)

s=1

  ∈ R Nd ×Nd indicates the parametrized stiffness matrix of the mth where K(m) θ (m) k (m) model class; Ks(m) is the sth substructural stiffness matrix given by Eq. (5.38); θk,s is the stiffness parameter of the sth substructure for the mth model class and the uncerfor the mth model class is defined by grouping tain stiffness parameter vector θ (m) k the stiffness parameters of all the substructures in the mth model class, given as Eq. (5.40). It is noted that θ (m) k , m = 0, 1, 2, . . . , Nm , is time-dependent and the dependence of θ (m) on time is omitted for symbol simplicity. k

162

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

By utilizing Eq. (5.28), the conditional evidence for each model class candidate can be calculated and then the plausibilities ∗of the Nm + 1 model classes can be (m ) at the (i + 1)th time step is given by: obtained. The most plausible model class Ci+1 ∗

(m ) Ci+1 = arg

   (m)  max P Ci+1 Di+1 (m) Ci+1 m = 1, 2, . . . , Nm

(5.44)

where m ∗ is the index of the most plausible model class and it is a time-dependent index due to the time-varying behavior of the dynamical system. Equation (5.44) (m ∗ ) achieves the maximum plausibility among the Nm + 1 model indicates that Ci+1 classes. The real-time system identification results are obtained by using the most plausible model class based on Eqs. (5.34) and (5.35). It is noted that if there is more than one model class achieving the same highest plausibility, evaluation of the plausibilities will continue until one model class outperforms the others with higher plausibility.

5.4.2 Self-Calibrating Strategy After constructing the model classes, the self-calibrating strategy can be implemented (0) dominates the plausibility among the as follows. When the baseline model class Ci+1 Nm +1 model class candidates, it implies that the baseline model class is sufficient for the underlying dynamical system. Meanwhile, other more complicated model classes provide only minor and possibly spurious improvement in data fitting. Therefore, the baseline model class should be utilized to represent the underlying dynamical system due to the principle of parsimony. On the other hand, when a non-baseline model class achieves the highest plausibility, it implies that there are substantial differential changes of the substructures in the underlying dynamical system. The baseline model class is insufficient and it underfits the measurements. As a result, calibration is triggered and all the model classes will be reconfigured to correct their deficiencies using the information from the most plausible model class. Then, the plausibilities of the calibrated model classes will be reset to uniform so that all the calibrated model classes will start with the same plausibility in the forthcoming time step. In addition, in order to achieve adaptiveness for damage tracking, a lower bound is imposed for the plausibilities, namely, Pl . If the plausibility of a model class is lower than Pl , it will be replaced by Pl . The plausibilities of the remaining model classes will be normalized such that the total plausibilities of all model classes are equal to unity. This minimum plausibility can be taken as a sufficiently small positive value, say 0.0001. The reason for imposing this plausibility lower bound is as follows. If the proposed identification method is used for long-term monitoring, it will be anticipated that the underlying structure is healthy for a very long period of time.

5.4 Self-Calibratable Model Classes

163

Then, the non-baseline model classes will all have infinitesimal plausibilities, say 10−100 or even smaller. Then, when damage occurs, it will take long time for the plausibility of the suitable model class to increase. By introducing this plausibility lower bound, it can avoid long delay of the switching of the most plausible model class due to the time-varying behavior of the dynamical system.

5.4.2.1

Triggering Conditions

At the early stage of identification, the results for model class selection are likely to be highly fluctuating due to the inaccurate initial conditions and lack of measurements. Therefore, a short training stage should be performed before the self-calibrating strategy is activated. In the training stage, all these model classes will be evaluated according to their plausibilities without performing the self-calibrating strategy. The training stage takes roughly two small-amplitude fundamental periods of the nominal model. After the training stage, calibration of the model classes is activated according to the plausibilities. The following two criteria to trigger the calibration are given as: m ∗ = 0 ∗



(m ) (m ) Ci+ , j = Ci

(5.45)

j = 1, 2, . . . , Nt

(5.46)

where Nt is the prescribed number of time steps such that the non-baseline model class possesses the highest plausibility. The value of Nt is again taken to cover roughly two small-amplitude fundamental periods of the nominal model. Equations (5.45) and (5.46) indicate that calibration will be triggered when a non-baseline model class dominates the plausibility for consecutive Nt time steps.

5.4.2.2

Self-Calibrating Mechanism

If the triggering conditions in Eqs. (5.45) and (5.46) are satisfied, calibration will be triggered and the following operations will be performed to correct the deficiencies of the model class candidates. The component stiffness matrices are first calibrated according to the model updating results from the most plausible model class. When the triggering conditions are satisfied at the ith time step, the component stiffness matrices will be calibrated as follows: 

c



c



c

K 1 , K 2 , . . . , K Nk c



(5.47)

where Kn ∈ R Nd ×Nd , n = 1, 2, . . . , Nk , is the calibrated component stiffness matrix of the nth structural component and it is given by:

164

5 Bayesian Model Class Selection and Self-Calibratable Model Classes … c





(m ∗ )

n ∈ Ss(m ) , s = 1, 2, . . . , Nk(m ∗

Kn = θ k,s Knc , 



)

(5.48)

(m ∗ )

where θ k,s is the updated stiffness parameter of the sth substructure in the most ∗ ∗ plausible model class; and Ss(m ) , s = 1, 2, . . . , Nk(m ) , represents the membership vector for the structural components of the sth substructure in the most plausible model class. Then, the substructural stiffness matrices can be updated by utilizing the calibrated component stiffness matrices: 

(m)



(m)



(m)

K1 , K2 , . . . , K Nk(m) 

(5.49)

(m)

where m = 0, 1, 2, . . . , Nm and Ks , s = 1, 2, . . . , Nk(m) , is the sth updated substructural stiffness matrix in the mth model class, given by: 

(m)

Ks



=



c

Kn ,

s = 1, 2, . . . , Nk(m)

(5.50)

n∈S(m) s (m) where S(m) s , s = 1, 2, . . . , Nk , represents the membership vector for the structural components of the sth substructure in the mth model class. Then, the stiffness matrix of the mth model class can be parameterized by using the updated substructural stiffness matrices: (m)

(m)



K

=

Nk



(m)

(m) θk,s Ks ,

m = 0, 1, 2, . . . , Nm

(5.51)

s=1

For instance, after calibrating the component stiffness matrices and obtaining the updated substructural stiffness matrices, the stiffness matrix of the baseline model class is calibrated as follows: 

K

(0)

(0)



(0) = θk,1 K1

(5.52)

(0)



where K1 is the updated substructural stiffness matrix of the baseline model class, given by: 

(0)

K1 =

Nk



c

Kn

(5.53)

n=1

After calibrating the stiffness matrices for all the model classes by using Eq. (5.51), the augmented state vector and its associated covariance matrix for all the model classes are reset as follows:

5.4 Self-Calibratable Model Classes

165

T  ∗ (m) (m )T , 1TNθ ×1 , yi+1|i+1 = §i+1|i+1





(m)  i+1|i+1 



(m)

=

m = 0, 1, 2, . . . , Nm

 (m ∗ )  §,i+1|i+1 0 N§ ×Nθ , m = 0, 1, 2, . . . , Nm 1 0 Nθ ×N§ I 9 Nθ

(5.54)

(5.55)

(m)



where yi+1|i+1 and  i+1|i+1 are the calibrated augmented state vector and associated covariance matrix, respectively, and they serve as the calibrated prior mean and prior covariance matrix∗ for the augmented state vector in the next propagation time (m ∗ ) (m ) and  §,i+1|i+1 are the state vector and associated covariance matrix step. §i+1|i+1 of the most plausible model class, respectively. Equation (5.54) indicates that the augmented state vector is reset to the state vector of the most plausible model class and unity model parameters. Moreover, the submatrix in Eq. (5.55) corresponding to the model parameters is reset to 1/9I Nθ . It is utilized to cover a sufficiently wide range for the parameters since it gives a standard deviation of 1/3 on the reset unity model parameters. Finally, the plausibilities for all the model classes are reset to uniform:    (m)  P Ci+1 Di+1 =

1 , m = 0, 1, 2, . . . , Nm Nm + 1

(5.56)

Equation (5.56) indicates that in the self-calibrating strategy, after calibrating the stiffness matrices, all the Nm + 1 model classes will restart with the same plausibility for the forthcoming calculation. The EKF continues to propagate for the next time step.

5.4.3 Procedure of the Real-Time System Identification with Self-Calibratable Model Classes 5.4.3.1

Training Stage

A short training stage is implemented for roughly two small-amplitude fundamental periods of the nominal model. This training stage is flexible and a rough estimation of the small-amplitude fundamental frequency can be easily obtained from the Fourier spectrum of the response for the dynamical systems. 1. At the initial time step, i.e., i = 0, set the initial conditions: (1) Define the structural component matrices Knc , n = 1, 2, . . . , Nk , in Eq. (5.36). (2) Define the substructural stiffness matrix K1(0) for the baseline model class and the substructural stiffness matrices Ks(m) , s = 1, 2, . . . , Nk(m) , m = 1, 2, . . . , Nm , for the remaining Nm model classes.

166

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

(3) Set the initial state vector and the initialcovariance   matrix. (m)  (4) Set the initial prior plausibilities as P C1 D0 = 1/(Nm + 1), m = 0, 1, 2, . . . , Nm . (5) Set the lower bound of the plausibility Pl and the number of consecutive time steps Nt . 2. At the ith (i ≥ 0) time step, calculate the conditional evidence by using Eq. (5.28) and the plausibility by using Eq. (5.13). If the plausibility of a model class is lower than Pl , it will be replaced by Pl . Then, normalize the plausibilities of the remaining model classes such that the sum of the plausibilities for all the model classes is equal to unity. 3. The training stage will be terminated if it has been carried out for Nt time steps. Otherwise, continue for the next time step from Step 2. 5.4.3.2

Working Stage

When the training stage is terminated, the proposed identification process will enter into the working stage. 1. At the ith time step, calculate the conditional evidence by using Eq. (5.28) and the plausibility by using Eq. (5.13). If the plausibility of a model class is lower than Pl , it will be replaced by Pl . Then, normalize the plausibilities of the remaining model classes such that the sum of the plausibilities for all the model classes is equal to unity. 2. Determine the most plausible model class and its system identification results are regarded as the current identification results. 3. Check the triggering conditions of calibration in Eqs. (5.45) and (5.46). (1) If the triggering criteria are fulfilled, calibration is triggered with the following calibration procedures: (a) Calibrate the component stiffness matrices by using the model updating results from the most plausible model class based on Eq. (5.49). (b) Reconfigure the substructural stiffness matrices in all the N m + 1 model classes by using Eq. (5.50). (c) Calibrate the augmented state vector and associated covariance matrix by using Eqs. (5.54) and (5.55), respectively. (d) Reset the plausibilities as a uniform distribution by using Eq. (5.56). (2) Otherwise, go back to Step 1 for the next time step.

5.5 Hierarchical Interhealable Model Classes

167

5.5 Hierarchical Interhealable Model Classes For practical system identification problems, there are usually a large number of uncertain parameters and finer discretization of the model is desirable. Although the Bayesian model class selection allows to choose among some prescribed model classes, it often encounters that the number of possible model classes is large. In particular, for the Bayesian real-time model class selection, the plausibilities for all the model class candidates are evaluated at each time step so the computational demand grows drastically. The large number of model class candidates will hamper the application of the Bayesian real-time model class selection. Therefore, in this section, we present a hierarchical strategy for the Bayesian real-time model class selection. The model classes are established in a hierarchical manner so that the proposed strategy requires only a small number of model classes, yet being able to explore a large solution space (Yuen and Dong 2020). Consequently, the method can handle the situation with a large number of damage scenarios while it maintains relatively low computational cost.

5.5.1 Hierarchical Model Classes 5.5.1.1

Model Classes and Parameterization in Hierarchical Level I

Hierarchical Level I considers relatively coarser model classes and the resolution of these model classes is up to the substructure level. In Hierarchical Level I, the entire structure is divided into N I substructures. The substructural stiffness matrices are obtained in the same fashion as Eq. (5.38) by using the component stiffness matrices given by Eq. (5.36). For the Bayesian real-time model class selection, in order to determine the optimal model class, there are 2 N I − 1 possible model classes with these N I substructures to be evaluated at each time step. However, the large number of model class candidates will drastically deteriorate the real-time capability of system identification. On the other hand, by using the hierarchical model classes, only N I + 1 model classes are considered in Hierarchical Level I: 

(0) (1) (2) (N I ) I (5.57) C i+1 = Ci+1 , Ci+1 , Ci+1 , . . . , Ci+1 (m) where Ci+1 , m = 0, 1, 2, . . . , N I , is the mth model class at the (i + 1)th time step in Hierarchical Level I and it is constructed based on a specified parameterization of the stiffness matrix for the underlying structures. As a result, the number of model classes to be evaluated in Hierarchical Level I grows only linearly with the number (0) which includes only of substructures. In particular, the baseline model class Ci+1 one stiffness parameter is introduced and the corresponding parameterization of the stiffness matrix can be expressed as follows:

168

5 Bayesian Model Class Selection and Self-Calibratable Model Classes … Nc  

(0) = θ K(0) θ (0) Knc k k,1

(5.58)

n=1 (0) (0) where θk,1 is the sole stiffness parameter in the baseline model class Ci+1 ; Knc , n = 1, 2, . . . , Nc , is the nth structural component stiffness matrix. For the remaining model classes in Hierarchical Level I, each model class has two stiffness parameters and the corresponding parameterization of the stiffness matrix is given by:

K

(m)



θ (m) k



=

(m) (m) θk,1 KI

+

(m) θk,2

NI

K(s) I , m = 1, 2, . . . , N I

(5.59)

s=1 s = m where K(m) I , m = 1, 2, . . . , N I , is the mth substructural stiffness matrix of the model classes in Hierarchical Level I and it is given by: K(m) = I

n∈S

Knc ,

m = 1, 2, . . . , N I

(5.60)

(m)

where S(m) is the membership vector of the mth substructure and S(1) , S(2) ,…, S(N I ) form a mutually exclusive partition of the set {1, 2, . . . , Nk }. The stiffness parameter (m) for identification in the model class Ci+1 : vector θ (m) k T  (m) (m) θ (m) = θ , θ k k,1 k,2

(5.61)

(m) It is noted that θ (m) k is time-dependent and the dependence of θ k on time is omitted for symbol simplicity. For Eq. (5.59), it is worth noting that the resolution of the model classes in Hierarchical Level I is up to the substructure level. Each model (m) , m = 1, 2, . . . , N I , in Hierarchical Level I (except the baseline model class Ci+1 (0) (m) ) has a designated stiffness parameter θk,1 for the mth substructure. For class Ci+1 example, consider a one hundred-story building with N I = 10 substructures and each substructure includes ten stories of the building. In Hierarchical Level I, N I +1 = 11 model classes are considered. Specifically, the baseline model class with the entire structure being the only substructure is established. For the remaining model classes in Hierarchical Level I, two parameters are utilized to describe the stiffness matrix of the structure. Each model class has one stiffness parameter for ten stories and the other stiffness parameter for the rest of the ninety stories. Those ten stories in different model classes are mutually exclusive. After constructing the model classes in Hierarchical Level I, their plausibilities (0) has the will be evaluated by using Eq. (5.13). When the baseline model class Ci+1 highest plausibility, it implies that the real-time system identification in Hierarchical

5.5 Hierarchical Interhealable Model Classes

169

Level I is sufficient for the underlying dynamical system. However, when a non(m ∗ ) , m ∗ = 0 dominates the plausibility, it implies that there baseline model class Ci+1 are substantial modeling errors necessary to be corrected and the potential modeling errors can be roughly located in the m ∗ th substructure. Then, the algorithm will shift to Hierarchical Level II.

5.5.1.2

Model Classes and Parameterization in Hierarchical Level II ∗

(m ) In Hierarchical Level I, when a non-baseline model class Ci+1 (m ∗ = 0) dominates the plausibility for Nt consecutive time steps, model classes in Hierarchical Level II will be triggered to calibrate the stiffness matrices in a finer level. Use N I I to denote the number of structural components in the m ∗ th substructure. The model classes in Hierarchical Level II focus on all the structural components in the m ∗ th substructure and they are expressed as follows:



(0) (m ∗ ,1) (m ∗ ,2) (m ∗ ,N I I ) II C i+1 = Ci+1 , Ci+1 , Ci+1 , . . . , Ci+1

(5.62)

(0) where Ci+1 is the baseline model class and its stiffness matrix is given by Eq. (5.58); (m ∗ ,q) Ci+1 , q = 1, 2, . . . , N I I , represents the qth model class in Hierarchical Level II; m ∗ and q in its superscript represent the m ∗ th substructure selected in Hierarchical Level I and the qth component in the selected substructure, respectively. The component stiffness matrices in the m ∗ th substructure can be expressed as follows: c c c Ks(1) , Ks(2) , . . . , Ks(N II)

(5.63)



where s(q) = S(m ) (q), q = 1, 2, . . . , N I I , represents the qth element in the ∗ c , q = membership vector S(m ) of the m ∗ th substructure. In other words, Ks(q) ∗ 1, 2, . . . , N I I , represents the qth component stiffness matrix in the m th substruc(m ∗ ,q) ture. Then, the stiffness matrices for model classes Ci+1 , q = 1, 2, . . . , N I I , in Hierarchical Level II are given by: K

(m ∗ ,q)



(m ∗ ,q) θk



=

(m ∗ ,q) c θk,1 Ks(q)

+

(m ∗ ,q) θk,2

Nc

Knc , q = 1, 2, . . . , N I I

n=1 n = s(q) (5.64) (m ∗ ,q)

(m ∗ ,q)

where θ k is the stiffness parameter vector for identification in model class Ci+1 and it is defined as follows:

170

5 Bayesian Model Class Selection and Self-Calibratable Model Classes … (m ∗ ,q)

θk

T  ∗ (m ,q) (m ∗ ,q) = θk,1 , θk,2

(5.65)

It is noticed that the resolution of the model classes in Hierarchical Level II is up to (m ∗ ,q) the structural component level and each model class Ci+1 , q = 1, 2, . . . , N I I , has a (m ∗ ,q) designated stiffness parameter θk,1 for the qth component in the m ∗ th substructure. The model classes in Hierarchical Level II will be evaluated according to their (m ∗ ,q ∗ ) plausibilities by using Eq. (5.13). When a non-baseline model class Ci+1 dominates the plausibility among all the N I I + 1 model class candidates in Hierarchical Level II, it implies that there are substantial modeling errors required to be corrected. The potential modeling errors can be finely located in the s(q ∗ )th structural component. Then, the algorithm will shift to interhealing process to calibrate the modeling error in the stiffness matrix of the s(q ∗ )th structural component.

5.5.2 Interhealing Mechanism (m ∗ ,q ∗ )

When model class Ci+1 achieves the maximum plausibility among the N I I + 1 model classes in Hierarchical Level II for Nt consecutive time steps, interhealing mechanism is triggered to correct the modeling errors in the stiffness matrix of the s(q ∗ )th structural component. (m ∗ ,q ∗ ) According to the identification results in model class Ci+1 in Hierarchical Level II, all the component stiffness matrices are calibrated as follows: 

c



(m ∗ ,q ∗ )

Ks(q ∗ ) = θ k,1 

c

(m ∗ ,q ∗ )



Kn = θ k,2 

Knc ,

c Ks(q ∗)

(5.66)

n = 1, 2, . . . , Nk & n = s(q ∗ )

c

(5.67) 

(m ∗ ,q ∗ )

where Kn , n = 1, 2, . . . , Nk , is the calibrated component stiffness matrix; θ k,1 

(m ∗ ,q ∗ )

(m ∗ ,q ∗ )

and θ k,2 are the updated stiffness parameters of model class Ci+1 . Then, the stiffness matrices in Hierarchical Level I and II can be updated based on the calibrated component stiffness matrices. In particular, the stiffness matrix of (0) can be updated by: the baseline model class Ci+1 

K

(0)



θ (0) k



=

θk(0)

Nk



c

Kn

(5.68)

n=1 (0)



where K is the updated stiffness matrix for the baseline model class. In Hierarchical Level I, the stiffness matrices for the non-baseline model classes are updated as follows:

5.5 Hierarchical Interhealable Model Classes 

K

(m)



171

NI 

(s) (m) (m) (m) θ (m) = θ K + θ K I , m = 1, 2, . . . , N I I k k,1 k,2 



(5.69)

s=1 s = m (m)



where K I , m = 1, 2, . . . , N I , is the mth updated substructural stiffness matrix of the model classes in Hierarchical Level I: 

(m)

KI

=

n∈S

c



Kn , m = 1, 2, . . . , N I

(5.70)

(m)

In Hierarchical Level II, the stiffness matrices for the non-baseline model classes are updated as follows: 

K

(m ∗ ,q)

 ∗  (m ,q) (m ∗ ,q) c (m ∗ ,q) θk = θk,1 Ks(q) + θk,2

Nc





c

Kn , q = 1, 2, . . . , N I I

n=1 n = s(q) (5.71) After calibrating the component stiffness matrices and updating the stiffness matrices of the model classes in Hierarchical Level I and II, the algorithm will go back to Hierarchical Level II with the updated model classes. Then, the prior plausibilities in Hierarchical Level II will be reset to uniform as follows:     (m ∗ ,q) (0) = P Ci+1 = P Ci+1

1 , q = 1, 2, . . . , N I I NI I + 1

(5.72)

Equation (5.72) implies that all the N I I + 1 model classes in Hierarchical Level II will restart with the same plausibility for the calculation in the next time step.

5.5.3 Triggering Conditions The algorithm for real-time system identification using hierarchical interhealable model classes has three different stages, i.e., Hierarchical Level I, Hierarchical Level II and interhealing process. Three types of transitions among the three stages are introduced and their triggering conditions are given as follows. The first type of transition is from Hierarchical Level I to Hierarchical Level II and it is triggered when the following two criteria are satisfied: m ∗ = 0

(5.73)

172

5 Bayesian Model Class Selection and Self-Calibratable Model Classes … ∗



(m ) (m ) Ci+ , j = Ci

j = 1, 2, . . . , Nt

(5.74)

In other words, level up, i.e., the shift from Hierarchical Level I to Hierarchical Level ∗ II, will be triggered when a non-baseline model class Ci(m ) achieves the maximum plausibility for Nt consecutive time steps. The second type of transition is from Hierarchical Level II to interhealing and it is triggered when the following two criteria are satisfied: m ∗ = 0 (m ∗ ,q ∗ )

Ci+ j

(m ∗ ,q ∗ )

= Ci

,

j = 1, 2, . . . , Nt

(5.75) (5.76)

Equations (5.75) and (5.76) indicate that interhealing will be triggered when a (m ∗ ,q ∗ ) non-baseline model class Ci in Hierarchical Level II achieves the maximum plausibility for Nt consecutive time steps. The third type of transition is from Hierarchical Level II back to Level I and it is triggered when the following criteria are satisfied: m∗ = 0 ∗



(m ) (m ) Ci+ , j = Ci

j = 1, 2, . . . , Nt

(5.77) (5.78)

In other words, level down, i.e., the shift from Hierarchical Level II back to Level I, will be triggered when the baseline model class in Hierarchical Level II is the most plausible and remains for Nt consecutive time steps.

5.5.4 Procedure of the Real-Time System Identification Using Hierarchical Interhealable Model Classes 5.5.4.1

Training Stage

A short training stage is implemented for roughly two fundamental periods of the nominal model. This training stage is flexible and a rough estimation of the fundamental frequency can be easily obtained from the Fourier spectrum of the response for the dynamical systems. 1. At the initial time step, i.e., i = 0, set the initial conditions: (1) Define the structural component matrices Knc , n = 1, 2, . . . , Nk .

5.5 Hierarchical Interhealable Model Classes

173

(2) Define the stiffness matrix of the baseline model class and the substructural stiffness matrices K(m) I , m = 1, 2, . . . , N I , for the N I non-baseline model classes in Hierarchical Level I. (3) Set the initial state vector and the initial covariance   matrix. (m)  (4) Set the initial prior plausibilities as P C1 D0 = 1/(N I + 1), m = 0, 1, 2, . . . , N I . (5) Set the lower bound of the plausibility Pl and the number of consecutive time steps Nt . 2. At the ith (i ≥ 0) time step, calculate the conditional evidence by using Eq. (5.28) and the plausibility by using Eq. (5.13). If the plausibility of a model class is lower than Pl , it will be replaced by Pl . Then, normalize the plausibilities of the remaining model classes such that the sum of the plausibilities for all the model classes is equal to unity. 3. Training stage will be terminated if it has been carried out for Nt time steps. Otherwise, continue for the next time step from Step 2. 5.5.4.2

Working Stage

When the training stage is terminated, the proposed algorithm will enter into the working stage. 1. At the ith time step, calculate the conditional evidence of the model classes in Hierarchical Level I by using Eq. (5.28) and the plausibility by using Eq. (5.13). If the plausibility of a model class is lower than Pl , it will be replaced by Pl . Then, normalize the plausibilities of the remaining model classes such that the sum of the plausibilities for all the model classes is equal to unity. 2. Determine the most plausible model class and its system identification results are regarded as the current identification results. 3. Check the triggering conditions for level up in Eqs. (5.73) and (5.74). (1) If the triggering criteria for level up are fulfilled, shift to Hierarchical Level II (Step 4) for the next time step. (2) Otherwise, go back to Step 1 for the next time step. 4. Construct the stiffness matrices of the model classes in Hierarchical Level II. 5. Calculate the conditional evidence by using Eq. (5.28) and the plausibility by using Eq. (5.13). 6. Check the triggering conditions for interhealing in Eqs. (5.75) and (5.76). (1) If the triggering criteria for interhealing are fulfilled: (a) Calibrate the component stiffness matrices by using the model updating results from the most plausible model class based on Eqs. (5.66) and (5.67).

174

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

(b) Reconfigure the substructural stiffness matrix of the baseline model class and the substructural stiffness matrices in all the non-baseline model classes in Hierarchical Level I and Hierarchical Level II. (c) Reset the plausibility of all the model classes in Hierarchical Level II as a uniform distribution by using Eq. (5.72). (2) Otherwise, go back to Step 5 for the next time step. 7. Check the triggering conditions for level down in Eqs. (5.77) and (5.78). (1) If the triggering criteria for level down are fulfilled, go back to Step 1 for the next time step. (2) Otherwise, go back to Step 5 for the next time step.

5.6 Applications to Bayesian Real-Time Model Class Selection for System Identification 5.6.1 Identification of High-Rise Building with Predefined Model Classes A fifty-story building with degrading stiffness is considered. The building has uniformly distributed floor mass and interstory stiffness over its height. The stiffness to mass ratio is assumed to be 1600 s−2 and hence the fundamental frequency of the building is 0.198 Hz. Rayleigh damping model is considered, i.e., C = αM + βK. The damping coefficients are given by α = 0.019 s−1 and β = 0.004 s, so that the damping ratios are 1% for the first two modes. The building was subjected to ground excitation modeled as zero-mean stationary Gaussian white noise with spectral intensity S0 = 6.0 × 10−5 m2 /s3 . The entire monitoring duration was 200 s and the sampling frequency was 200 Hz. The measurements consisted of the ground excitation and the acceleration responses of the 1st–3rd, 5th, 8th, 10th, 15th, 20th, 30th, 40th, and 50th floors. The measurement noise was modeled as Gaussian i.i.d. process with 5% rms of the corresponding noise-free quantities. During the monitoring duration, two structural damages occurred. At t = 80 s, the first damage led to 5% stiffness loss of the lowest story. At t = 150 s, the second story had 8% stiffness loss. 

(1) (2) (3) (4) (5) Five model class candidates C i+1 = Ci+1 , Ci+1 , Ci+1 , Ci+1 , Ci+1 are established to represent different possible damage scenarios of the structure. Table 5.1 shows the model class candidates with different parameterizations. The first column indicates the stiffness parameters used for characterizing the stiffness matrix of the corresponding model class. The second column to the sixth column indicates the (1) (5) to Ci+1 , respectively. specific parameterization of the model class candidates Ci+1 The numbers in Table 5.1 refer to the stories which are represented by the corresponding parameter of a model class. For example, the fifth column indicates that

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

175

Table 5.1 Model class candidates with different parameterizations Model class

(1) Ci+1

(2) Ci+1

(3) Ci+1

(4) Ci+1

(5) Ci+1

θk,1

1–50

1

1

1

1

θk,2



2–5

2

2

2

θk,3



6–10

3–5

3

3

θk,4



11–50

6–10

4–5

4

θk,5





11–50

6–10

5

θk,6







11–50

6–10

θk,7









11–50

(4) there are six stiffness parameters in model class Ci+1 used for describing the stiffness matrix of this building. Specifically, θk,1 , θk,2 and θk,3 correspond to the 1st, 2nd and 3rd story, respectively; θk,4 , θk,5 and θk,6 correspond to the 4th to 5th, 6th to 10th (1) and 11th to 50th story, respectively. As a result, it is obvious that model class Ci+1 was the simplest with only one stiffness parameter characterizing the stiffness matrix (5) was the most compliof the entire structure. On the other hand, model class Ci+1 cated and there were seven stiffness parametersgoverning   the stiffness matrix of the  entire structure. A uniform prior plausibility P C1(m) D0 = 1/5, m = 1, 2, 3, 4, 5, was utilized and it indicates that there was no prior preference on any model class candidate before the observations were acquired.    (m)  Figures 5.1 and 5.2 show the log-plausibility ln P Ci+1 Di+1 and plausibility    (m)  P Ci+1  Di+1 of the five model classes in time histories, respectively. The two vertical dashed lines indicate the time instants when the structural damages occurred. It is seen that the plausibilities of the model class candidates started with the same initial value and they fluctuated severely at the beginning of the identification due to (1) inaccurate initial values and lack of measurements. Before t = 80 s, model class Ci+1 outperformed other model classes since it was sufficient to represent the undamaged structure. At t = 80 s, the damage of the first story occurred. It is observed that the (1) (2) decreased rapidly while model class Ci+1 became plausibility of model class Ci+1 the most plausible model class at t = 82.110 s. The reason for the switching of the most plausible model class could be interpreted as follows. At t = 80 s, the first structural damage occurred. Since there was no designated stiffness parameter in (1) (1) for the first story, Ci+1 was incapable to describe the damaged model class Ci+1 (1) underfitted the measurements after the first structure. Therefore, model class Ci+1 (2) (3) (4) , Ci+1 , Ci+1 and structural damage occurred. On the other hand, model classes Ci+1 (5) Ci+1 had the designated stiffness parameter for the first story. However, compared (3) (4) (5) (2) , Ci+1 and Ci+1 , Ci+1 was the simplest model class. The with model classes Ci+1 (3) (4) , Ci+1 penalty due to complicated parameterization hampered model classes Ci+1 (5) (2) (3) and Ci+1 to become the most plausible model class. As a result, although Ci+1 , Ci+1 , (4) (5) (2) Ci+1 and Ci+1 had similar maximum likelihood values, the simplest model class Ci+1

176

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.1 Log-plausibility of the model classes

among them was the most suitable after the first structural damage occurred. Small time delay of the switching of the most plausible model class was observed because the identification results depended on the data points at the current and previous time steps. It was inevitable for the existence of a small time lag. Nevertheless, the time delay was acceptably small. Afterwards, at t = 150 s, the second damage occurred at the third story. The plau(2) (4) descended quickly while model class Ci+1 became the sibility of model class Ci+1 most plausible model class at t = 155.135 s. Since there were differential damages of the first and third story, model classes without designated stiffness parameters for (1) (2) (3) , Ci+1 and Ci+1 , underfitted the measurements the first and third story, namely Ci+1 (4) (5) and Ci+1 were after the second damage occurred. Meanwhile, model classes Ci+1 capable to track the stiffness losses of the first and third story. However, model class (4) Ci+1 was preferred and had the dominant plausibility almost equal to unity because (5) . it was simpler than model class Ci+1 Figures 5.3, 5.4, 5.5, 5.6, and 5.7 show the parametric identification results of (1) (5) to Ci+1 , respectively. The dotted lines represent the estimated model classes Ci+1 values; the solid lines represent the actual values; and the dashed lines represent the bounds of the 99.7% credible intervals. Before the first structural damage occurred, the estimated stiffness parameters in all the model classes were approximately equal to unity. After the stiffness of the first story decreased at t = 80 s, no significant (1) . This change was observed for the estimated stiffness parameter in model class Ci+1 (1) is because there was only one stiffness parameter in Ci+1 to characterize the stiffness

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

177

Fig. 5.2 Plausibility of the model classes

matrix of the entire structure and the estimation of the first structural damage was (1) underfitted the measurements diluted and undetectable. Therefore, model class Ci+1 from the damaged structure, resulting in the rapid drop of its plausibility after t = (2) (3) , Ci+1 , 80 s. In contrast, the parametric identification results of model classes Ci+1 (4) (5) Ci+1 and Ci+1 could successfully reflect the stiffness loss of the first story through θk,1 . (2) (3) (4) (5) , Ci+1 , Ci+1 and Ci+1 for a This verified the rapid increase of the plausibilities of Ci+1 short time duration after the first structural damage occurred. Afterwards, the second structural damage of 8% stiffness loss in the third story occurred at t = 150 s. It is seen (1) (2) (3) , Ci+1 and Ci+1 could that the parametric identification results of model classes Ci+1 not accurately capture the stiffness loss in the third story because the stiffness loss was diluted in the estimation results due to the oversimplification of the parameterization. (4) (5) and Ci+1 were capable to track the two stiffness On the other hand, model classes Ci+1 losses in the first and third story. In addition, all the model classes could provide satisfactory estimation results for the damping coefficients. Therefore, the parametric (1) (2) (3) (4) (5) , Ci+1 , Ci+1 , Ci+1 and Ci+1 verified identification results from model classes Ci+1 (1) (2) (4) that Ci+1 , Ci+1 and Ci+1 were the most suitable model classes for the structure in the status of undamaged, 10% stiffness loss of the 1st story and 8% stiffness loss of the third story, respectively. The proposed method could select the most appropriate model class in a set of model class candidates to describe the different statuses of the structure. Since the frequency contents of time-varying dynamical systems changed over time, a single model class was insufficient to represent the dynamical systems in

178

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

(1)

Fig. 5.3 Parametric identification results of model class Ci+1

their different statuses. The superiority of the proposed approach with simultaneous model class selection and parametric identification could be confirmed. In order to obtain robust parametric identification results, the real-time multimodel estimation formula in Eq. (5.33) were utilized. Figure 5.8 shows the robust parametric identification results using all the five model classes. The dotted lines represent the estimated values by using multi-model estimation; the solid lines represent the actual values; and the dashed lines represent the bounds of the 99.7% credible intervals. It is seen that estimation results could successfully track the stiffness degradations in the first and third story and they approached the actual values. In addition, 99.7% credible intervals could provide reasonable uncertainties for the identification results. Therefore, it is confirmed that the proposed approach provided proper weightings from the plausibilities of the model classes. The multi-model estimation could provide robust parametric identification results since multiple model classes were used with proper weightings. Moreover, comparing Fig. 5.8 with Figs. 5.3, 5.4, 5.5, 5.6, and 5.7, it is seen that the estimation results based on multiple model classes were superior to these obtained by using only one model class in terms of accuracy and uncertainty. Figures 5.9 and 5.10 show the actual versus the estimated values using multiple model classes of the displacement and velocity responses for the 1st, 10th, 20th, 30th, 40th and 50th floor, respectively. The 45-degree line in each subplot of Figs. 5.9 and

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

179

(2) Fig. 5.4 Parametric identification results of model class Ci+1

5.10 provides the reference of perfect match. The state estimation results in the first 5 s were excluded from the comparison to eliminate the effect of inaccurate initial conditions. It is seen that the estimated displacement and velocity responses in each subplot showed good agreement with the corresponding actual responses. On the other hand, in the first subplot of Fig. 5.10, there were roughly 20 points giving discrepant estimation of the velocity of the first story. These points were associated with the results obtained during the short time period immediately after the first and second structural damage occurred. Nevertheless, the majority of the points were distributed along the line of perfect match, indicating that the proposed method provided satisfactory estimation for the structural responses. Therefore, the parametric identification results including model parameters and structural states could achieve high accuracy by using the proposed approach.

180

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

(3) Fig. 5.5 Parametric identification results of model class Ci+1

5.6.2 Identification of Bouc-Wen Nonlinear Hysteresis System with Self-Calibratable Model Classes This application is concerned with a Bouc-Wen nonlinear hysteresis system with Nd = 5 DOFs shown in Fig. 5.11. The Bouc-Wen model of hysteresis is one of the most commonly used hysteretic models to describe nonlinear hysteresis systems in structural engineering. The governing equation of this nonlinear system is given by: M x¨ (t) + C[θ c (t)] x˙ (t) + K[θ k (t)]r(t) = T f (t)

(5.79)

where x¨ (t), x˙ (t) and r(t) are the acceleration, velocity, and restoring force vector, respectively; M, C and K are the mass, damping and stiffness matrix of the system, respectively; the stiffness and damping matrices are parameterized with possibly time-varying parameters θ k (t) and θ c (t), respectively; f is the excitation applied to the system and T is the influence matrix associated with the excitation f . The stiffness matrix K has the following form:

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

181

(4)

Fig. 5.6 Parametric identification results of model class Ci+1

K=

5

Knc

(5.80)

n=1

where Knc , n = 1, 2, . . . , 5, is the nth component stiffness matrix given by:

K1c ⎡ ⎢ Knc = k ⎢ ⎣ 02×(n−2)

1 01×4 =k 04×5



⎤ 0 (n−2)×5 ⎥ 1 −1 02×(5−n) ⎥ ⎦, n = 2, . . . , 5 −1 1 0(5−n)×5

(5.81)

(5.82)

The damping matrix C is taken as: C = αM + βK

(5.83)

where α and β are the damping coefficients. The restoring force vector r(t) is given by:

182

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

(5)

Fig. 5.7 Parametric identification results of model class Ci+1

r(t) = [r1 (t), r2 (t), r3 (t), r4 (t), r5 (t)]T

(5.84)

r˙n (t) = x˙n − μ|x˙n ||rn |η−1 rn − κ x˙n |rn |η , n = 1, 2, . . . , 5

(5.85)

where μ, κ and η are the characteristic parameters governing the shape and smoothness of the hysteresis loops. For the undamaged structure, the actual stiffness of each nonlinear spring was taken as k = 2000N/m. The damping coefficients were taken as α = 0.379 s−1 and β = 8.019 × 10−4 s. The characteristic parameters of the Bouc-Wen system were taken to be μ = 1000 s2 /m2 , κ = 1500 s2 /m2 and η = 2.

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

183

Fig. 5.8 Parametric identification results using multiple model classes

The hysteresis system was subjected to a zero-mean Gaussian white noise with spectral intensity S0 = 6.0 × 10−2 m2 /s3 . The entire monitoring period was 350 s and the sampling time interval was t = 0.002 s. The measurements consisted of the velocity responses of the 1st, 3rd and 5th DOFs. The measurement noise was modeled as zero-mean Gaussian i.i.d. process and the rms of the measurement noise was taken as 5% rms of the corresponding noise-free response quantities. In addition, the structure was undamaged in the first 150 s. Then, sudden damage with 5% stiffness reduction occurred in the 1st nonlinear spring at t = 150 s. Afterwards, sudden damages with 5% and 2% stiffness reduction occurred in the 2nd and 4th

184

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.9 Actual versus estimated displacement responses using multiple model classes

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

Fig. 5.10 Actual versus estimated velocity responses using multiple model classes

185

186

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.11 Bouc-Wen nonlinear hysteresis system with 5 DOFs

nonlinear springs at t = 250 s, respectively. The values Pl = 1 × 10−4 and Nt = 500 were used for the lower bound of plausibility and two small-amplitude fundamental periods of the nominal model. Six model class candidates with different parameterizations of the stiffness matrix (0) was defined as the simplest model were considered. The baseline model class Ci+1 class which used only one stiffness parameter to represent the stiffness matrix of the entire structure:   (0) (0) = θk,1 K(0) θ (0) K1 k (0) = θk,1

5

Knc

(5.86)

n=1 (0) (0) where θk,1 is the uncertain stiffness parameter in the baseline model class Ci+1 . For the remaining model classes, two stiffness parameters were utilized in each model class to parameterize the stiffness matrix of the entire structure. Specifically, in model (m) , m = 1, 2, . . . , 5, one stiffness parameter was assigned to the mth spring class Ci+1 and the other stiffness parameter was assigned to represent all other springs:

  (m) (m) (m) (m) = θk,1 K(m) θ (m) K1 + θk,2 K2 k (m) c (m) = θk,1 Km + θk,2

5

Knc , m = 1, 2, . . . , 5

(5.87)

n=1 n = m T  (m) (m) where θ (m) = θ , θ , m = 1, 2, . . . , 5, is the uncertain stiffness parameter k k,1 k,2 (m) vector in model class Ci+1 , m = 1, 2, . . . , 5. In addition, there were two damping coefficients and three characteristic parameters for the Bouc-Wen model in all the model classes. As a result, the uncertain model parameter vector of the baseline

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

187

(0) model class Ci+1 was given by: T

(0) θ (0) = [θk,1 , α (0) , β (0) , μ(0) , κ (0) , η(0) ]

(5.88)

(m) Moreover, the uncertain model parameter vector of model class Ci+1 , m = 1, 2, . . . , 5, was given by: T

(m) (m) θ (m) = [θk,1 , θk,2 , α (m) , β (m) , μ(m) , κ (m) , η(m) ] , m = 1, 2, . . . , 5

(5.89)

Figure 5.12 shows the plausibilities of all the six model classes. In the early stage of identification, the plausibilities of different model classes were fluctuating severely due to inaccurate initial values and lack of measurements. After more observations (0) achieved the highest plausibility. This were acquired, the baseline model class Ci+1 (0) implies that the baseline model class Ci+1 was sufficient to represent the undamaged structure. At t = 150 s, the first damage occurred in the first nonlinear spring so the baseline model class was insufficient to represent the damaged structure. However, model (1) had a designated stiffness parameter for the first spring so it outperformed class Ci+1 (1) triggered the all other model classes in data fitting. Afterwards, model class Ci+1 calibrations at t = 152.164 s and t = 153.256 s. Then, all the component stiffness matrices were calibrated by using the model identification results of model class (1) Ci+1 . The six model classes were reconfigured and their plausibilities were reset to uniform for the calculation in the next time step. After the calibration, the stiffness loss of the first spring was considered in the corresponding component stiffness matrix and the substructural stiffness matrices for all the model classes. As a result, the calibrated baseline model class achieved the highest plausibility in the time period [153.256, 250] s, since it was sufficient for the damaged structure. At t = 250 s, since the damages of the 2nd and 4th nonlinear springs occurred simultaneously, none of the model classes could match this damage pattern. It is observed that the plausibility of the baseline model class decreased rapidly after the sudden damages of the 2nd and 4th nonlinear springs occurred. Meanwhile, model (2) triggered the calibrations at t = 252.038 s and t = 255.162 s and model class Ci+1 (4) class Ci+1 triggered the calibration at t = 253.398 s. It is not surprising to run through a series of calibrations because none of the model class candidates could handle such damage pattern. After the three calibrations, the stiffness losses of the 2nd and 4th nonlinear springs were incorporated in the corresponding component stiffness matrices and the substructural stiffness matrices for all the model classes. Then, the baseline model class served as the most plausible model class until the end of the monitoring period. It is seen that although there was no suitable model class candidate to represent the simultaneous stiffness losses of the 2nd and 4th springs, the self-calibrating mechanism could correct sequentially the modeling errors between the actual and the estimated models. When there were changes of the structural stiffnesses, the model class with more complicated parameterization to accommodate

188

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.12 Plausibility of the model classes of the Bouc-Wen model

the changes was selected. The most plausible model class then triggered calibration. After the calibration, the discrepancy between the actual stiffness and the estimated stiffness could be corrected and the baseline model class was then sufficient to represent the calibrated structural stiffness matrix. As a result, more reliable future prediction could be anticipated.

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

189

Figure 5.13 shows the identification results of the stiffness parameters using the most plausible model class. The dotted lines represent the estimated values; the solid lines represent the actual values; and the dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be employed in the later figure. It is seen that in the early stage of identification, the estimated stiffness parameters fluctuated severely due to inaccurate initial values, lack of measurements and rapid switching of the most plausible model class. Afterwards, the estimated stiffness parameters agreed well with the actual values and the credible intervals provided reasonable estimation uncertainty. Moreover, it is observed that the credible intervals became much wider immediately after the calibration since the covariance matrix of the stiffness parameter vector was reset. The credible intervals narrowed down rapidly as the filter propagated. On the other hand, it is noticed that small time delay of parameter tracking was expected since the identification results depended on the data points at the current and previous time steps. Figure 5.14 shows the identification results of the damping coefficients and characteristic parameters for the Bouc-Wen model. It is seen that the estimated values agreed well with the actual values and they were within the 99.7% credible intervals. Figure 5.15 shows the actual values of the restoring forces versus the corresponding estimated values obtained from the most plausible model class. The 45degree line in each subplot provides the reference of perfect match. The estimation results in the first 5 s were excluded from the comparison in order to eliminate the effect of inaccurate initial conditions. It is observed that the majority of the points were distributed along the line of perfect match. As a result, the estimated structural responses by using the proposed approach achieved high accuracy.

5.6.3 Identification of Three-Dimensional Truss Dome with Hierarchical Interhealable Model Classes A three-dimensional truss dome (shown as Figs. 5.16 and 5.17) is considered in this example. The truss dome has 19 nodes and 42 bars. It spans 60 m in both x and y directions and reaches the maximum height of 12 m. The mass density and cross-sectional area of each member are 7850 kg/m3 and 5000 mm2 , respectively, and modulus of elasticity is 2.0 × 1011 Pa. Thus, the first five natural frequencies of the space truss dome are 0.246, 1.403, 4.834, 5.449 and 7.724 Hz. Rayleigh damping model is utilized and the damping matrix is given by C = αM + βK, where α = 0.053 s−1 and β = 0.004 s, so that the damping ratios are 2% for the first two modes. The dome was subjected to horizontal and vertical ground motions which were modeled as zero-mean Gaussian white noise with spectral intensity 0.12 m2 /s3 . The sampling time interval was t = 0.002 s and the entire monitoring period was 300 s. Triaxial acceleration responses of the 7th, 8th and 11th nodes were observed using three accelerometers whose locations were marked as triangles in Fig. 5.17. The

190

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.13 Identification results of the stiffness parameters

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

191

Fig. 5.14 Identification results of the damping coefficients and characteristic parameters of the Bouc-Wen model

192

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.15 Actual versus estimated restoring forces using the most plausible model class

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

Fig. 5.16 Space truss dome

Fig. 5.17 Space truss dome-plan view

193

194

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

rms of the measurement noise was taken as 5% rms of the corresponding noise-free response quantities. Structural damage was imposed during the monitoring duration. In particular, the space truss dome was undamaged in the first 100 s. Then, the first damage of 5% stiffness reduction occurred in the 31st element in substructure 1 at t = 100 s; and the second damage of 5% stiffness reduction occurred in the 22nd and 40th elements in substructure 4 at t = 200 s. The values Pl = 1 × 10−4 and Nt = 500 were used for the lower bound of plausibility and a quarter of fundamental period of the space truss dome. There were 42 components with the component stiffness matrices given by: c K1c , K2c , . . . , K42

(5.90)

The 42 members of the space truss dome were separated into six substructures and the substructural numbers were marked in Fig. 5.17. Each substructure consisted of seven components. The number of model classes in Hierarchical Level I was taken as N I + 1 = 7 and these model classes were denoted as: 

(0) (1) (2) (6) I (5.91) C i+1 = Ci+1 , Ci+1 , Ci+1 , . . . , Ci+1 (0) where the stiffness matrix of the baseline model class Ci+1 was given by: 42  

(0) = θ K(0) θ (0) Knc k k

(5.92)

n=1 (0) where θk(0) is the sole stiffness parameter in the baseline model class Ci+1 . For the (1) (2) (6) remaining model classes Ci+1 , Ci+1 , . . . , Ci+1 in Hierarchical Level I, each model class had two stiffness parameters and its stiffness matrix was given by: 6  

(m) (m) (m) = θ K(m) θ (m) K + θ K(s) k k,1 I k,2 I , m = 1, 2, . . . , 6

(5.93)

s=1 s = m where K(m) I , m = 1, 2, . . . , 6, is the mth substructural stiffness matrix given as follows:

K(m) = Knc , m = 1, 2, . . . , 6 (5.94) I n∈S(m)

where S(m) , m = 1, 2, . . . , 6, is the membership vector of the mth substructure and S(1) , S(2) ,…, S(6) form a mutually exclusive partition of the set {1, 2, . . . , 42}.

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

195

Fig. 5.18 Plausibility of the model classes in Hierarchical Level I

Figure 5.18 shows the plausibilities of the model classes in Hierarchical Level I. In (0) and particular, the first subplot shows the plausibility of the baseline model class Ci+1 (1) (2) (6) . the remaining subplots show the plausibilities of model classes Ci+1 , Ci+1 , . . . , Ci+1 The model classes in Hierarchical Level I started with the same prior plausibility     P C1(m)  D0 = 1/7, m = 0, 1, 2, . . . , 6. Moreover, in the early stage of identification, the plausibilities of the model classes were fluctuating severely due to inaccurate (0) initial values and lack of observations. Afterwards, the baseline model class Ci+1

196

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

(0) dominated the plausibility and it implies that Ci+1 was sufficient to represent the undamaged structure. At t = 100 s, the first damage occurred in the 31st structural (0) was element which was located in substructure 1. The baseline model class Ci+1 incapable to represent the damaged structure so its plausibility decreased rapidly. (1) rose up and triggered level up to Meanwhile, the plausibility of model class Ci+1 Hierarchical Level II at t = 102.414 s. In Hierarchical Level II, finer model classes were considered and their resolution was up to the structural component level. The component stiffness matrices in model (1) c c c c c were K1c , K7c , K13 , K19 ,K25 , K31 and K37 , so there were 8 model classes class Ci+1 in Hierarchical Level II and they were denoted as follows:



(0) (1,1) (1,2) (1,7) II C i+1 = Ci+1 , Ci+1 , Ci+1 , . . . , Ci+1

(5.95) (1,q)

(0) where the baseline model class Ci+1 was given as Eq. (5.58); model classes Ci+1 , (1) q = 1, 2, . . . , 7, were induced as model class Ci+1 in Hierarchical Level I triggered (1,q) level up. The stiffness matrices for model classes Ci+1 , q = 1, 2, . . . , 7, were given by:

  (1,q) (1,q) (m) (1,q) = θk,1 Ks(q) K(1,q) θ k + θk,2

42

Kn(m) , q = 1, 2, . . . , 7

(5.96)

n=1 n = s(q) where s(q) = S(1) (q), q = 1, 2, . . . , 7, and S(1) was given as follows: S(1) = [1, 7, 13, 19, 25, 31, 37]T

(5.97)

Figure 5.19 shows the plausibilities of the model classes in Hierarchical Level II. In particular, the first subplot shows the plausibility of the baseline model (0) and the remaining subplots show the plausibilities of model classes class Ci+1 (1,1) (1,2) (1,7) Ci+1 , Ci+1 , . . . , Ci+1 . All the model class candidates in Hierarchical Level II (1,6) achieved started with the same prior plausibility 1/8. Afterwards, model class Ci+1 the maximum plausibility among the model classes in Hierarchical Level II and it triggered the interhealing process at t = 103.514 s and t = 105.202 s. In the interc was calibrated according to the healing process, the component stiffness matrix K31 (1,6) model identification results of model class Ci+1 and it was utilized for calculation (0) triggered level down in the next time step. Afterwards, the baseline model class Ci+1 to Hierarchical Level I at t = 106.432 s and it implies that the deficiencies of the model classes were successfully corrected. At t = 200 s, the second damage occurred in the 22nd and 40th structural elements which were located in substructure 4. It is seen that after the interhealing process (0) regained its for the first structural damage, the updated baseline model class Ci+1

5.6 Applications to Bayesian Real-Time Model Class Selection for System …

197

Fig. 5.19 Plausibility of the model classes in Hierarchical Level II related to the damage at t = 100 s (4) dominating role to represent the structure until the plausibility of model class Ci+1 (4) rose up. Model class Ci+1 triggered level up to Hierarchical Level II at t = 201.890 s. In Hierarchical Level II, finer model classes were considered and their resolution was again up to the component level. The component stiffness matrices in model (4,q) c c c c c c , K16 , K22 ,K28 , K34 and K40 , so there class Ci+1 , q = 1, 2, . . . , 7, were K4c , K10

198

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

were 8 model classes in Hierarchical Level II and they were denoted as follows: 

(0) (4,1) (4,2) (4,7) II C i+1 = Ci+1 , Ci+1 , Ci+1 , . . . , Ci+1

(5.98) (4,q)

(0) where the baseline model class Ci+1 was given as Eq. (5.58); model classes Ci+1 , (4) q = 1, 2, . . . , 7, were induced as model class Ci+1 in Hierarchical Level I triggered (4,q) level up. The stiffness matrices for model classes Ci+1 , q = 1, 2, . . . , 7, were given by:

K

(4,q)

  (4,q) (4,q) (m) (4,q) θk = θk,1 Ks(q) + θk,2

42

Kn(m) , q = 1, 2, . . . , 7

(5.99)

n=1 n = s(q) where s(q) = S(4) (q), q = 1, 2, . . . , 7, and S(4) was given as follows: S(4) = [4, 10, 16, 22, 28, 34, 40]T

(5.100)

Figure 5.20 shows the plausibilities of the model classes in Hierarchical Level II. In particular, the first subplot shows the plausibility of the baseline model (0) and the remaining subplots show the plausibilities of model classes class Ci+1 (4,1) (4,2) (4,7) (4,7) Ci+1 , Ci+1 , . . . , Ci+1 . It is seen that model class Ci+1 first achieved the maximum plausibility among all the model classes in Hierarchical Level II and it triggered the interhealing process at t = 203.636 s. In the interhealing process, the component c was calibrated according to the model identification results of stiffness matrix K40 (4,7) (4,4) model class Ci+1 . Then, model class Ci+1 dominated the plausibility among all the model classes in Hierarchical Level II and triggered the interhealing process at c t = 204.804 s. In the interhealing process, the component stiffness matrix K22 was (4,4) calibrated according to the model identification results of model class Ci+1 . After(0) triggered level down to Hierarchical Level wards, the baseline model class Ci+1 I at t = 207.074 s and it implies that the deficiencies of the model classes were successfully corrected. The updated component stiffness matrices were then utilized to formulate the stiffness matrices of the model classes in Hierarchical Level I and the baseline model class resumed to be the most plausible until the end of the monitoring period. Figure 5.21 shows the estimation results of the stiffness parameters using the most plausible model class. The dotted lines represent the estimated values; the solid lines represent the actual values; and the dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be used to the later figure. It is noted that there were 42 stiffness parameter and each parameter could reflect the health condition of each bar element. However, only six representative stiffness parameters were shown due to space consideration and they were θ k,6 , θ k,9 , θ k,22 , θ k,31 , θ k,32 and 









5.6 Applications to Bayesian Real-Time Model Class Selection for System …

199

Fig. 5.20 Plausibility of the model classes in Hierarchical Level II related to the damage at t = 200 s 

θ k,40 corresponding to the 6th, 9th, 22nd, 31st, 32nd and 40th element, respectively. The remaining 36 identification results for the stiffness parameters were similar to θ k,6 , θ k,9 and θ k,32 . It is clearly seen that the estimated stiffness parameters showed decent agreement with the actual values and the damages of the 22nd, 31st and 40th elements were successfully detected and accurately traced with small time delay. Figure 5.22 shows the estimation results of the damping coefficients using the most 





200

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

Fig. 5.21 Identification results of the representative stiffness parameters using the most plausible model class

plausible model class. It is seen that the estimated values approached the actual values and the 99.7% credible intervals could indicate rational uncertainty level.

5.7 Concluding Remarks

201

Fig. 5.22 Identification results of the damping coefficients using the most plausible model class

5.7 Concluding Remarks This chapter introduced the Bayesian real-time model class selection. In order to select the optimal model class for time-varying dynamical system, Bayesian model class selection is extended from the conventional offline implementation to the novel real-time implementation. By utilizing the Bayes’ theorem to formulate the plausibilities of some given model classes, model class selection can be performed according the plausibilities in a real-time manner. The most plausible model class is less prone to noise so a more reliable future prediction can be anticipated. The proposed method provides simultaneous model class selection and parametric identification in a realtime manner. The Bayesian model class selection for real-time system identification is first conducted by utilizing some prescribed model classes and it is required to have at least one suitable prescribed model class. However, there is no guarantee of a good model class when all the model class candidates are not suitable. In order to resolve this problem, the novel third level of system identification, namely system identification using self-calibratable model classes, is proposed. It can reconfigure the model classes to adaptively correct their deficiencies. Moreover, an identification approach based on hierarchical interhealable model classes is proposed to tackle the

202

5 Bayesian Model Class Selection and Self-Calibratable Model Classes …

problem of large number of model class candidates. The model classes are established in a hierarchical manner so only a limited number of model classes is required, yet being able to explore a large solution space. The modeling errors including the errors in parameters and the deficiencies of the parametric models can be adaptively corrected. The proposed methods establish an innovative framework for real-time system identification and this framework can be easily applied for other filtering techniques.

References Akaike H (1974) A new look at the statistical identification model. IEEE Trans Autom Control 19(6):716–723 Akaike H (1978) A new look at the Bayes procedure. Biometrika 65(1):53–59 Beck JL (2010) Bayesian system identification based on probability logic. Struct Control Health Monit 17(7):825–847 Beck JL, Yuen KV (2004) Model selection using response measurements: Bayesian probabilistic approach. J Eng Mech 130(2):192–203 Ching J, Muto M, Beck JL (2005) Bayesian linear structural model updating using Gibbs sampler with modal data. In: Proceedings of the 9th international conference on structural safety and reliability, Rome, Italy Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge Books Cox RT (1961) The algebra of probable inference. Johns Hopkins University Press, Baltimore Grünwald PD (2007) The minimum description length principle. MIT Press Grünwald PD, Myung IJ, Pitt MA (2005) Advances in minimum description length: theory and applications. MIT Press Hoshiya M, Saito E (1985) Structural identification by extended Kalman filter. J Eng Mech 110(15):1757–1770 Jeffreys H (1961) Theory of probability, 3rd edn. Oxford Clarendon Press, Oxford, UK Ley E, Steel MF (2012) Mixtures of g-priors for Bayesian model averaging with economic applications. J Econ 171(2):251–266 Mark C, Metzner C, Lautscham L, Strissel PL, Strick R, Fabry B (2018) Bayesian model selection for complex dynamic systems. Nat Commun 9(1):1–12 Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471 Schaffer J (2015) What not to multiply without necessity. Australas J Philos 93(4):644–664 Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464 Sivia DS (1996) Data analysis: a Bayesian tutorial. Oxford Science Publications, Oxford, UK Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc B 64(4):583–639 Yuen KV (2010a) Bayesian methods for structural dynamics and civil engineering. Wiley Yuen KV (2010b) Recent developments of Bayesian model class selection and applications in civil engineering. Struct Saf 32(5):338–346 Yuen KV, Dong L (2020) Real-time system identification using hierarchical interhealing model classes. Struct Control Health Monit 27(12):e2628 Yuen KV, Mu HQ (2015) Real-time system identification: an algorithm for simultaneous model class selection and parametric identification. Comput-Aided Civ Inf 30(10):785–801 Yuen KV, Kuok SC, Dong L (2019) Self-calibrating Bayesian real-time system identification. Comput-Aided Civ Inf 34(9):806–821

Chapter 6

Online Distributed Identification for Wireless Sensor Networks

Abstract This chapter introduces an online dual-rate distributed identification framework for wireless sensor networks. Distributed identification is a concept that allows an individual unit to obtain local estimation using part of the data, and the obtained local estimation can then be used as a basis for global estimation. In this chapter, typical architectures of wireless sensor networks will first be introduced, including centralized, decentralized and distributed networks. Then, the online dualrate distributed identification approach is introduced for wireless sensor networks. Filtering method using only raw observations collected at each sensor is introduced. The preliminary local identification results are then compressed before transmitting to the central station for fusion. At the central station, Bayesian fusion is developed to integrate the compressed local identification results transmitted from the sensor nodes in order to obtain reliable global estimation. As a result, the large identification uncertainty in the local identification results can be substantially reduced. In addition to data compression, a dual-rate strategy for sampling and transmission/fusion is used to alleviate the data transmission burden so that online model updating can be realized efficiently for wireless sensor networks. The computational framework in this chapter will be followed in the next chapter, where specific algorithms for handling asynchronous data and multiple outlier-corrupted data will be introduced. Keywords Distributed identification · Online estimation · Bayesian fusion · Dual rate · Wireless sensor network

6.1 Introduction Chapters 3, 4 and 5 addressed some challenging issues in real-time system identification including both parametric identification, model class selection and system identification based of self-calibratable model classes, and they were introduced based on the centralized identification framework. Centralized identification requires transmitting all measured data to a single central processing unit, where global identification results are obtained. The architecture of centralized identification is theoretically the simplest and it has the best performance when all sensors in the network are accurately aligned and synchronized. Therefore, conventional system identification © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Huang and K.-V. Yuen, Bayesian Real-Time System Identification, https://doi.org/10.1007/978-981-99-0593-5_6

203

204

6 Online Distributed Identification for Wireless Sensor Networks

methods assume that observations from the sensing system are centrally acquired and processed. Wireless sensor network (WSN) is regarded as one of the most notable technological innovations in the twenty-first century and it becomes immensely widespread as a result of its advantages, such as convenient management, ease of installation and low deploying and maintaining cost. It is well-known that the basic feature for smart wireless sensors is the on-board microprocessor, which is used to process the observed data, make decisions, save data locally and transmit local results. Thus, identification algorithms can be programmed and built into the microprocessor and parts of the computational tasks can be performed at the sensor nodes. In addition, extraneous local information can be discarded to reduce the amount of data to be transmitted to the central station. As a result, distributed identification using WSNs can be realized in such a way that the raw response measurements are first processed by the sensor nodes and then only some extracted features are transmitted to the central station for fusion and diagnosis (Spencer et al. 2004). The online dual-rate distributed identification approach introduced in this chapter can realize reliable and efficient updating for time-varying linear/nonlinear systems. This method distributes a substantial portion of the computational workload to each sensor node in the network and compresses the local estimation results. A Bayesian fusion algorithm is developed to integrate these local estimation results and a much lower rate than the sampling rate is utilized to fuse these local results for reducing the data transmission burden. As a result, the method provides not only the most probable values of the model parameters but also their associated uncertainties. In the next section, typical architectures of WSNs are introduced and their pros and cons will be discussed. As a result, readers can have better understanding of the functionality and attributes of the typical architectures of WSNs. In Sect. 6.3, the extended Kalman filter (EKF) using measurements from single sensor node is introduced. In Sect. 6.4, the data compression and extraction approach at the sensor nodes is introduced, including data compression and extraction techniques for updated state vector and its associated covariance matrix. Section 6.5 presents the detailed derivations for the product of an arbitrary number of univariate Gaussian probability density functions (PDFs), and the product of an arbitrary number of multivariate Gaussian PDFs. A Bayesian fusion algorithm is built based on the results from the derivations to integrate the compressed local estimations transmitted from the sensor nodes to the central station. Section 6.6 presents two applications using a forty-story building and a bridge with two piers to structural health monitoring.

6.2 Typical Architectures of Wireless Sensor Network

205

6.2 Typical Architectures of Wireless Sensor Network 6.2.1 Centralized Networks The centralized network architecture shown in Fig. 6.1 is the most intuitive and the most commonly used type of networks in system identification. It is built around a single central/master server, which aggregates, stores and processes observations from all the local/slave sensor nodes. In other words, the local/slave nodes in the network are only responsible for recording and forwarding the raw measurement data to the central/master server without processing. Centralized networks hold the following advantages against decentralized and distributed networks: (1) Simple and rapid deployment. In principle, there is only one single central server in the network, and hence it is convenient to manage the configuration of the network. (2) Minimal information loss. In centralized networks, all the raw observations are directly transmitted to the central server for computation and decision-making, and, therefore, it is straightforward to maintain the data integrity and achieve the minimum information loss. (3) High precision. Centralized networks permit to utilize the observations from all local nodes, and this contributes to obtaining high precision results. Although centralized networks possess competitive advantages, it encounters the following limitations: (1) High processing cost. Only one single central unit handles all data processing tasks, leading to high processing cost for calculation and storage.

Central/Master Server Local/Slave node Fig. 6.1 Centralized network

206

6 Online Distributed Identification for Wireless Sensor Networks

(2) Low network scalability. Centralized networks are difficult to scale up because the capacity of the central server is limited and the traffic cannot be infinite. Factors, such as data transmission rate, data storage availability and power consumption, will affect severely the scalability of the networks.

6.2.2 Decentralized Networks In decentralized networks schematically shown in Fig. 6.2, the sensor nodes are grouped into clusters and each cluster has a single node assigned as the cluster head. Decentralized networks distribute measurement-processing workloads to multiple clusters. Each cluster head serves as a mini central unit that processes the observations from the nodes in its cluster, and every cluster makes its own decisions. In a given cluster, all nodes, except the cluster head, can only communicate with the cluster head, while the cluster head can communicate with all nodes in its cluster and other cluster heads subject to connectivity constraints. Decentralized networks follow essentially a peer-to-peer architecture. In other words, all clusters are peers with each other and no one cluster has supremacy over others. Therefore, the final decisions of the network are the aggregate of the decisions from the individual clusters. Advantages for decentralized networks are listed as follows: (1) Enhanced performance. Since every cluster head processes only observations from its cluster and makes its own decisions, data transmission demand

Cluster Cluster head Local node Fig. 6.2 Decentralized network

6.2 Typical Architectures of Wireless Sensor Network

207

and computational requirement can be remarkably reduced. Thus, the power consumption is decreased, and thereby the lifespan of the sensor network is prolonged. (2) High scalability. Decentralized networks are easy to scale, since their local nodes can be directly added to a cluster in order to increase its computing power. Limitations of decentralized networks are given as follows: (1) Complex configuration. Local nodes are divided into several clusters, and thereby it is required much effort to find the optimal cluster configuration, which will be crucial for the quality of the final estimations. (2) Higher maintenance costs. Decentralized networks require multiple cluster heads with advanced techniques to communicate the information instead of a single central server, leading to high maintenance costs. (3) Information loss. Since each cluster head utilizes only the measurement from its cluster with a limited number of local nodes, it is unavoidable that the information from the entire data set cannot be fully utilized, as compared with centralized networks.

6.2.3 Distributed Networks Distributed networks, schematically shown in Fig. 6.3, maintain a single central node, but some computational workloads are distributed among several clusters. It is noticed that distributed networks are similar to the decentralized networks, in the sense that part of the computational workloads is distributed to multiple clusters. The primary difference between decentralized and distributed networks is the responsible unit for decision making. In decentralized networks, each cluster head makes its own decision from the data of its cluster and there is no single central node to make the final decision. In contrast, in distributed networks, data processing is also shared to multiple clusters, but the decisions are still obtained in the central node by using information from all the cluster heads in the network. Distributed networks are similar to the decentralized networks in the sense that they share some important characteristics, such as enhanced performance and high scalability. On the other hand, since distributed networks keep a central node, where information from all the clusters can be integrated, it is more flexible and efficient to characterize the global behavior of the system. According to the architecture of WSNs, system identification using smart wireless sensors can be categorized into centralized, decentralized and distributed identification. Centralized identification possesses the simplest architecture and it has the best performance when all smart wireless sensors in the network are accurately aligned and synchronized. As a result, the majority of conventional system identification algorithms have been developed according to the centralized framework, for example, Farrar and James (1997) and Sirca and Adeli (2012). On the other hand, WSNs enable rooms for new computing possibilities in terms of decentralized and distributed system identification. However, in the literature, most, if not

208

6 Online Distributed Identification for Wireless Sensor Networks

Central node Cluster Cluster head Local node Fig. 6.3 Distributed network

all, existing decentralized and distributed identification methods are performed in an offline manner, for example, Gao et al. (2006) and Wang et al. (2009). Compared with offline identification methods, online techniques have wider applicability for time-varying systems tracking, so it is desirable to investigate the potential of online distributed identification. Therefore, in this chapter, an efficient online distributed system identification method is introduced. It is based on Huang and Yuen (2019), but the presentation has been reorganized and substantially elaborated. It will be more convenient for the readers who want to apply the method to their applications. This method utilizes two strategies to realize efficient online distributed estimation, namely data compression and dual-rate computing. Moreover, Bayesian fusion algorithm is performed at the central station to integrate the local estimation results from the sensor nodes, so the method provides not only the fusion results but also their associated uncertainties.

6.3 Problem Formulations Consider a dynamical system with Nd degrees of freedom (DOFs) and its equation of motion: M x¨ (t) + C[θ c (t)] x˙ (t) + K[θ k (t)]x(t) = T f (t)

(6.1)

6.3 Problem Formulations

209

where M, C and K are the mass, damping and stiffness matrix of the system, respectively; the stiffness and damping matrix are parameterized with possibly time-varying ]T [ model parameters θ (t) ≡ θ k (t)T , θ c (t)T ∈ R Nθ ; f (t) is the excitation applied to the system and T is the influence matrix associated with f . Define an augmented state ]T [ ∈ R2Nd +Nθ composed of the displacement, vector y(t) = x(t)T , x˙ (t)T , θ (t)T velocity and unknown model parameter vector. Then, the state-space representation of the dynamical system is given as follows: ˙y(t) = g( y(t), f (t); θ (t))

(6.2)

where g(., .; .) represents the nonlinear state-space function that characterizes the underlying system. Discrete-time response measurements sampled at the sampling rate rs are acquired by Ns disparate wireless sensor nodes and each wireless sensor node has Nc channels of measurements at its location, e.g., strain, displacement, velocity and acceleration. The corresponding sampling time step is: Δt s = 1/rs

(6.3)

The discrete-time measurement z i(s) ∈ R Nc for the sth sensor node is given by: ( ) z i(s) = h (s) yi + ni(s) , i = 1, 2, . . . ; s = 1, 2, . . . , Ns

(6.4)

where z i(s) ≡ z (s) (iΔt s ) is the noise-corrupted measurement at the ith time step; h (s) (·) defines the observation quantities for the sth sensor node; ni(s) represents the measurement noise of the sth sensor node at the ith time step and n(s) is modeled as Gaussian independent and identically distributed (i.i.d.) process with zero mean and Nc ×Nc . covariance matrix Σ (s) n ∈R In Chaps. 3, 4 and 5, Bayesian real-time system identification methods were introduced based on{ the }centralized identification framework, which implies that all the measurements z i(s) , i = 1, 2, . . . ; s = 1, 2, . . . , Ns , will be directly transmitted to the central station for processing. Compared with centralized identification methods, distributed identification processes directly the measurements observed at the corresponding sensor node. In other words, each sensor node is assigned to execute online identification algorithm with its own observed data and the local processing rate rs . In order to distinguish the notations in this chapter and Chaps. 3–5, the online identification using EKF with the local measurements is briefly revisited as follows. Given the{measurement dataset } of the sth sensor node up to the (i − 1)th time (s) (s) (s) (s) (s) for the sth sensor step Di−1 = z 1 , z 2 , . . . , z i−1 , the predicted state vector yi|i−1 node can be obtained as follows: ] [ | | (s) (s) (s) (s) (s) = Ai−1 yi|i−1 ≡ E yi(s) |Di−1 yi−1|i−1 + δ i−1 (6.5)

210

6 Online Distributed Identification for Wireless Sensor Networks

| ] [ (s) (s) | (s) (s) (s) where yi−1|i−1 ≡ E yi−1 | Di−1 ; Ai−1 is the state-transition matrix; δ i−1 is the remainder term due to the local linear approximation. Then, the covariance matrix (s) can be obtained as: of yi|i−1 (s) Σ i|i−1

] )T || (s) | ≡E − − | Di−1 ) ( (s) (s) (s)T T Σ i−1|i−1 Ai−1 + Bi−1 Σ f Bi−1 = λ Ai−1 [(

yi(s)

(s) yi|i−1

)(

yi(s)

(s) yi|i−1

(6.6)

where λ is the fading factor introduced in Chap. 2; Bi−1 is the input-to-state matrix; and Σ f is the covariance matrix of the excitation f . When a new measurement z i(s) is available, the updated state vector and its covariance matrix can be obtained by Kalman filter (Kalman 1960; Kalman and Bucy 1961): ( ( )) (s) (s) yi|i = yi|i−1 + Gi(s) zi(s) − h (s) yi|i−1

(6.7)

) ( (s) (s) Σ i|i = I2Nd +Nθ − Gi(s) Hi(s) Σ i|i−1

(6.8)

where Hi(s) is the observation matrix at the ith time step for the sth sensor node and Gi(s) is the Kalman gain at the ith time step for the sth sensor node given by: (s) Gi(s) = Σ i|i−1 Hi(s)

T

( )−1 T (s) Hi(s) Σ i|i−1 Hi(s) + Σ (s) n

(6.9)

At each sensor node, Eqs. (6.5)–(6.9) can be used recursively with the measured data to obtain the preliminary local estimation of the state vector and its covariance matrix. It is obvious that the local estimation results will exhibit large uncertainty because of the limited information from the corresponding sensor node. Thus, fusion of the local estimation results is desirable for reliable global estimation and its reasonable uncertainty level. In addition, direct transmission of local estimation results, (s) (s) and associated covariance matrices Σ i|i for all including the updated state vectors yi|i the sensor nodes, will lead to a heavy data transmission burden in the sensory system. It may easily exceed the bandwidth and power requirement limit of the network. (s) (s) and Σ i|i involves (2Nd + Nθ )(2Nd + Nθ + 3)/2 Specifically, transmission of yi|i real numbers for each sensor node at each time step. For example, consider a system with 100 DOFs and 10 unknown model parameters. Then, the amount of data to be transmitted to the central station is (2Nd + Nθ )(2Nd + Nθ + 3)/2 = 22365 for each sensor node at each time step. Therefore, it is desirable to perform data compression before transmission to the central station for fusion.

6.4 Compression and Extraction Technique at the Sensor Nodes

211

6.4 Compression and Extraction Technique at the Sensor Nodes In this section, two efficient strategies are employed to reduce the data transmission demand. First, instead of using the same rate as the sampling rate at the sensor nodes to fuse the local estimation results, it is proposed to use a much lower rate rc than the sampling rate for fusion at the central station. The local sampling rate rs can be determined based on the sensor network setup and its selection criteria are the same as conventional centralized identification techniques. Typically, it is 200 Hz. On the other hand, the choice of rc at the central station is more flexible with the consideration of the following tradeoff. A larger value of rc implies higher demand of transmission and fusion, while a smaller value of rc will result in larger delay of the identification results. For typical applications of system identification for civil engineering structures, it is suggested to be in the order of 1 s. Therefore, in the dual-rate strategy, for every rs /rc sampling time steps, the local identification results will only be compressed and transmitted once to the central station. Note that the local index i runs much faster than the central index I , and i = I rs /rc at the same physical time. Figure 6.4 shows the schematic diagram of the time index i at the sensor nodes and the time index I at the central station. The dashed arrows represent the time instant that the local identification results are compressed and transmitted to the central station. This dual-rate computing strategy can significantly alleviate the data transmission burden and achieve efficient online updating.

6.4.1 Compression and Extraction of the Updated State Vector The second strategy for alleviating the data transmission burden is to compress the local estimation results and it is elaborated as follows. The locally updated state vector ]T [ (s) (s) (s)T (s)T (s)T yi|i in Eq. (6.5) contains three components, namely yi|i = x i|i , x˙ i|i , θ i|i , (s) (s) (s) where x i|i , x˙ i|i and θ i|i represent the updated displacement, velocity and model parameter vector, respectively. In practice, the dimension of the updated state vector (s) yi|i is large for models with a large number of DOFs, since it includes three different

Fig. 6.4 Time indices at the sensor nodes and the central station

212

6 Online Distributed Identification for Wireless Sensor Networks

types of components, i.e., updated displacement, velocity and model parameter vectors. In order to diminish the amount of data transmission, the locally updated displacement and velocity vectors are projected to the space spanned by a limited number of the eigenvectors of the nominal model. The reason to use the nominal eigenvectors instead of the updated eigenvectors is that it can avoid solving eigenvalue problem at each node and at every time step. Moreover, the nominal eigenvectors can be pre-calculated and pre-stored at the central station and all sensor nodes. As a result, there is no need to transmit them between the central station and the sensor nodes. Furthermore, the updated eigenvectors at different nodes will generally be slightly different so using the updated eigenvectors for the projection basis will complicate the subsequent computation. Finally, although the updated and nominal eigenvectors are different, the difference can be compensated by the other eigenvectors. Use φ m ∈ R Nd , m = 1, 2, . . . , Nm , to denote the mth eigenvector of the nominal model, and they are mass normalized, that is φ Tm Mφ m = 1. In addition, they are ' M-orthogonal, that is φ Tm Mφ m ' = 0, if m /= m . Then, the nominal eigen-matrix Φ is defined to integrate the first Nm nominal eigenvectors: Φ ≡ [φ 1 , φ 2 , . . . , φ Nm ] ∈ R Nd ×Nm

(6.10)

It is noted that Φ can be pre-calculated and pre-stored at all sensor nodes and the central station. Therefore, it is not required for transmission in the entire monitoring and identification process. Approximation is realized by mathematical projections. In particular, the projection of the locally updated displacement and velocity vectors to the subspace spanned by the nominal eigenvectors can be obtained as follows: (

Σ Nm (s) = m=1 φ u (s) + ε u,i = Φui(s) + ε u,i x i|i Σ Nm m m,i (s) (s) x˙ i|i = m=1 φ m vm,i + ε v,i = Φv i(s) + ε v,i

(6.11)

(s) where u (s) m,i and vm,i , m = 1, 2, . . . , Nm , are the projected displacement and velocity coordinates, respectively; and the projected displacement and velocity vectors can be ]T [ (s) (s) defined by grouping the corresponding coordinates: ui(s) = u (s) ∈ 1,i , u 2,i , . . . , u Nm ,i ]T [ (s) (s) R Nm and v i(s) = v1,i , v2,i , . . . , v (s) ∈ R Nm . The last term in each of the subNm ,i equation in Eq. (6.11), ε u,i and ε v,i , are the projection error and they satisfy:

φmT Mεu,i = 0, m = 1, 2, . . . , Nm φmT Mεv,i = 0, m = 1, 2, . . . , Nm

(6.12)

Since the eigenvectors are M-orthonormal (Yuen 2012), the projected displacement and velocity vectors are readily obtained as:

6.4 Compression and Extraction Technique at the Sensor Nodes

213

( ) (s) ui(s) = ΦT M x i|i

(6.13)

( ) (s) v i(s) = ΦT M x˙ i|i

(6.14)

The matrix ΦT M can be pre-calculated and pre-stored in all sensor nodes, so it is not required to be transmitted in the monitoring period. Therefore, only the projected displacement vector ui(s) ∈ R Nm , projected velocity vector v i(s) ∈ R Nm and model parameter vector θ i(s) ∈ R Nθ are required to be transmitted to the central station. In other words, denote the compressed state vector to include the projected displacement, projected velocity and model parameter vector: ] [ T T T T q i(s) ≡ ui(s) , v i(s) , θ i(s) ∈ R2Nm +Nθ

(6.15)

Note that only q i(s) of each sensor node will be transmitted to the central station. Since the number of significant modes is far less than the number of DOFs for dynamical systems with a large number of DOFs (i.e., Nm Φχ 2 (0.95), s = 1, 2, . . . , Ns

This threshold

(7.51)



/

Φ−1 (0.95) is chosen conservatively and it is only used to form χ2 Nθ

the suspicious dataset eS,I . After obtaining the suspicious dataset eS,I , the initial regular dataset eR0,I can be readily obtained by excluding the suspicious data points: eR0,I = e I − eS,I f0

(7.52)

f0

Then, the initial fusion results θ I and v I can be obtained by using Eqs. (7.35) and (7.36), respectively, using only data points in the initial regular dataset eR0,I : Σ f0

(s) θm,i|i (s) e(s) ∈e R0,I v I m,i

θm,I = Σ

1 (s) e(s) I ∈eR0,I vm,i

, m = 1, 2, . . . , Nθ

(7.53)

7.5 Hierarchical Outlier Detection

f0

vm,I = Σ

265

1 e(s) I ∈eR0,I

, m = 1, 2, . . . , Nθ

1

(7.54)

(s) vm,i

In other words, Eqs. (7.53) and (7.54) are the same as Eqs. (7.35) and (7.36), respectively, except that the suspicious data points are excluded for fusion. Next, each suspicious data point is considered and its outlier probability will be assessed. Denote the suspicious data points in the dataset eS,I as follows: ] [ (sn ) (sn ) , n = 1, 2, . . . , N (S) e(n) = θ , v i i|i S,I

(7.55)

where N (S) indicates the number of ]data points (local estimation results) in the [ (sn ) , v i(sn ) is the data point in eS,I . The residual vector suspicious dataset eS,I and θ i|i is defined as follows to indicate the difference between the suspicious data point and the initial fusion result: (sn ) e(n) I ≡ θ i|i − θ I

f0

(7.56)

It is obvious that the residual vector e(n) of the regular data point follows an I Nθ -variate zero-mean Gaussian distribution and the diagonal covariance matrix has f0 variances from the vector v i(sn ) + v I . Then, the contour of the residuals with the same probability density as e(n) I is governed by the following equation: Nθ Σ

( )2 (n) e˜m,I

m=1

(sn ) vm,i + vm,I f0

=

Nθ Σ

( )2 (n) em,I

m=1

(sn ) vm,i + vm,I f0

(7.57)

∼(n)

(n) where e I represents the points on the contour in the residual space and e˜m,I is the mth ∼(n)

component of e I . The space enclosed by this contour is a hyper-ellipsoid. Moreover, the residuals within the hyper-ellipsoid are associated with higher probability density values than outside. As a result, the probability of a data point falling inside the hyper-ellipsoid can be utilized to evaluate the outlierness of the suspicious data point e(n) S,I . The outlier probability for a suspicious data point e(n) S,I can be defined as the probability of this hyper-ellipsoid. Instead of integrating directly to obtain this probability, the outlier probability can be calculated as follows. It is noticed that the right-hand side of Eq. (7.57) is a constant:

△(n) I =

Nθ Σ

( )2 (n) em,I

m=1

(sn ) vm,i + vm,I f0

(7.58)

266

7 Online Distributed Identification Handling Asynchronous Data …

It is observed that the constant △(n) I in Eq. (7.58) is the sum of squares of Nθ independent Gaussian random variables, so it follows the Chi-square distribution with Nθ degrees of freedom: 2 △(n) I ∼ χ Nθ

(7.59)

As a result, the outlier probability for a data point e(n) S,I is given by: ( ) ( ) (n) 2 ≡ Φ △ , n = 1, 2, . . . , N (S) PoG e(n) χ I S,I N θ

(7.60)

where Φχ N2 (·) is the cumulative distribution function of the Chi-square distribution θ with Nθ degrees of freedom. As a result, the detection criteria for global outliers are given by: ⎧ ( ) ⎨ P G e(n) ≥ 0.5, e(n) is a global outlier o S,I S,I ( ) ⎩ P G e(n) < 0.5, e(n) is a regular data point o S,I S,I

(7.61)

( ) According to Eq. (7.61), the data points with global outlier probability PoG e(n) S,I smaller than 0.5 will be reclassified as regular data points and they will be included in the regular dataset eR,I : ( ) } { (n) G (S) e < 0.5, n = 1, 2, . . . , N eR,I = eR0,I ∪ e(n) : P o S,I S,I

(7.62)

Finally, the updated fusion results can be obtained by using Eqs. (7.53) and (7.54) with eR0,I being replaced by eR,I : Σ f

(s) θm,i|i (s) e(s) ∈e R,I v I m,i

θm,I = Σ f

1 (s) e(s) I ∈eR,I vm,i

vm,I = Σ

1 1 (s) e(s) I ∈eR,I vm,i

, m = 1, 2, . . . , Nθ

(7.63)

, m = 1, 2, . . . , Nθ

(7.64)

In other words, all the updated regular data points will be used to calculate the fusion results at the I th central time step. The hierarchical outlier cleansing scheme including the local outlier detection and the global outlier detection is available and the procedure is summarized as follows:

7.5 Hierarchical Outlier Detection

267

Training stage A short training stage is required at each sensor node to provide initial estimation (s) . It is suggested to implement for the standard deviation of the prediction error σc,1 the training stage for roughly ten fundamental periods of the dynamical system. At the sensor nodes, S1. Implement local identification using EKF. | | | | (s) (s) S2. Calculate the absolute residuals |z c,i+1 − z c,i+1|i | and sort them in ascending order. (s) S3. The initial value σc,1 is given by the 68th-percentile value of the sorted absolute residuals. Working stage After the training stage, the time index is restarted with i = 0. At the sensor nodes, (s) S1. Compute the normalized residual ∈c,i+1 of the cth measurement channel using Eq. (7.37). S2. Local outlier detection: ( ) (s) (s) (a) Calculate the outlier probability PoL z c,i+1 for z c,i+1 using Eq. (7.42). If ( ) (s) (s) < 0.5, z c,i+1 PoL z c,i+1 is classified as a regular data point. Otherwise, it will be classified as an outlier and discarded from the identification process. (s) using Eq. (7.46). (b) Update the estimator σc,i+1 (s) S3. If i is an integer multiple of rs /rc , transmit the local estimation results θ i|i and v i(s) to the central station. Otherwise, repeat the procedure from Step S1 for the next time step. At the central station, ( ) (s) , s = 1, 2, . . . , Ns , using Eq. (7.48) and C1. Calculate the simplified MD d θ i|i obtain the suspicious dataset eS,I and initial regular dataset eR0,I . f0 f0 C2. Compute the initial fusion results θ I and v I by using Eqs. (7.53) and (7.54), respectively. ( ) (n) (S) G e C3. Compute △(n) , n = 1, 2, . . . , N , according to Eq. (7.58) and P o I I,S ( ) (n) (n) G according to Eq. (7.60). If Po e I,S < 0.5, the data point e I,S will be reclassified as a regular point. C4. Update the regular dataset by using Equation (7.62) and calculate the updated fusion results by using Equation (7.63) and Equation (7.64). C5. Sent back the fusion results to the sensor nodes and continue for the next time step.

268

7 Online Distributed Identification Handling Asynchronous Data …

7.6 Application to Model Updating of a Forty-Story Building A forty-story building shown in Fig. 7.13 is considered. It has uniformly distributed floor mass and interstory stiffness, and the stiffness-to-mass ratio was taken as 2160 s−2 . As a result, the fundamental frequency of the building is 0.2869 Hz. The Rayleigh damping model was used, and the damping matrix was given by C = αM + βK, where α = 0.0270 s−1 and β = 0.0028 s. As a result, the damping ratios Σ for the first two modes were 1.0%. The stiffness matrix was parameterized as K = 6n=1 θk(n) K(n) , where θk(n) and K(n) are the stiffness parameter and stiffness matrix of the nth substructure, respectively. Specifically, θk(1) and θk(2) were assigned to the 1st and 2nd story, respectively; θk(3) , θk(4) , θk(5) and θk(6) were assigned to the 3rd–5th, 6th–10th, 11th–20th and 21st–40th stories, respectively. The building was subjected to ground excitation modeled as zero-mean Gaussian white noise with spectral intensity S0 = 1.6 × 10−4 m2 /s3 . The entire monitoring duration was 500 s. The sampling rate at the sensor nodes was rs = 250 Hz while the Fig. 7.13 Forty-story building

7.6 Application to Model Updating of a Forty-Story Building

269

transmission/processing rate at the central station was rc = 1 Hz. The building was undamaged in the first 200 s. Then, 5% stiffness reduction occurred at the 200th s and 350th s in the first and second story, respectively. Sixteen wireless sensor nodes were placed on the 1st to 6th, 8th, 9th, 10th, 12th, 15th, 16th, 18th, 20th, 30th and 40th floor and each sensor node was assigned to measure the displacement and acceleration responses of the corresponding floor. Two types of abnormal data, i.e., outliers and biases, were generated and they were the most commonly encountered anomalies in sensing systems for structural health monitoring (Fu et al. 2019). Specifically, the raw measurements were composed of regular noisy measurements, outliers and biases. The measurement noise level for the regular noisy measurements was taken as 5% rms of the corresponding noise-free response quantities. Moreover, the outlier occurrence rate, which controls the amount of outliers in the raw measurements, was taken as 1% for each measurement channel. The outliers were generated with the uniform distribution ranging from the minimum to the maximum values of the corresponding noise-free response quantities. On the other hand, the wireless sensor nodes placed on the 10th, 20th and 30th floor were biased at the beginning of the monitoring period. In particular, the displacement measurement channel of the 10th floor was biased and the bias level was 20% rms of the corresponding noise-free response quantity. The acceleration measurement channel of the 20th floor was biased and the bias level was 15%. In addition, both measurement channels of the 30th floor were biased and the bias level was 10%. It is noted that the information about the outlier occurrence rate, outlier distribution and biased sensors was unknown throughout the entire identification process and it was used only for simulation purpose. The estimation results of the stiffness parameters obtained by using the conventional EKF with multiple outlier-corrupted measurements are first shown in Fig. 7.14. The time interval [360, 390] s is magnified in Fig. 7.15. The dotted lines represent the estimated values; the solid lines represent the actual values and the dashed lines represent the bounds of the 99.7% credible intervals. The same line style will be used to the later figures. It is seen that the estimation results of the stiffness parameters fluctuated severely and the credible intervals failed to represent the posterior uncertainty. It illustrates that the outliers substantially affect the identification results so detection and removal of the outliers are crucial. Next, the performance of the local outlier detection is examined. Two well-known indicators, namely masking and swamping are used to assess the performance of local outlier detection and they are elaborated in Chap. 4. Table 7.2 shows the local outlier detection results. The first column indicates the sensor node number. The second and third columns indicate the location of the sensor nodes and the corresponding measurement channels. The third and fourth columns show the average masking and swamping percentages, respectively, over 50 independent simulation runs. Note that the outlier detection results in the first 10 s were excluded in order to eliminate the effect of the initial conditions. The results show that the local outlier detection approach can successfully detect virtually all the local outliers with very low level of swamping in all the measurement channels. It is also observed that the masking and swamping values of the biased sensor nodes placed on the 10th and 20th floors

270

7 Online Distributed Identification Handling Asynchronous Data …

Fig. 7.14 Estimation results of stiffness parameters by using the conventional EKF

were slightly larger than others since the biased measurements were associated with large errors and were not detectable at the sensor node level. However, the masking and swamping values of the biased sensor node placed on the 30th floor were not affected by the biased measurements because the bias level was low. Figure 7.16 shows the local estimation results of the stiffness parameters, using the measurements from the 16th sensor node placed on the 40th floor. At this sensor node, although there was no sensor bias, the measurements were contaminated with regular measurement noise and outliers. The local estimation results exhibited large posterior uncertainty and severely fluctuated due to limited local information from the corresponding sensor node only, even though the local outliers had already been

7.6 Application to Model Updating of a Forty-Story Building

271

Fig. 7.15 Estimation results of stiffness parameters in the time interval [360, 390] s by using the conventional EKF

excluded from the identification process. Therefore, it is not surprising to observe that fusing the local estimation results is indispensable. On the other hand, Fig. 7.17 shows the local estimation results of the stiffness parameters using the measurements from the 15th sensor node placed on the 30th floor. Note that there was 10% sensor bias for both measurement channels at this sensor node in addition to regular measurement noise and outliers. These local estimations results exhibited much higher fluctuation. However, the sensor bias-induced outliers were not detectable at the sensor node level, because the displacement measurements of the 15th sensor node were associated with the same large error in the entire time histories caused by the sensor bias. As a result, it is necessary to perform global outlier detection at the central station. At each central time step, the transmitted local estimation results were first subjected to global outlier detection at the central station and the detection results are shown in Table 7.3. The first and second columns indicate the sensor node number and the corresponding location. The third column indicates the detection rate of the global outliers. The detection rate is calculated as the average ratio between the number of detected global outliers and the total number of central time steps for a sensor node among 50 simulation runs. The average detection rates for the sensor nodes placed on the 10th, 20th and 30th floors were close to unity while the others

272

7 Online Distributed Identification Handling Asynchronous Data …

Table 7.2 Local outlier detection results Sensor node

Location

Channel

Masking (%)

Swamping (%)

1

1

Displacement

0.00

0.14

Acceleration

0.00

0.20

2

2

Displacement

0.00

0.45

Acceleration

0.00

0.33

Displacement

0.00

0.34

Acceleration

0.00

0.24

Displacement

0.00

0.31

Acceleration

0.00

0.20

Displacement

0.00

0.25

Acceleration

0.00

0.18

Displacement

0.00

0.24

Acceleration

0.00

0.32

Displacement

0.00

0.26

Acceleration

0.00

0.30

Displacement

0.00

0.20

Acceleration

0.00

0.18

Displacement

0.01

1.35

Acceleration

0.00

1.20

Displacement

0.00

0.17

Acceleration

0.00

0.21

Displacement

0.00

0.10

Acceleration

0.00

0.15

Displacement

0.00

0.11

Acceleration

0.00

0.23

Displacement

0.00

0.10

Acceleration

0.00

0.18

Displacement

0.00

1.27

Acceleration

0.08

1.43

Displacement

0.00

0.14

Acceleration

0.00

0.17

Displacement

0.00

0.13

Acceleration

0.00

0.17

3

3

4

4

5

5

6

6

7

8

8

9

9

10

10

12

11

15

12

16

13

18

14

20

15 16

30 40

7.7 Concluding Remarks

273

Fig. 7.16 Estimation results of stiffness parameters at the 16th sensor node

were virtually zero. It illustrates that the proposed approach can successfully detect the biased sensor nodes by identifying the global outliers at the central station. After removal of the global outliers, Bayesian fusion was performed to obtain the final estimation. Figure 7.18 shows the fusion results of the stiffness parameters at the central station. The estimated stiffness parameters agreed well with the actual values within the 99.7% credible intervals and the large uncertainty in the local estimation results was significantly reduced. In addition, there was small time lag in the tracking of the abrupt change of θk(1) and θk(2) , since the estimation results were based on the measured data at the current and previous time steps. Nevertheless, the time delay was tolerably small.

7.7 Concluding Remarks In this chapter, two effective methods for handling the typical challenging problems in system identification, including asynchronous measurements and multiple outliercorrupted measurements, are presented. The proposed methods are established based on the online dual-rate distributed identification framework introduced in Chap. 6.

274

7 Online Distributed Identification Handling Asynchronous Data …

Fig. 7.17 Estimation results of stiffness parameters at the 15th sensor node

The proposed approach regarding to handling asynchronous measurements provides a simple but reliable framework for online distributed system identification using asynchronous data directly. It requires neither a model of asynchronism nor estimation of the time shifts among different sensor nodes so additional computational cost to quantify the asynchronous time shifts can be avoided. On the other hand, the proposed hierarchical outlier detection method regarding to handling multiple outlier-corrupted measurements provides a comprehensive way to detect different types of outliers. The proposed approach detects the local outliers according to the outlier probability of the raw measurements at the sensor nodes while it detects the global outliers according to the outlier probability of the local estimation results. By

7.7 Concluding Remarks Table 7.3 Global outlier detection results

275 Sensor node

Location

Detection rate (%)

1

1

0.40

2

2

0.00

3

3

0.00

4

4

0.00

5

5

0.00

6

6

0.20

7

8

0.60

8

9

0.00

9

10

97.20

10

12

0.20

11

15

0.20

12

16

0.20

13

18

0.00

14

20

93.40

15

30

91.60

16

40

0.00

Fig. 7.18 Estimation results of stiffness parameters at the central station

276

7 Online Distributed Identification Handling Asynchronous Data …

excluding both types of outliers, reliable system identification results can be achieved for time-varying dynamical systems in an online manner.

References Aggarwal CC (2017) Outlier analysis. Springer Cham Bai X, Wang Z, Sheng L, Wang Z (2018) Reliable data fusion of hierarchical wireless sensor networks with asynchronous measurement for greenhouse monitoring. IEEE Trans Control Syst Technol 27(3):1036–1046 Dragos K, Theiler M, Magalhães F, Moutinho C, Smarsly K (2018) On-board data synchronization in wireless structural health monitoring systems based on phase locking. Struct Control Health Monit 25:e2248 Fu Y, Mechitov K, Hoang T, Kim JR, Memon SA, Spencer BF Jr (2021) Efficient and high-precision time synchronization for wireless monitoring of civil infrastructure subjected to sudden events. Struct Control Health Monit 28(1):e2643 Fu Y, Peng C, Gomez F, Narazaki Y, Spencer BF Jr (2019) Sensor fault management techniques for wireless smart sensor networks in structural health monitoring. Struct Control Health Monit 26(7):e2362 Gupta M, Gao J, Aggarwal CC, Han J (2013) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267 He T, Stankovic JA, Abdelzaher TF, Lu C (2005) A spatiotemporal communication protocol for wireless sensor networks. IEEE Trans Parallel Distrib Syst 16(10):995–1006 Huang K, Yuen KV (2020) Hierarchical outlier detection approach for online distributed structural identification. Struct Control Health Monit 27(11):e2623 Lei Y, Kiremidjian AS, Nair KK, Lynch JP, Law KH (2005) Algorithms for time synchronization of wireless structural monitoring sensors. Earthq Eng Struct D 34:555–573 Li J, Mechitov KA, Kim RE, Spencer BF Jr (2016) Efficient time synchronization for structural health monitoring using wireless smart sensor networks. Struct Control Health Monit 23:470–486 Maes K, Reynders E, Rezayat A, De RG, Lombaert G (2016) Offline synchronization of data acquisition systems using system identification. J Sound Vib 381:264–272 Mu HQ, Yuen KV (2015) Novel outlier-resistant extended Kalman filter for robust online structural identification. J Eng Mech 141(1):04014100 Nagayama T, Spencer BF Jr (2007) Structural health monitoring using smart sensors. University of Illinois at Urbana-Champaign, Newmark Structural Engineering Laboratory Su W, Akyildiz IF (2005) Time-diffusion synchronization protocol for wireless sensor networks. IEEE/ACM Trans Netw 13(2):384–397 Zhang Y, Meratnia N, Havinga PJM (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tut 12(2):159–170 Zhu YC, Au SK (2017) Spectral characteristics of asynchronous data in operational modal analysis. Struct Control Health Monit 24(11):e1981