148 101 10MB
English Pages 309 [304] Year 2023
Badong Chen · Lujuan Dang Nanning Zheng · Jose C. Principe
Kalman Filtering Under Information Theoretic Criteria
Kalman Filtering Under Information Theoretic Criteria
Badong Chen • Lujuan Dang • Nanning Zheng • Jose C. Principe
Kalman Filtering Under Information Theoretic Criteria
Badong Chen National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics Xi’an Jiaotong University Xi’an, China
Lujuan Dang National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics Xi’an Jiaotong University Xi’an, China
Nanning Zheng National Key Laboratory of Human-Machine Hybrid Augmented Intelligence National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics Xi’an Jiaotong University Xi’an, China
Jose C. Principe Electrical and Computer Engineering Department University of Florida Gainesville, FL, USA
ISBN 978-3-031-33763-5 ISBN 978-3-031-33764-2 https://doi.org/10.1007/978-3-031-33764-2
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Kalman filtering is a powerful technology for state estimating of a dynamic system, which finds applications in many areas, including navigation, guidance, data integration, pattern recognition, tracking, and control systems. The workhorse of Kalman filtering has been the minimum mean square error (MMSE) criterion for historical reasons, not because it is the best we can do. Minimizing the power of the error variable is tractable in mathematics and has an intuitive understanding in engineering. Moreover, if the system is linear and innovation and noise are Gaussian, the MMSE yields an optimal estimator that can be solved recursively; therefore, most of the textbooks today still only treat this case, which is rather unfortunate. The Gaussian assumption is, however, seldom the case in real-world applications, where noise distributions tend to be skewed, multimodal, or heavytailed. This will create problems for MMSE because the solution is no longer the optimal and state estimation performance may degrade seriously in the case of nonGaussian noise. Researchers have proposed approaches (e.g., particle filtering) to solve the performance degradation in non-Gaussian noises, but the computational complexity is very high and/or the solutions found involved several free parameters that need to be tuned. In recent years, the remarkable advantages of information theoretic criteria (e.g., maximum correntropy criterion (MCC) and minimum error entropy (MEE) criterion) in solving non-Gaussian problems in machine learning and signal processing are more and more recognized by researchers. Under information theoretic criteria, the filters are adapted to minimize (or maximize) information theoretic quantities (such as the error entropy) instead of minimizing the mean square error. Since information theoretic quantities can capture higher-order statistics of the error, they offer potentially significant performance improvement in non-Gaussian noises. In the past few years, the authors have developed several efficient Kalman filters (linear or nonlinear) under information theoretic criteria, which can achieve excellent performance in complicated non-Gaussian noises with reasonable computation, yielding great practical application potential. The purpose of this book is to combine all these perspectives and results in a single textbook that will become an important reference for students and practitioners in relevant application fields. v
vi
Preface
To this end, the style and approach of the book will be geared to senior undergraduates with a basic understanding of linear algebra, signal processing and statistics, graduate students or practitioners with experience in Kalman filtering. Each chapter will start with a brief review of fundamentals and then present the material focused on the most important properties and then present results with a discussion of free parameters and their effect on the results. Proofs will be kept at a minimum and relegated to the end of each chapter. Xi’an, China Xi’an, China Xi’an, China Gainesville, FL, USA
Badong Chen Lujuan Dang Nanning Zheng Jose C. Principe
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Robust Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Robust Kalman Filtering Under Information Theoretic Criteria . . . . . . 1.4.1 Information Theoretic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Application to Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 3 6 6 7 8
2
Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Linear Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Bayesian Estimation (BE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Maximum a Posteriori Estimation (MAP) . . . . . . . . . . . . . . . . . . . . . 2.1.3 Linear Minimum Variance Estimation (LMV) . . . . . . . . . . . . . . . . 2.1.4 Minimum Variance Estimation (MV) . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5 Weighted Least Squares (WLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.6 Batch-Mode Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Nonlinear Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Extended Kalman Filter (EKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Unscented Kalman Filter (UKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Cubature Kalman Filter (CKF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Robust Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 H-Infinite Filter (HIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Adaptive Kalman Filter (AKF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Student’s t-Based Kalman Filter (SKF). . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Particle Filter (PF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Huber-Based Kalman Filter (HKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 12 14 17 19 22 23 24 28 30 31 33 35 38 38 40 43 44 48 50
vii
viii
Contents
3
Information Theoretic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Maximum Correntropy Criterion (MCC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Correntropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Maximum Correntropy Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Minimum Error Entropy (MEE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Renyi’s Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Minimum Error Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Minimum Error Entropy with Fiducial Points (MEEF) . . . . . . . . . . . . . . . 3.3.1 MEEF Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Fixed-Point Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Appendix 3.A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Appendix 3.B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Appendix 3.C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Appendix 3.D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Appendix 3.E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 Appendix 3.F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.7 Appendix 3.G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.8 Appendix 3.H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53 54 54 56 65 65 66 73 73 75 77 77 79 79 80 82 83 85 87
4
Kalman Filtering Under Information Theoretic Criteria . . . . . . . . . . . . . . . 4.1 Kalman Filter (KF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Maximum Correntropy Kalman Filter (MCKF) . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 MCKF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Computational Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Convergence Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Other Approaches for MCC-Based Kalman Filtering . . . . . . . . . . . . . . . . . 4.3.1 Correntropy Filter (C-Filter) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Modified Correntropy Filter (MC-Filter) . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Maximum Correntropy Criterion Kalman Filter (MCC-KF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Measurement-Specific Correntropy Filter (MSCF) . . . . . . . . . . . 4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF) . . . . . . . 4.4.1 GMCKF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Computational Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Parameter Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Minimum Error Entropy Kalman Filter (MEE-KF) . . . . . . . . . . . . . . . . . . . 4.5.1 MEE-KF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Computational Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Convergence Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89 90 90 90 91 91 94 95 97 101 101 102 102 103 104 105 108 109 110 114 114 117 119 121
Contents
ix
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Appendix 4.A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Appendix 4.B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123 124 124 125
5
Extended Kalman Filtering Under Information Theoretic Criteria . . . 5.1 Extended Kalman Filter (EKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Maximum Correntropy Extended Kalman Filter (MCEKF) . . . . . . . . . . 5.2.1 Linear Regression MCEKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Nonlinear Regression MCEKF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Minimum Error Entropy Extended Kalman Filter (MEE-EKF) . . . . . . 5.3.1 MEE-EKF Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Appendix 5.A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Appendix 5.B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127 127 128 128 128 129 132 135 141 141 142 145 145 145 146
6
Unscented Kalman Filter Under Information Theoretic Criteria . . . . . . 6.1 Unscented Kalman Filter (UKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Maximum Correntropy Unscented Filter (MCUF) . . . . . . . . . . . . . . . . . . . . 6.2.1 MCUF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Maximum Correntropy Unscented Kalman Filter (MCUKF) . . . . . . . . 6.3.1 MCUKF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 GCL-UKF Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Enhanced GCL-UKF (EnGCL-UKF) Algorithm . . . . . . . . . . . . . 6.4.3 Application to Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF). . . . . 6.5.1 MEE-UKF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149 149 150 150 151 151 154 158 158 161
Cubature Kalman Filtering Under Information Theoretic Criteria . . . 7.1 Cubature Kalman Filter (CKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191 191 192 192
7
162 163 164 165 167 172 172 176 189
x
Contents
7.2 Maximum Correntropy Cubature Kalman Filter (MCCKF) . . . . . . . . . . 7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 MCSCKF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial Points (MEEF-CKF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 MEEF-CKF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Determination of the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Convergence Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Additional Topics in Kalman Filtering Under Information Theoretic Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Maximum Correntropy Kalman Filter with State Constraints (MCKF-SC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Linear Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Nonlinear Constraint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Correntropy-Based Divided Difference Filter (CDD) . . . . . . . . . . . . . . . . . 8.2.1 First-Order Divided Difference Filter (DD1) . . . . . . . . . . . . . . . . . . 8.2.2 Second-Order Divided Difference Filter (DD2) . . . . . . . . . . . . . . . 8.2.3 Correntropy-Based DD1 and DD2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with Fiducial Points (MEEF-DEKF) . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Dual Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 MEEF-DEKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Kernel Kalman Filtering with Conditional Embedding Operator and Maximum Correntropy Criterion (KKF-CEO-MCC) . . 8.4.1 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Statistical Embeddings in Reproducing Kernel Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Kernel Kalman Filtering with Conditional Embedding Operator (KKF-CEO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.4 KKF-CEO-MCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.5 Simplified Versions of KKF-CEO-MCC . . . . . . . . . . . . . . . . . . . . . . 8.4.6 Computational Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.7 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Appendix 8.A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
193 194 196 199 207 209 212 214 216 227 229 229 230 233 235 237 238 239 241 243 247 247 250 251 255 259 259 259 261 262 268 271 272 279 280 280
Contents
xi
8.6.2 8.6.3 8.6.4
Appendix 8.B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Appendix 8.C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Appendix 8.D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Acronyms
a, b, c, x, σ, η, λ x, y, r, q .Q, R, P, K .R, F, S, X, Y . .
∂g(x) ∂x ∂g(x) . ∂x
.
diag{a1 , a2 , · · · , an } x(1 : k) .| · | . · .·, · .N(a, B) . .
L(a, b)
.
U(a, b) Hφ (x) .Hα (x) .I −1 .A .T r(A) ˆ .a ˆ .A .a A . T .(·) .sign(·) .exp(·) .log(·) .κ(·) . .
Scalars Vectors Matrices Spaces Gradient (column vector) of scalar function g Jacobian matrix of vector valued function .x → g(x) Diagonal matrix with diagonal values .a1 , a2 , · · · , an Set or sequence containing the vectors .{x(1), x(2), · · · , x(k)} Absolute value of a real number Euclidean norm of a vector Inner product Gaussian distribution with mean vector .a and covariance matrix .B Laplace distribution with location parameter a and scale parameter .b Uniform distribution over interval .[a, b] .φ-entropy of random variable x .α-order Renyi entropy of random variable x Identity matrix Inverse of matrix .A Trace of matrix .A Estimate of vector .a Estimate of matrix .A Estimate of vector .a with fixed-point interaton Estimate of matrix .A with fixed-point interaton Vector or matrix transposition Sign function Exponential function Natural logarithm function Mercer kernel function xiii
xiv
AKF BE C-DDF C-Filter CKF CPF DDF DRS EKF EnGCL-UKF GCL-UKF GHQF GMCKF GPS GSF HIF HKF IAKF INS IP ITC-KFs ITL KF KF-EP KF-PDFT KKF-CEO KKF-CEO-MCC LMV LRMCEKF MAP MC MCC MCC-KF MCCKF MCEKF MC-Filter MCKF MCKF-EP MCKF-PDFT
Acronyms
Adaptive Kalman filter Bayesian estimation Correntropy-based divided difference filter Correntropy filter Cubature Kalman filter Cubature particle filter Divided difference filter Doubly robust smoothing Extended Kalman filter Enhenced unscented Kalman filter with generalized correntropy loss Unscented Kalman filter with generalized correntropy loss Gauss-Hermite quadrature filter Generalized maximum correntropy Kalman filter Global positioning system Gaussian sum filter H-infinite filter Huber-based Kalman filter Innovation adaptive Kalman filter Inertial navigation system Information potential Information theoretic class of Kalman filters Information theoretic learning Kalman filter Kalman filter with estimate projection Kalman filter with probability density function truncation Kernel Kalman filtering with conditional embedding operator Kernel Kalman filtering with conditional embedding operator and maximum correntropy criterion Linear minimum variance estimation Linear regression maximum correntropy extended Kalman filter Maximum a posterior Monte Carlo Maximum correntropy criterion Maximum correntropy criterion Kalman filter Maximum correntropy cubature Kalman filter Maximum correntropy extended Kalman filter Modified correntropy filter Maximum correntropy Kalman filter Maximum correntropy Kalman filter with estimate projection Maximum correntropy Kalman filter with probability density function truncation
Acronyms
MCKF-SC MCSCKF MCUKF MEE MEE-EKF MEEF MEEF-CKF MEEF-DEKF MEE-KF MEE-UKF MMAKF MMSE MSCF MSE MV NRMCEKF PDF PF PMU QIP QS RKHS RSTKF SHAKF SIR STF UKF UT WLS
xv
Maximum correntropy Kalman filter with state constraints Maximum correntropy square-root cubature Kalman filter Maximum correntropy unscented Kalman filter Minimum error entropy Minimum error entropy extended Kalman filter Minimum error entropy with fiducial points Cubature Kalman filter under minimum error entropy with fiducial points Dual extended Kalman filter under minimum error entropy Minimum error entropy Kalman filter Minimum error entropy unscented Kalman filter Multiple model adaptive Kalman filter Minimum mean square error Measurement-specific correntropy filter Mean square error Minimum variance estimation Nonlinear regression maximum correntropy extended Kalman filter Probability density function Particle filter Phasor measurement units Quadratic information potential Quadratic support Reproducing kernel Hilbert spaces Robust student’s t based Kalman filter Sage-Husa adaptive Kalman filter Sequential importance resampling Student’s t based filter Unscented Kalman filter Unscented transform Weighted least squares with fiducial points
Chapter 1
Introduction
1.1 Estimation From industrial equipment to signal processing, machine learning, optimal control, and other research fields, parameter and state estimation has always been a very important problem [1–7]. Estimation theory is a branch of probability theory and mathematical statistics. It is a mathematical method for extracting certain parameters or states of the system based on disturbed observations. Generally, estimation can be divided into two categories: parameter estimation and state estimation. In parameter estimation, the goal is to estimate the constant or slowly changing system’s parameters over time based on the observations. State estimation is based on the state space model, which estimates the continuously changing state of the system over time based on the observation data. Common estimation methods include least squares estimation [3, 4, 7, 8], maximum likelihood estimation [9], maximum a posteriori estimation [10], and Bayesian estimation [9]. The least squares method can provide an optimal recursive solution, which is one of the most basic methods in signal processing and machine learning [3, 7]. The last three methods are developed in probability space, which poses the estimation problem from the probability density function perspective [9, 10]. In adaptive filter (AF) theory, parameter estimation is based on least squares estimation, in which the desired signal can be separated from an additive mixture of signal and noise [5]. Generally, noise cancellation with the traditional low-pass filter, high-pass filter, band-pass filter, and band-stop filter is limited by the signal and noise spectral overlap [11]. To solve this issue, N. Wiener proposed the design of a statistically optimal filter in the frequency domain called now the Wiener filter in his honor [4]. The classic Wiener filter obtains the optimal solution by solving the WienerHopf equation under the minimum mean square error (MMSE) criterion, which uses the characteristics of a pair of signals, the input and measurement. However, it is suitable for dealing with stationary random processes, and it is a simple moving average parameter estimator without an internal state, which hinders its applicability © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_1
1
2
1 Introduction
in practical problems. The Kalman filter (KF) is a special case of the optimal recursive Bayesian state estimation theory for time series [3], which is obtained under the assumption of a linear model and Gaussian innovations and errors. The elegance and simplicity of this special case made the Kalman filer a success story and opened the door for a wealth of applications in real-world scenarios [1–7].
1.2 Kalman Filtering State estimation of a dynamical system is of paramount importance in various applications including tracking and navigation. Different from classic adaptive filtering, Kalman proposed an optimal recursive filtering method referred to as KF, which is suitable for both stationary and nonstationary random processes [1, 2]. KF has been widely used in signal processing, communications, radar, biomedicine, economics, and other fields [12–17]. The original KF was derived for a linear state space model with Gaussian assumption. To cope with the nonlinear estimation problems, a variety of nonlinear extensions of the original KF have been proposed in the literature, including extended Kalman filter (EKF) [18], second-order extended Kalman filter (SEKF) [19], unscented Kalman filter (UKF) [20, 21], Gauss-Hermite quadrature filter (GHQF) [22, 23], cubature Kalman filter (CKF) [24], and many others. The EKF is a basic extension of the KF, which approximates the nonlinear system by its first-order linearization at the operating point. However, when the initial estimate of the state is not accurate enough or the state space is modeled incorrectly, the EKF cannot achieve satisfactory filtering performance owing to the linearization error. The SEKF maintains the second-order terms of the Taylor expansions of the state and measurement equations. The calculation of the Jacobian matrix for the nonlinear system is necessary, which is an obstacle in nonlinear state estimation [25]. In comparison, the UKF employs the unscented transformation (UT) to calculate the mean and covariance of the innovation by a deterministic set of sigma points [26]. The UKF can achieve third-order accuracy for Gaussian inputs and at least second-order accuracy for non-Gaussian inputs [27]. Selecting the appropriate UKF scaling parameter is difficult in operating the filter. The GHQF is based on the Gauss-Hermite quadrature (GHQ) rule and can provide a stable and accurate estimate, which performs at a higher accuracy than the UKF [22, 23]. However, since the quadrature points in GHQF increase exponentially with the state dimension, it suffers from the curse of dimensionality. To ease the burden on the curse of dimensionality, the CKF has been introduced, which is based on the spherical-radial cubature rule for numerical integration. As a special case of the UKF, the CKF is more stable than the UKF on account of positive weights. The KF and its nonlinear extensions in general perform well in Gaussian noises [28], but their performance may deteriorate significantly under non-Gaussian noises. Therefore, several advanced Kalman filters (KFs) have being proposed to improve robustness.
1.3 Robust Kalman Filtering
3
1.3 Robust Kalman Filtering The estimation performance of Kalman filtering is likely to get worse when applied to non-Gaussian situations. In actual industrial applications, heavy-tailed noise such as impulsive noise or mixed Gaussian noise is very common. For example, in the electric power system, due to communication channel noise, GPS synchronization process, changes in ambient temperature, and system operating conditions, the system process and measurement noise may have long tails that deviate from the Gaussian distribution [29–38], as shown in Fig. 1.1. In strapdown inertial navigation, due to the influence of large maneuvers, the output error of the gyro will increase with an increase of the scale factor error, but the angular rate is larger, so that the inertial navigation error will increase significantly. Therefore, the process noise of strapdown inertial navigation/GPS integration has a heavytailed distribution [39, 40], which deviates from the Gaussian distribution, as shown in Fig. 1.2. In neurophysiology monitoring of microvolt brain waves (the electroencephalogram (EEG)), any minor change of the patient (head movement, blinking) and instrumentation or electrical interference can produce noise, which are called artifacts. The accurate modeling of cognitive processes and real-time feedback of neurons urgently requires online time-varying analysis that is robust to these abnormal effects [41–45]. Even the characteristics of voltage and current in lithium batteries during operation will also deviate from the statistical characteristics of a Gaussian distribution [46–56]. These non-Gaussian distributed noises seriously degrade the filtering accuracy of traditional KF. Therefore, robust Kalman filters were studied. Examples include the fixed-lag doubly robust smoothing (DRS) [57], sum-of-norm regularization [58], non-smooth quadratic support (QS) penalties [59], H-infinite filter (HIF) [60–62], and so on. The robust Kalman filters can be mainly divided into three categories. The first category focuses on updating the noise covariance matrix in real time. In order to effectively address the process uncertainty, some adaptive Kalman filters (AKFs) have been proposed to deal with process noise statistics estimation, such as Sage-Husa AKF (SHAKF) [63–67], innovation AKF (IAKF) [67–70], and multiple model AKF (MMAKF) [71, 72]. The SHAKF uses the maximum a posteriori criterion to estimate the process noise statistics, but it cannot guarantee convergence to an accurate process noise covariance matrix, which may lead to filtering divergence [64–67]. The IAKF assumes that the innovation error obeys the statistical characteristics of Gaussian white noise and can implement a realtime update of the process noise covariance matrix. Since IAKF is only suitable for slowly changing process noise covariance matrix, it needs a lot of relevant data to achieve accurate estimation of the noise covariance matrix [67–70]. In MMAKF, multiple sets of state space models are used to improve filtering accuracy. Among them, different state space models are based on different statistical characteristics of noise. The MMAKF can realize the improvement of filtering accuracy, but it brings about the increase of computational complexity [71, 72].
4
1 Introduction
Fig. 1.1 Error distributions of the phasor voltage angle and magnitude, the phasor current angle and magnitude, the real and reactive power using field PMU data provided by Pacific Northwest National Laboratory (From [29])
The second category aims to improve filtering robustness from the perspective of Bayesian inference including Gaussian sum filter (GSF) [73], robust student’s tbased Kalman filter (RSTKF) [74, 75], particle filter (PF) [9, 76], cubature particle filter (CPF) [77], and improved cubature Kalman filter (ICKF) [78]. Specifically, the
1.3 Robust Kalman Filtering
5
Fig. 1.2 Measurement velocities under maneuvers provided by the GPS/PHINS-based integration navigation system (From [39])
GSF employs a set of Gaussian mixture distributions to approximate the posterior distribution [73], and the RSTKF utilizes a student’s t distribution to model the heavy-tailed non-Gaussian noise giving posterior estimates with the variational Bayesian method [74, 75]. In PF, the Monte Carlo random sampling method is used to generate a large number of random particles to approximate the posterior distribution [9, 76]. The CPF combines the features of the CKF and PF [77]. The ICKF adopts the novel sigma-point update method to improve the robustness of CKF [78]. The third category is based on solving a multidimensional optimization problem, where some robust optimality criteria are used to develop the robust KFs for performance improvement under heavy-tailed noises. Generally, impulsive noises or heavy-tailed distribution noises are common in many real scenarios of automatic control and target tracking. However, the KF is based on the well-known minimum mean square error (MMSE) criterion, which is sensitive to large outliers and results in deterioration of the robustness in non-Gaussian noise environments. To address this problem, some KFs based on non-MMSE criteria are developed. For example, the Huber-based Kalman filters (HKF) [79–86] are derived by minimizing the piecewise Huber function which is based on .L1 and .L2 norms to achieve stable state estimation. More HKF variants are developed based on linearization or statistical linearization methods, such as Huber-based extended Kalman filter [87], Huberbased divided difference filtering [84], Huber-based unscented Kalman filter [85], and nonlinear regression Huber Kalman filter [86]. The Kalman filters derived from an information theoretic criterion are one of the leading areas of current research, because they perform very well particularly when the error distribution is heavytailed or multimodal.
6
1 Introduction
1.4 Robust Kalman Filtering Under Information Theoretic Criteria This book emphasizes a novel Kalman filter class, compactly called the information theoretic class of Kalman filters (ITC-KFs) [88, 89].
1.4.1 Information Theoretic Criteria Optimization criteria employing information theoretic learning (ITL) [88, 89] have gained increasing attention over the past few years. The information theoretic quantities (e.g., entropy or correntropy) in the optimization costs can be estimated directly from the data instead of the usual second-order statistical measures, such as variance and covariance. The appeal of information theoretic quantities is their ability to capture higher-order statistics and offer potentially significant performance improvement in machine learning and signal processing applications. ITL links information theory, nonparametric estimators, and reproducing kernel Hilbert spaces (RKHS) in a simple and unconventional way [88]. In ITL, the maximum correntropy criterion (MCC) [90] and minimum error entropy (MEE) [91] have received more and more recognition by researchers due to the remarkable advantages in solving non-Gaussian problems.
1.4.1.1
Maximum Correntropy Criterion (MCC)
Correntropy is a similarity measure defined on translation invariant nonlinear kernel evaluations of the input data. Its name derives from a contraction of the words “correlation” and “entropy”. For a linear kernel, correntropy defaults to correlation. The nonlinearity in the Gaussian kernel guarantees that correntropy expresses the sum of all even moments of the probability density function of the input data. If one adds all the kernel evaluations on input samples and takes the logarithm of the sum, one obtains Renyi’s quadratic entropy of the input data. Since correntropy induces a nonhomogeneous metric space, it is also a local similarity measure (hence lesssensitive to outliers), which makes it naturally a robust cost for machine learning and signal processing [92–96]. In supervised learning, such as regression, the problem can be formulated as that of maximizing the correntropy between model output and desired response. This optimization criterion is called in ITL the MCC. Recently, MCC has been successfully used in robust adaptive filtering in impulsive (heavytailed) noise environments [97–99]. The MCC solution cannot be obtained in closed form even for a simple linear regression problem, so one has to solve regression using an iterative update algorithm such as gradient-based methods [90]. The gradient-based methods are simple and widely used. But they depend on a free parameter, the step-size,
1.4 Robust Kalman Filtering Under Information Theoretic Criteria
7
and usually converge to solution slowly. A fixed-point iterative algorithm is an alternative efficient solution to MCC, which involves no step-size and may converge to the solution very fast [88, 100, 101]. A sufficient condition that guarantees the convergence of the fixed-point MCC algorithm was given in [102]. The MCC is a nice choice for dealing with heavy-tailed non-Gaussian noises, but its performance may not be good when facing more complex non-Gaussian noises, such as noises from multimodal distributions.
1.4.1.2
Minimum Error Entropy (MEE)
The MEE is another important learning criterion in ITL, which has been successfully applied in robust regression, classification, system identification, and adaptive filtering [103–111]. The basic principle of MEE is to learn a model to discover structure in data by minimizing the entropy of error between model and data generating system [88]. Entropy takes all higher-order moments into account and hence is a global descriptor of the underlying distribution. In general, the argument of the logarithm in Renyi’s entropy estimator, called information potential (IP), is a popular implementation of MEE (being a monotonic function, the logarithm does not change the location of the extreme of the loss function in the space of the parameters). As a result, at the minimum error entropy, the distribution of the system output matches the distribution of the desired signal. The robustness and superior performance of MEE have been confirmed both theoretically and numerically when subjected to non-Gaussian noises [112, 113]. In general, in order to obtain the minimum error solution, the MEE needs to center the error distribution at zero, which can be implemented with an extra weight that makes the error bias equal to zero [91]. To more naturally adjust the error mean, the MEE with fiducial points (MEEF) was proposed, which can automatically anchor the error mean at zero [114]. Similar to MCC, the fixed-point-based MEE algorithm and MEEF algorithm and their convergence analysis also have been studied [103, 115].
1.4.2 Application to Kalman Filtering In recent years, to solve the performance degradation problem in heavy-tailed (or impulsive) non-Gaussian noises, the authors developed some robust Kalman filters by using information theoretic criteria as the optimality criterion. Particularly, the MCC has been successfully applied in Kalman filtering to improve the robustness against impulsive noises. Typical examples include the maximum correntropy Kalman filter (MCKF) [116], generalized maximum correntropy Kalman filter (GMCKF) [117], maximum correntropy extended Kalman filter (MCEKF) [118], maximum correntropy unscented Kalman filter (MCUKF) [119, 120], unscented Kalman filter with generalized correntropy loss (GCL-UKF) [121], maximum correntropy cubature Kalman filter (MCCKF) [122], and its square root version
8
1 Introduction
(MCSCKF) [123] among others. Since correntropy is a local similarity measure and insensitive to large errors, these MCC-based filters are little influenced by large outliers. It is worth noting that in [124], the MCC has been used in hidden state estimation, but it involves no covariance propagation process and theoretically is not a Kalman filter. Although the MCC-based KF is a good alternative when the system is affected by some heavy-tailed distribution noises, the performance may not be as good in dealing with more complex non-Gaussian disturbances especially for multimodal distributions. Numerous experimental results show that MEE can outperform MCC in these situations although its computational complexity is higher [112, 113]. More recently, authors use the minimum error entropy to optimize the KF, generating minimum error entropy Kalman filter (MEE-KF) [125], minimum error entropy extended Kalman filter (MEE-EKF) [125], minimum error entropy unscented Kalman filter (MEE-UKF) [126], cubature Kalman filter under minimum error entropy with fiducial points (MEEF-CKF) [127], and so on. Similar to the traditional KF, the ITC-KFs not only retain the state mean propagation process but also preserve the covariance matrix propagation process. Thus, the ITC-KFs also have a recursive structure and are suitable for online implementation.
1.5 Organization of the Book The authors have developed several efficient Kalman filters (linear or nonlinear) under information theoretic criteria, which can achieve excellent performance in complicated non-Gaussian noises with non-expensive computation, and have great practical application potential. The purpose of this book is to combine all these perspectives and results in a single textbook that will become an important reference for students and practitioners in relevant application fields. The topics of this book are divided into the following chapters, and Fig. 1.3 provides a tree for the ITC-KFs in this book.
Fig. 1.3 A tree for the ITC-KFs in this book
1.5 Organization of the Book
9
Chapter 1 reviews the basic principle of Kalman filtering, information theoretic criteria, and robust Kalman filtering. Chapter 2 reviews the linear Kalman filters from different views and the nonlinear Kalman filters. In addition, some robust Kalman filters are also reviewed, which contain H-infinite filter, adaptive Kalman filter, student’s t-based Kalman filter, particle filter, and Huber-based Kalman filter. Chapter 3 introduces the information theoretic criteria including maximum correntropy criterion (MCC), minimum error entropy (MEE), and minimum error entropy with fiducial points (MEEF). Chapter 4 is the first chapter devoted to the ITC-KFs, which includes the maximum correntropy Kalman filter (MCKF), generalized maximum correntropy Kalman filter (GMC-KF), and minimum error entropy Kalman filter (MEE-KF). Other approaches for MCC-based Kalman filtering are also provided. Chapter 5 focuses on the extended ITC-KF, including the maximum correntropy extended Kalman filter (MCEKF) and the minimum error entropy extended Kalman filter (MEE-EKF). Chapter 6 summarizes the unscented ITC-KFs, which contains the maximum correntropy unscented filter (MCUF), maximum correntropy unscented Kalman filter (MCUKF), unscented Kalman filter with generalized correntropy loss (UKFGCL), and minimum error entropy unscented Kalman filter (MEE-UKF). Chapter 7 studies the cubature ITC-KF, which includes the maximum correntropy cubature Kalman filter (MCCKF), maximum correntropy square root cubature Kalman filter (MCSCKF), and cubature Kalman filter under minimum error entropy with fiducial points (MEEF-CKF). Chapter 8 extends the ITC-KF framework to state constraints, divided difference filter, and kernel Kalman filter (KKF) with conditional embedding. Based on MEEF, the dual extended Kalman filter is also presented.
Chapter 2
Kalman Filtering
Estimation of system parameters and system states has been one of the most important issues in signal processing, communications, optimal control, and robotics, with an enormous importance in the industry. The actual applications include parameter estimation [12], system identification [13], target tracking [14], simultaneous localization [15], and many others. The purpose of this chapter is to briefly review the foundations of statistical estimation. For linear dynamic systems, the estimation problem is usually solved by the Kalman filter (KF), which is, in essence, a least squares linear filter that accepts an optimal recursive solution [3, 7, 16, 17]. In order to solve the nonlinear filtering problem, researchers have made many additions to the KF family, including the extended Kalman filter (EKF) [18], the unscented Kalman filter (UKF) [20], the cubature Kalman filter (CKF) [24], and many others. EKF is a basic extension to the KF that approximates the nonlinear system by its first-order linearization at the operating point [18]. UKF was derived based on the unscented transform for probability density approximation [20], and CKF was developed by using a spherical–radial cubature rule for approximating the Bayesian filter [24]. At the same time, robust Kalman filters were also studied. Typical examples include the H-infinity filter (HIF) [60, 62], adaptive Kalman filter (AKF) [63, 64], robust student’s t-based Kalman filter (RSTKF) [74, 75], particle filter (PF) [9, 76], and Huber-based Kalman filter (HKF) [82, 83], etc. Specifically, the HIF can provide a robust estimate when noise characteristics of measurement are unknown [60, 62]. As for the AKF, Sage–Husa adaptive filtering algorithm is one of the most widely used methods, which uses a forgetting factor to weight the measurement noise covariance matrix for adaptively adjusting the Kalman gain [64]. To deal with the heavy-tailed noise, an RSTKF was proposed by using the student’s t distribution to approximate prior PDF and posterior PDF [74, 75]. In PF, the Monte Carlo random sampling method is used to generate a large number of random particles to approximate the posterior distribution [9, 76], and the HKF is a famous extension of the M-estimator to the Kalman filter setting, which utilizes a
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_2
11
12
2 Kalman Filtering
combined minimum .L1 -norm and .L2 -norm as a robust cost function [82, 83]. In the following, a brief review about these methods is presented.
2.1 Linear Kalman Filters KF provides a powerful tool to deal with state estimation for linear systems, which is an optimal estimator under linear and Gaussian assumptions. Consider a linear discrete-time dynamic system with unknown state vector .x(k) ∈ Rn×1 and available measurement vector .y(k) ∈ Rm×1 , where k is the index of the current time. To estimate the state .x(k), KF assumes a state space model described by the following state and measurement equations: .
x(k) = F(k − 1)x(k − 1) + q(k − 1).
(2.1)
y(k) = H(k)x(k) + r(k),
(2.2)
where .F(k − 1) ∈ Rn×n and .H(k) ∈ Rm×n are the state transition matrix and measurement matrix, at the past and current times, respectively. Here, the process noise .q(k − 1) ∈ Rn×1 also called the innovation and measurement noise .r(k) ∈ Rm×1 is mutually independent Gaussian white noise and satisfies .
E[q(k − 1)] = 0; E[r(k)] = 0.
(2.3)
E[q(k − 1)rT (k)] = 0.
(2.4)
E[q(k − 1)qT (k − 1)] = Q(k − 1).
(2.5)
E[r(k)rT (k)] = R(k).
(2.6)
E[q(k − 1)qT (j − 1)] = 0.
(2.7)
E[r(k)rT (j )] = 0,
(2.8)
where .E[·] is the expectation operator, .k = j , and .Q(k − 1) and .R(k) are the covariance matrices of .q(k − 1) and .r(k), respectively. Before deriving the KF, we give some basic definitions about the following important quantities: the state prediction error .x(k|k − 1) with the corresponding .x(k) with the error covariance matrix .P(k|k − 1), the state estimation error corresponding error covariance matrix .P(k), the measurement predication error .y(k|k − 1) with the corresponding error covariance matrix .Pyy (k|k − 1), and the cross-covariance .Pxy (k|k − 1) between .x(k|k − 1) and .y(k|k − 1): ˆ (k|k − 1). .x(k|k − 1) = x(k) − x P(k|k − 1) = E x(k|k − 1) xT (k|k − 1) .
(2.9) (2.10)
2.1 Linear Kalman Filters
13
Fig. 2.1 Diagram of KF algorithm
x(k) = x(k) − xˆ (k). P(k) = E x(k) xT (k) .
(2.11)
y(k|k − 1) = y(k) − yˆ (k|k − 1). Pyy (k|k − 1) = E y(k|k − 1) yT (k|k − 1) = Var(y(k), y(k)). Pxy (k|k − 1) = E x(k|k − 1) yT (k|k − 1) = Cov(x(k), y(k)),
(2.13)
(2.12)
(2.14) (2.15)
where .xˆ (k|k−1) and .xˆ (k) are the state prediction and state estimation at k; .yˆ (k|k−1) is the measurement prediction. Denote .y(1 : k − 1) = {y(1), y(2), . . . , y(k − 1)}. In general, the KF derivation is carried out in two successive steps: the state prediction .xˆ (k|k − 1) = E [x(k)|y(1 : k − 1)] and the measurement update .xˆ (k) = E [x(k)|y(1 : k)], as shown in Fig. 2.1: (1) State prediction: The priori estimates .xˆ (k|k − 1) and .P(k|k − 1) are calculated by .
xˆ (k|k − 1) = F(k − 1)ˆx(k − 1).
(2.16)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.17)
The predicted mean .xˆ (k|k−1) and covariance .P(k|k−1) of the state are obtained at the time step k before the measurement arrives. (2) Measurement update: The posteriori estimates .xˆ (k) and .P(k) are obtained by .
xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − yˆ (k|k − 1)].
(2.18)
yˆ (k|k − 1) = H(k)ˆx(k|k − 1).
(2.19)
14
2 Kalman Filtering
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1.
(2.20)
P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T + K(k)R(k)KT (k),
(2.21)
where .K(k) is the Kalman filter gain. The estimated mean .xˆ (k) and covariance P(k) of the state are obtained at the time step k after the measurement arrives. The KF provides a closed form recursive solution for the estimation of linear discrete-time dynamic systems. Note that the predicted and estimated state covariances at the current time do not depend on the current measurements, so that they could be calculated off-line before incorporating any measurements provided by the matrices .F(k), .Q(k), .R(k), and .H(k).
.
It is very important to understand the fundamental idea behind the KF formulation. Many methods exist to formulate the KF, including Bayesian estimation (BE) [9], maximum a posterior estimation (MAP) [9, 10], linear minimum variance estimation (LMV) [10], minimum variance estimation (MV) [16, 17], weighted least squares (WLS) [128], and batch-mode regression (BMR) [82, 83], which involve more complex knowledge of mathematical statistics. With the above KF representation form, although the derivation may not be rigorous enough, the other more formal formulations are easy to understand.
2.1.1 Bayesian Estimation (BE) Bayesian estimation (BE) is a very general and important state estimation method based on statistical probability integrals [9]. In the Bayesian sense, the system dynamics can be described by two conditional probability distributions: .
x(k) ∼ p(x(k)|x(k − 1)).
(2.22)
y(k) ∼ p(y(k)|x(k)),
(2.23)
where: • .p(x(k)|x(k − 1)) characterizes the state transition probability density function (PDF) of the system dynamics. • .p(y(k)|x(k)) characterizes the measurement transition PDF given the current state, which describes how the measurements depend on the current state. The state space model characterized by the probability densities in (2.22) and (2.23) corresponds to that described by difference equations (2.1) and (2.2): .
p(x(k)|x(k − 1)) = N (x(k); F(k − 1)x(k − 1), Q(k − 1)) .
(2.24)
p(y(k)|x(k)) = N (y(k); H(k)x(k), R(k)) ,
(2.25)
2.1 Linear Kalman Filters
15
where .N (x(k); F(k − 1)x(k − 1), Q(k − 1)) denotes .x(k) drawn from the Gaussian distribution with mean .F(k − 1)x(k − 1) and variance .Q(k − 1); .N (y(k); H(k)x(k), R(k)) denotes .y(k) drawn from the Gaussian distribution with mean .H(k)x(k) and variance .R(k). Generally, the state transition PDF obeys the Markov property [9], which means the state .x(k) is independent of previous states and measurement given the previous state .x(k − 1), i.e., p(x(k)|x(1 : k − 1), y(1 : k − 1)) = p(x(k)|x(k − 1)),
.
(2.26)
where the notation .x(1 : k − 1) means all the previous states 1 to .k − 1. Similarly, the measurement transition PDF obeys the Markov sequence when the measurement .y(k) is independent of the measurements and states history, which can be expressed as p(y(k)|x(1 : k), y(1 : k − 1)) = p(y(k)|x(k)).
.
(2.27)
In actual applications, what we care about for the algorithm implementation is the one-step prediction and the state estimation, which can be achieved from the prior conditional distribution .p(x(k)|y(1 : k − 1)) and the posterior conditional distribution .p(x(k)|y(1 : k)). We can show based on Markov dynamic models (2.26) and (2.27) that the recursive Bayesian solutions for the two conditional probability densities are obtained as follows: (1) State prediction: The prior conditional distribution .p(x(k)|y(1 : k − 1)) can be calculated by integrating over .x(k − 1): p(x(k)|y(1 : k − 1)) =
.
p(x(k)|x(k − 1))p(x(k − 1)|y(1 : k − 1))dx(k − 1). (2.28)
The joint distribution that corresponds to the product of marginals in the integrand is p(x(k − 1), x(k)|y(1 : k − 1)) = p(x(k)|x(k − 1))p(x(k − 1)|y(1 : k − 1)), (2.29)
.
where .p(x(k)|x(k − 1)) = N (x(k); F(k − 1)x(k − 1), Q(k − 1)) and .p(x(k − 1)|y(1 : k − 1)) = N x(k − 1); xˆ (k − 1), P(k − 1) . By Lemma A.1 on page 209 in reference [9], the joint distribution .p(x(k − 1), x(k)|y(1 : k − 1)) can be rewritten as p(x(k − 1), x(k)|y(1 : k − 1))
.
= N (x(k); F(k − 1)x(k − 1), Q(k − 1)) N x(k − 1); xˆ (k − 1), P(k − 1)
x(k − 1) m1 m1 , =N (2.30) ; xˆ , P x(k)
16
2 Kalman Filtering
where
xˆ (k − 1) ˆ = .x F(k − 1)ˆx(k − 1)
P(k − 1) P(k − 1)FT (k − 1) . Pm1 = F(k − 1)P(k − 1) F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1) (2.31) m1
From (2.30) and Lemma A.1 in reference [9], the marginal distribution of .x(k) is p(x(k)|y(1 : k − 1)) = N x(k); xˆ (k|k − 1), P(k|k − 1)
= N x(k); F(k − 1)ˆx(k − 1), F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1) .
.
(2.32) Therefore, from (2.32), the mean and variance of the prior conditional distribution .p(x(k)|y(1 : k − 1)) can be denoted by .
xˆ (k|k − 1) = F(k − 1)ˆx(k − 1).
(2.33)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.34)
(2) Measurement update: The a posteriori conditional distribution .p(x(k)|y(1 : k)) can be computed by using Bayes’ rule p(y(k)|x(k))p(x(k)|y(1 : k − 1)) . p(x(k)|y(1 : k)) = p(y(k)|x(k))p(x(k)|y(1 : k − 1))
.
(2.35)
The joint distribution of .x(k) and .y(k) is p(x(k), y(k)|y(1 : k − 1)) = p(y(k)|x(k))p(x(k)|y(1 : k − 1)).
.
(2.36)
Based on (2.25), (2.30) and Lemma A.1 on page 209 in reference [9], the joint distribution .p(x(k), y(k)|y(1 : k − 1)) is calculated by p(x(k), y(k)|y(1 : k − 1)) = N (y(k); H(k)x(k), R(k)) N (x(k); xˆ (k|k − 1), P(k|k − 1)
x(k) =N (2.37) ; xˆ m2 , Pm2 , y(k)
.
where
2.1 Linear Kalman Filters
xˆ (k|k − 1) ˆ = .x , H(k)ˆx(k|k − 1)
P(k|k − 1) P(k|k − 1)HT (k) . Pm2 = H(k)P(k|k − 1) H(k)P(k|k − 1)HT (k) + R(k)
17
m2
(2.38)
From (2.37) and Lemma A.1 in reference [9], the conditional distribution of x(k) is
.
p(x(k)|y(k), y(1 : k − 1)) = p(x(k)|y(1 : k)) = N x(k); xˆ (k), P(k) = N x(k); xˆ (k|k − 1) + K(k) y(k|k − 1), (I − K(k)H(k))P(k|k − 1) . (2.39)
.
Therefore, from (2.39), the mean and variance of posterior conditional distribution .p(x(k)|y(1 : k)) can be denoted by .
xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − Hˆx(k|k − 1)].
(2.40)
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1.
(2.41)
P(k) = (I − K(k)H(k))P(k|k − 1).
(2.42)
Although the original derivation of the KF is based on the least squares approach, the same equations can be derived from pure probabilistic Bayesian analysis. For the linear systems and Gaussian noises, the optimal Bayesian solution coincides with the least squares solution (with minimum mean squared error (MMSE) loss), that is, the optimal least squares solution is exactly the posterior mean. Very importantly, the recursive solution in Bayesian estimation provides a unified theoretical framework for solving nonlinear filtering problems.
2.1.2 Maximum a Posteriori Estimation (MAP) One of the statistical methods for estimating the state .x(k) is the maximum a posteriori (MAP) method. Different from Bayesian estimation, the MAP method considers an optimization problem, where .x(k) is estimated by maximizing the conditional distribution .p(x(k)|y(1 : k)) given the measurement .y(1 : k): (1) State prediction: The mean .xˆ (k|k−1) and the corresponding variance .P(k|k−1) of the a priori distribution .p(x(k)|y(1 : k − 1)) are calculated by using the same method of Bayesian estimation:
18
2 Kalman Filtering
.
xˆ (k|k − 1) = F(k − 1)ˆx(k − 1).
(2.43)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.44)
(2) Measurement update: The posterior distribution .p(x(k)|y(1 : k)) can be computed by using Bayes’ rule p(x(k)|y(1 : k)) =
.
p(y(k)|x(k))p(x(k)|y(1 : k − 1)) , p(y(1 : k))
(2.45)
where .p(y(1 : k)) can be considered as a normalization constant. Then, the posterior distribution of the state can be written as p(x(k)|y(1 : k)) ∝ p(y(k)|x(k))p(x(k)|y(1 : k − 1)).
.
(2.46)
Denote .Z(x(k)) = p(y(k)|x(k))p(x(k)|y(1 : k − 1)). The MAP estimate is obtained by maximum of .Z(x(k)) with respect to .x(k): xˆ (k) = arg max Z(x(k))
.
x(k)
= arg max p(y(k)|x(k))p(x(k)|y(1 : k − 1)) x(k)
= arg max N (y(k); H(k)x(k), R(k)) N x(k); xˆ (k|k − 1), P(k|k − 1) , x(k)
(2.47) where N (y(k); H(k)x(k), R(k)) 1 1 T −1 exp − (y(k) − H(k)x(k)) R (k)(y(k) − H(k)x(k)) . = 2 (2π )m/2 |R|1/2 (2.48) N x(k); xˆ (k|k − 1), P(k|k − 1) 1 1 exp − (x(k) − xˆ (k|k − 1))T P−1 (k|k − 1) = n/2 1/2 2 (2π ) |P| (2.49) ×(x(k) − xˆ (k|k − 1)) .
.
Taking the log of this function and taking the derivative about .x(k) yield
.
∂ ln (2π1)|P| ∂ ln Z(x(k)) ∂LD (x(k)) = − ∂x(k) ∂x(k) ∂x(k)
2.1 Linear Kalman Filters
19
⎛
⎞ ln (2π1)|P| + − 12 (x(k) − xˆ (k|k − 1))T P−1 (k|k − 1)(x(k) − xˆ (k|k − 1))
⎠ ∂⎝ 1 + − 2 (y(k) − H(k)x(k))T R−1 (k)(y(k) − H(k)x(k)) ∂x(k) −1
= H (k)R (k) (y(k) − H(k)x(k)) − P−1 (k|k − 1) × x(k) − F(k − 1)ˆx(k − 1) , T
(2.50)
where 1 LD (x(k)) = (x(k) − xˆ (k|k − 1))T P−1 (k|k − 1)(x(k) − xˆ (k|k − 1)) 2 1 + (y(k) − H(k)x(k))T R−1 (k)(y(k) − H(k)x(k)). (2.51) 2
.
Letting the partial derivative equal to zero, the optimal .x(k) can be solved by .
xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1)].
(2.52)
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1 .
(2.53)
The error covariance matrix .P(k) can be updated by the same method of Bayesian estimation in (2.42). Note that when the distributions .p(y(k)|x(k)) and .p(x(k)|y(1 : k − 1)) are the Gaussian distributions, the MAP estimate can be equivalently computed as the minimum of the error function in Eq. (2.51).
2.1.3 Linear Minimum Variance Estimation (LMV) Linear minimum variance estimation (LMV) minimizes the mean of the sum of squares of the estimated error [10]. For the priori estimate of .x(k), the mean and variance take the form of .E [x(k)] and .E (x(k) − E [x(k)]) (x(k) − E [x(k)])T , which can be calculated in the following prediction process. The posterior estimate of .x(k) is carried out in the measurement update process from the derivation of LMV estimation: (1) State prediction: Based on Eq. (2.3), the a priori estimate .xˆ (k|k − 1) is obtained by taking the expectation for Eq. (2.1): xˆ (k|k − 1) = E [x(k)] = E [F(k − 1)x(k − 1) + q(k − 1)]
.
= F(k − 1)ˆx(k − 1).
(2.54)
20
2 Kalman Filtering
The corresponding error covariance matrix .P(k|k − 1) is calculated by P(k|k − 1) = E
.
T x(k) − xˆ (k|k − 1) x(k) − xˆ (k|k − 1)
F(k − 1)(x(k − 1) − xˆ (k − 1)) + q(k − 1) T =E × F(k − 1)(x(k − 1) − xˆ (k − 1)) + q(k − 1) T = F(k − 1)E x(k − 1) − xˆ (k − 1) x(k − 1) − xˆ (k − 1) × FT (k − 1) + E q(k − 1)qT (k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.55)
(2) Measurement update: In the measurement equation, we need to use the observed data .y(k) to estimate the truth state .x(k). To obtain the optimal estimate of .x(k) from available .y(k), the estimate .xˆ (k) can be modeled by a linear combination of measurements, denoted by xˆ (k) = Ay(k) + b,
.
(2.56)
where .A ∈ Rn×m is an undetermined coefficient matrix and .b ∈ Rn×1 is an undetermined vector. The performance index function of LMV estimation is established as follows: J (ˆx) = E (ˆx(k) − x(k))T (ˆx(k) − x(k)) .
.
(2.57)
Substitute (2.56) into (2.57) and take the following transformation: J (ˆx) = E (Ay(k) + b − x(k))T (Ay(k) + b − x(k))
= E Tr (Ay(k) + b − x(k))(Ay(k) + b − x(k))T = Tr E (Ay(k) + b − x(k))(Ay(k) + b − x(k))T .
.
(2.58)
To minimize the performance index function, we take the derivative for (2.58) with respect to .b and .A, respectively, to yield .
∂J = 2E [(b + Ay(k) − x(k))] = 0. ∂b ∂J = 2E (b + Ay(k) − x(k)) yT (k) = 0. ∂A
(2.59) (2.60)
2.1 Linear Kalman Filters
21
Then, we obtain .
b = E [x(k)] − AE [y(k)] .
(2.61)
A = Cov(x(k), y(k))Var(y(k))−1 ,
(2.62)
where .E [x(k)] = xˆ (k|k − 1), .E [y(k)] = yˆ (k|k − 1) = H(k)ˆx(k|k − 1), .Cov(x(k), y(k)) = E (x(k) − xˆ (k|k − 1))(y(k) − yˆ (k|k − 1))T , and ˆ (k|k − 1))(y(k) − yˆ (k|k − 1))T . Substitute the .Var(y(k), y(k)) = E (y(k) − y solution of .A and .b into (2.56) to obtain the LMV estimation xˆ (k) = E [x(k)] + Cov(x(k), y(k))Var(y(k))−1 (y(k) − E [y(k)]),
.
(2.63)
where Cov(x(k), y(k)) = E (x(k) − xˆ (k|k − 1))(y(k) − yˆ (k|k − 1))T = E (x(k) − xˆ (k|k − 1))(H(k) x(k|k − 1) + r(k))T
.
= P(k|k − 1)HT (k). (2.64) Var(y(k), y(k)) = E (y(k) − yˆ (k|k − 1))(y(k) − yˆ (k|k − 1))T = E (H(k) x(k|k − 1) + r(k))(H(k) x(k|k − 1) + r(k))T = H(k)E ( x(k|k − 1))( x(k|k − 1))T HT (k) + E r(k)rT (k) = H(k)P(k|k − 1)HT (k) + R(k).
(2.65)
Then, .xˆ (k) can be derived from (2.63) as follows: xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1)]
.
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1 .
(2.66)
The corresponding error covariance matrix .P(k) is obtained as P(k) = E (x(k) − xˆ (k))(x(k) − xˆ (k))T
.
(x(k) − xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1)) ×(x(k) − xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1))T
x(k|k − 1) − K(k)r(k)) ((I − K(k)H(k)) =E × ((I − K(k)H(k)) x(k|k − 1) − K(k)r(k))T
=E
22
2 Kalman Filtering
= (I − K(k)H(k))E x(k|k − 1) xT (k|k − 1) (I − K(k)H(k))T + K(k)E r(k)rT (k) KT (k) − (I − K(k)H(k))E x(k|k − 1)rT (k) KT (k) − K(k)E r(k) xT (k|k − 1) (I − K(k)H(k))T = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T + K(k)R(k)KT (k).
(2.67)
From the point of view of LMV, the KF equations are again obtained. Compared with (2.42), where the .P(k) matrix can lose symmetry and positivity, the formula (2.67) has the advantage in ensuring a stable algorithm since .P(k) is a symmetric and positive definite matrix.
2.1.4 Minimum Variance Estimation (MV) Here, the representation form of minimum variance (MV) estimation is mainly used for reference, and the Kalman filtering derivation is carried out by an intuitive method. Although the derivation process is not rigorous, it is easy to understand: (1) State prediction: Similar to (2.54) and (2.55), the priori state estimate .xˆ (k|k −1) and the corresponding error covariance matrix .P(k|k − 1) can be obtained. (2) Measurement update: If one only uses the state one-step prediction .xˆ (k|k − 1) to estimate the optimal .x(k), the estimation accuracy will not be high as the system has uncertainties and no information of the measurement equation considered. From (2.13) and (2.19), the measurement one-step prediction error .y(k|k − 1) calculated using the system measurement equation also contains the information of state .xˆ (k|k −1). It can be seen that both .xˆ (k|k −1) and .y(k|k −1) contain state information. A natural idea is to comprehensively consider the .y(k|k − influence of the state equation and the measurement equation. Using 1) to correct .xˆ (k|k − 1) as an estimate of .x(k) helps to improve the accuracy of state estimation. Thus, the optimal estimate .xˆ (k) can be obtained by using measurement to correct the priori state .xˆ (k|k − 1), i.e., xˆ (k) = xˆ (k|k − 1) + K(k) y(k|k − 1),
.
(2.68)
where .K(k) is the undetermined Kalman gain, and the correction term .y(k|k−1) is the innovation that means that the measurement prediction error carries new information about the state estimation. Use the same method with (2.67) to obtain the posteriori error covariance matrix .P(k) as
2.1 Linear Kalman Filters
23
P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T + K(k)R(k)KT (k). (2.69)
.
To obtain the undetermined Kalman gain, the MV estimation defines the following index function: J (x) = E xT (k) x(k) = E (ˆx(k) − x(k))T (ˆx(k) − x(k))
= E Tr (ˆx(k) − x(k))(ˆx(k) − x(k))T = Tr E (ˆx(k) − x(k))(ˆx(k) − x(k))T
.
= Tr (E [P(k|k − 1)]) .
(2.70)
The Kalman gain .K(k) can be solved by minimizing the index function .J (x) and (x(k)) setting . ∂J∂K(k) = 0, i.e., .
∂J (x(k)) = 2[I − K(k)H(k)]P(k|k − 1)(−HT (k)) + 2K(k)R(k) = 0. ∂K(k)
(2.71)
Then the Kalman gain is K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1 ,
.
(2.72)
which tells how much the predictions should be corrected on time step k. The meaning of the above formula reflects that the estimated .xˆ (k) makes comprehensive use of the information of state prediction .xˆ (k|k − 1) and measurement .y(k|k − 1). Therefore, the intuitive meaning of formula (2.68) is prediction error using the innovation .yˆ (k|k − 1) to modify the prior estimate .xˆ (k|k − 1) to obtain the posterior estimate .xˆ (k). The posterior estimate is more accurate than the prior estimate. In fact, using the LMV theory can also prove that the representation of optimal state estimation in formula (2.68) has the form of “prediction + correction”. Compared with LMV estimation, the calculation in MV estimation is simpler.
2.1.5 Weighted Least Squares (WLS) The derivation of KF based on weighted least squares method (WLS) can be regarded as a special case of the MAP method. Unlike MAP, the WLS can be achieved by direct optimization of the quadratic cost function while building a recursive estimator. To derive the KF, the quadratic objective function is
24
2 Kalman Filtering
1 L(x(k)) = (ˆx(k) − F(k − 1)ˆx(k − 1))T P−1 (k|k − 1)(ˆx(k) − F(k − 1)ˆx(k − 1)) 2 1 + (y(k) − H(k)ˆx(k))T R−1 (k)(y(k) − H(k)ˆx(k)). (2.73) 2
.
The formulation of KF can be derived by analytically solving xˆ (k) = arg min L(x(k)).
.
x(k)
(2.74)
Then, we have .
xˆ (k|k − 1) = F(k − 1)ˆx(k − 1).
(2.75)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.76)
xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1)].
(2.77)
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1.
(2.78)
P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T + K(k)R(k)KT (k).
(2.79)
From the previous analysis, it can be seen that WLS estimation requires a linear relationship between the state and the observations and assumes that noise .r(k) is known to have zero mean variance, while it makes no requirements on the statistical properties of state .x(k). Note that the weighting matrices .R−1 (k) and .P−1 (k|k − 1) are embedded in the cost function; hence, the estimation process can reduce the influence of two sources of uncertainty in the dynamical system, i.e., the system noise (uncertainty) .q(k) and the measurement noise (uncertainty) .r(k).
2.1.6 Batch-Mode Regression Different from the above-mentioned methods, there exists a state estimate method that solves a multidimensional optimization problem via building a batch-mode regression estimator [82]. Then, the aim is minimizing or maximizing a cost function. According to this viewpoint, the cost function of the KF in a Gaussian noise environment is the .L2 -norm (mean square error). To derive the KF, one can construct a batch-mode regression model by combining the state prediction error .x(k|k − 1) with the measurement Eq. (2.2): .
xˆ (k|k − 1) I = n x(k) + μ(k), y(k) H(k)
(2.80)
2.1 Linear Kalman Filters
25
where .In denotes an .n × n identity matrix, and μ(k) =
.
− x(k|k − 1) r(k)
(2.81)
is the augmented noise vector comprising of the state and measurement errors. The covariance matrix of the augmented noise .E μ(k)μ(k)T is positive definite, and then we have E μ(k)μ(k)T = (k)(k)T P(k|k − 1) 0 = . (2.82) 0 R(k) p (k|k − 1)p (k|k − 1)T 0 = , 0 r (k)r (k)T
p (k|k − 1) 0 where .(k) = , .p (k|k − 1), and .r (k) are obtained by 0 r (k) the Cholesky decomposition of .E μ(k)μ(k)T , .P(k|k − 1) and .R(k), respectively. Multiplying both sides of (2.80) by .−1 (k) gives
xˆ (k|k − 1) In −1 x(k) . (k) = (k) H(k) y(k)
−(x(k) − xˆ (k|k − 1)) + −1 (k) . r(k) −1
(2.83)
Equation (2.83) can be expressed by d(k) = W(k)x(k) + e(k)
.
ˆ = d(k) + e(k),
(2.84)
where
xˆ (k|k − 1) .d(k) = (k) . y(k)
I W(k) = −1 (k) n . H(k)
−(x(k) − xˆ (k|k − 1)) e(k) = −1 (k) . r(k) −1
(2.85) (2.86) (2.87)
26
2 Kalman Filtering
ˆ d(k) = W(k)x(k),
(2.88)
with .d(k) = [d1 (k),d2 (k), . . . dL (k)]T , .W(k) = [w1 (k);w2 (k); . . . ; wL (k)], .e(k) = ˆ [e1 (k),e2 (k), . . . eL (k)]T , .d(k) = [dˆ1 (k),dˆ2 (k), . . . dˆL (k)]T , and .L = m + n. Therefore, the batch-mode regression form is achieved by e(k) = d(k) − W(k)x(k).
.
(2.89)
Then, for the above regression problem, the optimal solution to .xˆ (k) is achieved by minimizing the mean square error, that is, xˆ (k) = arg max J (x(k)) = arg max e(k)T e(k).
.
x(k)
(2.90)
After simplification, the above loss function is reduced to (2.73). In addition, the loss function can be rewritten as 1 2 ei (k), L L
J (x(k)) =
.
(2.91)
i=1
where .ei (k) is the i-th element of .e(k): ei (k) = di (k) − wi (k)x(k).
.
(2.92)
The optimal solution can thus be obtained by solving .
2 ∂J (x(k)) = − WT (k)e(k) = 0. ∂x(k) L
(2.93)
From (2.93), .x(k) can be solved by
−1
WT (k)d(k) . xˆ (k) = WT (k)W(k)
.
(2.94)
By the matrix inversion lemma (A + BCD)−1
.
= A−1 − A−1 B(C−1 + DA−1 B)−1 DA−1 ,
(2.95)
with the assignments .
(p −1 (k|k − 1))T p (k|k − 1) → A, HT (k) → B, .
(2.96)
H(k) → C, (r −1 (k))T r (k) → D,
(2.97)
2.1 Linear Kalman Filters
27
one can reformulate (2.94) as xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1) ,
(2.98)
−1 K(k) = P(k|k − 1)HT (k) H(k)P(k|k − 1)HT (k) + R(k) .
(2.99)
.
where .
Then, the posterior covariance matrix can be updated by
−1 P(k) = W(k)T W(k)
.
= [I − K(k)H(k)] P(k|k − 1).
(2.100)
The Kalman filter based on information theory criteria introduced in Chap. 8 of the book is based on the batch-mode regression framework. Actually, this framework leads to the same loss function as the WLS. However, when the M-estimator or information theoretic learning is used in deriving KF, this framework plays a very effective role, which will be introduced in the following in detail. Different estimation methods to derive KF are summarized in Table 2.1. Every estimation method represents a specific perspective, which yields different approaches for the advanced KF models described in this book. For example, Kalman filters can be framed via posterior probability approximation (Bayesian estimation) or optimization of a loss function as in WLS, where the former will capture nonGaussian noise distribution and the latter can be made to suppress outliers or impulsive noise.
Table 2.1 Summary of the main points of the existing estimation methods Estimation method Bayesian estimation
Optimization method None
Maximum a posterior estimate
Maximum of the state posterior distribution
Linear minimum variance estimate Minimum mean square of state estimation error Minimum variance estimate Minimum variance of state estimation error Weighted least squares Minimum weighted state prediction and measurement error Batch-mode regression Minimum mean square of normalized augmented error
Core technique Bayesian inference .• Statistical probability integrals .• Statistical probability integrals .• Optimization problem Use linear combinations of measurements to denote state Recursive state estimation
.•
Construct weighted loss function Construct batch-mode regression
28
2 Kalman Filtering
2.2 Nonlinear Kalman Filters For a nonlinear system, the state and measurement equations can be described by .
x(k) = f(x(k − 1)) + q(k − 1).
(2.101)
y(k) = h(x(k)) + r(k),
(2.102)
where .f(·) and .h(·) are the state transition function and measurement (observation) function, respectively. Similar to linear KF, the nonlinear KF includes the following two parts: (1) State prediction: The priori estimates .xˆ (k|k − 1) and .P(k|k − 1) are calculated by xˆ (k|k − 1) = E [x(k)]
.
= E [f(x(k − 1)) + q(k − 1)] = f(x(k − 1))p(x(k − 1)|y(1 : k − 1))dx(k − 1) =
f(x(k − 1))N x(k − 1); xˆ (k − 1), P(k − 1) dx(k − 1).
T P(k|k − 1) = E x(k) − xˆ (k|k − 1) x(k) − xˆ (k|k − 1) = f(x(k − 1))fT (x(k − 1))N x(k − 1); xˆ (k − 1),
(2.103)
P(k − 1)) dx(k − 1) − xˆ (k|k − 1)ˆxT (k|k − 1) + Q(k − 1).
(2.104)
(2) Measurement update: Based on the LMV estimation, the posteriori estimate ˆ (k) has the form of .x xˆ (k) = E [x(k)] + Cov(x(k), y(k))Var(y(k))−1 (y(k) − E [y(k)])
.
= xˆ (k|k − 1) + K(k)[y(k) − yˆ (k|k − 1)],
(2.105)
where E [y(k)] = yˆ (k|k − 1) =
.
h(x(k))N x(k); xˆ (k|k − 1), P(k|k − 1) dx(k). (2.106)
K(k) = Cov(x(k), y(k))Var(y(k))−1 = Pxy (k|k − 1)Pyy (k|k − 1)
(2.107)
2.2 Nonlinear Kalman Filters
29
with Pxy (k|k − 1) =
.
x(k)hT (x(k))N x(k); xˆ (k|k − 1), P(k|k − 1) dx(k)
(2.108) − xˆ (k|k − 1)ˆy(k|k − 1). Pyy (k|k − 1) = h(x(k))hT (x(k))N x(k); xˆ (k|k − 1), P(k|k − 1) dx(k) − yˆ (k|k − 1)ˆy(k|k − 1) + R(k).
(2.109)
The state error covariance matrix is x(k) − xˆ (k|k − 1) − K(k) y(k) − yˆ (k|k − 1) T .P(k) = E × x(k) − xˆ (k|k − 1) − K(k) y(k) − yˆ (k|k − 1) = P(k|k − 1) − K(k)Pyx (k|k − 1) − Pxy (k|k − 1)KT (k) + K(k)Pyy (k|k − 1)KT (k) = P(k|k − 1) − K(k)Pyy (k|k − 1)KT (k).
(2.110)
The equations described above provide a unified theoretical framework for the nonlinear KF theory. The state estimation can finally be equivalent to the calculation of multidimensional vector integral, in which the integral form can be summarized as the inner product between nonlinear function and probability density function, i.e., [16] .I (g) = g(x)N (x; x, Px )dx, (2.111) where .g(·) is any nonlinear function. In general, it is intractable to directly compute the above integral. Therefore, the key to nonlinear KF lies in how to use numerical methods to calculate the multidimensional vector integral in the above form [16]. Based on Bayesian filtering theory, the integral of multidimensional nonlinear function weighted by the Gaussian distribution in nonlinear KF requires to be approximated. At present, the common numerical methods include approximations of nonlinear function and probability density function (PDF), as shown in (2.112). The nonlinear function approximation mainly uses the polynomial expansion, and the filters based on such approach include EKF, divided difference filter (DDF) [16], and Chebyshev polynomial Kalman filter (CPKF) [16]. Especially, the EKF is a very popular nonlinear extension of KF. The PDF can usually be approximated by a sampling. The filters based on PDF approximation mainly include UKF, CKF, and square root CKF (SCKF) [16].
30
2 Kalman Filtering
⎧ ⎧ ⎪ ⎨ EKF ⎪ ⎪ ⎪ ⎪ Approximation of nonlinear function DDF ⎪ ⎪ ⎩ ⎨ CPKF ⎧ .Nonlinear Kalman filters ⎪ UKF ⎨ ⎪ ⎪ ⎪ ⎪ Approximation of PDF CKF ⎪ ⎪ ⎩ ⎩ SCKF (2.112)
2.2.1 Extended Kalman Filter (EKF) The EKF is an important method for dealing with nonlinear systems via approximating the nonlinear function by a linear model at the operating point. In EKF, the nonlinear functions .f(x(k − 1)) and .h(x(k)) are approximated by the first-order Taylor series expansion at .xˆ (k) and .xˆ (k|k − 1), i.e., .
f(x(k − 1)) ≈ f(ˆx(k − 1)) + F(k − 1)(x(k − 1) − xˆ (k − 1)).
(2.113)
h(x(k)) ≈ h(ˆx(k|k − 1)) + H(k)(x(k) − xˆ (k|k − 1)),
(2.114)
where .F(k − 1) and .H(k) are the Jacobian matrices of .f(·) and .h(·), i.e., ∂f(x(k − 1)) . .F(k − 1) = ∂x x=ˆx(k) ∂h(x(k)) H(k) = . ∂x x=ˆx(k|k−1)
(2.115) (2.116)
Substituting (2.113) and (2.114) into (2.101) and (2.102), one obtains x(k) = F(k − 1)x(k − 1) + f(ˆx(k − 1)) − F(k − 1)ˆx(k − 1) + q(k − 1). (2.117)
.
y(k) = H(k)x(k) + h(ˆx(k|k − 1)) − H(k)ˆx(k|k − 1) + r(k).
(2.118)
According to the calculation of linear KF, we can use (2.101) and (2.102) to obtain the prediction and update EKF equations: .
xˆ (k|k − 1) = f(ˆx(k − 1)).
(2.119)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.120)
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1.
(2.121)
xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − h(ˆx(k|k − 1))].
(2.122)
2.2 Nonlinear Kalman Filters
31
P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T + K(k)R(k)KT (k).
(2.123)
The Kalman gain and error covariance matrices for state prediction and estimation in EKF are the same as those of linear KF.
2.2.2 Unscented Kalman Filter (UKF) In general, approximating a PDF is relatively easier than approximating an arbitrary nonlinear function. The unscented transform (UT) is a method of PDF approximation, employing a deterministic sampling method. Then, the approximate calculation of the integral can be denoted by I (g) ≈
N
.
wi g(ξi ),
(2.124)
i=1
where .ξi and .wi are the sampling points and corresponding weights. According to the UT, the UKF can be achieved as follows: (1) State Prediction (i) Unscented transformation: At the time .k −1, one can adopt .2n+1 sampling points to approximate the state variables’ statistical characteristics. In the UT, a sequence of .2n + 1 sigma points can be generated by ξ 0 (k − 1) =ˆx(k − 1)
.
ξ i (k − 1) =ˆx(k − 1) +
(n + λ )P(k − 1)
i
(i = 1, 2, . . . , n)
(n + λ )P(k − 1) ξ i (k − 1) =ˆx(k − 1) −
i
(i = n + 1, n + 2, . . . , 2n),
(2.125)
√ where . (n + λ )P(k − 1) i represents the i-th column vector of the square root matrix of .(n + λ )P(k − 1). The scalar parameter .λ is given by .λ = (α )2 (n + κ ) − n, where .α determines the diffusion degree of the sigma point .ξ i (k − 1) around .xˆ (k − 1), and .κ is set to 0 in the state estimate. The corresponding weights of the sigma points are given by w0m =
.
λ n + λ
32
2 Kalman Filtering
λ + 1 − (α )2 + β n + λ 1 wic =wim = 2(n + λ ) w0c =
(i = 1, 2, . . . , 2n),
(2.126)
where .β is a parameter about the prior information of the distribution of m c .x(k); the weights .w i and .wi are used to calculate mean and covariance, respectively. The unscented transformed points are then propagated through the state transition function .
χ i (k|k − 1) = f(ξ i (k − 1)) (i = 0, 1, 2, . . . , 2n).
(2.127)
(ii) Predict: The priori state estimate and the corresponding error covariance matrix are calculated by xˆ (k|k − 1) =
2n
.
wim χ i (k|k − 1).
(2.128)
i=0
2n wic χ i (k|k − 1) − xˆ (k|k − 1) T P(k|k − 1) = × χ i (k|k − 1) − xˆ (k|k − 1) + Q(k). i=0
(2.129)
(2) Measurement Update (i) Unscented transformation: Similarly, a set of .2n + 1 sigma points is generated by the predicted state .xˆ (k|k−1) and covariance matrix .P(k|k−1): ξ 0 (k|k − 1) =ˆx(k|k − 1)
.
ξ i (k|k − 1) =ˆx(k|k − 1) +
(n + λ )P(k|k − 1)
(i = 1, 2, . . . , n)
(n + λ )P(k|k − 1) ξ i (k|k − 1) =ˆx(k|k − 1) − (i = n + 1, n + 2, . . . , 2n).
i
i
(2.130)
The corresponding sigma point weights are the same as (2.126). Through the measurement function, the sigma points are transformed as .
γ i (k) = h(ξ i (k|k − 1)) (i = 0, 1, 2, . . . , 2n).
(2.131)
2.2 Nonlinear Kalman Filters
33
(ii) Update: The priori measurement estimate and corresponding error covariance matrix are yˆ (k|k − 1) =
2n
.
wim γ i (k)
i=0
Pyy (k|k − 1) =
2n
T wic γ i (k) − yˆ (k) γ i (k) − yˆ (k) + R(k).
0=1
(2.132) The cross-covariance matrix between state and measurement can be obtained by Pxy (k|k − 1) =
2n
.
wic χ i (k|k − 1) − xˆ (k|k − 1)
i=0
T × γ i (k) − yˆ (k|k − 1) .
(2.133)
Compute the state estimate .xˆ (k) and its corresponding covariance matrix .P(k) by .
xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − yˆ (k|k − 1) .
(2.134)
P(k) = Pxx (k|k − 1) − K(k)Pyy (k|k − 1)KT (k).
(2.135)
K(k) = Pxy (k|k − 1)P−1 yy (k|k − 1).
(2.136)
2.2.3 Cubature Kalman Filter (CKF) Different from the UKF, a spherical–radial rule is used in the CKF to numerically compute the multivariate integral, where 2n sampling points are adopted to approximate the state variables’ statistical characteristics: (1) State prediction: Calculate sampling points .ξ i (k − 1) by ξ i (k − 1) = xˆ (k − 1) + ϑ i
.
ξ i+n (k − 1) = xˆ (k − 1) − ϑ i ,
(i = 1, 2, . . . , n).
(2.137) (2.138)
√ where .ϑ i = na i ; .a i denotes the unit vector with the i-th element equal to .1 and other elements set as .0.
34
2 Kalman Filtering
Compute the propagated cubature points .
χ i (k|k − 1) = f(ξ i (k − 1)) (i = 1, 2, . . . , 2n).
(2.139)
Calculate the predicted state .xˆ (k|k − 1) and its corresponding covariance matrix P(k|k − 1)
.
1 χ i (k|k − 1). 2n
(2.140)
1 χˆ i (k|k − 1)χˆ i (k|k − 1)T + Q(k − 1), 2n
(2.141)
2n
xˆ (k|k − 1) =
.
i=1 2n
P(k|k − 1) =
i=1
where .χˆ i (k|k − 1) = χ i (k|k − 1) − xˆ (k|k − 1) and the CKF performs the state estimation on the assumption of a zero mean Gaussian distribution process noise .q(k − 1) in (2.101). (2) Measurement update: Calculate sampling points .ξ i (k|k − 1) by ξ i (k|k − 1) = S(k|k − 1)ϑ i + xˆ (k|k − 1),
.
(2.142)
where .S(k|k − 1) is the Cholesky decomposition of .Pxx (k|k − 1). Calculate the transformed cubature points .γ i (k) by γ i (k) = h(ξ i (k|k − 1)) (i = 1, 2, . . . , 2n).
.
(2.143)
Calculate the predicted measurement .yˆ (k|k − 1), its corresponding covariance matrix .Pyy (k|k−1), and the state-measurement cross-covariance matrix .Pxy (k|k−1) by 1 γ i (k). 2n
(2.144)
1 γˆi (k)γˆi (k)T + R(k). 2n
(2.145)
1 χˆi (k|k − 1)γˆi (k)T , 2n
(2.146)
2n
yˆ (k|k − 1) =
.
i=1 2n
Pyy (k|k − 1) =
i=1 2n
Pxy (k|k − 1) =
i=1
where .γˆi (k) = γ i (k) − yˆ (k|k − 1). Similar to the process noise .q(k − 1), the measurement noise .r(k) in CKF is also assumed to obey the Gaussian distribution with zero mean and covariance matrix .R(k).
2.2 Nonlinear Kalman Filters
35
Compute the state estimate .xˆ (k) and its corresponding covariance matrix .P(k) by .
xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − yˆ (k|k − 1) .
(2.147)
P(k) = P(k|k − 1) − K(k)Pyy (k|k − 1)KT (k).
(2.148)
K(k) = Pxy (k|k − 1)P−1 yy (k|k − 1).
(2.149)
2.2.4 Others From a numerical stability point of view, it would be better to work with matrix square roots of covariances instead of plain covariance matrices. Therefore, the square root cubature Kalman filter (SCKF) is presented, which is a useful method to prevent the problem of filter divergence and the loss of positive definiteness. In SCKF, the error covariance matrix is decomposed by QR decomposition, and the square root matrix is finally used in the updates. Assume at time .k − 1, .SP (k − 1) is the square root of the covariance matrix .P(k−1), i.e., .P(k−1) = SP (k−1)STP (k−1), T .SQ (k − 1) is a square root of .Q(k − 1), i.e., .Q(k − 1) = SQ (k − 1)S (k − 1), .SR (k) Q is a square root of .R(k), i.e., .R(k) = SR (k)STR (k): (1) State prediction: Evaluate the cubature points ξ i (k − 1) = SP (k − 1)I(i) + xˆ (k − 1), i = 1, 2, . . . , 2n,
.
(2.150)
where I(i)
.
√ n[1]i , i = 1, 2, . . . , n √ − n[1]i−n , i = n + 1, n + 2, . . . , 2n
(2.151)
with .[1]i being the i-th column vector of the .n × n identity matrix .I. Compute the propagated cubature points χ i (k|k − 1) = f(ξ i (k − 1)), i = 1, 2, . . . , 2n.
.
(2.152)
Estimate the priori state and corresponding square root of the covariance matrix 1 χ i (k|k − 1). 2n
(2.153)
SP (k|k − 1) = Tria([χ ∗ (k|k − 1), SQ (k − 1)]),
(2.154)
2n
xˆ (k|k − 1) =
.
i=1
where .Tria(·) denotes a general triangularization algorithm and .χ ∗ (k|k − 1) is given by
36
2 Kalman Filtering
1 [χ (k|k − 1) − xˆ (k|k − 1), . . . , χ 2n (k|k − 1) 2n 1
χ ∗ (k|k − 1) =
.
− xˆ (k|k − 1)], i = 1, 2, . . . , 2n.
(2.155)
(2) Measurement update: Evaluate the cubature points ξ i (k|k − 1) = SP (k|k − 1)I(i) + xˆ (k|k − 1),
.
i = 1, 2, . . . , 2n.
(2.156)
Compute the propagated cubature points γ i (k|k − 1) = h(ξ i (k|k − 1)),
.
i = 1, 2, . . . , 2n.
(2.157)
Estimate the priori measurement and corresponding square root of the covariance matrix 1 ˆ (k|k − 1) = .y γ i (k|k − 1). 2n
(2.158)
Syy (k|k − 1) = Tria([γ ∗ (k|k − 1), SQ (k)]),
(2.159)
2n
i=1
where .γ ∗ (k|k − 1) is given by 1 [γ (k|k − 1) − yˆ (k|k − 1), . . . , γ 2n (k|k − 1) 2n 1
γ ∗ (k|k − 1) =
.
− yˆ (k|k − 1)], i = 1, 2, . . . , 2n.
(2.160)
Evaluate the cross-covariance matrix Pxy (k|k − 1) = χ ∗∗ (k|k − 1)γ ∗ (k|k − 1)T ,
.
(2.161)
where .χ ∗∗ (k|k − 1) is given by χ ∗∗ (k|k − 1) =
.
1 [ξ (k|k − 1) − xˆ (k|k − 1), . . . , ξ 2n (k|k − 1) 2n 1 − xˆ (k|k − 1)],
i = 1, 2, . . . , 2n.
(2.162)
Compute the Kalman gain K(k) = (Pxy (k|k − 1)/STyy (k|k − 1))/Syy (k|k − 1).
.
(2.163)
2.2 Nonlinear Kalman Filters
37
Estimate the posterior state and square root of the posterior covariance matrix .
xˆ (k) = xˆ (k|k − 1) + K(k)(y(k) − yˆ (k|k − 1)). ∗∗
∗
SP (k) = Tria([χ (k|k − 1) − K(k)γ (k|k − 1), K(k)SR (k)]).
(2.164) (2.165)
Table 2.2 provides the main points of different nonlinear KFs. Although every nonlinear KF has merits and weaknesses, the accuracy of estimation is not guaranteed because it is affected by many practical factors. For different scenarios, different nonlinear KFs should be adopted.
Table 2.2 Summary of main points of different nonlinear Kalman filters Filter EKF
Core technique First-order Taylor linearization
.•
UKF
.•
Unscented transform .• High-dimensional state estimation
.•
CKF
.•
.•
SCKF
.•
Spherical–radial cubature rule .• High-dimensional state estimation Spherical–radial cubature rule .• QR decomposition .• High-dimensional state estimation
Merit Computational simplicity .• Stable .• No free parameters
Appropriate for high nonlinearity .• Easier to implement .• High filtering accuracy
Appropriate for high nonlinearity .• Easier to implement .• No free parameters .• High filtering accuracy .• High-dimensional state state estimation .• Good numerical stability .• High filtering accuracy
Deficiency Crude approximations and loss of higher order .• Truncation error may result in divergence .• Implementation difficulties of Jacobian matrices .• Sensitive to strong nonlinearity and initial error .• Sensitive to outliers .• Performance depended on scaling parameter .• Sensitive to outliers .• Assumption of Gaussian distribution .• Noise covariance matrix may lose positivity .• Sensitive to outliers .• Assumption of Gaussian distribution .• Noise covariance matrix may lose positivity .• Sensitive to outliers .• Assumption of Gaussian distribution
.•
38
2 Kalman Filtering
2.3 Robust Kalman Filters The KF and its extensions in general perform well under Gaussian assumptions. The unknown or non-Gaussian noise situations are common in many real scenarios of automatic control and target tracking. Thus, the performances of KF and its extensions are likely to get worse when applied to the above-mentioned environments. Some reasons for the performance degradation are that the accurate noise distribution is difficult to be characterized and the KF updates cannot suppress impulsive noises. In recent years, some robust methods have been developed to solve the non-Gaussian filtering problem, which are reviewed next.
2.3.1 H-Infinite Filter (HIF) An H-infinity filter (HIF) is usually employed to deal with the situations where there are system model errors and unknown noise characteristics in the measurement [62]. In HIF, the state space equation can be expressed by .
x(k) = F(k)x(k − 1) + q(k − 1).
(2.166)
y(k) = H(k)x(k) + r(k).
(2.167)
z(k) = L(k)x(k),
(2.168)
where .L(k) is a user-specified matrix associating .z(k) with .x(k). Applying the game theory to the HIF method results in a cost function [62] N −1
J1 =
.
x(0) − xˆ (0)
2 P−1 (0)
z(k) − zˆ (k)
k=1 N −1
q(k
+
k=1
2 S(k)
− 1)2Q−1 (k−1)
+ r(k)2R−1 (k)
, (2.169)
where .P(0), Q(k − 1), R(k), and .S(k) are symmetric, positive definite matrices chosen by the engineers based on the specific problem. As it is very difficult to directly minimize .J1 , a performance boundary is chosen to seek an estimation strategy that satisfies the threshold [62]: J1
Rmax ⎩ ˆ i (k) + β(k)ρ i (k) (1 − β(k))R others
(2.189)
i ˆ i (k) and the upper limit where the lower limit condition .Rmin can ensure positive .R i condition .Rmax can quickly reduce the credibility of measurement .yi (k). Actually, i .Rmax can also be used to determine whether the measurement is abnormal. If the measurement is abnormal, it will be discarded. Through the above processing, the i ], thus ensuring better adaptive ˆ i (k) can be limited within the interval .[R i , Rmax .R min ability and filtering reliability [10]. ˆ Based on the estimated .R(k), the AKF can be implemented. The AKF can improve the filtering accuracy, in which the noise matrix .R(k) is updated with changes in time on the basis of some prior knowledge to approximate the practical noise distribution as accurately as possible. Actually, there are many algorithms to solve this problem in the literature. Here, we will not elaborate further.
2.3 Robust Kalman Filters
43
2.3.3 Student’s t-Based Kalman Filter (SKF) In the state space model, the process and measurement noises often follow nonGaussian distributions. The challenge is how to use a proper noise distribution to derive the KF. If approximate conditional means and covariances can be obtained, the algorithm will have good accuracy. The heavy-tailed noises are one of the nonGaussian noises, which are common in many real scenarios of automatic control and target tracking. A linear student’s t-based filter (STF) was proposed in [74], which uses a student’s t distribution to approximate heavy-tailed distribution. Assume that both measurement and process noises have heavy-tailed characteristics. The corresponding measurement PDF .p(y(k)|x(k)) can be assumed as [74] p(y(k)|x(k)) = St(y(k); H(k)x(k), R(k), ν),
.
(2.190)
where .St(y(k); H(k)x(k), R(k), ν) denotes a student’s t PDF with mean vector H(k)x(k), scale matrix .R(k) (the nominal covariance matrix), and degree of freedom (dof) parameter .ν. To model the heavy-tailed process noise, the state transfer PDF .p(x(k)|x(k − 1)) is also approximated as a student’s t distribution [74]: .
p(x(k)|x(k − 1)) = St(x(k); F(k − 1)x(k − 1), Q(k − 1), ν).
.
(2.191)
(1) Time update: According to the Bayesian theorem, the state prediction PDF .p(x(k)|y(1 : k − 1)) can be denoted by p(x(k)|y(1 : k − 1)) = St(x(k); x(k|k − 1), P(k|k − 1), ν(k − 1)), (2.192)
.
where state prediction and corresponding covariance matrix are .
xˆ (k|k − 1) = F(k − 1)ˆx(k − 1).
(2.193)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
(2.194)
(2) Measurement update: The joint predicted PDF of state and measurement are assumed to be [74] p(x(k), y(k)|y(1 : k − 1)) =
xˆ (k) xˆ (k|k − 1) P(k|k − 1) P(k|k − 1)HT (k) , St ; , H(k)P(k|k − 1) Pyy (k) y(k) H(k)ˆx(k|k − 1)
.
ν(k − 1)) ,
(2.195)
44
2 Kalman Filtering
where .Pyy (k) = H(k)P(k|k − 1)HT (k) + R(k). The conditional PDF of state given measurements .p(x(k)|y(1 : k)) = St(x(k); xˆ (k), P(k), ν(k)) can be obtained using .
ν(k) = ν(k − 1) + m.
(2.196)
xˆ (k) = xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1)].
(2.197)
K(k) = P(k|k − 1)HT (k)[H(k)P(k|k − 1)HT (k) + R(k)]−1.
(2.198)
P(k) =
ν(k − 1) + 2y (k) ν(k − 1) + m
[I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T
+ K(k)R(k)KT (k).
(2.199)
2y (k) = (y(k) − H(k)ˆx(k|k − 1))T (Pyy (k))−1 (y(k) − H(k)ˆx(k|k − 1)). (2.200) Compared with KF, the STF can correct the posterior covariance matrix via the dof parameter to improve the filtering accuracy. The STF not only exhibits good robustness to heavy-tailed process and measurement noises but also has almost identical computational complexity to the KF.
2.3.4 Particle Filter (PF) The derivation of KF based on Bayesian estimation is strongly depended on the Gaussian assumptions and linear system assumption. However, the filtering distributions may often be non-Gaussian and the system is nonlinear, in which case a closed form solution of the Bayesian estimation is not available. The particle filter (PF) based on sequential importance resampling (SIR) can be a better alternative, which is a method for providing Monte Carlo approximations to the solution of the Bayesian filtering equations [9, 76]. The PF is based on the state space model (2.101)–(2.102), which can also be expressed as probability space model with the state transition probability density .p(x(k)|x(k − 1)) and the likelihood probability density .p(y(k)|x(k)). For the nonlinear system, the state estimate can be expressed as xˆ (k) = E [x(k)|y(1 : k)] = f(x(k − 1))p(x(k − 1)|y(1 : k − 1))dx(k − 1).
.
(2.201)
Then, the main inference problem can often be reduced to the computation of the following expectation over the posterior distribution [9, 76]
2.3 Robust Kalman Filters
45
E [f(x(k))] =
f(x(k))p(x(k)|y(1 : k))dx(k).
.
(2.202)
For the above nonlinear function .f(x(k)) or non-Gaussian distribution .p(x(k)|y(1 : k)), the integral is difficult to calculate. So it is difficult to obtain the analytic solution of .xˆ (k). Monte Carlo approach provides a numerical method for calculating the integrals of the form (2.202) [9, 76]. Monte Carlo refers to a general methodology where the expectation can be calculated by using time averages instead of the integral. Assuming that N samples have been sampled from the posterior probability, the calculation of the posterior probability can be expressed as p(x(k)|y(1 ˆ : k)) =
.
N 1 δ(x(k) − xˆ i (k)) ≈ p(x(k)|y(1 : k)), N
(2.203)
i=1
where .δ(·) is Dirac delta function; .xˆ i (k) is the i-th sampling point. For the general function .f(x(k)), the expectation .E [f(x(k))] is E [f(x(k))] =
.
f(x(k))p(x(k)|y(1 ˆ : k))dx(k)
N 1 f(x(k))δ(x(k) − xˆ i (k))dx(k) = N i=1
=
N 1 i f(x (k)). N
(2.204)
i=1
However, the posteriori probability .p(x(k)|y(1 : k)) is unavailable due to its complicated functional form. Therefore, the sequential importance sampling (SIS) is adopted, which selects a known importance density to perform the Monte Carlo approximation [9, 76]. Using the SIS, the expectation of the state transfer function is rewritten as E [f(x(k))] = f(x(k))p(x(k)|y(1 : k))dx(k)
.
=
f(x(k))
p(x(0 : k)|y(1 : k))dx(0 : k − 1)dx(k)
=
f(x(k))p(x(0 : k)|y(1 : k))dx(0 : k)
=
f(x(k))
p(x(0 : k)|y(1 : k)) π(x(0 : k)|y(1 : k))dx(0 : k), π(x(0 : k)|y(1 : k))
(2.205)
46
2 Kalman Filtering
where .π(x(0 : k − 1)|y(1 : k − 1)) is an available importance density. By using the Markov properties of the model, we get the following recursion for the full posterior distribution of states .x(0 : k) given the measurements .y(1 : k) [9, 76]: .p(x(0
: k)|y(1 : k))
=
p(y(k)|x(0 : k), y(1 : k − 1))p(x(0 : k)|y(1 : k − 1)) p(y(k)|y(1 : k − 1))
=
p(y(k)|x(0 : k), y(1 : k − 1))p(x(k)|x(0 : k − 1), y(1 : k − 1))p(x(0 : k)|y(1 : k − 1)) p(y(k)|y(1 : k − 1))
=
p(y(k)|x(k))p(x(k)|x(k − 1))p(x(0 : k)|y(1 : k − 1)) p(y(k)|y(1 : k − 1))
∝ p(y(k)|x(k))p(x(k)|x(k − 1))p(x(0 : k)|y(1 : k − 1)).
(2.206)
After rearrangement, (2.205) can be decomposed into the expectation integral and form the approximation as follows: .E
[f(x(k))] p(x(0 : k)|y(1 : k)) π(x(0 : k)|y(1 : k))dx(0 : k) = f(x(0 : k)) π(x(0 : k)|y(1 : k)) W (x(k)) π(x(0 : k − 1)|y(1 : k − 1))dx(0 : k) = f(x(0 : k)) p(y(k)|y(1 : k − 1)) 1 = f(x(k))W (x(k))π(x(0 : k − 1)|y(1 : k − 1))dx(0 : k) p(y(k)|y(1 : k − 1)) f(x(0 : k))W (x(k))π(x(0 : k − 1)|y(1 : k − 1))dx(0 : k) = p(y(k)|x(k))p(x(k)|x(k − 1))p(x(0 : k)|y(1 : k − 1))dx(0 : k) f(x(0 : k))W (x(k))pi(x(0 : k − 1)|y(1 : k − 1))dx(0 : k) = W (x(k))π(x(0 : k − 1)|y(1 : k − 1))dx(0 : k) =
E [f(x(0 : k))W (x(k))] , E [W (x(k))]
(2.207)
where .W (x(k)) = p(y(k)|x(k))p(x(k)|x(k−1))p(x(0:k)|y(1:k−1)) . The above expectation π(x(0:k−1)|y(1:k−1)) can be solved by sampling N samples and use the sample average to find their expectations. Thus, Eq. (2.205) can be approximated by N 1 f(xi (k))W (xi (k)) N N 1 i i=1 (xi (k)), = f(x (k))W .E [f(x(k))] ≈ N N 1 i=1 W (xi (k)) N i=1
(2.208)
2.3 Robust Kalman Filters
(xi (k)) = where .W
47
W (xi (k)) . The particle weights can be expressed as the recursive N W (xi (k)) i=1
form: i (k)) ∝ p(x(0 : k)|y(1 : k))
.W (x
π(x(0 : k)|y(1 : k))
=
p(y(k)|x(k))p(x(k)|x(k − 1))p(x(0 : k)|y(1 : k − 1)) π(x(0 : k − 1)|y(1 : k − 1))π(x(k)|x(k − 1), y(1 : k))
= W (xi (k − 1))
p(y(k)|x(k))p(x(k)|x(k − 1)) , π(x(k)|x(k − 1), y(1 : k))
(2.209)
where .π(x(0
: k)|y(1 : k)) = π(x(0 : k − 1)|y(1 : k − 1))π(x(k)|x(0 : k − 1), y(1 : k)).
(2.210) When selecting .π(x(k)|x(k − 1), y(1 : k)) = p(x(k)|x(k − 1)), the particle weight is i (k)) ∝ W (xi (k − 1))p(y(k)|x(k)).
.W (x
(2.211)
There exists a degradation problem in the application of SIS filtering. After several iterations, the weights of many particles have become very small and can be ignored. Only a few particles have relatively large weights, and fewer particles in the state space are effective. With the increase of the number of invalid sampling particles, a lot of calculations are wasted on particles that hardly contribute to the estimation of the posterior probability distribution, which makes the estimation performance degrade. The degeneracy problem can be solved by using a resampling procedure. The idea of the resampling procedure is to remove particles with very small weights and duplicate particles with large weights [9, 76]. Therefore, one can add a resampling step to the SIS algorithm generating the sequential importance resampling (SIR). The PF based on the SIR can be summarized as follows: • Draw N particles .xi (0) from the prior distribution .p(xi (0)). • For each .k = 1, . . . , T , do the following:
(i) Draw N particles .xi (k) from the importance distributions .p(x(k)|xi (k − 1)), i.e., propagate the particles to the next time step using the process equation .f (·): i (k|k − 1) = f (xi (k)).
.x
(2.212)
48
2 Kalman Filtering
(ii) Calculate the weight according to i (k)) = p(y(k)|x(k)),
.W (x
(2.213)
where .p(y(k)|x(k|k − 1)) is based on PDF of the measurement noise. • Normalize the weights to sum to unity (x .W
i (k)) =
W (xi (k)) N
.
(2.214)
W (xi (k))
i=1
• Perform the state estimate . N1
N
(xi (k)). f(xi (k))W
i=1
• If the effective number of particles is too small, perform resampling.
Note that PF is a nonlinear filtering method for nonlinear and non-Gaussian systems. The PF is also a recursive algorithm, where the basic idea can be described as follows: first, sample a group of random samples from the state-transfer probability density, which are called particles, and then adjust the weights and positions of particles according to the measurement values. When the sample size is large enough, the particle filter can approximate the true posterior probability density of the state variable and realize the optimal state estimation. Compared with the aforementioned filters, the PF in general has a higher estimation accuracy. However, the improvement of precision comes at the cost of an increase of computation.
2.3.5 Huber-Based Kalman Filter (HKF) Here we discuss how to solve measurement update problem using the Huber technique [82]. The Huber filtering measurement update arises from the minimization of cost function .J (x(k))
=
L
ρ(ei (k)),
(2.215)
i=1
where .ei (k) refers to the i -th component of the normalized residual error in (2.89); is usually defined as a piecewise function
.ρ(·)
! .ρ(ei (k))
=
where o is a threshold parameter.
1 e2 (k) |ei (k)| < o 2 i , 1 2 o |ei (k)| − 2 ei (k) |ei (k)| ≥ o
(2.216)
2.3 Robust Kalman Filters
49
Similar to the batch-mode regression to derive the KF, the Huber-based KF (HKF) can be obtained by replacing (2.91) to (2.215). The solution of Huber regression can be determined from the derivative of the cost function L .
∂ei (k) = 0, ∂x(k)
(2.217)
|ei (k)| < o ei (k) o sgn |ei (k)| |ei (k)| ≥ o.
(2.218)
ϕ(ei (k))
i=1
where .ϕ(ei (k)) =
∂ρ(ei (k)) ∂ei (k)
is given by !
.ϕ(ei (k))
=
One can denote the function .(ei (k)) = ! .(ei (k))
=
ϕ(ei (k)) ei (k) ,
that is,
|ei (k)| < o 1 o sgn|ei (k)| |ei (k)| ≥ o. ei (k)
(2.219)
Equation (2.217) could be written in a matrix form T (k)(k)(W(k)x(k) − d(k)) = 0,
.W
(2.220)
where .(k) = diag{(ei (k))}. Equation (2.220) may be expanded to T (k)(k)W(k)x(k) = WT (k)(k)d(k).
.W
(2.221)
Therefore, the iterative solution to (2.221) could be given in the form of
j +1 (k) = WT (k)j (k)W(k) −1 WT (k)j (k)d(k) ,
ˆ .x
(2.222)
where the superscript j refers to the iteration index. The method can be initialized by using the least squares solution
0 (k) = WT (k)W(k) −1 WT (k)d(k) .
ˆ .x
(2.223)
The state estimate error covariance matrix is computed by .P(k)
= WT (k)j (k)W(k).
(2.224)
Note that as .o → ∞ we have . → I. In this case, the HKF measurement update is equal to the standard KF. As a key technology in statistical signal processing, the improvement of the robustness of KF is a research priority in communication, automatic control, and
50
2 Kalman Filtering
Table 2.3 Summary of main points of the Robust Kalman filters Filter Core technique HIF .• Performance bound .• Minimax problem .• Constrained optimization problem performed through the use of Lagrange multipliers AKF .R(k) is adaptive .• Theoretical .Pyy (k) are approximated by the lumped average of random sequences
STF Student’s t modeling for heavy-tailed noises
Merit Computational simplicity .• Stable .• Robustness
Deficiency
Computational simplicity
.•
.•
.•
High-dimensional state estimation .• VB inference .• High filtering accuracy
SIA method to approximate .• High filtering accuracy the posterior density .• Theoretically for any non-Gaussian noise .• No free parameters HKF .• Huber cost function .• Resistance to .• Nonlinear regression interference .• Good robustness PF
Convergence of .P(k) cannot be guaranteed .• Performance depended on the size of data window .• Not suitable for noise covariance matrix changed rapidly .• Performance depended on free parameter .• Performance may degrade when noise is of light-tailed or multimodal distribution .• Assumption of stationary noise High computational complexity .• Particle degeneracy .• Poor numerical stability .• Sensitive to free parameters .• Accuracy degradation in non-stationary cases due to the fixed parameters
aerospace technology. In addition to the aforementioned robust KFs, many variants are also developed to further improve the estimation accuracy and robustness. As the most typical representatives of robust estimation, HIF, AKF, RSTKF, PF, and HUKF are extensively investigated and have been widely used in various fields. Table 2.3 summarizes the main points of these robust KFs. Every robust KF is derived from different perspectives to implement different goals. For a specific problem, they have merits and deficiencies. How to estimate the desired signal or state efficiently and accurately is still a rich research direction.
2.4 Conclusion This chapter introduces the basic principles and derivations of Kalman filters based on the estimation theory. Different estimation methods including Bayesian estimation, maximum a posterior estimation, linear minimum variance estimation, least variance estimation, weighted least squares, and batch-model regression are
2.4 Conclusion
51
used to derive the linear KF. Based on the different approximations of the nonlinear functions and of the probability density functions, the EKF, UKF, CKF, and RSCKF are designed. In addition, to improve the robustness, some robust KFs are also presented including HIF, AKF, RSTKF, PF, and HUKF. In all these KFs, it is relatively easy to use the batch-model regression. Therefore, in the following chapters, the robust KFs based on batch-model regression are mostly investigated.
Chapter 3
Information Theoretic Criteria
The optimization criteria in information theoretic learning (ITL) have gained increasing attention over the past few years, which use the information theoretic quantities (e.g., entropy or correntropy) estimated directly from the data instead of the usual second-order statistical measures, such as variance and covariance, as the optimization costs for machine learning and signal processing [88, 89]. Information theoretic quantities can capture higher-order statistics of the signals and can offer potentially significant performance improvement in machine learning and signal processing applications especially under non-Gaussian noise environments (heavytailed noise or severe outliers). The ITL links information theory, nonparametric PDF estimators, and reproducing kernel Hilbert spaces (RKHS) in a simple and unconventional way. In particular, the correntropy as a nonlinear similarity measure in kernel space has its root in Renyi’s entropy. In ITL, correntropy is also a local similarity measure (only concerns about the local part of the error PDF falling within the kernel size), and it is naturally a robust cost for machine learning and signal processing [92–96]. In supervised learning, such as regression, the optimization problem can be formulated as that of maximizing the correntropy between model output and desired response. This optimization criterion in ITL is called the maximum correntropy criterion (MCC) [90]. The MCC is insensitive to the error modes that are far from the origin and fall outside the kernel size, which has been successfully used in robust adaptive filtering in impulsive (heavy-tailed) noise environments [97–99]. As another key principle in ITL, entropy measures the uncertainty of error; therefore, the optimization criterion is formulated as minimum error entropy (MEE) which aims to reduce the error uncertainty via minimizing Renyi’s quadratic entropy. The MEE weights the error PDF by itself such that the error modes are easily detected anywhere in the error space. The robustness and efficiency of MEE can be achieved by self-adjusting the localness of the weighting function based on the error distribution [88]. Recently, the MEE has been successfully applied in robust regression, classification, system identification, and adaptive filtering [91, 103–111]. Compared with MCC dealing with heavy© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_3
53
54
3 Information Theoretic Criteria
tailed non-Gaussian noises, the MEE improves the capability of model prediction when facing more complex non-Gaussian noises, such as noises from multimodal distributions. Numerous experimental results show that the MEE can outperform MCC in many situations although its computational complexity is higher [112, 113]. Sometimes, in order to obtain an optimal solution, the MEE needs to manually add a bias to the model to yield zero mean error [91]. To more naturally adjust the error mean, the MEE with fiducial points (MEEF) was proposed, which can automatically anchor the error mean around zero [88, 114]. In the following, the MCC, MEE, and MEEF are briefly reviewed.
3.1 Maximum Correntropy Criterion (MCC) 3.1.1 Correntropy 3.1.1.1
Definition
Correntropy is a local similarity measure between two arbitrary random variables [90]. Given two random variables .X ∈ R and .Y ∈ R on the same probability, .pXY (x, y) denotes the joint probability density function with .x, y being realizations of .X, Y . Define .κ : R × R → R as a continuous, symmetric (usually translation invariant), and positive definite kernel function. The correntropy .C (X, Y ) is defined by: C (X, Y ) = EX,Y [κ(X, Y )] =
.
κ (x, y)pXY (x, y) dxdy,
(3.1)
where .EX,Y is the expected value with respect to the joint distribution of .(X, Y ). If the kernel function .κ (·, ·) satisfies Mercer condition, correntropy .C (X, Y ) can also be expressed as a correlation measure in a functional Hilbert space .F: C (X, Y ) = E ϕ(X), ϕ(Y )F = E ϕ(X)T ϕ(Y )
.
= Tr(RXY ),
(3.2)
where .·, ·F denotes the inner product operation in Hilbert space .F, .RXY = E ϕ(X)ϕ(Y )T , and the trace of .RXY is equal to the sum of the eigenvalues. Specially, .ϕ(·) denotes a nonlinear mapping induced by .κ, which transforms its argument into a high-dimensional (infinite for Gaussian kernels) Hilbert space .F, satisfying .ϕ(X)T ϕ(Y ) = κ(X, Y ). Correntropy is therefore essentially a secondorder statistic of the mapped feature space data. The kernel function can be any
3.1 Maximum Correntropy Criterion (MCC)
55
positive definite and translation invariant kernel function. Generally, the Gaussian kernel with the universal approximation is widely utilized and takes the form of: (x − y)2 , .κσ (x, y) = √ exp − 2σ 2 2π σ 1
(3.3)
where .σ denotes the kernel size of the correntropy. In most practical situations, however, only limited numbers of data are available, and the joint distribution .pX,Y (X, Y ) is usually unknown. In these cases, one can estimate the correntropy using a sample mean estimator: N 1 ˆ κ(e(i)), C(X, Y) ≈ N
.
(3.4)
i=1
where .e(i) = x(i) − y(i), with .{x(i),y(i)}N i=1 being N samples drawn from .pX,Y (X, Y ). Below, we present several basic properties of the correntropy, the proofs of which can be found in [129]. Property 1: Correntropy is symmetric: .C (X, Y ) = C (Y, X). Property 2: For the Gaussian kernel, correntropy is positive and bounded: .0 < C (X, Y ) ≤ √ 1 , which reaches its maximum if and only if .X = Y . 2π σ Property 3: For the Gaussian kernel, correntropy involves all the even moments ∞
(X−Y )2i (−1)i of random variable: .κσ (X, Y ) = √ 1 E . 2i i! σ 2i 2πσ i=0
N Property 4: Assume that the data . xi, yi i=1 are drawn from the joint PDF .pXY (x, y). Let .p ˆ e (·) be the Parzen estimate of the error PDF from the data N .{ei = xi − yi } i=1 . Then .C (X, Y ) is the value of .pˆ e (e) at the point zero, that is: Cˆ (X, Y ) = pˆ e (0),
.
where .pˆ e (ε) =
1 N
N
(3.5)
κ(ε − ei ).
i=1
N Property 5: Assume i.i.d. data . xi, yi i=1 are drawn from the joint PDF .pXY (x, y). Then .C (X, Y ) is the integral of .p ˆ X,Y (x, y) along the line .x = y: Cˆ (X, Y ) =
.
pˆ X,Y |x=y=u (x, y)du.
(3.6)
Property 6: Correntropy induces a metric in the sample space. Given two vectors T T .X = (x1 , x2 , . . . , xN ) and .Y = (y1 , y2 , . . . , yN ) in the sample space, the function .CI M(X, Y ) = κ(0) − Cˆ (X, Y ) defines a metric in the sample space and is named as the correntropy-induced metric (CIM).
56
3 Information Theoretic Criteria
3.1.2 Maximum Correntropy Criterion 3.1.2.1
MCC Estimation
The correntropy can be used to build new robust cost functions in many applications of signal processing and machine learning. Consider a supervised learning setting where the goal is to optimize a model .f that receives a random variable U and outputs .Z = f (U ) to approximate a target variable (or teaching variable) D. Here, .f (·) denotes an unknown mapping from the input to output that needs to be learned. A central problem in this learning task is the definition of a loss function (or a similarity measure) to compare Z with D. The well-known minimum mean square error (MMSE) criterion has been the workhorse of supervised learning, which aims to minimize the MSE cost .E e2 with .e = D − Z being the error variable. The combination of the linear feedforward model and MSE yields a set of equations that can be solved analytically. However, MMSE is only optimal when the error variable is Gaussian-distributed, which is seldom the case in real-world applications. The error distributions tend to be skewed and with long tails, which create problems for MSE. Therefore, many optimal solutions are indeed not practical, simply because of the criterion that is used in the optimization. Many non-MSE optimization criteria were proposed in the literature to address the limitations of the MSE. The maximum correntropy criterion (MCC) is one of the hotspots of current research, which performs very well particularly when the error distribution is heavy-tailed. Under the MCC, the model is optimized (or trained) to maximize the correntropy between the target D and output Z: f∗ = arg max C(D, Z)
.
f∈
= arg max E [κσ (D, Z)] ,
(3.7)
f∈
where .f∗ denotes the optimal model and . stands for the hypothesis space. In a practical situation, given finite input-target samples .{u(i), d(i)}N i , the model can be trained through maximizing a sample estimator of correntropy: ˆ f∗ = arg max C(D, Z)
.
f∈
= arg max f∈
N 1 κ(e(i)), N
(3.8)
i=1
where .e(i) = d(i) − z(i) = d(i) − f (u(i)) is the i-th error sample and the input vector can be denoted by .u(i) = [u1 (i), u2 (i), · · · , uM (i)]T ∈ Rm . Note N
κ(e(i)) is neither concave nor convex but is a continuously that .JMCC = N1 i=1
differentiable function at any .e(i) .(i = 1, 2, · · · , N).
3.1 Maximum Correntropy Criterion (MCC)
57
Fig. 3.1 Surface and contour of the MCC cost function
2
JMCC
1.5 1 0.5 0 2
0
e(i)
1 N
-2
-2
0
2
e(i)
Figure 3.1 shows the surface and contour of the MCC cost function .JMCC = N
κ(e(i)). As one can see clearly, the MCC can achieve the global maximum i=1
when .e(i) = 0. Furthermore, from the observation of the CIM contours, we can divide the space in three regions, namely, the Euclidean region, transition region (around the value of the radius where the Gaussian function changes slope), and rectification region. The MCC behaves like the 2-norm distance (circular contours) in the Euclidean region when the error vector is close to zero, similarly like the 1norm distance (diamond contour) in the transition region when the error gets larger, and eventually approaches a zero-norm in the rectification region when the error is large the metric levels off and becomes insensitive to the value of the error vector, which clearly explains the robustness of correntropy for outliers. Of course when the instantaneous error changes amplitude from sample to sample, the weighting of each error sample in MCC is affected as described above, so large errors are effectively just counted, independent of their amplitude (the 0-norm limit).
3.1.2.2
Fixed-Point Solution
Consider a linear-in-parameter (LIP) model in which the i-th output sample is z(i) = wT u(i), where .w = [w1 , w2 , · · · , wM ]T ∈ Rm is the output weight vector (model parameter) to be learned. Based on the MCC, the optimal weight vector can be determined by maximizing the following cost function:
.
w∗ = arg max JMCC (e(i)) m w∈R
.
N 1 = arg max κ(e(i)), m N w∈R i=1
(3.9)
58
3 Information Theoretic Criteria
MCC where .e(i) = d(i) − wT u(i). Setting . ∂J∂w = 0, we have:
.
N N 1 1 κ(e(i))e(i)u(i) = κ(e(i))(d(i) − wT u(i))u(i) = 0. N N i=1
(3.10)
i=1
From (3.10), we have: −1 w = NMCC NMCC uu dx ,
(3.11)
.
where: ⎧ N 1
⎪ MCC ⎪ κ(e(i))u(i)uT (i) ⎨ Nuu = N i=1 . N 1
⎪ ⎪ ⎩ NMCC = κ(e(i))d(i)u(i). dx N i=1
(3.12)
MCC is a function of the error, which depends on The kernel function in .NMCC uu and .Ndx the weight vector .w (note that .e(i) = d(i) − wT u(i)). Thus, the solution of (3.11) accepts a fixed-point form, which can be expressed by .w = g(w) with:
−1 g(w) = NMCC NMCC uu dx
.
=
N 1 κ(d(i) − wT u(i))u(i)uT (i) N
×
−1
i=1
N 1 T κ(d(i) − w u(i))d(i)u(i) . N
(3.13)
i=1
Unlike least squares, .g(w) is a function of the weight vector, so we cannot solve (3.13) in a manner of closed form solution. One has to solve it using an iterative update manner. A fast constructive method to find the optimal solution is the following fixed-point iterative algorithm: wτ +1 = g(wτ ),
.
(3.14)
where .wτ denotes the solved weight vector at iteration .τ with initial condition .w0 . The fixed-point iterative algorithm is an efficient way to solve the MCC solution, which involves no step-size and may converge to the optimal solution faster than the gradient-based methods. Generally, the iteration is stopped until some convergence condition is satisfied. Here, we assume that the matrix .NMCC is invertible with uu MCC > 0, where .λmin [·] denotes the minimum eigenvalue of .NMCC .λmin Nuu uu .
3.1 Maximum Correntropy Criterion (MCC)
59
In the following, we will present a sufficient condition under which the fixedpoint algorithm (3.14) will surely converge to a unique fixed-point solution [102]. In mathematics, the contraction mapping theorem (also known as the Banach fixedpoint theorem) provides an important tool for proving the convergence of a fixedpoint algorithm [100]. By the contraction mapping theorem, the convergence of the fixed-point equation (3.14) is guaranteed if .∃βMCC > 0 and .0 < αMCC < 1 such that the initial weight vector . w0 p ≤ βMCC and .∀w ∈ w ∈ Rm : w p ≤ βMCC ; it holds: .
g(w) p ≤ βMCC ∇w g(w) p = ∂g(w) ∂w ≤ αMCC ,
(3.15)
p
where . · p denotes an .Lp -norm of a vector or an induced norm of a matrix defined AX p x p =0 A p
by . A p = max
with .p ≥ 1, .A ∈ Rm×m , .X ∈ Rm×1 , and .∇w g(w) denotes
the .m × m Jacobian matrix of .g(w) with respect to .w, given by: ∇w g(w) =
.
∂g(w) ∂g(w) ∂w1 ∂w2
···
∂g(w) ∂wm
,
(3.16)
where: ∂g(w) ∂wj
.
=
∂
NMCC uu
−1
NMCC dx
∂wj −1 MCC MCC −1 ∂NMCC MCC −1 ∂NMCC MCC uu dx Nuu = − Nuu Ndx + Nuu ∂wj ∂wj N MCC −1 1 ∂ T = − Nuu g(w) ∂wj κ(e(i))u(i)u (i) N i=1 N −1 1 ∂ T + NMCC uu ∂wj κ(e(i))d(i)u (i) N i=1N MCC −1 1
T (i) e(i)u (i)κ(e(i))u(i)u g(w) = − Nuu s Nσ2 i=1 N −1 1
e(i)us (i)κ(e(i))d(i)uT (i) . + NMCC 2 uu Nσ i=1
(3.17) The following two theorems hold: Theorem 3.1 If .βMCC > ξMCC =
N √
|d(i)| u(i) 1 m i=1 N
T λmin u(i)u (i)
solution of equation .ϕ(σ ) = βMCC with:
i=1
and .σ ≥ σ ∗ , where .σ ∗ is the
60
3 Information Theoretic Criteria N √
|d(i)| u(i) 1 m
ϕ(σ ) =
.
λmin
i=1 N
, σ ∈ (0, ∞),
(3.18)
κ(βMCC u(i) 1 + |d(i)|)u(i)uT (i)
i=1
then . g(w) 1 ≤ βMCC for all .w ∈ {w ∈ Rm : w 1 ≤ βMCC }.
Proof See Appendix 3.A in Sect. 3.4.1. A much stronger condition is presented in Corollary 3.1. Corollary 3.1 If
βMCC
>
.
max (βMCC u(i) 1 +|d(i)|)2
i σ ≥ 2 log(βMCC /ξMCC ) {w ∈ Rm : w 1 ≤ βMCC }.
.
ξMCC
N √
|d(i)| u(i) 1 m i=1 N
λmin u(i)uT (i)
=
and
i=1
, then . g(w) 1
≤
βMCC for all .w
Proof See Appendix 3.B in Sect. 3.4.2. Theorem 3.2 If .βMCC > ξMCC =
∈
N √
|d(i)| u(i) m i=1 N
λmin u(i)uT (i)
and .σ ≥ max σ ∗ , σ † , where
i=1
σ ∗ is the solution of equation .ϕ(σ ) = βMCC and .σ † is the solution of equation .ψ(σ ) = αMCC (0 < αMCC < 1) with: .
.ψ(σ )
=
N
√
m (βMCC u(i) 1 + |d(i)|) u(i) 1 βMCC u(i)uT (i)1 + |d(i)u(i)| i=1
σ 2 λmin
N
κ(βMCC u(i) 1 + |d(i)|)u(i)uT (i)
,
i=1
(3.19) then it holds that . g(w) 1 ≤ βMCC and . ∇w g(w) 1 ≤ αMCC for all .w ∈ {w ∈ Rm : w 1 ≤ βMCC }. Proof See Appendix 3.C in Sect. 3.4.3.
Remark 3.2 By Theorem 3.3 and Banach fixed-point theorem, given an initial weight vector with . w0 1 ≤ βMCC , the fixed-point algorithm will surely converge to a unique solution (the fixed point) in the range .w ∈ {w ∈ Rm : w 1 ≤ βMCC } provided that the kernel size is larger than a certain value. The value of .αMCC (.0 < αMCC < 1) guarantees fast convergence speed.
3.1 Maximum Correntropy Criterion (MCC)
3.1.2.3
61
Robustness Analysis
Since correntropy is a local similarity measure (hence less sensitive to outliers), it is naturally a robust cost. To provide the robustness analysis for system parameter identification, a linear errors-in-variables (EIV) model with scalar variables is considered in the following [130]. In EIV, .ui is the true but unobserved input of the unknown system at instant i, and .u˜ i = ui + viu is the observed input in which .viu stands for the input noise. The observed output and true input of the unknown system are related via .di = w0 ui + vio , where .vio denotes the output (observation) noise and .w0 is the true system parameter to be estimated. In general, both .vio and .viu are assumed to be independent of .ui . The model’s output .zi is given by .zi = wu˜ i = w(ui + viu ). Let .εu ≥ 0 and .εo ≥ 0 be two nonnegative numbers, 2, · · ·, N } be the sample index set, and u .IN = {1, o .IN (εu , εo ) = i : i ∈ IN , v ≤ εu , v ≤ εo be a subset of .IN satisfying .∀i ∈ i i IN (εu , εo ), .viu ≤ εu , and .vio ≤ εo . In addition, the following two assumptions are made: Assumption 3.1 .N > |IN (εu , εo )| = M > cardinality of the set .IN (εu , εo ).
N 2,
where .|IN (εu , εo )| denotes the
Assumption 3.2 .∃oMCC > 0 such that .∀i ∈ IN (εu , εo ), .|u˜ i | ≥ oMCC . N Remark 3.3 Assumption 3.1 means that there are M (more than u . 2 ) samples o in which the amplitudes of the input and output noises satisfy . vi ≤ εu , v i ≤ u εo and .N − M (at leastone) samplesthat may contain large outliers with . vi > εu o u o or . vi > εo (possibly . vi εu or . vi εo ). The Assumption 3.2 is reasonable since for a finite number of samples, the minimum amplitude is in general larger than zero.
With the above notations and assumptions, the following theorem holds [131]: Theorem 3.3 If .σ >
εo +|w0 |εu M 2 log N−M
, then the estimated optimal solution .wMCC under
MCC criterion satisfies .|wMCC − w0 | ≤ ξ , where: ξ=
.
1 oMCC
⎛" # 2 # |w | + ) (ε N − M ε o 0 u ⎝$−2σ 2 log exp − − M 2σ 2 ⎞
+ εo + |w0 | εu ⎠ . Proof See Appendix 3.D in Sect. 3.4.4. The following corollaries are direct consequences of Theorem 3.4:
(3.20)
62
3 Information Theoretic Criteria
Corollary 3.3 Suppose that .εo + |w0 | εu > 0, and let .σ =
ς(ε o +|w0 |εu ) M 2 log N−M
with .ς > 1.
Then, the optimal solution .wMCC under MCC satisfies .|wMCC − w0 | ≤ ξ , where:
ξ=
.
1 oMCC
⎛ " # # (εo +|w0 |εu )2 − ⎜ # log exp − 2 2σ ⎜ς $ ⎝ log N −M
N −M M
M
⎞ ⎟ + 1⎟ ⎠ (εo + |w0 | εu ) . (3.21)
Corollary 3.4 If .εo + |w0 | εu = 0, then optimal solution .wMCC under MCC satisfies .|wMCC − w0 | ≤ ξ , where: ξ=
.
σ oMCC
2 log
M . 2M − N
(3.22)
Remark 3.4 According to Corollary 3.3, if .εo + |w0 | εu > 0 and .σ is greater than a certain value, the absolute value of the estimation error .εMCC = wMCC − w0 will be upper bounded by (3.21). In particular, if both .εo and .εu are very small, the upper bound .ξ will also be very small. This implies that the MCC solution .wMCC can be very close to the true value .w0 even in the presence of .N − M outliers whose values can be arbitrarily large, provided that there are M (.M > N/2) samples disturbed by small noise amplitudes (bounded by .εv and .εu ). In the extreme case, as stated in Corollary 3.4, if .εo + |w0 | εu = 0, we have .ξ → 0+ as .σ → 0+. In this case, the MCC estimation is almost unbiased as the kernel size .σ is small enough. It is worth noting that due to the inequalities used in the derivation, the real errors in practical situations are usually much smaller and rather far from the upper bound .ξ . This fact will be confirmed by the simulation results provided in the following: Remark 3.5 Although the analysis results cannot be applied directly in practice to improve the estimation performance, they explain clearly why and how the MCC estimation is robust with respect to outliers especially those with large amplitudes. In addition, according to Theorem 3.3 and Corollaries 3.3 and 3.4, the kernel size .σ plays an important role in MCC, which should be set to a proper value (possibly close to the threshold . ς(εo +|w0M|εu ) so as to achieve the best performance. 2 log
N−M
Remark 3.6 In robust statistics theory, there is a very important concept called breakdown point, which quantifies the smallest proportion of “bad” data in a sample that a statistics can tolerate before returning arbitrary values. The MCC estimator is essentially a redescending M-estimator, whose breakdown point has been extensively studied in [132–136]. In particular, it has been shown that the redescending M-estimators’ breakdown point with a bounded objective function can be very close to 1/2 in the location estimation [132, 133]. We, however, investigate the robustness of a special redescending M-estimator, namely, the MCC estimator,
3.1 Maximum Correntropy Criterion (MCC)
63
Fig. 3.2 Optimal solutions .wMSE , .wLAD , .wTLS , and .wMCC and the region between .w0 ± ξ with (a) different .μu (.μv = 10.0) and (b) different .μv (.μu = 10.0) (from [131])
in different ways: (1) an EIV model is considered, and (2) a bound for the estimation error is provided. Here, the analysis results are verified in a simple linear EIV system with scalar variables, where input noise .nui and output noise .nvi are assumed to be the Gaussian mixture model [131], given by: a1 N(−μu , σu2 ) + (1 − a1 )N(0, σu2 ) + 2 b1 nvi ∼ N(−μv , σv2 ) + (1 − b1 )N(0, σv2 ) + 2 nui ∼
.
a1 N(μu , σu2 ). 2 b1 N(μv , σv2 ), 2
(3.23) (3.24)
where .0 ≤ a1 , b1 ≤ 1 are two weighting coefficients that control the proportions of outliers (located around .±μu or .±μv ) in the observed input and output signals. In the simulations below, unless otherwise specified the variances are .σu2 = σv2 = 0.001, and the weighting factors are set to .a1 = b1 = 0.15. Denote the solutions under mean square error (MSE), least absolute deviation (LAD), total least squares (TLS), and MCC by .wMSE , .wLAD , .wTLS , and .wMCC , respectively. Figure 3.2 demonstrates the optimal solutions .wMSE , .wLAD , .wTLS , and .wMCC and the region between .w0 ± ξ with different .μu (or .μv ), where .μv (or .μu ) is fixed at .μv = 10.0 (or .μu = 10.0). From these results, one can observe the following: • The MCC solution lies within the region between .w0 ± ξ , being rather close to the true value .w0 = 3.0 and very little affected by both the input and output outliers. • The estimation error .εMCC = wMCC − w0 could be much smaller in amplitude than the upper bound .ξ . • Solutions .wMSE , .wLAD , and .wTLS are sensitive to outliers and can go far beyond the region between .w0 ± ξ . Especially, the MSE solutions are rather sensitive to the input outliers.
64
3.1.2.4
3 Information Theoretic Criteria
Extended Versions of MCC
As a robust nonlinear similarity measure in kernel space, correntropy has received increasing attention in machine learning and signal processing. In particular, the maximum correntropy criterion (MCC) has recently been successfully applied in robust regression and filtering. The kernel function in correntropy is usually a Gaussian kernel, which is desirable due to its smoothness and strict positive definiteness. However, the Gaussian function is, of course, not always the best choice. Therefore, various extended versions of MCC are developed, including generalized maximum correntropy criterion (GMCC) [137], maximum mixture correntropy criterion (MMCC) [138], maximum correntropy criterion with variable center (MCC-VC) [139], and maximum multikernel correntropy criterion (MMKCC) [140]. Their cost functions are listed in Table 3.1, and key characteristics are summarized as follows: • In GMCC, a generalized Gaussian density (GGD) function is adopted as the kernel function to improve the generality and flexibility. The .α-order generalized correntropic loss (GC-loss) function behaves like different norms (from .Lα to .L0 ) of the data in different regions, which includes the original correntropy with Gaussian kernel as a special case [137]. There are, however, two free parameters to be selected. • In MMCC, the mixture correntropy uses a linear combination of two zero-mean Gaussian kernels with different kernel sizes as the kernel function to enhance the flexibility of single kernel function, which was successfully applied to random neural networks and kernel adaptive filters (KAFs) for function approximation and data regression [138].
Table 3.1 Extended versions of MCC Method GMCC [137]
MMCC [138]
Cost function .JGMCC
=
α1 2α2 (1/α1 )
i=1
2 exp(− e(i) α2 )
where .α1 > 0 is the shape parameter and .α2 > 0 is the scale (bandwidth) parameter N
2 e2 (i) .JMMCC = η exp(− e (i) 2 ) + (1 − η) exp(− 2 ) i=1
MCC-VC [139]
N
2σ1
2σ2
where .σ1 and .σ2 are the kernel sizes of the Gaussian functions and .0 ≤ η ≤ 1 is the mixture coefficient N
2 exp(− (e(i)−c) .JMCC−VC = ) 2 2σ i=1
MMKCC [140]
where c is the center location N
M 2
.JMMKCC = ρ(i) exp(− (e(j2σ)−c(i)) ) 2 (i) j =1 i=1
where .ρ = [ρ(1), ρ(2), · · · , ρ(M)]T is the mixture coefficient vector, T .c = [c(1), c(2), · · · , c(M)] is the center vector and T .σ = [σ (1), σ (2), · · · , σ (M)] is the kernel size vector
3.2 Minimum Error Entropy (MEE)
65
• Different from centering the Gaussian kernel function at zero, which computes the distance of the error from the origin, the MCC-VC adopts a variable center kernel function whose center can be located at a nonzero position according to error PDF matching to optimize the kernel size and center location [139]. • Combining the ideas of variable center correntropy and mixture correntropy, the MMKCC uses a general mixture Gaussian function as the kernel function, where each component of the mixture Gaussian function can be centered at a different location (not limited to zero mean). The MMKCC involves more free parameters, which can be determined by error PDF matching and clustering methods [140].
3.2 Minimum Error Entropy (MEE) 3.2.1 Renyi’s Entropy Define the error of two random variables X and Y as .e = Y − X. In ITL, entropy provides an approach to measure the information contained in error e [88]. As a well-known entropy, Shannon’s entropy is defined as: ∞ H (e) = −
.
pe (x)log pe (x)dx
−∞
= Ee −log pe (x) ,
(3.25)
where .p(·) denotes the probability distribution function of the error e and .Ee [·] denotes the expectation operator with respect to e. As a generalization of Shannon’s entropy, the Renyi’s entropy has been widely used in ITL, defined as: 1 log .Hα (e) = 1−α
∞ peα (x)dx −∞
1 = log Iα (e), 1−α
(3.26)
where .α is the entropy order, .α > 0, .α = 1, and .Iα (e) is referred to as the order-.α information potential [88], given by: Iα (e) =
∞
.
−∞
pα (x)dx = Ee [pα−1 (e)].
(3.27)
Renyi’s entropy equals Shannon’s entropy for .α → 1. In practice, the analytical expression for the error entropy is not available in general; thus, we need to estimate it nonparametrically from the samples. As entropy
66
3 Information Theoretic Criteria
is a functional of PDF, one common approach is to directly substitute an estimate of the PDF into the sample mean approximation for the expectation. The kernel density estimation (KDE) is a typical density estimation method, also referred to as the Parzen windowing method [141, 142], which is a well-understood and useful nonparametric technique. By the KDE approach, the nonparametric estimate of the error’s PDF is: pˆ e (x) =
.
N 1 κσ (x − e(j )), N
(3.28)
j =1
where .{e(1), e(2), · · · , e(N)} are the error samples and N is the sliding data length. In the following, we only consider Renyi’s entropy of order (.α = 2). In this case, minimizing .H (e) is equivalent to maximizing the quadratic information potential (QIP) .I2 . Plugging the estimated PDF (3.28) into the sample mean approximation, an empirical version of the QIP can be expressed as: N N N 1 1 p(e(i)) ˆ = 2 κσ (e(i) − e(j )). Iˆα (e) = N N
.
(3.29)
i=1 j =1
i=1
3.2.2 Minimum Error Entropy 3.2.2.1
MEE Estimation
The supervised learning under MEE aims to make the model output as close as possible to the target through minimizing the error’s entropy. For .α = 2, minimizing the entropy .H2 (e) is equivalent to maximizing the information potential .I2 (e). Therefore, under MEE, the optimal model can be solved by [88]: f∗ = arg max Iˆ2 (e)
.
f∈ N N 1 = arg max κσ (e(i) − e(j )) N2 f∈
(3.30)
i=1 j =1
The QIP is computed by a double summation over all error samples, which is essentially an adaptive cost function that can adapt to the error distribution. Figure 3.3 shows the surface and contour of the MEE cost function .JMEE = N N 1
κσ (e(i) − e(j )). The minimum error entropy (or maximum QIP) is N2 i=1 j =1
achieved when all errors are equal, i.e., .e(1) = e(2) = . . . c, which is independent to the error mean. In some specific applications, the zero-mean error under MEE can be achieved by adding a bias to the model output [88].
3.2 Minimum Error Entropy (MEE)
67
Fig. 3.3 Surface and contour of the MEE cost function
2
JMEE
1.5 1 0.5 0 2
0
e(j)
3.2.2.2
-2
-2
0
2
e(i)
Fixed-Point Solution
For a linear-in-parameter (LIP) model, the optimal weight vector .w in MEE is solved by: w∗ = arg max JMEE m w∈R
.
N N 1 = arg max 2 κσ (e(i) − e(j )). m N w∈R i=1 j =1
(3.31)
There is no closed form solution of (3.31) even for a linear model. One can apply a gradient-based iterative algorithm to search the solution, starting from an initial point. Below, we derive a fixed-point iterative algorithm, which is, in general, much faster than a gradient-based method (a gradient method can be viewed as a special case of the fixed-point methods but involves a step-size parameter). Taking the derivative of .JMEE about .w gives: ∂JMEE ∂w
.
=
∂ 1 ∂w N 2
=
1 2σ 2 N 2
=
1 2σ 2 N 2
N
N
i=1 j =1 N
N
i=1 j =1 N
N
κσ (e(i) − e(j )) κσ (e(i) − e(j )) (e(i) − e(j )) [u(i) − u(j )] κσ (e(i) − e(j )) (d(i) − d(j )) [u(i) − u(j )]
i=1 j =1 N
N
κσ (e(i) − e(j )) [u(i) − u(j )] [u(i) − u(j )]T w − 2σ 21N 2 i=1 j =1 MEE = 2σ1 2 NMEE du − Nuu w , (3.32)
68
3 Information Theoretic Criteria
where: ⎧ ⎪ MEE ⎪ ⎪ ⎨ Ndu = .
⎪ ⎪ MEE ⎪ ⎩ Nuu =
Let
.
∂ 1 ∂w N 2
1 N2 1 N2
N
N
i=1 j =1 N
N
i=1 j =1
N
N
i=1 j =1
κσ (e(i) − e(j )) (d(i) − d(j )) [u(i) − u(j )] (3.33) κσ (e(i) − e(j )) [u(i) − u(j )] [u(i) − u(j )]T .
κσ (e(i) − e(j )) = 0, and assume that the matrix .NMEE is uu
invertible. Then, we obtain the following solution: −1 w = NMEE NMEE uu du .
.
(3.34)
The above solution in form is very similar to the well-known Wiener solution and also similar to the MCC solution in (3.11). It is not a closed form solution, since both MEE matrix .NMEE uu and vector .Ndu depend on the weight vector .w (note that the error .e(i) depends on .w). Clearly, the solution of (3.34) can be updated by a fixed-point equation, expressed by .w = g(w), where: −1 g(w) = NMEE NMEE uu du .
.
(3.35)
Similar to the MCC, the solution (fixed-point) of the equation .w = g(w) can also be found by the fixed-point iterative algorithm: wτ +1 = g(wτ ),
.
(3.36)
with initial condition .w0 . The above algorithm is called the fixed-point MEE algorithm. Following the MCC analysis, the convergence of the fixed-point MEE can be proved by the contraction mapping theorem (Banach fixed-point theorem). According to the contraction mapping theorem, the convergence of the fixed-point MEE algorithm (3.36) is guaranteed if .∃βMEE > 0 and .0 < αMEE < 1 such that initial weight vector . w0 p ≤ βMEE , and .∀w ∈ w ∈ Rm : w p ≤ βMEE ; it holds that: .
g(w) p ≤ βMEE ∇w g(w) p = ∂g(w) ∂w ≤ αMEE , p
where:
(3.37)
3.2 Minimum Error Entropy (MEE)
∂g(W) ∂wj
.
=
∂
NMEE uu
−1
NMEE du
69
∂wj MEE MEE −1 ∂NMEE MEE −1 NMEE + NMEE −1 ∂Ndu uu N = − Nuu uu uu du ∂wj ∂wj N
N MEE −1 1
∂ = − Nuu ∂wj κσ (e(i) − e(j )) [u(i) − u(j )] N2 j =1 i=1 T × [u(i) − u(j )] g(w) N
N −1 1
∂ + NMEE uu ∂wj κσ (e(i) − e(j )) (d(i) − d(j )) N j =1 i=1
× [u(i) − u(j )] N
N MEE −1
1 = − Nuu (e(i) − e(j )) [us (i) − us (j )] κσ (e(i) − e(j )) 2 2Nσ i=1 j =1 × [u(i) − u(j )] [u(i) − u(j )]T g(w) N
N MEE −1
1 + Nuu (e(i) − e(j )) [us (i) − us (j )] κσ (e(i) − e(j )) 2Nσ 2 i=1 j =1 × (d(i) − d(j )) [u(i) − u(j )] .
(3.38) To obtain a sufficient condition to guarantee the convergence of the fixed-point algorithm (3.36), we provide two theorems below: Theorem 3.4 If N
N √
|d(i) − d(j )| u(i) − u(j ) 1 m j =1 i=1
βMEE > ξMEE =
.
λmin
N N
i=1 j =1
,
(3.39)
[u(i) − u(j )] [u(i) − u(j )]
T
and .σ is larger than .σ ∗ , where .σ ∗ denotes the solution of equation .ϕ(σ ) = βMEE with: .ϕ(σ ) N
N √
|d(i) − d(j )| u(i) − u(j ) 1 m
= λmin
j =1 i=1
N
N
j =1 i=1
exp
u(i)−u(j ) 1 +|d(i)−d(j )|)2 (β − MEE 4σ 2
,
σ ∈ (0, ∞),
[u(i) − u(j )] [u(i) − u(j )]T
(3.40)
70
3 Information Theoretic Criteria
then it holds that . g(w) 1 ≤ βMEE for all .w ∈ {w ∈ Rm : w 1 ≤ βMEE }. ∗ † Theorem 3.5 If .βMEE > ξMEE and .σ ≥ max σ , σ , where .σ † represents the solution of equation .ψ(σ ) = αMEE .(0 < αMEE < 1), with: Proof See Appendix 3.E in Sect. 3.4.5.
.ψ(σ )
= 2σ 2 λmin
√ γMEE m , N N
u(i)−u(j ) 1 +|d(i)−d(j )|)2 (β T − u(j )] − u(j )] exp − MEE [u(i) [u(i) 4σ 2
σ ∈ (0, ∞),
j =1 i=1
(3.41) and: .γMEE
=
N N j =1 i=1
βMEE u(i) − u(j ) 1 + |d(i) − d(j )| u(i) − u(j ) 1
, × βMEE [u(i) − u(j )] [u(i) − u(j )]T + (d(i) − d(j )) [u(i) − u(j )] 1 1
(3.42) then . g(w) 1 ≤ βMEE and . ∇w g(w) 1 ≤ αMEE are achieved for all .w ∈ {w ∈ Rm : w 1 ≤ βMEE }.
Proof See Appendix 3.F in Sect. 3.4.6.
Remark 3.7 According to Theorem 3.5 and Banach fixed-point theorem, if the kernel size is larger than a certain value, the fixed-point algorithm (3.36) will surely converge to a unique solution (the fixed point) in the range .w ∈ {w ∈ Rm : w 1 ≤ βMEE } when the initial weight vector satisfies . w0 p ≤ βMEE . The convergence speed of the fixed-point MEE is determined by .αMEE (.0 < αMEE < 1). It is worth noting that the derived sufficient condition will be, certainly, a little loose, due to the inequalities in the proof.
3.2.2.3
Robustness Analysis
Following the similar ideas of MCC, the robustness analysis of MEE estimation can be performed with EIV model of scalar variables (see [143] for details). Assume that .εu ≥ 0 and .εo ≥ 0 are two nonnegative numbers (usually very small) and: ) * SN (εu , εo ) = (i, j ) : (i, j ) ∈ SN , viu − vju ≤ εu , vio − vjo ≤ εo
.
(3.43)
is a subset of .SN satisfying .∀(i, j ) ∈ SN (εu , εo ), .|viu − vju | ≤ εu , and .|vio − vjo | ≤ εo . In addition, the following two assumptions are made: Assumption 3.3 .N(N − 1)/2 > |SN (εu , εo )| = M > |SN |/2 = .|SN (εu , εo )| denotes the cardinality of the set .SN (εu , εo ).
N (N −1) , 4
where
3.2 Minimum Error Entropy (MEE)
71
Assumption 3.4 .∃oMEE > 0 such that .∀(i, j ) ∈ SN (εu , εo ), .u˜ i − u˜ j ≥ oMEE . Remark 3.8 Assumption 3.3 means that there are M (more than . |S2N | ) samples (with indices belonging to .SN ) in which the differences between the input noise pairs and output noise pairs satisfy .|viu − vju | ≤ εu , |vio − vjo | ≤ εo , and .|SN | − M (at least one) sample pairs that have larger differences with .|viu − vju | > εu or o o .|v − v | > εo (hence possibly contain large outliers). Given a finite number of i j samples, Assumption 3.2 holds if there are no two identical observed input samples. Then, the following theorem holds: Theorem 3.6 If .σ ≥ ξ=
1 oMEE
εo +|w0 |εu , β1
then for any .wMEE that satisfies .|wMEE − w0 | >
(σ β2 + εo + |w0 | εu ), it holds that: JMEE (wMEE ) < JMEE (w0 ),
(3.44)
.
where .JMEE (w) =
κσ (ei − ej ), .wMEE is the optimal solution under MEE,
(i,j )∈SN
w0 is the true systemparameter, and .β1 and .β2 are two positive numberssatisfying (|SN |−M) (εo +|w0 |εu ) .κσ (β1 ) = κσ (0) and .κσ (β2 ) = κσ − (|SNM|−M) κσ (0). M σ .
+√ , Remark 3.9 For Gaussian kernel .κσ (x) = (1 2π ) exp(−x 2 2), the positive numbers .β1 and .β2 can be easily computed as: Proof See Appendix 3.G in Sect. 3.4.7.
β1 = .
β2 =
2 log |SNM |−M 2 0 |εu ) −2 log exp − (εo +|w − 2σ 2
|SN |−M M
(3.45) .
From Theorem 3.6, one can obtain Theorem 3.7. Theorem 3.7 If .σ ≥
εo +|w0 |εu , then the MEE solution .wMEE β1
satisfying .|wMEE − w0 | ≤ ξ , where .ξ =
1 oMEE
= arg max JMEE (w) w∈R (σ β2 + εo + |w0 | εu ).
Proof See Appendix 3.H in Sect. 3.4.8.
The following two corollaries are direct consequences of Theorem 3.8: 0 |εu ) Corollary 3.4 Assume that .εo + |w0 | εu > 0, and let .σ = ς(εo +|w , with .ς > 1. β1 Then, the MEE solution .wMEE = arg max JMEE (w) satisfies .|wMEE − w0 | ≤ ξ , w∈R where:
ξ=
.
ς β2 + β1 (εo + |w0 | εu ) . oMEE β1
(3.46)
72
3 Information Theoretic Criteria
Corollary 3.5 If .εo + |w0 | εu = 0, then MEE solution .wMEE = arg max JMEE (w) w∈R satisfies .|wMEE − w0 | ≤ ξ , where: ξ=
.
σβ2 . oMEE
(3.47)
Remark 3.10 By Corollary 3.4, if .εo + |w0 | εu > 0 and kernel size .σ is larger than a certain value, the estimation error .εMEE = wMEE − w0 will be bounded by .|εMEE | ≤ ξ , where .ξ is given by (3.46). If both .εo and .εu are very small, the bound .ξ will also be very small. Thus, the MEE solution .wMEE can be very close to the true value .w0 if there are M (.M > |SN |/2) sample pairs (with indices belonging to .SN ) in which the differences between the noises are very small (bounded by .εv and .εu ), no matter how large the noise differences in the remaining sample pairs. In the extreme case, as stated in Corollary 3.5, if .εo + |w0 | εu = 0, we have .ξ → 0+ as .σ → 0+. Therefore, the MEE estimate is unbiased as .σ → 0+ if there are M (.M > |SN |/2) sample pairs disturbed by the same noises (.viu = vju and .vio = vjo ). The condition that .M > |SN |/2 (namely, more than .|SN |/2 sample pairs satisfy |viu − vju | ≤ εu and .|vio − vjo | ≤ εv ) implies that most of the input and output observation noises (.viu and .vio ) are concentrated around several locations, with relatively few noises scattered in other places (whose values can be arbitrarily large). The most common case is that the majority of the noises are concentrated around zero, while others (usually viewed as outliers) significantly deviate from the bulk of data. It is worth noting that a larger M usually leads to a smaller bound .ξMEE . As shown in Fig. 3.4, the bound .ξMEE will decrease with M increasing, where .w0 = 1, .N = 500, .εo = εu = 0.05, .ς = 1.5, and .oMEE = 0.5. Similar to MCC, the robustness analysis of MEE is tested in the EIV model by simulations. Figure 3.5 demonstrates the optimal solutions .wMMSE , .wTLS , and .wMEE and the region between .w0 ±ξ with different .μu (or .μv ), where .μv (or .μu ) is fixed at .μv = 5.0 (or .μu = 20.0). From Fig. 3.5, we observe (1) the MEE solutions are very close to the true value .w0 ± ξ (almost unbiased), lying within the regions between .
Fig. 3.4 Values of .ξ with different M values
3.3 Minimum Error Entropy with Fiducial Points (MEEF)
73
Fig. 3.5 Optimal solutions .wMMSE , .wTLS , and .wMEE and the region between .w0 ± ξ with (a) different .μu (.μv = 5.0) and (b) different .μv (.μu = 20.0) (from [143])
w0 ± ξ ; (2) the MEE solutions are very little influenced by both input and output outliers (no matter how large their values), while the MMSE and TLS solutions are very sensitive to the outliers (especially the input outliers) and may go far beyond the regions between .w0 ± ξ ; and (3) the estimation error .εMEE = wMEE − w0 can be much smaller in amplitude than .ξ (due to the inequalities used in the derivation).
.
3.3 Minimum Error Entropy with Fiducial Points (MEEF) 3.3.1 MEEF Estimation The MEE aims to minimize the error entropy in the estimation problem. It is beneficial for considering a zero-valued reference in the error sequence in (3.31) so that errors are impelled to move toward zero to avoid the extra adaptive parameter in the mapper. Specifically, an augmented error .E[e(0), e] is formulated by introducing an extra zero-valued error .e(0) = 0 called the fiducial point to the error samples .e = [e(1), e(2), · · · , e(N )] [88, 114]. Therefore, on the basis of (3.31), the MEEF is formed in the consideration of the fiducial point [114]: Vˆ2 (e) =
.
N N 1 κσ (e(i) − e(j )) (N + 1)2 i=0 j =0
⎤ N N N 1 ⎣2 = κσ (e(j )) + κσ (e(i) − e(j )) + κσ (0)⎦ . (N + 1)2 ⎡
j =1
i=1 j =1
(3.48)
74
3 Information Theoretic Criteria
Equation (3.48) can be further simplified by neglecting constants .κσ (0) and .1/(N + 1)2 : Vˆ2 (e) = η
N
.
κσ (e(j )) + (1 − η)
j =1
N N
κσ [e(j ) − e(i)],
(3.49)
i=1 j =1
where .η ∈ [0, 1] is a balanced parameter. Therefore, with the MEEF, the model can be optimized by: f∗ = arg max Vˆ2 (e)
.
f∈
⎧ ⎫ N N N ⎨ ⎬ η κσ (e(j )) + (1 − η) κσ [e(j ) − e(i)] = arg max ⎩ ⎭ f∈ j =1
(3.50)
i=1 j =1
The first term in (3.50) represents the MCC term, which is caused by the fiducial point and is beneficial for anchoring the peak of the error PDF at zero in comparison with the MEE without fixed location of error PDF. The MEEF inherits the desirable features of the MCC and MEE and therefore works in an elegant and robust way [114]. Figure 3.6 shows the surface and contour of the MEEF cost function N N
N
.JMEEF = η κσ (e(j )) + (1 − η) κσ [e(i) − e(j )]. One can observe that j =1
i=1 j =1
MEEF can obtain the global maximum value at .e(i) = e(j ) = 0, and the contour becomes a combination of MEE (along the bisector of the joint error space) and MCC contours, which centers the error and tries to make all the errors close to zero along the other directions of the error space. Fig. 3.6 Surface and contour of the MEEF cost function
2
JMEEF
1.5 1 0.5 0 2
0
e(j)
-2
-2
0
e(i)
2
3.3 Minimum Error Entropy with Fiducial Points (MEEF)
75
3.3.2 Fixed-Point Solution According to the MEEF cost function in (3.50), the optimal weights of a linear adaptive filter can be solved by: w∗ = arg max JMEEF . m w∈R
.
(3.51)
Taking the derivative of .JMEEF with respect to .w, we have: .
1 ∂JMEEF = 2 NMEEF − NMEEF w , uu dx ∂w σ
(3.52)
where .NMEEF and .NMEEF are: uu dx NMEEF =η dx
N
κ(e(i))d(i)u(i)
i=1
+(1 − η) .
=η NMEEF uu
N
N
N κ(e(i) − e(j ))
i=1 j =1 × [d(i) − d(j )] [u(i) − u(j )]
(3.53)
κ(e(i))u(i)uT (i)
i=1
+(1 − η)
N
N κ(e(i) − e(j ))
T i=1 j =1 × [u(i) − u(j )] [u(i) − u(j )] .
Setting (3.52) to 0, one can solve .w by a fixed-point equation: wτ +1 = g(wτ ),
.
(3.54)
−1 MEEF where .g(wτ ) = NMEEF Ndx . uu Following the same ideas in MCC and MEE, Banach’s fixed-point theorem can be applied in analyzing the convergence of the fixed-point algorithm. The convergence of the fixed-point algorithm in the form of Eq. (3.54) is guaranteed if .∃βMEEF > 0 such that the initial weight vector satisfies . w0 p ≤ βMEEF and .∃0 < αMEEF < 1; it holds that: .
g(w) p ≤ βMEEF ∇w g(w) p = ∂g(w) ∂w ≤ αMEEF ,
(3.55)
p
for .w ∈ w ∈ Rm : w p ≤ βMEEF . According to Banach’s theorem, if the MEEF meets the above conditions, the fixed-point algorithm will be convergent. In the following, two theorems are presented:
76
3 Information Theoretic Criteria
Theorem 3.8 If .βMEEF > ξMEEF , where: ξMEEF
.
√
m η
N
|d(i)| u(i) 1 + (1 − η)
i=1
=
N
λmin η
u(i)uT (i) + (1 − η)
i=1
N
N
i=1 j =1
N
N
i=1 j =1
|d(i) − d(j )| u(i) − u(j ) 1
,
[u(i) − u(j )][u(i) − u(j )]T (3.56)
and .σ ≥ σ ∗ and .σ ∗ is the solution of equation .ϕ(σ ) = βMEEF with: .ϕ(σ )
N N
N
√ |d(i)| u(i) 1 + (1 − η) |d(i) − d(j )| u(i) − u(j ) 1 m η i=1 j =1
i=1
=
λmin
,
σ ∈ (0, ∞)
(3.57) and =η
N
.
κ(βMEEF u(i) 1 + |d(i)|)u(i)uT (i)
i=1
+ (1 − η)
N N
κ(βMEEF u(i) − u(j ) 1 + |d(i) − d(j )|)
i=1 j =1
× (u(i) − u(j ))(u(i) − u(j ))T ,
(3.58)
then . g(w) 1 ≤ βMEEF for all .w ∈ {w ∈ Rm : w 1 ≤ βMEEF }. Theorem 3.9 If .βMEEF > ξMEEF and .σ ≥ max σ ∗ , σ † , where .σ ∗ is the solution of equation .ϕ(σ ) = βMEEF and .σ † is the solution of equation .ψ(σ ) = αMEEF (0 < αMEEF < 1) with: .ψ(σ )
= √ γMEEF m
⎡ ⎢ ⎢ σ 2 λmin ⎢ ⎣
η +(1 − η)
N
N
i=1 j =1
N
κ(βMEEF u(i) 1 + |d(i)|)u(i)uT (i)
i=1
κ(βMEEF u(i) − u(j ) 1 + |d(i) − d(j )|) [u(i) − u(j )] [u(i) − u(j )]
⎤ ⎥ ⎥ ⎥ T⎦
(3.59) and:
3.4 Appendices
77
γMEEF = βMEEF η
N
i=1
(βMEEF u(i) 1 + |d(i)|) u(i) 1 u(i)uT (i)1
+βMEEF (1 − η)
N
N
) 1 + |d(i) − d(j )| βMEEF u(i) − u(j × T i=1 j =1 × u(i) − u(j ) 1 [u(i) − u(j )] [u(i) − u(j )] 1 N
+η (βMEEF u(i) 1 + |d(i)|) u(i) 1 d(i)uT (i)1 i=1
N
N
βMEEF u(i) − u(j ) 1 + |d(i) − d(j )| +(1 − η) T i=1 j =1 × u(i) − u(j ) 1 [d(i) − d(j )] [u(i) − u(j )] 1 , (3.60)
.
then it holds that . g(w) 1 ≤ βMEEF and . ∇w g(w) 1 ≤ αMEEF for all .w ∈ {w ∈ Rm : w 1 ≤ βMEEF }. The proofs of Theorems 3.8 and 3.9 are similar to those for MCC and MEE and hence will be omitted here.
3.4 Appendices 3.4.1 Appendix 3.A Proof Since the induced matrix norm is compatible with the corresponding vector Lp -norm, we have: −1 MCC −1 MCC MCC ≤ NMCC g(w) 1 = N N N dx dx , uu uu 1
.
1
(3.61)
1
−1 where NMCC is the 1-norm (also referred to as the column sum norm) of the uu 1 MCC −1 inverse matrix Nuu , which is simply maximizing absolute column sum of the matrix. According to the matrix theory, the following inequality holds: −1 MCC −1 √ MCC −1 √ MCC , . = mλmax Nuu Nuu ≤ m Nuu 1
(3.62)
2
−1 where NMCC is the 2-norm (also referred to as the spectral norm) of uu 2 MCC −1 Nuu , which equals the maximum eigenvalue of the matrix. Further, we have:
78
3 Information Theoretic Criteria
λmax
−1 NMCC = uu
1 λmin NMCC uu
=
λmin
N N
[κ(e(i))u(i)uT (i)]
i=1 .
(a1 )
≤
λmin
N N
(3.63)
,
κ(βMCC u(i) 1 +|d(i)|)u(i)uT (i)
i=1
where (a1 ) comes from: |e(i)| = d(i) − wT u(i) . ≤ w 1 u(i) 1 + |d(i)| ≤ βMCC u(i) 1 + |d(i)| .
(3.64)
In addition, it holds that: N
MCC N =1 [κ(e(i))d(i)u(i)] dx N 1 i=1
(a2 )
≤
.
(a3 )
≤
1 N 1 N
N
i=1 N
1
κ(e(i))d(i)u(i) 1
(3.65)
d(i) 1 u(i) 1 ,
i=1
where (a2 ) follows from the convexity of the vector 1-norm, and (a3 ) is because of 2 κσ (e(i)) = exp(− (e(i)) ) ≤ 1 for any e(i). 2σ 2 Combining (3.61)–(3.63) and (3.65), we derive: N √
|d(i)| u(i) 1 m
g(w) 1 ≤ ϕ(σ ) =
.
λmin
i=1 N
.
(3.66)
κ(βMCC u(i) 1 + |d(i)|)u(i)uT (i)
i=1
Since ϕ(σ ) is a continuous and monotonically decreasing function of σ and satisfies lim ϕ(σ ) = ∞: σ →0+
√ .
lim ϕ(σ ) = ξMCC =
σ →∞
N
|d(i)| u(i) 1 i=1 N .
T λmin u(i)u (i) i=1 m
(3.67)
Hence, if βMCC > ξMCC , the equation ϕ(σ ) = βMCC will have a unique solution σ ∗ over (0, ∞), and if σ > σ ∗ , we have ϕ(σ ) ≤ βMCC . This completes the proof.
3.4 Appendices
79
3.4.2 Appendix 3.B Proof We have: √
ϕ(σ ) ≤ .
m
N
|d(i)| u(i) 1
N
2 T λmin κ max (βMCC u(i) 1 +|d(i)|) u(i)u (i) i=1
i
=
(3.68)
i=1
ξMCC
. κ max (βMCC u(i) 1 +|d(i)|)2 i
If βMCC > ξMCC and σ ≥
max (βMCC u(i) 1 +|d(i)|)2 i
2log(βMCC /ξMCC )
, then:
max (βMCC u(i) 1 +|d(i)|)2 i 2 = exp − κ max (βMCC u(i) 1 + |d(i)|) 2σ 2 i . MCC ≥ exp − log βξMCC = βξMCC . MCC (3.69) It follows easily that g(w) 1 ≤ ϕ(σ ) ≤ βMCC , which completes the proof.
3.4.3 Appendix 3.C Proof By Theorem 3.2, we have g(w) 1 ≤ βMCC . To prove ∇w g(w) 1 ≤ αMCC , ∂g(w) it suffices to prove ∀j, ∂wj ≤ αMCC . By (3.17), we have: 1
N MCC −1 1
T − Nuu e(i)us (i)κ(e(i))u(i)u (i) g(w) 2 N σ ∂g(w) i=1 ∂wj = N
1 1 + NMCC −1 T e(i)us (i)κ(e(i))d(i)u (i) uu Nσ2 i=1 1 . N MCC −1 1
T e(i)us (i)κ(e(i))u(i)u (i) g(w) ≤ − Nuu Nσ2 i=1 1 N MCC −1 1
T e(i)us (i)κ(e(i))d(i)u (i) . + Nuu Nσ2 i=1
1
(3.70)
80
3 Information Theoretic Criteria
It is easy to derive: N MCC −1 1
T − N e(i)u (i)κ(e(i))u(i)u (i) g(w) s 2 uu Nσ i=1 1 N 1 MCC −1
T e(i)us (i)κ(e(i))u(i)u (i) ≤ N σ 2 Nuu g(w) 1
1 i=1 1
(a4 ) β −1 N MCC T MCC ≤ N σ 2 Nuu e(i)us (i)κ(e(i))u(i)u (i) 1 1 i=1 N (a5 ) β MCC −1
u(i)uT (i) , ≤ NMCC u(i) |d(i)|) u(i) N + (β MCC 1 1 2 uu 1 σ 1 i=1
.
(3.71) where (a4 ) follows from the convexity of the vector 1-norm and g(w) 1 ≤ βMCC 2 and (a5 ) is due to the fact that κ(e(i)) = exp(− (e(i)) ) ≤ 1 for any e(i) and 2σ 2 |e(i)us (i)| ≤ (βMCC u(i) 1 + |d(i)|) u(i) 1 . In a similar way, we derive: N MCC −1 1
T N e(i)us (i)κ(e(i))d(i)u (i) uu Nσ2 i=1
1 N −1
MCC T e(i)us (i)κ(e(i))u(i)u (i) Nuu ≤ 1 1 i=1 N −1
≤ N1σ 2 NMCC ((βMCC u(i) 1 + |d(i)|) u(i) 1 d(i)u(i) 1 ). uu 1 Nσ2
.
1 i=1
(3.72) Then, combining (3.70)–(3.72), (3.62), and (3.63), we have: ∂g(w) . ∂w ≤ ψ(σ ) j 1 √ =
m
N
(βMCC u(i) 1 + |d(i)|) u(i) 1 βMCC u(i)uT (i)1 + |d(i)u(i)|
i=1
σ 2λ
N
min
κ(βMCC u(i) 1
.
+ |d(i)|)u(i)uT (i)
i=1
(3.73) Obviously, ψ(σ ) is a continuous and monotonically decreasing function, satisfying lim ψ(σ ) = ∞ and lim ψ(σ ) = 0. Therefore, given 0 < αMCC < 1, the σ →0+
σ →∞
equation ψ(σ ) = αMCC (σ ∈ (0, ∞)) has a unique solution σ † , and if σ ≥ σ † , we have ψ(σ ) ≤ αMCC . This completes the proof.
3.4.4 Appendix 3.D Proof Since wMCC = arg max JMCC (w), we have JMCC (w0 ) ≤ JMCC (wMCC ). To prove |wMCC − w0 | ≤ ξ , it will suffice to prove JMCC (w) ≤ JMCC (w0 ) for
3.4 Appendices
81
any w satisfying |w − w0 | > ξ . Since N > M > σ > εo +|w0 |εMu , it follows easily that: 2 log
N 2,
we have 0
ξ , we have ∀i ∈ IN (εu , εo ): |ei | = (w0 − w)u˜ i + vio − w0 viu (a6 ) ≥ |w0 − w| × |u˜ i | − vio − w0 viu .
(a7 )
≥ ξ oMCC −(εo + |w0 | εu ) 2 0 |εu ) = −2σ 2 log exp − (εo +|w − 2σ 2
(3.75) N −M , M
where (a6 ) comes from (w0 − w)u˜ i + vio − w0 viu ≥ |(w0 − w)u˜ i | − vio − w0 viu and |(w0 − w)u˜ i | = |w0 − w| × |u˜ i | and (a7 ) follows from the Assumption 3.2 and |w − w0 | > ξ and vio − w0 viu ≤ εo + |w0 | εu . Thus, ∀i ∈ IN (εu , εo ): e2 exp − 2σi 2 ⎛ .
< exp ⎝
⎞ (ε +|w0 |εu )2 − N−M −2σ 2 log exp − o 2 M 2σ
2σ 2
2 0 |εu ) − = exp − (εo +|w 2 2σ
⎠
(3.76)
N −M M .
Then, we have JMCC (w) ≤ JMCC (w0 ) for any w satisfying |w − w0 | > ξ , since: 6
ei2 ei2 1 exp − 2σ 2 + JMCC (w) = N exp − 2σ 2 i∈IN (εu ,εo ) i ∈I / N (εu ,εo ) 6
ei2 (εo +|w0 |εu )2 N −M 1 exp − − M + exp − 2σ 2 ξMEE , the equation ϕ(σ ) = βMEE σ →∞
will have a unique solution σ ∗ over (0, ∞), and when σ ≥ σ ∗ , ϕ(σ ) ≤ βMEE holds. This completes the proof.
3.4.6 Appendix 3.F Proof By Theorem 3.5, we have g(w) 1 ≤ βMEE . To prove ∇w g(w) 1 ≤ αMEE , ∂g(w) it suffices to prove ∀j, ∂wj ≤ αMEE . By (3.38), we have: 1
∂g(w) ∂wj 1 N
N (e(i) − e(j )) [u (i) − u (j )] κ (e(i) − e(j )) −1
s s 1 MEE g(w) − Nuu 2 2 2N σ T i=1 j =1 × [u(i) − u(j )] [u(i) − u(j )] = N N
(e(i) − e(j )) [us (i) − us (j )] κ (e(i) − e(j )) −1 1 MEE + Nuu 2σ 2 2N . i=1 j =1 (d(i) − d(j )) [u(i) − u(j )] 1 N (e(i) − e(j )) [u (i) − u (j )] κ (e(i) − e(j )) N
−1
s s 1 ≤ NMEE g(w) 2N 2 σ 2 T uu i=1 j =1 × [u(i) − u(j )] [u(i) − u(j )] 1 N N −1
(e(i) − e(j )) [us (i) − us (j )] κ (e(i) − e(j )) 1 + NMEE . 2N 2 σ 2 uu i=1 j =1 (d(i) − d(j )) [u(i) − u(j )] 1
(3.84)
84
3 Information Theoretic Criteria
It is easy to derive: N
N (e(i) − e(j )) [u (i) − u (j )] κ (e(i) − e(j ))
MEE −1 s s 1 g(w) Nuu 2 2 T 2N σ × − u(j )] − u(j )] [u(i) [u(i) i=1j =1 1 −1 ≤ 2N12 σ 2 NMEE uu 1
N (e(i) − e(j )) [u (i) − u (j )] κ (e(i) − e(j )) N
s s × g(w) 1 i=1 j =1 × [u(i) − u(j )] [u(i) − u(j )]T 1 MEE −1 (a13 ) β MEE ˆ ≤ 2N 2 σ 2 Nuu . 1 N N
(e(i) − e(j )) [us (i) − us (j )] κ (e(i) − e(j )) × T × − u(j )] − u(j )] [u(i) [u(i) i=1 j =1 1 (a14 ) β −1 MEE MEE√ ≤ 4N 2 σ 3 π Nuu 1
N N
βMEE u(i) − u(j ) 1 + |d(i) − d(j )| u(i) − u(j ) 1 , T i=1 j =1 × [u(i) − u(j )] [u(i) − u(j )] 1
(3.85) where (a13 ) follows from the convexity of the vector 1-norm and g(w) 1 ≤ βMEE ))2 ) ≤ 2σ 1√π and (a14 ) is due to the fact that κ(e(i) − e(j )) = 2σ 1√π exp(− (e(i)−e(j 2σ 2
for any e and |(e(i) − e(j )) [us (i) − us (j )]| ≤ βMEE u(i)−u(j ) 1 +|d(i)−d(j )| u(i) − u(j ) 1 . In a similar way, we derive: N
N (e(i) − e(j )) [u (i) − u (j )] κ (e(i) − e(j ))
MEE −1 s s 1 Nuu 2N 2 σ 2 − d(j )) − u(j )] (d(i) [u(i) i=1 j =1 ≤ .
1 2N 2 σ 2
1
6 N
N (e(i) − e(j )) [u (i) − u (j )] κ (e(i) − e(j )) MEE −1
s s × Nuu 1 i=1 j =1 (d(i) − d(j )) [u(i) − u(j )] 1 βMEE√ MEE −1 ≤ 4N 2 σ 3 π Nuu 1 6
N N
βMEE u(i) − u(j ) 1 + |d(i) − d(j )| |d(i) − d(j )| × . 2 i=1 j =1 × u(i) − u(j ) 1 (3.86)
3.4 Appendices
85
Then, combining (3.84)–(3.86), (3.79), and (3.80), we have: ∂g(w) ∂wj ≤
.
MEE −1 Nuu 1 ⎛
N
N
βMEE u(i) − u(j ) 1 + |d(i) − d(j )| u(i) − u(j ) 1 ⎜ ⎜ i=1 j =1 ×[u(i) − u(j )] [u(i) − u(j )]T ⎜ 1 ⎜
⎜
N
N βMEE u(i) − u(j ) 1 + |d(i) − d(j )| |d(i) − d(j )| ⎝+ i=1 j =1 × u(i) − u(j ) 21
1 βMEE√ 4N 2 σ 3 π
≤ =
N N
i=1 j =1
2σ 2 λmin
⎟ ⎟ ⎟ ⎟ ⎟ ⎠
(3.87)
√ γMEE m/π
4σ 3 λmin
⎞
N N
i=1 j =1
κ βMEE u(i) − u(j ) 1 + |d(i) − d(j )| [u(i) − u(j )] [u(i) − u(j )]T √ γMEE m
2 u(i)−u(j ) 1 +|d(i)−d(j )|) β exp − ( MEE 4σ 2
[u(i) − u(j )] [u(i) − u(j )]T
= ψ(σ ).
As in MCC, ψ(σ ) in MEE is also a continuous and monotonically decreasing function of σ over (0, ∞), satisfying lim ψ(σ ) = ∞ and lim ψ(σ ) = 0. σ →∞
σ →0+
Therefore, given 0 < αMEE < 1, the equation ψ(σ ) = αMEE has a unique solution σ † over (0, ∞), and if σ ≥ σ † , ψ(σ ) ≤ αMEE holds. This completes the proof.
3.4.7 Appendix 3.G Proof Since |SN | > M > |S2N | , we have 0 < follows easily that σβ1 ≥ εo + |w0 | εu and: |SN |−M κ(0) M
|SN |−M M
< 1. As σ ≥
εo +|w0 |εu , β1
= κ(β1 ) = σ κ(σ β1 ) (a15 )
.
it
≤ σ κ(εo + |w 0 | εu ) εo +|w0 |εu =κ , σ
(3.88)
where (a15 ) follows from κ(x1 ) ≤ κ(x2 ) and ∀ |x1 | ≥ |x2 |. Hence, we have: .
|SN |−M M κ(0) < κ(0) |−M) εo +|w0 |εu − (|SNM κ(0) σ
0< 0 ξ = 1 oMEE (σ β2 + εo + |w0 | εu ), we have ∀(i, j ) ∈ SN (εu , εo ):
86
3 Information Theoretic Criteria
ei − ej = (w0 − w)(u˜ i − u˜ j ) + (v o − v o ) − w0 (v u − v u ) i j i j o (a16 ) o u ≥ |w0 − w| × u˜ i − u˜ j − vi − vj − w0 (vi − vju ) .
(3.90)
(a17 )
≥ ξ oMEE − (εo + |w0 | εu ) = σβ2 ,
where 3.4 and (a17 ) follows from |w − w0 | > ξ (a16 ) comes from the Assumption o o u u and (vi − vj ) − w0 (vi − vj ) ≤ εo + |w0 | εu . Thus, ∀(i, j ) ∈ SN (εu , εo ):
.
κ(ei − ei ) ≤ κ(σ β2 ) = σ1 κ(β ) 2 ) =
1 σ
κ
εo +|w0 |εu σ
(|SN |−M) κ(0) M (|SN |−M) κ(0). M
−
= κ(εo + |w0 | εu ) −
*
(3.91)
Then, we have: JMEE (w) =
1 N2
(i,j )∈SN (εu ,εo )
(e −e )2 exp − i2σ 2j
(ei −ej )2 = exp − 2σ 2 (i,j )∈SN (εu ,εo ) 6
(ei −ej )2 exp − 2σ 2 + (i,j )∈S / N (εu ,εo ) (a18 )
2 |SN |−M 0 |εu ) ≤ N12 exp − (εo +|w − κ(0) 2 M 2σ (i,j )∈SN (εu ,εo ) 6
+ κ(0) (i,j )∈S / N (εu ,εo )
2 0 |εu ) = N12 exp − (εo +|w 2σ 2 (i,j )∈SN (εu ,εo ) o (a19 )
(vi −vjo +w0 (viu −vju ))2 1 ≤ N2 exp − 2σ 2 (i,j )∈SN (εu ,εo ) o o u u N N (vi −vj +w0 (vi −vj ))2 1
< N2 exp − 2σ 2 1 N2
.
i=1 j =1
= JMEE (w0 ), (3.92) (e −e )2 < 1 and (a19 ) follows from where (a18 ) comes from exp − i2σ 2j εo + |w0 | εu ≥ (vio − vjo ) + w0 (viu − vju ), ∀(i, j ) ∈ SN (εu , εo ). This completes the proof.
3.4 Appendices
87
3.4.8 Appendix 3.H Proof Since wMEE = arg max JMEE (w), we have JMEE (w0 ) ≤ JMEE (wMEE ). According to Theorem 3.7, JMEE (w) < JMEE (w0 ) for any w satisfying |w − w0 | > ξMEE . Thus, it holds that |wMEE − w0 | ≤ ξMEE .
Chapter 4
Kalman Filtering Under Information Theoretic Criteria
Generally, the traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under the Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF to non-Gaussian noises, several robust Kalman filters are derived in this chapter based on a batch regression model presented in Chap. 2 and information theoretic criteria in Chap. 3. The first is the maximum correntropy Kalman filter (MCKF) [116], which is developed based on the maximum correntropy criterion (MCC) and a fixed-point iterative algorithm. Similar to the traditional KF, the MCKF not only retains the state mean propagation process but also preserves the covariance matrix propagation process. Thus, the filter also has a recursive structure and is suitable for online implementation. Some other approaches for MCC-based Kalman filtering are also briefly presented. To further improve the performance of MCKF, a Kalman filter algorithm based on the generalized maximum correntropy criterion (GMCKF) [117] is also derived, which adopts the generalized Gaussian density (GGD) function as the kernel in correntropy. The GMCKF is more general and flexible, which includes the MCKF with Gaussian kernel as a special case. In addition, when faced with more complicated non-Gaussian noises such as noises from multimodal distributions, the minimum error entropy (MEE) is usually a better choice to match the error distribution, which is beyond the capability of MCC. Thus, an enhanced Kalmantype filter, called minimum error entropy Kalman filter (MEE-KF) [125], is also developed in this chapter by using MEE criterion.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_4
89
90
4 Kalman Filtering Under Information Theoretic Criteria
4.1 Kalman Filter (KF) First, we briefly recall the set of update equations that constitute the Kalman filter. Consider a linear system described by: x(k) = F(k − 1)x(k − 1) + q(k − 1),
(4.1)
y(k) = H(k)x(k) + r(k),
(4.2)
.
.
where .x(k) ∈ Rn denotes the n-dimensional state vector at instant k and .y(k) ∈ Rm represents the m-dimensional measurement vector. .F(k) and .H(k) stand for, respectively, the system matrix (or state transition matrix) and the observation matrix. .q(k − 1) and .r(k) are mutually uncorrelated process noise and measurement noise, respectively, with zero mean and covariance matrices: E q(k − 1)qT (k − 1) = Q(k − 1), E r(k)rT (k) = R(k).
.
(4.3)
In general, the Kalman filter includes two steps:
4.1.1 Prediction The prior mean and covariance matrix are given by: xˆ (k|k − 1) = F(k − 1)ˆx(k − 1),
.
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1).
.
(4.4) (4.5)
4.1.2 Update The Kalman filter gain is computed as: −1 K(k) = P(k|k − 1)HT (k) H(k)P(k|k − 1)HT (k) + R(k) .
.
(4.6)
The posterior state is equal to the prior state plus the innovation weighted by the KF gain: xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1) .
.
(4.7)
4.2 Maximum Correntropy Kalman Filter (MCKF)
91
Additionally, the posterior covariance is recursively updated by: P(k) = (I − K(k)H(k)) P(k|k − 1)(I − K(k)H(k))T .
+ K(k)R(k)KT (k).
(4.8)
4.2 Maximum Correntropy Kalman Filter (MCKF) The MCKF is derived under MCC, which may perform much better than the traditional KF in impulsive (heavy-tailed) noise environments. Similar to the KF, the state mean vector and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. Unlike KF, a fixedpoint algorithm is, however, used in MCKF to update the posterior estimation. Based on a batch regression model, the MCKF is provided in the following:
4.2.1 MCKF Algorithm For a linear system, an augmented batch regression model can be established as: d(k) = W(k)x(k) + e(k),
(4.9)
.
xˆ (k|k − 1) I , .W(k) = −1 (k) , .e(k) = y(k) H(k) −1 (k)ν(k), and .(k) can be obtained by Cholesky decomposition of T .E ν(k)ν (k) with: where .d(k) = −1 (k)
E ν(k)ν T (k) =
.
=
P(k|k − 1) 0 0 R(k)
0 p (k|k − 1)Tp (k|k − 1) 0 r (k)Tr (k)
= (k)T (k), where: ν(k) =
.
− x(k) − xˆ (k|k − 1) . r(k)
(4.10)
92
4 Kalman Filtering Under Information Theoretic Criteria
Since .E e(k)eT (k) = I, the residual error .e(k) are white. An MCC-based cost function is then formulated by: 1 κσ (di (k) − wi (k)x(k)), L L
JMCKF (x(k)) =
.
(4.11)
i=1
where .di (k) is the i-th element of .d(k), .wi (k) is the i-th row of .W(k), and .L = n+m is the dimension of .d(k). Then, an optimal estimate of .x(k) is: xˆ (k) = arg max JMCKF (x(k)) = arg max
L
.
x(k)
x(k)
κσ (ei (k)) ,
(4.12)
i=1
where .ei (k) is the i-th element of .e(k): ei (k) = di (k) − wi (k)x(k).
.
(4.13)
The optimal solution can thus be obtained by setting: .
∂JMCKF (x(k)) = 0. ∂x(k)
(4.14)
It follows easily that: L −1 T κσ (ei (k)) wi (k)wi (k) x(k) = i=1
L T κσ (ei (k)) wi (k)di (k) . ×
.
(4.15)
i=1
Since .ei (k) = di (k) − wi (k)x(k), Eq. (4.15) is a fixed-point equation of .x(k) and can be rewritten as: x(k) = g (x(k)) ,
.
(4.16)
with: L −1 T κσ (di (k) − wi (k)x(k)) wi (k)wi (k) .g (x(k)) = i=1
L κσ (di (k) − wi (k)x(k)) wTi (k)di (k) . × i=1
(4.17)
4.2 Maximum Correntropy Kalman Filter (MCKF)
93
A fixed-point iterative algorithm can readily be obtained as: xˆ (k)τ = g xˆ (k)τ −1 ,
.
(4.18)
where .xˆ (k)τ denotes the updated solution at the fixed-point iteration .τ . Fixed-point equation (4.18) can also be expressed as: −1 xˆ (k) = WT (k)(k)W(k) WT (k)(k)D(k),
.
(4.19)
x (k) 0 where .(k) = , with .x (k) = diag{κσ (e1 (k)) , ..., κσ (en (k))}, 0 y (k) .y (k) = diag{κσ (en+1 (k)) , ..., κσ (en+m (k))}. Equation (4.19) can further be expressed as follows (see Appendix 4.A in Sect. 4.7.1 for a detailed derivation):
xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1) ,
(4.20)
.
−1 K(k) = P(k|k − 1)HT (k) H(k)P(k|k − 1)HT (k) + R(k) .
(4.21)
T P(k|k − 1) = p (k|k − 1)−1 x (k)p (k|k − 1).
(4.22)
T R(k) = r (k)−1 y (k)r (k).
(4.23)
.
where:
Sometimes, one can use .(k)+ I to replace .(k), where . is a small regularization parameter to avoid numerical problem in the matrix inversion. With the above derivations, we summarize the MCKF algorithm as Algorithm 1. Remark 4.1 Of course, Eq. (4.20) is also a fixed-point equation of .xˆ (k) because K(k) depends on .P(k|k − 1) and .R(k), both related to .xˆ (k) via .x (k) and .y (k), respectively. The optimal solution of Eq. (4.20) depends also on the prior estimate ˆ (k|k − 1), which can be calculated by (4.4) using the latest estimate .xˆ (k − 1). .x
.
Remark 4.2 Different from the traditional KF algorithm, the MCKF uses a fixedpoint algorithm to update the posterior estimate of the state. A small positive number .ε provides a stop condition (threshold) for the fixed-point iteration. Since the initial value of the fixed-point iteration is set at the prior estimate .xˆ (k|k − 1), the convergence to the optimal solution will be very fast (usually in several steps). The kernel size .σ is a key parameter in MCKF. In general, a smaller size makes the algorithm more robust (with respect to outliers) but converge more slowly. On the other hand, when .σ becomes larger and larger, the MCKF will behave more and more like the original KF algorithm. The following theorem holds:
94
4 Kalman Filtering Under Information Theoretic Criteria
Algorithm 1 Maximum correntropy Kalman filter (MCKF) [116] Step 1: Choose a proper kernel size σ and a small positive number ε; Set an initial estimate xˆ (0) and an initial covariance matrix P(0); let k = 1 Step 2: Use Eqs. (4.4) and (4.5) to obtain xˆ (k|k − 1) and P(k|k − 1); use Cholesky decomposition to obtain p (k|k − 1) and r (k) Step 3: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1), where xˆ (k)τ denotes the estimated state at the fixedpoint iteration τ Step 4: Use (4.24)–(4.30) to compute xˆ (k)τ xˆ (k)τ = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1)
(4.24)
−1 K(k) = P(k|k − 1)HT (k) H(k) P(k|k − 1)HT (k) + R(k)
(4.25)
T −1 P(k|k − 1) = p (k|k − 1) x (k)p (k|k − 1)
(4.26)
with:
R(k) =
T −1 r (k) y (k)r (k)
(4.27)
x (k) = diag{κσ ( e1 (k)) , ..., κσ ( en (k))}
(4.28)
y (k) = diag{κσ ( en+1 (k)) , ..., Gn+m ( en+m (k))}
(4.29)
ei (k) = di (k) − wi (k)ˆx(k)τ −1 .
(4.30)
Step 5: Compare the estimation of the current step and the estimation of the last step. If (4.31) holds, set xˆ (k) = xˆ (k)τ and continue to 6. Otherwise, τ + 1 → τ , and go back to Step 4 xˆ (k)τ − xˆ (k)τ −1 ≤ε xˆ (k)τ −1
(4.31)
Step 6: Update the posterior covariance matrix by (4.32), k + 1 → k and go back to Step 2 T P(k|k − 1) I − K(k)H(k) P(k) = I − K(k)H(k) T (k). + K(k)R(k) K
(4.32)
Theorem 4.1 When the kernel size .σ → ∞, the MCKF will reduce to the original KF algorithm. Proof See Appendix 4.B in Sect. 4.7.2.
4.2.2 Computational Complexity The computational complexity is analyzed in terms of the floating point operations for the MCKF algorithm. The computational complexities of some basic equations are given in Table 4.1.
4.2 Maximum Correntropy Kalman Filter (MCKF)
95
Table 4.1 Computational complexities of some equations Equation (4.4) (4.5) (4.6) (4.7) (4.8) (4.24) (4.25) (4.26) (4.27) (4.28) (4.29) (4.30) (4.32)
Multiplication and addition/subtraction 2 .2n − n 3 2 .4n − n 2 2 .4n m + 4nm − 3nm 4nm 3 2 2 2 .4n + 6n m − 2n + 2nm − nm 4nm 2 2 .4n m + 4nm − 3nm 3 .2n 3 .2m 2 .2n 2nm 2n 3 2 2 2 .4n + 6n m − 2n + 2nm − nm
Division, matrix inversion, Cholesky decomposition, and exponentiation 0 0 3 .O(m ) 0 0 0 3 .O(m ) 3 .n + O(n ) 3 .m + O(m ) n m 0 0
The traditional KF algorithm involves equations (4.4)–(4.8). From Table 4.1, one can conclude that the computational complexity of KF is: SKF = 8n3 + 10n2 m − n2 + 6nm2 − n + O(m3 ),
.
(4.33)
where .O(m3 ) denotes the order of computational complexity with .m3 . The MCKF algorithm mainly involves equations (4.4), (4.5), (4.24)–(4.30), and x (k) and . y (k) are diagonal matrices, it is very easy to compute (4.32). Since . their inverse matrices. Assume that the average fixed-point iteration number is T . Then, according to Table 4.1, the computational complexity of the MCKF is: SMCKF =(2T + 8)n3 + (6 + 4T )T n2 m + (2T − 1)n2
.
+ (4T + 2)nm2 + (3T − 1)nm + (4T − 1)n + 2T m3 + 2T m + T O(n3 ) + 2T O(m3 ).
(4.34)
The fixed-point iteration number T is relatively small in general. Thus, the computational complexity of the MCKF is moderate compared with the traditional KF algorithm.
4.2.3 Convergence Issue The rigorous convergence analysis of the MCKF algorithm is very complicated. In the following, we present only a sufficient condition that guarantees the convergence
96
4 Kalman Filtering Under Information Theoretic Criteria
of the fixed-point iterations in MCKF. The result is a direct consequence of the convergence analysis for the fixed-point MCC in Chap. 3 and hence will not be proved here. According to the convergence analysis results of the fixed-point MCC in Chap. 3, the following theorem holds: L √ n wTi (k)1 |di (k)|
L
and Theorem 4.2 If .βMCKF > ζMCKF = λmin i=1 ∗ † .σ ≥ max σ , σ , in which .σ ∗ is the solution of the equation .φ(σ ) = βMCKF , where: i=1
L √ wT (k) |di (k)| n i 1
φ(σ ) =
.
λmin
i=1
L
i=1
wTi (k)wi (k)
, σ ∈ (0, ∞)
κσ (βMCKF wi (k) 1 + |di (k)|) wTi (k)wi (k) (4.35)
and .σ † is the solution of the equation .ψ (σ ) = αMCKF , (0 < αMCKF < 1), with
ψ(σ ) = .
L
√ n (βMCKF wi (k) 1 +|di (k)|) wi (k) 1 βMCKF wTi (k)wi (k)1 +wTi (k)di (k)1 i=1
, L σ 2 λmin κσ (βMCKF wi (k) 1 +|di (k)|)wTi (k)wi (k) i=1
σ ∈ (0, ∞) , (4.36) then it holds that . g (x(k)) 1 ≤ βMCKF and .∇x(k) f (x(k))1 ≤ αMCKF for all n .x(k) ∈ {x(k) ∈ R : x(k) 1 ≤ βMCKF }, where .∇x(k) g (x(k)) denotes the .n × n Jacobian matrix of .g (x(k)) with respect to .x(k), that is:
∂ ∂ g (x(k)) . . . g (x(k)) .∇x(k) g (x(k)) = ∂x1 (k) ∂xn (k)
(4.37)
with: ∂ g (x(k)) ∂xj (k) .
=
L 1 j T ei (k)wi (k)κσ (ei (k)) wi (k)wi (k) g (x(k)) σ2
−N−1 ww + N−1 ww
i=1
L 1 j T ei (k)wi (k)κσ (ei (k)) wi (k)di (k) , σ2 i=1
where .Nww =
L i=1
j
κσ (ei (k)) wTi (k)wi (k) and .wi (k) is the j -th element of .wi (k).
4.2 Maximum Correntropy Kalman Filter (MCKF)
97
By Theorem 4.2, if the kernel size .σ is larger than a certain value, we have:
g (x(k)) 1 βMCKF ∇x(k) g (x(k)) αMCKF < 1. 1
.
(4.38)
In general, in a fixed-point algorithm, the initial state .x(k)0 is set to predictive state xˆ (k|k − 1). By Banach fixed-point theorem, given an initial state estimate satisfying ˆ (k)0 1 βMCKF , the fixed-point iteration algorithm in MCKF will surely con.x verge to a unique fixed point in the range .x(k) ∈ {x(k) ∈ Rn : x(k) 1 βMCKF } provided that the kernel size .σ is larger than a certain value (e.g., .max σ ∗ , σ † ). Theorem 4.2 implies that the kernel size has significant influence on the convergence behavior of MCKF. If the kernel size .σ is too small, the algorithm will diverge or converge very slowly. A larger kernel size ensures a faster converge speed but usually leads to a poor performance in impulsive noises. In practical applications, the kernel size can be set manually by trial-and-error methods. .
4.2.4 Illustrative Example In this part, we present an illustrative example to demonstrate the performance of the MCKF algorithm and compare it with the traditional KF algorithm. The performance index is defined by the mean square error (MSE): MSE =
.
lmc 1
lmc
MSE2 (l),
(4.39)
l=1
ˆ (k)||22 , .xˆ (k) is the estimate of .x(k), and .lmc where .MSE2 (l) = N1 N k=1 ||x(k) − x and .N are the number of Monte Carlo runs and number of samples. Consider the following linear system: .
x1 (k) cos(θ ) − sin(θ ) x1 (k − 1) = + q(k − 1), x2 (k) x2 (k − 1) sin(θ ) cos(θ )
x1 (k) + r(k), y(k) = 1 1 x2 (k)
.
(4.40)
(4.41)
where .θ = π/18. First, we consider the case in which both noises are Gaussian distributions, that is: q(k − 1) ∼ N (0, 0.01) .
r(k) ∼ N (0, 0.01),
98
4 Kalman Filtering Under Information Theoretic Criteria
Table 4.2 MSEs of .x1 and .x2 in Gaussian noises (From [116]) Filter KF MCKF. σ MCKF. σ MCKF. σ MCKF. σ MCKF. σ
= 0.5, ε = 1.0, ε = 3.0, ε = 5.0, ε = 5.0, ε
= 10−6 = 10−6 = 10−6 = 10−6 = 10−6
MSE of .x1 0.035778 0.131361 0.103497 0.035885 0.035785 0.035784
MSE of .x2 0.030052 0.105729 0.096126 0.030139 0.030047 0.030051
where .N (0, 0.01) denotes the Gaussian distribution with zero mean and variance 0.01. Table 4.2 shows the MSEs of .x1 and .x2 for different filters. Here, the MSE is computed as an average over 100 independent Monte Carlo runs, and in each run, 1000 samples (time steps) are used to evaluate the MSE. Since the both noises are Gaussian, the original Kalman filter performs very well, and in this example, it achieves almost the best performance (i.e., the smallest MSEs). One can also see that when the kernel size is too small, the MCKF may achieve a worse performance, while when the size becomes larger, its performance will approach that of the KF. Actually, it has been proved that when .σ → ∞, the MCKF will reduce to the traditional KF. In general, one should choose a larger kernel size under Gaussian noises. Second, we consider the case in which the process noise is still Gaussian but the observation noise is a heavy-tailed (impulsive) non-Gaussian noise, with a mixedGaussian distribution, that is:
.
q(k − 1) ∼ N (0, 0.01) .
r(k) ∼ 0.9N (0, 0.01) + 0.1N (0, 100).
Figures 4.1 and 4.2 illustrate the probability densities of the estimation errors of x1 and .x2 . In the simulation, we set .ε = 10−6 . As one can see, in impulsive noises, when kernel size is too small or too large, the performance of MCKF will be not good. In this case, however, with a proper kernel size (say .σ = 2.0), the MCKF can outperform the KF significantly, achieving a desirable error distribution with a higher peak and smaller dispersion. Again, when .σ is very large, MCKF achieves almost the same performance as the KF. Figure 4.3 shows the fixed-point iteration numbers at the time step (or instant) .k = 1 for different kernel sizes. It is evident that the larger the kernel size, the faster the convergence speed. In particular, when the kernel size is large enough, the fixed-point algorithm in MCKF will converge to the optimal solution in just one or two iterations. In practical applications, to avoid slow convergence, the kernel size cannot be set at a very small value. Similar results can also be seen from Table 4.3, where the average fixed-point iteration numbers of every time step for different filters are shown, which are computed as averages over 100 independent Monte Carlo runs, with each run containing 1000 time steps. .
4.2 Maximum Correntropy Kalman Filter (MCKF)
99
Fig. 4.1 Probability densities of .x1 estimation errors with different filters (From [116])
Fig. 4.2 Probability densities of .x2 estimation errors with different filters (From [116])
Fig. 4.3 Fixed-point iteration numbers at time step .k = 1 for different kernel sizes
We further investigate the influence of the threshold .ε on the performance. Table 4.4 illustrates the MSEs of .x1 and .x2 with different .ε (where the kernel size is set at .σ = 2.0), and Table 4.5 presents the average fixed-point iteration numbers. One can see that a smaller .ε usually results in slightly lower MSEs but needs more iterations to converge. Obviously, the influence of .ε is not significant compared with the kernel size .σ .
100
4 Kalman Filtering Under Information Theoretic Criteria
Table 4.3 Average iteration numbers for every time step with different .σ (From [116]) Filter MCKF. σ MCKF. σ MCKF. σ MCKF. σ MCKF. σ MCKF. σ
= 0.2, ε = 10−6 = 0.5, ε = 10−6 = 1.0, ε = 10−6 = 2.0, ε = 10−6 = 3.0, ε = 10−6 = 10, ε = 10−6
Average iteration numbers 3.89826 2.78835 2.36406 2.12967 2.01343 1.66423
Table 4.4 MSEs of .x1 and .x2 with different .ε (From [116]) Filter MCKF. σ MCKF. σ MCKF. σ MCKF. σ MCKF. σ
= 2.0, ε = 2.0, ε = 2.0, ε = 2.0, ε = 2.0, ε
= 10−1 −2 = 10 = 10−4 −6 = 10 = 10−8
MSE of .x1 0.221182 0.220386 0.220326 0.220322 0.220322
MSE of .x2 0.168721 0.167958 0.167900 0.167899 0.167899
Table 4.5 Average iteration numbers for every time step with different .ε (From [116]) Filter MCKF. σ MCKF. σ MCKF. σ MCKF. σ MCKF. σ
= 2.0, ε = 2.0, ε = 2.0, ε = 2.0, ε = 2.0, ε
= 10−1 = 10−2 = 10−4 = 10−6 = 10−8
Average iteration numbers 1.03884 1.09134 1.44313 2.13328 2.78392
Fig. 4.4 True and estimated values of .x1
Figures 4.4 and 4.5 show the true and the estimated values of .x1 (k) and .x2 (k) with KF and MCKF (.σ = 2.0, ε = 10−6 ). The results clearly indicate that the MCKF can achieve much better tracking performance than the KF algorithm.
4.3 Other Approaches for MCC-Based Kalman Filtering
101
Fig. 4.5 True and estimated values of .x2
4.3 Other Approaches for MCC-Based Kalman Filtering 4.3.1 Correntropy Filter (C-Filter) Besides the above MCKF, the MCC has also been used in state estimation, leading to the correntropy filter (C-Filter) [124]. In the C-Filter, the cost function is established by: JC−Filter (x(k)) = κσ ( y(k) − H(k)x(k) ) + κσ ( x(k) − F(k)x(k − 1) ) . (4.42)
.
By taking the derivative of .JC−Filter (x(k)) about .x(k) to .0, the formulation of fixedpoint C-Filter can be achieved as: κσ y(k) − H(k)ˆx(k) HT (k)(y(k) − H(k)ˆx(k)), ˆ (k) = F(k)ˆx(k − 1) + .x κσ ˆx(k) − F(k)ˆx(k − 1) (4.43) where .xˆ (k) in the right side of Eq. (4.43) can be approximated by .xˆ (k|k − 1) with xˆ (k|k − 1) = F(k)ˆ x (k − 1). x (k) − F(k)ˆx(k − 1) ≈ Therefore, the term .κσ ˆ κσ ˆx(k|k − 1) − F(k)ˆx(k − 1) = κσ ( 0 ), which simplifies (4.43) to:
.
xˆ (k) = F(k)ˆx(k − 1) + K(k)(y(k) − H(k)ˆx(k|k − 1)) y(k) − H(k)ˆx(k|k − 1) 2 HT (k). K(k) = exp − 2σ 2 .
(4.44)
Here, the kernel size can be chosen by an adaptive value .σ (k) = y(k)−H(k)ˆx(k) . Note that the C-Filter involves no covariance propagation process.
102
4 Kalman Filtering Under Information Theoretic Criteria
4.3.2 Modified Correntropy Filter (MC-Filter) Considering that the C-Filter is derived by approximating .xˆ (k) ≈ xˆ (k|k − 1), the modified C-Filter (MC-Filter) [128] is derived by only using this approximation in the Gaussian kernel function rather than the innovation term in (4.43), yielding: xˆ (k) + κσ y(k) − H(k)ˆx(k|k − 1) HT (k)H(k)ˆx(k) = F(k)ˆx(k − 1) + κσ ˆx(k) − F(k)ˆx(k|k − 1) HT (k)y(k).
.
(4.45)
Then, the state update equation can be achieved as: xˆ (k) =ˆx(k|k − 1) + K(k)(y(k) − H(k)ˆx(k|k − 1)) K(k) =[I + κσ y(k) − H(k)ˆx(k|k − 1) )HT (k)H(k)]−1 × κσ ˆx(k) − F(k)ˆx(k − 1) HT (k). .
(4.46)
The covariance propagation equation is similar as that in KF: P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T
.
+ K(k)R(k)KT (k).
(4.47)
4.3.3 Maximum Correntropy Criterion Kalman Filter (MCC-KF) Inspired by the WLS method in deriving the standard KF in Chap. 2, the MCC can be applied in objective function (2.73), generating the objective function of maximum correntropy criterion Kalman filter (MCC-KF) [128]: JMCC−KF (x(k)) = κσ y(k) − H(k)x(k) R−1 (k) + κσ x(k) − F(k)x(k − 1) P−1 (k|k−1) ,
.
(4.48)
where . y(k)−H(k)x(k) R−1 (k) = (y(k)−H(k)x(k))T R−1 (k)(y(k)−H(k)x(k)) and T −1 (k|k − 1)(x(k) − . x(k) − F(k)x(k − 1) P−1 (k|k−1) = (x(k) − F(k − 1)x(k − 1)) P F(k − 1)x(k − 1)). After maximizing this objective function, the state estimate is: xˆ (k) =ˆx(k|k − 1) + K(k)(y(k) − H(k)ˆx(k|k − 1))
−1 κσ y(k) − H(k)ˆx(k|k − 1) −1 T −1 H (k)R (k)H(k) K(k) = P (k|k − 1) + κσ ˆx(k|k − 1) − F(k)ˆx(k − 1) .
4.3 Other Approaches for MCC-Based Kalman Filtering
103
κσ y(k) − H(k)ˆx(k|k − 1) HT (k)R−1 (k). × κσ ˆx(k|k − 1) − F(k)ˆx(k − 1)
(4.49)
Again, the covariance propagation equation can be borrowed from the standard KF algorithm: P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]T
.
+ K(k)R(k)KT (k).
(4.50)
4.3.4 Measurement-Specific Correntropy Filter (MSCF) Different from MCC-KF, the measurement-specific correntropy filter (MSCF) [144] is introduced by modifying the objective function (4.48) as: m 1 κσ yj (k) − Hj (k)x(k) R−1 (k) .JMSCF (x(k)) = jj m j =1
+ κσ x(k) − x(k|k − 1) P−1 (k|k−1) ,
(4.51)
where .yj (k) is the j -th element of .y(k), .Hj (k) is the j -th row of matrix .H(k), and Rjj (k) is the j -th element on the diagonal of .R(k). Note that the elements in the innovation are weighted separately. Setting the derivative of .JMSCF (x(k)) about .xˆ (k) to .0, the state estimate can be expressed by:
.
xˆ (k) =ˆx(k|k − 1) + K(k)(y(k) − H(k)ˆx(k|k − 1)).
.
(4.52)
K(k) =[x (k)P−1 (k|k − 1) + HT (k)y (k)R−1 (k)H(k)]−1 ]−1 × HT (k)y (k)R−1 (k) x (k) =κσ ˆx(k) − xˆ (k|k − 1) P−1 (k|k−1) . y (k) =diag κσ y1 (k) − H1 (k)ˆx(k) R−1 (k) , ..., κσ ( ym (k) . −Hm (k)ˆx(k) R−1 mm (k)
(4.53)
11
(4.54)
A large outlier occurred in the j -th measurement will be suppressed by the j -th kernel function in the diagonal matrix .y (k). In fact, if the innovation term of the j -th measurement is very large, the value of the kernel function κσ yj (k) − Hj (k)ˆx(k) R−1 (k) will be very small.
.
jj
104
4 Kalman Filtering Under Information Theoretic Criteria
4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF) The GMCKF is derived by using the generalized maximum correntropy criterion (GMCC) to deal with optimal filtering problem under various types of non-Gaussian noise. As an extension of Gaussian density function, the generalized Gaussian density (GGD) function is utilized as the kernel function in GMCC, taking the form of [137]: α α e e α exp − = γα,β exp − , .κα,β (e) = β β 2β 1 α
(4.55)
where e is the error, .α > 0 is the shape parameter describing the exponential rate of decay, .β > 0 is the scale parameter which models the dispersion of the distribution, . (.) denotes the Gamma function, and .γα,β = α/(2β (1/α)) is the normalization constant. As shown in Fig. 4.6, the GGD function reduces to Gaussian distribution function when .α = 2, and to Laplace distribution function when .α = 1, and converges to uniform distribution function when .α → ∞. When GGD function is used as the kernel function of correntropy, we obtain the generalized correntropy, defined by [117]: V (X, Y ) = E[κα,β (X − Y )],
(4.56)
.
which can easily be estimated from the sample data: 0.3
=1 (Laplace) =2 (Gaussian) =10 (Uniform)
value of GGD
0.25 0.2
0.15 0.1
0.05 0 -6
-4
-2
0
e
2
Fig. 4.6 GGD curves with .β = 2 and different .α (From [117])
4
6
4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF) N 1 κα,β (e(i)) , Vˆα,β (X, Y ) = N
.
105
(4.57)
i=1
where .e(i) = x(i) − y(i) and .{x(i), y(i)}N i=1 are the N samples drawn from the joint distribution .FXY .
4.4.1 GMCKF Algorithm Based on the generalized correntropy (4.57) and the batch regression model in (4.9), the cost function in GMCKF is: JGMCKF (x (k)) =
L
.
κ α,β (di (k) − wi (k) x (k)) ,
(4.58)
i=1
where .di (k) denotes the i-th element of .d(k), .wi (k) denotes the i-th row of .W(k), and .L = n + m denotes the dimension of .d(k). Under the GMCC, the optimal estimate of the state .x(k) can be derived by maximizing .JGMCKF (x(k)): xˆ (k) = arg max JGMCKF (x(k)) = arg max
N
.
x(k)
x(k)
κα,β (ei (k)),
(4.59)
i=1
where .ei (k) denotes the i-th element of .e(k) and: ei (k) = di (k) − wi (k)x(k).
.
(4.60)
The optimal solution can be achieved by setting: .
∂JGMCKF (x(k)) = 0. ∂x(k)
(4.61)
Substituting (4.58) into (4.61), one can obtain the following equation:
.
N κα,β (ei (k))wTi (k)|ei (k)|α−1 sgn (ei (k)) = 0.
(4.62)
i=1
Let: ϑ(ei (k)) = κα,β (ei (k))|ei (k)|α−2 .
.
(4.63)
106
4 Kalman Filtering Under Information Theoretic Criteria
Then, (4.62) can be represented as: .
N ϑ(ei (k))wTi (k) (di (k) − wi (k)x(k)) = 0.
(4.64)
i=1
From (4.64), we derive: N N T −1 T x(k) = [ϑ(ei (k))wi (k)wi (k)] [ϑ(ei (k))wi (k)di (k)] .
i=1
i=1
= (WT (k)(k)W(k))
−1
(4.65)
WT (k)(k)D(k),
where:
x (k) 0 .(k) = 0 y (k)
(4.66)
with: x (k) = diag{ϑ(e1 (k)), ..., ϑ(en (k))}
(4.67)
y (k) = diag{ϑ(en+1 (k)), ..., ϑ(en+m (k))}.
(4.68)
.
.
Thus, similar to the MCKF, we obtain: ˆ (k) = xˆ (k |k − 1 ) + K(k) y(k) − H(k)ˆx (k |k − 1 ) , .x
(4.69)
where: −1 K (k) = P (k |k − 1 ) HT (k) H(k)P (k |k − 1 ) HT (k) + R (k)
.
(4.70)
T P (k |k − 1 ) = p (k) −1 x (k) p (k)
(4.71)
T R (k) = r (k) −1 y (k) r (k) .
(4.72)
.
.
Usually, one can use .(k) + I to replace .(k), where . is a small regularization parameter to avoid numerical problem in the matrix inversion. Finally, update the posterior covariance matrix by: T
P(k) =(I − K(k)H(k))P(k|k − 1)(I − K(k)H(k)) .
T
(4.73)
+ K(k)R(k)K (k). Updated Eq. (4.69) is in fact a fixed-point equation and can be solved by the following fixed-point iterative algorithm: xˆ (k)τ = g(ˆx(k)τ −1 ),
.
(4.74)
4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF)
107
where: g(ˆx(k)τ −1 ) = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx (k|k − 1) .
.
(4.75)
xˆ (k)τ −ˆx(k)τ −1 ≤ ε with .ε denoting a threshold parameter, the fixed-point xˆ (k)τ −1 iteration is stopped, and we set .xˆ (k) = xˆ (k)τ . The pseudocode of GMCKF is shown in Algorithm 2.
When .
Algorithm 2 Generalized maximum correntropy Kalman filter (GMCKF) [117] Step 1: Choose a proper shape parameter α, kernel size σ , threshold for fixed-point iterative algorithm ε, initial state estimate xˆ (0), and initial covariance matrix P(0); let k = 1 Step 2: Use Eqs. (4.4)–(4.5) to obtain xˆ (k|k − 1); calculate (k) using − 1) and P(k|k Cholesky xˆ (k|k − 1) I −1 −1 ; calculate W (k) = (k) decomposition; calculate d (k) = (k) y (k) H(k) Step 3: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1), where xˆ (k)τ denotes the estimated state at the fixedpoint iteration τ ; Step 4: Use (4.76)–(4.82) to compute xˆ (k)τ : xˆ (k)τ = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1)
(4.76)
−1 K(k) = P(k|k − 1)HT (k) H(k) P(k|k − 1)HT (k) + R(k)
(4.77)
T −1 P(k|k − 1) = p (k|k − 1) x (k)p (k|k − 1)
(4.78)
with
T −1 r (k) y (k)r (k)
(4.79)
x (k) = diag{ϑ(e1 (k)), ..., ϑ(en (k))}
(4.80)
y (k) = diag{ϑ(en+1 (k)), ..., ϑ(en+m (k))}
(4.81)
ei (k) = di (k) − wi (k)ˆx(k)τ −1
(4.82)
R(k) =
ϑ( ei (k)) = κα,β ( ei (k))| ei (k)|
α−2
.
(4.83)
Step 5: Compare the estimation of the current step and the estimation of the last step. If (4.84) holds, set xˆ (k) = xˆ (k)τ and continue to Step 6. Otherwise, τ + 1 → τ , and go back to (4.76). xˆ (k)τ − xˆ (k)τ −1 ≤ε xˆ (k)τ −1
(4.84)
Step 6: Update the posterior covariance matrix by (4.85), k + 1 → k and go back to Step 2: T P(k|k − 1) I − K(k)H(k) P(k) = I − K(k)H(k) T (k). + K(k)R(k) K
(4.85)
108
4 Kalman Filtering Under Information Theoretic Criteria
Remark 4.3 Similar to MCKF, the GMCKF updates the posterior state estimate using a fixed-point iteration algorithm. The initial value of the fixed-point iteration algorithm is obtained from prior estimation, and generally this leads to a fast convergence rate. The shape parameter .α and the scale parameter .β have important impacts on the performance of GMCKF. GMCKF involves only one extra parameter compared with MCKF.
4.4.2 Computational Complexity In the following, the computational complexity of GMCKF is analyzed based on the floating point operations. Table 4.6 gives the computational complexities of some basic equations. Suppose the average number of fixed-point iteration is T . According to Table 4.6, the computational complexity of GMCKF can be evaluated by: SGMCKF = (2T + 8) n3 + (4T + 6) n2 m + (2T − 1) n2 + (4T + 2) nm2 .
+ (3T − 1) nm + (5T − 1) n + 2T m3 + 3T m + 2T O n3 + 2T O m3 .
(4.86)
The fixed-point iteration number T is relatively small in general, so the computational complexity of GMCKF is moderate.
Table 4.6 Computational complexities of some basic equations Equation (4.60) (4.63) (4.67) (4.68) (4.69) (4.70) (4.71) (4.72) (4.73)
Multiplication and addition/subtraction 2n .n + m 2 .2n 2nm 4nm 2 2 .4n m + 4nm − 3nm 3 .2n 3 .2m 3 2 2 2 .4n + 6n m − 2n + 2nm − nm
Division, matrix inversion, Cholesky decomposition, and exponentiation 0 3 .O n n m 0 3 .O m 3 .n + O n 3 .m + O m 0
4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF)
109
4.4.3 Parameter Selection Compared with the MCKF algorithm, the GMCKF involves an extra free parameter α. This section will present an approach to determine the value of .α, which is developed based on the estimation of the GGD parameters. There are many methods to estimate the parameters of a GGD distribution [145– 147]. Here, we only introduce the moment-matching estimators (MMEs), in which the assumed distribution can simply be matched by the underlying moments of the data set [148]. For a GGD distribution, the r-order absolute central moment is calculated as: +∞ r+1 α r , |x − μ|r κα,β (x)dx = β r = (4.87) . mr = E |x − μ| 1 α −∞
.
where .μ is the mean of variable .x, .κα,β (x) is the assumed GGD distribution, and .α and .β are the shape and scale parameters, respectively. The shape parameter .α can be expressed in terms of the kurtosis of the GGD distribution [149]. The reciprocal of the square root of a GGD distribution can be computed by: α3 m2 .K (α) = √ (4.88) = , m4 5 1 α α where .m2 denotes the standard deviation and .m4 is the fourth-order central moment of the GGD distribution. Therefore, an estimate of .α could be: m ˆ2 −1 (4.89) .α ˆ =K , m ˆ4 where .K −1 (·) denotes the inverse function of .K (·). Given random samples .x = {x1 , x2 , ..., xN }, we have: ⎧ N ⎪ 1 ⎪ ⎪ μ ˆ = xi ⎪ N ⎪ ⎪ i=1 ⎪ ⎨ N 2 . xi − μˆ m ˆ 2 = N1 ⎪ ⎪ i=1 ⎪ ⎪ N ⎪ 4 ⎪ 1 ⎪ xi − μˆ . ˆ4 = N ⎩m
(4.90)
i=1
To estimate the shape parameter .α, the inverse function .K −1 (·) needs to be solved. Since the inverse function has no explicit forms, cubic Hermite interpolation method is often used in practical applications. Due to the adverse effect of outliers
110
4 Kalman Filtering Under Information Theoretic Criteria
on parameter estimation, data far from the mean .μ can be removed to enhance the estimation precision. In addition, an upper bound of .αˆ can be set at a certain value (e.g., 10) to avoid very large gradient. The scale parameter .β (or kernel size .σ for Gaussian case) is usually set manually or optimized by trial-and-error methods, just like what has been done in the MCKF algorithm.
4.4.4 Illustrative Example Consider a dynamic system with state equation (4.40) and measurement equation (4.41). To validate the filtering performance in non-Gaussian noise, the observation noise model is set to: r(k) = (1 − a (k)) A (k) + a (k) B (k) ,
.
(4.91)
where .a (k) is a binary independent and identically distributed process with: .
Pr {a (k) = 1} = c, Pr {a (k) = 0} = 1 − c,
(4.92)
where .0 ≤ c ≤ 1 is an occurrence probability. In (4.91), .A (k) denotes a noise process with smaller variance which occupies the majority, and .B (k) denotes another noise process with larger variance to generate the outliers (or impulsive disturbances). The noise processes .A (k) and .B (k) are mutually independent, and both are independent of .a (k). In the following simulations, c is set at .0.1, and .B (k) obeys the Gaussian distribution with zero mean and variance 100. For the distribution of .A (k), we consider three different distributions: (a) .N (0, 0.01): Gaussian distribution with zero mean and variance 0.01 (b) .L(0, 1.0): Laplace distribution with position parameter zero and scale parameter .1.0 (c) .U (−1.0, 1.0): Uniform distribution over .[−1.0, 1.0] The performances of KF, MCKF, measurement-specific correntropy filter (MSCF) [144], robust student’s t-based Kalman filter (RSTKF) [74], and GMCKF algorithm are tested in the simulations. The scale parameter in GMCKF (or kernel size in MCKF) is manually set to guarantee a desirable performance. The threshold for the fixed-point iteration in MCKF and GMCKF is set to .ε = 10−6 . The kernel size in MSCF is experimentally set to .σ = 15, and the parameters in NRSTF are .ω = υ = τ = 5, N = 10. The mean square error (MSE) and PDFs of .x1 and .x2 are used to evaluate the performance of different algorithms. The MSEs are computed by averaging over 100 independent Monte Carlo simulations, and each simulation contains 1000 steps.
4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF) Table 4.7 Selection of the shape parameter .α (From [117])
111 Value of .αˆ 1.93 1.016 10
Type of noise process .A(k) .N (0, 0.01) .L(0, 1.0) .U (−1.0, 1.0)
Fig. 4.7 PDFs of the estimation errors of .x1 and .x2 with different .α when .A(k) ∼ N (0, 0.01) (From [117])
0.04
KF MCKF NRSTF MSCF GMCKF( GMCKF( GMCKF( GMCKF( GMCKF( GMCKF(
0.035 0.03
pdf
0.025 0.02
=0.5) =1) =2) =4) =6) =1.93)
0.015 0.01 0.005 0 -1
-0.5
0
x
0.5
0.045
KF MCKF NRSTF MSCF GMCKF( GMCKF( GMCKF( GMCKF( GMCKF( GMCKF(
0.04 0.035 0.03
pdf
1
1
0.025 0.02
=0.5) =1) =2) =4) =6) =1.93)
0.015 0.01 0.005 0 -1
-0.5
0
x
0.5
1
2
The initial states are set at .x1 (0) = x2 (0) = 0. Table 4.7 gives the estimated shape parameter .αˆ under different distributions of .A(k). As one can see, when .A(k) ∼ N (0, 0.01), the shape parameter is close to .2.0, and when .A(k) ∼ L(0, 1.0), the shape parameter is close to .1.0. The upper bound of .αˆ is set to 10 in the simulations, so for the case of third noise, the estimated value of .α is 10. Figure 4.7 and Table 4.8 demonstrate the performance of different filters when .A(k) obeys the Gaussian distribution .N (0, 0.01), where Fig. 4.7 shows the PDFs of the state estimation errors of .x1 and .x2 , and Table 4.8 illustrates the MSEs of .x1 and .x2 . As one can see, the performance of MCKF, NRSTF, MSCF, and GMCKF
112
4 Kalman Filtering Under Information Theoretic Criteria
Table 4.8 MSEs of .x1 and with different .α when .A(k) ∼ N (0, 0.01) (From [117])
Filter KF MCKF NRSTF MSCF GMCKF(.α = 1.93)
.x2
Fig. 4.8 PDFs of the estimation errors of .x1 and .x2 with different .α when .A(k) ∼ L(0, 1.0) (From [117])
MSE of .x1 0.5038 0.0467 0.0463 0.1660 0.0467
KF MCKF NRSTF MSCF GMCKF( GMCKF( GMCKF( GMCKF( GMCKF( GMCKF(
0.015
0.0125
0.01
pdf
MSE of .x2 0.4684 0.0356 0.0329 0.1125 0.0356
0.0075
=0.5) =1) =2) =4) =6) =1.016)
0.005
0.0025
0 -2
-1
0
1
x
2
1
0.02
KF MCKF NRSTF MSCF GMCKF( GMCKF( GMCKF( GMCKF( GMCKF( GMCKF(
0.018 0.016 0.014
pdf
0.012 0.01
=0.5) =1) =2) =4) =6) =1.016)
0.008 0.006 0.004 0.002 0 -2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
x2
can perform much better than that of the KF. In this case, the estimated shape parameter in GMCKF is around .1.93, with which the GMCKF achieves almost the same performance as MCKF. Figure 4.8 and Table 4.9 show the performance of different filters when .A(k) ∼ L(0, 1), from which we can see that with the estimated shape parameter (.α = 1.016) the GMCKF outperforms all other filters.
4.4 Generalized Maximum Correntropy Kalman Filter (GMCKF) Table 4.9 MSEs of .x1 and with different .α when .A(k) ∼ L(0, 1.0)
113
Filter KF MCKF NRSTF MSCF GMCKF(.α = 1.016)
.x2
Fig. 4.9 PDFs of the estimation errors of .x1 and .x2 with different .α when .A(k) ∼ U (−1.0, 1.0) (From [117])
MSE of .x1 0.6825 0.3375 0.3159 0.5227 0.2923
MSE of .x2 0.5469 0.2032 0.1983 0.3481 0.1835 KF MCKF NRSTF MSCF GMCKF( GMCKF( GMCKF( GMCKF( GMCKF( GMCKF(
0.025
0.02
pdf
0.015
=1) =2) =4) =6) =10) =100)
0.01
0.005
0 -1
-0.5
0
0.5
1
x1 0.035
KF MCKF NRSTF MSCF GMCKF( GMCKF( GMCKF( GMCKF( GMCKF( GMCKF(
0.03
0.025
pdf
0.02
0.015
=1) =2) =4) =6) =10) =100)
0.01
0.005
0 -1
-0.5
0
0.5
1
x2
Figure 4.9 and Table 4.10 show the performance of different filters when .A(k) ∼ U(−1.0, 1.0). Again, the GMCKF can outperform all other filters with a proper shape parameter (say, .α = 10 in this example).
114
4 Kalman Filtering Under Information Theoretic Criteria
Table 4.10 MSEs of .x1 and with different .α when .A(k) ∼ U (−1.0, 1.0) (From [117])
Filter KF MCKF NRSTF MSCF GMCKF(.α = 10)
.x2
MSE of .x1 0.4786 0.1225 0.1259 0.2949 0.1031
MSE of .x2 0.3783 0.0703 0.0691 0.2244 0.0589
4.5 Minimum Error Entropy Kalman Filter (MEE-KF) Although MCKF and GMCKF have the advantage in improving the estimation accuracy under heavy-tailed noise, both may result in performance degradation when faced with more complicated noise such as those of multimodal distributions. To address this issue, the MEE criterion is developed as an information theoretic alternative to traditional MMSE criterion in adaptive system training. The error’s entropy measures the average uncertainty contained in the error signal, whose minimization forces the error samples to concentrate. This motivates us to develop an MEE-based KF algorithm (MEE-KF), in which the PDF of error is estimated by kernel density estimation (KDE), resulting in a flexible functional form to approximate a large number of different signals, comprising a wide range of statistical distributions.
4.5.1 MEE-KF Algorithm Based on the batch regression model, the objective function of MEE-KF is given by: JMEE−KF (x(k)) =
.
L L 1 κσ (ej (k) − ei (k)). L2
(4.93)
i=1 j =1
Then, the state estimate .xˆ (k) can be achieved by solving the following optimization problem: .
xˆ (k) = arg max JMEE−KF (x(k)) x(k)
L L 1 κσ (ej (k) − ei (k)). = arg max 2 x(k) L
(4.94)
i=1 j =1
Setting the derivative of the objective function .JMEE−KF (x(k)) regarding .x(k) to zero, we have:
4.5 Minimum Error Entropy Kalman Filter (MEE-KF)
.
115
∂ JMEE−KF (x(k)) ∂x(k) L L 1 [ej (k) − ei (k)]κσ (ej (k) − ei (k))[wj (k) − wi (k)] 2 2 L σ
=
i=1 j =1
= 1 − 2 − 3 + 4 = 21 − 23 = 0,
(4.95)
where: 1 =
.
L L 1 ej (k)κσ (ej (k) − ei (k))wj (k)T. L2 σ 2
(4.96)
L L 1 ei (k)κσ (ej (k) − ei (k))wj (k)T. L2 σ 2
(4.97)
L L 1 ej (k)κσ (ej (k) − ei (k))wi (k)T. L2 σ 2
(4.98)
L L 1 ei (k)κσ (ej (k) − ei (k))wi (k)T . L2 σ 2
(4.99)
i=1 j =1
2 =
i=1 j =1
3 =
i=1 j =1
4 =
i=1 j =1
Equation (4.95) can be rewritten in a matrix form: 1 − 3
.
=
1 L2 σ 2
W(k)T (k)e(k) −
1 L2 σ 2
W(k)T (k)e(k) = 0,
(4.100)
where: [(k)]ij = κσ (ej (k) − ei (k)). # L $ L (k) = diag κσ (e1 (k) − ei (k)), ..., κσ (eL (k) − ei (k)) .
.
i=1
(4.101) (4.102)
i=1
From (4.100), .x(k) can be solved by a fixed-point iteration: xˆ (k)τ = g(ˆx(k)τ −1 ) −1 W(k)T (k)d(k) , = W(k)T (k)W(k)
.
where:
(4.103)
116
4 Kalman Filtering Under Information Theoretic Criteria
(k) =(k) − (k) =
.
x (k) yx (k) xy (k) y (k)
.
(4.104)
The explicit expressions of .x (k) ∈ Rn×n , .xy (k) ∈ Rm×n , .yx (k) ∈ Rn×m , and m×m are .y (k) ∈ R x (k) = i,j (k) n×n = i,j (k) n×n − i,j (k) n×n
.
(i = 1, 2 · · · n; j = 1, 2 · · · n). xy (k) = i,j (k) m×n = i,j (k) m×n − i,j (k) m×n
(4.105)
(i = n + 1, n + 2 · · · n + m; j = 1, 2 · · · n). yx (k) = i,j (k) n×m = i,j (k) n×m − i,j (k) n×m
(4.106)
(i = 1, 2 · · · n; j = n + 1, n + 2 · · · n + m). y (k) = i,j (k) m×m = i,j (k) m×m − i,j (k) m×m
(4.107)
(i = n + 1, n + 2 · · · n + m; j = n + 1, n + 2 · · · n + m).
(4.108)
In general, one can use .(k) + I to replace .(k), where . is a regularization parameter, to avoid numerical problem in the matrix inversion. Following (4.104) and the formulations of .W(k) and .d(k) in (4.9), we arrive at: T W(k) (k)W(k) −1 −1 T T T = −1 p (k|k − 1) x + H (k)r (k) xy p (k|k − 1) −1 −1 T T T + −1 p (k|k − 1) yx + H (k)r (k) y r (k)H(k) T . W(k) (k)d(k) −1 −1 T T T = −1 p (k|k − 1) x + H (k)r (k) xy p (k|k − 1) T + HT (k)−1 (k)T xˆ (k|k − 1) + −1 (k|k − 1) yx y p r
(4.109)
−1 r (k)y(k). By (4.109), the fixed-point update equation (4.103) can be rewritten as: xˆ (k) = (1 + 2 3 )−1 1 xˆ (k|k − 1) + 2 y(k)
.
= (1 + 2 3 )−1 1 xˆ (k|k − 1) + (1 + 2 3 )−1 2 y(k), where:
(4.110)
4.5 Minimum Error Entropy Kalman Filter (MEE-KF)
117
T + HT −1 (k)T −1 (k|k − 1) 1 = −1 (k|k − 1) x xy r p p −1 T −1 T . = −1 (k|k − 1)T 2 yx + H r (k) y r (k) p 3 = H(k). Using the matrix inversion lemma: .
(A + BCD)−1 = A−1 − A−1 B(C−1 + DA−1 B)−1 DA−1 ,
(4.111)
with the identifications: 1 → A, 2 → B, IL → C, 3 → D,
.
(4.112)
one can reformulate (4.110) as: .
xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1) ,
(4.113)
where:
T 1) P(k|k − 1) + H (k)Pxy (k|k − K(k) = + Pyx (k|k − 1) + HT (k)R(k) H × Pyx (k|k − 1) + HT (k)R(k) .
−1
−1 T P(k|k − 1) = −1 p (k|k − 1) x (k)p (k|k − 1) −1 −1 T Pxy (k|k − 1) = r (k) xy (k)p (k|k − 1) −1 T Pyx (k|k − 1) = −1 p (k|k − 1) yx (k)r (k) −1 −1 T R(k) = r (k) y (k)r (k).
(4.114)
The posterior covariance matrix can be updated by: T P(k) = I − K(k)H(k) P(k|k − 1) I − K(k)H(k)
.
+ K(k)R(k)K(k)T .
(4.115)
With the above derivations, the MEE-KF algorithm can be summarized as Algorithm 3.
4.5.2 Computational Complexity The updates of MEE-KF involve Eqs. (4.4), (4.5), (4.116)–(4.123), and (4.125), and the corresponding computational complexities of floating point operations are
118
4 Kalman Filtering Under Information Theoretic Criteria
Algorithm 3 Minimum error entropy Kalman filter (MEE-KF) [125] Step 1: Initialize the state a priori estimate xˆ (1|0) and state prediction error covariance matrix P(1|0); set a proper kernel size σ and a small positive number ε Step 2: Use Eqs. (4.4) and (4.5) to obtain xˆ (k|k − 1) and P(k|k − 1), respectively; use the Cholesky decomposition of (k) to obtain p (k|k − 1) and r (k) Step 3: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1), where xˆ (k)τ denotes the estimated state at the fixed-point iteration τ Step 4: Use available measurements {y(k)}N k=1 to update: xˆ (k)τ = xˆ (k|k − 1) + K(k)[y(k) − H(k)ˆx(k|k − 1)]
(4.116)
ei (k) = di (k) − w(k)ˆx(k)τ −1
x (k) yx (k) = (k) − (k) = (k) xy (k) y (k)
(4.117)
with:
(4.118)
T −1 P(k|k − 1) = −1 p (k|k − 1) x (k)p (k|k − 1)
(4.119)
Pxy (k|k − 1) =
− 1)
(4.120)
T −1 Pyx (k|k − 1) = −1 p (k|k − 1) yx (k)r (k)
(4.121)
T −1 R(k) = −1 r (k) y (k)r (k)
(4.122)
T −1 −1 r (k) xy (k)p (k|k
T P(k|k − 1) + H (k)PTxy (k|k − 1) + Pyx (k|k − 1) + H (k) R(k) H(k) × Pyx (k|k − 1) + HT (k) R(k) .
K(k) =
−1
(4.123)
Step 5: Compare xˆ (k)τ and xˆ (k)τ −1 ||ˆx(k)τ − xˆ (k)τ −1 || ≤ ε. ||ˆx(k)τ −1 ||
(4.124)
If the above condition holds, set xˆ (k) = xˆ (k)τ , and continue to Step 6. Otherwise, τ + 1 → τ , and return to Step 4 Step 6: Update k + 1 → k and the posterior error covariance matrix by: T P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]
T (k), + K(k)R(k) K and return to Step 2
(4.125)
4.5 Minimum Error Entropy Kalman Filter (MEE-KF)
119
Table 4.11 Computational complexities of some equations Equation (4.116) (4.117) (4.118) (4.119) (4.120) (4.121) (4.122) (4.123) (4.125)
Addition/subtraction and multiplication 4nm 2n 3 2 .3(m + n) + (m + n) 3 2 .4n − 2n 2 2 .2n m + 2nm − 2nm 2 2 .2n m + 2nm − 2nm 3 2 .4m − 2m 2 2 .6n m + 2nm − nm 3 2 2 2 .4n + 6n m − 2n + 2nm − nm
Division, matrix inversion, Cholesky decomposition and exponentiation .0 0 2 .(m + n) 3 .n + O(n ) 0 0 3 .m + O(m ) 3 .O(m ) 0
shown in Table 4.11. According to Table 4.11, the computational complexity of MEE-KF is: CMEE−KF = (7T + 8)n3 + (7T )m3 + (19T + 6)n2 m − n2
.
+ (15T + 2)nm2 + T O(n3 ) + T O(mn) + T m + (5T − 1)n + (6T − 1)nm + 2T O(m3 ).
(4.126)
The MEE-KF has additional computational burden caused by the error entropy functions in comparison with the KF and has higher computational complexity than MCKF. However, the computational complexities of the MEE-KF, MCKF, and KF are similar in the sense of order of magnitude.
4.5.3 Convergence Issue This section provides a sufficient condition to ensure the convergence of the fixedpoint iteration in MEE-KF. The proof will be omitted here since similar proof can be found in Chap. 3. From Eq. (4.103), we can rewrite: g(x(k)) = M−1 ww Mdw ,
.
where .M−1 ww
=
(4.127)
L L κ (e (k) − e (k))[w (k) − w (k)]T σ j i j i and .Mdw i=1 j =1 ×[wj (k) − wi (k)]
L L κ (e (k) − e (k))][d (k) − d (k)] σ j i j i . T i=1 j =1 ×[wj (k) − wi (k)]
=
120
4 Kalman Filtering Under Information Theoretic Criteria
The .n × n Jacobian matrix of .g(x(k)) with respect to .x(k) is: ∂ M−1 Mdw ∇x(k) g(x(k)) = ∂x(k) ww ∂g(x(k)) ∂g(x(k)) · · · = ∂g(x(k)) ∂xL ∂x1 (k) ∂x2 (k) −1 L L ξ (k)[w (k) − w (k)]T M . 1 j i ww =− 2 2 g(x(k)) ×[wj (k) − wi (k)] 2L σ i=1 j =1 L L ξ (k)[d (k) − d (k)] M−1 1 j i , + 2ww2 T (k) − w (k)] ×[w 2L σ j i i=1 j =1
(4.128)
in which .ξ1 (k) = (ej (k) − ei (k))(wjs (k) − wis (k))κσ (e(j ) − e(i)). Define .|| · ||p as an .lp -norm (.p ≥ 1) of a vector or an induced norm of a matrix as .||A||p = max ||AW||p /||W||p . Then, the following theorem holds: ||W||p =0
|dj (k) − di (k)| i=1 j =1 × wj (k) − wi (k) 1
L L √ n
Theorem 4.3 If .βMEE−KF > ρMEE−KF =
⎡
kernel size satisfies .σ ≥ max{σ1 , σ2 }, we have: .
⎤
[wj (k) − wi (k)]T ⎦ i=1 j =1 ×[wj (k) − wi (k)]
λmin ⎣
L L
||g(x(k))||1 ≤ βMEE−KF ||∇x(k) g(x(k))||1 ≤ αMEE−KF < 1,
and the
(4.129)
where .λmin is the minimum eigenvalue of the matrix .Mww and .σ1 and .σ2 are the solutions of .ϕMEE−KF (σ ) and .ψMEE−KF (σ ), respectively, i.e.: L L |d (k) − d (k)| √ j i n (k) − wi (k) 1 × w j i=1 j =1
= ϕMEE−KF (σ ) .||g(x(k))||1 ≤ L L ξ (k)[w (k) − w (k)]T 2 j i λmin i=1 j =1 ×[wj (k) − wi (k)] (4.130)
with .ξ2 (k) = κσ (βMEE−KF ||wj (k) − wi (k)||1 + |dj (k) − di (k)|), and
||∇x(k) g(x(k))||1 ≤
.
σ 2λ
min
√ MEE−KF n L L ξ (k)[w (k) − w (k)]T 2 j i i=1 j =1 ×[wj (k) − wi (k)]
= ψMEE−KF (σ )
(4.131)
4.5 Minimum Error Entropy Kalman Filter (MEE-KF)
with .MEE−KF
121
(βMEE−KF ||wj (k) − wi (k)||1 + |dj (k) − di (k)|) T L L ||w (k) − w (k)|| (β j i 1 MEE−KF ||[wj (k) − wi (k)] = . i=1 j =1 [wj (k) − wi (k)]||1 + [dj (k) − di (k)] ||wj (k) − wi (k)||1 )
According to Theorem 4.3 and Banach fixed-point theorem, the fixed-point algorithm of (4.103) will converge to a unique fixed-point in the range .x(k) ∈ {||x(k)||1 ≤ βMEE−KF } provided that the kernel size .σ is larger than a certain value and the initial state vector satisfies .||x(k)0 ||1 ≤ βMEE−KF .
4.5.4 Illustrative Example In this section, the performance of the MEE-KF is demonstrated in the example of land vehicle navigation. The results are compared with KF and MCKF. In the simulations, the performance index MSE is computed by averaging over 100 independent Monte Carlo runs. In each run, we use 3000 samples (time steps) to compute the MSE. In a linear land vehicle navigation system, the state and measurement are described by: ⎡
1 ⎢0 . x(k) = ⎢ ⎣0 0 .
⎤ 0 T 0 1 0 T ⎥ ⎥ x(k − 1) + q(k − 1) 0 1 0 ⎦ 0 0 1
−1 0 −1 0 x(k − 1) + r(k), 0 −1 0 −1
y(k) =
(4.132)
(4.133)
where .x(k) = [x1 (k), x2 (k), x3 (k), x4 (k)]T is the state vector with components: north position, east position, north velocity, and east velocity; .T = 0.3. The real state, the prior estimate, and the error covariance matrix are initialized by: .
x(0) = [0, 0, 10tanθ, 10]T.
(4.134)
xˆ (1|0) = [1, 1, 1, 1]T.
(4.135)
P(1|0) = diag{900, 900, 4, 4},
(4.136)
where .θ = π/3. In the dynamic system, the process noises are assumed to be of the Gaussian distribution with .q(k) ∼ N (0, 0.001). First, we investigate the influence of the measurement noise on the MSE performance. For the distribution of .r(k), we consider four cases:
122
4 Kalman Filtering Under Information Theoretic Criteria
Table 4.12 Estimation results of different algorithms under different noises (From [125]) Noises Algorithms MSE of x1 Case (1) KF .0.0762 ± 0.0087 MCKF .0.0762 ± 0.0074 MEE-KF .0.0791 ± 0.0042 Case (2) KF .0.5011 ± 0.0502 MCKF .0.3303 ± 0.0524 MEE-KF .0.2785 ± 0.0368 Case (3) KF .0.4903 ± 0.0887 MCKF .0.3352 ± 0.0005 MEE-KF .0.2714 ± 0.0314 Case (4) KF .1.1973 ± 0.0369 MCKF .0.8420 ± 0.0789 MEE-KF .0.6087 ± 0.0524
MSE of x2 ± 0.0026 .0.0762 ± 0.0027 .0.0789 ± 0.0045 .0.4868 ± 0.0783 .0.3255 ± 0.0785 .0.1794 ± 0.0478 .0.4840 ± 0.0375 .0.3376 ± 0.0365 .0.1796 ± 0.0962 .1.1918 ± 0.0468 .0.8256 ± 0.0546 .0.4998 ± 0.0479 .0.0762
MSE of x3 ± 0.0089 .0.0627 ± 0.0080 .0.0711 ± 0.0010 .0.1595 ± 0.0812 .0.1495 ± 0.0867 .0.1377 ± 0.0467 .0.1572 ± 0.0363 .0.1527 ± 0.0770 .0.1364 ± 0.070 .0.3911 ± 0.0126 .0.3604 ± 0.0345 .0.3225 ± 0.0425 .0.0626
MSE of x4 ± 0.0027 .0.0593 ± 0.0059 .0.0725 ± 0.0085 .0.1396 ± 0.0189 .0.1295 ± 0.0192 .0.1155 ± 0.0245 .0.1292 ± 0.0889 .0.1348 ± 0.0699 .0.1161 ± 0.0368 .0.3640 ± 0.0469 .0.3330 ± 0.0549 .0.3161 ± 0.0245 .0.0593
Bold values highlight the best results Table 4.13 Parameter settings of MCKF and MEE-KF Algorithms Parameters Gaussian noise Gaussian noise with outliers Mixture Gaussian noise Mixture Gaussian noise with outliers
MCKF
MEE-KF
.σ
.ε
10 6.0 6.0 5.0
.10
−6
−6 .10 −6 .10 −6 .10
.σ
.ε
10 2.0 2.0 1.5
.10
−6 −6
.10
−6
.10
−6
.10
(1) Gaussian noise, .r(k) ∼ N (0, 0.05); (2) Gaussian noise with outliers, r(k) ∼ 0.99N (0, 0.001) + 0.01N (0, 1000); (3) mixture Gaussian noise, .r(k) ∼ 0.99N (−0.1, 0.001) + 0.01N (0.1, 1000); and (4) mixture Gaussian noise with outliers, .r(k) ∼ 0.48N (−0.1, 0.001) + 0.04N (0, 1000) + 0.48N (0.1, 0.001); the process noise is .q(k) ∼ N (0, 0.01) in this case. Table 4.12 gives the estimation results of KF, MCKF, and MEE-KF in the presence of different measurement noises. The parameter settings of MCKF and MEE-KF are given in Table 4.13, and all these parameters are set by trials to obtain the best performance. From Table 4.12, we observe (i) under Gaussian noises, the MCKF and KF achieve almost the same performance and both perform slightly better than the MEE-KF and (ii) when large outliers or mixture Gaussian noise are included in the measurement, MCKF and MEE-KF show better performance than KF, and especially the MEE-KF achieves the best performance among the three algorithms. Next, we investigate the influence of the kernel size on the estimation performance. Table 4.14 shows the MSEs of different filters with different kernel sizes, where the measurement noise is the same as Case (4). The thresholds in MCKF and MEE-KF are set to .10−6 . From Table 4.14, we find out that if the kernel size is too small or too large, the performance of MCKF and MEE-KF will become worse. .
4.6 Conclusion
123
Table 4.14 Estimation results of different algorithms with different Kernel sizes (From [125]) Algorithms .σ KF N/A MCKF .σ = 3.0 .σ = 4.0 .σ = 5.0 .σ = 10.0 .σ = 20 MEE-KF .σ = 1.0 .σ = 1.5 .σ = 2.0 .σ = 3.0 .σ = 5.0 .σ = 10
MSE of x1 ± 0.3572 .N/A .0.8213 ± 0.5663 .0.8423 ± 0.4875 .1.0576 ± 0.3515 .1.1668 ± 0.3709 .1.7206 ± 0.3869 .0.6258 ± 0.2942 .0.6833 ± 0.3411 .0.8410 ± 0.2367 .1.0381 ± 0.2719 .1.1912 ± 0.2995 .1.2031
MSE of x2 ± 0.2139 .N/A .0.8437 ± 0.3573 .0.8121 ± 0.2310 .1.0366 ± 0.2103 .1.1748 ± 0.2570 .0.5587 ± 0.3001 .0.5376 ± 0.4943 .0.6671 ± 0.6711 .0.9269 ± 0.2604 .1.1381 ± 0.3910 .1.2142 ± 0.3956 .1.1811
MSE of x3 ± 0.6712 .N/A .0.3773 ± 0.7391 .0.3610 ± 0.0495 .0.3766 ± 0.6477 .0.3895 ± 0.6414 .0.3593 ± 0.5045 .0.3296 ± 0.4930 .0.3490 ± 0.4930 .0.3945 ± 0.4748 .0.4391 ± 0.4750 .0.4628 ± 0.4698 .0.3901
MSE of x4 ± 0.2057 .N/A .0.3298 ± 0.2330 .0.3260 ± 0.0352 .0.3471 ± 0.2066 .0.3615 ± 0.2667 .0.2896 ± 0.1772 .0.2896 ± 0.2441 .0.4213 ± 0.2453 .0.5876 ± 0.3474 .0.7279 ± 0.2161 .0.7814 ± 0.2204 .0.3622
Bold values highlight the best results Table 4.15 Average computing time (sec) for each iteration (From [125]) Algorithms Computing time(sec)
KF 0.000066
MCKF 0.000204
MEE-KF 0.000232
Especially, with a small kernel size, the MCKF may diverge, while MEE-KF seems more stable compared with the MCKF. In this example, the best performance can be achieved by MEE-KF when the kernel size is around .1.5. Finally, the computing time of different filters at each iteration is summarized in Table 4.15, which is measured with MATLAB 2016b running on i5-4590 and .3.30 GHZ CPU. We conclude that the MEE-KF can achieve better performance at the cost of slightly higher computational burden than MCKF.
4.6 Conclusion A robust Kalman-type filtering algorithm, called maximum correntropy Kalman filter (MCKF), is provided by using the maximum correntropy criterion (MCC) as the optimality criterion, instead of using the well-known minimum mean square error (MMSE) criterion. The computational complexity of the MCKF is not expensive, and the convergence is ensured if the kernel size is larger than a certain value. When the kernel size is large enough, the MCKF will behave like the KF. With a proper kernel size, the MCKF can outperform the KF significantly, especially when the underlying system is disturbed by some heavy-tailed non-Gaussian noises. To further improve the performance of MCKF, the generalized maximum correntropy Kalman filter (GMCKF) is presented based on the generalized maximum correntropy criteria (GMCC) instead of maximum correntropy criterion. The
124
4 Kalman Filtering Under Information Theoretic Criteria
GMCKF is more flexible and can perform better than KF and MCKF via adjusting the shape parameter .α when the system is disturbed by heavy-tailed noise such as Laplace with outlier or uniform with outlier. The shape parameter of the GMCKF can be estimated from the data. In addition, the minimum error entropy Kalman filter (MEE-KF) is provided by using the minimum error entropy (MEE) criterion as the optimality criterion. With an appropriate kernel size, the MEE-KF algorithm can achieve better performance than the KF and MCKF especially when the underlying system is disturbed by some complicated non-Gaussian noises such as those of multimodal distributions. Simulations on land vehicle navigation have confirmed the excellent performance of the MEE-KF.
4.7 Appendices 4.7.1 Appendix 4.A −1
W (k) =
.
I (k) H (k)
−1 0 p (k|k − 1) = −1 0 r (k)
−1 p (k|k − 1) = −1 r (k) H (k) (k) =
.
−1
D (k) =
.
=
I H (k)
x (k) 0 0 y (k)
xˆ (k|k − 1) (k) y (k)
(4.137)
(4.138)
ˆ −1 x − 1) − 1) (k|k (k|k p . −1 r (k) y (k)
(4.139)
By (4.137) and (4.138), we have: −1‘ WT (k) (k) W (k) .
=
−1 T T −1 T −1 −1 −1 + H H , x p y r p r
(4.140)
4.7 Appendices
125
where we denote p (k|k − 1) by p , r (k) by r , x (k) by x , and y (k) for simplicity. Using the matrix inversion lemma with the identification: T T −1 x −1 p p → A, H → B, T y −1 H → C, −1 r r → D.
.
We arrive at: −1‘ WT (k) (k) W (k) T −1 T T −1 T −1 T T −1 = p −1 x p − p x p H (r y r + Hp x p H ) T Hp −1 (4.141) x p .
.
Further, by (4.137)–(4.139), we derive: WT (k) (k) D (k) T T . −1 T −1 ˆ x − 1) + H y −1 = −1 (k|k x p p r r y (k) .
(4.142)
Combining (4.19), (4.141), and (4.142), we obtain (4.20).
4.7.2 Appendix 4.B Proof: −1 . lim κσ ( ei (k)) σ →∞
= lim
σ →∞
e2 (k) exp − i 2 2σ
−1 =1
(4.143)
It follows easily that: −1 −1 lim e1 (k)) , . . . , κσ−1 ( en (k))} x (k) = lim diag{κσ (
σ →∞ .
σ →∞
= diag{1, . . . , 1} + ,- . n
(4.144)
126
4 Kalman Filtering Under Information Theoretic Criteria −1 −1 lim en+1 (k)) , ..., κσ−1 ( en+m (k))} y (k) = lim diag{κσ (
σ →∞
σ →∞
.
= diag{1, ..., 1} + ,- .
(4.145)
m T −1 P(k|k − 1) = lim p (k|k − 1) lim x (k)p (k|k − 1)
σ →∞
σ →∞
(4.144) = p (k|k − 1)Tp (k|k − 1)
.
(4.146)
(4.10) = P(k|k − 1) T −1 R(k) = lim r (k) lim y (k)r (k)
σ →∞
σ →∞
(4.145) = r (k)Tr (k)
.
(4.147)
(4.10) = R(k) = lim lim K(k)
σ →∞
σ →∞
−1 P(k|k − 1)HT (k) H(k) P(k|k − 1)HT (k) + R(k)
−1 (4.146)(4.147) = P(k|k − 1)HT (k) H(k)P(k|k − 1)HT (k) + R(k)
.
(4.6) = K(k)
y(k) − H(k)ˆx(k|k − 1) lim xˆ (k)t = lim xˆ (k|k − 1) + K(k)
σ →∞
σ →∞
(4.148) = xˆ (k|k − 1) + K(k) y(k) − H(k)ˆx(k|k − 1)
.
(4.148)
(4.149)
(4.7) = xˆ (k) P(k) lim
σ →∞
= lim
σ →∞
.
T T (k) I − K(k)H(k) P(k|k − 1) I − K(k)H(k) + K(k)R(k) K
(4.148) = (I − K(k)H(k)) P(k|k − 1)(I − K(k)H(k))T + K(k)R(k)KT (k) (4.8) = P(k) (4.150)
This completes the proof.
Chapter 5
Extended Kalman Filtering Under Information Theoretic Criteria
To improve the performance in nonlinear situations, the extended Kalman filter (EKF) is broadly applied in many areas, particularly in nonlinear target tracking. Compared to unscented Kalman filter (UKF) and cubature Kalman filter (CKF), the main advantages of EKF are its conceptual and computational simplicity. The maximum correntropy criterion (MCC) and minimum error entropy (MEE) can be applied to EKF to improve the robustness to non-Gaussian noises, resulting in two alternatives: the maximum correntropy extended Kalman filter (MCEKF) [118] and the minimum error entropy extended Kalman filter (MEE-EKF) [125].
5.1 Extended Kalman Filter (EKF) Consider a nonlinear system in the following form: .
x(k) = f(x(k − 1)) + q(k − 1).
(5.1)
y(k) = h(x(k)) + r(k),
(5.2)
where .x(k) ∈ Rn and .y(k) ∈ Rm are the n-dimensional state vector and mdimensional measurement vector at time k; .f (·), and .h (·) denote the continuously differentiable nonlinear state function and measurement function; and .q(k − 1) and .r(k) represent the process noise and measurement noise, respectively. .q(k − 1) and .r(k) are mutually uncorrelated with zero means and covariance matrices: E q(k − 1)qT (k − 1) = Q(k − 1), E r(k)rT (k) = R(k).
.
(5.3)
As an extended version of KF, the EKF consists of two steps:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_5
127
128
5 Extended Kalman Filtering Under Information Theoretic Criteria
5.1.1 Prediction The prior state mean and covariance matrix are: ˆ (k|k − 1) = f xˆ (k − 1) , .x P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1),
.
(5.4) (5.5)
where the Jacobian matrix .F(k − 1) ∈ Rn×n is given by: ∂f(x(k − 1)) . .F(k − 1) = ∂x x=ˆx(k−1)
5.1.2 Update The Kalman filter gain is calculated as: −1 K(k) = P(k|k − 1)HT (k) H(k)P(k|k − 1)HT (k) + R(k) ,
.
(5.6)
where the Jacobian matrix .H(k) ∈ Rm×n is given by: H(k) =
.
∂h(x(k)) ∂x x=ˆx(k|k−1)
The posterior state mean and covariance matrix are then updated by: xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − h xˆ (k|k − 1)
.
P(k) = (I − K(k)H(k)) P(k|k − 1)(I − K(k)H(k))T + K(k)R(k)KT (k).
.
(5.7) (5.8)
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF) MCC can bring great improvement on filtering performance in heavy-tailed noise environments, and it has been utilized to develop two kinds of MCEKF, called linear regression maximum correntropy extended Kalman filter (LRMCEKF) [118] and nonlinear regression maximum correntropy extended Kalman filter (NRMCEKF) [118]. In both filters, the prediction step is similar to the original EKF, and the prior estimates of the state and covariance matrix are solved by using the state mean and covariance matrix propagation equations. Then the posterior estimates are updated by using a fixed-point iterative algorithm.
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF)
129
5.2.1 Linear Regression MCEKF Measurement equation (5.2) can be approximated by: y(k) ≈ h xˆ (k|k − 1) + H(k) x(k) − xˆ (k|k − 1) + r(k).
.
(5.9)
Combining (5.9) with the state prediction error, we have: .
xˆ (k|k − 1) I = x(k) y(k) − h xˆ (k|k − 1) + H(k)ˆx(k|k − 1) H(k)
− x(k) − xˆ (k|k − 1) + , r(k) (5.10)
which can be rewritten in a normalized form: z(k) = W(k)x(k) + e(k),
(5.11)
.
xˆ (k|k − 1) with .z(k) = (k)−1 , .W(k) = y(k) − h xˆ (k|k − 1) + H(k)ˆx(k|k − 1)
I − x(k) − xˆ (k|k − 1) (k)−1 , .e(k) = (k)−1 , and .(k) can be H (k) r(k) obtained by Cholesky decomposition of T − x(k) − xˆ (k|k − 1) − x(k) − xˆ (k|k − 1) .E . r(k) r(k) The optimal estimate of .x(k) under MCC can be formulated as: xˆ (k) = arg max JLRMCEKF (x(k)) = arg max
L
.
x(k)
x(k)
κσ (ei (k)) ,
(5.12)
i=1
where: ei (k) = zi (k) − wi (k)x(k).
.
(5.13)
Accordingly, the solution of .x(k) can be found by solving: .
∂JLRMCEKF (x(k)) = 0. ∂x(k)
(5.14)
Similar to the MCKF, the state estimate .xˆ (k) based on the batch regression model can be solved by a fixed-point update equation: xˆ (k)τ = g xˆ (k)τ −1 ,
.
(5.15)
130
5 Extended Kalman Filtering Under Information Theoretic Criteria
where:
−1
x 0 x 0 W(k) z(k), g xˆ (k)τ −1 = W(k)T W(k)T 0 y 0 y
.
(5.16)
with: x (k) = diag{κσ (e1 (k)) , ..., κσ (en (k))},
(5.17)
y (k) = diag{κσ (en+1 (k)) , ..., κσ (en+m (k))}.
(5.18)
.
.
Note that (5.15) can be further written as follows (see Appendix 5.5.1 for detailed derivation): xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − h xˆ (k|k − 1)
(5.19)
−1 , K(k) = P(k|k − 1)H(k)T H(k)P(k|k − 1)H(k)T + R(k)
(5.20)
.
with: .
P(k|k − 1) = p (k)(x (k))−1 Tp (k),
(5.21)
−1 R(k) = r (k) y (k) Tr (k),
(5.22)
.
.
where .p (k) and .r (k) can be obtained by the Cholesky decomposition of .P(k|k − 1) and .R(k). Furthermore, the posterior covariance matrix can be updated by: T T P(k) = I − K(k)H(k) P(k|k − 1) I − K(k)H(k) + K(k)R(k) K(k) . (5.23)
.
In practical applications, the measurement value may be very large when a system failure occurs. In this case, the LRMCEKF solution faces a numerical problem since .(k) is nearly singular. Aiming at this issue, the following approach is introduced: (k) = y(k) − h xˆ (k|k − 1)
(5.24)
Pyy (k) = H(k)P(k|k − 1)HT (k) + R(k)
(5.25)
o(k) = (k)T Pyy (k)−1 (k).
(5.26)
.
.
.
If .|o(k)| > othr with .othr > 0 being a preset positive threshold, the prediction step is carried out, i.e., .xˆ (k) = xˆ (k|k − 1), .P(k) = P(k|k − 1). If .|o(k)| ≤ othr , the prediction and update steps are both carried out.
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF)
131
The LRMCEKF algorithm is summarized in Algorithm 4. Algorithm 4 Linear regression maximum correntropy extended Kalman filter (LRMCEKF) [118] Step 1: Initialize the state a priori estimate xˆ (1|0) and state prediction error covariance matrix P(1|0); set a proper kernel size σ and a small positive number ε Step 2: Calculate F(k − 1) and H(k); calculate .xˆ (k|k − 1) = f xˆ (k − 1) , P(k|k − 1) = F(k − 1)P(k − 1)F(k − 1)T + Q(k − 1); use the Cholesky decomposition of (k) to obtain p (k|k − 1) and r (k); calculate z(k) and W(k) Step 3: Update: (k) = y(k) − h xˆ (k|k − 1) Pyy (k) = H(k)P(k|k − 1)HT (k) + R(k) o(k) = (k)T Pyy (k)−1 (k)
(5.27)
xˆ (k) = xˆ (k|k − 1)
(5.28)
P(k) = P(k|k − 1)
(5.29)
if |o(k)| > othr then:
else: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1), where xˆ (k)τ denotes the estimated state at the fixed-point iteration τ : y(k) − h xˆ (k|k − 1) xˆ (k)τ = xˆ (k|k − 1) + K(k)
(5.30)
−1 P(k|k − 1)HT (k) + R(k) K(k) = P(k|k − 1)HT (k) H(k)
(5.31)
P(k|k − 1) = p (k|k − 1)(x (k))−1 Tp (k|k − 1)
(5.32)
−1 R(k) = r (k) y (k) Tr (k)
(5.33)
x (k) = diag{κσ (e1 (k)) , ..., κσ (en (k))}
(5.34)
y (k) = diag{κσ (en+1 (k)) , ..., κσ (en+m (k))}
(5.35)
ei (k) = zi (k) − w(k)ˆx(k)τ −1
(5.36)
with:
xˆ (k) −ˆx(k) if . xˆτ(k) τ−1 ≤ ε then go to Step 4. τ −1 else τ = τ + 1, go to step (5.30) Step 4: Update k + 1 → k and the posterior error covariance matrix by: T P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]
T (k) + K(k)R(k) K and return to Step 2
(5.37)
132
5 Extended Kalman Filtering Under Information Theoretic Criteria
Remark 5.1 Unlike EKF, the LRMCEKF adopts a fixed-point iterative algorithm to update the posterior estimates. The small positive parameter .ε in Algorithm 4 provides a stop condition to obtain the converged solution, and the initial iteration value can be set to the prior estimate .xˆ (k|k − 1). The kernel size .σ is a key parameter in LRMCEKF. As discussed in Chapter 3, under certain conditions, if the kernel size is larger than a certain value, the convergence of the fixed-point algorithm under MCC can be guaranteed. For a convergent algorithm, the algorithm is more robust to heavy-tailed noise when the kernel size becomes small. If the kernel size .σ → ∞, then .(k) → I, and in this case, the LRMCEKF algorithm will reduce to the original EKF.
5.2.2 Nonlinear Regression MCEKF In this section, a nonlinear regression method is adopted to derive the NRMCEKF algorithm, which can further improve the filtering performance. If the measurement equation is not linearly approximated by (5.9), we have: .
xˆ (k|k − 1) x(k) − x(k) − xˆ (k|k − 1) = + . y(k) h (x(k)) r(k),
(5.38)
Similar to (5.11), we obtain the following nonlinear regression model: d(k) = h ¯ (x(k)) + e(k),
(5.39)
.
xˆ (k|k − 1) x(k) −1 where .d(k) = (k) , .h , .e(k) = ¯ (x(k)) = (k) y(k) h (x(k))
− x(k) − xˆ (k|k − 1) −1 (k) , and .(k) is obtained in the same way as that in r(k) (5.11). The optimal estimate under MCC can then be formulated by: −1
xˆ (k) = arg max JNRMCEKF (x(k)) = arg max
L
.
x(k)
x(k)
κσ (ei (k)) ,
(5.40)
i=1
where: ei (k) = di (k) − h ¯ i (x(k)) .
.
Accordingly, we are able to find the optimal solution by solving:
(5.41)
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF)
.
T
∂h ¯ (x(k) ∂x(k)
x 0 d(k) − h ¯ (x(k)) = 0, 0 y
133
(5.42)
where .y and .y are obtained in the same way with that in (5.17) and (5.18). Linearizing .h ¯ (x(k)) around the estimate .xˆ (k)τ obtained at each iteration of the fixed-point algorithm, we have:
x 0 −1 T x 0 M xˆ (k)τ −1 xˆ (k)τ = MT xˆ (k)τ −1 M xˆ (k)τ −1 0 y 0 y . × M xˆ (k)τ −1 xˆ (k)τ −1 − h ¯ xˆ (k)τ −1 + d(k) , (5.43) where: ∂h ¯ (x(k)) ˆ (k)τ −1 = . (5.44) .M x ∂x x=ˆx(k)τ −1 Similar to (5.15), (5.43) can be further written as follows (see Appendix 5.B in Sect. 5.5.2 for detailed derivation): xˆ (k)τ =ˆx(k|k − 1) + K(k) y(k) − h xˆ (k)τ −1 . −H xˆ (k)τ −1 xˆ (k|k − 1) − xˆ (k)τ −1 ,
(5.45)
with:
H xˆ (k)τ −1
.
∂h(x(k)) = , ∂x x=ˆx(k)τ −1
K(k) = P(k|k − 1)HT xˆ (k)τ −1 −1 . × H xˆ (k)τ −1 P(k|k − 1)HT xˆ (k)τ −1 + R(k)
(5.46)
.
(5.47)
We can still use (5.17), (5.18), (5.21), and (5.22) to obtain .x (k), .y (k), .P(k|k −1), and .R(k), respectively. In addition, the posterior covariance matrix can be updated by: T P(k) = I − K(k)H xˆ (k)τ −1 P(k|k − 1) I − K(k)H xˆ (k)τ −1 ) .
T + K(k)R(k) K(k) .
(5.48)
Moreover, we can employ (5.24)–(5.26) to avoid the problem that .(k) is nearly singular.
134
5 Extended Kalman Filtering Under Information Theoretic Criteria
The NRMCEKF algorithm is then summarized in Algorithm 5. Algorithm 5 Nonlinear regression maximum correntropy extended Kalman filter (NRMCEKF) [118] Step 1: Initialize the state a priori estimate xˆ (1|0) and state prediction error covariance matrix P(1|0); set a proper kernel size σ and a small positive number ε Step 2: Calculate F(k − 1) and H(k); calculate .xˆ (k|k − 1) = f xˆ (k − 1) , P(k|k − 1) = F(k − 1)P(k − 1)F(k − 1)T + Q(k − 1); use the Cholesky decomposition of (k) to obtain p (k|k − 1) and r (k); calculate d(k) Step 3: Update: (k) = y(k) − h xˆ (k|k − 1) Pyy (k) = H(k)P(k|k − 1)HT (k) + R(k)
if |o(k)| > othr then:
o(k) = (k)T Pyy (k)−1 (k)
(5.49)
xˆ (k) = xˆ (k|k − 1)
(5.50)
P(k) = P(k|k − 1)
(5.51)
else: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1), where xˆ (k)τ denotes the estimated state at the fixed-point iteration τ : y(k) − h xˆ (k)τ −1 xˆ (k)τ =ˆx(k|k − 1) + K(k) (5.52) −H xˆ (k)τ −1 xˆ (k|k − 1) − xˆ (k)τ −1 with: ∂h(x(k)) H xˆ (k)τ −1 = ∂x x=ˆx(k)τ −1 −1 P(k|k − 1)HT xˆ (k)τ −1 + R(k) K(k) = P(k|k − 1)HT xˆ (k)τ −1 H xˆ (k)τ −1
(5.53)
P(k|k − 1) = p (k|k − 1)(x (k))−1 Tp (k|k − 1)
(5.54)
−1 R(k) = r (k) y (k) Tr (k)
(5.55)
x (k) = diag{κσ (e1 (k)) , ..., κσ (en (k))}
(5.56)
y (k) = diag{κσ (en+1 (k)) , ..., κσ (en+m (k))}
(5.57)
ei (k) = di (k) − h ¯ i xˆ (k)τ −1
(5.58)
T (k) xˆ (k)τ −1 ]T + K(k)R(k) P(k) = [I − K(k)H xˆ (k)τ −1 ]P(k|k − 1)[I − K(k)H K xˆ (k) −ˆx(k) if . xˆτ(k) τ−1 ≤ ε, then go to Step 4 τ −1 else τ = τ + 1, go to step (5.52) Step 4: Update k + 1 → k and return to Step 2
(5.59)
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF)
135
Remark 5.2 Different from the LRMCEKF algorithm, the NRMCEKF adopts a nonlinear regression model to improve the estimation accuracy. The solution can be obtained by linearizing .d(k) around the estimate .xˆ (k)τ at each fixed-point iteration. Again, the kernel size .σ is an important parameter in NRMCEKF. If the kernel size .σ → ∞, the matrix .C(k) → I, and in this case, the NRMCEKF will reduce to the iterated EKF (IEKF) [150].
5.2.3 Illustrative Example In this section, an illustrative example is given to demonstrate the performances of the two MCEKF algorithms. We evaluate the performance based on the following measures: MSE1 (l) =
.
lmc 1
lmc
MSE2 (k) =
.
2
(x(k) − xˆ (l) (k)) , l = 1, . . . , lmc ,
(5.60)
l=1
N 1 2 (x(k) − x(k)) ˆ , k = 1, . . . , N, N
(5.61)
k=1
AMSE =
.
lmc N 1 1 MSE2 (l), MSE1 (k) = lmc N k=1
(5.62)
l=1
where N denotes the total time instants in one Monte Carlo run and .lmc denotes the total number of Monte Carlo runs. This example employs a practical tracking model [62] that deals with the estimation problem of the altitude, velocity, and ballistic coefficient of a vertically falling body. The model is derived by rectangle integral in the discrete time with sampling period .T and the measurements are achieved from radar every .0.1s. It can be described by: xk,1 =xk−1,1 + T xk−1,2 + qk−1,1 2 xk,2 =xk−1,2 + T ρ0 exp −xk−1,1 /a xk−1,2 xk−1,3 /2 .
− T g + qk−1,2
(5.63)
xk,3 =xk−1,3 + qk−1,3 yk1 =
b2 + (xk1 ,1 − H )2 .+ rk1 ,
(5.64)
136
5 Extended Kalman Filtering Under Information Theoretic Criteria
Fig. 5.1 Geometry of the example (From [118])
T where the state vector .x(k) = xk,1 xk,2 xk,3 contains the altitude, velocity, and ballistic coefficient, the sampling period is .T = 0.001s, the acceleration of gravity is .g = 32.2 ft/s2 , the altitude of radar is .H = 100,000 ft, the horizontal range between the body and the radar is .b = 100,000 ft, the constant .ρ0 is .ρ0 = 2, and the constant a that relates the air density with altitude is .a = 20,000. The geometry is described in Fig. 5.1. Several other filters are also tested for comparison, including EKF, IEKF, and Huber-EKF (HEKF) [87]. Similar to [62], suppose there is no process noise. The T initial state is .x(0) = 300,000, −20,000, 1/1000 , and the initial estimate and corresponding covariance matrix are: T xˆ (0) = 300,000, −20,000, 3 × 10−5 . . P(0) = diag{1,000,000, 4,000,000, 10−4 } First, we suppose that the measurement noise obeys Gaussian distribution: r(k) ∼ N (0, 10,000).
.
We consider the first 60s and make 100 independent Monte Carlo runs. Table 5.1 summarizes the .AMSEs of .xk,1 , .xk,2 , and .xk,3 , and Table 5.2 gives the average iteration numbers. One can see that when the kernel size is very large, the LRMCEKF and NRMCEKF show similar performance with EKF and IEKF, and both outperform the HEKF. Meanwhile, the NRMCEKF can achieve more accurate results than LRMCEKF; the average iteration numbers in LRMCEKF and NRMCEKF are slightly more than .2.0.
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF)
137
Table 5.1 .AMSEs of .xk,1 , .xk,2 , and .xk,3 in Gaussian noise (From [118]) Filter EKF IEKF HEKF LRMCEKF.(σ = 2.0) LRMCEKF.(σ = 3.0) LRMCEKF.(σ = 5.0) LRMCEKF.(σ = 10) LRMCEKF.(σ = 20) NRMCEKF.(σ = 2.0) NRMCEKF.(σ = 3.0) NRMCEKF.(σ = 5.0) NRMCEKF.(σ = 10) NRMCEKF.(σ = 20) Table 5.2 Average iteration numbers in Gaussian noise (From [118])
.AMSE
.AMSE
.AMSE
.6.1896
of .xk,1 × 103 3 .6.1740 × 10 3 .7.0433 × 10 3 .7.9526 × 10 3 .6.5102 × 10 3 .6.2510 × 10 3 .6.1976 × 10 3 .6.1884 × 10 3 .7.9444 × 10 3 .6.4972 × 10 3 .6.2388 × 10 3 .6.1859 × 10 3 .6.1767 × 10
.7.3201
of .xk,2 × 103 3 .7.3122 × 10 3 .7.8864 × 10 3 .8.4617 × 10 3 .7.5578 × 10 3 .7.3705 × 10 3 .7.3281 × 10 3 .7.3203 × 10 3 .8.4571 × 10 3 .7.5519 × 10 3 .7.3647 × 10 3 .7.3224 × 10 3 .7.3146 × 10
.7.1917
Filter IEKF HEKF LRMCEKF.(σ = 2.0) LRMCEKF.(σ = 3.0) LRMCEKF.(σ = 5.0) LRMCEKF.(σ = 10) LRMCEKF.(σ = 20) NRMCEKF.(σ = 2.0) NRMCEKF.(σ = 3.0) NRMCEKF.(σ = 5.0) NRMCEKF.(σ = 10) NRMCEKF.(σ = 20)
of .xk,3 × 10−7 −7 .7.1916 × 10 −7 .6.9610 × 10 −7 .6.8376 × 10 −7 .6.8713 × 10 −7 .7.0575 × 10 −7 .7.1570 × 10 −7 .7.1832 × 10 −7 .6.8370 × 10 −7 .6.8708 × 10 −7 .7.0572 × 10 −7 .7.1567 × 10 −7 .7.1828 × 10 Average iteration number 2.0309 2.1206 2.3132 2.2134 2.1264 2.0538 2.0121 2.3154 2.2169 2.1339 2.0700 2.0406
Second, we suppose that the measurement noise is a heavy-tailed non-Gaussian noise, subjecting to a mixed-Gaussian distribution: r(k) ∼ (1 − a)N (0, 1000) + aN (0, 100,000),
.
where .a = 0.1. Figures 5.2, 5.3, 5.4, 5.5, 5.6 and 5.7 show the .MSE1 s of .xk,1 , .xk,2 , and .xk,3 for different filters. Tables 5.3 and 5.4 summarize the .AMSEs and average iteration numbers, respectively. Again, it can be observed that when the kernel size is large enough, the LRMCEKF and NRMCEKF achieve almost the same performance with EKF and IEKF, respectively. However, with an appropriate kernel size (say .σ = 2.0), the LRMCEKF and NRMCEKF can outperform the EKF, IEKF, and HEKF. Meanwhile, the NRMCEKF can achieve slightly better performance than
138
5 Extended Kalman Filtering Under Information Theoretic Criteria
Fig. 5.2 .MSE1 of .xk,1 with LRMCEKF and other filters in non-Gaussian noise (From [118])
Fig. 5.3 .MSE1 of .xk,2 with LRMCEKF and other filters in non-Gaussian noise (From [118])
LRMCEKF. Moreover, the fixed-point algorithm in LRMCEKF and NRMCEKF may converge to the optimal solution very quickly (after about two iterations).
5.2 Maximum Correntropy Extended Kalman Filter (MCEKF)
Fig. 5.4 .MSE1 of .xk,3 with LRMCEKF and other filters in non-Gaussian noise (From [118])
Fig. 5.5 .MSE1 of .xk,1 with NRMCEKF and other filters in non-Gaussian noise (From [118])
139
140
5 Extended Kalman Filtering Under Information Theoretic Criteria
Fig. 5.6 .MSE1 of .xk,2 with NRMCEKF and other filters in non-Gaussian noise (From [118])
Fig. 5.7 .MSE1 of .xk,3 with NRMCEKF and other filters in non-Gaussian noise (From [118])
5.3 Minimum Error Entropy Extended Kalman Filter (MEE-EKF)
141
Table 5.3 .AMSEs of .xk,1 , .xk,2 , and .xk,3 in Non-Gaussian noise (From [118]) Filter EKF IEKF HEKF LRMCEKF.(σ = 2.0) LRMCEKF.(σ = 3.0) LRMCEKF.(σ = 5.0) LRMCEKF.(σ = 10) LRMCEKF.(σ = 20) NRMCEKF.(σ = 2.0) NRMCEKF.(σ = 3.0) NRMCEKF.(σ = 5.0) NRMCEKF.(σ = 10) NRMCEKF.(σ = 20)
.AMSE
.AMSE
.AMSE
.6.8422
of .xk,1 × 103 3 .6.8141 × 10 3 .2.0418 × 10 3 .1.5241 × 10 3 .2.3346 × 10 3 .4.2752 × 10 3 .6.0764 × 10 3 .6.6460 × 10 3 .1.5236 × 10 3 .2.3319 × 10 3 .4.2483 × 10 3 .6.0362 × 10 3 .6.6109 × 10
.6.5163
.8.1397
Table 5.4 Average iteration numbers in non-Gaussian noise (From [118])
of .xk,2 × 103 3 .6.4999 × 10 3 .3.2517 × 10 3 .2.8791 × 10 3 .3.3560 × 10 3 .4.7198 × 10 3 .5.9862 × 10 3 .6.3811 × 10 3 .2.8790 × 10 3 .3.3551 × 10 3 .4.7071 × 10 3 .5.9651 × 10 3 .6.3615 × 10
Filter IEKF HEKF LRMCEKF.(σ = 2.0) LRMCEKF.(σ = 3.0) LRMCEKF.(σ = 5.0) LRMCEKF.(σ = 10) LRMCEKF.(σ = 20) NRMCEKF.(σ = 2.0) NRMCEKF.(σ = 3.0) NRMCEKF.(σ = 5.0) NRMCEKF.(σ = 10) NRMCEKF.(σ = 20)
of .xk,3 × 10−7 −7 .8.1394 × 10 −7 .3.1252 × 10 −7 .2.5917 × 10 −7 .3.4126 × 10 −7 .5.2029 × 10 −7 .7.2457 × 10 −7 .7.9140 × 10 −7 .2.5912 × 10 −7 .3.4124 × 10 −7 .5.2024 × 10 −7 .7.2452 × 10 −7 .7.9135 × 10
Average iteration number 1.9821 1.9984 2.0764 2.0566 2.0291 1.9935 1.9720 2.0777 2.0595 2.0343 2.0054 1.9905
5.3 Minimum Error Entropy Extended Kalman Filter (MEE-EKF) 5.3.1 MEE-EKF Algorithm Similar to LRMCEKF, a batch regression model is established with the same form as (5.10). Then, the optimal solution to .xˆ (k) can be achieved by maximizing the MEE cost function, that is:
142
5 Extended Kalman Filtering Under Information Theoretic Criteria
xˆ (k) = arg max JMEE−EKF (x(k))
.
x(k)
L L 1
κσ (ej (k) − ei (k)). L2
= arg max x(k)
(5.65)
i=1 j =1
Similar to MEE-KF, .xˆ (k) can be solved by a fixed-point iterative algorithm: xˆ (k)τ = g(ˆx(k)τ −1 ) −1 WT (k)(k)z(k) , = WT (k)(k)W(k)
.
(5.66)
with: (k) =(k) − (k) =
.
x (k) yx (k) xy (k) y (k)
,
(5.67)
where: [(k)]ij = κσ (ej (k) − ei (k)).
.
(k) = diag
L
(5.68)
κσ (e1 (k) − ei (k)), ...,
i=1 L
κσ (eL (k) − ei (k)) .
(5.69)
i=1
and .(k) ∈ RL×L , .x (k) ∈ Rn×n , .xy (k) ∈ Rm×n , .yx (k) ∈ Rn×m , and .y (k) ∈ Rm×m . The forms of .(k) and .W(k) are the same as those of MEE-KF, and .z(k) is the same as that of LRMCEKF. Using the derivation similar to that of MEE-KF, one can obtain the MEE-EKF, as provided in Algorithm 6.
5.3.2 Illustrative Example Prediction of the evolution of an infectious disease [151] is used to validate the effectiveness of the MEE-EKF. The center for disease control (CDC) provides the influenza-like illness (ILI) data [151] in the United States, a traditional information source for epidemic estimation. The state process for an epidemic is based on the susceptible-infected-recovered (SIR) model [151], given by:
5.3 Minimum Error Entropy Extended Kalman Filter (MEE-EKF)
143
Algorithm 6 Minimum error entropy extended Kalman filter (MEE-EKF) [125] Step 1: Similar to Step 1 in Algorithm 1 Step 2: Calculate F(k − 1) and H(k); calculate .xˆ (k|k − 1) = f xˆ (k − 1) , P(k|k − 1) = F(k − 1)P(k − 1)F(k − 1)T + Q(k − 1); use the Cholesky decomposition of (k) to obtain p (k|k − 1) and r (k); calculate z(k) and W(k) Step 3: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1), where xˆ (k)τ denotes the estimated state at the fixed-point iteration τ Step 4: Use the available measurements {y(k)}N k=1 to update: xˆ (k)τ = xˆ (k|k − 1) + K(k)[y(k) − h(ˆx(k|k − 1))]
(5.70)
ei (k) = zi (k) − w(k)ˆx(k)τ −1
(5.71)
T P(k|k − 1) + H (k)PTxy (k|k − 1) + Pyx (k|k − 1) + H (k) R(k) H(k) × Pyx (k|k − 1) + HT (k) R(k) ,
K(k) =
−1
(5.72)
T −1 P(k|k − 1) = −1 p (k|k − 1) x (k)p (k|k − 1)
(5.73)
T −1 Pxy (k|k − 1) = −1 r (k) xy (k)p (k|k − 1)
(5.74)
Pyx (k|k − 1) =
(5.75)
−1 p (k|k
− 1)
T
yx (k)−1 r (k)
T −1 R(k) = −1 r (k) y (k)r (k)
(5.76)
Step 5: Similar to the Step 1 in Algorithm 1 Step 6: Update k + 1 → k and the posterior covariance matrix: T P(k) = [I − K(k)H(k)]P(k|k − 1)[I − K(k)H(k)]
T (k) + K(k)R(k) K
(5.77)
and return to Step 2.
.
s(k + 1) = s(k) − δ1 s(k)i(k).
(5.78)
i(k + 1) = i(k) + δ1 s(k)i(k) − δ2 i(k).
(5.79)
b(k + 1) = b(k) + δ2 i(k),
(5.80)
where .s(k), .i(k), and .b(k) are the susceptible, infected, and recovered crowd density, respectively, with .s(k) + i(k) + b(k) = 1 for all k. The parameter .δ1 denotes the rate about the spread of illness, and .δ2 is the rate about recovery from infection. Since the recovered .b(k) can be obtained by .b(k) = 1 − s(k) − i(k), the state of epidemic at time k can be described by .x(k) = [s(k), i(k)]T . Therefore, the state equation can be written as: x(k) = f(x(k − 1)) + q(k),
.
(5.81)
144
5 Extended Kalman Filtering Under Information Theoretic Criteria
where the nonlinear function .(x(k − 1)) is given in Eqs. (5.78) and (5.79). In this example, we suppose .Q(k) = 10−8 In . The measurement equation specifies how the observed data depend on the state of epidemic. When monitoring an epidemic, the unknown .s(k), .i(k), and .b(k) are regarded as hidden states of the model, and the observed data are acquired via syndromic surveillance system. Because only .i(k) can be obtained by the syndromic surveillance system, the measurement equation can be established by: y(k) = Hx(k) + r(k),
(5.82)
.
where the matrix .H = [0 1] denotes the measurement matrix. Here, we suppose R(k) = 10−6 . Figure 5.8 plots 52 weeks density of people infected in the year 2010. Table 5.5 gives the estimated MSE by EKF, MCEKF (LRMCEKF), and MEE-EKF. In Fig. 5.8, a peak around the 25th week occurs, which means that the number of people infected arrives at the maximum. In the SIR model, the contact rate .δ1 and recovery time .δ2 are set to .0.35 and .0.11, respectively. The kernel sizes of MEE-EKF and MCEKF are .20.0 and .16.0, respectively. As one can see, the MEE-EKF provides a much better prediction for the density of people infected than EKF and MCEKF.
.
0.025
MCEKF MEE-EKF EKF Real
0.02
i(k)
Fig. 5.8 Estimation results for infected .i(k) by EKF, MCEKF, and MEE-EKF (From [125])
0.015
0.01
0.005
10
20
30
40
50
Table 5.5 Performance comparison of different algorithms for prediction of infectious disease epidemics (From [125]) Algorithms MSE
EKF 0.001729
MCEKF 0.001703
MEE-EKF 0.000825
5.5 Appendices
145
5.4 Conclusion Two extended Kalman-type filters, called the linear regression maximum correntropy extended Kalman filter (LRMCEKF) and nonlinear regression maximum correntropy extended Kalman filter (NRMCEKF), are presented in this chapter, which adopt the MCC as the optimization criterion and utilize a fixed-point iteration algorithm to obtain the posterior estimate. Simulation results confirm that when the kernel size is large enough, the LRMCEKF and NRMCEKF have similar performance with the original EKF and IEKF. Both filters can achieve better performance than the EKF, IEKF, and HEKF in the presence of heavy-tailed noises. Furthermore, the MEE-based extended Kalman filter (MEE-EKF) is developed to deal with the problem of state estimation of a nonlinear system in complicated noise environments. Simulation results on prediction of infectious disease epidemics have confirmed the desirable performance of the MEE-EKF.
5.5 Appendices 5.5.1 Appendix 5.A One can easily obtain the following equations: W(k) =−1 (k)
(k) =
.
.
I H(k)
−1 0 p (k) = −1 0 r (k) (k) −1 p = −1 r (k)H(k)
.
z(k) =−1 (k)
x (k) 0 0 y (k)
I H(k)
xˆ (k|k − 1) y(k) − h xˆ (k|k − 1) + H(k)ˆx(k|k − 1)
(5.83)
(5.84)
(k)ˆx(k|k − 1) −1 p . = ˆ (k|k − 1) + H(k)ˆx(k|k − 1) −1 r (k) y(k) − h x
(5.85)
146
5 Extended Kalman Filtering Under Information Theoretic Criteria
By (5.83) and (5.84), we have: −1‘ W(k)T (k)W(k) .
=
−1 p (k)
T
T x (k)−1 p (k) + H (k)
−1 T −1 −1 r (k) y (k)r (k)H(k) . (5.86)
Utilizing the matrix inversion lemma with the identification: T −1 T −1 p (k) x (k)p (k) → A, H (k) → B,
.
T −1 H(k) → C, −1 r (k) y (k)r (k) → D.
We arrive at: −1 W(k)T (k)W(k)
.
= p (k)(x (k))−1 p (k)T − p (k)(x (k))−1 p (k)T H(k)T −1 −1 × r (k) y (k) r (k)T + H(k)p (k)(x (k))−1 p (k)T H(k)T × H(k)p (k)(x (k))−1 p (k)T . (5.87)
Furthermore, by (5.83)–(5.85), we derive: W(k)T (k)z(k) T −1 = −1 . p (k) x (k)p (k)ˆx(k|k − 1) T −1 y(k) − h xˆ (k|k − 1) + H(k)ˆx(k|k − 1) . + H(k)T −1 r (k) y (k)r (k)
(5.88) Combining (5.15), (5.87), and (5.88), we have (5.19).
5.5.2 Appendix 5.B We have the following equations:
5.5 Appendices
147
I H xˆ (k)τ −1
−1 (k) 0 I p = H xˆ (k)τ −1 0 −1 r (k) (k) −1 p = ˆ (k)τ −1 −1 r (k)H(k) x
M xˆ (k)τ −1 =−1 (k)
.
x (k) 0 .(k) = 0 y (k) d(k) = −1 (k)
.
=
(5.89)
xˆ (k|k − 1) y(k)
(5.90)
−1 x(k|k − 1) p (k)ˆ . −1 r (k)y(k)
(5.91)
By (5.89) and (5.90), we get:
.
−1 MT xˆ (k)τ −1 (k)M xˆ (k)τ −1 T −1 T ˆ (k)τ −1 = −1 p (k) x (k)p (k) + H x
(5.92)
T −1 −1 −1 × r (k) y (k)r (k)H xˆ (k)τ −1 . Utilizing the matrix inversion lemma with the identification: T −1 T ˆ τ −1 → B, −1 p (k) x (k)p (k) → A, H x
.
T H xˆ τ −1 → C, p (k)−1 y (k)p (k)−1 → D.
We obtain:
.
−1 T xˆ (k)τ −1 (k) xˆ (k)τ −1 = p (k)(x (k))−1 p (k)T − p (k)(x (k))−1 p (k)T HT xˆ (k)τ −1 −1 × r (k) y (k) r (k)T + H xˆ (k)τ −1 p (k)(x (k))−1 −1 ×p (k)T HT xˆ (k)τ −1 × H xˆ (k)τ −1 p (k)(x (k))−1 p (k)T . (5.93)
148
5 Extended Kalman Filtering Under Information Theoretic Criteria
Moreover, by (5.89)–(5.91), we have: MT xˆ (k)τ −1 (k) M xˆ (k)τ −1 xˆ (k)τ −1 − h ¯ xˆ (k)τ −1 + d(k) T = −1 (k) x (k)−1 x(k|k − 1) p p (k)ˆ .
T −1 + H xˆ (k)τ −1 −1 r (k) y (k)r (k) × y(k) + H xˆ (k)τ −1 xˆ (k)τ −1 − h xˆ (k)τ −1 . T
Combining (5.43), (5.93), and (5.94), we have (5.45).
(5.94)
Chapter 6
Unscented Kalman Filter Under Information Theoretic Criteria
The unscented transformation (UT) is an efficient method to solve the state estimation problem for a nonlinear dynamic system, which utilizes a derivative-free higher-order approximation of a Gaussian distribution rather than approximating a nonlinear function. Applying the UT to a Kalman filter-type estimator leads to the well-known unscented Kalman filter (UKF). In general, the UKF works well in Gaussian noises; its performance may, however, deteriorate severely when the noises are non-Gaussian, especially when the system is disturbed by noises of heavy-tailed distribution. To improve the robustness of the UKF against heavytailed noises, two UKF-type filters based on the maximum correntropy criterion (MCC) are presented in this chapter, namely, the maximum correntropy unscented filter (MCUF) [120] and maximum correntropy unscented Kalman filter (MCUKF) [119]. Except for MCUK and MCUKF, the unscented Kalman filter with generalized correntropy loss (GCL-UKF) [29] is also developed, which can achieve even better performance. In addition, since the minimum error entropy (MEE) exhibits good robustness with respect to more complicated noises such as those of multimodal distributions, a unscented Kalman-type filter based on the MEE criterion, termed MEE-UKF [126], is also derived and discussed.
6.1 Unscented Kalman Filter (UKF) Consider a nonlinear system in the following form: .
x(k) = f(x(k − 1)) + q(k − 1).
(6.1)
y(k) = h(x(k)) + r(k).
(6.2)
Similar to the KF, the UKF includes two steps, namely, prediction and update:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_6
149
150
6 Unscented Kalman Filter Under Information Theoretic Criteria
6.1.1 Prediction The a priori state estimate and the corresponding error covariance matrix are: xˆ (k|k − 1) =
2n
.
wim χ i (k|k − 1).
(6.3)
i=0
2n wic χ i (k|k − 1) − xˆ (k|k − 1) T × χ i (k|k − 1) − xˆ (k|k − 1) + Q(k), i=0
(6.4)
χ i (k|k − 1) = f(ξ i (k − 1)) (i = 0, 1, 2, . . . , 2n).
(6.5)
P(k|k − 1) = where: .
The sigma points .ξ i and corresponding weight .wi are generated by the unscented transformation [29, 119, 120, 126].
6.1.2 Update The a priori measurement estimate, the measurement error covariance matrix, and the cross-covariance matrix between state and measurement are: yˆ (k|k − 1) =
2n
.
wim γ i (k).
(6.6)
i=0
Pyy (k|k − 1) =
2n
T wic γ i (k) − yˆ (k|k − 1) γ i (k) − yˆ (k|k − 1) + R(k) .
i=0
(6.7) Pxy (k|k − 1) =
2n
T wic χ i (k|k − 1) − xˆ (k|k − 1) γ i (k) − yˆ (k|k − 1) ,
i=0
(6.8) where: .
γ i (k) = h(ξ i (k|k − 1)) (i = 0, 1, 2, . . . , 2n).
(6.9)
6.2 Maximum Correntropy Unscented Filter (MCUF)
151
The state estimate .xˆ (k) and corresponding covariance matrix .P(k) are computed by: .
xˆ (k) = xˆ (k|k − 1) + K(k) y(k) − yˆ (k|k − 1) .
(6.10)
P(k) = Pxx (k|k − 1) − K(k)Pyy (k|k − 1)KT (k).
(6.11)
K(k) = Pxy (k|k − 1)P−1 yy (k|k − 1).
(6.12)
6.2 Maximum Correntropy Unscented Filter (MCUF) Similar to MCEKF, the MCC can be utilized to derive a new UKF, called the MCUF [120]. Different from the MCEKF, the MCUF adopts a statistical linear approximation instead of the first-order approximation as in LRMCEKF to approximate the nonlinear measurement function. A novel batch regression model based on MCC is used to obtain the posterior state and covariance of MCUF.
6.2.1 MCUF Algorithm MCUF algorithm is derived based on (6.3)–(6.6) and (6.8)–(6.9). First, define the measurement slope matrix as: T S(k) = P−1 (k|k − 1)Pxy (k) .
.
(6.13)
Then, measurement equation (6.2) can be approximated by: y(k) ≈ yˆ (k|k − 1) + S(k)(x(k) − xˆ (k|k − 1)) + r(k).
.
(6.14)
Combining state prediction and (6.14), we obtain the following batch-mode regression: .
xˆ (k|k − 1) y(k) − yˆ (k|k − 1) + S(k)ˆx(k|k − 1)
I − x(k) − xˆ (k|k − 1) = x(k) + . S(k) r(k)
(6.15)
After uncorrelating the state prediction errors of the batch-mode regression, we have: D(k) = W(k)x(k) + e(k),
.
(6.16)
152
6 Unscented Kalman Filter Under Information Theoretic Criteria
xˆ (k|k − 1) I −1 where .D(k) = (k) , .W(k) = (k) , y(k) − yˆ (k) + S(k)ˆx(k|k − 1) S(k)
ˆ (k|k − 1) −1 − x(k) − x .e(k) = , and .(k) is obtained in the same way as r(k) that in (5.11). Under the MCC, the optimal estimate of .x(k) arises from the following optimization: −1
xˆ (k) = arg max JMCUF (x(k)) = arg max
L
.
x(k)
x(k)
κσ (ei (k)) ,
(6.17)
i=1
where .ei (k) is the i-th element of .e(k), that is: ei (k) = Di (k) − wi (k)x(k).
.
(6.18)
The optimal solution of (6.17) can be solved through: .
∂JMCUF (x(k)) = 0. ∂x(k)
(6.19)
The estimate of .x(k) can then be obtained by a fixed-point iterative algorithm [120] xˆ (k)τ = g xˆ (k)τ −1 .
.
(6.20)
A detailed description of the MCUF algorithm is given in Algorithm 7. Remark 6.1 Since .K(k) relies on . P(k|k − 1) and . R(k), both related to .x(k) through x (k) and .y (k), respectively, the Eq. (6.20) is a fixed-point equation for .x(k). The initial value of the fixed-point iteration can be set to .xˆ (k)0 = xˆ (k|k − 1) or chosen −1 as the least-squares solution .xˆ (k)0 = WT (k)W(k) WT (k)D(k). The value after convergence is the posterior estimate of the state .xˆ (k).
.
Remark 6.2 In the MCUF, we use a statistical linear regression model and the MCC to obtain the posterior estimates of the state and covariance. With the UT and statistical linear approximation, the filter can achieve a more accurate solution than the first-order linearization-based filters. Moreover, the usage of MCC will improve the robustness of the filter against large outliers. Usually, the convergence of the fixed-point iteration to the optimal solution is very fast. Thus, the computational complexity of MCUF is not high in general. The kernel size .σ , which can be set manually or optimized by trial-and-error methods in practical applications, is a key parameter in MCUF. In general, a smaller kernel size makes the algorithm more robust (with respect to outliers), but a too small kernel size may lead to slow convergence or even divergence. From [102], we know that if the kernel size is larger than a certain value, the fixed-point algorithm (6.20) will surely converge to
6.2 Maximum Correntropy Unscented Filter (MCUF)
153
Algorithm 7 Maximum correntropy unscented filter (MCUF) [120] Step 1: Choose a proper kernel size σ and a small positive ε; set an initial estimate xˆ (0) and corresponding covariance matrix P(0); let k = 1 Step 2: Use Eqs. (6.3) and (6.4) to obtain prior estimate xˆ (k|k − 1) and covariance P(k|k − 1), and calculate p (k|k − 1) and r (k) by Cholesky decomposition Step 3: Use (6.6) to compute the prior measurement yˆ (k|k − 1), and use (6.4) and (6.8) to acquire the measurement slope matrix S(k), and construct the statistical linear regression model (6.16) Step 4: Generate the sigma points using the UT, and let τ = 1 and .xˆ (k)0 = T −1 W (k)W(k) WT (k)D(k) Step 5: Use (6.21)–(6.27) to compute xˆ (k)τ : xˆ (k)τ = xˆ (k|k − 1) + K(k) y(k) − yˆ (k|k − 1)
(6.21)
−1 K(k) P(k|k − 1)ST (k) + R(k) = P(k|k − 1)ST (k) S(k)
(6.22)
T −1 P(k|k − 1) = p (k|k − 1) x (k)p (k|k − 1)
(6.23)
T −1 R(k) = r (k) y (k)r (k)
(6.24)
x (k) = diag{κσ ( e1 (k)) , ..., κσ ( en (k))}
(6.25)
y (k) = diag{κσ ( en+1 (k)) , ..., κσ ( en+m (k))}
(6.26)
ei (k) = Di (k) − wi (k)ˆx(kτ −1 )
(6.27)
with:
Step 6: Compare the estimation at the current step and the estimation at the last step. If (6.28) holds, set xˆ (k) = xˆ (k)τ and continue to Step 7. Otherwise, τ + 1 → τ , and go back to Step 5: xˆ (k)τ − xˆ (k)τ −1 ≤ε xˆ (k)τ −1
(6.28)
Step 7: Update the posterior covariance matrix by (6.29), k + 1 → k and go back to Step 2: T P(k|k − 1) I − K(k)S(k) P(k) = I − K(k)S(k) T (k) + K(k)R(k) K
(6.29)
a unique fixed point. When .σ → ∞, the MCUF will reduce to the least-squares −1 solution . WT (k)W(k) WT (k)D(k).
154
6 Unscented Kalman Filter Under Information Theoretic Criteria
6.2.2 Illustrative Example In the following, we present an illustrative example to demonstrate the performance of the MCUF algorithm. The performance is measured using .MSE1 (k), .MSE2 (l), and .MSE. The performances of EKF, Huber-EKF (HEKF) [87], UKF, and HUKF [82] are also presented for comparison purpose. The goal is to estimate the position, velocity, and ballistic coefficient of a vertically falling body at a very high altitude. The measurements are taken by a radar system each .0.1s. By rectangle integral in the discrete time with sampling period .T , we obtain the following model [62]: x1 (k1 ) = x1 (k1 − 1) + T x2 (k1 − 1) + q1 (k1 − 1) x2 (k1 ) = x2 (k1 − 1) + T ρ0 exp (−x1 (k1 − 1)/a) .
× x2 (k1 − 1)2 x3 (k1 − 1)/2 − T g + q2 (k1 − 1)
(6.30)
x3 (k1 ) = x3 (k1 − 1) + q3 (k1 − 1)
y(k) = H 2 + (x1 (k) − b)2 + r(k), where the sampling time is .T = 0.001s, .k = 100k1 , the constant .ρ0 is .ρ0 = 2, the constant a that relates the air density with altitude is .a = 20,000, the acceleration of gravity is .g = 32.2f t/s 2 , the located altitude of radar is .H = 100,000 ft, and the horizontal range between the body and the radar is .b = 100,000 ft. T The state vector .x(k) = x1 (k) x2 (k) x3 (k) contains position, velocity, and ballistic coefficient. Similar to [62], we do not introduce any process noise in this T model. The initial state is assumed to be .x(0) = 300,000, −20,000, 1/1000 , T and the initial estimate is .xˆ (0) = 300,000, −20,000, 0.0009 with covariance matrix .P(0) = diag{1,000,000, 4,000,000, 1/1,000,000}. First, we assume that the measurement noise is Gaussian, that is: .
q(k1 ) ∼ 0 r(k) ∼ N(0, 10,000).
In the simulation, we consider the motion during the first 50s and make 100 independent Monte Carlo runs. Figures 6.1, 6.2 and 6.3 show the .MSE1 (as defined in (5.60)) of .x1 , .x2 , and .x3 for different filters in Gaussian noise. The corresponding .MSEs and average fixed-point iteration numbers are summarized in Tables 6.1 and 6.2. As one can see, in this case, since the noise is Gaussian, the UKF performs the best. However, the MCUF with large kernel size can outperform the other two robust Kalman-type filters, namely, HEKF and HUKF. One can also observe that the average iteration numbers of the MCUF are very small especially when the kernel size is large.
6.2 Maximum Correntropy Unscented Filter (MCUF) Fig. 6.1 .MSE1 of .x1 in Gaussian noise (From [120])
Fig. 6.2 .MSE1 of .x2 in Gaussian noise (From [120])
Fig. 6.3 .MSE1 of .x3 in Gaussian noise (From [120])
155
156
6 Unscented Kalman Filter Under Information Theoretic Criteria
Table 6.1 .MSEs of .x1 , .x2 , and .x3 in Gaussian noise (From [120]) Filter EKF HEKF. ε = 10−6 UKF HUKF. ε = 10−6 MCUF. σ = 2.0, ε = 10−6 MCUF. σ = 3.0, ε = 10−6 MCUF. σ = 5.0, ε = 10−6 MCUF. σ = 10, ε = 10−6 MCUF. σ = 20, ε = 10−6
.MSE
.MSE
.MSE
.7.3254
of .x1 × 103 3 .7.9266 × 10 3 .7.2630 × 10 3 .7.8343 × 10 3 .8.3984 × 10 3 .7.4680 × 10 3 .7.2943 × 10 3 .7.2674 × 10 3 .7.2642 × 10
.8.6071
of .x2 × 103 3 .8.9799 × 10 3 .8.5948 × 10 3 .8.9969 × 10 3 .9.4045 × 10 3 .8.7892 × 10 3 .8.6564 × 10 3 .8.6298 × 10 3 .8.6254 × 10
.1.1317
of .x3 × 10−8 −8 .1.1001 × 10 −8 .1.1308 × 10 −8 .1.0983 × 10 −8 .1.0864 × 10 −8 .1.0865 × 10 −8 .1.1098 × 10 −8 .1.1246 × 10 −8 .1.1288 × 10
Table 6.2 Average iteration numbers for every time step in Gaussian noise (From [120]) Filter HEKF. ε = 10−6 −6 HUKF. ε = 10 MCUF. σ = 2.0, ε = 10−6 MCUF. σ = 3.0, ε = 10−6 MCUF. σ = 5.0, ε = 10−6 MCUF. σ = 10, ε = 10−6 MCUF. σ = 20, ε = 10−6
Average iteration number 1.2342 1.2332 1.7881 1.5821 1.3808 1.1706 1.0521
Second, we consider the case in which the measurement noise is a heavy-tailed (impulsive) non-Gaussian noise, with a mixed-Gaussian distribution, that is: .
q(k1 ) ∼ 0 r(k) ∼ 0.7N(0, 1000) + 0.3N(0, 100,000).
Figures 6.4, 6.5 and 6.6 demonstrate the .MSE1 of .x1 , .x2 , and .x3 for different filters in non-Gaussian noise, and Tables 6.3 and 6.4 summarize the corresponding .MSEs and average iteration numbers, respectively. We can see clearly that the three robust Kalman-type filters (HEKF, HUKF, MCUF) are superior to their non-robust counterparts (EKF, UKF). When the kernel size is very large, the MCUF achieves almost the same performance as that of UKF. In contrast, with a smaller kernel size, the MCUF can outperform the UKF significantly. Especially, when .σ = 2.0, the MCUF exhibits the smallest .MSE among all the algorithms. Again, the fixed-point algorithm in MCUF will converge to the optimal solution in very few iterations.
6.2 Maximum Correntropy Unscented Filter (MCUF) Fig. 6.4 .MSE1 of .x1 in non-Gaussian noise (From [120])
Fig. 6.5 .MSE1 of .x2 in non-Gaussian noise (From [120])
Fig. 6.6 .MSE1 of .x3 in non-Gaussian noise (From [120])
157
158
6 Unscented Kalman Filter Under Information Theoretic Criteria
Table 6.3 .MSEs of .x1 , .x2 and .x3 in non-Gaussian noise (From [120]) Filter EKF HEKF. ε = 10−6 UKF HUKF. ε = 10−6 MCUF. σ = 2.0, ε = 10−6 MCUF. σ = 3.0, ε = 10−6 MCUF. σ = 5.0, ε = 10−6 MCUF. σ = 10, ε = 10−6 MCUF. σ = 20, ε = 10−6
.MSE
.MSE
.MSE
.2.9499
of .x1 × 104 4 .1.4068 × 10 4 .2.8772 × 10 4 .1.3996 × 10 4 .1.1457 × 10 4 .1.7612 × 10 4 .2.3818 × 10 4 .2.7448 × 10 4 .2.8428 × 10
.2.2283
of .x2 × 104 4 .1.4079 × 10 4 .2.2331 × 10 4 .1.4212 × 10 4 .1.2990 × 10 4 .1.5970 × 10 4 .1.9475 × 10 4 .2.1572 × 10 4 .2.2147 × 10
.1.5566
of .x3 × 10−8 −9 .8.6645 × 10 −8 .1.5497 × 10 −9 .8.6247 × 10 −9 .7.3965 × 10 −8 .1.0199 × 10 −8 .1.3083 × 10 −8 .1.4826 × 10 −8 .1.5323 × 10
Table 6.4 Average iteration numbers for every time step in non-Gaussian noise (From [120]) Filter HEKF. ε = 10−6 −6 HUKF. ε = 10 MCUF. σ = 2.0, ε = 10−6 MCUF. σ = 3.0, ε = 10−6 MCUF. σ = 5.0, ε = 10−6 MCUF. σ = 10, ε = 10−6 MCUF. σ = 20, ε = 10−6
Average iteration number 1.2280 1.2300 1.4809 1.3815 1.2771 1.1713 1.0917
6.3 Maximum Correntropy Unscented Kalman Filter (MCUKF) Different from MCUF, the MCUKF is derived without linear (statistical linear) approximation, which only amends the measurement error covariance by MCC, and the UKF is adopted to the measurement equation to obtain the filter state and covariance matrix.
6.3.1 MCUKF Algorithm MCUKF algorithm is derived based on (6.3)–(6.6). Combining measurement equation (6.2) and the state prediction error, one can recast the nonlinear regression model as:
− x(k) − xˆ (k|k − 1) x(k) xˆ (k|k − 1) . (6.31) + = . r(k) h(x(k)) y(k)
6.3 Maximum Correntropy Unscented Kalman Filter (MCUKF)
159
Left multiplying both sides of (6.31) by .−1 (k), we have the following equation: d(k) = h ¯ (x(k)) + e(k),
(6.32)
.
xˆ (k|k − 1) x(k) −1 where .d(k) = (k) , .h , .e(k) = ¯ (x(k)) = (k) y(k) h(x(k))
− x(k) − xˆ (k|k − 1) −1 (k) , and .(k) can be obtained in the same way as r(k) that in (5.11). Under the MCC, the optimal estimate of .x(k) is found by: −1
xˆ (k) = arg max JMCUKF (x(k)) = arg max
L
.
x(k)
x(k)
κσ (ei (k)) ,
(6.33)
i=1
with: ei (k) = di (k) − h ¯ i (x(k)).
.
(6.34)
Hence, the optimal solution for .x(k) can be solved from the following equation: .
∂JMCUKF (x(k)) = 0, ∂x(k)
(6.35)
which also can be expressed as: L .
ϕ(ei (k))
i=1
∂ei (k) = 0, ∂xi (k)
(6.36)
where .xi (k) is the i-th element of .x(k) and .ϕ(ei (k)) = κσ (ei (k))ei (k). We define
i (k) = ϕ(ei (k))/ei (k) and have:
.
i (k) = κσ (ei (k)).
.
(6.37)
Equation (6.36) can then be expressed in a matrix form as: .
h ¯ (x(k)) ∂x(k)
T
(k) d(k) − h ¯ (x(k)) = 0,
where:
x (k) 0 . .(k) = 0 y (k)
(6.38)
160
6 Unscented Kalman Filter Under Information Theoretic Criteria
Based on Eq. (6.32) with only one iteration [116], we had obtained similar results to the UKF framework by using .(k) to reformulate the measurement information. Similar to [83], there are two ways to do it. One is to re-weight the residual error covariance depended on the value of .ei (k) = di (k) − h ¯ i (x(k)); the other is to reconstruct the “pseudo-observation.” In general, the derived results based on both alternatives are the same. Here, we just describe the algorithm based on the first approach for simplicity. (k) as the modified covariance: Define . (k) = (k)−1 (k)T (k).
.
(6.39)
(k) into two portions as: Let us write . (k) = .
x (k) 0 y (k) . 0
(6.40)
ˆ (k|k − 1) = 0. Since the true state .x(k) is unknown, we set .x(k|k − 1) = x(k) − x In this case, we can observe that: x (k) = p (k|k − 1) ∗ I ∗ Tp (k|k − 1) .
= P(k|k − 1).
(6.41)
The modified measurement covariance can be obtained as: y (k). R(k) =
.
(6.42)
With above derivations, the MCUKF algorithm can be summarized as Algorithm 8. Remark 6.3 As one can see, different from the UKF algorithm, the MCUKF uses a nonlinear regression model combined with MCC to update the measurement information. Again, the kernel size .σ is a key parameter in MCUKF. In general, a smaller kernel size makes the algorithm more robust (with respect to outliers). However, when the kernel size is too small, it may result in filter divergence or accuracy deterioration. The reason for this is explained in [102]. On the other hand, when .σ becomes larger and larger, the MCUKF will behave more and more like the UKF algorithm. The following theorem holds: Theorem 6.1 If the kernel size .σ approaches infinity, the MCUKF will reduce to the traditional UKF. Proof As .σ → ∞, we have .(k) → I. In this case, the MCUKF is equal to the traditional UKF.
6.3 Maximum Correntropy Unscented Kalman Filter (MCUKF)
161
Algorithm 8 Maximum correntropy unscented Kalman filter (MCUKF) [119] Step 1: Choose a proper kernel size σ ; set an initial estimate xˆ (0) and corresponding covariance matrix P(0); let k = 1 Step 2: Calculate the sigma points and the propagated sigma points with respect to function f and h Step 3: Calculate predicted estimate xˆ (k|k − 1) and covariance matrix P(k|k − 1) by (6.3) and (6.4), and adopt Cholesky decomposition to get p (k|k − 1) and r (k) Step 4: Use (6.31)–(6.42) to obtain the modified R(k); compute the predicted measurement mean and cross-covariance matrix by (6.6) and (6.8); replace R(k) with R(k) in (6.7) and have the following covariance matrix: Pyy =
2n
T ˜ wci γ i (k) − yˆ (k) γ i (k) − yˆ (k) + R(k).
(6.43)
i=0
Step 5: Use (6.10)–(6.12) to compute the state estimate and covariance matrix; update k + 1 → k and return to Step 2
6.3.2 Illustrative Example We demonstrate the performances of the algorithms. Here, we mainly compare the estimation performance based on the MSE. Now, a univariate nonstationary growth model (UNGM) that is often used as a benchmark example for nonlinear filtering is considered, whose state and measurement equations are: x(k) = α1 x(k − 1) + α2
.
x(k − 1) + α3 cos(1.2(k − 1)) + q(k − 1), 1 + x 2 (k − 1) y(k) =
.
x 2 (k) + r(k), 20
(6.44)
(6.45)
where the parameters are .α1 = 0.5, .α2 = 25, and .α3 = 8. First, we consider the case in which all the noises are Gaussian: .
q(k − 1) ∼ N(0, 1) r(k) ∼ N(0, 1).
In this example, the time steps and total Monte Carlo runs are chosen as .N = 500, lmc = 100. Table 6.5 illustrates the .MSEs of x, defined in (4.39). Since all the noises are Gaussian, the UKF gives the smallest .MSE in all filters; it is noted that when the kernel size is small, the MCUKF may result in a worse estimation; in contrast when the kernel size becomes larger, its performance will approach that of the UKF. Usually, one should choose a larger kernel size in Gaussian noises. Second, we change the observation noise to a heavy-tailed non-Gaussian noise:
162
6 Unscented Kalman Filter Under Information Theoretic Criteria
Table 6.5 .MSEs of x in Gaussian noises (From [119])
Table 6.6 .MSEs of x in non-Gaussian measurement noise (From [119])
.
Filter UKF MCUKF.(σ MCUKF.(σ MCUKF.(σ MCUKF.(σ MCUKF.(σ Filter UKF MCUKF.(σ MCUKF.(σ MCUKF.(σ MCUKF.(σ MCUKF.(σ
.MSE
= 2.0) = 3.0) = 5.0) = 10) = 20)
of x 67.6974 87.6836 80.8406 74.0286 72.3362 68.6795
.MSE
= 1.0) = 2.0) = 3.0) = 5.0) = 10)
of x 85.8439 84.1944 82.6933 83.1098 84.7173 85.4411
q(k − 1) ∼ N(0, 1) r(k) ∼ 0.8N(0, 1) + 0.2N(0, 500).
Table 6.6 shows the .MSEs of x in non-Gaussian measurement noise. As one can see, in this case, when kernel size is too small or too large, the performance of MCUKF may be not good. However, with a proper kernel size (say .σ = 2.0), the MCUKF can outperform the UKF, achieving a smaller .MSE. Again, when .σ is quite large, MCUKF achieves almost the same performance as the UKF.
6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF) The GCL-UKF is derived below, which combines the robustness of the GCL to address non-Gaussian interferences and the strength of the UKF in handling strong nonlinearities. In addition, considering the nontrivial influences of the noisy data on the innovation vector, an enhanced generalized correntropy loss UKF (EnGCLUKF) method is established. This will bring more degrees of freedom to adjust the covariance matrix so as to improve the state estimation accuracy under changes to the gain matrix during operation.
6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF)
163
6.4.1 GCL-UKF Algorithm The derivation of GCL-UKF algorithm is similar to MCUF in (6.13)–(6.16). In MCUF, correntropy allows the approximation of the loss norm between .L2 and .L0 , but there are applications where we would prefer to work with .Lp norms (.p > 2). The generalized correntropy loss is: L 1 exp −λ|ei (k)|α , 1− L
JGCL−UKF (x(k)) = γα,β
.
(6.46)
i=1
where .λ =
1 βα
is the kernel parameter, .γα,β =
and .ei (k) is the i-th element of .e(k) given by:
α 2βΓ ( α1 )
is the normalization constant,
ei (k) = Di (k) − wi (k)x(k).
.
(6.47)
Based on (6.13)–(6.16) established batch regression model, the optimal estimate of .x(k) can be obtained:
xˆ (k) = arg min γα,β
.
x(k)
L 1 α . exp −λ|ei (k)| 1− L
(6.48)
k=1
Setting the gradient to be zero yields: .
∂JGCL−UKF (x(k)) = 0. ∂x(k)
(6.49)
One can easily get:
.
L γα,β ∂JGCL−UKF (x(k)) =− exp{−λ|ei (k)|α }|ei (k)|α−1 sign(ei (k))wTi (k) = 0. ∂x(k) L k=1
(6.50) Similar to the previous series of methods, .x(k) can be solved by a fixed-point iterative algorithm: xˆ (k)τ = g(ˆx(k)τ −1 ).
.
(6.51)
Here, (6.51) is further expressed in a matrix form as:
−1
Cx (k) 0 Cx (k) 0 T T ˆ (k)τ = W (k) W (k) W(k) D(k) , .x 0 Cy (k) 0 Cy (k) (6.52)
164
6 Unscented Kalman Filter Under Information Theoretic Criteria
with: Cx (k) = diag{ϑ (e1 (k)) , ..., ϑ (en (k))},
(6.53)
Cy (k) = diag{ϑ (en+1 (k)) , ..., ϑ (en+m (k))},
(6.54)
.
.
and .ϑ(ei (k)) = κα,β (ei (k))|ei (k)|α−2 and .κα,β (ei (k)) in (4.55). Similar to MCUF, the state estimate can be obtained via the matrix inversion theorem, and the resulting GCL-UKF is summarized in Algorithm 9. Remark 6.4 The gross errors caused by the outlier in the measurements may significantly affect the performance of the state estimation techniques [152], such as the traditional UKF with MSE loss. The GCL is on the other hand a local similarity function determined by the free parameter kernel size, and this can be the main reason why the GCL-UKF appears to be insensitive to the outliers.
6.4.2 Enhanced GCL-UKF (EnGCL-UKF) Algorithm In general, noises in measurements may affect significantly the innovation vector ψ ∗ (k) = y(k)− yˆ (k|k −1) that will effect cause changes in the state vector. To avoid such performance degradation, it is important to restrain the change of innovation vector caused by the noise (especially by non-Gaussian noise). Inspired by the idea in [152], an exponential function of the innovation vector is introduced into the GCL-UKF to update the covariance matrix by adjusting innovation vector:
.
ˆ R(k) = R(k) exp{−|ψ ∗ (k)|}
(6.55)
ˆ −1 (k). R(k) = R
(6.56)
.
.
From (6.55) and (6.56), one can see that once any raw measurements encounter a significant outlier that results in an increase of absolute residual vector, the inverse of the absolute residual vector can help suppress the outlier such that the estimation performance does not suffer much. In addition, when the residual vector is considerably small, the exponential output shall at most approach the value of one. The suppression function can be added to the system model without too much change, so it is very simple and applicable. Moreover, the modified GCL-UKF by adding all previous effect in absolute exponential form can reduce the effect of gain because the measurement noise covariance matrix is included in the gain matrix. Then an enhanced GCL-UKF (EnGCL-UKF) method can be obtained when ˆ (6.55) and (6.56) are used in GCL-UKF to perform the update .R(k). In addition, the enhanced versions of EKF and UKF can also be obtained when the covariance matrices of measurement noise are updated by (6.55) and (6.56), termed as EnEKF and EnUKF. The procedure of the EnGCL-UKF is also presented in Algorithm 9.
6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF)
165
Algorithm 9 Unscented Kalman filter with generalized correntropy loss (GCLUKF) and enhanced generalized correntropy loss (EnGCL-UKF) [29] Step 1: Set proper free parameters, and choose a small positive ε; set an initial estimate xˆ (0) and the corresponding covariance matrix P(0); let k = 1 Step 2: Calculate the sigma points and the propagated sigma points with respect to function f and h Step 3: Calculate predicted estimate xˆ (k|k − 1) and covariance matrix P(k|k − 1) by (6.3) and (6.4), and adopt Cholesky decomposition to get p (k|k − 1) Step 4: Use (6.13) to derive the measurement slope matrix Si ; Compute the predicted measurement mean and cross-covariance matrix by (6.6) and (6.8) Step 5: Let τ = 1 and set the initial value x(k)τ at time instant τ = 0 as x(k)0 Step 6: Use the following equation to compute xˆ (k): xˆ (k)τ = xˆ (k|k − 1) + K(k)(y(k) − yˆ (k|k − 1))
(6.57)
−1 K(k) = P(k|k − 1)ST (k) S(k) P(k|k − 1)S(k)T + R(k)
(6.58)
T P(k|k − 1) = p (k|k − 1)C−1 x (k)p (k|k − 1)
(6.59)
R(k) =
(6.60)
with:
T r (k)C−1 y (k)r (k)
Cx (k) = diag{ϑ(e1 (k)), ..., ϑ(en (k))}
(6.61)
Cy (k) = diag{ϑ(en+1 (k)), ..., ϑ(en+m (k))}
(6.62)
In EnGCL-UKF, R(k) is further updated by: ˆ R(k) = R(k) exp{−|ψi |}
(6.63)
ˆ −1 (k). R(k) = R
(6.64)
Step 7: Compare the estimation at the current step and the estimation at the last step. If : ˆx(k)τ − xˆ (k)τ −1 ≤ε ˆxτ −1
(6.65)
holds, set xˆ (k) = xˆ (k)τ and continue to Step 8; otherwise, τ + 1 → τ , and go back to Step 6 Step 8: Update the posterior covariance matrix by: T T + K(k)R(k)S(k) , P(k) = (I − K(k)S(k))P(k|k − 1)(I − K(k)S(k))
(6.66)
and k + 1 → k and go back to Step 2
6.4.3 Application to Power System An illustrative example of the power system state estimation is used to demonstrate the performance of GCL-UKF. A dynamic model of the power system can be described by the state space equations [29, 153–156]:
166
6 Unscented Kalman Filter Under Information Theoretic Criteria
x(k) = f(x(k − 1)) + q(k − 1).
(6.67)
y(k) = h(x(k)) + r(k).
(6.68)
.
In the power system model, the state vector .x(k) is comprised of the sub-vectors of the voltage magnitude .V(k) ∈ R1×n1 and voltage angle .ϕ(k) ∈ R1×n2 , i.e., T .x(k) = [ϕ(k), V(k)] with .n = n1 + n2 . The measurement vector .y(k) is also represented by the sub-vectors, i.e., .y(k) = [V(k), T(k), Z(k), Tf (k), Zf (k)]T , where .T(k) ∈ R1×m1 , .Z(k) ∈ R1×m2 , .Tf (k) ∈ R1×m3 , and .Zf (k) ∈ R1×m4 are the real power injection, reactive power injection, real power flow, and reactive power flow at k-th time, respectively, with .m = n1 + m1 + m2 + m3 + m4 . In the power system state estimation, measurements .y(k) can be available from the PMUs like [155]. The inclusion of a prediction stage for estimation permits a better filtering of measurement noises so that the estimation results can be improved [152]. With the Holt’s two-parameter linear exponential smoothing technique [153, 155–157], .f(·) in general takes the form: f(x(k − 1)) = s(k − 1) + u(k − 1),
.
(6.69)
where: s(k − 1) = τ x(k − 1) + (1 − τ )x(k − 1)−.
(6.70)
.
u(k − 1) = η (s(k − 1) − s(k − 2)) + (1 − η )u(k − 2).
(6.71)
Here, .τ and .η are constants between 0 and .1.0, and .x(k − 1)− is the predicted state vector at the .(k − 1)-th instant [153, 155]. The state forecasting function (6.69) is used in (6.67) to predict the state vector in advance when the state prediction of Kalman filtering is executed. The effectiveness of the Holt’s method can be found in [155, 157]. With the standard real power and reactive power balance as well as power flow equations [153, 156], .h(·) takes the form: Ta = |Va |
N
.
|Vb |(Gab cosϕab + Bab sinϕab ).
(6.72)
|Vb |(Gab sinϕab − Bab cosϕab ).
(6.73)
b=1
Za = |Va |
N b=1
f
Tab = Va2 (gga + gab ) − |Va ||Vb |(gab cosϕab + bab sinϕab ). f Zab
(6.74)
= −Va2 (bga + bab ) −|Va ||Vb |(gab sinϕab − bab cosϕab ), ‘
(6.75)
6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF)
167
where .Va is the voltage magnitude at bus a, .ϕab is the voltage angle between buses a and b, .Ta and .Za are the real power injection and reactive power injection at bus f f a, .Tab and .Zab are the real power flow and reactive power flow between buses a and b, .Gab + j Bab is the ab-th element of the complex bus admittance matrix with real part .Gab and imaginary part .Bab , .gab + j bab is the admittance of the series branch between buses a and b, and .gga + j bga is the admittance of the shunt branch at bus a [153, 156].
6.4.4 Illustrative Example The performance of the GCL-UKF and EnGCL-UKF methods is evaluated in the case study on the IEEE 14 bus, 30 bus, and 57 bus test systems, respectively [158]. To investigate the effectiveness and the robustness, the three test scenarios including non-Gaussian noise with outliers in measurements, abnormal situation of measurement gross error, and sudden load change condition are mainly considered in the following simulations. If not mentioned otherwise, the process noises are all assumed to be a Gaussian distribution with zero mean and variance 0.01. In the following experiments, 50 time-sample intervals are set to simulate the dynamic changes of power systems. The EKF, UKF, EnEKF, and EnUKF are included for comparisons. Test results are presented in terms of the performance index .J (in p.u) of the overall measurement and the mean absolute error (MAE) of the phase angle and amplitude of each node voltage [152]. All the results are obtained by averaging over 200 independent Monte Carlo runs to achieve more reliable simulation results and higher statistical significance. All the tests are executed with MATLAB 2016b running on i7-4510U and 2.60 GHZ CPU.
6.4.4.1
Case 1: Non-Gaussian Noise with Outliers in Measurements
First, we consider the case with mixed-Gaussian noises in measurements generated by (1 − a)N(μ1 , v12 ) + aN(μ2 , v22 ),
.
(6.76)
where .N(μi , vi2 ) (i = 1, 2) denotes the Gaussian distributions with mean values 2 .μi (i = 1, 2) and variances .σ i (i = 1, 2), and a is the mixture coefficient. The Gaussian distribution with a larger variance can model stronger impulsive noises (outliers). In this simulation, the parameter value .(μ1 , μ2 , v12 , v22 , a) of the mixed-Gaussian distribution is set at .(0, 0, 1, 80, 0.15) to model the measurement noises. The initial state error covariance matrix .P0 is set to .10−4 I. The free parameter .α is set as 1.6 and the kernel size parameter .λ = 0.0019. The overall performance indexes for the different test systems at each time instant k are given
168 Fig. 6.7 Performance index p.u) under mixed-Gaussian measurement noise (From [29]). (a) IEEE 14 bus test system. (b) IEEE 30 bus test system. (c) IEEE 57 bus test system
6 Unscented Kalman Filter Under Information Theoretic Criteria 0.6 EKF EnEKF UKF EnUKF GCL-UKF EnGCL-UKF
.J (in
0.5
J(in p.u)
0.4
0.3
0.2
0.1
0 5
10
15
20
25
30
35
40
45
50
time,i
(a) 0.6 EKF EnEKF UKF EnUKF GCL-UKF EnGCL-UKF
0.5
J(in p.u)
0.4
0.3
0.2
0.1
0 5
10
15
20
25
30
35
40
45
50
time,i
(b) 0.6 EKF EnEKF UKF EnUKF GCL-UKF EnGCL-UKF
0.5
J(in p.u.)
0.4
0.3
0.2
0.1
0 5
10
15
20
25
30
35
40
45
50
time,i
(c)
in Fig. 6.7a–c. It is observed that (1) GCL-UKF performs better than EKF and UKF due to the enhancement by GCL and (2) the performance of the EnGCLUKF algorithm has significant advantages over other methods in any test system.
6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF) Fig. 6.8 Overall performance of EKF, UKF, GCL-UKF, and EnGCL-UKF (From [29])
169
0.45 EKF UKF EnEKF EnUKF GCL-UKF EnGCL-UKF
0.4 0.35
J(in p.u)
0.3 0.25 0.2 0.15 0.1 0.05 0 14
30
57
Bus
Such results illustrate that the EnGCL-UKF can achieve higher filtering capacities and more robust performance during slow dynamic changes under the non-Gaussian measurement noises, because the absolute exponential form can restrain the change in the innovation vector. Since PMU measurement errors of real and reactive power flows calculated from voltage and phasors may follow Laplace distribution with outliers, the real and reactive power measurement noises generated by the mixed-Gaussian and Laplace distributions represented by (6.77) are considered in this case to further verify the good performance of the GCL-UKF and EnGCL-UKF. We let the noises be presented as: 0.95N(0, 100) + 0.15L(0, 1),
.
(6.77)
where .L(0, 1) denotes the noise drawn from the Laplace distribution with zero mean and scale 1. The average value of the overall performance index under different test systems is shown in Fig. 6.8. It is observed from the figure that the EnGCL-UKF performs better than other methods under this case in all test systems. Figures 6.9 and 6.10 present the absolute error results of voltage amplitude and phase angle at the bus 3 in IEEE 14 bus test system. It is clear that results obtained by the EnGCLUKF are superior to those obtained by other methods. In addition, the MAE and maximum values of absolute voltage amplitude and phase angle errors (denoted as .MAEθ , .MaxEθ , .MaxEv , and .MAEv ) at bus 3 in IEEE 30-test system are given in Table 6.7. From this result, one can obtain the same conclusion.
6.4.4.2
Case 2: Bad Data Condition
The robustness of GCL-UKF and EnGCL-UKF is further tested in the scenario where bad data are included in the measurement. In this case, the real and the reactive power measurements are interrupted by some outliers which are modeled
6 Unscented Kalman Filter Under Information Theoretic Criteria
Fig. 6.9 Absolute amplitude error of estimating the voltage amplitude at bus 3 in IEEE 14 bus system (From [29])
8
Absolute error (voltage amplitude)
170
10-3 EKF UKF EnEKF EnUKF GCL-UKF EnGCL-UKF
7 6 5 4 3 2 1 0 5
10
15
20
25
30
35
40
45
50
time,i 0.014
Absolute error (voltage phase)
Fig. 6.10 Absolute amplitude error of estimating voltage phase angle at bus 3 in IEEE 14 bus system (From [29])
EKF UKF EnEKF EnUKF GCL-UKF EnGCL-UKF
0.012 0.01 0.008 0.006 0.004 0.002 0 5
10
15
20
25
30
35
40
45
50
time,i
Table 6.7 MAE and maximum results of voltage amplitude and phase angle (From [29]) Index .MAE θ .MaxE θ .MAE v .MaxE v
EKF 0.0103 0.0131 0.0057 0.0071
UKF 0.0092 0.0115 0.0041 0.0049
EnEKF 0.0064 0.0083 0.0036 0.0044
EnUKF 0.0058 0.0073 0.0032 0.0043
GCL-UKF 0.0044 0.0056 0.0029 0.0037
EnGCL-UKF 0.0039 0.0049 0.0023 0.0027
by expanding 20 at time index .i = 25. This test is only conducted in the IEEE 30 bus system. Figure 6.11 presents the overall performance index for all time samples considered. From such results, we see that the GCL-UKF and EnGCL-UKF perform better than EKF and UKF, thanks to the robust performance of the GCL. Another observation we can make from these results is that the EnGCL-UKF still achieves best performance in comparison to GCL-UKF, EnEKF, and EnUKF. It demonstrates that the enhanced insensitivity of EnGCL-UKF partly comes from the GCL and more significantly from the exponential updating mechanism in (6.63).
6.4 Unscented Kalman Filter with Generalized Correntropy Loss (GCL-UKF) Fig. 6.11 Overall performance in IEEE 30 bus with large outliers at .i = 25 (From [29])
171
EKF EnEKF UKF EnUKF GCL-UKF EnGCL-UKF
1
J(in p.u)
0.8
0.6
0.4
0.2
0 5
10
15
20
25
30
35
40
45
50
30
35
40
45
50
time,i 0.025
Absolute error (angle phase)
Fig. 6.12 Absolute voltage angle error at bus 3 in IEEE 30 bus test system (From [29])
EKF UKF EnEKF EnUKF GCL-UKF EnGCL-UKF
0.02
0.015
0.01
0.005
0 5
10
15
20
25
time,i
6.4.4.3
Case 3: Sudden Load Change Condition
In this case, a sudden load-changing scenario is simulated to investigate the robustness of GCL-UKF and EnGCL-UKF. The estimation of states at bus 3 in IEEE 30 bus test system at the 30-th time sample is taken as an example, where the power of load at bus 3 has a .20 load drop at that time. The measurement noises are still generated by mixed-Gaussian model. The absolute voltage angle and amplitude error results at bus 3 are presented in Figs. 6.12 and 6.13. One can observe that the EKF, UKF, EnEKF, and EnUKF show obvious fluctuations at the time sample when the sudden load change occurs. The performance of EnGCL-UKF algorithm is desirable, and this is expected as it restrains the bad effects of load sudden change by the exponential attenuation term in the state update. The estimation accuracy thus can be effectively improved.
6 Unscented Kalman Filter Under Information Theoretic Criteria
Fig. 6.13 Absolute voltage amplitude error at bus 3 in IEEE 30 bus test system (From [29])
0.08
Absolute error(voltage amplitude)
172
EKF UKF EnEKF EnUKF GCL-UKF EnGCL-UKF
0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 5
10
15
20
25
time,i
30
35
40
45
50
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF) In the following, the MEE-UKF will be presented. To develop the MEE-UKF, an augmented model is constructed with the statistical linearization approach, such that the state and measurement errors are involved in cost function simultaneously [29]. Instead of MMSE, MCC, and GMC, the MEE is used as the optimality criterion to derive MEE-UKF, in which a fixed-point iteration algorithm and a recursive method are used to online update the posterior estimates and covariance matrix. An important feature of this algorithm is the fact that it can deal with more complex nonGaussian disturbances especially for multimodal distributions. This improvement in performance, however, is achieved at the expense of an increase in computational complexity.
6.5.1 MEE-UKF Algorithm A regression model involving measurement equation (6.2) and state prediction error is constructed by: .
xˆ (k|k − 1) x(k) = + μ(k), ¯ y(k) h(x(k))
(6.78)
with .μ(k) ¯ ∈ Rn+m formulated by arraying the state prediction error .(k|k − 1) = ¯ = [−(k|k −1)T r(k)T ]T . x(k)− xˆ (k|k −1) and measurement noise .r(k), i.e., .μ(k) The batch regression model in (6.78) is not ideal since it makes the optimization of the hidden state .x(k) from .h(x(k)) difficult. An alternate approach is to linearize the
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF)
173
nonlinear function by its first-order Taylor series expansion. Unfortunately, the firstorder Taylor series expansion causes large residual error in comparison with highorder Taylor series expansion, but the latter is hard or even impossible to calculate. The statistical linearization in [29] is preferable for approximating a nonlinear function accurately. The properties of statistical linearization can be found in [29]. Based on (6.3)–(6.9), applying the statistical linearization to (6.2) yields a linearized measurement function: y(k) = yˆ (k|k − 1) + S(k)(k|k − 1) + r(k) + ν(k),
.
(6.79)
where the linearized matrix S(k) relates to the covariance matrix of the state estimation error .Pxx (k|k − 1) and cross-covariance matrix .Pxy (k|k − 1): T S(k) = P−1 (k|k − 1)P (k|k − 1) . xy xx
.
(6.80)
Apart from the linearized matrix .S(k), the statistical linearization error .ν(k) is a major difference in comparison with the traditional linearization approach. The statistical linearization error .ν(k) has an essential role in compensating the error caused by linearization and is therefore referred to as compensation factor. .(r(k) + ν(k)) is the statistical linearization error with covariance .Γ (k) = Pyy (k|k − 1) − −1 T P (k|k − 1)Pxy (k|k − 1) Pxy (k|k − 1) [29]. Combining the above .(k|k − 1) with the measurement equation (6.79), one can obtain the augmented model: .
xˆ (k|k − 1) I = n x(k) + μ (k), y(k) − yˆ (k|k − 1) + S(k)ˆx(k|k − 1) H(k)
(6.81)
where .In denotes an .n × n identity matrix and: μ (k) =
.
−(k|k − 1) . r(k) + ν(k)
(6.82)
When .E μ (k)μ (k)T is positive definite, we have: E μ(k)μ (k)T = (k) (k)T P(k|k − 1) 0 = 0 R(k) + Γ (k) . p (k|k − 1)p (k|k − 1)T 0 = , 0 r (k)r (k)T
(6.83)
where .E (r(k) + ν(k))(r(k) + ν(k))T = R(k) + Γ (k); . (k), .p (k|k − 1) and T .r (k) are obtained by the Cholesky decomposition of .E μ (k)μ (k) , .P(k|k − 1) and .R(k), respectively.
174
6 Unscented Kalman Filter Under Information Theoretic Criteria
Multiplying . (k)−1 on both sides of Eq. (6.81), one can obtain a normalized regression equation: d(k) = W(k)x(k) + e (k),
(6.84)
.
xˆ (k|k − 1) where .d(k) = (k)−1 , .W(k) y(k) − yˆ (k|k − 1) + S(k)ˆx(k|k − 1)
I (k)−1 n , and .e (k) = (k)−1 μ (k). H(k) Under the MEE, the optimal solution of .xˆ (k) can be solved by: xˆ (k) = arg max JMEE−UKF (x(k)) = arg max
.
x(k)
x(k)
=
L L 1 Gσ [ej (k) − ei (k)]. L2 i=1 j =1
(6.85) Setting the gradient of the cost function .JMEE−UKF (x(k)) regarding .x(k) to zero, we have the solution:
L L Gσ2 [ej (k) − ei (k)]1 ˆ (k) = .x
−1
i=1 j =1
L L i=1 j =1
Gσ2 [ej (k) − ei (k)]2
−1 W(k)T (k)d(k) , = W(k)T (k)W(k)
(6.86)
where .1 = [wi (k) − wj (k)][wi (k) − wj (k)]T ; .2 = [di (k) − dj (k)][wi (k) − wj (k)]T : (k) = (k) − (k) =
.
x (k) yx (k) xy (k) y (k)
,
(6.87)
with the forms of . (k) ∈ RL×L , . x (k) ∈ Rn×n , . xy (k) ∈ Rm×n , . yx (k) ∈ Rn×m , and . y (k) ∈ Rm×m same as those in MEE-KF. Sometimes, we can use . (k) + I to replace . (k) to avoid numerical problem in the matrix inversion. The realization of MEE-UKF is summarized as Algorithm 10. Remark 6.5 Since .E e (k)(e (k))T = IL , the residual error .e (k) is white. The global minimum of MEE-UKF is achieved when all residual errors identically are equal to 0 [113, 143]. A fixed-point iteration algorithm and the augmented model with a statistical linearization approach contribute to the performance improvement of MEE-UKF. The threshold .ε provides a stop condition for the fixed-point iteration.
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF)
175
Algorithm 10 Minimum error entropy unscented Kalman filter (MEE-UKF) [126] Step 1: Construct the state space model for power system Step 2: Initialize xˆ (1|0) and P(1|0), and set an appropriate kernel size σ and a small positive threshold ε Step 3: Use the unscented transformation to generate sigma points and corresponding weights; use Eqs. (6.3) and (6.4) to obtain xˆ (k|k − 1) and P(k|k − 1), respectively Step 4: Use Eqs. (6.6)–(6.8) to compute yˆ (k|k − 1), Pyy (k|k − 1) and Pxy (k|k − 1); construct the augmented model; use the Cholesky decomposition of (k) to obtain p (k|k − 1), and r (k); calculate d(k) and W(k), respectively Step 5: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1) Step 6: xˆ (k) can be estimated by a fixed-point iteration algorithm: − h(ˆx(k|k − 1))] xˆ (k)τ = xˆ (k|k − 1) + K(k)[y(k)
(6.88)
with: ei (k) = di (k) − w(k)ˆx(k)τ −1 x (k) yx (k) = (k) − (k) = (k) xy (k) y (k)
(6.89) (6.90)
T −1 x (k) p (k|k − 1) P(k|k − 1) = (p (k|k − 1))−1
(6.91)
T −1 xy (k) p (k|k − 1) Pxy (k|k − 1) = (r (k))−1
(6.92)
T yx (k) r (k) −1 Pyx (k|k − 1) = (p (k|k − 1))−1
(6.93)
T y (k) r (k) −1 R(k) = (r (k))−1
(6.94)
= K(k)
T P(k|k − 1) + S (k)PTxy (k|k − 1) + Pyx (k|k − 1) + S (k) R(k) S(k) × Pyx (k|k − 1) + ST (k) R(k) ,
−1
(6.95)
where xˆ (k)τ is the estimated state xˆ (k) at the τ -th fixed-point iteration ||ˆx(k)τ −ˆx(k)τ −1 || Step 7: Compare xˆ (k)τ and xˆ (k)τ −1 . If ≤ ε holds, set xˆ (k) = xˆ (k)τ and continue ||ˆx(k)τ −1 || to Step 8. Otherwise, τ + 1 → τ , and return to Step 6 Step 8: Calculate the posterior error covariance matrix by: T P(k) = I − K(k)S(k) P(k|k − 1) I − K(k)S(k) T. + K(k)R(k) K(k) Update k + 1 → k and return to Step 3
(6.96)
176
6 Unscented Kalman Filter Under Information Theoretic Criteria
The MEE-UKF retains the merit of MEE in dealing with multimodal distribution noises and large outliers. On one hand, through Parzen’s PDF estimator and Gaussian kernel function, the MEE has a strong modeling capability for error distribution especially for multimodal distributions. On the other hand, as the large outliers will result in abnormal .e (i) and .d(i) in .2 , the MEE can reduce the influence of outliers via the Gaussian kernel function .κσ [ej (k) − ei (k)] [143]. Remark 6.6 The double summation in MEE cost is a challenge when deriving the recursive MEE-UKF. The constructed matrices . and ., fortunately, benefit the derivation of the recursive MEE-UKF, which is completely different from the MCUF. In contrast with the MCUF based on a linearized approximation method, the MEE-UKF is developed in a more rigorous and accurate form due to the introduction of the statistical linearization error .ν(k). Remark 6.7 The estimation accuracy of the MEE-UKF is greatly affected by the kernel size .σ . A too small kernel size may result in filter divergence [103]. In addition, a too large kernel size also weakens the robustness of the MEE-UKF since errors in the presence of outliers cannot be suppressed by the Gaussian Kernel function. Remark 6.8 The statistical linearization in [29] is applied for yielding a linearized measurement function shown in (6.79). The traditional linearization technique may cause large residual errors for systems with strong nonlinearity since the nonlinear function is approximated crudely by a lower-order Taylor series expansion. The high-order Taylor series expansion is not recommended since a high-order Jacobian matrix is hard to estimate. By contrast, the statistical linearization includes a compensation factor or statistical linearization error .ν(k) for improving approximation accuracy.
6.5.2 Illustrative Example The filtering performance of the MEE-UKF is demonstrated in dynamic study of IEEE test systems [158] including IEEE 14 bus system, IEEE 30 bus system, and IEEE 57 bus system at different types of non-Gaussian noises. The detailed data of these IEEE test systems can be found at [158]. Nonlinear Kalman filters such as the EKF [18], UKF [154], HUKF [83], and MCUKF [119] are included for making a comparison with the MEE-UKF on the platform of MATLAB 2016b running on i5-4590 and .3.30 GHZ CPU. Table 6.8 provides parameter settings relevant to the state space model with m measurements (.m1 real power injection, .m2 reactive power injection, .m3 real power flow and .m4 reactive power flow), n states (voltage magnitude .n1 , and voltage angle .n2 ), lines l, and sampling N . Several performance metrics are introduced for evaluating the filtering precision of nonlinear Kalman filters from different perspectives:
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF) Table 6.8 Parameter settings for IEEE 14, 30, 57 bus test systems (From [126])
Parameter m n l N .n1 .n2 .m1 .m2 .m3 .m4
177 14 bus 96 27 20 200 13 14 14 14 27 27
30 bus 172 59 41 200 29 30 30 30 41 41
57 bus 331 113 80 200 56 57 57 57 80 80
(1) The mean absolute error (MAE) measures the average magnitude of the errors: n 1 xˆ i (k) − xi (k) .MAE(k) = , |xi (k)| + ς n
(6.97)
i=1
where the small number .ς is included for avoiding numerical instability and xˆ i (k) and .xi (k) represent the ith estimated and actual states at k-th time sample, respectively. (2) The averaging MAE (AMAE) describes the steady-state MAE, calculated by averaging filtering errors over the last .No iterations: .
1 .AMAE = nNo
N
n xˆ i (k) − xi (k)
k=N−No +1 i=1
|xi (k)| + ς
.
(6.98)
(3) The mean absolute error of the voltage magnitude error (pu) and voltage angle error (degree): n1 1 ˆ .MAEV (k) = Vi (k) − Vi (k). n1
(6.99)
n2 1 ϕˆ i (k) − ϕ i (k), n2
(6.100)
i=1
MAEϕ (k) =
i=1
ˆ i (k) and .Vi (k) denote the estimated and actual voltage magnitudes; where .V ϕˆ i (k) and .ϕ i (k) represent the estimated and actual voltage angles, respectively. Define the maximum absolute error of the voltage magnitude error (pu) and voltage angle error (degree) by:
.
MMEV (k) = max
.
Vi (k)
ˆ Vi (k) − Vi (k).
(6.101)
178
6 Unscented Kalman Filter Under Information Theoretic Criteria
MMEϕ (k) = max
ϕ i (k)
ϕˆ i (k) − ϕ i (k).
(6.102)
All results are obtained by averaging over 200 independent Monte Carlo runs. The practical power system may suffer from abnormal process and measurement gross error, yielding unknown noise statistics. To validate the filtering effectiveness and robustness of the MEE-UKF, we consider the following scenario: the process and measurement noises are corrupted by non-Gaussian noise with random outliers and outliers located at a specific discrete time. In reference [29], the Pacific Northwest National Laboratory provides the PMU data about error distributions of the phasor voltage angle and magnitude, which shows that the PMU measurement error potentially follows multimodal or bimodal distributions. As a common case, the bimodal distribution noise is considered. The random scenario includes the Gaussian noise with random outliers and bimodal Gaussian mixture noise with random outliers. The specific scenario considers sudden load change condition and bad measurement data at a specific discrete time. Different approaches can be adopted for the calculation of the noise variance matrices. A straightforward approach is about calculating the variance matrices .Q and .R according to the statistical characteristic shown in [119]. However, the overall statistical characteristic of noise in real world may be unavailable. This motivates us to use another method where the noise variance matrices .Q and .R are determined by the principal component of the mixture noise for fair comparison [30]. It is in general easier to approximate the principal component of noise through measurement errors than to obtain the overall statistical characteristics of noise. In the experiments, 200 timesample intervals are set to simulate the dynamic changes of power systems.
6.5.2.1
Gaussian Noise with Random Outliers in Process and Measurement
This scenario considers that the power system is corrupted by Gaussian noise with random outliers. The process noise .q(k) and measurement noise .r(k) in (6.67) and (6.68) are assumed to obey .q(k) ∼ 0.1N(0, 0.05) + 0.9N(0, 10−5 ) and −5 ) .r(k) ∼ 0.1N(0, 20)+0.9N(0, 0.3). In the mixture distribution noise, .0.9N(0, 10 or .0.9N(0, 0.3) are referred as principal components. In addition, secondary components .0.1N(0, 0.05) or .0.1N(0, 20) are considered for modeling impulsive noises (outliers) such as random unexpected changes occurring on the states and measurements. All nonlinear Kalman filters are initialized by .P(1|0) = 10−3 I. For the UKF, HUKF, MCUKF, and MEE-UKF, the unscented transformation parameters are set as .α = 0.01, .β = 1, and .κ = 0. The kernel size .σ .α , .β , and .κ for MCUKF and MEE-UKF are .5.0 and .1.5, respectively. The threshold factor .ε for MEE-UKF is set as .ε = 10−6 . The small number .ς in MAE is .10−10 . The estimated values of .τ and .η for the IEEE test system are obtained as .1.0 and .0.5, respectively. All parameters are chosen in the principle of making nonlinear
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF) -3
10 4
MAE(14-bus)
Fig. 6.14 MAE of nonlinear Kalman filters under Gaussian noise with random outliers, where .Q and .R are calculated by statistical characteristic (From [126])
179
3 2
EKF UKF HUKF MCUKF MEE-UKF
1 0
0
50
150
200
(a)
10-3
5
100
time k
MAE(30-bus)
4 3 2
EKF UKF HUKF MCUKF MEE-UKF
1 0
0
50
100
time k
150
200
(b)
-3
10
4
EKF UKF HUKF MCUKF MEE-UKF
MAE(57-bus)
3 2 1 0
0
50
100
time k
150
200
(c)
filters achieve a balanced filtering performance in terms of estimation accuracy and numerical stability. Figure 6.14 shows the instant MAE of nonlinear filters for case A, i.e., Gaussian noise with random outliers, where .Q and .R are calculated by statistical characteristic of noise. From Fig. 6.14, we observe:
180
6 Unscented Kalman Filter Under Information Theoretic Criteria
1. The UKF is usually better at dealing with a nonlinear system especially accompanying with strong nonlinearity in comparison with the EKF. The state space model given in (6.69) and (6.75) has mild nonlinearity, and this leads to almost similar filtering accuracy of the EKF and UKF in Fig. 6.14b–c. Since the nonlinearity of the system may be enhanced due to random additive highamplitude outliers, the UKF performs better in the dynamic study than the EKF as shown in Fig. 6.14a. 2. Compared with EKF and UKF based on MMSE, the robust Kalman filters including MCUKF, HUKF, and MEE-UKF achieve higher estimation accuracy. The HUKF is not very practical for dealing with non-Gaussian noises since the Huber function utilizes a threshold factor to detect the abnormal error, which is unknown a priori. By contrast, the MCUKF suppresses outliers in a soft manner based on the correntropy concept, yielding higher precision than that of the HUKF. 3. MEE-UKF achieves superior filtering accuracy when the process and measurement noises display a heavy-tailed distribution. Since the process and measurement errors are all included in MEE via the augmented model, the MEEUKF can process the predictions and observations simultaneously. The MEE inherits the nature of maximum correntropy and therefore suppresses outliers efficiently. The noise covariance in real world is unavailable, which means that the method of calculating .Q and .R in Fig. 6.14 is not applicable in practice. As a result, the noise covariance is set by the second method, i.e., principal component of the mixture noise, which is used for initializing .Q and .R. All nonlinear filters are performed under the same initialized noise covariance for fair comparison. Figure 6.15 gives the .AMAE results, where .No = 80 is used in this case. It can be observed that: 1. UKF achieves slightly higher filtering accuracy than EKF in IEEE 57 bus system and almost similar precision as the EKF in IEEE 14 and 30 bus systems. MCUKF
8 6
AMAE
Fig. 6.15 AMAE of nonlinear Kalman filters under Gaussian noise with random outliers, where −5 .Q = 10 I and .R = 0.3I (From [126])
10-3 EKF UKF HUKF MCUKF MEE-UKF
4 2 0
14
30
Bus
57
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF)
181
and HUKF still behave better than EKF and UKF because of the use of robust criterion when .Q and .R deviate from overall statistical characteristics. 2. The performance of the MEE-UKF has significant advantages over other methods in any test system. When the covariance matrix is initialized by the principal component of noise, the MEE-UKF in fact updates .Q and .R in real time in a weighted form, which can contribute to the performance improvement in practice. The first method of calculating .Q and .R seems more reasonable than the second method; this, however, does not mean that it is always beneficial for yielding a more accurate estimator due to the inaccuracy of estimating nonpositive covariance matrix caused by outliers or bad data. The MEE-UKF is less affected by the two ways of selection covariance matrix in comparison with the UKF and EKF which are sensitive to noise covariance matrices. This in fact can be linked to the adaptive update of .Q and .R in MEE-UKF. To further evaluate the transient state performance, Table 6.9 gives the .MMEV , .MMEϕ , .MAEV , and .MAEϕ of several filters at sample time .k = 80. The settings of parameters and noises are the same as those in Fig. 6.14. These results shown in Figs. 6.14, 6.15, and Table 6.9 indicate that the MEE-UKF can achieve higher filtering capacities and more robust performance during slow dynamic changes when the process and measurement are disturbed by Gaussian noise with random outliers. Table 6.9 Estimation results of different algorithms at .K = 80 for case A (From [126])
Algorithm EKF
UKF
HUKF
MCUKF
MEE-UKF
Bus 14 bus 30 bus 57 bus 14 bus 30 bus 57 bus 14 bus 30 bus 57 bus 14 bus 30 bus 57 bus 14 bus 30 bus 57 bus
.MMEV
.MMEϕ
.MAEV
.0.0166
.0.0299
.0.0061
.MAEϕ .0.0113
.0.0294
.0.0337
.0.0072
.0.0107 .0.0069
.0.0161
.0.0266
.0.0045
.0.0163
.0.0321
.0.0060
.0.0115
.0.0291
.0.0324
.0.0071
.0.0106 .0.0068
.0.0161
.0.0264
.0.0046
.0.0147
.0.0298
.0.0058
.0.0109
.0.0150
.0.0428
.0.0054
.0.0113 .0.0058
.0.0137
.0.0220
.0.0041
.0.0096
.0.0236
.0.0046
.0.0091
.0.0142
.0.0358
.0.0049
.0.0100 .0.0043
.0.0101
.0.0161
.0.0033
.0.0104
.0.0138
.0.0037
.0.0071
.0.0208
.0.0284
.0.0049
.0.0073
.0.0076
.0.0087
.0.0023
.0.0025
Bold values highlight the best results
182
6.5.2.2
6 Unscented Kalman Filter Under Information Theoretic Criteria
Bimodal Gaussian Mixture Noise with Random Outliers in Process and Measurement
Since unexpected events may occur in power systems such as load variations and instrument failures, these undesired events can be modeled as a bimodal distribution [29, 30]. The bimodal noise distribution also has the potential in modeling varying heavy-tailed characteristics. In this scenario, process and measurement noises are assumed to obey the symmetrical bimodal distribution, i.e., .q(k) ∼ 0.5N(0.2, 10−4 )+0.5N(−0.2, 10−4 ) and .r(k) ∼ 0.5N(0.2, 0.3)+0.5N(−0.2, 0.3), respectively. Nonlinear Kalman filters are initialized by .P(1|0) = 10−3 I. The unscented transformation parameters .α , .β , and .κ are set as the same as those in Fig. 6.14. The kernel sizes .σ for the MCUKF and MEE-UKF are .5.0 and .1.5, respectively. The MEE-UKF sets the threshold parameter in MEE-UKF .ε = 10−6 for stopping the adaptation. Figures 6.16 and 6.17 describe the MAE and AMAE of nonlinear filters respectively. We can see: 1. EKF performs the worst filtering accuracy due to the linearization of a nonlinear function in Fig. 6.16 and IEEE 14 and 57 bus system of Fig. 6.17. 2. Figure 6.16 shows the improved filtering performance of the MCUKF in comparison with the HUKF since the MCUKF introduces the MCC for softly dealing with a bimodal Gaussian mixture noise. 3. MEE-UKF shows the potential to deal with the bimodal distribution due to the superiority of the MEE, i.e., strong ability of multimodal modeling and outlier suppression. As a result, the MEE-UKF behaves the best estimation precision among the nonlinear Kalman filters. 4. With different approaches of setting .Q and .R, the performance of MEE-UKF is superior to that of other filters. Consider further that the power system is disturbed by bimodal Gaussian mixture noises with random outliers, i.e., .q(k) ∼ 0.4N(0.2, 10−4 ) + 0.2N(0, 10−2 ) + 0.4N(−0.2, 10−4 ) and .r(k) ∼ 0.4N(0.2, 0.3) + 0.2N(0, 20) + 0.4N(−0.2, 0.3). In the multimodal distribution, the Gaussian distribution with a large variance and small probability like .0.2N(0, 10−2 ) and .0.2N(0, 20) can create impulsive noises. Figures 6.18 and 6.19, respectively, show the MAE and AMAE comparison among the nonlinear filters with bimodal Gaussian mixture noises with random outliers. In most cases, the UKF slightly outperforms the EKF. The MEE-UKF still achieves the best estimation performance among all compared algorithms.
6.5.2.3
Sudden State Change and Bad Measurement Data
Apart from the random outliers in Fig. 6.17, this section studies the filtering robustness in the presence of an undesired event located at specific discrete time [121, 154]. All nonlinear Kalman filters perform the state estimation in the IEEE 14 bus system. The settings of parameter and noises are the same as those in Fig. 6.18a:
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF)
0.03
EKF UKF HUKF MCUKF MEE-UKF
0.025
MAE(14-bus)
Fig. 6.16 MAE of nonlinear Kalman filters under bimodal Gaussian mixture noise, where .Q and .R are calculated by statistical characteristic (From [126]). (a) IEEE 14 bus test system, (b) IEEE 30 bus test system, (c) IEEE 57 bus test system
183
0.02
0.015 0.01
0.005 0
0
50
100
time k
150
200
(a)
EKF UKF HUKF MCUKF MEE-UKF
0.06
MAE(30-bus)
0.05 0.04 0.03 0.02 0.01 0
0
50
100
time k
200
(b)
0.03
EKF UKF HUKF MCUKF MEE-UKF
0.025
MAE(57-bus)
150
0.02
0.015 0.01
0.005 0
0
50
100
time k
150
200
(c)
1. First, we consider the case of sudden load changes to investigate the robustness of the MEE-UKF. Bus 18 has a sudden change at .k = 80 where the power of load has a .20% drop at this time. Figure 6.20 compares the filtering precision of all nonlinear Kalman filters under this undesired event. From Fig. 6.20, we see that both the UKF and EKF have low estimation accuracy when the sudden
6 Unscented Kalman Filter Under Information Theoretic Criteria
Fig. 6.17 AMAE of nonlinear Kalman filters under bimodal Gaussian mixture noise, where −4 .Q = 10 I and .R = 0.3I (From [126])
0.012
EKF UKF HUKF MCUKF MEE-UKF
0.01 0.008
AMAE
184
0.006 0.004 0.002 0
14
30
Bus
57
load change occurs. This is because the EKF and UKF depended on the previous state estimation which is influenced by the sudden change. The HUKF, MCUKF, and MEE-UKF are little affected by the sudden load change due to the robust performance of the Huber function, MCC, and MEE acting on the abnormal error. The MEE-UKF performs better than HUKF and MCUKF since the weight matrices . x (k), . xy (k), . xy (k), . y (k) act on state prediction error covariance matrix, cross-covariance matrix, and measurement error covariance matrix to suppress the abnormal state and measurement. The MEE-UKF still achieves the best performance in comparison with HUKF and MCUKF no matter which method of calculating noise covariance matrix is adopted. It further demonstrates the enhanced insensitivity of the MEE-UKF for abnormal disturbance. 2. In addition, the robustness of MEE-UKF is further tested in the scenario where bad data are included in the measurement. The real and the reactive power measurements are reduced by .20% at .k = 100. Figure 6.21 provides the performance comparison of nonlinear Kalman filters in dealing with bad measurement data. It can be seen from Fig. 6.21 that the filtering precision of the EKF and UKF have a sharp decrease and obvious fluctuations. This shows that traditional Kalman filters indeed possess severe sensitivities to bad data. By contrast, MCUKF, HUKF, and MEE-UKF are all insensitive to this abnormal measurement since these robust filters have the potential in detecting and suppressing bad data. In comparison with the MCUKF and HUKF, the MEEUKF behaves the superior filtering precision in dealing with bad data.
6.5.2.4
Different Parameters of MEE-UKF
Figures 6.22 and 6.23 investigate the influences of kernel size .σ and initial state error covariance .P(1|0) on filtering accuracy for the IEEE 30 bus test system. In this scenario, the states and measurements are disturbed by bimodal Gaussian mixture
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF)
0.03
EKF UKF HUKF MCUKF MEE-UKF
0.025
MAE(14-bus)
Fig. 6.18 MAE of nonlinear Kalman filters under bimodal Gaussian mixture noise with random outliers, where .Q and .R are calculated by statistical characteristic (From [126]). (a) IEEE 14 bus test system, (b) IEEE 30 bus test system, (c) IEEE 57 bus test system
185
0.02
0.015 0.01
0.005 0
0
50
100
time k
150
200
(a)
EKF UKF HUKF MCUKF MEE-UKF
MAE(30-bus)
0.03 0.02 0.01 0
0
50
100
time k
200
(b)
0.025
EKF UKF HUKF MCUKF MEE-UKF
0.02
MAE(57-bus)
150
0.015 0.01
0.005 0
0
50
100
time k
150
200
(c)
noises with outliers like that in Fig. 6.18. It is shown in Fig. 6.22 that the MEE-UKF suffers from performance degradation when the kernel size .σ is too small or too large. It is natural to think that a too small or too large kernel size .σ may make the minimum error entropy lose the ability to suppress outliers, since the minimum error entropy reduces to one or zero under very large or very small .σ . The MEE-
Fig. 6.19 AMAE of nonlinear Kalman filters under bimodal Gaussian mixture noise with random outliers, where .Q = 10−4 I and .R = 0.3I (From [126])
6 Unscented Kalman Filter Under Information Theoretic Criteria
0.012 0.01 0.008
AMAE
186
EKF UKF HUKF MCUKF MEE-UKF
0.006 0.004 0.002 0
30
57
Bus
0.01
EKF UKF HUKF MCUKF MEE-UKF
MAE(14-bus)
0.008 0.006 0.004 0.002 0
0
50
100
time k
150
200
(a)
0.05
EKF UKF HUKF MCUKF MEE-UKF
0.04
MAE(14-bus)
Fig. 6.20 MAE of nonlinear Kalman filters under sudden state change: (a) .Q = 10−4 I and .R = 0.3I; (b) .Q and .R are calculated by statistical characteristic (From [126])
14
0.03 0.02 0.01 0
0
50
100
time k
(b)
150
200
6.5 Minimum Error Entropy Unscented Kalman Filter (MEE-UKF)
0.006
EKF UKF HUKF MCUKF MEE-UKF
MAE(14-bus)
Fig. 6.21 MAE of nonlinear Kalman filters under bad measurement data: (a) −4 .Q = 10 I and .R = 0.3I; (b) .Q and .R are calculated by statistical characteristic (From [126])
187
0.004
0.002 0
50
100
time k
200
(a)
0.015
MAE(14-bus)
150
EKF UKF HUKF MCUKF MEE-UKF
0.01
0.005
0
0
50
100
time k
150
200
80
100
(b)
Fig. 6.22 Estimation results with different kernel sizes for IEEE 30 bus test system (From [126])
-3
10
MAE(30-bus)
6 4 2
=0.8 =1.0 =1.2 =1.5 =1.8 =2.2
MEE-UKF MEE-UKF MEE-UKF MEE-UKF MEE-UKF MEE-UKF
0
-2 0
20
40
60
time k
6 Unscented Kalman Filter Under Information Theoretic Criteria
Fig. 6.23 Estimation results with different initial state error covariances for IEEE 30 bus test system (From [126])
0.3
MEE-UKF P(1|0)=0.2I MEE-UKF P(1|0)=10-1I MEE-UKF P(1|0)=10-2I MEE-UKF P(1|0)=10-3I MEE-UKF P(1|0)=10-4I MEE-UKF P(1|0)=10-5I
MAE(30-bus)
188
0.2 0.1 0 0
Table 6.10 Average computing time (Sec) of the three filters for three test systems (From [126])
Bus 14 bus 30 bus 57 bus
EKF 0.038 0.101 0.352
50
time k UKF 0.202 1.286 7.075
HUKF 0.502 2.605 14.933
MCUKF 0.559 2.777 15.103
100
MEE-UKF 1.768 12.348 80.305
UKF achieves the highest filtering precision at around .σ = 1.0. From Fig. 6.23, the estimation performance (MAE) of MEE-UKF changes slightly despite a large difference of the convergence when the initial state error covariance is less than −1 I. .10
6.5.2.5
Computational Complexity
The average computing time of the EKF, UKF, HUKF, MCUKF, and MEE-UKF in the above three test systems is summarized in Table 6.10. It can be concluded that the EKF requires the least computing time, followed by the UKF, HUKF, and MCUKF. Due to the diagonal matrix calculation to improve the robustness, the HUKF, MCUKF, and MEE-UKF require more computing time. For a lowdimensional state, the computing time of HUKF, MCUKF, and MEE-UKF has no significant difference in terms of order of magnitude. Note that the MEEUKF requires more computing time than other algorithms determined by the size of matrix . (k). Therefore, the MEE-UKF brings a significant performance improvement in the complex non-Gaussian noises but at the cost of computational complexity. Actually, the computational complexity can be reduced via simplifying the calculation of matrix . (k) or sparsity techniques for large-scale power systems. To improve computational efficiency, the matrix . (k) can be kept constant during the consecutive time samples under normal operating conditions.
6.6 Conclusion
189
6.6 Conclusion In this chapter, two kinds of nonlinear Kalman-type filter based on maximum correntropy criterion (MCC) are discussed. The first filter is the maximum correntropy unscented filter (MCUF), in which the nonlinear measurement function is approximated by a statistical linearization method. According to this approximation, the MCC is used to update the state estimation. Simulation results demonstrate that the MCUF can achieve better performance compared with EKF, HEKF, UKF, and HUKF particularly when the underlying system is disturbed by some impulsive noises. The second is the maximum correntropy unscented Kalman filter (MCUKF), which is derived without linearization approximation. For simplicity, only the measurement error covariance is modified by the MCC, and then standard UKF is performed. Simulations also validate the robustness of MCUKF and higher estimation accuracy than EKF, HEKF, UKF, and NRUKF under heavy-tailed distribution noise. It is worth noting that both measurement and state error covariance in MCUK are modified by MCC, but a linear (statistical linear) approximation is required. Since no linearization approximation to the measurement function is included in the MCUKF, it provides robustness at the cost of computational burden, but the state error covariance is not amended. The MCUK and MCUKF have merits and demerits, respectively, and specific use can be applied according to specific basis. Benefiting from the high robustness and flexibility of the generalized correntropy loss (GCL), the GCL-UKF was derived to obtain the posterior state estimate, which manages to correctly estimate the state in the presence of Laplace noises with outliers. To further improve the estimation performance of the GCL-UKF, an exponential adjusting method is introduced to update the covariance matrix by adding all previous effect of the innovation vector in an absolute exponential form, which helps reduce the effect of gain. Numerical simulations have been conducted for power system state estimation on IEEE 14, 30, and 57 bus test systems, and satisfactory efficiency and robustness of the new methods have been verified. To deal with more complex distribution noise, the minimum error entropy unscented Kalman filter (MEE-UKF) is derived using the minimum error entropy (MEE) for parameter adaptation, a fixed-point iteration algorithm as the update method, and a statistical linearization approach to construct the augmented model. Simulations on IEEE 14, 30, and 57 bus test systems confirmed that the MEE-UKF algorithm can provide better estimate than the UKF and MCUKF especially when the process and measurement are disturbed by some complex disturbances, i.e., Gaussian noise with random outliers, bimodal Gaussian mixture noise with random outliers, bad measurement data, and sudden state change.
Chapter 7
Cubature Kalman Filtering Under Information Theoretic Criteria
In UKF, the unscented transformation (UT) sigma points are utilized to calculate the mean and covariance of a random variable after a nonlinear transformation using a deterministic set of sigma points [20, 26]. Hence, UKF provides higher-order approximation, which is more accurate than the linear approximate filters such as EKF [27]. However, the choice of scaling parameter in UKF is still a challenge in practice. To ease the burden on the selection of the scaling parameter, a nonlinear filtering algorithm named cubature Kalman filter (CKF) has been introduced, which is based on the spherical-radial cubature rule for numerical integration [24]. For a nonlinear system, the CKF can obtain good performance in Gaussian noises. Since by utilizing the maximum correntropy criterion (MCC) to improve the robust performance instead of traditional minimum mean square error (MMSE) criterion, the CKF and its square-root version based on MCC are provided in this chapter. The two algorithms are named as the maximum correntropy cubature Kalman filter (MCCKF) [122] and maximum correntropy square-root cubature Kalman filter (MCSCKF) [123]. The new filters not only retain the advantages of CKF but also exhibit robust performance against heavy-tailed non-Gaussian noises. However, when the dynamic system suffers from complex non-Gaussian disturbances, the estimates obtained by MCCKF may be obviously biased. To address this issue, the cubature Kalman filter under minimum error entropy with fiducial points (MEEFCKF) [127] is presented to improve the robustness against noises. Simulation results confirm that the MEEF-CKF can achieve high estimation accuracy and strong robustness in the complex non-Gaussian noises especially in large outliers and multimodal distribution noise.
7.1 Cubature Kalman Filter (CKF) Consider a nonlinear system in the following form: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_7
191
192
7 Cubature Kalman Filtering Under Information Theoretic Criteria
.
x(k) = f(x(k − 1)) + q(k − 1).
(7.1)
y(k) = h(x(k)) + r(k).
(7.2)
The CKF consists of the following two steps.
7.1.1 Prediction The predicted state .xˆ (k|k − 1) and associated covariance matrix .Pxx (k|k − 1) are calculated by 1 χ i (k|k − 1). 2n 2n
xˆ (k|k − 1) =
.
(7.3)
i=1
2n 1 χ i (k|k − 1) − xˆ (k|k − 1) T Pxx (k|k − 1) = 2n × χ i (k|k − 1) − xˆ (k|k − 1) + Q(k), i=1
(7.4)
in which .
χ i (k|k − 1) = f(ξ i (k − 1)) (i = 1, 2, . . . , 2n),
(7.5)
where cubature points .ξ i (k − 1) are generated by the third-degree spherical-radial rule.
7.1.2 Update The predicted measurement .yˆ (k|k − 1), measurement error covariance matrix Pyy (k|k − 1), and the state-measurement cross-covariance matrix .Pxy (k|k − 1) are calculated by
.
1 ˆ (k|k − 1) = .y γ i (k). 2n 2n
(7.6)
i=1
Pyy (k|k − 1) =
2n 1 γ i (k) − yˆ (k|k − 1) T . 2n ˆ (k) − y (k|k − 1) + R(k). × γ i i=1
(7.7)
Pxy (k|k − 1) =
2n 1 χ i (k|k − 1) − xˆ (k|k − 1) T 2n × γ i (k) − yˆ (k|k − 1) . i=1
(7.8)
7.2 Maximum Correntropy Cubature Kalman Filter (MCCKF)
193
Then, the state estimate .xˆ (k) and associated covariance matrix .P(k) are xˆ (k) = xˆ (k|k − 1)+K(k) y(k) − yˆ (k|k − 1) .
.
P(k) = Pxx (k|k − 1) − K(k)Pyy (k|k − 1)KT (k),
(7.9) (7.10)
where K(k) = Pxy (k|k − 1)P−1 yy (k|k − 1)
.
(7.11)
is the gain matrix.
7.2 Maximum Correntropy Cubature Kalman Filter (MCCKF) The MCC exhibits robust performance in the presence of non-Gaussian noises. In this section, we use the MCC to improve the robustness of the CKF. Although MCCKF is derived similarly to the MCUF, it is more applicable than MCUF since it doesn’t require the selection of the scaling parameter. MCCKF algorithm is derived based on (7.3)–(7.6) and (7.8). Similar to MCUF in Sect. 6.2, the MCCKF is derived by linearizing the measurement equation via a statistical linear method. Thus, a batch-regression model is given by D(k) = W(k)x(k) + e(k),
.
(7.12)
xˆ (k|k − 1) where .D(k) = (k) , .W(k) = y(k) − yˆ (k|k − 1) + S(k)ˆx(k|k − 1) − x(k) − xˆ (k|k − 1) I −1 (k) , .e(k) = −1 , and the calculations of r(k) S(k) .(k) and .S(k) are similar to those in MCUF. The cost function for the batch-regression (7.12) based on MCC can be expressed by −1
JMCCKF (x(k)) =
n+m
.
κσ (ei (k)) ,
(7.13)
i=1
with ei (k) = Di (k) − wi (k)x(k).
.
(7.14)
194
7 Cubature Kalman Filtering Under Information Theoretic Criteria
The optimal solution to .x(k) is obtained by setting the gradient of .JMCCKF (·) (x(k)) regarding .x(k) to zero, i.e.,. ∂JMCCKF = 0. A fixed-point algorithm to update the ∂x(k) estimate of .x(k) gives xˆ (k)τ = g xˆ (k)τ −1 .
.
(7.15)
Combining the optimal solution in (7.15) and the measurement update of Bayesian filtering, one can derive the MCCKF, summarized as Algorithm 11. Remark 7.1 Although the MCCKF is similar to an iterated extended Kalman filter (IEKF) [159], it has a different cost function from IEKF. The penalty imposed by the maximum correntropy criterion in MCCKF is less sensitive to outliers in comparison with the minimum mean square criterion used in IEKF. Furthermore, the Newton’s method used in IEKF relies on the first order continuity and second derivative of nonlinear functions, and so more complex calculations of the Hessian matrix and its inverse are required. MCCKF only requires an efficient fixed-point iteration, which will surely converge to a unique solution provided that the initial value is proper and the kernel size is larger than a certain value [102]. Remark 7.2 The kernel size .σ is a crucial parameter that will provide direct influence not only on convergence of fixed-point iteration but also on the overall numerical stability of filtering process. An adaptive kernel size selection strategy is required. However, for the systems with high nonlinearity, the adaptive selection methods often fail. In practical applications, one often uses manual setting or optimization with trial and error methods within a proper range [102]. P(k|k − 1) and . R(k) tend Remark 7.3 When the kernel size .σ approaches .∞, . to .P(k|k − 1) and .R(k) respectively, and MCCKF reduces to CKF. In addition, MCCKF can be extended to other CKF type nonlinear filters, e.g., high-degree CKF [160], and spherical simplex-radial cubature Kalman filter (SSRCKF) [161].
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF) To prevent the problem of filter divergence and the loss of positive definiteness owing to the computational errors of arithmetic operations performed on the finite word-length digital computers, the SCKF based on MCC (MCSCKF) is derived, which propagates the square-root of the a priori and posterior error covariance. The MCSCKF not only retains the advantage of SCKF but also exhibits robust performance against heavy-tailed non-Gaussian noises.
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF)
195
Algorithm 11 Maximum correntropy cubature Kalman filter (MCCKF) [122] Step 1: Choose a proper kernel size σ and a small positive ε. Set an initial estimate xˆ (0) and corresponding covariance matrix P(0). Let k = 1; Step 2: Use Eqs. (7.3) and (7.4) to obtain prior estimate xˆ (k|k − 1) and covariance P(k|k − 1), and calculate p (k|k − 1) by Cholesky decomposition; Step 3: Use (7.6) to compute the prior measurement yˆ (k|k − 1) and use (7.12) to construct the statistical linear regression model (7.12); Step 4: Generate 2n sigma points using the third-degree spherical-radial rule, and let τ = 1 and −1 ˆ (k)0 = WT (k)W(k) WT (k)D(k); .x Step 5: Use (7.16)–(7.22) to compute xˆ (k)τ xˆ (k)τ = xˆ (k|k − 1) + K(k) y(k) − yˆ (k|k − 1)
(7.16)
−1 K(k) P(k|k − 1)ST (k) + R(k) = P(k|k − 1)ST (k) S(k)
(7.17)
T −1 P(k|k − 1) = p (k|k − 1) x (k)p (k|k − 1)
(7.18)
T −1 R(k) = r (k) y (k)r (k)
(7.19)
x (k) = diag{κσ ( e1 (k)) , . . . , κσ ( en (k))}
(7.20)
y (k) = diag{κσ ( en+1 (k)) , . . . , κσ ( en+m (k))}
(7.21)
ei (k) = Di (k) − wi (k)ˆx(k)τ −1
(7.22)
with
Step 6: Compare the estimation at the current step and the estimation at the last step. If (7.23) holds, set xˆ (k) = xˆ (k)τ and continue to Step 7. Otherwise, τ + 1 → τ , and go back to Step 5. xˆ (k)τ − xˆ (k)τ −1 ≤ε xˆ (k)τ −1
(7.23)
Step 7: Update the posterior covariance matrix by (7.24), k + 1 → k and go back to Step 2. T P(k|k − 1) I − K(k)S(k) P(k) = I − K(k)S(k) T (k) + K(k)R(k) K
(7.24)
196
7 Cubature Kalman Filtering Under Information Theoretic Criteria
7.3.1 MCSCKF Algorithm Since the square-root of the error covariance matrix is used in the update of SCKF, the MCSCKF is established also employing the square-root matrix. Define .SP (k−1) as the square-root of the covariance matrix .P(k−1), i.e., .P(k−1) = SP (k−1)STP (k− 1), .SQ (k−1) as a square-root factor of .Q(k−1), i.e., .Q(k−1) = SQ (k−1)STQ (k−1), and likewise .SR (k) as a square-root factor of .R(k), i.e. .R(k) = SR (k)STR (k). Similar to MCUKF, the MCSCKF is derived with a nonlinear regression model, constructed by .
where .ϕ (k) =
xˆ (k|k − 1) x(k) = + ϕ (k). y(k) h(x(k))
(7.25)
− x(k) − xˆ (k|k − 1) with the square-root of covariance matrix r(k) (k) =
.
SP (k|k − 1) 0 . 0 SR (k)
Left multiplying both sides of (7.25) by .−1 (k) yields d(k) = h¯ (x(k)) + e(k),
.
(7.26)
xˆ (k|k − 1) x(k) −1 where .d(k) = (k) , .h¯ (x(k)) = (k) , and .e(k) = y(k) h(x(k)) − x(k) − xˆ (k|k − 1) −1 (k) with the covariance of identity matrix. r(k) The optimal estimate of .x(k) is obtained by −1
xˆ (k) = arg max
n+m
.
x(k)
κσ (ei (k)),
(7.27)
i=1
where .ei (k) is the i-th element of .e(k) and ei (k) = di (k) − h¯ i (x(k)).
.
Setting the first derivative of the cost function to zero, we get m+n .
ψ (ei (k))
i=1
where .ψ (ei (k)) = κσ (ei (k)) ei (k).
∂ei (k) = 0, ∂x(k)
(7.28)
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF)
197
We define . i (k) = ψ (ei (k)) /ei (k) = κσ (ei (k)) and x (k) 0 . .(k) = diag{ 1 (k), . . . , n+m (k)} = 0 y (k) Then, (7.28) can be written in a matrix form:
.
h¯ (x(k)) ∂x(k)
T
(k) d(k) − h¯ (x(k)) = 0.
(7.29)
The MCC utilizes .(k) to modify the measurement information and can be regarded as a re-weighting of the covariance matrix of the residual error .e(k) to reconstruct the measurement. (k) as the updated covariance matrix, given by Define .
(k) = (k)−1 (k)T (k).
.
(7.30)
(k) in a block-matrix form We write .
(k) =
.
x (k) 0
y (k) . 0
(7.31)
Since the true state .x(k) is unknown, we let .x(k) be equal to .xˆ (k|k − 1). Therefore, we could easily obtain the following equation: x (k) = Sp (k|k − 1) ∗ I ∗ STp (k|k − 1) = P(k|k − 1).
.
(7.32)
R(k) is given by Meanwhile, the updated measurement covariance matrix . y (k). R(k) =
.
(7.33)
The square-root updated measurement covariance matrix .r (k) can be calculated by R(k) = r (k)Tr (k).
.
(7.34)
It is noted that only one iteration is necessary to obtain the solution, saving on the computation time. If the measurement system is out of order to obtain extremely large measurement errors, the MCSCKF may face numerical problems since the matrix .y (k) will be nearly singular. Below, we give an approach to solve this problem. Let (k) = y(k) − yˆ (k|k − 1),
.
(7.35)
198
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Pyy (k) = Syy (k|k − 1)STyy (k|k − 1),
(7.36)
o(k) = (k)T Pyy (k)−1 (k).
(7.37)
.
.
If .|o(k)| > othr , where .othr > 0 is a preset threshold, we only conduct the predict step, that is, .xˆ (k) = xˆ (k|k − 1), .SP (k) = SP (k|k − 1). If .|o(k)| ≤ othr , the two steps will be conducted. The MCSCKF algorithm is summarized in Algorithm 12. Algorithm 12 Maximum correntropy square-root cubature Kalman filter (MCSCKF) [123] Step 1: Select a proper kernel size σ ; assume an initial estimate state xˆ (0) and associated squareroot covariance matrix SP (0); set time k = 1; Step 2: Evaluate the cubature points and the propagated cubature points with respect to function f and h; Step 3: Use (7.3) and (7.6) to calculate the a priori estimate state xˆ (k|k − 1) and the priori measurement yˆ (k|k − 1); calculate the square-root covariance matrix by SP (k|k − 1) = Tria([χ ∗ (k|k − 1), SQ (k − 1)]),
(7.38)
1 where Tria(·) denotes a general triangularization algorithm with χ ∗ (k|k − 1) = 2n [χ1 (k|k − 1) − xˆ (k|k − 1), . . . , χ2n (k|k − 1) − xˆ (k|k − 1)], i = 1, 2, . . . , 2n. Step 4: Utilize (7.25)–(7.34) to derive the updated measurement square-root covariance matrix r ; calculate the corresponding square-root covariance matrix:
Syy (k|k − 1) = Tria γ ∗ (k|k − 1), r (k) .
(7.39)
1 with γ ∗ (k|k − 1) = 2n [γ1 (k|k − 1) − yˆ (k|k − 1), . . . , γ2n (k|k − 1) − yˆ (k|k − 1)]; evaluate the cross-covariance matrix
Pxy (k|k − 1) = χ ∗∗ (k|k − 1)γ ∗ (k|k − 1)T , with χ ∗∗ (k|k − 1) = Kalman gain
1 2n [ξ1 (k|k
(7.40)
− 1) − xˆ (k|k − 1), . . . , ξ2n (k|k − 1) − xˆ (k|k − 1)]; compute the
K(k) = (Pxy (k|k − 1)/STyy (k|k − 1))/Syy (k|k − 1).
(7.41)
Step 5: Compute (7.35)–(7.37); if |o(k)| > othr , xˆ (k) = xˆ (k|k−1), Sp (k) = Sp (k|k−1), k = k+1 and go back to Step 2; if |o(k)| ≤ othr , go to Step 6. Step 6: Calculate the posterior estimate state and corresponding square-root covariance matrix xˆ (k) = xˆ (k|k − 1) + K(k)(y(k) − yˆ (k|k − 1)) SP (k) = Tria χ ∗∗ (k|k − 1) − K(k)γ ∗ (k|k − 1), K(k)r (k) , update k = k + 1 and go back to Step 2.
(7.42) (7.43)
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF)
199
Remark 7.4 From the preceding derivation, the MCSCKF algorithm combines the efficiency of the SCKF with the robustness of MCC, which uses a nonlinear regression model based on MCC to modify the measurement information. It should be also noted that the kernel size has a key role in MCSCKF. On one hand, a smaller kernel size exhibits more robust characteristic. However, if the kernel size is too small, the filtering precision decreases, even badly to the point of causing filter diverging. Please refer to [102] for details. On the other hand, when the kernel size is large enough, the performance of MCSCKF is similar to that of SCKF. The following theorem holds. Theorem 7.1 If the kernel size .σ → ∞, the MCSCKF will reduce to the SCKF. Proof As .σ → ∞, we have .(k) → I. Thus, the MCSCKF is equivalent to SCKF. In practical applications, the kernel size may be set or optimized by means of different procedures, whose choice will be discussed in next section.
7.3.2 Illustrative Example The MCSCKF is applied to the SINS/GPS integrated system, which is used to predict the motion of an aircraft. First, the coordinate frames used in this example are defined as follows: i-frame is the geocentric inertial coordinate frame. b-frame is the body frame, whose origin is at the center of mass of vehicle, x-axis is along its longitudinal axis, z-axis is perpendicular to symmetrical longitudinal plane, and y-axis makes the frame into the righted-handed system. e-frame is the Earth frame, whose origin is at the center of the Earth, z-axis is along the Earth’s polar axis, and x-axis and y-axis are fixed within the Earth’s equatorial plane. n-frame is the navigation frame, which is a local geographic frame and whose origin is at the location of the navigation system, zaxis is parallel to the upward vertical at the local Earth surface referenced position location, and x-axis points toward east and y-axis points toward north. Next, the mechanical attitude matrix differential equations, velocity differential equations, and position differential equations of SINS are as follows [162, 163]: b Cnb = Cnb (ωnb ×),
(7.44)
n n vn = Cnb fbsf − (2ωie + ωen ) × vn + gn ,
(7.45)
.
.
.
L˙ =
n vN , RM + H
λ˙ =
n sec L vE , RN + H
H˙ = vUn ,
(7.46)
200
7 Cubature Kalman Filtering Under Information Theoretic Criteria
with b b n ωnb = ωib − Cnb ωin ,
.
n n n ωin = ωie + ωen ,
.
T n ωie = 0 ωie cos L ωie sin L ,
.
T n ωen = −L˙ λ˙ cos L λ˙ sin L ,
.
T gn = 0 0 −gL ,
.
where .(A×) denotes the skew symmetric form of .A; .Cnb is the direction cosine n is the angular rotation velocity of matrix (DCM) relating b-frame to n-frame; .ωin n is the Earth rotation the navigation coordinate system with respect to i-frame; .ωie b and .fb n velocity; .ωen is the rotation velocity from the e-frame to the n-frame; .ωib sf are the angular velocity that the gyroscopes output and the specific force that the n n n T accelerometers provide; .vn = vE vN vU is the velocity of SINS; L, .λ; and H denote the latitude, longitude, and height of SINS; .RM and .RN represent the local radius of curvature in meridian and radius of curvature in prime vertical; and .ωie and .gL are angular rate of the Earth and local gravitational acceleration. Due to the effect of the various error sources, actually there are some angular rotation errors between the real navigation frame (n-frame) and computed navigation frame (.n -frame). The Euler platform error angle (EPEA) between n-frame and T .n -frame is defined by .φ = φx φy φz . By neglecting some small quantities and simplifying the formula, the attitude, the velocity, and the position error equations are written as follows [162, 163]: n n b φ˙ = C−1 (I − Cnn ) , ωin + Cnn δωin − Cnb δωib ω
.
(7.47)
T fbsf + (Cnn )T Cnb δfbsf δvn = I − (Cnn ) Cnb n n n n − (2δωie + δωen ) × vn − (2 ωie + ωen ) × δvn
.
(7.48)
n n + (2δωie + δωen ) × δvn + δgn ,
δ L˙ = .
δ λ˙ =
n vN
M + H R
−
n − δv n vN N , M − δRM ) + (H − δH ) (R
n − δv n ) sec(L n sec L − δL) ( vE vE E − , (R N − δRN ) + (H − δH ) N + H R
δ h˙ = δvUn ,
(7.49)
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF) ⎡ n
.Cn
⎢ =⎢ ⎣
201
cos φy cos φz − sin φy sin φx sin φz cos φy sin φz + sin φy sin φx cos φz − sin φy cos φz − cos φx sin φz
cos φx cos φz
sin φx
sin φy cos φz + cos φy sin φx sin φz sin φy sin φz − cos φy sin φx cos φz cos φy cos φx
⎤ ⎥ ⎥, ⎦
(7.50) with C−1 ω
.
⎡ ⎤ cos φy cos φx 0 sin φy cos φx 1 ⎣ = sin φy sin φx cos φx − cos φy sin φx ⎦ , cos φx − sin φy 0 cos φy b δωib = εb + wbg ,
.
δfbsf = ∇ b + wba ,
.
and .Cnn is expressed in (7.47), where . A denotes the corresponding computed .A in .n frame, .δA stands for the error of the corresponding .A between .n -frame and n-frame, T T n = ωn + δωn , .ε b = b = wb wb wb b εb εb for instant, . ωin and . w ε g x y z gx gy gz in in T b b denote the gyroscope constant drifts and random drifts, .∇ = ∇x ∇yb ∇zb and T b b wb wb .wa = denote the accelerometer constant biases and random wax ay az biases. The state is defined as n δv n δv n δL δλ δh x = φx φy φz δvE N U . T εxb εyb εzb ∇xb ∇yb ∇zb ,
and the process noise is expressed as T b wb wb wb wb wb q = wgx . gy gz ax ay az
.
Accordingly, the SINS error propagation models can be represented by x = f(t, x) + T(t, x)q,
.
⎡ ⎢ where .T(t, x) = ⎣
n −C−1 ω Cb
03×3
(7.51)
⎤ 03×3
T ⎥ Cnn Cnb ⎦.
09×3 09×3 Then, subtracting the velocity and position of SINS with the velocity and position of GPS, we have the following measurement model:
202
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Table 7.1 Aircraft maneuvers in the simulation Stage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Period
Aircraft maneuver Uniform linear flight Constant acceleration Uniform linear flight Roll angle turn Turn left Roll angle turn Uniform linear flight Roll angle turn Turn right Roll angle turn Uniform linear flight Pitch angle turn Uniform linear flight Pitch angle turn Uniform linear flight Pitch angle turn Uniform linear flight Pitch angle turn Uniform linear flight Constant deceleration Uniform linear flight
.0∼90 s .90∼100 s .100∼200 s .200∼205 s .205∼250 s .250∼255 s .255∼355 s .355∼360 s .360∼450 s .450∼455 s .455∼555 s .555∼565 s .565∼615 s .615∼625 s .625∼725 s .725∼735 s .735∼785 s .785∼795 s .795∼895 s .895∼900 s .900∼1000 s
⎡
n vE,SI NS ⎢ vn ⎢ N,SI N S ⎢ n ⎢v . ⎢ U,SI N S ⎢ LSI N S ⎢ ⎣ λSI N S HSI N S
Parameters x = 5 m/s .v = 5 m/s to .v = 10 m/s .v = 10 m/s ◦ ◦ .r = 0 to .r = −2.04 .v = 10 m/s ◦ ◦ .r = −2.04 to .r = 0 .v = 10 m/s ◦ ◦ .r = 0 to .r = 1.02 .v = 10 m/s ◦ ◦ .r = 1.02 to .r = 0 .v = 10 m/s ◦ ◦ .θ = 0 to .θ = 20 .v = 10 m/s ◦ ◦ .θ = 20 to .θ = 0 .v = 10 m/s ◦ ◦ .θ = 0 to .θ = −20 .v = 10 m/s ◦ ◦ .θ = −20 to .θ = 0 .v = 10 m/s .v = 10 m/s to .v = 5 m/s .v = 5 m/s .v
⎤ ⎡ ⎤ ⎤ ⎡ n n − vE,GP r1 δvE S ⎥ ⎢ δv n ⎥ ⎢ r ⎥ n − vN,GP ⎥ ⎢ 2⎥ ⎢ S⎥ ⎢ ⎥ ⎥ ⎢ N n ⎥ n − vU,GP ⎥ ⎢ r3 ⎥ ⎢ δvU S⎥ ⎥ + ⎢ ⎥, ⎥=⎢ ⎢ δL ⎥ ⎢ r4 ⎥ − LGP S ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ − λGP S ⎦ ⎣ δλ ⎦ ⎣ r5 ⎦ δH r6 − HGP S
(7.52)
where .r1 , .r2 , .r3 , .r4 , .r5 , and .r6 are white measurement noises. The aircraft has done a series of maneuvers in the simulation, which are listed in Table 7.1, and the real trajectory is shown in Fig. 7.1. Some other filters, including UKF, CKF, SCKF, and HSCKF, are used for comparison based on the following benchmarks: RMSE(k) =
.
lmc 2 1 x(k) − x(k) ˆ , k = 1, . . . , N,
lmc
mc =1
(7.53)
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF)
203
Fig. 7.1 True trajectory of aircraft (From [123])
k2 1 RMSE(k). .TRMSE = (k2 − k1 )
(7.54)
k=k1
The SINS system is performed every .0.04 s, and the GPS measurements are available every 1 s. The initial latitude is .L = 45.779◦ ; the longitude is .λ = 126.670◦ ; the height is .H = 100 m; the velocity T is .vn = 0 m/s 5 m/s 0 m/s ; the gyroscope constant drifts are .εb = T 0.05◦ /h 0.05◦ /h 0.05◦ /h ; the accelerometer constant biases are .∇ b = T −4 10 g 10−4 g 10−4 g ; the initial errors of the attitude, velocity, and position are T ◦ ◦ ◦ ◦ ◦ . 0.1 0.1 0.5 0.2 m/s 0.2 m/s 0.2 m/s (1800/(Re ∗ π )) (1800/(Re ∗ π )) 1 m ; the initial estimate states and corresponding covariance matrix are xˆ (0) = 0, P(0) = diag (1◦ )2 , (1◦ )2 , (5◦ )2 , (2 m/s)2 , (2 m/s)2 , 2 ◦ 2 ◦ 2 . (2 m/s) , ((18,000/(Re ∗ π )) ) , ((18,000/(Re ∗ π )) ) , 2 ◦ 2 ◦ 2 ◦ 2 2 (10 m) , (0.5 /h) , (0.5 /h) , (0.5 /h) , (0.001g) , (0.001g)2 , (0.001g)2 ; and the process noises are chosen as
204
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Table 7.2 .TRMSEs of velocity and position in Gaussian noises (From [123])
Filters UKF CKF SCKF HSCKF MCSCKF.(σ MCSCKF.(σ MCSCKF.(σ MCSCKF.(σ MCSCKF.(σ
= 2.0) = 3.0) = 5.0) = 10) = 20)
.TRMSE
.TRMSE
of .vE .(m/s) 0.0175 0.0175 0.0175 0.0179 0.0181 0.0176 0.0175 0.0175 0.0175
of .vN .(m/s) 0.0164 0.0164 0.0164 0.0166 0.0166 0.0164 0.0164 0.0164 0.0164
Q = diag .
.TRMSE
of .vU
.TRMSE
.TRMSE
.TRMSE
.(m/s)
of L .(m) 0.7644 0.7644 0.7644 0.8046 0.8168 0.7793 0.7680 0.7651 0.7646
of .λ .(m) 0.9536 0.9536 0.9536 0.9866 0.9999 0.9623 0.9545 0.9535 0.9535
of H .(m) 0.1615 0.1615 0.1615 0.1697 0.1753 0.1656 0.1625 0.1617 0.1616
0.0051 0.0051 0.0051 0.0052 0.0052 0.0051 0.0051 0.0051 0.0051
√ 2 √ 2 √ 2 (0.02◦ / h) , (0.02◦ / h) , (0.02◦ / h) 2 2 2 (5 × 10−5 g) , (5 × 10−5 g) , (5 × 10−5 g) ,
where h denotes 1 hour, g represents the gravitational force, and .Re stands for the semimajor axis of the Earth. In this example, we have carried out 100 independent Monte Carlo experiments, and the total simulation in each experiment is 1000 s, i.e., .N = 1000, .lmc = 100. It is worth mentioning that the correntropy is a local similarity measure; thus, the MCSCKF might converge to the optimal solution slowly, particularly when the initial values deviate greatly from the true values. Therefore, we utilize the SCKF to make the process settle down during the initial 10 s, and then the MCSCKF starts after the time. First, we assume the measurement noises are Gaussian, i.e., r1 ∼ N 0, (0.1 m/s)2 , r2 ∼ N 0, (0.1 m/s)2 , 2 . r3 ∼ N 0, (0.1 m/s) , r4 ∼ N 0, ((1800/(Re ∗ π ))◦ )2 , r5 ∼ N 0, ((1800/(Re ∗ π ))◦ )2 , r6 ∼ N 0, (1 m)2 . Table 7.2 demonstrates the .TRMSE, defined in (7.54), with different filters in Gaussian noises (the parameter is set to .k1 = 10, .k2 = 1000). Those results illustrate that the performances of the non-robust filters (UKF, CKF, SCKF) are similar in this case, achieving the smallest .TRMSE. It is also seen that the robust filters perform worse than the non-robust filters under Gaussian noise condition. Next, we assume the measurement noises are heavy-tailed, satisfying the following mixed-Gaussian distributions, i.e.,
7.3 Maximum Correntropy Square-Root Cubature Kalman Filter (MCSCKF)
205
Fig. 7.2 .RMSE of .vE in non-Gaussian noises (From [123])
r1 ∼0.9N 0, (0.1 m/s)2 + 0.1N 0, (1 m/s)2 ,
r2 ∼0.9N 0, (0.1 m/s)2 + 0.1N 0, (1 m/s)2 ,
r3 ∼0.9N 0, (0.1 m/s)2 + 0.1N 0, (1 m/s)2 , .
r4 ∼0.9N 0, ((1800/(Re ∗ π ))◦ )2 + 0.1N 0, ((18,000/(Re ∗ π ))◦ )2 ,
r5 ∼0.9N 0, ((1800/(Re ∗ π ))◦ )2 + 0.1N 0, ((18,000/(Re ∗ π ))◦ )2 ,
r6 ∼0.9N 0, (1 m)2 + 0.1N 0, (10 m)2 . Figures 7.2, 7.3, 7.4, 7.5, 7.6 and 7.7 reveal the .RMSEs, defined in (7.53), with different filters in non-Gaussian noises, and the corresponding .TRMSEs are listed in Table 7.3. As one can observe, the robust filters can outperform the non-robust filters in heavy-tailed noises. As before, it is shown that the MCSCKF achieves similar estimation to the SCKF when the kernel size is very large. Nevertheless, with a proper kernel size, the MCSCKF is significantly superior to the non-robust filters due to the robustness of the correntropy. Especially, when .σ = 2.0, the MCSCKF exhibits the best performance in all filters. It is noted that the UKF sometimes halts its operation due to the problem of numerical instability, and is not suited to the high-dimensional nonlinear systems.
206
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Fig. 7.3 .RMSE of .vN in non-Gaussian noises (From [123])
Fig. 7.4 .RMSE of .vU in non-Gaussian noises (From [123])
Furthermore, Table 7.4 illustrates the computational times of different filters. The computational costs of the CKF type filters are superior to that of UKF, and the computational cost of MCSCKF is not expensive compared to that of SCKF.
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
207
Fig. 7.5 .RMSE of L in non-Gaussian noises (From [123])
Fig. 7.6 .RMSE of .λ in non-Gaussian noises (From [123])
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial Points (MEEF-CKF) In MEEF-CKF, fiducial points are recommended for anchoring the error mean at zero automatically [88, 114], which actually plays a role of the correntropy and is
208
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Fig. 7.7 .RMSE of H in non-Gaussian noises (From [123]) Table 7.3 .TRMSEs of velocity and position in non-Gaussian noises (From [123])
Filters UKF CKF SCKF HSCKF MCSCKF.(σ MCSCKF.(σ MCSCKF.(σ MCSCKF.(σ MCSCKF.(σ
= 2.0) = 3.0) = 5.0) = 10) = 20)
.TRMSE
.TRMSE
of .vE .(m/s) 0.0452 0.0452 0.0452 0.0270 0.0247 0.0288 0.0353 0.0416 0.0431
of .vN .(m/s) 0.0479 0.0479 0.0479 0.0295 0.0264 0.0317 0.0391 0.0450 0.0474
Table 7.4 Computational time comparisons (From [123])
.TRMSE
of .vU
.TRMSE
.TRMSE
.TRMSE
.(m/s)
of L .(m) 2.4008 2.4009 2.4009 1.5244 1.3915 1.5800 1.9128 2.2293 2.3880
of .λ .(m) 3.3565 3.3566 3.3565 1.7277 1.4871 1.8255 2.4312 3.0185 3.0897
of H .(m) 0.4285 0.4285 0.4286 0.2430 0.2264 0.2568 0.3175 0.3873 0.4160
0.0103 0.0103 0.0103 0.0068 0.0062 0.0070 0.0084 0.0096 0.0101
Filters UKF CKF SCKF HSCKF MCSCKF
Computation time (Second) 1.0000 0.9917 0.9746 0.9776 0.9751
beneficial for robustness enhancement. The MEEF-CKF inherits the superiority of the correntropy and error entropy.
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
209
7.4.1 MEEF-CKF Algorithm MEEF-CKF algorithm is derived based on (7.3)–(7.8). Similar to MEE-UKF in Sect. 6.5, the batch-regression model is d(k) = W(k)x(k) + e(k),
(7.55)
.
xˆ (k|k − 1) where .d(k)=−1 (k) , .W(k) = −1 (k) y(k) − yˆ (k|k − 1) + S(k)ˆx(k|k − 1) In −(x(k) − xˆ (k|k − 1)) −1 . , .e(k) = (k) , and the calculations of .(k), S(k) r(k) + ν(k) .S(k), and .ν(k) are similar to those in MCUF. The MEEF-CKF performs state estimation by utilizing the minimum error entropy with fiducial point over the constructed batch-regression model. The optimal state estimation can be solved by xˆ (k) = arg max JMEEF
.
x(k)
⎧ L ⎨ = arg max η κσ1 (ej (k)) x(k) ⎩ j =1
+(1 − η)
L L i=1 j =1
⎫ ⎬ κσ2 [ej (k) − ei (k)] , ⎭ (7.56)
where .σ1 denotes the kernel size of MCC and .σ2 is the kernel size of MEE. Taking the derivative of .JMEEF−CKF with respect to .x(k) yields ∂JMEEF−CKF = η1 ej (k)κσ1 (ej (k))wj (k) + T1 − T2 − T3 + T4 . ∂x(k) L
j =1
= η1 W(k)T ϒ(k)e(k) + 2η2 WT (k) ( (k) − (k)) e(k),
(7.57)
¯ where .η1 = η/σ12 , η2 = 1 − η/σ22 and .-th row, .-column element of the matrix .¯ (k), .¯ (k) = κσ (e¯ (k) − e (k)). .ϒ(k), and . (k) are diagonal matrices with L % the .-th diagonal entry .ϒ (k) = κσ (e (k)) and . (k) = κσ (e (k) − ei (k)), Tab = η2 respectively. Let .
L % L % i=1 j =1
i=1
ea (k)κσ2 [ej (k) − ei (k)]wb (k). Scalars .T1 ,.T2 ,.T3
and .T4 in (7.57) are set as .T1 = Tjj , T2 = Tij , T3 = Tj i and .T4 = Tii , respectively. From (7.57), .x(k) can be estimated by a fixed-point iterative algorithm [127]
210
7 Cubature Kalman Filtering Under Information Theoretic Criteria
xˆ (k)τ +1 = g(ˆx(k)τ ) ⎡ ⎤−1 L % T (k) η κ (e (k))w (k)w σ1 j j j ⎢ 1 ⎥ ⎢ j =1 ⎥ =⎢ L L % ⎥ % ⎣ ⎦ +η2 κσ2 [ej (k) − ei (k)] 1
.
i=1 j =1
⎡
L %
⎤
κσ1 (ej (k))wj (k)dj (k) ⎢ η1 ⎥ ⎢ j =1 ⎥ ⎢ L L ⎥ % % ⎣ ⎦ +η2 κσ2 [ej (k) − ei (k)] 2 i=1 j =1
−1
= W(k)T (k)W(k) W(k)T (k)d(k) ,
(7.58)
where .xˆ (k)τ denotes the solution of .x(k) at the .τ -th fixed-point iteration; . 1 = [wi (k) − wj (k)][wi (k) − wj (k)]T ; . 2 = [di (k) − dj (k)][wi (k) − wj (k)];
(k) = 2η2 ( (k) − (k)) + η1 ϒ(k) xx (k) yx (k) = , xy (k) yy (k)
.
(7.59)
with . (k) ∈ RL×L , . xx (k) ∈ Rn×n , . xy (k) ∈ Rm×n , . yx (k) ∈ Rn×m and m×m . . yy (k) ∈ R Note that some extremely large outliers in the measurement may make the matrix .(k) singular such that the MEEF-CKF is confronted with numerical issue. We provide the following method to overcome this issue: .
(k) = y(k) − yˆ (k|k − 1).
(7.60)
o(k) = (k)T P−1 yy (k|k − 1) (k).
(7.61)
If .|o(k)| > othr , we update the state and covariance matrix by .xˆ (k) = xˆ ckf (k) and P(k) = Pckf (k), where .othr is a pre-set threshold, .xˆ ckf (k), and .Pckf (k) are obtained by performing a standard CKF. If .|o(k)| < othr , the MEEF-CKF is performed. With above derivations, the MEEF-CKF is summarized as Algorithm 13.
.
Remark 7.5 The threshold .othr is essential in detecting extremely large outliers. However, the positive number .othr is hard to be chosen since .othr is dependent on varying applications and the magnitude of measurements. An inner quartile range (IQR)-based approach can be used for alleviating this issue [164]. The (IQR)-based approach yields lower quartile .Q1 , second quartile .Q2 , and third quartile .Q3 . Then, .|o(k)| lower than .Q1 − 1.5(Q3 − Q1 ) or higher than .Q3 + 1.5(Q3 − Q1 ) is regarded as outlier. Since we focus on finding extremely large outliers for avoiding numerical ¯ 3 − Q1 ) with instability, the threshold parameter can be set to .othr = Q3 + k(Q ¯ positive constant .k > 1.5.
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
211
Algorithm 13 Cubature Kalman filter under minimum error entropy with fiducial points (MEEF-CKF) [127] Step 1: Initialize xˆ (1|0) and P(1|0); set a small positive threshold ε. Step 2: Evaluate the cubature points and the propagated cubature points with respect to function f and h; Step 3: Use (7.3) and (7.4) to obtain xˆ (k|k − 1) and P(k|k − 1), respectively; use (7.6), (7.7), and (7.8) to compute yˆ (k|k − 1), Pyy (k|k − 1), and Pxy (k|k − 1); use (7.60) and (7.61) to calculate (k) and o(k); determine the extreme outliers and handle them; construct the regression model in (7.55). Step 4: Let τ = 1 and xˆ (k)0 = xˆ (k|k − 1); use available measurement y(k) to update xˆ (k) in a fixed-point form: xˆ (k)τ = xˆ (k|k − 1) + K(k)[y(k) − h(ˆx(k|k − 1))]
(7.62)
ei (k) = di (k) − wT (k)ˆx(k)τ −1 (k) = xx (k) yx (k) ,
xy (k) yy (k)
(7.63)
with
(7.64)
T −1 P(k|k − 1) = −1 p (k|k − 1) xx (k)p (k|k − 1)
(7.65)
Pxy (k|k − 1) =
− 1)
(7.66)
T −1 Pyx (k|k − 1) = −1 p (k|k − 1) yx (k)r (k)
(7.67)
T −1 R(k) = −1 r (k) yy (k)r (k)
(7.68)
= K−1 K(k) p (k)Kh (k)
(7.69)
T −1 −1 r (k) xy (k)p (k|k
˜ where Kp (k) = P(k|k − 1) + ST (k) Pxy (k|k − 1) + Pyx (k|k − 1) + ST (k) R(k) S(k) and Kh (k) = ||ˆx(k)τ −ˆx(k)τ −1 || R(k). σ1 , σ2 and η are determined by Algorithm 14. If ≤ε Pyx (k|k − 1)+ST (k) ||ˆx(k)τ −1 ||
holds, set xˆ (k) = xˆ (k)τ and continue to Step 5. Otherwise, τ + 1 → τ , and return to (7.62). Step 5: Update k + 1 → k and T P(k)= [I − K(k)S(k)]P(k|k − 1)[I − K(k)S(k)]
T (k), + K(k)R(k) K
(7.70)
and return to Step 2.
Remark 7.6 The MEEF-CKF retains all merits of the correntropy and error entropy in dealing with multimodal distribution noises and large outliers. Through the Parzen’s PDF estimator and Gaussian kernel function, the error entropy behaves strong modeling ability for error distribution especially for multimodal distribution. Although large outliers may result in abnormal .e(i) and .d(i) in . 2 , the state posterior estimate .xˆ (k) in (7.58) is still robust with the help of correntropy and error entropy. The correntropy suppresses outliers through Gaussian kernel function
212
7 Cubature Kalman Filtering Under Information Theoretic Criteria
κσ1 (ej (k)). By contrast, error entropy reduces the influence of outliers when .d(i) and/or .d(j ) contain larger outliers, also exhibiting strong robustness through the Gaussian kernel function .κσ2 [ej (k) − ei (k)].
.
7.4.2 Determination of the Parameters The estimation accuracy of the MEEF-CKF is greatly affected by parameters .η, .σ1 , and .σ2 . For example, the parameter .η is utilized for balancing correntropy and error entropy. More concretely, the correntropy or error entropy is switched off by setting .η = 0 or .η = 1. In addition, the MEEF-CKF will reduce to the traditional CKF by setting .η = 1 and .σ1 → ∞. In this section, an effective optimization method is introduced for determining these parameters. In MEEF, the IP with a fiducial point can be divided into three terms: V (η, σ1 , σ2 ) = ηVc +(1 − η)Ve = ηE[κσ1 (e)]+(1 − η)E[p(e)] & ∞ [ηκσ1 (x) + (1 − η)pe (x)]pe (x)dx =
.
−∞ & 1 ∞
=
2
−∞
[ηκσ1 (x) + (1 − η)pe (x)]2 dx + U (η, σ1 , σ2 ),
(7.71)
where .Vc = [κσ1 (e)] is the correntropy, .Ve = E[p(e)] is the quadratic Renyi’s information potential, and .U (η, σ1 , σ2 ) takes the following form: U (η, σ1 , σ2 ) =
.
1 2
&
∞
−∞
[pe (x)]2 dx −
1 2
&
∞
−∞
[ηκσ1 (x) + (1 − η)pe (x) − pe (x)]2 dx. (7.72)
Since the first term in (7.71) is independent on the error model for given .η, .σ1 , and σ2 , the maximization of (7.71) can be regarded as maximizing (7.72) [139, 140], i.e.,
.
(η , σ1 , σ2 ) =
.
arg max
η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2
U (η, σ1 , σ2 )
' & ∞ 1 [pe (x)]2 dx = arg max 2 −∞ η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2 ( & 1 ∞ [ηκσ1 (x) + (1 − η)pe (x) − pe (x)]2 dx , − 2 −∞
(7.73)
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
213
where .Sη , .Sσ1 , and .Sσ2 denote the admissible sets of parameters .η, .σ1 , and .σ2 . When the PDF of errors is fixed at discrete time k, the optimal parameters are determined by & (η , σ1 , σ2 ) =
arg min η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2
=
arg min η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2
.
∞ −∞
'&
2 ηκσ1 (x) + (1 − η)pe (x) − pe (x) dx
∞ −∞
2 ηκσ1 (x) + (1 − η)pe (x) dx (
− 2E[ηκσ1 (x) + (1 − η)pe (x)]
(7.74)
.
Note that free parameters are obtained when the IP with a fiducial point .ηκσ1 (x) + (1 − η)pe (x) matches the error PDF .pe (x) as closely as possible. In MEE, N 1 % κσ2 (x − ei ), and thus we can denote the .pe (x) is approximated as .pe (x) ≈ N i=1 )∞ 2 optimization function as . −∞ ηκσ1 (x) + (1 − η)pe (x) dx − 2E[ηκσ1 (x) + (1 − 2 N )∞ % η)pe (x)] ≈ −∞ ηκσ1 (x) + (1 − η) N1 κσ2 (x − ei ) dx − 2E[ηκσ1 (x) + (1 − η) N1
N %
i=1
κσ2 (x − ei )]. The deformation of above optimization therefore gives
i=1
(η , σ1 , σ2 ) =
.
arg min
η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2
(η, σ1 , σ2 ) U
=
' & ∞ ( η¯ T GTσ1 ,σ2 (x)Gσ1 ,σ2 (x)dx η¯ − 2η¯ T h arg min ¯ −∞ η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2
=
¯ η¯ − 2η¯ T h { η¯ T G arg min ¯ }, η∈Sη ,σ1 ∈Sσ1 ,σ2 ∈Sσ2
1−η T (L+1)×1 and .h = where .η¯ = [η, 1−η ¯ L ,..., L ] ∈ R
1 N
(7.75) N %
Gσ1 ,σ2 (ei ) ∈ R(L+1)×1
i=1
with .Gσ1 ,σ2 (ei ) = )[κσ1 (ei ), κσ2 (ei − e1 ), . . . , κσ2 (ei − eN )] ∈ R(L+1)×1 . Thus, we ¯ = ∞ GTσ ,σ (x)Gσ ,σ (x)dx ∈ R(L+1)×(L+1) . can calculate .G 1 2 −∞ 1 2 Finally, the free parameters are optimized by maximizing the following objective function: ¯ η¯ T − 2η¯ T h. = η¯ G U ¯
.
(7.76)
The scaling factor .η¯ can be considered as the solution of maximizing (7.76), i.e., ¯ + λ¯ I)−1 h, η¯ = (G ¯
.
(7.77)
214
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Algorithm 14 Determination of parameters [127] Step 1: Use the errors samples {ei (k)}L ¯ calculate i=1 in (7.63) to construct the Gσ1 ,σ2 (ei ), and h; )∞ T ¯ = Gσ1 ,σ2 (x)Gσ1 ,σ2 (x)dx; set a finite set of kernel sizes Sσ1 and Sσ2 ; set a definite integral .G −∞
small enough regularization parameter λ¯ . Step 2: By substituting (7.77) into (7.75), the two free parameters σ1 and σ2 can be optimized by (σ1 , σ2 ) =
arg min σ1 ∈Sσ1 ,σ2 ∈Sσ2
T¯ ¯ T ¯ + λ¯ I)−1 h] ¯ + λ¯ I)−1 h] [(G G + λ¯ I)−1 h ¯ G( ¯ − 2[(G ¯ h ¯ .
(7.78)
Step 3: After kernel sizes have been determined, the scaling factor η can then be calculated from = (G ¯ + λ¯ I)−1 h, ¯ i.e, η = η¯ (1). Step 4: The above steps can be repeated until fixed-point iteration convergence.
¯ .η
¯ Algorithm 14 summawhere λ . ¯ is the regularization parameter to avoid singular .G. rizes the optimization method for determining parameters. Remark 7.7 By solving above optimization issue, free parameters in MEEF-CKF are updated at each fixed-point iteration at discrete time k. Generally, a much smaller kernel size may result in filtering divergence (see convergence analysis in Sect. 7.4.3). In addition, a much larger kernel size also weakens the robustness of the MEEF-CKF since errors containing outliers cannot be detected. Therefore, the admissible sets of kernel sizes are experimentally selected. To further investigate the function of three parameters, the influences of .η, .σ1 , and .σ2 on filtering accuracy are also provided in the following.
7.4.3 Convergence Issue This section provides a sufficient condition to ensure the convergence of the fixedpoint iteration in MEEF-CKF where the proof is based on [102, 103]. For simplicity, we assume that kernel sizes satisfy .σ1 = cσ2 with nonnegative constant c. First, .xˆ (k) in (7.58) can be rewritten by .
where .Mww = η1 Mdw = η1
.
L % j =1
L % j =1
xˆ (k)τ +1 = g(ˆx(k)τ ) = M−1 ww Mdw ,
κσ1 (ej (k))wj (k)wj T (k) + η2
L % L % i=1 j =1
(7.79) κσ2 [ej (k) − ei (k)] 1 ;
L % L % κσ2 [ej (k) − ei (k)] 2 . Accordκσ1 (ej (k))wj (k)dj (k) + η2 i=1 j =1
ing to the proof in [102] and [103], we obtain
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
215
√ n(η1 3 + η2 4 ) .||g(ˆ x(k))||1 ≤ φ1 (σ2 ) = , Mmin (η1 5 + η2 6 )
(7.80)
where .Mmin is the minimum eigenvalue of the matrix .Mww ; . 3 ||wj (k)||1 |dj (k)|; . 4 =
.
L L % % i=1 j =1
=
|di (k) − dj (k)|||wi (k) − wj (k)||1 ; . 5 =
(β||wj (k)||1 +|dj (k)|)wj (k)wj T (k); . 6 =
.
L % L % i=1 j =1
L % j =1
L % j =1
κσ1
κσ2 (β||wi (k)−wj (k)||1 +|di (k)−
dj (k)| ) 1 ; .β = φ1 (σ 2 ). In addition, we calculate the Jacobian matrix of .g(ˆx(k)) with respect to .xˆ (k) ∂ M−1 Mdw ∂x(k) ww ⎞ ⎛ s L η1 % ej (k)wj (k)κσ1 (ej (k)) ⎜ c2 σ22 ⎟ T ⎜ ⎟ j =1 ×wj (k)wj (k) = −M−1 ⎟ g(x(k)) ww ⎜ L % L ⎝ η2 % ⎠ + 2 ι1 1
∇xˆ (k) g(x(k)) =
.
σ2
i=1 j =1
⎞ L e (k)ws (k)κ (e (k)) % j σ1 j j ⎟ ⎜ ⎟ ⎜ j =1 ×wj (k)dj (k) + M−1 ⎟, ⎜ ww L % L ⎠ ⎝ η2 % + 2 ι1 2 σ ⎛
η1 c2 σ22
2
(7.81)
i=1 j =1
with .ι1 (k) = (ej (k) − ei (k))(wsj (k) − wsi (k))κσ2 (e(j ) − e(i)). Based on the proof in [102] and [103], we also obtain ||∇x(k) g(x(k))||1 ≤ φ2 (σ2 ) =
.
√ n(η1 7 + η2 8 ) 2 Mmin (c σ22 η1 5 + 2σ22 η2 6 )
(7.82)
where L (β||wj (k)||1 + |dj (k)|)|wj (k)||1
. 7 = β||wj (k)wj T (k)||1 × j =1 +||dj (k)wj (k)||1 4
(β||wi (k) − wj (k)||1 + |di (k) − dj (k)|) L L ||wi (k) − wj (k)||1 (β||[wi (k) − wj (k)] . 8 = [wi (k) − wj (k)]T ||1 + |dj (k) − di (k)| i=1 j =1 ||wi (k) − wj (k)||1 ) Therefore, the following Theorem 7.2 holds.
(7.83)
216
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Theorem 7.2 If the kernel size satisfies .σ2 ≥ max{σ †1 , σ †2 } and
β>
.
Mmin η1
√ n η 1 3 + η 2 4 L % j =1
wj (k)wj
T (k) + η
L L % % 2
i=1 j =1
,
(7.84)
1
where .σ †1 and .σ †2 are the solutions of .φ1 (σ2 ) and .φ2 (σ2 ), we have 0 .
||g(x(k))||1 ≤ β
||∇x(k) g(x(k))||1 = ∂g(x(k)) ∂x(k) ≤ ς ≤ 1,
(7.85)
1
where .ς = φ2 (σ 2 ). The proof of Theorem 7.2 is omitted because it is similar to the proofs in [102] and [103]. According to Theorem 7.2 and Banach fixed-point theorem [102, 103], the fixedpoint algorithm in (7.58) guarantees .x(k) to converge to a unique fixed-point in the range .x(k) ∈ {||x(k)||1 ≤ β} under kernel size .σ2 ≥ max{σ †1 , σ †2 } and the initial state vector .||x(k)0 ||1 ≤ β.
7.4.4 Illustrative Example In this section, the performance of the algorithm is demonstrated by target tracking with INS/GPS integrated system. Different coordinate frames are used for data collection. Define the INS body frame (Right-Forth-Up) by b-frame, the navigation frame (East-North-Up, ENU) by t-frame, the computed reference navigation frame by .t -frame, the Earth-centered, Earth-fixed (ECEF) by e-frame (WGS84), and the geocentric inertial coordinate frame by i-frame (J2000) [165]. Denote the target position as .p = [λ, L, H ]T with the latitude .λ, longitude L, ve sec(L) ˙ vn , .L = , and .H˙ = vu , and height H of INS in the form of .λ˙ = Rn + H Rm + H where .vt = [ve , vn , vu ]T is the velocity in navigation frame (ENU) and .Rm and .Rn represent the main curvature radius of the prime meridian and the equator, respectively. Define .ˆfb = fb + fb , .ωˆ tit = ωtit + ωtit , .ωˆ tie = ωtie + ωtie , and ˆ tet = ωtet + ωtet , where .fb denotes the specific force measured by accelerometers; .ω t is the angular velocity from the i-frame to t-frame; .ωt is the rotation velocity .ω it ie of the Earth; .ωtet is the rotation velocity from the e-frame to the t-frame; .ˆfb , .ωˆ tit , ˆ tie , and .ωˆ tet are the corresponding computed reference values in the .t -frame; and .ω t t b t .f , .ω , .ω , and .ω et are the corresponding errors. Define the Euler platform it ie error angle (EPEA) between t-frame and .t -frame by .φ. The models of attitude error angle .φ = [φe , φn , φu ]T and velocity error .v = [ve , vn , vu ]T are presented in the following [165]:
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
⎧ t t t t b ⎪ ⎨ φ˙ = Cω (I − Cb )ωˆ it + ωit − Cb ωib . ˙vt = (I − (Ctb )T )Ctb ˆfb + Ctb fb ⎪ ⎩ +(2ωˆ tie + ωˆ tet ) × vt + (2ωtie + ωtet ) × v,
217
(7.86)
⎡
⎤ cos φn cos φe 0 − sin φn cos φe where .Cω = cos1φe ⎣ sin φn cos φe cos φe − cos φn sin φe ⎦ ; .Ctb is the attitude − sin φn 0 cos φn matrix from b-frame to .t -frame. The model of position errors .p = ˙ H˙ T , gyroscope dynamic drifts .δ ζ˙ , and accelerometer dynamic biases ˙ L, λ, ve sec L vn ve L tan L sec L ˙ can be expressed as .L˙ = .δ a , , .λ˙ = + Rn + H Rn + H Rm + H ˙ = vu , .δ ζ˙ = − 1 δζ + δζ , and .δ a˙ = − 1 δa + δa [165], where .τg .H τg τa and .τa are the correlation time of gyroscope and accelerometer error drifts and T T .δζ = [Qge , Qgn , Qgu ] and .δa = [Qae , Qan , Qau ] are the zero-mean Gaussian white noise. The INS error propagation equation in discrete form is written as x(k) = f(x(k − 1)) + q(k − 1),
.
(7.87)
T where .x(k) = φ T , vT , pT , δζ T , δaT and the function .f(·) is obtained by (7.86). The process noise covariance matrix takes the form of .Q(k) = diag{01×9 , Qg , Qa } with .Qg = [Qge , Qgn , Qgu ] and .Qa = [Qae , Qan , Qau ]. Subtracting the position of GPS from INS, we have the measurement equation: [λ, L, H ]T = Hx(k) + r(k),
.
(7.88)
where .λ = λI N S − λGP S , .L = LI N S − LGP S , .H = HI N S − HGP S ; .H = [03×6 , I3×3 , 03×6 ]; .r(k) = [r1 (k), r2 (k), r3 (k)]T denotes the measurement noise. Here, the frequencies of INS and GPS are 10 HZ and 100 HZ, respectively. The random errors in gyro and accelerometer are .0.03◦ /h and .100 μg m/s2 . The gyro and accelerometer error drift correlation time are .τg = 300 s and .τa = 1000 s, respectively. The positioning error of the GPS system is .10−100 m. The robustness of the MEEF-CKF is compared with that of nonlinear Kalman filters including the EKF, UKF, CKF, MCCKF, MEE-EKF, stochastic integration student’s t filter (SISTF) [166], and robust student’s t based stochastic cubature filter (RSTSCF) [167]. In addition, key parameters including .η, .σ1 , and .σ2 are vital for improving filtering accuracy. Therefore, different methods for choosing parameters of the MEEF-CKF are considered. The MEEF-CKF (trial) represents that numerous experiments are run with different parameters so that desirable parameters are chosen for improving robustness. This approach is a hard manner for finding unknown parameters, which means parameters are predesigned and not adaptive. A soft manner, i.e., MEEF-CKF (optimization), is therefore presented for
218
7 Cubature Kalman Filtering Under Information Theoretic Criteria
avoiding this issue. In MEEF-CKF (optimization), those free parameters are chosen adaptively as shown in Algorithm 14. Several performance metrics including root mean square error (RMSE) [167, 168] and the root time averaged MSE (RTAMSE) [167, 168] are utilized for evaluating the filtering precision in complex non-Gaussian noises. The position variables .λ (latitude) and L (longitude) with units degree (.◦ ) are used in Kalman filtering. However, when calculating the RMSE and RTAMSE, we have performed the unit conversion. The units of .λ (latitude) and L, i.e., degree, can be converted into meter (m). Let us reformulate the position by .p = [p1 , p2 , p3 ]T for the same units m. The RMSE of position at discrete time k over Monte Carlo runs is then defined by 2 3 lmc np 3 3 1 mc .RMSEpos (k) = 4 (pj (k) − pˆ jmc (k))2 , lmc
(7.89)
mc =1 j =1
where .pjmc (k) and .pˆ jmc (k) represent the j -th element of desired and estimated position at discrete time k in the .mc -th Monte Carlo; notations .lmc and .np denote the number of Monte Carlo runs and dimensions of position. The estimated .pˆ jmc (k) is calculated by .pjI N S (k) − pˆ jmc (k), where .pjI N S (k) is measured by the INS and .p ˆ jmc (k) is estimated by the Kalman filtering. By contrast, the RTAMSE of position relates to calculating the root time averaged mean square error at all time instants over Monte Carlo runs. Therefore, the RTAMSE can be calculated by
RTAMSEpos
.
2 3 np M N 3 1 3 (pjmc (k) − pˆ jmc (k))2 , =4 NM
(7.90)
k=1 mc =k j =1
where notation N denotes the number of samples. Similar to the RMSE and RTAMSE in position, we can also write formula for the RMSE and RTAMSE in velocity. All simulation results are obtained by averaging over 300 independent Monte Carlo runs. We focus on performing all nonlinear Kalman filters under same initial condition for fair comparison. In simulations, observable state, i.e., o .x (k), is initialized by measurements for all nonlinear Kalman filters. As for the hidden state .xu (k), all nonlinear Kalman filters are initialized by zero value. The process noise is a zero-mean Gaussian distribution with the covariance matrix .Q(k) determined from the stochastic error of inertial sensors and set as .Q(k) = diag{01×9 , (0.03◦ /h)2 , (0.03◦ /h)2 , (0.03◦ /h)2 , (100 μg m/s2 )2 , (100 μg m/s2 )2 , 2 2 2 .(100 μg m/s ) }. Notation h denotes .1 h, and .g = 9.78 m/s represents the gravitational acceleration. The measurement noise matrix .R(k) is determined from the GPS’s stochastic measurement error. In this section, different measurement noises including three types of synthetic noises and one real-world noise are considered.
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . .
7.4.4.1
219
Case 1: Gaussian Noise with Outliers
In this scenario, measurement noises are modelled by Gaussian noise with outliers and described by
5 5 rr (k) ∼ 0.96N 0, ((3 Re )◦ )2 + 0.04N 0, ((3000 Re )◦ )2
r3 (k) ∼ 0.96N 0, (1 m)2 + 0.04N 0, (1000 m)2 ,
.
(7.91)
5 ◦ 2 represents the Gaussian distribution with zerowhere r = 1, 2; N 0, ((3 Re ) )5 mean and standard deviation (3 Re )◦ ; Re stands for the semimajor axis of the Earth. The dof parameters of SISTF and RSTSCF are all set as 20, and the scaling factor η of MEEF-EKF (trial) is 0.9. At first, we investigate the effect of different parameters on the filtering performance of the MEEF-CKF. Figure 7.8 shows the RTAMSE of the MEEF-CKF versus σ2 and η under Gaussian noise with outliers. It
RTAMSEpos
300
200
100
0 1 0.5 0
10
8
6
4
2
0
12
(a) 2
250 200
RTAMSEvel
Fig. 7.8 RTAMSEs of MEEF-CKF versus .σ2 and .η under Gaussian noise with large outliers: (a) position; (b) velocity (From [127])
150 100 50 0 1 0.5 0
0
2
6
4 2
(b)
8
10
12
220
7 Cubature Kalman Filtering Under Information Theoretic Criteria
is necessary to note that all position units are converted into (m) and all velocity units are converted into (m/s). As shown in Fig. 7.8, a much larger or smaller σ2 and η may deteriorate the filtering performance of the MEEF-CKF. Then, Table 7.5 presents the effect of parameters σ1 and σ2 on the RTAMSE, where the RTAMSEs of p1 , p2 , p3 , ve , vn , and vu are calculated. From Table 7.5, the MCCKF with much small kernel size is more sensitive to outliers in comparison with the MEEF-CKF. The MEEF-CKF achieves the optimal estimate performance by choosing appropriate kernel sizes, i.e., σ1 is around 2.0 and σ2 is around 1.0. Apart from the RTAMSE shown in Fig. 7.8 and Table 7.5, RMSEs of position and velocity for all nonlinear Kalman filters are studied in Fig. 7.9. From Fig. 7.9, the MCCKF, MEE-EKF, SISTF, RSTSCF, and MEEF-CKF all have improved robustness in comparison with the EKF, UKF, and CKF. In addition, the MEEFCKF (optimization) and MEEF-CKF (trial) behave enhanced robustness against non-Gaussian noises in comparison with other Kalman filters. It is also necessary to note that the MEEF-CKF (optimization) has slightly higher estimation precision than the MEEF-CKF (trial).
7.4.4.2
Case 2: Bimodal Gaussian Mixture Noises
Apart from above Gaussian distribution with outliers, the bimodal Gaussian mixture noises are further considered here which are described by
5 5 rr (k) ∼0.96N 10−4 , ((3 Re )◦ )2 + 0.04N −10−4 , ((3000 Re )◦ )2
(7.92) r3 (k) ∼0.96N 10−4 , (1 m)2 + 0.04N −10−4 , (1000 m)2 .
.
The dof parameters of SISTF and RSTSCF and scaling factor of MEEF-EKF (trial) are as same as those in Case 7.4.4.1. Table 7.6 shows RTAMSEs of different algorithms under bimodal Gaussian mixture noises. From Table 7.6, the filtering accuracy of the MCCKF is degraded greatly by much small kernel size. By contrast, the MEEF-CKF always achieves the best estimate performance among all Kalman filters under the appropriate choice of kernel sizes and scaling factor. Figure 7.10 presents the RMSEs of position and velocity for different Kalman filters. From Fig. 7.10, the MEEF-CKF behaves superior filtering performance in comparison with other Kalman filters. In addition, the MEEF-CKF (optimization) performs better than the MEEF-CKF (trial) in dealing with non-Gaussian noises.
7.4.4.3
Case 3: Bimodal Gaussian Mixture Noises with Outliers
In this scenario, we consider the bimodal Gaussian mixture noises with outliers which are modelled by combining the Gaussian noise with outliers and bimodal Gaussian mixture noises.
= 1.0 = 1.5 = 2.0 = 3.0 = 5.0 = 10
Bold values highlight the best results
Kernel size N/A N/A N/A σ2 = 1 N/A N/A σ1 = 2.0 σ1 = 2.5 σ1 = 3.0 σ1 = 5.0 σ1 = 10 MEEF-CKF σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2
Algorithm EKF UKF CKF MEE-EKF SISTF RSTSCF MCCKF
RTAMSE of p1 (m) 543.218 360.622 359.123 128.958 273.129 273.129 N/A 136.684 166.959 259.119 342.632 78.777 100.621 127.706 148.157 209.576 314.408
RTAMSE of p2 (m) 555.682 367.792 366.643 131.283 285.274 285.274 N/A 150.655 177.304 276.697 353.983 93.240 110.648 122.355 156.152 222.425 328.797
RTAMSE of p3 (m) 303.693 137.021 136.052 57.090 119.229 119.229 N/A 71.181 81.984 117.761 135.416 50.650 55.265 61.778 77.162 107.431 137.768
RTAMSE of ve (m/s) 152.509 87.011 86.200 34.072 79.107 79.107 N/A 36.531 41.779 65.972 83.981 22.750 27.359 29.788 37.805 54.641 80.546
Table 7.5 RTAMSES of different algorithms under Gaussian noise with large outliers (From [127]) RTAMSE of vn (m/s) 174.838 101.641 100.794 37.369 88.353 88.353 N/A 38.692 46.359 73.365 96.916 21.478 28.126 35.370 42.492 60.827 91.836
RTAMSE of vu (m/s) 206.461 61.704 60.923 34.962 79.885 79.885 N/A 39.436 42.143 54.440 61.254 34.641 35.465 37.136 42.183 53.467 65.452
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . . 221
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Fig. 7.9 RMSEs of different algorithms under Gaussian noise with outliers: (a) position; (b) velocity (From [127])
1500
RMSEpos (m)
222
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
1000
500
0
0
100
200
300
Time (s)
400
500
(a) 500
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
RMSEvel (m/s)
400 300 200 100 0
0
100
200
300
Time (s)
400
500
(b)
5 5 rr (k) ∼0.48N 10−4 , ((3 Re )◦ )2 + 0.04N 0, ((3000 Re )◦ )2
5 + 0.48N −10−4 , ((3 Re )◦ )2
r3 (k) ∼0.48N 10−4 , (1 m)2 + 0.04N 0, (1000 m)2
+ 0.48N −10−4 , (1 m)2 .
.
(7.93)
The dof parameters of SISTF and RSTSCF and scaling factor of the MEEFEKF (trial) are as same as those in Case 7.4.4.1. Table 7.7 shows RTAMSEs of different Kalman filters under bimodal Gaussian mixture noises with outliers. Figure 7.11 presents the RMSEs of position and velocity for different algorithms. From Table 7.7 and Fig. 7.11, the same conclusions as those in Case 7.4.4.2 are obtained.
= 1.0 = 1.5 = 2.0 = 3.0 = 5.0 = 10
Bold values highlight the best results
Kernel size N/A N/A N/A σ2 = 1 N/A N/A σ1 = 2.0 σ1 = 2.5 σ1 = 3.0 σ1 = 5.0 σ1 = 10 MEEF-CKF σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2
Algorithm EKF UKF CKF MEE-EKF SISTF RSTSCF MCCKF
RTAMSE of p1 (m) 602.997 381.169 381.150 138.324 291.808 291.769 N/A 156.629 182.488 280.302 362.797 122.006 143.640 162.454 185.325 231.756 328.827
RTAMSE of p2 (m) 610.048 393.020 392.873 140.612 304.991 304.936 N/A 164.326 195.425 299.174 377.260 118.071 128.684 140.143 165.946 235.988 347.023
RTAMSE of p3 (m) 246.913 134.516 134.321 55.284 118.394 118.324 N/A 68.928 79.831 116.123 133.049 47.632 53.612 60.921 75.301 104.863 135.177
Table 7.6 RTAMSEs of different algorithms under bimodal Gaussian mixture noises (From [127]) RTAMSE of ve (m/s) 164.719 90.294 90.123 35.958 85.834 85.756 N/A 35.944 43.400 69.291 86.929 27.620 28.618 31.487 38.857 55.733 82.535
RTAMSE of vn (m/s) 194.396 103.873 103.653 40.086 92.297 92.134 N/A 38.675 47.498 75.958 99.104 26.322 34.732 40.893 46.289 62.977 92.659
RTAMSE of vu (m/s) 167.625 61.742 61.547 30.089 85.639 85.552 N/A 37.375 41.025 55.028 61.311 31.916 33.055 37.402 41.580 52.775 65.195
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . . 223
7 Cubature Kalman Filtering Under Information Theoretic Criteria
Fig. 7.10 RMSEs of different algorithms under bimodal Gaussian mixture noises: (a) position; (b) velocity (From [127])
1500
RMSEpos (m)
224
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
1000
500
0
0
100
200
300
Time (s)
400
500
(a) 500
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
RMSEvel (m/s)
400 300 200 100 0
0
100
200
300
Time (s)
400
500
(b)
7.4.4.4
Case 4: Real-World Noise
The measurement noise in real world is in fact much more complex than noises in Sects. 7.4.4.1, 7.4.4.2, and 7.4.4.3. Apart from different noises studied in Sects. 7.4.4.1, 7.4.4.2, and 7.4.4.3, the filtering robustness of the MEEF-CKF is therefore further demonstrated in real-world noises. In this scenario, the process noise covariance matrix Q(k) is set by using the errors of the gyro and accelerometer. The measurement noise covariance matrix R(k) = dia{(1.52 × 10−2 )◦ , (1.60 × 10−2 )◦ , 6.90 × 101 m}2 is set by using the GPS’s stochastic measurement error. The dof parameters in SISTF and RSTSCF are all set as 0.005, and the scaling factor in MEEF-EKF (trial) is 0.85. Figure 7.12 presents the RMSEs of the position and velocity for different algorithms. From Fig. 7.12, we observe that the SISTF and RSTSCF achieve higher filtering accuracy than other filters in terms of position estimation in the early stage but diverge at the later stage. Similar conclusion can also be obtained for velocity estimation. As for the MEEF-CKF,
= 1.0 = 1.5 = 2.0 = 3.0 = 5.0 = 10
Bold values highlight the best results
Kernel size N/A N/A N/A σ2 = 1 N/A N/A σ1 = 2.0 σ1 = 2.5 σ1 = 3.0 σ1 = 5.0 σ1 = 10 MEEF-CKF σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2 σ1 = 2.0, σ2
Algorithm EKF UKF CKF MEE-EKF SISTF RSTSCF MCCKF
RTAMSE of p1 (m) 526.705 365.542 359.123 181.402 288.739 288.129 N/A 182.225 209.227 276.915 349.129 144.343 155.346 167.071 191.956 236.329 322.591
RTAMSE of p2 (m) 533.836 377.327 366.643 190.104 302.567 302.274 N/A 188.792 212.249 296.245 364.652 150.615 161.895 170.483 194.228 249.773 341.047
RTAMSE of p3 (m) 215.788 132.489 136.052 131.031 116.293 116.229 N/A 64.458 75.230 112.372 130.933 42.782 48.096 55.955 71.230 101.455 133.290
RTAMSE of ve (m/s) 138.136 88.781 88.200 48.353 84.654 84.107 N/A 43.958 50.092 69.965 86.059 35.772 38.937 40.874 46.598 60.553 82.996
Table 7.7 RTAMSEs of different algorithms under bimodal Gaussian mixture noises with outliers (From [127]) RTAMSE of vn (m/s) 164.283 101.255 100.794 51.920 92.527 92.353 N/A 48.654 57.849 77.390 97.059 38.152 42.423 46.480 54.337 67.102 92.120
RTAMSE of vu (m/s) 142.136 60.392 60.088 79.363 85.209 85.196 N/A 33.968 38.213 52.442 59.942 28.313 29.596 32.321 42.790 51.090 63.763
7.4 Cubature Kalman Filter Under Minimum Error Entropy with Fiducial. . . 225
Fig. 7.11 RMSEs of different algorithms under bimodal Gaussian mixture noises with outliers: (a) position; (b) velocity (From [127])
7 Cubature Kalman Filtering Under Information Theoretic Criteria EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
1400 1200
RMSEpos (m)
226
1000 800 600 400 200 0
100
500
300
Time (s) (a)
400
500
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
400
RMSEvel (m/s)
200
300 200 100 0
0
100
200
300
Time (s)
400
500
(b)
the filtering performance of the MEEF-CKF is superior to that of other algorithms in complicated noises. Although the MEEF-CKF performs worse than other filters at small noises, all Kalman filters achieve desirable filtering precision. In addition, the MEEF-CKF (optimization) in general behaves better than the MEEF-CKF (trial). From the aspect of computational complexity, the average computing time of the UKF, CKF, MCCKF, MEEF-CKF (trial), and MEEF-CKF (optimization) is 2.40 × 10−4 s, 1.120 × 10−4 s, 3.39 × 10−4 s, 6.62 × 10−4 s, and 7.12 × 10−4 s, respectively. As the MEEF-CKF needs to calculate more matrices in (7.64), the MEEF-CKF requires more but completely acceptable computing time. It is also necessary to note that the computing time of all filters has no significant difference in terms of order of magnitude.
7.5 Conclusion
227 500
Fig. 7.12 RMSEs of different algorithms under real-world noise: (a) position; (b) velocity (From [127])
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
RMSEpos (m)
400 300 200 100 0
0
100
200
300
Time (s)
400
500
(a) 60
EKF UKF CKF MCCKF MEE-EKF SISTF RSTSCF MEEF-CKF (trial) MEEF-CKF (optimization)
RMSEvel (m/s)
50 40 30 20 10 0
0
100
200
300
Time (s)
400
500
(b)
7.5 Conclusion In this chapter, the CKF based on MCC (MCCKF) is derived to solve the estimation problems in nonlinear system with Gaussian and Gaussian mixture noises. Similar to MCUF, a statistical linearization is applied to the nonlinear measurement equation, without calculation of the Jacobian matrix. Different from MCUF, the MCCKF requires no free scaling parameter to generate sigma points. Simulations demonstrate that compared with CKF, the derived MCCKF can achieve a higher filtering accuracy than CKF in nonlinear systems with Gaussian and impulsive noises. To avoid numerical stability problem in the state error covariance matrix, the MCCKF is extended as its square-root version, generating the maximum correntropy squareroot cubature Kalman filter (MCSCKF). In the MCSCKF, the nonlinear regression model is constructed, and the MCC is used to obtain the optimal estimates. The MCSCKF is applied to the SINS/GPS integrated system, which exhibits strong robustness against the heavy-tailed non-Gaussian noises.
228
7 Cubature Kalman Filtering Under Information Theoretic Criteria
In addition, the cubature Kalman filter under minimum error entropy with fiducial points (MEEF-CKF) is derived. The MEEF-CKF combines the minimum error entropy and cubature Kalman filter for handling complex non-Gaussian noises. A fiducial point is also considered in MEEF-CKF for anchoring the error mean at zero automatically, which is beneficial for impelling errors to move toward zero. A designed optimization approach is presented to determine multiple free parameters of the MEEF-CKF adaptively. The improved robustness of MEEF-CKF and effectiveness of parameter optimization are demonstrated by object tracking problem with INS/GPS integration in complex non-Gaussian noises.
Chapter 8
Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Besides the previously mentioned Kalman filters, some other Kalman-type filters are derived based on information theoretic criteria for specific cases. For example, to address the problem of state estimation under equality constraints, the maximum correntropy Kalman filter with state constraints (MCKF-SC) was developed [169], which combines the MCC and constrained estimation methodology. In addition, two novel nonlinear filters, the correntropy-based first-order divided difference (CDD1) filter and the correntropy-based second-order divided difference (CDD2) filter [170– 172], are derived to solve the problem of numerical instability due to the propagation of a nonpositive definite covariance matrix. When the parameters of system model are unknown a priori or varying with time, the dual Kalman filtering under minimum error entropy with fiducial points (MEEF-DEKF) [173] provides an effective tool in estimating the model parameters and the hidden states under non-Gaussian noises. For nonlinear time-series prediction, the kernel Kalman filtering based on conditional embedding operator and maximum correntropy criterion (KKF-CEOMCC) was also derived [174], which shows significant performance improvements over traditional filters in noisy nonlinear time-series prediction.
8.1 Maximum Correntropy Kalman Filter with State Constraints (MCKF-SC) The MCKF can achieve better performance than the conventional MMSE-based Kalman filters, especially when the underlying dynamic system is disturbed by heavy-tailed non-Gaussian noise. If some prior information about the system is available, the performance of MCKF can be further improved. In the following, we will consider linear or nonlinear constraints on the system states. For simplicity, we omit the time step k.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2_8
229
230
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
8.1.1 Linear Constraint For a linear state-space model, assume that the state elements are subject to an additional linear constraint Mx = m,
.
(8.1)
where .M is an .s × n matrix, .m is an .s × 1 vector, and .s ≤ n. Moreover, we assume that .M is of full rank. Next, two methods are presented to incorporate the above linear constraint.
8.1.1.1
Projection Estimation
Projection estimation is a common method to address the problem of constrained filtering. It projects the unconstrained estimate .xˆ onto the constrained surface. The constrained estimate .x can be given by the following equation:
x = arg min
.
x
T x − xˆ W x − xˆ ,
(8.2)
satisfying Mx = m,
.
(8.3)
where .W is a positive definite weighting matrix. Then, the Lagrangian function is T L = x − xˆ W x − xˆ + 2λT (Mx − m) .
.
(8.4)
The optimal solution can be found by solving ∂L =0 ∂x
(8.5)
∂L = 0. ∂λ
(8.6)
.
.
Thus, we have −1 Mˆx − m . x = xˆ − W−1 MT MW−1 MT
.
(8.7)
We often set .W = (P(k))−1 or .W = I to obtain the constrained estimate. Algorithm 15 describes all the steps of the MCKF with projection estimation algorithm.
8.1 Maximum Correntropy Kalman Filter with State Constraints (MCKF-SC)
231
Algorithm 15 MCKF with State Constraints [169] Step 1: Initialize the state a priori estimate xˆ (1|0) and state prediction error covariance matrix P(1|0); set a proper kernel size σ and a small positive number ε. Step 2: Calculate xˆ (k|k − 1) and P(k|k − 1); use the Cholesky decomposition of (k) to obtain p (k|k − 1) and r (k). Step 3: Let τ = 1 and xˆ 0 (k) = xˆ (k|k − 1). Step 4: Use available measurements {y(k)}N k=1 to update: xˆ τ (k) = xˆ (k|k − 1) + K(k)(y(k) − H(k)ˆx(k|k − 1)) −1 K(k) = P(k|k − 1)HT (k) H(k) P(k|k − 1)HT (k) + R(k)
(8.8) (8.9)
−1 P(k|k − 1) = Bp (k|k − 1) Cx (k) BTp (k|k − 1)
(8.10)
−1 R(k) = Br (k) Cy (k) BTr (k)
(8.11)
Cx (k) = diag{κσ ( e1 (k)) , . . . , κσ ( en (k))}
(8.12)
Cy (k) = diag{κσ ( en+1 (k)) , . . . , κσ ( en+m (k))}
(8.13)
ei (k) = di (k) − wi (k)ˆxτ −1 (k)
(8.14)
xˆ τ (k)−ˆxτ −1 (k) ≤ ε, go to Step 5; otherwise, set τ = τ + 1 and go to (8.8). xˆ τ −1 (k) Step 5: Obtain xˆ (k) and T T (k) + K(k)R(k) K P(k) = I − K(k)H(k) P(k|k − 1) I − K(k)H(k) Step 6: State constraint Mx(k) = m −1 ˆ (k) − W−1 MT MW−1 MT Mˆx(k) − m .x(k) = x Step 7: Set k = k + 1 and go to Step 2. If .
8.1.1.2
Probability Density Function Truncation
The probability density function (PDF) truncation method truncates the PDF of the state estimate at the constraint edges. The constrained state estimate is equal to the mean of the truncated PDF. The steps of this method are summarized as follows. First, we define .x(t) as the state estimate after the first t constraints of (8.1) have (t) been enforced and .P as its covariance matrix. Initialize (t)
t = 0, x(t) = xˆ , P
.
= P(k).
(8.15)
Make the following transformation: z(t) = ρS−1/2 UT (x − x(t) ),
.
(8.16)
where .ρ is an orthogonal matrix, .U is an orthogonal matrix, and .S is a diagonal matrix.
232
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria (t)
Solve the singular value decomposition (SVD) of .P , that is, (t)
P
.
= USVT ,
(8.17)
and obtain the orthogonal matrix .ρ by using the Gram-Schmidt orthogonalization that satisfies
T 1/2 ρS1/2 UT MTt+1 = (Mt+1 P(t) MT ) , 0, · · · , 0 , t+1
(8.18)
.
where .Mt+1 denotes the .t + 1-th row of .M. According to (8.1), we have Mt+1 US1/2 ρ T z(t) + Mt+1 x(t) = mt+1 ,
(8.19)
.
where .mt+1 denotes the .(t + 1)-th element of .m. (t) Dividing both sides of (8.19) by .(Mt+1 P MTt+1 )1/2 and rearranging it yield .
mt+1 − Mt+1 x(t)
Mt+1 US1/2 ρ T (t)
(Mt+1 P MTt+1 )
z(t) = 1/2
1/2
(t)
.
(8.20)
(Mt+1 P MTt+1 )
As shown by (8.16)–(8.18), .z(t) has a mean of zero and a covariance matrix of identity. We define ct+1 =
.
(mt+1 − Mt+1 x(t) ) (t)
(Mt+1 P MTt+1 )
1/2
,
(8.21)
and have .
1 0 · · · 0 z(t) = ct+1 .
(8.22)
The mean and covariance matrix of .z(t+1) are given by μ = ct+1
(8.23)
ξ 2 = 0,
(8.24)
.
.
and the mean and variance of the transformed state estimate after the first .t + 1 constraints are enforced can be written as T z(t+1) = μ 0 · · · 0
.
(8.25)
8.1 Maximum Correntropy Kalman Filter with State Constraints (MCKF-SC)
Cov(z(t+1) ) = diag{ ξ 2 1 · · · 1 }.
.
233
(8.26)
Making the inverse transformation of (8.16), the mean and variance of the constrained state estimate that match the .(t + 1)-th constraint can be computed by x(t+1) = US1/2 ρ T z(t+1) + x(t)
.
(t+1)
P
.
= US1/2 ρ T Cov(z(t+1) )ρS1/2 UT .
(8.27) (8.28)
The step stops when .t + 1 = length(m), where .length(m) denotes the dimension of .m.
8.1.2 Nonlinear Constraint Now we assume that the states are subject to the following nonlinear constraint: .
g(x) = m,
(8.29)
where .g(·) is a nonlinear function of .x and .m is an .s × 1 vector. Below we present two methods to address the nonlinear constraint (8.29).
8.1.2.1
Linear Approximation
We expand the nonlinear state constraints around the current constrained state estimate .x and get gi (x) − mi =gi (x) + gi (x)T (x − x) + .
1 (x − x)T gi (x)(x − x) + · · · − mi 2!
=0, (8.30) where .gi (·) is the i-th row of .g(·), .mi is the i-th element of .m, and .gi (·) and .gi (·) represent the first and second partial derivatives of .gi (·). Retaining only the first-order terms, and making some rearrangements, we have g (x)T x ≈ m − g(x) + g (x)T x,
.
(8.31)
which has the similar form to (8.1) by replacing .M and .m with .g (x)T and .m−g(x)+ g (x)T x. The problem of nonlinear constraints can thus be solved approximately by a similar process of linear constraints.
234
8.1.2.2
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Second-Order Approximation
If the state constraint is second-order, it can be viewed as a second-order approximation of the nonlinearity:
T x T t x f(x(k)) = T 1 t t0 1 .
(8.32)
=xT Tx + tT x + xT t + t0 = 0. Like the idea of the constrained Kalman filter of [62], we can project an unconstrained state estimation onto a nonlinear constrained surface: .x = arg min (8.33) (z − Sx)T (z − Sx) , x
such that f(x) = 0.
(8.34)
.
Clearly, if we set .W = ST S and .z = Sˆx, Eqs. (8.33) and (8.34) become the same as (8.2) and (8.3). The solution of the constrained optimization problem in (8.33) and (8.34) can be obtained by using the Lagrangian multiplier technique: x = g−1 v(I + β T )−1 a(β)
.
q(β) =
.
a 2 (β)α 2 i i i
(1 + βαi2 )
2
+2
ai (β)cj i
1 + βαi2
+ t0 = 0,
(8.35) (8.36)
where .g is an upper right diagonal matrix solved by the Cholesky decomposition of W = HT H:
.
W = gT g.
.
(8.37)
In (8.35), .v is an orthonormal matrix, and . is a diagonal matrix with diagonal components denoted by .αi , which can be found by the singular value decomposition (SVD) of the matrix .Lg−1 as Lg−1 = UvT ,
.
(8.38)
where .U is the orthonormal matrix of the SVD, .L can be solved by the factorization of .T = LT L, and a(β) = [. . . ai (β) . . .]T = vT (gT )−1 (HT z − βt)
.
(8.39)
8.1 Maximum Correntropy Kalman Filter with State Constraints (MCKF-SC)
c = [. . . ci . . .]T = vT (gT )−1 t.
235
(8.40)
.
It is difficult to obtain a closed-form solution for the nonlinear equation .q(β) = 0 in (8.36), so one has to solve it using some numerical root-finding algorithms such as Newton’s method. Taking the derivative of .q(β) with respect to .β, we have q(β) ˙ =2
ai (β)a˙ i (1 + βα 2 )α 2 − a 2 (β)α 4 i
i 3 2 (1 + βαi )
i .
+2
i
i
a˙ i ci (1 + βα 2 ) − ai (β)ci α 2 i
i
2
(1 + βαi2 )
i
(8.41) ,
where a = [. . . a˙ i . . .]T = −vT (gT )−1 t.
.
(8.42)
Then, the solution of .β with Newton’s method can be updated by β (t+1) = β (t) − .
starting with β
q(β (t) ) q(β ˙ (t) )
(0)
(8.43)
= 0, t = 0.
(t+1) (t) β −β
≤ ε, or the number of iterations reaches a The iteration stops when . |β (t) | prespecified number. Substituting the solution of .β into (8.35), one can obtain the constrained optimal solution.
8.1.3 Illustrative Example Consider a ground vehicle that travels along a circular road. The turn center is assumed to be the origin of the coordinate system, and the turn radius is .r = 100 m. The angular turn rate of the vehicle is a constant .5.7 deg/s with an equivalent linear speed of .10 m/s. The true initial state is T x(0) = x1 , x2 , x3 , x4 . T = 100 m, 0 m/s, 0 m, 10 m/s , where the estimated initial state is assumed to be the same as the true initial state, and corresponding covariance matrix is .P(0) = diag{ 52 , 12 , 52 , 12 }. The vehicle is tracked by a sensor located at the origin with a sampling interval of .T = 1 s. The measurement equation is
236
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
1000 .y(k) = x(k) + r(k), 0010
(8.44)
where the measurement noises are assumed to follow the distributions r1 (k) ∼ 0.8N(0, 9) + 0.2N(0, 900) .
r2 (k) ∼ 0.8N(0, 9) + 0.2N(0, 900).
Moreover, the state equation is a discrete-time second-order kinematic model, given by ⎡
1 ⎢0 .x(k + 1) = ⎢ ⎣0 0
T 1 0 0
0 0 1 0
⎤ ⎡1 2 0 2T ⎥ ⎢ T 0⎥ x(k) + ⎢ ⎣ 0 T⎦ 0 1
⎤ 0 0 ⎥ ⎥ 1 2 ⎦ q(k), T 2 T
(8.45)
where the process noises are assumed to be Gaussian, with zero mean and covariance .Q = diag{ 2, 2 }. In the simulation, we carry out 100 independent Monte Carlo runs each lasting ◦ .60 s. Linear1 denotes the case where a linearizing point at .θ = 45 is selected to cover the entire curved road, and the linearized state constraint at .θ is given by
.
cos θ 0 sin θ 0 r x= . 0 cos θ 0 sin θ 0
(8.46)
Linear2 denotes the case in which four linearized points at .θ1 = 45◦ , .θ2 = 135◦ , ◦ ◦ .θ3 = 225 , and .θ4 = 315 are selected to cover the entire curved road with four linear segments. The switching points are .θ5 = 90◦ , .θ6 = 180◦ , and .θ7 = 270◦ , respectively. Nonlinear1 denotes a nonlinear constraint with respect to the position estimate, given by f1 (x(k)) = x12 (k) + x32 (k) = r 2 .
.
(8.47)
Nonlinear2 denotes the constraint with respect to both the position and velocity, and the constrained velocity estimate is obtained by the following projection: v = (ˆvτ )τ,
.
(8.48)
is the constrained velocity estimate, .vˆ = xˆ2 , xˆ4 is the T unconstrained velocity estimate, and .τ = − sin θ , cos θ is the constrained unit T direction vector associated with the constrained position estimate .p = x 1 , x 3 at −1 (x /x ). .θ = tan 3 1 where .v =
x2, x4
T
8.2 Correntropy-Based Divided Difference Filter (CDD) Trajectories
150
true values measurements MCKF MCKF−linear1 MCKF−linear2 MCKF−nonlinear1 MCKF−nonlinear2
100 50 y/m
237
0 −50 −100 −150 −150
−100
−50
0
50 x/m
100
150
200
250
Fig. 8.1 Trajectories of different filters (from [169]) Table 8.1 .ARMSEp and of different filters (From [169])
.ARMSEv
Filter KF MCKF KF-linear1 KF-linear2 MCKF-linear1 MCKF-linear2 KF-nonlinear1 KF-nonlinear2 MCKF-nonlinear1 MCKF-nonlinear2
.ARMSEp
.ARMSEv
10.9234 10.0285 104.0186 12.5875 103.9012 11.8897 5.3354 5.3354 4.3476 4.4060
4.1711 4.1291 6.9674 4.2057 7.0238 4.0933 2.6478 1.2226 2.4212 0.9674
Figure 8.1 shows the trajectories of different MCKF family filters, and Table 8.1 summarizes the .ARMSEp and .ARMSEv of different filters. In the simulation, the kernel size is set to .σ = 2.0. Again, the MCKF family filters achieve better performance than the KF family filters, and the constrained filters outperform the unconstrained ones. In addition, the second-order nonlinear approximation-based filters are superior to the linear approximation-based filters.
8.2 Correntropy-Based Divided Difference Filter (CDD) The divided difference filter (DD) [170, 171] is derived from polynomial approximations of the nonlinear transformations using multidimensional interpolation formula and can be classified into the first-order divided difference filter (DD1) and the second-order divided difference filter (DD2). The DD can guarantee a positive semidefiniteness of the covariance matrix. The DD2 filter not only has fewer parameters
238
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
but also has nearly the same performance as the UKF. Therefore, robust firstand second-order divided difference filtering algorithms based on correntropy are derived in the following, which not only retain the advantages of divided difference filters but also exhibit robustness in the presence of non-Gaussian noises, especially when the measurements are contaminated by heavy-tailed noises.
8.2.1 First-Order Divided Difference Filter (DD1) The nonlinear state and measurement equations are described by x(k) = f (x(k − 1), q(k − 1))
.
y(k) = h (x(k), r(k)) ,
(8.49)
where .x(k) ∈ Rn and .y(k) ∈ Rm are the n -dimensional state vector and m dimensional measurement vector at time step k. .f (·) and .h(·) are the continuous and differentiable state functions and the measurement function. .q(k − 1) and .r(k) are the process and measurement noises, which are assumed i.i.d and independent of the states, with means denoted by .q(k − 1) and .r(k) and covariance matrices by .Q(k − 1) and .R(k). The square-root decompositions of the predicted state error covariance .P(k), estiˆ mated state error covariance .P(k), process noise covariance .Q(k), and measurement noise covariance .R(k) are introduced as T
P(k) = Sx(k) Sx(k) ˆ P(k) = Sˆ x(k) Sˆ Tx(k) .
Q(k) = Sq(k) STq(k) R(k) = Sr(k) STr(k) . The DD1 filter uses the first-order divided differences to approximate the nonlinear equation. Let the element of i -th row, j -th column of .Sxˆx be .Sxˆx i,j, , i.e., k k .S = Sxˆx i,j , and similarly for other matrices. The four matrices containing xˆxk k first-order divided differences are defined by fi xˆ (k − 1) + cSˆ x(k−1),j , q(k − 1)
−fi xˆ (k − 1) − cSˆ x(k−1),j , q(k − 1) /2c . = fi xˆ (k − 1), q(k − 1) + cSq(k−1),j
Sxˆx(k−1) =
.
Sxq(k−1)
(8.50)
8.2 Correntropy-Based Divided Difference Filter (CDD)
Syx(k) Syr(k)
239
(8.51) −fi xˆ (k − 1), q(k − 1) − cSq(k−1),j /2c .
= hi x(k) + cSx(k),j , r(k) − hi x(k) − cSx(k),j , r(k) /2c . (8.52) = hi x(k), r(k) + cSr(k),j − hi x(k), r(k) − cSr(k),j /2c , (8.53)
where .Sˆ x(k−1),j is the j -th column of .Sˆ x(k−1) and c is the interval length generally set at .c2 = 3. The predicted state and the Cholesky factor of corresponding covariance are x(k) = f xˆ (k − 1), q(k − 1)
. Sx(k) = H Sxˆx(k−1) , Sxq(k−1) ,
(8.54)
where .H(·) denotes a Householder transformation of the augment matrix [170]. The predicted measurement and the Cholesky factor of corresponding covariance are given by
.
y(k) = h (x(k),
r(k)) Sy(k) = H Sy¯x(k) , Syr(k) .
(8.55)
The Kalman gain is −1 T K(k) = Sx(k) ST . yx(k) Sy(k) Sy(k)
.
(8.56)
The updated state and the Cholesky factor of corresponding covariance are
.
xˆ (k) = x(k)
+ K(k) (y(k) − y(k)) ˆSx(k) = H Sx(k) − K(k)S , K(k)S . yr(k) yx(k)
(8.57)
8.2.2 Second-Order Divided Difference Filter (DD2) The DD2 filter uses the second-order divided differences to approximate the nonlinear equation. Four matrices containing second-order divided differences are defined by √ .S xˆx(k−1)
=
c2 − 1 ˆ x(k−1),j , q(k − 1) ˆ f x (k − 1) + c S i 2c2
240
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Sxq(k−1)
Syx(k)
Syr(k)
+ fi xˆ (k − 1) − cSˆ x(k−1),j , q(k − 1) . (8.58) −2fi xˆ (k − 1), q(k − 1) √ c2 − 1 fi xˆ (k − 1), q(k − 1) + cSq(k−1),j = 2 2c (8.59) + fi xˆ (k − 1), q(k − 1) − cSq(k−1),j . −2fi xˆ (k − 1), q(k − 1) √ c2 − 1 h x(k) + cS r(k) + h , = x(k) − cS r(k) . , i i x(k),j x(k),j 2c2 (8.60) −2hi (x(k), r(k))]} √ c2 − 1 hi x(k), r(k) + cSr(k),j + hi x(k), r(k) − cSr(k),j = 2 2c (8.61) −2hi (x(k), r(k))]} .
The predicted state and the Cholesky factor of corresponding covariance are given by
c2 − nx − nq .x(k) = c2 +
f xˆ (k − 1), q(k − 1)
nx 1 ˆ x(k−1),j , q(k − 1) ˆ f x (k − 1) + c S 2c2 j =1
+f xˆ (k − 1) − cSˆ x(k−1),j , q(k − 1) +
nq 1 f xˆ (k − 1), q(k − 1) + cSq(k−1),j 2 2c j =1
Sx(k)
+f xˆ (k − 1), q(k − 1) − cSq(k−1),j .
=H Sxˆx(k−1) , Sxq(k−1) , Sxˆx(k−1) , Sxq(k−1) .
(8.62) (8.63)
The predicted measurement and the Cholesky factor of corresponding covariance are given by
8.2 Correntropy-Based Divided Difference Filter (CDD)
c2 − nx − nr .y(k) = 2 +
241
h (x(k), r(k))
nx
1 h , x(k) − cS r(k) , x(k) + cS r(k) + h x(k),j x(k),j 2c2 j =1
+
nr 1 h x(k), r(k) + cSr(k),j + h x(k), r(k) − cSr(k),j . 2 2c j =1
Sy(k) =H
Syx(k) ,
Syr(k) ,
Syx(k) ,
Syr (k)
(8.64)
(8.65)
.
The Kalman gain is computed as −1 T K(k) = Sx(k) ST . yx(k) Sy(k) Sy(k)
(8.66)
.
The updated state and the Cholesky factor of corresponding covariance are
.
xˆ (k) = x(k) + K(k) (y(k) − y(k)) ˆSx(k) = H Sx(k) − K(k)S K(k)Syr(k) , yx(k) ,
K(k)Syx(k) ,
K(k)Syr(k)
.
(8.67)
8.2.3 Correntropy-Based DD1 and DD2 This section derives the DD1 and DD2 filters under maximum correntropy criterion (MCC). First, the measurement in Eq. (8.49), whose noise is additive, can be written as .y(k) = h (x(k)) + r(k). A nonlinear regression model is reconstructed:
.
x(k) x(k) = + ν(k), y(k) h (x(k))
(8.68)
the Cholesky factor of covariance of .ν(k) is given by Sx(k) 0 . . (k) = 0 Sr(k)
†
(8.69)
Left multiplying both sides of (8.68) by .† (k)−1 , we have the regression model: d† (k) = h¯ † (x(k)) + e(k).
.
(8.70)
242
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
The regression problem associated with (8.70) can be solved under MCC, and the corresponding cost function is J (x(k)) =
L
.
κσ (ei (k)) .
(8.71)
i=1
The solution can be found by setting the first derivation of (8.71) to be zero: m+n .
ψ (ei (k))
i=1
∂ei (k) = 0, ∂x(k)
(8.72)
where .ψ (ei (k)) = κσ (ei (k)) ei (k). Defining . i (k) = ψ (ei(k)) /ei (k) = κσ (ei (k)) and .(k) = diag{1 (k), . . ., x (k) 0 .n+m (k)} = , Eq. (8.72) can be written 0 y (k) .
h¯ † (x(k)) ∂x(k)
T
(k) d(k) − h¯ † (x(k)) = 0.
(8.73)
The MCC uses .(k) to reweight the covariance matrix of the residual error and reconstruct the measurement information. Thus, the updated covariance can be written as † (k) = † (k)−1 (k)† (k)T ,
.
(8.74)
decomposed into two portions: † = .
†x 0 . †y 0
(8.75)
The initial value can be set to .xˆ (k)(0) = x¯ (k) or be equal to the updated state computed from the standard DD1 or DD2. We obtain the following equations:
.
¯T †x = S¯ x(k) · −1 x · Sx(k) † T y = Sr(k) · −1 y · Sr(k) .
(8.76)
To reduce the computation time, only one iteration can be used to obtain the solution. A one-step correntropy update for the DD1 (CDD1) filter is [172]
8.2 Correntropy-Based Divided Difference Filter (CDD) (1)
=H
Sy(k) .
K(k)(1) = Sx(k) −1 x xˆ (k) Sˆ x(k)
−1/2
−1/2
, Syr(k) y (1) (1)T −1 S S ST yx(k) y(k) y(k)
Syx(k) x
(1) = x(k) + K(k) (y(k) − y(k))
=H
−1/2
Sx(k) x
243
−1/2
− K(k)(1) Syx(k) x
,
−1/2
K(k)(1) Syr(k) y
. (8.77)
Similarly, a one-step correntropy update for the DD2 (CDD2) filter is [172] (1)
Sy(k) = H
.
−1/2
Syx(k) x
,
−1/2
Syr(k) y
,
−1/2
Syx(k) x
,
−1/2
Syr(k) y
(1) (1)T −1 T S K(k)(1) = Sx(k) −1 S S x yxk y(k) y(k) xˆ (k) = x(k) + K(k)(1) (y(k) − y(k)) (8.78) −1/2 −1/2 −1/2 Sx(k) x − K(k)(1) Syx(k) x , K(k)(1) Syr(k) y , . Sˆ x(k) = H −1/2 −1/2 K(k)(1) Syx x , K(k)(1) Syr(k) y Remark 8.1 The correntropy-based divided difference filters utilize the correntropy to improve the performance in the presence of heavy-tailed non-Gaussian noises, in which a nonlinear regression model is applied to update the measurement information. It is noted that the kernel size plays a key role in the algorithm. When kernel size .σ → ∞ and the matrix . → I, the CDD1 and CDD2 filters will reduce to the original DD1 and DD2 filters. In practical applications, the measurement system may yield extremely large values. In this case, the CDD1 and CDD2 filters may face numerical problems since .y will be nearly singular. In view of this problem, a stabilizing method is introduced as −1 o(k) = (y(k) − y(k))T Sy(k) STy(k) (y(k) − y(k)) .
.
(8.79)
If .|o(k)| > othr , with .othr being a positive threshold, only the predicted step is performed, i.e., .xˆ (k) = x(k) and .Sˆ x(k) = Sx(k) . If .|o| othr , the CDD1 and CDD2 are worked.
8.2.4 Illustrative Example In this section, to demonstrate the performance of the above algorithms, we apply them to solve the problem of ship positioning. The model combines dead reckoning (DR) and GPS technology to improve the accuracy of positioning.
244
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
8.2.4.1
State and Measurement Models
The motion of a ship can be denoted by a nonlinear function in terms of many factors, such as speed, course, shape of the Earth, sea current, wind, and so on. There are two kinds of maneuvering motions of a ship, which are speed maneuver in a straight line and direction maneuver. Since the acceleration of the ship is generally small and the sampling period of the integrated positioning system is relatively short, the speed maneuver can be ignored or regarded as speed noise. The direction maneuver can be approximately described by uniform circular motion with a constant change rate of the ship’s course. Therefore, there are two kinds of states related to the motion of a ship, which are uniform linear motion and uniform circular motion. In view of the above, the state vector is chosen as x= ϕ
λ
.
vN
vE
s
K
T
,
(8.80)
where .ϕ and .λ denote the arc lengths of latitude and longitude, .vN and .vE are the northward velocity component and the eastward velocity component of the ocean current, s denotes the velocity of that ship relative to the water, K is the ship’s course, and . represents the change rate of the ship’s course. Correspondingly, the state equations of the ship are written as .
ϕ˙ = vN + s cos K + q1.
(8.81)
ν˙ = vE + s sin K + q2.
(8.82)
v˙N = −βvN + q3.
(8.83)
v˙E = −βvE + q4.
(8.84)
s˙ = q5.
(8.85)
K˙ = + q6.
(8.86)
˙ = q7 ,
(8.87)
where .β denotes the inverse correlation time of ocean current and .qi are independent Gaussian white noises. By discretizing these equations, we obtain the following equations: ϕ(k) =ϕ(k − 1) + β −1 (1 − exp(−βT ))vN (k − 1)
.
+ s(k − 1) cos (K(k − 1) + 0.5T (k − 1)) T + q1 (k − 1) λ(k) =λ(k − 1) + β −1 (1 − exp(−βT ))vE (k − 1) + s(k − 1) sin (K(k − 1) + 0.5T (k − 1)) T + q2 (k − 1) vN (k) = exp(−βT )vN (k − 1) + q3 (k − 1) vE (k) = exp(−βT )vE (k − 1) + q4 (k − 1)
8.2 Correntropy-Based Divided Difference Filter (CDD) Table 8.2 Parameters of the example (from [172])
245 Parameter .β
−1 ,
Value .27, 780 12
s
.T , s
Table 8.3 Initial conditions (from [172])
Parameter m .λ0 , m .vN,0 , m/s .vE,0 , m/s .s0 , m/s .K0 , rad .0 , rad/s
Value × 106 7 .1.2565 × 10 1 1 .10.289 .π/4 0
.ϕ0 ,
.2.2239
s(k) =s(k − 1) + q5 K(k) =K(k − 1) + (k − 1)T + q6 (k − 1) (k) =(k − 1) + q7 (k − 1),
(8.88)
where T is sampling period. The measurement vector is chosen as y(k) = ϕG (k)
.
λG (k) sL (k) KG (k)
T
,
(8.89)
where .ϕG and .λG denote the arc lengths of latitude and longitude provided by GPS, sL is the velocity of that ship relative to the water provided by electromagnetic log, and .KG represents the ship’s course provided by electric gyrocompass. Accordingly, the measurement equations are
.
ϕG (k) = ϕ(k) + r1 (k) λG (k) = λ(k) + r2 (k) .
sL (k) = s(k) + r3 (k)
(8.90)
KG (k) = K(k) + r4 (k), where .ri (k) are independent white noises. Some filters are tested for comparison, including EKF, DD1, UKF, DD2, the Huber-based first-order divided difference (HDD1) filter, and the Huber-based second-order divided difference (HDD2) filter. In the simulation, 100 independent Monte Carlo experiments were conducted, and the duration of each experiment was .1200 s. The parameters in this example and initial conditions are summarized in Tables 8.2 and 8.3, and the process noises satisfy the Gaussian distributions:
246
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Table 8.4 TMSEs of position in Gaussian noises (from [172])
Filter EKF DD1 HDD1 CDD1 (.σ CDD1 (.σ CDD1 (.σ UKF DD2 HDD2 CDD2 (.σ CDD2 (.σ CDD2 (.σ
= 2) = 5) = 20)
= 2) = 5) = 20)
TMSEs of .ϕ 10.3197 10.3196 12.1968 23.8353 10.4517 10.3215 10.3093 10.3086 12.1613 23.8503 10.4366 10.3102
TMSEs of .λ 11.1708 11.1707 13.1074 22.6151 11.2693 11.1690 11.1557 11.1549 13.0604 22.3792 11.2484 11.1529
q1 ∼ N 0, 0.684 m2 , q2 ∼ N 0, 0.684 m2 . q3 ∼ N 0, 0.000158( m/s)2 , q4 ∼ N 0, 0.000158( m/s)2 . q5 ∼ N 0, 0.00158( m/s)2 , q6 ∼ N 0, 0.0026rad2 .
.
q7 ∼ 0.
8.2.4.2
(8.91) (8.92) (8.93) (8.94)
Simulation Results
First, we assume the measurement noises satisfy the Gaussian distributions: r1 ∼ N 0, 10000 m2 , r2 ∼ N 0, 10000 m2 . r3 ∼ N 0, 0.0423 (m/s)2 , r4 ∼ N 0, 0.0000395 rad2 .
(8.95)
The TMSEs of position in Gaussian noises are revealed in Table 8.4. It can be seen that the DD2 filter and UKF have a similar performance, likewise for the DD1 filter and EKF. The DD2 filter and UKF are superior to the DD1 filter and EKF, exhibiting slightly smaller errors. Meanwhile, the robust filters do not perform as well as their non-robust counterparts under Gaussian noise conditions. The CDD1 and CDD2 filters achieve almost the same performance as the DD1 and DD2 filters when the kernel size is large enough. It is noted that the UKF sometimes stops executing because the parameters of the UKF may not be finely tuned enough, bringing about the problem of propagation of the nonpositive definite covariance matrix. Second, we assume that the measurement noises are heavy-tailed non-Gaussian, satisfying the following distributions: r1 ∼ N 0, 10000 m2 + N 0, 1000000 m2
.
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with. . .
247
Fig. 8.2 .MSEs of .ϕ with first-order approximate filters in non-Gaussian measurement noises (from [172])
r2 ∼ N 0, 10000 m2 + N 0, 1000000 m2 r3 ∼ N 0, 0.0423(m/s)2 + N 0, 4.23 (m/s)2 r4 ∼ N 0, 0.0000395rad2 + N 0, 0.00395rad2 .
(8.96)
Figures 8.2, 8.3, 8.4, and 8.5 show the MSEs of different filters for the non-Gaussian case, and Table 8.5 summarizes the corresponding TMSEs. The performances of the DD2, UKF, DD1, and EKF follow the behavior from the Gaussian case. In the non-Gaussian case, the robust filters outperform the corresponding nonrobust filters. With a very large kernel size, the CDD1 and CDD2 filters achieve a similar estimation to the DD1 and DD2 filters; with a proper kernel size, the CDD1 and CDD2 give smaller errors than the non-robust filters; in particular, when the kernel size is set to 2, the CDD2 exhibits the smallest errors.
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with Fiducial Points (MEEF-DEKF) 8.3.1 State-Space Model Generally, a nonlinear MVAR model is constructed to fit the dynamical system: γ (k) = f(γ (k − 1), . . . , γ (k − p), w(k)) + q(k),
.
(8.97)
248
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Fig. 8.3 .MSEs of .λ with first-order approximate filters in non-Gaussian measurement noises (from [172])
Fig. 8.4 .MSEs of .ϕ with second-order approximate in non-Gaussian measurement noises (from [172])
where .γ (k) ∈ Rm×1 is the variable of the dynamical system driven by nonlinear function .f(·) and white process noise .q(k) ∈ Rm×1 at discrete-time k; p is the order of MVAR model; and .w(k) ∈ Rnw ×1 is the parameter vector of .f(·). Equation (8.97) can be extended as a state-space model [175]: .
x(k) = A(x(k − 1), w(k)) + Bq(k).
(8.98)
y(k) = Hx(k) + r(k),
(8.99)
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with. . .
249
Fig. 8.5 .MSEs of .λ with second-order approximate in non-Gaussian measurement noises (from [172]) Table 8.5 TMSEs of position in non-Gaussian measurement noises (from [172])
Filter EKF DD1 HDD1 CDD1 (.σ CDD1 (.σ CDD1 (.σ UKF DD2 HDD2 CDD2 (.σ CDD2 (.σ CDD2 (.σ
= 2) = 5) = 20)
= 2) = 5) = 20)
TMSEs of .ϕ 91.7839 91.7776 44.2747 39.4284 64.5026 89.3531 91.4480 91.4269 43.8700 39.0126 64.1485 89.0010
TMSEs of .λ 84.7678 84.7557 37.6608 33.0989 56.9180 82.3220 84.5617 84.5294 37.4722 32.8825 56.7222 82.0980
where .x(k) = [γ T (k), γ T (k − 1), . . . , γ T (k − p + 1)]T ∈ Rnx ×1 reflects the hidden state; .A(x(k − 1), w(k)) = [fT (γ (k − 1), . . . , γ (k − p), w(k)), γ T (k − 1), · · · , T T T n ×m ; .γ (k − p + 1)] is the state transition function; .B = [Im×m , 0, . . . , 0] ∈ R x m×1 and .y(k) ∈ R represents the observation obtained by measurement transition function .H = BT and observation noise .r(k) ∈ Rm×1 . In addition, we can also construct a separate state-space model for the unknown parameter [175]: w(k) = w(k − 1) + v(k).
(8.100)
y(k) = f(x(k − 1), w(k)) + q(k) + r(k),
(8.101)
.
where the parameter .w(k) is a stationary process driven by process noise .v(k).
250
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
8.3.2 Dual Extended Kalman Filter To estimate the dynamical system states and the MVAR parameters simultaneously, the DEKF was developed to perform the dual estimation in an alternate iteration manner. To be specific, the estimation includes state estimation and parameter estimation [175]: (1) State Estimation: The a priori state estimate .xˆ (k|k −1) with corresponding error covariance matrix .Px (k|k − 1), the state gain matrix .Gx (k), and the a posteriori state estimate .xˆ (k) with corresponding error covariance matrix .Px (k) are given by ˆ xˆ (k|k − 1) = A(ˆx(k − 1), w(k|k − 1)).
.
(8.102)
Px (k|k − 1) = Fx (k)Px (k − 1)Fx (k)T + BQ(k)BT Gx (k) = Px (k|k − 1)Hx (k)T [Hx (k)Px (k|k − 1)Hx (k)T + R(k)]−1.
(8.103)
xˆ (k) = xˆ (k|k − 1) + Gx (k)(y(k) − Hx (k)ˆx(k|k − 1)).
(8.104)
P (k) = (I − G (k)H (k))P (k|k − 1),
(8.105)
x
x
where .Fx (k) =
x
x
∂A(ˆx(k − 1)) , .Q(k) = E[q(k)qT (k)], .Hx = H, and .R(k) = ∂ xˆ (k − 1)
E[r(k)rT (k)]. ˆ − 1) with cor(2) Parameter Estimation: The a priori parameter estimate .w(k|k responding error covariance matrix .Pw (k|k − 1), the parameter gain matrix w ˆ .G (k), and the a posteriori parameter estimate .w(k) with corresponding error covariance matrix .Pw (k) are calculated by ˆ ˆ − 1). w(k|k − 1) = w(k
(8.106)
P (k|k − 1) = P (k − 1) + V(k).
(8.107)
.
w
w
Gw (k) = Pw (k|k − 1)Hw (k)T [Hw (k)Pw (k|k − 1)Hw (k)T + R(k) + Q(k)]−1.
(8.108)
ˆ ˆ w(k) = w(k|k − 1) + Gw (k)(y(k) − Hx xˆ (k|k − 1)).
(8.109)
P (k) = (I − G (k)H (k))P (k|k − 1),
(8.110)
w
w
w
w
where .V(k) = (β −1 − 1)Pw (k − 1), .β ∈ (0, 1] refers to the forgetting factor ˆ ∂ xˆ (k|k − 1) f(ˆx(k − 1), w(k|k − 1)) = H(k) [175], and .Hw (k) = . ˆ ˆ ∂ w(k|k − 1) ∂ w(k|k − 1)
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with. . .
251
8.3.3 MEEF-DEKF 8.3.3.1
Batch Regression Model
ˆ The batch regression form for states .xˆ (k) or parameters .w(k) is given in the following. To avoid repetition, we denote .χ = {x, w}. Based on the state-space equations (8.98) (or (8.100)) and (8.99) (or (8.101)), we construct a framework to process the predictions and observations simultaneously:
.
χˆ (k|k − 1) I χ(k) + μχ (k), = y(k) Hχ
(8.111)
where
μχ (k) =
.
χˆ (k|k − 1) − χ (k) , rχ (k)
(8.112)
with Tχ (k)Tχ (k)T E μχ (k)μχ (k)T = χ χ . Tp (k|k − 1)Tp (k|k − 1)T 0 = . χ χ 0 Tr (k)Tr (k)T
(8.113)
To normalize the error .μχ (k), multiplying both sides of Eq. (8.111) by .Tχ (k)−1 yields dχ (k) = Uχ (k)χ(k) + eχ (k),
(8.114)
.
where
χˆ (k|k − 1) yχ (k)
. χ I χ −1 U (k) = T (k) Hχ χ χ −1 χ e (k) = T (k) μ (k), dχ (k) = Tχ (k)−1
χ
χ
χ
(8.115)
χ
with .dχ (k) = [d1 (k), · · · , dL (k)]T , .Uχ (k) = [U1 (k), · · · , UL (k)]T , .eχ (k) = χ χ [e1 (k), · · · , eL (k)]T , and .L = nχ + m. After the prewhitening, the variance of the normalized error is calculated as .E[eχ (k)eχ (k)T ] = I.
8.3.3.2
MEEF-DEKF
Combining the MEEF and Eq. (8.114), we have the MEEF-DEKF cost function:
252
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
JL = λ
L
.
χ
κσ1 (ej (k)) + (1 − λ)
j =1
L L
χ
χ
κσ2 [ej (k) − ei (k)].
(8.116)
i=1 j =1
Taking the partial derivative of Eq. (8.116) about .χ (k) and letting it equal to zero, we obtain .
∂ JL = 2λ2 Uχ (k)T χ (k)eχ (k) − Uχ (k)T χ (k)eχ (k) ∂χ(k) + λ1 Uχ (k)T ϒ χ (k)eχ (k) = 0,
where .λ1 = L i=1
λ , σ12
χ
λ2 =
.
1−λ , σ22
(8.117)
χ
χ
[χ (k)]ij = κσ2 (ej (k) − ei (k)), .[ χ (k)]ij =
.
χ
κσ2 (ej (k) − ei (k)), and .ϒ χ (k)
=
χ
χ
diag{κσ1 (e1 (k)), · · · , κσ1 (eL (k))}. In
Eq. (8.117), .χ(k) can be solved by a fixed-point iteration algorithm: ˆ t) χˆ (k)t+1 = g(χ(k) −1 Uχ (k)T χ (k)dχ (k) , = Uχ (k)T χ (k)Uχ (k)
.
(8.118)
where .χˆ (k)t represents the solution of .χ (k) at the t-th fixed-point iteration; and χ (k) = 2λ2 χ (k) − χ (k) + λ1 ϒ χ (k) χ χ χ xx (k) yx (k) x (k) 0 = 2λ2 + λ1 χ χ χ xy (k) yy (k) 0 y (k) χ χ xx (k) yx (k) = , χ χ xy (k) yy (k)
.
with χ χxx (k) = i,j (k)
.
nχ ×nχ
χ = i,j (k)
nχ ×nχ
(i = 1, 2, · · · , nχ ; j = 1, 2, · · · , nχ ) χ χ = i,j (k) χxy (k) = i,j (k) m×nχ
m×nχ
χ − i,j (k)
nχ ×nχ
χ − i,j (k)
m×nχ
(i = nχ + 1, nχ + 2, · · · , nχ + m; j = 1, 2, · · · , nχ ) χ χ χ = i,j (k) − i,j (k) χyx (k) = i,j (k) nχ ×m
nχ ×m
nχ ×m
(i = 1, 2, · · · , nχ ; j = nχ + 1, nχ + 2, · · · , nχ + m)
(8.119)
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with. . .
253
(i = 1, 2, · · · , nχ ; j = nχ + 1, nχ + 2, · · · , nχ + m) χ χ χ χ .yy (k) = (k) = (k) − (k) i,j i,j i,j m×m
m×m
(8.120)
m×m
(i = nχ + 1, nχ + 2, · · · , nχ + m; j = nχ + 1, nχ + 2, · · · , nχ + m) χx (k) = diag{κσ1 (e1 (k)), · · · , κσ1 (enχ (k))} χy (k) = diag{κσ1 (enχ +1 (k)), · · · , κσ1 (eL (k))} χxx (k) = 2λ2 χxx (k) + λ1 χx (k) χyx (k) = 2λ2 χyx (k)
χxy (k) = 2λ2 χxy (k)
χyy (k) = 2λ2 χyy (k) + λ1 χy (k). MEEF-DEKF algorithm thus proceeds as follows: ˆ Step 1 : Initialize .xˆ (1|0), .Px (1|0), .w(1|0) and .Pw (1|0); set .λ, .σ1 , .σ2 , .ε, and .τ = 1. ˆ − 1), .Pw (k|k − 1), Step 2 : Calculate .xˆ (k|k − 1), .Px (k|k − 1), .Fx (k), .Hx (k), .w(k|k and .Hw (k); construct the regression model in Eq. (8.111). Step 3 : Calculate .Txp (k|k − 1) and .Txr (k) using the Cholesky decomposition w of .Px (k|k − 1) and .R(k); calculate .Tw p (k|k − 1) and .Tr (k) using the w Cholesky decomposition of .P (k|k − 1) and .R(k) + Q(k); calculate x x −1 .d (k) = T (k) xˆ (k|k − 1)y(k) , .Ux (k) = Tx (k)−1 IHx , .dw (k) = Tw (k)−1 w(k|k ˆ − 1)y(k) and .Uw (k) = Tw (k)−1 IHw . ˆ (k) in a fixed-point form: Step 4 : Let .xˆ (k)0 = xˆ (k|k − 1); use .{y(k)}N k=1 to update .x x (k)[y(k) − Hx (k)ˆx(k|k − 1))] xˆ (k)τ = xˆ (k|k − 1) + K
.
(8.121)
x x x T ˆ (k)τ −1. .ej (k) = dj (k) − Uj (k) x x x yx (k) xx (k) x (k) = x ,. x yy (k) xy (k)
(8.122)
with
(8.123)
x Px (k|k − 1) = (Txp (k|k − 1)T )−1 xx (k)Txp (k|k − 1)−1.
(8.124)
x xy (k)Txp (k|k − 1)−1. Pxxy (k|k − 1) = (Txr (k)T )−1
(8.125)
x Pxyx (k|k − 1) = (Txp (k|k − 1)T )−1 yx (k)Txr (k)−1.
(8.126)
x Rx (k) = (Txr (k)T )−1 yy (k)Txr (k)−1
(8.127)
254
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Px(k|k − 1) + (Hx (k))T Pxxy (k|k − 1) .K (k) = x x T x + Pyx (k|k − 1) + H (k) R (k) Hx (k)
−1
x
× Rx (k) . Pxyx (k|k − 1) + Hx (k)T
(8.128)
Step 5 : Compare .xˆ (k)τ and .xˆ (k)τ −1 .
||ˆx(k)τ − xˆ (k)τ −1 || ≤ ε. ||ˆx(k)τ −1 ||
(8.129)
If Eq. (8.129) holds, set .xˆ (k) = xˆ (k)τ and continue to Step 6. Otherwise, τ + 1 → τ , and return to Eq. (8.121). Step 6 : Update .k + 1 → k and the state posterior error covariance matrix by .
x (k)Hx (k)]Px (k|k − 1)[I − K x (k)Hx (k)]T Px (k) = [I − K
.
x (k)R(k)(K x (k))T . +K
(8.130)
ˆ 0 = w(k|k ˆ ˆ Step 7 : Let .w(k) − 1); use .{y(k)}N in a fixed-point k=1 to update .w(k) form: w (k)[y(k) − Hx (k)ˆx(k|k − 1)] ˆ τ = w(k|k ˆ w(k) − 1) + K
(8.131)
.
with w w w T ˆ τ −1. .ej (k) = dj (k) − Uj (k) w(k) w w (k) (k) w xx yx (k) = ,. w w yy (k) xy (k)
(8.132) (8.133)
T −1 w w −1 Pw (k|k − 1) = (Tw p (k|k − 1) ) xx (k)Tp (k|k − 1) .
(8.134)
w T −1 w w −1 Pw xy (k|k − 1) = (Tr (k) ) xy (k)Tp (k|k − 1) .
(8.135)
w Pw yx (k|k − 1) = (Tp (k|k
(8.136)
Rw (k) =
T −1 w w −1 (Tw r (k) ) yy (k)Tr (k)
w (k) = K
.
w −1 − 1)T )−1 yx (k)Tw r (k) .
Pw(k|k − 1) + (Hw (k))T Pw xy (k|k − 1) w w T w R (k) Hw (k) + Pyx (k|k − 1) + H (k)
w T w × Pw yx (k|k − 1) + H (k) R (k) .
(8.137) −1
(8.138)
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with. . .
255
ˆ τ and .w(k) ˆ τ −1 Step 8 : Compare .w(k) .
ˆ τ − w(k) ˆ τ −1 || ||w(k) ≤ ε. ˆ τ −1 || ||w(k)
(8.139)
ˆ ˆ τ and continue to Step 9. Otherwise, = w(k) If Eq. (8.139) holds, set .w(k) τ + 1 → τ , and return to Eq. (8.131). Step 9 : Update .k + 1 → k and the parameter posterior error covariance matrix by .
w (k)Hw (k)]Pw (k|k − 1)[I − K w (k)Hw (k)]T Pw (k) = [I − K
.
w (k)R(k)(K w (k))T , +K
(8.140)
and return to Step 2.
8.3.4 Illustrative Examples To demonstrate the capabilities of MEEF-DEKF in suppressing non-Gaussian noises, we perform state and parameter estimation on two illustrative examples, i.e., time-varying channel tracking and equalization and nonstationary signal prediction. As the dual versions of robust Kalman filters are not off the shelf, we extend the robust Kalman filters such as HIF [60], Huber-based Kalman filters [83], MCEKF [118], MEE-EKF [125], and ST-KF [74] to DEKF, generating the HIF-DEKF, H-DEKF, MC-DEKF, MEE-DEKF, and ST-DEKF. These robust algorithms were tested and compared in the examples. To quantitatively give the estimation results, define the performance metrics as follows: MSEw =
.
nw N 1 ˆ l (k))2. (wl (k) − w Nnw
(8.141)
l=1 k=1
nx N 1 MSEx = (xl (k) − xˆ l (k))2. Nnx
(8.142)
1 (yl (k) − yˆ l (k))2 , Nm
(8.143)
l=1 k=1 m
MSEy =
N
l=1 k=1
where .MSEw , .MSEx , and .MSEy are the mean square errors (MSEs) of the estimated parameter, state, and measurement and .wl (k), .xl (k), and .yl (k) are the l-th element ˆ l (k), .xˆ l (k), and .yˆ l (k) of the true parameter, state, and measurement vector, while .w correspond to the l-th element of the estimated vectors at time k. Simulation results are obtained by averaging over 50 independent runs.
256
8.3.4.1
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Time-Varying Channel Tracking and Equalization
In the multi-input multi-output (MIMO) communication system, the signals are transmitted by the multipath fading channels. In order to suppress the influences of channel distortion, channel noise, and cross talk between users, the DEKF adopts the state-space model to model MIMO chaotic communication system and uses the dual estimation to realize the channel equalization and signal tracking effectively. In the communication system, the signal .x(k) = [x(k), x(k − 1), . . . , x(k − p + 1)]T ∈ Rp×1 with .p = 10 is modulated via the time-varying channel: ⎤ a(k) tanh(b(k)x(k − 1) x(k) ⎢ ⎥ +c(k)) + d(k) ⎥ ⎢ x(k − 1) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ x(k − 1) .⎢ ⎥ + q(k), ⎥=⎢ .. ⎥ ⎣ ⎦ ⎢ . . .. ⎣ ⎦ x(k − p + 1) x(k − p + 1) ⎡
⎤
⎡
(8.144)
where .a(k) ∈ R1×4 , .b(k) ∈ R4×10 , .c(k) ∈ R4×1 , and .d(k) ∈ R1×1 are the unknown channel parameters; .q(k) denotes the process noise. The measurement equation is represented by y(k) = Hx(k) + r(k),
.
(8.145)
where .y(k) ∈ R1×1 is the first channel output including measurement noise .r(k); the measurement function is .H = [1, 0, . . . , 0] ∈ Rp×1 . In this example, the process noise is assumed to follow the Gaussian distribution .q(k) ∼ N(0, 0.01). The performance of MEEF-DEKF is investigated in the measurement noises with different distributions. Consider three cases: (1) Gaussian noise with outliers: .r(k) ∼ 0.999N(0, 0.25) + 0.001N(0, 1); (2) mixture Gaussian noise: .r(k) ∼ 0.5N(−0.1, 0.55) + 0.5N(0.1, 0.55); and (3) mixture Gaussian noise with outliers: .r(k) ∼ 0.48N(−0.01, 0.25) + 0.04N(0, 1) + 0.48N(0.01, 0.25). The filters are performed over 2500 time-sample intervals. Table 8.6 shows the parameters of MEEF-DEKF which are set by making a balance between filtering precision and robustness. Table 8.7 presents the state estimation results of HIF-DEKF, H-DEKF, MC-DEKF, MEE-DEKF, ST-DEKF, and MEEF-DEKF. From Table 8.7, we observe the following: (1) The performance of HIF-DEKF is not very good in Table 8.7. Since a userspecified matrix and a user-specified performance boundary are required, the filtering performance is more sensitive to the design of parameters especially in non-Gaussian noises. (2) Based on the non-MMSE criteria like MCC, Huber function, MEE, and the Kalman filters, including the H-DEKF, MC-DEKF, and MEE-DEKF. The filtering precision of H-DEKF is limited. In contrast, the MC-DEKF and MEE-DEKF can better suppress outliers than the H-DEKF and ST-DEKF.
8.3 Dual Extended Kalman Filter Under Minimum Error Entropy with. . . Table 8.6 Parameter settings for time-varying channel tracking and equalization under different noise distributions (from [173])
Parameters Gaussian noise with outliers mixture Gaussian noise mixture Gaussian noise with outliers
257 .σ1
.σ2
.λ
5.0 5.0 5.0
2.0 2.0 1.0
0.9 0.9 0.9
Table 8.7 State estimation results for time-varying channel tracking and equalization under different noise distributions (from [173]) Algorithms DEKF HIF-DEKF H-DEKF MC-DEKF MEE-DEKF ST-DEKF MEEF-DEKF
Gaussian noise with outliers 7.9073 22.6915 5.3598 1.9586 1.4396 2.4849 1.1428
mixture Gaussian noise 0.2162 6.4179 0.2097 0.1968 0.2009 1.9397 0.1876
mixture Gaussian noise with outliers 6.4944 15.1750 5.8995 2.4535 1.4448 2.4685 0.4264
In comparison with DEKF and HIF-DEKF, the ST-DEKF has high filtering accuracy but may be limited by the choice of multiple parameters. (3) For complex non-Gaussian distribution noise, the MEEF-DEKF achieves the superior filtering accuracy. Because the MEEF-DEKF combines the advantages of both MEE-DEKF and MC-DEKF, the capability to deal with non-Gaussian noise is powerful.
8.3.4.2
Nonstationary Signal Prediction
Consider a dynamic system, which is driven by two damped stochastically oscillators (.γ2 (k) and .γ3 (k)) and a stochastically relaxator .γ1 (k). The system variables .γ1 (k), .γ2 (k), and .γ3 (k) are considered as a set of time series, which can be fitted by a three-dimensional MVAR process: ⎧ ⎪ γ1 (k) = 0.59γ1 (k − 1) − 0.20γ2 (k − 2) ⎪ ⎨ +b(k)γ2 (k − 2) + c(k)γ3 (k − 1) + q1 (k) . ⎪ γ2 (k) = 1.58γ2 (k − 1) − 0.96γ2 (k − 2) + q2 (k) ⎪ ⎩ γ3 (k) = 0.60γ3 (k − 1) − 0.91γ3 (k − 2) + q3 (k) where c(k) =
.
% 2k N 2N −2k N
if k ≤ else ,
N 2
(8.146)
258
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Table 8.8 Parameter settings for nonstationary signal prediction under different noise distributions (from [173])
Parameters Gaussian noise with outliers mixture Gaussian noise mixture Gaussian noise with outliers
.σ1
.σ2
.λ
1.0 1.0 1.0
0.3 0.08 0.1
0.95 0.95 0.95
Table 8.9 State and parameter estimation results for nonstationary signal prediction under different noise distributions (from [173]) .MSEw
Algorithms DEKF HIF-DEKF H-DEKF MC-DEKF MEE-DEKF ST-DEKF MEEF-DEKF
Gaussian noise with outliers 0.1179 0.0885 0.1126 0.0992 0.0992 0.0177 0.0134
mixture Gaussian noise 0.0807 0.0586 0.0789 0.0757 0.0504 0.0389 0.0135
.MSEx
mixture Gaussian noise with outliers 0.0268 0.0507 0.0239 0.0222 0.0208 0.0177 0.0138
Gaussian noise with outliers 16.0536 10.0806 15.2985 13.2368 11.4025 9.2369 0.9035
mixture Gaussian noise 9.4422 6.1239 9.3549 9.2495 7.5689 5.5556 1.3878
mixture Gaussian noise with outliers 1.7103 1.2395 0.9836 0.5595 0.4454 0.1243 0.0262
Bold values highlight the best results
and .b(k) is the function of exponential decay cosine. The parameters .b(k) and .c(k) denote the intensity of the influences on time series. Here, the DEKF can be used to track the fast time-varying MVAR parameters and estimate the noisy signals. In the dynamic system, the process noise obeys the Gaussian distribution .q(k) ∼ N(0, 0.01). To test the performance of DEKF in noisy nonstationary signal prediction, we consider the complex noises in the measurement: (1) Gaussian noise with outliers: .r(k) ∼ 0.99N(0, 0.5) + 0.01N(0, 100); (2) mixture Gaussian noise: .r(k) ∼ 0.5N(−0.2, 5.0) + 0.5N(0.2, 5.0); and (3) mixture Gaussian noise with outliers: .r(k) ∼ 0.49N(−0.1, 0.8) + 0.02N(0, 20) + 0.49N(0.1, 0.8). Table 8.8 presents the parameter settings of MEEF-DEKF in different measurement noises, and the corresponding evaluation results are given in Table 8.9. It can be seen from Table 8.9 that (1) The filtering performance of DEKF is more sensitive to the non-Gaussian noises compared with other algorithms. The HIF-DEKF shows better filtering performance than DEKF. Different from the ST-DEKF in Table 8.7, the results in Table 8.9 show a higher filtering accuracy in comparison with the HIF-DEKF, H-DEKF, MC-DEKF, and MEE-DEKF. (2) The MEEF-DEKF achieves significantly lower .MSEx and .MSEw in comparison with other dual Kalman filters, which suggests that the MEEF has the superior ability in dealing with non-Gaussian distributions such as bimodal distribution via the weight matrix .(k).
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
259
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and Maximum Correntropy Criterion (KKF-CEO-MCC) 8.4.1 Kernel Methods Kernel methods provide flexibility to establish nonlinear models based on linear approaches, through embedding the data .u ∈ Rd×1 in the original space to a high dimensional reproducing kernel Hilbert space (RKHS) .H, i.e., .ϕ : u → ϕ(u) [6]. In RKHS, the Mercer kernel .κ(·, ·) satisfies κ(u, ·) = ϕ(u). & ' κ(u, ·), κ(u , ·) H = κ(u, u ).
(8.147)
f (·), κ(u, ·)H = f (u),
(8.149)
.
(8.148)
where . ·, ·H denotes the inner product operation. Equation (8.149) suggests that the Mercer kernel has reproducing property [6]. The inner product between .ϕ(u) and .ϕ(u ) can be computed efficiently by the kernel trick [6]: κ(u, u ) = ϕ(u)T ϕ(u ) =
∞
.
ζj φj (u), φj (u ),
(8.150)
j =1
where .ζj is .φj is the eigenfunction, and .ϕ(u) can be constructed as √the eigenvalue, √ ϕ(u) = ζ1 φ1 (u), ζ2 φ2 (u), . . . . As a Mercer kernel, the Gaussian kernel has been widely used:
.
−||u − u ||2 .κσ1 (u, u ) = exp 2σ12
,
(8.151)
where .σ1 is the kernel size.
8.4.2 Statistical Embeddings in Reproducing Kernel Hilbert Space Assume that two random variables X and Y with instantiations x and y are defined in the same domain.
260
8.4.2.1
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Marginal Distribution Embedding
The kernel embedding of probability distributions refers to the expectation of random variables X in the RKHS. When the random variable X with marginal distribution .PX (x) is mapped as .ϕ(X) in RKHS .H, the kernel embedding for .PX (x) is defined by ( μX := EX [ϕ(X)] =
.
ϕ(x)dPX (x).
(8.152)
The kernel embedding .μX can be estimated using finite samples: 1 ϕ(x(j )), m m
μˆ X =
.
(8.153)
j =1
where m is the number of samples.
8.4.2.2
Joint Distribution Embedding
Kernel embedding of the marginal distribution can be extended to the joint distribution .PXY (x, y) of two variables X and Y using tensor product feature spaces, i.e., XY : = EXY [ϕ(X) ⊗ ψ(Y )] ( = ϕ(x) ⊗ ψ(y)dPXY (x, y),
.
(8.154)
where .ψ(Y ) is the mapping of Y in RKHS .F and .⊗ denotes the tensor product of two column vectors with .a ⊗ b = abT . The corresponding estimate with finite samples is ˆ XY =
.
1 1 ϕ(x(j )) ⊗ ψ(y(j )) = ϒT , m m m
(8.155)
j =1
where .ϒ = [ϕ(x(1)), ϕ(x(2)), . . . , ϕ(x(m))] and . = [ψ(y(1)), ψ(y(2)), . . . , ψ(y(m))].
.
8.4.2.3
Conditional Distribution Embedding
Similarly, the embedding of conditional distribution .P (Y |x) can be expressed as
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
261
( μY |x := EY |x [ϕ(Y )] =
.
ϕ(y)dPY |X (y|x).
(8.156)
The estimate with finite samples can be achieved by [176] −1 μˆ Y |x = ˆ Y |X ϕ(x) = ˆ Y X ˆ XX ϕ(x)
.
=
1 1 ϒ T ( ϒϒ T + ς I)−1 ϕ(x), m m
(8.157)
where .ς is a the regularization factor. Moreover, the conditional distribution .PY |X (y|x) embedding operator can be estimated by [176] ˆ Y |X = ˆ Y |X = ˆ Y X ˆ −1 U XX . = m1 ϒ T ( m1 ϒϒ T + ς I)−1 = (ϒ T ϒ+ς mIm )−1 ϒ T ,
(8.158)
where .Im denotes an .m × m identity matrix.
8.4.3 Kernel Kalman Filtering with Conditional Embedding Operator (KKF-CEO) Assume that the noisy time-series .{y(1), y(2), . . . , y(m+1)} is generated by .y(k) = u(k) + v(k), where .u(k) ∈ Rd×1 is a hidden clean time series, .y(k) ∈ Rd×1 is the observation corrupted by an additive zero-mean measurement noise .v(k) ∈ Rd×1 , and .v(k) is independent of .u(k). Let .ϕ(y(k)) be the mapping of .y(k) in RKHS and .μ(k) = E[ϕ(y(k))] be the noisy observation embedding in RKHS. To predict or estimate the clean time-series .u(k) from the noisy time-series .y(k), the KKF-CEO establishes a state-space model in RKHS, which takes the form [176] ˆ ˆ − 1) + q(k − 1) μ(k) = F(k − 1)μ(k
.
ˆ ϕ(y(k)) = μ(k) + r(k).
(8.159)
ˆ Here, .F(k − 1) is an embedding transition matrix, .μ(k) is the estimate of .μ(k), and .q(k − 1) and .r(k) with zero mean are mutually uncorrelated process noise and measurement noise in RKHS, which refer to the uncertainty in the model (8.159). The covariance matrices of .q(k − 1) and .r(k) are .
Q(k − 1) = qInk R(k) = rInk ,
262
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
where .Ink denotes an .nk × nk identity matrix with .nk being the dimension number of the RKHS and the constants q and r reflect the noisy intensity. The transition matrix .F(k − 1) can be estimated by using the conditional embedding operator in RKHS [176]: F(k − 1) = (ϒ T ϒ+ς mIm )−1 ϒ T
.
= (K+ς mIm )−1 ϒ T ,
(8.160)
where .ϒ = [ϕ(y0 (1)), ϕ(y0 (2)), . . . , ϕ(y0 (m))], . = [ϕ(y0 (2)), ϕ(y0 (3)), . . . , ϕ T 0 .(y (m + 1))], .K = ϒ ϒ and the superscript ."0" denotes the training data. Using the model (8.159) and the similar updates in Kalman filter, one can derive the KKFCEO predictor [176].
8.4.4 KKF-CEO-MCC 8.4.4.1
KKF-CEO-MCC in RKHS
Inspired by the MCC-based KFs [116, 124, 128], we introduce the following cost function of KKF-CEO-MCC in RKHS: ,2 ⎞ ⎛, , , , , , ⎟ ⎜, ˆ , .J = κσ2 ⎝,ϕ(y(k)) − μ(k) ⎠ 0, ./ ,, , r(k) R−1 (k) , ⎛, ⎞ , , , , , ⎟ ⎜, ˆ ˆ − 1), 2P−1 (k|k−1) ⎠ , − F(k − 1)μ(k + κσ2 ⎝,μ(k) ./ 0, ,, , q(k−1)
(8.161)
,2 , ˆ ,R−1 (k) where .R−1 (k) and .P−1 (k|k − 1) are the weighting matrices; .,ϕ(y(k)) − μ(k) , ,2 T R−1 (k)(ϕ(y(k)) − μ(k)); ˆ ,R−1 (k) = ˆ ˆ .= (ϕ(y(k)) − μ(k)) .κσ2 ,ϕ(y(k)) − μ(k) 2 ˆ R−1 (k) ϕ(y(k))−μ(k) exp − . Note that the process noise .q(k − 1) and measurement 2 2σ2
noise .r(k) are both embedded in the cost function (8.161), and the effects of the two uncertainties can be attenuated simultaneously. Under MCC, the optimal estimate of .μ(k) can be obtained by maximizing (8.161). Taking derivative of (8.161) with ˆ respect to .μ(k) and letting it equal to zero, we have .
∂J = 0. ˆ ∂ μ(k)
(8.162)
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
263
ˆ The optimal estimate .μ(k) can be achieved by two steps: ˆ − 1) together with the corresponding error (1) Predict: The a priori estimate .μ(k|k covariance matrix .P(k|k − 1) is computed by .
ˆ ˆ − 1). μ(k|k − 1) = F(k − 1)μ(k
(8.163)
P(k|k − 1) = F(k − 1)P(k − 1)FT (k − 1) + Q(k − 1),
(8.164)
where .P(k|k −1) is the autocorrelation matrix of the a priori error . 1 (k|k −1) = ˆ μ(k) − μ(k|k − 1). ˆ (2) Update: The a posteriori estimate .μ(k) and the corresponding error covariance matrix .P(k) can be obtained by .
ˆ ˆ ˆ μ(k) = μ(k|k − 1) + G(k)(ϕ(y(k)) − μ(k|k − 1)).
(8.165)
P(k) = (Ink − G(k))P(k|k − 1),
(8.166)
where G(k) = λ(k)P(k|k − 1)[λ(k)P(k|k − 1) + R(k)]−1. , ,2 ˆ − 1),R−1 (k) κσ2 ,ϕ(y(k)) − μ(k , . λ(k) = , ˆ − 1) − μ(k|k ˆ − 1), 2 −1 κσ ,μ(k
.
2
P
(8.167) (8.168)
(k|k−1)
Here, .P(k) represents the autocorrelation matrix of the a posteriori error . 2 (k) = ˆ μ(k) − μ(k) (See Appendix 8.A in Sect. 8.6.1 for the detailed derivation). Since the ˆ is of high dimension, the above update equations cannot estimated embedding .μ(k) be calculated directly. To bypass the mathematical difficulties of high dimension calculation, one can transform the operators in RKHS into the original space through the “kernel trick” using pairwise kernel evaluations [6]. Then, the update equations can be further expressed as follows:
8.4.4.2
1-Step Predictor of KKF-CEO-MCC
In the following, a 1-step predictor of KKF-CEO-MCC will be derived in the subspace spanned by the training data. The calculations of Eqs. (8.163), (8.164), (8.165), (8.166), and (8.167) are dependent on .F(k − 1) = (K+ς mIm )−1 ϒ T , ˆ = ϕ(y(0)), and the following .Q(k − 1) = qInk , .R(k) = rInk , .P(0) = εInk , .μ(0) theorems. ˆ ˆ − 1) and .μ(k), First, in order to obtain the estimates .μ(k|k the gain matrix .G(k), a prior .P(k|k − 1), and a posterior .P(k) should be computed by operators in their original spaces.
264
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Theorem 8.1 For .k = 1, the following equations are established: T ˜ P(1|0) = P(1|0) + qInk .
(8.169)
.
λ(1)r λ(1)q T ˜ . G(1) In k + λ(1)q + r λ(1)q + r rq T ˜ In + P(1) P(1) = , λ(1)q + r k G(1) =
(8.170) (8.171)
where ˜ P(1|0) = ε(ϒ T ϒ + ς mIm )−1 ϒ T ϒ[(ϒ T ϒ + ς mIm )−1 ]T.
−1 T ˜ ˜ ˜ . G(1) = (λ(1)q + r)Im + λ(1)P(1|0) P(1|0) λ(1)rq ˜ r ˜ G(1) P(1|0) − λ(1)q+r ˜ P(1) = λ(1)q+r . λ(1)r ˜ T ˜ − λ(1)q+r G(1) P(1|0)
.
(8.172) (8.173) (8.174)
Proof See Appendix 8.B in Sect. 8.6.2.
Assuming that Theorem 8.1 also holds for .k − 1, we obtain the following equations at k. Based on (8.164) and (8.171), .P(k|k − 1) takes the form T P(k|k − 1) = F(k − 1)P(k
− 1)F (k − 1) + qInk rq T ˜ − 1)T + = F(k − 1) P(k I n λ(k−1)q+r k F (k − 1) + qInk
. rq T ˜ − 1)T T + T + qInk = P(k λ(k−1)q+r ˜ = P(k|k − 1)T + qIn , k
(8.175) where . = (ϒ T ϒ+ς mIm )−1 ϒ T and ˜ − 1)T T + ˜ P(k|k − 1) = P(k
.
rq T λ(k−1)q+r
.
(8.176)
In addition, by (8.166), (8.171), and (8.174), .P(k) can be calculated by P(k) =
.
rq T ˜ In + P(k) , λ(k)q + r k
(8.177)
with ˜ P(k) =
.
λ(k)rq ˜ r ˜ λ(k)q+r P(k|k − 1) − λ(k)q+r G(k) λ(k)r ˜ ˜ − 1) − λ(k)q+r G(k)T P(k|k
.
(8.178)
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
265
According to (8.167), (8.170), and (8.173), .G(k) is G(k) =
.
λ(k)r λ(k)q T ˜ , In k + G(k) λ(k)q + r λ(k)q + r
(8.179)
where −1
(λ(k)q + r)Im ˜ ˜ .G(k) = P(k|k − 1). ˜ +λ(k)P(k|k − 1)T
(8.180)
The operators in RKHS are all expressed in terms of kernel evaluations by virtue of ˜ kernel trick and matrix inverse theorem [6]. Therefore, the calculations of .P(k|k−1), ˜ ˜ .P(k), and .G(k) only involve matrices calculations and inner product operations in the original space. The sizes of these matrices in general are much smaller than the dimension .nk of the feature spaces. ˜ ˆ ˆ Then, the update of .μ(k|k−1) and .μ(k) for .k > 1 can be denoted with .P(k|k−1), ˜ ˜ .P(k), and .G(k). ˆ ˆ Theorem 8.2 For .k = 1, .μ(1|0) and .μ(1) satisfy ˆ μ(1|0) = w(1).
(8.181)
.
ˆ μ(1) = s(1) +
λ(1)q ϕ(y(1)), λ(1)q + r
(8.182)
where ˆ . w(1) = (ϒ T ϒ+ς mIm )−1 ϒ T μ(0) λ(1)r ˜ r T w(1) I − G(1) m λ(1)q+r λ(1)q+r s(1) = . T ϕ(y(1)) ˜ + λ(1)r G(1)
.
(8.183) (8.184)
λ(1)q+r
Proof See Appendix 8.C in Sect. 8.6.3. Assuming that Theorem 8.2 also holds at .k − 1, we arrive at ˆ μ(k) = s(k) +
.
λ(k)q ϕ(y(k)), λ(k)q + r
(8.185)
where s(k) =
.
λ(k)r ˜ r T λ(k)q+r Im − λ(k)q+r G(k) λ(k)r ˜ + λ(k)q+r G(k)T ϕ(y(k))
The following equation can also be obtained:
w(k)
.
(8.186)
266
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
ˆ ˆ − 1) μ(k|k − 1) = F(k − 1)μ(k
= (ϒ T ϒ+ς mIm )−1 ϒ T
.
λ(k−1)q λ(k−1)q+r ϕ(y(k
− 1))
+s(k − 1)
(8.187)
= w(k), with w(k) = (ϒ T ϒ+ς mIm )−1 ϒ T
.
λ(k−1)q λ(k−1)q+r ϕ(y(k
− 1))
+s(k − 1)
.
(8.188)
Next, in the process of propagating the embedding distribution from k to .k + 1, λ(k) is introduced into the update of embedding distribution, which plays a key role in improving the effectiveness and robustness. In the following, .λ(k) is calculated based on the aforementioned theorems.
.
Theorem 8.3 Define .
ˆ ˆ − 1). b01 (k) = μ(k|k − 1)T μ(k|k
(8.189)
ˆ − 1). ˆ − 1)T μ(k b02 (k) = μ(k
(8.190)
ˆ ˆ − 1). b03 (k) = μ(k|k − 1)T μ(k
(8.191)
ˆ − 1)T μ(k|k ˆ b04 (k) = μ(k − 1).
(8.192)
b1 (k) = e(k|k − 1)T e(k|k − 1).
(8.193)
b2 (k) = e(k|k − 1)T .
(8.194)
b3 (k) = T e(k|k − 1),
(8.195)
ˆ − 1) − μ(k|k ˆ where .e(k|k − 1) = μ(k − 1). Then, the calculations of .a(k) and .b(k) are performed by ,2 , ˆ − 1),R−1 (k) a(k) = ,ϕ(y(k)) − μ(k
1 ϕ(y(k))T ϕ(y(k)) − ϕ(y(k))T μ(k ˆ − 1) . = ˆ − 1)T ϕ(y(k)) + b02 (k) r −μ(k , , ˆ − 1) − μ(k|k ˆ − 1), 2P−1 (k|k−1) b(k) = ,μ(k
.
=
1 1 b1 (k) − b2 (k)(q P˜ −1 (k|k − 1) + T )−1 b3 (k). q q
b1 (k) = b01 (k) + b02 (k) − b03 (k) − b04 (k). Proof See Appendix 8.D in Sect. 8.6.4.
(8.196)
(8.197) (8.198)
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
267
By Theorem 8.3, .λ(k) can be calculated by λ(k) =
.
κσ2 (a(k)) . κσ2 (b(k))
(8.199)
ˆ Finally, the state .u(k) is estimated from the embedding .μ(k) as follows. By virtue ˆ of the zero-mean measurement noise .v(k), the estimate .u(k) from the embedding ˆ .μ(k) can be expressed as
.
ˆ u(k) = E[u(k) + v(k)] = E[y(k)] = [E[y1 (k)], E[y2 (k)], . . . , E[yd (k)]]T = [E[f1 (y(k))], E[f2 (y(k))], . . . , E[fd (y(k))]]T ,
(8.200)
where .y(k) = [y1 (k), y2 (k), . . . , yd (k)]T . Considering the definition of the embeddings in the RKHS, we denote the relationship between the preimage .y(k) and its ˆ as image .μ(k) [E[f1 (y(k))], E[f2 (y(k))], . . . , E[fd (y(k))]]T
.
T ˆ ˆ ˆ = [ f1 , μ(k),
f2 , μ(k), . . . , fd , μ(k)]
ˆ = fT μ(k),
(8.201)
where .f = [f1 , f2 , . . . , fd ]. The function .f can be effectively estimated by using the regularized least squares, namely, f = (T + ηIm )−1 (Y0 )T ,
.
(8.202)
with .Y0 = [y0 (2), y0 (3), . . . , y0 (m + 1)], with .η being the regularization factor. ˆ ˆ According to .μ(k) given in (8.185), the estimate .u(k) can be achieved by ˆ ˆ u(k) = fT μ(k) = [s(k) + .
= =
λ(k)q T T −1 0T λ(k)q+r ϕ(y(k))] ( + ηIm ) Y T λ(k)q y(k) sT (k)T (T + ηIm )−1 Y0 + λ(k)q+r λ(k)q 0 −1 Y M(M + ηIm ) s(k) + λ(k)q+r y(k),
(8.203)
where .M = T . Theorem 8.4 For finite .a(k) and .b(k), when the kernel size .σ2 → ∞, the KKFCEO-MCC will reduce to the KKF-CEO algorithm. Proof As .σ2 → ∞, we have .λ(k) → 1. In this case, the KKF-CEO-MCC becomes the KKF-CEO.
268
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
ˆ Remark 8.2 In the KKF-CEO, the estimate .u(k) is influenced by q and r, and the inappropriate q and r will result in poor performance. However, the KKF-CEOMCC employs .λ(k) as an adaptive weighting factor for q and r. From the view of loss function, .λ(k) induced by correntropy involves high order statistics of error ˆ in the update of .s(k), which improves the robustness in the update of .u(k) against heavy-tailed noises.
8.4.4.3
t-Step Predictor of KKF-CEO-MCC
The predictor of KKF-CEO-MCC includes two processes. In the learning process, the training data from noisy measurements are used to establish the conditional embedding operator. During the filtering process, an online testing input sequence is used to estimate or predict the kernel embedding at each iteration. By virtue of the kernel trick and the regularized least squares, one can obtain the state estimate in the original space. Similar to the 1-step predictor, the t-step predictor of KKFCEO-MCC can be easily obtained based on the two processes. With the previous derivations, we summarize the t-step predictor of KKF-CEO-MCC in Algorithm 16. Remark 8.3 Different from the one-step prediction, Algorithm 16 gives the generalized t-step predictor of KKF-CEO-MCC. One can observe that . and .Y0 construct a new conditional embedding operator from the training data set, which associates with the current and future measurement embedding and describes how the embedding is propagated from k to .k + t. In addition, when .t = 1, the algorithm reduces to the 1-step prediction.
8.4.5 Simplified Versions of KKF-CEO-MCC In KKF-CEO-MCC, the learned model is constructed by the conditional embedding operator calculated by kernel matrices, and the filtering process requires the whole ˜ ˜ ˜ training data when computing the matrices .G(k), .P(k), and .P(k|k − 1). Therefore, the computational cost is very expensive especially for large scale data sets. To solve this problem, two approaches are introduced to curb the computational complexity of the model. In [176], an online approach is used to simplify the KKF-CEO (viz., the KKF-CEO-O) by selecting some important samples to compute the conditional embedding and kernel matrices. Similar to the KKF-CEO-O, we introduce an online KKF-CEO-MCC called KKF-CEO-MCC-O to simplify the computational complexity. The other is the Nyström approach [177], which uses a low-rank matrix to approximate a large kernel matrix to reduce the computational costs in kernel methods. We obtain the KKF-CEO-MCC-NA algorithm when using the Nyström approach to simplify the computational complexity of KKF-CEO-MCC.
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
269
Algorithm 16 t-Step Predictor of KKF-CEO-MCC Initialization and learning: For k = 1, set ϒ = [ϕ(y0 (1)), ϕ(y0 (2)), . . . , ϕ(y0 (m))] = [ϕ(y0 (1 + t)), ϕ(y0 (2 + t)), . . . , ϕ(y0 (m + t))] Y0 = [y0 (1 + t), y0 (2 + t), . . . , y0 (m + t))] K = ϒ T ϒ; M = T ; T = ϒ T L = (K+ςmIm )−1 μ(0) ˆ = ϕ(y(0)) ˜ P(0) = εIm w(1) = Lϒ T μ(0) ˆ ˜ P(1|0) = εLKLT Filtering: For k = 1, 2 . . ., set Z(k) = T ϕ(y(k)) While k = 1 λ(k) = 1 Else
1 − ϕ(y(k))T μ(k ˆ − 1) a(k) = 1r T −μ(k ˆ − 1) ϕ(y(k)) + b02 (k) b(k) = q1 b1 (k) − q1 b2 (k)(q P˜ −1 (k|k − 1) + T )−1 b3 (k). λ(k) =
κσ2 (a(k)) κσ2 (b(k))
End
−1
(λ(k)q + r)Im ˜ ˜ P(k|k − 1) G(k) = ˜ +λ(k)P(k|k − 1)M λ(k)rq ˜ r ˜ ˜ P(k|k − 1) − λ(k)q+r G(k) P(k) = λ(k)q+r λ(k)r ˜ ˜ − λ(k)q+r G(k)MP(k|k − 1) λ(k)r ˜ r Im − λ(k)q+r G(k)M w(k) s(k) = λ(k)q+r ˜ + λ(k)r G(k)Z(k) λ(k)q+r
λ(k)q Lϒ T ϕ(y(k)) w(k + 1) = Lϒ T s(k) + λ(k)q+r λ(k)q 0 −1 ˆ u(k) = Y M(M + ηIm ) s(k) + λ(k)q+r y(k) rq T LT + ˜ + 1|k) = LTP(k)T ˜ LKLT P(k λ(k)q+r
End
8.4.5.1
KKF-CEO-MCC-O
In KKF-CEO-MCC, the conditional embedding operator is constructed with all training data. Here, KKF-CEO-MCC-O updates the operator with the subsets of training data at each iteration. With the help of the distance between the current ˆ estimated embedding .μ(k) and all the training samples .{ϕ(y0 (j ))}m j =1 in the RKHS,
270
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
the closest n samples and the corresponding measurements .{ϕ(z0 (k, j ))}nj=1 can be selected to estimate the conditional embedding operator. In KKF-CEO-MCC-O, the distance is computed by , ,2 ˆ − ϕ(y0 (j )), disj = ,μ(k) ,2 , , , λ(k)q ϕ(y(k)) − ϕ(y0 (j )), = ,(k − 1)s(k) + λ(k)q+r , ,2 , λ(k)q , T T , = s (k) (k − 1)(k − 1)s(k) + , ϕ(y(k)), , ./ 0 λ(k)q + r 0 ./ χ1 . χ2 , ,2 2λ(k)q , , s(k)T (k − 1)ϕ(y(k)) + ,ϕ(y0 (j )), + λ(k)q +r ./ 0 0 ./ χ3 χ4 2λ(k)q T − λ(k)q+r ϕ (y(k))ϕ(y0 (j )) − 2sT (k)T (k
(8.204)
− 1)ϕ(y0 (j )),
where .(k−1) = [ϕ(z0 (k−1, 1)), ϕ(z0 (k−1, 2)), . . . , ϕ(z0 (k−1, m))] at iteration 0 .k − 1; .χ2 and .χ3 are the constants; and .χ1 and .χ4 are unrelated with .ϕ(y (j )). After simplification, the distance .disj can be determined by 4j = dis
.
λ(k)q ϕ T (y(k))ϕ(y0 (j )) λ(k)q + r +sT (k)T (k − 1)ϕ(y0 (j )).
(8.205)
4 j . Then, Similar to the KKF-CEO-O, the KKF-CEO-MCC-O selects the n largest .dis 0 we use the selected n training data .Yo to construct the feature matrices .ϒ o , .o , and the conditional embedding operator .Fo at iteration k, with .n < m.
8.4.5.2
KKF-CEO-MCC-NA
The Nyström approach has been widely used to speed up various kernel methods, such as Gaussian process regression and kernel ridge regression [177]. In the Nyström approach, a low-rank matrix can approximate the original kernel matrix [6], resulting in computational and memory savings [177]. In order to construct the lowˆ the Nyström approach randomly samples n (.n < m) columns to rank matrix .K, ˆ ∈ Rn×n based form a matrix .Kb ∈ Rn×m and to derive a new small kernel matrix .K on the n columns. The original kernel matrix can be approximated by [177] ˆ = Kb V D −1 V T KT , K≈K b
.
(8.206)
ˆ is viewed as an approximation of .K; .V = [V ij ]m×m = (ˆv (1), . . . , vˆ (m)) where .K ˆ .D = diag{ξˆ (1), . . . , ξˆ (m)} denotes the denotes the eigenvector matrix of .K; ˆ diagonal matrix; .ξ (j ) is the eigenvalues; and .vˆ (j ) is the eigenvectors. Inspired by
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . .
271
Table 8.10 Computational complexities of different algorithms Algorithms KRLS KKF-CEO KKF-CEO-MCC KKF-CEO-O KKF-CEO-MCC-O KKF-CEO-NA KKF-CEO-MCC-NA
Estimation of embeddings N/A 3 .O(m ) 3 .O(m ) 3 2 .O(n ) + O(mn ) 3 2 .O(n ) + O(mn ) 3 .O(n ) 3 .O(n )
Calculation of predictions 2) 3 .O(m ) 3 .O(m ) + O(m) 2 .O(m ) + O(mn) 2 .O(m ) + O(mn) + O(n) 2 .O(m ) + O(mn) 2 .O(m ) + O(mn) + O(n) .O(m
the Nyström approach, the kernel matrices .K = ϒ T ϒ in KKF-CEO-MCC can be approximated by .K ≈ ϒ Ts ϒ s , where .
ϒ s = [ϕ(y0s (1)), ϕ(y0s (2)), . . . , ϕ(y0s (n))].
(8.207)
is the embedding samples via random sampling. Then, we have .
s = [ϕ(y0s (1 + t)), ϕ(y0s (2 + t)), . . . , ϕ(y0s (n + t))] Y0s = [y0s (1 + t), y0s (2 + t), . . . , y0s (n + t)].
(8.208)
With .ϒ s , .s , and .Y0s , the conditional embedding operator can thus be approximated.
8.4.6 Computational Complexity In the KKF-CEO-MCC, KKF-CEO-MCC-O, and KKF-CEO-MCC-NA, the computational complexity is mainly incurred by the estimation of embeddings and calculation of predictions. Table 8.10 gives the comparison of the computational complexities among the MMSE-based algorithms, i.e., KRLS, KKF-CEO, KKF-CEO-O, and KKF-CEO-NA, and the MCC-based algorithms, namely, KKFCEO-MCC, KKF-CEO-MCC-O, and KKF-CEO-MCC-NA. In comparison with the MMSE-based algorithms, the MCC-based algorithms incur slightly higher computational burden due to the calculation of the correntropy-based function .λ(k). The complexity increases with the number of training data required to construct the estimated conditional embedding operators. However, the KKF-CEO-MCC-O and KKF-CEO-MCC-NA have lower computational complexity than the KKF-CEOMCC algorithm. Compared with the Nyström approximation, the online approach has higher computational complexity since it requires to update the conditional embedding operator at each iteration.
272
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
8.4.7 Illustrative Examples In this section, the performances of the KKF-CEO-MCC, KKF-CEO-MCC-O, and KKF-CEO-MCC-NA are demonstrated in several simulation examples. The first example is about the noisy IKEDA chaotic time-series estimation, which is used to validate the performance in different parameter settings including algorithm parameters and noise parameters. The second example predicts the noisy Lorenz time series and shows the superiority of KKF-CEO-MCC-O and KKF-CEO-MCCNA in reducing the computational complexity. The third example predicts the noisy real sunspot time series. The alpha-stable distribution has a sharper peak and thicker tails, which can be used to model the impulsive measurement noises. The characteristic function of the alpha-stable distribution is [88] g (k) = exp jρ4 k − ρ3 |k|ρ1 1 + jρ2 sgn (k) S (k, ρ3 ) ,
.
(8.209)
where the parameter vector .V = (ρ1 , ρ2 , ρ3 , ρ4 ) includes the characteristic factor ρ1 ∈ (0, 2], asymmetry parameter .ρ2 ∈ (−1, 1), dispersion parameter .ρ3 > 0, and location parameter .ρ4 ∈ (−∞, ∞), and
.
% .
S (k, ρ1 ) =
tan ρ12π , ρ1 = 1 2 π log |k|, ρ1 = 1.
(8.210)
To evaluate the filtering accuracy, we define the testing mean square error as Testing MSE =
.
lmc 1
lmc
l=1
ntest 1
ntest
ˆ ||u(k) − u(k)||
2
,
(8.211)
k=1
ˆ where .u(k) is the estimate of .u(k), .ntest is the number of testing data, and .lmc is the number of runs.
8.4.7.1
Noisy IKEDA Chaotic Time-Series Estimation
The IKEDA chaotic time-series [176] is generated by ϑ(k) = 0.4 −
.
6.0 1 + x12 (k) + x22 (k)
x1 (k + 1) = 1.0 + 0.84 (x1 (k) cos(ϑ(k)) − x2 (k) sin(ϑ(k))) x2 (k + 1) = 0.84 (x1 (k) sin(ϑ(k)) − x2 (k) cos(ϑ(k))) ,
(8.212)
where the state .u(k) = [x1 (k), x2 (k)]T with initiation .u(0) = [1, 0]T . Our goal is to estimate the clean state .u(k) from the alpha-stable noise corrupted data, denoted
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . . Table 8.11 Testing MSE for noisy IKEDA time-series estimation (from [174])
Algorithms CKF UKF KKF-CEO KKF-CEO-MCC
Gaussian noise 0.1009 0.2091 0.0759 0.0753
273 alpha-stable noise 1.2409 2.2926 0.8303 0.6215
Table 8.12 Average computing time (sec) for noisy IKEDA time-series estimation (from [174]) Algorithms Computing time
UKF 0.0175
CKF 0.0123
KKF-CEO 0.582
KKF-CEO-MCC 0.656
by .y(k). In this example, the training and testing data number are 201 and 200, respectively. The simulations are carried out with MATLAB 2016b running in i54590 and 3.30 GHZ CPU. First, the additive alpha-stable noises with .ρ1 = 2.0 (viz., the Gaussian noises) are considered to evaluate the performance of KKF-CEO-MCC in comparison with KKF-CEO, UKF, and CKF. The signal noise ratio is set to .3dB, and the parameters q, r, .ς , .ε, .σ1 , and .σ2 are .1.0, 3.0, 10−3 , 10−3 , 1.0, 5.0, respectively. Table 8.11 gives the testing MSE of different filters. From Table 8.11, the KKF-CEO-MCC achieves slightly better performance than KKF-CEO under the Gaussian noises. To further show the performance superiority of the KKF-CEO-MCC, Table 8.11 also illustrates the performance in the presence of large outliers (viz., alpha-stable noises with .ρ1 < 2.0). The parameters .ρ1 , .ρ2 , .ρ3 , .ρ4 , .σ1 , and .σ2 are set at .1.2, 0, 0.06, 0, 1.0, 0.5, respectively. In Table 8.11, KKF-CEO-MCC significantly outperforms all other methods including KKF-CEO, UKF, and CKF in the alphastable noises. Table 8.12 presents the computing time of different filters averaged over 50 independent runs. In general, compared with KKF-CEO, the KKF-CEOMCC can achieve better performance at the cost of slightly increasing computational burden. We investigate the influences of different parameters on the filtering performance in alpha-stable noises: (1) The noise intensity q and r are important parameters, which reflect the uncertainty of established model. Figure 8.6 shows the testing MSE of KKFCEO-MCC under different pairs of q and r. In Fig. 8.6, we adopt the same parameter setting with the alpha-stable noise case in Table 8.11, except that q and r vary from 1 to .3.25. From Fig. 8.6, one can find out that a smaller r or a larger q adopted by KKF-CEO-MCC can achieve higher estimation accuracy. To further investigate this effect, the estimation results with different ratios of q and r are given in Fig. 8.7, which suggests that KKF-CEO-MCC can always achieve the higher filtering accuracy than KKF-CEO in this example. (2) Since the kernel size of correntropy is a crucial parameter in MCC, the influence of .σ2 on the filtering performance of KKF-CEO-MCC is demonstrated in Fig. 8.8. The parameter settings are the same as that of the alpha-stable noise case in Table 8.11, except that the kernel size .σ2 varies from .0.01 to .0.97. From
274
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Fig. 8.6 Testing MSE with different q and r (from [174])
3.25 0.7
3 0.65
2.75
0.6
2.25
0.55
2
0.5
1.75
0.45
r
2.5
1.5
0.4
1.25 0.35
1 1
1.25
1.5
1.75
2
2.25
2.5
2.75
3
3.25
q
Fig. 8.7 Testing MSE with different .r/q (from [174])
20
KKF-CEO-MCC KKF-CEO
Testing MSE
15
10
5
0
1
2
r/q
3
4
5
Fig. 8.8, one can see that if the kernel size .σ2 is too small or too large, the performance of KKF-CEO-MCC will become worse. In this example, the KKFCEO-MCC achieves the best performance when .σ2 ≈ 0.35. In the subsequent simulation, the kernel size .σ2 is set to .σ2 = 0.5. (3) The influences of the noise parameters .ρ1 and .ρ3 are illustrated in Figs. 8.9 and 8.10, respectively. The parameter setting for Figs. 8.9 and 8.10 is the same as that of the alpha-stable noise case in Table 8.11, except that .ρ1 in Fig. 8.9 varies from .1.5 to .2.0 and .ρ3 in Fig. 8.10 varies from .0.01 to .2.0. Evidently, the KKF-CEO-MCC is more robust to impulsive noise and can achieve much lower testing MSE than the KKF-CEO with different values of .ρ1 and .ρ3 .
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . . Fig. 8.8 Testing MSE with different kernel sizes .σ2 (from [174])
275
0.72 0.7
KKF-CEO-MCC KKF-CEO
Testing MSE
0.68 0.66 0.64 0.62 0.6
0
0.2
0.4
0.6
0.8
0.97
2
2.2
KKF-CEO-MCC KKF-CEO
2 1.8
Testing MSE
Fig. 8.9 Testing MSE with different .ρ1 under alpha-stable noise (from [174])
1.6 1.4 1.2 1 0.8 0.6 1.5
1.6
1.7
1.8
1.9
2
1
8 7
KKF-CEO-MCC KKF-CEO
6
Testing MSE
Fig. 8.10 Testing MSE with different .ρ3 under alpha-stable noise (from [174])
5 4 3 2 1 0
0.5
1
1.5 3
2
276
8.4.7.2
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Noisy Lorenz Time-Series Prediction
8 1 The Lorenz system [5] is described by the differential equations: . dx dτ = − 3 x1 + dx3 dx2 x2 x3 ; dτ = 10(x2 − x3 ); dτ = −x1 x2 + 28x2 − x2 , where state .u(k) = [x1 (k), x2 (k), x3 (k)]T is obtained at sampling period .0.01. The alpha-stable noises are added to the time series. The goal is to predict the clean data from the noisy data. In the simulation, the training data number and testing data number are 1700 and 200, respectively. We compare the performance of the simplified versions of the KKF-CEO and KKF-CEO-MCC and also show the performance of the KRLS, a well-known kernel adaptive filter [6]. First, the testing MSEs with different network sizes are given in Table 8.13. Here, the parameters .ρ1 , .ρ2 , .ρ3 , and .ρ4 are configured as .1.3, 0, .0.3, and 0. For KKF-CEOMCC, the parameters q, r, .ς , .ε, .σ1 , and .σ2 are set to .0.1, 0.09, 10−5 , 10−3 , 1.0 and .0.6, respectively. For KRLS, the regularization factor, forgetting factor, and kernel size .σ1 are set to .10−3 , .1.0, and .1.0, respectively. As illustrated in Table 8.13, the algorithms with larger network sizes usually obtain better performance than those with smaller network sizes. In addition, KKF-CEO-MCC-NA and KKF-CEOMCC-O in general outperform KKF-CEO-NA and KKF-CEO-O, respectively. In this example, the KKF-CEO-MCC-NA performs the best among all the compared methods. Table 8.14 presents the average computing time of different filters with different network sizes over 50 runs. One can see that compared with KKF-CEONA and KKF-CEO-O, KKF-CEO-MCC-NA and KKF-CEO-MCC-O can achieve better performance at the cost of slightly increasing computational burden. The testing MSEs with different prediction steps and network sizes are shown in Figs. 8.11 and 8.12. One can see that a larger prediction step may degrade the filtering accuracy. Again, the KKF-CEO-MCC-NA exhibits the best performance among all compared algorithms.
8.4.7.3
Noisy Sunspot Time-Series Prediction
We consider an example of real data set prediction, namely, the sunspot time-series prediction [178, 179]. The data set is chosen as a set of sunspot numbers, recorded by the World Data Center [179]. The 25 most recent values in the past are used as the embedded input to predict the current value. Figure 8.13 plots 438 months’ sunspot number from May 1982 to Sep 2018. The first 200 months are used as the training data and the last 238 months as the testing data. An alpha-stable noise is added to the time series. To evaluate the prediction ability of KRLS, KKF-CEO, and KKF-CEO-MCC, the testing MSEs with different prediction steps t are shown in Fig. 8.14. Here, the parameters are configured as .ρ1 = 1.5, .ρ2 = 0, .ρ3 = 0.5, −3 , .ε = 10−4 , .σ = 1.0, and .σ = 0.3. In KRLS, .ρ4 = 0, .q = 5.0, .r = 5.0, .ς = 10 1 2 the regularization factor, forgetting factor, and kernel size .σ1 are .10−3 , .1.0, and .1.0, respectively.
.n
Network size = 30 .n = 40 .n = 50 .n = 60 .n = 70 .n = 80 .n = 90 .n = 100
.4.1183
± 0.0723 .4.1183 ± 0.0723 .4.1183 ± 0.0723 .4.1183 ± 0.0723 .4.1183 ± 0.0723 .4.1183 ± 0.0723 .4.1183 ± 0.0723 .4.1183 ± 0.0723
KRLS KKF-CEO-O ± 0.0662 .2.2014 ± 0.0817 .2.1622 ± 0.0012 .2.2230 ± 0.0453 .1.2691 ± 0.0820 .1.7469 ± 0.0923 .1.1460 ± 0.0426 .1.7050 ± 0.0077
.2.4168
Table 8.13 Testing MSE with different network sizes (from [174]) KKF-CEO-MCC-O ± 0.0437 .0.2561 ± 0.0417 .0.2533 ± 0.0406 .0.2515 ± 0.0408 .0.2522 ± 0.0397 .0.2266 ± 0.0348 .0.2260 ± 0.0351 .0.2200 ± 0.0347
.0.2608
KKF-CEO-NA ± 0.0495 .0.2904 ± 0.0338 .0.2769 ± 0.0396 .0.3026 ± 0.0318 .0.2731 ± 0.0472 .0.2361 ± 0.0290 .0.2164 ± 0.0313 .0.2529 ± 0.0323 .0.3231
KKF-CEO-MCC-NA ± 0.0352 .0.1872 ± 0.0378 .0.1852 ± 0.0362 .0.1850 ± 0.0362 .0.1833 ± 0.0364 .0.1830 ± 0.0365 .0.1817 ± 0.0362 .0.1806 ± 0.0345 .0.1952
8.4 Kernel Kalman Filtering with Conditional Embedding Operator and. . . 277
278
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
Table 8.14 Average computing time (s) for noisy Lorenz time-series prediction (from [174]) Network size .n = 30 .n = 40 .n = 50 .n = 60 .n = 70 .n = 80 .n = 90 .n = 100
KRLS 3.9620 3.9760 3.9577 3.9659 3.9730 3.9750 3.9600 3.9641
KKF-CEO-O 7.8583 10.5156 13.1572 16.0112 18.9590 22.1642 25.4742 28.9607
Fig. 8.11 Testing MSE with network size .n = 50 and different prediction steps t (from [174])
KKF-CEO-MCC-O 7.9874 11.4404 14.2475 16.1669 19.9952 23.1732 26.5350 29.9331
KKF-CEO-NA 0.0519 0.0846 0.1023 0.1273 0.1718 0.2458 0.2661 0.3004
KKF-CEO-NA KKF-CEO-MCC-NA KKF-CEO-O KKF-CEO-MCC-O
1.2 1 0.8
Testing MSE
KKF-CEOMCC-NA 0.0732 0.1045 0.1316 0.1632 0.2218 0.2528 0.2914 0.3299
0.6 0.4
0.2 0.17 1
0.7
Fig. 8.12 Testing MSE with network size .n = 100 and different prediction steps t (from [174])
10
15
20
10
15
20
Prediction step t
KKF-CEO-NA KKF-CEO-MCC-NA KKF-CEO-O KKF-CEO-MCC-O
0.5
Testing MSE
5
0.4 0.3
0.2 0.14
1
5
Prediction step t
8.5 Conclusion
279 300
Fig. 8.13 Monthly mean total sunspot number from May 1982 to Sep 2018 (from [174])
Sunspot number
250 200 150 100 50 0 1982
1.6 1.4
2000
Time(years)
2009
2018
KKF-CEO-MCC KKF-CEO KRLS
1.2
Testing MSE
Fig. 8.14 Testing MSE with different prediction steps t (from [174])
1991
1
0.8 0.6
0.4 1
5
10
15
20
Prediction step t
25
30
Simulation results confirm that KKF-CEO-MCC exhibits better prediction performance than both KKF-CEO and KRLS in alpha-stable noise.
8.5 Conclusion This chapter presents several other Kalman-type filters under information theoretic criteria. In order to address the problem of constrained state estimation in nonGaussian noise environments, the MCKF-SC is derived by combining the MCC and a constrained estimation methodology. Both linear and nonlinear state constraints are taken into account. Simulation results confirm the superior performance of the
280
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
MCKF-SC. To improve the numerical instability, two correntropy-based divided difference filtering methods, namely, CDD1 and CDD2, are derived, which show strong robustness against heavy-tailed non-Gaussian noises. To estimate the nonlinear system parameter and state in the complicated non-Gaussian measurement noises, the dual extended Kalman filtering based on minimum error entropy with fiducial points (MEEF-DEKF) is derived to improve the robustness and filtering precision. In addition, a novel kernel Kalman-type filter, namely, kernel Kalman filter with conditional embedding operator and maximum correntropy criterion (KKF-CEO-MCC), is derived to deal with the nonlinear time-series prediction. Compared with the KKF-CEO derived under the minimum mean square error (MMSE) criterion, the KKF-CEO-MCC is more robust and can achieve much better performance especially in impulsive noises.
8.6 Appendices 8.6.1 Appendix 8.A Equation (8.162) is calculated by .
, , ˆ ˆ − 1), 2P−1 (k|k−1) − F(k − 1)μ(k κσ2 ,μ(k)
ˆ ˆ − 1)) − F(k − 1)μ(k P−1 (k|k − 1)(μ(k) , ,2 ˆ ,R−1 (k) R−1 (k)(ϕ(y(k)) − μ(k)) ˆ − κσ2 ,ϕ(y(k)) − μ(k) = 0.
(8.213)
Rearrange (8.213) as ˆ ˆ − 1) μ(k) = F(k − 1)μ(k
.
ˆ + λ(k)P(k|k − 1)R−1 (k)(ϕ(y(k)) − μ(k)),
(8.214)
where , ,2 ˆ ,R−1 (k) κσ2 ,ϕ(y(k)) − μ(k) , . .λ(k) = , ˆ ˆ − μ(k|k − 1), 2P−1 (k|k−1) κσ2 ,μ(k)
(8.215)
8.6 Appendices
281
Substituting (8.163) into (8.214), one can derive ˆ ˆ ˆ μ(k) = μ(k|k − 1) + λ(k)P(k|k − 1)R−1 (k)(ϕ(y(k)) − μ(k))
.
ˆ + λ(k)P(k|k − 1)R−1 (k)μ(k|k − 1) ˆ − λ(k)P(k|k − 1)R−1 (k)μ(k|k − 1).
(8.216)
ˆ − 1) ≈ μ(k) ˆ To derive the KKF-CEO-MCC, we approximate .μ(k only in the Gaussian kernel functions of .λ(k), yielding , ,2 ˆ − 1),R−1 (k) κσ2 ,ϕ(y(k)) − μ(k , . .λ(k) ≈ , ˆ − 1) − μ(k|k ˆ − 1), 2P−1 (k|k−1) κσ2 ,μ(k
(8.217)
ˆ Then, the estimate .μ(k) can be expressed as ˆ ˆ ˆ μ(k) = μ(k|k − 1) + G(k)(ϕ(y(k)) − μ(k|k − 1)),
.
(8.218)
where G(k) =λ(k)[I + λ(k)P(k|k − 1)R−1 (k)]−1 P(k|k − 1)R−1 (k)
.
=λ(k)P(k|k − 1)[λ(k)P(k|k − 1) + R(k)]−1 .
(8.219)
Therefore, we obtain (8.165), (8.166) and (8.167).
8.6.2 Appendix 8.B Proof: First, from (8.160), (8.164), and .P(0) = εInk , we have P(1|0) =F(0)P(0)FT (0) + qInk
.
=ε(ϒ T ϒ + ς mIm )−1 ϒ T ϒ[(ϒ T ϒ + ς mIm )−1 ]T T + qInk T ˜ =P(1|0) + qInk ,
(8.220)
˜ where .P(1|0) = (ϒ T ϒ + ς mIm )−1 ϒ T ϒ[(ϒ T ϒ + ς mIm )−1 ]T . Then, according to the matrix inversion lemma [6], the Kalman gain G(1) is given by
282
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
−1 G(1) = λ(1)P(1|0) λ(1)P(1|0) + rInk
T −1 ˜ T + qI ) λ(1)P(1|0) ˜ = λ(1)(P(1|0) nk +(λ(1)q + r)Ink
λ(1)q T ˜ λ(1)q+r λ(1)P(1|0) + (λ(1)q + r)Ink = λ(1)r T ˜ + λ(1)q+r P(1|0) .
−1 T + (λ(1)q + r)I ˜ × λ(1)P(1|0) nk
−1 (λ(1)q + r)Im λ(1)q λ(1)r T ˜ Ink + λ(1)q+r = λ(1)q+r P(1|0) T ˜ +λ(1)P(1|0) λ(1)q λ(1)r T, ˜ = λ(1)q+r Ink + λ(1)q+r G(1)
(8.221)
−1 T ˜ ˜ ˜ where G(1) = (λ(1)q + r)Im + λ(1)P(1|0) P(1|0). Based on (8.221), we update the covariance matrix as P(1) = (Ink − G(1))P(1|0) λ(1)q Ink − = (Ink − λ(1)q+r .
= =
λ(1)r T T + qI ) ˜ ˜ nk λ(1)q+r G(1) )(P(1|0) λ(1)rq ˜ r ˜ P(1|0) − G(1) rq λ(1)q+r λ(1)q+r T λ(1)q+r Ink + − λ(1)r G(1) T P(1|0) ˜ ˜ λ(1)q+r rq T ˜ λ(1)q+r Ink + P(1) ,
˜ where P(1) = when k = 1.
λ(1)rq ˜ r ˜ λ(1)q+r P(1|0) − λ(1)q+r G(1) λ(1)r ˜ ˜ − λ(1)q+r G(1)T P(1|0)
(8.222)
. Therefore, Theorem 8.1 holds
8.6.3 Appendix 8.C ˆ Proof: First, we predict .μ(1|0) as ˆ ˆ μ(1|0) = F(0)μ(0)
.
ˆ = (ϒ T ϒ+ς mIm )−1 ϒ T μ(0) = w(1), ˆ where .w(1) = (ϒ T ϒ+ς mIm )−1 ϒ T μ(0).
(8.223)
8.6 Appendices
283
ˆ Then, update the state .μ(1) as ˆ ˆ ˆ μ(1) = μ(1|0) + G(1)(ϕ(y(1)) − μ(1|0)) λ(1)r T ˜ G(1) ˆ = w(1) + λ(1)q+r (ϕ(y(1)) − μ(1|0)) λ(1)q + λ(1)q+r Ink . λ(1)r ˜ r T λ(1)q λ(1)q+r Im − λ(1)q+r G(1) w(1) = + λ(1)q+r ϕ(y(1)) T ϕ(y(1)) ˜ + λ(1)r G(1) λ(1)q+r λ(1)q λ(1)q+r ϕ(y(1)),
= s(1) +
(8.224) where .s(1) =
λ(1)r ˜ r T λ(1)q+r Im − λ(1)q+r G(1) λ(1)r ˜ + λ(1)q+r G(1)T ϕ(y(1))
w(1)
. Therefore, Theorem 8.2
has been proven when .k = 1.
8.6.4 Appendix 8.D Proof: From above derivations, we have , , ˆ − 1) − μ(k|k ˆ − 1), 2P−1 (k|k−1) b(k) = ,μ(k
.
= e(k|k − 1)T P−1 (k|k − 1)e(k|k − 1) 1 1 ˜ −1 q Ink − q 2 (P (k|k − 1) T = e(k|k − 1) e(k|k − 1) + q1 T )−1 T 1 1 ˜ −1 (k|k − 1) (q P I − n = e(k|k − 1)T q kT q−1 T e(k|k − 1) + )
(8.225)
= q1 e(k|k − 1)T e(k|k − 1) − q1 e(k|k − 1)T (q P˜ −1 (k|k − 1) + T )−1 T e(k|k − 1) = q1 b1 (k) − q1 b2 (k)(q P˜ −1 (k|k − 1) + T )−1 b3 (k), where b1 (k) = e(k|k − 1)T e(k|k − 1) ˆ − 1) − μ(k|k ˆ ˆ − 1) − μ(k|k ˆ − 1)] = [μ(k − 1)]T [μ(k T ˆ − 1) − μ(k ˆ − 1)T μ(k|k ˆ ˆ − 1) μ(k − 1) = μ(k ˆ − 1) + μ(k|k ˆ ˆ ˆ − 1)T μ(k|k − 1) −μ(k|k − 1)T μ(k = b01 (k) + b02 (k) − b03 (k) − b04 (k), . b2 (k) = e(k|k − 1)T ˆ − 1) − μ(k|k ˆ = [μ(k − 1)]T λ(k−1)q = [s(k − 1) + λ(k−1)q+r ϕ(y(k − 1)) − w(k)]T = s(k − 1)T T + b3 (k) = b2 T (k),
λ(k−1)q T λ(k−1)q+r ϕ (y(k
− 1)) − wT (k)T ,
(8.226)
284
8 Additional Topics in Kalman Filtering Under Information Theoretic Criteria
with ˆ ˆ b01 (k) = μ(k|k − 1)T μ(k|k − 1)=wT (k)T w(k) ˆ − 1) ˆ − 1)T μ(k b02 (k) = μ(k
T λ(k−1)q = s(k − 1) + λ(k−1)q+r ϕ(y(k − 1))
λ(k−1)q s(k − 1) + λ(k−1)q+r ϕ(y(k − 1)) .
= s(k − 1)T T s(k − 1) + λ(k−1)q s(k + λ(k−1)q+r
λ(k−1)q T λ(k−1)q+r ϕ (y(k
− 1)T T ϕ(y(k
− 1)) +
− 1))s(k − 1) 2
λ(k−1)q λ(k−1)q+r
ˆ − 1) ˆ b03 (k) = μ(k|k − 1)T μ(k = wT (k)T [s(k − 1) + = wT (k)T s(k − 1) + b04 (k) = b03 T (k).
λ(k−1)q λ(k−1)q+r ϕ(y(k − 1))] λ(k−1)q T T λ(k−1)q+r w (k) ϕ(y(k
Therefore, Theorem 8.3 has been proven.
− 1))
(8.227)
Reference
1. R. Kalman, A new approach to linear filtering and prediction problems.Trans. ASME-J. Basic Eng. 82(1), 35–45 (1960) 2. R. Kalman, R. Bucy, New results in linear filtering and prediction theory. Trans. ASME-J. Basic Eng. 83(1), 95–108 (1961) 3. N.E. Nahi, in Estimation Theory and Applications (Wiley, New York, 1969) 4. S. Haykin, in Adaptive Filters Therory (Wiley, New York, 2001) 5. A. Sayed, in Fundamentals of Adaptive Filters (Wiley, New York, 2008) 6. W. Liu, J.C. Príncipe, S. Haykin, in Kernel Adaptive Filtering (Wiley, New York, 2010) 7. A. Bryson, Y. Ho, in Applied Optimal Control: Optimization, Estimation and Control (CRC Press, Boca Raton, 1975) 8. W. Edmonson, K. Srinivasan, C.Wang, J. Principe, A global least mean square algorithm for adaptive IIR filtering. IEEE Trans. Circuits Syst. 45, 379–38 (1998) 9. S. Särkkä, in Bayesian Filtering and Smoothing (Cambridge University Press, Cambridge, 2013) 10. G. Yan, J. Wong, in Strapdown Inertial Navigation Algorithm and Combined Navigation Principle (Northwestern Polytechnical University Press, Xi’an, 2019) 11. S. Mitra, Y. Kuo, in Digitial Signal Processing: A Computer-Based Approach (McGraw-Hill, New York, 2006) 12. G. Tambini, G.C. Montanari, M. Cacciari, The Kalman filter as a way to estimate the lifemodel parameters of insulating materials and system, in Proceedings the 4th International Conference on Conduction and Breakdown in Solid Dielectrics (1992), pp. 523–527 13. D. Unsal, M. Dogan, Implementation of identification system for IMUs based on Kalman filtering, in Proceedings IEEE/ION Position, Location and Navigation Symposium-PLANS (2014), pp. 236–240 14. A. Yadav, N. Naik, M.R. Ananthasayanam, A. Gaur, Y.N. Singh, A constant gain Kalman filter approach to target tracking in wireless sensor networks, in Proceedings the IEEE 7th International Conference on Industrial and Information Systems, ICIIS (2012), pp. 1–7 15. H. Chou, M. Traonmilin, E. Ollivier, M. Parent, A simultaneous localization and mapping algorithm based on Kalman filtering, in Proceedings IEEE Intelligent Vehicles Symposium (2004), pp. 631–635 16. S. Wang, J. Huang, Z. Xie, Y. Jiang, J. Feng, in Principles of Nonlinear Kalman Filters and ther Applications (Electronics industry Press, Beijing, 2015) 17. X. Huang, Y. Wang, in Principles of Kalman Filters and ther Applications (Electronics Industry Press, Beijing, 2015)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2
285
286
Reference
18. S. Julier, J. Uhlmann, A new extension of the Kalman filter to nonlinear systems, in Proceedings 11th International Symposium Aerospace/Defence Sensing, Simulation and Controls (1997), pp. 182–193 19. M. Roth, F. Gustafsson, An efficient implementation of the second order extended Kalman filter, in Proceedings 14th International Symposium on Information Fusion (2011), pp. 1–6 20. R. Van Der Merwe, Sigma-point kalman filters for probabilistic inference in dynamic statespace models, Thesis, 2004 21. S.J. Julier, J.K. Uhlmann, Unscented filtering and nonlinear estimation. Proc. IEEE 92(3), 401–422 (2004) 22. K. Ito, K. Xiong, Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Control 45(5), 910–927 (2000) 23. I. Arasaratnam, S. Haykin, R.J. Elliott, Discrete-time nonlinear filtering algorithms using gausshermite quadrature. Proc. IEEE 95(5), 953–977 (2007) 24. I. Arasaratnam, S. Haykin, Cubature Kalman filters. IEEE Trans. Autom. Control 54(6), 1254–1269 (2009) 25. H. Leung, Z. Zhu, Z. Ding, An aperiodic phenomenon of the extended Kalman filter in filtering noisy chaotic signals. IEEE Trans. Signal Process. 48(6), 1807–1810 (2000) 26. K. Xiong, H. Zhang, C. Chan, Performance evaluation of UKF-based nonlinear filtering. Automatica 42, 261–270 (2006) 27. F. Gustafsson, G. Hendeby, Some relations between extended and unscented Kalman filters. IEEE Trans. Signal Process. 60(2), 545–555 (2012) 28. J. Morris, The Kalman filter: a robust estimator for some classes of linear quadratic problems. IEEE Trans. Inf. Theory. 22(5), 526–534 (1976) 29. J. Zhao, L. Mili, A robust generalized-maximum likelihood unscented Kalman filter for power system dynamic state estimation. IEEE J. Sel. Topics Signal Process. 12(4), 578–592 (2018) 30. J. Zhao, L. Mili, Robust unscented Kalman filter for power system dynamic state estimation with unknown noise statistics. IEEE Trans. Power Syst. 10(2), 1215–1224 (2019) 31. Z. Huang, N. Zhou, R. Diao, S. Wang, S. Elbert, D. Meng, S. Lu, Capturing real-time power system dynamics: opportunities and challenges, in Proceedings IEEE PES General Meeting (2015), pp. 1–5 32. Z. Huang, K. Schneider, J. Nieplocha, N. Zhou, Estimating power system dynamic states using extended Kalman Filter, in Proceedings IEEE PES General Meeting (2014), pp. 1–5 33. L. Mili, G. Steeno, F. Dobraca, D. French, A robust estimation method for topology error identification. IEEE Trans. Power Syst. 14(4), 1469–1476 (1999) 34. J. Zhao, Dynamic state estimation with model uncertainties using H-infinity extended Kalman filter. IEEE Trans. Power Syst. 33(1), 1099–1100 (2018) 35. A. Sharma, S.R. Samantaray, Power system tracking state estimator for smart grid under unreliable PMU data communication network. IEEE Sens. J. 18(5), 2107–2116 (2018) 36. J. Zhao, M. Netto, L. Mili, A robust iterated extended Kalman filter for power system dynamic state estimation. IEEE Trans. Power Syst. 32(4), 3205–3216 (2017) 37. A. Sharma, S.C. Srivastava, S. Chakrabarti, An iterative multiarea state estimation approach using area slack bus adjustment. IEEE Syst. J. 10(1), 69–77 (2014) 38. M.C. Graham, J.P. How, D.E. Gustafson, Robust state estimation with sparse outliers. J. Guid. Control Dyn. 38(7), 1229–1240 (2015) 39. Y. Huang, Y. Zhang, A new process uncertainty robust students’t based Kalman filter for SINS/GPS integration. IEEE Access 5, 14391–14404 (2017) 40. M. Zhong, J. Guo, Q. Cao, On designing PMI Kalman filter for INS/GPS integrated systems with unknown sensor errors. IEEE Sens. 15(1), 535–544 (2015) 41. P. Li, X. Huang, F. Li et al., Robust Granger analysis in Lp norm space for directed EEG network analysis. IEEE Trans. Neural Syst. Rehabilitation Eng. 25(11), 1959–1969 (2017) 42. F. Nie, H. Wang, H. Huang et al., Unsupervised and semi-supervised learning via L1-norm graph, in 2011 IEEE International Conference on Computer Vision (ICCV) (2011), pp. 2268– 2273
Reference
287
43. H. Wang, Q. Tang, W. Zheng, L1-norm-based common spatial patterns. IEEE. Trans. Biomed. Eng. 59(3), 653–662 (2012) 44. P. Li, P. Xu, R. Zhang et al., L1 norm based common spatial patterns decomposition for scalp EEG BCI. Biomed. Eng. Online 12, 1–12 (2013) 45. L. Zheng, S. Wang, Z. Liu et al., Lp-norm IDF for large scale image search, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013), pp. 1626–1633 46. G.L. Plett, in Battery Management Systems, Volume I: Battery Modeling (Artech House, Norwood, 2015) 47. G.L. Plett, in Battery Management Systems, Volume II: Equivalent-Circuit Methods (Artech House, Norwood, 2015). http://mochajava.uccs.edu/BMS2/ 48. Y.M. Jeong, Y.K. Cho, J.H. Ahn, S.H. Ryu, B.K. Lee, Enhanced coulomb counting method with adaptive SOC reset time for estimating OCV, in Proceedings 2014 IEEE Energy Conversion Congress and Exposition (2014), pp. 4313–4318 49. S. Tong, M.P. Klein, J.W. Park, On-line optimization of battery open circuit voltage for improved state-of-charge and state-of-health estimation. J. Power Sources 293, 416–428 (2015) 50. J.A. Qahouq, Z. Xia, Single-perturbation-cycle online battery impedance spectrum measurement method with closed-loop control of power converter. IEEE Trans. Ind. Electron. 64(9), 7019–7029 (2017) 51. M. Cacciato, G. Nobile, G. Scarcella, G. Scelba, Real-time model based estimation of SOC and SOH for energy storage systems. IEEE Trans. Power Electron. 32(1), 794–803 (2017) 52. S. Muhammad et al., A robust algorithm for state-of-charge estimation with gain optimization. IEEE Trans. Ind. Inform. 14(5), 2983–2994 (2017) 53. G.L. Plett, Extended Kalman filtering for battery management systems of LiPB-based HEV battery packs Part 3. State and parameter estimation. J. Power Sources 134(2), 277–292 (2004) 54. S. Wang et al., Online state of charge estimation for the aerial lithiumion battery packs based on the improved extended Kalman filter method. J. Energy Storagy 9, 69–83 (2017) 55. C.S. Chin et al., Nonlinear temperature-dependent state model of cylindrical LiFePO4 battery for open-circuit voltage, terminal voltage and state-of-charge estimation with extended Kalman filter. Energy 11(9), 2467 (2018) 56. N. Ding et al., State of charge estimation of a composite lithium-based battery model based on an improved extended Kalman filter algorithm. Energy Storagy 9, 69–83 (2019) 57. S. Farahmand, G. Giannakis, D. Angelosante, Doubly robust smoothing of dynamical processes via outlier sparsity constraints. IEEE Trans. Signal Process. 59(10), 4529–4543 (2011) 58. H. Ohlsson, F. Gustafsson, L. Ljung, S. Boyd, Smoothed state estimates under abrupt changes using sum-of-norms regularization. Automatica 48(4), 595–605 (2012) 59. A. Aravkin, J. Burke, G. Pillonetto, Sparse/robust estimation and Kalman smoothing with non-smooth log-concave densities: modeling, computation, and theory. J. Mach. Learn. Res. 14(9), 2689–2728 (2013) 60. W. Zhang, B. Chen, C. Tseng, Robust H-infinity filtering for nonlinear stochastic systems. IEEE Trans. Signal Process. 53(2), 589–598 (2005) 61. W. Zhang, B. Chen, C. Tseng, Robust H-inf filtering for nonlinear stochastic systems. IEEE Trans. Signal Process. 53(2), 589–598 (2012) 62. D. Simon, in Optimal State Estimation: Kalman, H-infinity, and Nonlinear Approaches (Wiley-Interscience, New York, 2006) 63. C.K. Chui, G. Chen, in Kalman Filtering with Real-Time Applications (Springer-Verlag, Berlin, 2009) 64. A.P. Sage, G.W. Husa, Adaptive filtering with unknown prior statistics, in Proceedings of Proceedings of the Joint Automatic Control Conference (JACC) (1969), pp. 760–769 65. X. Gao, D. You, S. Katayama, Seam tracking monitoring based on adaptive Kalman filter embedded neural network during high power fiber laser welding. IEEE Trans. Ind. Electron. 59(11), 4315–4325 (2012)
288
Reference
66. W. Gao, J.C. Li, G.T. Zhou, Q. Li, Adaptive Kalman filtering with recursive noise estimator for integrated SINS/DVL systems. J. Navigat. 68(1), 142–161 (2015) 67. C. Hide, T. Moore, M. Smith, Adaptive Kalman filtering for low cost INS/GPS. J. Navigat. 56(1), 143–152 (2003) 68. A.H. Mohamed, K.P. Schwarz, Adaptive Kalman filtering for INS/GPS. J. Geod. 73(4), 193– 203 (1999) 69. Y. Geng, J. Wang, Adaptive estimation of multiple fading factors in Kalman filter for navigation applications. GPS Solut. 12(4), 273–279 (2008) 70. K.H. Kim, J.G. Lee, C.G. Park, Adaptive two-stage extended Kalman filter for a fault-tolerant INS-GPS loosely coupled system. IEEE Trans. Aerosp. Electron. Syst. 45(1), 125–137 (2009) 71. X.R. Li, Y. Bar-Shalom, A recursive multiple model approach to noise identification. IEEE Trans. Aerosp. Electron. Syst. 30(3), 671–684 (1994) 72. H. Qian, D. An, Q. Xia, IMM-UKF based land-vehicle navigation with low-cost GPS/INS, in Proceedings International Conference on Information and Automation (Harbin, 2010), pp. 2031–2035 73. D.L. Alspach, H. Sorenson, Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Trans. Autom. Control 17(4), pp. 439–448 (1972) 74. M. Roth, E. Özkan, F. Gustafsson, A Student’s filter for heavy-tailed process and measurement noise, in Proceedings 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 5770-5774 75. S. Särkkä, A. Nummenmaa, Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Trans. Automat. Control 54(3), 596–600 (2009) 76. M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002) 77. F. Sun, L. Tang, Cubature particle filter. Syst. Eng. Electron. 33(11), 2554–2557 (2011) 78. B. Cui, X. Chen, X. Tang, H. Huang, X. Liu, Robust cubature Kalman filter for GNSS/INS with missing observations and colored measurement noise. ISA Trans. 72, 138–146 (2018) 79. L.B. Chang, K.L. Li, B.Q. Hu, Huber M-estimation-based process uncertainty robust filter for integrated INS/GPS. IEEE Sens. J. 15(6), 3367–3374 (2015) 80. T. Chien-Hao, S.F. Lin, J. Dah-Jing, Robust Huber-based cubature Kalman filter for GPS navigation processing. J. Navigat. 229(7), 527–546 (2016). vol. 70 81. K. Li, B. Hu, L. Chang, Y. Li, Robust square-root cubature Kalman filter based on Huber’s mestimation methodology. Proc. Inst. Mech. Eng. G-J. Aer. 229(7), 1236–1245 (2015) 82. X. Wang, N. Cui, J. Guo, Huber-based unscented filtering and its application to vision-based relative navigation. IET Radar Sonar. Navig. 4(1), 134–141 (2010) 83. L. Chang, B. Hu, G. Chang et al., Huber-based novel robust unscented Kalman filter. IET Sci. Meas. Technol. 6(6), 502–509 (2012) 84. C.D. Karlgaard, H. Schaub, Huber-based divided difference filtering. J. Guid. Control Dyn. 30(3), 885–891 (2007) 85. L.B. Chang, B.Q. Hu, G.B. Chang, A. Li, Multiple outliers suppression derivative-free filter based on unscented transformation. J. Guid. Control Dyn. 35(6), 1902–1906 (2012) 86. C.D. Karlgaard, Nonlinear regression Huber Kalman filtering and fixed-interval smoothing. J. Guid. Control Dyn. 38(2), 322–330 (2015) 87. F. El-Hawary, Y. Jing, Robust regression-based EKF for tracking underwater targets. IEEE J. Ocean. Eng. 20(1), 31–41 (1995) 88. J.C. Príncipe, in Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives (Springer, New York, 2010) 89. B. Chen, Y. Zhu, J. Hu, J.C. Príncipe, in System Parameter Identification: Information Criteria and Algorithms (Newnes, New York, 2013) 90. A. Singh, J.C. Príncipe, Using correntropy as a cost function in linear adaptive filters, in Proceedings International Joint Conference on Neural Networks (IJCNN) (2009), pp. 2950– 2955
Reference
289
91. D. Erdogmus, J.C. Príncipe, An error-entropy minimization for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Process. 50(7), 1780–1786 (2002) 92. R. He, B. Hu, W. Zheng, X. Kong, Robust principal component analysis based on maximum correntropy criterion. IEEE Trans. Image Process. 20(6), 1485–1494 (2011) 93. R. He, W. Zheng, B. Hu, Maximum correntropy criterion for robust face recognition. IEEE Trans. Pattern Anal. Intell. 33(8), 1561–1576 (2011) 94. R. Li, W. Liu, J.C. Príncipe, A unifying criterion for blind source separation based on correntropy. Signal Process. 8(78), 1872–1881 (2009). Special Issue on ICA 95. A. Singh, J.C. Príncipe, A loss function for classification based on a robust similarity metric, in Proceedings IEEE International Joint Conference on Neural Networks (IJCNN) (2010), pp. 1–6 96. S. Seth, J.C. Príncipe, Compressed signal reconstruction using the correntropy induced metric, in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2008), pp. 3845–3848 97. S. Zhao, B. Chen, J.C. Príncipe, Kernel adaptive filtering with maximum correntropy criterion, in Proceedings IEEE International Joint Conference on Neural Networks (IJCNN) (2011), pp. 2012–2017 98. Z. Wu, J. Shi, X. Zhang, W. Ma, B. Chen, Kernel recursive maximum correntropy. Signal Process. 117, 11–16 (2015) 99. L. Shi, Y. Lin, Convex combination of adaptive filters under the maximum correntropy criterion in impulsive interference. IEEE Signal Process. Lett. 21(11), 1385–1388 (2014) 100. R.P. Agarwal, M. Meehan, D. O’Regan, in Fixed Point Theory and Applications (Cambridge University Press, Cambridge, 2001) 101. A. Singh, J.C. Principe, A closed form recursive solution for maximum correntropy training, in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2010), pp. 2070–2073 102. B. Chen, J. Wang, H. Zhao, N. Zheng, J.C. Príncipe, Convergence of a fixed-point algorithm under maximum correntropy criterion. IEEE Signal Process. Lett. 22(10), 1723–1727 (2015) 103. Y. Zhang, B. Chen, X. Liu, Z. Yuan, J.C. Príncipe, Convergence of a fixed-point minimum error entropy algorithm. Entropy 17(8), 5549–5560 (2015) 104. D. Erdogmus, J.C. Príncipe, Generalized information potential criterion for adaptive system training. IEEE Trans. Neural Netw. 13(5), 1035–1044 (2002) 105. L.M. Silva, C.S. Felgueiras, L.A. Alexandre, de Sá J. Marques, Error entropy in classification problems: a univariate data analysis. Neural Computat. 18(9), 2036–2061 (2006) 106. J.P.M. de S’a, L.M. Silva, J.M Santos, L.A. Alexandre, in Minimum Error Entropy Classification (Springer, London, 2013) 107. I. Santamaria, D. Erdogmus, J.C. Príncipe, Entropy minimization for supervised digital communications channel equalization. IEEE Trans. Signal Process. 50(5), 1184–1192 (2002) 108. S. Han, S. Rao, D. Erdogmus, K-H. Jeong, J.C. Príncipe, A minimum-error entropy criterion with self-adjusting step-size. Signal Process. 87(11), 2733–2745 (2007) 109. B.J. Bessa, V. Miranda, J. Gama, Entropy and correntropy against minimum square error in offline and online three-day ahead wind power forecasting. IEEE Trans. Power Syst. 24(4), 1657–1666 (2009) 110. Y. Wang, Y. Tang, L. Li, Minimum error entropy based sparse representation for robust subspace clustering. IEEE Trans. Signal Process. 63(15), 4010–4021 (2015) 111. B. Chen, Y. Zhu, J. Hu, Mean-square convergence analysis of ADALINE training with minimum error entropy criterion. IEEE Trans. Neural Netw. 21(7), 1168–1179 (2010) 112. B. Chen, P. Zhu, J.C. Príncipe, Survival information potential: a new criterion for adaptive system training. IEEE Trans. Signal Process. 60(3), 1184–1194 (2012) 113. B. Chen, Z. Yuan, N. Zheng, J.C. Príncipe, Kernel minimum error entropy algorithm. Neurocomput. 121, 160–169 (2013) 114. W. Liu, P.P. Pokharel, J.C. Príncipe, Error entropy, correntropy and M-estimation, in Proceedings 16th IEEE Workshop Machine Learning for Signal Processing (2006), pp. 179– 184
290
Reference
115. A. Heravi, G. Hodtani, A new robust fixed-point algorithm and its convergence analysis. J. Fixed Point Theory Appl. 19, 3139–3215 (2017) 116. B. Chen, X. Liu, H. Zhao, J.C. Príncipe, Maximum correntropy Kalman filter. Automatica 76, 70–77 (2017) 117. Y. Mo, H. Li, H. Yang, B. Chen, Z. Jiang, Generalized maximum correntropy Kalman filter for target tracking in complex application. Space Sci. Technol. 2022, 15 (2022). paper ID: 9796015 118. X. Liu, Z. Ren, H. Lyu, Z. Jiang, P. Ren, B. Chen, Linear and nonlinear regression-based maximum correntropy extended Kalman filtering. IEEE Trans. Syst. Man Cybern. Syst. 51(5), 3093–3102 (2021) 119. X. Liu, H. Qu, J. Zhao, P. Yue, M. Wang, Maximum correntropy unscented kalman filter for spacecraft relative state estimation. Sensors 16(9), 1530 (2016) 120. X. Liu, B. Chen, B. Xu, Z. Wu, P. Honeine, Maximum correntropy unscented filter. Int. J. Syst. Sci. 48(8), 1607–1615 (2017) 121. W. Ma, J. Qiu, X. Liu, G. Xiao, J. Duan, B. Chen, Unscented Kalman filter with generalized correntropy loss for robust power system forecasting-aided state estimation. IEEE Trans. Ind. Inform. 15(11), 6091–6100 (2019) 122. S. Wang, C. Yin, G. Qian, Y. Feng, S. Kai, L. Wang, Maximum correntropy cubature Kalman filter based on statistical linear regression, C.N. Patent, 106487358 A, 8 Mar 2017. https:// patents.google.com/patent/CN106487358A/zh 123. X. Liu, H. Qu, J. Zhao, P. Yue, Maximum correntropy square-root cubature Kalman filter with application to SINS/GPS integrated systems. ISA Trans. 80, 195–202 (2018) 124. G.T. Cinar, J.C. Principe, Hidden state estimation using the correntropy filter with fixed point update and adaptive kernel size, in International Joint Conference on Neural Networks (IJCNN) (Brisbane, 2012), pp. 1–6 125. B. Chen, L. Dang, Y. Gu, N. Zheng, J.C. Príncipe, Minimum error entropy Kalman filter. IEEE Trans. Syst. Man Cybern. Syst. 51(9), 5819–5829 (2021) 126. L. Dang, B. Chen, S. Wang, W. Ma, P. Ren, Maximum error entropy unscented Kalman filter. IEEE Trans. Instrum. Meas. 66(8), 2036–2045 (2017) 127. L. Dang, B. Chen, Y. Huang, Y. Zhang, H. Zhao, Cubature Kalman filter under minimum error entropy with fiducial points. IEEE/CAA J. Autom. Sinica 9(3), 450–465 (2022) 128. R. Izanloo, S.A. Fakoorian, H. Sadoghi, D. Simon, Kalman filtering based on the maximum correntropy criterion in the presence of non-Gaussian noise, in 50th Annual Conference on Information Science and Systems (CISS) (Princeton, 2016), pp. 530–535 129. W. Liu, P.P. Pokharel, J.C. Príncipe, Correntropy: properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 55(11), 5286–5298 (2007) 130. T. S´’oderstr´’om, Errors-in-variables methods in system identification. Automatica 43(6), 939– 958 (2007) 131. B. Chen, L. Xing, H. Zhao, S. Du, J.C. Príncipe, Effects of outliers on the maximum correntropy estimation: a robustness analysis. IEEE Trans. Syst. Man Cybern. Syst. 51(6), 4007–4012 (2021) 132. P.J. Huber, Finite sample breakdown of M-and P-estimators. Ann. Stat. 12(1), 119–126 (1984) 133. Z. Chen, D.E. Tyler, On the finite sample breakdown points of redescending M-estimates of location. Stat. Probab. Lett. 69(3), 233–242 (2004) 134. V.J. Yohai, High breakdown-point and high efficiency robust estimates for regression. Ann. Stat. 15(2), 642–656 (1987) 135. R.A. Maronna, V.J. Yohai, The breakdown point of simultaneous general M estimates of regression and scale. J. Amer. Stat. Assoc. 86(415), 699–703 (1991) 136. X. He, D.G. Simpson, G. Wang, Breakdown points of t-type regression estimators. Biometrika 87(3), 675–687 (2000) 137. B. Chen, L. Xing, H. Zhao, N. Zheng, J.C. Príncipe, Generalized correntropy adaptive filtering. IEEE Trans. Signal Process. 64(13), 3376–3387 (2016) 138. B. Chen, X. Wang, N. Lu, S. Wang, J. Cao, J. Qin, Mixture correntropy for robust learning. Pattern Recognit. 79, 318–327 (2018)
Reference
291
139. B. Chen, X. Wang, Y. Li, J.C. Príncipe, Maximum correntropy criterion with variable center. IEEE Signal Process. Lett. 26(8), 1212–1216 (2019) 140. B. Chen, X. Wang, Z. Yuan, P. Ren, J. Qin, Multi-kernel correntropy for robust learning. IEEE Trans. Cybern. 52(12), 13500–13511 (2022) 141. E. Parzen, On the estimation of a probability density function and the mode. Ann. Math. Statist. 33, 1065–1067 (1962) 142. D. Erdogmus, J.C. Príncipe, An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Process. 50(7), 1780–1786 (2002) 143. B. Chen, L. Xing, B. Xu, H. Zhao, J.C. Príncipe, Insights into the robustness of minimum error entropy estimation. IEEE Trans. Neural Netw. Learn. Syst. 29(3), 731–737 (2018) 144. S.A. Fakoorian, A. Mohammadi, V. Azimi et al., Robust Kalman-type filter for non-Gaussian noise: performance analysis with unknown noise covariances. J. Dyn. Syst. Meas. Control 141(9), 091011 (2019) 145. M.K. Varanasi, B. Aazhang, Parametric generalized Gaussian density estimation. J. Acoust. Soc. Amer. 86(4), 1404–1415 (1989) 146. B. Chen, J.C. Principe, J. Hu, Y. Zhu, Stochastic information gradient algorithm with generalized Gaussian distribution model. J. Circuits Syst. Comput. 21(01), 1250006 (2012) 147. K. Kokkinakis, A.K. Nandi, Exponent parameter estimation for generalized Gaussian probability density functions with application to speech modeling. Signal Process. 85(9), 1852–1858 (2005) 148. S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989) 149. S. Choi, A. Cichocki, S.I. Amari, Flexible independent component analysis. J. VLSI Signal Process. Syst. Signal Image Video Technol. 26(1–2), 25–38 (2000) 150. L. Fan, Y. Wehbe, Extended Kalman filtering based real-time dynamic state and parameter estimation using PMU data. Elect. Power Syst. Res. 103, 168–177 (2013) 151. E. Ray, N. Reich, Prediction of infectious disease epidemics via weighted density ensembles. PLoS Computat. Biol. 14(2), e1005910 (2018) 152. K.R. Shih, S.J. Huang, Application of a robust algorithm for dynamic state estimation of a power system. IEEE Trans. Power Syst. 17(1), 141–147 (2002) 153. A. Sharma, S.C. Srivastava, S. Chakrabarti, A cubature Kalman filter based power system dynamic state estimator. IEEE Trans. Instrum. Meas. 66(8), 2036–2045 (2017) 154. G. Valverde, V. Terzija, Unscented Kalman filter for power system dynamic state estimation. IET Gener. Transm. Distrib. 5(1), 29–37 (2011) 155. J. Zhao, G. Zhang, K. Das, G.N. Korres, N.M. Manousakis, A.K. Sinha, Z. He, Power system real-time monitoring by using PMU based robust state estimation method. IEEE Trans. Smart Grid 7(1), 300–309 (2016) 156. A. Abur, A.G. Exposito, in Power System State Estimation: Theory and Implementation (Marcel Dekker, New York, 2004) 157. A. Leite da Silva, M. Do Coutto Filho, J. de Queiroz, State forecasting in electric power systems. IEE Proc. Gener. Transm. Distrib. 130(5), 237–244 (1983) 158. Power Systems Test Case Archive. http://www.ee.washington.edu/research/pstca/ 159. D.P. Bertsekas, Incremental least squares methods and the extended Kalman filter. SIAM J. Optim. 6, 807–822 (1996) 160. B. Jia, M. Xin, Y. Cheng, Sparse-grid quadrature nonlinear filtering. Automatica 48, 327–341 (2012) 161. S. Wang, J. Feng, C.K. Tse, Spherical simplex-radial cubature Kalman filter. IEEE Signal Process. Lett. 21, 43–46 (2014) 162. G. Yan, W. Yan, D. Xu, A SINS nonlinear error model reflecting better characteristics of SINS errors. J. Northwest Polytech Univ. 27(4), 511–516 (2009) 163. H. Li, J. Zhang, J. Shen, Error estimation method of SINS based on UKF in terrainaided navigation, in Proceedings International Conference on Mechatronic Science, Electric Engineering and Computer (MEC) (2011), pp. 2498–2501
292
Reference
164. C. Andreou, V. Karathanassi, Estimation of the number of endmembers using robust outlier detection method. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 7(1), 247–256 (2014) 165. K. Feng, J. Li, X. Zhang, X. Zhang, C. Shen, H. Cao, Y. Yang, J. Liu, An improved strong tracking cubature Kalman filter for GPS/INS integrated navigation systems. Sensors 18(6), 1919 (2018) 166. O. Straka, J. Duník, Stochastic integration Student’s-t filter,in Proceedings 20th International Conference on Information Fusion (2017), pp. 1–8 167. Y. Huang, Y. Zhang, Robust Student’s-t based stochastic cubature filter for nonlinear systems with heavy-tailed process and measurement noises.IEEE Access 5(5), 7964–7974 (2017). 168. B. Ristic, S. Arulampalam, N. Gordon, in Beyond the Kalman Filter: Particle Filters for Tracking Applications (Artech House, Norwood, 2003) 169. X. Liu, B. Chen, H. Zhao, J. Qin, J. Cao, Maximum correntropy Kalman filter with state constraints. IEEE Access 5, 25846–25853 (2017) 170. M. Nrgaard, N.K. Poulsen, O. Ravn, New developments in state estimation for nonlinear systems. Automatica 26(11), 1627–1638 (2000) 171. N. Subrahmanya, Y.C. Shin, Adaptive divided difference filtering for simultaneous state and parameter estimatio. Automatica 45(7), 1686–1693 (2009) 172. X. Liu, B. Chen, S. Wang, S. Du, Correntropy based divided difference filtering for the positioning of ships. Sensors 18, 4080 (2018) 173. L. Dang, B. Chen, Y. Xia, J. Lan, M. Liu, Dual extended Kalman filter under minimum error entropy with fiducial points. IEEE Trans. Syst. Man Cybern. Syst 52(12), 7588–7599 (2022) 174. L. Dang, B. Chen, S. Wang, Y. Gu, J.C. Principe, Kernel Kalman filtering based on conditional embedding operator and maximum correntropy criterion. IEEE Trans. Circuits Syst. I Regul. Pap. 66(11), 4265–4277 (2019) 175. S.S. Haykin, in Kalman Filtering and Neural Networks (Wiley, New York, 2001) 176. P. Zhu, B. Chen, J.C. Principe, learning nonlinear generative models of time series with a Kalman filter in RKHS. IEEE Trans. Signal Process. 62(1), 141–155 (2014) 177. J. Lu, S.C. Hoi, J. Wang, P. Zhao, Z. Liu, Large scale online kernel learning. J. Mach. Learn. Res. 17(1), 1613–1655 (2016) 178. Y.R. Park, T.J. Murray, C. Chen, Predicting sun spots using a layered perceptron neural network. IEEE Trans. Neural Netw. 7(2), 501–505 (1996) 179. Royal Observatory of Belgium, Brussels. SILSO, Sunspot Number Version 2.0. Accessed 11 May 2017. http://sidc.oma.be/silso/datafiles
Index
A Adaptive Kalman filter (AKF), 3, 11, 40–42, 50, 51
C Correntropy-based first-order divided difference (CDD1), 229, 237–239, 242, 243, 246–249, 280 Correntropy-based second-order divided difference (CDD2), 229, 239–241, 243, 246–249, 280 Cubature Kalman filter (CKF), 2, 5, 8, 9, 11, 29, 30, 33–35, 37, 51, 127, 191–194, 202, 207–227, 273 Cubature Kalman filter under minimum error entropy with fiducial points (MEEF-CKF), 8, 9, 191, 207–227
D Dual Kalman filtering under minimum error entropy with fiducial points (MEEF-DEKF), 229, 247–258, 280
E Extended Kalman filter (EKF), 2, 5, 9, 11, 30–31, 37, 51, 127–128, 132, 136, 137, 141, 144, 145, 154, 156, 158, 164, 167–172, 176, 179–189, 191, 217, 219–223, 225–227, 245–249, 255
G Generalized maximum correntropy Kalman filter (GMCKF) algorithm, 105–108
H H-infinity filter (HIF), 3, 11, 38–40, 50, 51, 255 Huber based Kalman filter (HKF), 5, 9, 11, 48–50, 255
I Information theoretic criteria, 6–7, 53–87, 89–284 Information theoretic learning (ITL), 6, 7, 27, 53, 65
K Kalman filter (KF), 2, 3, 5, 6, 8, 11–14, 17, 22–24, 27–31, 37, 38, 40, 43, 44, 49–51, 89–91, 93–95, 97, 98, 100, 102, 110–114, 119, 121–124, 127, 128, 149, 237, 262 Kalman filtering, 2, 7–9, 11–51, 89–148, 166, 191–284 Kernel Kalman filtering based on conditional embedding operator and maximum correntropy criterion (KKF-CEOMCC), 259–281
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Chen et al., Kalman Filtering Under Information Theoretic Criteria, https://doi.org/10.1007/978-3-031-33764-2
293
294 M Maximum correntropy criterion (MCC), 6–9, 53–65, 68, 70, 72, 74, 75, 77, 83, 85, 89, 91, 92, 96, 101–103, 123, 127–129, 132, 145, 149, 151, 152, 158–160, 172, 182, 184, 189, 191, 193, 194, 199, 209, 227, 241, 242, 256, 262, 271, 273, 279 Maximum correntropy cubature Kalman filter (MCCKF), 7, 9, 191, 193–195, 217–221, 223–227 Maximum correntropy extended Kalman filter (MCEKF), 7, 9, 127–141, 144, 151 Maximum correntropy Kalman filter (MCKF), 7, 9, 89, 91–101, 106–108, 110–114, 119, 121–124, 129, 229, 230, 237 Maximum correntropy Kalman filter with state constraints (MCKF-SC), 229–237, 280 Maximum correntropy square-root cubature Kalman filter (MCSCKF), 8, 9, 191, 194–208, 227 Maximum correntropy unscented filter (MCUF), 9, 149, 151–158, 163, 164, 176, 189, 193, 209, 227 Maximum correntropy unscented Kalman filter (MCUKF), 7, 9, 149, 158–162, 176, 178–189, 196 MEE with fiducial points (MEEF), 7, 8, 54, 73–77, 212, 251 Minimum error entropy (MEE), 6–8, 53, 54, 65–75, 77, 85, 89, 114, 124, 127, 141, 149, 172, 174, 175, 180, 182, 184, 188, 189, 209, 213, 256 Minimum error entropy extended Kalman filter (MEE-EKF), 8, 9, 127, 141–144, 217, 220–227, 255
Index Minimum error entropy Kalman filter (MEE-KF), 8, 9, 89, 114–123, 142, 174 Minimum error entropy unscented Kalman filter (MEE-UKF), 8, 9, 149, 172–188 Minimum mean square error (MMSE), 1, 5, 17, 56, 73, 89, 114, 123, 172, 180, 191, 229, 256, 271, 280
N Non-Gaussian noise, 2, 5, 7, 8, 38, 50, 53, 54, 89, 98, 110, 123, 124, 127, 137–141, 156–158, 161, 164, 167–169, 176, 178, 180, 188, 191, 193, 194, 205–208, 218, 220, 227–229, 238, 243, 257, 280
P Particle filter (PF), 4, 9, 11, 44–48, 50, 51
R Robust Kalman filtering, 3–8 Robust student’s t based Kalman filter (RSTKF), 5, 11, 50, 51, 110
U Unscented Kalman filter (UKF), 2, 9, 11, 29, 31–33, 37, 51, 127, 149–190, 202, 204, 205, 208, 209, 220–227, 238, 245–249, 273 Unscented Kalman filter with generalized correntropy loss (GCL-UKF), 7, 9, 149, 162–172, 189