125 30 61MB
English Pages 439 Year 2021
Applied Condition Monitoring
Fakher Chaari · Jacek Leskow · Agnieszka Wylomanska · Radoslaw Zimroz · Antonio Napolitano Editors
Nonstationary Systems: Theory and Applications Contributions to the 13th Workshop on Nonstationary Systems and Their Applications, February 3–5, 2020, Grodek nad Dunajcem, Poland
Applied Condition Monitoring Volume 18
Series Editors Mohamed Haddar, National School of Engineers of Sfax, Sfax, Tunisia Walter Bartelmus, Wroclaw, Poland Fakher Chaari, Mechanical Engineering Department, National School of Engineers of Sfax, Sfax, Tunisia Radoslaw Zimroz, Faculty of GeoEngineering, Mining and Geology, Wroclaw University of Science and Technology, Wroclaw, Poland
The book series Applied Condition Monitoring publishes the latest research and developments in the field of condition monitoring, with a special focus on industrial applications. It covers both theoretical and experimental approaches, as well as a range of monitoring conditioning techniques and new trends and challenges in the field. Topics of interest include, but are not limited to: vibration measurement and analysis; infrared thermography; oil analysis and tribology; acoustic emissions and ultrasonics; and motor current analysis. Books published in the series deal with root cause analysis, failure and degradation scenarios, proactive and predictive techniques, and many other aspects related to condition monitoring. Applications concern different industrial sectors: automotive engineering, power engineering, civil engineering, geoengineering, bioengineering, etc. The series publishes monographs, edited books, and selected conference proceedings, as well as textbooks for advanced students. ** Indexing: Indexed by SCOPUS, WTI Frankfurt eG, SCImago
More information about this series at http://www.springer.com/series/13418
Fakher Chaari Jacek Leskow Agnieszka Wylomanska Radoslaw Zimroz Antonio Napolitano •
•
•
•
Editors
Nonstationary Systems: Theory and Applications Contributions to the 13th Workshop on Nonstationary Systems and Their Applications, February 3–5, 2020, Grodek nad Dunajcem, Poland
123
Editors Fakher Chaari National School of Engineers of Sfax Sfax, Tunisia Agnieszka Wylomanska Institute of Mining Engineering Wroclaw University of Technology Wrocław, Poland Antonio Napolitano Centro Direzionale di Napoli Napoli, Italy
Jacek Leskow College of Telecommunications and Computer Science Cracow University of Technology Cracow, Poland Radoslaw Zimroz Faculty of GeoEngineering, Mining and Geology Wroclaw University of Technology Wrocław, Poland
ISSN 2363-698X ISSN 2363-6998 (electronic) Applied Condition Monitoring ISBN 978-3-030-82191-3 ISBN 978-3-030-82110-4 (eBook) https://doi.org/10.1007/978-3-030-82110-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Nonstationary systems are useful in many areas of interest, including the physical and natural sciences. They are also applied in the condition monitoring. In recent years, many research papers and books devoted to the applications of nonstationary systems have appeared in the scientific literature around the world. It is clear that different sources induce the nonstationary behavior of the signals related to the technical diagnostic area. From the mathematical and signal processing point of view, the nonstationarity can be seen in many different ways, depending on the source. Consequently, the corresponding model should have time-dependent parameters which includes, for example, changing distribution. This book presents nonstationary systems and processes as a solution. Such systems have found many practical applications, but their analysis is still a challenging task. That’s why in this volume we present papers where the new algorithms for the nonstationary systems analysis are introduced. Moreover, a significant part of this volume is dedicated to the specific applications of the nonstationary systems, especially in the condition monitoring. Recent research has amply demonstrated the benefits that can be obtained from modelling diagnostic signals as nonstationary processes. This is especially true for vibration or acoustic signals produced by mechanical systems due to their close link to nonstationary mechanisms. This volume is a summary of the recent results obtained by the authors conducting the interdisciplinary research in the application of the nonstationary systems in the condition monitoring. This volume is in line with the series “Applied Condition Monitoring”, where the new achievements of the dedicated methods for technical diagnostics are presented. We believe the presented results, and applications are unique and can be useful for practitioners working in the area of technical diagnostics. Moreover, the theoretical papers related to the nonstationary processes can be also interesting for the mathematicians working in this area. This volume is a result of the meeting in Grodek nad Dunajcem, Poland (February, 2020) of the group of researchers, PhD students, students and engineers interested in the nonstationary processes analysis and applications. The annual international workshop on Nonstationary Systems and Their Applications is a v
vi
Preface
chance for the academic audience to see real applications of this wide class of processes. On the other hand, the engineers can learn new methods of analysis of the nonstationary systems. The main idea of this meeting was to bring together pure and applied researchers from different disciplines and start the discussion on the possible cooperation in the field of the analysis and applications of nonstationary systems. The presentations covered recent theoretical developments for such systems and their possible applications. We present more than 20 interesting papers from a wide area of nonstationary systems. In this volume, one can find theoretical papers as well as articles devoted to the practical aspects of nonstationary processes applications, especially in the condition monitoring. In general, the presented papers can belong to two groups: related to the theoretical considerations of nonstationary processes and their practical applications. It should be noted, however, that all articles represent the interdisciplinary research on the frontline of mathematics, statistics, signal processing and engineering. The first group of papers contains the articles where the nonstationary processes are considered in the context of their theoretical properties. Various examples of the nonstationary processes are considered here (see Maraj et. al, Garay et. al). However, the motivation for their analysis always comes from the practical applications, and thus the obtained results have huge application potential in the various disciplines, especially machine condition monitoring. The special attention is directed towards the processes with non-Gaussian behaviour (see Grzesiek et. al, Grzesiek, Michalak et. al, Szarek et. al) and the cyclostationary systems (see Dudek et. al, Javorskyj et. al). In this group, one can also find interesting applications for the financial markets (see Stawiarski, Szarek et. al). The second group of papers in this volume is devoted to the practical applications of the theory for the nonstationary processes. The main attention is focused on the technical diagnostics and especially the condition monitoring. Most of the articles propose methods based on the analysis of the signals that contain information of the technical condition. Thus, the methods proposed in the papers from the “mathematical” part can be successfully applied for these applications. The cyclostationarity-based analysis is also highlighted in a few interesting articles devoted to machine condition monitoring (see Abboud et. al, Choklati et. al). There are also articles where the health monitoring for fault diagnostics is discussed and analysed by using various approaches. The authors propose here the automatic algorithms that can be applied in the monitoring systems (see Jablonski et. al, Lorenzo et. al, Hoshyarmanesh et. al, Laha et. al, Rashidi et. al, Razik et. al, Wang et. al). In this volume, one can also find the interesting approaches for the decision-support techniques applicable in the condition monitoring systems (see Elforjani et. al, Shumelchyk et. al, Gursky et. al). The mathematical modelling and computer simulations for condition monitoring are also discussed in the research papers presented in the current volume (Martynenko, Shapovalova et. al).
Preface
vii
We believe that the papers presented in this volume will be important for the scientists who are interested in new advances in nonstationary process analysis. Moreover, we hope to attract the attention of practitioners who could apply algorithms in real-life problems. Finally, we would like to thank all the reviewers who contributed to the improvement of the quality of the chapters. A special thanks goes to Springer for the support in publishing this book. Fakher Chaari Jacek Leskow Agnieszka Wylomanska Radoslaw Zimroz Antonio Napolitano
Contents
Time-Averaged Statistics-Based Methods for Anomalous Diffusive Exponent Estimation of Fractional Brownian Motion . . . . . . . . . . . . . . Katarzyna Maraj and Agnieszka Wyłomańska
1
First-Order Integer Valued AR Processes with Zero-Inflated Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aldo M. Garay, Francyelle L. Medina, Isaac Jales C. S., and Patrice Bertail
19
Asymptotics of Alternative Interdependence Measures for Bivariate a Stable Autoregressive Model of Order 1 . . . . . . . . . . . . . . . . . . . . . . Aleksandra Grzesiek and Agnieszka Wyłomańska
41
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aleksandra Grzesiek, Anna Michalak, and Agnieszka Wyłomańska
69
Granger Causality and Cointegration During Stock Bubbles and Market Crashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bartosz Stawiarski
93
Non-Gaussian Regime-Switching Model in Application to the Commodity Price Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Dawid Szarek, Łukasz Bielak, and Agnieszka Wyłomańska Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Anna E. Dudek, Dominique Dehay, Harry Hurd, and Andrzej Makagon Component and the Least Square Estimation of Mean and Covariance Functions of Biperiodically Correlated Random Signals . . . . . . . . . . . . 145 Ihor Javorskyj, Roman Yuzefovych, and Oksana Dzeryn
ix
x
Contents
The Synchronous Fitting of Cyclo-non-Stationary Signals: Definition and Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Dany Abboud, Amadou Assoumane, and Mohammed Elbadaoui On the Modelling of Phonocardiogram Signals: Laplace Kernel and Cyclostationarity Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . 193 Abdelouahad Choklati, Anas Had, and Khalid Sabri Overview of Practical Aspects of Evaluation of Spectral Scalar Indicators for Trend Analysis in Condition Monitoring . . . . . . . . . . . . . 207 Adam Jablonski and Tomasz Barszcz Automatic Detection of Rolling Element Bearing Faults to Be Applied on Mechanical Systems Comprised by Gears . . . . . . . . . . . . . . 217 Andy Rodríguez, Fidel Hernández, and Mario Ruiz Health Monitoring of Moving/Rotary Structures: An Electromechanical Impedance Approach Using Integrated Piezoceramic Transducers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Hamidreza Hoshyarmanesh and Ali Abbasi Rub-Impact Fault Diagnosis of a Coal Crusher Machine by Using Ensemble Patch Transformation and Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 S. K. Laha, B. Swarnakar, Sourav Kansabanik, and K. J. Uke Fault Detection of Non-stationary Processes Using a Modified PCA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Bahador Rashidi and Qing Zhao Contribution to Health Monitoring of Silicon Carbide MOSFET . . . . . . 307 Hubert Razik, Malorie Hologne-Carpentier, Bruno Allard, Guy Clerc, and Tianzhen Wang The Use of Signal Intensity Estimator for Monitoring Real World Non-stationary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Mohamed Elforjani and David Mba Model-Based Decision Support System for the Blast Furnace Charge of Burden Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 Yevhen Shumelchyk, Yurii Semenov, Viktor Horupakha, Pavlo Krot, and Iryna Hulina Optimization of the Vibrating Machines with Adjustable Frequency Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Volodymyr Gursky, Pavlo Krot, Ihor Dilay, and Radoslaw Zimroz
Contents
xi
Mathematical Modelling and Computer Simulation of Rotors Dynamics in Active Magnetic Bearings on the Example of the Power Gas Turbine Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Gennadii Martynenko Computer Method of Determining the Yield Surface of Variable Structure of Heterogeneous Materials Based on the Statistical Evaluation of Their Elastic Characteristics . . . . . . . . . . . . . . . . . . . . . . 378 Mariya Shapovalova and Oleksii Vodka Diagnosis Methods on the Blade of Marine Current Turbine . . . . . . . . 393 Tianzhen Wang, Funa Zhou, Tao Xie, and Hubert Razik Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Time-Averaged Statistics-Based Methods for Anomalous Diffusive Exponent Estimation of Fractional Brownian Motion Katarzyna Maraj and Agnieszka Wyloma´ nska(B) Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wroclaw University of Science and Technology, Wroclaw, Poland [email protected], [email protected]
Abstract. The anomalous diffusive processes are widely discussed in the research papers. In contrast to the diffusive processes with the linear second moment, they are characterized by the nonlinear variance. The anomalous diffusive processes exhibit many interesting properties which are not adequate to the diffusive systems and thus they have found various applications including, among others, biology, physics and environmental engineering. Very useful in the testing of the anomalous diffusive behavior are the time-averaged statistics which are based on the sample trajectory of the given process. Similar as the empirical second moment, they exhibit different behavior for anomalous diffusive and diffusive processes. Thus, they can be very effective tools for the estimation and statistical testing of the anomalous diffusive behavior. One can find different theoretical anomalous diffusive processes. One of the classical examples is the fractional Brownian motion. In this chapter, we demonstrate how the selected time-averaged statistics behave for the fractional Brownian motion and show how they can be applied in order to estimate the Hurst exponent (responsible for the anomalous diffusive behavior). By using Monte Carlo simulations, we compare the effectiveness of the presented estimation methods for the considered stochastic process. The described methodology can be applied to any other anomalous diffusive processes. Keywords: Anomalous diffusive process · Hurst exponent · Mean square displacement · Detrended fluctuation analysis · Detrended moving average analysis · Monte Carlo simulations
1
Introduction
The anomalous diffusive processes are widely discussed in world literature. They have also found many interesting practical applications [2,3,22,29,32,34,37,38, 47,49]. The anomalous diffusive behavior is manifested by non-linear second moment of the given process. More precisely, if {X(t)} is the anomalous diffusive process, then E2 X(t) ∼ tβ , where β = 1. When β < 1, then the process is called c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 1–18, 2022. https://doi.org/10.1007/978-3-030-82110-4_1
2
K. Maraj and A. Wyloma´ nska
the subdiffusive while for β > 1 - superdiffusive one. In the case β = 1 we call the process diffusive one. The parameter β is called the anomalous diffusive exponent [5,7,20,27,41,46,48,50]. However, in the real applications, when only limited number of trajectories of a given process is available (or even only one trajectory) we can not calculate properly the second moment of the given process, thus it is needed to consider the statistics which also can be useful in the anomalous diffusive behavior recognition. One of the classical example, widely discussed in the literature, is the time-averaged mean square displacement (TAMSD) [21,43]. This statistic is especially known in the physical sciences and was applied to the analysis of different phenomenon [19,29]. Also, the theoretical properties of TAMSD are known [18,21,32,43]. Thus, one can consider the anomalous diffusive behavior in the language of the TAMSD. Similar as in the general definition of the anomalous diffusive processes expressed by the second moment of a given process, one can consider the given process as the anomalous diffusive if the TAMSD for the large values of its arguments behaves like the power function. When the power exponent is smaller than one we call the process the subdiffusive one while when it is larger than one - the superdiffusive one. Although the TAMSD seems to be the most used statistic to the anomalous diffusive behavior recognition, one can consider different time-averaged statistics that can be applied to the considered problem. We mention here the empirical autocovariance function [45], the square of fluctuation function in the detrended fluctuation analysis [4,12,13,25,33,40,52] and generalized variance in the detrended moving average analysis [1,11,42,51]. The mentioned time-averaged statistics are sensitive to change of the parameters responsible for the anomalous diffusive behavior thus they are perfect tools for testing the anomalous diffusive behavior [43] and for the estimation of the anomalous diffusive parameter of the considered processes. In the literature, one can find different processes with the anomalous diffusive behavior. The most known are continuous-time random walk [20,26], obstructed diffusion [23,24], fractional Brownian motion (fBm) [28,35,46], fractional L´evy stable motion [17] and fractional Langevin equation [8,27,28]. One can find also different modifications of the classical models, see e.g., [30,31]. In this study, we consider one of the most classical anomalous diffusive processes, namely the fBm. We demonstrate how the selected time-averaged statistics can be applied to the estimation of the Hurst exponent responsible for the anomalous diffusive behavior of the considered process. We compare the estimation results by using the Monte Carlo approach. The presented methodology is universal and can be applied to different anomalous diffusive processes. The rest of the paper is organized as follows: in Sect. 2 we define the fBm and present its main properties. Moreover, we demonstrate how to simulate the sample trajectory of the process by using the Cholesky method. In Sect. 3 we describe the selected time-averaged statistics and demonstrate how to estimate the Hurst exponent of the fBm applying the presented statistics. Next, in Sect. 4 we demonstrate the simulation study in order to compare the estimation results. The last section concludes the paper.
Time-Averaged Statistics-Based Methods
2
3
Fractional Brownian Motion as the Anomalous Diffusive Process
As was mentioned, in the literature one can find different stochastic processes which exhibit the anomalous diffusive behavior. One of the classical examples is the fBm. In the following definition, we present also the main properties of the considered process. Definition 1 [6,36]. Let H be a constant such that H ∈ (0, 1). The (fBm) {X H (t)}t≥0 with Hurst index H, is continuous and centered Gaussian process with covariance function given by where t, s ≥ 0. (1) E X H (t)X H (s) = 1/2 t2H + s2H − |t − s|2H , For H = 1/2, the fBm is then a standard Brownian motion. One can show that the fBm {X H (t)}t≥0 has the following properties a.s.
1. X H (0) = 0 thus E[X H (t)] = 0 for all t ≥ 0. 2. {X H (t)}t≥0 has stationary increments, i.e., X H (t + s) − X H (s) has the same distribution as X H (t) for s, t ≥ 0. 3. {X H (t)}t≥0 is self-similar, meaning that X H (at) has the same distribution law as aH X H (t) for all t, a ≥ 0. 4. {X H (t)}t≥0 is a Gaussian process with the variance E[X H (t)2 ] = t2H , t ≥ 0, for all H ∈ (0, 1). Thus, for H < 1/2 this process is subdiffusive while for H > 1/2 it exhibits superdiffusive behavior. 5. {X H (t)}t≥0 has continuous trajectories. In this study, we use the Cholesky method to simulate the trajectory of the fractional Brownian motion [36] therefore we remind the readers of the general idea of this algorithm. By using this method we simulate the increments of the fBm, namely the fractional Gaussian noise (fGn), which is defined as Y H (t) = X H (t + Δ) − X H (t),
where t ∈ R+ .
(2)
We can see, that time belongs to positive real numbers, but we discretize it and simulate the process in time points t0 , t1 , . . . , tN −1 for N ∈ N. It should be mentioned that ∀t Y H (t) has a standard Gaussian distribution and Y H (t) and Y H (s) are not independent for t = s. The fGn is a stationary process. The autocovariance function for the fGn follows from the self-similarity property and it is given by [14] γY H (t) = cov(Y H (s), Y H (s + t)) =
1 [|t − 2|2H − 2|t|2H + |t + 1|2H ], 2
(3)
for any t, s ∈ N and Δ = 1. In the presented simulation method we use the Cholesky decomposition [16,36]. The main idea is to decompose the covariance matrix of the fGn into
4
K. Maraj and A. Wyloma´ nska
the product of a lower triangular matrix and its L(n)L(n)∗ , where ⎛ γY H (0) γY H (1) γY H (2) ⎜ γY H (1) γY H (0) γY H (1) ⎜ ⎜ γY H (2) γY H (1) γY H (0) Γ (n) = ⎜ ⎜ .. .. .. ⎝ . . .
conjugate-transpose Γ (n) = ⎞ . . . γY H (n) . . . γY H (n − 1) ⎟ ⎟ . . . γY H (n − 2) ⎟ ⎟. ⎟ .. .. ⎠ . .
γY H (n) γY H (n − 1) γY H (n − 2) . . .
(4)
γY H (0)
If the covariance matrix Γ (n) is positive-definite then L(n) has real entries and Γ (n) = L(n)L(n) . Suppose that Γ (n) matrix can be represented as a product of L(n) and L(n) matrixes ⎞ ⎛ ⎛ ⎞ l00 0 0 . . . 0 l00 l10 l20 . . . ln0 ⎟ ⎜ ⎜ l10 l11 0 . . . 0 0 l11 l21 . . . ln1 ⎟ ⎟ ⎜ ⎟ ⎜ .. ⎟ ⎜ . ⎜ ⎟ . 0 0 l21 . . . ln2 . . ⎟ Γ (n) = L(n) × L(n) = ⎜ ⎟ .(5) ⎟×⎜ ⎜ l20 l21 l22 ⎜ ⎟ . . . . . ⎟ ⎝ . . . ⎜ . . . . . . . .. 0 ⎠ . . . . ⎠ ⎝ .. .. .. 0 0 0 0 lnn ln0 ln1 ln2 . . . lnn 2 All entries are in the form lij . One can see that l00 = γY H (0), l10 l00 = γY H (1) 2 2 and l10 + l11 = γY H (0). For i ≥ 1, the entries of the lower triangular matrix can be determined by
li1 = 1 lij = ljj
γY H (i − j) −
j−1
γY H (i) , l0,0 lik ljk
(6)
,
for 0 < j ≤ n,
(7)
2 lik .
(8)
k=0 2 lij = γY H (0) −
j−1
k=0
Given independent, identically distributed (i.i.d.) standard Gaussian random variables {V (i)}i=0,...,N −1 , the fGn sequence is generated by y H (n) =
n
lnk V (k),
for n = 0, 1, . . . , N − 1.
(9)
k=0
To get the trajectory of the fBm for time t0 , t1 , . . . , tN −1 we just calculate the cumulative sum of the sequence defined in (9). In Fig. 1 we present the example trajectories of the fractional Brownian motion with H = 0.3 and H = 0.7 obtained by using the Cholesky method.
Time-Averaged Statistics-Based Methods
5
Fig. 1. The example trajectories of the fBm with H = 0.3 and H = 0.7.
3
Time-Averaged Statistics in Application to the Estimation of the Anomalous Diffusive Exponent
In this section, we present four time-averaged statistics that can be useful in the problem of the anomalous diffusive exponent estimation. In our study, we demonstrate their behavior for the fractional Brownian motion, however, they can be also applied for most of the anomalous diffusive processes. 3.1
The Empirical Autocovariance Function Based Method
As the first time-averaged statistics we consider one of the most classical known, namely the empirical autocovariance function (ACF). We remind, for the trajectory of the stationary process X(0), X(1), . . . , X(N − 1) the empirical ACF is a statistics defined as follows [45] γˆ (k) =
N −k 1 ¯ ¯ (X(n + k) − X)(X(n) − X), N n=0
(10)
¯ = 1 N −1 X(i). where X i=0 N If the considered process is the fGn then its theoretical autocovariance function is given in the Eq. (3). In case H = 1/2 and |k| → +∞, the theoretical autocovariance function hyperbolically decreases, namely γY H (k) ∼ H(2H − 1)|k|2H−2 ,
as |k| → +∞.
(11)
For the fGn which is a stationary process one can consider the spectral density defined as a Fourier transform of the autocovariance function [15] f (λ) =
N −1
k=0
2π γY H (k) exp −i λk . N
(12)
6
K. Maraj and A. Wyloma´ nska
One can show that in the considered case we have the following [14]
|2πj + λ)|−1−2H , ∀λ ∈ [0, 2π], f (λ) = 2cλ (1 − cos λ)
(13)
j=Z 1 with constant cλ = 2π sin(πH)Γ (2H + 1). Using all the above facts one can present the log-periodogram method to estimate the H parameter for the fGn. For the trajectory of the fGn Y H (0), Y H (1), . . . , Y H (N − 1) we use the fact that f (λ) ∼ cf |λ|1−2H . The periodogram is defined by
1 IN (λk ) = 2πN
2 N −1 H −itλ Y (t)e ,
for λk =
t=0
2πk , N
(14)
where k = [m1 , m2 ], m1 and m2 are the values that we can choose and which meet the following conditions 1 ≤ m1 < m2 ≤ N − 12 . It should be mentioned that the periodogram is an asymptotically unbiased estimator of the spectral density [14]. One can show that E(IN (λ)) ∼ cf |λ|1−2H .
(15)
Applying the logarithm to the above equation we get log E(IN (λ)) ∼ log cf + (1 − 2H) log(|λ|).
(16)
In practical applications, we can not directly apply the formula (16) because the theoretical expected value is not given. Thus we get log(IN (λ)) ∼ log cf + (1 − 2H) log(|λ|). (17) From the linear regression method applied to the above equation for log(IN (λ)) on log(λ) one can find a ˆ estimator from the fitted linear function a ˆ log(|λ|) + ˆb. Using that, we get an estimator of the H parameter ˆ = 1 (1 − a H ˆ). 2
(18)
In order to summarize the above-mentioned steps, below we present the stepby-step estimation procedure 1. For the set of the realizations of the fGn – y H (0), y H (1), . . . , y H (N − 1) we calculate the periodogram IN (λ) according to the formula (14). 2. We select the range of the frequencies m1 and m2 . We choose m1 = 1 and m2 = 14 N . 3. We calculate the λ. 4. We fit the linear function using the least squares method according to formula (17). ˆ = 1 (1 − a ˆ). 5. We estimate H from the fitted function a ˆ log(λ) + ˆb as H 2
Time-Averaged Statistics-Based Methods
7
In order to demonstrate the usefulness of the described above method, we simulate 1000 trajectories of the fGn of the length of 1000 with the H = 0.3 and H = 0.7 (the example trajectories are presented in Fig. 1) and calculate the mean of log(IN (λ)) for all trajectories. Then the regression of log-periodogram ˆ log(|λ|) + ˆb is fitted on log-frequencies is calculated. The linear function (1 − 2H) to the mean of log-periodogram and log-frequencies. The results are presented ˆ is the mean of the estimated H calculated for all in Fig. 2. The parameter H trajectories.
Fig. 2. The comparison of the empirical (mean of the 1000 trajectories) and theoretical function log(IN (λ)) on log(|λ|) for the fGn with H = 0.3 and H = 0.7. The parameter ˆ is the mean of the estimated H calculated for all trajectories. H
As one can see in Fig. 2, the empirical and theoretical functions coincide. The ˆ parameter is close to the theoretical values. estimated H 3.2
Time-Averaged Mean Square Displacement Based Method
TAMSD is one of the most popular tool for anomalous diffusive behavior recognition. For the trajectory of the fBm X H (0), X H (2), . . . , X H (N − 1) and τ = 1, 2, . . . , τmax we define the TAMSD as [44] T AM SD(τ ) =
1 N −τ
N −1−τ
2 X H (t + τ ) − X H (t) .
(19)
t=0
It is known that for the fBm the following holds [10] E(T AM SD(τ )) ∼ τ 2H .
(20)
8
K. Maraj and A. Wyloma´ nska
Similar as for the method based on the empirical autocovariance function, in practice, we do not consider the theoretical expected value of the TAMSD statistic but the value of the statistic calculated for real data. Thus, using (20) one obtains log(T AM SD(τ )) ∼ 2H log(τ ).
(21)
Finally, we fit the linear function to the left-hand side of (21) using the least ˆ = 1a squares method. As a final result, we obtain H 2 ˆ. To summarize the described above algorithm, the TAMSD-based estimation method proceeds as follows 1. For the trajectory of the fBm xH (0), xH (1), . . . , xH (N − 1) we calculate TAMSD(τ ) for t = 0, 1, 2, . . . , N − 1 and τ = 1, 2, . . . , τmax according to the formula (19). In this study, we assume τmax = 10. 2. The linear function is fitted using the least squares method to log(T AM SD(τ )) on log(τ ), see formula (21). 3. From the fitted linear function a ˆ log(n) + ˆb we estimate the Hurst exponent ˆ = 1a ˆ . as H 2 We simulate 1000 trajectories of the fBm of the length of 1000 with H = 0.3 and H = 0.7. For each trajectory we calculate the log(T AM SD(τ )). Then, ˆ log(τ ) + ˆb according to (21) using the linear regression method the function 2H ˆ is fitted. The parameter H is the mean of the estimated H calculated for each trajectory. In Fig. 3 we present the mean of the log(T AM SD(τ )) calculated for all trajectories and the mean of the fitted functions. One can see that the empirical and theoretical functions coincide and the parameter H is correctly estimated. 3.3
Detrended Fluctuation Analysis Based Method
Detrended Fluctuation Analysis (DFA) is one of the most popular scaling methods to estimate power-law correlation exponents from random signals. The procedure of detrended fluctuation analysis in the considered case is based on the trajectory of the fBm X H (0), X H (2), . . . , X H (N − 1). The first step in the DFA is to calculate the cumulative sum of the given trajectory Z H (t) =
t
X H (i),
where t = 0, 1, . . . , N − 1.
(22)
i=0
Then the time axis t = 0, 1, . . . , N − 1 is divided into K segments of length n, K = [N/n]. In every segment v, where v = 0, 1, . . . , K − 1, we derive the squared error sum of the detrended process f 2 (v, n) as f 2 (v, n) =
n+d 1 v [Z H (t) − pv (t)]2 , n t=1+dv
(23)
Time-Averaged Statistics-Based Methods
9
Fig. 3. The comparison of the empirical (mean of the 1000 trajectories) and theoretical function log(T AM SD(τ )) on log(τ ) for the fBm with H = 0.3 and H = 0.7. The ˆ is the mean of the estimated H calculated for all trajectories. parameter H
where pv (·) is the fitted deterministic function for the segment v. In our analysis, we consider pv (·) as the polynomial of order 1 which is fitted in segment v by using the ordinary least squares method. Moreover, dv = (v − 1)n. Finally, the square of the fluctuation function is calculated as the average over all squared error sums F 2 (n) =
[N/n] K
1 2 1 f (v, n) = f 2 (v, n). K v=0 [N/n] v=0
(24)
It should be highlighted, the square of the fluctuation function is a random quantity and thus it is reasonable to consider its main characteristics, such as the expected value and the variance (if they exist). In [39] it is shown that for the fractional Brownian motion with the Hurst exponent H the following holds E(F (n)) ∼ nH+1 ,
(25)
for n = 2, 3, . . . , nmax . Similar as in previous cases, the theoretical expected value of the squared fluctuation function of DFA is not given, thus we use the empirical value of F (n). We get (26) F (n) ∼ nH+1 . To summarize the described above algorithm, the DFA-based estimation method proceeds as follows [9] (1), . . . , xH (N − 1) we calculate the 1. For the trajectory of the fBm xH (0), xH t H vector of the cumulative sums z (t) = i=0 xH (i), t = 0, 1, . . . , N − 1. H 2. For each n the sequence z (t) is segmented into K = [N/n] windows of size n.
10
K. Maraj and A. Wyloma´ nska
3. In each segment we fit the linear function pv (t) by using the least squares method. 4. The squared error sum of the detrended process is calculated according to the formula (23). 5. The mean-squared residuals F (n) are found according to the formula (24) for n = 2, 3, . . . , nmax . In our case we assume nmax = 14 N . 6. The linear function is fitted using the least squares method to log(F (n)) on log(n). 7. From the fitted linear function a ˆ log(n) + ˆb the Hurst exponent is estimated ˆ as H = a ˆ − 1. Similar as in the previous presented methods, we simulate 1000 trajectories of the fBm of the length of 1000 with the H = 0.3 and H = 0.7 and calculate the ˆ + 1) log(n) + ˆb mean of log(F (n)) for all trajectories. Then the linear function (H is fitted to this mean on log(n) using linear regression method. The comparison is ˆ is the mean of the estimated H calculated presented in Fig. 4. The parameter H for all trajectories. As one can see, the Hurst parameters are correctly estimated. They are close to the theoretical values. 3.4
Detrended Moving Average Analysis Based Method
The last considered method of the Hurst exponent estimation is the algorithm based on the Detrended Moving Average Analysis (DMA). This algorithm is widely known in the literature [1,11] and has also the practical importance because it is easy and fast. The procedure of estimation is similar to the DFA–based method. The statistic used in the DMA algorithm is called 2 the generalized variance σDM A (n). We consider the trajectory of the fBm H H H X (0), X (2), . . . , X (N − 1). The analyzed statistic is defined as 2 σDM A (n) =
where
1 N − nmax
N
˜ H (i)]2 , [X H (i) − X n
(27)
i=nmax
n−1
˜ H (i) = 1 X H (i − k), X n n
(28)
k=0
where n = 2, 3, . . . , nmax and nmax is equal to some percentage of the length of the trajectory, we chose to set nmax = 14 N , where N is the length of the trajectory. For the processes with anomalous diffusive behavior, such as the fractional 2 Brownian motion we can see power-law dependence for σDM A (n) with moving average window size n [11]. In our case we have 2 H E(σDM A (n)) ∼ n .
(29)
Time-Averaged Statistics-Based Methods
11
Fig. 4. The comparison of the empirical (mean of 1000 trajectories) and theoretical function log(F (n)) on log(n) for the fBm with H = 0.3 and H = 0.7. The parameter ˆ is the mean of the estimated H calculated for all trajectories. H
In the practical applications, the theoretical expected value of the considered 2 statistic is not available and we consider the value of empirical σDM A (n) along the arguments. In the real data analysis, we consider the natural logarithm of 2 the σDM A (·) and using (29) we estimate the H parameter by the least squares method taking the following relation 2 log(σDM A (V (n)) ∼ H log(n).
(30)
for n = 2, 3, . . . , nmax . Now, using the linear regression method, we can find the linear function ˆ Using (30) we fit the linear function of log(n) to a ˆ log(n) + ˆb, where a ˆ = H. 2 ((n)) and thus we estimate the Hurst exponent. log(σDM A In order to summarize the above description, the DMA-based algorithm proceeds as follows 1. For the set of realisations of the fBm xH (0), xH (1), . . . , xH (N − 1) first we n−1 H 1 calculate the moving averages x ˜H n (i) = n k=0 x (i − k), for each window’s size n = 2, 3, . . . , nmax . 2 2. We calculate the statistic σDM A (n) over the time interval [nmax , Nmax ] according to formula (27). 3. Using the formula (30) we calculate the estimator of the Hurst exponent. Similar as in the previous considered cases, we simulate 1000 trajectories of the fBm of the length of 1000 for H = 0.3 and H = 0.7. Then, the mean 2 of log(σDM A (n) for all trajectories is calculated. After that the linear function ˆ ˆ H log(n) + ˆb is fitted to the function of this mean on log(n). The parameter H is the mean of the estimated H calculated for all trajectories. In Fig. 5 one can ˆ see that the theoretical and empirical functions coincide and the parameter H is correctly estimated.
12
K. Maraj and A. Wyloma´ nska
Fig. 5. The comparison of the empirical (mean of 1000 trajectories) and theoretical function log(σDM A (n)) on log(n) for the fBm with H = 0.3 and H = 0.7. The paramˆ is the mean of the estimated H calculated for all trajectories. eter H
4
Simulation Study
In order to check the effectiveness of the estimation methods described in the previous section, in this part we compare the results for simulated data. We simulate 1000 trajectories of the fBm. In order to check how the Hurst exponent influences the estimation results, here we consider different values of the H parameter. In our simulations we take under consideration 9 values of the anomalous diffusive exponent H = [0.1, 0.2, . . . , 0.9]. For different trajectories lengths, we estimate the H parameter by four described methods based on the time-averaged statistics. Here we consider the following lengths of the trajectories n = [100, 150, . . . , 1000]. The results are presented in Figs. 6, 7 and 8. As one can see in Fig. 6, where we demonstrate the mean of the estimated values of H parameter, for small values of H parameter (H < 0.5 - the subdiffusive case) the method based on TAMSD seems to be the superior with respect to other considered algorithms. In the superdiffusive case, namely for H > 0.5, we can indicate two algorithms, which give the best results, namely the DFA-based method and the TAMSD-based technique. It is obvious, the longer the trajectories, the results are better. However in the case of the ACF-based method even for longer trajectories, the empirical means of the estimators are far from their theoretical values. If we take under consideration the variance of the estimators (Fig. 7) we observe that for small values of the anomalous diffusive exponent (H < 0.5 - the subdiffusive case) the DMA-based and TAMSD-based methods give the best results, the variances of the estimators are closer to zero with respect to other considered methods. In the superdiffusive case (H > 0.5) the ACF-based algorithm and TAMSD-based method seem to be superior with respect to the other methods. As we could expect, the longer the trajectories, the better results.
Time-Averaged Statistics-Based Methods
13
Fig. 6. The influence of the sample length on the mean of estimated values of the anomalous diffusive exponent for all considered methods. Here we consider the trajectories of red the fBm with different values of H parameter.
14
K. Maraj and A. Wyloma´ nska
Fig. 7. The influence of the sample length on the variance of estimated values of the anomalous diffusive exponent for all considered methods. Here we consider the trajectories of the fBm with different values of H parameter.
Time-Averaged Statistics-Based Methods
15
In this paper, we also check the computational time for the considered estimation methods (Fig. 8). We present the results only for two cases, namely H = 0.3 and H = 0.7. For the other values of the Hurst exponent, the results are very similar. We can see that the fastest methods are TAMSD-based and ACF-based algorithms. Admittedly, the slowest is the DFA-based method.
Fig. 8. The influence of the sample length on the computational time for time-averaged statistics-estimation methods for the Hurst exponent of the fBm. We consider two exemplary values of the H parameter, namely H = 0.3 and H = 0.7.
5
Conclusions
In this chapter, we consider the time-averaged statistics as the tools used in the estimation of the anomalous diffusive exponent of anomalous diffusive processes. Here we consider four time-averaged statistics which are based on the sample trajectory of the given process. We give the definition and the detailed description of the considered estimation methods for the classical anomalous
16
K. Maraj and A. Wyloma´ nska
diffusive process, namely the fractional Brownian motion. We demonstrate that all methods give acceptable results. However, when we compare the estimators, one can select the best algorithms. When we consider the mean value of the estimated anomalous diffusive exponent, then the TAMSD-based method seems to be superior in the subdiffusive case while in the superdiffusive case, we can indicate two best algorithms, namely TAMSD-based and DFA-based. When we analyze the variance of the estimators the TAMSD-based and DMA-based algorithms are superior with respect to the other methods in the subdiffusive case. In the superdiffusive case we can select two best algorithms, namely TAMSD-based and the ACF-based. In this paper, we have also considered the computational time of the used algorithms. The worst results were obtained for the DFA-based algorithm. The results for other methods are comparable taking into consideration the computational time. Although we demonstrate the results for the fBm, the presented methodology is universal and can be applied to any other anomalous diffusive processes. Acknowledgements. We would like to acknowledge the support of the National Center of Science Opus Grant No. 2016/21/B/ST1/00929 “Anomalous diffusion processes and their applications in real data modeling”.
References 1. Alessio, E., Carbone, A., Castelli, G., Frappietro, V.: Second-order moving average and scaling of stochastic time series. Phys. Condens. Matter 27, 197–200 (2002) 2. Andreanov, A., Grebenkov, D.S.: Time-averaged MSD of Brownian motion. J. Stat. Mech. Theor. Exp. 2012(07), P07001 (2012) 3. Arcizet, D., Meier, B., Sackmann, E., R¨ adler, J.O., Heinrich, D.: Temporal analysis of active and passive transport in living cells. Phys. Rev. Lett. 101, 248103 (2008) 4. Bashan, A., Bartsch, R., Kantelhardt, J.W., Havlin, S.: Comparison of detrending methods for fluctuation analysis. Physica A 387, 5080–5090 (2008) 5. Bertseva, E., Grebenkov, D., Schmidhauser, P., Gribkova, S., Jeney, S., Forr´ o, L.: Optical trapping microrheology in cultured human cells. Eur. Phys. J. E Soft Matter 35, 63 (2012) 6. Biagini, F., Hu, Y., Øksendal, B., Zhang, T.: Stochastic Calculus for Fractional Brownian Motion and Applications. Springer, London (2008). https://doi.org/10. 1007/978-1-84628-797-8 7. Bressloff, P.C., Newby, J.M.: Stochastic models of intracellular transport. Rev. Mod. Phys. 85, 135–196 (2013) 8. Bronstein, I., et al.: Transient anomalous diffusion of telomeres in the nucleus of mammalian cells. Phys. Rev. Lett. 103, 018102 (2009) 9. Bryce, R., Sprague, K.B.: Revisiting detrended fluctuation analysis. Sci. Rep. 2, 315 (2012). https://doi.org/10.1038/srep00315 10. Burnecki, K., Kepten, E., Garini, Y., Sikora, G.: Estimating the anomalous diffusion exponent for single particle tracking data with measurement errors - an alternative approach. Sci. Rep. 5, 11306 (2015) 11. Carbone, A., Kiyono, K.: Detrending moving average algorithm: Frequency response and scaling performances. Phys. Rev. E 93, 063309 (2016)
Time-Averaged Statistics-Based Methods
17
12. Chen, Z., Hu, K., Carpena, P., Bernaola-Galvan, P., Stanley, H.E., Ivanov, P.C.: Effect of nonlinear filters on detrended fluctuation analysis. Phys. Rev. E 71, 011104 (2005) 13. Chen, Z., Ivanov, P.C., Hu, K., Stanley, H.E.: Effect of nonstationarities on detrended fluctuation analysis. Phys. Rev. E 65, 041107 (2002) 14. Coeurjolly, J.: Simulation and identification of the fractional Brownian motion: a bibliographical and comparative study. J. Stat. Softw. 05, 1–53 (2000) 15. Cohen, L.: Generalization of the Wiener-Khinchin theorem. IEEE Sig. Process. Lett. 5, 292–294 (1998) 16. Dieker, T.: Simulation of fractional Brownian motion. M.Sc. thesis, Vrije Universiteit Amsterdam (2013) 17. Fanelli, D., McKane, A.J.: Diffusion in a crowded environment. Phys. Rev. E 82, 021113 (2010) 18. Gajda, J., Wyloma´ nska, A., Kantz, H., Chechkin, A., Sikora, G.: Large deviations of time-averaged statistics for gaussian processes. Statist. Probab. Lett. 143, 47–55 (2018) 19. Gal, N., Lechtman-Goldstein, D., Weihs, D.: Particle tracking in living cells: a review of the mean square displacement method and beyond. Rheol. Acta 5, 425– 443 (2013) 20. Golding, I., Cox, E.C.: Physical nature of bacterial cytoplasm. Phys. Rev. Lett. 96, 098102 (2006) 21. Grebenkov, D.S.: Probability distribution of the time-averaged mean-square displacement of a gaussian process. Phys. Rev. E 84, 031124 (2011) 22. Grebenkov, D.S.: Optimal and suboptimal quadratic forms for noncentered gaussian processes. Phys. Rev. E 88, 032140 (2013) 23. Guigas, G., Kalla, C., Weiss, M.: Probing the nanoscale viscoelasticity of intracellular fluids in living cells. Biophys. J . 93, 316 (2007) 24. Hellmann, M., Klafter, J., Heermann, D.W., Weiss, M.: Challenges in determining anomalous diffusion in crowded fluids. J. Phys. Condens. Matter 23(23), 234113 (2011) 25. Hu, K., Ivanov, P.C., Chen, Z., Carpena, P., Stanley, H.E.: Effect of trends on detrended fluctuation analysis. Phys. Rev. E 64, 011114 (2001) 26. Jeon, J.-H., Metzler, R.: Fractional Brownian motion and motion governed by the fractional Langevin equation in confined geometries. Phys. Rev. E 81, 021103 (2010) 27. Jeon, J.H., et al.: In vivo anomalous diffusion and weak ergodicity breaking of lipid granules. Phys. Rev. Lett. 106, 048103 (2011) 28. Kepten, E., Bronshtein, I., Garini, Y.: Ergodicity convergence test suggests telomere motion obeys fractional dynamics. Phys. Rev. E 83, 041919 (2011) 29. Kepten, E., Weron, A., Sikora, G., Burnecki, K., Garini, Y.: Guidelines for the fitting of anomalous diffusion mean square displacement graphs from single particle tracking experiments. PLOS ONE 10(2), 1–10 (2015) 30. Kumar, A., Gajda, J., Wyloma´ nska, A.: Fractional Brownian motion delayed by tempered and inverse tempered stable subordinators. Methodol. Comput. Appl. Probab. 21, 185–202 (2019) 31. Kumar, A., Wyloma´ nska, A., Polocza´ nski, R., Sundar, S.: Fractional Brownian motion time-changed by gamma and inverse gamma process. Phys. A 468, 648– 667 (2017) 32. Lanoisel´ee, Y., Sikora, G., Grzesiek, A., Grebenkov, D.S., Wyloma´ nska, A.: Optimal parameters for anomalous-diffusion-exponent estimation from noisy data. Phys. Rev. E 98, 062139 (2018)
18
K. Maraj and A. Wyloma´ nska
33. Ma, Q.D.Y., Bartsch, R.P., Bernaola-Galv´ an, P., Yoneyama, M., Ivanov, P.C.: Effect of extreme data loss on long-range correlated and anticorrelated signals quantified by detrended fluctuation analysis. Phys. Rev. E 81, 031101 (2010) 34. Magdziarz, M., Weron, A.: Anomalous diffusion: testing ergodicity breaking in experimental data. Phys. Rev. E: Stat., Nonlin, Soft Matter Phys. 84, 051138 (2011) 35. Mandelbrot, B., van Ness, J.: Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10, 422–437 (1968) 36. Masaaki, K., Chun, M.T.: Fractional Brownian motions in financial models and their Monte Carlo simulation. In: IntechOpen (2014) 37. Meroz, Y., Sokolov, I.M., Klafter, J.: Test for determining a subdiffusive model in ergodic systems from single trajectories. Phys. Rev. Lett. 110, 090601 (2013) 38. Metzler, R., et al.: Analysis of single particle trajectories: from normal to anomalous diffusion. Acta Physica Polonica Ser. B B40(5), 1315–1330 (2009) 39. Movahed, M.S., Jafari, G.R., Ghasemi, F., Rahvar, S., Tabar, M.R.R.: Multifractal detrended fluctuation analysis of sunspot time series. J. Stat. Mech. Theor. Exp. 2006(02), P02003–P02003 (2006) 40. Peng, C.-K., Buldyrev, S.V., Havlin, S., Simons, M., Stanley, H.E., Goldberger, A.L.: Mosaic organization of DNA nucleotides. Phys. Rev. E 49, 1685–1689 (1994) 41. Sackmann, E., Keber, F., Heinrich, D.: Physics of cellular movements. Ann. Rev. Condens. Matter Phys. 1(1), 257–276 (2010) 42. Sikora, G.: Statistical test for fractional Brownian motion based on detrending moving average algorithm. Chaos, Solitons Fractals 116, 54–62 (2018) 43. Sikora, G., Burnecki, K., Wyloma´ nska, A.: Mean-squared-displacement statistical test for fractional Brownian motion. Phys. Rev. E 95, 032110 (2017) 44. Sikora, G., Teuerle, M., Wyloma´ nska, A., Grebenkov, D.: Statistical properties of the anomalous scaling exponent estimator based on time-averaged mean-square displacement. Phys. Rev. E 96, 022132 (2017) 45. Subba Rao, T.: Time Series Analysis: Methods and Applications. Elsevier (2012) 46. Szymanski, J., Weiss, M.: Elucidating the origin of anomalous diffusion in crowded fluids. Phys. Rev. Lett. 103, 038102 (2009) 47. Tejedor, V., et al.: Quantitative analysis of single particle trajectories: mean maximal excursion method. Biophys. J . 98, 1364–1372 (2010) 48. Toli´c-Nørrelykke, I.M., Munteanu, E.-L., Thon, G., Oddershede, L., Berg-Sørensen, K.: Anomalous diffusion in living yeast cells. Phys. Rev. Lett. 93, 078102 (2004) 49. T¨ urkcan, S., Masson, J.-B.: Bayesian decision tree for the classification of the mode of motion in single-molecule trajectories. PLoS ONE 8(12), 1–14 (2013) 50. Wilhelm, C.: Out-of-equilibrium microrheology inside living cells. Phys. Rev. Lett. 101, 028101 (2008) 51. Xi, C., Zhang, S., Xiong, G., Zhao, H.: A comparative study of two-dimensional multifractal detrended fluctuation analysis and two-dimensional multifractal detrended moving average algorithm to estimate the multifractal spectrum. Phys. A 454, 34–50 (2016) 52. Xu, L., Ivanov, P.C., Hu, K., Chen, Z., Carbone, A., Stanley, H.E.: Quantifying signals with power-law correlations: a comparative study of detrended fluctuation analysis and detrended moving average techniques. Phys. Rev. E 71, 051101 (2005)
First-Order Integer Valued AR Processes with Zero-Inflated Innovations Aldo M. Garay1(B) , Francyelle L. Medina1 , Isaac Jales C. S.2 , and Patrice Bertail3 1
Department of Statistics, Federal University of Pernambuco, Recife, Brazil {agaray,francy}@de.ufpe.br 2 Department of Mathematics and Statistics, State University of Rio Grande do Norte, Mossor´ o, Brazil [email protected] 3 Universit´e Paris Nanterre, MODAL’X, Paris, France [email protected] Abstract. To deal with time series process with excess of zeros, we extend the INAR(1) process by considering that the innovations follow different zero-inflated models, called the ZI-INAR(1) model. We present some of its theoretical properties, develop an efficient EM algorithm for parameter estimation and propose several bootstrap techniques to construct confidence intervals for the parameters. Finally, we present the relevance and applicability the of proposed ZI-INAR(1) model through simulation studies and an application to a real dataset. Keywords: Count time series algorithm
1
· ZINAR(1) process · Bootstrap · EM
Introduction
Recently the modeling and analysis of count time series have been carried out by considering the integer-valued autoregressive (INAR) process, introduced by [3,22,29]. In this process, the innovations are assumed to follow a Poisson distribution, but in practice, the innovations may be overdispersed. A frequent manifestation of this incidence of zero counts greater than expected. In the context of non-negative integer value time series with excess zeros, [15] proposed the ZINAR(1) process, which assumes that the innovations of the process follow a zero-inflated Poisson distribution. The authors developed some structural properties of the process and estimated the unknown parameters by conditional or approximate full maximum likelihood. To analyze discrete time series with excess of zeros, in this manuscript we extend the ZINAR(1) model, considering that the innovations follow a class of zero-inflated (ZI) models, called the ZI-INAR(1) model. Some properties, such as mean, variance and joint distribution are developed. We propose an EM algorithm to estimate the unknown parameters by maximizing the conditional likelihood function. For this purpose, we start by introducing some definitions, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 19–40, 2022. https://doi.org/10.1007/978-3-030-82110-4_2
20
A. M. Garay et al.
notations and properties about the first-order integer valued autoregressive processes INAR(1) and the zero-inflated (ZI) models. Definition 1. Let X be a non-negative integer-valued random variable (r.v.) and α ∈ [ 0, 1]. Then, the thinning operator ‘◦’ [29] is defined as: α◦X =
X
Zi ,
(1)
i=1
where {Zi }i≥1 is a sequence of independent identically distributed (iid) Bernoulli random variables independent of X, with P (Zi = 1) = α. Thus, considering Definition 1, the INAR(1) processes have the following stochastic structure: Yt = α ◦ Yt−1 + Vt , t ∈ Z, (2) where {Vt }t∈Z is a sequence of non-negative integer-valued random iid variables, called innovations, with E(Vt ) = μ < ∞, V ar(Vt ) = σ 2 < ∞ and independent of Yt−1 , for all t. It is important to note that given Yt−1 = yt−1 > 0, the r.v. α ◦ Yt−1 follows a binomial distribution, with parameters yt−1 and α, and given Yt−1 = 0, we have that α ◦ Yt−1 is a r.v. degenerate at zero. As discussed by [3,9], if α ∈ (0, 1), then {Yt }t∈Z is a stationary process, whereas α = 0 and α = 1 imply, respectively, independence and non-stationarity. Also, if 0 ≤ α < 1, the INAR(1) process is second-order stationary. Several zero-inflated models have been proposed in the literature to accommodate simultaneously both overdispersion and excess zeros: (i) the zero-inflated Poisson (ZIP) model, [19], in which the zero counts can come from two sources: from the Poisson distribution (sampling zeros) or from Bernoulli distribution (structural zeros); (ii) the zero-inflated negative binomial distribution (ZINB), [10,12,28], which is obtained by mixing a Bernoulli distribution with a baseline negative binomial (NB) distribution over the ZIP model, among others. In the following, we define the zero-inflated distributions, through their hierarchical formulation and introduce some further properties and three particular cases. Definition 2. The discrete r.v. V follows a zero-inflated (ZI) distribution, with parameters ρ and λ, if it has the following stochastic representation: V = BU, B⊥U, where B is a Bernoulli r.v., with P (B = 1) = 1 − ρ and 0 ≤ ρ < 1. U is a non–negative discrete r.v., with probability mass function (pmf) hU (u|λ). λ can be a scalar or vector parameter, indexing the distribution of U . B⊥U indicates that the r.v. B and U are independent.
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
As consequence of Definition 2, we can obtain the pmf of V , given by: ρ + (1 − ρ) hU (0|λ) v = 0 P (V = v) = (1 − ρ) hU (v|λ) v ≥ 1,
21
(3)
where hU (v|λ) = P (U = v). We denote V ∼ ZI(ρ, λ; hU (·)). E [V ] = (1 − ρ) E [U ] and V ar [V ] = (1 − ρ) V ar [U ] + ρE 2 [U ] .
(4)
The distribution of the r.v. U determines the form of the ZI distribution. Thus, we describe briefly three particular cases of the flexible ZI models: – The zero-inflated Poisson (ZIP) model: In this case, we assume that the r.v. U follows a Poisson distribution, with mean λ. Thus, a pmf of the r.v. V , defined in Eq. (3), takes the form: ρ + (1 − ρ) e−λ , v = 0 (5) −λ v P (V = v) = (1 − ρ) e v!λ v ≥ 1. We use the notation V ∼ ZIP (ρ, λ) so that: E [V ] = (1 − ρ) λ and V ar [V ] = (1 − ρ) λ (1 + ρλ) . – The zero-inflated negative binomial (ZINB) model: The ZINB distribution is the result of considering that the r.v. U follows a negative binomial distribution, as in Eq. (3). We denote by V ∼ ZINB (ρ, μ, φ) and its pmf is given by: ⎧
φ φ ⎪ ⎪ ρ + (1 − ρ) μ+φ , v=0 ⎪ ⎨ P (V = v) = ⎪ Γ (φ + v) μ v φ φ ⎪ ⎪ ⎩ (1 − ρ) ,v≥1 Γ (v + 1)Γ (φ) μ + φ μ+φ where μ ≥ 0 and φ > 0 is the dispersion parameter. Γ (·) represents the gamma function. We have that:
μ E [V ] = (1 − ρ)μ and V ar [V ] = (1 − ρ) μ 1 + + ρμ . φ When ρ = 0, the r.v. V follows a negative binomial distribution, with mean μ and dispersion parameter φ, denoted by V ∼ NB(μ, φ). For more details, see [10–12,21,28]. – The zero-inflated Poisson inverse Gaussian (ZIPIG) model: This case arises when we consider that U , given in Eq. (3), follows a Poisson inverse Gaussian (PIG) distribution. The pmf of the r.v. V , denoted by V ∼ ZIPIG(ρ, λ, φ), is: ⎧ √ ⎪ ρ + (1 − ρ)eφ− φ(φ+2λ) , ⎪ ⎪ ⎪ ⎨ (v−1/2) P (V = v) = ⎪ 2 (1 − ρ) ρ2 [φ(φ + 2λ)]− ⎪ ⎪ ⎪ ⎩
v = 0; eφ (λφ)v v!
Kv−1/2
φ(φ + 2λ) , v ≥ 1 . . .
22
A. M. Garay et al.
where, μ > 0 is the mean, φ is a dispersion parameter and Kλ (t) = 1 1 ∞ λ−1 − 2t (u+ u ) du is the modified Bessel function of the third kind, [1]. e 2 0 u When ρ = 0, the r.v. V follows a PIG distribution, with mean λ and dispersion parameter φ, denoted by V ∼ PIG(μ, φ), and we have that:
μ E [V ] = (1 − ρ)μ and V ar [V ] = (1 − ρ) μ 1 + + ρμ . φ The ZIPIG distribution is a particular case of the mixed Poisson (MP) distribution, with hierarchical representation: V |Z = z ∼ Poisson(μz), where Z follows an Inverse Gaussian (IG) distribution, with mean 1 and dispersion parameter φ, denoted by Z ∼ IG(1, φ). More details are presented and discussed by [4,16]. This manuscript is organized as follows. Section 2 outlines the proposed ZIINAR(1) model and discusses some mathematical properties, including the likelihood function. Section 3 presents the implementation of an EM-type algorithm to estimate the parameters of the model and proposes several bootstrap methods to construct confidence intervals for the parameters. The suitability and applicability of the process are illustrated in Sects. 4 and 5, through simulation studies and analysis of real dataset, respectively. Finally, Sect. 6 concludes with short remarks and some possible avenues for future research.
2
The ZI-INAR(1) Process
The ZI-INAR(1) process is an integer-valued first-order autoregressive process, with ZI innovations, as presented in Eq. (2), given by: Yt = α ◦ Yt−1 + Vt , t ∈ Z,
(6)
where Vt ∼ ZI(ρ, λ; hU (·)). 2.1
Mathematical Properties
In this subsection, we present some mathematical and structural properties of the corresponding marginal distributions of the process: Proposition 1. Let {Yt }t∈Z be a stationary ZI-INAR(1) process; then: (1 − ρ) αE[Ut ] + ρE[Ut ]2 + V ar[Ut ] (1 − ρ)E[Ut ] E[Yt ] = and V ar[Yt ] = , 1−α 1 − α2 where Ut denotes a r.v. with density function (or pmf ) hU (u|λ), for all t ∈ Z. Proof. By using the expectation obtained in Eq. (4), we have that: E[Yt |Yt−1 ] = E[α ◦ Yt−1 + Vt |Yt−1 ] = E[α ◦ Yt−1 |Yt−1 ] + E[Vt |Yt−1 ] = αYt−1 + (1 − ρ) E [Ut ] .
(7)
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
23
Thus, E[Yt ] = E [E (Yt |Yt−1 )] = E [αYt−1 + (1 − ρ) E (Ut )] = αE[Yt−1 ] + (1 − ρ) E [Ut ] .
(8)
Equation (8) is a consequence of the fact that the process is stationary, that is, E[Yt ] = μY , for all t ∈ Z. To obtain the variance of the process, we use: V ar[Yt |Yt−1 = yt−1 ] = V ar[α ◦ Yt−1 + Vt |Yt−1 = yt−1 ] = V ar[α ◦ Yt−1 |Yt−1 = yt−1 ] + V ar[Vt |Yt−1 = yt−1 ] = α(1 − α)yt−1 + (1 − ρ) V ar [Ut ] + ρE 2 [Ut ] . (9) Thus, by (7) and (9) we obtain: V ar[Yt ] = V ar[E(Yt |Yt−1 )] + E[V ar(Yt |Yt−1 )] = α2 V ar[Yt−1 ] + α(1 − α)E[Yt−1 ] + (1 − ρ)(V ar[Ut ] + ρE 2 [Ut ]) = α2 V ar[Yt−1 ] + α(1 − ρ)E[Ut ] + (1 − ρ)(V ar[Ut ] + ρE 2 [Ut ]), (10) where (10) is obtained by the expectation given in Eq. (8). 2.2
The Likelihood Function
Let y = (y1 , y2 , . . . , yn ) be realizations of the ZI-INAR(1) process. Then the likelihood function of the unknown parameters θ = (α, ρ, λ) , given y, can be written as: L (θ|y) = P (Y1 = y1 , Y2 = y2 , . . . , Yn = yn ) = P (Y1 = y1 ) P (Y2 = y2 |Y1 = y1 ) . . . P (Yn = yn |Y1 = y1 , . . . , Yn−1 = yn−1 ) n
P (Yt = yt |Yt−1 = yt−1 ) , (11) = P (Y1 = y1 ) t=2
where P (Yt = yt |Yt−1 = yt−1 ) =
min{yt−1 ,yt }
k=0
yt−1 k α (1 − α)yt−1 −k k
× ρI{0} (yt − k) + (1 − ρ)h (yt − k|λ) .
(12)
IA (·) denotes the indicator function, i.e., IA (y) = 1, if y ∈ A and IA (y) = 0 otherwise. Note that Eq. (12) represents the probability transition of a stationary Markov chain, of state yt−1 to yt . Thus, the marginal probability function can be defined by: P (Yt = yt ) =
∞ yt−1 =0
P (Yt−1 = yt−1 ) P (Yt = yt |Yt−1 = yt−1 ) .
(13)
24
A. M. Garay et al.
From Eqs. (11)–(13), we have that: L (θ|y) = P (Y1 = y1 )
n min{y t−1 ,yt }
yt−1 t=2
k=0
k
αk (1 − α)yt−1 −k
× ρI{0} (yt − k) + (1 − ρ)h (yt − k|λ) .
(14)
The marginal distribution is intractable, so a simple approach to deal with this is to condition it on the observed Y1 , and estimate the parameters by conditional maximum likelihood (CML); see [3,15]. To estimate the parameters of the ZI-INAR (1) process by maximizing this conditional likelihood function directly prevents the possibility of analytical solutions. One alternative is to maximize the complete conditional likelihood using the expectation-maximization (EM) algorithm [6], which is stable and straightforward to implement, since the iterations converge monotonically and no second derivatives are required. In the next section, we discuss a technique to find the ML estimates of the parameters vector θ, based on the EM algorithm.
3
Maximum Likelihood Estimation and Bootstrap Resampling Methods
3.1
Parameter Estimation via the EM Algorithm
In this section we develop an EM-type algorithm for maximum likelihood estimation of the parameters of the ZI-INAR(1) process. The key to the development of our EM-type algorithm is to consider the presence of latent variables and treat the problem as if these variables were in fact observed. As suggested by [12,25], we define the latent variables W = (W1 , . . . , Wn ) and S = (S1 , . . . , Sn ) for all t ≥ 1, where: – St is defined by St = α ◦ Yt−1 . Thus St |Yt−1 = yt−1 , α ∼ Bin (yt−1 ; αi ) , if yt−1 > 0 and it is a degenerate distribution at zero if yt−1 = 0, where Bin (y; α) represents the binomial distribution with parameters y and α ∈ [0, 1]. – Wt is a latent dichotomous variable, that is: 1 if Vt is from the zero state Wt = 0 if Vt ∼ h (·|λ) , with P (Wt = 1) = ρ and h (·|λ) as presented in Definition 2. Let Yc = (Y, W, S) be the complete data vector, where Y and {W, S} represent the observed data and the missing data, respectively, with Y = (Y1 , . . . , Yn ) . Then the joint probability function for Yct = (Yt , St , Wt ) is given by:
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
25
P (Yct = yct ) = P (Yt = yt , St = st , Wt = wt |Yt−1 = yt−1 ) = P (Yt = yt , Wt = wt |St = st , Yt−1 = yt−1 ) P (St = st |Yt−1 = yt−1 ) = ρwt ((1 − ρ) h(yt − st |λ))1−wt ×
y
t−1
st
αst (1 − α)yt−1 −st , for all t ≥ 1.
Thus, the complete likelihood function is defined by: Lc (θ|yc ) = P (Y1 = y1 )
n
P (Yt = Yt , St = st , Wt = wt |Yt−1 = yt−1 )
t=2
n
1−wt yt−1 yt−1 −st wt st α (1 − α) ∝ ρ ((1 − ρ) h (yt − st |λ)) . st t=2
Hence, the complete log-likelihood function is given by: c (θ|yc ) = log (Lc (θ|yc )) n n n
∝ wt log (ρ) + (1 − wt ) log (1 − ρ) + (1 − wt ) log (h (yt − st |λ)) +
t=2
t=2
n
n
st log (α) +
t=2
t=2
(yt−1 − st ) log (1 − α) .
t=2
The EM algorithm has several appealing properties relative to other iterative algorithms such as Newton-Raphson and Fisher’s scoring method for finding MLEs, [23]. One of them is that the sequence of estimates from the EM algorithm increases the likelihood function (θ|y) at each iteration, and under standard regularity conditions the sequence converges to a stationary point of the likelihood. Thus, the EM-algorithm proceeds in two steps: – E-step: Let θ
(k)
be the current k-th step estimate of θ. By using the property (k) ) function given by: of conditional expectation, we compute the Q(θ|θ (k) (k) Q(θ|θ ) = E c (θ|yc )|y, θ . (15)
– M-step: Maximize Q(θ|θ
(k)
(k+1) . ) with respect to θ, obtaining θ
Observe that the expression of the Q-function, in Eq. (15), is determined by the knowledge of the following expectations: (k) (k) st = E St |y, θ (k) (k) w t = E Wt |y, θ and
(k) (k) ∗ . Qt (λ|θ ) = E (1 − Wt ) log h(yt − St |λ)|y, θ
26
A. M. Garay et al.
Thus, the Q-function can be written in more simple form as: Q(θ|θ
(k)
)∝
n
(k)
w t
t=2
+
n
n
log (ρ) +
(k)
(k)
st log (α) + (k)
and w t
(k)
yt−1 − st
n
Q∗t (λ|θ (k) )
t=2
log (1 − α) .
can be obtained by using the following results:
(k) P St = st , Yt = yt |Yt−1 = yt−1 , θ = P St = st |Yt−1 = yt−1 , θ (k) P Yt = yt |Yt−1 = yt−1 , θ (k) × P St = st |Yt−1 = yt−1 , θ (k) P Yt = yt |St = st , Yt−1 = yt−1 , θ (k)
min{yt−1 ,yt }
y =
log (1 − ρ) +
t=2
=
t=2
n
t=2
At each step, st
(k)
1−w t
t−1
st
st =0
(k) P Yt = yt , St = st |Yt−1 = yt−1 , θ
(α(k) )st (1 − α(k) )yt−1 −st ρ(k) I{0} (yt − st ) + (1 − ρ(k) )h(yt − st |λ(k) ) min{yt−1 ,yt }
st =0
(k) P Yt = yt , St = st |Yt−1 = yt−1 , θ
(16)
and (k) P Wt = wt , Yt = yt |Yt−1 = yt−1 , θ (k) P Wt = wt |Yt−1 = yt−1 , θ = (k) P Yt = tt |Yt−1 = yt−1 , θ (k) P Wt = wt , Yt = yt |Yt−1 = yt−1 , θ = . min{yt−1 ,yt } (k) P Yt = yt , St = st |Yt−1 = yt−1 , θ st =0
Then, for all t ≥ 1, we have that: min{yt−1 ,yt } (k)
st
=
(k) st P St = st |Yt−1 = yt−1 , θ
(17)
st =0
(k)
w t
(k) = 1 × P Wt = 1|y, θ
(k) P Wt = 1, Yt = yt |Yt−1 = yt−1 , θ =
. min{y t−1 ,yt } (k) P Yt = yt , St = st |Yt−1 = yt−1 , θ
(18)
st =0
Therefore, our EM algorithm for the ZI-INAR(1) process can be summarized in the following way: (k)
, for t ≥ 1 we compute s , w t , as given in Eq. (17)–(18), – E-step: Given θ t (k) ∗ respectively, and Qt (λ|θ ). (k)
(k)
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
27
(k) by maximizing Q(θ|θ (k) ) over θ, which leads to the – M-step: Update θ following expressions: n st (k+1) = nt=2 , α t=2 yt−1 n w t ρ(k+1) = t=2 , n−1 n (k+1) (k) ∗ λ = arg max Qt (λ|θ ) . (19) λ
t=2
In the following, we develop the procedure to obtain the expressions (k+1) , considering the three particular cases of the ZI models Q∗t (λ|θ (k) ) and λ seen before, i.e., when V ∼ ZI(ρ, λ). For this, we define the expectation b t st with Bt =1 − Wt , given by: (k) b t st = E Bt St |y, θ
y t−1 ,yt } min{y s y −s (k) 1 − ρ(k) (k) t−1 t h yt − st |λ α (k) t 1 − α st t−1 s t st =0 . = (k) P Yt = yt |Yt−1 = yt−1 , θ
Thus, – If Vt ∼ ZIP(ρ,λ), then from Eq. (5) h(·|λ) represents a pmf of the Poisson distribution, with parameter λ. Consequently, (k) (k) Q∗t (λ|θ (k) ) ∝ −λ(1 − w t ) + log(λ)(1 − w t )yt + log(λ)b t st
(k)
(20)
and from Eq. (19) and (20): n
(k+1) = λ
t=2
(k)
(1 − w t )yt − n t=2
(1 −
n (k) b t st
t=2
.
(k) w t )
– If Vt ∼ ZINB(ρ,μ,φ), then h(·|λ) represents the pmf of the negative binomial distribution with parameters λ = (μ, φ). Thus, (k) (k) (k) (k) ∗ (1 − w )yt − bt st Q (λ|θ ) ∝ gt (φ) + log(μ) − log(μ + φ) t
t
− log (Γ (φ)) + φ (log(φ) − log(μ + φ))
+
1−
(k) w t
(21)
with (k) (k) = E Bt log Γ (yt − St + φ)|y, θ g t (φ) =
1−ρ (k)
t−1 ,yt min y st =0
log Γ (yt − st + φ)P(St = st |Yt−1 = yt−1 , θ
(k) P Yt = yt |Yt−1 = yt−1 , θ
(k)
)h(yt − st |θ
(k)
) .
(22)
28
A. M. Garay et al.
When the M-step turns out to be analytically intractable, it can be replaced by a sequence of conditional maximization (CM) steps. The procedure is known as the ECM algorithm [24]. Thus, from Eq. (19), (21) and (22), we (k+1) = ( have that for the ZINB-INAR(1) process, λ μ(k+1) , φ(k+1) ) are given by: n n (k) (k) (1 − w )yt − b t st μ (k+1) =
t
t=2
n
t=2
(1 −
t=2
φ(k+1) = arg max
n
φ
.
(k) w t )
Q∗t (μ(k+1) , φ|θ(k) )
t=2
φ(k+1) is obtained using the “optim” routine in the R software [27]. – If Vt ∼ ZIPIG(ρ,μ,φ), then λ = (μ, φ) and Its hierarchical representation is: Vt |Zt = zt , Wt = 0 ∼ Poisson(μzt ) Vt |Zt = zt , Wt = 1 follows a degenerate distribution at zero; Zt |Wt = 0 ∼ IG(1, φ),
(23) (24)
Wt ∼ Bin(1, ρ),
(25)
t = 2, 3, . . . ,
the density function of Zt |Wt = wt is: f (zt |wt ) =
−3/2 φ1/2 zt
exp
φ (zt − 1)2 − 2 zt
1−wt .
(26)
In order to develop the EM algorithm for the ZIPIG-INAR model, we add the latent variable Zt . Thus, the complete-data is defined by Yc = (Y, W, S, Z). Thus, Q∗t (μ, φ|θ (k) ) ∝ −μb t zt
(k)
− log(μ)b t st
(k)
(k)
+ log(μ)(1 − w t )yt (k) (k) log(φ) φ (k) (k) (1 − w t ) + φ(1 − w b + t ) − + b , t zt t /zt 2 2
(k) where the expectations b and b t zt t /zt
(k) (k) b z = E E B Z |S , y, θ t t t t t min{yt−1 ,yt }
=
(k)
are defined by:
(k) 1 − ρ(k) (yt − st + 1)h yt − st + 1|λ
st =0
× P St = st |Yt = yt , Yt−1
(k) = yt−1 , θ
1
× μ (k)
(27)
(k) (k) (k) h yt − st |λ ρ I{yt =st } + 1 − ρ
(28)
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
29
and b t /zt
(k)
(k) = E E Bt Zt−1 |St , , y, θ
min{yt−1 ,yt }
=
(k) )P(St = st |Yt = yt , Yt−1 = yt−1 , θ (k) ) μ (k) h(yt − st − 1|λ
(k) ) (yt − st )h(yt − st |λ (k) ) + ρ(k) ( ρ(k) + 2 μ(k) ) + 1 1 − ρ(k) h(0|λ st =0
I{yt >st }
(k) × P St = yt |Yt = yt , Yt−1 = yt−1 , θ ×
1 I {yt =st } , (k) ρ(k) ρ(k) + 1 − ρ(k) h 0|λ
(29)
(k) is given in Eq. (16). Thus, from where P St = st |Yt = yt , Yt−1 = yt−1 , θ Eq. (19) and (27), we have that: n
μ (k+1) =
t=2
(k)
(1 − w t )yt − n b t zt
n (k) b t st
t=2
.
t=2 n
φ(k+1) =
t=2
(k)
(1 − w t )
n n n (k) (k) (k) + − 2 (1 − w t ) b b t zt t /zt
t=2
t=2
.
t=2
This process is iterated until some convergence rule is satisfied. Here, we use the Aitken acceleration-based stopping criterion [23] as a convergence rule. This criterion is based on the fact that the limit of the sequence (k+1) = (θ(k+1) |y), (k+1) denoted by ∞ , can be approximated by ∞ = (k) + ((k+1) − (k) )/(1 − c(k) ), (k) (k+1) (k) (k) (k−1) − )/( − where c = ( ). As suggested by [31], we decided to (k+1) (k+1) stop the algorithm when ∞ − < ε = 10−5 . 3.2
Bootstrap Resampling Methods
In this section, we propose several bootstrap alternatives to construct confidence intervals for the parameter. Actually, we can indifferently use the posterior mean of the Bayesian approach or the standard maximum likelihood estimators (mle), or any other efficient estimators. In any case, the standard error can be difficult to compute, so several bootstrap approaches can be used to solve this issue. Here we propose two types of approaches, based respectively on some strong parametric assumptions, on the mixing properties of the INAR(p) process and on the Markov property of this process.
30
A. M. Garay et al.
(a) Parametric bootstrap approach: Let θ = (α, ρ, λ) be the parameters of the ZI-INAR(1) process, where we n ) as the mle estimator obtained using the EM algon = ( αn , ρn , λ denote θ rithm developed in Sect. 3.1. The properties of mle for general INAR(p) models are studied, for instance, in [9], who showed strong consistency of the mle, and [20] who studied the efficiency and asymptotic normality in Proposition 6.1. Other estimators, based on saddle point approximation of the likelihood, which may be easier to implement, were proposed in [26]. In n any kind of asymptotically efficient estimator the following, we denote by θ of the parameter θ. The parametric bootstrap method simply consists of generating new data in the model with estimated parameters. That is: Step 1: Generate the random variable Vt∗ from the ZI distribution, with paramn . eters θ Step 2: Generate a ZI-INAR(1) model recursively as follows: Y1∗ = y1 ∗ Yt∗ = α n ◦ Yt−1 + Vt∗ , t = 2, . . . , n, ∗ d Yt−1 ∗ ∗ ∗ ∗ meaning that if Yt−1 > 0, then α n ◦ Yt−1 |Yt−1 = i=1 Zi with ∗ ∗ ∗ n ) or α n ◦ Yt−1 |Yt−1 is degenerate at zero if Yt−1 = 0. Zi∗ ∼ Bin(1, α Notice that the first observation of the bootstrap ZI-INAR(1) process is set to the first observation of the observed data: asymptotically, even if the generated process is not stationary, this will not perturb the asymptotic properties of the bootstrap process. Now the bootstrap counter n ), say θ ∗ = ( ∗ ), are the mle obtained n = ( αn , ρn , λ αn∗ , ρ∗n , λ parts of θ n n using our proposed EM algorithm, considering the bootstrap process Y∗ = (Y1∗ , . . . , Yn∗ ). Step 3: Since the exact bootstrap distribution of θn∗ may be difficult (and time consuming) to compute, it is replaced by a Monte Carlo approximation that is repeats the procedure in Step 2, B times, by generating times series y∗(b) , b = 1,. . . , B, for realization of the bootstrap process Y∗ . ∗(b) , b = 1, . . . , B estimators. Then we can compute the corresponding θ n The centered and normalized bootstrap distribution is then given by, for u = (u1 , . . . , u4 ) ∈ R4 :
Kn∗(B) (u) =
B 1 I{√n(θ∗(b) −θ )≤u} . n n B b=1
If the bootstrap distribution is asymptotically valid, is, it is a con√ that n − θ) ≤ u) then vergent distribution (at least in probability) of P( n(θ it is easy to use the quantile of the Monte Carlo bootstrap distribution to construct (simultaneous) confidence intervals for the parameter θ or any regular (at least differentiable) functional of the parameter.
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
31
Concerning the choice of B, it is known from the work by [13] that one needs to choose B so that it at least has the same order as n and such that (B + 1)(1 − γ)/2 is an integer (when constructing asymptotic twosided confidence intervals of level γ. So typically for γ = 95% (for size of n smaller than 1000) we choose B = 999. Proposition 2. Assume that the parameter space of θ = (α, ρ, λ) with λ = n to be the mle of (μ, φ) is given by Θ =]0, 1[×]0, 1[×]0, ∞[×]0, ∞[ and consider θ the ZI-INAR(1) process. Then the parametric bootstrap in asymptotically correct, meaning that almost surely along the sample, √ √ ∗ − θ n ) ≤ u) − P( n(θ n − θ) ≤ u) → 0 when n → ∞ sup P( n(θ n u∈R4
Proof: In [5], Theorem 2.1 gives necessary and sufficient conditions for the validity of the bootstrap of mle estimators or efficient estimators in regular models (in the LeCam sense). These conditions essentially reduce to two conditions: 1) LAN (locally asymptotically normal) property of the log-likelihood at the true parameter (see Definition 2.1 in [5]); and 2) LAE (locally asymptocally equivarin (see Definition 2.2 in [5]). In the open domain ant) condition for the estimator θ Θ, we have that l(θ|y) is twice differentiable with a non degenerate information matrix, which implies differentiability in quadratic mean in the LeCam sense (see [30], Lemma 7.6, p. 95). As a consequence, it has the LAN property (see 7.14 and 7.15 in [30] p. 104). As far as the LAE condition is concerned, this follows from the fact that the ZI-INAR(1) model may be written as a regular model on the domain Θ (see 7.16, p. 104 as well as the references in [26]): notice that any efficient estimator has the LAE property, so that one may use indifferently the mle in the construction or the estimators proposed in [26]. (b) Moving block bootstrap approach When no specific assumption is made about the distribution of the residuals, it is still possible to implement semiparametric estimators of the ZIINAR process, as described in [26]. In that case, a parametric bootstrap process cannot be used. Moreover, to assess the robustness of the parametric assumptions, it may also be interesting to implement a more robust version of the bootstrap process. The more general method is based on splitting the original time series into overlapping blocks which are then resampled to reconstruct the original time series. The procedure in our case√is as follows: Step 1: Choose a length b (which will typically be of size b = o ( n)) . Define the overlapping blocks B1 = (Y1 , . . . , Yb ), B2 = (Y2 , . . . , Yb+1 ), . . . , Bn−b+1 = (Yn−b+1 , . . . , Yn ) Use circular block bootstrap, or even better, apply stationary block bootstrap (see [7]), which allows simulating a stationary version of the moving block bootstrap, in place of the moving block bootstrap method.
32
A. M. Garay et al.
Step 2: Draw without replacement [ nb ] + 1 blocks, which are bound together (and possible truncated at the end) to form a new time series of size n. After this, compute the statistics of interest, either the mle or estimators based on approximations (see [26]) of the corresponding time series. Step 3: Just like in the Step 3 of the parametric bootstrap, use a Monte Carlo method to obtain an approximation of the bootstrap distribution. It has been shown in [17] that under some strong mixing conditions and bn → 0, the moving block bootstrap is asympprovided that bn → ∞ and √ n totically valid. A lot of variation and modifications have been proposed in the literature to obtain valid second-order approximations (see for instance [18] for a complete overview and references). However, it should be noticed that according to the distribution of Vt , the INAR(p) process may not strongly mix. The process can be shown to strongly mix when the residuals Vt have Poisson distribution. However, it can be shown that the process is only weakly mixing, or ψ − weak mixing for more general distributions; see [8]. This includes the ZI cases studied here, when 0 < α < 1, since the moments of all residuals are finite. The validity of the stationary moving block bootstrap method (as well as variations including the circular block bootstrap) in the weak mixing case has been studied in [14], see Theorem 3.2. As noticed by the authors, their theorem can be easily extended to functionals of means, Frechet differentiable functionals (which can be simply linearized) or even Hadamard differentiable functionals (including M-estimators). As a consequence, this method will be asymptotically valid for the mle, which is LAE (it is less obvious for the estimators proposed in [26]). 3.3
Confidence Intervals
In the second simulation study, we use and compare two different types of asymptotically valid confidence intervals, mainly bootstrap variance based asymptotic intervals and the standard percentile method. We briefly describe these methods in our context. Consider θk the components of the mle of θ = (θ1 , . . . , θ4 ) = (α, ρ, μ, φ) and ∗(b) denote by θk,l , b = 1, . . . , B, the corresponding Bootstrap replications obtained at the Monte-Carlo step, either by the parametric (l = 1) or moving block (l = 2) techniques. In the simulation we will choose B such that (B + 1)(1 − γ)/2 is an integer, as suggested by [13]. 3.3.1 Variance Based Asymptotic Interval The bootstrap variance of θk is simply given by: 2 B B 1 ∗(b) 1 ∗(i) ∗ Vk,l = . θk,l − θ B B i=1 k,l b=1
First-Order Integer Valued AR Processes with Zero-Inflated Innovations
33
Thus, the asymptotic normal approximation leads to an asymptotic γ confidence interval of the type ∗1/2 ∗1/2 , θk − u 1+γ Vk,l , θk + u 1+γ Vk,l 2
2
where u 1+γ is a quantile of order 2
1+γ 2
of a standard normal distribution.
3.3.2 Standard Percentile Method 1+γ ∗ ∗ and θk,l as the 1−γ Define respectively θk,l 2 and 2 quantiles of the ( 1−γ ( 1+γ 2 ) 2 ) ∗(b) empirical distribution of θk,l , b = 1, . . . , B. Then the standard percentile confidence interval (which has the same asymptotic property as the preceding one) is given by: ∗ ∗ 1+γ . θk,l 1−γ , θ k,l ( ) ( ) 2
4
2
Simulation Study
In this section we show the performance of the estimation procedure for the ZI-INAR(1) process, using the EM algorithm proposed and bootstrap resampling methods described before. Thus, we present examples considering artificial datasets, in different scenarios. All the computational codes were implemented using the R software [27] and the program codes are available from us on request. We present two simulation studies. The first one investigate the asymptotic properties of the maximum likelihood estimation. In the second one, we use and compare different type of asymptotically valid confidence intervals. 4.1
Simulation 1: Asymptotic Properties
We consider the set of sample sizes n ∈ {100, 300, 500, 1000} to analyze the consistency of the mle of parameters. We fix α = 0.3, ρ ∈ {0.3, 0.6}. Thus, by considering the eight different combinations of n and ρ, we simulate N = 300 Monte Carlo replicates for the following ZI-INAR(1) processes: (i) ZIPINAR(1) with λ = 2; (ii) ZINB-INAR(1) and ZIPIG-INAR(1) with μ = 2 and φ ∈ {0.75, 1.5, 2.5}. We compute the relative bias (RB) and root relative square error (RRSE), defined by: N 1 θij − θi RB(θi ) = N j=1 θi
and
2 ! N !1 ij − θi θ RRSE(θi ) = " , N j=1 θi
where θij is the mle of parameter θi , computed in the j-th sample. Tables 1, 2 and 3 show that both RB and RRSE decrease when the sample size increases, for all the parameter estimates.
34
A. M. Garay et al.
Table 1. RB and RRSE of the parameter estimates of ZIP-INAR(1) processes, with α = 0.3 and λ = 2. λ RRSE RB
ρ
n
α RB
ρ RRSE RB
0.3
100 300 500 1000
−0.0668 −0.0166 −0.0229 −0.0020
0.3141 0.1635 0.1188 0.0870
−0.0764 −0.0090 −0.0128 −0.0150
0.3436 0.1943 0.1372 0.0995
−0.0035 0.0058 0.0033 −0.0024
0.1163 0.0777 0.0579 0.0395
0.6
100 300 500 1000
−0.0433 −0.0118 −0.0117 −0.0095
0.2362 0.1289 0.0985 0.0750
−0.0212 −0.0068 −0.0050 −0.0031
0.1337 0.0707 0.0488 0.0388
0.0007 −0.0058 0.0021 −0.0040
0.1582 0.0895 0.0650 0.0472
RRSE
Table 2. RB and RRSE of the parameter estimates of ZINB-INAR(1) processes, with α = 0.3 and μ = 2. φ
ρ
0.75 0.3
0.6
n
0.3
0.3
0.0051
0.2080 −0.0001 0.7010 0.0971
0.3614 0.3417
1.8875
300
0.0042
0.1117 −0.0365 0.5423 0.0344
0.2355 0.1165
0.6861
RRSE
RB
RRSE
RB
RRSE
0.1873 0.0696
0.4845
500
0.0022
0.0842 −0.0238 0.4571 0.0143
1000
0.0019
0.0595 −0.0424 0.3500 −0.0048 0.1376 0.0111
0.2943
100
0.0052
0.1893 −0.1443 0.4576 0.0306
0.4736 0.2758
3.2307
300
−0.0148 0.1091 −0.0688 0.2908 0.0106
500
0.0083
0.2980 0.0990
0.9289
0.0830 −0.0684 0.2552 −0.0158 0.2584 0.0125
0.6428
−0.0021 0.0612 −0.0242 0.1472 −0.0076 0.1692 0.0602
0.3940
100
−0.0157 0.2241 −0.0666 0.6120 0.0183
0.2399 0.2144
1.4996
300
−0.0015 0.1219 −0.0834 0.4179 −0.0147 0.1604 0.0298
0.6910
500
−0.0019 0.0899 −0.0185 0.3245 0.0084
0.4348
0.0020
0.1294 0.0585
0.0687 −0.0333 0.2436 −0.0043 0.0950 −0.0039 0.2898
100
−0.0200 0.2105 −0.1236 0.3800 −0.0118 0.3557 0.0959
2.1109
300
0.0005
0.1204 −0.0407 0.1930 −0.0130 0.2056 0.0401
1.1545
500
−0.0109 0.0949 −0.0256 0.1351 −0.0167 0.1607 0.0213
0.6549
−0.0004 0.0603 −0.0218 0.0883 −0.0184 0.1161 −0.0314 0.3832
100
−0.0388 0.2513 −0.1162 0.5348 0.0030
0.2104 0.0319
0.9771
300
−0.0050 0.1370 −0.0642 0.3803 −0.0077 0.1390 0.1027
0.9715
500
−0.0212 0.1089 −0.0502 0.2699 −0.0061 0.1037 0.0178
0.5092
1000 0.6
φ
100
1000 2.5
RB
μ
RRSE
1000 0.6
ρ
RB
1000 1.5
α
−0.0066 0.0764 −0.0281 0.1829 −0.0082 0.0755 −0.0169 0.3569
100
−0.0169 0.2278 −0.1465 0.3334 −0.0959 0.2975 −0.1273 1.3087
300
−0.0125 0.1194 −0.0165 0.1262 −0.0176 0.1730 0.0525
1.0700
500
−0.0030 0.0889 −0.0088 0.0954 −0.0022 0.1286 0.0730
0.8590
1000
−0.0006 0.0697 −0.0065 0.0690 −0.0025 0.0977 |λ2 | |λ1 | < |λ2 | λ1 = λ2 = λ λ1 = −λ2 and h even λ1 = −λ2 and h odd
CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) ∼ α D1 λh1 ∼ α D2 λh2 ∼ α D3 hλh ∼ α (D1 + D2 )λh1 ∼ α (D1 − D2 )λh1
Table 2. Asymptotic formulas of the auto-codifference function for the time series {X2 (t)} expressed with the eigenvalues of the coefficients matrix Θ denoted as λ1 and λ2 . CASES B) |λ1 | > |λ2 | |λ1 | < |λ2 | λ1 = λ2 = λ λ1 = −λ2 and h even λ1 = −λ2 and h odd
CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) ∼ α E1 λh1 ∼ α E2 λh2 ∼ α E3 hλh ∼ α (E1 + E2 )λh1 ∼ α (E1 − E2 )λh1
where “∼” denotes the asymptotic behavior for h → +∞ and the constants Dk and Ek for k = 1, 2, 3 are given in Eqs. (32), (34), 42) and (44), (47), (51), respectively (Tables 1, 2). Proof. See Appendix B. Theorem 2. For t ∈ Z let {X(t)} = {X1 (t), X2 (t)} be the bounded solution of the system given by Eq. (3) formulated in Eq. (5), where the coefficients a2 , a3 in Eq. (2) are nonzero and 1 < α < 2. In this instance for the auto-covaration functions we obtain that
Asymptotics of Alternative Interdependence Measures
47
Table 3. Asymptotic formulas of the auto-covariation function for the time series {X1 (t)} expressed with the eigenvalues of the coefficients matrix Θ denoted as λ1 and λ2 . CV(X1 (t), X1 (t − h)) CV(X1 (t), X1 (t + h)) α−1 A) |λ1 | > |λ2 | ∼ D1 λh1 ∼ D4 λh1 α−1 |λ1 | < |λ2 | ∼ D2 λh2 ∼ D5 λh2 α−1 λ1 = λ2 = λ ∼ D3 hλh ∼ D6 hλh α−1 λ1 = −λ2 and h even = (D1 + D2 )λh1 = D7 λh1 α−1 λ1 = −λ2 and h odd = (D1 − D2 )λh1 = D8 λh1
Table 4. Asymptotic formulas of the auto-covariation function for the time series {X2 (t)} expressed with the eigenvalues of the coefficients matrix Θ denoted as λ1 and λ2 . CV(X2 (t), X2 (t − h)) CV(X2 (t), X2 (t + h)) α−1 B) |λ1 | > |λ2 | ∼ E1 λh1 ∼ E4 λh1 α−1 |λ1 | < |λ2 | ∼ E2 λh2 ∼ E5 λh2 α−1 ∼ λ1 = λ2 = λ ∼ E3 hλh ∼ E6 hλh α−1 λ1 = −λ2 and h even = (E1 + E2 )λh1 = E7 λh1 α−1 λ1 = −λ2 and h odd = (E1 − E2 )λh1 = E8 λh1
where “∼” denotes the asymptotic behavior for h → +∞, “=” denotes the exact formula and the constants Dk and Ek for k = 1, . . . , 8 are given in Eqs. (32), (34), (42), (61), (63), (72), (65), (67) and in Eqs. (44), (47), (51), (76), (79), (83), (81), (82), respectively (Tables 3, 4). Proof. See Appendix C. Remark 1. In Theorems 1 and 2 by the asymptotic behavior we mean the asymptotic equivalence of the functions f (h) and g(h) for h → +∞, that is f (h) ∼ g(h) for h → +∞ if and only if
f (h) = 1. h→+∞ g(h) lim
In the following part of this section, we consider an exemplary twodimensional AR(1) time series with a particular spectral measure of the α−stable noise. For this model, we present the exact expressions of some constants presented in Theorems 1–2 by specifying the constants corresponding to the case of |λ1 | > |λ2 |, which is also considered in Sect. 4.
48
A. Grzesiek and A. Wyloma´ nska
Example 1. For t ∈ Z let {X(t)} = {X1 (t), X2 (t)} be a two-dimensional α−stable AR(1) model with the discrete spectral measure of the following form Γ (·) = γ1 δ((z1 , z2 )) + γ2 δ((−z1 , −z2 )) + γ3 δ((−z1 , z2 )) + γ4 δ((z1 , −z2 )), (11) where γ1 = γ2 = ν > 0 γ3 = γ4 = ξ > 0.
(12)
Moreover, let us assume that z1 = cos φ and z2 = sin φ, where 0 < φ < π/2, 1 < α < 2 and the coefficients a2 , a3 in Eq. (2) are nonzero. This model was considered in the authors’ previous paper, see Grzesiek et al. (2020). For this time series, in the case of |λ1 | > |λ2 |, the formulas given in Theorems 1–2 simplify to the following expressions A) CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) +∞ α−1 j j ∼ α S1 λ1 λ1 (z1 (λ2 − a1 ) − a2 z2 ) + λj2 (z1 (a1 − λ1 ) + z2 a2 ) j=0
+ S2
+∞
α−1 λj1 λj1 (z1 (a1 − λ2 ) − a2 z2 ) + λj2 (z1 (λ1 − a1 ) + z2 a2 )
λh 1,
j=0
CV(X1 (t), X1 (t − h)) +∞ α−1 j j ∼ S1 λ1 λ1 (z1 (λ2 − a1 ) − a2 z2 ) + λj2 (z1 (a1 − λ1 ) + z2 a2 ) j=0
+ S2
+∞
α−1 λj1 λj1 (z1 (a1 − λ2 ) − a2 z2 ) + λj2 (z1 (λ1 − a1 ) + z2 a2 )
λh 1,
j=0
CV(X1 (t), X1 (t + h)) +∞ j α−1 j ∼ S3 λ1 λ1 (z1 (λ2 − a1 ) − a2 z2 ) + λj2 (z1 (a1 − λ1 ) + z2 a2 ) j=0
+ S4
+∞
λj1
α−1 λj1 (z1 (a1 − λ2 ) − a2 z2 ) + λj2 (z1 (λ1 − a1 ) + z2 a2 )
α−1 λh , 1
j=0
where S1 =
2ν(z1 (λ2 − a1 ) − z2 a2 ) , |λ2 − λ1 |α
S3 =
2ν(z1 (λ2 − a1 ) − z2 a2 )α−1 , |λ2 − λ1 |α
S2 =
2ξ(z1 (a1 − λ2 ) − z2 a2 ) , |λ2 − λ1 |α
S4 =
2ξ(z1 (a1 − λ2 ) − z2 a2 )α−1 , |λ2 − λ1 |α
Asymptotics of Alternative Interdependence Measures
49
B) CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) +∞ α−1 j j ∼ α S5 λ1 λ1 (z2 (λ2 − a4 ) − z1 a3 ) + λj2 (z2 (a4 − λ1 ) + z1 a3 ) j=0
+ S6
+∞
λj1
α−1 λj1 (z2 (λ2 − a4 ) + z1 a3 ) + λj2 (z2 (a4 − λ1 ) − z1 a3 )
λh 1,
j=0
CV(X2 (t), X2 (t − h)) +∞ α−1 j j ∼ S5 λ1 λ1 (z2 (λ2 − a4 ) − z1 a3 ) + λj2 (z2 (a4 − λ1 ) + z1 a3 ) j=0
+ S6
+∞
α−1 λj1 λj1 (z2 (λ2 − a4 ) + z1 a3 ) + λj2 (z2 (a4 − λ1 ) − z1 a3 )
λh 1,
j=0
CV(X2 (t), X2 (t + h)) +∞ j α−1 j ∼ S7 λ1 λ1 (z2 (λ2 − a4 ) − z1 a3 ) + λj2 (z2 (a4 − λ1 ) + z1 a3 ) j=0
+ S8
+∞
λj1
α−1 λj1 (z2 (λ2 − a4 ) + z1 a3 ) + λj2 (z2 (a4 − λ1 ) − z1 a3 )
α−1 λh , 1
j=0
where
4
S5 =
2ν(z2 (λ2 − a4 ) − z1 a3 ) , |λ2 − λ1 |α
S7 =
2ν(z2 (λ2 − a4 ) − z1 a3 )α−1 , |λ2 − λ1 |α
S6 =
2ξ(z2 (λ2 − a4 ) + z1 a3 ) , |λ2 − λ1 |α
S8 =
2ξ(z2 (λ2 − a4 ) + z1 a3 )α−1 . |λ2 − λ1 |α
Simulations
Here we illustrate the theoretical results given in Sect. 3 by presenting the graphs corresponding to an exemplary two-dimensional α−stable AR(1) time series specified in Example 1 in the previous section. For this model the spectral measure is defined in Eq. (11) and we assume √ 3 1 , ν = 0.1 and ξ = 0.2. z 1 = , z2 = 2 2 Futhermore, let us take α = 1.5 and 0.4 −0.3 Θ= . −0.1 0.2 Consequently, we have λ1 = 0.5 and λ2 = 0.1. Since both eigenvalues are less that 1 in absolute value, the bounded solution exists and has the form presented in
50
A. Grzesiek and A. Wyloma´ nska
Eq. (5). Since we consider the case of |λ1 | > |λ2 | the formulas given in Theorems 1–2 are defined more precisely in Example 1. In Figs. 1–2 we present the auto-codifference (upper panel) and the autocovariation (middle and bottom panels) functions corresponding to the first {X1 (t)} and to the second time series {X2 (t)} time series, respectively. Both figures present the comparisons of the theoretical formulas (red solid line), the asymptotic formulas (dash line with black dots) and the empirical counterparts derived from the simulated trajectories (blue circles). It is important to mention here that for the empirical auto-dependence measures we apply the estimators commonly used in the literature, see Rosadi and Deistler (2011) and Kruczek et al. (2017). As we can see in Figs. 1–2, the lines representing the asymptotic behavior are approaching those representing the theoretical formulas with the increase in h whereas the empirical values oscillate around the theoretical ones. (a)
CD(h) = CD(-h)
1
0.1 0
0.5 -0.1 20
0
5
10
15
1
CV(-h)
25
30
35
40
45
50
40
45
50
0 20
25
30
35
0.1
h (b)
0
0.5 -0.1 20
25
30
35
40
45
50
0 0
5
10
15
20
25
30
35
40
45
50
h 1
0.1
(c)
0
CV(h)
0.5 -0.1 20
25
30
35
40
45
50
0 0
5
10
15
20
25
30
35
40
45
50
h asymptotic
theoretical
empirical
Fig. 1. The comparison of the theoretical, asymptotical and empirical values taken by the auto-codifference (top panel) and the auto-covariation (middle and bottom panel) function corresponding to the first component, denoted as {X1 (t)}, of an exemplary two-dimensional α−stable AR(1) model considered in Sect. 4. The trajectory length is 20000.
Asymptotics of Alternative Interdependence Measures (a)
CD(h) = CD(-h)
1.5
0.1
1
0 -0.1 20
0.5
25
30
35
40
45
50
0 0
5
10
15
1
CV(-h)
51
20
25
30
35
40
45
50
25
30
35
40
45
50
25
30
35
40
45
50
25
30
35
40
45
50
25
30
35
40
45
50
(b) h
0.1 0
0.5 -0.1 20
0 0
5
10
15
20
h (c)
1
0.1 0
CV(h)
0.5 -0.1 20
0 0
5
10
15
20
h asymptotic
teoretical
empirical
Fig. 2. The comparison of the theoretical, asymptotical and empirical values taken by the auto-codifference (top panel) and the auto-covariation (middle and bottom panel) function corresponding to the second component, denoted as {X2 (t)}, of an exemplary two-dimensional α−stable AR(1) model considered in Sect. 4. The trajectory length is 20000.
5
Conclusions
In this paper, we concentrated on the auto-dependence measures applied to describe the interdependence of the two-dimensional α-stable AR(1) model. We considered here two particularly well-used measures which are well-defined in the α−stable case, namely the auto-codifference and the auto-covariation functions. Both measures were applied to the components of the two-dimensional model treated as one-dimensional processes. The main result of the paper are Theorems 1–2 where the asymptotic formulas for the considered measures of interdependence are presented. The results can be considered as an extension of the ones presented in Nowicka (1997) where the asymptotics of the auto-dependence measures for the one-dimensional autoregressive models was examined. This work is also a complement to the paper (Grzesiek and Wyloma´ nska 2019), where the authors considered the asymptotic of cross-dependence measures for the same model.
52
A. Grzesiek and A. Wyloma´ nska
Appendix A: Lemma 1. Let {X(t)} = {X1 (t), X2 (t)} with t ∈ Z be the bounded solution of Eq. (3) given by Eq. (5). 1. For two different eigenvalues of Θ indicated as λ1 and λ2 , |λ1 | < 1 and |λ2 | < 1, let us introduce the following notation A1 = A1 (s1 , s2 , a1 , a2 , λ1 , λ2 , j) =
λj1 (λ2 s1 − a1 s1 − a2 s2 ) , λ2 − λ1
B1 = B1 (s1 , s2 , a1 , a2 , λ1 , λ2 , j) =
λj2 (−λ1 s1 + a1 s1 + a2 s2 ) , λ2 − λ1
C1 = C1 (s1 , s2 , a1 , a2 , λ1 , λ2 , j) =
λj1 (−a2 s2 + λ2 s1 − a1 s1 ) + λj2 (a2 s2 − λ1 s1 + a1 s1 ) . λ2 − λ1
Then, for h ∈ N0 we obtain that (a) for 0 < α < 2 CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) +∞ h α α h h α |λh Γ (ds), = 1 A1 + λ2 B1 | + |C1 | − |C1 − (λ1 A1 + λ2 B1 )| j=0
(13)
S2
(b) for 1 < α < 2 CV(X1 (t), X1 (t − h)) =
+∞ j=0
λh1 A1 + λh2 B1 Γ (ds),
(14)
α−1 C1 λh1 A1 + λh2 B1 Γ (ds).
(15)
α−1
S2
C1
and CV(X1 (t), X1 (t + h)) =
+∞ j=0
S2
2. For equal eigenvalues of Θ indicated as λ1 = λ2 = λ, |λ| < 1, let us introduce the following notation A2 = A2 (s1 , s2 , a1 , a2 , λ, j) = jλj−1 a1 s1 − jλj s1 + λj s1 + jλj−1 a2 s2 , B2 = B2 (s1 , s2 , a1 , a2 , λ, j) = λj−1 a1 s1 − λj s1 + λj−1 a2 s2 , C2 = C2 (s1 , s2 , a1 , a2 , λ, j) = jλj−1 a1 s1 + jλj−1 a2 s2 − jλj s1 + λj s1 .
Then, for h ∈ N0 we obtain that (a) for 0 < α < 2 CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) +∞ h h α α h h α = |λ A2 + hλ B2 | + |C2 | − |C2 − (λ A2 + hλ B2 )| Γ (ds), j=0
S2
(16)
Asymptotics of Alternative Interdependence Measures
53
(b) for 1 < α < 2 +∞
CV(X1 (t), X1 (t − h)) =
j=0
λh A2 + hλh2 B2 Γ (ds),
(17)
α−1 C2 λh A2 + hλh B2 Γ (ds).
(18)
α−1
C2
S2
and CV(X1 (t), X1 (t + h)) =
+∞ j=0
S2
Proof. The proof is analogous to the ones presented in the authors’ previous papers, see Grzesiek et al. (2019) for the codifference function and Grzesiek et al. (2020) for the covariation function. We also use the formulas given in Eqs. (7–8). Lemma 2. Let {X(t)} = {X1 (t), X2 (t)} with t ∈ Z be the bounded solution of Eq. (3) given by Eq. (5). 1. For two different eigenvalues of Θ indicated as λ1 and λ2 , |λ1 | < 1 and |λ2 | < 1, let us introduce the following notation A3 = A3 (s1 , s2 , a3 , a4 , λ1 , λ2 , j) =
λj1 (λ2 s2 − a3 s1 − a4 s2 ) , λ2 − λ1
B3 = B3 (s1 , s2 , a3 , a4 , λ1 , λ2 , j) =
λj2 (−λ1 s2 + a3 s1 + a4 s2 ) , λ2 − λ1
C3 = C3 (s1 , s2 , a3 , a4 , λ1 , λ2 , j) =
λj1 (−a3 s1 + λ2 s2 − a4 s2 ) + λj2 (a3 s1 − λ1 s2 + a4 s2 ) . λ2 − λ1
Then, for h ∈ N0 we obtain that (a) for 0 < α < 2 CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) +∞ h h α α h h α |λ1 A3 + λ2 B3 | + |C3 | − |C3 − (λ1 A3 + λ2 B3 )| Γ (ds), = j=0
(19)
S2
(b) for 1 < α < 2 CV(X2 (t), X2 (t − h)) =
+∞ j=0
λh1 A3 + λh2 B3 Γ (ds),
(20)
α−1 C3 λh1 A3 + λh2 B3 Γ (ds).
(21)
α−1
S2
C3
and CV(X2 (t), X2 (t + h)) =
+∞ j=0
S2
2. For equal eigenvalues of Θ indicated as λ1 = λ2 = λ, |λ| < 1, let us introduce the following notation A4 = A4 (s1 , s2 , a3 , a4 , λ, j) =jλj−1 a3 s1 − jλj s2 + λj s2 + jλj−1 a4 s2 , B4 = B4 (s1 , s2 , a3 , a4 , λ, j) =λj−1 a3 s1 − λj s2 + λj−1 a4 s2 , C4 = C4 (s1 , s2 , a3 , a4 , λ, j) =jλj−1 a3 s1 + jλj−1 a4 s2 − jλj s2 + λj s2 .
Then, for h ∈ N0 we obtain that
54
A. Grzesiek and A. Wyloma´ nska
(a) for 0 < α < 2 CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) +∞
=
S2
j=0
h
h
α
|λ A4 + hλ B4 |
α
+ |C4 |
h
h
α
− |C4 − (λ A4 + hλ B4 )|
Γ (ds),
(22)
(b) for 1 < α < 2 CV(X2 (t), X2 (t − h)) =
+∞ j=0
λh A4 + hλh2 B4 Γ (ds),
(23)
α−1 C4 λh A4 + hλh B4 Γ (ds).
(24)
α−1
S2
C4
and CV(X2 (t), X2 (t + h)) =
+∞ j=0
S2
Proof. The proof is analogous to the ones presented in the authors’ previous papers, see Grzesiek et al. (2019) for the codifference function and Grzesiek et al. (2020) for the covariation function. We also use the formulas given in Eqs. (7–8).
Appendix B: Proof. A) Let us consider the auto-codifference function of {X1 (t)} given in Lemma 1. – For λ1 = λ2 , and |λ1 | < 1, |λ2 | < 1, we examine the auto-codifference given in Eq. (13). I) Let us assume that |λ1 | > |λ2 |. In this case, one can show that lim
h→+∞
j=0
j=0
λh 1
h λ A1 + λh B1 α + |C1 |α − C1 − (λh A1 + λh B1 )α 1 2 1 2
lim
h→+∞
+∞ ()
=
S2
j=0
+∞ ()
=
h λ A1 + λh B1 α + |C1 |α − C1 − (λh A1 + λh B1 )α 1 2 1 2
+∞
S2
lim
S2 h→+∞
λh 1
h λ A1 + λh B1 α + |C1 |α − C1 − (λh A1 + λh B1 )α 1 2 1 2 λh 1
Γ (ds)
Γ (ds)
Γ (ds).
(25) From the dominated convergence theorem Weir (1973), let us notice that ( ) holds if the sum over j ∈ N0 in Eq. (25) converges uniformly. Now, from the below inequalities ||a|α + |b|α − |a + b|α | ≤ (α + 1)|a|α + α|a||b|α−1 , |a + b|α ≤ 2α−1 (|a|α + |b|α ) ,
(26)
Asymptotics of Alternative Interdependence Measures
55
satisfied for a, b ∈ IR and 1 < α < 2, see Maejima and Yamamoto (2003), for all j ∈ N0 we can show that λh1 A1 + λh2 B1 α + |C1 |α − C1 − (λh1 A1 + λh2 B1 )α Γ (ds) S2 λh1
≤ 2α−1 (α + 1) |A1 |α Γ (ds) + |B1 |α Γ (ds) S2
S2
|B1 | |C1 |α−1 Γ (ds) = Mj ,
|A1 | |C1 |α−1 Γ (ds) +
+α S2
S2
which means that each component of the infinite sum is bounded by an expression which does not depend on h. As a consequence, the sum over j ∈ N0 given in Eq. (25) converges uniformly if the sum of Mj over j ∈ N0 is finite, i.e. if the below conditions are satisfied +∞ j=0
|A1 |α Γ (ds) < +∞, S2
+∞ j=0
+∞
α
|B1 | Γ (ds) < +∞, S2
|A1 | |C1 |α−1 Γ (ds) < +∞, S2
j=0
+∞
(27) |B1 | |C1 |
α−1
Γ (ds) < +∞,
S2
j=0
which is true (see Remark 1 in Appendix D). Now, to prove that (
) in Eq. (25) holds we again apply the dominated convergence theorem. From the inequalities in Eq. (26), for the integrand given in Eq. (25) we have that λh A + λh B α + |C |α − C − (λh A + λh B )α 1 1 2 1 1 1 2 1 1 1 λh1 α−1 α α α−1 ≤ 2 (α + 1) (|A1 | + |B1 | ) + α |A1 | |C1 | + |B1 | |C1 |α−1 .
for a fixed s = (s1 , s2 ) ∈ S2 . Let us notice that the dominating function does not depend on h. Moreover, it is integrable if for all j ∈ N0 the below conditions are true
|A1 |α Γ (ds) < +∞,
S2
|A1 | |C1 |α−1 Γ (ds) < +∞, S2
|B1 |α Γ (ds) < +∞,
S2
|B1 | |C1 |α−1 Γ (ds) < +∞, S2
which is satisfied (see Remark 1 in Appendix D). Then, let us notice that for a fixed s = (s1 , s2 ) ∈ S2 and h → +∞ we have α α h h α h h λ1 A1 + λ2 B1 + |C1 | − C1 − λ1 A1 + λ2 B1 α α = λh1 A1 + (λ2 /λ1 )h B1 + |C1 |α − C1 − λh1 A1 + (λ2 /λ1 )h B1 α α ∼ λh1 |A1 |α + |C1 |α − C1 − λh1 A1
56
A. Grzesiek and A. Wyloma´ nska
and the following limit holds lim
x→0
|dx|α + |c|α − |c − dx|α = α dcα−1 x
for 1 < α < 2, d, c ∈ IR. (28)
Moreover, we have λh1 → 0 for h → +∞. Finally we can write that
h λ1 A1 + λh2 B1 α + |C1 |α − C1 − λh1 A1 + λh2 B1 α lim h→+∞ λh1 h α λ1 A1 + |C1 |α − C1 − λh1 A1 α α−1 = lim = α A1 C1 (29) h→+∞ λh1
for a fixed s = (s1 , s2 ) ∈ S2 . Finally, from Eq. (25) we finally have that lim
h→+∞
h λ1 A1 + λh2 B1 α + |C1 |α − C1 − (λh1 A1 + λh2 B1 )α
+∞ j=0
λh1
S2
= αD1 ,
Γ (ds)
(30)
which is equivalent to CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) ∼ αD1 λh1
for h → +∞,
(31)
where D1 :=
+∞
α−1
S2
j=0
A1 C1
Γ (ds) < +∞.
(32)
II) Let us assume that |λ1 | < |λ2 |. Proceeding similarly as above we have CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) ∼ αD2 λh2
for h → +∞,
(33)
where D2 :=
+∞ j=0
α−1
S2
B1 C1
Γ (ds) < +∞.
(34)
III) Let us assume that λ1 = −λ2 and even h. Similarly as above, we obtain CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) ∼ α(D1 + D2 ) λh1
for h → +∞,
where D1 + D2 =
+∞ j=0
α−1
S2
(A1 + B1 )C1
Γ (ds) < +∞.
(35)
Asymptotics of Alternative Interdependence Measures
57
IV) Let us assume that λ1 = −λ2 and h is odd. Similarly as above, we have CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) ∼ α(D1 − D2 ) λh1 for h → +∞,
(36)
where D1 − D2 =
+∞ j=0
α−1
S2
(A1 − B1 )C1
Γ (ds) < +∞.
– For λ1 = λ2 = λ, and |λ| < 1, we examine the auto-codifference function given in Eq. (16). Similarly to previous case, we show that lim
h→+∞ ()
=
j=0
=
j=0
+∞
()
h λ A2 + hλh B2 α + |C2 |α − C2 − (λh A2 + hλh B2 )α
+∞
hλh
S2
h λ A2 + hλh B2 α + |C2 |α − C2 − (λh A2 + hλh B2 )α
lim
h→+∞
+∞ j=0
Γ (ds)
hλh
S2
Γ (ds)
h λ A2 + hλh B2 α + |C2 |α − C2 − (λh A2 + hλh B2 )α lim Γ (ds). hλh S2 h→+∞
(37) From the dominated convergence theorem, to justify ( ) one has to prove that the sum over j ∈ N0 in Eq. (37) converges uniformly. From the inequalities in Eq. (26) we have that λh A2 + hλh B2 α + |C2 |α − C2 − (λh A2 + hλh B2 )α Γ (ds) h S2 hλ
≤ 2α−1 (α + 1)M |A2 |α Γ (ds) + |B2 |α Γ (ds) S2
S2
|B2 | |C2 |α−1 Γ (ds) = Nj ,
|A2 | |C2 |α−1 Γ (ds) +
+α S2
S2
for all j ∈ N0 , where M is the boundary of {hλh } over h ∈ N0 . Let us notice that Nj does not depend on h. Consequently, the sum over j ∈ N0 in Eq. (37) converges uniformly if the sum of Nj over j ∈ N0 converges, i.e. if the below sums are finite +∞ j=0
|A2 |α Γ (ds) < +∞, S2
+∞ j=0
j=0
|B2 |α Γ (ds) < +∞, S2
+∞
|A2 | |C2 |α−1 Γ (ds) < +∞, S2
+∞ j=0
(38) |B2 | |C2 |α−1 Γ (ds) < +∞,
S2
which is true (see Remark 1 in Appendix D). As the second step, to prove that (
) in Eq. (37) holds we again use the dominated convergence
58
A. Grzesiek and A. Wyloma´ nska
theorem. Namely, from the inequalities given in Eq. (26), for the integrand in Eq. (37) we obtain that λh A + hλh B α + |C |α − C − (λh A + hλh B )α 2 2 2 2 2 2 hλh α−1 α α α−1 ≤ 2 (α + 1)M (|A2 | + |B2 | ) + α |A2 | |C2 | + |B2 | |C2 |α−1 , (39)
for a fixed s = (s1 , s2 ) ∈ S2 . Let us notice that the dominating function does not depend on h. Moreover, it is integrable if for all j ∈ N0 the below integrals are finite
|A2 |α Γ (ds) < +∞,
S2
|A2 | |C2 |α−1 Γ (ds) < +∞, S2
α
|B2 | |C2 |α−1 Γ (ds) < +∞,
|B2 | Γ (ds) < +∞, S2
S2
which is true (see Remark 1 in Appendix D). Then, let us notice that for a fixed s = (s1 , s2 ) ∈ S2 and h → ∞ we have α α h h α h h λ A2 + hλ B2 + |C2 | − C2 − λ A2 + hλ B2 α α = hλh (A2 /h + B2 ) + |C2 |α − C2 − hλh (A2 /h + B2 ) α α ∼ hλh |B2 |α + |C2 |α − C2 − hλh B2 .
From the limit in Eq. (28) and since hλh → 0 for h → +∞, we have that h λ A + hλh B2 α + |C2 |α − C2 − λh A + hλh B2 α lim h→+∞ hλh h α hλ B2 + |C2 |α − C2 − hλh B2 α α−1 = lim = α B2 C2 h→+∞ hλh
for a fixed s = (s1 , s2 ) ∈ S2 . Finally, from Eq. (37) we have lim
h→+∞
+∞ j=0
S2
h λ A2 + hλh B2 α + |C2 |α − C2 − (λh A2 + hλh B2 )α Γ (ds) hλh = αD3 ,
(40)
which is equivalent to the fact that CD(X1 (t), X1 (t − h)) = CD(X1 (t), X1 (t + h)) ∼ αD3 hλh
for h → +∞,
(41) where D3 :=
+∞ j=0
α−1
S2
B2 C2
Γ (ds) < +∞.
(42)
Asymptotics of Alternative Interdependence Measures
59
B) Let us consider the auto-codifference function of {X2 (t)} given in Lemma 2. – For λ1 = λ2 , and |λ1 | < 1, |λ2 | < 1 we examine the auto-codifference function given in Eq. (19). I) Let us assume that |λ1 | > |λ2 |. Proceeding as in A) leads to CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) ∼ αE1 λh1
for h → +∞,
(43)
where E1 :=
+∞
α−1
S2
j=0
A3 C3
Γ (ds) < +∞.
(44)
We mention here, similarly as in A), to use the dominated convergence theorem we need to guarantee that +∞ j=0
S2
+∞ j=0
+∞
|A3 |α Γ (ds) < +∞,
|A3 | |C3 |α−1 Γ (ds) < +∞, S2
j=0
+∞
α
|B3 | Γ (ds) < +∞, S2
(45) |B3 | |C3 |
α−1
Γ (ds) < +∞,
S2
j=0
that is true (see Remark 1 in Appendix D). II) Let us assume that |λ1 | < |λ2 |. Again, proceeding as in A) we obtain CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) ∼ αE2 λh2
for h → +∞,
(46)
where E2 :=
+∞ j=0
α−1
S2
B3 C3
Γ (ds) < +∞.
(47)
III) Let us assume that λ1 = −λ2 and even h. Proceeding as in A) we have CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) ∼ α(E1 + E2 ) λh1 for h → +∞,
(48)
where E1 + E2 =
+∞ j=0
α−1
S2
(A3 + B3 )C3
Γ (ds) < +∞.
iV) Let us assume that λ1 = −λ2 and odd h. Proceeding as in A) leads to CD(X2 (t), X2 (t − h)) = CD(X2 (t), X2 (t + h)) ∼ α(E1 − E2 ) λh1 for h → +∞,
(49)
60
A. Grzesiek and A. Wyloma´ nska
where E1 − E2 =
+∞
α−1
S2
j=0
(A3 − B3 )C3
Γ (ds) < +∞.
– For λ1 = λ2 = λ, and |λ| < 1, we examine the auto-codifference function given in Eq. (22). Proceeding as in A) leads to CD(X2 (t), X2 (t + h)) = CD(X2 (t), X2 (t + h)) ∼ αE3 hλh
for h → +∞,
(50) where E3 :=
+∞ j=0
α−1
S2
B4 C4
Γ (ds) < +∞.
(51)
As in A), to use the dominated convergence theorem we need to guarantee that +∞ j=0
α
|A4 | Γ (ds) < +∞, S2
+∞ j=0
+∞ j=0
α
|B4 | Γ (ds) < +∞, S2
|A4 | |C4 |α−1 Γ (ds) < +∞, S2
+∞ j=0
(52) |B4 | |C4 |
α−1
Γ (ds) < +∞,
S2
that is true (see Remark 1 in Appendix D).
Appendix C: Proof. A) Let us consider the auto-covariation function of the time series {X1 (t)} given in Lemma 1. (a) At first, we examine the function CV(X1 (t), X1 (t − h)). – Let us assume that λ1 = λ2 , and |λ1 | < 1, |λ2 | < 1. The autocovariation function given in Eq. (14) can be written as CV(X1 (t), X1 (t − h)) = λh1 D1 + λh2 D2 ,
(53)
which directly leads to the formulas given in Theorem 2. The constants D1 and D2 are specified in Eqs. (32) and (34), respectively. – Let us assume that λ1 = λ2 = λ, and |λ| < 1. The auto-covariation function given in Eq. (17) can be written as CV(X1 (t), X1 (t − h)) = λh D∗ + hλh D3 ,
(54)
which directly leads to the formula given in Theorem 2. The constant D3 is given in Eq. (42) and D∗ :=
+∞ j=0
α−1
S2
A2 C2
Γ (ds).
Asymptotics of Alternative Interdependence Measures
61
(b) Now, we examine the function CV(X1 (t), X1 (t + h)). – For λ1 = λ2 , and |λ1 | < 1, |λ2 | < 1, we consider the auto-covariation function given in Eq. (15). I) Let us assume that |λ1 | > |λ2 |. In this case, one can show that α−1 C1 A1 λh1 + B1 λh2 Γ (ds) h α−1 h→+∞ λ1 j=0 S2 α−1 +∞ C1 A1 λh1 + B1 λh2 () = lim Γ (ds) h α−1 h→+∞ S λ1 2 j=0 α−1 +∞ C1 A1 λh1 + B1 λh2 () = lim Γ (ds), h α−1 h→+∞ λ1 j=0 S2 lim
+∞
(55)
From the dominated convergence theorem, ( ) holds if the sum over j ∈ N0 in Eq. (55) converges uniformly. From the following inequality |a + b|α−1 ≤ |a|α−1 + |b|α−1
(56)
satisfied for a, b ∈ IR, 1 < α < 2, for all j ∈ N0 we have α−1 C1 A1 λh1 + B1 λh2 Γ (ds) α−1 S2 λh1 ≤ |C1 | |A1 |α−1 Γ (ds) + S2
|C1 | |B1 |α−1 Γ (ds) = Kj(. 57)
S2
Let us notice that Kj does not depend on h. Consequently, the sum over j ∈ N0 in Eq. (55) converges uniformly if the sum of Kj over j ∈ N0 converges, which leads to the below conditions +∞
|C1 | |A1 |α−1 Γ (ds) < +∞,
S2
j=0
+∞ j=0
|C1 | |B1 |α−1 Γ (ds) < +∞ S2
(58) that are true, see Remark 1 in Appendix D. In the next step, we prove that (
) in Eq. (55) holds by the use of the dominated convergence theorem. From the inequality in Eq. (56) we have that C A λh + B λh α−1 1 2 1 1 1 α−1 + |C1 ||B1 |α−1 . ≤ |C1 ||A1 | h α−1 λ 1
for a fixed s = (s1 , s2 ) ∈ S2 . Let us notice that the dominating function does not depend on h and is integrable if for all j ∈ N0 we have that
|C1 ||A1 |α−1 Γ (ds) < +∞, S2
|C1 ||B1 |α−1 Γ (ds) < +∞, S2
62
A. Grzesiek and A. Wyloma´ nska
which is true, see Remark 1 in Appendix D. Then, let us notice that for a fixed s = (s1 , s2 ) ∈ S2 the following limit holds lim
h→+∞
α−1 C1 A1 λh1 + B1 λh2 α−1 = C1 A1 h α−1 λ1
and finally, from Eq. (55) we have lim
α−1 C1 A1 λh1 + B1 λh2 Γ (ds) = D4 . h α−1 λ1
+∞
h→+∞
j=0
S2
(59)
which is equivalent to the fact that CV(X1 (t), X1 (t + h)) ∼ D4
α−1 λh1
for h → +∞,
(60)
where D4 :=
+∞
α−1
S2
j=0
C1 A1
(61)
Γ (ds) < +∞.
II) Let us assume that |λ1 | < |λ2 |. Proceeding similary as above, we obtain CV(X1 (t), X1 (t + h)) ∼ D5
α−1 λh2
for h → +∞,
(62)
where D5 :=
+∞ j=0
α−1
S2
C1 B1
(63)
Γ (ds) < +∞.
III) Let us assume that λ1 = −λ2 and even h. In this case, we have the exact formula CV(X1 (t), X1 (t + h)) = D7
λh1
α−1
,
(64)
where D7 :=
+∞ j=0
C1 (A1 + B1 )α−1 Γ (ds) < +∞.
(65)
S2
IV) Let us assume that λ1 = −λ2 and odd h. In this case, as in the case given above, we have the exact formula CV(X1 (t), X1 (t + h)) = D11
λh1
α−1
,
(66)
where D8 :=
+∞ j=0
S2
C1 (A1 − B1 )α−1 Γ (ds) < +∞.
(67)
Asymptotics of Alternative Interdependence Measures
63
– For λ1 = λ2 = λ, and |λ| < 1, we consider the auto-covariation function given in Eq. (18). Similarly to the previous case, one can show that lim
h→+∞
j=0
S2
j=0
+∞ ()
=
α−1 C2 A2 λh + B2 hλh
+∞
lim
h→+∞
+∞ ()
=
j=0
S2
lim
(hλh )α−1 α−1 C2 A2 λh + B2 hλh (hλh )α−1 α−1 C2 A2 λh + B2 hλh
S2 h→+∞
(hλh )α−1
Γ (ds)
Γ (ds)
(68)
Γ (ds).
From the uniform convergence theorem, to justify ( ) the sum over j ∈ N0 in Eq. (68) has to converge uniformly. From the inequality in Eq. (56) for all j ∈ N0 we can show that α−1 C2 A2 λh + B2 hλh Γ (ds) α−1 h S2 (hλ ) ≤ |C2 | |A2 |α−1 Γ (ds) + S2
|C2 | |B2 |α−1 Γ (ds) = Lj , S2
where Lj does not depend on h. Consequently, the sum over j ∈ N0 in Eq. (68) converges uniformly if the sum of Lj over j ∈ N0 converges, that leads to the below conditions +∞ j=0
|C2 | |A2 |α−1 Γ (ds) < +∞, S2
+∞ j=0
|C2 | |B2 |α−1 Γ (ds) < +∞,
(69)
S2
which are true (see Remark 1 in Appendix D). Now, to justify (
) in Eq. (68) we again use the dominated convergence theorem. From the inequality in Eq. (56) we have C A λh + B hλh α−1 2 2 2 α−1 + |C2 ||B2 |α−1 , ≤ |C2 ||A2 | (hλh )α−1
for a fixed s = (s1 , s2 ) ∈ S2 . Let us notice that the integrand in Eq. (68) is dominated a function that does not depend on h. Moreover, this function is integrable if for all j ∈ N the integrals given below are finite
|C2 ||A2 |α−1 Γ (ds) < +∞, S2
|C2 ||B2 |α−1 Γ (ds) < +∞, S2
which is true (see Remark 1 in Appendix D). Then, let us notice that for a fixed s = (s1 , s2 ) ∈ S2 the following limit holds lim
h→+∞
α−1 C2 A2 λh + B2 hλh (hλh )α−1
α−1
= C2 B2
64
A. Grzesiek and A. Wyloma´ nska
and finally from Eq. (68) we obtain lim
h→+∞
α−1 C2 A2 λh + B2 hλh
+∞ j=0
(hλh )α−1
S2
Γ (ds) = D6 ,
(70)
which is equivalent to the fact that CV(X1 (t), X1 (t + h)) ∼ D6
α−1 hλh
for h → +∞,
(71)
where D6 :=
+∞
α−1
S2
j=0
C2 B2
(72)
Γ (ds) < +∞.
B) Let us consider the auto-covariation function of the time series {X2 (t)} given in Lemma 2. (a) At first, let us examine the function CV(X2 (t), X2 (t − h)). – Let us assume that λ1 = λ2 , and |λ1 | < 1, |λ2 | < 1. The autocovariation function given in Eq. (20) can be written as CV(X2 (t), X2 (t − h)) = λh1 E1 + λh2 E2 ,
(73)
which directly leads to the formulas given in Theorem 2. The constants E1 and E2 are specified in Eqs. (44) and (47), respectively. – Let us assume that λ1 = λ2 = λ, and |λ| < 1. The auto-covariation function given in Eq. (23) can be written as CV(X2 (t), X2 (t − h)) = λh E∗ + hλh E3 ,
(74)
which directly leads to the formulas given in Theorem 2. The constant E3 is given in Eq. (51) and E∗ :=
+∞
α−1
S2
j=0
A4 C4
Γ (ds).
(b) Now, we examine the function CV(X2 (t), X2 (t + h)). – For λ1 = λ2 , and |λ1 | < 1, |λ2 | < 1, we consider the auto-covariation function given in Eq. (21). I) Let us assume that |λ1 | > |λ2 |. Proceeding as in A) we obtain α−1 CV(X2 (t), X2 (t + h)) ∼ E4 λh1
for
h → ∞,
(75)
where E4 :=
∞ j=0
S2
α−1
C3 A3
Γ (ds) < +∞.
(76)
Asymptotics of Alternative Interdependence Measures
65
We mention here that similarly as in A), to use the dominated convergence theorem we need to guarantee that +∞ j=0
|C3 | |A3 |α−1 Γ (ds) < +∞,
S2
(77)
+∞ j=0
|C3 | |B3 |
α−1
Γ (ds) < +∞.
S2
which is true (see Remark 1 in Appendix D). II) Let us assume that |λ1 | < |λ2 |. Again, proceeding as in A) we obtain
CV(X2 (t), X2 (t + h)) ∼ E5
λh2
α−1
for h → +∞,
(78)
where E5 :=
+∞
α−1
S2
j=0
C3 B3
Γ (ds) < +∞.
(79)
III) Let us assume that λ1 = −λ2 and h is even. In this case, we have the exact formula CV(X2 (t), X2 (t + h)) = E7
λh1
α−1
for h → +∞,
(80)
where E7 :=
+∞ j=0
C3 (A3 + B3 )α−1 Γ (ds) < +∞. S2
IV) Let us assume that λ1 = −λ2 and h is odd. Similarly as above, we have the exact formula α−1 CV(X2 (t), X2 (t + h)) = E8 λh1 for h → +∞,
(81)
where E8 :=
+∞ j=0
C3 (A3 − B3 )α−1 Γ (ds) < +∞. S2
– For λ1 = λ2 = λ, and |λ| < 1, we consider the auto-covariation function given in Eq. (24). Proceeding as in A) we obtain CV(X2 (t), X2 (t + h)) ∼ E6
hλh
α−1
for h → +∞,
(82)
where E6 :=
+∞ j=0
α−1
S2
C4 B4
Γ (ds) < +∞.
(83)
66
A. Grzesiek and A. Wyloma´ nska
As in A), to use the dominated convergence theorem we need to guarantee that +∞ j=0
|C4 | |A4 | S2
α−1
Γ (ds) < +∞,
+∞ j=0
|C4 | |B4 |α−1 Γ (ds) < +∞, S2
(84) which is true (see Remark 1 in Appendix D).
Appendix D Remark 1. Let us notice that the constants Ai , Bi , Ci for i = 1, 2, 3, 4 can be upper-bounded by M max(|λ1 |, |λ2 |)j or by M jmax(|λ1 |, |λ2 |)j with the constant M independent of j. Since max(|λ1 |, |λ2 |) < 1 and the measure Γ (·) is finite, the conditions given in Eqs. (27), (38), (45), (52), (58), (69), (77) and (84) are always satisfied.
References Ansley, C.F.: Computation of the theoretical autocovariance function for a vector ARMA process. J. Stat. Comput. Simul. 12(1), 15–24 (1980) Ansley, C.F., Kohn, R.: A note on reparameterizing a vector autoregressive moving average model to enforce stationarity. J. Stat. Comput. Simul. 24(2), 99–106 (1986) Mainassara, Y.B.: Selection of weak VARMA models by modified Akaike’s information criteria. J. Time Ser. Anal. 33, 121–130 (2012) Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, New York (2002). https://doi.org/10.1007/978-3-319-29854-2 Cambanis, S., Miller, G.: Linear problems in pth order and stable processes. SIAM J. Appl. Math. 41(1), 43–69 (1981) Chan, H., Chinipardaz, R., Cox, T.: Discrimination of AR, MA and ARMA time series models. Commun. Stat. - Theor. Meth. 25(6), 1247–1260 (1996) de Silva, B.M.: A class of multivariate symmetric stable distributions. J. Multivar. Anal. 8(3), 335–345 (1978) Fama, E.F.: The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965) Gallagher, C.M.: A method for fitting stable autoregressive models using the autocovariation function. Stat. Probab. Lett. 53, 381–390 (2001) Grzesiek, A., Wyloma´ nska, A.: Asymptotic behavior of the cross-dependence measures for bidimensional AR(1) model with α−stable noise. Accepted in Banach Center Publications 2019 (2019). https://arxiv.org/abs/1911.10894 Grzesiek, A., Teuerle, M., Wyloma´ nska, A.: Cross-codifference for bidimensional VAR(1) models with infinite variance. Communications in Statistics - Simulation and Computation published online (2019). 27. Located at: arXiv:1902.02142 Grzesiek, A., Teuerle, M., Sikora, G., Wyloma´ nska, A.: Spatial-temporal dependence measures for α−stable bivariate AR(1). J. Time Ser. Anal. 41, 454–475 (2020) Hong-Zhi, A., Zhao-Guo, C., Hannan, E.J.: A note on ARMA estimation. J. Time Ser. Anal. 4(1), 9–17 (1983) Jablo´ nska-Sabuka, M., Teuerle, M., Wyloma´ nska, A.: Bivariate sub-Gaussian model for stock index returns. Physica A: Stat. Mech. Appl. 486, 628–637 (2017)
Asymptotics of Alternative Interdependence Measures
67
Kozubowski, T.J., Panorska, A.K.: Multivariate geometric stable distributions in financial applications. Math. Comput. Modell. 29(10–12), 83–92 (1999) Kozubowski, T.J., Panorska, A.K., Rachev, S.T.: Statistical issues in modeling multivariate stable portfolios. In: Rachev, S.T. (ed.) Handbook of Heavy Tailed Distributions in Finance, volume 1 of Handbooks in Finance, pp. 131–167. North-Holland, Amsterdam (2003) Kruczek, P., Wyloma´ nska, A., Teuerle, M., Gajda, J.: The modified Yule-Walker method for alpha-stable time series models. Physica A 469, 588–603 (2017) Lii, K.-S., Rosenblatt, M.: An approximate maximum likelihood estimation for nongaussian non-minimum phase moving average processes. J. Multivar. Anal. 43(2), 272–299 (1992) Luetkepohl, H.: Forecasting cointegrated VARMA processes. Humboldt Universitaet Berlin, Sonderforschungsbereich 373 (2007) Maejima, M., Yamamoto, K.: Long-memory stable Ornstein-Uhlenbeck processes. Electron. J. Probab. 8(19), 1–18 (2003) Mandelbrot, B.: The variation of certain speculative prices. J. Bus. 36(4), 394–419 (1963) Mauricio, J.A.: Exact maximum likelihood estimation of stationary vector ARMA models. J. Am. Stat. Assoc. 90(429), 282–291 (1995) McKenzie, E.: A note on the derivation of theoretical autocovariances for ARMA models. J. Stat. Comput. Simul. 24, 159–162 (1986) Miller, G.: Properties of certain symmetric stable distributions. J. Multivar. Anal. 8(3), 346–360 (1978) Mittnik, S., Rachev, S.T.: Alternative multivariate stable distributions and their applications to financial modeling. In: Cambanis S., Samorodnitsky G., Taqqu M.S. (eds.) Stable Processes and Related Topics, vol. 25 of Progress in Probabilty. Birkh¨ auser Boston (1991). https://doi.org/10.1007/978-1-4684-6778-9 6 Niglio, M., Vitale, C.D.: Threshold vector ARMA models. Commun. Stat. - Theor. Meth. 44(14), 2911–2923 (2015) Nolan, J.P., Panorska, A.K.: Data analysis for heavy tailed multivariate samples. Commun. Stat. Stoch. Mod. 13(4), 687–702 (1997) Nowicka, J.: Asymptotic behavior of the covariation and the codifference for ARMA models with stable innovations. Commun. Stat. Stoch. Mod. 13(4), 673–685 (1997) Nowicka, J., Weron, A.: Measures of dependence for ARMA models with stable innovations. Annales Universitatis Mariae Curie-Sklodowska. Sectio A – Mathematica, LI 1, 14, 133–144 (1997) Nowicka, J., Wyloma´ nska, A.: The dependence structure for PARMA models with a-stable innovations. Acta Physica Polonica B 37(11), 3071–3081 (2006) Nowicka-Zagrajek, J., Wyloma´ nska, A.: Measures of dependence for stable AR(1) models with time-varying coefficients. Stoch. Mod. 24(1), 58–70 (2008) Peters, G., Sisson, S., Fan, Y.: Likelihood-free Bayesian inference for alpha-stable models. Comput. Stat. Data Anal. 56(11), 3743–3756 (2012) Press, S.: Multivariate stable distributions. J. Multivar. Anal. 2(4), 444–462 (1972) Rosadi, D.: Testing for independence in heavy-tailed time series using the codifference function. Comput. Stat. Data Anal. 53, 4516–4529 (2009) Rosadi, D., Deistler, M.: Estimating the codifference function of linear time series models with infinite variance. Metrika: Int. J. Theor. Appl. Stat. 73(3), 395–429 (2011) Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York (1994)
68
A. Grzesiek and A. Wyloma´ nska
Stoyanov, S.V., Samorodnitsky, G., Rachev, S., Ortobelli, S.: Computing the portfolio conditional Value-at-Risk in the alpha-stable case. Probab. Math. Stat. 26, 1–22 (2006) Tsai, H., Chan, K.S.: A note on non-negative ARMA processes. J. Time Ser. Anal. 28, 350–360 (2007) Weir, A.J.: Lebesgue Integration and Measure. Cambridge University Press, Cambridge (1973) Weron, A.: Stable processes and measures; a survey. In: Szynal, D., Weron, A. (eds.) Probability Theory on Vector Spaces III. pp, pp. 306–364. Springer, Heidelberg (1984). https://doi.org/10.1007/BFb0099806 Williams, K.S.: The nth power of a 2 × 2 matrix. Math. Mag. 65(5), 336 (1992) Wyloma´ nska, A., Chechkin, A., Sokolov, I., Gajda, J.: Codifference as a practical tool to measure interdependence. Physica A 421, 412–429 (2015) ˙ Zak, G., Obuchowski, J., Wyloma´ nska, A., Zimroz, R.: Application of ARMA modelling and alpha-stable distribution for local damage detection in bearings. Diagnostyka 15, 01 (2014) ˙ Zak, G., Wyloma´ nska, A., Zimroz, R.: Data driven iterative vibration signal enhancement strategy using alpha stable distribution. Shock Vib. 2017, 11 (2017) ˙ Zak, G., Wyloma´ nska, A., Zimroz, R.: Periodically impulsive behaviour detection in noisy observation based on generalised fractional order dependency map. Appl. Acoust. 144, 31–39 (2019) Zolotarev, V.M.: One-dimensional stable distributions. Translations of Mathematical Monographs. Providence: American Mathematical Society (1986)
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data Aleksandra Grzesiek1 , Anna Michalak2 , and Agnieszka Wyloma´ nska1(B) 1
2
Faculty of Pure and Applied Mathematics Hugo Steinhaus Center, Wroclaw University of Science and Technology, Wroclaw, Poland {aleksandra.grzesiek,agnieszka.wylomanska}@pwr.edu.pl Faculty of Geoengineering, Mining and Geology, Wroclaw University of Science and Technology, Wroclaw, Poland [email protected]
Abstract. Many real datasets exhibit non-Gaussian distribution, which is mainly manifested by impulsive behavior that is rather not visible in the Gaussian-based models. One can recognize such behavior in both one-dimensional and multi-dimensional datasets. In this paper, we study the problem of the linear dependence for two-dimensional non-Gaussian (infinite-variance) random variables. In the Gaussian case, the perfect measure is the correlation which clearly indicates the linear dependence between considered random variables. However, when the random variables are infinite-variance distributed, the alternative dependence measures need to be considered. We remind in this paper the most known alternative dependence measures which are adequate for infinite-variance systems and for two example two-dimensional random variables which are linear dependent, we calculate the selected measures. Moreover, by using the Monte Carlo simulations, we check how the empirical dependence measures tend to their theoretical values. In the simulation study, we check how the parameters of the considered random variables influence the behavior of the empirical correlation and empirical scov measure. The results presented in the simulation study clearly indicate that the empirical correlation should not be considered as the measure of dependence for the infinite-variance random vectors. Keywords: α-stable distribution · Linear dependence · Alternative dependence measures · Estimation · Monte Carlo approach
1
Introduction
In this paper, we study the linear dependence of two random variables. The considered problem seems to be relatively easy when the random variables under consideration are second-order, i.e., in case their variance is finite. The simplest case is when both of them are Gaussian distributed. In that case, the most straightforward and more effective tool for checking if the linear dependence between the random variables exists is the correlation coefficient. This statistics c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 69–92, 2022. https://doi.org/10.1007/978-3-030-82110-4_4
70
A. Grzesiek et al.
clearly indicates if there is a dependence between considered variables and how strong it is. However, when the considered random variables come from the distribution with infinite variance, the problem is much more complicated. In that case, the classical correlation coefficient can not be considered as the proper tool of linear dependence recognition because it is infinite. Thus, in the infinite variance case, the alternative dependence measures are considered. The most classical are covariation [1–3], codifference [2–7] and fractional lower order covariance- FLOC [8–10]. In the literature, one can also find the modified versions of the mentioned measures of dependence [11,12]. One can see the theoretical considerations of the alternative measures of dependence for heavy-tailed-based processed [5,6,11–14]. It should be highlighted, the mentioned dependence measures can give different information about the considered random variables. It has been observed that the classical Gaussian distribution is inappropriate for datasets which exhibit impulsive and non-symmetric characteristics. The nonGaussian distributions and models one can find in various applications, like in finance [15,16], physics [17,18], electricity market [19], technical diagnostics [20– 22], geophysical science [23,24], telecommunication [25], speech signal analysis [26], medical signal [27] and many others. As it was mentioned, the non-Gaussian behavior of real datasets is manifested by their impulsiveness. One of the classical distribution which seems to be perfect for modeling such kind of datasets is the α-stable one (also called stable). The αstable distribution was introduced in 1925 by Paul L´evy [28]. Stable probability laws are important, because due to the Generalized Central Limit Theorem they attract distributions of sums of random variables with diverging variance, similarly to the Gaussian law that attracts distributions with finite variance. The first widely known application of stable distribution appeared in the famous work of Mandelbrot [29] on financial time series modeling, which are perfect examples of the heavy-tailed datasets. In 1995, with the book of Shao and Nikias [1] (as well as with their paper [30]), which defined the initial signal processing framework, and the book by Samorodnitsky and Taqqu [2], which provided a unified and solid mathematical treatment, this interest became public and in recent years many journals and conference papers appeared on various applications of αstable distribution and α-stable-based models. The α-stable distribution can be considered as the extension of the Gaussian one because in the special case it reduces to this classical distribution. The α-stable distribution, in general, has infinite variance (except the Gaussian case) and belongs to the so-called heavy-tailed class of distributions for which the large observations are more probable than in the Gaussian case. Thus it is proper to model data with large observations. The α-stable distribution can be considered in a one-dimensional as well as a multi-dimensional version [31–34]. In this paper, we study the linear dependence for α-stable random variables. The main purpose of the paper is to indicate that in the considered infinitevariance case the classical measure, namely the correlation, is not effective when the data are dependent. In that case the non-Gaussian behavior (mostly
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
71
manifested by large observations in the time series) influences the estimator of the classical measure in such a way that the information about the dependence may be disturbed. Here we consider two example cases when the linear dependence between random variables exists. We demonstrate how the selected theoretical alternative dependence measures behave for the considered models. Moreover, we check how empirical alternative dependence measures tend to their theoretical counterparts. In the simulation part, we make a comparative study and check how the considered models’ parameters and the sample length influence the results for two dependence measures, namely the empirical correlation and the empirical scov measure (which is defined in the following sections). It should be highlighted, however the theoretical correlation coefficient is not defined for α-stable random variables, the empirical correlation (based on real data vectors) can always be calculated and in many research papers this measure is still considered as the appropriate one for the linear dependence analysis. In the simulation study, we indicate the empirical correlation should not be considered in the problem of linear dependence description for infinite-variance random variables. This paper can be considered as the guide for practitioners who analyze the data in the context of their dependency. We indicate here that even if the considered time series belongs to the heavy tailed family of distributions, the classical measure of dependence can be used, however we need to be careful with the conclusions. In that case we suggest to consider other dependency measures which may indicate at the linear dependence in more clear way in the case of the heavy-tailed distribution of the data. This is the main message of the paper. The proposed dependency measures adequate to non-Gaussian distributed data can be used to any bi-dimensional vectors with the linear dependence of the components. Even the simple models considered in this paper could be used to for instance to describe the financial data, like metal’s prices and currency exchange rate data considered in [35], which exhibit strongly non-Gaussian behavior. The rest of the paper is organized as follows: in Sect. 2 we define the one- and multidimensional α-stable distribution and the alternative dependence measures considered in this paper. In Sect. 3 we consider two exemplary two-dimensional αstable random variables with the linear dependence between the components and calculate the theoretical alternative dependence measures. Moreover, by using the Monte Carlo simulations we check how the empirical alternative dependence measures tend to their theoretical counterparts along the stability index α. In the Simulation Study, we analyze the empirical correlation and empirical scov measure for the example two-dimensional α-stable random variables for different values of the parameters. In this part we indicate that the empirical correlation can not be considered as the dependence measure for the linear dependence identification. The last section concludes the paper.
2
One- and Multidimensional α- Stable Distribution
The α-stable random variables, also called stable, can be characterized in four equivalent ways, see [2,36,37]. According to one of the definitions, the univariate
72
A. Grzesiek et al.
α-stable distribution approximates the distribution of the normalized sums of independent and identically distributed (i.i.d.) random variables which leads to the formulation of the Generalized Central Limit Theorem, see [2]. Since the αstable cumulative distribution function and the α-stable probability distribution function are not given in closed forms in the general case, the distribution of an α-stable random variable X is commonly defined using the characteristic function of the following form ⎧ ⎨ exp {−σ α |θ|α {1 − iβsign(θ) tan (πα/2)} + iμθ} for α = 1, (1) E[exp iθX] = ⎩ exp −σ|θ|{1 + iβsign(θ) π2 log(|θ|} + iμθ for α = 1, where the parameter α ∈ (0, 2] is a stability index, σ > 0 is a scale parameter, β ∈ [−1, 1] is a skewness parameter, and μ ∈ IR is a shift parameter. For β = 0 and μ = 0 the α-stable distribution is said to be symmetric and the characteristic function given in Eq. (1) takes a simpler form as a real-valued function. It is worth mentioning that for α = 2 the α-stable distribution becomes Gaussian with the √ mean equal to μ and the standard deviation equal to 2σ, whereas for α = 2 the distribution has the so-called heavy tails which converge to zero slower than exponentially. Moreover, for the α-stable distributed random variable with α = 2 the second moment is infinite. The multivariate α-stable distribution that is a distribution of an α-stable d−dimensional random vector X = (X1 , . . . , Xd ), can be defined similarly to the univariate case using the following characteristic function [2,38] E[exp{iθ, X}] = ⎧ α πα ⎪ ⎪ ⎨ exp − Sd |θ, s| (1 − isign(θ, s) tan( 2 ))Γ (ds) + iθ, μ0 ⎪ ⎪ ⎩ exp −
Sd
|θ, s|(1 + i π2 sign(θ, s) log(θ, s)))Γ (ds) + iθ, μ0
for α = 1,
(2)
for α = 1,
where ·, · is the inner product. The definition of multivariate α-stable distribution involves the stability index α ∈ (0, 2], the shift vector μ0 ∈ IRd and a finite measure Γ (·) on a unit sphere in IRd denoted as Sd . The measure Γ (·) is called a spectral measure, and it includes information about the skewness and the shape of the distribution. The pair (Γ, μ0 ) is unique and together with the parameter α fully describes the multidimensional α-stable distribution. Moreover, the α-stable random vector X is symmetric if and only if μ0 = 0d and the spectral measure Γ (·) is symmetric. As in the univariate case, for the symmetric α-stable random vectors, the characteristic function given in Eq. (2) simplifies to a real-valued function. An example of a multivariate symmetric α-stable distribution is a subGaussian one which arises as a distribution of a combination of zero-mean Gaussian vector in IRd denoted as G = (G1 , G2 , . . . , Gd ) and a random variable A, namely (3) X = (X1 , . . . , Xd ) = (A1/2 G1 , A1/2 G2 , . . . , A1/2 Gd ),
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
73
2/α
where A is α/2−stable with σ = (cos(πα/4)) , β = 1 and μ = 0. For the sub-Gaussian random vector the characteristic function given in Eq. (2) takes the following form [2] ⎧ α/2 ⎫ ⎪ ⎪ d d ⎨ 1 ⎬ θi θj Rij , (4) E[exp{iθ, X}] = exp − ⎪ ⎪ ⎭ ⎩ 2 i=1 j=1 where Rij denotes the covariance function between the random variables Gi and Gj . In this paper, we focus on one- and two-dimensional α-stable distribution. Moreover, in the models considered in Sects. 3 and 4 we assume that the distribution is symmetric. In the case of α = 2, the α-stable random vectors are Gaussian, and their dependence can be described using the covariance or correlation function denoted as cov and corr, respectively. However, for α = 2 the measures based on the second moment are not defined and they need to be replaced by the alternative functions adequate to the general α-stable case. The first function considered in this context is the covariation introduced in [39] for the symmetric α-stable random vectors. The measure is presented in Definition 1 given below. Definition 1 [2,36]. Let (X1 , X2 ) be a bidimensional symmetric α-stable random vector with 1 < α < 2 with the spectral measure Γ (·). The covariation of X1 on X2 is the real number defined as
α−1 s1 s2 Γ (ds), CV(X1 , X2 ) = S2
where ap = |a|p sign(a). The covariation function shares some properties of the classical covariance function and can be treated as its generalization because for α = 2 it reduces to the covariance up to a constant. For jointly independent components of the random vector (X1 , X2 ) the covariation function is equal to zero. However, the implication in the opposite way is not true. In general, the covariation function is additive only in the first argument. It becomes additive in the second argument only when the random variables being summed are independent. One important disadvantage of this measure is the fact that covariation is not symmetric in its arguments. As a consequence, the covariation coefficient called also normalized covariation [12,40], defined as NCV(X1 , X2 ) =
CV(X1 , X2 ) , α σX 2
(5)
where σX2 is a scale parameter of a random variable X2 , is not symmetric and may be unbounded. The calculation of the empirical covariation measure involves the estimation of the spectral measure, see for example [41], which can pose a
74
A. Grzesiek et al.
substantial problem, therefore in many applications, the estimator of normalized covariation is used instead, see [40,41]. To overcome the disadvantages of the covariation function, in [12,42] the authors proposed the signed symmetric covariation coefficient (denoted as scov), which satisfies most of the properties of the classical correlation coefficient. The measure is presented in Definition 2 given below. Definition 2. Let (X1 , X2 ) be a bidimensional symmetric α-stable random vector with 1 < α < 2 with the spectral measure Γ (·). The signed symmetric covariation coefficient between X1 and X2 is defined as CV(X1 , X2 )CV(X2 , X1 ) 1/2 , scov(X1 , X2 ) = κX1 ,X2 σα σα X1 X2
where κX1 ,X2 =
⎧ ⎨ sign (CV(X1 , X2 )) if sign (CV(X1 , X2 )) = sign (CV(X2 , X1 )) , ⎩
−1
if sign (CV(X1 , X2 )) = −sign (CV(X2 , X1 )) .
The properties of the signed symmetric covariation coefficient and the formula for the scov estimator are given in [12]. Here we mention that scov is symmetric in its arguments, and it is bounded, namely −1 ≤ scov(X1 , X2 ) ≤ 1. Moreover, |scov(X1 , X2 )| = 1 if and only if X2 = λX1 for λ ∈ IR and λ = 0. If the random vector (X1 , X2 ) is jointly independent, then scov(X1 , X2 ) = 0. For α = 2, scov(X1 , X2 ) reduces to the classical correlation coefficient. Another measure of dependence, that can replace the covariance for α = 2, is the codifference function, which is defined not only in the case of 1 < α < 2 but for all values of the stability index, see [2]. The basic definition of this measure is given below in Definition 3. Definition 3 [2]. Let (X1 , X2 ) be a bidimensional symmetric α-stable random vector. Then the codifference between X1 and X2 has the following form α α α CD(X1 , X2 ) = σX − σX − σX , 1 −X2 1 2
where σX1 −X2 , σX1 and σX2 denote the scale parameters of X1 − X2 , X1 and X2 , respectively. However, by generalizing Definition 3 the codifference can be presented in the language of the characteristic function. Therefore, the measure can be used to quantify the dependence for a more general class of random variables, e.g., infinitely divisible random variables. Definition 4 [4]. Let us take a random vector (X1 , X2 ). The codifference between X1 and X2 is given by CD(X1 , X2 ) = log E exp{i(X1 − X2 )} − log E exp{iX1 } − log E exp{−iX2 }.
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
75
The codifference function, similarly to the covariation measure, reduces to the classical covariance in the Gaussian case. For jointly independent random vector (X1 , X2 ) the codifference function is equal to zero. The implication in the opposite way it true only for 0 < α < 1 and α = 2. However, on the contrary to the covariation function, the codifference is symmetric in its arguments for the symmetric random vectors (X1 , X2 ). Moreover, the codifference function is bounded, i.e., α α 0 ≤ CD(X1 , X2 ) ≤ σX + σX 1 2
(1 − 2
α−1
α )(σX 1
α σX ) 2
+
≤ CD(X1 , X2 ) ≤
α σX 1
+
α σX 2
for 0 < α ≤ 1, for 1 < α ≤ 2.
Consequently, one can consider a normalized version of the codifference function, given by [43] CD(X1 , X2 ) NCD(X1 , X2 ) = α (6) α , σX1 + σX 2 which inherits all the mentioned properties of the codifference function besides the upper and lower bounds which are the following 0 ≤ NCD(X1 , X2 ) ≤ 1 for 0 < α ≤ 1, 1 − −2
α−1
≤ NCD(X1 , X2 ) ≤ 1 for 1 < α ≤ 2.
The estimators of the codifference and the normalized codifference functions are based on the estimation of the characteristic function, see [5,43].
3
Example Two-Dimensional α-stable Random Variables
In this section, we consider two example models based on two-dimensional αstable distribution. For both models, we derive the theoretical formulas for all dependence measures mentioned in Sect. 2, namely covariation, normalized covariation, symmetric signed covariation coefficient, codifference, and normalized codifference. 3.1
Model I
Let us consider the random vector (X, Y ) with the random variable Y satisfying the following equation Y = aX + S, (7) where a ∈ IR\{0} is a constant and X and S are one-dimensional independent symmetric α-stable random variables with the scale parameters equal to σ1 and σ2 , respectively.
76
A. Grzesiek et al. 30
400
20
200
10
yi
yi
(b)
(a)
600
0
0
-200
-10
-400
-20
-600 -400
-200
0
200
400
-30 -20
x
-10
0
10
20
x
Fig. 1. Example realizations of the random vector (X, Y ) considered as Model I. The parameters are a = 1.5, σ1 = 1, σ2 = 0.5, α = 1.2 for panel (a) and α = 1.8 for panel (b). The length of the sample is equal to n = 1000.
In Fig. 1 we present the exemplary realizations of the random vector (X, Y ) denoted by {(xi , yi ) for i = 1, . . . , n}. Panels (a) and (b) correspond to α = 1.2 and α = 1.8, respectively. In both cases one can recognize the linear dependence between the random variables X and Y . To quantify the dependence we use the functions presented in Sect. 2. The theoretical formulas for the considered measures in this case are as follows CV(X, Y ) = aα−1 σ1α ,
(8)
CV(Y, X) = aσ1α ,
NCV(X, Y ) =
(9)
aα−1 σ1α , |a|α σ1α + σ2α
(10)
NCV(Y, X) = a,
(11)
scov(X, Y ) = scov(Y, X) = sign(a)
1 |a|α σ1α 2 , |a|α σ1α + σ2α
CD(X, Y ) = CD(Y, X) = σ1α (1 + |a|α − |1 − a|α ) ,
NCD(X, Y ) = NCD(Y, X) =
σ1α (1 + |a|α − |1 − a|α ) . σ1α (1 + |a|α ) + σ2α
(12)
(13)
(14)
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data 108
(a)
77
(b) 1
6 4
0.5
2 0 =1 . =1 1 . =1 2 .3 =1 . =1 4 .5 =1 . =1 6 .7 =1 . =1 8 . =2 9 .0
=1 . =1 1 . =1 2 . =1 3 . =1 4 . =1 5 . =1 6 . =1 7 . =1 8 .9 =2 .0
0
(d)
(c) 1
0.6
0.5
0.4
0
0.2
-0.5
0
theoretical
theoretical
-1
(e) 5
=1 . =1 1 . =1 2 . =1 3 . =1 4 . =1 5 . =1 6 . =1 7 . =1 8 . =2 9 .0
=1 . =1 1 . =1 2 .3 =1 . =1 4 . =1 5 . =1 6 .7 =1 . =1 8 .9 =2 .0
-0.2
0.9
(f)
theoretical
4 3
0.8 0.7 theoretical
2
=1 . =1 1 .2 =1 . =1 3 . =1 4 . =1 5 . =1 6 .7 =1 . =1 8 . =2 9 .0
=1 . =1 1 . =1 2 . =1 3 . =1 4 . =1 5 . =1 6 .7 =1 . =1 8 . =2 9 .0
0.6
Fig. 2. Comparison of the theoretical formulas of the dependence measures with the performance of the estimators for Model I. Panels (a)–(f) correspond to cov(X, Y ), corr(X, Y ), NCV(X, Y ), scov(X, Y ), CD(X, Y ) and NCD(X, Y ), respectively. The parameters are a = 1.5, σ1 = 1, σ2 = 0.5. The stability index changes from α = 1.1 to α = 2.0. The length of the sample is equal to n = 1000 and the number of repetitions is equal to M = 1000.
Figure 2 presents the comparison of the theoretical expressions given in Eqs. (8)–(14) (the corresponding proofs are presented in Appendix A, see Lemmas 1–5) with the performance of the estimators. Panels (c)–(f) correspond to normalized covariation CV(X, Y ), symmetric signed covariation coefficient scov(X, Y ), codifference CD(X, Y ) and normalized codifference NCD(X, Y ), respectively. Additionally, on panels (a)–(b) we present the values taken by the
78
A. Grzesiek et al.
covariance and correlation estimators for which the theoretical equivalents are not defined, but the empirical functions can still be calculated for the data. The empirical values are presented on the boxplots for M = 1000 repetitions and for various values of the stability index from α = 1.1 to α = 2.0. The length of a sample is equal to n = 1000. In all cases, the medians of the empirical values are close to the theoretical values of the considered measures. Let us notice that the variance of estimators decreases as the value of the stability index α increases. Moreover, the values taken by the covariance and correlation estimators are characterized by greater variability than the values taken by other estimators, which is evidenced by a large number of outliers. This is an argument against using these two measures for models based on the α-stable distribution. A further discussion on this topic is presented in Sect. 4. 3.2
Model II
Let us consider the sub-Gaussian random vector in IR2 denoted as (X, Y ) with the characteristic function given in Eq. (4), the stability index α and the parameters R11 , R22 and R12 . Moreover, let us assume that Z = aY + S,
(15)
where a ∈ IR \ {0} is a constant, (X, Y ) and S are independent and S is a symmetric α-stable distributed random variable with the scale parameter equal to σ. For this model, we are interested in describing the dependence between random variables X and Z. In Fig. 3 we present the exemplary realizations of the random vector (X, Z) which are denoted by {(xi , zi ) for i = 1, . . . , n}. Similarly to the previous model, we present the case of α = 1.2 and of α = 1.8, respectively on panels (a) and (b). Again, we quantify the dependence using the measures presented in Sect. 2. The theoretical formulas for the considered measures in this case are as follows CV(X, Z) = 2−α/2 aα−1 R12 R22
(α−2)/2
CV(Z, X) = 2−α/2 aR12 R11
(α−2)/2
,
(16)
,
2−α/2 aα−1 R12 R22
(17)
(α−2)/2
NCV(X, Z) =
α/2
2−α/2 |a|α R22 + σ α
,
(18)
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data (a)
200
79
(b)
30 20
100
zi
zi
10 0
0 -10
-100 -20 -200 -100
-50
0
50
100
-30 -20
x
-10
0
10
20
x
Fig. 3. Exemplary realizations of the random vector (X, Z) considered as Model II. The parameters are a = 2.5, σ1 = 0.1, R11 = 1, R22 = 0.5, R12 = 0.5, α = 1.2 for panel (a) and α = 1.8 for panel (b). The length of the sample is equal to n = 1000.
NCV(Z, X) =
aR12 , R11
(19)
12 2 (α−2)/2 2−α |a|α R12 (R R ) 11 22 , scov(X, Z) = scov(Z, X) = sign(R12 a) α/2 α/2 2−α/2 R11 2−α/2 |a|α R22 + σ α −α/2
CD(X, Z) = CD(Z, X) = 2
2−α/2 NCD(X, Z) = NCD(Z, X) =
α/2 R11
+ |a|
α
α/2 R22
(20) − |R11 − 2aR12 + a R22 | , 2
α/2 α/2 R11 + |a|α R22 − |R11 − 2aR12 + a2 R22 | α/2
α/2
2−α/2 R11 + 2−α/2 |a|α R22 + σ α
(21) .
(22)
Similarly to Model I, we compare the theoretical formulas of the dependence measure given in Eqs. (16)–(22) (the corresponding proofs are given in Appendix A, see Lemmas 6–10) with the values taken by the estimators. The comparison is presented in Fig. 4, where panels (c)–(f) correspond to normalized covariation CV(X, Z), symmetric signed covariation coefficient scov(X, Z), codifference CD(X, Z) and normalized codifference NCD(X, Z), respectively, and the panels (a)–(b) present the values taken by the covariance and correlation estimators (without the theoretical equivalents which are not defined). The number of repetitions is equal to M = 1000 and the length of the sample is n = 1000. We consider various values of the stability index from α = 1.1 to α = 1.9. Similarly to Model I, the medians of the values taken by the estimators are close to the theoretical ones. As the parameter α decreases, the variance of estimators increases and the variability of the values is the greatest for the covariance and correlation estimators (a large number of outliers). A further discussion on this topic is presented in Sect. 4.
A. Grzesiek et al. 109
(b)
(a)
20
1
15
0.5
10
0
5
-0.5
0
-1 =1
=1
.1 .2 =1 .3 =1 .4 =1 .5 =1 .6 =1 .7 =1 .8 =1 .9
=1 .1 =1 .2 =1 .3 =1 .4 =1 .5 =1 .6 =1 .7 =1 .8 =1 .9
80
(d)
(c)
2
1
1
0.5
0
0
-1 -0.5
-2
theoretical
theoretical
theoretical
1.6 1.4
.5 .6 =1 .7 =1 .8 =1 .9 =1
.4
=1
.3
=1
=1
=1
=1
.1
.4 =1 .5 =1 .6 =1 .7 =1 .8 =1 .9
.3
=1
=1
.2
.1
=1
=1
(e) 1.8
.2
-1
-3
(f)
0.8
theoretical
0.7 0.6
1.2 1
0.5
0.8
.7 =1 .8 =1 .9
=1
.6
=1
.5
=1
.4
.3
=1
=1
.2
.1
=1
=1
=1
.1 =1 .2 =1 .3 =1 .4 =1 .5 =1 .6 =1 .7 =1 .8 =1 .9
0.4
Fig. 4. Comparison of the theoretical formulas of the dependence measures with the performance of the estimators for Model II. Panels (a)–(f) correspond to cov(X, Z), corr(X, Z), NCV(X, Z), scov(X, Z), CD(X, Z) and NCD(X, Z), respectively. The parameters are a = 2.5, σ = 0.1, R11 = 1, R22 = 0.5, R12 = 0.5. The stability index changes from α = 1.1 to α = 2.0. The length of the sample is equal to n = 1000 and the number of repetitions is equal to M = 1000.
4
Simulation Study
In this section, we consider two example models described in Sects. 3.1 and 3.2. For both models, we compare two measures of dependence that take values between [−1, 1] - correlation and symmetric signed covariation coefficient. The scov measure is well defined for considered models and quantifies the dependence present in the data accordingly. But because correlation is used more often, we
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
81
decided to show it as well and compare the results. It should be mentioned, the empirical correlation can always be calculated but its theoretical equivalent exists only in case α = 2. 4.1
Model I
In Fig. 5 the exemplary realizations of the random vector (X, Y ) from the Model I are presented. In columns we observe the data corresponding to the stability index α = 1.1, α = 1.5 and α = 2, respectively. The rows correspond to the cases a = −1, a = 0.01 and a = 1. The other parameters are σ1 = 1, σ2 = 0.5. The length of the sample is equal to n = 1000. In the first and third rows, linear dependence can be observed. In order to calculate the empirical counterpart of the scov measure we used the estimator presented in [12]. The empirical correlation is the classical statistic used in every statistical package (i.e., in Matlab it is the ‘corrcoef’ function). In Figs. 6–8 we present the comparison of two measures of dependence: scov(X, Y ) (left panel) and corr(X, Y ) (right panel). In all considered cases, the number of repetitions is M = 1000. For each repetition, we calculate both measures scov(X, Y ) and corr(X, Y ). The center line is related to the median of the obtained measure coefficient. The whiskers represent the first (Q1 ) and third (Q3 ) quartile. They are calculated from all data, without the outliers removing. As it was shown in Sect. 3.1, the medians of the empirical values are close to the theoretical ones. We compare the quartile (Q1 ) and (Q3 ) and based on this, we show the differences between two considered measures. Based on Figs. 6–8 we can analyze one of the dispersion measure namely the interquartile range (IQR). In Fig. 7 we can observe that when the scale parameter σ1 increases (σ2 = 0.5 and a = 1), the IQR of both measures decreases. The opposite behavior is observed in Fig. 8 when the parameter σ2 increases (σ1 = 1 and a = 1). The highest IQR is observed when the value of the stability parameter α is 1.1 and it becomes smaller when parameter α increases. When dependence in the data does not occur (the parameter a = 0.01), we observe a smaller IQR for the scov(X, Y ) than for the corr(X, Y ). For the σ1 = 0.1 and the small sample size the IQR of the scov(X, Y ) measure is larger than for the corr(X, Y ). For the higher values of σ1 parameter, the opposite behavior can be observed (IQR for the scov(X, Y ) is smaller than for the corr(X, Y )). When the length n of the sample increases, the value of IQR decrease in all analyzed cases. It means that this measure can be used to show that dependence in the data exists, and in most of the considered cases it better describes the dependence in data.
82
A. Grzesiek et al.
Fig. 5. Exemplary realizations of the random vector (X, Y ) considered as Model I for different values of a and α parameters. The other parameters are σ1 = 1, σ2 = 0.5. scov
1
corr
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-1
-1 1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
n = 50, a = -1 n = 100, a = -1 n = 1000, a = -1
1.9
2
1.1
1.2
n = 50, a = 0.01 n = 100, a = 0.01 n = 1000, a = 0.01
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
n = 50, a = 1 n = 100, a = 1 n = 1000, a = 1
Fig. 6. The comparison for two measures of dependence scov(X, Y ) (left panel) and corr(X, Y ) (right panel) for different values of a parameter and the different length n of the sample (see the legend). The other parameters are σ1 = 1, σ2 = 0.5 and the stability index α increases from 1.1 to 2.
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data scov
1
corr
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
83
-1
-1 1.1
1.2
1.3
1.4
1.5
1.6
n = 50,
1.7
1
n = 100,
1.8
1.9
= 0.1
n = 1000,
n = 50,
= 0.1
1
1
1.1
2
1
n = 100,
= 0.1
1.2
=1 1
n = 1000,
1.4
n = 50,
=1 1
1.3
1.5
=1
1.7
1.8
1.9
2
=3
1
n = 100,
1.6
=3
1
n = 1000,
1
=3
Fig. 7. The comparison for two measures of dependence scov(X, Y ) (left panel) and corr(X, Y ) (right panel) for different values of σ1 parameter and the different length n of the sample (see legend). The other parameters are a = 1, σ2 = 0.5 and the stability index α increases from 1.1 to 2. scov
1
corr
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 1.1
1.2
1.3
1.4
1.5
1.6
1.7
n = 50, n = 100,
1.8
2
1.9
= 0.1 2
= 0.1
n = 1000, 2 = 0.1
1.1
2 n = 50,
n = 100,
2
1.2
=1 2
=1
n = 1000, 2 = 1
1.3
1.4
1.5
n = 50, n = 100,
2
1.6
1.7
1.8
1.9
2
=3 2
=3
n = 1000, 2 = 3
Fig. 8. The comparison for two measures of dependence scov(X, Y ) (left panel) and corr(X, Y ) (right panel) for different values of σ2 parameter and the different length n of the sample (see the legend). The other parameters are a = 1, σ1 = 1 and the stability index α increases from 1.1 to 2.
4.2
Model II
In Fig. 9 the exemplary realizations of the random vector (X, Z) from the Model II are presented. In columns, we observe the data corresponding to the stability
84
A. Grzesiek et al.
index α = 1.1, α = 1.5 and α = 1.9, respectively. Here, we have the difference between Model I and Model II, because Model II is not well defined for the α = 2. The rows, as in the previous model, correspond to a = −1, a = 0.01 and a = 1. The other parameters are σ = 0.1, R11 = 1, R12 = 0.5, R22 = 0.5 and the length of the sample is equal to n = 1000. In the first and third rows, the linear dependence can be observed.
Fig. 9. Exemplary realizations of the random vector (X, Z) considered as Model II for different values of a and α parameters (see labels). The other parameters are σ = 0.1, R11 = 1, R12 = 0.5, R22 = 0.5.
In Figs. 10–11 the comparison of two measures of dependence: scov(X, Z) (left panel) and corr(X, Z) (right panel) is presented. In all cases, and the number of repetitions is M = 1000. Similarly to the Model I, in Sect. 3.2, it was shown that the medians of empirical measures are close to the theoretical values. Based on this, we compare scov(X, Z) and corr(X, Z) in the same way as in the previous case. In Figs. 11 we can observe that for different scale parameters σ (a = 1, R11 = 1, R12 = 0.5, R22 = 0.5), the IQR of each measure has the similar values. Similarly, as in Model I, when the stability index α increases the IQR decreases in all cases for both measures. The same behavior can be observed when the sample size n is increased. According to Figs. 10–11, when the strong linearity exists in the data (a = 1 and a = −1), for the scov(X, Z), we observe the smaller interquartile range in all cases than in the corr(X, Z). It means that the scov(X, Z) is the better choice to analyze data when the linear dependence is expected.
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data scov
1
corr
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
85
-1
-1 1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.1
1.9
n = 50, a = -1 n = 100, a = -1 n = 1000, a = -1
1.2
n = 50, a = 0.01 n = 100, a = 0.01 n = 1000, a = 0.01
1.3
1.4
1.5
1.6
1.7
1.8
1.9
n = 50, a = 1 n = 100, a = 1 n = 1000, a = 1
Fig. 10. The comparison for two measures of dependence scov(X, Z) (left panel) and corr(X, Z) (right panel) for different values of a parameter and the different length n of the sample (see the legend). The other parameters are σ = 0.1, R11 = 1, R12 = 0.5, R22 = 0.5 and the stability index α increases from 1.1 to 1.9. scov
1
corr
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8 -1
-1 1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
n = 50, = 0.1 n = 100, = 0.1 n = 1000, = 0.1
1.9
1.1
n = 50, = 0.5 n = 100, = 0.5 n = 1000, = 0.5
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
n = 50, = 1 n = 100, = 1 n = 1000, = 1
Fig. 11. The comparison for two measures of dependence scov(X, Z) (left panel) and corr(X, Z) (right panel) for different values of σ parameter and the different length n of the sample (see the legend). The other parameters are a = 1, R11 = 1, R12 = 0.5, R22 = 0.5 and the stability index α increases from 1.1 to 1.9.
Summarizing this section, we can specify three main results. Firstly, the value of stability index α affects on the estimators of coefficients in Model I and Model II for both measures. When the α increases, the estimators of coefficients for both measures have smaller IQR.
86
A. Grzesiek et al.
Secondly, The length n of the sample also affects on the estimators of coefficients. In both Model I and Model II we can observe a higher influence of the sample size for the scov measure than for the corr. However, only in one case (for σ1 = 0.1), the length n of the sample cause that the scov measure has a higher IQR than corr. In other cases, it is does not affect which measure shows better. Thirdly, for both models, when the strong linearity is present in the data, for the scov measure, we observe smaller IQR than for correlation. However, if there is no dependence in the data, the results based on corr are better. It means that the scov is a better choice when the dependence on analyzed data is expected.
5
Conclusions
This paper is devoted to the problem of the linear dependence measuring in case the random variables under consideration are infinite-variance distributed. We remind the readers the most known dependence measures which are properly defined for the infinite-variance random variables. Then we calculate them for two example two-dimensional α-stable random variables with the linear dependence. The α-stable distribution is considered here as the classical member of the infinite-variance class of distributions. By using the Monte Carlo simulations, we check the effectiveness of the empirical alternative dependence measures with respect to the stability index α. In the simulation study, we consider two empirical dependence measures, namely the correlation and the scov measure. The theoretical correlation for the considered random variables is not defined, however the empirical one can be always calculated. Moreover, there are research papers where the empirical correlation is considered in order to recognize the linear dependence between two vectors of observations in case of the infinite variance distribution. Our simulation study clearly indicates the empirical correlation in the cases considered in this paper is not as effective as the empirical alternative dependence measure (scov) which is properly defined for the α-stable random variables. Acknowledgment. AG and AW would like to acknowledge the support of the National Center of Science Opus Grant No. 2016/21/B/ST1/00929 “Anomalous diffusion processes and their applications in real data modelling”. AM would like to acknowledge the support of the EIT RawMaterials GmbHunder Framework Partnership Agreement No. 18253 (OPMO - Operation monitoring of mineral crushing machinery)
Appendix A Theoretical Formulas of the Dependence Measures Presented in Sect. 3 for Model I and Model 2 Lemma 1. For Model I defined in Eq. (7) the covariation function given in Definition 1 takes the following form CV(X, Y ) = aα−1 σ1α
and
CV(Y, X) = aσ1α .
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
87
Proof. CV(X, Y ) = CV(X, aX + S) = CV(X, aX) + CV(X, S) = aα−1 CV(X, X) = aα−1 σ1α , CV(Y, X) = CV(aX + S, X) = CV(aX, X) + CV(S, X) = aCV(X, X) = aσ1α . Lemma 2. For Model I defined in Eq. (7) the normalized covariation function given in Eq. (5) takes the following form NCV(X, Y ) =
aα−1 σ1α |a|α σ1α + σ2α
and
NCV(Y, X) = a.
Proof. The expressions follow from the formulas for the covariation functions given in Lemma 1 and the fact that CV(X, X) = σ1α and CV(Y, Y ) = CV(aX + S, aX + S) = |a|α CV(X, X) + CV(S, S) = |a|α σ1α + σ2α . Lemma 3. For Model I defined in Eq. (7) the signed symmetric covariation coefficient given in Definition 2 takes the following form 1 |a|α σ1α 2 scov(X, Y ) = scov(Y, X) = sign(a) α α |a| σ1 + σ2α Proof. The expression follows from the formulas for the covariation functions in Lemma 1. Lemma 4. For Model I defined in Eq. (7) the codifference function given in Definition 3 takes the following form CD(X, Y ) = CD(Y, X) = σ1α (1 + |a|α − |1 − a|α ) . Proof. Since log E exp{iX} = −σ1α , log E exp{−iY } = log E exp{−i(aX + S)} = log E exp{−iaX} + log E exp{−iS} = −|a|α σ1α − σ2α , log E exp{i(X − Y )} = log E exp{i(X − aX − S)} = log E exp{i(1 − a)X} + log E exp{−iS} = −|1 − a|α σ1α − σ2α ,
88
A. Grzesiek et al.
thus using Definition 4 we have that CD(X, Y ) = σ1α (1 + |a|α − |1 − a|α ) and CD(X, Y ) = CD(Y, X)
since the codifference function is symmetric.
Lemma 5. For Model I defined in Eq. (7) the normalized codifference function given in Eq. (6) takes the following form NCD(X, Y ) = NCD(Y, X) =
σ1α (1 + |a|α − |1 − a|α ) . σ1α (1 + |a|α ) + σ2α
Proof. The expression follows from the formula for the codifference function in Lemma 4. Lemma 6. For Model II defined in Eq. (15) the covariation function given in Definition 1 takes the following form CV(X, Z) = 2−α/2 aα−1 R12 R22
(α−2)/2
and
CV(Z, X) = 2−α/2 aR12 R11
(α−2)/2
.
Proof. CV(X, Z) = CV(X, aY + S) = CV(X, aY ) + CV(X, S) = aα−1 CV(X, Y ) = 2−α/2 aα−1 R12 R22
(α−2)/2
,
CV(Z, X) = CV(aY + S, X) = CV(aY, X) + CV(S, X) = aCV(Y, X) = 2−α/2 aR12 R11
(α−2)/2
.
Lemma 7. For Model II defined in Eq. (15) the normalized covariation function given in Eq. (5) takes the following form 2−α/2 aα−1 R12 R22
(α−2)/2
NCV(X, Z) =
α/2 2−α/2 |a|α R22
+
σα
and
NCV(Z, X) =
aR12 . R11
Proof. The expressions follow from the formulas for the covariation functions given in Lemma 6 and the fact that CV(X, X) = 2−α/2 R11
α/2
and CV(Z, Z) = CV(aY + S, aY + S) = |a|α CV(Y, Y ) + CV(S, S) = 2−α/2 |a|α R22 + σ α . α/2
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
89
Lemma 8. For Model II defined in Eq. (15) the signed symmetric covariation coefficient given in Definition 2 takes the following form 12 2 (α−2)/2 2−α |a|α R12 (R11 R22 ) . scov(X, Z) = scov(Z, X) = sign(R12 a) α/2 α/2 2−α/2 R11 2−α/2 |a|α R22 + σ α Proof. The expression follows from the formulas for the covariation functions in Lemma 6. Lemma 9. For Model II defined in Eq. (15) the codifference function given in Definition 3 takes the following form α/2 α/2 CD(X, Z) = CD(Z, X) = 2−α/2 R11 + |a|α R22 − |R11 − 2aR12 + a2 R22 | . Proof. Since log E exp{iX} = −2−α/2 R11 , α/2
log E exp{−iZ} = log E exp{−i(aY + S)} = log E exp{−iaY } + log E exp{−iS} = −2−α/2 |a|α R22 − σ α , α/2
log E exp{i(X − Z)} = log E exp{i(X − aY − S)} = log E exp{i(X − aY )} + log E exp{−iS} = −2−α/2 |R11 − 2aR12 + a2 R22 | − σ2α using Definition 4 we have that α/2 α/2 CD(X, Z) = 2−α/2 R11 + |a|α R22 − |R11 − 2aR12 + a2 R22 | and CD(X, Z) = CD(Z, X)
since the codifference function is symmetric.
Lemma 10. For Model II defined in Eq. (15) the normalized codifference function given in Eq. (6) takes the following form
NCD(X, Z) = NCD(Z, X) =
α/2 α/2 2−α/2 R11 + |a|α R22 − |R11 − 2aR12 + a2 R22 | α/2
α/2
2−α/2 R11 + 2−α/2 |a|α R22 + σ α
.
Proof. The expression follows from the formula for the codifference function in Lemma 9.
90
A. Grzesiek et al.
References 1. Nikias, C.L., Shao, M.: Signal Processing with Alpha-Stable Distributions and Applications. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1995) 2. Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York (1994) 3. Nowicka, J.: Asymptotic behavior of the covariation and the codifference for ARMA models with stable innovations. Commun. Stat. Stoch. Models 13(4), 673–685 (1997) 4. Wyloma´ nska, A., Chechkin, A., Sokolov, I.M., Gajda, J.: Codifference as a practical tool to measure interdependence. Phys. A 421, 412–429 (2015) 5. Rosadi, D., Deistler, M.: Estimating the codifference function of linear time series models with infinite variance. Metrika 73(3), 395–429 (2011) 6. Rosadi, D.: Order identification for Gaussian moving averages using the codifference function. J. Stat. Comput. Simul. 76(6), 553–559 (2006) 7. Kokoszka, P.S., Taqqu, M.S.: Fractional ARIMA with stable innovations. Stoch. Process. Appl. 60(1), 19–47 (1995) 8. Liu, T.-H., Mendel, J.M.: A subspace-based direction finding algorithm using fractional lower order statistics. IEEE Trans. Sig. Process. 49(8), 1605–1613 (2001) 9. Chen, Z., Geng, X., Yin, F.: A harmonic suppression method based on fractional lower order statistics for power system. IEEE Trans. Ind. Electron. 63(6), 3745– 3755 (2016) 10. Aalo, V.A., Ackie, A.-B.E., Mukasa, C.: Performance analysis of spectrum sensing schemes based on fractional lower order moments for cognitive radios in symmetric α-stable noise environments. Sig. Process. 154, 363–374 (2019) 11. Damarackas, J., Paulauskas, V.: Properties of spectral covariance for linear processes with infinite variance. Lith. Math. J. 54(3), 252–276 (2014) 12. Kodia, B., Garel, B.: Estimation and comparison of signed symmetric covariation coefficient and generalized association parameter for alpha-stable dependence modeling. Commun. Stat. Theor. Meth. 43(24), 5156–5174 (2014) 13. Kharisudin, I., Rosadi, D., Abdurakhman, Suhartono, S.: The asymptotic property of the sample generalized codifference function of stable MA(1). Far East J. Math. Sci. 99, 1297–1308 (2016) 14. Mikosch, T., Gadrich, T., Kluppelberg, C., Adler, R.J.: Parameter estimation for ARMA models with infinite variance innovations. Ann. Stat. 23(1), 305–326 (1995) 15. Mittnik, S., Rachev, S.T.: Stable Paretian Models in Finance. Wiley, New York (2000) 16. McCulloch, J.H.: 13 financial applications of stable distributions. In: Statistical Methods in Finance. Handbook of Statistics, vol. 14, pp. 393–425. Elsevier (1996) 17. Takayasu, H.: Stable distribution and L´evy process in fractal turbulence. Progress Theoret. Phys. 72(3), 471–479 (1984) 18. Annibaldi, S.V., Manfredi, G., Dendy, R.O.: Non-Gaussian transport in strong plasma turbulence. Phys. Plasmas 9(3), 791–799 (2002) 19. Nowicka-Zagrajek, J., Weron, R.: Modeling electricity loads in california: ARMA models with hyperbolic noise. Sig. Process. 82(12), 1903–1915 (2002) ˙ 20. Zak, G., Wyloma´ nska, A., Zimroz, R.: Periodically impulsive behaviour detection in noisy observation based on generalised fractional order dependency map. Appl. Acoust. 144, 31–39 (2019)
How to Describe the Linear Dependence for Heavy-Tailed Distributed Data
91
˙ 21. Zak, G., Wyloma´ nska, A., Zimroz, R.: Data driven iterative vibration signal enhancement strategy using alpha-stable distribution. Shock Vib. 2017, 1–11 (2017) 22. Chen, Z., Ding, S.X., Peng, T., Yang, C., Gui, W.: Fault detection for non-Gaussian processes using generalized canonical correlation analysis and randomized algorithms. IEEE Trans. Industr. Electron. 65(2), 1559–1567 (2018) 23. Palacios, M.B., Steel, M.F.J.: Non-Gaussian Bayesian geostatistical modeling. J. Am. Stat. Assoc. 101(474), 604–618 (2006) 24. Gosoniu, L., Vounatsou, P., Sogoba, N., Smith, T.: Bayesian modelling of geostatistical malaria risk data. Geospat. Health 1(1), 127–139 (2006) 25. Middleton, D.: Non-Gaussian noise models in signal processing for telecommunications: new methods and results for class a and class b noise models. IEEE Trans. Inf. Theor. 45, 1129–1149 (1999) 26. Yellin, D., Weinstein, E.: Criteria for multichannel signal separation. IEEE Trans. Sig. Process. 42(8), 2158–2168 (1994) 27. Chua, K.C., Chandran, V., Acharya, U.R., Lim, C.M.: Application of higher order statistics/spectra in biomedical signals-a review. Med. Eng. Phys. 32(7), 679–689 (2010) 28. L´evy, P.: Calcul des Probabilites. Gauthier-Villars, Paris (1925) 29. Mandelbrot, B.: The Pareto-L´evy Law and the distribution of income. Int. Econ. Rev. 1(2), 79–106 (1960) 30. Shao, M., Nikias, C.L.: Signal processing with fractional lower order moments: stable processes and their application. Proc. IEEE 81, 986–1010 (1993) 31. Press, S.J.: Estimation in univariate and multivariate stable distributions. J. Am. Stat. Assoc. 67(340), 842–846 (1972) 32. Kozubowski, T.J., Panorska, A.K., Rachev, S.T.: Statistical issues in modeling multivariate stable portfolios. In: Rachev, S.T. (ed.) Handbook of Heavy Tailed Distributions in Finance. Handbooks in Finance, vol. 1, pp. 131–167. North-Holland, Amsterdam (2003) 33. Nolan, J.P., Panorska, A.K.: Data analysis for heavy tailed multivariate samples. Commun. Stat. Stoch. Models 13(4), 687–702 (1997) 34. Miller, G.: Properties of certain symmetric stable distributions. J. Multivar. Anal. 8(3), 346–360 (1978) 35. Grzesiek, A., Mrozi´ nska, M., Giri, P., Sundar, S., Wyloma´ nska, A.: The covariationbased Yule-Walker method for multidimensional autoregressive time series with α−stable distributed noise (2020, in preparation) 36. Weron, A.: Stable processes and measures; a survey. In: Szynal, D., Weron, A. (eds.) Probability Theory on Vector Spaces III. LNM, vol. 1080, pp. 306–364. Springer, Heidelberg (1984). https://doi.org/10.1007/BFb0099806 37. Zolotarev, V.M.: One-dimensional Stable Distributions. Translations of Mathematical Monographs. American Mathematical Society, Providence (1986) 38. Press, S.: Multivariate stable distributions. J. Multivar. Anal. 2(4), 444–462 (1972) 39. Cambanis, S., Miller, G.: Linear problems in pth order and stable processes. SIAM J. Appl. Math. 41(1), 43–69 (1981) 40. Gallagher, C.M.: A method for fitting stable autoregressive models using the autocovariation function. Stat. Probab. Lett. 53, 381–390 (2001) 41. Kruczek, P., Wyloma´ nska, A., Teuerle, M., Gajda, J.: The modified yule-walker method for α-stable time series models. Physica A 469, 588–603 (2017)
92
A. Grzesiek et al.
42. Garel, B., Kodia, B.: Signed symmetric covariation coefficient for alpha-stable dependence modeling. C.R. Math. 347(5), 315–320 (2009) 43. Rosadi, D.: Measuring dependence of random variables with finite and infinite variance using the codifference and the generalized codifference function. AIP Conf. Proc. 1755(1), 1755 (2016)
Granger Causality and Cointegration During Stock Bubbles and Market Crashes Bartosz Stawiarski(B) Institute of Computer Science and Telecommunication, Cracow University of Technology, Cracow, Poland [email protected]
Abstract. We explore Granger causality and cointegration between main stock indices, macroeconomic indicators (PMI) and central banks monetary expansion for US data in presence of extreme market movements: bubbles and crashes. Two stock indices are caused in Granger sense either by economic fundamentals or by money supply provided by Federal Reserve’s monetary policy. The causation is found to be dynamic: vanishing during moderate expansions and recurring around long–term market peaks followed by market crashes. Cointegration between the time series dynamics, here considered within Vector Autoregressive framework, has been empirically shown to be a time–variant, recurrent phenomenon, too. Keywords: Granger causality · Cointegration · Vector autoregression · Asset bubble · Market crash · Quantitative easing
1
Introduction
Economic and mechanical systems evolve in time according to their specific multivariate and often nonlinear dynamics measured by a set of underlying variables. As systems are mostly stochastic, these variables are modelled by a collection of stochastic processes exhibiting intertemporal dependencies. Some of the processes can either exert leading/lagging influence on one another (causation), or feature subtle joint evolution pattern (cointegration). The framework is general enough to study multivariate empirical data stemming from a number of scientific disciplines. In this paper we will study dynamics of the US economy and financial market by means of selected driving processes, aiming at capturing possible causality or cointegration which themselves can be subject to evolution over time. Overall US stock market capitalization exceeds 30 trillion = 3 × 1013 USD. According to classic rules, long–term price trends (bull vs. bear market) coincide with current macroeconomic background measured by e.g.: GDP dynamics, inflation, unemployment rate, industrial output. Recently however, monetary expansion c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 93–107, 2022. https://doi.org/10.1007/978-3-030-82110-4_5
94
B. Stawiarski
employed by central banks has found its place among main drivers of stock price dynamics, envisaged here by two benchmark indices: S&P500 and NASDAQ Composite. Vast empirical research of the last decade shows that ample monetary stimuli lead to massive decoupling of stock prices from underlying fundamentals, resulting in asset bubbles followed by crashes. Therefore it seems important to inspect the cause and effect dynamics, namely which sets of inputs influence which other variables. Some classic techniques are used both to detect sources of causation (transmitted via the time axis) and capture contemporaneous dependencies between seemingly unrelated variables. Tracking these interactions between financial markets, real economy and central banks balance sheets is still more challenging in the presence of bubbles and crashes (known as boom–bust cycles). The paper is organized as follows. In Sect. 2 we introduce empirical data sets to be processed throughout the paper and perform preliminary transformations ensuring stationarity. Section 3 deals with testing Granger causality with respect to the two stock indices, considered in moving 2–year time windows to track presence or absence of causality. In Sect. 4 we focus on verifying cointegration between indices, economic fundamentals and central banks balance sheets. Section 5 concludes the paper and provides promising topics for further research.
2 2.1
Data Sets Description and Preliminary Processing Empirical Data Sets
In our empirical study we focus on specific econometric data from the United States throughout years 2003–2019, sampled monthly. Each original series consists of n = 204 entries recorded at month ends, provided by Saint Louis FED database [7]. The data are as follows: I Manufacturing PMI index – measure of activity at US factories, II Federal Reserve Bank (FED) balance sheet – overall value of assets held by US central bank which can be treated as a market liquidity proxy, III S&P 500 index quotes, IV NASDAQ Composite index quotes, V Aggregate balance sheet of three main central banks (FED, European Central Bank, Bank of Japan) expressed in USD based on respective month–end currency crosses, namely EURUSD, USDJPY. Accordingly, entire empirical data set constitutes a 204 × 5 matrix or, equivalently, 5–dimensional uniformly sampled econometric time series arranged columnwise. The data sets are visualized on subsequent plots of Figs. 1 and 2 below.
Granger Causality and Cointegration During Stock Bubbles
95
Fig. 1. US PMI, S&P500 and NASDAQ composite data (2003–2019)
Fig. 2. FED and (FED + ECB + BoJ) assets monthly data (2003–2019)
The US Manufacturing PMI is rather range–bound with faint low–frequency cyclical behavior. Stock indices and central banks balance sheets are evidently nonstationary because of distinct trends, sometimes even faster than linear. Since 2009 the two US indices have experienced secular bull market, accompanied by
96
B. Stawiarski
explosive growth of FED and other main central banks assets. Fundamental environment is described by Manufacturing PMI index which is recorded monthly. GDP and corporate profits are measured only quarterly and due to weak data resolution we exclude these series here. Indeed, KPSS test proposed in [5] strongly rejects stationarity hypothesis (narrowly only in case of PMI), whereas the augmented Dickey–Fuller (ADF) test does not reject unit–root hypothesis in case of stock indices and balance sheet data. Granger causality analysis requires covariance–stationary data, therefore we will need transformed data according to well–known techniques aiming at eliminating trends and tapering heteroscedasticity. For each univariate time series {Xt }1≤t≤n we define a transformed series {ΔXt } as either logreturn or common differencing: t log XXt−1 def ΔXt = (1) Xt − Xt−1 The latter transform is applied only to the low–volatility PMI series with no linear trend, instead exhibiting only a slowly varying cyclical component. The remaining four empirical time series are subject to transform yielding logreturns. Both schemes are additive over larger time horizons in case temporal aggregation were necessary. Stationarity tests are now comfortably passed, no unit–roots are detected. Sudden spike in FED balance sheet in late 2008 is a distinct outlier. For convenience, we will denote the transformed series as {Δ(N asdaq)t }, {Δ(P M I)t } and so on. Now we proceed to explore the Granger causality between fundamental, monetary data on one hand, and financial time series on the other.
3 3.1
Granger Causality Study: What Drives Stock Indices Returns? Growing Impact of Monetary Stimuli on Financial Markets
In the classic study of economics systems dynamics, especially in the context of stock market performance relative to macro input variables, there has been well understood role of leading and lagging variables driving economy and long–term stock returns within a given cycle phase. For instance, unemployment rate is a typical lagging indicator, whereas yield curve has performed quite well as leading indicator (especially inversions preceding recessions by a 1–2 year margin). These interdependencies used to be modelled by wide variety of time series models, e.g. regression, vector ARIMA’s, conditional copulae, GARCH–type models (Fig. 3). Currently, especially since the financial crisis of 2008–09, there is a growing impact of monetary interventions upon stock prices dynamics. The US Federal Reserve Bank has engaged in four large–scale asset purchase programs known as quantitative easing (QE ). Simultaneously, European Central Bank,
Granger Causality and Cointegration During Stock Bubbles
97
Fig. 3. Three time series after “stationarizing” transforms
Bank of Japan and Bank of England have been carrying out their own QE ’s. This contributes to releveraging the overall financial system and artificial risk suppression. The issue of markets decoupling from economic fundamentals has been raised since around 2012–13. Typical economic cycles have become over– extended and stock valuations have run well above commonly approved long– term averages. 3.2
Granger Causality Test
Crucial question facing asset managers and creators of macroeconomic policies is which econometric or/and financial time series tend to lead (here: cause) which other time series. Granger (1969) has developed a novel testing procedure devised for detecting causality between time series, see [2]. Definition 31. Wide–sense stationary time series {Xt }t∈Z is said to Granger cause another w.–s. s. time series {Yt }t∈Z if for any fixed t ∈ Z (2) σ 2 Yˆt |Yt−1 , Yt−2 , . . . , Xt−1 , Xt−2 , . . . < σ 2 Yˆt |Yt−1 , Yt−2 , . . . 2 where σ 2 (Yˆt |Ft−1 ) = E Yˆt − E(Yˆt |Ft−1 ) Ft−1 is the variance of the optimal linear forecast Yˆt based on filtration Ft−1 i.e. σ–algebra generated by the history of processes {Xt } and {Yt }.
98
B. Stawiarski
The above definition states that including {Xs }s 0, if x = 0, if x < 0.
with parameters: μ ∈ R, σ > 0, −1 < λ < 1, p > 0, q > 0; that are amenable for (in order) shift, scale, skewness and both p and q for kurtosis and “weight” of the tails. It is in high importance to remember that we use the not-centered and not-scaled version of the SGT distribution. We will use the aforementioned distribution to model the noise in the data (instead of normal distribution).
3
Estimation of the Model’s Parameters
For the estimation procedure first we will use the discretized version of Eq. (2) with dt → 1 (∀i ti+1 − ti = 1): Yti := Xti+1 − Xti = α1 (ti ) + α2 (ti )Xti + (1 − Hti )St1i + Hti St2i ,
(5)
where {Stki } (k = 1, 2; i = 1, 2, . . . , n) are independent random variables and have following distributions: St1i ∼ SGT (μ1 , σ1 , λ1 , p1 , q1 ), St2i ∼ SGT (μ2 , σ2 , λ2 , p2 , q2 ).
(6)
Non-Gaussian Regime-Switching Model in Application
113
In the real applications we actually consider the realization of the process {Xt } given in (5). In the further analysis we denote the vector of realisations of the process {Xt } as x = {xi } while its increments as y = {yi } = {xi+1 − xi }, i = 1, 2, . . . , n. For estimation of α1 (ti ) + α2 (ti )xti we will use modified method of least squares. Let us define loss function of classic least squares method [46] for our model (5): L(x, y; {k αw }) =
n i=1
2
(yi − yˆi ) =
n
2
(yi − (α1 (ti ) + α2 (ti )xti )) .
i=1
However, this choice of loss function is not well suited for noise with heavy tails. Thus we will use a loss function which makes estimates less sensitive to outliers. We could utilize weighted least squares method [47] however, there is a problem in choosing proper weights especially when we assume that residuals do not have to have a finite first moment. Another loss function that tries to handle outliers is Huber loss function [48]: n 2 (yi − yˆi ) , for |yi − yˆi | ≤ δ; L(y, yˆ; δ) = i=1 (7) n 2 2δ|y − y ˆ | − δ , for |yi − yˆi | > δ. i i i=1 Merging together mean absolute and mean squared loss functions, cause the Huber loss function to achieve better performance when dealing with heavy tailed data. Parameter δ in Eq. (7) defines the distance from yˆ that is needed for yi to be treated as outlier. In our research, we will use Charbonnier/pseudo-Huber loss function [49]. This function is a modification of Huber loss function that avoids assembling |x| and x2 explicitly - for small values of x the Charbonnier loss function behaves like x2 and for large like |x|. It is given by the following formula: ⎛ ⎞ 2 n yi − yˆi L(y, yˆ; δ) = δ2 ⎝ 1 + − 1⎠ . (8) δ i=1 Here δ parameter has similar meaning to one in (presented earlier) standard Huber loss function (7). To capture the local changes in trend in the data we will use local regression method [50]. First let us define pseudo-Huber function as follows: ⎛ ⎞ 2 y − y ˆ fH (y − yˆ; δ) = δ 2 ⎝ 1 + − 1⎠ . (9) δ Then, to find local estimates of the parameters, we will minimize the loss function of the following form: Lt∗ (x, y; {k αw }) =
n i=1
fH (yi − α1 (ti ) + α2 (ti )xti )Kb,br (ti − t∗ ) + τ
dw 2 w=1 k=0
2 k αw .
(10)
114
D. Szarek et al.
Functions α1 (·) and α2 (·) (from Eq. (5)) are locally approximated by Taylor’s polynomials [51] of d1 and d2 degree (we assume that α1 ∈ C d1 and α2 ∈ C d2 ): αw (ti ) ≈
dw (k) αw (t∗ )
k!
k=0
(ti − t∗ )k =
dw
k k αw ti ,
w = 1, 2.
(11)
k=0
The function Kb,br (·) is asymmetric kernel function (proposed in [29]). In this paper, we will use asymmetric tricube kernel function [52] given by the following equation: t 1t≤0 + K btr 1t>0 K b−b r , (12) Kb,br (t) = 2 b where K(·) is tricube kernel function [52]: K(t) =
70 (1 − |t|3 )3 1t∈(−1,1) . 81
Hyperparameters b and br are (in order) width of the kernel and distance from 0 to right root of the kernel. The last term in the loss function (10) is Tikhonov regularization [53]. We will exploit this regularization (similarly to ridge regression [54]) with one hyperparameter τ ∈ R+ . We minimize the loss function (10) using numeric minimization methods such as Broyden-Fletcher-Goldfarb-Shanno algorithm [55]. To ease computations, we can pass Jacobian of the loss function (10). Knowing the derivative of pseudoHuber function (9): y − yˆ (y − yˆ; δ) = fH 2 , y−ˆ y 1+ δ we can easily compute the Jacobian: ∂Lt∗ = −fH (yi − yˆ; δ)Kb,br (ti − t∗ )tki + 2τ k α1 ; ∂ k α1 i=1 n
∂Lt∗ = −fH (yi − yˆ; δ)Kb,br (ti − t∗ )xi tki + 2τ k α2 . ∂ k α2 i=1 n
Let us define:
(13)
Wti := Hti St1i + (1 − Hti )St2i
as the detrended time series (5). Then, using estimates of {k αw }, we remove a drift from the data: w ˆ i = yi −
d1 k=0
ˆ 1 tki kα
−
d2 k=0
ˆ 2 xi tki kα
≈ hi s1i + (1 − hi )s2i .
Non-Gaussian Regime-Switching Model in Application
115
Assuming that {Hti } is a Markov chain [56] and for any measurable set A following equation holds [57]: P (Wti ∈ A|Ht1 = h1 , . . . , Hti = hi ) = P (Wti ∈ A|Hti = hi ), then we can use estimation methodology for Hidden Markov Models for continuous distributions with two hidden states. Let us define: (14) ζti |tj (k) = P (Hti = k|W (tj ) ; M). Namely, it is a probability of {Wti } being in a state k at a time ti under condition tj of data up to the time tj (W (tj ) = {Wt }t=t ) and a set of model’s parameters 1 M = {μ1 , σ1 , λ1 , p1 , q1 , μ2 , σ2 , λ2 , p2 , q2 , P , η}. In the set M, P is a matrix of transition probabilities P = {pij }: ∀k pij = P (Htk+1 = i|Hki = j), i, j = 0, 1 and η is a vector of initial distribution, namely ηi = P (Ht1 = i), i = 0, 1. The other parameters that need to be estimated correspond to the SGT distribution for the C1 and C2 states. To estimate ζti |ti (ti = 1, 2, . . . , T ) we use following equations [58]: ζˆti |ti =
ζˆti |ti−1 ξti , 1 (ζˆti |ti−1 ξti )
(15)
ζˆti+1 |ti = P ζˆti |ti ;
(16)
with starting condition ζˆt1 |t0 = η. The 1 indicates the transposition of the matrix 1. In Eq. (15), symbol stands for Hadamard product (element-wise multiplication), 1 is a vector of ones, namely: 1 = 1 1 and ξti is given by: ξti = {fSGT (wi |hi = j − 1; μj , σj , λj , pj , qj )}j=1,2 . Using those matrices we can compute log-likelihood function [59]: l(M) =
tn
ln 1 (ζˆti |ti−1 ξti ) .
(17)
t=t1
Using Kim’s algorithm [60] we can compute probabilities conditioned on a whole data (calculate starting from tn−1 down to t1 ): (18) ζˆti |tn = ζˆti |ti P ζˆti |tn ζˆti+1 |ti , where is Hadamard division operator. Then using Hamilton’s estimator [61] for transition probabilities we have: tn pˆij = pij
t=t2
(t−1)
ˆ
;M) ˆ P (ht−1 =i|w P (ht = j|w(tn ) ; M) ˆ P (ht =j|w(t−1) ;M) . tn (tn ) ; M) ˆ t=t2 P (ht−1 = i|w
(19)
116
D. Szarek et al.
Afterwards, we estimate parameters of SGT distributions for both states using the maximum likelihood estimation method. For each subset: (0)
(1)
(0)
(1)
ˆi : ζti |tn > ζti |tn } and W1 = {w ˆi : ζti |tn ≤ ζti |tn } W0 = {w (k)
(ζti |tn indicates kth element of vector ζti |tn ) we maximize log-likelihood function: l(Wk−1 ; μk , σk , λk , pk , qk ) =
ln (fSGT (w; μk , σk , λk , pk , qk )) , k = 1, 2,
w∈Wk−1
(20) using e.g. Nelder-Mead algorithm [62]. The whole procedure is repeated until convergence i.e. until a change in the log-likelihood function (17) is less than some (picked) small constant ε. All of the estimation steps are gathered in pseudo-code (Algorithm 1).
4
Simulation Study
To check the effectiveness of the proposed estimation procedure, we have analyzed the simulated data that resemble real data that we consider in the next section. The trajectory of the model was generated using Euler method [63] for the following SDE: Yt = α1 (γt) + α2 (γt)Xt + (1 − Ht )St1 + Ht St2 , X0 = 0, γ = 10−3 ; α1 (t) = 1.87t − 28.9t2 + 104.8t3 − 133.3t4 + 55.5t5 , t ∈ [0, 1];
(21)
2
α2 (t) = 0.1 − 0.2t + 0.15t , t ∈ [0, 1]; with initial distribution P (H1 = 0) = 1 − P (H1 = 1) = 1 and 0.986 0.014 St1 ∼ SGT (0, 1, 0.1, 1, 4), P = . 0.0225 0.9775 St2 ∼ SGT (0, 3, −0.2, 2.5, 2), For such set of parameters the SGT distribution is skewed and has the property of heavy tails. Vector of 1001 samples was generated for t ∈ {0, 1, . . . , 1000}. The exemplary trajectory is shown in Fig. 1. We then used the estimation methodology proposed in Sect. 3. Using grid search method, we picked the following hyperparameters: δ = 2 (parameter of pseudo-Huber function (9)); d1 = 2 (degree of local approximation of α1 (·) polynomial (11)); d2 = 1 (degree of local approximation of α2 (·) polynomial (11)); b = 200 (bandwidth of the asymmetric kernel function (12)); br = 75 (distance from 0 to right bound of the asymmetric kernel function (12)); – τ = 0.1 (Tikhonov regularization).
– – – – –
Non-Gaussian Regime-Switching Model in Application
117
Algorithm 1. Estimation algorithm δ ← picked hyperparameter defined alongside the Huber loss function (7) d1 , d2 ← picked hyperparameters for α1 (·), α2 (·) functions (11) b, br ← picked hyperparameters defined for the asymmetric kernel function (12) η ← picked initial distribution ie. 1 0 1/2 1/2 5: P ← picked initial transition matrix ie. 1 1 /2 /2 6: ε ← Some small constant greater than zero 7: n ← length(y) 8: ˆ ← [] 9: α 10: for t∗ ← [t1 , t2 , . . . , tn ] do Minimize L (10) using Jacobian (13) 11: {k αw } ← arg min(Lt∗ (x, y; {k αw })) {k αw } 1 2 ˆ 1 t∗k + dk=0 ˆ 2 xi t∗k 12: α[t∗ ] ← dk=0 kα kα 1: 2: 3: 4:
13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:
ˆ ←y−α ˆ w ˆ2 ← random data point from y μ ˆ1 , μ ˆ2 ← 2 · std(y) σ ˆ1 ← std(y); σ ˆ 2 ← 0; pˆ1 , pˆ2 ← 2; qˆ1 , qˆ2 ← ∞ ˆ1 , λ λ ˆ ← −∞ L(M) ← 0; L(M) ˆ > ε do while |L(M) − L(M)| ˆ L(M) ˆ ←0 L(M) ← L(M); ζˆt+1|t ← [η]; ζˆt|t ← [] for i ← 1 : n do ζˆt|t [i] ← ζˆti |ti ζˆt+1|t [i] ← ζˆti+1 |ti ˆ ← L(M) ˆ + ln 1 (ζˆt |t ξ ti ) 25: L(M) i i−1 26: 27: 28: 29: 30: 31: 32: 33:
ζˆt|T ← [ ] ζˆt|T [n] ← ζˆt|t [−1] for i ← n − 1 : 0 do ζˆt|T [i] ← ζˆt |t i
Get the last element - ζˆtn |tn Calculate using Eq. (18). Utilize previously calculated elements of ζt|T
n
P ← Pˆ for k ← 1 : 2 do ˆ k , pˆk , qˆk ← ˆk , λ μ ˆk , σ
Calculate using Eq. (15) Calculate using Eq. (16)
Update transition matrix P using equation (19) arg max
l(Wk−1 ; μk , σk , λk , pk , qk )
See Eq.(20)
μk ,σk ,λk ,pk ,qk
For those hyperparameters, for every time point we calculated the local estimates of α1 (·) and α2 (·) functions. We show the results in Fig. 2a. There can be observed the robustness of the estimation method to outliers (see the behavior near t = 500 where abnormal values can be found), rising from the usage of pseudo-Huber loss function. Then we performed estimation procedure for the Hidden Markov Chain part with the following results:
118
D. Szarek et al.
Fig. 1. The exemplary realization of the stochastic process defined by Eq. (21).
ˆ 1 = 0.1454, pˆ1 = 1.3525, qˆ1 = 2.0168, μ ˆ1 = 0.0582, σ ˆ1 = 1.1268, λ ˆ 2 = −0.1892, pˆ2 = 3.4400, qˆ2 = 1.0538, μ ˆ2 = −0.1841, σ ˆ2 = 2.9016, λ 0.992646 0.007354 Pˆ = . 0.012827 0.987173 In Fig. 2b we also present the probabilities of the vector of the data being in a more violate and stable state with accompanying vector of differences of the vector {xti }. Let us note that we determine the process’s state by picking that one that entails the highest conditional probability ζti |tn (14). On this single example, we see that the estimates are very close to theoretical ones. To further validate the method we performed 250 Monte Carlo simulations and visualized the distribution of the SGT distributions’ parameters estimators and the distribution of the hidden state accuracy (ratio of when the estimated and theoretical states match) using box plots (presented in Fig. 3). We observe that the method estimates regimes with high accuracy. However, from the Fig. 3c, we get that sometimes the estimated parameter σ (responsible for the variance) is overestimated. It is caused by the fact that when many unlikely values (“heavy tail” property) occur, the likelihood of it being caused by large variance is larger than by the “heavy tail” property while the sample size is relatively small (then also parameters p and q are wrongly estimated as they are responsible for modeling the “heavy tail” property). The solution to this problem can be solved by using the trajectories of a larger sample size. Thus one can conclude that the estimation methodology is correct and provides reasonable results.
5
Real Data Analysis
In this section, we check the performance of the proposed model in real-life application, by modelling daily copper price data. The data consists of 2525 data points from the beginning of 2004 until the end of 2013. In Fig. 4a we observe variance that is changing in time. Due to this fact we transformed the vector taking the Box-Cox transformation [64] - i.e. natural logarithm of the data. In the next step, we scaled the data by 1000 to reduce numerical errors. From
Non-Gaussian Regime-Switching Model in Application
119
Fig. 2. Resulting estimates for simulated data for SDE given by Eq. (21).
Fig. 4b, where differentiated data is presented, we see periods when the variance is significantly higher than in other periods and it resembles the realization of a random variable with heavier tails than Gaussian distribution. Thus the proposition of using this model seems to be reasonable. The estimation method requires the selection values of hyperparameters which we can find using grid search - we chose ones with lowest loss. For the local estimation of α1 (·) and α2 (·) functions we picked the following values: – δ = 75 (parameter of pseudo-Huber function (9)); – d1 = 0 (degree of local approximation of α1 (·) polynomial (11))
120
D. Szarek et al.
Fig. 3. Accuracy of regimes estimates with distributions of model’s (5) parameters’ estimators.
Fig. 4. Copper price data used for evaluation of the proposed methodology.
– d2 = 1 (degree of local approximation of α2 (·) polynomial (11)); – b = 750 (bandwidth of the asymmetric kernel function (12)); – br = 187.6 (distance from 0 to right bound of the asymmetric kernel function (12)); – τ = 0.7 (Tikhonov regularization). Taking the mentioned above parameters, we minimized the loss function (10). The results are presented in Fig. 5. After that, we estimated parameters of the Hidden Markov Chain with SGT distribution as the noise. Estimated parameters are as follows: ˆ1 = 36.71, μ ˆ1 = 0.12, σ ˆ 1 = −0.0270, pˆ1 = 3.8919, qˆ1 = 1.1947, ˆ 0.97713 0.02287 λ P = . 0.01223 0.98777 μ ˆ2 = 1.05, σ ˆ2 = 17.12, ˆ 2 = −0.0300, pˆ2 = 2.0268, qˆ2 = 5.8758, λ
Non-Gaussian Regime-Switching Model in Application
121
Fig. 5. Resulting trend estimates for copper price data.
The final results are presented in Fig. 6. We also tested the fit for residuals. Because we presumed that they are independent, we can test residuals separately for every case (orange and blue parts of the vector from plot 6a) of Hidden Markov Chain. We used Kolmogorov-Smirnov (KS) test [65] to validate the null hypothesis that the vector of residuals constitutes a sample from SGT distribution with estimated parameters. For both of the states, the test passes with a p-value greater than 0.95 which is significantly larger than commonly used significance level α = 0.05. The KS test returned test statistics K1 = 0.01267, K2 = 0.01267 with p-values 0.9572 and 0.9601 for the state C1 and C2 , respectively. Thus we can not reject the null hypothesis. To further review the ability of the model to explain randomness in the data, especially the heavy tails, we performed a visual test of goodness of fit - Q–Q plot [66] presented in Fig. 7. The Q–Q plot is a visual test which compares the empirical quantiles of real data and the theoretical ones corresponding to the tested distribution. Based on the results of both tests, we conclude that the model properly describes the data. From Fig. 6a we see that the estimation procedure found the presumed two states which can be easily labelled as calm and violate. This fact finally confirms our assumption of the existence of such hidden states and thus justifies the use of the model to the copper price data.
122
D. Szarek et al.
Fig. 6. Resulting Hidden Markov Chain estimates for copper price data.
Non-Gaussian Regime-Switching Model in Application
123
Fig. 7. Q–Q plots of residuals for each estimated hidden state.
6
Conclusions
In the last decades, the commodities market faced fundamental changes, which justifies using regime-switching models. From the application perspective, models reflecting heteroscedastic parameters behavior and non-Gaussian return distribution can be extremely useful in assessing the state of the commodity markets and forecasting potential price distributions. The approach presented in this paper can be used for practical purposes, especially when one analyzes the commodity prices in risk management applications. The theoretical model that properly reflects the dynamics of commodity price behavior can be used for the simulations for a few years horizon. This is extremely important for business decisions when the analysis of price scenarios is a crucial point. In this paper, we have proposed the stochastic model which takes under consideration the specific behavior of the real data, namely the regime-switching, time-dependent characteristics and non-Gaussian distribution. We have proposed a novel estimation procedure for the considered model’s parameters and checked its efficiency based on the Monte Carlo simulations. Finally, the proposed methodology we have applied to the real data describing the copper prices, one of the main risk factors of the KGHM mining company. Although the proposed model was applied to the financial data description, it is universal and could be applied to any other time series with the specific behavior, mentioned above. The current paper is the continuation of the authors’ previous research when the model with time-dependent parameters and the non-Gaussian behavior was proposed. Here, we extend the methodology to the case with the possible change of the volatility of the analyzed metal’s prices.
References 1. Cortez, C.T., Saydam, S., Coulton, J., Sammut, C.: Alternative techniques for forecasting mineral commodity prices. Int. J. Min. Sci. Technol. 28, 309–322 (2018)
124
D. Szarek et al.
2. Gambaro, A.M., Secomandi, N.: A discussion of non-gaussian price processes for energy and commodity operations. Prod. Oper. Manage. (2020) https://doi.org/ 10.1111/poms.13250 3. Benth, F.E.: Cointegrated commodity markets and pricing of derivatives in a nongaussian framework. In: Kallsen, J., Papapantoleon, A. (eds.) Advanced Modelling in Mathematical Finance. SPMS, vol. 189, pp. 477–496. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45875-5_20 4. Tully, E., Lucey, B.M.: A power garch examination of the gold market. Res. Int. Bus. Finance 21(2), 316–325 (2007) 5. Cortez, C.T., Saydam, S., Coulton, J., Sammut, C.: Alternative techniques for forecasting mineral commodity prices. Int. J. Min. Sci. Technol. 28(2), 309–322 (2018) 6. Obuchowski Jakub, W.A.: The ornstein-uhlenbeck process with non-gaussian structure. Acta Phys. Polon B 44(5), 11 232–1136 (2013) 7. Obuchowski, J., Wyłomańska, A.: The Ornstein-Uhlenbeck process with nonGaussian structure. Acta Phys. Pol. B. 44(5), 1123–1136 (2013) 8. Brockwell, P.: Recent results in the theory and applications of CARMA processes. Ann. Inst. Stat. Math. 66(4), 647–685 (2014) 9. Brockwell, P.J.: Lévy-driven CARMA processes. Ann. Inst. Stat. Math. 53(1), 113–124 (2001) 10. Brockwell, P.J., Davis, R.A., Yang, Y.: Estimation for non-negative Lévy-driven CARMA processes. J. Bus. Econ. Stat. 29(2), 250–259 (2011) 11. Janczura, J., Orzeł, S., Wyłomańska, A.: Subordinated α-stable OrnsteinUhlenbeck process as a tool for financial data description. Phys. A Stat. Mech. Appl. 390(23–24), 4379–4387 (2011) 12. Wyłomańska, A.: Measures of dependence for Ornstein-Uhlenbeck process with tempered stable distribution. Acta Phys. Pol. B. 42(10), 2049–2062 (2011) 13. Salhi, K., Deaconu, M., Lejay, A., Champagnat, N., Navet, N.: Regime switching model for financial data: empirical risk analysis. Phys. A Stat. Mech. Appl. 461, 148–157 (2016) 14. Haldrup, N., Ørregaard Nielsen, M.: A regime switching long memory model for electricity prices. J. Econ. 135(1), 349–376 (2006) 15. Hamilton, J.: Regime-Switching Models. Palgrave McMillan Ltd, London (01 2008) 16. Cai, J.: A markov model of switching-regime arch. J. Bus. Econ. Stat. 12(3), 309– 316 (1994) 17. Alizadeh, A., Nomikos, N., Pouliasis, P.: A markov regime switching approach for hedging energy commodities. J. Bank. Finance 32, 1970–1983 (2008) 18. Ho, T.S.Y., Lee, S.-B.: Term structure movements and pricing interest rate contingent claims. J. Finance 41(5), 1011–1029 (1986) 19. Hull, J., White, A.: Pricing interest-rate- derivative securities. Rev. Financ. Stud. 3(4), 573–592 (1990) 20. Black, F., Derman, E., Toy, W.: A one-factor model of interest rates and its application to treasury bond options. Financ. Anal. J. 46(1), 33–39 (1990) 21. Black, F., Karasinski, P.: Bond and option pricing when short rates are lognormal. Financ. Anal. J. 47(4), 52 (1991) 22. Theodossiou, P.: Financial data and the skewed generalized t distribution. Manage. Sci. 44(12-part-1), 1650–1661 (1998) 23. BenSaïda, A., Slim, S.: Highly flexible distributions to fit multiple frequency financial returns. Phys. A Stat. Mech. Appl. 442, 203–213 (2016) 24. BenSaïda, A., Boubaker, S., Nguyen, D.K., Slim, S.: Value-at-Risk under market shifts through highly flexible models. J. Forecast. 37(8), 790–804 (2018)
Non-Gaussian Regime-Switching Model in Application
125
25. Slim, S., Koubaa, Y., Bensaida, A.: Value-at-risk under Lévy GARCH models: evidence from global stock markets. J. Int. Financ. Mark. Inst. Money 46, 30–53 (2017) 26. Hansen, C., McDonald, J., Theodossiou, P.: Some flexible parametric models for partially adaptive estimators of econometric models. Econ. E-J. 1(7), 1–20 (2007) 27. McDonald, J.B., Michelfelder, R.A., Theodossiou, P.: Robust estimation with flexible parametric distributions: estimation of utility stock betas. Quant. Financ. 10(4), 375–387 (2010) 28. Sikora, G., Michalak, A., Bielak, Ł., Miśta, P., Wyłomańska, A.: Stochastic modeling of currency exchange rates with novel validation techniques. Phys. A Stat. Mech. Appl. 523, 1202–1215 (2019) 29. Dawid Szarek, A.W., Bielak, Ł.: Long-term prediction of the metals’ prices using non-gaussian time-inhomogeneous stochastic process. Phys. A Stat. Mech. Appl. 555, 124659 (2020) 30. Fan, J., Jiang, J., Zhang, C., Zhou, Z.: Time-dependent diffusion models for term structure dynamics. Stat. Sinica 13(4), 965–992 (2003) 31. Su, Y.Y., Cui, H.J., Li, K.C.: Parameter estimation of varying coefficients structural ev model with time series. Acta Math. Sinica English Ser. 33(5), 607–619 (2017) 32. Cui, H.: Estimation in partial linear ev models with replicated observations. Sci. China Ser. A Math. 47(1), 144 (2004) 33. Sophocleous, C., Hara, J., Leach, P.: A model of stochastic volatility with timedependent parameters. J. Comput. Appl. Math. 235, 05 (2011) 34. Reichert, P., Mieleitner, J.: Analyzing input and structural uncertainty of nonlinear dynamic models with stochastic, time dependent parameters. Water Resour. Res. 45(10), 1–19 (2009) 35. Janczura, J., Weron, R.: Efficient estimation of markov regime-switching models: an application to electricity spot prices. AStA Adv. Stat. Anal. 96, 07 (2011) 36. Kim, C.-J., Piger, J., Startz, R.: Estimation of markov regime-switching regression models with endogenous switching. J. Econ. 143(2), 263–273 (2008) 37. Kruczek, P., Żuławiński, W., Pagacz, P., Wyłomańska, A.: Fractional lower order covariance based-estimator for ornstein-uhlenbeck process with stable distribution. Mathematica Applicanda 47, 259–292 (2019) 38. Kitagawa, G.: Non-gaussian state-space modeling of nonstationary time series. J. Am. Stat. Assoc. 82(400), 1032–1041 (1987) 39. Fridman, M., Harris, L.: A maximum likelihood approach for non-gaussian stochastic volatility models. J. Bus. Econ. Stat. 16(3), 284–291 (1998) 40. Vasicek, O.: An equilibrium characterization of the term structure. J. Financ. Econ. 5(2), 177–188 (1977) 41. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637 (1973) 42. Weron, A., Weron, R.: Inżynieria finansowa: Wycena instrumentów pochodnych. Symulacje komputerowe. Statystyka rynku, WNT (1998) 43. Theodossiou, P.: Financial data and the skewed generalized t distribution. Manage. Sci. 44(12), 1650–1661 (1998) 44. Hansen, C., McDonald, J.B., Newey, W.K.: Instrumental variables estimation with flexible distributions. J. Bus. Econ. Stat. 28(1), 13 (2010) 45. Andrews, R.R.G.E., Askey, R.: Special functions, ser. Encyclopedia of mathematics and its applications 71. Cambridge University Press (1999)
126
D. Szarek et al.
46. Charnes, A., Frome, E.L., Yu, P.L.: The equivalence of generalized least squares and maximum likelihood estimates in the exponential family. J. Am. Stat. Assoc. 71(353), 169–171 (1976) 47. Kiers, H.A.L.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997) 48. Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964) 49. Charbonnier, P., Blanc-Feraud, L., Aubert, G., Barlaud, M.: Deterministic edgepreserving regularization in computed imaging. IEEE Trans. Image Process. 6(2), 298–311 (1997) 50. Trevor Hastie, J.F., Tibshirani, R.: The Elements of Statistical Learning. Springer, Cham (2017) 51. Marsden, A.W.J.: Calculus II. Springer, New York (1985) https://doi.org/10.1007/ 978-1-4612-5026-5 52. Jaditz, T., Riddick, L.A.: Time-series near-neighbor regression. Stud. Nonlinear Dyn. Econ. 4(1), 35–44 (2000) 53. Cont, R.: Encyclopedia of Quantitative Finance. Wiley, vol. 4, pp. 1807–1811 (2010) 54. Saleh, A.K.M.E., Arashi, M., Tabatabaey, S.M.M.: Statistical Inference for Models with Multivariate T-distributed Errors. Wiley, pp. 133–170 (2014) 55. Shanno, D.F.: Conditioning of quasi-newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970) 56. Graham, C.: Markov Chains: Analytic and Monte Carlo Computations. Wiley, Chichester (2014) 57. Walter Zucchini, I.L.M.: Hidden Markov Models for Individual Time Series. Chapman and Hall/CRC, Boca Raton (2009) 58. Hamilton, J.: Time Series Analysis. Princeton University Press, Princeton (1994) 59. Ephraim, Y., Merhav, N.: Hidden markov processes. IEEE Trans. Inf. Theor. 48(6), 1518–1569 (2002) 60. Kim, C.-J.: Dynamic linear model with markov switching. J. Econ. 60, 1–22 (1991) 61. Hamilton, J.D.: Analysis of time series subject to changes in regime. J. Econ. 45(1), 39–70 (1990) 62. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965) 63. Fox, L., Mayers, D.F.: Numerical Solution of Ordinary Differential Equations. Chapman and Hall, London (1987) 64. Box, G.E.P., Cox, D.R.: An analysis of transformations. J. Royal Stat. Soc.. Series B (Methodological) 26(2), 211–252 (1964) 65. Stephens, M.A.: Edf statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69(347), 730–737 (1974) 66. Wilk, M.B., Gnanadesikan, R.: Probability plotting methods for the analysis of data. Biometrika 55(1), 1–17 (1968)
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2 Anna E. Dudek1 , Dominique Dehay2 , Harry Hurd3 , and Andrzej Makagon4(B) 1
Department of Applied Mathematics, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland [email protected] 2 Universite Rennes 2, CNRS, IRMAR - UMR 6625, Rennes, France 3 University of North Carolina, Chapel Hill, USA 4 Hampton University, Hampton, VA, USA [email protected]
Abstract. The aim of this paper is to provide readers with basic concepts and techniques for analysis of strongly periodically correlated fields (SCF) over Z 2 . We show that every SCF over Z 2 can be transformed into a coordinate-wise SCF (Fact 3.1) studied in [13]. The main result of the paper however is a specific decomposition of a strongly periodically correlated field (Theorem 4.1) which was not available for coordinate-wise SCFs. As consequences of the latter we obtain a description and an easy proof of existence of the spectral measures of an SCF (Theorem 5.1) as well as a functional description of an absolutely continuous SCF (Theorem 6.1). Most of the facts are explained in details and proved, with an exception of the proof of Theorem 6.1, which was too long for this publication and is left for a forthcoming paper. Keywords: Random fields
1
· Periodically correlated · Cyclostationary
Introduction
The letter Z will denote the set all integers and Z 2 = Z × Z. Elements of Z 2 will be written as row-vectors p = (m, n), m, n ∈ Z. Let T1 = (T11 , T12 ) ∈ Z 2 , T2 = (T21 , T22 ) ∈ Z 2 and let T = [T1 ; T2 ] be a 2 × 2 matrix such that T1 is its first row, and T2 is the second row of T . A function f on Z 2 is called strongly periodic with period T = [T1 ; T2 ] [13] if T1 , T2 are linearly independent (that is det(T ) = 0) and for every m, n, p, q ∈ Z f (m, n) = f (m + pT11 + qT21 , n + pT21 + qT22 ). Note that the equation above can be written as f (m, n) = f ((m, n) + (p, q)T ), m, n, p, q ∈ Z. A random field X over Z 2 is a family X = (X(m, n)), (mj, n) ∈ Z 2 , of elements of a complex Hilbert space H with an inner product (·, ·) indexed c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 127–144, 2022. https://doi.org/10.1007/978-3-030-82110-4_7
128
A. E. Dudek et al.
by Z 2 . In probabilistic setting H = L20 (Ω) is the space of all complex, zero mean, finite variance random variables and (Y, Z) = EY Z, Y, Z ∈ L20 (Ω). An auto-covariance of a random field X is defined as RX ((j, k), (m, n)) = (X(j, k), X(m, n)),
(j, k) ∈ Z 2 , (m, n) ∈ Z 2 .
Two fields X and Y over G in possibly different Hilbert spaces H and K are said to be equivalent if RX = RY . For spectral analysis of fields it is convenient to define an associated with RX family of functions B(j,k) (m, n) = R((j + m, k + n), (m, n)),
(m, n) ∈ Z 2 , (j.k) ∈ Z 2
(1)
Definition 1.1. Suppose that T1 = (T11 , T12 ) and T2 = (T21 , T22 ) are linearly independent. We say that a random field X is strongly periodically correlated or strongly cyclostationary (SCF) with a period T = [T1 ; T2 ] if for every (j, k) ∈ Z 2 the function B(j,k) (m, n) is strongly periodic function of (m, n) with the same period T = [T1 ; T2 ]. An SCF such that B(j,k) (m, n) is constant in (m, n) is called stationary. From Corollary 2.1 in Sect. 2 it follows that an SCF with period T is stationary iff | det(T )| = 1. For a literature on stationary random fields we refer to [19]. In more general setting of an arbitrary discrete group, strongly periodically correlated fields were examined in [4]. In this paper we restrict our attention only to the group Z 2 . For this group we introduce a completely different than in [4] approach based on a description of the structure of a SCF. This approach produces simpler proofs and allows us to obtain results not available with [4] technique. SCFs are natural extension of periodically correlated sequences (PC). The latter have been studied for last 50 or more years. Thanks to contribution of Hurd (e.g. [10,12]), Javorskij (e.g. [5,15]) and many other authors (e.g. [1,2,6– 8,21–24]) the theory and statistics of PC sequences is almost complete, so it seems natural to turn the attention to PC fields. The paper is organized as follows. First in Sect. 2, we recall some tools useful in the sequel. We present some features on the group structure of Z 2 , and we define the notion of period of a subgroup of Z 2 . This is illustrated by an example. Section 3 is devoted to the introduction of strongly periodic functions on Z 2 . Then we study the structure of a strongly periodically correlated field (SCF) in Sect. 4, and its spectrum in Sect. 5. In Sect. 6 we extend the notion of transfer functions for an absolutely continuous SCF. Finally in Sect. 7, we consider the shift group and associated multivariate stationary fields associated to an SCF.
2
Integer Matrices and Groups
All matrices in this paper will be 2 × 2. A matrix with integer coefficients will be called an integer matrix. To save space, a 2 × 2 integer matrix = [aij ] A 12 . The will be written as A = [a11 , a12 ; a21 , a22 ], for example [1, 2; 3, 4] = 34
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
129
determinant of every square integer matrix is an integer. Any integer matrix with determinant equal to ±1 will be called unimodular. Unimodular matrices are obtained from the identity matrix by elementary row or column operations. If A is an unimodular integer matrix then det(A) = ±1, and hence A−1 exists, has integer entries, and is unimodular. If A is a nonzero integer matrix or vector then gcd(A) is the (positive) greatest common divisor of all entries of A. If A is a matrix or vector with real entries and r is a positive number, then [A]r denotes the matrix or vector whose entries are non-negative remainders in division of corresponding entries of A by r, for example [(5π/2, 2π)]2π = (π/2, 0). Fact 2.1. Let T be a 2 × 2 integer matrix. Then: 1. There exist a diagonal integer matrix D and unimodular integer matrices A and B such that T = ADB. None of the matrices A, D, or B are uniquely determined by T . 2. A diagonal matrix D above can be chosen in such a way that D11 = gcd(T ) and D22 = | det(T )|/ gcd(T ), but even then still matrices A and B are not unique. This fact is well known and can be proved by reducing T to a diagonal form by elementary operations on rows and columns, as shown in an example below. The matrix D in part 2, is called the Smith normal form of T . The detailed proof of part 2 for 2 × 2 matrices is in [20]. 2 of Z 2 will be realized as [0, 2π)2 equipped The set Z 2 is a group. The dual Z with coordinate-wise addition modulo 2π. With this identification the action of a character (s, t) ∈ [0, 2π)2 on Z 2 is of the form (s, t)(m, n) = exp(−i(ms + nt)) = exp(−i(m, n)(s, t) ), (m, n) ∈ Z 2 , where denoted the transpose. For normalized Haar measures we take the counting measure on Z 2 , and the Lebesgue measure 2 = [0, 2π)2 . divided by 4π 2 on Z A subgroup of Z 2 is a subset K of Z 2 which itself is a group with respect to the coordinate-wise addition. A subgroup K of Z 2 is said to be strong if it has two linearly independent generators, that is there are T1 = (T11 , T12 ) ∈ Z 2 , T2 = (T21 , T22 ) ∈ Z 2 such that K = {pT1 + qT2 : p, q ∈ Z}. For any two T1 = (T11 , T12 ) ∈ Z 2 , T2 = (T21 , T22 ) ∈ Z 2 we define T = [T1 ; T2 ] to be the 2 × 2 matrix such that T1 is its first row, and T2 is the second, and we denote KT := (Z 2 )T = {(p, q)T : (p, q) ∈ Z 2 }. For a given subgroup K, linearly independent vectors T1 , T2 ∈ Z 2 are generators of K iff KT = (Z 2 )T = K. If KT = K, then the matrix T = [T1 ; T2 ] will be called a period of K. Fact 2.2. If T = [T1 ; T2 ] and S = [S1 ; S2 ] are two periods of K then there is a unimodular 2 × 2 integer matrix A such that T = AS. Proof. Part if follows from the fact that the range of each unimodular matrix A is Z 2 , and hence (Z 2 )T = (Z 2 )AS = (Z 2 )S. To proof the only if part note first that since (Z 2 )T = (Z 2 )S, rows of T are integer linear combinations of the
130
A. E. Dudek et al.
rows of S, and vice versa. Therefore T = AS, and S = BT for some integer matrices A, B. Consequently S = BAS. Since S is invertible (in the field of real numbers), we conclude that BA = I. This implies that det(B) det(A) = 1. Since both det(A) and det(B) are integers, they must be equal ±1. Hence A (and B) is unimodular. A period T is called a simple period if T = DB, where B is unimodular and D is a diagonal integer matrix with positive diagonal entries, i.e. D = [H, 0; 0, V ], where H, V are some positive integers. Fact 2.3. Every strong subgroup K of Z 2 has a simple period. Proof. Let T be any period of K. Then from Fact 2.1 it follows that T = ADB, where A, B are unimodular, and D = [H, 0; 0, V ], with H, V > 0. Since A is unimodular, from Fact 2.2 we conclude that S = A−1 T = DB is an equivalent simple period of K. Note that from part 2 of Fact 2.1 it follows that in the proof above we can always choose H = gcd(T ) and V = | det T |/ gcd(T ), where T is any equivalent period of K. A strong subgroup K is called a product subgroup if there are positive integers H, V such that K = K H,V := (Z)H × (Z)V = {(pH, qV ) : p, q ∈ Z}. K is a product subgroup iff S = [H, 0; 0, V ] is a period of K. Fact 2.4. Let K be a strong subgroup of Z 2 . Then there exist a product subgroup K H,V and a unimodular integer matrix B such that K = K H,V B. Proof. From Fact 2.3 it follows that K has a simple period T = DB, where D = [H, 0; 0, V ] and B is unimodular. Define K H,V = (Z 2 )D = Z)H × (Z)V . Then K = (Z 2 )T = ((Z 2 )D)B = K H,V B. The isomorphism Ψ that we mentioned before is given by Φ(m, n) = (m, n)B −1 . Note that if T = [T1 ; T2 ] and S = [S1 ; S2 ] are equivalent periods of K, then | det(T )| = | det(S)|. This common value is called the index of K and will be denoted dK . Corollary 2.1. If K has a period T with | det(T )| = 1, then K = Z 2 . Proof. Indeed if | det(T )| = 1 then, from the proof of Fact 2.4, we obtain that | det(T )| = | det(D)|| det(B)| = | det(D)| = HV = 1, and hence H = V = 1. Therefore K H,V = Z 2 and consequently, K = K H,V B = Z 2 B = Z 2 , because B is unimodular and hence invertible with an integer inverse. The following notions will be illustrated by an example at the end of the section. Given a strong subgroup K of Z 2 , let Z 2 /K denote its quotient group, that is Z 2 /K is the group of all distinct cosets (i.e. elements of the family
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
131
{(p, q) + K; (p, q) ∈ Z 2 }). The mapping ιK : Z 2 → Z 2 /K defined by ιK (m, n) = (m, n) + K is a homomorphism from Z 2 onto Z 2 /K. Number of elements in Z 2 /K is called the index of K in Z 2 and is equal to dK . The group Z 2 /K can be represented as a subset of Z 2 by means of a cross-section. A cross-section is a set Γ ⊂ Z 2 such that (0, 0) ∈ Γ and the mapping ιK restricted to Γ is one-to one ˙ and onto. If we denote the inverse of (ιK |Γ ) by ξΓ then ξΓ induces an addition + ˙ k) = ξΓ (m+j, n+k), (m, n), (j, k) ∈ Γ . Note that if (m+j, n+k) on Γ , (m, n)+(j, ˙ k) = (m + j, n + k) = (m, n) + (j, k), so happens to belong to Γ , then (m, n)+(j, ˙ coincides with the standard addition on Z 2 . With this within Γ the addition + addition the mapping ξΓ is an isomorphism (i.e. one-to-one, onto, and preserves ˙ group operations) from Z 2 /K onto (Γ, +). There are many ways to choose a cross-section. Given a period T = [T1 ; T2 ] of K, a natural cross-section ΓT for Z 2 /K associated with the period T is the set of all points in the parallelogram with vertices {(0, 0), T1 , T1 + T2 , T2 } excluding edges T1 ↔ (T1 +T2 ) and (T1 +T2 ) ↔ T2 . This ΓT is easy to visualize but not easy to parameterize. If K = K H,V = ZH × ZV , H, V > 0, is a product subgroup, then a natural cross-section for K is Γ H,K = {(p, q); 0 ≤ p < H, 0 ≤ q < V }. If K is strong and is not a product subgroup then a cross-section for Z 2 /K can be found as follows: choose an equivalent period T of K, factor T = ADB, D = [H, 0; 0, V ], as in part 1 of Fact 2.1, and then take Γ = Γ H,K B = {(p, q)B : 0 ≤ p < H, 0 ≤ q < V }.
(2)
From Fact 2.4 it follows that Γ is a cross-section for Z 2 /K. Fact 2.5. If K is strong, then Z 2 /K is finite and for every cross-section Γ and for every period T of K for Z 2 /K n(Γ ) = n(Z 2 /K) = | det(T )| = dK < ∞.
(3)
( n(Δ) above stands for number of elements in the set Δ.) 2 /K of Z 2 /K is by definition the group of all homomor The dual group Z phisms from Z 2 /K into the unit circle on a complex plane equipped with multi2 /K is plication. If K is a strong subgroup of Z 2 then Z 2 /K is finite, and hence Z 2 /K) = d 2 finite and n(Z K (e.g. [9], (23.27)(d)). Because every λ ∈ Z /K defines 2 2 2 2 /K is as a subgroup Λ a character on Z , we can realize Z K of [0, 2π) = Z , namely 2 /K = Λ = {(s, t) ∈ [0, 2π)2 : [(s, t)T ] (4) Z K 2π = (0, 0)}, where [(u, v)]2π is the pair of remainders in division of u and v by 2π, respectively. The addition in ΛK is inherited from [0, 2π)2 , that is (s, t) + (u, v) = [(s + u, t + v)]2π . The group ΛK will be referred as the frequency group of Z 2 /K and its elements will be called frequencies. The lemma below give us a simple parametrization of a cross section Γ for Z 2 /K and dual ΛK of K in the case when K has a simple period.
132
A. E. Dudek et al.
Lemma 2.1. Let K be a strong subgroup of Z 2 and let S = [H, 0; 0, V ]B be a simple period of K. Then Γ = {(p, q)B : 0 ≤ p < H, 0 ≤ q < V }, (5) ΛK = {(s, t) ∈ [0, 2π)2 : (s, t) = [(2πp/H, 2πq/V )(B )−1 ]2π , 0 ≤ p < H, 0 ≤ q < V },
(6)
The value of a character (s, t) ∈ ΛK at (p, q) ∈ Γ is then equal to exp(−i(p, q)(s, t) ). Moreover for any period T of K and any cross-section Γ for Z 2 /K we have that n(ΛK ) = n(Γ ) = dK = | det(T )|. To illustrate the previous notions consider the following example. Example 2.1. Let K = {(j, k) = (4p + 10q, 5p + 14q) : T = [4, 5; 10, 14] is a period of K. T can be written as
p, q ∈ Z}. Then
T = [4, 5; 10, 14)] = [2, −1; 5, −2)] [2, 0; 0, 3)] [1, 2; 0, 1)] = ADB
(7)
with A = [2, −1; 5, −2)], D = [2, 0; 0, 3)], and B = [1, 2; 0, 1)]. Define S = DB = [2, 4; 0, 3)]. Then S is also a period of K and gives much easier description of K, namely K = {(p, q)S = (2p, 4p + 3q) = p(2, 4) + q(0, 3) : p.q ∈ Z} A cross-section Γ for Z 2 /K as defined in (5) of Lemma 2.1 is Γ = {(p, q)B = (p, 2p + q) : p = 0, 1, q = 0, 1, 2} = {(0, 0), (0, 1), (0, 2), (1, 2), (1, 3), (1, 4)},
Since in this example H = 2 and V = 3, (B )−1 = [1, 0; −2, 1)] and in view of (6) of Lemma 2.1 2 /K = Λ Z K = {(s, t) = 2π(p/2, q/3) [1, 0; −2, 1)] modulo 2π : 0 ≤ p < 2, 0 ≤ q < 3}
= {(0, 0), (π, 0), (2π/3, 2π/3), (5π/3, 2π/3), (4π/3, 4π/3), (π/3, 4π/3)}
A cross-section β for [0, 2π)2 /ΛK can be obtained from a simple period S = [H, 0; 0, V ]B as follows β = ([0, 2π/H) × [0, 2π/V ))(B )−1 . In our case H = 2, V = 3, and (B )−1 = [1, 0; −2, 1)], and hence β = {(u, v) [1, 0; −2, 1)] = [(u − 2v, v)]2π :: 0 ≤ u < π, 0 ≤ v < 2π/3} NOTE: The diagonal matrix D above in not the Smith form of T . MATLAB smithForm function produces A = [1, 0; −2, −1)], B = [4, 5; −3, −4)],
D = [1, 0; 0, 6)],
that is H = 1, V = 6, and an equivalent simple period of S = DB = [4, 5; −18, −24)]. Although this do not change ΛK , a cross-section Γ = {(p, q)B : 0 ≤ p < H, 0 ≤ q < V } will be now different (and strange), namely Γ = {(0, 0), (−3, −4), (−6, −8), (−9, −12), (−12, −16), (−15, −20)}. A practical consequence of this is that it seems to be better to do the factorization by hand, rather than use the MATLAB smithForm function, in order to get B as simple as possible.
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
133
In the next lemma we list some other properties of ΛK and Γ that may be needed later in the paper. Lemma 2.2. Let K be a strong subgroup of Z 2 , dK be the index of K, ΛK = 2 /K be as in (4), and let Γ be any cross-section for Z 2 /K. Then: Z 1. The family of sets Γj,k := Γ + (j, k) = {(p, q) + (j, k) : (p, q) ∈ Γ }, (j, k) ∈ K, form a partition of Z 2 . 2. For every (m, n) ∈ Z 2 there are unique (p, q) ∈ Γ and (j, k) ∈ K, such that (m, n) = (p, q) + (j, k). In a sequel we denote [(m, n)]Γ = (p, q) and (j, k) = qΓ (m, n) 0 if (m, n) ∈ /K 3. If (m, n) ∈ Z 2 , then exp(−i(m, n)(s, t) ) = dK if (m, n) ∈ K (s,t)∈ΛK
4. For every (s, t) ∈ ΛT , then
exp(−i(m, n)(s, t) ) =
(m,n)∈Γ
0 if (s, t) = (0, 0) dK if (s, t) = (0, 0)
Proof. 1. Since by definition of Γ , the sets (p, q) + K, (p, q) ∈ Γ form a partition of Z 2 , (Γ + (j, k)) = ((p, q) + K) = Z 2 . (j,k)∈K
(p,q)∈Γ
To show that they are disjoint, suppose that (p, q) ∈ Γj1 ,k1 ∩ Γj2 ,k2 , that is (p, q) = (p1 , q1 ) + (j1 , k1 ) = (p2 , q2 ) + j2 , k2 , (p1 , q1 ), (p2 , q2 ) ∈ Γ , (j1 , k1 ), j2 , k2 ∈ K. This implies that (p1 , q1 ) = (p2 , q2 ) because both (p1 , q1 ), (p2 , q2 ) ∈ Γ and as we mentioned the sets (p, q) + K, (p, q) ∈ Γ , are distinct. 2. This part is an immediate consequence of 1. 3. and 4. If (m, n) ∈ K or (s, t) = (0, 0), then exp(−i(m, n)(s, t) ) = 1 and both sums are equal to the number of elements added that is to dk (Lemma 2.1). To prove that otherwise the sums are 0 we may use the fact that the group Z 2 /K and hence Γ , being finite, is isomorphic with a product group of the form G = Z(m1 ) × · · · × Z(ms ), where Z(m) = {0, 1, . . . , m − 1} with addition modulo m ([9], A.27). As can be easy seen from [9], (23.27)(d), for any such G, λ(g) = 0 if g = 0, and g∈G λ(g) = 0 if λ = 0. λ∈G
3
Strongly Periodic Functions on Z 2
A definition of strongly periodic function on Z 2 was given in Introduction. A strongly periodic function f with period T is said to be coordinate-wise periodic if f has a period of the form S = [H, 0; 0, V ], H > 0, V > 0, that is if there are H, V > 0 such that f (m, n) = f (m + jH, n + kV ) for all m, n, j, k ∈ Z. Fact 3.1. Let f be strongly periodic with a period T and let T = ADB, D = [H, 0; 0, V ], be a factorization of T from Fact 2.1 part 1. Then g(m, n) = f ((m, n)B) is coordinate-wise periodic with a period D.
134
A. E. Dudek et al.
Proof. Recall that A, and B are unimodular, so A−1 exists, is an integer matrix and I = A−1 A. Since f (m, n) = f ((m, n) + (k, j)T ) for every m, n, j, k ∈ Z, we conclude that g((m, n) + (k, j)D) = f (((m, n) + (k, j)D)B) = f ((m, n)B + (k, j)A−1 ADB) = f ((m, n)B + (k, j)A−1 T ) = f ((m, n)B) = g(m, n). Corollary 3.1. Let f be strongly periodic with a period T = [T1 ; T2 ]. If | det(T )| = 1 then f is constant. Proof. By Corollary 2.1, f is periodic with period group K = Z 2 , that is f (m, n) = f (m + p, n + q) for every m, n, p, q ∈ Z 2 Fourier analysis is an important tool in study PC sequences. An idea is to represent a function as a mixture (sum, series, or integral) of fundamental harmonics, which, in the case of Z 2 are the functions on of the form Z 2 (m, n) → exp(−i(m, n)(s, t) ), s, t ∈ [0, 2π). For strongly periodic functions on Z 2 with period group K this is done in the next Lemma. Lemma 3.1. Let f be strongly periodic with period group K, ΛK be as in (4), dK be the index of K, and let Γ be any cross-section for Z 2 /K. For every (s, t) ∈ ΛK define af (s, t) = f (j, k) exp(−i(j, k)(s, t) ). (8) (j,k)∈Γ
Then for every (m, n) ∈ Z 2 , f (m, n) =
1 dK
exp(i(m, n)(s, t) )af (s, t),
(9)
(s,t)∈ΛK T
Proof. Denoting the right-hand side of (9) by R(m, n) and using (8) we obtain that ⎛ ⎞ 1 R(m, n) = f (j, k) ⎝ exp(−i(j − m, k − n)(s, t) )⎠ . dK (j,k)∈Γ
(s,t)∈ΛK
By part 3. of Lemma 2.2 the sum in parentheses is nonzero only if (j −m, k−n) ∈ K. Write (m, n) = (p, q) + (l, r) where (p, q) ∈ Γ and (l, r) ∈ K. Therefore (j − m, k − n) = (j − p, k − q) − (l, r) ∈ K iff (j − p, k − q) = (j, k) − (p, q) ∈ K. But both (j, k) and (p, q) are from Γ , so (j, k) must be equal to (p, q). Therefore the sum (j,k)∈Γ reduces to one element, namely when (j, k) = (p, q). Summing up we obtain that R(m, n) =
1 f (p, q) dK
(s,t)∈ΛK
exp(i(l, r)(s, t) ) = f (p, q),
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
135
because, by definition (4) of ΛK , exp(i(l, r)(s, t) ) = 1 for every (s, t) ∈ ΛK and (l, r) ∈ K. To finish the proof we notice that periodicity of f with respect to K implies that f (p, q) = f ((p, q) + (l, r)) = f (m, n). Explicit descriptions of both ΛK and Γ above is given is Lemma 2.1. Recall that if we write T = ADB as in part 1 of Fact 2.1, then for a simple period of K = KT needed Lemma 2.1 we can take S = DB.
4
Structure of an SCF
Let X = (X(m, n)) be a SCF in H with period T . Let K = KT , Γ be any cross-section for Z 2 /K, ΛK be the frequency group of Z 2 /K, and RX and B be as in Introduction. Since all B(j,k) (m, n) are strongly periodic in (m, n) with the same period T , they have Fourier expansion established in Lemma 3.1 R((j + m, k + n), (m, n)) = B(j,k) (m, n) =
1 dK
a(s,t) (j, k) exp(i(m, n)(s, t) ),
(10)
(s,t)∈ΛK
j, k, m, n ∈ Z, where for every (s, t) ∈ ΛK and (j, k) ∈ Z 2 the Fourier coefficients a(s,t) (j, k) are defined by B(j,k) (p, q) exp(−i(p, q)(s, t) ), (j, k) ∈ Z 2 . (11) a(s,t) (j, k) = (p,q)∈Γ
Recall that dK = | det(T )|. Formula (10) shows that the set of functions {a(s,t) : (s, t) ∈ Λ} completely determines the auto-covariance function of (X(m, n)). Lemma 4.1. Let X = (X(m, n)) be an SCF with a period T = [T1 ; T2 ]. Then there are H, V > 0 and a unimodular matrix B such that Y (m, n) = X((m, n)B), (m, n) ∈ Z 2 , has a period S = [H, 0; 0, V ]. Proof. Let T = ADB, D = [H, 0; 0, V ], be a factorization of T as in Fact 2.1 part 1. Define Y (m, n) = X((m, n)B), (m, n) ∈ Z 2 . Then RY ((m, n) + (p, q)D, (j, k) + (p, q)D) = RX (((m, n) + (p, q)D)B, ((j, k) + (p, q)D)B) = RX ((m, n)B + (p, q)A−1 T, (j, k)B + (p, q)A−1 T ) = RX ((m, n)B, (j, k)B) = RY ((m, n), (j, k)).
The lemma shows that it is enough to study coordinate-wise periodically correlated processes to derive the corresponding properties of any other strongly periodically correlated sequence. We will use it in the proof of our first theorem that describe a structure of am SCF. Theorem 4.1. Let X(m, n) be an SCF with a period T = [T1 ; T2 ], K = KT , and let ΛT be the frequency group of Z 2 /K. Then there are a Hilbert space K, an xo ∈ K, and unitary representations U (m, n) of Z 2 and V (s, t) of ΛK in K satisfying the following canonical commutation relation (CCR) U (m, n)V (s, t) = V (s, t)U (m, n) exp(i(m, n)(s, t) ),
(m, n) ∈ Z 2 , (s, t) ∈ ΛK , (12)
136
A. E. Dudek et al.
such that for every (m, n) ∈ Z 2 1 X(m, n) = exp(−i(m, n)(s, t) )U (m, n)V (s, t)xo . dK
(13)
(s,t)∈ΛK
Proof. Proof is slight modification of the proof given in [17] for PC sequences. First we assume that X(m, n) is coordinate-wise periodically correlated, that is K = (Z)H × (Z)V field. Let H = MX . Consider two commuting unitary operators W1 and W2 on H defined as linear extensions of the mappings W1 X(m, n) = X(m + H, n) and W2 X(m, n) = X(m, n + V ), respectively. Define K = HH × HV . Elements of K are H × V -matrices f = (f j,k ) with entries in f j,k ∈ H. We number rows j and columns k of these matrices from 0 so that 0 ≤ j < H, 0 ≤ k < V . Define a unitary operator U1 on K as follows: if f = (f j,k ) then (U1 f )j,k = f j+1,k if j < H − 1, and (U1 f )H−1,k = W1 f 0,k , 0 ≤ k < V − 1. In a similar way define U2 : (U2 f )j,k = f j,k+1 if j < V − 1, and (U2 f )j,V −1 = W2 f j,0 . For example, if H = 2 and V = 3 ⎛ 0,2 1,2 ⎞ ⎛ 1,2 ⎞ ⎛ 0,2 1,2 ⎞ ⎛ ⎞ f f f W1 f 0,2 f f W2 f 0,0 W2 f 1,0 f 1,2 ⎠ . U2 ⎝ f 0,1 f 1,1 ⎠ = ⎝ f 0,2 U1 ⎝ f 0,1 f 1,1 ⎠ = ⎝ f 1,1 W1 f 0,1 ⎠ , f 0,0 f 1,0 f 1,0 W1 f 0,0 f 0,0 f 1,0 f 0,1 f 1,1 Further define V1 and V2 on K as follows: if f = (f j,k ) then (V1 f )j,k = exp(i2πj/H)f j,k and (V2 f )j,k = exp(i2πk/V )f j,k . It is obvious that V1H = I, and V2K = I. We show that U1 V1 = V1 U1 exp(2π/H). Indeed, if f = (f j,k ) then for 0 ≤ j < H − 1 we have (U1 V1 f )j,k = (U1 (exp(i2π · /H)f ·,∗ )j,k = exp(i2π(j + 1)/H)f j+1,k = exp(i2π/H) exp(i2πj/H)(U1 f )j,k = exp(i2π/H)(V1 U1 f )j,k and for j = H − 1 we have (U1 V1 f )H−1,k = (U1 (exp(i2π · /H)f ·,∗ )H−1,k = exp(i2π(0)/H)W1 f 0,k = exp(i2π/H) exp(i2π(H − 1)/H)(W1 f )0,k = exp(i2π/H)(V1 U1 f )H−1,k .
Similarly, we will see that U2 V2 = V2 U2 exp(i2π/K). Because U1 , U2 , V1 , V2 , in K satisfy U1 U2 = U2 U1 , V1 V2 = V2 V1 , U1 V2 = V2 U1 , V1 U2 = U2 V1 , we conclude that for every m, n, j, k ∈ Z, U1m U2n V1j V2k = V1j V2k U1m U2n exp(i(m, n)(2πk/H, 2πj/V ) ).
(14)
j,k Define xo = (xj,k o ) ∈ K by xo ) = X(j, k), 0 ≤ j ≤ H − 1 0 ≤ k ≤ V − 1, and define a field (f (m, n)) in K by
f (m, n) =
H−1 V −1 1 exp(−i(m, n)(2πj/H, 2πk/V ) ) U1m U2n V1j V2k xo . HV j=0
(15)
k=0
From (14) we obtain that ⎞ ⎛ H−1 V −1 H−1 V −1 1 j k m n 1 j⎠ 1 k ⎝ (U1m U2n xo ). f (m, n) = V V U U xo = V V HV j=0 k=0 1 2 1 2 H j=0 1 H k=0 2
(16)
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
137
Recall that if f = (f j,k ) then (V1j f )p,q = exp(i2πpj/H)f p,q , and hence ⎞p,q ⎛ ⎞ ⎛ H−1 H−1 j 1 1 ⎝ V f⎠ = ⎝ exp(i2πpj/H)⎠ f p,q = 1{p=0} (p, q)f p,q H j=0 1 H j=0 where
1{p=0} (p, q)
=
0 if p = 0 1 if p = 0
1{q=0} (p, q)f p,q . Summing up
.
Similarly
H−1 1 k V2 f V
p,q =
k=0
⎞ ⎞p,q H−1 V −1 j 1 1 0 if (p, q) = (0, 0) k ⎝⎝ V1 ⎠ V2 ) f ⎠ = 1{(p,q)=(0,0)} (p, q)f p,q = f 0,0 if (p, q) = (0, 0) H j=0 H k=0 ⎛⎛
Consequently, applying this to (16) we obtain that f (m, n)p,q = 0 if (p, q) = (0, 0) and f (m, n)p,q = (U1m U2n xo )0,0 if (p, q) = (0, 0). Recall that xp,q o = X(p, q), 0 ≤ p ≤ H − 1, 0 ≤ q ≤ V − 1, so that (U1m U2n xo )p,q = X(p + m, q + n)). Therefore f (m, n)p,q = X(m, n) if (p, q) = (0, 0), and otherwise f (m, n)p,q = 0. The K-norm of such f (m, n) = (f p,q (m, n)), ||f (m, n)||K = ||X(m, n)||H , so the fields (f (m, n)) and (X(m, n)) are unitary equivalent. Being equivalent the field (X(m, n)) also has a representation (15). To finish this part note that U (m, n) = U1m U2n is a representation of Z 2 and V (2πj/H, 2πk/V ) = V1j V2k is a representation of ΛK , because if K = (ZH) × (ZV ) then ΛK = {(2πj/H, 2πk/V ) : 0 ≤ j < H − 1, 0 ≤ k < V } and dK = HV . We therefore proved the representation (13) for coordinate-wise strongly periodically fields. Let now ((X(m, n)) is an arbitrary strongly periodically correlated field, T be its period, T = ADB, D = [H, 0; 0, V ], be a factorization of T as in Fact 2.1 part 1. From Lemma 4.1 it follows that the field Y (m, n) = X((m, n)B), (m, n) ∈ Z 2 , is coordinate-wise periodically correlated with period S = [H, 0; 0, V ]. Therefore from what we just have proven 1 exp(−i(m, n)(u, v) )Uc (m, n)Vc (u, v)xo , Y (m, n) = d Kc (u,v)∈ΛKc
where Kc = (Z)H×(Z)V , ΛKc = {(2πj/H, 2πk/V ) : 0 ≤ j < H−1, 0 ≤ k < V }, dKc = HV = | det(T )| = dK , and Uc and Vc are representations of Z 2 and ΛKc . Therefore X(m, n) = Y ((m, n)B −1 ) =
1 dK
exp(−i(m, n)B −1 (u, v) )Uc ((m, n)B −1 )Vc (u, v)xo .
(u,v)∈ΛKc
Recall that by (6), ΛK = {(s, t) = [(u, v)(B )−1 )]2π : (u, v) ∈ ΛKc }. Define U (m, n) = Uc ((m, n)B −1 ), (m, n)Z 2 , and for each (s, t) ∈ ΛK , V (s, t) = Vc ([(s, t)B ]2π ). Then U (m, n) is a representation of Z 2 , V (s, t) is a representation of ΛK , and substituting (s, t) = [(u, v)(B )−1 )]2π in the above sum we obtain that 1 X(m, n) = exp(−i(m, n)(s, t) )U (m, n)V (s, t)xo . dK (s,t)∈ΛK
138
5
A. E. Dudek et al.
Spectrum an SCF
It is known [13] that each strongly periodically correlated sequence is strongly harmonizable, that is its auto-covariance function R is a Fourier transform of a certain measure on the dual of Z 2 ×Z 2 that is on [0, 2π)2 ×[0, 2π)2 ,. In particular it is shown in [13] that each a(s,t) defined in (11) is the Fourier transform of some complex measure γ(s,t) on [0, 2π)2 . Below we show how to obtain this fact using spectral resolution of representations V (s, t) and U (m, n) appearing in Theorem 4.1: 2π 2π exp(−i(m, n)(s, t) )dE(s, t), (m, n) ∈ Z 2 (17) U (m, n) = 0 0 V (s, t) = exp(i(m, n)(s, t) )P (m, n), (s, t) ∈ Λ. (18) (m,n)∈Γ
Here Γ is a cross-section for Z 2 /KT (recall that Γ is isomorphic to Z 2 /K and hence to Λ K ). Theorem 5.1. Let X(m, n) be an SCF with a period T , K = KT , and let K, xo , U (m, n) and V (s, t) be as in Theorem 4.1. Further let E and P be spectral resolutions of U (m, n) and V (s, t) given in (17) and (18). For every (s, t) ∈ ΛK , let γ(s,t) be a measure on [0, 2π)2 defined by γ(s,t) (Δ) = (E(Δ)xo , V (s, t)xo )K ,
(19)
Then for every (s, t) ∈ ΛK and (m, n) ∈ Z 2
2π
2π
a(s,t) (m, n) = 0
exp(−i(m, n)(u, v) )dγ(s,t) (u, v).
(20)
0
Moreover, if we define K-valued Borel measure G on [0, 2π)2 by G(Δ) =
1 dK
V (s, t)E(Δ)xo =
(s,t)∈ΛK
then
2π
X(m, n) = 0
2π
1 dK
E(Δ − (s, t))V (s, t)xo , (21)
(s,t)∈ΛK
exp(−i(m, n)(u, v) )dG(u, v).
(22)
0
Proof. From (13) and the CCR relation it follows that B(j,k) (m, n) = R((j + m, k + n), (m, n)) = (X((m, n) + (j, k)), X(m, n)) = (U (m, n)U (j, k)Y ((m, n) + (j, k)), U (m, n)Y (m, n)) = (U (j, k)Y ((m, n) + (j, k)), Y (m, n)) 1 exp(−i(m, n)(s, t) ) exp(−i(k, j)(s, t) ) = 2 dK (s,t)∈ΛK (u,v)∈ΛK
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
139
× exp(i(m, nj)(u, v) )(U (j, k)V (s, t)xo , V (u, v)xo ) 1 exp(−i(m, n)(s, t) ) exp(i(m, n)(u, v) ) = 2 dK (s,t)∈ΛK (u,v)∈ΛK
×(U (j, k)xo , V (u − s, v − t)xo ) and hence a(w,z) (j, k) =
B(j,k) (m, n) exp(−i(m, n)(w, z) )
(m,n)∈Γ
1 = 2 dK
(s,t)∈ΛK (u,v)∈ΛK
⎡ ⎣
⎤ exp(−i(m, n)(s − u + w, t − v + z) )⎦
(m,n)∈Γ
×(U (j, k)xo , V (u − s, v − t)xo ) The expression in square brackets is 0 unless (s−u+w, t−v +z) = (0, 0) modulo 2π, that is unless (u − v, v − t) = (w, z). Consequently a(w,z) (j, k) = (U (j, k)xo , V (w, z)xo ) 2π 2π exp(−i(j, k)(u, v) )(dE(u, v)xo , V (w, z)xo ) = 0
0
2π
2π
= 0
exp(−i(j, k)(u, v) )dγ(w,z) (u, v).
0
The formula (22) follows from (13) and CCR which yields that X(m, n) =
1 dK
V (s, t)U (m, n)xo .
(s,t)∈ΛK
Substituting (17) to the above and noticing that CCR implies that V (s.t)E(Δ) = E(Δ − (s, t))V (s.t) finish the proof. Definition 5.1. Measures γ(s,t) , (s, t) ∈ ΛK , defined in (19) and satisfying (20), will be referred to as the spectral measures of a strongly periodically correlated field (X(m, n)). The family γ = {γ(s,t) : (s, t) ∈ ΛK } will be called the spectrum of (X(m, n)). Note that |γ(s,t) (Δ)| = |(E(Δ)xo , V (s, t)xo )| ≤ ||V (s, t)xo )|| ||E(Δ)xo || = ||xo || γ(0,0) (Δ) and hence Corollary 5.1. γ(0,0) ≥ 0 and for every (s, t) ∈ ΛK , γ(s,t) is absolutely continuous with respect to γ(0,0) . Construction given in Theorem 5.1 allows us to prove the uniqueness of the representation (13).
140
A. E. Dudek et al.
Theorem 5.2. Suppose that Xi (m, n), i = 1, 2, are strongly PC fields with the same period group K, and 1 Xi (m, n) = exp(−i(m, n)(s, t) )Ui (m, n)Vi (s, t)xi , i = 1, 2 dK (s,t)∈ΛK
are their representation as in Theorem 4.1. Then X1 and X2 are unitary equivalent iff there is an isometry Φ from K1 = sp{U1 (m, n)V1 (s, t)x1 : (m, n) ∈ Z 2 , (s, t) ∈ ΛK } onto K2 = sp{U2 (m, n)V2 (s, t)x2 : (m, n) ∈ Z 2 , (s, t) ∈ ΛK }, such that Φx1 = x2 , ΦU1 (m, n) = U2 (m, n)Φ, (m, n) ∈ Z 2 , and ΦV1 (s, t) = V2 (s, t)Φ, (s, t) ∈ ΛK . In other words the triple (U, V, xo ) in (13) is uniquely determined by X in the sense of unitary equivalence. Proof. The if part is obvious. Now assume then that X1 and X2 are unitary equivalent. Then they have the same auto-covariance function and hence the same Fourier coefficients a(s,t) (j, k) (see (10) and same spectral measures. From (19) it therefore follows that (E1 (Δ)x1 , V1 (s, t)xo )K1 = (E2 (Δ)x2 , V2 (s, t)xo )K2 , and hence (U1 (m, n)x1 , V1 (s, t)x1 )K1 = (U1 (m, n)x2 , V2 (s, t)x2 )K2 , for all (m, n) ∈ Z 2 and (s, t) ∈ ΛK . Consequently, from the CCR property implies that (U1 (m, n)V1 (s, t)x1 , U1 (j, k)V1 (u.v)x1 )K1 = (U2 (m, n)V2 (s, t)x2 , U2 (j, k)V2 (u.v)x1 )K2
This shows that the mapping Φ defined as Φ (U1 (m, n)V1 (s, t)x1 ) = U2 (m, n) V2 (s, t)x2 , extends linearly to the isometry from K1 = sp{U1 (m, n)V1 (s, t)x1 : (m, n) ∈ Z 2 , (s, t) ∈ ΛK } onto K2 = sp{U2 (m, n)V2 (s, t)x2 : (m, n) ∈ Z 2 , (s, t) ∈ ΛK }.
6
Transfer Function of an a.c. SCF
Let (X(m, n)) be an SCF with the period T , K = KT and let γ(s,t) , (s, t) ∈ ΛK , be its spectral measures. We say that the field (X(m, n)) is absolutely continuous (a.c.) if γ(0,0) is absolutely continuous with respect to the Lebesgue measure dudv on [0, 2π)2 . As we have already noticed if this is the case, then each γ(s,t) is absolutely continuous with respect to the Lebesgue measure. If (X(m, n))is a.c. then functions on [0, 2π)2 defined as g(s,t) (u, v) =
dγ(s,t) (u, v), (s, t) ∈ ΛK , (u, v) ∈ [0, 2π)2 , dudv
will be referred to as spectral densities of (X(m, n)) The notion of transfer function plays important role in the prediction of stationary and PC sequences since it allows to construct a functional model for the sequence, that is find a unitary equivalent representation of the sequence in some standard function space. A transfer function for a PC sequence was first introduced in [16] under a different name of a square factor, and then explored further in [18]. Below is an extension of the concept of transfer function to the case of a.c. SCFs.
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
141
Definition 6.1. Let (X(m, n)) be an SCF a period T , K = KT , ΛK be the frequency group for K, and dK be the index of K (recall that dK = n(ΛK )). Suppose further that (X(m, n)) is a.c., and let g(s,t) (u, v), (s, t) ∈ ΛK , are spectral densities of (X(m, n)). Let Θ be any set with n(Θ) = dK (for example Θ = {1, 2, . . . , dK }. A function f = (f θ ) ∈ L2 (C Θ ) is said to be a transfer function of (X(m, n)) is for each (s, t) ∈ ΛK g(s, t)(u, v) = f (u, v)f (u + s, v + t)∗ ,
(u, v) − e.a.
(23)
C Θ above is the set of all complex functions on the set Θ. The phrase f (u, v)f (u+ s, v + t)∗ means f θ (u, v)f θ (u + s, v + t) f (u, v)f (u + s, v + t)∗ = (f (u, v), f (u + s, v + t))C Θ = θ∈Θ
and its value does not depend on a way that we enumerate elements of Θ, and in fact does not depend on Θ but only on dK . Every a.c. SCF has at least one transfer function. A transfer function is not unique. One of the consequences of the existence of a transfer function is the following functional description of a.c.SCFs. Theorem 6.1. Let (X(m, n)) be an a.c. strongly periodically correlated field with a period subgroup K and ΛK be the frequency group for K. Let Θ be any finite set with n(Θ) = dK and f ∈ L2 (C Θ ) be any transfer function of (X(m, n)). Define H(m, n)(u, v) = (1/dK )
exp(−i(m, n)(u + s, v + t) )f (u + s, v + t),
u, v ∈ [0, 2π).
(s,t)∈ΛK
(24) Then (H(m, n)) is strongly PC field in L2 (C Θ ) with period group K unitary equivalent to (X(m, n)). Moreover, we can always choose Θ = {1, 2, . . . , dK } and then H(m, n) is a field in L2 (C dK ) The proof of this theorem as well as the proof of an existence of a transfer function are too long to be given it here. We include them in a forthcoming paper which will deal with prediction, statistics, and boot-strapping techniques for SCFs. The proofs make use of the notion of associated multivariate stationary fields, which are introduced in next section.
7
Shift Groups and Associated Stationary Fields
If X(n), n ∈ Z, is a periodically correlated sequence with period T > 0, then the shift operator W of (X(n)) is defined as a linear extension of the mapping W X(n) = X(n + T ). The powers W n , n ∈ Z, of W can be viewed as a representation of (Z)T = {nT : n ∈ Z}, W (nT ) = W n , n ∈ Z, or as a representation of Z, WT (n) = W n , n ∈ Z. Consequently there are two ways to define the shift group and associated stationary fields for strongly periodically correlated field with period T , over K = (Z 2 )T or over Z 2 .
142
A. E. Dudek et al.
Definition 7.1. Let (X(m, n)) be an SCF with period T , K = KT , and let MX = sp{X(m, n) : m, n ∈ Z}. The K-shift group of (X(m, n)) is the representation (WK (j, k)) of K in MX defined by ⎛ ⎞ WK (j, k) ⎝ a(p, q)X(p, q)⎠ = a(p, q)X(p + j, q + k). (25) (p,q)∈Z 2
(p,q)∈Z 2 (p,q)
Further let Γ be any cross-section for Z 2 /K. Define XK (0, 0) = (XK (p,q) XK = XK (p, q), (p, q) ∈ Γ , and (p,q)
XK (j, k) = W (j, k)X(0, 0) = (W (j, k)XK
),
(j, k) ∈ K.
) where
(26)
The field (XK (j, k)) defined above is called a stationary Γ -variate K-field associated with the field (X(m, n)) Note that XK (0, 0) depends on a choice of Γ and hence the field (XK (j, k)). Namely XK (0, 0) is the part of the PC field (X(m, n)) visible in the “window” Γ and XK (j, k) the part of the field (X(m, n)) visible in the “window” Γ + (j, k). The field (XK (j, k)) is stationary over K because WK is unitary representation of K. the The spectral resolution Eβ of the representations (WK (j, k)) sits on K, 2 dual to K. K can be represented as a subset β of[0, 2π) , namely a cross-section exp(−(j, k)(u, v) )Eβ (du, dv).
for [0, 2π)2 /ΛK , so we can write W (j, k) = β
Consequently the spectral representation of XK is (p,q) (p,q) XK (j, k) = exp(−(j, k)(u, v) )ZK (du, dv),
(27)
β
(p,q)
(p,q)
(p,q),(l,r)
where ZK (Δ) = Eβ (Δ)XK (0, 0). The spectral measure FK = (FK ) (p,q),(l,r) (p,q) (l,r) (Δ) = (ZK (Δ), ZK (Δ)) and also sits on β. of XK is FK Having defined the representation (WK (j, k)) of K and the stationary field XK (j, k), (j, k) ∈ K, we may define WT (m, n) = WK ((m, n)T )
and
XT (m, n) = XK ((m, n)T ),
(28)
(m, n) ∈ Z 2 . Then clearly WT the is a representation of Z 2 and XT is Γ variate stationary field over Z 2 . This is a second way to associate a multivariate stationary field with an SCF. Given XT , the original PS field (X(m, n)) can be recovered by the formula (p,q)
X((p, q) + (m, n)T ) = XT
(m, n),
(p, q) ∈ Γ, (m, n) ∈ Z 2 ,
(29)
An advantage of (XT (j, k)) over (XK (j, k)) is that the former is over Z 2 not over K, and that the spectrum of (XT (j, k)) sits on [0, 2π)2 while the other on crosssection β. A disadvantage is that it depends on a a choice of a period T while the other only on KT . Both (XK (j, k)) and (XT (j, k)) depend additionally on Γ .
Foundations of the Theory of Strongly Periodically Correlated Fields over Z 2
143
An associated stationary fields provide an essential tool to study prediction properties of SCFs. It allows to use techniques known for multivariate stationary fields, provided that we will find relations between spectral measures and transfer functions of an SCF and an associated stationary field. We will not address this problem in this paper.
8
Conclusion
This paper deals exclusively with the theory of strongly periodically correlated fields indexed by Z 2 (SCF). We have shown that any SCF can be transformed into coordinate-wise SCF by a simple and explicit transformation of the parameter set Z 2 . Using this fact we have obtained a general structure-type theorem for SCFs and a complete description of the spectrum of an SCF in terms of spectral resolutions of the involved unitary representations. A limited discussion on a transfer function of an SCF and associated multivariate stationary fields is also included. Direct motivation for this work came from previous papers by Hurd and his collaborators [4,13] where theoretical properties of various types of periodically correlated random fields were studied over different groups. See [11] for a simple example of a SCF constructed by randomly jittering the variables (m, n) in a two dimensional periodic function f (m, n) which might arise in modeling crops in a field, positioning of sensors in the sea or stitching in the weave of a fabric. Elements of statistical analysis of SCFs, and hence a certain applied motivation for this work, can be found in [14,15], or [3] among others. Acknowledgements. Anna Dudek acknowledges support from the King Abdullah University of Science and Technology (KAUST) Research Grant OSR-2019-CRG84057.2.
References 1. Dehay, D.: Spectral analysis of the covariance of the almost periodically correlated processes. Stoch. Process. Appl. 50(2), 315–330 (1994) 2. Dehay, D., Limiting distributions for explosive PAR(1) time series with strongly mixing innovation, arXiv:1501.02151 2015) (2002) 3. Dehay, D., Hurd, H.: Spectral estimation for strongly periodically correlated random fields defined on R2 . Math. Methods Stat. 11, 135–151 (2002) 4. Dehay, D., Hurd, H., Makagon, A.: Spectrum of periodically correlated fields. Eur. J. Pure Appl. Math. 7(3), 343–368 (2014) 5. Dragan, Y.P., Javors’kyj, I.: Statistical analysis of periodic random processes (Russian). Otbor i Peredacha Informatsii 71, 20–29 (1985) 6. Dudek, A., Leskow, J., Paparoditis, E., Politis, D.N.: A generalized block bootstrap for seasonal time series. J. Time Ser. Anal. 5(2), 89–114 (2014) 7. Dudek, A.: Block bootstrap for periodic characteristics of periodically correlated time series. J. Nonparametric Stat. 30(1), 87–124 (2018) 8. Gladyshev, E.G.: Periodically correlated random sequences. Sov. Math. 2, 385–388 (1961)
144
A. E. Dudek et al.
9. Hewitt, E., Ross, K.A.: Abstract Harmonic Analysis I, 2nd edn. Springer-Verlag, New York (1979). https://doi.org/10.1007/978-1-4419-8638-2 10. Hurd, H.L.: Correlation theory of almost periodically correlated processes. J. Multivar. Anal. 37(1), 24–45 (1991) 11. Hurd, H.L.: Spectral correlation of randomly jittered periodic functions of two variables. In: Hurd, H.L. (ed.) Twenty Ninth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA (1995) 12. Hurd, H.L., Miamee, A.G.: Periodically Correlated Random Sequences. Spectral Theory and Practice. John Wiley & Sons, Hoboken (2007) 13. Hurd, H.L., Kallianpur, G., Farshidi, J.: Correlation and spectral theory for periodically correlated random fields indexed on Z 2 . J. Multivar. Anal. 90, 359–383 (2004) 14. Iversen, H., Lonnerstad, L.: An evaluation of stochastic models for analysis and synthesis of gray scale texture. Pattern Recogn. Lett. 15, 573–585 (1994) 15. Javors’kyj, I., Yuzefovych, R., Kravets, I., Matsko, I.: Methods of Periodically Correlated Random Processes and Their Generalizations. In: Chaari, F., Le´skow, J., Napolitano, A., Sanchez-Ramirez, A. (eds.) Cyclostationarity: Theory and Methods. Lecture Notes in Mechanical Engineering. Springer, Cham (2014). https:// doi.org/10.1007/978-3-319-04187-2 6 16. Makagon, A., Miamee, A.G.: Spectral representation of periodically correlated sequences. Probab. Math. Stat. 33(1), 175–188 (2013) 17. Makagon, A., Miamee, A.G.: Structure of PC Sequences and the 3rd Prediction Problem. In: Chaari, F., Le´skow, J., Napolitano, A., Sanchez-Ramirez, A. (eds.) Cyclostationarity: Theory and Methods. Lecture Notes in Mechanical Engineering, pp. 53–72. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04187-2 5 18. Makagon, A.: Periodically corerlated sequences with rational spectra and PARMA systems. In: Chaari, F., Leskow, J., Napolitano, A., Zimroz, R., Wylomanska, A. (eds.) Cyclostationarity: Theory and Methods III. Applied Condition Monitoring, vol. 6, pp. 151–172. Springer, Cham (2017). https://doi.org/10.1007/978-3-31951445-1 9 19. Mandrekar, V.S., Redett, D.A.: Weakly Stationary Random Fields, Invariant Subspaces and Applications. CRC Press Taylor & Francis Group, Boca Raton (2018) 20. Morandi, P.J.: The Smith Normal Form of a Matrix (2005). http://sierra.nmsu. edu/morandi/notes/SmithNormalForm.pdf 21. Napolitano, A.: Generalizations of Cyclostationary Signal Processing: Spectral Analysis and Applications. John Wiley & Sons, Hoboken (2012) 22. Napolitano, A.: Cyclostationary Processes and Time Series. Theory, Applications, and Generalizations. Elsevier, Amsterdam (2019) 23. Wylomanska, A.: Spectral measures of PARMA sequences. J. Time Ser. Anal. 29(1), 1–13 (2008) 24. Wylomanska, A., Obuchowski Jakub, J., Zimroz, R., and Hurd, H.: Periodic autoregressive modeling of vibration time series from planetary gearbox used in bucket wheel excavator. In: Chaari, F., Le´skow J., Napolitano, A., Sanchez-Ramirez, A. (eds.) Cyclostationarity: Theory and Methods. Lecture Notes in Mechanical Engineering, pp. 171–186. Springer, Cham (2014). https://doi.org/10.1007/978-3-31904187-2 12
Component and the Least Square Estimation of Mean and Covariance Functions of Biperiodically Correlated Random Signals Ihor Javorskyj1,2
, Roman Yuzefovych1,3(B)
, and Oksana Dzeryn1
1 Department of Methods and Facilities for Acquisition and Processing Diagnostic Signals,
Karpenko Physico-Mechanical Institute of National Academy of Sciences of Ukraine, Naukova Street 5, Lviv 79060, Ukraine [email protected] 2 Institute of Telecommunication and Computer Sciences, UTP University of Sciences and Technology, Al. Prof. S. Kaliskiego 7, 85796 Bydgoszcz, Poland 3 Department of Applied Mathematics, Lviv Polytechnic National University, Bandera Street 12, Lviv 79000, Ukraine
Abstract. The component and the least square (LS) estimators of mean and covariance functions of biperiodically correlated random processes (BPCRPs) as the model of the signal with binary stochastic recurrence are analyzed. The formulae for biases and the variances for estimators are obtained and the sufficient condition for the mean square consistency of mean function and Gaussian BPCRP covariance function are given. It is shown that the leakage errors are absent for the LS estimators in contrast to the component ones. The comparison of the bias and variance of the component and the LS estimators is carried out for BPCRP particular case. Keywords: Biperiodically correlated random processes · Mean and covariance functions · Component and the least square estimators · Consistency · Asymptotically unbiased
1 Introduction The recurrence and randomness are the characteristic features of the numerous oscillations which occur in the different fields of science and technology. The telecommunication and telemetric signals, the atmospheric and the ocean noises, the geophysical and radiophysical phenomena, vibrations, and others have these properties [1–10]. Periodically correlated random processes are the adequate mathematical models for analysis of such oscillations [1, 2, 5, 8, 9]. The solution of problems of the signal transformation, processing, diagnostics, prediction, and others allow to improve the efficiency of the obtained results [3, 4, 8–18]. However, the situations often occurr when both natural and man-made signals have the properties of birhythmic changeability, namely the stochastic recurrence of some period interacts with the stochastic recurrence of another period. In the properties of the telecommunication signals, for example, the recurrence caused by © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 145–177, 2022. https://doi.org/10.1007/978-3-030-82110-4_8
146
I. Javorskyj et al.
the periodicity of the carrier can be accompanied by the rhythmic recurrence of modulating signals [4, 7–9, 12, 13]. There are daily, weekly, and annual rhythmic in the electric energy consumption [4, 8, 12, 13]. In the vibration signals of the electric machines, polyrhythmic is caused by the different rotation velocities of the separate units [10, 13, 15, 16]. To analyze the properties of the binary recurrence, the mathematical models of the signals in the form of biperiodically correlated random processes (BPCRPs) can be used [8, 9, 19, 20]. The BPCRP’s mean m(t) = Eξ (t) and covariance functions b(t, u) = ◦
◦
◦
E ξ (t) ξ (t + u), ξ (t) = ξ (t) = ξ (t)−m(t), where E is the mean operator, are determined by the formulae m(t) =
mln eiλln t , λln = l
l,n∈Z
b(t, u) =
2π 2π +n , P1 P2
Bln (u)eiλln t ,
(1) (2)
l,n∈Z
where P1 and P2 are positive numbers called the periods of the binonstationarity, Z is the set of the integer numbers. The BPCRPs are a subclass of almost PCRPs [8, 9, 21– 23]. The quantities Bln (u) are called the covariance components [1, 3, 8, 17] or cyclic functions [2, 4, 9]. It follows from (1) and (2) that series representations for the BPCRP’s mean and covariance functions involve the harmonics with multiple frequencies λl0 = l 2π P1 and 2π λ0n = n P2 and also the combination frequencies λln . The coherent (synchronized) averaging allows to extract and analyze only the additive periodic components of (1) and (2) with period P1 or P2 [8, 9, 20]. To estimate the mean and covariance functions with accounting of all possible harmonic components, we can use the component or the least square methods which are considered in this paper. We shall obtain the conditions of the asymptotical unbiasedness and consistency, the formulae for the biases, and the variances when the process realization length is finite; and for BPCRP particular case, we shall carry out the comparison of the component and LS estimators.
2 Component Estimator We suppose that the number of the harmonic with basic frequencies and the combination frequencies is finite and l and n are integers and belong to interval [−N1 , N1 ] for the mean function and to interval [−N2 , N2 ] for the covariance function. It is obvious that ¯ ln , and B−l,−n = −B¯ ln , where “−” is for real BPCRPs λ−l,−n = −λln , m−l,−n = −m a conjugation sign. BPCRPs in this case can be represented by the following stochastic series: ξ (t) =
N1 l,n=−N1
ξln (t)eiλln t ,
Component and the Least Square Estimation of Mean and Covariance Functions
147
where ξln (t) are jointly stationary random processes with means mln = Eξln (t) and covariance functions ◦
◦
◦
Rlnpq (u) = E ξln (t) ξpq (t + u), ξln (t) = ξln (t) − mln . Then, Bln (u) =
Rp−l,q−n,p,q (u)eiλpq u ,
p,q∈M
where M = {k − N1 , ..., N1 } for k ≥ 0 and M = {−N1 , ..., N1 + k} for k < 0. We shall suppose that auto- and crosscovariance functions of the modulating processes ξkl (t) vanish as lag increases. Hence, lim Bln (u) = 0 ∀ l, n ∈ [−N1 , N2 ].
|u|→∞
(3)
At first, we consider the estimators of the mean function and the covariance function which have the forms of the trigonometric polynomes N1
m(t) ˆ =
m ˆ ln eiλln t ,
(4)
Bˆ ln (u)eiλln t ,
(5)
l,n=−N1 N2
ˆ u) = b(t,
l,n=−N2
and their coefficients are calculated using formulae 1 m ˆ ln = T
T
ξ (t)e−iλln t dt,
(6)
0
1 Bˆ ln (u) = T
T
ξ (t) − m(t) ˆ ξ (t + u) − m(t ˆ + u) e−iλln t dt,
(7)
0
where T is the realization length. Statistics (6) and (7) are called the component [8, 23] or cyclic [9, 12] estimators. Proposition 2.1. For the BPCRP, the mean function of which is represented by the finite Fourier series m(t) =
N1
mln eiλln t ,
(8)
l,n=−N1
statistics (4) and (5) are asymptotically unbiased estimators of the mean function and when conditions (3) are satisfied they are the mean square consistent. For the finite realization length, their bias ε m(t) ˆ = E m(t) ˆ − m(t) is determined by expressions
148
I. Javorskyj et al.
N1
ε m(t) ˆ =
iλ t ε m ˆ ln e ln ,
(9)
l,n=−N1
where N1 ε m ˆ ln =
N1
k=−N1 r=−N1 r=n k=l
sin λkr T2 iλkr T mkr ϕ λk−l,r−n T , ϕ(λkr T ) = e 2, λkr T2
(10)
and the variance has the form N1
Var[m(t)] ˆ =
N1
k,r=−N1 l,n=−N1
⎡ 1 eiλk−r,l−n t ⎣ T
⎤
u B 1− (u) eiλrn u + e−iλkl u du⎦. T k−r,l−n
T
0
(11) Proof of the Proposition 2.1 is given in Appendix A. Proposition 2.2. When conditions (3) are satisfied, component estimator (5) and (7) of the covariance function b(s, u) =
N2
Bpq (u)eiλpq s
(12)
p,q=−N2
ˆ u) = E b(t, ˆ u) − b(t, u) for the finite is asymptotically unbiased and its bias ε b(t, realization length is determined by expression
ˆ u) = ε b(t,
ε Bˆ ln (u) eiλln t
N2 l,n=−N2
−
N2 l,n=−N2
⎡
e
iλln t ⎣ 1
T
T
1−
u Bln (u1 − u)eiλln u + Bln (u1 + u)e−iλln u1 h(N1 , u1 ) T
0
+Bln (u)h˜ ln (Nu, u1 ) du1 , (13)
Component and the Least Square Estimation of Mean and Covariance Functions
149
where N2 N2 Bpq (u)ϕ λp−l,q−n T , ε Bˆ ln (u) = p=−N2 q=N2 q=n p=l N1
h(N1 , u) =
l,n=−N1
h˜ ln (N1 , u, u1 ) =
(14)
cos λln u1 ,
eiλrk u e−iλrk u1 + e−iλr−l,k−n u ,
r∈L1 ,k∈L2
and L1 = {−N1 , ..., N1 + r − 1}, L2 = {−N1 , ..., N1 + k − 1} as r < 0 and k < 0, L1 = {N1 − r + 1, ..., N1 }, L2 = {N1 − k + 1, ..., N1 } as r ≥ 0 and k ≥ 0. Proof of Proposition 2.2 is given in Appendix B. Now we analyze the properties of the variance for the covariance function estimator of Gaussian BPCRP. For its computation, we neglect the components caused by the previous estimation of the mean function. These components, as it follows from their analysis, lead to the appearance of the additional summands in the expression for the variance that have a higher order of smallness. Thus, we have ˆ u) = Var b(t,
N2
⎡
N2
l,n=−N2 p,q=−N2
1 eiλl−p,n−q t ⎣ 2 T
T T
⎤
G(s1 , s1 + u, s2 , s2 + u)ei λpq s2 −λln s1 ds1 ds2 ⎦,
0 0
where ◦
◦
◦
◦
◦
G(s1 , s2 , s3 , s4 ) = E ξ (s1 )ξ (s2 )ξ (s3 ) ξ (s4 ) − E ξ (s1 ) ξ (s2 ) ξ (s3 )ξ (s4 ). Proposition 2.3. Statistics (4) and (6) define the mean square consistent estimator of the covariance function of the Gaussian BPCRP. When conditions (3) are satisfied and in the first approximation the variance of this estimator is determined by the formula ˆ u) = Var b(t,
N2
N2
⎡ e
⎤ T
|u1 | iλ u −iλ u B˜ l−p,n−q (u1 , u) e pq 1 + e pq 1 du1 ⎦ 1− T T
iλl−p,n−q t ⎣ 1
l,n=−N2 p,q=−N2
0
+O T −2 ,
(15) where B˜ ln (u1 , u) is the Fourier coefficients of the function G(s, s + u, s + u1 , s + u, u1 ). Proof of Proposition 2.3 is given in Appendix C.
3 Least Square Estimation It follows from (9) and (10) that statistics (4) is biased as far as mln are overlapped by the coefficients of other numbers. Overlapping values are determined by the weight function
150
I. Javorskyj et al.
ϕ λk−l,r−n T which depends on the difference between the combination frequencies. This effect is called leakage. The components caused by leakage are also present in expression for bias of the component estimators of covariance function (14). To avoid leakage effects it is necessary to choose the realization length T so that the equation T = M1 P1 = M2 P2 , where M1 and M2 are integer numbers, was satisfied. Then, ϕ λk−l,r−n T = 0 ∀ k, l ∈ [−N1 , N1 ] or k, l ∈ [−N2 , N2 ]. However, it is difficult to satisfy this condition in practice. Below we shall prove that the leakage is absent if the least square estimation is used. 3.1 Mean Function Estimation Rewrite the series for the mean function in a real form m(t) = m00 +
N1 N1 c mln cos(λln t) + msln sin(λln t) l=0 n=1
+
N1 N1
mcl,−n cos λl,−n t + msl,−n sin λl,−n t .
n=0 l=1
To simplify the following analysis, rename the harmonic frequencies as well as their amplitudes (Table 1 and Table 2). Table 1. Harmonic frequencies N1 ,−N1
N1 ,−N1 +1
… N1 ,−1
N1 0
N1 1
N1 2
… N1 N1
N1 −1,−N1 N1 −1,−N1 +1 … N1 −1,−1 N1 −1,0 N1 −1,1 N1 −1,2 … N1 −1,N1 … … … … … … … … … 2,−N1
2,−N1 +1
… 2,−1
2,0
2,1
2,2
… 2,N1
1,−N1
1,−N1 +1
… 1,−1
1,0
1,1
1,2
… 1,N1
0,1
0,2
… 0,N1
The quadratic functional for the Fourier coefficients of the mean function has the form
F m ˆ 0, m ˆ c1 , ..., m ˆ cL1 , m ˆ s1 , ..., m ˆ sL1
T = 0
⎡
⎡
⎣ξ (t) − ⎣m ˆ0 +
L1
⎤⎤2
m ˆ cr cos ωr t + m ˆ sr sin ωr t ⎦⎦ dt,
r=1
(16) where L1 = 2N1 (N1 + 1). Rewrite the necessary conditions for the existence of the minimum for functional (16) ∂F ∂F ∂F = 0, = 0, = 0, z ∈ [1, L1 ] ∂m ˆ0 ∂m ˆ cr ∂m ˆ sr
…
…
…
…
…
ω2N1 (N1 +1)
ωN1 (2N1 +1)−1
…
ωN1 (N1 +3)+2
ωN1 (N1 +2)+1
ωN1 (N1 +1)+2
ωN1 (N1 +2)+2
…
1
ω2N 2
ωN1 (2N1 +1)
ωN1 (N1 +1)+1
ωN1 (N1 +2)+2
…
1
ω2N 2 −1
ωN1 (2N1 +1) 1
ωN1 +2 ω2
ω1
ω2N1 +2
…
ω(N1 −1)N1 +2
ωN 2 +2
ωN1 +1
ω2N1 +1
…
ω(N1 −1)N1 +1
1
ωN 2 +1
Table 2. Renamed frequencies
…
…
…
…
…
…
ωN1 −1
ω2N1 −1
ω3N1 −1
…
1
ωN 2 −1
ωN 2 +N −1 1 1
ωN1
ω2N1
ω3N1
…
1
ωN 2
ωN1 (N1 +1)
Component and the Least Square Estimation of Mean and Covariance Functions 151
152
I. Javorskyj et al.
in the matrix form ˆ = m, ˜ Mm where
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ M=⎢ ⎢ ⎢ ⎢ ⎢ ⎣
1 c01 c10 c11 .. .. . . cL0 cL1 a10 a11 .. .. . . a0L1 a1L1
⎤ ... c0L1 a01 ... a0L1 ... c1L1 a11 ... a1L1 ⎥ ⎥ ⎥ ⎥ ⎥ r ∈ 1, 2L + 1 , ⎥ ... cL1 L1 aL1 ... aL1 L1 ⎥ = [mrk ], ⎥ k ∈ 1, 2L + 1 , ... aL1 1 s11 ... s1L1 ⎥ ⎥ .. .. .. ⎥ . . . ⎦ ... aL1 L1 sL1 ... sL1 L1 ⎡ ⎤ ⎤ ⎡ m ˜0 m ˆ0 c ⎥ c ⎥ ⎢m ⎢m ⎢ ˜1 ⎥ ⎢ ˆ1 ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ c ⎥ ⎢ c ⎥ ˆ L1 ⎥, ˜ L1 ⎥ ˜ = ⎢m ˆ = ⎢m m m ⎢ s ⎥ ⎢ s ⎥ ⎢m ⎢m ˆ ⎥ ˜ ⎥ ⎢ 1⎥ ⎢ 1⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎣ . ⎦ ⎣ . ⎦ m ˆ sL1
(17)
m ˜ sL1
and crk
T
1 = T
cos ωr t cos ωk tdt, 0
srk
1 = T
T sin ωr t sin ωk tdt, 0
ark =
1 T
T cos ωr t sin ωk tdt, 0
1 m ˜0 = T
T ξ (t)dt, 0
m ˜ ck
1 = T
T ξ (t) cos ωk tdt, m ˜ sk
1 = T
0
T ξ (t) sin ωk tdt. 0
˜ , then the matrix If the rank of matrix M equals the rank of the expanded matrix M|m equation has only unique solution, i.e., the following: ˜ ˆ = M−1 m, m
(18)
Component and the Least Square Estimation of Mean and Covariance Functions
where M−1 is the inverse matrix M−1 = complements of the elements mrk . Taking into account (18), we obtain
[Mrk ]T |M|
153
and Mrk are the algebraic
1 1 m ˜ j fj (t), |M| 2L
m(t) ˆ =
(19)
j=0
where fj (t) = Mj+1,1 +
L Mj+1,r+1 cos ωr t + Mj+1,L+r+1 sin ωr t . r=1
Proposition 3.1. Statistics (19) is unbiased and when conditions (3) are satisfied also the mean square consistent estimator of the BPCRP mean function, its variance for the finite realization is determined by formula Var m(t) ˆ =
2L 1 1 Rm˜ m˜ fl (t)fn (t), |M|2 l,n=0 l n
where ⎡
T −u cos ω s + u) cos ω (s 1 n l Bpq (u)⎣ eiλpq s Rm˜ l m˜ n T sin ωl (s + u) sin ωn s p,q=−N2 0 0 cos ωl s cos ωn (s + u) ds du. + sin ωl s sin ωn (s + u) 1 = T
T N2
(20)
Proof of Proposition 3.1 is given in Appendix D. 3.2 Covariance Function Estimator Now consider the estimator for the covariance function which has the form of the trigonometric polynom and their coefficients are found by minimization of the quadratic functional F Bˆ 0 (u), Bˆ 1c (u), ..., Bˆ Lc 2 (u), Bˆ 1s (u), ..., Bˆ Ls 2 (u) T = 0
2 L2 dt, ζ (t, u) − Bˆ 0 (u) + Bˆ rc (u) cos ωr t + Bˆ rs (u) sin ωr t
(21)
r=1
where ζ (t, u) = ξ (t) − m(t) ˆ ξ (t + u) − m(t ˆ + u) , ωr are the frequencies renamed by analogy with Tables 1 and 2, and L2 = 2N2 (N2 + 1) is the number of harmonics of
154
I. Javorskyj et al.
renamed Fourier series. Write the necessary conditions of the minimum existence for functional (21) in the form of the matrix equation ˆ ˜ DB(u) = B(u),
(22)
where D is similar to (17), the quadratic matrix (2L2 + 1) × (2L2 + 1), and T ˆ B(u) = Bˆ 0 (u) Bˆ 1c (u)...Bˆ Lc 2 (u) Bˆ 1s (u)...Bˆ Ls 2 (u) , T ˜ B(u) = B˜ 0 (u) B˜ 1c (u)...B˜ Lc 2 (u) B˜ 1s (u)...B˜ Ls 2 (u) , T
1 B˜ 0 (u) = T
ζ (t, u)dt, 0
B˜ rc (u) =
1 T
T
ζ (t, u) cos ωr tdt, B˜ rs (u) =
0
1 T
T
ζ (t, u) sin ωr tdt,r ∈ [1, L2 ].
0
The solution of matrix Eq. (22) has the form [Drk ]T ˜ ˆ ˜ B(u) = D−1 B(u) = B(u), |D|
(23)
where [Drk ]T is the transposed matrix of the algebraic complements. The mathematical ˜ expectations of the elements for matrix B(u) are equal to 1 E B˜ 0 (u) = T
T
b(t, u) − εζ (t, u) dt,
0
E B˜ rc (u) =
1 T
T
b(t, u) − εζ (t, u) cos ωr tdt,
0 s E B˜ r+L (u) = 2
1 T
T
b(t, u) − εζ (t, u) sin ωr tdt, r ∈ [1, L2 ],
0
where
◦ ◦ ◦ ◦ ◦ ◦ εζ (t, u) = E m(t) ˆ ξ (t + u) + ξ (t) m(t ˆ + u) − m(t) ˆ m(t ˆ + u) .
Proposition 3.2. When conditions (3) are satisfied, the elements of matrix equation solution (23) are asymptotically unbiased estimators of the BPCRP covariance components; their biases have no leakage components and for the finite realization length they are determined by the formulae 2L 1 2 ˜ ˆ ε Bk (u) Dk+1,1 , ε B0 (u) = − |D| k=0
Component and the Least Square Estimation of Mean and Covariance Functions
155
2L 1 2 ˜ ε Bˆ lc (u) = − ε Bk (u) Dk+1,l+1 , |D| k=0
2L 1 2 ˜ ε Bˆ ls (u) = − ε Bk (u) Dk+1,l+L+1 , |D|
(24)
k=0
where
⎧ ⎫ ⎪ ⎪ T T ˜ k (u) ⎨ ⎬ ε B cos ωk t 1 1 = εζ (t, u)dt, εζ (t, u) ε Bˆ 0 (u) = dt. ⎪ T sin ωk t ⎩ ε B˜ ⎭ T (u) ⎪ k+L2
0
(25)
0
Proof of Proposition 3.2 is given in Appendix E. It follows from Proposition 3.2 that the least square estimator of the covariance function also is asymptotically unbiased and its bias L2 ˆ u) = ε Bˆ 0 (u) + ε Bˆ kc (u) cos ωk t + ε Bˆ ks (u) sin ωk t ε b(t, k=1
is not caused by the leakage effect, it is caused only by previous mean function estimation. ˆ u) = To calculate the variance of the covariance function estimator Var b(t, ˆ u) − E b(t, ˆ u) , rewrite it in the form E b(t, 2 ˆ u) = 1 B˜ l (u)gl (t), b(t, |D| 2L
(26)
l=0
where gl (t) = Dl+1,1 +
L2 Dl+1,r+1 cos ωr t + Dl+1,L2 +r+1 sin ωr t . r=1
Proposition 3.3. When conditions (3) are satisfied, statistics (26) for the Gaussian BPCRP is the mean square consistent and in the first approximation its variance is determined by formulae ˆ u) = Var b(t, where
2L
1 2 −2 , R + O T (u)g (t)g (t) l k ˜ ˜ |D|2 k,l=0 Bk Bl
(27)
156
I. Javorskyj et al.
⎡
2N2
RB˜ k B˜ l (u) =
p,q=−2N2
+
1 T
T
⎣1 T
0 −T
(pq) B˜ pq (u1 , u)fkl (−u1 , θ )du1
⎤
(28)
(pq) B˜ pq (u1 , u)fkl (0, θ − u1 )du1 ⎦,
0
and (pq) fkl (a, b)
1 = T
b e
iλpq t
cos ωk t cos ωl (t + u)
sin ωk t sin ωl (t + u)
a
dt.
(29)
2 ˆ u) = E b(t, ˆ u) − E b(t, ˆ u) Proof. Proceeding from (26) for the variance Var b(t, we have (27), where the covariance RB˜ k B˜ l (u) = E B˜ k (u)B˜ l (u) − E B˜ k (u)E B˜ l (u) is equal to
1 RB˜ B˜ (u) = 2 k l T
T T ◦ ◦ ◦ ◦ cos ωk t cos ωl s E ξ (t) ξ (t + u) ξ (s) ξ (s + u) − b(t, u)b(s, u) dtds. sin ωk t sin ωl s 0 0
For Gaussian BPCRP ◦
◦
◦
◦
E ξ (t) ξ (t + u) ξ (s) ξ (s + u) − b(t, u)b(s, u) = b(t, s − t)b(t + u, s − t) + b(t, s − t + u)b(t + u, s − t − u). This function is the temporal biperiodical function. Representing it in the form of series (C.1), we come to (27)–(29). It follows from conditions (3) and relations (C.2) and (C.3) that lim B˜ pq (u1 , u) = 0
|u1 |→∞
∀ p, q ∈ [−2N2 , 2N2 ].
Since functions (29) are bounded, then RB˜ k B˜ l (u) → 0 as T → ∞. Thus, estimator (26) is the mean square consistent. The advantage of the least square estimation as compared with the component estimation is the absence of the leakage effects, which can cause significant errors when the combination frequencies have the close values. Other summands of the biases and the variances asymptotically converge to the values of the component estimator. To analyze the difference between the errors of the component and the least square estimator we must consider the concrete BPCRP models.
Component and the Least Square Estimation of Mean and Covariance Functions
157
4 Additional Technical Results Let us concretize above obtained results for quadrature model of BPCRP ξ (t) = ξc (t) cos λ11 t + ξs (t) sin λ11 t,
(30)
where ξc (t) and ξs (t) are jointly stationary random processes, Eξc (t) = mc , Eξs (t) = 2π ms ,λ11 = 2π P1 + P2 , P1 and P2 are the periods of the nonstationarity. Mean and covariance function of process (30) are the following: m(t) = mc cos λ11 t + ms sin λ11 t, c s b(t, u) = B00 (u) + B22 (u) cos λ22 t + B22 (u) sin λ22 t.
(31)
Here, B00 (u) =
1 [Rc (u) + Rs (u)] cos λ11 u + R− cs (u) sin λ11 u, 2
(32)
c B22 (u) =
1 [Rc (u) − Rs (u)] cos λ11 u + R+ cs (u) sin λ11 u, 2
(33)
1 s B22 (u) = R+ cs (u) cos λ11 u + [Rs (u) − Rc (u)] sin λ11 u, 2
(34)
◦
◦
◦
◦
◦
◦
and Rc (u) = E ξc (t) ξc (t + u), Rs (u) = E ξs (t) ξs (t + u), ξc (t) = ξc (t) − mc , ξs (t) = − ξs (t) − ms , R+ cs (u) and Rcs (u) are the even and odd parts of the cross-covariance function ◦
◦
Rcs (u) = E ξc (t) ξs (t + u). Since the expressions for the moment function of estimators are cumbersome, below we consider only the covariance components estimators. 4.1 Biases of the Covariance Components Estimators For the conciseness, rename covariance components (32)–(34) c s B1 (u) = B00 (u), B2 (u) = B22 (u), B3 (u) = B22 (u).
The LS estimators of the covariance components then are determined by the relation 1 Dlk B˜ l (u), k = 1, 3, Bˆ k (u) = |D| 3
l=1
where 1 B˜ 0 (u) = T
T ζ (t, u)dt, 0
(35)
158
I. Javorskyj et al.
T c B˜ 22 (u) cos λ22 t 1 ζ (t, u) = dt. s T sin λ22 t B˜ 22 (u) 0
For simplifying, consider the case when u = 0. Then the biases of the covariance components estimators are determined by the formula ⎡ ⎡ ⎤ ⎡ ⎤ T T 1 1 1 ⎣D1k ⎣ εζ (t, 0)dt ⎦ + D2k ⎣ εζ (t, 0) cos λ22 tdt ⎦ ε Bˆ k (0) = − |D| T T
0
⎡ +D3k ⎣
1 T
T
⎤⎤
0
εζ (t, 0) sin λ22 tdt ⎦⎦,
0
where 1 [2hc (t, T )(M11 cos λ11 t + M12 sin λ11 t) + 2hs (t, T )(M21 cos λ11 t |M| + M22 sin λ11 t) − H1 (T ) cos λ22 t − H2 (T ) sin λ22 t − H3 (T )], (36)
εζ (t, 0) =
and
T cos λ11 s hc (t, T ) 1 = b(t, s − t) ds, T sin λ11 s hs (t, T ) 0
1 2 2 2 2 + Is (T ) M21 − M12 − M22 Ic (T ) M11 H1 (T ) = 2|M| + 2Ics (T )(M11 M21 − M12 M22 )], H2 (T ) =
1 [M11 M12 Ic (T ) + M22 M21 Is (T ) + Ics (T )(M11 M22 + M12 M21 )], |M|
1 2 2 2 2 + Is (T ) M22 + M12 + M21 H3 (T ) = Ic (T ) M11 2|M| +2Ics (T )(M11 M21 + M22 M12 )],
T T Ic (T ) cos λ11 s1 cos λ11 s2 1 = 2 b(s1 , s2 − s1 ) ds1 ds2 , T sin λ11 s1 sin λ11 s2 Is (T ) 0
Ics (T ) =
1 T2
(37)
0
T T b(s1 , s2 − s1 ) cos λ11 s1 sin λ11 s2 ds1 ds2 . 0
0
(38)
Component and the Least Square Estimation of Mean and Covariance Functions
159
After transformations of double integrals (37) and (38), taking into account the representation (31) and integrating respect the variable s1 in the first approximation, we obtain T
1 Ic (T ) = 2T
u c s 2B00 (u) cos λ11 u + B22 (u) cos λ11 u − B22 (u) sin λ11 u du, T
1−
0
(39) T
1 Is (T ) = 2T
u c s 2B00 (u) cos λ11 u − B22 (u) cos λ11 u + B22 (u) sin λ11 u du, T
1−
0
(40) 1 Ics (T ) = 2T
T
u s c 2B22 (u) cos λ11 u + B22 1− (u) sin λ11 u du. T
(41)
0
Taking into account (36), we have 1 ε0 (T ) = T
T εζ (t, 0)dt = 0
1 [2(M11 Ic (T ) + M12 Isc (T )) |M|
+ 2(M21 Ics (T ) + M22 Is (T )) −H1 (T )c02 − H2 (0)a02 − H3 (T )], 1 εc (T ) = T
T εζ (t, 0) cos λ22 tdt = 0
(42)
1 [M11 [Ic (T ) + I1 (T )] + M22 [I1 (T ) − Is (T )] |M|
+M12 [Isc (T ) + I2 (T )] + M12 [I2 (T ) − Ics (T )] −H1 (T )c22 − H2 (T )a22 − H3 (T )c02 ], (43) 1 εs (T ) = T
T εζ (t, T ) sin λ22 tdt = 0
1 [M11 [I2 (T ) + Ics (T )] + M22 [Isc (T ) + I2 (T )] |M|
+M12 [Ic (T ) − I1 (T )] + M21 [I2 (T ) + Is (T )] −H1 (T )a22 − H2 (T )s22 − H3 (T )a02 ]. (44) Here, in the first approximation 1 I1 (T ) = 4T
T −T
|u| c s B22 (u) cos λ11 u + B22 1− (u) sin λ11 u du, T
(45)
160
I. Javorskyj et al.
1 I2 (T ) = 4T
T |u| s s 1− B22 (u) cos λ11 u − B22 (u) sin λ11 u du. T
(46)
−T
Thus, for the biases of estimators (35), we have 1 ε Bˆ k (0) = − [D1k ε0 (T ) + D2k εc (T ) + D3k εs (T )], k = 1, 2, 3. |D| They are caused only by the previous calculation of the mean function of BPCRP. The dependencies of these quantities on the realization length are determined by elations (42)–(46) and also by the determinants |M|, |D|, and their algebraic complements. The asymptotical values of these quantities are the following: lim |M| =
T →∞
lim D11 =
T →∞
1 1 , lim |D| = , 4 T →∞ 4
1 1 , lim D22 = lim D33 = , T →∞ 4 T →∞ 2
lim M11 = lim M22 =
T →∞
T →∞
1 . 2
The limits of other complements are equal to zero. Then, for the biases in asymptotics T → ∞, we get ε Bˆ 00 (0) = −2[Ic (T ) + Is (T )], c ε Bˆ 22 (0) = −2[Ic (T ) − Is (T ) + 4I1 (T )], s ε B22 (0) = −4[Ics (T ) + 2I2 (T )]. We obtain these expressions analyzing the biases of the component statistics 1 Bˆ 00 (0) = T
2 T ◦ ◦ ξ (t) − m(t) ˆ dt, 0
c Bˆ 22 (0)
1 = s ˆB22 T (0)
2 T ◦ ◦ cos λ22 t ξ (t) − m(t) ˆ dt. sin λ22 t 0
Substituting the formulae for covariance components (32)–(34) into (45) and (46), we come to relations 1 Ic (T ) = 2T
T
u 1 Rc (u) + [Rc (u) + Rs (u)] cos λ22 u du, 1− T 2 0
Component and the Least Square Estimation of Mean and Covariance Functions
1 Is (T ) = 2T
161
T
u 1 Rs (u) + [Rc (u) + Rs (u)] cos λ22 u du, 1− T 2 0
1 Ics (T ) = 2T
T
1−
u + R (u)du, T cs
0
T
u 1 1− I1 (T ) = [Rc (u) − Rs (u)] cos λ22 udu, 4T T 0
1 I2 (T ) = 2T
T
u + R (u) cos λ22 udu. 1− T cs 0
Suppose that Rc (u) = A1 e−α1 |u| , Rs (u) = A2 e−α2 |u| , Rcs (u) = A3 e−α3 |u| ,
(47)
and introduce the functions 1 rk (αi , T ) = T
T
u −λi u e 1− cos λkk udu. T
(48)
0
Then,
1 [A1 r2 (α1 , T ) + A2 r2 (α2 , T )] , 2 1 [A1 r2 (α1 , T ) + A2 r2 (α2 , T )] , 2 A3 Ics (T ) = r0 (α3 , T ), 2 A1 I1 (T ) = r2 (α1 , T ). 4 The derived relations allow to calculate the bias values for the least square and the component estimators of the covariance components for the chosen covariance parameters of the modulating processes ξc (t) and ζs (t) and to analyze their dependencies on the realization length. The results of such calculations are shown below (Table 3a–c and Fig. 1). As can be seen from the presented calculation results, the bias values of the component estimators and the LS estimators differ insignificantly even for the short realization length. So, the bias of the component estimator of the cosine function for T = 10P1 is greater than the bias of the LS estimators in 1.06 times, and for the sine function – in 1.04 times. The biases of estimators decrease as the coefficients of the covariance damping increase, however, the difference between the biases of both methods of the estimation, in this case, changes insignificantly. We can neglect the difference between the biases when T ≥ 25P1 . 1 Ic (T ) = A1 r0 (α1 , T ) + 2 1 Is (T ) = A2 r0 (α2 , T ) + 2
162
I. Javorskyj et al. Table 3. The bias values for the different realization length and P2 = 1.5P1 ; P1 = 10:
a) ac = αs = 0.02; acs = 0.01. T, s
LS estimators ε Bˆ 00 (0)
c (0) ε Bˆ 22
s (0) ε Bˆ 22
Component estimators c (0) ε Bˆ 00 (0) ε Bˆ 22
s (0) ε Bˆ 22
10
0.695348
0.092899
0.279155
0.075049
0.194351
0.490785
25
0.654656
0.131864
0.382010
0.682842
0.171640
0.460037
50
0.579925
0.165710
0.390541
0.588926
0.147481
0.426452
75
0.514373
0.128630
0.395318
0.514373
0.128660
0.395421
100
0.448205
0.106702
0.354270
0.454245
0.113632
0.367978
150
0.364470
0.091133
0.321391
0.364470
0.091147
0.321437
200
0.300180
0.077822
0.279255
0.301877
0.075492
0.283868
250
0.254819
0.062550
0.250033
0.256466
0.064133
0.253159
500
0.143619
0.036454
0.159394
0.144016
0.036011
0.160280
LS estimators ε Bˆ 00 (0)
c (0) ε Bˆ 22
s (0) ε Bˆ 22
Component estimators c (0) ε Bˆ 00 (0) ε Bˆ 22
s (0) ε Bˆ 22
10
0.215947
0.036090
0.163220
0.245417
0.073327
0.245498
25
0.172781
0.022969
0.103240
0.108103
0.030936
0.121612
50
0.101215
0.016656
0.058910
0.055715
0.015741
0.065074
75
0.054231
0.010065
0.042469
0.037512
0.010554
0.044332
100
0.037512
0.007144
0.031371
0.028273
0.007938
0.033605
150
0.027764
0.005075
0.021728
0.018941
0.005307
0.022641
200
0.018941
0.003922
0.016174
0.014240
0.003986
0.017069
250
0.014139
0.002984
0.013016
0.011409
0.003191
0.013698
500
0.011325
0.001549
0.006587
0.005721
0.001598
0.006892
LS estimators ε Bˆ 00 (0)
c (0) ε Bˆ 22
s (0) ε Bˆ 22
Component estimators c (0) ε Bˆ 00 (0) ε Bˆ 22
s (0) ε Bˆ 22
10
0.154963
0.028613
0.113292
0.175572
0.061418
0.185687
25
0.069226
0.017066
0.065062
0.073620
0.024602
0.082842
50
0.036476
0.011945
0.036187
0.037375
0.012307
0.042867
b) ac = αs = 0.6; acs = 0.3 T, s
c) ac = αs = 1.0; acs = 0.5 T, s
75
0.025042
0.007259
0.025955
0.025042
0.008206
0.028899
100
0.018516
0.005154
0.019026
0.018829
0.006155
0.021795
150
0.012584
0.003643
0.013155
0.012584
0.004103
0.014610
200
0.009389
0.002804
0.009758
0.009449
0.003078
0.010988
250
0.007514
0.002140
0.007842
0.007565
0.002462
0.008804
500
0.003778
0.001106
0.003961
0.003788
0.001231
0.004416
Component and the Least Square Estimation of Mean and Covariance Functions
163
Fig. 1. The dependencies of the biases on realization length:P2 = 1.5P1 ; P1 = 10 s, ac = αs = 1.0 (black line – component estimators, red line – LS estimators).
164
I. Javorskyj et al.
4.2 Variances of the Covariance Component Estimators To simplify the calculation of the variances for (35), introduce the functions ⎡ ⎤ T ◦ ◦ 1 D 1k ⎣ (k) ξ (t) ξ (t + u)dt ⎦, Cˆ 0 (u) = |D| T ⎡ D2k + iD3k ⎣ 1 (k) Cˆ 2 (u) = |D| T
0
T
⎤ ◦
◦
(49)
(k) (k) ξ (t) ξ (t + u)eiλ22 t dt ⎦, Cˆ −2 = Cˆ 2 .
(50)
0
Then estimators (35) have the form Bˆ k (u) =
Cˆ r(k) (u),
r=0,±2
and hence " 2 2 2 # (k) (k) (k) (k) Var Bˆ k (u) = Var Cˆ 0 (u) + 2Var Cˆ 2 (u) + 2Re E Cˆ 2 (u) − E Cˆ 2 (u) $ % (k) (k) (k) (k) +4Re E Cˆ 0 (u)Cˆ 2 (u) − E Cˆ 0 (u)E Cˆ 2 (u) . (51) Taking into consideration relations (49) and (50), we shall calculate each of the summands of the last expression. For the first of them we have T T 2 |D | 1k (k) Var Cˆ 0 (u) = bη (t, s − t, u)dtds, T 2 |D|2 0
0
where bη (t, s − t) is the covariance function of the random process η(t, u) = ◦
◦
ξ (t) ξ (t + u): bη (t, s − t, u) = Eη(t, u)η(s, u) − b(t, u)b(s, u). For Gaussian process bη (t, s − t, u) = b(t, s − t)b(t + u, s − t) + b(t, s − t + u)b(t + u, s − t − u). (52) Let us introduce a new variable u1 = s − t and change the order of integration. Taking into account the equality bη (t, −τ, u) = bη (t − τ, u), we have Var
(k) Cˆ 0 (u)
2|D1k | = θ 2 |D|2
T T−u1 bη (t, u1 , u)dtdu1 . 0
0
Proceeding from (31) and (52), we get bη (t, u1 ) =
r=0,±2,±4
B˜ r (u1 , u)eiλ22 t ,
(53)
Component and the Least Square Estimation of Mean and Covariance Functions
165
where 2 (u ) + B (u + u)B (u − u) + 2|B (u )|2 cos λ u + B B˜ 0 (u1 , u) = B00 1 00 1 00 1 22 1 22 −2,−2 (u1 + u)
×B22 (u1 − u)e−iλ22 u + B22 (u1 + u)B−2,−2 (u1 − u)e−iλ22 u ,
B˜ 2 (u1 , u) = B00 (u1 )B22 (u1 ) 1 + eiλ22 u + B00 (u1 + u)B22 (u1 − u)eiλ22 u +B22 (u1 + u)B0 (u1 − u)e−iλ22 u , 2 B˜ 4 (u1 , u) = B22 (u1 )eiλ22 u + B2 (u1 + u)B2 (u1 − u)eiλ22 u ,
(54) (55) (56)
c s (u) ,B ˜ 4 (u1 , u) = 1 B˜ c (u1 , u) − iB˜ s (u1 , u) . and B22 (u) = 21 B22 (u) − iB22 4 4 2 Using (53) in the first approximation, we have 2|D |2 T
u1 ˜ 1k (k) 1 − Var C0 (u) = B0 (u1 , u)du1 . T T |D|2
(57)
0
After the analogous transformation of the rest of the components of the expression (58), we come to formulae Var
(k) C2 (u)
2 T 2
D2k + D3k u1 = B0 (u1 , u) cos λ22 u1 du1 , 1 − T 2|D|2 T
(58)
0
" 2 2 # (k) (k) 2Re E C2 (u) − EC2 (u) 1 2 2 = D − D 2k 3k 2|D|2 T
−2D2k D3k
T
0
1−
T
1−
u1 ˜ c B4 (u1 , u) cos λ22 u1 +B˜ 4s (u1 , u) sin λ22 u1 T
0
u1 ˜ c B4 (u1 , u) sin λ22 u1 + B˜ 4s (u1 , u) cos λ22 u1 du1 , T
(59)
% $ (k) (k) (k) (k) 4Re E Cˆ 0 (u)Cˆ 2 (u) − E Cˆ 0 (u)E Cˆ 2 (u)
⎡ ⎤ T
T |u1 | s D1k ⎣ u1 ˜ c = 1− 1− A2k B2 (u1 , u)du1 + A3k B˜ 2 (u1 , u)du1 ⎦. (60) T T |D|2 T −T
−T
Summarizing expressions (57)–(60), we obtain the formula for the variances (58).
166
I. Javorskyj et al.
The asymptotic expressions for variance (51) for the different k as T → ∞ have the forms T
u1 ˜ 2 ˆ ˆ 1− Var B1 (u) = Var B00 (u) = B0 (u1 , u)du1 , T T 0 c ˆ Var B2 (u) = Var Bˆ 22 (u)
2 = T
T
1−
(61)
u1 ˜ 2B0 (u1 , u) cos λ22 u1 + B˜ 4c (u1 , u) cos λ22 u1 − B˜ 4s (u1 , u) sin λ22 u1 du1 , T
0
Var Bˆ 3 (u) = Var =
2 T
T
1−
s Bˆ 22 (u)
(62)
u1 ˜ 2B0 (u1 , u) cos λ22 u1 − B˜ 4c (u1 , u) cos λ22 u1 + B˜ 4s (u1 , u) sin λ22 u1 du1 . T
0
(63) Obtained formulae are the same as the expressions for the variance of the component estimators c T ◦ ◦ T ◦ ◦ Bˆ 22 (u) 1 1 ξ (t) ξ (t + u)dt, ξ (t) ξ (t + u) cos λ22 tdt. Bˆ 00 (u) = (64) = s T T Bˆ 22 (u) 0
0
To calculate numerical values of variances (61)–(63), we use approximation (47) for the covariance function of the modulating processes. For arbitrary value of lag u, formulae for the variances will be cumbersome; that is why we shall consider only the case when u = 0. Following from relations (54)–(56), we have 2 B˜ 0 (u1 , 0) = 2B00 (u1 ) + 4|B22 (u1 )|2 , c B˜ 2c (u1 , 0) = 4B00 (u1 )B22 (u1 ), s B˜ 2s (u1 , 0) = 4B00 (u1 )B22 (u1 ),
c 2 s 2 B˜ 4c (u1 , 0) = B22 (u1 ) − B22 (u1 ) , c s B˜ 4s (u1 , θ ) = 2B22 (u1 )B22 (u1 ).
Taking into account (32)–(34), (47), and (48) for variances (61)–(63), we get 1 Var Bˆ 00 (0) = A21 [2r0 (2α1 , T ) + r2 (2α1 , T )] + A22 [2r0 (2α2 , T ) + r2 (2α2 , T )] 2 +2A1 A2 r2 (α1 + α2 , T ) + 4A23 r0 (2α3 , T ) , (65)
Component and the Least Square Estimation of Mean and Covariance Functions
167
1 c Var Bˆ 22 (0) = A21 [2r0 (2α1 , T ) + 4r4 (2α1 , T ) + r4 (2α1 , T )] + A22 [2r0 (2α2 , T ) 2 +4r2 (2α2 , T ) + r4 (2α2 , T )] + 4A23 [2r2 (2α3 , T ) − r0 (2α3 , T )] +2A1 A2 r4 (α1 + a2 , T )], (66) 1 s Var Bˆ 22 (0) = A21 [4r2 (2α1 , T ) + r4 (2α1 , T )] + A22 [4r2 (2α2 , T ) + r4 (2α2 , T )] 2 + 2A1 A2 [2r0 (α1 + a2 , T ) + r4 (α1 + α2 , T )] + 4A23 [r0 (2α3 , T ) + 2r2 (2α3 , T )] . (67) Obtained expressions describe the dependencies of the covariance component estimators (64) on the realization length and the covariance parameters of the modulating processes, namely, their variances, cross-variances, and the rate of correlation vanishing. The specified calculations were carried out using these expressions. Their results are shown below. Proceeding from expressions (57)–(60) for the variances of the LS estimators (35) we obtain 1 2 2 D1k A1 [2r0 (2α1 , T ) + r2 (2α2 , T )] + A22 [2r0 (2α2 , T ) Var[Bk (0)] = 2|D|2
2 2 + r2 (2α2 , T )] + 2A1 A2 r2 (α1 + α2 , T ) + 4A23 r0 (2α3 , T ) + D2k + D3k 1 2 A [r0 (2α1 , T ) + 4r2 (2α1 , T ) + r4 (2α1 , T )] + A22 [r0 (2α2 , T ) + 4r2 (2α2 , T )] × 4 1 +r4 (2α2 , T )] + 2A1 A2 [r0 (α1 + α2 , T ) + r4 (α1 + α2 , T )]] + 2A23 r2 (2α3 , T ) +2D1k D2k A21 [r0 (2α1 , T ) + r2 (2α1 , T )] − A22 [r0 (2α2 , T ) + r2 (2α2 , T )] +2D3k A3 [A1 [r0 (α1 + α3 , T ) + r2 (α1 + α3 , T )] + A2 [r0 (α2 + α3 , T ) + r2 (α2 + α3 , T )]] 1
2 − D2 2 r (2α , T ) + A2 r (2α , T ) − 2A A r (α + α , T ) A + D2k 0 1 0 2 1 2 0 1 2 2 3k 4 1 −A23 r0 (2α3 , T ) − A3 D2k D3k [A1 r0 (α1 + α3 , T ) − A2 r0 (α2 + α3 , T )] , k = 1, 2, 3.
(68) Quantity (68) for k = 1 tends to (65) as T → ∞ and for k = 2, 3 tends to (66) and (67) respectively as T → ∞. The calculations for specific parameters of signals and realization length make possible to estimate the difference between quantities (65)–(67) and (68). The results of the calculations are illustrated below by the graphs and the tables (Figs. 2 and 3 and Tables 4 and 5) obtained for chose covariance parameters.
168
I. Javorskyj et al.
Table 4. The dependencies of variances for covariance components estimators on the realization length: P2 = 1.5P1 ; P1 = 10c; α1 = α2 = 0.6; α3 = 0.3. T , s LS estimators Component estimators c (0) s (0) c (0) s (0) Var Bˆ 00 (0) Var Bˆ 22 Var Bˆ 22 Var Bˆ 00 (0) Var Bˆ 22 Var Bˆ 22 10
0.234751
0.212913
0.184284
0.206223
0.220992
0.232242
15
0.175696
0.139942
0.156238
0.138810
0.139942
0.156238
25
0.102161
0.083206
0.084402
0.086026
0.084765
0.093104
50
0.053220
0.041802
0.045052
0.043765
0.041367
0.047385
75
0.036211
0.027696
0.031370
0.029342
0.027696
0.031370
100
0.026868
0.020719
0.022881
0.022087
0.020826
0.023463
150
0.018173
0.013830
0.015693
0.014770
0.013830
0.015693
200
0.013582
0.010380
0.011643
0.011097
0.010353
0.011791
250
0.010864
0.008286
0.009311
0.008886
0.008304
0.009406
As it follows from the shown graphs, the values of variances of component and the least square estimators are different insignificantly even for short realization lengths. Thus, the variance of the least square estimator of zero covariance component (Table 5) as T = 20P1 is only 1.21 times greater than the variance of the component estimator. And the variance of the least square estimator of the second cosine covariance component as T = 5P1 is only 1.01 times greater than the variance of the component estimator. The difference between variances increases when damping rate of correlations of signal became grater (Fig. 3 and Table 5). For example, the variance of the least square estimator of zero covariance component for T = 10P1 and α1 = α2 = 1.0; α3 = 0.5 is only 1.57 times greater than the variance of the component estimator. Difference between variances damps slowly when realization length increases and we can neglect it when T > 30P1 .
Component and the Least Square Estimation of Mean and Covariance Functions
169
Fig. 2. Dependencies of estimator variances for the covariance components on realization length: P2 = 1.5P1 ; P1 = 10c; α1 = α2 = 0.6; α3 = 0.3. (black line – component estimators, red line – LS estimators).
170
I. Javorskyj et al.
Table 5. Values of variances of covariance component estimators for the different realization length: P1 = 10c; P2 = 1.5P1 ; α1 = α2 = 1.0; α12 = 0.5. T , s LS estimators Component estimators c (0) s (0) c (0) s (0) Var Bˆ 00 (0) Var Bˆ 22 Var Bˆ 22 Var Bˆ 00 (0) Var Bˆ 22 Var Bˆ 22 10
0.203072
0.476206
0.189502
0.143953
0.440942
0.224116
15
0.146896
0.212809
0.149424
0.095315
0.218720
0.149424
25
0.085178
0.139024
0.082836
0.058328
0.139024
0.088947
50
0.043801
0.083078
0.043409
0.029404
0.084198
0.045033
75
0.029640
0.041546
0.029818
0.019655
0.041235
0.029818
100
0.022006
0.027588
0.021901
0.014773
0.027588
0.022305
150
0.014836
0.020663
0.014905
0.009865
0.020739
0.014905
200
0.011089
0.013781
0.011091
0.007406
0.013781
0.011194
250
0.008867
0.010339
0.008867
0.005928
0.010320
0.008932
Fig. 3. Dependencies of the estimators variances for the covariance components on realization length: P2 = 1.5P1 ; P1 = 10c; α1 = α2 = 1.0; α3 = 0.5. (black line – component estimators, red line – LS estimators).
Component and the Least Square Estimation of Mean and Covariance Functions
171
Fig. 3. (continued)
5 Conclusions BPCRPs are the adequate model for the signals which have the properties of the birhythmical stochastic recurrence. These properties, for example, are clearly manifested in the vibration of the rolling bearing when localized faults are initiated. We must provide the appropriate accuracy of the vibration processing results for early fault diagnostics. The required errors of the covariance analysis can be ensured by the choice of the appropriate values of processing parameters. The analysis of the properties of the estimators for BPCRP, the mean and covariance functions, and their Fourier coefficients are the theoretical basis for such choice. These estimators can be calculated by the component and least square methods. The vanishing of the variance function as the lag increases is the sufficient condition of the mean square consistency of the mean function estimators and also of the asymptotic unbiasedness of the covariance function estimator. This condition is also sufficient for the mean square consistency of the covariance function estimator for Gaussian BPCRP. The advantage of the least square estimators as compared with the component estimators is the absence of the leakage effects which can cause the significant systematic errors. The bias of the least square estimator for the covariance
172
I. Javorskyj et al.
function is caused only by the previous estimation of the mean function. The structures of least square estimators differ from one of the component estimators by the presence of the product of the harmonic functions with near frequencies. These correction quantities change the values of the biases and the variances. However, as it follows from the results of the calculations carried out for BPCRP quadrature model, both systematic and root-mean square errors for component and the least square estimators rapidly converge one-for-one as realization length increases.
Appendix A Proof of Proposition 2.1 For the mathematical expectation of (4), we have ⎤ ⎡ T N1 1 E m(t) ˆ = eiλln t ⎣ m(s)e−iλln s ds⎦. T l,n=−N1
0
Taking into account representation (8) and integrating, we obtain
1 T
θ m(s)ds = mln +
mkr ϕ λk−l,r−n T .
k,r=−N1 k=l,r=n
0
Hence, we have (9) and (10). The variance of estimator (4) is equal to ⎡ ⎤ T T N1 N1 1 Var m(t) ˆ = eiλl−r,n−k t ⎣ 2 b(s1 , s2 − s1 )ei(λln s1 −λkr s2 ) ds1 ds2 ⎦. T l,n=−N1 k,r=−N1
0
0
Introduce a new variable u = s2 − s1 , change the order of integration, and take into consideration the equality b(s, −u) = b(s − u, u). Then, 1 T2
T T b(s1 , s2 − s1 )e 0
i(λln s1 −λkr s2 )
1 ds1 ds2 = 2 T
0
T −s T b(s, u)eiλl−k,n−r s e−iλkr u duds 0 −s
T T−u
1 = 2 b(s, u)eiλl−k,n−r eiλln u + e−iλkr u dsdu. T 0
0
After substituting into last expressions the representation b(s, u) =
N2
Bpq (u)eiλpq s
p,q=−N2
and temporal integration in the first approximation, we come to (11). It follows from (11) that Var m(t) ˆ → 0 as T → 0, i.e., estimator (4) is the mean square consistent.
Component and the Least Square Estimation of Mean and Covariance Functions
173
Appendix B Proof of Proposition 2.2 Rewrite statistics (7) in the form 1 Bˆ ln (u) = T
T ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ξ (s) ξ (s + u) − m(s) ξ (s + u) − ξ (s) m(s ˆ ˆ + u) + m(s) ˆ m(s ˆ + u) e−iλln s ds, 0
◦
where m(s) ˆ = m(s) ˆ − m(s). The mathematical expectation of every component is equal to
E
E
⎧ ⎨ 1 T ⎩T
⎧ ⎨ 1 θ ⎩T
◦
◦
ξ (s) ξ (s + u)e−iλln s ds
0 ◦
◦
ξ (s + u)e m(s) ˆ
−iλln s
⎫ ⎬
1 ds = 2 ⎭ T
⎫ ⎬ ⎭
= Bln (u) + ε[Bln (u)],
T T b(s1 , s2 − s1 + u)qln (N1 , s1 , s2 )ds1 ds2 ,
0
0
0
(B.1) E
⎧ ⎨ 1 T ⎩T
◦
◦
m(s ˆ + u) ξ (s)e−iλln s ds
⎫ ⎬ ⎭
=
1 T2
T T b(s1 , s2 − s1 )hln (N1 , s1 , s2 , u)ds1 ds2 ,
0
0
0
(B.2) E
⎧ ⎨ 1 T ⎩T
◦
◦
m(s) ˆ m(s ˆ + u)e−iλln s ds
⎫ ⎬ ⎭
0
1 = T
N1
N1
m,k=−N1 r,s=−N1
ϕ λm+r−l,k+s−n T
T T 0
b(s1 , s2 − s1 )e−i(λmk s1 +λrs (s2 +u)) ds1 ds2 .
0
(B.3) Taking into consideration the representation (12) after transformation, we obtain in the ˜ first approximation formulas (13) and (14). Since the functions h(N1 , u2 ) and h(N1 , u, u1 ) ˆ u) → 0 as T → ∞ when conditions (8) are satisfied. are bounded, then ε b(t,
174
I. Javorskyj et al.
Appendix C Proof of Proposition 2.3 For Gaussian BPCRP G(s1 , s1 + u, s2 , s2 + u) = b(s1 , s2 − s1 )b(s1 + u, s2 − s1 ) +b(s1 , s2 − s1 + u)b(s1 + u, s2 − s1 − u). Introduce a new variable u1 = s2 − s1 . The function G(s, s + u, s + u1 , s + u + u1 ) is the biperiodical function of the variable s and it can be represented by the Fourier series
G(s, s + u, s + u1 , s + u1 + u) =
2N2
B˜ kr (u1 , u)eiλkr s .
(C.1)
k,r=−2N2
Proceeding from (12), we get ⎧ N2 N2 ⎪ ⎪ ⎪ ⎪ Bp(k,r) (u1 , u), k ≤ 0, r ≤ 0, ⎪ 1 q1 ⎪ ⎪ ⎪ q =r−N p =k−N 2 ⎪ 1 2 1 ⎪ ⎪ ⎪ N2 N ⎪ 2 −r ⎪ ⎪ ⎪ ⎪ Bp(k,r) (u1 , u), k ≤ 0, r > 0, ⎪ 1 q1 ⎪ ⎨ p =k−N q =−N 2 1 2 1 B˜ kr (u1 , u) = . N N ⎪ 2 −k 2 −r ⎪ ⎪ (k,r) ⎪ ⎪ Bp q (u1 , u), k > 0, r ≤ 0, ⎪ ⎪ ⎪ p1 =−N2 q1 =r−N2 1 1 ⎪ ⎪ ⎪ ⎪ ⎪ N N 2 −k 2 −r ⎪ ⎪ ⎪ ⎪ Bp(k,r) (u1 , u), k > 0, r > 0, ⎪ ⎪ 1 q1 ⎩
(C.2)
p1 =−N2 q=−N2
where (k,r) Bpq (u1 , u) = Bk+p,r+q (u1 )Bpq (u1 ) + Bk+p,r+q (u1 + u)Bpq (u1 − u) e−iλpq u . (C.3) After reducing the double integral 1 T2
T T 0
G(s1 , s1 + u, s2 , s2 + u)ei(λpq s2 −λln s1 ) ds1 ds2
0
in iterated integral and integrating respect to s in the first approximation, we obtain ˜ formula (15). If conditions (3) are satisfied, then Bpq (u1 , u) → 0 as |u1 | → ∞. Thus, ˆ u) → 0 as θ → ∞, i.e., estimator (5) is the mean square consistent. Var b(t,
Component and the Least Square Estimation of Mean and Covariance Functions
175
Appendix D Proof of Proposition 3.1 Taking into account the property 2L 1 +1
" mjr Mjk =
j=1
|M|, r = k, 0, r = k,
ˆ Em we conclude the mathematical expectation of the elements of matrix m ˆ j = mj , j ∈ [0, 2L1 ], i.e., the estimators of the Fourier coefficients for the mean function are unbiased. Then, estimator (19) is also unbiased. Proceeding from (19) for the variance estimator, we obtain 2L 1 Rm˜ m˜ fl (t)fn (t), Var m(t) ˆ = |M|2 l,n=0 l n
where Rm˜ l m˜ n
1 = 2 θ
"
θ θ b(s1 , s2 − s1 ) 0
0
# cos ωl s1 cos ωn s2 ds1 ds2 . sin ωl s1 sin ωn s2
(D.1)
After transformation of the double integral (C.1), we obtain expression (20). Since the inner integral is bounded, then Rm˜ l m˜ n → 0 as T → ∞, i.e., the least square estimator of BPCRP’s mean function is the mean square consistent.
Appendix E Proof of Proposition 3.2 Taking into account the series b(t, u) = B0 (u) +
Lc c Bk (u) cos ωk t + Bks (u) sin ωk t k=1
and the property 2L 2 +1
" drk Drj =
r=1
|D|, k = j, 0, k = j,
we have ⎤ ⎡ θ [D1k ]T ⎣ 1 b(t, u)dt ⎦ = B0 (u), |D| θ 0
176
I. Javorskyj et al.
Dp+1,k |D|
⎤ T ⎡ θ 1 ⎣ b(t, u) cos ωr tdt ⎦ = Bpc (u), θ 0
⎤ T ⎡ θ Dp+L2 +1,k 1 ⎣ b(t, u) sin ωr tdt ⎦ = Bps (u), p ∈ [1, L2 ]. |D| θ 0
Thus, leakage error is absent. It follows from (B.1) and (B.2) that the rest of the components of biases (24) and (25) for estimators of the covariance components are defined by the integrals of the forms 1 I (T ) = 2 T
T T 0
0
"
# cos ωp s1 cos ωq s2 ds1 ds2 . b(s2 , s2 − s1 ) sin ωp s1 sin ωq s2
If conditions (3) are satisfied, these integrals vanish as T → ∞. So, the LSM-estimators of the covariance components are asymptotically unbiased.
References 1. Dragan, Y., Javorskyj, I.: Rhythmics of Sea Waves and Underwater Acoustic Signals, Kyiv, Naukova Dumka (1982). (in Russian) 2. Gardner, W.A.: Introduction to Random Processes with Application to Signals and Systems. Macmillan, New York (1985) 3. Dragan, Y., Rozhkov, V., Javorskyj, I.: The Methods of Probabilistic Analysis of Oceanological Rhythmics, Leningad, Gidrometeoizdat (1987). (in Russian) 4. Gardner, W.A. (ed.): Cyclostationarity in Communications and Signal Processing. IEEE Press, New York (1994) 5. Hurd, H.L., Miamee, A.: Periodically Correlated Random Sequences: Spectral Theory and Practice. Wiley, New Jersey (2007) 6. Javorskyj, I., Yuzefovych, R., Kravets, I., Matsko, I.: Methods of periodically correlated random processes and their generalizations. In: Chaari, F., Le´skow, J., Napolitano, A., SanchezRamirez, A. (eds.) Cyclostationarity: Theory and Methods. LNME, pp. 73–93. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04187-2_6 7. Napolitano, A.: Generalizations of Cyclostationary Signal Processing: Spectral Analysis and Applications. Wiley, IEEE Press (2012) 8. Javorskyj, I.: Mathematical Models and Analysis of Stochastic Oscillations, Lviv, Karpenko Physico-Mechanical Institute of NAS of Ukraine (2013). (in Ukrainian) 9. Napolitano, A.: Cyclostationary Processes and Time Series: Theory, Applications and Generalizations. Elsevier, Academic Press (2020) 10. Antoni, J.: Cyclostationarity by examples. Mech. Syst. Sig. Process. 23, 987–1036 (2009) 11. Gardner, W.A.: Exploitation of spectral redundancy in cyclostationary signals. IEEE SP Mag. 3, 14–36 (1991) 12. Gardner, W.A., Napolitano, A., Paural, L.: Cyclostationarity: half century of research. Sig. Process. 86(4), 639–697 (2006) 13. Napolitano, A.: Cyclostationarity: new trends and application. Sig. Process. 120, 385–408 (2016)
Component and the Least Square Estimation of Mean and Covariance Functions
177
14. Wylomanska, A., Obuchowski, J., Zimroz, R., Hurd, H.: Influence of different signal characteristics on PAR model stability. In: Chaari, F., Leskow, J., Napolitano, A., Zimroz, R., Wylomanska, A., Dudek, A. (eds.) Cyclostationarity: Theory and Methods - II. ACM, vol. 3, pp. 89–104. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16330-7_5 15. Obuchowski, I., Zimroz, R., Wylomanska, A.: Identification of cyclic components in presence of non-Gaussian noise – application to crusher bearings damage detection. J. Vibroeng. 17(3), 473–480 (2015) 16. Antoni, J., Bonnardot, F., Raad, A., Badaoui, E.: Cyclostatinary modeling of rotating machine vibration signals. Mech. Syst. Sig. Process. 18, 1285–1314 (2004) 17. Mykhajlyshyn, V., Yavorskyj, I., Vasylyna, Y., Drabych, O., Isayev, I.: Probabilistic models and statistical methods for the analysis of vibration signals in the problems of diagnostics of machines and structure. Mater. Sci. 33, 655–672 (1997) 18. Javorskyj, I., Kravets, I., Matsko, I., Yuzefovych, R.: Periodically correlated random processes: application in early diagnostics of mechanical systems. Mech. Syst. Sig. Process. 83, 406–438 (2017) 19. Javorskyj, I.: Biperiodically correlated random processes as model for bi-rhythmic signal. In: All-Union Conference on Information Acoustics, Moscow, pp. 6–10 (1987). (in Russian) 20. Javorskyj, I.: Statistical properties of biperiodically correlated random sequences. Otbor i Peredacha Informacii 1(77), 16–23 (1988). (in Russian) 21. Dehay, D.: Spectral analysis of the covariance of the almost periodically correlated processes. Stoch. Process. Appl. 50(2), 315–330 (1994) 22. Hurd, H., Leskow, J.: Strongly consistent and asymptotically normal estimation of the covariance for almost periodically correlated processes. Stat. Decis. 10(3), 201–225 (1992) 23. Javorskyj, I.: Statistical analysis of poly- and almost periodically correlated random processes. Otbor i Peredacha Informacii 3(79), 1–10 (1989). (in Russian) 24. Homepage. http://www.springer.com/lncs. Accessed 21 Nov 2016
The Synchronous Fitting of Cyclo-non-Stationary Signals: Definition and Theoretical Analysis Dany Abboud1(B) , Amadou Assoumane1 , and Mohammed Elbadaoui1,2 1
Safran Tech, Rue des Jeunes Bois - Chˆ ateaufort, 78772 Magny-les-Hameaux, France [email protected], [email protected] 2 Univ Lyon, Univ Jean-Monnet of Saint-Etienne, LASPI, EA3059, 42023 Saint-Etienne, France
Abstract. This paper addresses the problem of deterministic/random separation in vibration signals when the machine is operating under nonstationary regime. The solution to this problem is well established in the stationary regime case, where the deterministic component is simply periodic. In this regard, the synchronous average provides an optimal way to separate a deterministic synchronous source from other interferences. However, synchronous averaging theoretically requires the machine to operate under stationary regime (i.e. the related vibration signals are cyclostationary) and is otherwise jeopardized by the presence of amplitude and phase modulations. The local synchronous fitting presents a powerful generalization of the synchronous average to the non-stationary regime case (i.e. the related vibration signals are cyclonon-stationary). The idea is to replace the (cyclic) empirical average operation by a (cyclic) local curve fitting using the Savitsky-Golay algorithm. This paper studies the temporal and spectral properties of this filter, and demonstrates its potentiality on real-world helicopter data recorded under varying operating speed. Keywords: Gears · Bearing · Diagnosis · Nonstationary conditions Vibration signal · Savitzky–Golay filter · Signal separation
1
·
Introduction
Vibratory health monitoring of rotating machines has been widely used in industries for decades. It consists in analyzing the vibration signal in order to extract machine health indicators. By its rotary movement, a machine generates a vibration signal rich in information on the health state of each of these organs including gears, bearings, shafts, fan blades etc. These signals are cyclic in nature and can be described within the cyclo-non-stationary (CNS) framework. The CNS framework extends the cyclo-stationary theory to the case where machine signals are recorded under varying operating regimes (varying load and/or speed). c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 178–192, 2022. https://doi.org/10.1007/978-3-030-82110-4_9
The Synchronous Fitting of Cyclo-non-Stationary Signals
179
They can be classified mainly into two categories: CNS at order 1 and CNS at order 2 (or higher). With regard to bearings, their characteristic signal is CNS2 in nature. In fact, when they are defective, they generate a series of random but pseudo-periodic pulses due to the sliding phenomenon in the bearing rolling elements motion [1]. In the case of gearboxes, the meshing signal is intrinsically deterministic and, thus, CNS1 in nature. Such a signal can be modeled as a sum of a sinusoid modulated by the time-varying amplitudes and phases related to the regime. For an accurate diagnosis, the separation of the CNS1 and CNS2 signals is required. The few separation approaches in the literature using the cyclic behavior of mechanical organs are based on the estimation of the CNS1 sinusoid, generally described by sinusoidal components. Synchronous averaging is probably the most used technique for extracting sinusoidal components because of its simplicity and ease of implementation [2,3]. Despite its widespread use, it is only valid in stationary regime (constant speed and load). In variable regime, the synchronous average fails due to the structural change in the signal statistics across the cycles. In fact, the change in operating regime induces relatively slow amplitude modulations in the signal as well as phase modulations. Methods adapted to the analysis of non-stationary signals are present in the literature. Previously, there was an attempt to generalize the SA by Daher et. al [4] using a parametric approach. In details, the authors decomposed the deterministic components onto a set of periodic functions multiplied by functions dependent on the speed capable of capturing long-term evolution over consecutive cycles. This approach uses a higher-order polynomial to estimate the sinusoidal component throughout the cycle contained in the raw vibration signal. It was shown that this technique suffers from side effects and underestimates the entire dynamic of the sinusoidal component. Also, It is worth mentioning that authors in [5] proposed the so-called “Generalized Synchronous Average”, denoted GSA, for the variable speed condition. A particular difference between the GSA and the proposed approach, is the fact that the applicability of the former is confined to the case where only one CNS agent varies during the acquisition (e.g. speed or torque) and its profile has to be known. In fact, the GSA decomposes the CNS agent profile into a given number of regimes and perform the average for each regime. A commonly encountered scenario in rotating machine is when the speed and load vary simultaneously. In this case, the GSA is not applicable whereas the proposed method can deal with this issue. Recently, the so-called local synchronous fitting (LSF) [6] has been proposed. The LSF uses a lower order polynomial to estimate the complex envelope across the cycles. In reference [6] the LSF showned to perform better than the method proposed by Daher et al. [4]. Other parametric methods such as Vold-Kalman filtering [7], H∞ filtering [8], biquadratic filtering [9] have been proposed to monitor rotating machines under variable speed conditions.
180
D. Abboud et al.
For the time being, the properties of the LSF are not studied. The aim of this paper is to fill this gap by studying the time and frequency properties of the LSF filter. The paper is organized as follows. In Sect. 2 we recall the LSF for the estimation of sinusoidal components. In Sect. 3 the properties in time and frequency domain of the LSF filter is studied. In Sect. 4 the LSF filter is applied to analyze a helicopter vibration signal. The last section concludes the paper.
2 2.1
Polynomial Generalization of the Synchronous Average Cyclo-non-Stationary Signals
Cyclo-non-stationary (CNS) signals are firstly proposed to describe vibration signals recorded under strong varying regimes. These signals, though being resampled in the angular domain, present strong long-term statistical change in their structure that prevents them to be cyclostationary. As it will be shown in this paper, the CNS class can enfold other types of signals generated from different applications, not only vibrations recorded under nonstationary regimes. As this paper is concerned with the estimation of the deterministic component, firstorder CNS signals are of concern. It is proposed to define a first-order CNS signal, say x[n], with a fundamental period N (i.e. cycle and 1/N the normalized frequency) and a length L as: dk [n]ej2πkn/N + w[n] (1) (∀n ∈ {1, · · · , L}) x[n] = d[n] + w[n] = k
where k is an integer, dk [n] ∈ C are deterministic smooth functions (whose real and imaginary part are continuous and differentiable) and whose bandwidths, denoted Bk , are much smaller than the half of the fundamental frequency i.e.: (∀k) , Bk 1/2N and w[n] is a random noise that can possibly comprise other higher-order CNS components. The fundamental period N can be also referred to as “revolution” or “cycle”. As periodicity is generally defined in the angular domain, the index n generally refers to the discret angular variable, that is x(θn ) = x(nΔθ) = x[n] where θ and Δθ respectively denotes the continuous angle variable and increment (angular sampling period). According to the Weirstrass theorem, the complex envelopes, dk [n], can be approximated through a P -order polynomial function, i.e.: (∀k)
dk [n] ≈
P p=0
dpk np
(2)
where dpk ∈ C. By inserting Eq. (2) into the expression of d[n], one obtains: (∀n ∈ {1, · · · , L})
d[n] =
P
cp [n]np
(3)
p=0
where cp [n] = k dpk ej2πkn/N is a periodic function of period N . Equation (3) indicates that the deterministic component can be approximated by a sum of
The Synchronous Fitting of Cyclo-non-Stationary Signals
181
periodic functions multiplied with the polynomial basis: it is actually a polynomial with periodic coefficients. Let us first define n ¯ = n − 1/N + 1 as the sample location within the period N (a/b denotes the division of a by b). Since cp (n) is periodic with period N , n] = cp [¯ n + (q − 1)N ] for all integer q ∈ {1, . . . , Q} (Q is the number we have cp [¯ of cycles). Thus, Eq. (3) can be equivalently written as follows: (∀q ∈ {1, · · · , Q}) (∀¯ n ∈ {1, · · · , N })
d[¯ n +(q −1)N ] =
P
cp [¯ n](¯ n +(q −1)N )p
p=0
(4) p n − N )p−i q i where Using the binomial theorem (¯ n + (q − 1)N )p = i=0 Cip N i (¯ Cip is the binomial coefficient), one can deduce from Eq. (4) that the samples n] = d[¯ n + (q − 1)N ] for all associated with the same location n ¯ in the period, sq [¯ integer q ∈ {1, · · · , Q}, define a polynomial of order P with constant coefficients, i.e.: P n] = bp [¯ n]q p (5) (∀q ∈ {1, · · · , Q}) (∀¯ n ∈ {1, · · · , N }) sq [¯ p
p=0
P
j n −N )j−p cj [¯ n]. j=p Cp (¯
where bp [¯ n] = N Note that bp [¯ n] is parametrized by n ¯. Equation (5) means that, for each position withing the cycle, the data evolution along the cycles follows a smooth curve defined through a P th order polynomial. 2.2
Local Synchronous Fitting
As previously pointed out, the estimation of the deterministic component resumes to find a curve that fits the data points that are related to the same location in the cycle. While the proposed approach also seeks a polynomial fit, this latter is made locally and for each data point. The adopted fitting method is excerpted from the “Savitzky-Golay filter” which is a widely known method to smooth or fit the data based on the least mean square solution of local polynomial fitting [10]. n] being a Precisely, for every q ∈ {1, · · · , Q}, let’s consider the data set xq [¯ function of q and parametrized by n ¯ ; we try to find the best LMS polynomial fit, with a fixed order P at the point n ¯ , from the 2M + 1 subset centered at q, i.e. n], · · · , xq+M [¯ n]}. It follows that, this problem can be stated in a similar {xq−M [¯ way as the previous subsection, i.e.: ˆ(q) [¯ (∀q ∈ {1, · · · , Q}) (∀¯ n ∈ {1, · · · , N }) b n] = argminJ b(q) [¯ n] − x(q) [¯ n]2 (6) b (q) [¯ n]
n] = [xq−M [¯ n], · · · , xq+M [¯ n]]T represents the q th subset, b(q) [¯ n] = where x(q) [¯ (q) (q) T [b0 [¯ n], · · · , bP [¯ n]] are the P + 1 polynomial coefficients associated with the q th subset, and J the (2M + 1) × (p + 1) matrix such that for every m ∈
182
D. Abboud et al.
{1, · · · , 2M + 1}, p ∈ {1, · · · , P + 1}, Jm,p = (m − M + 1)p−1 . The (2M + 1)length curve that best fits the q th subset writes: s(q) m =
P
b(q) n](m − M + 1)p p [¯
(7)
p=0
It is worth noting that the window doesn’t have to be symmetric in general, but it is decided to adopt in this paper a symmetric window centered on the sample itself, considering M samples on the left and M on the right. The Savitzky-Golay method suggests to estimate the deterministic component at the q th data point by retaining the value of the polynomial at the central point i.e. at m = M + 1: ˆ n + (q − 1)N ] = sˆ(q) [¯ ˆ(q) n] d[¯ M +1 n] = bM +1 [¯ (8) It can be shown that solution of (6) writes: (∀q ∈ {1, · · · , Q}) (∀¯ n ∈ {1, · · · , N } :)
(∀q ∈ {1, · · · , Q}) (∀¯ n ∈ {1, · · · , N })
ˆ(q) [¯ n] = Hx(q) [¯ n] b
(9)
where H = (J T J )−1 J T is a matrix of size (P + 1) × (2M + 1) whose elements are independent of n ¯ and q. The (M + 1)th element of the above vector namely (q) ˆb(q) [¯ [¯ n] with the 2M + 1 elements of M +1 n] is actually a linear combination of x th T ¯: the (M + 1) row, h = [h−M , · · · , hM ], of H being independent of q and n T (q) ˆb(q) [¯ [¯ n] M +1 n] = h x
(10)
Considering Eqs. (8) and (9), one can write the estimate of the deterministic component, henceforth called “local synchronous fit” (LSF), as: M
ˆ n + (q − 1)N ] = d[¯
xq−m [¯ n]hm
m=−M M
=
x[¯ n + (q − 1)N − mN ]hm
m=−M M N
=
˜i x[¯ n + (q − 1)N − i]h
(11)
i=−M N
The above equation can be explained as follows: the deterministic component estimate at a given location n ¯ and a cycle N is a weighted sum of the data located at the same cycle positions in neighboring cycles. The number of considered cycles depends on the length of the S-G filter, being equal to 2M + 1: one for the sample located at the same position, M for the data located at the left (previous cycles) and M for the data located at the right (following cycles). With that being said, one can reformulate Eq. (11) as follows: ˆ n + (q − 1)N ] = d[¯
M N i=−M N
˜i x[¯ n + (q − 1)N − i]h
(12)
The Synchronous Fitting of Cyclo-non-Stationary Signals
183
˜ T = [h ˜ −M N , · · · , h ˜ M N ] is obtained by zero-padding the S-G filter h as where h follows: ˜ i = hm if i = mN, −M ≤ m ≤ M h (13) ˜i = 0 h elsewhere Now applying the variable change n = n ¯ + (q − 1)N , Eq. (12) writes as: ˆ = d[n]
M N
˜i x[n − i]h
(14)
i=−M N
The above formulation gives an interesting insight on the LSF method: it is ˜ whose coefficients equivalent to an LTI filtering with a (2M N + 1)-length filter h are made of the associated (2M + 1)-length S-G filter h zero-padded by N − 1 samples among its coefficient. It becomes obvious that the LSF turns to an LTI filtering of the original ˜ signal x[n] with the (2M N + 1)-length filter h: ˆ = d[n]
M N
˜i x[n − i]h
(15)
i=−M N
˜T The fact that the LSF turns into a linear-time invariant convolution with h instead of fitting QN polynomial has a major impact on the computation time of the algorithm. Actually, this is a basic feature of the Savitsky-Golay (S-G) algorithm which, when applied to equally spaced data, is equivalent to an LTI convolution. It is important to note that, just like the S-G filter, the LSF filter ˜ is the same for P (P odd) and P + 1, meaning that for a fixed M , h ˜ is the h same for P = 0 and P = 1, or P = 3 and P = 4.
3
Study of the LSF Filter
The previous section showed an intimate relationship between the S-G and LSF filters. The aim of this section is to study the time (or angle) and frequency (or order) domain properties of this filter. 3.1
Impulse Response
This section is concerned with the study of the impulse response (IR) of the LSF filter. A numerical simulation is made and displayed in Fig. 3 showing the IRs of the LSF filter for three polynomial orders ((a) P = 1, (b) P = 3 and (c) P = 5) and three window lengths (M = 20 (blue plot), M = 50 (red plot) and M = 100 (green plot)). The observations of these plots lead to the following properties: – For P = 1, the associated S-G filter is actually a moving average filter and the LSF in this case can be called as the “synchronous moving average”.
184
D. Abboud et al.
– It is obvious that the non-zero coefficients are only defined at integer multiples of the fundamental period (since the plots are represented w.r.t. the fundamental period, it is equal to 1). – The maximum value is located at zero meaning that the biggest weight is always associated with the center of the moving window. This is also a feature of a low-pass filter. For lower values of P , the first lobe tends to be flatter. The flattest one is that associated with P = 1 where the non-zero coefficients are all equal. – For a fixed window width given by M , the maximum value of the coefficients decreases and the variability of the coefficient envelope increases with P , the reason is that further samples (within the 2M + 1 window interval) are considered (through weights) as P increases, so that relatively higher frequency modulations are considered. This pattern is consistent with other values of M and P . – For a fixed P , the shape of the envelope of the coefficients stays the same (to see this compare the three plots of the same color), yet further samples are naturally considered as the window width get larger with M . Of course the coefficient weight and location are adapted so that their squared sum equals 1. Now having evaluated the IR shape, the next subsection evaluates its spectral properties. 3.2
Frequency Response Function
Reference [11] has studied the frequency response function (FRF) of the S-G filter which, as expected, turns into a low-pass filter whose cutoff frequency depends on both the polynomial order P and the window length through M . Equations (12) shows a strong relationship between the S-G filter h and the ˜ The spectral interpretation of this relationship is studied in this LSF filter h. subsection as well as the spectral properties of the LSF filter. Let’s define H(α) as the frequency response function (FRF) of the LSF filter of {hi }i=M i=−M , that is H(α) =
hi ej2πiα/N
− N/2 < α ≤ N/2
(16)
i∈Z
where α is a frequency variable represented with respect to the fundamental order (i.e. associated with the fundamental cycle or period N ) with unit [evt/rev], i.e. one event per revolution/cycle/period, i is the normalized angular index which represent i/N revolution of the fundamental period. The FRF of the LSF filter writes as: ˜ H(N (α − k/N )) − N/2 < α ≤ N/2 H(α) = k∈Z
=
k∈Z
H(N α − k)
− N/2 < α ≤ N/2
(17)
The Synchronous Fitting of Cyclo-non-Stationary Signals
185
Fig. 1. The IRs of the LSF filter for three polynomial orders: (a) P = 1, (b) P = 3 and (c) P = 5. For each polynomial order, the IR is applied for three window length defined for M = 20 (blue plot), M = 50 (red plot) and M = 100 (green plot).
The above equation gives an interesting insight on the interpretation of the LSF filter, it is actually a comb-filter located at the fundamental order and all its harmonics i.e. the set of central frequencies is defined as: Ω = {α ∈ Z s.t. − N/2 < α ≤ N/2} (if N is odd, otherwise one sample on the left bound must be added just like the convention adopted in digital signal processing). Similarly to the synchronous average, the comb-filter is made of N identical elementary low-pass filter H(N α) which, when shifted, turns into band-pass filter. This is due to the fact that the mean operation (which is the empirical average in the synchronous average case and the fitting in the LSF case) is made synchronously (across cycles). Interestingly, the elementary filter is nothing but the S-G filter FRF, defined for each couple (P, M ), shrinked by a factor N (which is the number of samples per fundamental period). The reason of the spectral domain shrinkage is the zero-padding in Eq. (13) which spaces the sample by N (i.e. the IR of the S-G filter is dilated). A numerical simulation is made and displayed in Fig. 1 showing the IRs of the LSF filter for three polynomial orders ((a) P = 1, (b) P = 3 and (c) P = 5) and three window lengths (M = 20 (blue plot), M = 50 (red plot) and M = 100 (green plot)). The plots show a comb filter having a unit gain at the central frequencies and whose properties clearly depend on the both P and M (Fig. 2). In order to study the effect of these parameters on the filter properties, it is more convenient to analyse the elementary filter which can be observed over a half of an order (for instance from 0 to 0.5) because of its symmetry. To do so, the FRFs of the LSF filter is computed for a fixed polynomial order P = 3, and for several window widths defined by M = 20 (blue plot), M = 50 (red plot), M = 100 (green plot) and M = 100 (black plot). The FRFs of the LSF filter is
186
D. Abboud et al.
Fig. 2. The FRFs of the LSF filter for three polynomial orders: (a) P = 1, (b) P = 3 and (c) P = 5. For each polynomial order, the IR is applied for three window length defined for M = 20 (blue plot), M = 50 (red plot) and M = 100 (green plot).
also computed for M = 50: P = 1 (blue plot), P = 3 (red plot), P = 5 (green plot) and P = 7 (black plot). The observations of these plots leads to the following properties of the elementary filter: ˜ – Since the IR is real, H(α) is real and symmetric and so H(N α) also is. – For P = 1, the associated S-G FRF is actually a sinc function, having the narrowest bandpass and the highest noise in the stop-band region for a fixed M. – The cutoff frequency increases with P and decreases with M . The reason is that by increasing P for a given M or decreasing M for a given P , larger variations in the estimated component are allowed which results in higher frequencies of the modulations. – The magnitude of the secondary lobes decreases with P and increases with M. Since the filter bandpass is the most important element in choosing M and P , the 3-dB cutoff frequency is calculated w.r.t. M for five polynomial orders: P = 1, P = 3, P = 5, P = 7 and P = 9. Results are displayed in Fig. 4. The results are generally conform with what was stated previously. As expected, the cutoff frequency plots of higher polynomial order are located above the order plots associated with lower polynomial. It is remarkable how the cutoff frequency strongly decreases as M increases for lower values of M and this change gets slower for higher values until stabilizing asymptotically. Another feature not fully represented in these plots is the fact that almost any desired bandwidth can be obtained by multiple couples (M, P ), disregarding the noise rejection properties
The Synchronous Fitting of Cyclo-non-Stationary Signals
187
Fig. 3. (a)The FRFs of the LSF filter for P = 3: M = 20 (blue plot), M = 50 (red plot), M = 100 (green plot) and M = 100 (black plot). (b) The FRFs of the LSF filter for M = 50: P = 1 (blue plot), P = 3 (red plot), P = 5 (green plot) and P = 7 (black plot).
in the stopband region. In details, if one of these parameters is fixed, almost any cutoff frequency can be obtained by simply sweeping the other. 3.3
Authors’ Recommendations on Parameter Setting
While the proposed method of Daher et al. [4] technique is parametrized by the polynomial order P , the LSF method is parametrized by the window width through M . The optimal parameter setting for a given problem depends on (i) the deterministic component itself through the variability of its complex envelope and (ii) the nature of the noise. In practice, neither the actual signal nor the noise statistics are known. This makes it hard to establish a systematic method for optimal parameter settings. However, according to the authors experience, it is proposed to fix the polynomial order to P = 3. The reason is that for P = 1 which corresponds to a synchronous moving average, the data tendency within the local window will not be considered, while higher values of P will not be useful since the window length defined by M can be shortened if higher polynomial values are required. Another way to see this, is the fact that almost any desired cutoff frequency can be obtained (M, P = 3). The choice of M can be empirically set by the user by simply looking at the long-term modulations in the signal envelope: the variability of the latter has to be expressible with a 3 polynomial order.
188
D. Abboud et al.
Fig. 4. (a) The 3-dB cutoff frequency w.r.t. M for various values of the window length defined by: P = 1(blue plot and ∗ markers), P = 3 (red plot and + marker), P = 5 (orange plot and × marker), P = 7(purple plot and marker) and P = 9 (green plot and marker). (b) the dB representation of the plots in (a).
4
Application to Real Vibration Signal
In this section, the proposed approach is applied to analyze a helicopter vibration signal. The signal is measured on a two stage gearbox with an input pinion (L), two intermediate pinion (M) and (N) and an output pinion (O). The signal was acquired with a sampling frequency of fs = 65536 Hz during 4 s. The kinematic of the reducer is presented in Table 1. The corresponding signal is displayed in Fig. 5 below. The rotating frequency increases from 63% to 100%. Table 1. Characteristic order of the different pinion of the gearbox and the meshings. Pinion
(L) (M)
Order
1
Meshing order 29
(N)
(O)
0.293 0.293 0.1532 9.959
The order spectrum of the vibration is displayed in Fig. 6. It exhibits mainly the characteristic order of the four pinion and two meshing order located at 2 × 9.959 and 29. The presence of those spectral line confirms that the gear signal is sinusoidal. Here, we focus on the meshing component 29 and its is estimated by the LSF filter. The latter is applied with a 3 order polynomial and with a window length of M = 21, M = 101 and M = 401 respectively. The provided estimation in time domain and time-frequency domain (spectrogram)
The Synchronous Fitting of Cyclo-non-Stationary Signals
189
Fig. 5. Measured signals on the helicopter gearbox.
is presented in Fig. 7. It can be seen that whatever the window length(M ), the LSF filter estimates component around order 29 and its harmonics. On the other hand, it can be seen in the time domain that the estimation is smoother for a larger window size. By decreasing M the LSF filter takes into account the modulations and higher frequencies. The spectrogram in Fig. 7 (c) shows a wide band estimate visible around the order 29 and its harmonics when the length of
Fig. 6. Order spectrum of the raw gearbox vibration signal.
190
D. Abboud et al.
Fig. 7. Estimation of the meshing order 29 provided by LSF filter with a 3 order polynomial and different window length(M = 21, 101, 401).
The Synchronous Fitting of Cyclo-non-Stationary Signals
191
the window is equal to M = 21. The lateral bands come from the modulation of the meshing components by the rotation of the pinions (L) and (M). With regard to the spectrogram in Fig. 7 (e) and (g), the estimate provided by the LSF filter is narrow-band. The estimate is dominated mainly by the meshing components and the lateral bands are strongly attenuated. In addition to that, we observe that the noise level is strongly attenuated larger the size of the window is. These results on real signals are consistent with the theoretical study of the LSF filter of the Sect. 3. Based on these results, further analysis can be done such as pinion health monitoring by analyze the residual signal after removing the estimated meshing components.
5
Conclusions
This paper has studied the properties of the local synchronous fitting filter. It was shown that the impulse response of this filter is that of the classical SavitskyGolay filter stretched by the number of point per cycle N and zero padded for all non-integer multiple samples of N. The frequency response function is a comb filter, being made of N identical elementary low-pass filters, which, when shifted, turns into narrow bandpass filters. The elementary filter is nothing but the SG filter of the same parameters, shrinked by a factor N. Further, the effect of the two main parameters, namely the window length M and the polynomial order P, is studied. It was shown that the cutoff frequency strongly decreases as M increases for lower values of M and this change get slower for higher values to reach an asymptote. Another interesting finding is the fact that any desired bandwidth can be obtained by fixing P = 3 and varying only M. This facts makes easier the manipulation of this method in practical applications. Acknowledgement. Acknowledgement is made for the measurements used in this work provided through data-acoustics.com Database.
References 1. Ho, D., Randall, R.: Optimisation of bearing diagnostic techniques using simulated and actual bearing fault signals. Mech. Syst. Signal Process. 14(5), 763–788 (2000) 2. Braun, S.: The synchronous (time domain) average revisited. Mech. Syst. Signal Process. 25(4), 1087–1102 (2011) 3. McFadden, P., Toozhy, M.: Application of synchronous averaging to vibration monitoring of rolling element bearings. Mech. Syst. Signal Process. 14(6), 891–906 (2000) 4. Daher, Z., Sekko, E., Antoni, J., Capdessus, C., Allam, L.: Estimation of the synchronous average under varying rotating speed condition for vibration monitoring. In: Proceedings of ISMA (2010) 5. Abboud, D., Antoni, J., Sieg-Zieba, S., Eltabach, M.: Deterministic-random separation in nonstationary regime. J. Sound Vib. 362, 305–326 (2016)
192
D. Abboud et al.
6. Abboud, D., Assoumane, A., Marnissi, Y., El Badaoui, M.: Synchronous fitting for deterministic signal extraction in non-stationary regimes: application to helicopter vibrations. In: Surveillance, Vishno and AVE conferences. INSA-Lyon, Universit´e de Lyon, Lyon, France (Jul 2019). https://hal.archives-ouvertes.fr/hal-02188704 7. Pan, M.-C., Wu, C.-X.: Adaptive Vold-Kalman filtering order tracking. Mech. Syst. Signal Process. 21(8), 2957–2969 (2007) 8. Assoumane, A., Sekko, E., Antoni, J., Ravier, P.: Bearing signal enhancement using Taylor-h∞ estimator under variable speed condition. IEEE Trans. Instrum. Meas. 67(11), 2538–2547 (2018) 9. Roussel, J., Assoumane, A., Capdessus, C., Sekko, E.: Estimation of cyclic cumulants of machinery vibration signals in non-stationary operation. In: Timofiejczuk, A., Chaari, F., Zimroz, R., Bartelmus, W., Haddar, M. (eds.) Advances in Condition Monitoring of Machinery in Non-Stationary Operations. CMMNO 2016. Applied Condition Monitoring, vol. 9, pp. 21–31. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-61927-9 3 10. Savitzky, A., Golay, M.J.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964) 11. Schafer, R.W.: On the frequency-domain properties of Savitzky-Golay filters. In: 2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), pp. 54–59. IEEE (Jan 2011)
On the Modelling of Phonocardiogram Signals: Laplace Kernel and Cyclostationarity Based Approaches Abdelouahad Choklati1(B) , Anas Had1,2 , and Khalid Sabri1 1
2
STIC Laboratory, Faculty of Sciences, Choua¨ıb Doukkali University, El Jadida, Morocco {choklati.a,had.a,sabri.k}@ucd.ac.ma LASPI, IUT de Roanne, UJM-Saint-Etienne, Universit´e de Lyon, Lyon, France
Abstract. Phonocardiogram is a concept that is used for recording heart sound signals and murmurs. This acoustic recording helps to reveal important information that human ear cannot recognize easily. A phonocardiogram signal, in the healthy case, consists of two fundamental sounds s1 and s2 which are derived from the mechanical functioning of the heart. Actually any change, even small, in the heart sounds might indicate heart valve problems, and hence the need of correctly analyzing and characterizing phonocardiogram signals. Recently, the analysis of phonocardiogram signals becomes an interesting field of research. There are several tools that have been studied and presented in the literature review. The majority of these studies are based on time-frequency and partially exploiting the periodic character of phonocardiogram signal due to the heart functioning. The objective of this research is to propose a coherent mathematical model and an analytical framework based on cyclostationarity. This allows the use of cyclostationary tools for the characterization and the analysis of phonocardiogram signals which are analyzed and discussed in details over synthetic and experimental datasets. The simulation shows promising results that can help with the early detection of some heart diseases.
Keywords: Phonocardiogram Heart diseases
1
· Cyclostationarity · Cyclic statistics ·
Introduction
The sound of the beating of the heart has a functional representation in the time domain. This representation which expresses the mechanical actions of the heart and the resulting signal is called phonocardiogram (PCG). This is a solid source of information whose analysis is the path that leads to the detection and the identification of a variety of pathologies and cardiac dysfunctions.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 193–206, 2022. https://doi.org/10.1007/978-3-030-82110-4_10
194
A. Choklati et al.
In the case of a good health, the PCG signal consists of two major sounds, namely the first heart sound s1 and the second heart sound s2 . These both elements are derived from the dynamic functioning of the heart. They are due to the closing of the heart valves and the turbulent passage of blood through them [1,5]. In comparison with s2 , the first heart sound s1 is recognized by its long duration and higher amplitude. Statistically with 95% confidence interval, the mean of s1 and s2 duration are 122 ± 32 ms and 92 ± 28 ms, respectively [27]. In normal cases, each one of the two heart sounds s1 and s2 are composed of two components that are separated by less than 30 ms during expiration and by 50 to 60 ms during inspiration [14]. The measurement of the time difference between the components of each heart sound is an essential indicator for heart diseases as s1 or s2 split. Besides s1 and s2 , the presence of other heart sounds could match problems or abnormalities in the heart valves. Unlike classical methods, all abnormalities can be reflected on PCG signals which allow cardiologists to suspect early hear abnormalities upon thorough analysis. Several models have been proposed in the literature to reproduce the shape of heart sounds. Among these models we cite: The linear transient chirp signal model [31], the non-linear transient chirp signal model [32] and the sinusoidal models damped by exponential [2,26]. However, these models are limited to a single cardiac cycle and do not provide information on the behavior of cardiac sounds for the other cycles on which a reliable cardiac diagnosis depends. Moreover, the majority of these models are based on the occurrence of timefrequency analysis or scale [6,17,29]. In a recent work, a mathematical model based on the Gabor kernel was proposed to describe the heart functioning over multiple cycles [4]. Nevertheless, it was observed that sometimes the shape of the heart sound is different from Gabor kernel. In practice, a large number of observed processes are the result of periodic phenomena. These processes give rise to random data whose statistics vary periodically with time and are called cyclostationary processes [11,21,24]. These cyclostationary processes are found in various domains as telecommunications, telemetry, mechanics, astronomy, and econometric [13,22,28]. In medicine, cyclostationnarity was used earlier for tissue characterization [7,8]. Afterwards, other authors employed cyclostationarty for other purposes as the discrimination of ventricular tachycardia from sinus rhythm [9], identification of significant frequencies in surface EMG signals [19], heart sound cancelation from lung sound [18], and analysis of respiratory signals [23]. Indeed, the heartbeats are in the form of a series of repeated mechanical actions. The repetition is not exactly periodic but is considered as quasi-periodic. This means that the vibration waves records are, in a sense, cyclostationary or quasi-cyclostationary. Therefor, the use of mathematical models based on cyclostationarity for representing the functioning of the heart sounds seems to be important for the analysis and the characteristic of PCG signals. Hence, the aim of this study is to provide a coherent cyclostationary model with a more realistic kernel, Laplace kernel, for PCG signals. This model is widely discussed
On the Modelling of Phonocardiogram Signals
195
in order to evaluate how much it is adaptable to the heart functioning. The main motivation behind the new model is to propose a framework for an accurate characterization of PCG signals and thereby an early detection of certain abnormal heart functioning. This chapter is organized as follows. Section 2 concerns the modelling and the representation of PCG signals. Section 3 is dedicated to the analytic development of the proposed model, some simulation results are also presented in order to confirm the theoretical analysis. In Sect. 4, we give examples of synthetic and real PCG signals. Finally, Sect. 5 summarizes the whole study.
2 2.1
Analytical Model of PCG Signals Background
As mentioned before, a PCG signal is said to be normal (healthy) when it consists only of s1 and s2 heart sounds. The need for an adequate framework, for the detection and the classification of abnormalities in the heart functioning, has pushed researchers to be interested in the modelling of PCG signals. Hence, several models have been proposed to reproduce the shape of heart sounds such as the chirp model [32,33], the damped sinusoidal model [2,16] and the modified Prony model [26]. To the best of our knowledge, all related models are limited to a single cardiac cycle and give no information about the behavior of heart sounds for the remaining cycles. Thus, the motivation of this work is to build up a mathematical model that can reproduce jointly the shape and the quasi-periodic character of heart sounds. 2.2
Proposed Model
Among the existing PCG models, the ones based on Laplace and Gabor kernels [4,5,15,25,30,32,34] seem to be more realistic. The Laplace kernel, which is actually an exponentially damped sinusoidal wave, offers the possibility, through five adjustable parameters (ai , βi , μi , fi and ϕi ), to exactly reproduce the shape of any heart beat. The model of Eq. (1) makes use of two Laplace kernels to represent each heart sound s1 and s2 . Therefore, one heart beat of PCG signal can be modeled with four kernels as follows (Fig. (1)-(a)): ai exp (−βi |t − μi |) cos (2πfi t − ϕi ) (1) ± i∈[s± 1 ;s2 ]
196
A. Choklati et al.
where – fi and ϕi are respectively the frequency and the phase shift of the sinusoidal terms. – ai , μi and βi are successively the amplitude, the location and the scale which represent the parameters of the Laplacian terms. – ± is a superscript indicating the two Laplace kernels which are used for ± − + − + modelling each heart sound with [s± 1 ; s2 ] = [s1 , s1 ; s2 , s2 ]. The model of Eq. (1) represents a PCG signal for a single cardiac cycle. Unfortunately, this representation is not enough to make a full characterization of the heart in a limited time. Hence, the idea of the proposed model, to achieve a whole description, is to jointly combine Laplace kernels, for modelling the shape of heart sounds, with some randomness to reproduce the fluctuations occurring in the heart functioning for every cardiac cycle. This combination leads to a complex model for characterizing the heart functioning over multiple cycles, which is expressed as follows: z(t) =
ai,n exp (−βi |t − μi − nT |) cos (2πfi (t − nT ) − ϕi,n )
(2)
± i∈[s± 1 ,s2 ],n
where – the index n stands for the cardiac cycle. – T is the cardiac cycle duration. The random behavior in z(t) comes simultaneously from the parameters ai,n and ϕi,n . This means that the amplitude and the phase for each heart sound might change for any cardiac cycle. Where ai,n is the amplitude of Laplace kernel for the ith heart sound and the nth cardiac cycle and it follows a Gaussian law 2 ). Whereas the phase ϕi,n follows a uniform law inside the interval N (μai , σai [ϕi,0 − Δϕ, ϕi,0 + Δϕ] with Δϕ ∈ [0, π] and ϕi,0 is the ith initial phase. An example of the proposed model of Eq. (2) is given by Fig. (1)-(b), where the parameters are listed in Table (1). Moreover T = 1 s, K = 12 cardiac cycles and the sampling frequency fs is set to 1000 Hz. Table 1. Parameters to generate the realistic PCG signal according to Eq. (2). μai (mv) σai (mv) μi (s) βi
ϕi,0 (rad) Δϕi (rad) fi (Hz)
1st data set s+ 1 0.8
0.02
0.0414 75
2.77
0.314
66.66
s− 1 0.8
0.15
0.0716 62.5 1.73
0.314
78.85
s+ 2 0.8
0.10
0.3836 60
3.14
0.314
66.92
s− 2 0.9
0.07
0.3883 70
3.14
0.314
71.19
On the Modelling of Phonocardiogram Signals
197
6
6
4
4
2
2
0
0
-2
-2
-4 -4
-6 -6 0
0.1
0.2
0.3
0.4
0.5
0.6
0
2
4
6
t(s)
t(s)
(a)
(b)
8
10
12
Fig. 1. Example of a healthy PCG signal. (a) Single cardiac cycle. (b) Multiple cardiac cycles.
3
Cyclic Analysis
It is well known that the coupling of a randomness, produced by the fluctuation of the amplitude and/or the phase for example, with a deterministic periodic phenomenon gives rise to a cyclostationary process. To verify the cyclostationarity hypothesis of the proposed model of Eq. (2), a theoretical study is conducted. Let us first recall the bases of cyclostationary analysis. 3.1
Definitions
A stochastic signal x(t) of expectation E{x(t)} and Instantaneous Autocorrelation Function (IAF) Rx (t, τ ) = E{x(t − τ /2)x∗ (t + τ /2)}, where the superscript ∗ denotes complex conjugation, is said to be wide-sense cyclostationary with T0 -period if both E{x(t)} and Rx (t, τ ) are periodic over time t with T0 -period [10], i.e. E{x(t)} = E{x(t + T0 )} for all t and Rx (t, τ ) = Rx (t + T0 , τ ) for all t, τ. The IAF is, thus, periodic over t and can be expanded in Fourier series: n +∞ n/T0 n/T Rx (t, τ ) = (τ )ej2π T0 t , where Rx 0 (τ ) is known as the Cyclic n=−∞ Rx Autocorrelation Function (CAF) and is given by: T0 /2 n 1 Rx (t, τ )e−j2π T0 t dt (3) Rxn/T0 (τ ) = T0 −T0 /2 where n/T0 , n ∈ Z, are the cyclic frequencies. The Fourier transform of the CAF with respect to the cyclic frequency α = n/T0 gives rise to the the Spectral Correlation Density function (SCD): +∞ Sxα (f ) = Rxα (τ )e−j2πατ dτ (4) −∞
198
3.2
A. Choklati et al.
1st -order and 2nd -order Moments of the Proposed Model
Let us check the wide-sense cyclostationarity for the proposed model using the previous definitions. We first compute the 1st -order moment of z(t) and the IAF. The mean E{z(t)} is given by: Δϕ μai exp (−βi |t − μi |) cos(2πfi t − ϕi,0 ) ∗ δ(t − nT ) mz (t) = sinΔϕ (5) ± ± n
i∈[s1 ,s2 ]
mz (t) is T -periodic which indicates that z(t) is 1st -order cyclostationary. It should be noted that mz (t) converges to 0 when Δϕ moves toward π as Δϕ ∈ [0, π]. The following relationship gives the computation of the IAF of the PCG signal after removing the synchronous mean :
Rz (t, τ ) =
σ2
sin(2Δϕ) cos(4πfi t − 2ϕi,0 ) 2 2Δϕ (6) i exp (−βi |t − μi − τ /2|) exp (−βi |t − μi + τ /2|) ∗ n δ(t − nT ) ai
cos(2πfi τ ) +
Rz (t, τ ) and E{z(t)} are both periodic with the same period T as illustrated in Fig. (2). Hence, the signal of the proposed model of Eq. (2) is wide-sense cyclostationary. 45
40
35
30
25
20
15
10
5
0 0
2
4
6
8
10
12
t(s)
Fig. 2. A numerical estimate of the time-varying autocorrelation function Rz (t, τ ), for τ = 0 s, of a synthetic signal.
3.3
Cyclic Autocorrelation Function Rzα (τ )
According to Gardner [10], the CAF can be defined by performing the Fourier transform of Rz (t, τ ) with respect to t.
On the Modelling of Phonocardiogram Signals
199
The Fourier transform of Rz (t, τ ) of Eq. (6) leads to: σ 2 e−j2πμi α δ(α − nT −1 ) ai Rzα (τ ) = cos(2πfi τ )HRi (α, τ ) 2T i,n
(7) 2jϕ −2jϕi,0 i,0 e + sin(2Δϕ) H (α + 2f , τ ) + e H (α − 2f , τ ) Ri i Ri i 4Δϕ where |τ |
|τ |
sin(πα|τ |) e− 2 (βi +j2πα) e− 2 (βi −j2πα) + + e−βi |τ | HRi (α, τ ) = βi − j2πα βi + j2πα πα and δ(.) denotes the Dirac’s delta. The first thing to note is that Rzα (τ ) is α-discrete and nonzero only for the harmonics of T −1 . This point confirms the second order cyclostationarity of the 2 σai model of Eq. (2). Also, the term 2T increases when σai increases too, this will cause an increase of cyclostationarity. However, sin(2Δϕ) decreases when Δϕ goes 4Δϕ to π which leads to a decrease of cyclostationarity. Figure (3) reports the estimated Rzα (τ ) of the synthetic signal of Fig. (1). In accordance with the theoretical results, the CAF is nonzero only for the harmonics of T −1 .
Fig. 3. A numerical estimate of the CAF of a synthetic PCG signal: (a) Rzα (τ ) as a function of α and τ . (b) Up-scaled Rzα (τ ) in the α-plan for τ = 0 s.
3.4
Spectral Correlation Density Function Szα (f )
Szα (f ) represents another important second-order cyclic statistic allowing the characterization in the (f, α)-plan. As defined by Gardner [12], the SCD function of a cyclostationary random process in the wide-sense is the Fourier transform of its CAF with respect to τ .
200
A. Choklati et al.
The Fourier transform of Rzα (τ ) of Eq. (7) leads to: Szα (f ) =
e−j2παμi HSi (α, f − fi ) + HSi (α, f + fi ) sin(2Δϕ) HSi (α + 2fi , f )ej2ϕi,0 + HSi (α − 2fi , f )e−j2ϕi,0 δ(α − nT −1 ) + 2Δϕ
2 σai i,n 4T
Where
HSi (α, f ) =
+∞
−∞
HRi (α, τ )e−j2πf τ dτ
(8)
(9)
It should be noted that for the zero cyclic frequency i.e. α = 0 Hz the last relationship is reduced to the power spectrum density. As it is shown by the relationship (8), the SCD Szα (f ) is α-discrete and is nonzero only for α = nT −1 with resonances around ±2fi . Furthermore, Szα (f ) ± is f -continuous and presents peaks in frequencies ±fi , with i ∈ [s± 1 , s2 ], where fi represents the characteristic frequencies of the signal. Moreover, Fig. (4) reports a numerical estimator of Szα (f ) of the synthetic signal of Fig. (1) which confirms the effectiveness of the theoretical results mentioned previously.
Fig. 4. A numerical estimate of the spectral correlation density Szα (f ): (a) Szα (f ) as a function of α and f . (b) Up-scaled Szα (f ) in axis α for f = 71.16 Hz.
In the simulated studies the cyclostationarity tools as the CAF or the SCD are estimated on signal with limited duration. More details about the cyclic statistic estimators are presented in [20].
On the Modelling of Phonocardiogram Signals
4 4.1
201
Tests on Synthetic and Real Data Sets Realistic Synthetic Data Sets
Additional simulations have been made in order to confirm the cyclostationary behavior of the mathematical model of Eq. (2) regarding its parameters. Actually, the parameters, μai , σai , μi , βi , ϕi,0 , Δϕi and fi mentioned in Subsect. 2.2, might vary from beat to beat and for each person i.e. these parameters represent the functioning of a unique heart. Two healthy hearts cannot have exactly the same parameters. The objective of this simulation is to understand the influence of these parameters on the PCG model in order to evaluate how much cyclic statistics represent a coherent signature and characteristic even for different hearts (different parameters), and thereby a suitable tool to recognize healthy hearts. The parameters for the simulations are listed in Table (2), where K represent the number of cardiac cycles. Moreover, an additive Gaussian noise is added such that the SNR is set to the desired values. The sampling frequency for the three signals is set to 1000 Hz. The second order statistics reported in Fig. (5, 6 and 7), confirm the cyclic behavior of the three PCG signals even if the cyclic periods are different. It should be noted that the cyclic statistics are not sensitive to noise since the noise is supposed to be stationary. 4.2
Experimental Data Sets
The estimations of second order staistics for real PCG signals will indicates the matching of the proposed model of Eq. (2) with reality. To do this we make use of data sets provided from [3] which have been gathered from a clinic trial in hospitals using the digital stethoscope DigiScope. Table 2. Parameters to generate three sets of realistic PCG signals according to Eq. (2). 2nd data set s+ 1 s− 1 s+ 2 s− 2 3rd data set s+ 1 s− 1 s+ 2 s− 2 4sd data set s+ 1 s− 1 s+ 2 s− 2
μai (mv) σai (mv) μi (s)
βi
ϕi,0 (rad) fi (Hz) Δϕi (rad) T
0.45 0.93 0.70 0.51
0.025 0.17 0.20 0.09
0.0446 0.0748 0.3676 0.3915
77.50 65.00 63.75 67.50
2.83 3.14 3.14 0.00
67.86 76.59 69.12 62.83
0.523
0.67 30
SN R (db) K
0.33 0.80 0.53 0.52
0.03 0.19 0.02 0.05
0.0366 0.0700 0.3804 0.3756
72.50 67.50 61.25 71.25
2.00 3.14 0.18 3.14
63.21 68.05 62.27 66.98
0.785
0.75 20
0.43 0.76 0.43 0.33
0.04 0.15 0.03 0.08
0.0398 0.0796 0.3756 0.3899
76.25 65.00 66.25 73.75
2.23 3.14 3.14 3.13
65.97 71.94 67.36 66.98
1.047
0.80 10
12
The second order statistics reported in Fig. (8, 9 and 10) show that the three real PCG signals are well wide-sense cyclostationary. This result matches with
202
A. Choklati et al.
Fig. 5. IAFs of each synthetic PCG signal: (a) 2nd data set. (b) 3rd data set. (c) 4th data set.
Fig. 6. A numerical estimate of the CAF Rzα (τ ): (a, b, c) Rzα (τ ) as a function of α and τ . (d, e, f) Up-scaled Rzα (τ ) in axis α for τ = 0 s.
the one of synthetic PCG signals which proves the effectiveness of the proposed model of Eq. (2). It should be noted that the last two real PCG signals are very noisy. The perturbations in CAF and the spectral correlation density of the last two real PCG signals may be explained by the strong fluctuation of the cyclic period i.e. cardiac cycle. Furthermore, the spectral correlation density functions are thresholded in order to exclude noise due to the estimator since signals have a small number of samples.
On the Modelling of Phonocardiogram Signals
203
Fig. 7. A numerical estimate of the spectral correlation density Szα (f ): (a, b, c) Szα (f ) as a function of α and f . (d, e, f) Up-scaled Szα (f ) in axis α for ((d)f = 71.77 Hz, (e)f = 63.23 Hz, (f)f = 65.91 Hz).
Fig. 8. (a, b, c) Time representation of the real PCG signals. (d, e, f) IAF for each real PCG signal.
204
A. Choklati et al.
Fig. 9. The CAF Rzα (τ ) of the three real PCG signals: (a, b, c) Rzα (τ ) as a function of α and τ . (d, e, f) Up-scaled Rzα (τ ) in axis α for τ = 0 s.
Fig. 10. The SCD Szα (f ) of the three real PCG signals: (a, b, c) Szα (f ) as a function of α and f . (d, e, f) Up-scaled in axis α for ((d) f = 37.68 Hz, (e) f = 26.91 Hz, (f) f = 29.29 Hz).
On the Modelling of Phonocardiogram Signals
5
205
Conclusion
In this chapter, a new PCG model based on Laplace kernels is presented. This model exploits the cylostationarity of the process and allows a better description of heart beats over several cardiac cycles. The theoretical study proves the wide-sense cyclostationarity of the proposed model. Simulations on synthetic and experimental signals were performed to confirm the proposed model and its cyclostationarity. The cyclostationary property provides researchers and cardiologists with robust tools to detect many troubles in the heart functioning as cyclic statistics are not perturbed by stationary additive noise.
References 1. Almasi, A., Shamsollahi, M.B., Senhadji, L.: Bayesian denoising framework of phonocardiogram based on a new dynamical model. IRBM 34(3), 214–225 (2013) 2. Baykal, A., Ider, Y.Z., Koymen, H.: Distribution of aortic mechanical prosthetic valve closure sound model parameters on the surface of the chest. IEEE Trans. Biomed. Eng. 42(4), 358–370 (1995) 3. Bentley, P., Nordehn, G., Coimbra, M., Mannor, S.: The PASCAL Classifying Heart Sounds Challenge 2011 (CHSC2011) Results. http://www.peterjbentley. com/heartchallenge/index.html 4. Choklati, A., Sabri, K.: Cyclic analysis of extra heart sounds: Gauss kernel based model. Int. J. Image Graph. Signal Process. 10(5), 1–14 (2018) 5. Choklati, A., Sabri, K., Lahlimi, M.: Cyclic analysis of phonocardiogram signals. Int. J. Image Graph. Signal Process. 9(10), 1 (2017) 6. Debbal, S., Bereksi-Reguig, F.: Time-frequency analysis of the first and the second heartbeat sounds. Appl. Math. Comput. 184(2), 1041–1052 (2007) 7. Donohue, K., Varghese, T.: Spectral crosscorrelation for tissue characterization. In: IEEE 1992 Ultrasonics Symposium Proceedings. IEEE (1992) 8. Fellingham, L., Sommer, F.: Ultrasonic characterization of tissue structure in the in vivo human liver and spleen. IEEE Trans. Sonics Ultras. 31(4), 418–428 (1984) 9. Finelli, C., Jenkins, J.: A cyclostationary least mean squares algorithm for discrimination of ventricular tachycardia from sinus rhythm. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 13. IEEE (1991) 10. Gardner, W.: Two alternative philosophies for estimation of the parameters of time-series. IEEE Trans. Inf. Theory 37(1), 216–218 (1991) 11. Gardner, W.A.: Introduction to Random Processes with Applications to Signals and Systems, p. 447. MacMillan Co., New York (1986) 12. Gardner, W.A.: Cyclostationarity in communications and signal processing. Technical report, Statistical Signal Processing Inc, CA (1994) 13. Had, A., Sabri, K.: A two-stage blind deconvolution strategy for bearing fault vibration signals. Mech. Syst. Signal Process. 134, 106307 (2019) 14. Had, A., Sabri, K., Aoutoul, M.: Detection of heart valves closure instants in phonocardiogram signals. Wirel. Pers. Commun. 112(3), 1569–1585 (2020) 15. Jabloun, M., Ravier, P., Buttelli, O., L´ed´ee, R., Harba, R., Nguyen, L.D.: A generating model of realistic synthetic heart sounds for performance assessment of phonocardiogram processing algorithms. Biomed. Signal Process. Control 8(5), 455–465 (2013)
206
A. Choklati et al.
16. Koymen, H., Altay, B.K., Ider, Y.Z.: A study of prosthetic heart valve sounds. IEEE Trans. Biomed. Eng. BME-34(11), 853–863 (1987) 17. Leung, T., Salmon, A., Collis, W., White, P., Brown, E., Cook, J.: Analysis of the second heart sound for diagnosis of paediatric heart disease. IEE Proc. Sci. Meas. Technol. 145(6), 285–290 (1998) 18. Li, T., Tang, H., Qiu, T., Park, Y.: Heart sound cancellation from lung sound record using cyclostationarity. Med. Eng. Phys. 35(12), 1831–1836 (2013) 19. LoPresti, E., Jesinger, R., Stonick, V.: Identifying significant frequencies in surface EMG signals for localization of neuromuscular activity. In: Proceedings of 17th International Conference of the Engineering in Medicine and Biology Society. IEEE (1995) 20. Napolitano, A.: Cyclostationarity: new trends and applications. Signal Process. 120, 385–408 (2016) 21. Napolitano, A.: Cyclostationary signal processing and its generalizations. In: IEEE Statistical Signal Processing Workshop, Gold Coast, Australia (2014) 22. Nassar, M., Dabak, A., Kim, I.H., Pande, T., Evans, B.L.: Cyclostationary noise modeling in narrowband powerline communication for smart grid applications. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2012) 23. Saatci, E., Saatci, E., Akan, A.: Cyclostationary analysis of respiratory signals with application of rate determination. In: IFMBE Proceedings, pp. 265–269. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-9038-7 49 24. Sabri, K., Badaoui, M.E., Guillet, F., Belli, A., Millet, G., Morin, J.B.: Cyclostationary modeling of ground reaction force signals. Signal Process. 90(4), 1146–1152 (2010) 25. Sava, H., Pibarot, P., Durand, L.G.: Application of the matching pursuit method for structural decomposition and averaging of phonocardiographic signals. Med. Biol. Eng. Comput. 36(3), 302–308 (1998) 26. Sava, H., McDonnell, J.: Modified forward-backward overdetermined prony method and its application in modelling heart sounds. IEE Proc. Vision Image Signal Process. 142(6), 375 (1995) 27. Schmidt, S.E., Holst-Hansen, C., Graff, C., Toft, E., Struijk, J.J.: Segmentation of heart sound recordings by a duration-dependent hidden markov model. Physiol. Meas. 31(4), 513–529 (2010) 28. Serpedin, E., Panduru, F., Sarı, I., Giannakis, G.B.: Bibliography on cyclostationarity. Signal Process. 85(12), 2233–2303 (2005) 29. Shervegar, M.V., Bhat, G.V.: Automatic segmentation of phonocardiogram using the occurrence of the cardiac events. Inf. Med. Unlocked 9, 6–10 (2017) 30. Smoot, S.R., Rowe, L.A., Roberts, E.: Laplacian model for ac DCT terms in image and video coding. In: Ninth Image and Multidimensional Signal Processing workshop, Citeseer (1996) 31. Tran, T., Jones, N.B., Fothergill, J.C.: Heart sound simulator. Med. Biol. Eng. Comput. 33(3), 357–359 (1995) 32. Xu, J., Durand, L., Pibarot, P.: Nonlinear transient chirp signal modeling of the aortic and pulmonary components of the second heart sound. IEEE Trans. Biomed. Eng. 47(10), 1328–1335 (2000) 33. Xu, J., Durand, L.G., Pibarot, P.: Extraction of the aortic and pulmonary components of the second heart sound using a nonlinear transient chirp signal model. IEEE Trans. Biomed. Eng. 48(3), 277–283 (2001) 34. Zhang, X., Durand, L., Senhadji, L., Lee, H., Coatrieux, J.L.: Analysis-synthesis of the phonocardiogram based on the matching pursuit method. IEEE Trans. Biomed. Eng. 45(8), 962–971 (1998)
Overview of Practical Aspects of Evaluation of Spectral Scalar Indicators for Trend Analysis in Condition Monitoring Adam Jablonski(B)
and Tomasz Barszcz
AGH University of Science and Technology, Cracow, Poland {ajab,tbarszcz}@agh.edu.pl
Abstract. The problem of calculation of parameters based on spectra might seem trivial, but it is so only for MATLAB® (or other advanced software) users. Commercial condition monitoring systems work on pure platforms and they use basic mathematical functions. For this reason, even such a simple thing as evaluation of scalar indicators from spectral signal representation might raise several problems and possible errors during industrial implementations, especially for untypical signal parameters. The paper presents selected guidelines how to cope with this problem in practical implementations. After reading this paper, one will learn some practical recipes, which enable conversion of spectral data to trend data. Keywords: Trend analysis · Health indicator · Signal feature · Diagnostic feature · Diagnostic indicator · Condition monitoring systems
1 Introduction 1.1 The Concept of Trend Analysis Trend analysis is one of most favorable methods within condition monitoring of rotary machinery simply because it is easy to present and understand historical evolution of a machine parameter. Nevertheless, one needs to keep in mind that every trend plot illustrates just single particular characteristics of machine technical condition. In other words, it is impossible to define a single, “universal” technical health marker for a given machine. As a result, commercial monitoring systems define dozens, hundreds or even thousands of trends hoping that they cover all possible machine faults, i.e. that no fault will be overlooked. However, even when a large number of trend analyses is defined, and all thresholds are set, the second problem with trend analysis concerns the actual mathematical conversion from a raw vibration signal to a scalar value. Although some scalar indicators are defined from time or angle domain, this paper focuses on the most commonly used indicators, which are calculated from signal’s spectral representation.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 207–216, 2022. https://doi.org/10.1007/978-3-030-82110-4_11
208
A. Jablonski and T. Barszcz
1.2 Dictionary Trend analysis uses a set of scalar values calculated via some mathematical equation. Historically, these series are called in various ways by different researchers from different fields of science and engineering. Typically, in the field of mathematics and statistics, these points are called “time series”. In signal processing and data science field, trend data is typically noted as “signal feature”. On the other hand, condition monitoring developers usually refer to “health indicators” (HIs), “diagnostic indicators” or “diagnostic features”. Finally, diagnostic engineers and machine operators simply talk about “trends”. In this paper, all notations above mentioned are used interchangeably in best accordance with particular context. 1.3 Classification Condition monitoring systems offer a variety of scalar indicators. Moreover, a large part of these indicators is calculated from relatively long signals (i.e. 10 s and more).
Fig. 1. Classification of common health indicators used in commercial condition monitoring systems.
Overview of Practical Aspects of Evaluation
209
As a result, condition monitoring systems frequently offer averaged, broadband and wideband HI-s, many narrowband HI-s, as well as residual, compound, and postprocessed HIs. Figure 1 illustrates classification of HIs commonly used in general purpose vibration-based Condition Monitoring Systems (CMS). According to Fig. 1, first class of broadband indicators, like peak-to-peak (PP) or root-mean-square (RMS) is calculated directly from a time waveform, i.e. raw vibration signal. After translation into frequency domain, narrowband indicators are calculated in the frequency domain. Next, the envelope of the signal enables calculation of broadband envelope HIs, typically Envelope PP and Envelope RMS, which are efficient in detection of faulty rolling element bearings (REB)-generated components, especially caused by distributed faults. After translation into frequency domain, the envelope signal enables tracking of predefined cyclostationary components, like ball passing frequency of the outer race (BPFO), ball passing frequency of the inner race BPFI, fundamental train frequency FTF (cage) or ball spin frequency (BSF) of REBs local faults. Note that frequency-domain narrowband indicators are generally used for condition monitoring of machinery operating under stationary operational conditions. At the same time, acceleration signal could be resampled, i.e. it is converted to angle domain. Most commonly, this operation requires extra phase marker signal (PM), also called a “tacho” signal or “keyphasor® ” (the latter one is registered by Bently Nevada). Analogously to frequency-domain health indicators, resampled signal is used to calculate broadband and narrowband HI-s, as well as their envelope counterparts. Narrowband HIs are used to track deterministic, phase-locked components, like shaft harmonics, blade harmonics or gear meshing frequency (GMF) harmonics, while narrowband HIs from its envelope signal are widely used to track predefined, phase-locked second-order cyclostationary components (like local REB fault-generated components) [1]. Thus, order-domain narrowband HIs are generally used for technical assessment of machinery operating under non-stationary operational conditions, especially non-stationary rotational speed (but preferably varying by no more than a few percent). When extraction of periodic signal components from a noisy background is of greater interest than the resolution of the signals, the resampled signal is averaged, resulting in a so-called Time-Synchronous Averaged (TSA) new waveform, which is much shorter than the original one. As shown in [2], almost any “regular” statistical broadband or narrowband health indicator could be calculated from the TSA signal. The TSA signal is also used to compute indicators from so-called “residual” signals, which are defined as TSA signals with removed clear, phase-locked components, mainly shaft harmonics and gearbox-generated harmonics [2, 3]. Naturally, any of broadband or narrowband scalar indicators might be calculated from the residual signal as well. However, in contrast to other paths, time synchronous-averaging requires user definition of the averaging cycle, which is not obvious and typically calls for manual data analysis. As illustrated in Fig. 1, all but envelope HIs could be generated from both, acceleration and velocity signals. However, in practice, due to velocity signal amplitude general conversion ratio (1/2π f ), where f is a frequency of consecutive components, velocity signals are analyzed up to few hundred Hz. Further integration of velocity signal to displacement signal typically is not done because of inevitable numerical errors, especially
210
A. Jablonski and T. Barszcz
when relatively long signals (e.g. for more than 10 s for a commonly used industrial 20–30 kHz sampling frequency) are used. The total number of scalar health indicators proposed by researchers is much larger [4], and covers standard deviation (STD), kurtosis, shape factor (SF), energy ratio (ER), energy operator (EOP or EO), Zero Order Figure of Merit (FM0), Fourth Order Figure of Merit (FM4), M6A, M8A, NA4 and NA4*, NB4, delta RMS sideband level factor (SLF or SMLF), sideband index CAL4, clearance factor, impulse indicator, Correlated Kurtosis (CK), mean frequency (MN), frequency center (FC), Root mean square frequency (RMSF), Standard deviation frequency (STDF), Spectral Kurtosis (SK), Shannon Entropy, and finally Fourth Order Normalized Power (NP4). Some algorithms for these indicators are found in [5]. Other indicators, including G2 (ratio of the amplitude of the GMFx2 to the amplitude of GMFx1) and DAM (Derivative of the Amplitude Modulation Analysis) is given in [6]. Finally, rational indicators including Dynamic Energy Index (DEI), Sideband Power Factor (SBPF), and Sideband Energy Ratio (SER)™, are found in [5, 6], and [7], respectively. From practical point-of-view, it is worth to note that any health indicator, which requires either integration (i.e. all velocity signal-based HIs) or includes a denominator (like crest, kurtosis or SER™ HIs) is more error prone to any random signal disturbances than other indicators, resulting in higher false alarm rate.
2 Generation of Trend Plots Regardless of the algorithm underneath, trend plots always draw points (or lines) along horizontal time axis. Because condition monitoring typically requires analysis of long term trend data, these graphs are frequently called “historians”. Trend plots are generally
Fig. 2. A schematic illustration of trend plots generation.
Overview of Practical Aspects of Evaluation
211
recognized as the most practical tools in condition monitoring, because a proper selection of indicator and signal domain, followed by representative input data, results in reliable fault detection, fault identification and fault development monitoring. Figure 2 illustrates how a set of predefined indicators (here two HIs) is calculated from a discontinuous series of continuous vibration signal. Frequently, all types of indicators in a set (broadband, wideband and narrowband) are calculated simultaneously from a given signal. Next, individual HIs are stored asynchronously, because they are calculated from signals, which are buffered according to the machine operational conditions. Nevertheless, for stationary operational conditions, condition monitoring systems typically “try” to store these sets of health indicators in fixed time periods, e.g. every 10 min. From a practical point-of-view, the more repeatable the storage is, the easier the data analysis.
3 Scalar Evaluation 3.1 Overview Health indicators are calculated from different types of input data, including time waveform, envelope waveforms, order waveforms, and their spectral representations from both, acceleration and (partially) velocity signals. In case of spectral analysis, the number of possible bandwidths is virtually unlimited; therefore, many systems operate on tremendous total number of indicators. From the source-point of view, HIs are calculated either from time signal or spectral representation. The current paper deals with the latter ones. Generally, multiple narrowband HIs are calculated from the same spectral data by selection of spectral points corresponding to certain ranges of spectral indexes; therefore, this process is frequently referred to as “narrowband component extraction”. This complete procedure requires three independent operations. The fist operation is extraction of narrowband signal, the second operation refers to determination of spectral indexes, and finally the third operation refers to calculation of spectral amplitudes, i.e. conversion from a relatively short sequence of numbers to a single scalar value. 3.2 Signal Extraction Figure 3 illustrates a fragment of a noisy raw simulated vibration signal and its narrowband-filtered, deterministic component. The signal consists of a single 100 Hz component with amplitude 0.35 and a scaled, white random noise. The narrowband component showed in the figure is filtered in the frequency domain from 90 Hz to 110 Hz. Figure 4 illustrates spectral representation of the narrowband component. Clearly, the analytical selection of spectral points corresponding to frequency band in the range from 90 Hz to 110 Hz practically completely covers the component of interest. Typically, condition monitoring systems do not display such filtered signals, but they operate solely on spectral indexes and amplitudes, so the “extraction” step is a part of inner system calculations.
212
A. Jablonski and T. Barszcz
Fig. 3. Sinusoidal component merged in noise.
Fig. 4. Spectral representation of the narrowband component.
3.3 Determination of Spectral Indexes When a narrowband spectral indicator is calculated, it is necessary do select certain set of amplitude values, from which it is computed. This set of values is selected on the basis of the spectral range, or bandwidth, which is calculated typically in advance (with respect to analytical center frequency) and is stored in system’s configuration. The bandwidth is initially calculated as an analytical value, and it is finally represented either as a range, e.g. here 90 Hz–110 Hz or a single component name, like for instance “GMFx3” with some additional bandwidth definition. Definition of a relatively wide band for tracking of the development of machine faults is easily justified, because such definition handles variations of speed. Moreover, it is more stable. Furthermore, it compensates for mismatch between theoretical values and true frequencies/orders of components due to all kinds of slips or drive train transmission errors. Finally, it enables definition of HIs for frequencies or orders, which are not accurately represented due to limited spectral resolution. On the other hand, definition of relatively narrow ranges (theoretically) enables more precise fault identification. This is useful for components, which are phase-locked and relatively close to each other, like for instance multiple harmonics of multiple shafts generating multiple sidebands of gear meshing harmonics in multi-stage gearboxes. Typically, the final spectral band is calculated in three simple steps. First, single order/frequency corresponding to a given machine element is calculated. Typically, it is called the “center (or central) frequency” CF. Secondly, the analytical, (theoretical, symbolic) bandwidth is calculated either using percent of the frequency/order (here Band Width BW ) or number
Overview of Practical Aspects of Evaluation
213
of “resolution points” Nres, respectively: band_definition_1 = [CF − CF ∗ BW /2 : CF + CF ∗ BW /2] band_definition_2 = [CF − k ∗ Nres : CF + k ∗ Nres] Tertiary, when a vibration signal is recorded, the final spectral lines’ indexes are calculated. Every spectral component, like shaft harmonic, GMF, BPFO, BPF, and others, is calculated on the basis of the kinetostatic analysis of the drive train of a rotary machine. The order of any component is speed-independent, so it does not change. Clearly, in both cases, the perfect matching of a single spectral component with the output shaft is practically impossible, as it would require infinite signal length (or infinite number of revolutions Nrot). Practical considerations about spectral resolution and the spectral bandwidth for frequency and order domains are illustrated in Table 1. Table 1. Parameters for frequency spectrum and order spectrum Frequency domain
Order domain
Counterparts
Sampling frequency: fs
Number of samples per rotation: Nsamp
Counterparts
Signal total length: T
Number of revolutions in a signal Nrot
Counterparts
Number of points in the signal: N
Number of points in the signal: Nres
Relationship
T = N/fs
Nrot = Nres/Nsamp
Spectral resolution
df = 1/T (or fs/N)
d_ord = 1/Nrot (or Nsamp/Nres)
Spectral bandwidth
fs/2
Nsamp/2
Spectral axis
0:df:fs/2
0:d_ord:Nsamp/2
Index calculation
i = frequency[Hz]/df + 1
i = order[ord]/d_ord + 1
One of methods to determine the direction of location of nearest index is to require that both edges of the bandwidth select their nearest spectral component, and add this point to the selected data. Although this could be “left” or “right” direction, only the opposite direction for both edges guarantees that they will not select the very same point. Additional care needs to be taken for bandwidth, where left edge is smaller or equal the first df or d_ord value (beginning of the spectrum axis) and where the right edge is equal or greater the fs/2-df or Nsamp/2-d_ord values. In practice, the latter case is hardly ever met. In addition to selection of nearest neighbors, one simple method to improve the selection of spectral band is to force minimum number of selected spectral points. On the other hand, definition of a band consisting of a single point for tracking of the development of amplitude of a single theoretical components is justified when the data acquisition and data processing system together assure a high-level of synchronous, i.e. phase-locked data and reliable angular (i.e. order) domain data representation. Such data
214
A. Jablonski and T. Barszcz
is obtained using high resolution encoders. Analysis of single-point bandwidths is used sometimes for tracking of shaft harmonics. In this case, the band definition could be treated similar to a previous case, with additional condition: index_start = index_stop. A typical solution for determination of spectral ranges for narrowband indicators implemented in CMS is the 3% bandwidth, which means that for every defined frequency or order, the band covers a range from −1.5% to +1.5% of the frequency value. Considering a typical vibration 10 s signal, depending on the value of the center frequency, the bandwidth and resulting number of frequency components varies, as illustrated in Table 2. Table 2. Results of spectral indexing for typical CMS settings Frequency value 1 Hz
Bandwidth definition 3%
Total bandwidth 0.03 [Hz]
Theoretical bandwidth
Theoretical/practical No. of points
0.985–1.015 [Hz]
1/3 3/3
10 Hz
3%
0.30 [Hz]
9.850–10.15 [Hz]
100 Hz
3%
3.00 [Hz]
98.500–101.5 [Hz] 31/31
1000 Hz
3%
30.00 [Hz]
10000 Hz
3%
300.00 [Hz]
985.000–1015 [Hz]
301/301
9850.000–10150 [Hz]
3001/3001
Clearly, in case of low frequencies, which is a typical range for many phase-locked components, the decision concerning the number of components, which is to be taken into account for calculations, might require some arbitrary user decisions on extensions to neighboring indexes. 3.4 Calculation of Spectral Amplitudes When the narrowband signal is extracted in the frequency or order domain, like in Fig. 4, i.e. when a sequence of numbers is extracted on the basis of calculated indexes, it is always a question how to process these numbers. Sometimes it is the peak value, sometimes RMS, sometimes power, and other times other algorithm. If the selected HI is to be domain-independent, typically spectral energy, power or RMS calculated using the Parseval’s Theorem are used. If the evaluation is to have commonly used scientific meaning, additional power density algorithm is typically used. When trend dynamics are of major interest, peak-based algorithms are used. On the contrary, when the reduction of false alert is of main concern, reflecting in desired HI variance minimization, the square root function-based algorithms are in favor. Table 3 shows how to calculate popular broadband health indicators. For calculation of narrowband spectral indicators, let’s assume that a variable “selected_y” refers to scaled, one-sided, full resolution spectral amplitudes extracted on the basis of spectral indexes. For such a variable, Table 4 illustrates selected possible calculations.
Overview of Practical Aspects of Evaluation
215
Table 3. Broadband health indicators Name
Time domain
Frequency domain
Energy
sum(x.ˆ2)
1/N * sum(abs(fft(x)).ˆ2)
Power
1/N * sum(x.ˆ2)
1/Nˆ2 * sum(abs(fft(x)).ˆ2)
RMS
sqrt(1/N * sum(x.ˆ2))
1/N * sqrt(sum(abs(fft(x)).ˆ2))
Table 4. Narrowband calculations Name
Code in frequency domain
Energy
N/2 * sum(selected_y.ˆ2)
Power
1/2 * sum(selected_y.ˆ2)
RMS
sqrt(1/2 * sum(selected_y.ˆ2))
Peak
max(selected_y)
Peak squared
max(selected_y.ˆ2)
Sum
sum(selected_y)
Sum of squares
sum(selected_y.ˆ2)
Normalized sum 1
1/length(selected_y) * sum(selected_y)
Normalized sum 2
1/bandwidth * sum(selected_y)
Normalized sum of squares 1
1/length(selected_y) * sum(selected_y.ˆ2)
Normalized sum of squares 2
1/bandwidth * sum(selected_y.ˆ2)
From practical point-of-view, energy is not used, simply because “the longer the signal, the higher its value”. For acceleration signals, selection between RMS and Peak is a tradeoff between higher stability (less trend variance) and higher dynamics. Sum is convenient for embedded designs due to quick calculation; however, when some physical meaning is required, Power is preferred. The last evaluation function, i.e. the sum of squares of scaled, one-sided spectral amplitudes of the narrowband signal divided by its bandwidth is a common spectral power density estimator. Taking into account the above, it is easy to understand that even simplest condition monitoring units calculate a common set of indicators (at least both, Peak, and RMS in both, Acceleration and Velocity domains).
4 Additional Discussion One way or the other, almost everyone working with vibration data calculates many scalar indicators. Note that in practice, the final range of values of individual HIs depends
216
A. Jablonski and T. Barszcz
on the selected units in the system. When system operates in [g] units, all but highamplitude shaft harmonics have almost always initial spectral amplitudes smaller than 1. In this case, any “power”-related representation inevitably produces the final value closer to 0, while any square root produces it closer to 1. This is the reason, why except signal energy, the sum and peak values are typically characterized by highest values. Naturally, for original amplitudes greater than 1, the consequences are opposite. In case of Velocity representations, typical significant values are greater than 1. However, when the sensitivity of individual HI is considered, i.e. when relative change is tracked, practically any evaluation function which squares the data increases HI dynamics, while any evaluation, which includes normalization of sum of values decreases HI dynamics. As a consequence, the more sensitive the trend value, it is inevitably characterized by larger variance, and vice-versa.
5 Conclusion The issue of evaluation of trend data from spectral data could be analyzed from various perspectives. The current paper has shown that definition of more than one path of calculations is practically inevitable, because different signal processing algorithms work better for different cases. The paper could be a friendly start for academic researchers or could be used as a reference for practitioners, who want to understand more about data processing in embedded condition monitoring systems. Acknowledgement. The book is partially supported by the Grant No. POIR.04.01.04-00-0115/17 funded by The National Centre for Research and Development, Poland.
References 1. Randall, R.B., Antoni, J.: Rolling element bearing diagnostics—a tutorial. Mech. Syst. Sig. Process. 25(2), 485–520 (2011) 2. Yoon, J., et al.: Planetary gearbox fault diagnosis using a single piezoelectric strain sensor. In: Proceedings of the Annual Conference on the Prognostics and Health Management Society, pp. 118–127 (2014) 3. Bechhoefer, E., Duke, A., Mayhew, E.: A case for health indicators vs. condition indicators in mechanical diagnostics. Annu. Forum Proc. AHS Int. 63(2), 1468 (2007) 4. Sharma, V., Parey, A.: A review of gear fault diagnosis using various condition indicators. Proc. Eng. 144, 253–263 (2016) 5. Dynamic Energy Index (DEI): GE Energy Fact Sheet Template, Bently Nevada Asset Condition Monitoring (2011) 6. Zapalla, D., et al.: Sideband Algorithm for Automatic Wind Turbine Gearbox Fault Detection and Diagnosis (Preprint), European Wind Energy Association 2013 Annual Event, Vienna, Austria (2013). https://www.nrel.gov/docs/fy13osti/57395.pdf 7. Hanna, J., et al.: Detection of Wind Turbine Gear Tooth Defects Using Sideband Energy Ratio™, China Wind Power, Beijing, China, Oct. 2011, pp. 19–21 (2011)
Automatic Detection of Rolling Element Bearing Faults to Be Applied on Mechanical Systems Comprised by Gears Andy Rodríguez1 , Fidel Hernández2(B) , and Mario Ruiz3 1 Faculty of Physics, University of Havana, Havana, Cuba 2 Faculty of Telecommunications and Electronics, Technological
University of Havana, Havana, Cuba [email protected] 3 Faculty of Technical Sciences, University of Pinar del Rio, Pinar del Rio, Cuba [email protected]
Abstract. This research aims to develop and validate a method for tacho-less automatic detection of faults in rolling bearings for mechanical systems comprised by gears. The proposed method was based on the application of some mode decomposition technique in order to extract monocomponent signals from the vibration and to calculate an indicator of the modulation produced by the rolling element bearing fault. The computation of this indicator was performed by means of Lock-in Amplifiers, which are used in order to extract, through a synchronous approach, spectral components from non-stationary signals. A novel algorithm, previously applied on gear fault detection, was adapted and used in order to estimate the rotational speed. The effectiveness of the method was assessed through experiments with real signals. Besides, the capability of the indicator for serving as a relative measure of the fault severity was verified. Keywords: Gear vibration · Singular Spectrum Decomposition · Modulation detection · Lock-in Amplifiers · Rolling bearing fault · Fault detection
List of Symbols, Abbreviations and Acronyms ak Aa Ai Ao Xk P(t) θi (n) θr (n) wj
Amplitude modulation function related to the shaft rotational speed Amplitude modulation of the vibration produced by a faulty rolling element bearing Amplitude of the modulation produced by the input gear Amplitude of the modulation produced by the output gear Amplitude of the k-th harmonic of the gear mesh frequency Angle of a frequency modulation signal Angle of the analytical signal of the monocomponent signal i Angle of the rotational speed signal Average angular speed between the j-th and the (j − 1)-th impulse produced by a faulty rolling element bearing
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 217–234, 2022. https://doi.org/10.1007/978-3-030-82110-4_12
218
A. Rodríguez et al.
αc αi αo αr β x(n) Ψi E{·} α fres fc fmod mV g˜ i (n) s(n) ϕk F(t) LIA Pd g˜ i (n) DN (fc , fmod )3 K Nb fo fr Ts SNR SSD DN (fc , fmod )3 TARSE T Ti
w(n) x(n)
Average occurrence period in angular domain Characteristic coefficient for the cage fault Characteristic coefficient for the inner race fault Characteristic coefficient for the outer race fault Characteristic coefficient for the rolling element fault Contact angle Digital model of the gear vibration Equivalent jitter in the angular domain Expected value operator Fault characteristic coefficient Frequency of a monocomponent signal Frequency of the carrier signal Frequency of the modulating signal Gear ratio Hilbert transform of the monocomponent signal i Impulse response stimulated by a local rolling element bearing defect Initial phase of the k-th harmonic of the gear mesh frequency Instantaneous frequency of a frequency modulation signal Lock-in Amplifier Mean diameter Monocomponent signal i Normalized modulation detection index Number of harmonics of the digital model of the gear vibration Number of rolling elements Rotational speed of the output gear Rotational speed Sampling period Signal-to-Noise Rate Singular Spectrum Decomposition Smoothed version of the normalized modulation detection index Tacho-less Automatic Rotational Speed Estimation Teeth number of the gear Time instant of impulse occurrence due to a vibration produced by a faulty rolling element bearing Vibration background noise Vibration produced by a faulty rolling element bearing
1 Introduction Several techniques have been developed in order to detect faults in rolling element bearings. Techniques based on signal deconvolution (Wang et al. 2019a, b), matching pursuit (Qin et al. 2020), spectral coherence (Wang et al. 2019a, b), wavelet transform (Qin 2018), sparse feature identification (Du et al. 2015) and deep learning (Guo et al. 2019), have exhibited suitable effectiveness under constant rotational speed condition.
Automatic Detection of Rolling Element
219
When the rotational speed is variable the effectiveness of such techniques is degraded, then computational order tracking (COT) is more suitable to be used (Wang et al. 2020). The implementation of COT requires working with the rotational speed, which can be gathered by means of a tachometer. When a tachometer can’t be used, the rotational speed could be estimated from the vibration signal. Several algorithms have addressed the estimation of the rotational speed from the vibration signal (Lu et al. 2019; Hou et al. 2019) (Peeters et al. 2019). For example, in (Bonnardot et al. 2005), an algorithm based on the demodulation of a harmonic of the gear mesh frequency and the subsequent signal resampling was proposed. In (Urbanek et al. 2013), the estimation of the instantaneous frequency was based on phase demodulation foundations and a joint time-frequency analysis. Other works have also addressed the application of monocomponent signal decomposition. In (Zhao et al. 2013), this signal decomposition was used in order to apply phase demodulation on a harmonic of the gear mesh frequency for large variability of the rotational speed. However, in that work, the relation of the resulting monocomponent signals with the corresponding vibration source is very difficult to be found out. Most of aforementioned techniques require the participation of an expert, which permeate the procedure with subjectivity. Obviously, the objective automation of the fault detection technique is desirable. In (Wang et al. 2020), an automatic method, able to detect rolling element bearings faults under variable rotational speeds, was proposed. This technique performs the speed estimation by working with the main component, or certain harmonic, of the rotational speed, that is, the “baseband rotational speed signal”. The detection of this component and the identification of the harmonic order are carried out through surrogate tests. The problem here is that such baseband rotational speed signal (vibration produced by unbalances, misalignments, etc.) can arise at very low magnitude. This work is focused to develop and validate an automated method for rolling element bearing fault detection under variable rotational speed conditions. The rotational speed is estimated from the vibration signal and is not dependent on the work with the baseband rotational speed signal. This proposal requires that the rotating machinery is comprised by gear pairs, since it performs an adaptation of the technique TARSE (Tacho-less Automatic Rotational Speed Estimation) (Ruiz et al. 2019) in order to estimate the rotational speed signal.
2 Vibration Produced by a Faulty Rolling Element Bearing When a rolling element bearing has a local fault, an impact is produced whenever each rolling element passes over the local fault (Hernández and Caveda 2007). Then, the resulting vibration is comprised by short and attenuated resonant impulses that occur whenever a rolling element passes over the fault. The detection of a faulty component can be accomplished through the identification of a vibration spectral pattern that involves spectral components separated by the frequency of such resonant impulses. The frequencies of the resonant impulses produced by a fault in some of the rolling element bearing components, known as fault characteristic frequencies, are determined as the product of the shaft rotational speed and a coefficient known as fault characteristic
220
A. Rodríguez et al.
coefficient. The main fault characteristic coefficients, αo for the outer race fault, αi for inner race fault, αr for the rolling element fault, and αc for the cage fault, are determined as follows (Hernández and Atxa 2007): Nb Bd αo = 1− cosβ (1) 2 Pd Bd Nb 1+ cosβ (2) αi = 2 Pd Pd Bd 2 αr = cos2 β (3) 1− 2 2Bd Pd Bd 1 1− cosβ (4) αc = 2 Pd where Pd is the mean diameter, Nb is the number of rolling elements, Bd is the rolling element diameter and β is the contact angle. The spectrum of the vibration produced by a faulty rolling element bearing, working under a constant shaft rotational speed, corresponds to the spectrum of an amplitude modulation signal, which consists of spectral lines allocated around the excited resonant frequencies and separated by the fault characteristic frequency. Most techniques implemented in order to detect rolling element bearings faults are based on the fact that if a fault is running, such characteristic spectrum is obtained. However, in case of variable rotational speeds, the spectrum of the modulation smears around the resonant frequencies, which makes the procedure of spectral components identification, carried out by traditional techniques, turn into a very difficult task. The vibration produced by a faulty rolling element bearing can be characterized by the following expression (Borghesani et al. 2013): +∞ Aa s(n − Ti ) + w(n) (5) x(n) = i=−∞
where Aa is the amplitude modulation, s(n) depicts the impulse response stimulated by the local defect and w(n) is the background noise. The time instant of impulse occurrence is characterized by Ti and calculated as i + Ψi (6) Ti = j=1 w j where the average angular speed between the j-th and the (j − 1)-th impulse is depicted as wj , is the average occurrence period in angular domain, and Ψi is the equivalent jitter in the angular domain.
3 Model of the Gear Vibration Signal The gear vibration signal can be described by the sum of modulated sinusoids at frequencies equal to multiples of the gear mesh frequency, as follows (Bonnardot et al. 2005; Ruiz et al. 2019; Villa et al. 2011; Ruiz et al. 2017): K Xk [1 + ak (n)] cos{2π kTfr (n)n + ϕk } (7) x(n) = k=1
Automatic Detection of Rolling Element
221
where Xk and ϕk are the amplitude and initial phase of the k-th harmonic of the gear mesh frequency, respectively, K is the number of harmonics, fr (n) is the rotational speed, T is the teeth number of the gear, Tf r (n) is the gear mesh frequency, and ak (n) is the amplitude modulation function related to the shaft rotational speed, expressed as follows (Ruiz et al. 2019): ak (n) = Ai sin(2π fr (n)n) + Ao sin(2π mV fr (n)n)
(8)
where Ai and Ao are the amplitudes of the modulation produced by the input and the output gear, respectively, and mV is the gear ratio. Thus, fo (n) = mV fr (n) is the rotational speed of the output gear. 3.1 A Gear Vibration Model Update Proposal It is well known that the instantaneous frequency, F(t), of a frequency modulation signal 1 dP (t) is determined as F(t) = 2π dt , where P(t) is the angle of the modulated sinusoid (Carlson 2002). If the instantaneous frequencies of the modulated sinusoids in expression (7) are computed, it will be noticed that such sinusoids are not given at frequencies equal to the expected multiples of the gear mesh frequency; they are equal to terms r (n) instead. Only if fr (n) is constant, the instantaneous frequencies proportional to dfdn equal the expected ones. Accordingly, the instantaneous frequencies corresponding to the first and the second terms in expression (8) do not equal multiples of fr (n) and mV fr (n), respectively. That is why a modification of the gear vibration model is proposed in this work: n K Xk [1 + ak (n)] cos{2π kTTs fr (i) + ϕk } (9) x(n) = k=1
ak (n) =
J j=1
i=0
n [Aij sin 2π jTs
i=0
L n fr (i) ] + [Aol sin 2π mV lTs l=1
i=0
fr (i) ] (10)
where Ts is the sampling period. Since several harmonics of the first and the second sinusoidal terms in expression (8) could appear, J and L are set to denote the number of such harmonics, respectively. Clearly, in case of expression (9), F(n) = kTfr (n), and the instantaneous frequencies of the sinusoidal terms in expression (10) are the expected ones.
4 The Automatic Method for Rolling Element Bearing Fault Detection for Systems Comprised by Gears In this work, a method for automatic and tacho-less detection of faults in rolling element bearing under variable rotational speed conditions, suitable to be applied on mechanical systems comprised by gears, is proposed. This method is based on the application of monocomponent signal decomposition techniques and Lock-in Amplifiers (LIA) (Ruiz et al. 2017) in order to identify the modulation produced a fault in the rolling element bearing. Since this implementation requires to work under variable rotational speed
222
A. Rodríguez et al.
conditions, the estimation of the rotational speed signal is performed by processing the vibration produced by the gear pair. The method consists of two main steps: 1. Application of a monocomponent signal decomposition procedure to the vibration signal. (In this work, Singular Spectrum Decomposition (Bonizzi et al. 2014), SSD, was used since this technique has been successfully applied on vibration analysis (Ruiz et al. 2019; Calzadilla et al. 2020). 2. Identification of the delivered monocomponent signal that corresponds to the modulation produced by the rolling element bearing fault (it is expected that some delivered monocomponent signal is close related to the vibration produced by the rolling element bearing fault it such a fault exists). If this modulation is not detected, a no-fault condition is assumed. The modulation detection is implemented by means of the computation of the normalized modulation detection index, presented in (Ruiz et al. 2019) and expressed as follows:
E Z(f − fmod )Z(f + fmod ) Z(f ) e−j2ang[Z(f c )] 2 c c c
(11) DN (fc , fmod )3 =
2
2 E Z(f c − fmod )Z(f c + fmod ) E Z(f c )
where E{·} is the expected value operator, Z(f ) is the value of the spectral component of the signal under analysis at frequency f , fc and fmod are the frequencies of the carrier signal and the modulating signal, respectively, and ang[Z(f c )] is the phase of the spectral component of the carrier signal. This index is computed for every signal delivered by the monocomponent signal decomposition procedure. For a given monocomponent signal, the idea behind the application of expression (11) is to detect the modulation whose carrier signal frequency equals the frequency of the monocomponent signal, fres , and whose modulating frequency equals the product of the shaft rotational speed, fr , and the fault characteristic coefficients, α. That is, if the monocomponent signal is the modulation produced by a fault in any of the rolling element components, then DN (fres , α · fr )3 yields a high value (close to 1). The calculus is done for the four possible values of α. This method requires to calculate the value of spectral components at frequencies fres − (α · fr ), fres and fres + (α · fr ). Since the shaft rotational speed can vary, the Fourier transform technique is not suitable to be used for this purpose. That is why such spectral values are determined by using LIA (Gaspar et al. 2004), in the same way as performed by (Ruiz et al. 2019). In order to obtain the value of the spectral component at frequency fres , the LIA’s reference signal is a sinusoidal signal with angle equal to the angle of the analytical signal of the monocomponent signal under analysis. This angle is expressed as follows:
θi (n) = tan
˜ i (n) −1 g g˜ i (n)
where g˜ i (n) is the Hilbert transform of the monocomponent signal g˜ i (n).
(12)
Automatic Detection of Rolling Element
223
The calculation of the spectral components at frequencies fres −(α·fr ) and fres +(α·fr ) is performed by using two LIA. The reference signal of one of the LIA is a sinusoidal signal with angle given by the angle of the analytical signal of the monocomponent signal under analysis minus the angle of a sinusoidal signal with frequency equal to α · fr . In case of the other LIA, the reference signal is also a sinusoidal signal with angle given by the angle of the analytical signal of the monocomponent signal plus the angle of a sinusoidal signal with frequency equal to α · fr . The angle of the sinusoid, whose frequency equals the rotational frequency, is estimated through TARSE algorithm (Ruiz et al. 2019), which is capable of providing the rotational speed signal from the vibration produced by gear pairs. The method proposed here performs the individual processing of both the vibration produced by a gear pair and the vibration produced by a faulty rolling element bearing. Accordingly, a filtering procedure could be convenient to be applied in order to separate the spectral bands of such signals whenever possible. Figure 1 shows a diagram of the proposed method.
Fig. 1. Method for automatic and tacho-less detection of rolling element bearing fault in mechanical systems comprised by gears
4.1 On the Application of TARSE’s Approach TARSE algorithm is a method designed for the estimation of the rotational speed signal from the vibration produced by a gear pair (Ruiz et al. 2019). This algorithm was developed in three steps. The first stage is focused on the decomposition of the vibration signal into monocomponents. Since the mechanical system is comprised by gear pairs, a monocomponent signal is expected to correspond to expression (9), that is, the vibration produced by the gear pair. The second stage addresses the detection of such a monocomponent signal, which must consist of an amplitude modulation with carrier frequency signal equal to the gear mesh frequency or any of its harmonics. This detection is accomplished through the computation of the index DN (fc , fmod )3 , for fc corresponding to the gear mesh frequency and fmod corresponding to the expected rotational speed. For the TARSE’s third stage, the angle of the selected monocomponent signal, θk (n), is determined by means of expression (12).Finally, the angle of the rotational speed signal is computed as follows:
224
A. Rodríguez et al.
θr (n) =
θk (n) kT
(13)
However, for the monocomponent signal that has been selected, the identification of the harmonic order of the gear mesh frequency is difficult to be attained. This is due to the fact that high values of DN (fc , fmod )3 can be obtained for different values of k. This issue can make the TARSE lead to an incorrect estimation of the rotational speed signal. That is why, in this work, the estimated rotational speed signal (obtained at the TARSE’s third stage) will not be defined as such; it will be assumed to be the first, the second or the third harmonic of the meshing frequency. If this is applied on the detection of the amplitude modulation produced by a fault in the rolling element bearing, then such a modulation will be detected only if the right harmonic number of the meshing frequency has been assumed.
5 Working with Simulation Signals In order to verify the performance of the proposed method, several experimentations were accomplished through the work with simulation signals. The signals corresponding to the vibration produced by a fault in the outer race of a rolling element bearing (expressions (5) and (6) were used), the vibration produced by a gear pair (modeled by expressions (9) and (10)) and background noise (Gaussian noise) were simulated. The fault characteristic coefficient, αo , was chosen to be 4, and the rotational speed profile shown in Fig. 2 was generated and used. The resonant frequency excited by the rolling element bearing fault was simulated to be 9 kHz. The gear was simulated to comprise an 18 teeth pinion and the gear vibration was consisted of the first and second harmonic of the meshing frequency, modulated by the first harmonic of the rotational speed. Gaussian noise was added to the signal, providing a Signal-to-Noise Rate (SNR) equal to 10 dB. The work was performed by using Matlab.
Fig. 2. Profile of the rotational speed signal for the simulation signal
The first task implemented was the estimation of the rotational speed through the gear pair vibration in the simulation signal.
Automatic Detection of Rolling Element
225
Since the maximum meshing frequency was about 848 Hz (see Fig. 2), a low-pass filter, with cutoff frequency equal to 2 kHz, was applied in order to isolate the gear pair vibration and consequently to deliver it to TARSE. The TARSE’s first stage performed the monocomponent decomposition of the signal and the index DN (fc , fmod )3 was calculated for all delivered signals. For each monocomponent signal delivered, this index was computed making the frequency fc be the result of the application of Hilbert transform on the monocomponent signal and making the frequency fmod be fc divided by the number of gear teeth. Figure 3 reveals that two monocomponent signals were delivered and that the highest value of DN (fc , fmod )3 was achieved by the monocomponent signal number 2. Thus, this monocomponent signal was assumed to be the one related to the meshing frequency (i.e., the signal related to the rotational speed signal) and used in order to estimate the rotational speed signal. However, it’s not possible to know which harmonic order of the meshing frequency the chosen monocomponent signal represents. That is why, according to the algorithm, the detection of the modulation produced by the rolling element bearing fault is performed by assuming that this monocomponent signal is the first, the second or the third harmonic of the meshing frequency. If any of these modulations does not reach to be detected, then a no-fault condition is assessed.
Fig. 3. Values of DN (f c , f mod )3 for the monocomponent signals delivered by the TARSE’s monocomponent signal decomposition technique (simulation signal, SNR = 10 dB)
Afterwards, the algorithm SSD was applied to the unfiltered vibration signal and the index DN (fc , fmod )3 was applied to the delivered monocomponent signals. In this case, this index was computed making the frequency fc be the result of the application of Hilbert transform on the monocomponent signal and making the frequency fmod be (1) the product of αo = 4 and the rotational speed signal (already estimated); (2) the product of αo = 4 and half of the estimated rotational speed; and (3) the product of αo = 4 and a third of the estimated rotational speed. This work is taking into account that the rotational speed estimation could be based on the work with the first, the second, or the third harmonic of the meshing frequency. For the three choices of fmod , the highest values of the computed DN (fc , fmod )3 are presented in Table 1. Table 1 reveals that a very high value (the highest one) of DN (fc , fmod )3 was obtained by assuming that the monocomponent signal chosen by TARSE algorithm corresponded to the first harmonic of the meshing frequency. Since a very high value of DN (fc , fmod )3 was achieved at
226
A. Rodríguez et al.
this stage, the modulation produced by the rolling element bearing fault is assumed to exist, and then the fault existence is confirmed. The values of DN (fc , fmod )3 , computed under the assumption that the selected monocomponent signal corresponds to the first harmonic of the meshing frequency, are shown in Fig. 4. Table 1. Highest values of DN (f c , f mod )3 obtained for each harmonic assumption (simulation signal, SNR = 10 dB). Order of the assumed meshing frequency harmonic (TARSE algorithm)
Highest value of DN (fc , fmod )3 (detection of the modulation produced by the faulty bearing)
1st
0.9978
2nd
0.6004
3rd
0.7319
Fig. 4. Values of DN (f c , f mod )3 for the monocomponent signals delivered in the stage of detection of faulty bearing modulation (simulation signal, SNR = 10 dB)
In order to study the behavior of DN (fc , fmod )3 , in particular, when combinations of fc and fmod coincide with spectral components of a modulation or not, this index was computed for fmod = α · fr , where α was varied from 1 to 15.5 (a range that includes the value of the fault characteristic coefficient, αo = 4). Figure 5 shows the result of this computation. Despite the high variability exhibited by the values of DN (fc , fmod )3 , the highest values, achieved around α = 4, 8 and 12, that is, multiples of αo , can be observed. The curve of DN (fc , fmod )3 , for a given α, shows a high variability that can be smoothed by a filtering procedure. In this work, a 9th-order average filter was used. This specific filter was also used for the rest of the experimental work. The smoothed version of DN (fc , fmod )3 , denoted as DN (fc , fmod )3 , can be shown in Fig. 6. Likewise the behavior of DN (fc , fmod )3 , DN (fc , fmod )3 shows the highest values at multiples of αo . The next experiment addressed the analysis of the performance achieved by the proposed method when the SNR is lower. Then, the same work was done for SNR = 2 dB.
Automatic Detection of Rolling Element
227
Fig. 5. Plot of DN (f c , f mod )3 for the monocomponent signal number 1, delivered in the stage of detection of faulty bearing modulation (simulation signal, SNR = 10 dB)
Fig. 6. Plot of DN (f c , f mod )3 for the monocomponent signal number 1, delivered in the stage of detection of faulty bearing modulation (simulation signal, SNR = 10 dB)
Four monocomponent signals were delivered as result of the monocomponent signal decomposition procedure involved in the estimation of the rotational speed. The result of the application of DN (fc , fmod )3 to such signals is shown in Fig. 7. This figure shows that the highest value of DN (fc , fmod )3 was achieved by the monocomponent signal number 2. Then, this monocomponent signal was chosen to be used for the detection of the modulation produced by the rolling element bearing fault (αo = 4). As performed in the previous simulation work, such a monocomponent signal was assumed to be the first, the second or the third harmonic of the meshing frequency, and the index DN (fc , fmod )3 was computed for every monocomponent signal delivered by the monocomponent decomposition technique. The procedure that followed was the analysis of the highest values of the computed DN (fc , fmod )3 for each assumption of harmonic order. Such values are presented in Table 2.
228
A. Rodríguez et al.
Fig. 7. Values of DN (f c , f mod )3 for the monocomponent signals delivered by the TARSE’s monocomponent signal decomposition technique (simulation signal, SNR = 2 dB) Table 2. Highest values of DN (f c , f mod )3 obtained for each harmonic assumption (simulation signal, SNR = 2 dB). Order of the assumed meshing frequency harmonic (TARSE algorithm)
Highest value of DN (fc , fmod )3 (detection of the modulation produced by the faulty bearing)
1st
0.9575
2nd
0.6129
3rd
0.6414
Just like the result achieved in the previous experiment, Table 2 reveals that a very high value (the highest one) of DN (fc , fmod )3 was obtained by assuming that the monocomponent signal chosen corresponded to the first harmonic of the meshing frequency. Such high value of DN (fc , fmod )3 is an indicator of the existence of the modulation produced by the rolling element bearing fault and then, the existence of the fault. The results of the application of the calculus of DN (fc , fmod )3 (in order to detect the modulation produced by the faulty bearing) to each monocomponent signal delivered by the SSD algorithm, under the assumption that the previously-selected monocomponent signal corresponded to the first harmonic of the gear mesh frequency, is shown in Fig. 8. The behavior of the smoothed version of DN (fc , fmod )3 , DN (fc , fmod )3 , for the monocomponent signal number 2 delivered by the SSD algorithm, for fmod = α · fr , can be shown in Fig. 9. The behavior resulted very similar to that obtained in the previous experiment: modulations were detected for α = αo , α = 2αo and α = 3αo . If the results shown in Fig. 6 are compared with those shown in Fig. 9, the decrease of the top values is revealed. This means that as the SNR diminish, the values of DN (fc , fmod )3 tend to diminish as well.
Automatic Detection of Rolling Element
229
Fig. 8. Values of DN (f c , f mod )3 for the monocoponent signals delivered in the stage of detection of faulty bearing modulation (simulation signal, SNR = 2 dB)
Fig. 9. Plot of DN (f c , f mod )3 for the monocomponent signal number 2, delivered in the stage of detection of faulty bearing modulation (simulation signal, SNR = 2 dB)
6 Working with Real Signals The performance of the proposed algorithm was also validated by working with real signals. These signals were provided by data-acoustics.com and came from a 2 MW wind turbine, whose high speed shaft was driven by a 20 teeth pinion and was comprised by a rolling element bearing with an inner race fault. The signals were gathered for 50 days. Several vibration measures were performed for 50 days. The sampling frequency was 97.656 kHz. The fault characteristic coefficient for the inner race, αi , was 9.46. Since the spectral components of the resonant bearing vibrations are located at a spectral band between 9 kHz and 11 kHz and the maximum rotational speed is about 32 Hz (hence, the top meshing frequency is about 20 · 32 Hz = 640 Hz) (Bechhoefer et al. 2013), the spectral separation of both gear and bearing vibrations through a filtering procedure is suitable to be done. The method was applied on the vibration signal registered on day 50. It was expected that the rolling element bearing fault is more severe at that time. Likewise the work performed by using simulation signals, the first step addressed the selection of the decomposed monocomponent signal related to the gear vibration. Thus,
230
A. Rodríguez et al.
firstly the rolling element bearing vibration was filtered out (cut-off frequency equal to 2 kHz) from the vibration signal. Then, the SSD algorithm was applied on the resulting signal and values of DN (fc , fmod )3 were computed for each delivered monocomponent signal (see Fig. 10). The highest value of DN (fc , fmod )3 was achieved by the monocomponent signal number 1. Then, this monocomponent signal was chosen for being used in the next stage.
Fig. 10. Values of DN (f c , f mod )3 for the monocomponent signals delivered by the TARSE’s monocomponent signal decomposition technique (real vibration signal, day 50)
By assuming that the monocomponent signal number 1 was the first, second or third harmonic of the meshing frequency, the detection of the modulation produced by the faulty bearing was performed. The unfiltered vibration signal was decomposed into monocomponent signals and values of DN (fc , fmod )3 were computed in order to find the modulation produced by a bearing fault with characteristic coefficient αi = 9.4. For the three different harmonic order assumptions, the highest values of DN (fc , fmod )3 are presented in Table 3. Table 3. Highest values of DN (f c , f mod )3 obtained for each harmonic assumption (real vibration signal, day 50). Order of the assumed meshing frequency harmonic (TARSE algorithm)
Highest value of DN (fc , fmod )3 (detection of the modulation produced by the faulty bearing)
1st
0.4924
2nd
0.4935
3rd
0.7393
Table 3 clearly reveals that a very high value of DN (fc , fmod )3 , 0.739, was achieved when the chosen monocomponent signal was assumed to be the third harmonic of the meshing frequency. This value was the highest. Then, it can be concluded that the modulation produced by a fault in the rolling element bearing exists, and then, the fault existence is confirmed. The values of DN (fc , fmod )3 , computed for every decomposed
Automatic Detection of Rolling Element
231
monocomponent signal, under the assumption that the previously-selected monocomponent signal corresponded to the third harmonic of the gearmesh frequency, are shown in Fig. 11. This figure shows that the monocomponent signal corresponding to the vibration produced by the faulty bearing was the monocomponent number 4 (this signal yielded the highest value of DN (fc , fmod )3 ).
Fig. 11. Values of DN (f c , f mod )3 for the monocoponent signals delivered in the stage of detection of faulty bearing modulation (real vibration signal, day 50)
The behavior of DN (fc , fmod )3 , applied to this monocomponent signal, for fmod = α · fr , can be shown in Fig. 12. This figure shows that the modulation was detected for α ≈ αi = 9.46, which is coherent with the results obtained by the work with simulation signals.
Fig. 12. Plot of DN (f c , f mod )3 for the monocomponent signal number 4, delivered in the stage of detection of faulty bearing modulation (real vibration signal, day 50)
In order to study the behavior of the computed DN (fc , fmod )3 as the fault severity increases, the same procedure was applied on the vibration signals gathered on previous days. Behaviors of DN (fc , fmod )3 , for days 20, 33, 48 and 50, are shown in Fig. 13. Figure 13 shows that the value of DN (fc , fmod )3 , for α = αi = 9.46, increases as the operation time of the rolling element bearing increases. This means that as the
232
A. Rodríguez et al.
Fig. 13. Plots of DN (f c , f mod )3 (real vibration signal, days 20, 33, 48 and 50)
operation time increases, the fault severity increases too; thus the magnitude of the arising modulation is larger. In this case, the method was able to start to detect clearer the rolling element bearing fault from day 33. This demonstrates that the method is also suitable to be used as a relative fault severity measure. Other interesting issue is revealed in Fig. 13: local maximum values were obtained around α = 2, 3, . . . In the context of an inner race fault, these maximum values are indicating the detection of a modulation whose modulating frequency is equal to the rotational speed and multiples of it. Such modulation is the modulation that also appears in the vibration produced by a fault in the inner race of a rolling element bearing. That is why, maximum values of DN (fc , fmod )3 can be obtained at multiples of the rotational frequency: α = 1, 2, 3… (the value of DN (fc , fmod )3 for α = 1 was not shown in Fig. 13 because the information that this data could provide is not needed for fault detection purposes).
7 Conclusions In this work, a new method was proposed for tacho-less detection of rolling element bearing faults, to be applied on mechanical systems comprised by gears under variable rotational speed conditions. Since the proposed method uses TARSE algorithm, information about the rotational speed is not required. TARSE algorithm (based on the vibration produced by a gear pair) constitutes a useful tool to estimate the rotational speed signal whenever the baseband rotational speed signal cannot be feasibly retrieved. The proposed method was able to detect the modulation produced by faulty rolling element bearing through the application of a monocomponent signal decomposition technique and Lock-in Amplifiers for computing an indicator of modulation existence, DN (fc , fmod )3 , and then, the fault presence.
Automatic Detection of Rolling Element
233
Several experiments were carried out in order to validate the proposed method. In this work, the problem of TARSE algorithm, which actually delivers a signal that is not exactly the rotational speed signal, was stated. However, the effective application of TARSE algorithm on the detection of rolling element bearing faults was confirmed. The capability of the indicator DN (fc , fmod )3 to act as a relative measure of the fault severity was proven; it was verified that the value of this indicator increases as the fault becomes more severe. On the other hand, a new description of the gear vibration model was proposed as a solution to some limitations found in the already stated proposals. Acknowledgements. The authors would like to thank data-acoustics.com for supplying the real vibration signals used in this work.
References Bechhoefer, E., Van Hecke, B., He, D.: Processing for improved spectral analysis. In: Annual Conference of the Prognostics and Health Management Society, pp. 14–17 (2013). https://www.phm society.org/sites/phmsociety.org/files/phm_submission/2013/phmc_13_006.pdf. Accessed 1 Feb 2021 Bonizzi, P., Karel, J.M.H., Meste, O., Peeters, R.L.M.: Singular spectrum decomposition: a new method for time series decomposition. Adv. Adapt. Data Anal. 06(04), 1450011 (2014). https:// doi.org/10.1142/S1793536914500113 Bonnardot, F., El Badaoui, M., Randall, R.B., Danière, J., Guillet, F.: Use of the acceleration signal of a gearbox in order to perform angular resampling (with limited speed fluctuation). Mech. Syst. Signal Process. 19(4), 766–785 (2005). https://doi.org/10.1016/j.ymssp.2004.05.001 Borghesani, P., Ricci, R., Chatterton, S., Pennacchi, P.: A new procedure for using envelope analysis for rolling element bearing diagnostics in variable operating conditions. Mech. Syst. Signal Process. 38(1), 23–35 (2013). https://doi.org/10.1016/j.ymssp.2012.09.014 Calzadilla, A., Catalá, A., Hernández, F.E., Rodríguez, A., Ruiz, M.L.: Assessing the monocomponent decomposition technique able to more accurately deliver the vibration produced by a gear. J. Appl. Res. Technol. 18(3) (2020). https://doi.org/10.22201/icat.24486736e.2020.18.3. 1087 Carlson, B., Crilly, P.: Cummunications Systems: An Introduction to Signals and Noise in Electrical Communication, 5th edn. McGraw Hill (2010) Chapin, L. (ed.): Communication Systems. ITIFIP, vol. 92. Springer, Boston, MA (2002). https:// doi.org/10.1007/978-0-387-35600-6 Du, Z., Chen, X., Zhang, H., Yan, R.: Sparse feature identification based on union of redundant dictionary for wind turbine gearbox fault diagnosis. IEEE Trans. Industr. Electron. 62(10), 6594–6605 (2015). https://doi.org/10.1109/TIE.2015.2464297 Gaspar, J., Chen, S.F., Gordillo, A., Hepp, M., Ferreyra, P., Marqués, C.: Digital lock in amplifier: study, design and development with a digital signal processor. Microprocess. Microsyst. 28(4), 157–162 (2004). https://doi.org/10.1016/j.micpro.2003.12.002 Guo, L., Lei, Y., Xing, S., Yan, T., Li, N.: Deep convolutional transfer learning network: a new method for intelligent fault diagnosis of machines with unlabeled data. IEEE Trans. Industr. Electron. 66(9), 7316–7325 (2019). https://doi.org/10.1109/TIE.2018.2877090 Hernández Montero, F.E., Atxa Uribe, V.: Aplicación de técnicas clásicas y avanzadas de procesamiento de vibraciones al diagnóstico de cojinetes. Análisis experimental. Ingeniería Mecánica 10(1), 71–82 (2007). https://www.redalyc.org/pdf/2251/225117649010.pdf. Accessed 1 Feb 2021
234
A. Rodríguez et al.
Hernández Montero, F.E., Caveda Medina, O.: Consideraciones para la aplicación del procesamiento ciclo estacionario avanzado al diagnóstico de cojinetes de rodamientos. Ingeniería Mecánica 10(2), 41–46 (2007). https://www.redalyc.org/pdf/2251/225117646005.pdf. Accessed 1 Feb 2021 Hou, B., Wang, Y., Tang, B., Qin, Y., Chen, Y., Chen, Y.: A tacholess order tracking method for wind turbine planetary gearbox fault detection. Measurement 138, 266–277 (2019). https://doi. org/10.1016/j.measurement.2019.02.010 Lu, S., Yan, R., Liu, Y., Wang, Q.: Tacholess speed estimation in order tracking: a review with application to rotating machine fault diagnosis. IEEE Trans. Instrum. Meas. 68(7), 2315–2332 (2019). https://doi.org/10.1109/TIM.2019.2902806 Peeters, C., et al.: Review and comparison of tacholess instantaneous speed estimation methods on experimental vibration data. Mech. Syst. Signal Process. 129, 407–436 (2019). https://doi. org/10.1016/j.ymssp.2019.02.031 Qin, Y.: A new family of model-based impulsive wavelets and their sparse representation for rolling bearing fault diagnosis. IEEE Trans. Industr. Electron. 65(3), 2716–2726 (2018). https://doi. org/10.1109/TIE.2017.2736510 Qin, Y., Zou, J., Tang, B., Wang, Y., Chen, H.: Transient feature extraction by the improved orthogonal matching pursuit and K-SVD algorithm with adaptive transient dictionary. IEEE Trans. Industr. Inf. 16(1), 215–227 (2020). https://doi.org/10.1109/TII.2019.2909305 Ruiz Barrios, M.L., Hernández Montero, F.E., Gómez , J.C., Palomino Marín, E.: Tacho-less automatic rotational speed estimation (TARSE) for a mechanical system with gear pair under non-stationary conditions. Measurement 145, 480–494 (2019). https://doi.org/10.1016/j.mea surement.2019.05.085 Ruiz Barrios, M.L., Hernández Montero, F.E., Gómez , J.C., Palomino, M.E.: Application of lockin amplifier on gear diagnosis. Measurement 107, 120–127 (2017). https://doi.org/10.1016/j. measurement.2017.05.015 Urbanek, J., Barszcz, T., Antoni, J.: A two-step procedure for estimation of instantaneous rotational speed with large fluctuations. Mech. Syst. Signal Process. 38(1), 96–102 (2013). https://doi. org/10.1016/j.ymssp.2012.05.009 Villa, L.F., Reñones, A., Perán, J.R., de Miguel, L.J.: Angular resampling for vibration analysis in wind turbines under non-linear speed fluctuation. Mech. Syst. Signal Process. 25(6), 2157–2168 (2011). https://doi.org/10.1016/j.ymssp.2011.01.022 Wang, D., Zhao, X., Kou, L.-L., Qin, Y., Zhao, Y., Tsui, K.-L.: A simple and fast guideline for generating enhanced/squared envelope spectra from spectral coherence for bearing fault diagnosis. Mech. Syst. Signal Process. 122, 754–768 (2019a). https://doi.org/10.1016/j.ymssp. 2018.12.055 Wang, Y., Tang, B., Qin, Y., Huang, T.: Rolling bearing fault detection of civil aircraft engine based on adaptive estimation of instantaneous angular speed. IEEE Trans. Industr. Inf. 16(7), 4938–4948 (2020). https://doi.org/10.1109/TII.2019.2949000 Wang, Z., et al.: Research and application of improved adaptive MOMEDA fault diagnosis method. Measurement 140, 63–75 (2019b). https://doi.org/10.1016/j.measurement.2019.03.033 Zhao, M., Lin, J., Wang, X., Lei, Y., Cao, J.: A tacho-less order tracking technique for large speed variations. Mech. Syst. Signal Process. 40(1), 76–90 (2013). https://doi.org/10.1016/j.ymssp. 2013.03.024
Health Monitoring of Moving/Rotary Structures: An Electromechanical Impedance Approach Using Integrated Piezoceramic Transducers Hamidreza Hoshyarmanesh1(B) and Ali Abbasi2 1 Project neuroArm, Health Research Innovation Center, Cumming School of Medicine,
University of Calgary, Calgary, AB T2N4Z6, Canada [email protected] 2 Advanced Engineering Systems for Industry, Algoritmi Research Center, School of Engineering, University of Minho, Braga, Portugal [email protected]
Abstract. This chapter presents development of a novel structural health monitoring methodology used for incipient damage detection in moving structures. This method is built upon implementation of low-cost deposition of piezoceramic transducers on crucial substrates together with electromechanical impedance (EMI) to provide a practical solution to damage/fault detection of moving structures that suffer from mechanical fatigue, thermal fatigue or corrosion, before any catastrophic failure. Important steps towards application of such technology are: i) Chemo-physical fabrication of piezoceramic transducers, i.e., deposition of precursor solutions or piezoelectric materials, on geometrically irregular structures; ii) Development of a portable impedance-based transceiver and signal processing algorithm capable of transmitting actuating signals as well as receiving and analyzing response signals to/from piezoelectric transducers, respectively; iii) Development of a monitoring algorithm, and iv) Methodology verification through bunch of semi-filed tests by building the prototype of a moving structure. Multiple characterizations are performed to assure the micro and macro-structural functionality, accuracy, and precision of fabricated transducers. To verify the functionality of the custom-built system, electromechanical response of the transducers and the results obtained from the transceiver are compared with commercially available piezoelectric wafers and standard impedance analyzers. Frequency response of transducers measured in a wide bandwidth shows obvious frequency shift and change in the admittance/ impedance amplitude corresponding to the resonance/antiresonance peaks at pristine vs. damaged conditions. Furthermore, for rotary structures, rotational speed and temperature play important role in this method. Application of this method could be extended to moving structures such as airplane engine blades, fuselage frames, wing ribs, helicopter main rotor assembly, critical parts of the exploration rovers, satellite loaded modules and thrusters, moving links/joints on the international space station, rotors in hydroelectric or nuclear power plants, autonomous underwater vehicles, submarine propellers, hydro/diving planes, wind turbines, hot/cold rollers in steel production lines, drones, mobile robots, etc.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 235–264, 2022. https://doi.org/10.1007/978-3-030-82110-4_13
236
H. Hoshyarmanesh and A. Abbasi Keywords: Structural health monitoring · Moving/rotary structures · Electromechanical impedance · Piezoceramic transducer · Thin/thick film deposition · Image processing · Portable transceiver · High temperature · Turbine blade
1 Introduction Structural health monitoring (SHM) of moving structures has recently been desired in different industries in the context of predictive maintenance, prognosis and diagnosis. This can be classified as damage detection, damage evaluation (size and shape), damage localization, and life-span estimation. Investigators around the world have studied this field in three specific categories: i) preparation, deposition, and characterization of smart sensors [1–9]; ii) innovations in wave propagation, and non-destructive SHM methods [10–16]; and iii) implementation of SHM methods in semi-field applications [17–20]. Although many challenges have been addressed for stationary SHM, there are still major shortcomings that limit conventional techniques to be used in application for moving SHM (MSHM), where there are crucial structures suffering from fatigue, corrosion, or chemical disruption. 1.1 Importance of MSHM Figure 1 illustrates the importance of the SHM in moving structures in different domains. Figure 1A depicts the catastrophic uncontained compressor failure on the 8-yearold McDonnell Douglas MD-88 in July 1996 [21]. Figure 1B shows the Qantas’ A380s engine failure in December 2010. Australian investigators found the engine turbine blades had either fractured or broken away causing an explosion in the Boeing 747’s Rolls Royce RB211 engine. The distinctive double-decker Airbus cost £156 m and came into service in September 2007 [22]. Figure 1C shows emergency landing of a Southwest Airlines plane in Arizona after a section of fuselage tore away during flight in April 2011. Investigators said the rip began where two outer panels were riveted together, and that the area around it showed evidence of pre-existing cracking due to fatigue [23]. Figure 1D shows the corroded hull of the Alliance, a royal navy submarine, laid down towards the end of the Second World War and completed in 1947 [24]. Figure 1E illustrates the damage to the Boeing 777 Pratt & Whitney PW4077 engine of the United Airlines flight UA328 that failed in February 2021 due to metal fatigue. Based on a preliminary assessment, two fan blades were fractured [25]. Figure 1F depicts the wheel damage sustained by NASA’s Mars rover “Curiosity” in 2014 [26]. 1.2 Failure Sources in Moving Structures Further to the errors originated or associated with design, manufacturing, assembling, inspection, and maintenance, the most important factors that cause failure in moving components are:
Health Monitoring of Moving/Rotary Structures
A
B
C
D
237
E F
Fig. 1. A) Catastrophic failure of McDonnell Douglas MD-88 in 1966, B) Qantas’ A380s engine failure in 2010, C) Southwest airlines plane emergency landing in Arizona in April 2011, D) Corrosion in submarine Alliance, E) Fan blade fracture in the Pratt & Whitney PW4077 engine of the United Airlines flight UA328, and F) Damage to Curiosity’s left-middle wheel.
• All types of wear: abrasive, adhesive, erosive, third-body contact, fatigue, corrosive, electrical discharge, fretting, and cavitation • Oxidation: gets worse as temperature increases • Surface cracks: due to mechanical vibrations, fatigue, or transient temperature tensions • Hot corrosion: emerges at high temperatures by penetration and chemical reaction at the presence of impurities and active gases Once a damage is initiated in a crucial structure in the form of a micro-crack, microhole, or thinning, the damage will expand over time to a depth from the surface and may finally cause reduction in mechanical strength and eventually failure. This failure may end up being an irrecoverable disaster. This could be avoided or prohibited by using an efficient predictive SHM technique.
238
H. Hoshyarmanesh and A. Abbasi
1.3 Conventional Methods Used for MSHM There exist a number of non-destructive SHM methods for stationary structures; nevertheless, a few methods have thus far been used for health monitoring of moving structures. Table 1 presents three examples of such methods together with their advantages and disadvantages. Table 1. Comparison of three conventional MSHM methods Methods
Advantage
Disadvantage
High speed thermal imaging • Non-destructive method • Wide scan imaging • Damage detection and localization
• Costly method requires ultra-high- speed camera (1 M frame/s) • Low accuracy at high speeds • Installation and implementation complexity
Tip-timing
• Non-destructive method • Ease of installation and implementation
• Unable to localize the damage • Cannot detect incipient damages, e.g., micro-cracks • Not an applicable solution for prognostic MSHM • Used for foreign object damage (FOD) detection
Fiber optics [27]
• High accuracy and repeatability • High sensitivity
• Fragile, not a robust method for MSHM • Cannot resist high temperatures
In thermal imaging, developed and used by Siemens, ultra-high-speed IR cameras (1 million frames per second) are deployed to take advantage of abnormal thermal radiation and distribution on the surface of crucial structures. This is caused by a higher heat dissipation rate in damage initiation zones [20, 28]. This is not yet a successful MSHM method in practice, as a structure moving at thousands of revolutions per minute would cause blurry images and probably restrict the visual monitoring system from detecting incipient damages in time. A stationary-based monitoring counterpart for this method was SIEMAT, developed by Siemens to work at low temperatures. This method implements ultrasound wave propagation to heat the structure. As the temperature distribution is inhomogeneous around the damage, IR cameras could detect the damage, size, and location in stationary mode [29]. The tip-timing method was unveiled by the Qinetiq industrial group in UK in 2004 in which acoustic emissions vibration sensors and high-temperature contaminationimmune Eddy current sensors were deployed to monitor rotary structures and turbine engines working at high speeds [19]. The system was controlled through a digital signal processor (DSP). Siemens Industrial Turbomachinery Ltd. is one of the largest industrial groups that has recently been testing this method on gas turbine engines. In this
Health Monitoring of Moving/Rotary Structures
239
method, eddy current sensors are fitted into the fan casing and detect the blade passing. The DSP then computes the speed signal (engine speed) from the tip-timing data and detects foreign objects or consecutive damages. As a foreign object enters the engine and hits the blades, it will cause some deflection to the blades and therefore change the speed of rotation. Figure 2 exhibits online health monitoring of turbine engines based on Tip-Timing method. This method is not able to detect incipient damages on the blades either, as there is a gap between sensors and the moving structure under test. Abnormal vibration usually occurs when some damage has sufficiently grown in a part and hence engaging neighboring parts of the structure. Detection of such faults may be too late to avoid catastrophic failures.
Fig. 2. Tip timing sensors fitted in the Spey fan casing of a turbine blades, Qinetiq [19]
To address the issues associated with the methods mentioned, here we present development of a novel low-cost MSHM method based on electromechanical impedance spectrum of integrated piezoelectric transducers. This method is able to continuously monitor crucial stationary and moving structures, during operation, with no obligation to shut the machines down. This method mitigates the challenges with conventional methods while enhancing the ability of monitoring systems to control losses, decrease maintenance/repair costs, reduce life-threatening failures, and save lives. 1.4 Constituent Elements of EMI-Based MSHM An effective MSHM system should entirely or partially include the following elements: • Reliable, light-weight, and integrated sensors/actuators resistant to wear, corrosion, and temperature • An accurate, and low-cost transceiver to be used as both wave generator and impedance analyzer • A precise software algorithm for damage detection, evaluation, localization, and lifetime estimation Complementary equipment, including but not limited to measurement devices, coating machines, characterization machines, amplifier, signal processor, human machine
240
H. Hoshyarmanesh and A. Abbasi
interface (HMI), data logger, and wireless data transmission ICs, are also employed in MSHM. Researchers have so far used several types of sensors for non-destructive SHM such as strain gauge, electroactive polymers, fiber optics, piezoelectric wafer active sensors (PWAS), magnetostrictive sensors, resistive sensors, eddy current sensors, and laser holographic sensors [27]. Among which, piezoelectric sensors have gained higher attention due to higher accuracy and simpler solutions for implementation. However, utilization of piezoelectric sensors in MSHM has considerable challenges. For MSHM, a network of active transducers is distributed at different spots of the substrate structure. The transducers could actively stimulate the structure (indirect actuation effect) and at the same time sense/acquire the reflected signals bounced back from the host structure (direct sensory effect). A transceiver is used to dynamically excite the substrate structure in the neighborhood zone of a specific transducer. The transceiver is aimed to record the transmitted and received signals via an embedded microcontroller and pass the data to the processing algorithm. This transceiver is always attached to and moves with the moving structure, hence, must be mounted/embraced properly while having a robust architecture. An intelligent algorithm is created to process the paired, transmitted and received, signals from/to each transducer in the network, by means of a multiplexer in a sequential order. This algorithm can predict the presence, size, shape, and location of the damage upon analyzing the frequency response of the paired signals. Transducers may be located close to or far from the damage location. Distance between the transducer and the damage location, together with the damage size and shape/depth has a significant impact on the frequency response of each transducer. A similar impact on the frequency response will be introduced by the speed of movement or temperature of the moving structures. The algorithm is trained by the data collected from transducers, located in the neighborhood (a circle around each transducer determined by the piezoelectric coefficients of the transducers) or far from a potential damage. The algorithm could be programmed on a DSP chip, which is onboard as part of the transceiver electrical circuit or installed on a PC. In the former method, the DSP processes the signals and sends out a damage recognition message. In the latter method, the data logger sends the analog signals out to a PC-based processing unit [30–34].
2 Smart Piezoceramic Transducers In this section we introduce the synthesis of piezoceramic solutions together with fabrication of transducers on a sample part, followed by characterization and poling of fabricated transducers. 2.1 Synthesis Process There are several methods to fabricate smart piezoceramic transducers. Selecting the most suitable method is contingent on the application of such transducers. Among the various methods, physical vapor deposition (PVD), pulsed laser deposition (PLD), reactive magnetron pulsed DC sputtering (RMPDS), print screen, and sol-gel seem to be of
Health Monitoring of Moving/Rotary Structures
241
most interest to researchers and experts in this field. These methods, compared in Table 2, allow for fabrication of thick film transducers (greater than 1 μm) as well as thin films. To attain higher stimulation strength in MSHM, thickness of the transducers is a key factor. Sole-gel is applied in three ways: spin coat, spray, and hybrid. Spin coating could only accommodate a small flat substrate, hence, not a suitable method for structures that comprise curved elements/features. Spray coating, though facilitates implementation of this method and speeds up the deposition rate, cannot provide a dense uniform film on curved surfaces. Therefore, we believe the Hybrid Sol-gel would be the best candidate for fabrication of piezoceramic transducers on moving/rotary substrates [33]. Fabrication of thick film piezoceramic films on various substrates with different geometrical shapes using hybrid sol-gel method has shown superior results compared to spin coating during our study [32, 33, 35]. Table 2. Conventional methods for fabrication of smart piezoceramic thin/thick film transducers Fabrication methods
Advantages
Disadvantages
Physical vapor deposition (PVD)
• Ability to fabricate thick films • Thickness uniformity • Low porosity
• Deposition speed 650 °C [12]
Piezoelectric coefficient
Relatively high piezoelectric coefficient [5, 33]
Relatively low piezoelectric coefficient [6]
Health Monitoring of Moving/Rotary Structures
243
To prepare PZT solution, lead 2-ethylhexanoate, zirconium 2-ethylhexanoate (96%), and titanium isopropoxide (97%), with Pb:Zr:Ti = 1.1:0.52:0.48, were dissolved in hexane (96%). The BiT solution was also made by dissolving titanium isopropoxide, lead 2-ethylhexanoate and zirconium 2-ethylhexanoate (99.9%), with Bi: Ti = 4.4: 3, in hexane [35]. Precursor preparation
0.225 ml Titanium Isopropoxide
PZT solution preparation
Dissolved in Hexane Stirred for 10 minutes at 80 °C
+ Masking and Dropping
0.55 ml Lead 2-ethylhexanoate
Drying
Stirred for 10 minutes at 80 °C
+ Pyrolyzing + UV curing
A
Annealing
B
0.568 g r Zirconium 2-ethylhexanoate Stirred for 4 hours at 80 °C
Fig. 3. Sol-gel preparation process for A) PZT and B) BiT
2.2 Deposition of Thick Films on the Substrate For semi-field tests, pristine turbine blades of a jet engine compressor were selected. To deposit thick-film piezoceramic transducers on the blades, the parts must be rinsed in ethanol, methanol, and acetone to remove surface contamination. This is followed by taping and masking the borders while leaving the film zone uncovered. At this stage, gold or platinum bottom electrodes are formed on the exposed areas of the substrate using thermal evaporation or DC sputtering deposition methods (Fig. 4). The piezoceramic layers are deposited on top of the bottom electrodes. Bottom electrodes will later be hard-wired to the transceiver. It was observed that conventional sol-gel deposition of PZT and BiT solutions did not result in desirable thick films up to 100–200 μm, even with successive repetition of the process. Therefore, the infiltration technique was deployed. In this technique, composites of PZT/PZT (PZT powder suspended in PZT solution) and PZT/BiT (PZT powder suspended in BiT solution) were implemented. To conduct the infiltration process accurately, considerable attention must be paid towards the appropriate mixing of the powder and piezoelectric solution, i.e., powder nano-scale size, stoichiometric ratio, stirring time, and dispersion quality to form a dense and uniform suspension, strengthen the surface tension forces, suspend the powder homogeneously, and avoid agglomeration before and after deposition.
244
H. Hoshyarmanesh and A. Abbasi
Unmasked zones for bottom electrodes
Unmasked zones for bottom electrodes
A
B
Gold bottom electrodes
Fig. 4. A) Masking the sample parts (superalloy blades IN718 and 738) and B) Deposition of gold bottom electrodes
Thus, we combined hybrid sol-gel method with ultraviolet (UV) curing, which is categorized under the phytochemical metal-organic techniques. This method takes advantage of UV irradiation to help break the large ligands in the microstructural grain network of the material. This, while reducing the probability of surface crack formation in the films, improves electrophysical properties of the piezoceramic transducer [35]. In conventional methods, the polyvinylpyrrolidone (PVP) polymer is mostly used to create a crack-free, low-porosity thick piezoelectric film. Although many properties improve by reducing the porosity, adding polymeric impurities will have a negative impact on the performance of the transducer. We have not used any additives in the chemo-physical structure of the proposed films. The following diagram in Fig. 5 illustrates the consecutive steps of photochemical hybrid sol-gel deposition used in our experiments.
Fig. 5. Photochemical hybrid sol-gel deposition method for fabrication of piezoceramic transducers
Dropping method has been chosen, instead of spray or spin coat, to deposit the solutions on the convex surface of the blades in two or three spots. Upon completing the post processing steps, gold electrodes are deposited on top of the piezoceramic thick films using thermal evaporation. The top electrodes are also hard-wired to the transceiver for data transmission. Figure 6 shows a sample PZT thick film deposited on the curved surface of iron-nickel-based superalloy blades IN718 and IN738, with gold top electrode, using hybrid sol-gel infiltration method.
Health Monitoring of Moving/Rotary Structures
245
Subsequent to the above, the liquid/gel films undergo a post processing procedure composed of several steps: drying, pyrolyzing, optical curing, and annealing. The process then is repeated multiple times in order to obtain a desirable thickness. The main challenges associated with this process were formation of surface cracks, non-homogeneity, and non-uniform thickness, all were resolved by controlling the diluteness of PZT solution, together with optimizing the post processing time and temperature.
A
B
C
Fig. 6. PZT thick films deposited on nickel-based superalloy blades A) IN718 and B) IN738, with gold top electrodes, using hybrid sol-gel infiltration technic; C) Masked blades after deposition of top gold electrodes
2.3 Poling Phenomenon As the piezoelectric thick film transducers and top electrodes are deposited, the tape mask is removed, and the transducers should be subjected to a process called poling. This process enhances piezoelectric properties by energizing the dipoles. There are two conventional methods named as contact and not-contact polling, in which piezoelectric material is exposed to a high electric field. As we believed the contact method was easier to implement, safer, and needed less voltage levels, this method was used in our tests to polarize the deposited piezoceramic transducers. Two groups of parameters are taken into account: i) poling temperature, applied electric field, and polarizing time, as well as ii) material type, thickness, porosity, film size, and number of layers. Figure 7 shows the schematic view of the contact poling circuit. An alternating current (AC) power supply feeds the circuit. The poling process is begun with laying the samples in silicon oil bath. The generated voltage is rectified and directly applied to the poling electrodes installed in touch with top electrodes of the thick films. A temperature sensor, a thermostat and heater are used to control the temperature of the silicon oil bath.
A
B
Fig. 7. A) Schematic view of the contact poling circuit, and B) Experimental setup for contact poling.
3
2
1
15 × 15
50 ± 1
15 × 15
60 ± 1
B
B
10 × 10
60 ± 1 PZT/ PZT
A
50 ± 1 PZT/PZT 10 × 10
15 × 15
100 ± 2
B
A
10 × 10
10
10
10
16
16
16
10
10
5
150
150
150
98
− I Ic>100
60 ± 2.2 70
50 ± 1.6 65
20 ± 1
Size Layers Porosity tp ± 5s Tp ± 1 Ep (mm2 ) (%) (min) (o C) (kV/cm)
100 ± 2 PZT/BiT
Material
A
Sample Pos. δ (μm)
10
7
55
− I Ic350 °C seems to be challenging, if not impossible. o Solution: High-temperature piezoelectric materials such as orthorhombic bismuth titanate BiT, and gallium ortho-phosphate GaPO4 are ideal candidates for high temperature application. The piezoelectric response (coefficients) of such materials is not as high as those of the family of PZTs; however, composite compounds of HT piezoceramics, e.g. BiT, and PZT has shown promising results for up to 650 °C. Although this temperature range is far below the temperatures at which gas turbines and rocket propulsion systems work, there are still many crucial components in such systems that operate below 650 °C (e.g., low pressure and high pressure compressors of a gas turbine engine). • Macroscopic vibration of the structure and environmental noise may affect the signals and accuracy of health monitoring. o Solution: The test setup was dynamically balanced and macroscopic vibrations were minimized; however, in field applications macroscopic displacements occur due to aerodynamic pressure, drag force, etc., which cause low-frequency mechanical vibrations, out of the scope of the monitoring frequency range. • Abrasive and corrosive effects of high-energy particles colliding the piezoelectric surface. o Solution: Piezoceramics are wear resistant and could survive in the substrate lifetime. However, electrodes and contacts should be well protected.
256
H. Hoshyarmanesh and A. Abbasi
To create a platform for semi-field tests, and verify/validate the proposed method, the prototype of the compressor of a jet engine was developed, as shown in Fig. 17. In this prototype an electric motor is able to rotate the blade-disk-rotor assembly up to 3000 rpm. The housing of the transceiver is fixed to the moving rotor. The cooling tank and recirculating pump are located behind the disk. All transducers are hard-wired to the transceiver through the central hole of the disk, machined axially on the face surface, and the cooling tank. A fan-cooled copper radiator is installed under the rotary assembly that cools down the recirculating coolant. Thermal energy is transferred to the blades by means of an industrial force-air electric heater. A control panel installed on the rear back of the apparatus provides some control features to adjust the rotational speed, measure the temperature, enable the cooling system, turn the transceiver on/off, and recharge the batteries.
B
A
C
Fig. 17. A) Temperature and pressure diagram of a turbine jet engine (fan and low pressure compressor compartments work at temperatures 2000 rpm, and the monitoring station should be established and verified for sending encrypted packets on a specified bandwidth that carries brief information about fault detection, size, location, and perhaps estimated life-span. • Developing a machine learning algorithm to automate the monitoring software and generate alarms before any dangerous failure.
262
H. Hoshyarmanesh and A. Abbasi
References 1. Chen, X.: Preparation of PZT material with different Zi/Ti ratio by sol-gel method. Appl. Mech. Mater. 238, 105–108 (2012). https://doi.org/10.4028/www.scientific.net/AMM. 238.105 2. Kažys, R., Voleišis, A., Voleišien˙e, B.: High temperature ultrasonic transducers: review. ULTRAGARSAS (ULTRASOUND) 63, 7–17 (2008) 3. Kobayashi, M., Olding, T.R., Sayer, M., Jen, C.K.: Piezoelectric thick film ultrasonic transducers fabricated by a sol-gel spray technique. Ultrasonics 39, 675–680 (2002) 4. Annamdas, V.G.M., Soh, C.K.: Multiple piezoceramic transducers (PZT): structure interaction model. In: Smart Structures and Materials 2006: Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, p. 61743G. International Society for Optics and Photonics (2006) 5. Pandey, S.K., et al.: Structural, ferroelectric and optical properties of PZT thin films. Phys. B Condens. Matter. 369, 135–142 (2005) 6. McAughey, K., Burrows, S., Edwards, R.S., Dixon, S.: Investigation into the use of Bismuth Titanate as a High Temperature Piezoelectric Transducer, 8 7. Jeon, Y., Chung, J., No, K.: Fabrication of PZT thick films on silicon substrates for piezoelectric actuator. J. Electroceram. 4, 195–199 (2000) 8. Pérez, J., Vyshatko, N.P., Vilarinho, P.M., Kholkin, A.L.: Electrical properties of lead zirconate titanate thick films prepared by hybrid sol–gel method with multiple infiltration steps. Mater. Chem. Phys. 101, 280–284 (2007) 9. de la Cruz, J.P.: Piezoelectric thick films: preparation and characterization. Microelectromech. Syst. Devices (2012) 10. Giurgiutiu, V., Zagrai, A.N.: Embedded self-sensing piezoelectric active sensors for on-line structural identification. J. Vib. Acoust. 124, 116–125 (2001) 11. Park, G., Cudney, H.H., Inman, D.J.: An integrated health monitoring technique using structural impedance sensors. J. Intell. Mater. Syst. Struct. 11, 448–455 (2000) 12. Annamdas, V.G.M., Soh, C.K.: Application of electromechanical impedance technique for engineering structures: review and future issues. J. Intell. Mater. Syst. Struct. 21, 41–59 (2010) 13. Damage Prognosis: For Aerospace, Civil and Mechanical Systems | Wiley. In: Wiley.com 14. Giurgiutiu, V.: Structural Health Monitoring with Piezoelectric Wafer Active Sensors. Elsevier (2014) 15. Engineering Vibration, 4th edn. https://www.content/one-dot-com/one-dot-com/us/en/hig her-education/program.html. Accessed 7 Mar 2021 16. Mook, G., Pohl, J., Michel, F.: Non-destructive characterization of smart CFRP structures. Smart Mater. Struct. 12, 997–1004 (2003) 17. Xu, B., Giurgiutiu, V.: A low-cost and field portable electromechanical (E/M) impedance analyzer for active structural health monitoring. South Carolina University of Columbia Department of Mechanical Engineering (2005) 18. Ouahabi, A., Thomas, M., Kobayashi, M., Jen, C.K.: Structural health monitoring of aerospace structures with sol-gel spray sensors. Key Eng. Mater. (2007) 19. Cardwell, D.N., Chana, K.S., Russhard, P.: The use of eddy current sensors for the measurement of rotor blade tip timing: sensor development and engine testing. American Society of Mechanical Engineers Digital Collection, pp. 179–189 (2009) 20. Andersson, O., Navrotsky, D.V., Santamaria, S.: Siemens’ medium size gas turbine continued product and operation improvement program, 25 21. The Final Report: Uncontained Engine Failure, Delta Air Lines Flight 1288, McDonnell Douglas MD-88, N927DA, Pensacola, Florida (1996)
Health Monitoring of Moving/Rotary Structures
263
22. The Final Report: In-Flight Uncontained Engine Failure Airbus A380-842, VH-OQA. Australian Transport Safety Bureau, Overhead Batam Island, Indonesia (2013) 23. The Final Safety Report: Southwest Airlines Flight 812 Failure. National Transportation Safety Board, Washington, D.C., Yuma, Arizona (2013) 24. Royal Navy Submarine Museum. In: National Museum of the Royal Navy. https://www.nmrn. org.uk/submarine-museum. Accessed 7 Mar 2021 25. Shepardson, D., Freed, J.: Damage to United Boeing 777 engine consistent with metal fatigue: NTSB. Report in Washington (2021). https://www.reuters.com/article/us-boeing-777-ntsbidUSKBN2AN03S 26. Haggart, S., Waydo, J.: The mobility system wheel design for NASA’s mars science laboratory mission. In: 11th European Conference of the International Society for Terrain-Vehicle Systems. Torino, Italy, p. 19 (2008) 27. Addington, M., Schodek, D.: Smart Materials and Technologies: For the Architecture and Design Professions. Routledge, CRC Press 28. LeMieux, D.H.: On-line thermal barrier coating monitoring for real-time failure protection and life maximization. Semi-Annual Report-Siemens Westinghouse Power Corporation (United States) (2002) 29. SIEMAT: Florida Turbine Technologies, Inc. (2014). www.fttinc.com 30. Hoshyarmanesh, H., Abbasi, A.: Structural health monitoring of rotary aerospace structures based on electromechanical impedance of integrated piezoelectric transducers. J. Intell. Mater. Syst. Struct. 29, 1799–1817 (2018) 31. Hoshyarmanesh, H., Abbasi, A., Moein, P., Ghodsi, M., Zareinia, K.: Design and implementation of an accurate, portable, and time-efficient impedance-based transceiver for structural health monitoring. IEEE ASME Trans. Mechatron. 22, 2809–2814 (2017) 32. Hoshyarmanesh, H., Nehzat, N., Salehi, M., Ghodsi, M., Lee, H.-S., Park, H.-H.: Thickness and thermal processing contribution on piezoelectric characteristics of Pb(Zr-Ti)O3 thick films deposited on curved IN738 using sol–gel technique. Proc. Inst. Mech. Eng. Part J. Mater. Des. Appl. 229, 511–521 (2015) 33. Hoshyarmanesh, H., Ebrahimi, N., Jafari, A., Hoshyarmanesh, P., Kim, M., Park, H.-H.: PZT/PZT and PZT/BiT Composite piezo-sensors in aerospace SHM applications: photochemical metal organic + infiltration deposition and characterization. Sensors 19(1), 13 (2018) 34. Hoshyarmanesh, H., Ghodsi, M., Kim, M., Cho, H.H., Park, H.-H.: Temperature effects on electromechanical response of deposited piezoelectric sensors used in structural health monitoring of aerospace structures. Sensors 19, 2805 (2019) 35. Hoshyarmanesh, H., Ghodsi, M., Park, H.-H.: Electrical properties of UV-irradiated thick film piezo-sensors on superalloy IN718 using photochemical metal organic deposition. Thin Solid Films 616, 673–679 (2016) 36. Lazarevic, Z., Stojanovic, B.D., Varela, J.: An Approach to Analyzing Synthesis, Structure and Properties of Bismuth Titanate Ceramics (2005) 37. Hoshyarmanesh, H., Maddahi, Y.: Poling process of composite piezoelectric sensors for structural health monitoring: a pilot comparative study. IEEE Sens. Lett. 2, 1–4 (2018) 38. Bhalla, S., Yang, Y.W., Annamdas, V.G.M., Lim, Y.Y., Soh, C.K.: Impedance models for structural health monitoring using piezo-impedance transducers. In: Soh, C.-K., Yang, Y., Bhalla, S. (eds.) Smart Materials in Structural Health Monitoring, Control and Biomechanics, pp. 53–128. Springer, Berlin, Heidelberg (2012) 39. Zhou, S.-W., Liang, C., Rogers, C.A.: An impedance-based system modeling approach for induced strain actuator-driven structures. J. Vib. Acoust. 118, 323–331 (1996)
264
H. Hoshyarmanesh and A. Abbasi
40. Bhalla, S., Soh, C.K.: Structural health monitoring by piezo-impedance transducers. I: Modeling. J. Aerosp. Eng. 17, 154–165 (2004) 41. Bhalla, S.K., Moharana, S.: Modelling of piezo-bond structure system for structural health monitoring using EMI technique. Key Eng. Mater. (2013). https://www.scientific.net/KEM. 569-570.1234. Accessed 7 Mar 2021
Rub-Impact Fault Diagnosis of a Coal Crusher Machine by Using Ensemble Patch Transformation and Empirical Mode Decomposition S. K. Laha(B) , B. Swarnakar, Sourav Kansabanik, and K. J. Uke CSIR-Central Mechanical Engineering Research Institute, MG Avenue, Durgapur 713209, West Bengal, India [email protected]
Abstract. In this paper, a newly developed signal decomposition technique called Ensemble Patch Transformation (EPT) is used for the first time to identify mechanical faults of a hammer type coal crusher machine. Rub-impact faults of a rotary machine result in amplitude modulation of the vibration signal. In this scenario, mode decomposition of a complex signal is essential when the signal is a combination (either convolutive or additive) of many simpler signals. EPT is a newly developed multi-resolution signal analysis method inspired by multi-scale concept of scale-space theory. As Empirical Mode Decomposition (EMD) method can decompose a complex signal into a number of simpler signals called intrinsic mode functions (IMF); the EPT process can also decompose a complex signal into a main frequency component (FC) and a residual signal. The residual signal again can be decomposed into a FC component and residual component and this process can be repeated for a number of times. Initially, the performance of the EPT is investigated on a simulated signal. Then, the same method is applied to find out rub-impact fault of a hammer type coal crusher machine in a steel plant. Finally, the superiority of the EPT based method is demonstrated during extracting multiple signals with comparison to the EMD based method. Keywords: Empirical mode decomposition · Ensemble patch transform · Signal decomposition · Rub-impact · Fault diagnosis · Coal hammer crusher
1 Introduction A coal crusher machine is used to crush the coal using hammers into the required size for better calorific value during combustion. This machine is at the heart of any thermal power plant or steel plant’s Coal Handling Unit. The pulverized coal is then fed into a furnace for combustion. Apart from impacting force from the hammer, dynamic forces are also present in the machine which may be caused by mass unbalance, misalignment, looseness, bent rotor, bearing faults and electrical problems etc. High and abnormal vibration can be observed in presence of such faults in the crusher. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 265–278, 2022. https://doi.org/10.1007/978-3-030-82110-4_14
266
S. K. Laha et al.
When the radial clearance between the crusher rotor with stator gets reduced after an optimum limit, rubbing starts. Rubbing may cause dynamic instability of the rotor and finally, catastrophic failure of the system may happen. When rub and impact occurred in a crusher machine, the vibration signal measured on it may consist of high synchronous, sub-synchronous, super-synchronous components along with noise. Therefore conventional signal processing techniques like FFT may not be sufficient to identify the rub impact signal from the complicated vibration signal. Obuchowski et al. [1] studied the feasibility to find out local damage of a rolling element bearing on a hammer crusher based on cyclic spectral coherence. Wyłoma´nska et al. [2] proposed a new segmentation method based on the regime switching model to cancel out large impulsive noise and enhance the signal to noise ratio. This methodology had been used to detect rolling element bearing fault in a copper ore crusher. Wyłoma´nska et al. [3] also proposed a stochastic model of source identification from crusher vibration signal to detect rolling element bearing fault. Lin et al. [4] calculated dynamic response by using a simplified model of a rotor rubbing in its housing. Sun et al. [5] detected the rubbing impacts on a high speed rotor system by using non linear dynamics. Choy [6] explained a modeling methodology to analyze the interaction of a complex system consisting rotor, stator, bearings and blades. Oks et al. [7] identified rubbing by higher harmonics detection. Smalley [8] examined the behaviour of a rub induced thermal bow vibration of a steam turbine during acceleration and deceleration. Chu [9] identified the location of rubbing in a rotary system with the help of acoustic emission and wavelet transform. Hu et al. [10] identified the rubbing impacts at their early stages by using stochastic resonance. Chu and Lu [11] demonstrated that the value of dynamic stiffness at a rubbing position will be higher in comparison to non rubbing positions. They applied dynamic stiffness identification method to find out rubbing positions in a multi disk rotor. Peng et al. [12] investigated the characteristics of rub impact by using the scalogram and wavelet phase spectrum on three signals. Auger et al. [13] developed reassigned scalogram to overcome the limitations of scalogram. Peng et al. [14] modeled and studied the rubbing fault between a rotor and stator by using reassigned scalogram. They detected the increased amplitude and impact numbers in the high frequency region when the rubbing increased. Huang et al. [15] had proposed that Empirical Mode Decomposition (EMD) can be used successfully to decompose a complex signal into a set of Intrinsic Mode Functions (IMFs). EMD is a self-adaptive decomposition method and the signal can be decomposed by local characteristic time scale. It is one of the most popular and widely applied signal decomposition techniques for fault diagnosis of rotating machines. However, it cannot sort out the mode mixing problems. Cheng et al. [16] applied EMD to decompose the rub impact signal from the background and noise signal. Wu et al. [17] proposed ensemble empirical mode decomposition (EEMD) a new signal decomposition technique to overcome the limitations of EMD. In EEMD the signal characteristics information extracted accurately by adding a white noise of finite amplitude, which uniformly fills the entire time-frequency space. Lei et al. [18] applied the Ensemble Empirical Mode Decomposition (EEMD) technique successfully to identify the rub-impact fault of a power generator and early rub-impact fault of a heavy oil catalytic cracking machine set. Dragomiretskiy et al. [19] had proposed Variational Mode Decomposition (VMD) method which is a non-recursive, adaptive and variational method. In VMD a signal
Rub-Impact Fault Diagnosis of a Coal Crusher Machine
267
can be decomposed into its principal modes by convex optimization. In this paper, they demonstrated their method on a series of real, artificial and complicated signals and found that it is superior to EMD and other decomposition methods for tone detection, tone separation and denoising. Wang et al. [20] Demonstrated that the superiority of VMD over other conventional decomposition techniques applied on numerically simulated rubbing signal as well as practical vibration signal from a gas turbine rotor. Daubechies et al. [21] proposed a method of Synchrosqueezed wavelet transforms which is a combination of wavelet analysis and reallocation method. In this process, the current instantaneous frequency of each mode can be selected by appropriate wavelet scales. They also successfully applied the methodology to real and artificial data. Gilles [22] proposed an empirical wavelet transform (EWT). In EWT different modes of a signal can be extracted by designing adaptive wavelet filters banks. Kim et al. [23] had proposed ensemble patch transform (EPT), a recently developed method of signal decomposition and filtering of complicated signals that cannot be effectively decomposed by the existing decomposition methods. ‘Patch process’ and ‘ensemble’ are the two main components of EPT. They demonstrated the effectiveness of EPT on various artificial and real non-stationary, noisy signals successfully. The rest of the paper is structured as follows: The EMD and EPT are described in Sect. 2. The performance of the EPT is initially investigated on a simulated signal in Sect. 3. Then, the same method is applied to diagnose rub-impact fault of a coal crusher unit in a steel plant causing high vibration. The machine specification is also described in Sect. 3. Further, a comparison between the EPT and EMD has been carried out in that section. Finally, some concluding remarks and future scope of work are outlined in Sect. 4.
2 Mathematical Background 2.1 Empirical Mode Decomposition Empirical Mode Decomposition (EMD) is a time domain signal processing method in which the signal is decomposed into a number of simpler signals called Intrinsic Mode Functions (IMF). By definition IMF signals have the following properties: (a) only one extreme between zero crossings and (b) a mean value of zero. EMD is ideal for analyzing non-linear and non-stationary signals. The algorithm for extracting IMFs through so called sifting procedure is outlined below: 1. Given a univariate signal X (t), let m1 be the mean of its upper and lower envelopes as determined from a cubic-spline interpolation of local maxima and minima. 2. The first component h1 is computed as: h1 = X (t) − m1 . 3. In the second sifting process, h1 is treated as the data, and m11 is the mean of h1 ’s upper and lower envelopes, thus: h11 = h1 − m11 .
268
S. K. Laha et al.
4. This sifting procedure is repeated k times, until h1k is an IMF, such that: h1(k−1) − m1k = h1k . 5. Then it is designated as c1 = h1k , the first IMF component from the signal, which contains the shortest period component of the signal. It is separated from the rest of the data by subtracting from the original data: X (t) − c1 = r1 . Since the residue, r1 , still contains longer period variations in the data, it is treated as the new data and subjected to the same sifting process as described above. The procedure is repeated on successive r j ’s as follows: r1 − c2 = r2 , .... rn−1 − cn = rn . 6. The sifting process stops finally when the residue, r n , becomes a monotonic function from which no more IMF can be extracted. The original signal can be reconstructed by summing up all the IMFs and the residual, X (t) =
n
cj + rn .
j=1
Thus following the above shifting procedure a number of IMFs and a residue, r n can be obtained. 2.2 Ensemble Patch Transformation Kim et al. [23] proposed a new multi-scale method for signal analysis and decomposition process called Ensemble Patch Transform (EPT) derived from multi-scale scale space theory in computer vision. At the heart of this proposed method there are two key concepts (i) patch process which is defined as the data dependent patch of data at a particular time t and (ii) ensemble process which is obtained by shifting the time point t of the patch. Ensemble process enhances the temporal resolution of the signal. 2.2.1 Multi-scale Patch Transform For a given univariate time series (xt )t , a patch at (t, xt ) is a polygon containing the neighbors of the point (t, xt ). The shape and size of the patch control the multi resolution properties of the signal. Let T = {τi }i be a set of size parameters of the patch. For τ ∈ T, let Ptτ (xt ) denote the patch process at the location (t, xt ). Further, the multi-scale patch transforms MPtT (xt ) at that particular location is defined as the sequence of all the elements in the set T. MPtT (xt ) := Ptτi (xt ) i=1,...,|T| Kim et al. [23] considered two types of patches in their analysis, which are.
Rub-Impact Fault Diagnosis of a Coal Crusher Machine
269
Rectangle Patch rectangle formed by the points This patch at the location (t, xt ) is a closed t + k, mink∈[−τ/2,τ/2] {xt+k } − 0.5γ τ and t + k, maxk∈[−τ/2,τ/2] {xt+k } + 0.5γ τ . For this kind of patch the width of the patch is τ and the height is hτt = maxk∈[−τ/2,τ/2] {xt+k } − mink∈[−τ/2τ/2] {xt+k } + γ τ where γ is a scale factor. Oval Patch This kind of patch at (t, xt ) is oval shaped with boundaries 2 2 t + k, xt+k ± γ (τ /4) − k , k ∈ [−τ/2, τ/2] Where, γ is again the scale factor and τ is the width of the patch. A number of statistical measures such as central tendencies and dispersion can be obtained from the patch process Ptτ (xt ) and multi-scale patch process MPtT (xt ). Some common statistical measures for a fixed patch Ptτ (xt ) are as follows • Avetτ (xt ) = Average xti is the average of the patch where xti is the observation in the patch Ptτ (xt ). • Mtτ (xt ) = 21 Lτt (xt ) + Utτ (xt ) where Mtτ (xt ), Lτt (xt ) and Utτ (xt ) are mean envelope, lower envelope upper envelope respectively. and τ • SDt (xt ) = Var Xti is the standard deviation in the particular patch • Rτt (xt ) = Utτ (xt ) − Lτt (xt ) is the difference between upper envelope and lower envelope. 2.2.2 Ensemble Patch Transform Once the patch process is introduced, the Ensemble Patch Transform can defined by shifting the patch at time point t. The definition for Ensemble Patch Transform is as follows: For a univariate time series (xt )t let T be the set of size parameters for the patch. For τ any τ ∈ T, the l th shifted patch at time t is given by Pt+l (xt ), l ∈ [−τ/2, τ/2]. The ensemble patch is collection of all possible shifted patches for a fixed patch τ and is given by τ (xt ) : l ∈ [−τ/2, τ/2] EPtτ (xt ) := Pt+l Further, the multi-scale patch transform is provided by combining all the patches in the set T MEPtT (xt ) := EPtτ (xt ) : τ ∈ T Similar to the patch process, Ptτ {xt } statistical measures such as central tendencies and dispersions can be obtained for the Ensemble Patch Transform EPtτ (xt ) as well.
270
S. K. Laha et al.
2.2.3 Ensemble Patch Filtering and Decomposition For a multi-component signal EPT can be adopted as a low pass or a high pass filter. Kim et al. [23] also proposed a signal decomposition algorithm using the patch filtering process. The decomposition algorithm is as follows. τ 1. Let ςtτ (xt ) be some central measure, such as average or mean envelope of Pt+l (xt ) l . τ Where Pt+l (xt ) is the l th shifted patch at t for a fixed τ . Assuming the signal xt consists of a high frequency component ht and a low frequency component gt such that xt = gt + ht . 2. Obtain the initial component hˆ t (0) = xt − ς τ (xt ) 3. Iterate until convergence for the k th component [k = 0, 1, .....]
(k+1) (k) (k) hˆ t = hˆ t − ς τ hˆ t (k) (k) 4. Upon convergence, hˆ t is the estimate of ht .
3 Results and Discussion 3.1 Simulated Signal Analysis In order to compare the performance of Ensemble Patch Transform (EPT) vis-à-vis Empirical Mode Decomposition (EMD) a synthetic signal is created. The mathematical model of the synthetically simulated signal is given by x(t) =
∞
Aj h(t − jTim ) + ε
j=1
where, Aj is the amplitude of the jth impulse and Tim = 1/fim is time period corresponding to characteristic impulse frequency, h(t) is the impact impulse function and ε is additive Gaussian noise. The impact impulse function h(t) is given by h(t) = e−βt sin(2π fr t) where, β is the decay parameter and fr is the excited resonant frequency. This simulated signal is characteristic of a bearing defect frequency. In this paper the following parameter values of the simulated signal are considered (Table 1). Table 1. Parameter values of the simulated signal Parameters Aj fim (Hz) ε Values
5
13
β (Hz) fr (Hz)
ℵ(0, 0.2) 20
182
Rub-Impact Fault Diagnosis of a Coal Crusher Machine
271
The signal is simulated for a period of 4 s with a sampling frequency of 2000 Hz. Figures 1 and 2 show the Ensemble Patch Transform (EPT) decomposition results and their corresponding FFT spectra. The EPT decomposition of a signal results in a main high-frequency component (FC) and a residual component. The residual component is then again decomposed by EPT process which again results in a FC component and a residual component. This EPT decomposition can be repeated for a number of times until no further decomposition is possible. In this paper, average patch process and average ensemble process with rectangular patch is considered. Figure 1 shows the FC components in 5 stages of decomposition whereas Fig. 2 shows the residual components along with their corresponding spectra.
Fig. 1. Ensemble patch transformation decomposition results of the simulated signal: FC components
It can be observed from Fig. 1 that the first two extracted decomposed signals are impulse type similar to the original impulse signal. The resonance frequency at 182 Hz (fr ), which is also the carrier frequency can be clearly seen. But, the last three decomposition results, peaks are 26 Hz (2fim ) are observed which is twice of the modulating frequency. However, in the Fourier spectrum of the residual component after first decomposition both the carrier frequency at 182 Hz and the modulating frequency of 13 Hz (fim ) and its harmonics are observed (Fig. 2). Also, it is evident from Fig. 2 that the residual components at the 2nd , 3rd and 4th stage consist of only 13 Hz (fim ) and 26 Hz (2fim ). Finally, only the impulse frequency of 13 Hz is present at residual component after 5th decomposition. Thus, it can be concluded that EPT decomposition process successfully separates out the impulse frequency and the resonance frequency. One of the most crucial parameters in the EPT decomposition process is the patch size, Kim [23] suggested two methods for selecting the size parameter: (a) a-priori selection in which the size parameter is based on the approximate period of the signal and (b) post-priori selection in which τ is selected having minimum correlation between the decomposed signals. In this study, τ is selected by estimating the approximate time period between two consecutive impulses.
272
S. K. Laha et al.
Fig. 2. Ensemble patch transformation decomposition results of the simulated signal: Residual components
Fig. 3. Empirical mode decomposition result of the simulated signal
For comparison with EPT, the simulated signal is decomposed using Empirical Mode Decomposition (EMD) which is shown in Fig. 3. The first three IMFs are mainly impulse type signal carrying similarity to the original signal. The resonance frequency of 182 Hz (fr ) and the fault frequency of 13 Hz (fim ) as modulation are evident from Fig. 3. The IMF3 consists of peaks at 4fim , 5fim and 6fim whereas the IMF4 consists of peaks at 2fim , 3fim and 4fim which are essentially harmonics of the modulating frequency. Finally the IMF5 has only one peak at 2fim . Unlike the EPT process, EMD fails to separate out the carrier frequency and the modulating frequency.
Rub-Impact Fault Diagnosis of a Coal Crusher Machine
273
3.2 Coal Crusher Vibration Signal Analysis A 400 Tonne/hour capacity coal crusher of hammer type was suffering high vibration problem. The crusher was running in a coal handling plant in steel plant. A simple schematic diagram of coal crusher with sensor orientation on bearing housing, a 3D crusher model and a photograph of Crusher NDE bearing with two sensors fixed in radial and horizontal direction are shown in Figs. 4, 5 and 6 respectively. Crusher assembly was seated on a concrete floor directly and held by four foundation bolts. The motor was supported on a steel frame and fixed by four foundation bolts, as shown in Fig. 7. The concrete floor was situated at a height of 10 m in a coal handling unit and supported by beams and columns. The crusher rotor, shown in Fig. 8 was placed inside the two bearing blocks and fixed by two cylindrical roller bearings (R3240). It was directly coupled by gear coupling with a 3 phase, 3.3 KV & 800 KW slip ring induction motor. The system was run at 998 rpm (fr = 16.63 Hz). The crusher rotor had 42 numbers of ring hammers in six rows, as shown in Fig. 9. Each ring hammer’s weight was 23.8 ± 0.2 kg.
Fig. 4. A schematic diagram of coal crusher with sensor orientation on bearing housing
Fig. 5. A 3D model of coal crusher assembly
274
S. K. Laha et al.
Fig. 6. Crusher NDE bearing housing with two sensors fixed in Radial and Horizontal direction
Fig. 8. Crusher rotor
Fig. 7. Motor
Fig. 9. Ring hammers
Vibration signals were collected from the motor and crusher bearing housings using the VibXpert–II system with VIB 6.142 accelerometers (sensitivity 9.8 μA/g) in three mutually perpendicular directions with a sampling frequency of 4096 Hz. The three acceleration signals at the crusher non-drive end and their spectra are shown in Fig. 10. In the spectrum of the original signal a dominant frequency at 100 Hz and side-bands around it are observed. This carrier frequency at six-times of the running frequency 6fr corresponds to so-called “hammer-pass-frequency” arises as there are six rows of hammers. Further, the modulating frequency is observed at half of the running frequency i.e., at 8.3 Hz (= fr /2). For further analysis using EPT and EMD process the acceleration signal along the axial direction (thrust signal) is considered in this chapter. The main high-frequency FC components extracted through the EPT process are depicted in Fig. 11. The impact signals are visible first two decomposed components. However, in the 4th decomposed signal 23 fr and 25 fr components and finally in the 5th decomposed signal 23 fr components are visible.
Rub-Impact Fault Diagnosis of a Coal Crusher Machine
275
Fig. 10. Acceleration signals at the crusher non-drive end and their spectra
Fig. 11. Ensemble patch transformation decomposition results of the coal crusher vibration signal: FC components
The residual components of the coal crusher vibration signal extracted by the EPT process are shown in Fig. 12. In all the decomposed residual signals the running frequency along with the side-bands at the sub-harmonics at 21 fr apart can be seen. Finally, the EMD results of the same vibration signal of the coal crusher are obtained and are shown in Fig. 13. It is evident that the EMD fails to extract the sub-harmonic component characteristic of rub-impact fault.
276
S. K. Laha et al.
Fig. 12. Ensemble patch transformation decomposition results of the coal crusher vibration signal: Residual components
Fig. 13. Empirical mode decomposition results of the coal crusher vibration signal
4 Conclusions In this paper, a recently developed multi-resolution signal analysis method called Ensemble Patch Transformation (EPT) is adopted for detecting rub-impact fault of a coal crusher unit in a steel plant. The EPT can successfully extract the impact component along with the sub-harmonic and super-harmonic components of a typical rub-impact fault signature. In fact, compared to EMD this method performs better. However, EPT results are critically dependent on the patch size parameter, τ. In the present study, a fixed patch size is considered for all consecutive decompositions. However, this patch size can be selected adaptively depending on the decomposition stage. Selecting the optimal patch size may be an open research problem for vibration-based fault diagnosis of rotating system.
Rub-Impact Fault Diagnosis of a Coal Crusher Machine
277
References 1. Obuchowski, J., Zimroz, R., Wyłoma´nska, A.: Identification of cyclic components in presence of non-Gaussian noise – application to crusher bearings damage detection. J. Vibroengineering 17, 1242–1252 (2015) 2. Wyłoma´nska, A., Zimroz, R., Janczura, J., Obuchowski, J.: Impulsive noise cancellation method for copper ore crusher vibration signals enhancement. IEEE Trans. Industr. Electron. 63(9), 5612–5621 (2016) 3. Wylomanska, A., Zimroz, R., Janczura, J.: Identification and stochastic modelling of sources in copper ore crusher vibrations. J. Phys.: Conf. Ser. 628(1), 012125 (2015) 4. Lin, F., Schoen, M.P., Korde, U.A.: Numerical investigation with rub-related vibration in rotating machinery. J. Vib. Control 7, 833–848 (2001) 5. Sun, Z.C., Xu, J.X., Zhou, T.: Analysis on complicated characteristics of a high-speed rotor system with rubbing caused impacts. Mech. Mach. Theory 37, 659–672 (2002) 6. Choy, F.K., Padovan, J., Batur, C.: Rubinteractions of flexible casing rotor systems. J. Eng. Gas Turbines Power Trans. ASME 111, 652–658 (1989) 7. Oks, A.B., Iwatsubo, T., Arii, S.: Detection of the point where rubbing occurs in a multi supported rotor system. JSME Int. J. 36, 312–318 (1993) 8. Smalley, A.J.: Dynamic response of rotors to rubs during startup. J. Vib. Acoust. Stress. Reliab. Des. 111, 226–233 (1989) 9. Chu, F., Zhang, Z.: Bifurcation and chaos in a rub-impact Jeffcott rotor system. J. Sound Vib. 210, 1–18 (1998) 10. Hu, N.Q., Chen, M., Wen, X.S.: The application of stochastic resonance theory for early detecting rubbing caused impacts fault of rotor system. Mech. Syst. Signals Process. 17, 883–895 (2003) 11. Chu, F., Lu, W.: Determination of the rubbing location in a multi-disk rotor system by means of dynamic stiffness identification. J. Sound Vib. 248, 235–246 (2001) 12. Peng, Z., He, Y., Lu, Q., Chu, F.: Feature extraction of the rub-impact rotor system by means of wavelet analysis. J. Sound Vib. 259(4), 1000–1010 (2003) 13. Auger, F., Flandrin, P.: Improving the readability of time–frequency and time-scale representations by the reassignment method. IEEE Trans. Signal Process. 43, 1068–1089 (1995) 14. Peng, Z.K., Chu, F.L., Tse Peter, W.: Detection of the rubbing-caused impacts for rotor– stator fault diagnosis using reassigned scalogram. Mech. Syst. Signal Process. 19(2), 391–409 (2005) 15. Huang, N.E., et al.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. Roy. Soc. London. Ser. A: Math. Phys. Eng. Sci. 454(1971), 903–995 (1998) 16. Cheng, J., et al.: Local rub-impact fault diagnosis of the rotor systems based on EMD. Mech. Mach. Theory 44(4), 784–791 (2009) 17. Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009) 18. Lei, Y., He, Z., Zi, Y.: Application of the EEMD method to rotor fault diagnosis of rotating machinery. Mech. Syst. Signal Process. 23(4), 1327–1338 (2009) 19. Dragomiretskiy, K., Zosso, D.: Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2013) 20. Wang, Y., et al.: Research on variational mode decomposition and its application in detecting rub-impact fault of the rotor system. Mech. Syst. Signal Process. 60, 243–251 (2015) 21. Daubechies, I., Jianfeng, L., Hau-Tieng, W.: Synchrosqueezed wavelet transforms: an empirical mode decomposition-like tool. Appl. Comput. Harmon. Anal. 30(2), 243–261 (2011)
278
S. K. Laha et al.
22. Gilles, J.: Empirical wavelet transform. IEEE Trans. Signal Process. 61(16), 3999–4010 (2013) 23. Kim, D., Choi, G., Oh, H.-S.: Ensemble patch transformation: a flexible framework for decomposition and filtering of signal. EURASIP J. Adv. Signal Process. 2020(1), 1–27 (2020). https://doi.org/10.1186/s13634-020-00690-7
Fault Detection of Non-stationary Processes Using a Modified PCA Approach Bahador Rashidi and Qing Zhao(B) Department of Electrical and Computer Engineering, University of Alberta, 116 Street and 85 Avenue, Edmonton, AB, Canada [email protected]
Abstract. Fault detection in non-stationary processes is a timely research topic in industrial process monitoring. The core objective of this research is to tackle anomaly detection in non-stationary industrial processes with manipulated set-point changes and uncertainties in the prior knowledge about the statistical nature of the measurements. In this research, the fault detection problem is investigated from an unsupervised perspective and a modified PCA approach is proposed. This method utilizes the base-line loading matrices and an upper bound to be determined for the variation range of time-series to relax the assumption on stationary characteristics. Hence, the mean used for normalizing the time-series are adaptively updated (using soft-calculation) without any need for a high-complexity recalibration procedure as needed in other existing adaptive/recursive PCA methods. Moreover, the firstand second-order error indices are introduced to monitor the statistical behaviour of process measurements. To develop a more reliable system condition indicator, an overall health index is given based on the proposed features using a non-parametric kernel density estimator (KDE). The proposed approach does not require a heavy online calculation in comparison with the existing adaptive solutions and it can successfully detect faults from healthy measurements’ mean changes. Finally, an alarm generator algorithm is presented which generates two alarm types, caution and actual fault for processes operators, utilizing the proposed overall health index. The effectiveness of the modified PCA approach is validated by both numerical examples and industrial case studies. Keywords: Fault dectection · Nonstationary process · Principal component analysis · Feature engineering · Alarm generation
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 279–306, 2022. https://doi.org/10.1007/978-3-030-82110-4_15
280
B. Rashidi and Q. Zhao
Nomenclature α ¯∗ X Sˆ ˆ, U ˜ U ˆ ˜ V ,V Λ E0 E1 E2 Fα M. V μ ν φ0 τ a D Fp G h H0 H1 k(.) R SP E T2 U U CL w WI WL X X d1 X d2 Xf
1
Confidence interval Calibrated mean of measured variables α Tails percentile threshold Matrix of principal singular values Left singular vectors in SVD Right singular vectors in SVD Matrix of singular values Proposed zero-order error index Proposed first-order error index Proposed second-order error index F-distribution Correlation matrix derived by PCA Variation range (upper bound) Non-stationary random time-series i.i.d inputs Combined index derived from (dynamic) PCA Tuning parameter Tuning parameter Distance measure for mean calibration Estimated probability density function General form of a (non-)linear function Kernel bandwidth Null hypothesis Alternate hypothesis Kernel function (e.g. RBF) Proposed overall health index Square prediction error Hotelling statistics All system inputs Upper Control Limit Additive measurement noise Differencing moving-window length Mean calibration moving-window length System measurements First-order difference of X Second-order difference of X Proposed feature vector
Introduction
Process health monitoring techniques have been widely applied in industrial processes to effectively enhance safety/reliability and reduce maintenance costs by detecting anomalies in time. This line of work also plays a prominent role in the design and implementation of a reliable control system. Initiated in the early
Fault Detection of Non-stationary Processes Using a Modified PCA
281
1970s, model-based fault detection and diagnosis, also known as quantitative approaches have been significantly developed since then. Successively qualitative data-driven methods have been introduced for the same purpose, which do not require prior model knowledge about the system. Many qualitative and quantitative methods have been developed and they are summarized in surveys [1,2] and [3]. Application of model-based techniques [2,4–6] may experience difficulties for complex industrial processes when there is a lack of information on their model structures. On the other hand, data-driven methods [7–10] are relatively easier to implement which can be performed effectively without the need for a priori knowledge of the process model. Hence, the focus of this research is on the application of a data-driven method for fault detection in non-stationary processes. An industrial system can be classified with respect to different properties such as linearity and time-invariance [11]. Similarly, based on the properties of timeseries, one can treat process measurements that follow a multivariate normal distribution with constant mean and variance as one common type of stationary processes. On the other hand, most industrial process measurements belong to the category of time series that manifest large variations even during their normal operating conditions and consequently show non-stationary statistical behaviour. One way to model such a non-stationary process is to assume that its non-stationary measurements still follow a multivariate Gaussian distribution but with time-variant means and variances. However several well-established methods (e.g. PCA, KPCA and probabilistic PCA) have underlying stationery and Gaussian distribution assumptions for the process measurements. Hence when they are used for anomaly detection in the non-stationary process, inaccurate fault detection results may be rendered. One reason for such a problem is that most of these approaches use monitoring statistics such as T 2 or square prediction error (SPE), which require the data itself and the calculated errors (actual values minus estimated counterparts) to follow Gaussian distribution with constant mean and variance [12,13]. To tackle this particular limitation, one can consider deep learning-based approaches for process monitoring [14,15], which do not require the measured data to follow multivariate normal distributions. For instance, in [16], variational Autoencoders (VAE) are proposed for fault detection by mapping the input space to a multivariate normal distribution using the T 2 statistics and its control limit according to a predefined F distribution. Non-stationary measurements may be the result of a time-varying dynamic system whose parameters are subject to variations over time. Consequently, for this non-stationary behaviour, one can assume that there exists a time-varying transformation, which can be applied on the system measurements such that they follow a tractable Gaussian distribution. Hence, one can try to derive a set of steps to determine such a time-varying transformation and then utilize the existing monitoring indices suitable for Gaussian distributions to monitor the statistical changes in the system measurements. Other factors leading to non-stationary measurement are manipulated inputs (e.g. intentional changes or close-loop compensation effect) and/or certain internal process changes such as
282
B. Rashidi and Q. Zhao
material degradation (e.g. the catalyst degradation in a CSTR process [17]), corrosion and valve/nozzle plugging (e.g. residue clotting in high-speed centrifuges [18]), etc. In [19], co-integration is assumed for non-stationary time-series in order to provide a viable solution. Other research works that adopt a co-integrated structure for non-stationary time-series to create health monitoring indices can be found in [11,20]. Among all available data-driven methods, principal component analysis (PCA) [21], canonical variate analysis (CVA) [22], independent component analysis (ICA) [23] and partial least-squares (PLS) [24] have been frequently used for fault detection. From the implementation perspective, each one of the aforementioned methods has pros and cons in comparison with other counterparts [7]. PCA [25] and its various modified versions [21,26,27] were utilized for different types of processes. For the multivariate non-linear cases, kernel PCA is widely used [28–31]. By utilizing kernel trick [32], KPCA firstly maps the process variables with non-linear relations onto a high-dimensional feature space and then applies the standard PCA to generate useful statistical error indices for process health monitoring. Ordinary PCA [21] and partial least-squares (PLS) [33] were initially proposed to monitor linear stationary processes in which the relationships among different measurements follow a static and linear pattern. The dynamic relationships between process time-series, the Augmentation approach was proposed in [21] to take into account the auto- and cross-correlation with time-delay, which led to proposing dynamic PCA (DPCA) to identify the baseline model of the process. PCA is a simple yet powerful method for fault detection and has been implemented in many process monitoring products. Its modification for performance improvement is of great interest. However, this approach imposes stationarity and Gaussian assumptions, which may not be satisfied in many industrial processes with non-stationary variations. Although a system whose measurements follow a non-Gaussian distribution can have stationary statistical behaviour, a non-stationary process mostly can not produce normally distributed measurements with constant mean and variance. In this paper, the problem under study is fault detection for non-stationary processes. The assumption is that there exists a time-varying transformation, which can map the actual measurements into a reduced feature space such that they follow a multivariate normal distribution with relatively constant means and variances. With this aim, one can utilize the modern ANN-based approaches (e.g. AE, DAE, VAE) which usually involve complex training. On the other hand, there also exist several enhanced versions of the more conventional approaches that utilize real-time (e.g. batchwise) adaptation to treat the nonstationary behaviour of time-series. Several modified versions of PCA such as adaptive PCA [28,34], moving window PCA [35] and recursive PCA [36] were proposed. In these approaches, upon receiving the test data subject to non-stationary changes, a set of algorithms are activated to update the base-line model. More specifically, the mean, covariance matrix and a number of principal components are updated in a block- or element-wise manner. Hence, it requires big O calculation to conduct singular value decomposition (SVD) upon arrival of a new block/sample of the test data which can be compu-
Fault Detection of Non-stationary Processes Using a Modified PCA
283
tationally involved and also requires an enormous amount of attention from the operators for parameter tuning. The other limitation of adaptive PCA is that it attempts to update the base-line model with any changes in the process unless the change is relatively abrupt and violates a tuning threshold. Furthermore, the more complex parameter tuning steps required for model updating defy the simplicity of the original PCA approach, hence hinder its industrial implementations. The limitations of the aforementioned methods motivated authors to propose a new PCA-based technique that not only handles the non-stationary changes in the time-series but also avoid a real-time updating structure to reduce the computational complexity and simplify the implementation. The core objective of the proposed method is to distinguish between actual faults and normal process variations due to intentional/induced manipulated inputs changes. The proposed strategy can be applied to both stationary and non-stationary cases, and provide feasible features about the health status of the process under study. The remainder of this Chapter is organized as follows: in Sect. 2, the proposed modified PCA approach and three feature indices are introduced. In Sect. 2.2, a non-parametric learning approach is presented to reconcile the features to generate an overall health index indicating the process health status. Section 3 presents an alarm generator algorithm for more reliable industrial application of the proposed modified PCA technique. In Sect. 4, a thorough comparative simulation study is given to demonstrate the effectiveness of proposed framework using a numerical example. Finally in Sect. 4.3, the proposed approach is applied to an industrial compressor dataset and its advantages are highlighted.
2
A Modified PCA Approach for Nonstationary Process Fault Detection
The following shows a description of the systems under study, in which the process measurements may have non-stationary statistical behaviour. Equation (1) is a generic definition of a (non-)linear process with the time-series X ∈ Rm as the measurement X = G(U ) + w (1) where U = [ν, μ]T ∈ Rn+l . ν ∈ Rn is the i.i.d inputs and μ ∈ Rl is the non-stationary time-series with a non-deterministic trend (e.g. bounded random walk with time-variant mean and variance). If such a non-deterministic trend is known and eliminated, the remaining residual will follow a multivariate Gaussian distribution. The measurement noise w can be assumed as a white noise and G(.) can be a generally linear or non-linear function. According to Eq. (1) and the aforementioned definition for non-stationarity, it can be readily assumed that X follows a multivariate Gaussian distribution but with non-deterministic time-varying mean and variance under normal operating condition. However this study is only focused on the cases of mean variations and the variance of process variables are assumed to remain relatively constant. In dynamic systems, normal operating conditions often include inevitable process variations such as set-point changes, control compensation effects and
284
B. Rashidi and Q. Zhao
equipment degradation, etc. On the other hand, fault scenarios that can be detected by the proposed approach include but not limited to constant bias with different magnitudes and other deterministic trends such as ramp with slow or steep slope. In addition, stochastic random drift can also be handled by the proposed modified PCA. To distinguish between normal operating conditions with non-stationary mean changes and the actual faulty ones, it is assumed that the statistical characteristics of the fault-induced changes are different from the normal non-stationary characteristics. As can be seen in Fig. 1, the proposed strategy includes two main steps: 1) Feature generation according to the proposed error indices and, 2) unsupervised probability distribution analysis to distinguish normal time-varying behavior of process measurements from actual faults. For implementation, it is suggested that 30% of the entire training dataset is used for learning the base-line dynamics and calculating feature indices, while 70% of that is utilized for probabilistic characterization to create a reliable hypothesis test for fault detection. The following subsections present details of the proposed approach.
Fig. 1. Schematic diagram of the proposed non-stationary fault detection approach using modified-PCA
2.1
Feature Engineering Using PCA
To conduct effective anomaly detection, three feasible feature indices are defined and extracted for a given process which carry key information about the nonstationary behaviour of process measurements. It should be noted that the formulation in this section is given for a linear process, but the non-linear extension of the proposed features can be similarly derived by using kernel PCA that was recapitulated by authors in [18].
Fault Detection of Non-stationary Processes Using a Modified PCA
285
2.1.1 Mean Updating Scheme and the Zero-Order Error Index E0 This index indicates the zero-order (constant) trend of a time-series according to the base-line model derived in the training step. To make this feature robust to time-varying mean changes of process variables, a moving-mean calculation procedure is proposed as follows. 1) At first, pre-processing is performed. A complete training data-set is collected to construct the process base-line model. If the process is assumed to be linear, dynamic PCA [21] with a proper number of augmentation shift (h) may be applied to reduce the dimensionality and extract the principal components used in the proposed framework. On the other hand, if the process is non-linear, kernel PCA [29] with a proper choice of kernel function (i.e. Gaussian, polynomial Sigmoid, etc.) can be utilized. Then the initial mean m0 and variance v0 of the entire training data are determined, and accordingly standardization is conducted. 2) Algorithm 1 is given to calculate the proper transformation matrix Mφ0 = MT 2 MSP E + using the PCA approach, which is required for calculation U CLT 2 U CLSP E of a combined index φ0 [37] for the training data. Note that the superscript 0 in the combined index φ0 indicates the zero-order difference of the time-series X ∈ RN ×m are used in PCA to derive the loading matrices. The combined index is computed as φ0 (i) = x(i)Mφ0 xT (i) Algorithm 1. Principal component analysis (PCA) based on conducting SVD on the standardized data [21] 1: For training data X ∈ RN ×m , conduct mean centring and standardization. 1 ˆ U ˜ ]Λ[Vˆ V˜ ]T . X = [U 2: perform singular value decomposition (SVD) √ N −1 3: By following the cumulative percentage criteria choose the first r columns of the matrix U which include 95% (tuning parameter) of the variables variance. ˆ T from the loading ˆ Λ−0.5 U 4: Build the project matrices MSP E = V˜ V˜ T and MT 2 = U vectors for generating the proper residual signals. 5: The upper control limit (UCL) for the Hotelling’s T 2 statistic is calculated based on the fact that under normal operating condition T 2 follows a F distribution, U CLT 2 = (N − m)/(m(N − l))Fα (m, N − m). 6: The upper control limit (UCL) for the SPE index is calculated as U CLSP E = cα 2θ2 h20 θ2 h0 (h0 − 1) (1/h0 ) +1+ ) , where cα is the confidence interval that θ1 ( θ1 θ12 corresponds to the 1 − α percentile of the normal distribution. Also, θi = r 2 2 j=m+1 λj , i = 1, 2, 3 and h0 = 2θ1 θ3 /(3θ2 ). MSP E + 7: The projection matrix for the combined index is calculated as Mφ = U CLSP E MT 2 . U CLT 2
286
B. Rashidi and Q. Zhao
using the baseline correlation matrix Mφ0 for a given new test measurement x ∈ Rm . 3) A mean updating scheme is introduced and utilized in the proposed modified PCA. The objective of such a scheme is to relocate/shift the origin coordinates of the original multivariate signal space when the data mean variations are within the normal/expected operating zone. This origin relocation can retain the stationarity assumption of PCA approaches for transformation of the signal space to scores while the mean varies. An upper bound of the operating zone is defined according to the difference between combined index φ0 and its threshold. To this end, it is proposed to define a moving window with a length of WL in which weighted average filtering is conducted with respect to the difference of the combined residual index φ0 and its nominal base-line threshold for the entire training data. As a result, the mean of variables used in the mean-centering step gets updated at each sample time and resists to exceed the threshold as long as it is within the normal operating zone. However, if there exists a significant and abrupt mode change due to malfunction occurring in the process which drives one/some of the variables out of the normal operating zone, the mean values will instead reflect the fault induced changes, leading to successful detection of faults. For this purpose, a scalar measure is calculated as D(i) = φ0 (i) − U CLφ , which represents the distance between current combined index φ0 and its upper control limit (i.e. the threshold) U CLφ = 2. This number is selected as a less SP E(i) T 2 (i) conservative threshold based on the definition of φ0 (i) = + . U CLSP E U CLT 2 Remark 1. The upper control limit U CLφ is considered as a tuning parameter in the proposed modified PCA, which can be alternatively selected using the approximate distribution of φ0 in [38]. When setting U CLφ = 2, it assumes that mean variations of the process measurements should be significant enough to distort both T 2 and SP E indices beyond their upper control limits for activating the proposed mean updating rule, which is given in Eq. (2). In the following, a nonlinear function is introduced (Eq. (2)) to activate the updating rule of the mean given a new test data, which is designed to adjust the original training data mean according to the normal/expected operating zone as follows, ⎛ ⎞ ¯ ∗ (i) = ⎜ X ⎝
2 (
+
1
⎟ − 1⎠ m0
¯ )+1 (2) 1 + e−a(D(i)−V) W
L −1 WL − j 1 x∗ (i − j) − ¯ −a(D(i)− V) 0.5W (W + 1) 1+e L L j=0
1 + e−aD(i)
1 1 + e−aD(i)
−
1
where a ≥ 10 is the tuning parameter to adjust the sharpness of the switching ¯ is variation range with respect to D(i) to properly activate and function and V
Fault Detection of Non-stationary Processes Using a Modified PCA
287
deactivate the updating rule. WL is the length of a moving window for smoothing the measurement samples. The above formulation normalizes the new test data using the updated mean and the same standard deviation. It should be noted that the standard deviation of the measurements is assumed almost constant. ¯ stands for upper bound of variation of the normal operating In Eq. (2), V zone which is also the upper bound of changes for combined index φ0 . This value can be calculated based on the operator’s knowledge about the normal/expected range of variation of each process measurement. If the upper bounds of expected ¯ can variations of all process measurements Vi ∈ VX , i = 1, ..., m are known, V ¯ = VT Mφ0 VX . be determined as V X Remark 2. In practice, knowledge about the expected range of variations for all the process measurements might not be available. If the upper bound of expected variations of a process measurements are known (a < m), it may still be possible to calculate the other m − a unknown Vi using SVD as follows,
ˆ ˆN ×r U ˜N ×(N −r) Sr×r 0 Vˆm×r V˜m×(m−r) T . (3) XN ×m = U 0 0 According to Eq. (3), V˜m×(m−r) is the right null space of XN ×m , which contains columns of V corresponding to the zero singular values. The auto-regressive relationship between process measurements X = [x1 , x2 , ..., xm ]T can be captured using the rows of V˜m×(m−r) such that V˜ T X = 0. Using this homogeneous equation and knowing that (m − a) upper bounds of measurements’ variations are unknown, we can write V˜ T VX = 0, in which VX is an array of all process measurements’ upper bounds. The problem is then redefined and changed into solving a set of linear equations for (m − a) unknown upper bounds. To this aim, by rearranging the columns of V˜ according to the rearranged upper bound matrix unknown known T , VX ] ∈ Rm , the matrix A ∈ R(m−r)×(m−a) is built such VX = [VX that the problem is redefined to solve the following, unknown AVX = C,
(4)
unknown known where VX ∈ R(m−a) , and C ∈ R(m−r) is calculated by multiplying VX ˜ to the columns of the Vm×(m−r) corresponding to the (a) known upper bounds. The columns of V˜m×(m−r) corresponding to the (m − a) unknown upper bounds unknown are put together in matrix A. In general, Eq. (4) is consistent, i.e. VX it has at least one solution, if the row rank of augmented matrix [A | C] ∈ R(m−r)×(m−a+1) is equal to the row rank of coefficient matrix A ∈ R(m−r)×(m−a) . This solution is unique if this rank is equal to (m − a). Finally, when the upper bound of variations for all m process measurements are determined, the upper ¯ = VT Mφ0 VX . bound required in Eq. (2) is calculated as V X
Finally, as shown in Eq. (5), the zero-order error index E0 is defined as the first feature index for the given test data Xtest (i), ¯∗ ¯ test (i) = (Xtest (i) − X (i)) ∈ Rm X ν0 0 ¯ test (i)T ¯ E (i) = Xtest (i)Mφ0 X
(5)
288
B. Rashidi and Q. Zhao
By following this updating rule for variables means, it is assumed that the structure of the process is intact which implies that the principal directions remain the same during the process. Any relatively slow changes in the mean or oscillations due to normal operating variations are compensated by the mean updating rule. On the other hand, if there is a severe malfunction in the process which drives the combined index φ0 to significantly exceed its threshold (U CLφ0 = 2), the engineered error index E0 will not get updated and it will detect that malfunction. 2.1.2 The First-Order Error Index E1 This index is defined to monitor the first order differencing (rate of change) of process variables. Although the non-stationary mean variations of the process variables are unexpectedly random, it is expected that the rate of change is bounded in many cases. For instance, in a continuous stirred tank reactor (CSTR) process, there exists a catalyst that degrades along with time and induces a first-order (ramp) change into two variables during the operating process. In acetylene hydrogen reactor [39], some of the variables are subjected to drifting mean changes due to the degradation. Moreover, in the distillation column process, variables have a similar trend to a random walk signal, but the rate of change of the variations is bounded. The following shows detailed steps to calculate the proposed E1 . Consider a block-wise approach with the length of WI , determine the average rate of change of the variables X(i) ∈ Rm as follows, ⎛ ⎞ WI 2W
I 1 ⎝ X(i − j) − X(i − j)⎠ (6) X d1 (i) = WI j=1 j=WI
Then one can conduct PCA on X d1 ∈ RN ×m for N training samples in order to extract the principal transformation matrices Mφ1 by following the Algorithm 1. Then the first-order error index is determined, T
E1 (i) = X d1 (i)Mφ1 X d1 (i)
(7)
In the non-linear case, Kernel PCA will be applied on the time-series X d1 to extract the kernel transformation matrices Mkernel accordingly and the first φ1 CA order error index can be similarly calculated as E1 = k(xd1 )MKP k(xd1 )T . φ1 2.1.3 The Second-Order Error Index E2 This index is defined to monitor the second-order differencing of the process variations. Similar to E1 , this index can also be determined accordingly. Using a block-wise approach, we obtain ⎛ ⎞ WI 2W
I 1 ⎝ X d2 (i) = X d1 (i − j) − X d1 (i − j)⎠ . (8) WI j=1 j=WI
Fault Detection of Non-stationary Processes Using a Modified PCA
289
It should be noted that the training data X d1 is already standardized before it is used to generate the training data for the second-order feature E2 . After applying PCA on the X d2 ∈ RN ×m for the linear case and deriving the corresponding transformation matrix, the proposed feature is determined as follows, T
E2 (i) = X d2 (i)Mφ2 X d2 (i)
(9)
CA k(xd2 )T , Similarly for the non-linear case, it is defined as E2 = k(xd2 )MKP φ2 where k(xd2 ) represents the transformed test data using kernel transformation function.
Remark 3. The first-order error index E1 tends to monitor the rate of changes of the process time-series. Accordingly, the second-order error index E2 monitors the acceleration of the changes. It should be noted that higher-order indices may be also derived and incorporated as additional features, but as a rule of thumb, the zero-, first- and second-order indices convey three useful physical aspects of variations in most dynamic processes. 2.2
Unsupervised Non-parametric Learning for Combined Health Index
Based on the three monitoring indices defined, in this section, a non-parametric probability-based approach is proposed and used to learn normal behaviour of the system under no-fault operating conditions. Assuming X ∈ RN ×m with N observation is utilized for generating the feature indices, then, Nc is the number of sample point to build the feature matrix as follows, Xf = [log(E0 ), log(E1 ), log(E2 )] ∈ RNc ×3
(10)
The feature indices En , n = 0, 1, 2 are presented in Sect. 2.1. The logarithm trick is applied in order to transform the feature density function into a distribution which has more resemblance to the Gaussian for further analysis. Based on Xf , an overall condition monitoring index is to be determined and used in hypothesis testing for fault detection. In this case, a null hypothesis is defined for the no-fault case including non-stationary mean variations and normal mode changes, while the alternative hypothesis is defined for the faulty case, i.e. occurrence of process anomalies. To achieve this, it is suggested to obtain the probability density function(s) (PDF) of the feature indices for normal operating conditions. Then given a new process observation, the PDF can be utilized to estimate how likely the new observation belongs to the normal operating condition or not. To estimate the PDF of time-series, we suggest to use a conventional non-parametric approach, kernel density estimator (KDE) [40]. Using KDE to approximate the individual PDF for each column of the feature matrix, xjf , j = 1, 2, 3: Fjp (xjf )
Nc X − xjf (i) 1
), j = 1, 2, 3 = K( Nc h i=1 h
(11)
290
B. Rashidi and Q. Zhao
where Fjp is the estimated probability density function (PDF), K(.) is a kernel function (e.g. Gaussian, spherical, Epanechnikov, etc.) satisfying Mercer’s conditions, and h is the bandwidth of KDE which introduces a smoothing effect to its shape. A large value of h leads to fitting a smoother kernel distribution function and a small value produces a sharper one with a higher level of fluctuations. Bandwidth h can be selected adaptively using the maximum likelihood method or k-nearest neighbour approach that updates h according to the Euclidean distance from the k th nearest observation [41,42]. The estimated probability of feature index xjf , Pˆj (xf (i)), j = 1, 2, 3 is then calculated which represents probability of the process measurement corresponding to the normal operating condition, i.e. xf (i)+τj /2 Pˆj (xf (i)) = Fjp (x)dx τj Fjp (xf (i)). (12) xf (i)−τj /2
To determine the parameter τj in Eq. (12), first the minimum and maximum values corresponding to the upper and lower 99.99% percentile of the Fjp (xjf ) are calculated. Then according to the property of density function,
x ¯jf
xjf
Fjp (x)dx 1
where xjf = min(xjf ), x ¯jf = max(xjf ), the interval between xjf and x ¯jf can be divided into Nb subintervals. By using the Newton-Cotes formula, τj is approx x ¯jf − xjf Nb −1 p j , k = 1, 2, . . . , Nb . When the imated as τj = 1/ k=1 Fj xf + k Nb process measurements are subjected to a fault, the feature indices will deviate from their normal operating conditions, hence, the estimated probability Pˆj (xjf (i)) → 0 depending on the magnitude of fault-induced changes. On the other hand, under normal operating conditions, each feature should be near the maximum possible probability of xjf , which can be determined as γj = τj max(Fjp (xjf (k))), k = 1, ..., Nb The estimated Pˆj (·) for all three feature indices are used to calculate the combined health index R as follows, Π3j=1 Pˆj (xjf (i)) R(i) = a 1 − Π3j=1 γj
(13)
where a is a tuning parameter representing the upper bound of the overall index R. R in Eq. (13) is calculated based on certain desirable properties. For example, when there is a malfunction in the process and Pˆj values are small (or close to zero), the value of R should be close to the upper bound, i.e. R ≈ a. This feature is preferred for industrial users because they mostly desire to work with a bounded health index which yields to the maximum for faulty conditions but
Fault Detection of Non-stationary Processes Using a Modified PCA
291
have relatively negligible values for normal operations. Therefore, R is defined in such way to mostly generate values close to zero or its maximum limit. As a final step, it is required to determine the upper control limit (UCL) for the proposed health index R for proper thresholding. For each feature index, the α tails percentile of the corresponding KDE is calculated and its corresponding value is considered as for that feature. In other words, for the j th feature index, j is determined as, xjf ), Fjp (xjf )) j = max(Fjp (¯ x¯jf (14) s.t. Fjp (x)dx = α xjf
Hence, the overall UCL for the R in Eq. (13) is as follows, Π3j=1 j U CLR = a 1 − 3 Πj=1 γj
(15)
Upon observation of a new test process measurement xf ∈ R3 , the proposed health index is calculated as in Eq. (13). Then the null hypothesis is defined as H0 : R < U CLR (fault free), and the alternative hypothesis is H1 : R ≥ U CLR (faulty process). If the overall health index R has values greater than its threshold, it supports rejection of the null hypothesis and an alarm to flag the fault can be generated.
3
Alarm Generation for Process Monitoring
Alarm-based fault detection in chemical processes is investigated in a great number of research studies. In some cases, the end-user prefers a binary alarm signal indicating whether the process has a malfunction. With this aim, an alarm generator can be designed especially when the process is subjected to an oscillatory type of fault that leads to fluctuation in the proposed health index [43,44]. Almost all the fault alarm techniques are based on a predetermined trade-off between “false alarm rate” and “missed alarm rate”. For this purpose, receiver operating characteristic (ROC) curve is utilized to show the probability of missed alarm versus the probability of false alarm while the trip point varies from −∞ to +∞, [45]. Three conventional techniques to reduce false alarms and missed alarms include filtering, delayed alarm and alarm deadband [46]. Although the proposed general health index itself in Eq. (13) has certain favourable advantages as mentioned above, various uncertainties and disturbances such as occasional missing data, a surge in data acquisition system and sensor noise might create unwanted spikes that should not be detected as a process fault. To tackle this challenge, a rule-based alarm generator is proposed and given in Algorithm 2.
292
B. Rashidi and Q. Zhao
Algorithm 2. Alarm generating procedure using the proposed monitoring index R 1: INPUTS of Algorithm: R(i) := The overall health index U CLR := The upper control limit of overall health index b := Weighting parameter for marking up the faulty observations w1 := Window size for weighted averaging of overall health index w2 := The window size for fault continuity test 2: (Assign more weights to the residuals samples greater than UCL) 3: if Rc (i) < U CLR then 4: Rc (i) = R(i) 5: else 6: Rc (i) = bR(i) 7: (Define a window of length w1 to store previous weighted Rc ) 1 j=w1 Rc (i − j) (First Layer Alarm Generator → Alarm1 ) 8: R1 (i) = w1 j=0 9: if R1 (i)) > U CLR then 10: Alarm1 (i) = T rue 11: else 12: Alarm1 (i) = F alse —————————————— (Second Layer Alarm Generator → Alarm2 ) 13: —————————————— 14: if Alarm1 (i) == T rue then 1 j=i Boolval (Alarm1 (j)) > 0.75 then 15: if w2 j=i−w2 16: Alarm2 (i) = T rue 17: else 18: Alarm2 (i) = F alse 19: else 20: Alarm2 (i) = F alse —————————————— (Third Layer Alarm Generator → Alarm3 ) 21: —————————————— 22: if Alarm1 (i) == T rue & Alarm2 (i − 1) == T rue then 23: Alarm3 (i) = T rue 24: else if Alarm1 (i) == T rue & Alarm2 (i) == F alse & Alarm3 (i − 1) == T rue then 25: Alarm3 = T rue 26: else 27: Alarm3 (i) = F alse 28: OUTPUT of Algorithm: Alarm2 ⇒ Caution, Alarm3 ⇒ Fault 29:
The main idea of Algorithm 2 is based on 3-steps processing of alarm signals using a moving average filter and alarm delay technique. First, a higher weight is assigned to the faulty residual samples to penalize the normal samples in comparison with faulty counterparts. Second, length of the fault is considered to be greater than a predefined window to ensure that the alarm is not active
Fault Detection of Non-stationary Processes Using a Modified PCA
293
for an outlier measurement or surge of DAQ card due to digitization. Finally, a rule-based approach is deemed to connect the entire faulty zone and create a continuous alarm for the detected malfunction. In addition to the final fault alarm (Alarm3 ), Algorithm 2 also generates a caution signal (Alarm2 ) to notify operators of possible maintenance actions before the process reaches to the more severe faulty condition.
4
Implementation and Case Studies
In this section, firstly, a numerical example is presented and simulations conducted to evaluate the performance of the proposed method considering various common non-stationary conditions. Secondly, a comparison study is done between some of the existing approaches in the similar line of works (e.g. Recursive PCA and moving-window PCA) and the proposed modified PCA to investigate their performance under the selected fault scenarios. At last the proposed approach is implemented and tested on an industrial compressor data set to demonstrate the applicability and accuracy for an actual industrial system. 4.1
Numerical Example and Simulation Case Study
The following shows a numerical process model using a similar mathematical representation as in Eq. (1), X = U P + w, ⎡ ⎤ 3 2 3 −5 0 −3 0 0 0 0 P = ⎣0 0 −2 0 −1.5 0 2 0 0 0 ⎦ 1 0 −3 0 −2 −5 0 −2 8 3
(16)
where U = [ν, μ]T ∈ R3 , ν ∼ N(0, 0.05), δ ∼ N(0, 0.005) and w ∼ N(0, 0.0005) ∈ R10 . Also, μ(i) = μ(i − 1) + δ(i − 1) acts as a random walk noise that introduces the non-stationary behaviour to the process measurements. In Fig. 2, the process variables of the above system are plotted. The simulation lasts for 400 s with sampling time T = 0.01 s, which yields N = 40,000 observations). The total data points are split into 2 halves, with the first half (200 s) used for training/cross-validation, and the second half (200 s) for testing. Furthermore, for the first half of data points, 30% of data (corresponding to 60 s) is used for training and 70% (i.e. 140 s) used for cross-validation. The entire simulation generates 4 different modes, which are common in a controlled process. They manifest in four trends shown in all process variables, namely, the steady-state, the random variations which mimic closed-loop control transients, variations due to the step change and the ramp change (i.e. changes of set-point). For verification purposes only two of them are considered in training of the base-line features and KDE calculation, and the other two modes are assumed unknown to the proposed method during the learning process. The window lengths shown in Eqs. (2 and 6) are selected as WL = 50 and WI = 100, respectively. The expected upper bound of the normal variation
294
B. Rashidi and Q. Zhao
range of the time-series are known as V = [10, 7, 14, 19, 22, 10, 14, 15, 2, 3, 3]T . ¯ = VMφ0 VT = 298. When training the base-line models while applying then V PCA, the cumulative percentage variance CP V = 95% is chosen for all features. Figure 3(a) shows the indices E0 and φ0 (i.e. combined index from DPCA [37]) for the normal operating condition and shows how the proposed engineered error index is robust to the mode change and non-stationary variations. The objective of this simulation study is to demonstrate that the proposed scheme and overall health index R can be used to effectively distinguish the normal operating mode changes (or non-stationary variations) from the actual process malfunctions causing similar trends in process variables. In many existing methods, the nonstationary normal trends are very often flagged as faults, resulting in an excessive amount of false alarms. In Fig. 3(b), the overall index R with tuning parameter a = 105 is shown for normal operating condition and it can be observed that while there are two significant mode changes and non-stationary mean variations of four process variables, the overall index can successfully flag normal operating condition. Only a few isolated incidents of false alarms are generated comparing to constant false alarms over an extended time interval shown in Fig. 3(a). Figure 3(c) shows the three feature indices proposed in this work under normal operation conditions, which, as expected, do not show significant deviations under normal operations despite the non-stationary mode change.
Fig. 2. Plot of all 10 process variables of the numerical example showing 4 different normal operating trends: steady-state, random variation (transient), step change, and ramp change
Next three different fault scenarios are defined and introduced to the process for evaluating performance of the proposed method in detecting faults. It is
Fault Detection of Non-stationary Processes Using a Modified PCA
295
Fig. 3. The error indices using modified PCA and the DPCA approach in the numerical example under normal operating conditions
worth mentioning that all fault scenarios chosen in the simulation should satisfy the detectability criteria [41]. The proposed MPCA approach is mainly designed to perform fault detection in both stationary and non-stationary processes. As a special case, the proposed method acts as an ordinary PCA when non-stationary changes do not exist. 4.1.1 First Fault Scenario: Bias For this fault scenario, a bias intercept of F = [0, 8, 0, 0, 0, 0, 0, 0, 0, 0]T is added at 290th second to the process measurements and overall health index R is shown in Fig. 4(b). This indicates that although the magnitude of the additive fault is in the safe range of variation for the second variable, but the SPE portion of the ¯ = 298. Therefore, as shown combined index φ0 reacts aggressively and exceeds V in the Fig. 4(a), the mean updating rule is deactivated and the error index E0 shows the fault. Both the proposed method and DPCA can detect the fault. But DPCA generates an excessive amount of false alarms before correct detection of the actual bias fault, while the proposed method almost precisely detects the actual fault.
296
B. Rashidi and Q. Zhao
Fig. 4. The error indices using modified PCA and the DPCA approach in the numerical example under the first fault scenario (additive bias)
4.1.2 Second Fault Scenario: Slow Ramp Variation In this case, an additive slow ramp variation with the slope of 0.05 is introduced to the x10 at 290th second. The challenge in this scenario is that this fault does not cause the process variables to exceed their upper bounds, hence as shown in Fig. 5(a), the moving-mean updating rule is active and E0 does not reject the null hypothesis and it is blind to the fault. For this type of fault scenario the first-order error index E1 plays a prominent role in fault detection. This scenario appears as a fault for the first-order difference of the time-series. Figure 5(b) shows the index R for this scenario which precisely detects the presence of this fault. 4.1.3 Third Fault Scenario: Steep Ramp In this simulation case, x3 and x6 are subjected to a fault with a steep ramp trend with −0.2 and 0.4 slopes, respectively, which drag mean variations out of ¯ In this case, due to the more steep variation, the expected range of time-series V. 0 1 both E and E can detect this fault-induced change, hence the proposed overall index R shown in Fig. 6(b)) can successfully flag the fault with only a few isolated incidents of false alarms. To save the space, only E0 is plotted in Fig. 6(a). It should be noted that for the DPCA method, the fault is detected together with an excessive amount of false alarms before and after the actual fault incidents. 4.1.4 Fourth Fault Scenario: Random Drift In this case, a random drift fault is simulated, for which d(i) = d(i − 1) + (i − 1), ∼ N(0, 0.1), is added to the process variable x4 at 300th second. It can be seen from Fig. 7(a) that such a fault causes drastic fluctuations of the zeroorder index E0 . Generally speaking, the challenge of such a random drift fault is due to the fact that it does not have a deterministic trend, which can usually be detrended by normal regression approaches. On the contrary, random drift faults frequently drive the process measurements’ mean to cross their upper bounds ¯ in a random manner, which can increase the false alarm rate if using only V the zero-order index E0 . To deal with such a fault, all three feature indices E0 ,
Fault Detection of Non-stationary Processes Using a Modified PCA
297
Fig. 5. The error indices using modified PCA and the DPCA approach in the numerical example under the second fault scenario (slow ramp)
Fig. 6. The error indices using modified PCA and the DPCA approach in the numerical example under the third fault scenario (steep ramp)
E1 , E2 play an important role as shown in Fig. 7(c), which are combined in the overall index R. As shown in Fig. 7(b), the index R successfully flags the fault with rare false alarms. 4.2
Comparison Case-Study
In the above, to validate the performance of the proposed fault detection scheme, comparison between the proposed modified PCA and the ordinary PCA methods is conducted and their differences are investigated in details for multiple
298
B. Rashidi and Q. Zhao
Fig. 7. The error indices using modified PCA and the DPCA approach in the numerical example under the fourth fault scenario (random drift)
fault scenarios. In this section, a comparison case study between the proposed modified PCA approach with several recursive and adaptive PCA approaches is presented and the advantages of the proposed one are further demonstrate. Two aspects, online calculation complexity and fault detection accuracy, are used in the comparison. With respect to the calculation complexity, the proposed modified PCA relies on performing a cascaded feature engineering and does not require any online adaptation including performing SVD on updated covariance matrix or updating the loading vectors and number of principal components when the updating mechanism is triggered [47]. To clarify this difference, in Table 1, online calculation complexity of Ordinary and fast MW-PCA [48] as well as Recursive PCA [49] are shown and compared with the proposed modified PCA approach. Since there exist several different types of Recursive PCA, the Rank-One-Modification scheme is selected for the comparison as it is considered the fastest one among similar approaches. It can be seen from Table 1 that for the fast MWPCA, the overall complexity does not depend on the length of the moving window and it can be shown that, as a rule of thumb, if the length of moving window W > 3m, the overall online calculation complexity of fast MWPCA is better than ordinary MWPCA [48]. According to Table 1, RPCA is faster than MWPCA in general, but the proposed modified PCA in this paper still outperforms the RPCA in terms of online calculation complexity. Intuitively, if WL , WI and W are considered to be equal, the online complexity of proposed modified PCA is simplified to O(4(W + m)), which is faster than RPCA online complexity of O((W + 1)m).
Fault Detection of Non-stationary Processes Using a Modified PCA
299
Table 1. Comparison between online training calculation complexity of proposed modified-PCA, ordinary Moving-window PCA, Fast Moving-window PCA and Recursive PCA Approach
Online calculation complexity
Modified PCA
O(2(WL + WI ) + 4m)
Recursive PCA (rank-one modification) [49] O((W + 1)m) Ordinary moving-window PCA [48]
O(2W m2 + 8W m + 4m)
Fast moving-window PCA [48]
O(6m3 + 20m2 + 11m)
Fig. 8. Fault detection accuracy comparison study between the proposed Modified PCA (the rightmost bar) and ordinary moving-window PCA (MWPCA, the leftmost bar), fast MWPCA (the second bar from the left) and Recursive PCA (RPCA, the third bar from the left) using the numerical example for all four fault scenarios. Low FAR is desirable under normal condition and high FDR is desirable under faulty conditions
To compare the fault detection accuracy of the proposed modified PCA with its competitors MWPCA, fast MWPCA and RPCA, false alarm rate (FAR) and fault detection rate (FDR) indices defined in Eq. (17) and utilized. ⎧ N umber of alarms ⎪ ⎨F DR = N umber of f aulty samples (17) N umber of f alse alarms ⎪ ⎩F AR = . N umber of f ault f ree samples For this comparison study, the same numerical example presented in Sect. 4.1 is used. Firstly, all four selected approaches (proposed modified PCA, Ordinary MWPCA, Fast MWPCA) are applied under the non-stationary normal condition, for which the measurements are shown in Fig. 2. In addition, these methods are tested under all three fault scenarios (e.g. bias - slow ramp - steep ramp - random drift). With respect to the accuracy matrices, FAR is utilized to
300
B. Rashidi and Q. Zhao
measure the so-called Type I error of each approach for normal (Fault-free) operating conditions, and for the fault scenarios, FDR is utilized to measure the Type II error. To perform a fair comparison among the selected methods, for RPCA approach, the online adaptation mechanism is rendered in a sample-wise format. Also, for ordinary and fast MWPCA, the block-wise approach with the length of W = 100 samples is considered. In these three methods, according to the criteria defined in [47], the length of the exponentially weighted moving average (EWMA) filter is chosen as 50 samples. It should be noted that to make the fault detection results of all methods comparable in a similar sense, the T2 SP E + while the U CLs are not constant combined index φ = U CLSP E U CLT 2 and may change according to the update mechanism. Although the combined index is initially defined for the ordinary PCA, its definition can be seamlessly used for updated batches in MWPCA and RPCA approaches. The result of this comparison study is summarized in Fig. 8. One important difference between the proposed modified PCA approach and the other counterparts are that the principal components (i.e. dynamic of the process) of the system under study is considered to be consistent even during the time-varying non-stationary changes in normal operating conditions. On the other hand, all the other RPCA, ordinary MWPCA and fast MWPCA constantly update the structure of the system when the adaptation step is triggered and most importantly they are resilient to the sensor outliers and abrupt changes in the mean and standard deviation of the measurements. As can be seen in Fig. 8, although all three approaches have relatively better FDR for random drift as the sudden mean changes in this scenario are mostly captured as outliers and successfully being detected as an anomaly, the proposed modified PCA still outperforms them. Moreover, for bias and slow/steep ramp scenarios, the mean change keeps confusing the trigger mechanism in RPCA and MWPCA methods. The EWMA filter in the trigger mechanism frequently switches on and off hence can not consistently detect the presence of faults. It should be noted that for RPCA and both MWPCA approaches, the sensor fault reconstruction is not used in this study for fair comparison. In addition, RPCA and both MWPCA approaches have significantly higher FAR in the normal operating condition as they captured some of the intrinsic non-stationary changes in the measurements as faulty readings. Also for a slow ramp fault scenario, in which an additive fault starts to accumulate, the trigger mechanism is constantly active and the loading matrices/vectors are being updated and can not be able to capture this type of anomaly. The reason behind this shortcoming is that these methods are open to updating the principal loadings as long as the adaptation threshold is activated and does not have any measure on the structural changes. 4.3
Industrial Case-Study
An industrial compressor data set including 26 sensor measurements categorized in Table 2 is investigated for validating the performance of the proposed scheme
Fault Detection of Non-stationary Processes Using a Modified PCA
301
in this paper. The main reason for considering this industrial dataset is that the compressor is generally a complex non-linear and time-varying system, and the measurement data has regular non-stationary mean variations, which is the case of interests for this research. Table 2. Sensor measurement classification for industrial compressor; each class belongs to a different component. Vibration measurement Sealing system Lubrication oil Process measurement x1 → x9
x10 , x11 , x12
x13 , x14 , x15
x16 → x26
In this industrial application, only the process variables x16 to x26 are considered for monitoring the process health condition. From inspection and prior knowledge of domain experts, some normal batches of data are collected and used for training the non-linear base-line model and extracting the proposed features. The sampling rate for all measurement is 1 sample/min. The indices of time-series measurements selected for training the base-line models are 1000 to 2000, 4500 to 5500, 15000 to 15500, 26000 to 27000 and 96000 to 97000, respectively. The length of the weighted moving average filter in Eq. (2) is chosen as Wl = 10. For generating the first- and second-order feature indices shown in Eqs. (7) and (9), a window of size WI = 5 is considered acting as a moving average filter on the measurement increments. Considering that compressor measurements exhibit non-linear relationships, kernel PCA (KPCA) is adopted, in ||xi − xj ||2 which RBF kernel function k(xi , xj ) = exp − is used. To decom5 pose the principal direction of the kernel matrix for the entire three features, CP V = 98% is considered and the number of principal components is determined to be 5, 1 and 4 for the three feature indices, respectively. Figure 9 shows the standardized training data used for KPCA in the training mode. Also, the first and second-order rate of variations for extracting the feature indices are shown in Figs. 9(b) and 9(c), respectively. Figure 10 presents the result KPCA fault detection in the compressor data utilizing the proposed feature engineering method alone with Algorithm 2 for generating caution and alarm signals. The red circles represent fault alarms, and the purple star shows the caution alarm suggesting that the health status of the process should be examined. The idea behind generating a caution flag is to account for the situation where a fault at its primitive stage is developing or a disturbance case due to data acquisition surge, sensor spikes (i.e. oscillation) and missing data. Figure 11(a) shows the proposed health index R for the compressor data by fitting a multivariate kernel density estimator. As can be seen, the health index is mostly zero for the normal condition and is maximum for the faulty periods. This attribute of the health index helps the operators to compartmentalize the normal condition from anomalies much easier comparing to similar methods.
302
B. Rashidi and Q. Zhao
Fig. 9. The error indices for the industrial compressor training data-set.
Fig. 10. The error indices (Φ0 of a kernel PCA approach and the engineered zero-order feature index E0 of the proposed method) and the generated alarm log-history using Algorithm 2 for 5 months of the compressor data. This figure compares the difference of the proposed modified PCA with kernel tricks and a conventional kernel PCA (KPCA)
Fault Detection of Non-stationary Processes Using a Modified PCA
303
Although the manipulated inputs of the compressor within the range of 30,000 to 40,000 introduce a mean change to all process variables, the health index R could successfully recognize that as a normal operating variation and stays within the threshold. It should be noted that there exist several spikes and outlier samples of the proposed health index around the upper control limit which are mainly due to the presence of uncertainties, disturbances or missing data. This issue can be handled by applying an alarm generator such as the one proposed in Algorithm 2. For a close examination, zoom-in snapshots of two events are given in Figs. 11(b) and 11(c) which show the response of overall index R with higher resolution. It clear shows detection of several fault events on records.
Fig. 11. Process monitoring result of the compressor data using the proposed modified PCA and alarm Algorithm 2
304
5
B. Rashidi and Q. Zhao
Conclusion
In this paper, a modified PCA method is proposed that is applicable for both stationary and non-stationary processes. The key of this approach is to update the mean value of the variables adaptively based on their upper bound of the expected range of variations. Three error indices are defined as features that are proper indicators for the mechanical-statistical behaviour of the process measurements. Moreover, a non-parametric approach based on a kernel density estimator is used to generate a new health index to reconcile the features. A fault alarm generation algorithm is also given for practical industry implementation of the proposed method. The effectiveness of the modified PCA approach is demonstrated in a thorough comparison case study using a numerical example for different fault scenarios. A real industrial compressor data set is also used to test the proposed strategy for fault detection under several nonstationary operation conditions, and the performance is successfully verified.
References 1. Hwang, I., Kim, S., Kim, Y., Seah, C.E.: A survey of fault detection, isolation, and reconfiguration methods. IEEE Trans. Control Syst. Technol. 18(3), 636–653 (2010) 2. Venkatasubramanian, V., Rengaswamy, R., Yin, K., Kavuri, S.: A review of process fault detection and diagnosis: part I: quantitative model-based methods. Comput. Chem. Eng. 27(3), 293–311 (2003) 3. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S.: A review of process fault detection and diagnosis: part II: qualitative models and search strategies. J. Comput. Chem. Eng. 27(3), 313–326 (2003) 4. Li, L., Chadli, M., Ding, S.X., Qiu, J., Yang, Y.: Diagnostic observer design for T-S fuzzy systems: application to real-time weighted fault detection approach. IEEE Trans. Fuzzy Syst. 26, 805–816 (2017) 5. Youssef, T., Chadli, M., Karimi, H., Wang, R.: Actuator and sensor faults estimation based on proportional integral observer for TS fuzzy model. J. Franklin Inst. 354(6), 2524–2542 (2017) 6. Chibani, A., Chadli, M., Shi, P., Braiek, N.B.: Fuzzy fault detection filter design for T-S fuzzy systems in finite frequency domain. IEEE Trans. Fuzzy Syst. 25, 1051–1061 (2016) 7. Yin, S., Ding, S., Haghani, A., Hao, H., Zhang, P.: A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J. Process Control 22(9), 1567–1581 (2012) 8. Wang, Y., Ma, G., Ding, S., Li, C.: Subspace aided data-driven design of robust fault detection and isolation systems. Automatica 47(11), 2474–2480 (2011) 9. Ding, S., Zhang, P., Naik, A., Ding, E., Huang, B.: Subspace method aided datadriven design of fault detection and isolation systems. J. Process Control 19(9), 1496–1510 (2009) 10. Qin, S.: Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control. 36(2), 220–234 (2012) 11. Li, G., Qin, J., Yuan, T.: Nonstationarity and cointegration tests for fault detection of dynamic processes. In: IFAC Proceedings of the 19th World Congress, pp. 24–29 (2014)
Fault Detection of Non-stationary Processes Using a Modified PCA
305
12. Box, G.E.P.: Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Ann. Math. Stat. 25(2), 290–302 (1954) 13. Kourti, T., MacGregor, J.F.: Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemom. Intell. Lab. Syst. 28(1), 3–21 (1995) 14. Hu, Y., Palm´e, T., Fink, O.: Fault detection based on signal reconstruction with auto-associative extreme learning machines. Eng. Appl. Artif. Intell. 57, 105–117 (2017) 15. Lu, C., Wang, Z.-Y., Qin, W.-L., Ma, J.: Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process. 130, 377–388 (2017) 16. Lee, S., Kwak, M., Tsui, K.-L., Kim, S.B.: Process monitoring using variational autoencoder for high-dimensional nonlinear processes. Eng. Appl. Artif. Intell. 83, 13–27 (2019) 17. Choi, S.W., Martin, E.B., Morris, A.J., Lee, I.-B.: Fault detection based on a maximum-likelihood principal component analysis (PCA) mixture. Ind. Eng. Chem. Res. 44(7), 2316–2327 (2005) 18. Rashidi, B., Singh, D.S., Zhao, Q.: Data-driven root-cause fault diagnosis for multivariate non-linear processes. Control. Eng. Pract. 70, 134–147 (2018) 19. Johansen, S.: Statistical analysis of cointegration vectors. J. Econ. Dyn. Control 12, 231–254 (1988) 20. Chen, Q., Kruger, U., Leung, A.-Y.-T.: Cointegration testing method for monitoring nonstationary processes. Ind. Eng. Chem. Res. 7(48), 3533–3543 (2009) 21. Ku, W., Storer, R., Georgakis, C.: Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 30(1), 179–196 (1995) 22. Ma, L., Dong, J., Peng, K., Zhang, K.: A novel data-based quality-related fault diagnosis scheme for fault detection and root cause diagnosis with application to hot strip mill process. Control. Eng. Pract. 67, 43–51 (2017) 23. Hyv¨ arinen, A., Oja, E.: A fast fixed-point algorithm for independent component analysis. Neural Comput. 9(7), 1483–1492 (1997) 24. Yin, S., Ding, S.X., Zhang, P., Hagahni, A., Naik, A.: Study on modifications of pls approach for process monitoring. IFAC Proc. Vol. 44(1), 12389–12394 (2011) 25. Wold, S., Esbensen, K., Geladi, P.: Proceedings of the multivariate statistical workshop for geologists and geochemists principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52 (1987) 26. Rato, T., Reis, M.: Fault detection in the Tennessee Eastman benchmark process using dynamic principal components analysis based on decorrelated residuals (DPCA-DR). Chemom. Intell. Lab. Syst. 125(1), 101–108 (2013) 27. Scholkopf, B., et al.: Input space versus feature space in kernel-based methods. IEEE Trans. Neural Netw. 10(5), 1000–1017 (1999) 28. Ding, M., Tian, Z., Xu, H.: Adaptive kernel principal component analysis. J. Sig. Process. 90(5), 1542–1553 (2010) 29. Lee, J.-M., Yoo, C., Choi, S.W., Vanrolleghem, P.A., Lee, I.-B.: Nonlinear process monitoring using kernel principal component analysis. Chem. Eng. Sci. 59(1), 223– 234 (2004) 30. Hoffmann, H.: Kernel PCA for novelty detection. Pattern Recogn. 40(3), 863–874 (2007) 31. Cho, J.-H., Lee, J.-M., Choi, S.W., Lee, D., Lee, I.-B.: Fault identification for process monitoring using kernel principal component analysis. Chem. Eng. Sci. 60(1), 279–288 (2005)
306
B. Rashidi and Q. Zhao
32. Kwak, N.: Nonlinear projection trick in kernel methods: an alternative to the kernel trick. IEEE Trans. Neural Netw. Learn. Syst. 24(12), 2113–2119 (2013) 33. Geladi, P., Kowalski, B.R.: Partial least-squares regression: a tutorial. Analytica Chimica Acta 185(Supplement C), 1–17 (1986) 34. Hu, Z., Chen, Z., Gui, W., Jiang, B.: Adaptive PCA based fault diagnosis scheme in imperial smelting process. ISA Trans. 53(5), 1446–1455 (2014) 35. Liu, X., Kruger, U., Xie, L., Wang, S.: Moving window kernel PCA for adaptive monitoring of nonlinear process. Chemom. Intell. Lab. Syst. 96(2), 132–143 (2009) 36. Stork, C., Veltkamp, D., Kowalski, B.: Identification of multiple sensor disturbances during process monitoring. Anal. Chem. 69(24), 5031–5036 (1997) 37. Jolliffe, I.: Principal Component Analysis. Wiley, Hoboken (2014) 38. Yue, H.H., Qin, S.J.: Reconstruction-based fault identification using a combined index. Ind. Eng. Chem. Res. 40(20), 4403–4414 (2001) 39. Gao, Y., Wang, X., Wang, Z., Zhao, L.: Fault detection in time-varying chemical process through incremental principal component analysis. Chemom. Intell. Lab. Syst. 158, 102–116 (2016) 40. Silverman, B.W.: Density Estimation for Statistics and Data Analysis, in Monographs on Statistics and Applied Probability, Chapman and Hall/CRC, London (1986) 41. Ding, S.X.: Model-Based Fault Diagnosis Techniques: Design Schemes, Algorithms and Tools. Springer, London (2013). https://doi.org/10.1007/978-1-4471-4799-2 42. Silverman, B.: Density Estimation for Statistics and Data Analysis. Routledge, New York (1986) 43. Cheng, Y.: Data-driven techniques on alarm system analysis and improvement. Ph.D. thesis, University of Alberta, ECE (2013) 44. Lai, S.: Data-driven methods for industrial alarm flood analysis. Ph.D. thesis, University of Alberta, ECE (2017) 45. Montgomery, D.: Introduction to Statistical Quality Control. Wiley, New York (2004) 46. Izadi, I., Shah, S.L., Shook, D.S., Kondaveeti, S.R., Chen, T.: A framework for optimal design of alarm systems. IFAC Proc. Vol. 42(8), 651–656 (2009) 47. Dunia, R., Qin, S.J., Edgar, T.F., McAvoy, T.J.: Identification of faulty sensors using principal component analysis. Process Syst. Eng. 42(10), 2797–2812 (1996) 48. Wang, X., Kruger, U., Irwin, G.W.: Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 44(15), 5691–5702 (2005) 49. Li, W., Yue, H., Valle-Cervantes, S., Qin, S.: Recursive PCA for adaptive process monitoring. J. Process Control 10(5), 471–486 (2000)
Contribution to Health Monitoring of Silicon Carbide MOSFET Hubert Razik1,3(B) , Malorie Hologne-Carpentier2 , Bruno Allard1 , Guy Clerc1 , and Tianzhen Wang3 1
3
Univ Lyon, Universit´e Claude Bernard Lyon 1, INSA Lyon, Ecole Centrale de Lyon, CNRS, Amp`ere, UMR5005, 69622 Villeurbanne, France {hubert.razik,guy.clerc}@univ-lyon1.fr, [email protected] 2 Ecole Catholique d’Arts et M´etiers, Lyon, France [email protected] Logistics Engineering College, Shanghai Maritime University, Shanghai, China [email protected]
Abstract. Power converters’ usage is expanding ever in industrial applications as they provide flexibility, high level of performances and new functionalities. However, with increased complexity come new constraints with respect to reliability. This chapter covers a study on reliability of a lab-scale power electronic module taken here as a vehicle. The downsizing of converters and new application-related operating constraints are accompanied by an increase in current density. The use of Silicon Carbide wide-gap technology in power modules is therefore attracting but remains a challenge because this technology is not yet mature and does not benefit from the deep knowledge established about Silicon counterpart. Therefore, health monitoring has naturally emerged as an effective way to implement a reliability assessment. After a brief description of the expected failure modes, an experimental failure monitoring bench will be presented. The choice and implementation of failure indicators through a classification using a neural network will be discussed and presented.
Keywords: MOSFET SiC Neural network
1
· Health monitoring · Classification ·
Introduction
In the context of more electric mobility, many projects have focused the more electric aircraft [1]. The aim is to gradually replace hydraulic transmission systems by equivalent electric actuators for environmental, maintainability and compactness constraints even if the reliability of power electronic systems under harsh operating conditions are far from the one of traditional power equipment [2]. The electrical actuators will be integrated into more powerful and flexible electrical networks. The advantages of the transition to more electric infrastructure are multiple. Electrical systems are supposedly more compact and lighter for c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 307–329, 2022. https://doi.org/10.1007/978-3-030-82110-4_16
308
H. Razik et al.
a same level of power. They are also easily controllable and offer better immediate security. They are considered as more reliable for several reasons: their small size and properties allow for safer and simpler redundancy and their maintenance level is less restrictive than the one of hydraulic and pneumatic systems. Therefore, the reliability of a complete system is linked to its complexity. However, the electronic supply of electromagnetic actuators could bring some weaknesses to the whole system, especially in harsh environment (high temperature, vibration, ...). Some applications such as avionic do require a high level of reliability and an assessment is mandatory. Thus, monitoring of electric actuators is considered as essential in a large number of publications. Since power converters are directly connected to these actuators, their reliability is an important issue. Converters are key components in many applications but are also responsible for failures [3]. The power module is the core of a converter generally speaking. For the sake of power density, so-called planar power modules are preferred. The geometry of a planar module as well as the assembly of technologies render difficult the access to the temperature map or the power losses. It is the essential difference with an electro-mechanical system as an example, where successful monitoring approaches have been reported mainly because the observation of failure-related signals is possible. Practically a rainflow approach [4] is mostly considered when performing the monitoring in an actual power electronic system. The rainflow is feeded by power losses computation and, thus, junction temperature estimations. Even if successfully demonstrated, the approach requires some calibration and many tests to obtain the evaluation of number of cycles before failure according to a targeted junction temperature, thermal excursion and cycle time or some ageing laws [5,6]. Other approaches are based on the extrapolation of time-series built from thermal sensitive parameters [7]. This kind of approach seems less expensive to implement and will be preferred here. Most of the publications concerning the SiC1 components focus on-chip performances and are made in laboratory conditions. The SiC technology behavior being quite different from the previous one (Si), the well-known results about Silicon MOSFET may not be translated. We propose, in this chapter, to study as many failure mechanisms as possible from the chip to the interconnections inside a lab-scale module taken as a demonstrative vehicle. The module considered here is innovative in terms of technology. A special attention has been taken for its compactness and power density. Consequently, no model is available for this device. Therefore, the experimental approach through accelerated testing has been adopted in order to characterize its behaviour under different stresses. As a result, high stress cycles of operation are implemented to accelerate the ageing process [8]. Unfortunately, this approach generates a large amount of data because the ageing conditions must be various allowing an extrapolation of the consumed lifetime or the RUL2 [9–11]. It must also be ensured that they do not address the same defects. The literature shows that the estimation 1 2
Silicon Carbide. Remaining Useful Lifetime.
Health Monitoring of SiC
309
of the junction temperature is recommended as it can be considered a relevant indicator [12–16]. But its evaluation must be done indirectly by monitoring sensitive electrical parameters (TSEP). Therefore, their analysis should be carried out and then failure modes should be identified against the selected TSEPs. In a first step, a database is created, producing about fifty potential indicators, representative of a fault or not. A composite failure signature is built in order to follow the evolution of the defects in a classification space where the different areas will define some intervals of lifespan. The signature is naturally composed of failure indicators that will be selected through a systematic selection method to ensure the best sensitivity in terms of signature variation according to ageing while reducing the complexity and redundancy of the required information. A learning phase has been realized using 6 modules in order to create 4 classes as follow: healthy state, 30% of the lifespan (already over), 60% of the lifespan (already over) and end of the lifespan (nearby). We first present below an overview of the expected failure modes and faults. Then, we discuss the accelerating process with the description of the test bench for the ageing process and the active power cycling test. Conclusively, knowing that signatures are generated, we suggest the monitoring of selected best sensitive signatures to establish the percentage of life elapsed using a neural network [17].
2
Overview of Expected Failure Modes and Faults
In the literature, [18] indicates that based on a study of 200 products 34% of failures come from semiconductor and soldering failures in device modules while [19] implies that 38% of faults in variable-speed AC drives are due to failures inside power devices. Furthermore, [20,21] highlight the fault character in power electronic converters. These are distributed, in order of importance, as follow: – – – – – –
30% for capacitors; 26% for printed circuit-board (PCB); 21% for semiconductor devices; 13% for solders; 3% for connectors, and 7% for the rest.
According to another survey, [22] and [3] have shown that sources of stress distribution come from several conditions as follow: – – – –
55% are related strongly to temperature; 20% are related strongly to vibrations; 19% are related to humidity and moisture, and 6% are due to contaminants.
310
H. Razik et al.
Fig. 1. Distribution of failure modes [25]
Regardless of the technologies implemented in a power module (SiC, Si), the failure mechanisms are associated with the dices but not only. Indeed, the packaging takes a significant part. However, it depends on a multitude of parameters such as temperature, current level, voltage amplitude, switching frequency, etc., and the converter topology as mentioned in [23,24]. [25] has shown that packaging induces failures during the module lifespan. The distribution of the different modes is shown in Fig. 1. For instance, wire bonds represent a significant part of failures (about 26%). They are followed by interconnection problems such as chip attachment and metallization (about 15%) and the package (about 5%). It should be noted that the contamination or diffusion aspects take a significant part in the failure modes (18%). However, the rate of test errors is unfortunately high (19%). The two main failure modes observed are: – an open circuit: a conductive part is cracked or liftoff; – a short circuit: an insulator is perforated or a conductive part has melted around an insulator to generate an electrical contact. In any case, the most expected failure modes shown in Fig. 1 are: – – – – –
Bonding wires’ liftoff or cracking [25,26] Electromigration in connectors [25] Metalization reconstruction [27] Cracks in die attach [28,29] Direct Bonded Copper cracking [26,30]
Wire Bond Lift-Off or Cracking: Numerous studies have been conducted over the past decades on this type of fault because it was one of the most important. It is mainly due to the detachment of the bonding wire in a progressive mechanism.
Health Monitoring of SiC
311
Several phenomena can occur at the interface between the bond wire and the metalization of the upper pad. The first is observed when the metal of the bond wire and the metalization are dissimilar. Electromigration in Connectors: The power modules require metallic connection parts to connect the chips to the external connectors. These can undergo electromigration under high heat and high current conditions. Also this mechanism can lead to the creation of voids in the metallic conductors and consequently a local increase in resistivity. As a result, this will induce localized overheating and thus an increase in the failure rate. Metalization Reconstruction: Failure in a wire-bond-free module for the pad connection leads to another type of degradation [27]. Indeed, the Coefficients of Thermal Expansion (CTEs) are mismatched and effects can be observed during the PAC test. During a thermal cycle, the metalization layer undergoes compressive stresses between several relaxation phases that induce plastic deformation at the grain boundaries. The consequence is the creation of cracks between the semiconductor part and the metalization what corresponds to a reconstruction of the metallization. Also, the resistivity increases locally, leading undoubtedly to a localized increase in temperature and thus the appearance of a failure mode. Cracks in Die Attach: Beyond its mechanical function to insure an electrical contact between connectors and the chip, the die attach is also the path to evacuate heat to the base-plate. This layer, very close to the chip, undergoes considerable thermo-mechanical constraints and it is, according to [28], the main failure mode in IGBT3 power modules. A lifetime estimation in [29], has focused on solder failure modes. It is concluded that the major fault is a formation of cracks in the solder layer between the chip and the DBC4 . Indeed, this failure mechanism lays on progressive and successive degradation which leads to a general crack and an increase in resistivity and overheating as previously stated. [26] studied solder technologies with respect to reliability. With silicon carbide, lead solutions are not competitive any more. Silicon carbide dices will present reduced contact surface and higher operating temperature. Thus, some other technical solutions must been studied. Four solutions are described below: – A solution with a lead, silver and tin alloy does not present any deterioration after 1000 cycles; – A solution with silver nanoparticles presents cracks after 100–300 cycles; – A solution with a silver sintering presents delamination phenomena after 1000 cycles; – A solution with a gold alloy does not present any deterioration through 1000 cycles. The first and fourth solutions seem to be equivalent in terms of reliability. For economical reason, the first proposal is considered to improve the die attach reliability in module with SiC chips. 3 4
insulated-gate bipolar transistor. Direct Bonded Copper.
312
H. Razik et al.
DBC Cracking: [26] and [30] present studies on DBC issues. The choice of a substrate is essential because it must satisfy strong constraints. Indeed, it must ensure an efficient heat transfer. The DBC will have to ensure electrical continuity knowing that it will be subject to mechanical stresses generated by thermal effects. The CTE of the DBC must be similar to that of the substrate (silicon and silicon carbide). Among the most common flavors are Al2 O3 5 and AlN6 . However, preference will be given to AlN because its CTE is close to that of Si as well as SiC. Moreover, it is well suited for high-temperature applications. The weakness is mainly located at the edge of the interface between copper and substrate. Indeed, due to the various mechanical constraints, cracks appear at these parts. With the technological developments of components, the power density is constantly increasing (this is the case of SiC) the thermo-mechanical constraints are increasingly important. Moreover, SiC presents another weakness which is located at the gate level. Studies [31] show that under the same level of stress, SiC technologies are more fragile compared to Si. The weakness is due to the use of SiO2 7 . This study is consistent with the failures observed throughout our experiments.
3
Accelerated Ageing Process
In the literature, many accelerated life tests have been performed either on single chips, packaged or unpackaged, or on single modules. Few studies deal with a complete system. Few studies are dedicated to the failure mechanisms of SiC MOSFET-based power module as summarized in [32]. As mentioned before, the factors accelerating the degradation of the MOSFET gate are the high temperature, a high drain current (short circuit) or the application of a high voltage on the gate continuously. The DC gate voltage method will be considered to obtain an acceleration factor for the failure mode of the gate and not of the other parts of the chip and module. Concerning the failure modes of the module, the literature shows that thermal cycling seems to accelerate most of the expected failures. To impose thermal cycling, two approaches are possible. The first one is a passive cycling, the second one is an active cycling. – Passive cycling [33,34] consists in getting the module in a controlled climatic enclosure with a temperature varying from −55 ◦ C to 180 ◦ C. This method imposes thermo-mechanical constraints on the components and causes the failure modes expected in a power module; – Active cycling [10,35] is more representative of the operating constraints because the temperature distribution in the module is created by the heating of the chip. This study shows that the upper and lower temperature limits as well as the duration of the cycles allow exciting some failure modes more than others. 5 6 7
Aluminum oxide. Aluminum nitride. Silicon dioxide.
Health Monitoring of SiC
3.1
313
Description of the Test Bench for the Ageing Process
Reliability testing is performed on a module consisting of a single converter leg with one MOSFET per switch. To stress SiC power MOSFETs, there are two tests which are mainly used in literature: HTGB8 and HTRB9 . These tests have been created to stimulate the MOSFET gate oxide degradation. So, to initiate the failure modes specific to the MOSFET gate, the HTGB test was chosen. Two types of HTGB tests will be performed: the first with a positive bus voltage higher than the nominal voltage and the second one with a negative bus voltage lower than the nominal voltage. Each HTGB tests will be carried out on 4 identical modules in order to check the repeatability of the behavior of the indicators. The indicators observed during these tests will be the gate threshold voltage, the drain leakage current, the drain-to-source on-state resistance and the gate leakage current. The HTGB test consists in applying a constant gate bias under a high ambient temperature. During this test, no current flows through the MOSFET. Temperature and electrical stresses have to be chosen according to several parameters: the nominal operating values for the gate bias and the maximum temperature allowed. 3.2
Power Active Cycling Tests
The Power Active Cycling (PAC) tests using the HTGB test is realized to learn about the SiC MOSFET behavior suffering from a Gate oxide ageing under thermal cycling. So, PAC tests have been designed to stimulate different failure modes and try to decorrelate gate failure modes, metalization failure modes and transfer element failure modes. The first test consists in long thermal cycles created by the self-heating of the MOSFET under a nominal drain current. The aim is to trigger a heat diffusion through all the layers of the module during each cycle. This heat diffusion should trigger the failure modes concerning the metalization, solder transfer and the substrate. The second test is composed of short thermal cycles created by self-heating of the MOSFET whose drain current value will be much higher than its nominal value. This test will allow the triggering of charge traps in the gate oxide as well as the metallic reconstruction of the upper metalization. During both types of PAC tests, the gate leakage current, the saturation current, the gate threshold voltage, the on-state resistance, the reverse voltage and the thermal resistance are recorded. The saturation current will be used as a TSEP10 to estimate the temperature in the conduction channel. A calibration phase has been carried out and already shows that a rigorous protocol is necessary to obtain an accurate correlation between the saturation current and temperature. Indeed, the state of charge of the gate creates a bias in the value of the saturation current, so the calibration comes after an OFF state with a gate voltage of −5 V between each measurement in order to reset the state of charge of the gate. The global consideration of all these indicators 8 9 10
High Temperature Gate Bias. High Temperature Reverse Bias. Thermal Sensitive Electrical Parameters.
314
H. Razik et al.
in both types of test should make it possible to dissociate the different failure modes. For the sake of repeatability, the PAC tests will also be carried out on 10 sample modules. Only the lower-side MOSFET in the converter is tested and monitored. A last test will be performed in a specific test protocol monitoring the junction temperature, the drain leakage current and the reverse voltage to track the internal diode reliability.
Fig. 2. Synoptic of the power active cycling test bench
As shown in Fig. 2 the PAC test bench is composed of a power circuit to trigger the self-heating of the MOSFET under study. Another circuit allows to measure the value of the saturation current. A last circuit allows to realize a double-pulse. This step will be used to obtain an evaluation of the threshold voltage via the Miller plateau and to measure the on-state resistance. Figure 3 shows the logical sequences of driver signals. The self-heating pulses take some seconds and the double-pulse needs few microseconds. During the test duration, the power module undergoes a long heating phase (+1000 s) followed by a short cooling phase (4 s). The die self-heating during DC current conduction produces the heating phase. The die turn-off and a cooling system ensure the cooling phase. The temperature swing is imposed between 40 ◦ C and a maximum limit which is between 110 ◦ C to 150 ◦ C. This is an ageing accelerated test thanks to an electrical stress what creates successively self-heating and cooling phase in the SiC MOSFET. The temperature and the electrical parameters are evaluated during this cooling phase. Many dynamic parameters (Rise time of Gate voltage, Gate current peak . . . ) but also static parameters (On-state voltage, On-state resistance . . . ) are recorded during the double-pulse phase. A total of 50 potential ageing-related parameters are collected over the lifespan of the module (see Table 3 in Annex).
Health Monitoring of SiC
315
Fig. 3. Typical waveforms during the TSEP phase and the double-pulse measurement: (a) Measurement of IDS and VGS to obtain a temperature evaluation, (b) Measurement of IDS and VDS to extract a on-state resistance RDS ON evaluation, (c) Measurement of the Miller effect to extract the threshold voltage VT H .
The following figures show the configurations when the transistors are ON: – – – –
Figure 4: Figure 5: Figure 6: Figure 7:
typical typical typical typical
circuit when T2 is ON; circuit when T4 and T2 are ON, the SiC MOSFET is OFF; circuit when T3 is ON; circuit when T1 and T2 are ON.
During a switching transient, the following process is observed. Signals VGS and IGS are shown in Fig. 8 and signals VGS and IDS are shown in Fig. 9. The waveform in Fig. 8 is particularly interesting because of the pseudo Miller effect which is linked to the threshold voltage. This one is a good indicator of gate state-of-health. One can decompose the transient response during a commutation from offto on-state as shown in Fig. 9. – During stage 1, VGS increases and IGS decreases regularly: it is the Gate-toSource capacitance charging. The MOSFET is off and the drain current is nearly zero. The drain voltage remains mostly constant. – During stage 2, a drain current appears and the drain voltage decreases, because of the drain-to-gate capacitance end of charge. The drain-to-gate capacitance is charged with the current flowing in the gate-to-source capacitance. This negative feedback creates a p ¨ seudo − plateau which ¨ occurs during 47 ns. Features of this pseudo-plateau will constitute a reference in this study because they are linked to the health of the gate oxide. – During stage 3, the gate-to-source voltage increases until the driver operating point. The drain current reaches its nominal value and the drain-to-source voltage reaches its lowest value. Figures 10 and 11 show the test bench. It allows accelerating the ageing process of SiC MOSFETs [36].
316
H. Razik et al.
Fig. 4. Typical circuit when T2 is ON
Fig. 5. Typical circuit when T4 and T2 are ON
The feature selection is based on the following steps: 1 Collect parameters issued from 10 cycling tests, 2 Feature selection based on monotonous behavior of the parameters with ageing and sufficient information level, 3 Selection of 10 best significant parameters for classification purpose, 4 Signature extraction at determined percentage of lifespan, 5 Classification into 4 classes thanks to a supervised signature approach, 6 Monitoring and percentage of life elapsed estimation.
Health Monitoring of SiC
317
Fig. 6. Typical circuit when T3 is ON
Fig. 7. Typical circuit when T1 and T2 are ON
Test campaigns have produced a database constituted of 50 parameters recorded every 3 min during the lifespan of 10 modules knowing that each modules present a lifespan between 20,000 and 176,000 cycles of stress (see Table 3 in Annex). All these parameters have shown a monotonous evolution according to ageing over the lifespan. Parameter values have been normalized in time and in amplitude. As the on-state resistance had drifted over 10% for the MOSFET of the modules during lifespan, it has been decided that the degradation was sufficient to select ageing parameters, i.e. to reduce this set neglecting the parameters of least pertinence [37].
318
H. Razik et al.
Fig. 8. Typical electrical signals during a SiC power MOSFET turn-On
Fig. 9. Typical electrical signals during a SiC power MOSFET turn-on: stage 1 to 3
Health Monitoring of SiC
Fig. 10. Test bench: global view
Fig. 11. Test bench: close-up view
319
320
4
H. Razik et al.
Failure Mode Signature
The conduction path degradation and a partial perforation of the gate oxide lead to the drift of several parameters. In each test, the duration of ageing has been different and the definitive failure mode has been either a deterioration of the gate or a deterioration of the drain-to-source path. According to this variability, the chosen parameters must be sensitive to the related failure modes. To judge the sensitivity of parameters, two mathematical tools are used: – Spearman correlation coefficient calculation in order to check the monotonous behavior of the signal with ageing [38]; – Shannon entropy calculation in order to check the sensitivity [39]. Once the monotonous parameters are isolated, their sensitivities are compared. As the ageing has led to different failure modes with different lifespans, the Shannon entropy coefficient varies for each tested device. The aim is to gather a batch of parameters sufficiently sensitive to both possible failure modes. The selection of parameters is based on two steps. The first step consists in evaluating the Spearman correlation which allows detecting parameters linked to ageing [40]. The Spearman approach lays on a research of a monotonous behavior at each rank iteration. The calculation of the coefficient is given by: 6 ρS = 1 −
n i=1
(rg(Xi ) − rg(yi ))2 n(n2 − 1)
(1)
where rg(Xi ) − rg(yi ) is the difference between the two ranks (two recordings) of each point and n is the number of points. As a consequence, the Spearman approach helps reducing the number of candidates from 50 down to 20. The second step is based on a Shannon entropy calculation which highlights the most informative parameters [41]. It constitutes a good tool to find out the best information carried out by the correlated failure precursors. The Shannon entropy, H, evaluates the level of information contained in a signal as [41]: H(X) =
n
Pi .log2 (Pi )
(2)
i=1
where X is the studied signal and Pi is the probability to meet the ith class of the signal. In our application, the Shannon entropy calculated for 20 pre-selected parameters gives: – the maximum entropy value is 3.2 (100%); – the minimum entropy value is 0.5 (0%). Table 1 shows the entropy coefficient of the 10 selected parameters. We are not able to classify them by order of importance, that is why we keep all these 10parameter batch and then proceed to a PCA11 to reduce the problem order of the neuronal network used afterwards. 11
Principal Component Analysis.
Health Monitoring of SiC
321
Table 1. Entropy calculation for correlated parameters of each tested module Modules Parameters
TM1 TM2 TM3 TM4 TM5 TM6 TM7 TM8 TM9 TM10
Temperature 2.5
2.8
2.3
3.2
3.1
2.6
2.6
2.2
2.4
3.1
IdVg
2
X
2.25
3
2.9
2.6
X
2.3
2.8
2.9
IdP
2
X
2.25
3
2.95
2.5
X
2.3
2.8
2.6
RiseT ime
1.1
2
X
2.7
X
2.3
X
X
X
3.2
Tmeanplateau
0.5
1.6
X
2.3
X
2
X
X
X
3
RDSON
X
3
2.5
2.8
3
2.5
2.9
2.5
2.6
3
VDSON
X
3
2.4
2.9
3
2.5
2.5
2.3
2.5
3.1
PM iller
X
2.7
2.3
2.5
3
2.3
2.5
2.3
2.3
3
M eanplateau X
2.7
X
2.9
X
X
X
X
X
X
Coeflinear
3
X
2.7
X
X
X
X
X
X
X
The 10 parameters are selected thanks to a satisfying correlation level (65%) with ageing and cover more than one failure mode: TJ , IDS ∗ VGS , IDS , VDS ON , PM iller , TM ean P lateau , VGS Slope , Rise T ime, Pinst and RDS ON (see Table 2). Consequently, a 10-dimension signature is composed of parameters whose drift is linked to the gate oxide failure mechanisms and some other parameters whose drift is linked to interconnection issues. Table 2. Ten parameter candidates composing the failure signatures Abbreviation Description
4.1
TJ IDS ∗ VGS IDS VDS ON
Estimated temperature at the third pulse Instantaneous power during the second pulse Current during the second pulse Voltage during the second pulse
PM iller TM ean P lateau VGS Slope Rise T ime Pinst RDS ON
Mean voltage of Miller plateau Mean time of Miller plateau Slope of the Miller plateau Gate voltage rise-time Instantaneous power On-state resistance
Failure Signature Classification
Once the signature is constituted, a classification helps matching the power module ageing and the signature value. Six modules have been chosen to constitute
322
H. Razik et al.
a learning database [37]. For each module, we have extracted the value of the signature from different lifespan of the power module (0%, 30%, 60% and 100%), as pictured in Fig. 12. The modules have shown various failures under different stress levels so they constitute a base to construct a model of classification. – – – –
Class Class Class Class
1: 2: 3: 4:
5 5 5 5
signature signature signature signature
values values values values
are are are are
extracted extracted extracted extracted
from healthy samples; around 30% of the lifespan; around 60% of the lifespan; before the end of the lifespan.
Fig. 12. Signature Classes’ construction
Fig. 13. Structure of a neural network
Health Monitoring of SiC
323
Four learning classes have been constructed upon the values of the signatures for various module lifespans. For each module a range of 5% of the lifespan has been defined around the reference values by extracting 5 values of the signature for each class. The learning phase is realized with 5 points extracted for 4 classes during the lifespan of 6 modules to constitute the learning database. A total of 120 signatures are available in the database. The learning phase has been realized with 84 signatures among the 120 constituting the learning data. The validation test has been realized with 18 signatures and finally an attribution test has been made with 18 signatures. The results are presented in Fig. 14. 4.2
Neural Network
Many schemes or types of Neural Networks exist [42–46]. The simplest one is a neural network based on three layers. This one is widely used and covers a large number of applications. Each layer is composed of neurons which are connected to the previous layer via weights. The first layer is connected to the input variables. The output layer is composed of one neuron per output. In our application, we have 10 input variables and 4 output variables as shown in Fig. 13. A bias is connected to all layers via weights. The relationship between inputs and outputs in a multilayer NN12 is generally based on nonlinear functions. The activation function used in the hidden layer is a sigmoid Symmetric Transfer Function described as follows: f hj (.) =
2 − 1 = tanh(yj ) 1 + exp(−2.yj )
where yj = bj .bias +
n
wi,j .xi
(3)
(4)
i=1
with xi the input, wi,j is the weight between the input xi and the neural j, bi is the weight between the bias and the neural j and n the number of inputs. This type of function is interesting for the training process because its derivative is a continuous function (f (x) = 1 − f (x)2 ). For the output layer, the choice is a softmax function. So, we introduce an intermediary variable based on this equation: ⎛ ⎞ n 4 4 n wj,k .f hj (.) − max ⎝ bk .bias + wj,k .f hj (.)⎠ (5) Δk = bk .bias + j=1
k=1
k=1 j=1
Finally, the output layer is described as follows: f ok (.) =
Δk 4 k=1
12
Neural Network.
exp Δk
(6)
H. Razik et al.
2 3 4
20 0 0 0 23.8% 0% 0% 0% 0 18 0 0 0% 21.4% 0% 0% 0 0 22 0 0% 0% 26.2% 0% 0 0 0 24 0% 0% 0% 28.6% 2 3 4 1
Output class (a) Training confusion matrix
Target class
1 2 3 4
6 0 0 0 33.3% 0% 0% 0% 0 6 0 0 0% 33.3% 0% 0% 0 0 2 0 0% 0% 11.1% 0% 0 0 0 4 0% 0% 0% 22.2% 2 3 4 1
Output class (b) Validation confusion matrix
1 Target class
Target class
1
2 3 4
4 0 0 0 22.2% 0% 0% 0% 0 6 0 0 0% 33.3% 0% 0% 0 0 6 0 0% 0% 33.3% 0% 0 0 0 2 0% 0% 0% 11.1% 2 3 4 1
Output class (c) Test confusion matrix 1 Target class
324
2 3 4
30 25% 0 0% 0 0% 0 0% 1
0 0% 30 25% 0 0% 0 0% 2
0 0% 0 0% 30 25% 0 0% 3
0 0% 0 0% 0 0% 30 25% 4
Output class (d) Combined confusion matrices
Fig. 14. Neural Network confusion matrices issued from the learning phase (84 signatures), from the validation phase (18 signatures) and from the test phase (18 signatures) gathered in the combined confusion matrix view.
The learning algorithm is based on scaled conjugate gradient back propagation knowing that our NN design is composed of one input layer with 10 nodes, one hidden layer with 10 nodes and one output layer with 4 nodes. The goal of the training process is to define all weights of the NN. The learning phase was tested on 70% of the database, i.e. using 84 signatures among the total of 120 signatures. 15% of signatures are dedicated to the validation phase and 15% for the test. The results are shown in Fig. 14. The detection of the condition monitoring system is effective. The detailed rates of classification are also presented: healthy state (1), 30% of the lifespan (2), 60% of the lifespan (3), and close to the end of the lifespan (4). In conclusion, results show an efficiency of 100% in class attribution. In order to validate this classification using a NN, a test on a separate sample module has been performed. The result is promising as 96% of good attribution to classes are observed. Classes 1, 3 and 4 are well discriminated with this model. Only Class 2 is not determined for the last signature. One can suppose that
Health Monitoring of SiC
325
this class is certainly not discriminating enough for the module under test (i.e. the lab-scale technology selected to provide ageing data). In conclusion, the neural network shows promising results. Signatures are well discriminated. The proposed approach to estimate the remaining useful lifetime is a rough draft and can constitute a great track to explore. The results obtained with few samples are promising. The next step would be to consider a realistic mission profile to estimate the remaining lifetime in a given environment and not only the estimated percentage of elapsed lifespan. Rainflow-type approaches could be used in this perspective.
5
Conclusion
This chapter wishes to offer a synopsis of a larger work on the estimation of the percentage of life consumed in the operation of a power module with SiC power MOSFETs. Results are based on a lab-scale power planar module of non-mature technology. One interest was to give rise to failure modes of a large variety. It is then a complete example of the necessary steps involved in selecting a monitoring strategy. Failure modes have to be identified as well as the parameters to sense them. The most informative precursors are selected among a large number of recorded ones during the test, thanks to the Spearman correlation and the Shannon entropy calculation. These mathematical tools have allowed to systematically extract the precursors that have a monotonous evolution against ageing. Then, the reduction in the number of contenders was made based on the 10 best candidates which better represent variations according to ageing whatever the failure mechanism in progress. According to this signature evolution, 4 classes have been created thanks to a learning process based on 6 modules. The first class represents a healthy signature, the second one a signature around 30% of the lifespan, the third one a signature around 60% of the lifespan and finally a fourth class which represents a signature at the end of the lifespan. One efficient model to discriminate these classes is a neural network. The model was tested on a separate module and has attributed the signature in the true class with a probability of 96%. Finally, an estimation method has been proposed for the percentage of consumed life. The method has been implemented. The important conclusion is the applicability of the strategy detailed in the chapter: it is repeatable with quite any technology of planar module based on SiC MOSFETs. Acknowledgment. Authors acknowledge the financial support of EU H2020 project I 2 M P ECT , grant n◦ 636170.
326
H. Razik et al.
Annex
Table 3. 50 ageing parameters extracted or calculated from the online measurement files Parameters
Description
Number of parameters obtained
TP(i)
Estimated temperature at pulse 1, 2 and 3
3
IdVg(i)
IDS .VGS at pulse 1, 2 and 3
3
IdP(i)
IDS at pulse 1, 2 and 3 1 3 i=1 IDSP (i) at pulse 1, 2 and 3 3
3
MId RDSON
On state resistance
1
PM iller
Current power during the second pulse
1
VDSON
Voltage during the second pulse
1
IDSON
Current during the second pulse
1
EON
Injected energy in the gate
1
EONN
Injected energy in the gate normed by ID
1
EONN 2
Injected energy in the gate normed by ID 2
1
IGmax
maximum gate current
1
VGmax
maximum gate voltage
1
AreaIG
Area under IG during turn ON
1
FIG
Pseudo-frequency of IG during turn ON
RiseT imei
Rise time to reach i V with i from 1 to 15
AreaVGS
Area under VGS during turn ON
Areaplateau
Area under VGS plateau during turn ON
1
M axplateau
Maximum point of VGS plateau
1
M inplateau
Minimum point of VGS plateau
1
M eanplateau
Mean level of VGS plateau
1
1
1 15 1
Lengthplateau Length of VGS plateau
1
Tmeanplateau
Mean time of VGS plateau
1
SlopeVGS
Slope of VGS curve to reach the plateau
1
SDplateau
Standard deviation of VGS plateau points
1
Coeflinear
linear function fitting on VGS plateau
1
Coefpoly
3th order function fitting on VGS plateau
4
References 1. Harikumaran, J., et al.: Failure modes and reliability oriented system design for aerospace power electronic converters. IEEE Open J. Ind. Electron. Soc. 2, 53–64 (2021). https://doi.org/10.1109/OJIES.2020.3047201
Health Monitoring of SiC
327
2. Wang, B., Cai, J., Du, X., Zhou, L.: Review of power semiconductor device reliability for power converters. CPSS Trans. Power Electron. Appl. 2(2), 101–117 (2017). https://doi.org/10.24295/CPSSTPEA.2017.00011 3. Wang, H., Liserre, M., Blaabjerg, F.: Toward reliable power electronics: challenges, design tools, and opportunities. IEEE Ind. Electron. Mag. 7(2), 17–26 (2013). https://doi.org/10.1109/MIE.2013.2252958 4. Musallam, M., Johnson, C.M., Yin, C., Bailey, C., Mermet-Guyennet, M.: Realtime life consumption power modules prognosis using on-line rainflow algorithm in metro applications. In: IEEE Energy Conversion Congress and Exposition, September 2010, pp. 970–977 (2010). https://doi.org/10.1109/ECCE.2010.5617883 5. Bayerer, R., Herrmann, T., Licht, T., Lutz, J., Feller, M.: Model for power cycling lifetime of IGBT modules - various factors influencing lifetime. In: 5th International Conference on Integrated Power Electronics Systems, pp. 1–6 (2008) 6. Ciappa, M.: Lifetime modeling and prediction of power devices. In: 2008 5th International Conference on Integrated Power Systems (CIPS), March 2008, pp. 1–9 (2008) 7. Haque, M.S., Shahedd, M.N.B., Choi,S.: RUL estimation of power semiconductor switch using evolutionary time series prediction. In: IEEE Transportation Electrification Conference and Expo (ITEC), June 2018, p. 564–569 (2018). https://doi. org/10.1109/ITEC.2018.8450131 8. Ciappa, M., Carbognani, F., Fichtner, W.: Lifetime prediction and design of reliability tests for high-power devices in automotive applications. IEEE Trans. Device Mater. Reliab. 3, 191–196 (2003). https://doi.org/10.1109/TDMR.2003.818148 9. Tian, B., Qiao, W., Wang, Z., Gachovska, T., Hudgins, J.: Monitoring IGBT’s health condition via junction temperature variations. In: Applied Power Electronics Conference and Exposition (APEC), 2014 Twenty-Ninth Annual IEEE, pp. 2550– 2555 (2014). https://doi.org/10.1109/APEC.2014.6803662 10. Choi, U.M., Blaabjerg, F., Jorgensen, S.: Power cycling test methods for reliability assessment of power device modules in respect to temperature stress. IEEE Trans. Power Electron. 33(3), 2531–2551 (2018). https://doi.org/10.1109/TPEL. 2017.2690500 11. Zhao, S., Chen, S., Yang, F., Ugur, E., Akin, B., Wang, H.: A composite failure precursor for condition monitoring and remaining useful life prediction of discrete power devices. IEEE Trans. Industr. Inf. 17(1), 688–698 (2021). https://doi.org/ 10.1109/TII.2020.2991454 12. Anderson, J. M., Cox, R. W.: On-line condition monitoring for MOSFET and IGBT switches in digitally controlled drives. In: IEEE Energy Conversion Congress and Exposition, pp. 3920–3927 (2011). https://doi.org/10.1109/ECCE.2011.6064302 13. Beczkowski, S., Ghimre, P., de Vega, A. R., Munk-Nielsen, S., Rannestad, B., Thogersen, P.: Online VCE measurement method for wear-out monitoring of high power IGBT modules. In: 15th European Conference on Power Electronics and Applications (EPE), pp. 1–7 (2013). https://doi.org/10.1109/EPE.2013.6634390 14. Hiller, S., Beier-Moebius, M., Frankeser, S., Lutz, J.: Using the Zth(t) - power pulse measurement to detect a degradation in the module structure. In: International Exhibition and Conference for Power Electronics, Intelligent Motion, Renewable Energy and Energy Management, pp. 1–7 (2015) 15. Nikolaidis, E., Ghiocel, D.M., Singhal, S.: Engineering Design Reliability Applications for the Aerospace, Automotive, and Ship Industries. CRC Press, Boca Raton (2008)
328
H. Razik et al.
16. Dusmez, S., Heydarzadeh, M., Nourani, M., Akin, B.: Remaining useful lifetime estimation for power MOSFETs under thermal stress with RANSAC outlier removal. IEEE Trans. Industr. Inf. 13(3), 1271–1279 (2017). https://doi.org/10. 1109/TII.2017.2665668 17. Hologne, M.: Contribution to condition monitoring of Silicon Carbide MOSFET based power module. Dissertation, Universit´e Claude Bernard Lyon 1 (2018) 18. Wolfgang, E.: Examples for failures in power electronics systems. In: ECPE Tutorial ‘Rel. Power Electron. Syst.’, Nuremberg, Germany, April 2007 19. Fuchs, F.W.: Some diagnosis methods for voltage source inverters in variable speed drives with induction machines–a survey. In: IECON 2003. 29th Annual Conference of the IEEE Industrial Electronics Society (IEEE Cat. No. 03CH37468), vol. 2, pp. 1378–1385 (2003). https://doi.org/10.1109/IECON.2003.1280259 20. Yang, S., Xiang, D., Bryant, A., Mawby, P., Ran, L., Tavner, P.: Condition monitoring for device reliability in power electronic converters: a review. IEEE Trans. Power Electron. 25(11), 2734–2752 (2010). https://doi.org/10.1109/TPEL.2010. 2049377 21. Yang, S., Bryant, A., Mawby, P., Xiang, D., Ran, L., Tavner, P.: An industry-based survey of reliability in power electronic converters. IEEE Trans. Ind. Appl. 47(3), 1441–1451 (2011). https://doi.org/10.1109/TIA.2011.2124436 22. Manohar, S.S., Sahoo, A., Subramaniam, A., Panda, S.K.: Condition monitoring of power electronic converters in power plants – a review. In: 20th International Conference on Electrical Machines and Systems (ICEMS), pp. 1–5 (2017). https:// doi.org/10.1109/ICEMS.2017.8056371 23. Nel, B.J., Perinpanayagam, S.: A brief overview of SiC MOSFET failure modes and design reliability. Procedia CIRP 29 (2017). https://doi.org/10.1016/j.procir. 2016.09.025 24. GopiReddy, L.R., Tolbert, L., Ozpineci, B.: Power cycle testing of power switches: a literature survey. IEEE Trans. Power Electron. 30(5) (2015). https://doi.org/10. 1109/TPEL.2014.2359015 25. Bower, G., Rogan, P., Kozlowski, J., Zugger, M.: SiC power electronics packaging prognostics. In: IEEE Aerospace Conference, Big Sky, MT, USA, March 2008. https://doi.org/10.1109/AERO.2008.4526605 26. Ji, B., Pickert, V., Zahawi, B.: In-situ bond wire and solder layer health monitoring circuit for IGBT power modules. In: 7th International Conference on Integrated Power Electronics Systems (CIPS), Nuremberg, Germany, March 2012, pp. 1–6 (2012) 27. Durand, C., Klingler, M., Coutellier, D., Naceur, H.: Confrontation of failure mechanisms observed during active power cycling tests with finite element analyze performed on a MOSFET power module. In: 14th International Conference on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems (EuroSimE), Cardiff, UK, April 2013, pp. 1–4 (2013) 28. Chen, Y., et al.: Study on the effects of small swing of junction temperature cycles on solder layer in an IGBT module. In: 2016 IEEE 8th International Power Electronics and Motion Control Conference (IPEMC-ECCE Asia), Heifei, China, May 2016, pp. 3236–3240 (2016) 29. Musallam, M., Johnson, C., Yin, C., Bailey, C., Mermet-Guyennet, M.: Real-time life consumption power modules prognosis using on-line rainflow algorithm in metro applications. In: Energy Conversion Congress and Exposition (ECCE), Atlanta, GA, USA, September 2010, pp. 970–977 (2010)
Health Monitoring of SiC
329
30. Xu, L., Zhou, Y., Liu, S.: DBC substrate in Si- and SiC-based power electronics modules: design, fabrication and failure analysis. In: 2013 IEEE 63rd Electronic Components and Technology Conference (ECTC), Las Vegas, NV, USA, May 2013, pp. 1341–1345 (2013) 31. Nguyen, T.T., Ahmed, A., Thang, T.V., Park, J.H.: Gate oxide reliability issues of SiC MOSFETs under short-circuit operation. IEEE Trans. Power Electron. 30(5), 2445–2455 (2015). https://doi.org/10.1109/TPEL.2014.2353417 32. Ni, Z., Lyu, X., Yadav, O. P., Cao, D.: Review of SiC MOSFET based three-phase inverter lifetime prediction. In: IEEE Applied Power Electronics Conference and Exposition (APEC), Tampa, FL, USA, March 2017, pp. 1007–1014 (2017). https:// doi.org/10.1109/APEC.2017.7930819 33. Zhang, L.: Feasibility study on SiC-based power modules for high-temperature applications. PhD report, chapter 3, Sciences and technologies University of Bordeaux, France (2012) 34. Chen, L., Lai, Z., Cheng, Z., Liu, J.: Reliability investigations for encapsulated isotropic conductive adhesives flip chip interconnection. In: Proceeding of the Sixth IEEE CPMT Conference on High Density Microsystem Design and Packaging and Component Failure Analysis, Shangai, China, June 2004, pp. 134–140 (2004) 35. Jiang, N., Chen, M., Xu, S., Lai, W., Bing, G., Chen, Y.: Lifetime evaluation of solder layer in an IGBT module under different temperature levels. In: 8th International power electronics and motion control IEEE conference (IPEMC-ECCE Asia), pp. 3137-3141 (2016). https://doi.org/10.1109/IPEMC.2016.7512797 36. Hologne, M., et al.: An experimental approach to the health-monitoring of a silicon carbide MOSFET-based power module. In: IEEE International Electric Machines and Drives Conference (IEMDC) (2017). https://doi.org/10.1109/IEMDC.2017. 8002028 37. Hologne, M., Bevilacqua, P., Allard, B., Clerc, G., Razik, H.: Test bench and data analysis towards an on-line Health Monitoring for emerging power modules. In: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society (2018). https://doi.org/10.1109/IECON.2018.8592696 38. Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904) https://doi.org/10.2307/1412159 39. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–432 (1948) 40. Myers, L., Sirois, M.J.: Encyclopedia of Statistical Sciences (2004). https://doi. org/10.1002/0471667196.ess5050.pub2 41. Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948). https://doi.org/10.1002/j.1538-7305.1948.tb01338.x 42. Picton, P.: Introduction to Neural Networks. The MacMillan Press (1994). https:// doi.org/10.1007/978-1-349-13530-1 43. Chen, C.H.: Fuzzy Logic and Neural Network Handbook. Editor McGraw-Hill (1996). ISBN: 0070111898, 9780070111899 44. Bose, B.K.: Neural network applications in power electronics and motor drives – an introduction and perspective. IEEE Trans. Ind. Electron. 54(1) (2007). https:// doi.org/10.1109/TIE.2006.888683 45. Bose, B.K.: Artificial intelligence techniques: how can it solve problems in power electronics?: an advancing frontier. IEEE Power Electron. Mag. 7(4) (2020). https://doi.org/10.1109/MPEL.2020.3033607 46. Zhao, S., Blaabjerg, F., Wang, H. : An overview of artificial intelligence applications for power electronics. IEEE Trans. Power Electron. 36(4) (2021). https://doi.org/ 10.1109/TPEL.2020.3024914
The Use of Signal Intensity Estimator for Monitoring Real World Non-stationary Data Mohamed Elforjani1,2(B) and David Mba2 1 SupervisoryEye, Cranfield MK43 0JA, UK
[email protected]
2 De Montfort University, Leicester LE1 9BH, UK
[email protected]
Abstract. Robustness and operational reliability are some advantages that present rotating machine components as inevitable parts of most mechanical engineering systems. To keep rotating machines function at optimal conditions, control and maintenance of machine components must well be applied. Improper analysis of high modulated non-stationary data, acquired from machine components (e.g. gears and bearings) may lead to complete machine break. Despite the body of research work, available through the literature, most of existing condition monitoring (CM) methods proved to be inefficient for real world applications. Hence, this would still suggest an obvious need for fundamental modifications and/or development of new CM techniques. Here we aim to tackle this issue by proposing Signal Intensity Estimator (SIE) as an alternative technique, tailored to the task for monitoring of high modulated data. Our main interest lies with the introduction of the idea behind the SIE method and its previous successful applications for monitoring Suzlon and Repower (Wind Energy Companies) wind machines. Keywords: Condition monitoring · Non-stationary data · Wind turbines · SIE method
1 Introduction Machine components such as bearings and gears have been utilised for various machining and industrial processes since the ancient era. Feasibility for condition monitoring (CM) of bearings and gears using signal processing techniques was off the shelf determined (Elforjani 2020). Observational studies have demonstrated that most of the wellestablished CM tools are only highly relevant/sensitive in the detection of early presence of machine faults. Also, the potential of the use of these methods for monitoring the well advanced damages remains low (Elforjani and Bechhoefer 2018). Another important constraint on all the work discussed in this area is that the reliability of these techniques to handle and characterize the measured data from machines under non-stationary operating conditions is not yet sufficiently proved (Elforjani 2020). Most of the commonly used time and frequency analysis methods assume that the operating conditions (e.g. speed and load) can be held reasonably constant. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 330–339, 2022. https://doi.org/10.1007/978-3-030-82110-4_17
The Use of Signal Intensity Estimator
331
However, in real world applications, machine data very often is acquired from an environment under non-stationary operating conditions. This type of data consists of very extremely high modulation rates that need capable, efficient and workable CM tools for a thorough analysis. Practical failings of the existing CM methods in tackling the high modulated data were highlighted in the reports provided by some wind energy companies (e.g. Suzlon & Repower) (Elforjani 2020; Elforjani and Bechhoefer 2018). Due to non-stationarity, information of bearings’ faults, for instance, from Repower wind machines was superimposed by the gears’ meshing signals. As a consequence, what is called high order cyclostationary process (signals with varying properties e.g. statistical variance) was generated. This led the applied CM tools to unworkably observe and locate the carrier frequencies of the bearings under diagnosis. To tackle the non-stationarity issue, the authors have recently proposed Signal Intensity Estimator (SIE), which was inspirational to numerous subsequent works. SIE exhibited excellent performance in almost all cases and it was shown to be more reliable than the existing methods. Until recently, the SIE method has passed through major changes, updates and modifications not only to alleviate and ease its computational complexity but also to improve its overall accuracy. It should be telling that early versions of SIE algorithms were only limited to the analysis of the experimental data and there was no definitive method or a single approach universally suitable for how values of the required parameters such as the maximum frequency of analysis and the windows size should be obtained. As a consequence of this, tedious trial and error method remained the only attempt to solve this problem (Elforjani 2016, 2018). Although similar SIE idea is still applied, the most updated SIE version significantly differs in the way it handles the selection of such parameters. It was reported that the updated version does not suffer from any issues raised earlier and its algorithm provides flexibility to tackle very complex challenges in simple, robust and reliable computational process and leaves no requirement for any further calculation steps. However, to our knowledge, there may not be any published documents surrounding the concepts of SIE method in very sufficient and comprehensive details. For the sake of completeness, the main objective of the present chapter is to bring together and thoroughly explain both the basic and advanced concepts of the most updated SIE method in a single standard document that comes in handy when machine diagnostic is performed. The chapter starts with a thorough overview of SIE method and then progresses to highlight its applications for the analysis of data from real world wind machines (Repower & Suzlon wind machines).
2 Signal Intensity Estimator (SIE) Statistical parameters such as Root Mean Square (RMS), kurtosis (KU), and Crest Factor (CF) are broadly used to extract the carrying information from vibration signals. In general, these fault indicators are attractive to the field analysts, who are mainly practical than theory oriented, because they are very fast to compute and do not need laborious knowledge. Also, they can be incorporated into the monitoring systems at very low cost. However, due to the nature of today’s CM data, these fault indicators suffer from issues such as high dimensionality, high modulation rates, and rapid transient changes. When
332
M. Elforjani and D. Mba
trying to apply KU or CF in practice to vibration data from well-advanced damaged machine, they become insignificant/unresponsive and their levels will drop to that similar to undamaged machines. Yet, KU, CF, and RMS are essentially measured over a predefined signal chunk (one value indicator) and, hence, their values were reported to be unaffected by transient changes that typically occur over micro seconds. Further, these parameters can only provide time domain information, which is not always sufficient to obtain an exact match between the failure mode and the faulty component. Recently, Spectral Kurtosis (SK) algorithm was proposed to handle this issue (Antoni 2006). Often faulty machine show significant high KU levels, which will give a rise to the impulsive responses. This will lead the acquired data to have varying statistical contents, in particular, if the machine is working under varying operating conditions. Thus, the idea behind the SK method is to locate the varying and/or non-stationary events in the frequency domain. The SK algorithm basically attempts to first calculate the KU values at different frequency bands and then identify the location of the maximum KU. The frequency band of the maximum SK level will eventually be used to design a signal filter to extract the high level impulsive signals. Though several attempts were made to improve the overall calculation procedures and ease the difficulty in the investigation of the entire plane, it should be telling that the method is still highly dependent on an iterative and complicated computational process. Yet, improper selection of the optimal size for the frequency bands has substantial impacts on the overall results. Also, the applicability of this method in the case of high modulated data is not always possible since the calculated KU values by nature are very noisy and the SK, as a result, will not be able to resolve the high modulation rates in the analyzed signals; vibration spectrum very often is overwhelmed by broad and noisy frequency spikes (Elforjani and Bechhoefer 2018). This motivates the need for an alternative approach that provides the flexibility of being applicable even in the cases of high modulated data. Signal Intensity Estimator (SIE) method has a distinction from the other tools as it employs the cumulative sum (CUSUM) for monitoring any sequential samples (Elforjani 2020). The use of CUSUM will firstly allow the total to be identified at any time interval without having to sum the entire sequence. Secondly, if any particular activities are not individually important, CUSUM can easily save having to record the sequence itself. Thirdly, processing several samples from failure histories using CUSUM’s will result in greater sensitivity for detecting transient shifts or variation in trends over time. In practice, SIE is essentially a piecewise windowing method that attempts to analyze any CM data, regardless its physical units, using a twostep process. It is important at this point to note that SIE method is fundamentally different from the classical envelope approaches as the process primarily incorporates the segmentation of the entire signal into equal width segments and CUSUM is then used to statistically chart the signal intensity of current and preceding values. To apply the SIE, the sum of cumulative sum (SCSsegment ) of a predefined segment (window) in a given time domain signal is normalized to the overall root mean square (RMSoverall ) of the same signal. As well-known, the rotational speed (RPM) is considered to be one of causal factors of the non-stationarity and it is therefore the RPM is employed by the SIE algorithm to calculate the Maximum Frequency of Analysis (Fmax ). Using
The Use of Signal Intensity Estimator
333
this approach for the calculation of segment sizes could significantly alleviate the issue of what is called the losses in time localization and frequency localization. With the known of RPM at any time and the Machine Component Constant (β), Fmax can simply be calculated using Eq. 1. Fmax = β ·
RPM 60
(1)
It is worth to mention that each β value in Eq. 2 is assigned to a specific machine component (e.g. shafts, bearings, gears, etc.). After careful iterative selection process, the validation results confirmed that the process produced β values within appropriate tolerance levels. ⎧ ⎪ 20, for shafts ⎪ ⎪ ⎪ ⎪ ⎨ 40, for bearings (2) β = 60, for pumps ⎪ ⎪ ⎪ 80, for gears ⎪ ⎪ ⎩ 100, for slow rotational speeds The calculated Fmax values are compared with published SIE Standard Frequencies (SSF’s), which have been quantitatively and qualitatively validated experimentally to eventually ensure the reproducible results. The SSF is very important factor in the SIE algorithm as it allows for a robust calculation of the most appropriate windows (segments) size and significantly helps to avoid any indefinite settings. Table 1 illustrates an example of these SIE Standard Frequencies used for bearings and gears. Upon the completion of selecting SSF value, SIE algorithm calculates the required number of segments (n) in the analysed signal using the Sampling Rate Frequency (FS ). This can be described through an equation such as the following: nsegments =
Fs SSF
(3)
Mathematically, SIE can be computed using one of two approaches. The first approach is a time domain method where SIE values are directly calculated from the original data (Eq. 4). The SIE values in the second approach are extracted from the resulting Fast Fourier transform (FFT) applied to every individual segment (Eq. 5). SCSsegment RMSoverall FFT CSsegment = RMSoverall
SIET |segment = SIEF |segment
(4) (5)
When non-transient type signals are analyzed, identical statistically SIE charts will be noted and the ratio of SIE between any two adjacent segments will approximately approach a value of one. For the signals associated with transient characteristics, this ratio will be greater than one. It is of interest to mention that SIE algorithm is also integrated with another algorithm for performing the statistical test for normality to identify the
334
M. Elforjani and D. Mba Table 1. Example of SIE standard frequencies for the analysis frequency (Hz). Machine component (Type of application)
β
Bearings
Analysis frequency Fmax a
SSFb
40
Fmax < 800
Equal to Fmax c
Bearings
40
800 ≤ Fmax ≤ 1500
1000–2000
Bearings
40
Fmax > 1500
2000–4000
Bearings
40
Fmax > FS d
1500
4000–6000
Gears
80
Fmax > FSd
0, mw > 0 0 < l1 < L 2
⎪ ⎭
,
(22)
(23)
where, f 01 * is the required value of the frequency of free oscillations, M(S) - is the total mass of the beam system. In particular, considering the beam with a rectangular cross-section, marking for which thickness as h, and taking the width as functional dependence b(h) = 8h, we obtain the following system of equations: M (h, mw ) = ρb(h)hL + mw → min
(24)
⎫ ∗ f01 (h, mw , l1 ) = 0.046λ(h, mw , l1 )2 c0 h ≡ f01 ⎪ ⎬ h > 0, mw > 0 ⎪ ⎭ 0 < l1 < L 2
(25)
For example, using an excitation frequency of f 01 * = 25 Hz, we obtain the optimum design parameters h = 5.51 10–3 m, mw = 1.12 kg, l 1 = 0.112 m that provide M = 3 kg. For the synthesis of vibrating systems, the mass mw can be determined, so the problem (22)–(23) will be reduced to determining the design parameters of the cross-section S and the position l1 of the intermediate supports. The considered optimization problem (24)–(25) can be supplemented by force analysis and additional constraints on stress-strain characteristics, in particular, the strength and durability of the vibration system operating conditions [19]. This will allow for more comprehensive optimization of vibration systems, taking into account operating conditions and adapting existing tools of regulation to the operating modes.
4 Adjustment of Frequency Characteristics The automated adjustment of the frequency characteristics of the vibrating machine is proposed according to the scheme shown in Fig. 5. The automatic vibration control system includes vibration sensor 1 (can be 1–2-3 axes); controller 2, which receives the signal from the sensor and generates control signals to the actuator 3. The rotary actuator 3 is equipped with an angular position sensor (encoder), which performs stabilization of its position through internal feedback. Depending on the configuration of the actuator mounting, its angular position is calibrated in the controller 2 to the respective linear positions of the intermediate support 4. In turn, the actuator 3, which is connected to the intermediate support 4 using a lever system, due to rotation within 0–180° moves the support 4 with the lever, which is mounted in the guides 5 along the x-axis.
Optimization of the Vibrating Machines with Adjustable Frequency Characteristics
361
Fig. 5. Scheme of automatic control of the position of intermediate supports: 1 - vibration sensor; 2 - controller; 3 - the actuator of the rotary type, 4 - intermediate support, 5 - guides.
The stabilization of the mass mw amplitude fluctuations occurs in the case of deviation of the motion parameters from the values specified in the controller for each type of processing material and its properties (fraction size, humidity, density). In the general case, in the controller 2, based on the measurement data, any parameters of the vibrational motion of the machine can be determined, on which the efficiency of material processing depends. In particular, the following parameters can be adjusted: the root mean square or the peak value of the oscillation amplitude, the shape of the trajectory (orbit) and others.
5 Conclusions The resonant and out-of-resonance vibration machines are considered, the advantages and disadvantages of each of these types, operating under non-stationary loads and changes in the properties of the processed material, are noted. It is proposed to adjust the frequency characteristics of resonant machines with the help of elastic intermediate supports. The problem of determining the natural frequencies of free bending oscillations of a beam with intermediate supports and finite mass is solved. By changing the stiffness factor and the location of the intermediate supports, the frequency of free transverse oscillations of the system can be increased significantly by up to 6 times. This approach will compensate for the influence of technological factors on the frequency of oscillations, as well as change other kinematic characteristics during the operation of the vibrating machine. The solution of an optimization problem aimed at minimizing mass characteristics is given providing the corresponding values of the frequency of free oscillations. The problem is reduced to the determination of rational cross-sections of an elastic beam under conditions that intermediate supports of arbitrary stiffness are used. The scheme is developed of the automatic vibration control system of the sieving machine with external feedback by vibration signal and internal feedback by the signal of the angular position of rotary type actuator. The system provides stabilization of the machine’s motion parameters depending on the type and the parameters of the material being processed, which significantly improves the efficiency of the bulk raw material processing.
362
V. Gursky et al.
The proposed solutions open up the possibility of further optimization of structures under conditions of unsteady dynamic loads, in particular providing established frequency-mass and power characteristics, which can have wide practical application in various industrial machines and technical devices. Funding Statement. This activity has partly received funding from European Institute of Innovation and Technology (EIT), a body of the European Union, under the Horizon 2020, the EU Framework Programme for Research and Innovation under Framework Partnership Agreements No. 18253 (OPMO - Operation monitoring of mineral crushing machinery). Conflicts of Interest. The authors declare that there is no conflict of interest regarding the publication of this paper.
References 1. Nazarenko, I., Slipetskyi, V.: Development of the organizational principles of formation of the optimal diagram and parameters of vibration system. Technol. Audit Prod. Reserves 5, 29–31 (2019). https://doi.org/10.15587/2312-8372.2019.183874 2. Palacios, J.L., Balthazar, J.M., Brasil, R.M.L.R.F.: A short note on a nonlinear system vibrations under two non-ideal excitations. J. Brazilian Soc. Mech. Sci. Eng. 25(4), 391–395 (2003). https://doi.org/10.1590/S1678-58782003000400011 3. Ultimate Screener. http://www.kb-intel.com.ua/product/15/ 4. Jiang, Y.-Z., He, K.-F., Dong, Y.-L., et al.: Influence of load weight on dynamic response of vibrating screen. Shock. Vib. 2019, 4232730 (2019). https://doi.org/10.1155/2019/4232730 5. Lekic, Ð.M., Despotovic Z.V.: Control of half-bridge resonant PWM converter for electromagnetic vibratory actuator. In: 18th International Symposium on INFOTEH-JAHORINA (INFOTEH 2019), East Sarajevo, Bosnia and Herzegovina, pp. 1–6 (2019). https://doi.org/ 10.1109/INFOTEH.2019.8717773 6. Despotovic, D.Z., Pavlovic, M.A., Radakovic, J.: Regulated drive of vibratory screens with unbalanced motors. In: XV International Scientific Professional Symposium, vol. 15, pp. 155– 60. INFOTEH-JAHORINA, Jahorina, Bosnia and Herzegovina (2016) 7. Magnetic Vibrator. https://www.aviteq.com/en/products/drive-technology/magnetic-vib rator/ 8. Nazarenko, I., Gaidaichuk, V., Dedov, O., Diachenko, O.: Determination of stresses and strains in the shaping structure under spatial load. Eastern-European J. Enterp. Technol. 6, 13–18 (2018). https://doi.org/10.15587/1729-4061.2018.147195 9. Peng, L.-P., Liu, C.-S., Song, B.-C., Wu, J., Wang, S.: Improvement for design of beam structures in large vibrating screen considering bending and random vibration. J.Central South Univ. 22(9), 3380–3388 (2015). https://doi.org/10.1007/s11771-015-2878 10. Lingyun, W., Mei, Z., Guangming, W., et al.: Truss optimization on shape and sizing with frequency constraints based on genetic algorithm. Comput. Mech. 35, 361–368 (2015). https:// doi.org/10.1007/s00466-004-0623-8 11. Sedaghati, R., Suleman, A., Tabarrok, B.: Structural optimization with frequency constraints using the finite element force method. AIAA J. 40, 382–388 (2002). https://doi.org/10.2514/ 2.1657 12. Pukach, P.Ya., Kuzio, I.V., Nytrebych, Z.M., et al.: Asymptotic method for investigating resonant regimes of nonlinear bending vibrations of elastic shaft. Nauk Visnyk Nat. Hirnychoho Univ. 1, 68–73 (2018). https://doi.org/10.29202/nvngu/2018-1/9
Optimization of the Vibrating Machines with Adjustable Frequency Characteristics
363
13. Hong, J., Dodson, J., Laflamme, S., et al.: Transverse vibration of clamped-pinned-free beam with mass at free end. Appl. Sci. 9(15), 2996 (2019). https://doi.org/10.3390/app9152996 14. Krot, P., Zimroz, R.: Methods of springs failures diagnostics in ore processing vibrating screens. In: IOP Conference Series: Earth and Environmental Science, Prague, Czech Republic, 9–13 September 2019, vol. 362, pp. 1–9 (2019) 15. Krot, P., Zimroz, R., Michalak, A., et al.: Development and verification of the diagnostic model of the sieving screen. Shock. Vib. 2020, 8015465 (2020). https://doi.org/10.1155/2020/801 5465 16. Ozden, R.C., Anik, M.: Enhancement of the mechanical properties of EN52CrMoV4 spring steel by deep cryogenic treatment. Materialwiss. Werkstofftech. 51, 422–431 (2020). https:// doi.org/10.1002/mawe.201900122 17. Krot, P., Bobyr, S., Dedik, M.: Simulation of backup rolls quenching with experimental study of deep cryogenic treatment. Int. J. Microstruct. Mater. Prop. 12(3/4), 259–275 (2017). https:// doi.org/10.1504/IJMMP.2017.10012128 18. Piersol, A.G., Harris, C.M.: Harris’ Shock and Vibration Handbook, 5th edn. Standardsmedia, New York (2002) 19. Gursky, V., Kuzio, I.: Dynamic analysis of a rod vibro-impact system with intermediate supports. Acta Mechanica et Automatica 12, 127–134 (2018). https://doi.org/10.2478/ama2018-0020
Mathematical Modelling and Computer Simulation of Rotors Dynamics in Active Magnetic Bearings on the Example of the Power Gas Turbine Unit Gennadii Martynenko(B) Department of Dynamics and Strength of Machines, NTU “KhPI”, National Technical University “Kharkiv Polytechnic Institute”, 2, Kyrpychova Street, Kharkiv 61002, Ukraine [email protected]
Abstract. The paper considers the use of mathematical and computer modelling methods for the analysis of the technical state of complex rotary machines with controlled electronic components. These are active magnetic bearings (AMBs) with control systems. Assessment of the vibrational state of any rotor system is one of the most important tasks in their design and synthesis. Stationary and nonstationary processes take place in such systems. The main focus of the research is the use of a specially designed mathematical apparatus and software for implementation of proposed techniques for assessing the dynamic behaviour of rotors of power engineering turbomachines with active magnetic bearings in the entire range of excitation frequencies. Mathematical modeling is based on analytical representation using Lagrange-Maxwell magnetomechanical equations. The Runge-Kutta procedure is applied for numerical simulation. These methods and computer tools have several advantages over existing ones since they take into account completeness of a nonlinear relationship between mechanical, magnetic, and electrical processes occurring in the system, including continuous or discrete control actions. Created analytical and numerical approaches allowed to perform the modelling of rotor dynamics occurring in an energy gas turbine installation taking into account AMBs. Design calculation studies and analysis of the rotor dynamics of turbocompressor and turbogenerator of this installation show advantages of the proposed method for systems including AMBs. All the results of numerical studies are verified by comparison with numerical and experimental data known from open sources of information. Keywords: Rotor dynamics · Active magnetic bearing · Stationary and non-stationary processes · Computer simulation
1 Introduction One of the areas of applying mathematical modelling and modern information technologies is the informatization of the design of the wide variety of technological and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 364–377, 2022. https://doi.org/10.1007/978-3-030-82110-4_20
Mathematical Modelling and Computer Simulation of Rotors Dynamics
365
artificial human environment objects. They help to perform research, design, organizational and managerial activities. In the design process, information technologies are used for computer modelling in various applied fields of complex objects, processes, and phenomena. Such the simulation includes geometric modelling and analysis systems, in-variant design procedures, information processing algorithms, programming, database and calculations tools, modern software for computer-aided design of engineering objects, animation and computer graphics systems as well as other tools. In this case, the considered subject is an application of mathematical modelling and information technologies to the design of complex objects of power machine building and modelling of stationary and non-stationary processes and phenomena occurring in them. A lowpower gas-turbine power plant based on gas-turbine units is considered as an example of this application.
2 Literature Review Active magnetic bearings (AMBs) are an alternative type of elastic-damper bearings of rotors, which, compared to other ones (rolling element, fluid journal, gas-dynamic bearings), have advantages, such as the lack of lubrication systems, reduction of friction losses, a relatively large clearance, etc. [1]. In a mathematical description of the dynamic behaviour of the rotor-in-AMBs system, an important issue is a correct mathematical formulation of force and stiffness characteristics of the AMBs. These characteristics depend on geometrical and physical parameters of stator and rotor parts, as well as on parameters of a control law which an automatic control system of a rotor position operates for. These are voltages applied to windings of electromagnets, currents in them, and their active resistances [2]. The values of voltages and currents vary following the control law and depend on a state of the rotor-in-AMBs system, i.e. on a deviation of the rotor from its nominal position in a bearing clearance [3]. Physically, these deviations are limited by clearances in touchdown rolling element bearings, which are usually 2 times less than the nominal air (magnetic) clearance between stator poles of AMBs and a rotor [4]. However, the information on the force characteristics in an entire range of deviations in air (magnetic) clearances improves the adequacy of mathematical models of dynamics of the rotor in the AMBs [5, 6]. The use of these models increases the validity of a choice of AMBs and control laws based on mathematical modelling, rather than experimental refinements [7]. Methods for analysing the dynamics of rotors in elastic-damping supports are well developed [8–10]. However, such a specific bearing unit as an AMB introduces its characteristic features into the dynamic behaviour of the rotor. To describe the dynamics of rotors in AMBs, simplified analytical models are often used either with one degree of freedom like a mechanical oscillator or with two degrees of freedom reflecting the position of the rotor in one of the planes [11]. Moreover, the search for critical speeds can be performed both with and without taking into account rotor rotation, and electromagnetic forces can be represented by linearized relations, which also leads to inaccuracies of modelling [12]. Improving an adequacy of mathematical modelling can be achieved by including the dependences of force on current and displacement, flux linkage on current and displacement, and first derivative of flux linkages concerning current or displacement on current and displacement, predefined by a numerical analysis (for example, the
366
G. Martynenko
finite-element one), in the linearized analytical model [2]. These models can be used, for example, to analyse a strategy and laws of control of active magnetic bearings. Improving the quality of mathematical models and accuracy of calculation results can be achieved by taking into account the following effects: connected processes in orthogonal directions [13]; an interaction of various processes, such as electromagnetic and mechanical ones, and a performance of a coupled analysis [14]; various nonlinear effects [15]. One of the main drawbacks of these mathematical models is linearization, which introduces errors significantly affecting the result and can be applied only in a small area of an equilibrium position for all design variants of axial and radial AMBs. Commercial finite element codes allow various computational models to be used, for example, three-dimensional solid-state [16] and beam ones [17], and take into account some of the features of rotor systems with magnetic fields, for example, a complex structure of a long shaft [18], elastic-damping couplings, modelling of AMBs, the rotor associated with them and the machine body [19] and others. However, these software tools do not enable utilizing finite element models to find solutions which take into account the entirety of processes occurring in such a complex electromechanical system as a rotor in AMBs. These are, for example, coupled electro- and magneto-mechanical phenomena (the dependence of magnetic forces on gaps between moving and stationary parts and currents in the windings of the AMB electromagnets), a time delay of the current in the windings of AMB electromagnets due to the inductance of coils, as well as laws of the AMB control system. At the same time, such approaches are successfully used to analyse the dynamics of rotors in other types of bearings [20, 21]. One of the means of adequate modelling of dynamic processes in rotor machines is an analytical description with subsequent software implementation [22]. The use of information technologies is relevant in this case, for example, applying information processing, database and calculation algorithms, such as a characteristic analysis [23] or an analysis using artificial neural networks [24]. In any case, experimental verification of the used approaches and tools is a necessary condition [25]. The result of using information design technologies is the technical implementation of created objects or machines at the prototype level [26]. The concept proposed in [5, 6] for constructing analytical mathematical models of the dynamics of rigid rotors in AMBs allows most of the disadvantages of the existing approaches to be eliminated. It can be used reasonably to analyse the dynamic behaviour of not only the simplest rotor systems with AMBs, but also to simulate the dynamics of rigid and flexible rotors of real industrial rotary machines. This work is devoted to the proof of this fact using information technologies.
3 The Object of Research and Problem Formulation 3.1 Gas Turbine Unit Design At the moment, the use of low-power gas-turbine power plants based on a gas-turbine installation for generation of electricity and heat for domestic and industrial consumers located in areas with an impossibility or a difficulty of centralized supply from large cogeneration plants is becoming increasingly urgent.
Mathematical Modelling and Computer Simulation of Rotors Dynamics
367
The paper considers gas turbine heat and power plant (GT HPP), designed to generate electricity and heat [27]. The object of the research is the dynamics of the rotor of a gas turbine unit (GTU) GTE-009M with a single electric capacity of 9 MW, based on which a GT HPP was made. The station includes a high-speed turbogenerator TFE-10–2(3 × 2)/6000U3 with a frequency of 101.6 Hz, and a thyristor frequency converter is used to supply electricity to an external network with a frequency of 50 Hz. Two gas turbine installations, consisting of gas turbine units of the GTE-009M type, recuperative air heaters, heat recovery boilers and hot water boilers, are installed at the GT HPP [27]. The GTE-009M gas turbine unit has a single shaft with a rotational speed of 6096 rpm supported by magnetic bearings including sectional combustion chambers and with the axial gas output after the turbine. The use of an electromagnetic suspension system, namely, active magnetic bearings, replaces oil bearings in the turbo unit, which increases the durability and environmental friendliness of the power plant. The appearance of this installation is shown in Fig. 1, and the view of the support units (magnetic and safety bearings) in Fig. 2. The shaft line of the installation is shown in Fig. 3. It consists of a rotor of a turbocompressor (1), a rotor of a generator (2) and an intermediate shaft (3) [27, 28].
Fig. 1. An appearance of the gas turbine unit GTE-009M (Source: [27])
Fig. 2. Radial AMB (diameter 400 mm) and safety bearing (Source: [27])
368
G. Martynenko
Fig. 3. The shaft line of the installation (Source: [27])
The shaft line is installed in active magnetic bearings (Fig. 2) – one radial (4) and one radial-axial (5) AMBs of the turbocompressor rotor, as well as two radial (6 and 7) AMBs of the generator rotor. The intermediate shaft does not have any bearings. There are the sensors of radial displacement of the rotor to the left and the right of each radial AMB. Radial AMBs are additionally equipped with two touchdown bearings with ceramic rolling elements without lubrication (Fig. 2). They are necessary to ensure the run-out of the rotor in case of failure of the magnetic suspension system [27]. The design of GTU rotors in AMBs was performed according to the results of dynamic calculations of the turbine and generator rotors separately and of the entire shaft line as a whole [27]. These calculations were executed using a beam finite element model (FEM). The results are represented as frequencies and modes of natural vibrations of non-rotating rotors, as well as critical speeds determined by Campbell diagrams. The calculated data were compared with the results of the experimental determination of natural frequencies of the free shaft [27]. This confirmed their correctness and the possibility of using new mathematical models for verification. 3.2 Research Objectives and Initial Data Beam-mass finite element models give general information about the dynamic parameters of a linear system. However, it is necessary to create more accurate calculation models for testing various algorithms and laws of AMB control, as well as modelling of nonlinear phenomena of rotor dynamics characteristic of such systems. For the reliability of the analysis and the validity of the conclusions, the error of the results should not exceed 1–2%. Therefore, the models should be able to take into account both the dynamic characteristics of the considered rotor system and the controlled stiffness and damping properties of AMBs. For this purpose, the technique proposes to use analytical models [5, 6] with and without taking into account the deformability of the rotor. The task is to perform computational studies to verify the adequacy of these analytical models for studying the dynamics of rotors of a gas turbine installation with the rotors in the active magnetic bearings. The work aims to create a nonlinear simulation computational model of the dynamics of a rotating rotor in passive and/or active magnetic bearings (SCM-DRMB-N) to implement analytical models and conduct a comprehensive dynamic analysis based on their use. The base for this model is the application of the information technologies of design and computer modelling of objects and processes. The input data for solving the stated technical problem are the scheme of the entire GTU rotor (Fig. 4); schemes of rotors of the turbocompressor and the generator; the mass
Mathematical Modelling and Computer Simulation of Rotors Dynamics
369
of the turbocompressor with a part of the intermediate shaft (~6300 kg), the generator with a part of the intermediate shaft (~4700 kg), the entire GTU shaft line (~11000 kg); transverse (J e ) and equatorial (J p ) moments of inertia and centres of gravity, set separately for each rotor and the rotor assembly, (J e = 93130 kg·m2 and J p = 606 kg·m2 ) [27, 29]. The real AMB control law is usually an industrial secret and is individual for each AMB design and rotor system. Known parameters are the natural frequencies of the non-rotating rotor, which are in the range with the upper limit, that one and a half times exceeds the operating frequency (6096 rpm or 101.6 Hz), and the modes corresponding to them. These data characterize the design at a test value of the stiffness of all active magnetic bearings equal to 1 MN/m. For the final verification of the model, the dependences of the natural frequencies on the rotor speed (Campbell diagram) and the rotor critical speeds are used.
Fig. 4. Calculation scheme of the GTU rotor (sectional lengths in mm) (Source: [27, 29])
4 Mathematical Modelling of the Dynamics of GTU Rotors 4.1 Analysis of Linear Vibrations of GTU Rotors The main objective of these computational studies in a linear formulation is to obtain all possible results for the verification of an analytical mathematical computational model to confirm applicability and advantages of the proposed method by using a software package for the general engineering purpose. These studies were performed in a finite element formulation using a beam-mass model and elastic linear elements K for modelling of the AMBs. The results of calculations of natural frequencies and modes for the entire shaft line at a test stiffness of all supports equal to 1 MN/m are presented in Fig. 5. Numerical data of the natural frequencies (NF) of a non-rotating rotor (ω = 0) are summarized in Table 1. The results of the analysis of the dynamic behaviour of a gas turbine rotor under the action of harmonic forces caused by its imbalance (6.3·10–6 kg·m for a turbocompressor and 4.7·10−6 kg·m for the generator) with relative damping of 4% are presented in Fig. 6 in the form of amplitude-frequency characteristics (AFC) and orbits of axis points corresponding to critical and operating frequencies. Using modal and harmonic analyses, such results were obtained as spectra of natural frequencies and modes, critical speeds taking into account the gyroscopic moment and the frequency response of the entire GTU shaft line and isolated turbocompressor and generator rotors. These dynamic characteristics are compared with experimental and calculated data (Table 1). They are suitable for verification of analytical and computational models of the dynamics of GTU rotors, built on the principles described in [5, 6].
370
G. Martynenko
Fig. 5. Modes of natural vibrations of the shaft line with the stiffness of the supports 1 MN/m.
Table 1. The results of an analysis of the GTU rotor critical speeds #
NF, Hz (ω = 0)
Critical speeds, rpm Reverse precession Calculation
Reference
Direct precession Error, %
Calculation
Reference
Error, %
1
2.9
175.5
177.8
1.3
175.8
178.3
1.4
2
3.5
206.8
211.1
2.0
208.1
212.4
2.0
3
5.6
327.1
319.4
2.4
342.3
335.1
2.1
4
21.9
1284.9
1299.4
1.1
1341.4
1363.1
1.6
5
83.9
4883.0
4768.8
2.4
5195.6
5067.2
2.5
Fig. 6. AFC of isolated rotors of a turbocompressor (left) and a generator (right) and orbits of the rotor axis points corresponding to critical and operational rotational frequencies
4.2 Analytical and Computational Models of the Dynamics of GTU Rotors The method proposes to form an analytical mathematical model for analysing the dynamics of a gas turbine rotor in AMBs based on the approaches described in [5, 6]. These approaches were tested by an analysis of the dynamics of model rigid rotors [5, 6], and their reliability was confirmed by numerical and laboratory experiments [30, 31]. The mathematical model is based on one of the representations of the Lagrange-Maxwell
Mathematical Modelling and Computer Simulation of Rotors Dynamics
371
equations. In the case when the conduction currents are closed and there are no capacitors in the electrical circuits, electromechanical systems can be described by equations similar to Routh equations in mechanics. They have the form: ⎧ d ∂T ∂T ∂ ∂D ∂W ⎪ ⎨ dt ∂ q˙ j − ∂qj + ∂qj + ∂ q˙ j = − ∂qj + Qj (j = 1, . . . , M ), N (1) ∂ ∂W ⎪ rCks ∂ + E = 1, . . . , N (k ). k ⎩ ∂tk + s s=1
In (1), W = W ( 1 ,…, N , q1 ,…,qM ) is the magnetic field energy; k are the fluxes of induction (flux linkage), r C ks are the active resistances of the electrical circuits, E k is the algebraic sum of the external electromotive forces; N is the number of closed unbranched contours, the terms – ∂W /∂qj are ponderomotive forces, namely, generalized forces due to a mechanical action of a magnetic or electromagnetic field. Writing the expressions of kinetic and potential energies and applying these equations to describe the dynamics of the rotor in the AMBs allow obtaining a coupled differential equation system (DES) for M generalized coordinates qj and N flux linkages k . The expression of the magnetic energy of an AMB is recorded based on the analysis of magnetic circuits taking into account the magnetic resistances (or conductivities) of both air clearances and sections of magnetic cores [5, 6, 31]. This approach avoids singularities and gives finite values of magnetic forces when the rotor is displaced by an amount close to the nominal clearance. In this case, the forces in the AMB consider the control law and the characteristics of electromagnetic circuits. The data approach was verified by comparing the force characteristics obtained by other methods and experimentally [32]. In the case of a rigid (non-deformable) rotor, the mathematical model consists of 5 nonlinear differential equations of motion and N equations describing the total current law for each k th circuit of the system (i.e. all AMB coils) according to the Eqs. (1) [5]. It has the form [6]:
(2)
Here m is a rotor overall mass; J 1 and J 3 are rotor moments of inertia; l1 and l 2 (l1 + l2 = l) are distances from the coordinate system center to the radial contact section centers, and l3 is a distance to the axial stop section center; qj = {x 1 ,y1 ,x 2 ,y2 ,z3 } are general
372
G. Martynenko
coordinates (centres of supporting sections of the rotor in magnetic bearings); members −∂/∂qj represent potential forces which depend only on generalized mechanical coordinates, for example, restoring magnetic forces in the PMBs or forces caused by an action of elastic coupling halves; Pqj = −∂W /∂qj are electromagnetic reactions of the AMB; H q (t) are forces explicitly dependent on time – external periodic loads caused by the dynamic rotor unbalance; Qj are other non-potential generalized forces; f qj (qi ), f qj (qi ) are nonlinear members of the equations of motion, caused by inertia forces and a potential field of the second and third order; bx1,…,z3 are viscosity coefficients; r c 1,…,N are active resistances in winding circuits; uc 1,…,N are control voltages supplied to the AMB windings, the values of which are formed in accordance with the adopted control law; W ( 1 ,…, N „x 1 ,…,z3 ) is the magnetic field energy of the AMB, which is formed by the magnetic field energies of each circuit section of each AMB, which includes the magnetic resistances of the circuit sections and magnetic fluxes through them. In the case of a flexible rotor, the first part in (1), i.e. the equations of motion of the rotor, is formed taking into account the expressions of the potential and kinetic energies of the shaft itself, as well as the disks and supports (AMBs) located on it: T = TS + TD1 + TD2 + TD3 + . . . , = S + CF
(3)
Here T is the kinetic energy of the system; T S and T D are the ones of a shaft and discs, respectively; and P is the potential energy of the system; PS and PCF are all the internal energy of the system and the energy of conservative forces (gravity, restoring elastic forces). They are formed as, for example, in [33]. Such an approach allows a mathematical model in an analytical form to be constructed and analytical or numerical methods for solving systems of joint differential equations to be used. 4.3 Simulation Computational Model of the Dynamics of a Rotor in MBs Computer mathematical systems were used to implement the analytical mathematical model. They helped to create a nonlinear simulation computational model of the dynamics of a rotor in magnetic bearings (MBs). This model allows performing variant calculations of dynamic stability and search for rational control impacts caused by design changes of the geometric and electromagnetic parameters of the rotor and suspension. The block diagram of the simulation computational model is presented in Fig. 7. Algorithmically, it consists of three computational blocks which can be used either sequentially or as separate modules (if the necessary information was already entered during previous program launches), since communication between them is provided by databases. The simulation computational model is implemented using a computer algebra program system oriented towards complex mathematical computations and modelling. SCM-DRMB-N performs symbolic computations, numerical solution of the DES and visualization of the results. SCM-DRMB-N allows studying the stable motions of the rotor in PMBs and AMBs for analogue or discrete automatic control systems with the implementation of almost any control algorithm. The calculation results obtained using this simulation model give a complete picture of the dynamic processes of various nature occurring in the system. Its accuracy is confirmed by comparison with laboratory experiments [6, 31].
Mathematical Modelling and Computer Simulation of Rotors Dynamics
373
Fig. 7. The enlarged block diagram of the simulation computational model SCM-DRMB-N
4.4 Verification of Simulation Model of Dynamics of GTU Rotors in AMBs A check of the adequacy of modelling and applicability to the analysis of the dynamics of rotors of industrial power machines is carried out on the example of a gas turbine installation. To verify and ensure an accuracy, a comparative analysis of the results obtained using the SCM, the finite element approach and the known results was performed. Figure 8 shows Campbell diagrams for determining the critical speeds of the gas turbine rotors. These results of linear analysis of the rotor dynamics for the turbocompressor and the generator with the stiffness of all AMBs equal to 1 MN/m (Fig. 8) allow determining the critical speeds taking into account the gyroscopic moments at which the rotors make movements of cylindrical and conical precession types. Here p are the critical speeds corresponding to the direct and reverse precessions, p1x , p1y , p2x , p2y are the natural frequencies of non-rotating rotors, ω is the angular velocity of rotation, ω1x , ω1y , ω2x , ω2y are the critical rotational speeds, k is the frequency of the response harmonic. Figure 9 shows the amplitude-frequency characteristics for an intrinsic unbalance and relative damping equal to the values at which the analysis was carried out in a finite
374
G. Martynenko
Fig. 8. Campbell resonance diagram of the isolated rotors of the turbocompressor (left) and the generator (right) and orbits of the rotor axis on critical rotational frequencies
element formulation (Fig. 6). Here, Ax1 , Ay1 , Ax2 , Ay2 are the oscillation amplitudes of the left and right supporting sections of the rotors in the horizontal and vertical directions, respectively, ω is the excitation frequency (angular velocity).
Fig. 9. Dependences of the amplitudes of the fundamental harmonic on the frequency of the excitation force (AFC) with the stiffness of all supports (AMBs) 1 MN/m
In order to conduct a comparative analysis, the research results together with the known reference values are summarized in Table 2 (NF in Hz, CS in 2π rad/s). The discrepancy between the values of the critical frequencies found using the SCM and the solutions in the finite element formulation and reference values does not exceed 2.6 and 4.0%, respectively. This allows concluding that the analysis of the real rotor dynamics of GTU units is reliable using the proposed approach to the formation of an analytical mathematical model and a SCM. A comparative analysis of the frequency response for the same parameters (Fig. 6 and Fig. 9) showed the coincidence of the resonant frequencies values (I and II) and their respective amplitudes with an accuracy of 1 and 2%, respectively. The accuracy of calculating the values of resonance frequencies and amplitudes of forced vibrations using the proposed technique is determined by the analytical description of the rotor-in-AMBs system. This enables reducing the error in determining the dynamic characteristics in comparison with modern FE systems and achieving a discrepancy with true values of less than 1%.
Mathematical Modelling and Computer Simulation of Rotors Dynamics
375
Table 2. Comparison of the calculation results (verification of the SCM) Type of rotor movement
Turbo unit machine
Estimated value SCM NF
Reference value Discrepancy, % [27–29] FEM NF
CS NF
CS
Cylindrical Compressor 2.83 precession Generator 3.29
2.85 2.8
2.9 2.9
2.95 1.7
3.31 3.3
3.4 3.4
3.45 2.6
4.0
Conical precession
Compressor 4.26
4.38 4.3
4.4 4.4
4.45 0.5
1.6
Generator
5.85 5.9
5.9 5.9
5.95 0.8
1.7
5.71
CS
FEM
Ref 3.3
5 Conclusion and Discussion The advantages of the described approach are that it allows searching for dynamic parameters of rotors in various types of magnetic bearings, taking into account AMB control algorithms and a large set of nonlinearities inherent in such systems. The implementation of the analytical mathematical model was performed in the form of a simulation computational model of the dynamics of a rotating rotor in magnetic bearings. This made it possible to construct vibrograms, spectrograms, motion trajectories, phase trajectories and stroboscopic Poincare sections, three-dimensional spectra and AFCs for evaluating the vibrational state and the stability of motion for rotation frequencies in a given range. Confirmation of the adequacy of the simulation was performed by comparing the analysis results with calculated and experimental data. The use of information technologies made it possible to create a specialized computer system which allows performing the described modelling of processes and phenomena. In particular, the use of symbolic mathematics, numerical analysis, database elements and calculations, as well as visualization tools allowed a numerical-graphical representation of the results to be implemented. This approach can significantly improve the quality of perception of the simulated phenomena. The use of a simulation model increases the reliability of the calculated determination of the parameters of rotors, magnetic bearings and control systems of active magnetic bearings. This significantly reduces the volume of experimental studies.
References 1. Schweitzer, G.: Applications and research topics for active magnetic bearings. In: Gupta, K. (ed.) IUTAM Symposium on Emerging Trends in Rotor Dynamics 2009, IUTAM Bookseries, vol. 25, pp. 263–273. Springer, Dordrecht (2011) 2. Polajžer, B.: Magnetic Bearings, Theory and Applications. Sciyo, Rijeka (2010) 3. Maslen, E.H.: Magnetic Bearings. University of Virginia Department of Mechanical, Aerospace, and Nuclear Engineering. Charlottesville, Virginia (2000) 4. Bleuler, H., Cole, M., Keogh, P., et al.: Magnetic Bearings. Theory, Design, and Application to Rotating Machinery. G. Schweitzer and E.H. Maslen (eds). Springer, Berlin (2009). https:// doi.org/10.1007/978-3-642-00497-1
376
G. Martynenko
5. Martynenko, G.: The interrelated modelling method of the nonlinear dynamics of rigid rotors in passive and active magnetic bearings. Eastern-Eur. J. Enterp. Technol. 2(5(80)), 4 (2016). https://doi.org/10.15587/1729-4061.2016.65440 6. Martynenko, G.: Accounting for an interconnection of electrical, magnetic and mechanical processes in modeling the dynamics of turbomachines rotors in passive and controlled active magnetic bearings. In: 2018 IEEE 3rd International Conference on Intelligent Energy and Power Systems (IEPS 2018), Kharkiv, Ukraine, pp. 326–331. IEEE (2018). https://doi.org/ 10.1109/IEPS.2018.8559518 7. Jiong, W., Hu, C., Hourao, L.: The relationship between active magnetic bearing system’s stiffness and bias current. In: 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015), Shenzhen, China, pp. 1049–1055. Atlantis Press (2015) 8. Childs, D.: Turbomachinery rotordynamics: phenomena, modeling, and analysis. John Willey & Sons, New York (1993) 9. Gasch, R., Nordmann, R., Pfützner, H.: Rotordynamik. 2., vollständig neu bearbeitete und erweiterte Auflage. Springer, Berlin (2002) 10. Ehrich, F.: Observations of nonlinear phenomena in rotordynamics. J. Syst. Des. Dyn. 2(3), 641–651 (2008) 11. Larsonneur, R.: Modeling and analysis of dynamic mechanical systems with a special focus on rotordynamics and active magnetic bearing (amb) systems. short lecture course. Mecos Traxler AG, Winterthur (2006) 12. Velandia, E.F.R., Santisteban, J.A., Pedroza, B.C.: A displacement estimator for magnetic bearing. In: 18th International Congress of Mechanical Engineering (COBEM 2005), ABCM Symposium Series in Mechatronics, Ouro Preto, vol. 2, pp. 68–75. MG (2006) 13. Okoro, O.I.: Transient state analysis of an Active Magnetic Bearing (AMB) system with six degree of freedoms using MATLAB. Pac. J. Sci. Technol. 6(1), 56–63 (2005) 14. Chen, S.-L.: Smooth stabilizing controllers for a 3-pole active magnetic bearing system. In: 2005 CACS Automatic Control Conference Tainan, Tainan, Taiwan, pp. 1–6 (2005) 15. Ji, J.C., Hansen, C.H., Zander, A.C.: Nonlinear dynamics of magnetic bearing systems. J. Intell. Mater. Syst. Struct. 19(12), 1471–1491 (2008) 16. Rusanov, A., Martynenko, G., Avramov, K., Martynenko, V.: Detection of accident causes on turbine-generator sets by means of numerical simulations. In: 2018 IEEE 3rd International Conference on Intelligent Energy and Power Systems (IEPS 2018), Kharkiv, Ukraine, pp. 51– 54. IEEE (2018). https://doi.org/10.1109/IEPS.2018.8559546 17. Nordmann, R., Aenis, M.: Fault diagnosis in a centrifugal pump using active magnetic bearings. Int. J. Rotating Mach. 10(3), 183–191 (2004) 18. Shi, L., Zhao, L., Yang, L., Gu, H., Diao, X., Yu, S.: Design and experiments of the active magnetic bearing system for the HTR-10. In: 2nd International Topical Meeting on High Temperature Reactor Technology, #Paper D04, Beijing, China, pp. 1–16 (2004) 19. Fu, H.Y., Liua, P.F., Zhang, Q.C., Wang, Y.T.: Vibration modal analysis of the active magnetic bearing system. Key Eng. Mater. 458, 137–142 (2011) 20. Kärkkäinen, A., Sopanen, J., Mikkola, A.: Dynamic simulation of a flexible rotor during drop on retainer bearings. J. Sound Vib. 306(3–5), 601–617 (2007). https://doi.org/10.1016/j.jsv. 2007.05.047 21. Pavlenko, I.V., Simonovskiy, V.I., Demianenko, M.M: Dynamic analysis of centrifugal machines rotors supported on ball bearings by combined application of 3D and beam finite element models. In: 15th International Scientific and Engineering Conference Hermetic Sealing, Vibration Reliability and Ecological Safety of Pump and Compressor Machinery (HERVICON+PUMPS 2017), Sumy State University, Sumy, Ukraine (2017) 22. Boyaci, A., Hetzler, H., Seemann, W., Proppe, C., Wauer, J.: Analytical bifurcation analysis of a rotor supported by floating ring bearings. Nonlinear Dyn. 57(4), 497–507 (2009). https:// doi.org/10.1007/s11071-008-9403-x
Mathematical Modelling and Computer Simulation of Rotors Dynamics
377
23. Guojun, Y., Yang, X., Zhengang, S., Huidong, G.: Characteristic analysis of rotor dynamics and experiments of active magnetic bearing for HTR-10GT. Nucl. Eng. Des. 237(12–13), 1363–1371 (2007). https://doi.org/10.1016/j.nucengdes.2006.09.040 24. Pavlenko, I., Neamtu, C., Verbovyi, A., Pitel, J., Ivanov, V., Pop, G.: Using computer modeling and artificial neural networks for ensuring the vibration reliability of rotors. In: 2nd International Workshop on Computer Modeling and Intelligent Systems (CMIS 2019), Zaporizhzhia, Ukraine (2019) 25. Chasalevris, A., Dohnal, F., Chatzisavvas, I.: Experimental detection of additional harmonics due to wear in journal bearings using excitation from a magnetic bearing. Tribol. Int. 71, 158–167 (2014). https://doi.org/10.1016/j.triboint.2013.12.002 26. Martynenko, G.Y., Marusenko, O.M., Ulyanov, Y.M., Rozova, L.V.: The use of information technology for the design of a prototype engine with rotor in magnetic bearings. In: Nechyporuk, M., Pavlikov, V., Kritskiy, D. (eds.) Integrated Computer Technologies in Mechanical Engineering. AISC, vol. 1113, pp. 301–309. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-37618-5_26 27. Anurov, Y.M., Litvinov, E.V.: Development and operation of serial energy gas turbines with magnetic bearings. Eastern-Eur. J. Enterp. Technol. 4(40), 20–24 (2009) 28. SKF-S2M Magnetic Bearings for Combined Heat and Power Generation Plant. Products leaflets PUB MT/S9 15571 EN, SKF Group, Marcel (2015) 29. Kashtanov, D.: SKF-S2M. Magnetic systems. Technology. General presentation S2M, SKF/S2M (2010) 30. Martynenko, G.: Resonance mode detuning in rotor systems employing active and passive magnetic bearings with controlled stiffness. Int. J. Automot. Mech. Eng. 13(2), 3293–3308 (2016). https://doi.org/10.15282/ijame.13.2.2016.2.0274 31. Martynenko, G., Ulianov, Y.: Combined rotor suspension in passive and active magnetic bearings as a prototype of bearing systems of energy rotary turbomachines. In: 2019 IEEE International Conference on Modern Electrical and Energy Systems (MEES 2019), Kremenchuk, Ukraine, pp. 90–93. IEEE (2019). https://doi.org/10.1109/MEES.2019.8896571 32. Martynenko, G., Martynenko, V.: Numerical determination of active magnetic bearings force characteristics taking into account control laws based on parametric modeling. In: 2019 IEEE International Conference on Modern Electrical and Energy Systems (MEES 2019), Kremenchuk, Ukraine, pp. 358–361. IEEE (2019). https://doi.org/10.1109/MEES.2019.889 6501 33. Van Osch, M.M.E.: Rotor Dynamics of a Centrifugal Pump. Rapportnummer WPC 2006.04. Van Esch, B.P.M. (ed.). Technische Universiteit Eindhoven, Eindhoven (2006)
Computer Method of Determining the Yield Surface of Variable Structure of Heterogeneous Materials Based on the Statistical Evaluation of Their Elastic Characteristics Mariya Shapovalova(B)
and Oleksii Vodka
Department of Dynamics and Strength of Machines, National Technical University, Kharkiv Polytechnic Institute, Kharkiv, Ukraine
Abstract. The study of the material microstructure allows obtaining information about the state of the part without additional tests and full-scale experiments. The paper offers computer methods for constructing parametric, statistically equivalent models of cast iron microstructure with the inclusion of spheroidal graphite. The studied material has a transient microstructure that exhibits variability at various material points. To analyze the unsteadiness of deformations, the Monte Carlo method is used. A finite element model is constructed to find the elastic characteristics of the material. The stress state is considered based on plane models. Numerical experiments are carried out for various concentrations of inclusions. The results obtained for the elastic constants are statistically averaged, and the dependences of the Poisson’s ratio, the moduli of elasticity, and the shear moduli on the concentration of the inclusions are established. For veracity assessment, the values obtained are compared with those obtained using the mixture rule. The results of the application of the rule confirm the correctness of the built models. The yield surfaces are found, going beyond the surface indicates the appearance of plastic strains in the material. Keywords: High strength cast iron · Microstructure · Finite element method · Material properties · Yield surface
1 Introduction The use of composite materials requires a detailed study of their internal structure. To understand the behavior of the structure during operation, it is necessary to know the mechanical properties of the material and the boundary stress values at which failure-free operation is possible. Assessing the internal structure of the sample at the micro-level, the method of analyzing the microstructure image is widely used. High-strength cast iron has found application in mechanical engineering [31, 32]. It is used in critical assemblies, such as gears, gearboxes, suspension arms, etc. A feature of such cast iron is its relatively simple microstructure. The microstructure of cast iron with the inclusion of spherical graphite is represented in Fig. 1. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 378–392, 2022. https://doi.org/10.1007/978-3-030-82110-4_21
Computer Method of Determining the Yield Surface
379
Such approaches to the analysis of the material macrostructure are known in the literature. Experimental [1–6] or direct research. They require the creation of samples and the experiment conducting. Work is known where the influence of the microstructure on crack development [1], phase composition [2], the hardness of the test sample [6].
Fig. 1. The microstructure of high-strength cast iron [25]
The second group of studies [7–13], combines the application of computer vision technology. Image pattern recognition is used to classify the structure [7, 10]; assessment of the number of defects [8]; segmentation of complex microstructures [11], finding the particle sizes and their distribution on the plane, predicting the properties of the material by the image of its microstructure [12, 13]. Other works [14–18] propose modeling the studied microstructure by the finite element method. This paper proposes to create a methodology for studying microstructure without additional full-scale experiment, to use the advantages of computer vision for microstructure recognition. The proposed technique relies on the generation of statistically equivalent microstructure geometry, independent of the particular image. The studied material is characterized by a transition microstructure. Such a structure exhibits variability at various points on the surface. For analysis, a method of averaging material characteristics is used. The position and orientation of graphite inclusions on the ferrite plane are randomly generated by a numerical method. The obtained statistical data allows describing of the variability microstructure influence on the sample’s mechanical properties. Including analysis of the stress-strain state and equivalent elastic constants by the finite element method. Elements of the same technics could be found in [26–30].
2 Objectives As the initial data in the work, images of the microstructure of cast iron are taken (Fig. 1). It is assumed that the structure of cast iron is modeled synthetically, based on actual images of its microstructure. It is necessary to take into account the possibility of a random position of inclusions on the plane and consider the possibility of varying their concentration depending on the size of graphite. To determine the elastic properties of the investigated material by modeling a finite elemental model. To obtain the characteristics
380
M. Shapovalova and O. Vodka
of the elastic moduli, shear moduli, and Poisson’s ratio as a function of the concentration of inclusions. Find the invariants of the elastic moduli. They provide information on the elastic characteristics of the material. Build the yield surface, which ensures the absence of plastic strains.
3 Image Processing and Generation of the Statistically Equivalent Artificial Microstructure Image processing and artificial microstructure generation of high-strength cast iron have been implemented in previous works [19, 20]. The generation of the statistically equivalent microstructure of cast iron is possible by establishing the dependence of the size of inclusions on their concentration. For each concentration case, data have been obtained on the number and size of graphite inclusions on the plane. According to mathematical expectation data, the variance of the radii inclusions, their number per area, the function of the dependence of the size of the inclusions on the concentration have been obtained by (1): M [R] = 18.308 · (ψ − 0.048)0.123 ; D[R] = 9.683 · (ψ − 0.045)0.314 .
(1)
The nature of inclusions obeys the normal distribution law of a random variable. Each radius of graphite inclusions is randomly generated while their total area is less than the required concentration. By concentration (ψ) is meant the ratio of the area of inclusions to the area of the sample, which varies in the range of [0.054..0.3]. The position of radii centers on the plane of the simulated cast iron microstructure also occurs randomly and implemented by a uniform quantity distribution function. The result of the artificial generation of the microstructure is shown in Fig. 2.
Fig. 2. Artificially generated microstructure with an appropriate concentration of inclusions
4 Finite Element Model The construction of the finite elemental model is based on the geometry (Fig. 2) obtained after the artificial generation of the cast iron microstructure. To create the mesh grid,
Computer Method of Determining the Yield Surface
381
a two-dimensional 8-node finite element with two degrees of freedom in each node is used. Typical meshing is shown in Fig. 3. For calculations, it is assumed that ferrite is an isotropic material, in Table 1 shows its mechanical properties, and graphite – has a hexagonal structure of the crystal lattice, the corresponding elastic constants are given in Table 2.
Fig. 3. The typical meshing of cast iron microstructure
Table 1. Properties of ferrite material E, GPa N 210
Yield strength, MPa
0.3 125
Table 2. Properties of graphite material Ex , Ez , GPa
Ey , GPa
νxy
1025
36
0.34
νyz 0.012
νxz 0.16
Gxy , Gxz , GPa
Gyz , GPa
Yield strength, MPa Compression
Tensile
0.18
4.35
480
100
5 Homogenization Procedure and Elastics Constant Determination Based on the macro level, structural elements are considered homogeneous anisotropic materials with averaged elastic characteristics. Hooke’s law for anisotropic material can be written for the case of general anisotropy by (2): (2) εij = Aijkl σkl , (i, j, k, l = 1, 2, 3), where ijkl – elastic constants of equivalent homogeneous material; A σij , εij – the mean strain and strain tensors averaged as the integral by volume (3). 1 1 σij dV ; εij = εij dV . (3) σij = V V V
V
382
M. Shapovalova and O. Vodka
Using the Voigt notation [23] and the above concepts are introduced, the 4th rank symmetric tensor from Eq. (2) can be written using a quadratic matrix. In an arbitrarily chosen orthogonal coordinate system, Hooke’s law can be represented in a matrix form (4) [17]. ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ σx εx a a a a a a 11 12 13 14 16 15 ⎢ ε ⎥ ⎢ a a a a a a ⎥ ⎢ σ ⎥ ⎢ y ⎥ ⎢ 21 22 23 24 25 26 ⎥ ⎢ y ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ εz ⎥ ⎢ a31 a32 a33 a34 a35 a36 ⎥ ⎢ σz ⎥ (4) ⎥×⎢ ⎥=⎢ ⎥, ⎢ ⎢ γyz ⎥ ⎢ a41 a42 a43 a44 a45 a46 ⎥ ⎢ τyz ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎣ γzx ⎦ ⎣ a51 a52 a53 a54 a55 a56 ⎦ ⎣ τzx ⎦ a61 a62 a63 a64 a65 a66 γxy τxy where aik elastic constants; in the general case, the number of independent elastic constants is 21, according to symmetry aik = aki . As a test sample, plane artificially modeled images of cast iron microstructure are taken, therefore it‘s correct to calculate the stress based on plane models. Equation (4) for the case of a plane stress state takes the form (5): ⎤ ⎡ ⎤ ⎤ ⎡ ⎡ σx εx a11 a12 a16 ⎣ εy ⎦ = ⎣ a21 a22 a26 ⎦ × ⎣ σy ⎦. (5) a61 a62 a66 γxy τxy In technical application, such notation (6) are often used [21]: a11 = a16 =
νxy 1 1 1 Ex ; a22 = Ey ; a66 = Gxy ; a12 = − Ex ; a21 ηx,xy ηy,xy ηxy,x ηxy,y Gxy ; a26 = Gxy ; a61 = Ex ; a62 = Ey ,
ν
= − Eyxy ;
(6)
where E x , E y ,– Young’s moduli; ν xy , ν yx – Poisson’s ratio; Gxy – shear module; ηxy,x , ηxy,y – the 1st order coefficients of interaction that characterize the displacement in the directions parallel to the coordinate axes under the action of normal stresses; ηx,xy , ηy,xy – the 2nd order interaction coefficients that characterize the elongation in directions parallel to the coordinate axes caused by the shear stresses. Then Hooke’s law (5), taking into account the notation (6), will take the form (7). Based on the statement about the symmetry of the matrix, have six independent elastic constants, and three are linearly dependent on the diagonal constants determined from Eqs. (8): ⎤ ⎡ 1 − νxy ηx,xy ⎤ ⎡ ⎤ ⎡ σx εx Ex Ex Gxy νyx ηy,xy ⎥ ⎢ 1 ⎣ εy ⎦ = ⎣ − E E G ⎦ × ⎣ σy ⎦. (7) y y xy ηxy,x ηxy,y 1 γxy τ xy E E G x
y
xy
Ex νyx = Ey νxy ; Ex ηx,xy = Gxy ηxy,x ; Ey ηy,xy = Gxy ηxy,y .
(8)
To obtain the matrix it’s necessary to find all the constants four numerical experiments have to be performed. The load diagram of the model, the results of which allows creating a system of linear algebraic equations for the Poisson’s ratio, elastic and shear moduli represented in Fig. 4.
Computer Method of Determining the Yield Surface
383
Fig. 4. Von Mises equivalent stresses under different types of loads
6 The Results of Elastic Constant Conclusion The results of the stress-strain state of the microstructure of cast iron with the inclusion of spherical graphite in four types of load are represented in Fig. 4. Considering the orientation of inclusions to be arbitrary, it is necessary to conduct a series of numerical experiments to obtain elastic constants. To determine the elastic constants, the Monte Carlo method is used. According to this method, the position and orientation of inclusions on the plane are set randomly. For each concentration, 200 Monte Carlo algorithm interventions are performed. The results obtained for elastic constants are statistically averaged, and their dependence on the concentration of inclusions is established (Fig. 5). The dependence (9) is taken as the confidence interval for the calculated data, which for the normal distribution of a random variable corresponds to 99.73% of the probability of the results being in this region. √ (9) αint = M ± 3 · D,
384
M. Shapovalova and O. Vodka
where M and D – mathematical expectation and variance of the corresponding elastic constants. The averaged results for 200 numerical experiments for the elastic moduli, Poisson’s ratio, shear moduli, and the 1st and 2nd order coefficients of the mutual influence of stresses for 17 concentrations of graphite inclusions are given in Table 3. Table 3. Elastic characteristics of the studied material ψ
Ex ,GPa Ey ,GPa Gxy ,GPa νxy , × νyx , × ηx,xy , × ηy,xy , × ηxy,x , × ηxy,y , × 10−2 10−2 10−3 10−3 10−3 10−3
0.054 195.58
196.13
74.54
31.56
31.65
−0.70
0.057 195.70
195.59
74.14
31.54
31.52
1.28
−1.04
3.32
−2.76
0.060 194.32
194.63
73.81
31.73
31.78
0.15
−0.48
0.59
−1.52
0.065 193.37
193.68
73.05
31.75
31.81
1.36
−0.11
3.59
−0.27
0.070 192.70
193.13
72.24
31.71
31.78
1.57
−1.24
4.32
−3.24
0.075 190.87
191.63
72.04
31.98
32.11
−0.21
2.44
−0.45
6.55
0.080 190.02
190.50
71.19
32.07
32.15
−1.35
−1.14
−3.60
−2.98
0.085 189.56
189.72
70.71
32.07
32.11
−1.98
0.17
−5.23
0.090 187.94
188.33
70.64
32.32
32.38
0.65
0.100 186.16
186.16
68.93
32.53
32.53
−1.64
0.49
−4.95
1.78
0.135 178.43
178.75
65.59
33.27
33.35
−1.83
2.40
−5.25
6.52
0.150 174.98
174.17
65.06
34.19
34.03
0.76
2.12
2.89
3.89
0.170 170.40
168.81
63.64
34.99
34.68
−2.81
5.41
−6.61
14.32
0.185 166.62
166.56
62.19
35.21
35.18
1.69
5.09
−7.02
0.200 163.98
163.95
60.76
35.46
35.46
−0.98
3.55
−3.24
9.11
0.250 155.47
156.33
56.25
36.13
36.34
−0.40
−3.51
−0.07
−10.15
0.300 147.21
144.73
53.40
38.09
37.53
1.13
0.31
−0.71
−2.32
2.68
−1.87
1.90
2.67
0.85
0.18 −2.20
6.62
To assess the veracity, the results obtained have been compared with the results obtained using the mixture rule (10). This approach makes it possible to estimate the upper and lower boundaries of the elastic moduli. These estimates correspond to parallel and perpendicular structural elements (Fig. 6). An analysis of the results shows that the mathematical expectation of the equivalent moduli of elasticity is between the upper and lower bounds of the estimate according to the rule of the mixture, which confirms the correctness of the constructed models. However, from it is seen that the upper boundary of the confidence interval exceeds the upper estimate of the elastic moduli. This is because the rule of the mixture does not take into account the random orientation of the principal axes of the graphite crystals, and a comparison is possible only by average values. This is also since the real properties of graphite are much more complicated than isotropic, which provides for the rule of the mixture.
Computer Method of Determining the Yield Surface
385
Fig. 5. The dependence of the elastic characteristics of the material on the concentration of inclusions
Emax = ψ · Eg + (1 − ψ) · Ef ; Emin =
ψ 1−ψ + Eg Ef
−1
,
(10)
where, ψ – concentration in the range [0.054, 0.300]; E g – graphite elastic moduli; E f – ferrite elastic moduli. On the other hand, in the literature [22], the problems of finding the invariants of the elastic moduli tensor are often considered. Such invariants have a mechanical meaning and provide information on the elastic properties of the material under study. The found invariants of the elastic moduli provide information on the properties of the material and require the establishment of a smaller number of independent constants. To find the corresponding invariants, it is necessary to introduce the concepts: eigenvalues – λ, and Eigen tensor of the second rank – qij . Then, for a plane stress state, the tensor takes the form (11):
386
M. Shapovalova and O. Vodka
Fig. 6. The upper and lower boundaries of the elastic moduli of the generated microstructure
Aij · qj = λ · qi , (i, j = 1, 2, 3).
(11)
⎧ ⎨ (A11 − λ) · q1 + A12 · q2 + A13 · q3 = 0 A · q + (A22 − λ) · q2 + A23 · q3 = 0 . ⎩ 21 1 A31 · q1 + A32 · q2 + (A33 − λ) · q3 = 0
(12)
The system of linear Eqs. (12) gives three orthonormal Eigen tensors: qi(1) , qi(2) , qi(3) . Stresses and strains are represented by the expansion along with the basis of intrinsic tensors (13): (σ )
(1)
(σ )
(2)
(σ )
(3)
σi = k1 · qi + k2 · qi + k3 · qi ⇒ σi∗ ; (ε) (1) (ε) (2) (ε) (3) εi = k1 · qi + k2 · qi + k3 · qi ⇒ εi∗ ; (σ ) (1) (σ ) (2) (σ ) (3) k1 = σi · qi ; k2 = σi · qi ; k3 = σi · qi ; (ε) (1) (ε) (2) (ε) (3) k1 = σi · qi ; k2 = σi · qi ; k3 = σi · qi .
(13)
Given Eq. (13), Hooke’s law in matrix form has a diagonal form (14), the elastic moduli are given in three positive definite eigenvalues λi > 0, (i = 1, 2, 3): ⎤ ⎡ ∗ ⎤ ⎡ ∗ ⎤ ⎡ σi 0 0 λ 1 εi ⎣ σ ∗ ⎦ = ⎣ 0 λ2 0 ⎦ × ⎣ ε∗ ⎦. (14) i∗ i∗ εi 0 0 λ3 σi Using the linear algebra library numpy.linalg, the eigenvalue and the right eigenvectors of the square array for the elastic moduli are calculated. The results of mathematical expectation and variance for three invariants of the elastic moduli and six elastic constants for various concentrations of inclusions are shown in Table 4, and such dependence is graphically shown in Fig. 7.
Computer Method of Determining the Yield Surface
387
Table 4. Statistical data of the invariants of the elastic moduli
M[λ1 ], × 1011
D[λ1 ], × 1011
M[λ2 ], × 1010
D[λ2 ], × 109
M[λ3 ], × 109
D[λ3 ], × 109
0.054
2.86
1.49
7.45
0.53
1.74
1.20
0.057
2.86
1.49
7.41
0.64
2.19
1.61
0.060
2.85
1.48
7.38
0.75
2.53
1.61
0.065
2.84
1.47
7.30
1.09
3.15
1.86
0.070
2.83
1.47
7.22
1.08
3.02
2.15
0.075
2.82
1.45
7.20
1.19
3.13
2.07
0.080
2.81
1.44
7.11
1.12
3.55
2.20
0.085
2.80
1.44
7.06
1.31
3.83
2.49
0.090
2.79
1.42
7.05
1.42
3.61
2.35
0.100
2.76
1.40
6.90
1.61
4.07
2.63
0.135
2.69
1.34
6.54
2.26
4.88
3.18
0.150
2.66
1.31
6.49
2.56
5.57
3.84
0.170
2.61
1.26
6.34
2.97
5.34
3.20
0.185
2.58
1.24
6.19
3.08
5.61
4.10
0.200
2.55
1.22
6.05
3.06
4.99
3.58
0.250
2.46
1.15
5.59
3.55
7.29
4.04
0.300
2.36
1.07
5.30
4.05
6.89
4.95
7 The Yield Surface Calculation One of the tasks of materials engineering is to establish the loading conditions that cause plastic deformation. This is important to determine the load combination which leads to a transition from the elastic to the plastic state. To find «safe» loading which is not lead to plastic deformation. In the case of uniaxial loading, this task is not particularly difficult. It is enough to have a relation between stress and strain. Such data can be obtained from experiments on simple tension and compression. However, for materials that are in conditions of two and three-dimensional stress states, everything is not so clear. In such situations, predicting the appearance of plasticity requires additional information. In the case of a three-dimensional stress state, determining the yield surface is a difficult task. This is due to several technical difficulties caused on the one hand by the complexity of the experimental environment, and on the other hand, by the huge number of samples that need to be tested. This problem is especially acute for composite and heterogeneous materials. To solve this problem, computer simulation methods are used.
388
M. Shapovalova and O. Vodka
Fig. 7. The dependence of the invariants of the elasticity moduli on the concentration of inclusions
Computer modeling uses yield hypotheses in complex loading conditions [24]. All hypotheses are based on the assumption that the yield of material in a multidimensional stressed state occurs when the value is reached or exceeded the specific value obtained from a simple uniaxial test. The finding of the yield surface in this work is based on the hypothesis of the specific energy of shaping (the Huber – Mises – Genki hypothesis) [24]. According to the hypothesis, plastic strains of a sample in a complex stress state occurs when the specific formation energy becomes equal to or exceeds the specific formation energy of the material under the action of a uniaxial stress state. For the microstructure which is consists of two types of materials (ferrite and graphite), the maximum stresses for each phase are found. For graphite, the tensile and compressive strengths differ significantly, therefore, separately for each type of stress state, the ratios maximum stresses to the corresponding allowable tensile strength are found. The yield surface is determined by the ratio of the principal stresses to the safety factor. The calculation result for some concentrations of graphite inclusions in the structure of ferrite is presented graphically in Table 5.
Computer Method of Determining the Yield Surface
389
Table 5. Yield surface for different concentration of inclusions KDE processed data
ψ = 0.185
ψ = 0.100
ψ = 0.054
Calculation results
The first column in Table 5 contains some concentrations of graphite inclusions. The second column contains the calculation results presented as a set of points where the abscissa axis is σ 1 and the ordinate is σ 2 . These quantities are the maximum allowable values of the principal stresses for ferrite and graphite materials respectively.
390
M. Shapovalova and O. Vodka
Kernel density estimation (KDE) is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. In the third column of Table 5 presented a PDF of the principals stress values obtained during calculations. Yield surface for different concentration of inclusions built with a library of statistical functions scipy.stats using gaussian_kde method to calculate the estimator bandwidth.
8 Conclusions The paper discusses an algorithm for studying the elastic mechanical properties of cast iron. The elemental analysis of the created structural model is completed, the obtained formulas for determining the elastic moduli, Poisson’s ratio, and shear moduli are completed. The analysis of the dependence of elastic characteristics on the content of graphite inclusions is carried out. To evaluate the results, the mixture rule is applied to the averaged elastic moduli. The results of numerical modeling showed a good ratio of the calculated values of the Poisson’s ratio, elastic moduli, and shear with reference data. On the other hand, the maximum allowable values of the principal stresses for ferrite and graphite materials are calculated. According to the received data yield surfaces for various concentrations of inclusions have been found and constructed. Going beyond the surface indicates the appearance of plastic strains in the part. Acknowledgment. This work has been supported by the Ministry of Education and Science of Ukraine in the framework of the realization of the research projects: «Development of methods for mathematical modeling of the behavior of new and composite materials aims to structural elements lifetime estimation and prediction of engineering designs reliability» (State Reg. Num. 0117U004969), and «Development of methods of computational intelligence in problems of synthesis of characteristics of responsible elements, increase of reliability and efficiency of innovative equipment» (State Reg. Num. 0121U100730).
References 1. Sikoraab, P., Elrahmanac, M., Chunga, S.-Y., Cendrowskid, K., Mijowskad, E., Stephana, D.: Mechanical and microstructural properties of cement pastes containing carbon nanotubes and carbon nanotube-silica core-shell structures, exposed to elevated temperature. Cement Concr. Compos. 95, 193–204 (2019). https://doi.org/10.1016/j.cemconcomp.2018.11.006 2. Salinas, A., Celentano, D., Carvajal, L., Artigas, A., Monsalve, A.: Microstructure-based constitutive modelling of low-alloy multiphase TRIP steels. Metals 9(2), 250 (2019). https:// doi.org/10.3390/met9020250 3. Xu, H., Zhu, M., Marcicki, J., Yang, X.: Mechanical modeling of battery separator based on microstructure image analysis and stochastic characterization. J. Power Sources 345, 137–145 (2017). https://doi.org/10.1016/j.jpowsour.2017.02.002 4. Son, S., et al.: Investigation of the microstructure of laser-arc hybrid welded boron steel. JOM 70(8), 1548–1553 (2018). https://doi.org/10.1007/s11837-018-2876-2 5. Zhang, Y., et al.: Influence of graphite morphology on phase, microstructure, and properties of hot dipping and diffusion aluminizing coating on flake/spheroidal graphite cast iron. Metals 9(4), 450 (2019). https://doi.org/10.3390/met9040450
Computer Method of Determining the Yield Surface
391
6. Ramakrishnan, G., Dinda, P.: Microstructure and mechanical properties of direct laser metal deposited Haynes 282 superalloy. Mater. Sci. Eng. 748(4), 347–356 (2019). https://doi.org/ 10.1016/j.msea.2019.01.101 7. DeCost, B., Holm, E.: A computer vision approach for automated analysis and classification of microstructural image data. Comput. Mater. Sci. 110, 126–133 (2015). https://doi.org/10. 1016/j.commatsci.2015.08.011 8. Pereira, R.F., da Silva Filho, V.E.R., Moura, L.B., Kumar, N.A., de Alexandria, A.R., de Albuquerque, V.H.C.: Automatic quantification of spheroidal graphite nodules using computer vision techniques. J. Supercomput. 76(2), 1212–1225 (2018). https://doi.org/10.1007/ s11227-018-2579-z 9. Campbell, A., Murray, P., Yakushina, E., Marshall, S., Ion, W.: New methods for automatic quantification of microstructural features using digital image processing. Mater. Des. 141, 395–406 (2018). https://doi.org/10.1016/j.matdes.2017.12.049 10. Kwon, O., et al.: A deep neural network for classification of melt-pool images in metal additive manufacturing. J. Intell. Manuf. 31(2), 375–386 (2018). https://doi.org/10.1007/s10845-0181451-6 11. DeCost, B., Lei, B., Francis, T., Holm, E.: High throughput quantitative metallography for complex microstructures using deep learning: a case study in ultrahigh carbon steel. Microsc. Microanal. 25(1), 21–29 (2019). https://doi.org/10.1017/S1431927618015635 12. Fragassa, C., Babic, M., Bergmann, C., Minak, G.: Predicting the tensile behavior of cast alloys by a pattern recognition analysis on experimental data. Metals 9(5), 557 (2019). https://doi. org/10.3390/met9050557 13. Shapovalova, M., Vodka, O.: Image microstructure estimation algorithm of heterogeneous materials for identification their chemical composition. In: IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), Institute of Electrical and Electronics Engineers Inc., Ukraine, Lviv pp. 975–979 (2019). https://doi.org/10.1109/UKRCON.2019. 8879861 14. Hua, F., Yang, Y., Guo, D., Tong, W., Hu, Z.: Cailiao Kexue Yu Jishu Elasto-plastic FEM analysis of residual stress in spun tube. J. Mater. Sci. Technol. 20, 379–382 (2004) 15. Seriacopi, V., Fukumasu, N., Souza, R., Machado, I.: Finite element analysis of the effects of thermo-mechanical loadings on a tool steel microstructure. Eng. Fail. Anal. 97, 383–398 (2019). https://doi.org/10.1016/j.engfailanal.2019.01.006 16. Park, H., Jung, J., Kim, H.: Three-dimensional microstructure modeling of particulate composites using statistical synthetic structure and its thermo-mechanical finite element analysis. Comput. Mater. Sci. 126, 265–271 (2017). https://doi.org/10.1016/j.commatsci.2016.09.033 17. Fischer, C., Reichenbacher, A., Metzger, M., Schweizer, C.: Computational assessment of the microstructure-dependent thermomechanical behaviour of AlSi12CuNiMg-T7—methods and microstructure-based finite element analyses. In: Naumenko, K., Krüger, M. (eds.) Advances in Mechanics of High-Temperature Materials. ASM, vol. 117, pp. 35–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-23869-8_2 18. Vodka, O.: Processing microsection images to determine elastic characteristics of cast iron. IEEE Ukraine SYW-2018 Congress. Student, Young Professional and Women in Engineering, Kyiv, Ukraine (2018) 19. Shapovalovam, M., Vodka, O.: Computer methods for constructing parametric statistically equivalent models of high-strength cast iron microstructure to analyze its elastic characteristics. Notes of the Tavrida National University V.I. Vernadsky. Series: Technical Sciences, vol. 30(69), 6, pp. 179–187. (in Ukrainian) (2019). https://doi.org/10.32838/2663-5941/2019.61/33 20. Shapovalova, M., Vodka, O.: Computer methods for modeling the synthetic structure of cast iron for statistical evaluation of its mechanical properties and strength characteristics. BNTU Minsk: 277–284 ISSN (online): 2310-7405 (2020). (in Russian)
392
M. Shapovalova and O. Vodka
21. Ambatsumian, S.: Theory of Anisotropic Plates. Nayka. Moscow (1967). 268 p. (in Russian) 22. Ostrosablin, N.: About the invariants of the fourth-rank tensor of elastic moduli. Sib. Jorn. Industr. Mach. 1(1), 155–163 (1998). (in Russian) 23. Annin, B., Ostrosablin, N.: Anisotropy of the elastic properties of materials. Appl. Mech. Tech. Physic. 49(6), 131–151 (2008). (in Russian) 24. Beliaev, N.: Strength of materials. Science, Chap. (ed.) Physical and Mathematical Literature (1965). 856 p. (in Russian) 25. GOST 3443–87: Castings of Cast Iron of Various Shapes of Graphite. Methods for determining the structure (ISO 945–75*). [Instead of GOST 3443–77]. M.: Standardinform. (2005). (in Russian) 26. Kudii, D., Khrypunov, M., Zaitsev, R., Khrypunova, A.: Physical and technological foundations of the chloride treatment of cadmium telluride layers for thin-film photoelectric converters. J. Nano. Electron. Phys. 10(3), 03007 (2018). https://doi.org/10.21272/jnep.10(3). 03007 27. Zaitsev, R., Kirichenko, M., Khrypunov, G., Prokopenko, D., Zaitseva, L.: Hybrid solar generating module development for high-efficiency solar energy station. J. Nano. Electron. Phys. 10(6), 06017 (2018). https://doi.org/10.21272/jnep.10(6).06017 28. Avdieieva, O., Usatyi, O., Vodka, O.: Development of the typical design of the high-pressure stage of a steam turbine. In: Ivanov, V., Pavlenko, I., Liaposhchenko, O., Machado, J., Edl, M. (eds.) DSMIE 2020. LNME, pp. 271–281. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-50491-5_26 29. Lytvynenko, O., Tarasov, O., Mykhailova, I., Avdieieva, O.: Possibility of using liquid-metals for gas turbine cooling system. In: Ivanov, V., Pavlenko, I., Liaposhchenko, O., Machado, J., Edl, M. (eds.) DSMIE 2020. LNME, pp. 312–321. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-50491-5_30 30. Shapovalova, M., Vodka, O.: A data-driven approach to the prediction of spheroidal graphite cast iron yield surface probability characteristics. In: Nechyporuk, M., Pavlikov, V., Kritskiy, D. (eds.) ICTM 2020. LNNS, vol. 188, pp. 565–576. Springer, Cham (2021). https://doi.org/ 10.1007/978-3-030-66717-7_48 31. Kelin, A., Larin, O., Naryzhna, R., Trubayev, O., Vodka, O., Shapovalova, M.: Mathematical modelling of residual lifetime of pumping units of electric power stations. In: Nechyporuk, M., Pavlikov, V., Kritskiy, D. (eds.) Integrated Computer Technologies in Mechanical Engineering. AISC, vol. 1113, pp. 271–288. Springer, Cham (2020). https://doi.org/10.1007/9783-030-37618-5_24 32. Kelin, A., Larin, O, Naryzhna, R, Trubayev, O, Vodka, O, Shapovalova, M : Estimation of residual life-time of pumping units of electric power stations. In: IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine. 1, 153–159 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929748
Diagnosis Methods on the Blade of Marine Current Turbine Tianzhen Wang1(B) , Funa Zhou1 , Tao Xie1 , and Hubert Razik1,2 1 School of Logistics Engineering, Shanghai Maritime University, Shanghai, China
{tzwang,zhoufn}@shmtu.edu.cn, [email protected] 2 UMR5005, Univ Lyon, Université Claude Bernard Lyon 1, INSA Lyon, Ecole Centrale de
Lyon, CNRS, Ampère, 69622 Villeurbanne, France
Abstract. The global energy crisis has allowed marine currents to enter the field of vision of all countries. Marine current turbine (MCT) is a kind of deep-sea equipment that converts marine current energy into electric power, and its safe and reliable operation is very important. In order to facilitate monitoring of the blade status of MCTs in multiple scenarios, the chapter deals with the diagnosis method of MCT blade followed by different methods. First, a review of the MCT blade fault diagnosis method has been presented. Then two different methods are discussed in this chapter. One is the signal processing method based on the stator current, which includes VMD denoising and proposed novel LDA classifier; the another one is the image semantic segmentation technique based on the MCT image, which includes semantic segmentation and adaptive recognition technical. These two methods can be organically combined under different biofouling cases. When the biofouling is low to affect the output torque of the turbine, the method based on image processing can be useful. On the contrary, the method based on current signal is more convenient and effective. The experimental results show that the two methods proposed in this chapter can effectively detect MCT biofouling in different scenarios. It also proposes several trends for a handle with biofouling problem in conclusion. Keywords: Marine current energy · Biofouling fault · VMD denoising · LDA classifier · Semantic segmentation · Adaptive biofouling recognition
1 Introduction Due to concerns such as global warming, environmental pollution, and diminishing resources, recent power generation strategies have gradually shifted to the field of renewable energy, e.g., wind, solar, and marine current energy [1–3]. In achieving the goal of carbon neutrality and focusing on building a safe and efficient energy system, marine current energy has entered the vision of all countries. Compared with wind and solar energies, marine current energy has three merits: better stability and predictability, relatively few prerequisites for site selection, and considerably higher energy density than wind. Therefore, marine current power generation has received extensive attention due to its unique advantages [4, 5]. MCT is a low-noise, non-pollutant, and reliable device © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 393–426, 2022. https://doi.org/10.1007/978-3-030-82110-4_22
394
T. Wang et al.
for power generation [6]. As marine current power generation technology matures, fault diagnosis technology ensures the safe and efficient operation of the MCT1 is essential [7–9]. Since MCTs work on the seabed, they are susceptible to corrosion by water and aquatic organisms and cause generator failures, affecting the entire system’s power generation performance [10–12]. Moreover, MCT is easily broken down because it is commonly installed in a harsh marine environment [13–16]. Since MCTs remain underwater for a long time, tiny marine organisms are likely to reproduce on the machinery, which can be considered a biofouling problem [17–19]. In general, the uniform and nonuniform biofouling on MCT blades result in balanced and imbalance faults, respectively. The non-uniform case requires more attention as it can cause distorted output current and voltage that ultimately reduce power generation efficiency [20]. It is thus significant to monitor MCT health conditions to perform prompt maintenance when required [21]. It is common to detect and diagnose MCT by analyzing the signals collected by sensors [22–25]. Xie et al. [26] reviewed several MCT fault cases and proposed future trends, the mentioned method including the stator current signal-based and sensor-based, which are used in fault detection of MCT blade. In [27, 28], VMD2 is employed to decomposed the stator current signal into several IMF. Then the objective IMF is selected by MIC3 . The proposed method is the effective detection of blade biofouling. Hall and speed sensors are used to detect the generator’s fault; However, poor installation of the sensor will also bring the noise to proper signals [29, 30]. Furthermore, these sensors need to be waterproof and long-term maintenance, challenging to be installed under complex marine conditions, while the stator current sensor is free of exposure to the marine condition [31]. Therefore, fault detection and diagnosis methods based on stator current signals are relevant for MCT [32, 33]. The biofouling of MCT will cause the phase modulation of the stator current, making it very popular to diagnose the biofouling based on the phase modulation characteristics of the stator current [34, 35]. In [36], LSE4 is employed to demodulate the amplitude modulation of the stator current. The degree of biofouling can be judged by modulating the amplitude. In [37], LDA5 was used to classify generators’ mechanical faults and electrical faults, achieving good results. In [38, 39], HT6 was applied to the demodulation of stator current signal, and its demodulation error was analyzed. Similarly, the instantaneous frequency of the voltage signal is obtained by the Hilbert transform in [40]. Besides, the maximum likelihood estimator’s angular frequency and complex phasor are obtained in [41]. In [42], Li et al. proposed another strategy using wavelet threshold denoising, and then under different fixed operating conditions, by calculating the statistical indicators in the principal subspace and the remaining subspace of the PCA to detect biofouling faults. In [43, 44], the proposed method is used for denoising based on empirical mode decomposition (EMD). In [45], a new adaptive proportional sampling frequency (APSF) was proposed for sampling the modulus signal so that the fault frequency in the modulus signal became a stable value. 1 Marine Current Turbine. 2 Variational Mode Decomposition. 3 Maximal Information Coefficient. 4 Least Squares Estimator. 5 Linear Discriminant Analysis. 6 Hilbert Transform.
Diagnosis Methods on the Blade of Marine Current Turbine
395
When the biofouling is very light but covers a large area, the image of MCT may be a good option at this time. In recent years, the semantic segmentation technique has been successfully applied to the domain of underwater scene understanding [46, 47] and medical diagnosis [48, 49]. King et al. [50] compared the performance of five patch-based convolutional neural networks (CNNs) and four fully convolutional neural networks (FCNNs) on the semantic segmentation of coral reef images. Ultimately, FCNNs obtain higher localization accuracy but lower classification accuracy than patch-based CNNs. O’Byrne et al. [51] initially segmented marine growth on real submerged structures by a deep segmentation network, which was effectively trained on synthetic imagery. After that, an iterative support vector machine was employed to refine the initial segmentation. This procedure is significant since there are similar features between the synthetic and real data. In [52], a diabetic foot ulcer diagnosis method based on FCNNs and two-tier transfer learning were proposed; this method can automatically segment ulcers and their surrounding skin. Zhao et al. [53] integrated FCNNs and conditional random fields to achieve precise brain tumor segmentation. In this chapter, two biofouling diagnosis methods based on the different sensors are proposed to classify the biofouling severities. The first one is a fault classification method based on the stator current signal, which is different from the vibration signal, which has its substantial advantages. The VMD decomposition method can effectively solve the interference frequency caused by the stator current signal due to the flow rate change, and the feature dimension filter module of the LDA classifier is designed by using the cut-off frequency of VMD to achieve high classification accuracy. The second method is based on image processing and solves the problem of weak attachments. The image segmentation technique is applied to the MCTs scene for the first time, and improvements have been made. When the quality of biofouling limits the first method, the second method can be considered. The second one is an image semantic segmentation technique based on an underwater camera. Those two proposed methods can not only detect the fault but also classify the fault effectively. The experimental results show that those two proposed methods have good consistency and reliability. The rest of this paper is organized as follows: Sect. 2 introduces the diagnosis method based on the VMD and S-LDA, Sect. 3 illustrates the diagnosis method based on adaptive coarse-fine semantic segmentation. The Sect. 4 concludes the chapter.
2 Diagnosis Through Process of Stator Current Using VMD-Denoising and S-LDA Classifier This chapter raises one method of VMD denoising and S-LDA to classify the degree of blade biofouling fault. This method comprises three parts: the HT (Hilbert transform) method, VMD-denoising method, and S-LDA fault classification method. Even in the case of different flow velocity fluctuations, this method can still perform the fault diagnosis of blade imbalance based on the single-phase stator current signal. 2.1 VMD-Denoising for MCT Stator Current Signal The proposed method is only needed to acquire the single-phase stator current signal, and it can be roughly regarded as a narrowband signal in a short period, so HT can be
396
T. Wang et al.
used to phase modulation. The stator current under biofouling fault condition can be expressed as [54]: iA = I sin(pωr t + pωr + ϕ)
(1.1)
where I is the amplitude of the current signal; p is the number of pole pairs and ϕ is the initial angle. The Hilbert transform of the stator current can be defined as follows: +∞ iA (τ ) 1 H [iA (t)] = d τ = iA (t) ∗ (1.2) πt −∞ π(t − τ ) HT can convert the acquired current signal iA (t) into a complex signal, as follows: IA (t) = iA (t) + jH [iA (t)] = A(t)ejφ(t)
(1.3)
where A(t) is the instantaneous amplitude; φ(t) is the phase of the origin signal in the complex exponential signal. φ(t) = arctan A(t) = |IA (t)| =
H [iA (t)] iA (t)
iA2 (t) + H 2 [iA (t)]
(1.4) (1.5)
The instantaneous frequency fe (t) can be finally obtained as follows: fe (t) =
d φ(t) 1 × 2π dt
(1.6)
VMD can decompose the multi-component signal fe (t) into multiple IMF components, and by constructing a constrained variational model to achieve the sum of the bandwidth of each component in the VMD algorithm is minimized. The expression of the VMD algorithm as follows: 2 K j −jωkt L({uk }, {ωk }, λ) = α ∂t σ (t) + π t ∗ uk (t) e 2 k=1
2 K K +f (t) − uk (t) + λ(t), f (t) − uk (t) k=1
(1.7)
k=1
2
where α is the quadratic penalty term; σ (t) donates the Dirac Distribution; λ(t) means the Lagrangian multiplier; uk and ωk = (k = 1, . . . , K) represent the component and frequency center of the IMFk respectively. For convex optimization, the alternating direction method of the multiplier (ADMM) is a good choice. It can be used to update ωk and bandwidth uk . The frequency-domain component is updated as follows: uˆ k (ω) =
fˆ (ω) −
m=k
uˆ m (ω) + λˆ (ω)/2
1 + 2α(ω − ωk )2
(1.8)
Diagnosis Methods on the Blade of Marine Current Turbine
397
ˆ where uˆ k (ω), fˆ (ω) and λ(ω) are Fourier Transform of uk , f (t) and λ(t) respectively. uˆ k (ω) can be regarded as the output of the Wiener filtering of fˆ (ω) − m=k uˆ m (ω) with VMD. Similarly, based on mathematical methods, the center frequency of each IMF and the center of gravity of the power spectrum can be derived, as shown in (1.9).
∞ ˆ k (ω)2 d ω 0 ω u ωk = ∞ (1.9) uˆ k (ω)2 d ω 0
Broadness and fairness are two fundamental attributes of the Maximum information coefficient (MIC), which can effectively measure the direct relationship between two variables. Compared with Pearson, MIC has better robustness and is suitable for processing nonlinear signals. The joint probability of the current scatter falling in each grid is shown as follows: p(x, y) dxdy (1.10) I (x; y) = p(x, y) log2 p(x)p(y) where x, y are two variables. Calculate the MIC value under different scales, and select the most considerable MIC value as the practical MIC value. The specific expression is as follows: MIC(x; y) = max
a·b 1 2 ||γ ||2 where ε > 0 is a small value to prevent the numerical calculation problem; to ensure that γ is greater than zero, equation γ = |γold | should be executed before the constraint. 3.4.3 Procedure of the Biofouling Recognition As shown in Fig. 16, biofouling recognition consists of three stages: data preparation, training, and testing (biofouling recognition) stages. A normalized image and a pixel-level label are first attained for each biofouling class during the data preparation stage. Second, the rotation augmentation scheme is used to extend the attained dataset. In order to confirm the CSSN’s generalization ability, several data distributions are generated for training, validation, and testing datasets by setting diverse seed s. Table 5 shows the parameters of rotation augmentation. In the stage of training, the training data is used to update all the trainable parameters of the CSSN until the iteration number reaches a predefined maximum value. In the testing stage (biofouling recognition), 25 Monte Carlo samplings (MCSs) with 50% dropout are conducted. Initially, 25 softmax segmentation maps with probability are retrieved. Then, compute the mean value and the variance of these maps on the pixel level. After that, the maximum probability of each pixel and its label can be obtained. Finally, the pixel-to-color relationship presented is utilized to generate SM, and an image with the computed variance values is regarded as the UM. Based on the SM, BAP can be calculated by the following equation: AAP =
count(attachment regions) 100% count(attachment regions) + count(healthy blade regions)
where count(·) computes the pixel number of the specified regions.
(1.26)
Diagnosis Methods on the Blade of Marine Current Turbine
413
Fig. 16. Procedure of the biofouling recognition
Table 5. Rotation augmentation configurations for each biofouling category Dataset
B(deg)
seed
K
Total number
Training dataset
[0, 355]
40
99
100
Validation dataset
[0, 355]
400
10
10
Testing dataset
[0, 60]
100
10
10
3.5 Evaluation of Segmentation Performance Three commonly used performance metrics [70] (i.e., pixel accuracy (PA), mean pixel accuracy (MPA) and MIoU are employed to evaluate the segmentation quality. Besides, the following metrics are used:
414
T. Wang et al.
(1) Sensitivity: is the ratio of true positives (TP) to all positives (TP + false negatives (FN)). This metric is equal to the recall or the true positive rate (TPR) that measures the rate of correctly recognized positives. Sensitivity =
TP TP + FN
(1.27)
(2) Specificity: is the opposite to the sensitivity, as it focuses on the ratio of true negatives (TN) to all negatives (TN + false positives (FP)). This metric is equal to 1-false positive rate (FPR). Specificity =
TN TN + FP
(1.28)
(3) Receiver operating characteristic curve (ROC): is a robust metric to evaluate TPR and FPR, and the area under the curve (AUC) computes the area under ROC. (4) Confusion matrix: intuitively shows the classification accuracy for each pixel label. As there is no existing SSN for the biofouling recognition, in the four experiments (i.e., Experiment 1 to Experiment 4), three state-of-the-art segmentation networks (SegNet, Unet and Deeplabv3+ [47]) are used for comparison with the proposed CSSN. From experiment No. 1 to experiment No. 4, the speed of MCT blades gradually increased. As shown in Fig. 17, the higher the rotation speed, the lower the quality of the acquired MCT images have.
Experiment No.1
Experiment No.2
20 r/min
80 r/min
Experiment No.3
Experiment No.4
130 r/min
185 r/min
Fig. 17. Examples of biofouling 1 acquired under four experiments
Diagnosis Methods on the Blade of Marine Current Turbine
415
Due to the lack of MCT image datasets for benchmarking, the three networks are further compared with the CSSN on the CamVid dataset [71]. CamVid, a public road scene dataset, is designed for quantitatively evaluating new segmentation algorithms. This public dataset contains 701 day and dusk scene images (367 training, 101 validation and 233 testing images) at 360*480 resolution and is more challenging than ours as 11 classes (pedestrian, road, car, etc.) need to be segmented. Therefore, the evaluation results on CamVid can be more convincing. 3.6 Application in MCT Blade Since the validation process does not have a gradient back-propagation, it requires less GPU memory than the training process. Therefore, in order to make full use of the GPU’s parallel computing capability, the validation batch size is set to 10, which is twice the training batch size. To obtain a robust network, a shuffling operation is applied to the training stage. After several trials, observe that the training accuracy is boosted significantly during the primary iterations. Therefore, the number of iterations is set to 50 for examining the training performance. Table 6 shows all the parameter configurations. 25 MCSs with 50% dropout are applied to output the testing results. To reduce the effect of randomicity, all the experimental results are averaged by ten repeated trials. Table 6. Training parameters Parameter
Value
The number of iterations 50 Training batch size
5
Validation batch size
10
Shuffle
True
Optimizer
adadelta
Initial learning rate
1.0
Initial γ
1.0
λ
0.01
ε
1e-7
3.6.1 Comparisons of Network Convergence Figure 18 shows the training accuracy graphs. In the early iterations, the training accuracy of Unet improves the slowest, and Deeplabv3+ is quicker than CSSN. In addition, Unet has an apparent accuracy fluctuation throughout the whole training process. After training 50 iterations, the accuracy of CSSN is close to 99.6%. However, Unet keeps the obvious margin to SegNet and CSSN since Unet is too shallow to train. Figure 19 displays the change procedure of γ : at the beginning, it is about 1.0, and it is decreased rapidly after
416
T. Wang et al.
15 training iterations. In the end, the target value (0.5) of γ is obtained, which verifies the efficacy of the proposed adaptive algorithm. The training convergence time is around 42s, as shown in Table 9.
Fig. 18. Training processes of the four networks under experiment No. 1
Fig. 19. Optimization process of γ under experiment No. 1
3.6.2 Recognition Results on Testing Dataset Under Slow Rotation Speed In experiment No. 1, the testing dataset under slow speed is utilized. As shown in Table 7, PA exceeds 99%; nevertheless, MPA and MIoU decrease with biofouling degrees. From the qualitative results in Fig. 20, CSSN accurately recognizes the areas of background, healthy, and biofouling. Through the analysis of ground truth masks, SMs and UMs, it can be observed that while CSSN recognizes a wrong label, the uncertainty increases. Additionally, the uncertainty of class boundary is apparent, which implies the negative effect of fuzzy labels on precise recognition. In addition, the possibility of the wrong prediction is high
Diagnosis Methods on the Blade of Marine Current Turbine
417
as the biofouling pixels are relatively few compared to the background within an entire image. The above analysis can be proved by the confusion matrix shown in Table 8. Even though the percentage of correctly recognized pixels is the largest in each row, the classes of the blade (labeled by “1”) and biofouling (labeled by “2”) can be erroneously recognized as the background (pixel label is “0”) with the highest probability. Table 7. Quantitative recognition results of the CSSN under experiment No. 1. True BAP and Pred BAP are the ground truth and predicted BAP, respectively Category
BAP (%)
Evaluation metrics (%)
True BAP
Pred BAP
PA
MPA
MIoU
0.0
0.0
99.4
98.3
97.1
Biofouling 1
22.1
22.9
99.3
97.4
96.0
Biofouling 2
31.0
30.9
99.4
97.7
96.3
Biofouling 3
40.1
39.8
99.2
96.8
94.7
Biofouling 4
68.9
69.9
99.0
94.9
92.5
Healthy
Ground truth mask Segmentation map Uncertainty map
Biofouling 4
Biofouling 3
Biofouling 2
Biofouling 1
Healthy
Input image
Fig. 20. Qualitative recognition results of the CSSN under experiment No. 1
418
T. Wang et al.
Table 8. Confusion matrix of the five biofouling categories using the CSSN under experiment No. 1. The best percentage in each row is boldfaced Predicted class (%) Pixel label 0
1
Healthy
0
99.7
True class (%)
1
Biofouling 1 True class (%)
1 2
2.1
Biofouling 2
0
99.8
True class (%)
1 2
2.1
Biofouling 3
0
99.7
True class (%)
1 2
2.3
Biofouling 4
0
99.8
True class (%)
2 0.3
0.0
3.1 96.9
0.0
2
0.0
0.0
0.0
0
99.8
0.2
0.0
4.1 95.5
0.4
0.8 97.1 0.2
0.0
3.3 96.3
0.4
0.9 97.0 0.2
0.1
3.6 95.4
1.0
2.5 95.2 0.1
0.1
1
8.6 88.8
2.6
2
2.3
1.5 96.2
3.6.3 Recognition Results on Testing Dataset Under Fast Rotation Speed The testing dataset under fast rotation speed is used in experiment No. 4. As seen from Table 9, the segmentation ability of the four networks degrades as the MCT speed increases. By comparing the MIoU variance over the four experiments, the CSSN and Deeplabv3+ are more stable, while Unet exhibits the lowest robustness. In terms of the comprehensive performance, the proposed CSSN obtains the highest mean of MIoUs and highest MIoU in all the experiments except experiment No. 1 where the CSSN slightly underperforms Deeplabv3+. The resulting improvement of the proposed CSSN to SegNet proves the importance of the inserted fine segmentation branch. Since the CSSN has loaded the pre-trained weights of VGG16, the testing results are much better than that of the CSSN without pre-trained weights, which is named as CSSN* . Figure 21 shows the qualitative comparison results under experiment No4. It can be observed that the CSSN outputs the sharpest contours and best biofouling recognition map among the four networks. Unet predicts the biofouling regions with poor global recognition and coarse contours, which causes the lowest MIoU (66.70%). Figure 22 also verifies the superiority of the CSSN because of its highest AUC (0.9995). Meanwhile, the training efficiency can be examined by comparing the average training time per iteration. As a result, Unet is the fastest model (12 s) due to its shallow structure, and the CSSN takes the longest time (42 s) due to the fine branch added in SegNet. The efficient encoder structure Xception used in Deeplabv3+ speeds up the
Diagnosis Methods on the Blade of Marine Current Turbine
419
network convergence. With our proposed biofouling recognition method, inferring one image takes less than 0.8 s with UM and 0.06 s without, which meets the requirement of real-time biofouling recognition in practical MCT monitoring systems. In addition, configuring a smaller number of MCSs can contribute to a faster uncertainty estimation (Fig. 21). Table 9. Quantitative recognition results of the four networks under the four experiments. CSSN* represents the CSSN without pre-trained VGG16. The last column records the averaged training time per iteration. The boldfaced value represents the best result in each column Networks
Experiment No
Mean (%)
Variance
Time (s)
66.70
85.60
125.46
12
90.15
85.67
91.33
14.28
34
93.62
92.02
88.45
92.40
6.72
32
94.37
91.77
73.22
88.48
78.84
42
94.66
93.04
91.76
93.70
1.94
42
1 (%)
2 (%)
3 (%)
4 (%)
Unet
94.81
92.99
87.88
SegNet
95.17
94.32
Deeplabv3+
95.51
CSSN*
94.56
CSSN
95.32
Unet
SegNet
Deeplabv3+
CSSN
Biofouling 4
Biofouling 3
Biofouling 2
Biofouling 1
Healthy
Input image
Fig. 21. Qualitative recognition results of the four networks under experiment No. 4. Blue and red colors represent the correctly classified and falsely classified pixels, respectively
The pre-trained VGG16 includes five encoder blocks, and the encoded features in each encoder block are constructed into a three-dimensional tensor (height, width, and depth). Here, the grayscale map with a selected depth dimension is visualized. Figure 23(a) provides the visualization maps of the five encoder blocks in experiment No. 4; The shallow encoder blocks can capture brightness, contrast, and contour information, while the deep encoder blocks focus on some relatively abstract features with
420
T. Wang et al.
Fig. 22. ROC of the four networks under experiment No. 4
high-level semantic information. Figure 23(b) shows the global SM and a contour map obtained by the coarse branch and fine branch, respectively. Via the identical visualization method, the efficacy of the coarse branch and fine branch in CSSN: the coarse branch accomplishes the global segmentation, and the fine branch refines the object contours.
Input image
Encoded feature1 Encoded feature2 Encoded feature3 Encoded feature4 Encoded feature5
(a) Feature visualization of the five encoder blocks
Coarse branch output Fine branch output (b) Output visualization of the coarse and fine branches
Fig. 23. Visualization results of the CSSN under experiment No. 4
3.6.4 Parameter Analysis To examine whether the CSSN itself triggers the relatively poor results in experiment No. 4 (91.76%), we increase the number of iterations to 100 to reduce the possibility of insufficient training. After 100 iterations, the CSSN obtains the MIoU improvement from 91.76% to 92.11% (increased by 0.38%), proving that the CSSN can precisely recognize the biofouling under turbid submerged conditions, and sufficient training can enhance the recognition ability of the proposed CSSN.
Diagnosis Methods on the Blade of Marine Current Turbine
421
Subsequently, to investigate the influence of training data, the training dataset size (750, 1000, and 1500) is enlarged then retrained in Experiment 4. The number of iterations is still set to 50. As shown in Fig. 24, feeding more training data to the CSSN gradually improves the segmentation accuracy, as the better feature extraction with sufficient data helps precise pixel-wise classification. However, increasing the number of training data inevitably consumes extra training time.
Fig. 24. Effect of number of training data on the CSSN under experiment No. 4 Table 10. Testing results of the four networks on CamVid. C1 to C11 represent the 11 segmentation classes of sky, building, pole, road, pavement, tree, sign symbol, fence, car, pedestrian and bicyclist. The boldfaced value represents the best result in each column Networks Segmentation class
MIoU (%)
Unet
SegNet
Deeplabv3+
CSSN
C1
85.8
82.9
74.8
83.4
C2
56.4
60.6
55.2
59.8
C3
16.5
16.4
8.1
15
C4
84.3
85.7
77.6
84.6
C5
61
62.1
44.1
64.2
C6
42.8
45.1
42.3
49.1
C7
19.9
20.8
15
21.6
C8
13.4
13.6
12.7
20.5
C9
59.4
58
40.3
59.8
C10
21.2
21.8
14.4
18
C11
9.8
5.1
3.2
7.4
47.9
47.1
43.8
48.6
In order to highlight the superiority of the algorithm proposed in this chapter, SegNet, Unet, Deeplabv3+, CamVid, and the method proposed in this article are used to test the
422
T. Wang et al.
public data set. Under the premise that the MCS technology is not applicable, Table 10 shows the accuracy of the four networks in classifying labels of different categories. It can be found from Table 10 that the method proposed in this article has better results in the classification of different categories. It can be seen from the accuracy of the MioU test that the method proposed in this chapter is more practical than the other three networks.
4 Conclusion and Future Work The blade is the critical component of marine current energy conversion. In order to solve the problem of biofouling of MCT blades, two methods are introduced with their application for fault diagnosis in this chapter. In the first proposed framework, HT is utilized to obtain the instantaneous frequency of single-phase stator current. Then, the component with fault characteristic frequency (1P frequency) is obtained by VMD-denoising. Finally, S-LDA is applied to classify the severities of blade biofouling fault based on the samples of the power signal obtained by PSD analysis. When the quality of biofouling limits the first method, the image semantic segmentation technique can be considered. Based on the MCT image, An adaptive coarse-fine semantic segmentation method is proposed to recognize biofouling from blurry MCT images. This method contains two main parts: (i) a coarse-fine semantic segmentation network (CSSN) that fuses coarse and fine segmentation branches and (ii) an adaptive training method for the CSSN. Specifically, the coarse branch with a deep network (a modified SegNet) aims to conduct the global segmentation, while the delicate branch inserted in the coarse branch is an external network for refining the local contours. The experiment results from a 230 W MCT test platform showed the efficiency of those two methods to classify the biofouling degree in a specific condition. The current signal-based method is simple to operate and does not require additional sensors. The proposed CSSN effectively recognizes the precise location and size of biofouling under slow, fast, and three faster MCT rotation speeds. Therefore, it will be beneficial to combine image sensors and signal-based methods to enhance fault diagnosis performance. However, significant effort is still needed to develop these technologies to achieve a cost-effective blade condition monitoring systems.
References 1. Rivera, G., Felix, A., Mendoza, E.: A review on environmental and social impacts of thermal gradient and tidal currents energy conversion and application to the case of Chiapas Mexico. Int. J. Environ. Res. Public Health 17, 7791–7808 (2020) 2. Freeman, B., Tang, Y., Huang, Y., Van Zwieten, J.: Rotor blade imbalance fault detection for variable-speed marine current turbines via generator power signal analysis. Ocean Eng. 22, 108666 (2021) 3. Batten, W., Bahaj, A., Molland, F., Chaplin, R.: Hydrodynamics of marine current turbines. Renew. Energy 31(2), 249–256 (2006) 4. Qian, P., Feng, B., Liu, H., Tian, X., Si, Y., Zhang, D.: Review on configuration and control methods of tidal current turbines. Renew. Sustain. Energy Rev. 108, 125–139 (2019)
Diagnosis Methods on the Blade of Marine Current Turbine
423
5. Nachtane, M., Tarfaoui, M., Mohammed, M., Saifaoui, D., El Moumen, A.: Effects of environmental exposure on the mechanical properties of composite tidal current turbine. Renew. Energy 156, 1132–1145 (2020) 6. Begg, S., Fowkes, N., Stemler, T., Cheng, L.: Fault detection in vibration systems: Identifying damaged moorings. Ocean Eng 2018(06), 006 (2018) 7. Vinod, A., Banerjee, A.: Performance and near-wake characterization of a tidal current turbine in elevated levels of free stream turbulence. Appl. Energy 254, 113639 (2019) 8. Nachtane, M., Tarfaoui, M., Goda, I., Rouway, M.: A review on the technologies, design considerations and numerical models of tidal current turbines. Renewable Energy 157, 1274– 1288 (2020) 9. Zhou, Z., Benbouzid, M., Charpentier, J., Scuiller, Tang, T.: Developments in large marine current turbine technologies-a review. Renewable Sustainable Energy Rev. 71, 852–858 (2017) 10. Hua-Ming, W., Xiao-Kun, Q., Lin, C., Lu-Qiong, T., Qiao-Rui, W.: Numerical study on energy-converging efficiency of the ducts of vertical axis tidal current Turbine in restricted water. Ocean Eng. 210, 107320 (2020) 11. Wang, S., Zhang, Y., Xie, Y., Xu, G., Liu, K., Zheng, Y.: The effects of surge motion on hydrodynamics characteristics of horizontal-axis tidal current turbine under free surface condition. Renewable Energy 170, 773–784 (2021) 12. Scherelis, C., Penesis, I., Hemer, M., Cossu, R., Wright, J., Guihen, D.: Investigating biophysical linkages at tidal energy candidate sites: a case study for combining environmental assessment and resource characterisation. Renewable Energy 159, 399–413 (2020) 13. Zamudio-Ramirez, I., Antonino-Daviu, J., Osornio-Rios, R., de Jesus, R.-T., Razik, H.: Detection of winding asymmetries in wound-rotor induction motors via transient analysis of the external magnetic field. IEEE Trans. Industr. Electron. 67(6), 5050–5059 (2019) 14. Antonino-Daviu, J., Quijano-López, A., Climente-Alarcon, V., Garín-Abellán, C.: Reliable detection of rotor winding asymmetries in wound rotor induction motors via integral current analysis. IEEE Trans. Ind. Appl. 53(3), 2040–2048 (2017) 15. Ma, H., Zhang, Y., Wei, H., Fu, M., Huang, C.: Diagnosis of stator winding inter-turn short circuit in DFIG based on instantaneous average power in rotor side. Electric Power Autom. Equipment 4, 151–156 (2018) 16. Sharifi, R., Ebrahimi, M.: Detection of stator winding faults in induction motors using threephase current monitoring. ISA Trans. 50(1), 14–20 (2011) 17. Zhang, J., Hang, J., Ding, S., Cheng, M.: Online diagnosis and localization of high-resistance connection in PMSM with improved fault indicator. IEEE Trans. Power Electron. 32(5), 3585–3594 (2016) 18. Hang, J., Zhang, J., Cheng, M., Huang, J.: Online interturn fault diagnosis of permanent magnet synchronous machine using zero-sequence components. IEEE Trans. Power Electron. 30(12), 6731–6741 (2015) 19. Ibrahim, R., Watson, S., Djurovi´c, S., Crabtree, C.: An effective approach for rotor electrical asymmetry detection in wind Turbine DFIGs. IEEE Trans. Industr. Electron. 65(11), 8872– 8881 (2018) 20. Hassanzadeh, R., bin Yaakob, O., Taheri, M., Hosseinzadeh, M., Ahmed, Y.: An innovative configuration for new marine current turbine. Renewable Energy 120, 413–422 (2018) 21. Elghali, S., Benbouzid, M.E.H., Charpentier, J.F.: Modelling and control of a marine current Turbine-driven doubly fed induction generator. IET Renew. Power Gener. 4(1), 1–11 (2010) 22. Pham, H., Bourgeot, J., Benbouzid, M.E.H.: Comparative investigations of sensor faulttolerant control strategies performance for marine current turbine applications. IEEE J. Oceanic Eng. 43(4), 1024–1036 (2017) 23. Hernandez, C., Luis, J., Ledesma, A.-O., Quaternion, S.: Signal analysis algorithm for induction motor fault detection. IEEE Trans. Industr. Electron. 66, 1 (2019)
424
T. Wang et al.
24. Helmi, H., Forouzantabar, A.: Rolling bearing fault detection of electric motor using time domain and frequency domain features extraction and ANFIS. IET Electr. Power Appl. 13(5), 662–669 (2019) 25. Hassan, O.E., Amer, M., Abdelsalam, A.K.: Induction motor broken rotor bar fault detection techniques based on fault signature analysis - a review. IET Electr. Power Appl. 12(7), 895–907 (2018) 26. Xie, T., Wang, T., He, Q., Diallo, D., Claramunt, C.: A review of current issues of marine current turbine blade fault detection. Ocean Eng. 218, 108194 (2020) 27. Wei, J., Xie, T., Shi, M., He, Q., Wang, T., Amirat, Y.: Imbalance fault classification based on VMD Denoising and S-LDA for variable-speed marine current Turbine. J. Marine Sci. Eng. 9(3), 248 (2021) 28. Wei, J., Xie, T., Wang, T.: A VMD denoising-based imbalance fault detection method for marine current Turbine. In: IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, pp. 2813–2818 (2020) 29. Zandi, O., Poshtan, J.: Fault diagnosis of brushless DC motors using built-in hall sensors. IEEE Sens. J. 19(18), 8183–8190 (2019) 30. Yang, M., Chai, N., Liu, Z., Ren, B., Xu, D.: Motor speed signature analysis for local bearing fault detection with noise cancellation based on improved drive algorithm. IEEE Trans. Industr. Electron. 67(5), 4172–4182 (2020) 31. Wang, T., Liu, L., Zhang, J., Schaeffer, E., Wang, Y.: A M- Scherelis fault detection strategy of insulation system for marine current Turbine. Mech. Syst. Signal Process. 115, 269–280 (2019) 32. Zhang, M., Wang, T., Tang, T., Liu, Z., Claramunt, C.: A synchronous sampling based harmonic analysis strategy for marine current turbine monitoring system under strong interference conditions. Energies 12(11), 2117 (2019) 33. Huang, Y., Tang, Y., Van Zwieten, J., Jiang, G., Ding, T.: Remaining useful life estimation of hydrokinetic turbine blades using power signal. In: 2019 IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5 (2019) 34. Saidi, L., Benbouzid, M., Diallo, D., Amirat, Y., Elbouchikhi, E., Wang, T.: PMSG-based tidal current turbine biofouling diagnosis using stator current bispectrum analysis. In: IECON 2019–45th Annual Conference of the IEEE Industrial Electronics Society 1, pp. 6998–7003 (2019) 35. Saidi, L., Benbouzid, M., Diallo, D., Amirat, Y., Elbouchikhi, E., Wang, T.: Higher-order spectra analysis-based diagnosis method of blades biofouling in a PMSG driven tidal stream Turbine. Energies 13(11), 2888 (2020) 36. Trachi, Y., Elbouchikhi, E., Choqueuse, V.: Induction machines fault detection based on subspace spectral estimation. IEEE Trans. Industr. Electron. 63(9), 5641–5651 (2016) 37. Haddad, R.Z., Strangas, E.G.: On the accuracy of fault detection and separation in permanent magnet synchronous machines using MCSA/MVSA and LDA. IEEE Trans. Energy Convers. 31(3), 924–934 (2016) 38. Bouchikhi, E.H.El, Choqueuse, V., Benbouzid, M., Charpentier, J.F.: Induction machine fault detection enhancement using a stator current high-resolution spectrum. In: IECON 2012 38th Annual Conference on IEEE Industrial Electronics Society, pp. 3913–3918 (2012) 39. Yao, G., Pang, S., Ying, T.: VPSO-SVM based open-circuit faults diagnosis of five-phase marine current generator sets. Energies 13(22), 1–28 (2020) 40. Xie, T., Li, Z., Wang, T., Shi, M., Wang, Y.: An integration fault detection method using stator voltage for marine current turbines. Ocean Engineering 226, 108808 (2021) 41. Tajik, M., Movasagh, S., Shoorehdeli, M.A.: Gas turbine shaft unbalance fault detection by using vibration data and neural networks. In: 2015 3rd RSI International Conference on Robotics and Mechatronics (ICROM), pp. 308–313 (2016)
Diagnosis Methods on the Blade of Marine Current Turbine
425
42. Li, Z., Wang, T., Wang, Y., Amirat, Y., Benbouzid, M., Diallo, D.: A wavelet threshold denoising-based imbalance fault detection method for marine current Turbines. In: IEEE Access, pp. 29815–29825 (2020) 43. Benelghali, S., Benbouzid, M.E.H., Charpentier, J.F.: Generator systems for marine current turbine applications: a comparative study. IEEE J. Oceanic Eng. 37(3), 554–563 (2012) 44. Zhang, M., Wang, T., Tang, T.: Blade imbalance fault detection method for directdriven marine current turbine with permanent magnet synchronous generator. Trans. China Electrotech. Soc. 33(1), 38–47 (2018) 45. Xie, T., Wang, T., Diallo, D., Razik, H.: Imbalance fault detection based on the integrated analysis strategy for marine current Turbines under variable current speed. Entropy 22(10), 1069 (2020) 46. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39, 2481–2495 (2017) 47. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision, pp. 801–818 (2018) 48. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 49. Na, A., Aa, B., Eh, B.: Efficient 3D deep learning model for medical image semantic segmentation. Alex. Eng. J. 60(1), 1231–1239 (2021) 50. King, A., Bhandarkar, S.M., Hopkinson, B.M.: A comparison of deep learning methods for semantic segmentation of coral reef survey images. In: IEEE Comput Soc Conf Comput Vis Pattern Recognit Work, pp. 1475–1483 (2018) 51. O’Byrne, M., Pakrashi, V., Schoefs, F., Ghosh, B.: Semantic segmentation of underwater imagery using deep networks trained on synthetic imagery. J. Marine Sci. Eng. 6(3), 93 (2018) 52. Goyal, M., Yap, M.H., Reeves, N.D., Rajbhandari, S., Spragg, J.: Fully convolutional networks for diabetic foot ulcer segmentation. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 618–623 (2017) 53. Zhao, X., Wu, Y., Song, G., Li, Z., Zhang, Y., Fan, Y.: A deep learning model integrating FCNNs and CRFs for brain tumor segmentation. Med. Image Anal. 43, 98–111 (2018) 54. Dudzik, M., Mielnik, R., Wróbel, Z.: Preliminary analysis of the effectiveness of the use of artificial neural networks for modelling time-voltage and time-current signals of the combination wave generator. In: 2018 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM), pp. 1095–1100 (2018) 55. Gemechu, A., Cui, G., Kong, L.: Beampattern synthesis with sidelobe control and applications. IEEE Trans. Antennas Propag. 68(1), 297–310 (2020) 56. Zhang, M., Wang, T., Tang, T.: An imbalance fault detection method based on data normalization and EMD for marine current turbines. ISA Trans. 68, 302–312 (2017) 57. Masci, J., Giusti, A., Dan, C., Fricout, G., Schmidhuber, J.: A fast-learning algorithm for image segmentation with max-pooling convolutional networks. In: 2013 IEEE International Conference on Image Processing. IEEE, pp. 2713–2717 (2014) 58. Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4905–4913 (2016) 59. Youssef, A.: Image downsampling and upsampling methods. International conference on imaging science, systems, and technology. In: CISST’99, pp. 132–138 (1999)
426
T. Wang et al.
60. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556 61. Pellegrini, T.: Comparing SVM, Softmax, and shallow neural networks for eating condition classification. In: INTERSPEECH 2015, pp. 899–903 (2015) 62. Neven, D., Brabandere, B.D., Georgoulis, S., Proesmans, M., Gool, L.V.: Fast Scene Understanding for Autonomous Driving (2017). arXiv preprint arXiv:1708.02550 63. Bottou, L.: Large-scale machine learning with stochastic gradient descent. Physica-Verlag HD 177–186 (2010) 64. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015) 65. Yim, J., Sohn, K.A.: Investigating the feature collection for semantic segmentation via single skip connection (2017). arXiv preprint arXiv:1710.08192 66. Shi, W., Caballero, J., Theis, L., Huszar, F., Wang, Z.: Is the deconvolution layer the same as a convolutional layer? (2016). arXiv preprint arXiv:1609.07009 67. Zheng, Y., Wang, T., Xin, B., Xie, T., Wang, Y.: A sparse autoencoder and softmax regressionbased diagnosis method for the attachment on the blades of marine current turbine. Sensors 19(4), 826 (2019) 68. Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding (2015). arXiv preprint arXiv:151102680 69. Zeiler, M.: Adadelta: an adaptive learning rate method (2012). arXiv preprint arXiv:1212. 5701 70. Yu, H., et al.: Methods and datasets on semantic segmentation: a review. Neurocomputing 304, 82–103 (2018) 71. Brostow, G., Fauqueur, J., Cipolla, R.: Semantic object classes in video: a high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2008)
Author Index
A Abbasi, Ali, 235 Abboud, Dany, 178 Allard, Bruno, 307 Assoumane, Amadou, 178 B Barszcz, Tomasz, 207 Bertail, Patrice, 19 Bielak, Łukasz, 108 C Choklati, Abdelouahad, 193 Clerc, Guy, 307 D Dehay, Dominique, 127 Dilay, Ihor, 352 Dudek, Anna E., 127 Dzeryn, Oksana, 145 E Elbadaoui, Mohammed, 178 Elforjani, Mohamed, 330 G Garay, Aldo M., 19 Grzesiek, Aleksandra, 41, 69 Gursky, Volodymyr, 352 H Had, Anas, 193 Hernández, Fidel, 217 Hologne-Carpentier, Malorie, 307
Horupakha, Viktor, 340 Hoshyarmanesh, Hamidreza, 235 Hulina, Iryna, 340 Hurd, Harry, 127 J Jablonski, Adam, 207 Jales C. S., Isaac, 19 Javorskyj, Ihor, 145 K Kansabanik, Sourav, 265 Krot, Pavlo, 340, 352 L Laha, S. K., 265 M Makagon, Andrzej, 127 Maraj, Katarzyna, 1 Martynenko, Gennadii, 364 Mba, David, 330 Medina, Francyelle L., 19 Michalak, Anna, 69 R Rashidi, Bahador, 279 Razik, Hubert, 307, 393 Rodríguez, Andy, 217 Ruiz, Mario, 217 S Sabri, Khalid, 193 Semenov, Yurii, 340
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 F. Chaari et al. (Eds.): WNSTA 2021, ACM 18, pp. 427–428, 2022. https://doi.org/10.1007/978-3-030-82110-4
428 Shapovalova, Mariya, 378 Shumelchyk, Yevhen, 340 Stawiarski, Bartosz, 93 Swarnakar, B., 265 Szarek, Dawid, 108
U Uke, K. J., 265
V Vodka, Oleksii, 378
Author Index W Wang, Tianzhen, 307, 393 Wyłoma´nska, Agnieszka, 1, 41, 69, 108 X Xie, Tao, 393 Y Yuzefovych, Roman, 145 Z Zhao, Qing, 279 Zhou, Funa, 393 Zimroz, Radoslaw, 352