293 66 6MB
English Pages 208 [209] Year 2023
Progress in Geophysics
Victor Privalsky
Practical Time Series Analysis in Natural Sciences
Progress in Geophysics Series Editor Alexander B. Rabinovich, P. P. Shirshov Institute of Oceanology, Russian Academy of Sciences, Moskva, Russia
The “Progress in Geophysics” book series seeks to publish a broad portfolio of scientific books, aiming at researchers, students in geophysics. The series includes peer-reviewed monographs, edited volumes, textbooks, and conference proceedings. It covers the entire research area including, but not limited to, applied geophysics, computational geophysics, electrical and electromagnetic geophysics, geodesy, geodynamics, geomagnetism, gravity, lithosphere research, paleomagnetism, planetology, tectonophysics, thermal geophysics, and seismology.
Victor Privalsky
Practical Time Series Analysis in Natural Sciences
Victor Privalsky Space Dynamics Laboratory (Retd.) VEP Consulting Logan, UT, USA
ISSN 2523-8388 ISSN 2523-8396 (electronic) Progress in Geophysics ISBN 978-3-031-16890-1 ISBN 978-3-031-16891-8 (eBook) https://doi.org/10.1007/978-3-031-16891-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To the memory of Julius S. Bendat and Allan G. Piersol
Acknowledgements
The author is sincerely indebted to Max Malkin for his valuable help. Special thanks are to Professor Emeritus Randall Allemang for giving me the engineering data for this book and to Alexander Rabinovich for recommending the book to Springer.
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 16
2 Analysis of Scalar Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Preliminary Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 No Preliminary Processing Required . . . . . . . . . . . . . . . . . . . . 23 2.2.2 Linear Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.3 The Hopping Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.4 Seasonal Trend Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.5 Linear Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3 Time Domain Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4 Frequency Domain Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5 Statistical Predictability and Prediction . . . . . . . . . . . . . . . . . . . . . . . . 50 2.6 Verification of GCM-Simulated Climate. The Scalar Case . . . . . . . . 66 2.7 Engineering Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3 Bivariate Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Products of Bivariate Time Series Analysis with AVESTA3 . . . . . . . 3.3 Finding Dependence Between Time Series with AVESTA3 . . . . . . . 3.4 Teleconnection Between Global Temperature and ENSO . . . . . . . . . 3.5 Time Series Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Verification of GCM-Simulated Climate. The Bivariate Case . . . . . . 3.7 Bivariate Analysis of Mechanical Engineering Time Series . . . . . . . 3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107 107 112 117 140 154 159 165 169 171
ix
x
Contents
4 Analysis of Trivariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Products of Trivariate Time Series Analysis with AVESTA3 . . . . . . 4.2 Application to Geophysical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Application to Engineering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173 173 176 181 190 192 192
5 Conclusions and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Abbreviations
AGST AICc AR AR BIC CAT ENSO ESM GCM IPCC KFLT KWT LFLT MESA MFLT MTM NINO PDF PSI QBO RMS RPC SSA SSN SST UEA
Annual global surface temperature Akaike’s corrected information criterion Autoregressive Multivariate autoregressive Bayesian information criterion Criterion of autoregressive transfer function El Niño–Southern Oscillation Electronic supplementary materials Global circulation model Intergovernmental Panel on Climate Change See Table 2.1 Kolmogorov–Wiener theory See Table 2.1 Maximum entropy spectral analysis See Table 2.1 Multitapering method Oceanic component of ENSO Probability density function Hannan–Quinn criterion Quasi-biennial oscillation Root mean square (value) Relative predictability criterion Singular spectral analysis Sunspot numbers Sea surface temperature University of East Anglia
xi
Chapter 1
Introduction
The goal of this book is to provide researchers and students who do not have a serious knowledge of theory of random processes and respective methods of analysis with a proper tool that allows one to easily obtain statistical information which is necessary for studying natural processes and which are still almost unknown in natural sciences. We live in a world where all processes generated by nature are random. It means that the behavior of any natural process presents a random (or stochastic) process and that its behavior is controlled by probabilistic laws. In order to study these processes, understand them, and learn how to predict them, we need to apply methods of analysis developed within the framework of theory of random processes. The current mathematical apparatus offers efficient methods for studying time series— the sample records of random processes—in both time and frequency domains, for building their stochastic models, which reveal statistical properties of sample records, for classifying them as belonging to different types of random processes and, in particularly, for time series predictions. The only non-random, or deterministic process that we know on Earth is tides, which are generated by the lunar and solar gravity forces. Theoretically, they can be forecasted without an error at any lead time. The methods developed within the theory of random processes are intended for working with sample records of scalar and multivariate random processes, that is, for analysis of scalar and multivariate time series. The methods can be quite complicated and learning them may be a problem for nature researches. The book and its attachments are supposed to greatly simplify the computational part of time series analysis and, what is especially important, to put it upon the proper mathematical basis. Each time series (also called a random function of time) presents a time-dependent sequence of random variables; its analysis includes some obligatory tasks which the researcher has to solve irrespective of his or her final goal. If the time series is scalar (a single record), such tasks include:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 V. Privalsky, Practical Time Series Analysis in Natural Sciences, Progress in Geophysics, https://doi.org/10.1007/978-3-031-16891-8_1
1
2
1 Introduction
● determining the degree of proximity of its probability distribution function to the Gaussian (normal) curve; a Gaussian time series has some important properties not possessed by other processes; ● getting a sample correlation function estimate and its autoregressive extension; ● describing the time domain behavior of the time series explicitly with a stochastic difference equation; ● estimating the spectral density of the time series; if you do not know the time series spectrum, you know practically nothing about the time series; ● analyzing the time series statistical predictability and forecasting it in agreement with the theory of random processes. Building a time domain model of the time series in the form of equation is important because it gives you an explicit picture of its internal structure within the time domain. It is also necessary if you intend to predict the time series, that is, to determine its probable trajectory and the variance of the forecast error as functions of lead time. The scalar time series are analyzed and forecasted here with the executable program AVESTA1. If your time series is multivariate (consists of more than one scalar time series), the number of compulsory tasks increases. The time series analyzed with the executable program AVESTA3 attached to the book is treated as a sample of a linear stochastic system having one output time series and one or two input processes. Aside from getting information about its probability distribution, correlation functions, and time domain models, you will get: ● quantitative estimates of relationships between the time series components; respective functions are frequency dependent because the relationship between time series varies with frequency (or with the time scale); this is true for all other characteristics in a multivariate case; the quantitative indicator of the dependence between time series is called the coherence function; ● the response of the output to each input (defined earlier with time domain models in the form of multivariate stochastic difference equations) described now with functions of frequency: spectral densities, coherence functions, coherent spectrum, frequency response functions (gain and phase factor) plus the previously unknown time lag factor. All of these statistical characteristics are estimated and given to the user in a single printout. In the scalar case, it is done in one run of the program AVESTA1; the multi-variate time series are dealt within a single run of the program AVESTA3. The time series, which can be processed with AVESTA1 and AVESTA3, may contain from a few dozens to a million of time units. A home computer time for one run of the program when the time series length does not exceed 104 is measured in seconds or a couple of minutes and can take hours and longer when a multivariate time series is long. This seems to be acceptable for any natural science, from turbulence to climatology. Obviously, the degree of detail and the statistical reliability of results depend upon the time series length. If you have a hundred observations of a time dependent
1 Introduction
3
random variable you will not be able to obtain acceptable estimates of, roughly, more than ten coefficients in its scalar time domain model and your estimates of frequency dependent functions (spectra, coherence, etc.) will be on the brink of being unreliable, especially at low frequencies. All estimates are given with respective confidence intervals or error variance. The practical procedures of analysis with AVESTA1 and AVESTA3 are illustrated in this book with numerous examples using observed and simulated time series belonging to different natural phenomena plus relatively short examples of analysis of engineering time series. The latter are included for giving the reader an idea about the differences between the engineering time series and the data about natural phenomena. To the best of this author’s knowledge, the two AVESTA programs are unique in natural sciences. The book and the programs are intended to constitute easily applicable tools that allow one to obtain important statistical information about scalar, bivariate, and trivariate time series, including (in the scalar case) their statistical predictability properties and mathematically proper forecasting, and about interdependences between the scalar time series considered as components of multivariate stochastic systems. The reader will find here many warnings against mathematically improper and at the same time dominant methods of time series analysis and forecasting which are used in natural sciences. Such woes are repeated again and again in what follows because the level of mathematical knowledge in natural sciences is lagging by many decades behind the current state of the theory of random processes and respective methods of analysis so productively used in applied mathematics and engineering. However, in order to understand what a specific function of frequency is, there is no need to know how to calculate its estimate. Therefore, the reader may simply disregard the critical comments about the mathematical flaws in the current Earth, solar, and other natural sciences in the area of time series analysis and go straight to the instructions about the practical analysis with the attached programs AVESTA1 and AVESTA3. Hopefully, the simplicity of learning how to use the programs and the large volume of useful information about statistical properties of time series provided by them may eventually convince at least some users that mathematical problems should be resolved by using proper mathematical tools. Studying and predicting natural processes is a mathematical task that belongs to the theory of random processes and the attached programs help you to solve this task. All random processes are controlled by probabilistic laws. Normally, every time series presents a sample record of some random process and it should be analyzed using the tools that have been developed within the framework of theory of random processes. Each time series presents a time dependent sequence of random variables; the time series containing observations of natural phenomena constitute a common and highly valuable type of information about the state of nature-created systems. In particular, time series are especially important in climatology as indicators of climate variability at time scales from years to millennia, depending upon the volume of initial data.
4
1 Introduction
They must be analyzed and, if required and possible, forecasted using the means that were developed in agreement with theory of random processes. AVESTA1 is the tool for obtaining estimates of most important statistical characteristics of a scalar time series: its spectral density, its predictability properties, and its mathematically proper forecast. In many Earth sciences, time series are analyzed with the purpose of determining statistical properties of specific natural processes including their relationships with each other, such as the dependence of sea level variations upon atmospheric conditions, the response of surface temperature to solar irradiance, the reaction of local weather or seasonal cycle of precipitation to variations of the most famous phenomenon—the El Niño-Southern Oscillation (ENSO), etc. These tasks are solved with AVESTA3 program which provides information about dependences between scalar time series both in time and frequency domains: bivariate or trivariate stochastic difference equations, feedback indicators, coherence functions, coherent spectra, and frequency response functions. A particularly important part of time series research in natural sciences is forecasting, which is supposed to be made on the basis of the time series behavior in the past. In all cases, it is necessary to remember that we are dealing with time series—the time dependent sequences of random variables. The attached programs can provide information about statistical predictability of the time series and includes, in the scalar case, its optimal linear prediction within the classical Kolmogorov-Wiener theory of extrapolation. This theory developed about 80 years ago provides a mathematically strict solution of the forecasting problem; regrettably, the theory and respective methods of extrapolation remain practically unknown in natural sciences. For any time series, scalar or multivariate, we need to decide, first of all, whether it can be regarded as a sample of some stationary random process, meaning whether its statistical properties remain the same irrespective of the time origin of the ensemble of sample records (time series) belonging to the random process. If this condition is not satisfied, we are dealing with a nonstationary random process and, strictly speaking, it cannot be understood by studying a single sample of the process. The assumption of stationarity works quite often for time series, especially if it does not contain a strong deterministic component. In this book, all time series are analyzed as stationary or as stationary plus a deterministic trend. This assumption is verified through the results of analysis and it turns out to be acceptable in many situations. The major properties of any stationary scalar time series are characterized with two quantities: the spectral density, or the spectrum, that describes the frequency domain distribution of time series energy, and the probability density function (PDF). These characteristics are independent of each other: any two stationary time series with different PDFs may have identical spectral density estimates and vice versa: any two stationary time series with the same PDF may have different spectra. An important question about the frequency domain analysis of scalar and multivariate time series is what method should be used for estimating the time series spectrum. Historically, the first mathematically proper method was developed over 60 years ago and published in the book by R. Blackman and J. Tukey in 1958. The method is based upon a Fourier transform of the estimated covariance function
1 Introduction
5
with subsequent smoothing of the result (e.g., the so-called Hanning operation). This approach was used in the first book by Bendat and Piersol (1966) where the Blackman and Tukey method was extended to the multivariate case so that the spectrum estimate obtained in the scalar case becomes a spectral matrix. This matrix was used to obtain all frequency dependent characteristics of a multivariate time series whose scalar components were regarded as the inputs and outputs of a linear stochastic system. The next step in estimation of frequency dependent statistical characteristics was to split the initial time series into a number of subrecords, obtain a spectrum estimate of each of them through a Fourier transform, and then average them over the entire ensemble of subrecords. The scalar version of this approach was proposed by Welch (1967) and extended to the multivariate case by Bendat and Piersol in 1971. In the classical book by G. Box and G. Jenkins first published in 1970, the spectral estimate is obtained through a parametric time-domain model of stationary time series (autoregressive, moving average, or a mixed autoregressive and moving average). In programs AVESTA1 and AVESTA3, the spectra and spectral matrices are calculated through Fourier transform of the autoregressive time domain model fitted to the time series. If the time series is Gaussian, this approach is called maximum entropy spectral analysis, or MESA. In this book, the autoregressive approach is extended to multivariate time series. In the Gaussian case, the autoregressive estimate of the spectral matrix also presents a maximum entropy estimate. The Thomson’s multitapering method (MTM 1982) works well in the scalar case but this author is not aware of its applications in natural sciences for analyzing multivariate time series. These four methods are mathematically correct. In practice, the Bendat and Piersol approach requires long initial time series. It seems to be dominant in engineering. In natural sciences, the really long time series (105 –106 time intervals) are relatively rare thus leaving us with the Blackman and Tukey’s and the parametric approaches. In the multivariate case, the autoregressive frequency domain analysis used in the book has a weak spot: a mathematically strict method to determine confidence intervals for frequency domain functions does not seem to be available. An approximate approach suggested by this author many years ago is briefly explained in Chap. 2. The frequency domain analysis in this book is always done through the timedomain autoregressive models. This approach is selected because: ● it is physically reasonable: the time series behavior is studied as a function of its past plus noise, ● it provides a time domain model of the time series, and ● it can be used with really short time series (e.g., of length N = 50 in the scalar case). In natural sciences, the scalar spectral estimates are often obtained with methods which cannot be regarded as proper and reliable. For example, the Schuster periodogram that had been introduced at the end of the nineteenth century, should not be
6
1 Introduction
used for spectral analysis because the estimate produced by it is statistically inconsistent: its variance does not decrease as the length of the time series increases (e.g., Bendat and Piersol, 2010, Sect. 5.2.2). A special case is the so called singular spectral analysis (SSA). First of all, the SSA is not a method of spectral analysis because the result of the entire procedure does not include a spectral estimate. Essentially, the SSA is close to a filter designed for signal detection from a noisy background. The tools used by it—covariance matrices, eigenvalues and eigenvectors—belong mostly to the classical mathematical statistics rather than to the theory of random processes. This method cannot be recommended for frequency domain analysis of univariate (scalar) and multivariate time series and one should be very cautious with the estimates of frequency dependent functions obtained after the last SSA stage independently with spectral analysis methods. In addition to estimating the vicinity of the time series PDF to the Gaussian (normal) probability distribution, the program AVESTA1 produces a sampling estimate of the time series correlation function, which is then augmented with its autoregressive (maximum entropy in the Gaussian case) extension. It also produces a quantitative description of the time series dependence upon its behavior in the past in the form of a stochastic difference equation and upon the properties of the innovation sequence (the random component of the equation). The AVESTA1 program also describes, in accordance with the user’s instruction, the degree of the time series predictability and produces its forecast at lead times from a single time step to a predictability limit that depends upon the prediction error variance. The time domain information about statistical properties of the time series is given for each autoregressive order as a stochastic difference equation, which describes in a very simple form the dependence of the time series upon its past. These equations and respective spectral estimates are given by AVESTA1 under the condition that the time series can remember its behavior in the past from a single time step to a user-prescribed maximum. A spectrum estimate is given for each step within this interval and then the optimal model is selected and described in more details. The mathematically proper forecast can be obtained for the time series model selected with one of the five criteria developed in the information theory. If the time series is multivariate, that is, if we have a set of potentially interdependent scalar time series, the goal of analysis is to describe the dependence of the output time series upon the input scalar components. The quantitative information provided by the executable program AVESTA3 is much more detailed. It includes estimates of ● covariance and cross-covariance functions; correlation and cross-correlation functions, ● multivariate stochastic difference equations that describe the time domain dependence of the output time series upon its own past and upon the past of all other scalar time series that belong to the multivariate system, ● statistical properties of innovation sequences and respective predictability criteria, ● spectral density of each scalar component of the multivariate time series,
1 Introduction
7
● ordinary and multiple coherent spectra and ordinary, multiple, and partial coherence functions, ● frequency response functions in the form of gain and phase factors for each output/input track, ● time lag factors corresponding to the phase factors. These functions are necessary for studying relations between time series and most of them remain unknown in natural sciences. Examples of tasks that can be treated in full or partially with the information provided by AVESTA3 include studies of the so-called teleconnections in the Earth system, causality and feedback properties, and time series reconstructions. If the structure of the time series is complicated and its memory extends into a deeper past, the time domain information, which is still available for all models, may be too cumbersome. In those cases, the analysis should be concentrated upon the frequency domain information given by the program. All time and frequency domain estimates are given with respective confidence intervals. Unfortunately, the current situation with analysis of data presented with time series, first of all, in the Earth and solar sciences (but excluding the solid Earth science), is deplorable due to the lack of knowledge of theory of random processes, information theory, and the classical mathematical statistics. Consequently, the proper methods that should be applied to perform analysis of scalar and multivariate time series in both time and frequency domains are practically unknown in natural sciences. These flaws are most severe and harmful in the science of climatology and it is especially regretful today due to the importance of statistical analysis of time series data used within the Intergovernmental Panel on Climate Change (IPCC) project intended to predict the effects of human activities upon the Earth climate. Briefly, the most common and painful mistakes in the analysis of scalar time series characterizing phenomena with time scales from seconds to millennia and longer include: ● the lack of interest to probability distributions of the data, ● low interest to the most important statistical characteristic of any stationary time series—its spectral density—and the use of improper methods applied for spectrum estimation, ● the lack of confidence intervals for statistical estimates, especially, for the spectrum estimates, or the use of a mathematically incorrect criterion of its statistical significance, and ● the practically complete ignorance of the classical Kolmogorov-Wiener theory of extrapolation (prediction, forecasting) of stationary random processes. In the multivariate case, the most harmful flaw seems to be the lack of understanding that the properties of multivariate time series are frequency-dependent so that the methods of mathematical statistic are not applicable for their analysis. As the result of this ignorance, at least two most common and probably most important
8
1 Introduction
tasks of multivariate time series analysis in natural sciences—the time series reconstruction (used, first of all, for restoring the climate during the previous decades, centuries, and millennia) and the so-called teleconnection analysis—are being treated with improper methods and respective results should be regarded as mathematically incorrect. The lack of interest to the probability distributions of data, both scalar and multivariate, is also highly regretful. If the time series belonging to a stationary random process is Gaussian, it means that its best possible prediction (forecast, extrapolation) can be obtained only with methods that agree with the theory of extrapolation developed about 80 years ago by Andrey Kolmogorov and Norbert Wiener. The Kolmogorov-Wiener theory (KWT) proves that it ensures the smallest variance of the forecast error within the class of linear methods of extrapolation and for any method, linear or nonlinear, in the Gaussian case. Obviously, it means that if you are trying to forecast a stationary time series using a linear method, you must do it in accordance with the KWT. It also means that if your stationary time series is Gaussian, you must do it with a linear method that agrees with the KWT. Forecasting a time series is a mathematical problem successfully resolved in the theory of stationary random processes many decades ago and it must be executed with tools created for this purpose in accordance with this theory. Besides, a Gaussian stationary time series is ergodic, which means that its statistical properties estimated from a single time series describe the properties of the random process that generated it. A non-Gaussian stationary time series does not have this property. Another statistics, which is highly important for extrapolation of time series is the spectral density. It serves as a basis for the first practically useful method of extrapolation (Yaglom 1962) and it gives the researcher an immediate idea about the statistical predictability of the time series. If the spectrum is concentrated in a narrow low-frequency band, the statistical predictability of the time series may be high. If the spectrum is concentrated within a relatively narrow band at intermediate frequencies, the time series may contain quasi-periodic random vibrations. If the spectrum is concentrated at high frequencies (not a common case for natural sciences), its predictability is low. Finally, if the spectral density is close to a constant, the time series is close to a white noise—a sequence of identically distributed and mutually independent (uncorrelated in the non-Gaussian case) random variables, and it is unpredictable. Of course, a reliable prediction always requires a sufficiently long time series. Numerous methods of time series forecasting which are used in natural sciences such as neural networks, machine learning, etc. suffer from a critical flaw: the lack of a mathematical theory as a basis for the forecast. This flaw regularly appears in methods of forecasting applied to time series in natural sciences, which makes all of them purely empirical. The almost complete lack of references to the classical Kolmogorov-Wiener theory and the general misunderstanding of methods which can and should be used for extrapolation of stationary random processes is puzzling and presents a serious lapse in natural sciences. This comment includes both scalar and multivariate cases.
1 Introduction
9
The methods traditionally used in natural sciences for analysis of multivariate time series suffer from the same major flaw: they are mathematically incorrect. They are mostly based upon the regression equation, that is, upon the method that belongs to the classical mathematical statistics, which has nothing to do with time series. The reliability of regression analysis results is measured with the cross-correlation coefficient which is correct for sets of random variables and inapplicable to time series—time dependent sequences of random variables. If we have two sets of random variables, which do not depend upon time, and if the linear cross-correlation coefficient between them is sufficiently high, we can build a regression equation which will allow us to calculate the linear part of the set one through the set two and vice versa. If we change the order of the pairs of variables in the same manner in each set (say, numerate the sequence and put the pairs with even numbers after the pairs with odd numbers), the cross-correlation coefficient and the linear regression equation will still be the same. Being independent of time, the sets of random variables cannot have correlation and cross-correlation functions, no spectra, and no other functions of frequency that characterize dependences between time series. Also, the random variables cannot be predicted because the idea of predicting something that does not depend upon time is absurd. These considerations hold for nonlinear regression between time invariant random variables. Historically, the linear regression analysis belongs to the nineteenth century and it remains correct and valuable for time independent random vectors but not for multivariate time series. Yet, in natural sciences, it is still practically the only tool for analyzing interdependences between time series. This is happening in spite of the fact that the proper methods for analysis of multivariate time series based upon the 66 years old publication by Gelfand and Yaglom (1957) became available more than 50 years ago and are widely used in other areas of science and technology. In order to understand a possible reason or reasons for this backwardness of natural sciences it seems reasonable to see what is happening in engineering applications of methods developed within the framework of the theory of stationary random processes. In this book, we will give examples of time series encountered in mechanical engineering and examples of their scalar, bivariate, and trivariate analysis. A major reason for the startling difference between what the engineers do in order to study the effects of random loads upon their constructions or devices with what is being done in natural sciences for studying dependences between time series seems to be the fact that engineers are accountable for what they are doing for practically all of us—the consumers. The consumers will not pay for poorly designed goods: coffee grinders, cars, airplanes, spacecraft, and other items which are supposed to be built to last and stay safe and reliable for sufficient length of time. The same is true for all areas of technology that serve billions of users who do not belong to the same community as the designers and manufacturers of respective products. As consumers, we show our attitude to unreliable and unsatisfactory products by not purchasing them thus acting as reliable, independent, and honest reviewers. Among the Earth sciences, the role of engineers seems to belong to the meteorologists who learned how to forecast with rather high reliability random variations of weather on the basis of fluid dynamics models and observation data. Their forecasts are given in a probabilistic form and are usually quite satisfactory at lead times
10
1 Introduction
up to at least a week and we trust them because they proved the reliability of their predictions many times. In contrast with this situation, we, in other natural sciences and especially in climatology, are the ones who publicize the results of our research by issuing books and articles and we are the ones who review and approve them, the ones who are supposed to read these books and articles and respond to them by producing new books and articles to be read by the members of the same community. We are the producers, reviewers, and consumers of our products, with no really independent referees. We are responsible to nobody but to ourselves. And our students seem to be taught the science in accordance with what we know and then the cycle is repeated. If our results and predictions are incorrect, the consequences will tell upon everybody. We know that we cannot produce reliable forecasts of nature-caused variations of climate and waiting for results of long-term forecasts including external forcing will take decades. Therefore, we have to initiate public campaigns to convince everybody that we are right in our views and that the entire scientific community or its overwhelming majority agree with us. (This latter argument does not belong to science.) Could it be the reason or one of the reasons why we lag behind the engineers in the area of time series analysis and what is the way or ways to improve the situation? Similar problems may exist in other sciences but in this specific case of time series analysis we are involved in an area which belongs to a mathematical discipline—the probability theory and to its part that is called the theory of random processes. It looks like our knowledge in this area is not solid enough even to select mathematically proper tools for applications in our research. Obviously, for this huge climate change program dealing, first of all, with natural, that is, random processes, we need independent reviewers whose areas of expertise cover the theory of random processes and methods of their analysis and prediction. To be specific, the area of time series analysis in natural sciences suffers from several serious flaws caused by the misunderstanding of the role played by the classical mathematical statistics. On the one hand, the mathematical statistics is not used by us when it must be used; on the other hand, it is used for solving the tasks that have nothing to do with this important discipline. For us, the mathematical statistics must be used, in particular, for answering the following two questions of primary importance: how close is the probability density function of my time series to the Gaussian (normal) distribution, and how reliable are my estimates of the time series statistical properties? If a time series, scalar or multivariate, can be regarded as Gaussian, it means that applying any nonlinear method for its analysis and forecasting will be wrong. For example, it means that transforming the original data by squaring it or by using its logarithm (if the data is strictly positive) would be senseless and will result in incorrect conclusions about the time series properties. It also means that if I intend to forecast a Gaussian time series, I must to do it in accordance with the theory of extrapolation created by A. Kolmogorov and N. Wiener. No other approach, linear or nonlinear, can produce better results (also see the preface to Sect. 2.5). Many if not most of our publications on time series analysis do not even contain information about the probability density function of the time series which is being studied.
1 Introduction
11
The cases when estimates of statistical characteristics are given without properly built confidence bounds are numerous, especially, when the estimated statistics is the spectral density. All mathematically proper methods of spectral analysis provide the user with a way to determine the confidence limits corresponding to different levels of statistical significance. Not showing the reliability of statistical estimates makes them absolutely useless. These are some examples when the rules of mathematical statistic are improperly ignored. This book is an attempt to correct at least some of the above-described omissions by offering two mathematically proper, easily applicable, and fast working tools for analysis of scalar and multivariate time series, including their predictability and prediction (the latter, only in the scalar case). These tools—the executable programs AVESTA1 and AVESTA3—lie within the framework of theory of stationary random processes, mathematical statistics, and information theory. They allow one to easily obtain time and frequency domain information about the scalar time series and from the time series which consist of several scalar components. Another goal of the book is to describe, in numerous examples, how to interpret results of analysis with AVESTA1 and AVESTA3 in order to understand statistical and, if possible, physical properties of the time series. The examples from mechanical engineering show how complicated random processes can be and they are intended to convince the reader that proper methods of time series analysis lead to achievements which are vital for the normal existence of the entire society. The information required for understanding random processes that occur in nature and which is provided by the attached programs includes a quantitative description of time series properties in both time and frequency domains. The results of computations contain some simple equations that describe the statistically averaged behavior of the time series as a function of time and a single function of frequency in the scalar case or a set of such functions, which allow one to see how the statistical properties of the process vary along the frequency axis, that is, at different time scales. The function that gives the researcher a good idea about the time series behavior in the time domain and quantitatively describes it in the frequency domain is the spectrum, or the spectral density. The other functions of frequency which define the properties of two or more scalar time series treated as a single stochastic system will be described below. Again, the results of analysis with the executable programs attached to this book include: ● estimates of statistical moments to test the time series probability density function (PDF), ● covariance (in the multivariate case) and correlation functions, ● optimal time domain autoregressive (AR) models of the time series expressed with a scalar or multivariate stochastic difference equations, ● predictability properties and prediction (the latter for the scalar time series only), ● in the multivariate case, the data about feedbacks, information rate, and causality, and ● respective estimates of time series properties in the frequency domain:
12
– – – – –
1 Introduction
spectra, ordinary and multiple coherent spectra, ordinary, multiple, and partial coherence functions, information rates according to Gelfand and Yaglom (1957), gain, phase, and time lag factors for each input process.
All these functions, with the exceptions of sample covariances and correlations, are obtained through fitting autoregressive models to scalar or multivariate time series, selecting the optimal models, and transforming them into the spectral density (the scalar case) or the spectral matrix (the multivariate case). The frequency domain functions listed above are obtained from the spectral matrix and their meaning is explained in Sect. 3.2. The autoregressive modeling is known in natural sciences but a simple explanation given below is intended for the user who is not familiar with the parametric approach to time series analysis. The autoregressive model presents the values of a discrete time series as a linear function of its past values plus noise. The time series x t , t = 1, …, N, having a zero mean value is presented as xt = ϕ1 xt−1 + ... + ϕ p xt− p + at where ϕ j , j = 1, ..., p, are the AR coefficients, p is the AR order, and at is the noise component—a zero mean sequence of mutually independent (or uncorrelated) random variables. In the multivariate case, there will be several equations (two for a bivariate time series, etc.) and each equation may contain contribution from all other scalar time series. Details will be given in the following chapters. To the best of the author’s knowledge, the current software used in natural science for time series analysis, both free and commercial, does not have the abilities of programs AVESTA1 and AVESTA3 to provide both time and frequency domain information about scalar and multivariate time series in one run of the program. At the same time, this approach is known in engineering but their autoregressive models are very complicated and their parametric analysis in the time domain is practically impossible. In Earth and solar sciences, especially in climatology, the models often may have a low AR order, which allows one to study them in the time domain as well. The numerical analysis of time series with the attached programs AVESTA1 and AVESTA3 consists of two stages which are taken care of in one run of the program: ● preliminary processing of the time series by the program to obtain the version that will be analyzed, ● obtaining estimates of time and frequency domain quantities of the time series (scalar or multivariate). All statistical estimates are given with respective reliability indicators in the form of random error variances and/or confidence bounds. Without these quantities, the knowledge of the time series properties is severely flawed and should be ignored.
1 Introduction
13
The resulting information includes time and frequency domain statistical description of time series models for autoregressive orders from 0 (or 1 in the multivariate case) to the maximum order M given by the user. The time series length should not exceed 106 , the order M should not be higher than 99 (scalar) or 50 (multivariate). In the scalar case, the program AVESTA1 requires slightly over 2 s to obtain all AR models from 0 through 99 for a time series of length 105 . This includes more detailed results for the optimal models selected by one of the five information criteria used in the programs. All that the user needs to do in order to run a program is to have a time series and a short file CAT.DAT with the initial parameters that control the program. The more complicated multivariate program AVESTA3 requires less than 5 min to calculate bivariate AR models from 1 through 50 for the time series of length 105 . It takes seconds for time series of length of several thousand but can take many hours for really long (close to 106 ) trivariate time series. The computation results include detailed information about time series properties in the time domain for each autoregressive order and in the frequency domain for the models selected by five order selection criteria. The book contains examples of computations and analysis of their results in Earth sciences, solar research, and engineering. It cannot be regarded as a substitute for such important and almost universal tools of time series analysis in general as, for example, the monograph by Shumway and Stoffer (2017) or the Signal Processing Toolbox in Matlab. This book deals with a number of tasks designed for researching the most important properties of time series in time and frequency domains; this restriction allows one to create executable programs that provide the necessary information in a single run for a sequence of models with subsequent selection of the optimal models recommended by the order selection criteria and in accordance with the user’s instructions. The information obtained for the optimal model is more detailed in both the scalar and multivariate programs. The mathematical basis for the programs became available in time series analysis as long ago as in the late 1950s and in 1960s (Gelfand and Yaglom 1957; Yaglom 1962; Granger and Hatanaka 1964; Bendat and Piersol, 1966) and then due to the remarkable book by G. Box and G. Jenkins, first published in 1970 and for the fifth time in 2015. A fundamental treatment of random processes is also contained in the two-volume monograph by Yaglom (1986). The apparently original in the natural sciences feature here is the ability to obtain in a single step an optimal, in the information theory sense, time domain autoregressive model of a scalar or multivariate time series and use it to estimate its frequency domain properties. It means that the frequency domain quantities such as spectra, coherent spectra, coherence functions, gain, phase, and time lag factors are obtained here directly from the optimal time domain models of the time series built during the same run of the program. The existing universal tools of general time series analysis do not seem to have this property of combining time and frequency domain information in the multivariate case (possibly, it is available in engineering). This approach allows one to obtain smooth estimates of spectral density as well as ordinary, multiple, and partial coherent spectra and coherence functions along with estimates
14
1 Introduction
of the frequency response functions connecting the components of multivariate time series to each other. The analysis of models of autoregressive orders from zero to a maximum value prescribed by the user is conducted in one run of the program. The book contains some theoretical information given in a straightforward form with just a few simple equations and, hopefully, sufficient for giving the reader an understanding of the character of random processes generated by nature. The easy-to-use tools of time series analysis offered in this book help the reader and user to avoid approaches that lead to mathematically erroneous results and to incorrect decisions based upon them. In accordance with the considerations given in this chapter, the book contains three more chapters dedicated to scalar, bivariate, and trivariate time series analysis in time and frequency domains. Each chapter contains a short and simple explanation of the theoretical basis, a description of respective computer program, and examples of its applications with comments and conclusions. The programs produce a large volume of information but the accent in our approach is made upon analyzing the results obtained through the use of the program rather than upon describing how to obtain these results. The results are obtained here quite straightforwardly due to the convenience in using the programs. This approach makes the user’s task much easier. Additionally, Chap. 2 contains sections dedicated to the preliminary processing of scalar time series. This stage includes different types of transformation of the initial time series through data averaging (for example, switching from hourly to daily data by averaging over 24 h, or seasonal trend deletion) and filtering with low-pass, highpass, or band-pass linear filters. The time series is analyzed for the presence of a linear trend which can be deleted in accordance with its intensity and/or with user’s instructions. These options are available in both AVESTA1 and AVESTA3 programs. Chapters 3 and 4 deal with bi- and trivariate time series regarded as linear stochastic systems with one output and one or two input time series. Such data are normally analyzed in natural sciences through regression equations and, consequently, lead to incorrect results. The scalar program AVESTA1 provides, among other things, detailed information about statistical predictability of the time series and can forecast it according to the Kolmogorov-Wiener theory of extrapolation. Chapters 3 and 4 also contain some information about statistical predictability but do not include prediction examples. The time series forecasting (prediction, extrapolating) is the task that is regularly mistreated in natural sciences by applying methods that are not based upon any mathematically proper theory. In natural sciences including climatology and other areas of geophysics or solar physics, we always deal with random processes, even in the cases when the forecast is based upon numerical solutions of fluid dynamics equations (e.g., weather prediction or climate “projections”). A proper mathematical theory as a foundation for time series forecasting exists and is absolutely necessary for practical applications. It presents an important part of the probability theory but stays practically unknown in natural sciences. The lists of references in this book are intentionally short. If one were to include references to different methods of analysis and, especially, forecasting used in natural sciences, it would probably have to include dozens of monographs and thousands of
1 Introduction
15
articles. Unfortunately, many or even most of them suffer from the lack of a theoretical foundation and this drawback can be healed only through the acquaintance with respective mathematical theory. The reference list here includes several classical monographs that have been written at a more or less engineering level. The requirement regarding the close knowledge of mathematical issues becomes less important when analysis is conducted with the programs AVESTA1 and AVESTA3. The AVESTA executable programs given here had been written in FORTRAN 35–45 years ago with subsequent updates by the author who is not a professional programmer. To the best of the author’s knowledge, the programs do not contain errors but there is an unsolved problem which is related to error variances of estimated frequency dependent quantities in the bivariate and trivariate cases that requires attention of professional mathematicians. The problem is caused by the lack of a mathematically strict solution for the error variances of frequency domain characteristic obtained for the multivariate time series through time domain autoregressive models. The solution suggested by this author is approximate. Another problem is the unreliable estimation of the partial coherent spectra in the trivariate case related to the ordering the input processing; it is discussed in Chap. 4. The linear extrapolation of multivariate time series is not included into the AVESTA3 program. Yet, having in mind the disregard of the theory of random processes and respective methods of their analysis that exists in Earth sciences and in solar physics, the tools offered here may bring the attention of natural science researchers to the classical works published for applied scientists and engineers since the early 1960s (Yaglom 1962, 1986; Bendat and Piersol 1966, 2010; Box et al. 2015). Actually, this is not the first document containing software for autoregressive analysis of scalar and multivariate time series in the time and frequency domain: similar programs had been published by the Utah Climate Center almost 30 years ago (see Privalsky and Jensen 1993). To conclude this introduction chapter, this book does not contain a list of problems for the reader. The reason is very simple: the entire book is designed as a manual for using the executable programs AVESTA1 and AVESTA3. Every example given here can be regarded as a problem; by repeating the examples and making sure that the results coincide with what is given in the book, the user finds the answers to the problems and gets the experience in applying the programs for practical analysis of time series. The data that are not easily available on the Internet are given in the Electronic Supplementary Material (ESM) attached to the book. The Sects. 2.6 and 3.6 dedicated to verification of climate modeling results are written in co-authorship with V. P. Yushkov.
16
1 Introduction
References Bendat J, Piersol A (1966) Measurement and analysis of random data. Wiley, New York Bendat J, Piersol A (1971) Random data: Analysis and measurement procedures. WileyInterscience, New York Bendat J, Piersol A (2010) Random data. Analysis and measurements procedures, 4th edn. Wiley, Hoboken Blackman R, Tukey J (1958) The measurements of power spectra. Dover Publications, New York Box GEP, Jenkins GM (1970) Time series analysis. Forecasting and control. Wiley, Hoboken Box G, Jenkins G, Reinsel G, Ljung G (2015) Time series analysis. Forecasting and control, 5th edn. Wiley, Hoboken Gelfand I, Yaglom A (1957) Calculation of the amount of information about a random function contained in another such function. Uspekhi Matematicheskikh Nauk 12:3–52. English translation: American Mathematical Society Translation Series 2(12):199–246, 1959 Granger C, Hatanaka M (1964) Spectral analysis of economic time series. Princeton University Press, Princeton Privalsky V, Jensen D (1993) Time series analysis package. Utah State University, Utah Climate Center Shumway R, Stoffer D (2017) Time series analysis and its applications. Springer, Switzerland Thomson D (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096 Welch P (1967) The use of Fast Fourier Transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Trans Audio Electroacoust, AU 15:70–73. https://doi.org/10.1109/tau.1967.1161901 Yaglom A (1962) An introduction to the theory of stationary random functions. Prentice Hall, Englewood Cliffs Yaglom A (1986) Correlation theory of stationary and related random functions. Basic results. Springer, New York
Chapter 2
Analysis of Scalar Time Series
2.1 Introduction All nature-generated phenomena on the Earth are controlled by randomness. It can be random events such as the number of lightning strikes per day, or random phenomena such as hurricanes, or random processes such as hourly or annual variations of surface temperature. This entire book is dedicated to analysis of time series, that is, to timedependent sequences of random variables, and each time series, if it is not simulated, presents a sample of a natural process and should be treated as a source of information about respective nature generated process. This chapter is dedicated to analysis and forecasting of scalar or univariate, time series. If a process running in time can be predicted without an error, it is deterministic. A process running in time and not having this property of errorless predictability is random. A stricter definition is that a process occurring in time and controlled by the probabilistic laws is random. All processes generated by the Earth nature are random. The only deterministic process on our planet is the tides but they are caused by the lunar and solar gravity forces, not by the Earth system. In engineering, there may be both deterministic and random processes but, with one exception, we will be dealing in this book with random processes. The processes that are studied here are always discrete, that is, they present sequences of random values separated from each other by a constant time interval. The distance between consecutive values of the process defines the sampling rate (for example, one observation per day or sixty observations per minute). A time series is a time dependent sequence of random variables and it always presents a sample record of some random process that generated the time series. It should always be remembered that a time series is a random function of time and this definition makes it cardinally different from sets of random variables, which do not Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-16891-8_2
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 V. Privalsky, Practical Time Series Analysis in Natural Sciences, Progress in Geophysics, https://doi.org/10.1007/978-3-031-16891-8_2
17
18
2 Analysis of Scalar Time Series
depend upon time. Consequently, the methods of time series analysis are different from methods of analysis of random variables. When studying a time series, we understand that it has been generated by some random process such as the air temperature variations (geophysics), daily, monthly or annual changes in the sunspot numbers (solar science), or vibrations of an artificial structure caused by wind (engineering). All observed or calculated time series have a finite length and it is normally assumed that the characteristics estimated from a time series of finite length describe the behavior of the process to which the time series belongs both in its past and in its future or in the form of a different sample record. The random variables are not time dependent and have no past and no future. Moreover, we assume that if we were to have or do have many time series generated by the same process, its statistical properties obtained by averaging over the repeated observations in the form of time series obtained at different initial times would be independent of those time origins. For example, let the process that is being studied is the sea waves caused by storms at a specific geographical point and we measure its characteristics for many storms, say, for 60 min after the storm becomes well developed. In engineering, it could be a response of new cars of the same model to the same uneven road measured at the same moments after the car reaches a constant speed. If we measure statistical characteristic of sea waves or statistical characteristics of car response by averaging over a set of sample records at the same moments of time for many individual experiments and find that the results do not depend upon the initial moment, the random process is stationary. A simplified definition is that the process is stationary if its statistical properties do not depend upon the time origin. If it also turns out that the results of averaging over an ensemble of experimental results at different initial times coincide with the results obtained by averaging over any individual time series obtained during a large series of experiments are identical, the process is ergodic. In the ergodic case, it is proper to say that the correctly and reliably estimated statistical characteristics have been the same in the past and will be the same in the future. In other words, by analyzing a single sample record of an ergodic process, we obtain statistical characteristics of the random process as a whole. As our experiments can be repeated only a limited number of times and as the length of the time series obtained during each experiment is finite, getting absolutely correct estimates is not possible and the issue of coincidence of estimates is to be resolved within the framework of the classical mathematical statistics. This means that the estimates are regarded as equivalent if it can be shown that the differences between them stay within the range of sampling variability of respective estimates. Building confidence intervals for estimates of statistics is a major function of the classical mathematical statistics. In this book, as in the majority of other applied studies of time series, the properties of stationarity and ergodicity are accepted by default. A nonstationary process cannot be ergodic by definition and estimates of its statistical properties are time dependent; their analysis requires special methods and, if studied with a single sample record, the record must be very long (also see Bendat and Piersol 2010).
2.1 Introduction
19
If the process is ergodic, it means that the properties estimated over a single time series of observations characterize the entire process; therefore, the assumption of ergodicity is a very important step. At the same time, we usually have no way to verify this property, which puts us in a difficult position. Yet, all of us make this assumption when we believe that the information obtained from one time series is correct for other similar data, other observation times, and other space coordinates. However, this assumption may become absolutely correct if we study the probability distribution of our time series. If the process is stationary and if its probability distribution function is Gaussian (normal), the process is also ergodic (e.g., Yaglom 1962). This is probably the most important reason why one should always estimate the probability density function of the time series under the study. The assumption of ergodicity in the non-Gaussian case cannot be proved mathematically but “it is often assumed without proof, referring to physical intuition” (Yaglom 1987, Vol. 2, pp. 69, 76). Another important concept in theory of random processes is the linear process. Its definition is based upon the properties of the process which presents a time dependent sequence of identically distributed and mutually independent random variables – the white noise. A stationary random process that is obtained by a time-invariant linear transformation of white noise is called linear or linearly regular process. In our analysis of stationary random time-dependent phenomena, we always deal with the linear processes. A linear process cannot contain any strictly periodic components but it may have components which are allowed to be arbitrarily close to periodic. Thus, in what follows we shall be working with the time series that present samples of ergodic and linear random processes. Moreover, the probability distribution of the process is generally assumed to be Gaussian. This assumption is important, in particular, for the task of time series forecasting (prediction, extrapolation). Otherwise, all statistical characteristics such as mean value, variance, skewness and kurtosis, correlation functions and spectra are calculated in the same way irrespective of the time series probability distribution. The issue of non-Gaussianity does not affect the analysis of statistical moments such as correlation functions and spectral densities. The role of non-Gaussianity in time series forecasting is discussed in Attachment 2.2 to this chapter. Note also that a nonstationary random process cannot be studied on the basis of a single sample record but requires an ensemble of sample records. This is true because different samples of a nonstationary process may have different statistical properties. In this book, it is always assumed that the time series that we are analyzing is a sample of some stationary random process whose current value depends upon a finite number of its past values and upon a disturbance represented at every time step with a single white noise variable also called the innovation sequence. The coefficients that define the dependence upon past values of the time series are constant. This is what the autoregressive (AR) model of a time series is and it is quite reasonable physically and can be regarded as a digital analog of a differential equation plus an additional random disturbance term. This is what is called the stochastic difference equation. The first of the few examples of stochastic difference equations is given in Example 2.1.
20
2 Analysis of Scalar Time Series
The number of previous values of the time series which affect the current value is called the autoregressive order. This mathematically and physically sensible approach puts no limits upon analysis and upon the physical nature of the time series. Any time series generated by a linearly regular stationary process can be approximated with an AR model. Consider now the stages of analysis of scalar (univariate) time series with the executable program AVESTA1 that deals only with scalar time series.
2.2 Preliminary Processing The initial time series may require some preliminary processing, which is executed by the program in accordance with the user’s instructions. It includes, optionally, a linear trend analysis and removal, averaging, analysis and removal of the seasonal or diurnal trend, and low-pass, high-pass, or band-pass linear filtering. All preliminary processing and especially the linear filtering should be done on the basis of rational considerations. Examples of preliminary processing of real or simulated time series are discussed below and illustrated with figures that show the results of processing upon the time series spectrum. Such figures serve as a basis for understanding and interpreting the time series properties. The initial time series presents a set of random variables distributed in time. Its length (the number of terms) is N and the constant interval between the neighboring terms (the sampling interval) is DT. These and all other parameters are contained in the file CAT.DAT which is required by the AVESTA1 program. The complete explanation of the CAT.DAT file is given in Table 2.1. Thus, the stage of preliminary processing consists of the following options of time series transformation prior to its analysis in time and frequency domains. They include • linear trend analysis (applied in all cases) and trend removal (if desired); this operation is conducted at the last stage of the preliminary processing and the trend may be deleted or not deleted according to the values of the parameter K in CAT.DAT, • parameter R is not used but should always be in CAT.DAT, • averaging the time series values over a given time interval with no overlapping such as transforming the time series of average hourly values to average daily values (the “hopping” averaging), • removal of a quasi-periodic trend, for example, the seasonal trend in monthly observations, • linear filtering of the time series. The entire set of preliminary actions discussed in this chapter can also be made by the program AVESTA3.EXE designed for analysis of multivariate time series.
2.2 Preliminary Processing
21
Table 2.1 Initial parameters for time series processing 500
50
501
1
0
1
1
1
0
0
0
500
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
N—the number of terms in the time series (should not exceed 106 ) M—the maximum autoregressive order (should not exceed 99, or 50 for AVESTA3) NF—the number of frequencies in the spectral estimate (should not exceed 5001) K, R—parameters of subroutine TREND (R is always zero) K = 0—the trend will not be deleted K = 1—the trend will be deleted if it is statistically significant K = 2—the trend will be deleted L—the interval of the “hopping” averaging (averaging is not performed if L = 1) LS—the period of seasonal trend; it will not be deleted if LS = 1) DT —the sampling interval (DT > 0) MFLT —the half-length of the filter’s weighting function (no filtering performed if MFLT = 0) KFLT —determines the type of filtering: low-pass, high-pass, or band-pass filtering (0, 1, or 2, respectively) LFLT —determines the filter’s type: 0 equal-weights running averaging 1 Gaussian 2 Bartlett 3 Tukey 4 ’bell-shaped’ band-pass (only for KFLT = 2) ENDDATE—the extrapolation parameter—the time corresponding to the last known value of the time series (no predictability analysis and no extrapolation if ENDDATE = 0); for example, if you want to predict the time series from year 2020, ENDDATE should be equal to 2020
The preliminary processing may include only one of the above-listed options plus the trend analysis and removal. After the preliminary processing or when no preliminary processing is required, the AVESTA1 results will include: • initial information from the CAT.DAT file, • the time series and its major statistical moments (mean value, variance, root mean square value (RMS), skewness, kurtosis, and the standardized versions of skewness and kurtosis), • linear trend slope and the RMS error of its estimate with a conclusion about its statistical significance at a 95% confidence level and the action taken according to the value of K, • a sample correlation function estimate,
22
2 Analysis of Scalar Time Series
• AR orders from order 1 to the maximum order M given in CAT.DAT (values of M exceeding N/10 are not recommended), • AR coefficients for each AR order, • spectral density estimate for each AR order at frequencies from 0 to 1/2DT cycles per time interval (the Nyquist frequency, e.g., 0.5 cph or 0.5 cpy if DT = 1 h or DT = 1 year) in accordance with the parameter NF, • information needed for selecting an optimal AR order: variances of the innovation sequence, values of five order selection criteria, and the order selected by each criterion. It should be noted that if the variance of the time series is close to the variance of the innovation sequence, the time series is close to a white noise. It may happen for some criteria even if the time series spectrum is not monotonic. The further information is given for the autoregressive model of order p selected by the corrected Akaike information criterion AICc. It includes • the autoregressive extension of the first p values of the sample correlation function (if the time series is Gaussian, both the correlation function extension and the spectrum are the maximum entropy estimates), • the numbers of mutually independent (uncorrelated in the non-Gaussian case) observations and respective confidence intervals for the estimates of time series mean value and RMS, • estimated AR coefficients with respective confidence intervals and RMS errors, • number of degrees of freedom N/p for the spectral estimate, • spectral estimate with approximate 90% and 95% confidence limits, • predictability parameters: the relative predictability criterion and the correlation coefficient between the predicted value and the unknown true future value; these are given for lead times from 0 to the value at which the criteria become approximately equal to 1 and 0, respectively, • the time series forecast at lead times from one DT to the lead time indicated by the predictability criteria. The last two items are given only if the parameter ENDDATE in the CAT.DAT file shows the time of the last observation used for extrapolation of the time series. If ENDDATE = 0, there will be no information on predictability properties and no prediction. All operations required for preliminary processing can be executed by both AVESTA1 and AVESTA3 programs. In the latter case, the preliminary processing operations should be identical for all scalar components of the time series.
2.2 Preliminary Processing
23
2.2.1 No Preliminary Processing Required Example 2.1. Data with DT = 1 Consider an example with no preliminary transformations: the time series X.TXT of length N = 145 presents a simulated sample of an autoregressive model of order p = 2 (see Fig. 2.1a). We will analyze and predict the time series using the first 100 values of it. The CAT.DAT file parameters in this example are given after the figure. 100
10
501
1
0
1
1
1
0
0
0
100
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
It is strongly recommended that the maximum order of autoregressive analysis should not exceed N/10. Therefore, we have M = 10 here. The parameter NF defines the frequency resolution of the spectrum and the value NF = 501 is recommended for the cases when the spectral estimate does not contain narrow peaks. The value NF = 501 means that all spectra will be given for frequencies 0, 0.001, 0.002, …, 0.5. The entire interval of frequencies always lies between 0 and 1/2DT; the sampling interval DT here is 1 (e.g., 1 h), so that the total interval of frequencies is from 0 to 0.5 cycles per the sampling interval, that is, from 0 to 0.5 cycles per hour (cph) in this case. When the time series dimension is millimeters and the dimension of DT is hours, the spectrum dimension is mm2 /cph, or mm2 ×h. The trend parameter K = 1 means that the trend will be deleted if it is statistically significant. It will be deleted if the absolute value of the trend exceeds the error of trend estimate multiplied by 2. If K = 0, the trend will not be deleted. The parameter
Fig. 2.1 Time series X.TXT and its sample correlation function according to AVESTA1 (black line)
24
2 Analysis of Scalar Time Series
R = 0 is not used by the program but it should always be included into CAT.DAT. In this case, the trend is statistically insignificant. When L = 1 or LS = 1, there will be no hopping averaging or deletion of the “seasonal” trend (for example, the daily trend). If the parameter MFLT is zero, the time series will not be filtered. More detailed information about the filtering parameters will be given later in this chapter. The ENDDATE = 100 parameter means that the time series will be extrapolated with the last known value coinciding with the last value of the time series given in CAT.DAT. If, for example, the last value at N = 100 was observed in 2020, the ENDDATE should be set to 2020. If ENDDATE is zero, there will be no prediction. For further issues regarding the extrapolation see Sect. 2.5. Run the program. The absolute values of standardized skewness and kurtosis are less than 2, which means that the time series PDF can be regarded as Gaussian. The trend estimate is found to be statistically insignificant (it is smaller than the doubled RMS error) and will not be deleted. It shall be deleted irrespectively of its statistical significance if the parameter K equals 2. The next printout in X_100.RES is a nonparametric estimate of the time series correlation function. It is given for lags up to N/5 but not greater than 500 (Fig. 2.1b). The gray line will be explained later in this example. The N/10 or less following printouts (10 in this case) show the current autoregressive order, estimates of autoregressive coefficients for that model, and respective spectral density estimate. These parts of the output file contain intermediate results for user’s convenience. The following printout contains information about the innovation sequence variances and the order selection criteria for each order from 0 to M. It also tells one what AR order was selected by each criterion. In this case, three criteria recommend an autoregressive model of order 2, that is, AR(2). Actually, the optimal order in this program is always selected in accordance with the corrected Akaike criterion AICc, which usually selects the highest order among the five criteria. If the correct or desired order is smaller than what is given by the AICc, the computation should be repeated with a smaller maximum order M. In this case, the AICc’s choice is 7 while three criteria prefer order 2. Therefore, we change M in CAT.DAT to M = 2 and run AVESTA1 again. The new file with the results can be named as X_100_p = 2.res. The information about the selected model begins with the analytical extension of its correlation function in accordance with the first p values of the sample correlation function given at the beginning stage of computations. The parameter p here is the order of the selected model, that is, p = 2. The first p values of the extension will always coincide with the sampling estimates at the first p lags. The gray line in Fig. 2.1b shows the maximum entropy extension of the correlation function. The values of the “integral” (actually, it’s a sum) of the correlation function and its square are given for determining the number of mutually uncorrelated (independent, in the Gaussian case) terms in the time series. If N is the total length of the time series, the numbers of independent terms of the time series for estimates of the mean
2.2 Preliminary Processing
25
Table 2.2 AR coefficients, their 90% confidence bounds and RMS errors No
Lower bound
COEFFS
Upper bound
RMS
1
0.1455
0.3101
0.4748
0.1004
2
− 0.7008
− 0.5362
− 0.3715
0.1004
value and variance are N/a and N/b, respectively. Here, a and b are the above given “integrals”. In the file X_100_p = 2.RES, it is 39 and 26. These values are used to determine the confidence bounds for estimates of the mean value and variance (see Yaglom 1987, Vol. 1, Chap. 3). The model selected for the X.TXT time series by all criteria is now AR(2) and the following information will be given for this model. The AR coefficients describe the dependence of the time series upon its past values. In this case, that dependence lasts for two hours. The respective equation for the time series x t is xt = 0.3101xt−1 − 0.5362xt + at , where at is a white noise innovation sequence; in this case, its variance is 0.717541 (see the printout under the title “WHITE NOISE VARIANCES AND CRITERIA …”. The variance and the RMS error of coefficient estimates given after the title “AR COEFFICIENTS, THEIR 90% …” show that in this case both coefficients are statistically different from zero (Table 2.2). The coefficients tell us, in particular, about possible presence of the damped quasi-periodic oscillations in the time series model, about its damping coefficient, etc. (see Box et al. 2015, Sect. 3.2). In this case, the frequency of these pseudoperiodic oscillations is found from the roots of the equation 1 − 0.3101x + 0.5362x 2 = 0. The roots lie within the unit circle and are complex-valued. This means that the time series belongs to a stationary random process which also contains damped oscillations. Their frequency is close to 0.21 cpy (cycles per hour) while the damping coefficient is 0.707. (This information is not given by AVESTA1.) Thus, the time series is both stationary and Gaussian; consequently, it is ergodic so that the estimates of its statistics can be regarded as the estimates of statistical properties of the random process which generated the time series X_100.TXT. Moreover, in the case of a Gaussian stationary process, the autoregressive estimate of the spectrum is also a maximum entropy estimate. The order p = 2 selected now by all five criteria is low so that the number of degrees of freedom N/M is 50 (see the printout). It means that the statistical reliability of the spectral estimate is quite good. The spectrum estimate is given after that in the fourth column while other columns contain respective 95% and 90% confidence intervals. The spectral estimate with its 90% confidence interval is shown in Fig. 2.2 in linear (a) and logarithmic (b) scales on the vertical axis; it is obviously dominated by quasi-periodical oscillations at 0.2 cpy but it does not mean that such oscillations are so strong that they can be detected visually (see Fig. 2.1a).
26
2 Analysis of Scalar Time Series
Fig. 2.2 Spectrum estimate of the time series X.TXT at N = 100 in linear (a) and logarithmic (b) scales on the vertical (spectrum) axis and linear scale on the horizontal (frequency) axis
The role of the peak in the energy balance of a time series can be measured by the square root of the ratio of the innovation sequence variance to the time series variance. In this case, it is 0.717541/1.02862 ≈ 0.70, which means that the role of the two deterministic terms – the AR coefficients – in determining the model’s variance is 30%. In other words, the behavior of this time series in the frequency domain does not differ much from a white noise behavior. The rest is supplied by the innovation sequence, that is, by noise. The predictability properties of the time series are characterized with two criteria: the relative predictability and the correlation coefficient between the predicted and unknown true future values. Respective results in the printout are given in Table 2.3. The table shows quantitatively that the predictability of this time series is low: the relative predictability criterion which coincides with the square root of the ratio of predictability error variance at a given lead time to the variance of the time series is close to unity at all lead times in this case. Table 2.3 Predictability parameters
No
Lead time
REL_PRED-TY
COR_COEFF
0
0.0000
0.0000
1.0000
1
1.0000
0.8352
0.5499
2
2.0000
0.8745
0.4851
3
3.0000
0.9485
0.3167
4
4.0000
0.9817
0.1907
5
5.0000
0.9888
0.1493
6
6.0000
1.0000
0.0000
2.2 Preliminary Processing
27
Yet, the future values of the time series are known to us and we can show the forecast of X.TXT from time 100 at lead times from 1 to 6 and compare it with the actual data that have not been used in analysis of the time series X.TXT (Fig. 2.3). In this case, the extrapolation results are satisfactory for the simple reason that the statistical predictability of this time series is low and the confidence limits for the forecast are close to the maximum value even at the smallest lead time, that is, one hour (see Table 2.3). The forecast of this or any other time series through its stationary AR model lies within the framework of the Kolmogorov-Wiener theory of extrapolation and, as the time series is both stationary and Gaussian, its linear extrapolation (forecasting, prediction) has the smallest possible error variance among all linear and nonlinear methods of extrapolation. In other words, with this time series there can be no way to obtain predictions with a smaller error variance. The issues related to time series forecasting are discussed in more detail and with more examples in Sect. 2.5. Thus, our analysis of the time series X.TXT of length N = 100 showed that: • being both stationary and Gaussian, the time series belongs to an ergodic random process and we can state that similar results will be obtained for any other sample record of the process; • the current value of the time series is affected by two previous values, that is, the process which generated the time series, has a memory of two sampling intervals, • the coefficients relating the current value to the past values are statistically significant, Fig. 2.3 Extrapolated (gray) and observed (black) values of the time series X.TXT. The dashed lines show the 90% confidence limits for the forecast
28
2 Analysis of Scalar Time Series
• the spectral density of the autoregressive model of order p = 2 [that is, AR(2)] is smooth and has a statistically significant peak at about 0.2 cycles per unit time (hour), which corresponds to the time scale of about 5 h, • in spite of the presence of the peak, the white noise innovation sequence plays a dominant role in the time domain behavior of this time series, • in particular, it means that the statistical predictability of the time series X.TXT is low and the confidence limits for its forecast quickly grow to their maximal value equal to the time series RMS multiplied by a confidence level coefficient (1.64 and 1.96 for confidence levels 0.90 and 0.95, respectively). In concluding this example, compare the results produced by AVESTA1 with the results of analysis conducted with Matlab Signal Processing Toolbox. First, the AR coefficients for the AR(2) models of the time series X.TXT and its spectra are close to each other. The Matlab’s confidence interval for the spectral estimates is slightly wider than the approximate confidence interval given with AVESTA1. Certainly, the Matlab’s intervals are mathematically correct but the small difference with that confidence interval is intentional and will be explained later in connection with determining the confidence bounds for autoregressive estimates of frequency dependent functions in AVESTA1 and AVESTA3. This ends Example 2.1.
2.2.2 Linear Trend The presence of high-energy components in time series at very low frequencies is a common phenomenon in the Earth sciences. Usually, it is caused by natural factors because the time scales of many processes that happen within the Earth system atmosphere–ocean–land–cryosphere are not limited and can be longer than millions of years. Such long-term phenomena cannot be studied quantitatively if the time series contains several hundreds or even several thousand years. A simple practical rule is that the largest time scale in the time series that can be studied with an acceptable degree of reliability should be about an order of magnitude shorter than the time series length. For example, if you have a 50-years long time series and use it to restore climate into hundreds of years into the past, your results will probably be incomplete, to say the least, because the reconstruction will not include any longterm variability. Obviously, the length of the time series that is being investigated puts a limit upon the maximum time scales that can be studied with the time series. A low-frequency trend can also be caused by external factors. For example, a river streamflow or a lake water level can be affected due to a growing water withdrawal for agriculture. Certainly, the most famous trend of this sort is the trend in the global annual temperature possibly caused by anthropogenic factors. If one is studying the nature-caused variability of climate at scales not exceeding several years or even
2.2 Preliminary Processing
29
decades, the long-scale variations should probably be suppressed irrespective of their provenance. The trend is usually assumed to be linear though it can actually be just a short part of a long-term oscillation. But in any case, one has to make a decision about keeping the time series as is or deleting the trend. The presence of a strong linear trend can make the time series nonstationary; a way to test the time series for stationarity had been proposed by this author (Privalsky 2021, Chap. 4) but the autoregressive modeling which is done with the executable program AVESTA1 attached to this book will usually find a stationary solution. The final decision is to be made by the user but it should be based upon rational considerations. The first task related to the trend in a time series is, of course, estimating its slope and determining if the estimate is statistically different from zero. The linear trend slope is calculated here under the assumption that the time series presents a sequence of mutually independent (or uncorrelated) and identically distributed random variables, that is, a white noise. Strictly speaking, this is incorrect because the estimated slope of the trend line depends upon the degree of dependence between the sequential terms of the time series (serial correlation). However, the effect of serial correlation upon the trend slope estimate does not seem to be large. This traditional white noise assumption is always used for the trend removal task and it is also used here. The program tests the time series for the presence of linear trend and removes or does not remove it in accordance to the trend’s statistical significance and/or in accordance with user’s instruction (see Table 2.1). Example 2.2 Monthly Global Surface Temperature, 1857–2020 All nine time series of spatially averaged surface temperature from global to hemispheric, terrestrial, and oceanic available at the web sites of the University of East Anglia and at other similar sources contain a strong trend that can be approximated with a straight line. Before continuing with the example, it should be noted that recently the University of East Anglia’s data set of surface temperature variations consisting of the files HadCRUT4 (global), CRUTEM4 (land), and HadSST3 (ocean), has been revised by introducing a new data set (HadCRUT5, CRUTEM5, and HadSST4) to improve the previous version; these efforts have resulted “in greater warming of the global average” (Morice et al. 2021) over the entire set from 1857 (in our work) through 2020. This new set is given at the UEA’s web site https://crudata.uea.ac.uk/cru/data/ temperature. After the update, the trend of the monthly global temperature had increased by about 15% for the interval from 1857 through 2020 and by 30% for the last 15 years. The upsurge in the warming rate in the recent past due to the transfer to the new data set is especially strong over the southern hemisphere ocean: the trend rate has increased by over 80% between January 2005 and December 2020 against approximately 10% between January 1857 and December 2020. The corrected new set shows a faster than in the previous data global warming during the last 15 years, that is, during the time when the amount and quality of data should have been at least not worse than
30
2 Analysis of Scalar Time Series
Fig. 2.4 Time series of a monthly global temperature HadCRUT4 (gray) and HadCRUT5 (black) and b SST temperature HadSST3 (gray) and HadSST4 (black) for the southern hemisphere ocean; January 2005–June 2021
for all other parts of the time series. An example of the changes in the linear trend values for the last 15 years in the UEA data sets is shown in Fig. 2.4. According to the figure, the rates of warming have increased due to the corrections of the data by 30% for the global temperature and by 84% for the southern hemisphere ocean. This is a significant warming of the planet and the southern hemisphere ocean caused by anthropogenic activity. For the time series of annual global and southern hemisphere sea surface temperature, the increase in the rate of growth during the last 100 years amounts to about 20% and 24% (Fig. 2.5). In what follows, we will be using this improved version of the global temperature given at the UEA website. Our task in this example is to estimate the effects of linear trend deletion upon stochastic models of spatially averaged monthly surface temperature from 1857 through 2020 and we will do it with the globally averaged time series HadCRUT5 and with the time series of sea surface temperature averaged over the entire southern hemisphere ocean (OSH5). The CAT.DAT file for calculating the linear trend in the monthly time series should be: 1968
99
501
0
0
1
1
0.083333333
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
2.2 Preliminary Processing
31
Fig. 2.5 Time series of a annual global temperature HadCRUT4 (gray) and HadCRUT5 (black) and b annual SST temperature HadSST3 (gray) and HadSST4 (black) for the southern hemisphere ocean; 1920–2020
when the trend is not removed and with K = 1 for the version with the linear trend deleted. Both time series prior and after the trend removal are shown in Fig. 2.6. The AR models of the global (HadCRUT5) and oceanic (OSH5) temperature turn out to be rather complicated with optimal orders changing between 10 and 48 for HadCRUT5 and between 5 and 62 for OSH5. Simple analysis shows that the loworder models are too simple. However, the time series in this case are quite long as compared to even the highest order of p = 62: the ratio N/p = 31 is high enough for getting a statistically reliable spectral estimates in this case. Eventually, the order selected for the time series HadCRUT5 prior and after the trend deletion were set to
Fig. 2.6 Time series HadCRUT5 (a) and OSH5 (b) before and after trend deletion (gray and black)
32
2 Analysis of Scalar Time Series
Fig. 2.7 Spectra of monthly global temperature a and oceanic temperature over the southern hemisphere surface b before (black) and after (gray) trend removal. The dashed lines show the 90% confidence limits
p = 36 and p = 62 for the time series OSH5. These orders are too high for analyzing the time series in the time domain. Respective spectra are shown for clarity in a logarithmic scale on both axes (Fig. 2.7). Obviously, the removal of the trend affects the spectra only at very low frequencies and does not change anything in the spectra at frequencies higher than 0.03 cpy (time scale of about 30 years). As seen from the figure, both time series contain rather sharp peaks at the seasonal trend frequency of 1 cpy but their energy is relatively small. Another feature of these spectral estimates is a statistically significant smooth bump at about 0.2–0.3 cpy. This phenomenon will be discussed in Chap. 3 but here we need to discuss a problem that one may encounter with autoregressive spectral analysis. The analysis of the HadCRUT5 file with AVESTA1 and the CAT.DAT file given above shows that the optimal order of the best AR model should be equal to 10: the model recommended by the criteria RICc and BIC. The other three criteria prefer different orders: 36 (AICc), 24 (PSI), and 48 (CAT). The spectral density of the formally best model AR(10) is monotonic and the two statistically significant peaks at about 0.25 cpy and at 1 cpy disappear. Having in mind that the presence of the physically explainable peak in the spectrum of the global temperature at the seasonal frequency 1 cpy and another such peak at about 0.25 cpy related to the ENSO (e.g., Privalsky 2021, Sect. 8.2), and the fact that the other three criteria have preferred much higher orders, we may select the average of the three higher order models, that is, AR(36). The respective spectral density estimate is shown in black in Fig. 2.7a. Note also that three criteria differ from each other by 12 time units (12 months) so that the differences are probably related to the seasonal cycle. Thus, there may be the cases when the AR order shown by the majority of order selection criteria can be disregarded; however, it should be
2.2 Preliminary Processing
33
done only under the condition that such decision is supported with physical and/or computational arguments. Testing for possible effects of linear trend removal upon statistical properties of the time series is recommended in most cases.
2.2.3 The Hopping Averaging The term “hopping” is used here for the operation when the neighboring intervals of averaging do not overlap. For example, it may be a transformation of data with a 5 min sampling interval (DT = 5 min) into hourly data (DT = 1 h) so that the averaging interval will be L = 12. The amount of data diminishes by L times while the unit time interval becomes L times longer. Usually, the goal of such averaging is to see the low-frequency part of the process’ spectrum such as moving from monthly to annual data. The analysis of monthly global temperature data would not be informative if one is interested in climatic phenomena, that is, when the time scales of interest begin with months (seasonality) rather than with years (climate variability). Example 2.3 Switching to Larger DT Consider what happens when the daily data (DT = 1 day) is transformed into monthly data, which means that the unit time interval changes from 1 day to 1 month (respective Nyquist frequencies are 0.5 cpd and 0.5 cpm). The initial data presents daily evaporation values at Kew West, Florida, USA taken from the site https://www.epa. gov/ceam/meteorological-data-florida and shown in Fig. 2.8a. The time series covers the interval of 30 years (1961–1990, N = 10,957). The CAT.DAT file in this case is: 10957
99
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The AVESTA1 results for the daily time series show that its PDF is not Gaussian. The time series does not contain a significant linear trend and the optimal model for it is AR(63); it has been selected by two of the five order selection criteria: Akaike corrected information criterion (AICc) and Parzen criterion of autoregressive transfer function (CAT). The spectral density estimate is shown in Fig. 2.8b with gray curves. Yet, in spite of the large amount of data in the time series, most AR coefficients in that model are statistically insignificant (the 90% confidence interval for their estimates includes zero) so that it seems reasonable to try the other criteria, which showed orders 16, 18, and 23. The smallest order was recommended by the SchwarzRissanen Bayesian information criterion (BIC) and the spectral estimate is given in Fig. 2.8b with a black curve. The differences between the two estimates occur not only at low frequencies but also at about 0.025 cpd and higher. It means that the selection of the AR(16) model would lead to a loss of some statistically significant features. That is why, generally, it is recommended to select the model indicated by the majority of order selection criteria.
34
2 Analysis of Scalar Time Series
Fig. 2.8 Time series of mean daily evaporation at Kew West, 1961–1990 (a) and its spectral estimates (b)
Averaging the daily data over 30 days requires the CAT.DAT file 10957
36
501
0
0
30
1
0.00273785 0
0
0
0
N
M
NF
K
R
L
LS
DT
KFLT
LFLT
ENDDATE
MFLT
where the DT is set to 1/365.25. Certainly, the averaging over the 30-day intervals is an approximation but it is used here to demonstrate how the spectrum of the daily data is transformed into a spectrum that contains practically nothing but a low-frequency part and a strong seasonal trend. Running AVESTA1 with this initial file, we get the time series of monthly evaporation values (Fig. 2.9a) and a set of AR models for it for AR orders from 0 (when the time series is supposed to be a white noise) to the maximum AR order M = 36. The spectral density estimate (Fig. 2.9b) contains a sharp peak at the seasonal trend frequency and several higher frequency harmonics. Example 2.4 Hopping Averaging for Spectral Analysis Consider the case of a time series given at a high sampling rate. Sometimes, the sampling rate in the original time series is too fast for proper analysis. For example, let’s assume that we have a stationary time series of length N = 2 × 105 , at the sampling interval DT = 0.05 s. If we are interested in time scales exceeding, say, ten seconds we may lose the frequency resolution in the spectrum of this data: the frequencies which we intend to study will be located at the initial, that is, the lowfrequency, part of the spectrum while most of the frequency axis will be taken by the higher frequency variability, which is of no interest to us.
2.2 Preliminary Processing
35
Fig. 2.9 Time series of monthly evaporation at the Kew West station (a) and its spectral estimate (b)
The sampling interval should be chosen in such a way that the frequencies of interest will be far enough from zero on the frequency axis. At the same time, one should take into account the behavior of the time series spectrum at higher frequencies; otherwise, if the process contains strong high-frequency energy, the spectral estimate obtained from the averaged data may be distorted by contribution from that high-frequency variability. This phenomenon is called the aliasing error: the high-frequency variability “pretends” to be a part of the spectral density at lower frequencies (see Subsection 10.2.3 in Bendat and Piersol 2010). Then, averaging the data may result in a distorted spectrum estimate. The data for this example is a time series of length N = 2 × 105 at the sampling interval DT = 0.05 s which presents a simulated record of the so called microbaroms: “infrasonic waves generated by nonlinear interaction of surface waves in nearly opposite directions with similar frequencies” (Willis et al. 2004). According to Willis et al., 2004, the frequency of interest lies close to 0.2 Hz, that is, the period of those waves is about 5 s. The time series is simulated as an autoregressive sequence of order p = 99. The CAT.DAT file for analysis of the original time series should be: 200000
99
501
0
0
1
1
0.05
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
Figure 2.10 shows the original time series of length 104 s (200,000 observations); it can be regarded as stationary and its optimal autoregressive spectral estimate is AR(98). Obviously, this AR order is too high to analyze the time series in the time domain.
36
2 Analysis of Scalar Time Series
Fig. 2.10 Time series of atmospheric pressure observations at DT = 0.05 s (a) and its spectral estimate (b)
The statistical reliability of this spectral estimate is very high because the number of equivalent degrees of freedom used here to obtain the approximate confidence limit for the estimates is N/p = 2040. The small sharp peak at about 0.2 Hz is seen in Fig. 2.10b at the lowest frequency part of the scale and the estimate contains too much information about the high-frequency part of the spectrum, which is generally of no interest with one exception: it shows that the values of high-frequency components are smaller than what happens at about 0.2 Hz by orders of magnitude. Therefore, averaging the data at L = 20 will leave 5 observations per 5 s, which is enough, and it will not cause any aliasing effects. Now, the file CAT.DAT will look like this: 200000
99
501
0
0
20
1
0.05
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The graph of the new time series (Fig. 2.11a) shows that it can be regarded as a sample of a stationary random process. Its length after the averaging will be 104 s and the optimal autoregressive order for it is AR(6); it has been selected by all order selection criteria. The spectral estimate of the time series corresponding to the model AR(6) is shown in Fig. 2.11b. The 90% confidence interval is very narrow due to the high reliability criterion N/p = 1666. Obviously, the smooth maximum in the spectrum close to 0.2 Hz takes a wide frequency band and it is much easier to study than at the version with the small original sampling interval. These quasi-oscillations generated by ‘storms’ in this simulated data produce an infrasound whose spatial distribution is studied by the authors of the publication referred to above.
2.2 Preliminary Processing
37
Fig. 2.11 The time series of atmospheric pressure observations with DT = 1 (a) and its spectral estimate (b)
Another example of the same type that is not given here and is left for the user to get information about the behavior of the global temperature at climatic time scales can be built with the UEA data. If we use the original monthly temperature averaged over the Northern Hemisphere land (see https://crudata.uea.ac.uk/cru/data/temper ature) and analyze it at DT = 0.08333… we will see that this option with monthly data gives us practically no visible information about the behavior of temperature at the climate scale; its spectrum will show a number of high-frequency peaks including one at the frequency of the seasonal trend f = 1 cpy. If the averaging parameter L is set to 12, the spectral density selected with all order selection criteria will be p = 4, and the shape of the spectrum corresponding to the model AR(4) will look very close to what we see in Fig. 2.11b, except that the dimension of the horizontal axis will now be cycles per year (cpy) and the values and dimension of the spectrum will be (°C)2 /cpy. This exercise will be useful for further work with AVESTA1 and AVESTA3. This concludes Example 2.4.
2.2.4 Seasonal Trend Removal The presence of a strong seasonal trend makes it impossible to study other statistical properties of the time series. The technique used here to remove the seasonal trend is based upon the understanding that the trend presents a sample of a random process containing a strong irregular cycle rather than a strictly periodic function of time. To remove a seasonal trend (daily or monthly) with a known “period” LS, the seasonal
38
2 Analysis of Scalar Time Series
trend removal subroutine calculates the average amplitude and RMS of the trend and then subtracts the average amplitude from the entire time series. Example 2.5 Seasonal Trend Analysis and Removal The previous Example 2.3 allows one to understand how to get rid of the seasonal trend. Averaging the daily data in accordance with the actual number of days per month during 1961–1990, we get a time series of monthly mean evaporation values of length N = 360 shown in Fig. 2.9a (it contains minor errors due to the 30 days averaging interval). The shape of the resulting spectrum does not allow one to understand other properties of the time series. Therefore, the new time series of monthly values is analyzed now by processing it with AVESTA1 and the CAT.DAT file: 360
36
501
0
0
1
12
0.083333
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The results of seasonal trend analysis of this 30-years long time series of monthly evaporation at the Key West station from 1961 through 1990 are reproduced below. The subtraction of this trend (the fouth column) shown in Fig. 2.12a from the entire time series changes its behavior in such a way that its spectral density completely loses the signs of quasi-periodical oscillation (Fig. 2.12b). Moreover, the absolute values of the standardized higher statistical moments given for the file with the seasonal trend deleted are small enough to regard the probability density of this time series as close to Gaussian. The optimal AR model recommended by order selection criteria for the resulting time series is AR(1) with the AR coefficient equal to about 0.44. Its spectral estimate is shown in Fig. 2.12b along with the spectrum obtained for the initial time series shown in Fig. 2.9a. Table 2.4 Seasonal trend’s average Month
RMS
And Its 90%
Confidence
Limits
1
0.580157E-01
0.38447
0.47961
0.57476
2
0.659158E-01
0.45779
0.56589
0.67399
3
0.678345E-01
0.60627
0.71752
0.82877
4
0.605240E-01
0.72733
0.82659
0.92585
5
0.806795E-01
0.66732
0.79963
0.93194
6
0.647457E-01
0.65693
0.76311
0.86929
7
0.566531E-01
0.70511
0.79802
0.89093
8
0.603253E-01
0.63930
0.73824
0.83717
9
0.538918E-01
0.56499
0.65337
0.74175
10
0.579384E-01
0.49798
0.59300
0.68802
11
0.595282E-01
0.40738
0.50500
0.60263
12
0.575218E-01
0.35609
0.45043
0.54477
2.2 Preliminary Processing
39
Fig. 2.12 The seasonal trend of evaporation at Key West (a) and the spectral estimates of the time series with the seasonal trend (gray) and with the trend removed (b). The black line in a is the seasonal trend’s RMS values
Some traces of the seasonal trend remain in autoregressive models of higher orders, for example, in AR(36) but all order selection criteria reject AR(36) along with all other models but AR(1). Moreover, the resulting AR(1) approximation is physically reasonable because a Markov model is supposed to be typical at larger time scales. In this case, the removal of the seasonal trend was successful but it does not mean that this would happen with any time series whose spectrum contains a seasonal trend. An approximate criterion that defines the efficiency of the trend removal operation is the ratio of the trend’s amplitude to its RMS averaged over all LS values. In this case, the ratio was slightly over six. This concludes Example 2.5.
2.2.5 Linear Filtering The goal of the linear filtering operations is to isolate time series variations belonging to specific frequency bands. There are three types of filters: low-pass, high-pass, and band-pass. This operation may be useful in some cases but it can never be ideal: all filters affect the spectrum of the time series at the filter’s input within the entire frequency band from zero to the Nyquist frequency 1/2DT. The length of the filter’s weighting function is 2MFLT + 1 so that the time series at the filter’s output loses MFLT values at its beginning and end. The frequency band selected by the filter is inversely related to the length 2MFLT + 1 of the filter’s weighting function. This means that an attempt to isolate variations within a narrow band of frequencies requires a long weighting function and, consequently, the loss of a large part of the time series.
40
2 Analysis of Scalar Time Series
The programs AVESTA1 and AVESTA3 offer several low-pass and one bandpass filters. The high-pass filtering is done by subtracting the results of the low-pass filtering from the initial time series. Before applying a filter to the time series, one should have a convincing reason for this transformation. First, both the theory of autoregressive modeling and the experience show that the autoregressive approach to time series analysis in time and frequency domains is quite reliable in revealing the structure of time series spectrum providing that the time series is long enough for obtaining a reliable spectral estimate. If the time series is short, its filtering will even worsen the situation. In this section, we will show what happens with the spectrum of the time series after running it through a linear filter: low-pass, high-pass, or band-pass. The results of filtering upon the spectrum are clearly seen if the initial time series is a white noise, that is, if the true initial spectral density is constant within the entire frequency band from zero to the Nyquist frequency 1/2DT. If the variance of the initial white noise time series is 1, its true spectrum equals 2 at all frequencies. Example 2.6 Low-Pass and High-Pass Filtering In accordance with these considerations, the initial time series in this section always presents a white noise sample; its length N = 105 and the variance equals to 1. It has been filtered with a low-pass equal-weights filter with a half-length equal to 2 (MFLT = 2). The file is given in Electronic Supplementary Material. The CAT.DAT file is: 100000
99
501
0
0
1
1
1
2
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
Small parts of the initial, low-pass and high-pass filtered time series are shown in Fig. 2.13a. The true spectrum is given in Fig. 2.13b with a thick horizontal line. The filtering procedure made the initial time series shorter by 4 values and a part of the output is also given in Fig. 2.13a. The spectral estimate of this time series is shown in Fig. 2.13b with a gray line. As follows from the figure, the part of the spectrum above approximately 0.15 cycles per DT is suppressed by orders of magnitude while the low-frequency part (below 0.15 cycles per DT ) significantly differs from the true spectrum. In other words, the low-frequency components are strongly distorted. Increasing the MFLT parameter will help but the length of the time series will decrease.
2.2 Preliminary Processing
41
Fig. 2.13 A white noise time series (black) and results of its low-pass (dashed) and high-pass (gray) filtering (a); the true spectrum (black) and the spectral estimates of the low-pass and high-pass (dashed and gray) filtered time series (b)
The high-frequency part of the time series can be obtained as the difference between the initial and the low-pass filtered time series (the solid gray line in Fig. 2.13a) while the spectrum is calculated as the difference between the true and the low-passed filtered spectra (Fig. 2.13b). The CAT.DAT file in this case should be 100000
99
501
0
0
1
1
1
2
1
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The small wiggling at very low frequencies appears due to the sampling variability of spectral estimates. The high-pass filtering will remove any linear trend if it exists in the time series. Example 2.7 Band-Pass Filtering In concluding this section, consider an example of the band-pass filtering of the same white noise time series. The first 50 values of the initial and filtered time series are given in Fig. 2.14a. The spectral estimate of the time series at the filter’s output built with parameters MFLT = 4 and LFLT = 2 shows that the filter has suppressed time series variations at frequencies below approximately 0.13 cycles per DT and higher than 0.37 cycles per DT by an order of magnitude. In this version, the file CAT.DAT should be: 100000
99
501
0
0
0
1
1
4
0
2
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
42
2 Analysis of Scalar Time Series
Fig. 2.14 A white noise time series (black) and results of its band-pass (gray) filtering (a); the true spectrum (black) and the spectral estimate of the filtered time series (b)
The variability within the band between those borders has not been distorted significantly. In some sense, the results of this band-pass filtering are relatively good, especially for the band between 0.2 and 0.3 cycles per DT. If one wants to filter a short time series with a filter whose weighting function is long, the loss of the data can be avoided through the following technique: • build an optimal autoregressive model of the initial time series with AVESTA1, • add data simulated in accordance with this model at the beginning and end of the initial time series, • execute the filtering operation and then remove the remaining simulated data from both ends of the time series. The resulting time series will have the same length as the initial one. Another important sequence of filtering is a cardinal change in the time series properties which is very well seen from Figs. 2.13 and 2.14. It automatically means that the confidence bounds for all statistical characteristics should now be recalculated with account for the change in the time series variance, covariance and correlation functions as well as in the spectral density. After a filtering operation, the confidence limits for estimates of the time series’ statistical moments become different from what they were initially. Last but not least, by using a band-pass filter with a long weighting function, one can obtain the output time series which contains “strong cyclicity” even in the case when the input time series presents a white noise. Such result would be a pure trick. The time and frequency domain properties of linear filters used here are given in Attachment 2.1.
2.3 Time Domain Analysis
43
2.3 Time Domain Analysis In what follows, we will assume that the reader is familiar with all types of preliminary processing that can be applied to the time series before its autoregressive analysis. As mentioned above, the same transformations of the time series can be executed by the AVESTA3 program. Example 2.8 Correlation Function and Autoregressive Modeling To explain the computation process of the AVESTA1 program, we will again use the recently updated file of the monthly global surface temperature anomalies HadCRUT5 created at the University of East Anglia and available at the web site https://crudata.uea.ac.uk/cru/data/temperature/. This part of the book has been completed in 2021 when the time series available to us was covering the interval from 1857 through 2020. The first seven annual values from 1850 are not used here. In this example, we will be interested in climatic time scales so that using monthly data will produce a large amount of information redundant for climate research. Therefore, the monthly data will be averaged over 12 months, that is, L = 12 in the CAT.DAT file. The length of the time series consisting of the annual data will be 164, which means that the maximum AR order M should be set to 16. At this time, we are not interested in extrapolation of this time series so that the ENDDATE should be equal to zero. The original sampling rate is still 12 observations per year. In order to obtain the time series of annual surface temperature, we will run AVESTA1 with the following initial CAT.DAT parameters as: 1968
16
501
0
0
12
1
0.083333333
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
As seen from Fig. 2.15, the time series of annual anomalies contains a strong positive trend which will be regarded here as its integral part irrespective of the trend’s provenance. A strong linear trend exists in all nine time series of global, hemispheric, oceanic, and terrestrial temperature published by the University of East Anglia and by other sources. With the time series of annual data, the file CAT.DAT will look as shown below. 164
16
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
Obviously, the calculations can also be done with the monthly data and then the CAT.DAT file will be 1968
16
501
0
0
12
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
44
2 Analysis of Scalar Time Series
Fig. 2.15 Anomalies of annual global surface temperature HadCRUT5, 1857–2020, according to the UEA data
The linear trend in the time series will not be deleted in this example (K = 0). The frequency interval from 0 cpy to 0.5 cpy (cycles per year) will contain 501 value of spectral density (NF = 501). Preparing the CAT.DAT file takes a few minutes and the run of AVESTA1 takes slightly over 0.01 s. The program will ask the user for the names of the initial time series file (e.g., GL5) and the resulting file (e.g., GL5_nodt.res). After running the program with monthly data, the output file will contain the following information: • parameters from CAT.DAT, • the initial time series of monthly values of length N = 1968, • its basic statistical moments (mean value, variance, root mean square value, skewness, kurtosis, and the standardized values of the last two), • the time series after averaging and the basic statistical moments, • the estimated trend slope and (in this case) a warning that a statistically significant trend will not be deleted; • a sample estimate of the correlation function for lags from 0 to N/5 (but not more than 200), • AR orders, AR coefficients, and NF = 501 values of the spectral density estimate for each autoregressive order from 1 through M, • white noise (innovation sequence) variance estimates for models of orders from 0 through 16, • values of the five order selection criteria, and • optimal AR orders recommended by the order selection criteria.
2.3 Time Domain Analysis
45
The following additional information will be given only for the optimal model selected by the corrected Akaike criterion AICc: • the autoregressive estimate of the extended correlation function for lags from 0 through 200, • information required for determining the confidence bounds for the estimates of mean value and RMS, including respective numbers of mutually independent observations (see Yaglom, 1987, Vol. 1, Ch. 3), • 90% and 95% confidence limits for estimated mean value and RMS, • estimates of the AR coefficients for the model selected by the corrected Akaike criterion AICc, their 90% confidence limits and the RMS of estimate errors, • the number of degrees of freedom for determining the approximate confidence bounds for the selected spectral estimate, • the autoregressive (maximum entropy in the Gaussian case) spectral estimate for the optimal model with the approximate 90% and 95% confidence limits, • predictability criteria: the relative predictability and the correlation coefficient between the unknown true and predicted values of the time series as functions of prediction lead time, • extrapolation results from the initial point coinciding with the last term (ENDDATE) of the time series that is being extrapolated and the forecast values with 90% and 95% confidence limits. The predictability information is given only in the case when the parameter ENDDATE is not zero. As with the monthly data, the statistics for the annual temperature data show that it is not Gaussian (the standardized skewness exceeds 5). Usually, the criterion AICc selects the model of the highest order so that the way to make AICc to select a smaller order is to put M to the desired value. As mentioned above, the printout contains the first 200 values of the maximum entropy extension of the HadCRUT5 correlation function. The extension is built by using the first p values of the sample correlation function, where p is the selected AR order (M = 4 being the order of the optimal model) and the analytical expression for the correlation function for higher values of the lag (Box et al., 2015, Chap. 3). As seen from Fig. 2.16, the two correlation function estimates are very different and the analytical approximation describes HadCRUT5 variations as a process with a rather longer memory than it follows from the sample estimate of correlation function given at the beginning. Generally, an estimated correlation function is not regarded as a convenient characteristic of time series properties, especially, when the time series is not long. However, the maximum entropy estimate of the correlation function is used in AVESTA1 to determine the number of mutually uncorrelated (independent, in the Gaussian case) observations in the time series. There are two such numbers: one for the mean value estimation and one for the estimation of RMS; both depend upon the behavior of the entire correlation function. The availability of an analytical recurrent expression for the maximum entropy extension of the
46
2 Analysis of Scalar Time Series
Fig. 2.16 The sample (black) and maximum entropy estimates of HadCRUT5 correlation function
correlation function allows one to calculate respective confidence bounds for these statistical moments (Yaglom, 1987, Vol. 1, Chapter 3). This information is given at confidence levels 0.90 and 0.95. Note that in this case the maximum entropy (more accurately, autoregressive) estimate of correlation function decreases very slowly so that the numbers of independent observations required for finding confidence intervals for estimates of the mean value and variance are very small (1 and 3). This is caused by the dominance of low frequencies in the HadCRUT5 spectrum, which will be discussed later. It should be noted that the information about the confidence limits for the mean value and variance estimates is given to show the user that generally the time series possess internal (serial) correlations which affect the quality of statistical estimates and that the quality becomes worse if the serial correlation is high. In a way, it can be regarded as a decrease in the time series length. In any practical case, the algorithm of the autoregressive time series analysis always looks for a solution that satisfies the stationarity requirements. The information about the number of mutually independent (uncorrelated) values in the time series and the confidence intervals for the mean value and variance estimates show the degree of time series closeness to nonstationarity. The quantitative information is given by a sequence of autoregressive equations in the time domain model and its characteristic equation: the roots of the characteristic equation should lie outside the unit circle. The AVESTA1 program does not provide the latter information but each model of orders from 1 to M contains a list of coefficient estimates which can be used to find the roots of the characteristic equations and its related quantities (see Box et al., 2015 and the text below). The printout contains information about statistical reliability of estimates of the AR coefficients. In this case,
2.3 Time Domain Analysis
47
Table 2.5 AR coefficients, their 90% confidence bounds and RMS errors No 1
lower bound
COEFFS
0.4932
0.6143
upper bound
RMS
0.7354
0.0739
2
− 0.1557
− 0.0043
0.1470
0.0923
3
− 0.0410
0.1104
0.2617
0.0923
4
0.1272
0.2481
0.3693
0.0739
two AR coefficients contained in the model are significant (Table 2.5); they equal to approximately 0.61 and 0.25 and indicate a positive sign effect upon the current value of the time series. This means that the dependence of HadCRUT5 upon its past is quite strong. Note that all four coefficients should be taken into account when working with the time and frequency domain properties of the time series. Specifically, the time domain model of this time series is given with the following stochastic difference equation: xt = ϕ1 xt−1 + ϕ2 xt−2 + ϕ3 xt−3 + ϕ4 xt−4 + at , where ϕ j , j = 1, …, 4 are the coefficients given in the third column of the table and at is the innovation sequence, which presents a white noise with the variance (in this case) 0.158608E−01 given in the printout after the spectral estimates corresponding to the maximum AR order (16 in this case). Obviously, the AR coefficients describe the dependence of the current value of the time series upon its past values. For this time series, the dependence is stronger for the previous year t − 1 and for the year t − 4. The values of temperature in years t − 2 and t − 3 play a smaller role. All but one terms in the right side of the above-given equation are deterministic, at time t they are known from observations and their contribution to the current value of the time series is also known. The only term which is not deterministic is, of course, the value at of the white noise innovation sequence and it obviously presents the error of prediction (or extrapolation) of the time series x t at the unit lead time. Therefore, the variance of the innovation sequence defines the variance of prediction error while the ratio of the respective RMS to the time series RMS presents a criterion characterizing the time series predictability. Moreover, it also means that no method of linear extrapolation can have an error variance smaller than the innovation sequence variance in the above-given equation. The issue of statistical predictability of stationary random processes is discussed in detail in Sect. 2.5; here, we are just stressing the fact that the results of time series analysis with AVESTA1 contain some important information about the time series predictability even in the case when the user is not interested in predicting the time series (ENDDATE = 0). A useful additional information about statistical properties of annual global temperature can be obtained by analyzing the roots of the equation that describes the autoregressive model of the HadCRUT5 time series. It can be obtained by finding the roots of the polynomial with coefficients [1 − ϕ 1 − … − ϕ 4 ]. In this case,
48
2 Analysis of Scalar Time Series
it has four roots: two real and two complex-conjugated. One real root is close to 1 (0.98), which shows that the time series is close to being a sample of a nonstationary random process. The complex-valued roots reveal the presence of oscillations with a frequency close to 0.23 cpy but their damping coefficient is large – about 0.68. The program AVESTA1 provides the autoregressive coefficients while more detailed information about the oscillatory behavior can be obtained from the Matlab procedure roots (also see Box et al. 2015, Chap. 3). In particular, if Gj is the ith root of the polynomial [1 − ϕ 1 − ϕ 2 − ϕ 3 − ϕ 4 ], the frequency of the damping oscillation will be f i = atan[Im(Gi )/Re(Gi )]/2π and the damping coefficient will be |Gi |. In this case, the oscillations with a time scale of about 4 years do not play a noteworthy role in the balance of climate variability. This concludes Sect. 2.3.
2.4 Frequency Domain Analysis A spectral estimate is necessary for describing any stationary time series. If you do not know the spectrum of the time series, you know next to nothing about its properties as a function of both time and frequency. The spectral density shows whether the random process (climate in the case of annual global temperature) that generated the time series contains any cyclic oscillations, shows the degree of low frequency dominance, and gives one an idea of the process’ statistical predictability, that is, tells the user to what extent one may hope to get a reliable probabilistic forecast of the time series. If the time series is not stationary, the task of its analysis and prediction cannot be solved unless the dependence of its statistical properties upon time is known in advance. The nonstationary random processes are not discussed in this book. The reliability of frequency domain information is defined by the number of degrees of freedom for the spectral estimate. It is calculated here as a ratio of the time series length to the AR order; in the current case, it is N/p, that is, 41. This approach is approximate and it follows the solution given in older editions of the Bendat and Piersol books. The decision to use this approximation is related to the multivariate autoregressive spectral analysis: a mathematically strict way to calculate confidence intervals for estimates of frequency dependent functions does not exist because the approach to multivariate spectral analysis through multivariate time domain models seems to have never been applied (or published) before in natural sciences. Example 2.9 Spectrum of Annual Global Surface Temperature As mentioned above, getting an estimate of the spectral density must be a part of analysis of any stationary time series. Besides, the autoregressive time domain models of some geophysical time series have high autoregressive orders, which makes it impossible to study the time series using only the time domain information. The spectral estimate corresponding to the AR(4) model of the global annual temperature that contains a statistically significant linear trend is shown in Fig. 2.17a.
2.4 Frequency Domain Analysis
49
Fig. 2.17 Autoregressive spectral estimates of annual global surface temperature HadCRUT5, 1857-2020, with the linear trend present a and removed b
It decreases with frequency by about three orders of magnitude and contains a statistically significant swelling at the frequency of about 0.22 cpy. If the linear trend is deleted, the time series HadCRUT5 becomes Gaussian, its variance drops by almost four times, the spectral density diminishes at frequencies below 0.02–0.03 cpy, but the shape of the spectrum remains unchanged though the smooth peak at about 0.22 cpy becomes slightly more reliable (Fig. 2.17b). The change of the spectrum at low frequencies is not very important because the properties of global temperature cannot be described reliably at time scales comparable to the time series length. On the whole, the almost monotonic shape of the spectral density characterizes the ergodic time series HadCRUT5 and the random process that generated it as a rather simple stochastic system with no striking features such as sharp peaks or troughs. The statistically significant smooth peak near 0.2 cpy has been mentioned in the time domain part of this example and will be discussed in Chap. 3. This is all that can be said at this time about the spectrum of the time series HadCRUT5. In concluding this example and Sect. 2.4, the major probabilistic and physical properties of the time series of annual global surface temperature represented with the time series HadCRUT5 can be formulated as follows: • the original HadCRUT5 time series is not Gaussian, • it contains a statistically significant linear trend (which in this case is regarded as an integral part of its variability); the dominance of low frequencies in variations of temperature is a characteristic physical feature of climate presented with this time series, • within the set of the 16 AR models fitted to the time series HadCRUT5, all order selection criteria show the AR(4) model as optimal, • the model indicates a strong dependence of the current HadCRUT5 value upon what happened especially one and four years ago,
50
2 Analysis of Scalar Time Series
• the time series is stationary; its natural frequency is close to 0.23 cpy and the damping coefficient is close to 0.68; therefore, the oscillations at that frequency disappear very quickly, • as the estimated spectral density is almost monotonic and increases by orders of magnitude at the lowest frequencies, the contributions of higher frequencies to variations of the global temperature including the maximum at about 0.23 cpy is small, • the predominance of low frequencies in variations of temperature shows that the time series will probably have a relatively high statistical predictability. This latter property is discussed below in detail. This concludes Example 2.9 and Sect. 2.4.
2.5 Statistical Predictability and Prediction A reliable prediction of nature-generated processes represented with time series is probably the most desirable goal in natural sciences. The term prediction (extrapolation, forecast) means obtaining future values of a time series at the times that exceed the time of the last observation. It also means that the predicted trajectory of the time series values will be supplemented with confidence limits given at some confidence level. The time interval between the last observation and the predicted value is called the lead time. Both the common sense and theory of random processes tell us that any proper prediction must contain solutions of these two tasks: determine the probable trajectory of the time series over the prediction interval and determine the prediction error at every lead time. Any prediction that does not contain all this information is useless. The first requirement is clearly related to the probability density function of the process that is being predicted: if you do not know that function, you cannot determine the probable future trajectory of the process. The estimate of the forecast error also depends upon the time series PDF. Obviously, any prediction with an unknown accuracy is worthless. The desire to have a reliable method of prediction may explain at least partially the fact that we have so many methods of prediction in natural sciences that the choice of the best method becomes a difficult problem by itself. A good example in this respect is a relatively recent publication by Papacharalampous et al. (2018) where the efficiency of extrapolation at the unit lead time was tested experimentally for twenty different methods and several other methods were mentioned in their review of literature. A partial list of such methods of forecasting used or mentioned by the authors of the above publication includes regression modeling with or without exponential smoothing, neural and neurofuzzy networks, machine-learning, farmer’s and naïve methods, ARIMA, ARFIMA, etc. The necessity to test those numerous methods
2.5 Statistical Predictability and Prediction
51
clearly tells us that the authors of the article believe that the problem of time series prediction has not been resolved in the theory of random processes. The foremost reason for this ungainly situation in natural sciences is much simpler: it is the insufficient knowledge of theory of random processes revealed by numerous authors in their never-ending attempts to construct a better method of time series forecasting and apply it to real data. The forecasting of random data in the form of time series is a mathematical problem and it must be treated within the framework of mathematics, specifically, the theory of random processes, rather than on the basis of any considerations which do not have a reliable mathematical foundation or upon the results of simulations. All methods of forecasting must have analytical solutions for the two functions mentioned at the beginning of this section: the probable future trajectory of the time series and respective prediction error characteristic given in the form of equations. Any method of forecasting that is not based upon a proper mathematical theory and does not produce this information is senseless. Comparisons of different methods using examples of forecasts cannot be regarded as a proof of method’s efficiency or inefficiency. The lack of a mathematical theory for a method of forecasting makes it unreliable, to say the least. Actually, the problem of extrapolation of stationary random processes had been solved once and forever over 80 years ago by Andrey Kolmogorov (1939, 1941) and Norman Wiener (1949) independent of each other; their theory of extrapolation constitutes a fully accomplished part of the theory of random processes. A. Kolmogorov published his theoretical works in 1939 and 1941 while N. Wiener, who was unaware of Kolmogorov’s publications, developed in 1942 the same theory of extrapolation and augmented it with a prediction method. Their pioneering works constitute the foundation of theory of extrapolation of time series generated by stationary random processes. To our shame, this theory is practically unknown in natural sciences. In most books and articles, partially or in full dedicated to forecasting, the KolmogorovWiener theory (KWT in what follows) is not even mentioned. All nature-generated processes occurring in the Earth system (with the exception of tides) are random and the ignorance of the theory and methods of time series prediction created in probability theory, does not do honor to our sciences. The Kolmogorov-Wiener theory had been developed by its distinguished authors for the stationary random processes, that is, for the processes whose statistical properties do not depend upon the time origin. In particular, the spectral density of the processes, which can be extrapolated on the basis of KWT, must be such that the integral of its logarithm over the frequency band from − 1/2DT through 1/2DT in the case of the discrete stationary random processes is finite. This, by the way, is one of the reasons why the knowledge of the spectral density is mandatory for understanding and predicting the time series and the processes that generated them. If this theoretical requirement to the spectral density is met (and it is often met in real situations), it turns out to be possible to develop an equation for extrapolating the probable trajectory of the time series into the future as a function of its history and to build a formula for the variance of the extrapolation error as a function of
52
2 Analysis of Scalar Time Series
the lead time. The latter quantity presents the most common measure of the time series predictability and it will be used in this book. As it has been said before, any prediction without mathematically correct information about its error variance is worthless. It should also be stressed that actually the only requirement to the random process that one wants to extrapolate into the future is that the process is stationary. The future behavior of a nonstationary process (that is, the process whose statistical characteristics vary with time and depend upon the time origin) cannot be predicted reliably without the knowledge of its statistical properties in the future, which is generally impossible when we have a time series of a finite length. The Kolmogorov and Wiener extrapolation theory proves that it ensures the smallest possible extrapolation error variance for any stationary random process having a Gaussian (normal) probability distribution. It means that if one has a Gaussian time series, its forecast must be made in accordance with the KWT because no other approach, linear or nonlinear, can produce better results than a method based upon this theory. Moreover, there cannot be any linear method of extrapolation that has an error variance smaller than what follows from the KWT. This statement is true irrespective of the probability density function of the time series that is being extrapolated. Therefore, if your time series is not Gaussian, the only way to get an extrapolation error variance smaller than what follows from the linear theory of Kolmogorov and Wiener is to use a nonlinear approach. The problem of the best possible linear extrapolation has been solved many decades ago and, if a proper linear method of forecasting is used, the respective error variance is the smallest. Building a mathematically proper nonlinear theory and methods of extrapolation presents a separate task and it is relevant only for the non-Gaussian random processes. Yet, in numerous publications dedicated to time series forecasting the issue of probability distribution function is not even mentioned. Moreover, the “modern and approved” methods of forecasting are supposed to work without any information about the time series spectrum. The above given statements regarding the absolute superiority of KWT-based extrapolation of Gaussian stationary random processes needs more comments. In any specific case, any individual empirical forecast, successful or unsuccessful, is not important mathematically. When we operate with concepts lying within the theory of random processes such as mean values, variances, and other statistical moments including the spectral density, we mean the mathematical expectations rather than the results of individual tests or experiments. It means that one should have a mathematically proper equation for calculating the most probable future trajectory of the time series and another equation to determine the prediction error variance as a function of the lead time. This is why the results of Monte Carlo experiments cannot be regarded as a mathematical proof but such a proof does exist in the case of time series extrapolation within the Kolmogorov-Wiener theory. And there is absolutely no sense in inventing new methods of probabilistic forecasting of stationary time series not supported with a mathematical foundation. Such foundation exists in the form of the KWT in all
2.5 Statistical Predictability and Prediction
53
stationary linear and Gaussian cases while no single method of extrapolation can be created for all non-Gaussian processes. Each of them requires a solution taking into account the specific probability density function and spectral density. Thus, if I am forecasting a Gaussian time series and use a nonlinear method, I must prove that the Kolmogorov and Wiener theory is incorrect. Otherwise, I am doing a senseless work. If a time series belongs to a non-Gaussian process, it is theoretically possible to obtain a mathematically strict solution which would be more accurate than what is achievable within KWT but in any such case the mathematical basis and results of extrapolation should be compared with what follows from the Kolmogorov-Wiener theory. Unfortunately, this powerful and at the same time simple theory created by the two giants of mathematics over 80 years ago is still not known in natural sciences including the Earth and solar sciences. The methods of extrapolation of stationary Gaussian time series have been evolving from a very cumbersome approach through the correlation function developed by N. Wiener in 1942 to a simpler method based upon the spectral density (Yaglom 1962) and then to the ARIMA methods which became available for practice over half a century ago (Box and Jenkins, 1970, with four later editions through 2015). The method of time series forecasting used in the AVESTA1 program is the autoregressive extrapolation. It completely agrees with the Kolmogorov-Wiener theory and it is quite simple in practical applications. Many natural processes have a Gaussian PDF but the cases when it does not happen are also numerous. Thus, if the time series is not Gaussian, one may hope that nonlinear methods of extrapolation may produce better results. The problem here is that the number of distributions that differ from the Gaussian is large and it is hardly possible to create a single theory of extrapolation for all of them; indeed, the Kolmogorov-Wiener theory had been created for extrapolation of only Gaussian processes. Every non-Gaussian time series has to be studied separately, its probability density function and spectrum must be analyzed, and the optimal extrapolation and error variance equations must be formulated for it. Moreover, extrapolation in accordance with the KWT ensures the minimal error variance of linear extrapolation irrespective of the time series PDF. In the general (not necessarily Gaussian) case, the task of predicting the future behavior of a time series requires a function of the past that describes the probable future trajectory and an equation for the prediction error variance that would be smaller than for other probability distributions. These quantities will be specific for each type of non-Gaussian processes and it will be “a very complicated statistical characteristic of the [time series], which cannot be expressed in terms of the correlation function” (Yaglom 1962, p. 99). The nonlinear approach to extrapolation of non-Gaussian stationary random processes is complicated; the gain in the accuracy of its results may not be large and it is desirable, as recommended, for example, in the monograph by J. De Gooijer (2017, Chap. 4, p. 119), to begin “from a linear model and abandon it only if sufficiently strong evidence for a nonlinear alternative can be found”.
54
2 Analysis of Scalar Time Series
To demonstrate the importance and correctness of this opinion, this chapter contains the first English translation of a unique research presented by A.M. Yaglom and his two young colleagues in 1960 to a conference dedicated to probability theory and mathematical statistics (Attachment 2.2). The conference took place in the former Soviet Union and the report had been published in the Russian language in 1962 in the former Soviet republic of Lithuania. Since then, it stays seemingly unknown in the world literature and even seems to be unavailable on the Internet. The report describes 13 non-Gaussian stationary random processes, defines respective PDFs and spectra, provides equations for their optimal extrapolation and for determining the error variance. It demonstrates that in all 13 cases the results of mathematically strict nonlinear extrapolation are better than the results of the optimal linear extrapolation within the framework of the Kolmogorov-Wiener theory by just a few per cent. The participants of the conference were mathematicians and understanding this work in detail is not easy. However, it gives us a clear example how to properly approach the task of time series extrapolation. Besides, it seems that this publication is not known even to the mathematical community. As seen from the Yaglom’s work, in order to build a method for extrapolation, execute it, determine its accuracy and, if necessary, compare it with a different solution, one needs to satisfy several requirements which are listed below first in a simple form without any mathematics and with a slightly more complicated form illustrated with examples given by A. Yaglom. The simple form: • you must have a mathematical model of you process, including its non-Gaussian probability distribution function and its spectrum, • using the model, define respective extrapolation method and determine its accuracy, in particular, through a predictability criterion (it can be the relative predictability criterion (RPC) used in AVESTA1 and in the extrapolation examples given below), • extrapolate the time series, • compare the results of linear and nonlinear forecasts. No empirical approach can serve as a substitute for the mathematical way of solving the problems of extrapolation. As mentioned before, no tests of Monte Carlo type can prove that this or that method of extrapolation is better or worse than some other method. It is a mathematical problem and it must be solved using the analytical tools provided by mathematics. Certainly, the results obtained by A. Yaglom do not mean that efforts to forecast non-Gaussian random processes are useless. According to him, more solutions of this problem are highly desirable but they must be done within the theory of random processes and by no means can be based upon empirical methods. In the light of this unique work by A. Yaglom and of the recent recommendation by D. De Gooijer, no nonlinear extrapolation should be given without comparing it with the classical linear approach. Besides, getting “sufficiently strong evidence for a nonlinear alternative” (De Gooijer 2017, p. 119) would hardly be an easy task.
2.5 Statistical Predictability and Prediction
55
An example of such a process in solar science is the sunspot numbers known since the 17th Century; a satisfactory linear forecast of this process does not seem to be possible, at least for the monthly data. Such nonlinear predictions do exist but they do not seem to include quantitative comparisons with the results of linear extrapolation within the Kolmogorov-Wiener theory. Currently, the linear predictions can be easily obtained by using the autoregressive modeling of respective time series and then extrapolating it in accordance with the optimal AR model (see Box et al. 2015). This is actually what the AVESTA1 program does with the scalar time series belonging to stationary random processes. Thus, the exhaustive list of actions for time series forecasting must include the following stages: • evaluate the probability density function (PDF) of the time series, • estimate the spectral density, • extrapolate the time series and determine the extrapolation error variance in accordance with the KWT, • if the PDF is Gaussian, the task is complete, • if the PDF is not Gaussian and you believe that the results of your KWT extrapolation are not satisfactory, develop a mathematically correct method for your specific non-Gaussian time series, apply it, and • compare your new results with the results of the linear extrapolations based upon the KWT. For details, see the classical book of Yaglom (1962, Chap. 4) and Attachment 2.2. The forecasts with the AVESTA1 program are linear and obtained in accordance with the Kolmogorov and Wiener theory of extrapolation. Example 2.10 Extrapolation of Annual Global Surface Temperature The time series of annual global surface temperature HadCRUT5 for 1857-2020 contains 164 terms; we will produce and discuss examples of its extrapolation starting with the ENDDATEs 2010 and 2016 and then compare the forecasts with independent observations obtained after those dates. The initial file of the annual global temperature is HadCRUT5 and the results of its analysis and forecasting will be HadCRUT5_frcst_2010.RES and HadCRUT5_ frcst_2016.RES. For the case of no trend removal and the last known value of HadCRUT5 in 2010, the CAT.DAT file will be 154
15
501
0
0
1
0
1
0
0
0
2010
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
56
2 Analysis of Scalar Time Series
A run of AVESTA1 with this CAT.DAT initial file shows that the time series HadCRUT5 with the linear trend not removed, is not Gaussian though the deviation from a normal PDF is not very large: the standardized skewness exceeds the threshold of 2.0 by about twice. The time series is best presented with an autoregressive model AR(4) selected by all five order selection criteria. Actually, some important information about the potential predictability can be obtained from the printout of analysis results at the stage of selecting the optimal AR order for the time series. The second column of the text given in Table 2.6 below contains values of innovation sequence (W.N.) variance for every model calculated for the HadCRUT5 time series from 1857 through 2010 during the run of the program. These values are the error variances of the optimal linear extrapolation with the AR models of the order given in the first column at the lead time equal to the sampling interval DT, whatever it is: minute, hour, year, etc. (also see the text below Table 2.5) The error variance always monotonically increases with the lead time until it reaches the variance of the process. According to these results, the time series approximated with the AR(4) model has rather high predictability because√the RMS of prediction error at the unit lead time amounts √ to about 0.12 (that is, 0.01448) while the time series RMS is 0.30 (that is, 0.091296). As the trend is not deleted in this case, the relatively high predictability extends to almost 20 years (Table 2.7). Table 2.6 Innovation sequence variances and order selection criteria Order
W.N. variance
AICc
RIC
BIC
PSI
CAT
0
0.912960E−01
− 1.37367
− 2.37360
− 2.36746
− 2.37917
− 10.8823
1
0.172135E−01
− 3.03513
− 4.02184
− 4.00972
− 4.03314
− 57.6455
2
0.166150E−01
− 3.06343
− 4.03674
− 4.01900
− 4.05413
− 59.3475
3
0.154876E−01
− 3.12646
− 4.08620
− 4.06320
− 4.11004
− 63.3091
4
0.144819E−01
− 3.18623
− 4.13221
− 4.10432
− 4.16287
− 67.3448
5
0.145794E−01
− 3.17200
− 4.10404
− 4.07164
− 4.14190
− 66.4377
Table 2.7 Predictability criteria Predictability Parameters No
Lead time
REL_PRED-TY
COR_coeff
0
0.00000
0.0000
1.0000
1
1.00000
0.3983
0.9173
2
2.00000
0.4626
0.8866
12
12.0000
0.6924
0.7215
20
20.0000
0.7965
0.6046
2.5 Statistical Predictability and Prediction
57
Fig. 2.18 Predictability criteria for the HadCRUT5 time series, 1857–2010: relative predictability (black) and correlation coefficient between the predicted and unknown future values (gray)
The forecast given below is done for the case when the limit of statistical predictability is defined as the lead time that corresponds to the year when the relative predictability criterion (RPC) equal to the ratio of the prediction error RMS to the time series RMS becomes equal to 0.7 or 0.8. As seen from the table, these limits are reached at lead times of 12 or 20 years. A partial graph of the criteria values as a lead time function is given in Fig. 2.18. The results of extrapolation are given in the table below and in Fig. 2.19a. As seen from the figure, the results are satisfactory for the first four years and then they go out of the 90% confidence interval. This is not a very satisfactory forecast but it will contain just three observations lying above the confidence level 95%. The forecast from 2016 is quite satisfactory: the independent observations for 2017–2021 stay within the 90% confidence interval and it looks like the average temperature in 2021 will also be there (these forecasts were made in September, 2021 and the observed temperature in 2021 was added in May, 2022). The extrapolation of this time series under the assumption that the trend is caused by external factors (e.g., by the anthropogenic forcing) makes the statistical predictability of HadCRUT5 much weaker because of the disappearance of the linear trend effect. If we follow the criterion of the threshold relative predictability of 0.8 introduced here, the predictability limit will be just four years. On top of the forecast of the detrended time series, we would have to add the effect of the deterministic artificial linear trend. After that, the results of such extrapolations with the last observed values in 2010 and 2016 shown in Fig. 2.20 will barely differ from
58
2 Analysis of Scalar Time Series
Table 2.8 Linear extrapolation results for HadCRUT5, 1857–2010 Least-squares linear predictions According to AR-model of order 4 (significance levels for the predictions confidence bounds are 90% and 95%) Time
Lower95
Lower90
Forecast
Upper90
Upper95
RMS error
2007.000 0.59200000
0.59200000 0.59200000 0.59200000 0.59200000 0.0000
2008.000 0.46600000
0.46600000 0.46600000 0.46600000 0.46600000 0.0000
2009.000 0.59700000
0.59700000 0.59700000 0.59700000 0.59700000 0.0000
2010.000 0.68000000
0.68000000 0.68000000 0.68000000 0.68000000 0.0000
2011.000 0.36571825
0.40422733 0.60158634 0.79894534 0.83745442 0.1203
2012.000 0.26055728
0.30528746 0.53452962 0.76377179 0.80850197 0.1398
2013.000 0.25087667
0.29764380 0.53732530 0.77700679 0.82377392 0.1461
2014.000 0.25780989
0.30605471 0.55330941 0.80056410 0.84880892 0.1508
2015.000 0.21672607
0.26876690 0.53547608 0.80218526 0.85422608 0.1626
2016.000 0.16815073
0.22353825 0.50739926 0.79126027 0.84664779 0.1731
2017.000 0.13972762
0.19740123 0.49297839 0.78855556 0.84622916 0.1802
2018.000 0.12245705
0.18195409 0.48687641 0.79179873 0.85129577 0.1859
2019.000 0.98825898E − 010 0.16036737 0.47576733 0.79116730 0.85270876 0.1923 2020.000 0.70897868E − 01
0.13447521 0.46030905 0.78614289 0.84972023 0.1987
2021.000 0.46311736E − 01
0.11167930 0.44668801 0.78169673 0.84706429 0.2043
2022.000 0.02582029
0.09276766 0.43587286 0.77897806 0.84592543 0.2092
Fig. 2.19 Extrapolation of HadCRUT5 (gray) from 2010 (a) and from 2016 (b) with 90% confidence limits (dashed). The trend is not removed. Later independent observations are shown with black circles
2.5 Statistical Predictability and Prediction
59
Fig. 2.20 Extrapolation of HadCRUT5 (gray) from 2010 (a) and from 2016 (b) with 90% confidence limits (dashed). The trend is removed as caused by an external forcing. The later independent observations are shown with black circles
what we have in Fig. 2.19. In other words, the provenance of the trend (whether it is natural or artificial) is not important. The new features in Fig. 2.20 include a slightly narrower confidence band, satisfactory predictions for 2018 and 2021, and a slightly slower rate of tending to the time series mean value or close to it, depending upon the shape of the probability density function. The error variance tends to the variance of the process. An important change of statistical properties of the detrended time series is that it became Gaussian. Therefore, the extrapolations shown in Fig. 2.20b has the absolutely minimal error variance, and no other method, linear or nonlinear, based upon the past values of annual temperature, which disagrees with the Kolmogorov-Wiener theory, can produce better results. This concludes Example 2.10. Example 2.11 Extrapolation of Sunspot Numbers This example begins with extrapolation of annual sunspot data. The initial data for the monthly sunspot numbers (SSN) is taken from the KNMI web site https://cli mexp.knmi.nl/data/isunspots.dat. The CAT.DAT file for analyzing the entire time series of annual SSN from 1749 through 2020 should be 3264
27
501
0
0
12
0
0.0833333
0
0
0
2020
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The resulting annual time series of length N = 272 is shown in Fig. 2.21a; the optimal model for it is AR(9) and the respective spectral estimate contains a sharp peak at f = 0.091 cpy (Fig. 2.21b).
60
2 Analysis of Scalar Time Series
Fig. 2.21 Sunspot numbers, 1749–2020 (a) and its spectral estimate with 90% confidence bands (b)
The predictability of annual SSN data is rather high (Table 2.9), so that the limit of predictability corresponding to the threshold of 0.8 is achieved in 12 years. However, the limit is achieved much sooner if the relative predictability criterion is changed to 0.7: it happens just in three years. The RPC threshold of 0.7 means that the 90% confidence interval for the prediction becomes equal to the RMS of the entire time series. To compare forecasts with independent observation data, consider the cases with the last known values of SSN in 2011 and in 2016. The values of N in the CAT.DAT file should be changed for these years to 3156 and to 3216 and the ENDATE parameter to 2011 and 2016. The PDFs of the annual time series are not Gaussian due to high asymmetry (standardized asymmetry close to 5 while the absolute values of kurtosis stay less than 2). Both examples of extrapolation of the annual SSN time series in Fig. 2.22 are quite satisfactory. Thus, the trajectories of the SSN predicted from 2011 and 2016 for lead times of 10 and 5 years proved to be good in spite of the deviation of SSN’s probability density function from Gaussian. A recent (May of 2022) update Table 2.9 Predictability parameters
#
Lead time
0
0.00000
REL_PRED-TY
COR_COEFF
0.0000
1.0000
1
1.00000
0.3909
0.9205
2
2.00000
0.5994
0.8004
3
3.00000
0.7048
0.7094
0.7300
0.6835
10
10.0000
0.7521
0.6591
11
11.0000
0.7674
0.6412
12
12.0000
0.8059
0.5920
4
4.00000
2.5 Statistical Predictability and Prediction
61
Fig. 2.22 Extrapolation of annual sunspot numbers with the initial dates 2011 (a) and 2016 (b) and 90% confidence bounds. The forecast was made in September 2021, the value for 2021 added in June 2022. The observed value for 2022 is 83.0.
showed that the forecasts were correct for 2021 as well (the data for 2021 is available at https://www.sidc.be/silso/datafiles/). According to that source, the mean annual sunspot number in 2021 was 29.6. A nonlinear approach would hardly produce much better results for the extrapolation trajectory but, theoretically, it may lead to a narrower confidence interval. The forecasts follow the 11-yr cycle of sunspot numbers and, formally, remain quite satisfactory even at lead times up to 15–20 years in spite of the sharp difference between the PDF of the annual SSN and the normal probability distribution. The “satisfactory” extrapolations at large lead times are good because all independent data lie within the 90% confidence interval whose width is ± 1.64σ (τ), where σ (τ ) is the RMS prediction error at lead time τ (also see Fig. 2.23). Obviously, the very strong difference of the best fitting PDF for this time series (a six-parametric phased bi-Weibull distribution according to the Kolmogorov criterion) from Gaussian for the annual SSN data does not make our forecasts incorrect and, consequently, a nonlinear solution of the extrapolation task would hardly be much better than what is obtained here within the KWT. It should also be mentioned that the SSN amplitude was diminishing in both cases from the initial prediction point (1979 and 1984) to 2020 and the forecast reproduced this long-term phenomenon. A more detailed information about statistical predictability of sunspot numbers is obtained from the monthly data from 1749 through 2020 (Fig. 2.24). The CAT.DAT file is: 3264
99
501
0
0
1
0
0.0833333
0
0
0
2020.91666
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
62
2 Analysis of Scalar Time Series
Fig. 2.23 Long-term extrapolations of the solar cycle at lead times 42 years (a) and 37 years (b)
Fig. 2.24 Time series of monthly SSN, 1749–2020 (a) and its spectral estimate with 90% confidence limits (b)
The optimal model for the entire time series and for most other time intervals is AR(34), that is, the dependence of monthly values upon the past behavior of the time series extends to almost three years. The statistical predictability of monthly SSN for the entire time series is quite high: it stays below 0.7 for 19 months and below 0.8 for 27 months (Table 2.10). The forecast from December 2020 through September 2021 (Fig. 2.25) at lead times up to 9 months is quite acceptable. The probability density functions of the entire time series and its shorter versions is different from Gaussian and have large asymmetry. The best-fitting PDFs include the monotonically decreasing three-parameter Pert and the general Pareto distributions. This may be the reason for the lower quality forecasts of monthly data, especially in
2.5 Statistical Predictability and Prediction Table 2.10 Predictability parameters
No
Lead time
63 REL_PRED-TY
COR_COEFF
0
0.00000
0.0000
1.0000
1
0.833333E-01
0.3582
0.9336
2
0.166667
0.4056
0.9141
3
0.250000
0.4273
0.9041
19
1.58333
0.7004
0.7137
20
1.66667
0.7148
0.6993
21
1.75000
0.7290
0.6845
22
1.83333
0.7401
0.6725
23
1.91667
0.7512
0.6601
24
2.00000
0.7650
0.6440
25
2.08333
0.7751
0.6318
26
2.16667
0.7889
0.6146
27
2.25000
0.8030
0.5960
Fig. 2.25 Extrapolation of monthly SSN from December 2020 (gray) and independent observations in 2021 (black circles). The observed data for 2022 was added in June 2022
the vicinity of maxima and minima. Yet, the figures given below indicate that good and poor forecasts may occur both during the intervals of high and low values of SSN (Figs. 2.26 and 2.27). The quality of these two forecasts turned out to be radically different. It looks like the highly asymmetric PDF of monthly SSN data does not necessarily make the forecasts behavior close to its maxima or minima. These extrapolations are made within the framework of the KWT and we can only say that no linear approach disagreeing with it can produce a better forecast.
64
2 Analysis of Scalar Time Series
Fig. 2.26 Extrapolations of monthly SSN near the maxima of the 11-year cycle
Fig. 2.27 Extrapolations of monthly SSN near the minima of the 11-year cycle
The more accurate predictions of monthly SSN should be sought through nonlinear approaches and should be done following the required stages: • describe the probability density function and spectral density, • define respective most probable future trajectory of the time series, • show the equation that defines the quality criterion (most commonly, the error variance as a function of the lead time), and • compare your results with the results of linear extrapolation. Any forecast that does not satisfy these requirements is useless. This ends Example 2.11.
2.5 Statistical Predictability and Prediction
65
Example 2.12 Extrapolation of Quasi-biennial Oscillation The Quasi-biennial oscillation (QBO) is a very special random process which occurs in the equatorial troposphere at elevations from 18 to 31 km. It consists of steady oscillatory variations of zonal wind speed between approximately − 55 m/s (easterly) and 55 m/s (westerly). The process begins at the highest level of 31 km (10 hPa) and moves downward to the altitude of 18 km (70 hPa). In this example, we will consider the time series of monthly data given at the site of the Free Berlin University https://www.geo.fu-berlin.de/met/ag/strat/produkte/qbo/qbo.dat. The wind speed is measured at seven levels: 10, 15, 20, 30, 40, 50, and 70 hPa from January 1953 through December 2020 (plus eleven months in 2021). The length of time series is N = 816 months except that the time series at 10 hPa is three years shorter (it begins from January 1956, N = 780). The time series consist of monthly values obtained through direct observations and our task is to study the statistical predictability of QBO at different levels by conducting some forecasting experiments. The CAT.DAT file for analysis from 1953 through 2020 with AVESTA1 should be: 816
81
501
0
0
1
1
0.0833333
0
0
0
2020
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
When AVESTA1 is run with the above given file CAT.DAT, the model AR(12) selected by the Akaike criterion AICc is not optimal. To get the optimal model AR(10) selected by two criteria, one needs to run AVESTA1 again by setting the maximum order M to 10. To correctly detect the frequency of the random vibration clearly seen in Fig. 2.28a, the parameter NF should be changed to 5001. The calculations show that this QBO time series is not Gaussian and does not contain significant linear trend. The strongest version of QBO seems to be the time
Fig. 2.28 Time series of QBO at 15 hPa (a) and its spectral estimate (b), 1953–2020
66 Table 2.11 Predictability parameters
2 Analysis of Scalar Time Series No
Lead time
REL_PRED-TY
COR_COEFF
0
0.0000
0.0000
1.0000
1
0.0833
0.2570
0.9664
2
0.1667
0.4170
0.9089
6
0.5000
0.6978
0.7163
7
0.5833
0.7238
0.6900
16
1.3333
0.8013
0.5983
series at 15 hPa; its optimal autoregressive model is AR(10). The time series and its spectral density estimate are shown in Fig. 2.28. The spectral estimate corresponding to the AR(10) model shown in Fig. 2.28b has a sharp peak at f = 0.4272 cpy, which corresponds to the time scale of 28.09 months. All other time series of QBO behave in the same manner with the frequency of the spectral maximum changing between 0.4236 cpy to 0.4344 cpy (a “period” between 28.3 and 27.6 months) with the average frequency 0.4279 cpy (28.04 month). According to the table below, the limits of QBO’s statistical predictability characterized here with the entire time series shown in Fig. 2.28 achieves 7 and 16 months at the threshold values τ 0.7 and τ 0.8 , respectively (see Table 2.11). In spite of the oscillatory shape of QBO and with the contribution of the spectral band from 0.2 cpy to 0.6 cpy exceeding 80% of the time series variance, the limit of predictability is rather low: just six or seven months. Yet, all seven forecasts of QBO starting from the last known value in December 2020 (the time when this text was being written) proved to be satisfactory (Fig. 2.29). This does not mean, however, that all forecasts made with other initial points will also be good (Fig. 2.30). This ends Example 2.12 and Sect. 2.5.
2.6 Verification of GCM-Simulated Climate. The Scalar Case In addition to the above-described applications, the AVESTA1 program can be used for verification of climate data generated with numerical general circulation models (GCMs). Such models are designed for determining possible changes in the naturegenerated climate due to the external forces caused by anthropogenic activities. The results obtained for situations with possible presence of external forces and forecasts of climate response to them can be trusted only under the condition that the models are able to correctly reproduce properties of climate created by nature. An especially important part of this task is the ability of GCMs to correctly generate the part of the climate variability at decadal time scales which define what will be happening with climate within the 21st Century.
2.6 Verification of GCM-Simulated Climate. The Scalar Case
67
Fig. 2.29 Examples of QBO forecasts at levels 10, 15, 40, and 50 hPa starting from December 2020
The IPCC project aimed at predicting the anthropogenic effects upon the natural climate includes a number of experiments (Coupled Model Intercomparison Projects, or CMIPs), with the CMIP6 being the latest. The results of the project include timedependent random fields of global climate elements (temperature, atmospheric pressure, precipitation, and other important elements of climate) given on a spatial grid which covers the entire planet. This data set allows one to calculate the behavior of practically any climate element for the period of instrumental observations since 1850 through 2014, including, in particular, the time series of annual global, hemispheric, oceanic, and terrestrial surface temperature. Such data can be used to compare statistics of the observed and simulated climate elements to verify that the models do have the ability to produce simulated data whose statistical properties do not differ significantly from respective properties determined from climate observations.
68
2 Analysis of Scalar Time Series
Fig. 2.30 QBO forecasts at 20 hPa a and 30 hpa b from December 2019
Having two time series—observed and simulated—of a specific climate element and using the AVESTA1 program, one can compare different statistical moments in respective time series. In the scalar case, it includes. • mean values, variances, and higher statistical moments, • probability density functions, or PDFs (more accurately, the degree of vicinity of PDFs to the Gaussian curve), • correlation functions, • autoregressive time domain models of orders p from p = 0 to the maximum order M ≤ N/10, where N is the time series length and M is the maximum autoregressive order, • spectral densities, • statistical predictability, and, if necessary, practically any statistics listed earlier in this chapter. The statistics that is especially important for testing the achievability of IPCC’s final goal—projecting the climate change—is the predictability properties of the observed and simulated natural climates. The results of simulations can be regarded as reasonably reliable if the criteria of predictability obtained for the observed and simulated data do not disagree with each other. The time scales of such comparisons of observation and simulation climate data should not exceed the scales allowed by the length of the observed data sets, that is, about 150 years (also see IPCC 2013, Box TS.3). The numerical general circulation models—the principal tool of climate projections for several future decades—are based upon fluid dynamics equations. The equations cannot be integrated analytically so that our knowledge of both natural and anthropogenic variations of climate are based upon approximate numerical solutions. By themselves, the equations do not contain any random components but the results of climate simulations and climate projections with different GCMs are stochastic.
2.6 Verification of GCM-Simulated Climate. The Scalar Case
69
The numerically simulated climate acquires this new essential property because of at least two factors: the discrete character of the computational grid in time and spatial domains and because of the imperfect knowledge of many parameters that need to be prescribed for general circulation models. Due to the unavoidable errors in setting the parameters and the initial conditions, the numerical climate models generate random errors that grow with time so that eventually the range of the errors becomes close to the range of the simulated process. Further computations become meaningless. This cardinally important problem had been described by the outstanding American meteorologist and mathematician Edward Lorenz in a series of classical publications, e.g., Lorenz (1963, 1975, 1995). A source of randomness in numerical simulations lies in the differences between individual climate models developed and used by the participants of climate projection programs. Significant and sometimes cardinal dissimilarities between the results produced with different numerical GCMs based on the same theoretical platform are well known, well understood in principle, and make it necessary to treat the simulation results produced by various models as sample records of random processes. This is quite proper because climate is a random process. Whenever the results of simulations can be presented as time series of climate indices such, for example, as the annual surface global temperature (AGST), we have to apply time series analysis as the main tool for climate model verifications. Certainly, any verification of GCMs as a means for climate projections should begin with studying their ability to reproduce the nature-caused climate variability. A model that is capable to ensure a reasonably good statistical description of previously observed climate can be trusted and, consequently, can be regarded as a promising tool for the next task – projections of climate response to external forces. Supposedly, at least one of the reasons why the term ‘projection’ was introduced by the participants of the IPCC programs is the inability of GCMs to predict the natural variations of climate in the sense dictated by the theory of random processes. In particular, the models do not have the ability to tell us whether, under the natural conditions, the next decade, year, season, or even month will be warmer or colder than the previous decade, year, season, or month. Thus, the first goal in verifying a numerical climate model is to make sure that the climate generated by it in the absence of artificial external forcing has the characteristics statistically similar to what had been observed in the past. The most important climate index is, of course, the annual surface temperature averaged over the globe. The following example is designed to verify whether the statistical properties of the simulated AGST agree with the properties of the observed data. Comparisons of properties of observed and simulated time series should include the most important statistical characteristics, first of all, the probability distribution functions, the first four statistical moments (mean value, variance, skewness, and kurtosis) and the frequency domain properties describing the observed and simulated time series. The time domain characteristics may also include the correlation function but this function does not seem to be a convenient subject for the task of comparison. With the autoregressive approach, we also have the scalar and multivariate stochastic difference equations, but they can hardly be regarded as statistical moments and their
70
2 Analysis of Scalar Time Series
comparisons with the stochastic difference equation for the observed time series may not reveal the relative advantages or disadvantages of individual numerical models of climate. The time series used in this section are taken for the interval from 1870 through 2014 having in mind the similar verification of observation and simulation data in the bivariate case when one of the components of the observed bivariate time series is known only for that interval (see Chapter 3). Example 2.13 Verification of Climate Simulations within CMIP6 Thus, the time series length N = 145 years, the time step DT = 1 year and the Nyquist frequency f N = 1/2DT = 0.5 cpy (cycles per year). The global temperature data HadCRUT5 is taken from the web site https://crudata.uea.ac.uk/cru/data/temper ature/ (also see Morice et al. 2021 and Osborn et al. 2021). The observed AGST time series HadCRUT5 may contain contribution from external sources but it is included into the numerical models of climate and is exactly what we need for verifying models’ ability to reproduce statistics of the observed climate. The simulated globally averaged AGST and NINO3.4 data are calculated from the CMIP6 information at https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6. The global circulation models used here for the verification are listed in Table 2.12; the total number of time series to be verified is 35. Examples of simulations given in Fig. 2.31 show that observed and simulated time series generally behave rather similarly. A verification of simulated data means comparisons of major statistical properties of the observed and simulated time series. The absolute values of the observed AGST are given here as the average of the absolute temperature for 1961–1990, which is 287.15 K, with deviations from it (see Table 2.12 List of general circulation models #
Name
#
Name
#
Name
1
ACCESS-CM2
13
MCM-UA-1–0
25
EC-EARTH3-Veg
2
ACCESS-ESM1-5
14
MRI-ESM2-0
26
EC-EARTH3-Veg-LR
3
BCC-ESM1
15
NESM3
27
GISS-E2-1-G
4
CAMS-CSM1-0
16
TaiESM1
28
GISS-E2-1-H
5
CESM2
17
AWI-CM-1–1-MR
29
IITM-ESM
6
CESM2-WACCM
18
AWI-ESM-1–1-LR
30
INM-CM4-8
7
CMCC-CM2-HR4
19
CESM2-FV2
31
INM-CM5-0
8
CMCC-CM2-SR5
20
CESM2-WACCM-FV2
32
MIROC6
9
FIO-ESM-2–0
21
E3SM-1–0
33
MPI-ESM-1–2-HAM
10
IPSL-CM6A-LR
22
E3SM-1–1
34
MPI-ESM1-2-h
11
KACE-1–0-G
23
E3SM-ECA
35
MPI-ESM1-2-LR
12
KIOST-ESM
24
EC-EARTH3-AerChem
2.6 Verification of GCM-Simulated Climate. The Scalar Case
71
Fig. 2.31 HadCRUT5 (black) and its GCM simulations (gray)
https://en.wikipedia.org/wiki/Instrumental_temperature_record). In a few cases, the average simulated temperature has a noticeable positive bias and a larger variance than the observed temperature. In this example, we will concentrate upon two statistical characteristics of the observed and simulated AGST: its spectral density and its statistical predictability. The first is one of the most important statistical properties of any stationary random process, the second presents an indicator of the IPCC program’s ability to “project” the behavior of climate for the forthcoming decades. If the statistical predictability of simulated data does not differ significantly from the predictability of the observed AGST, the term “projection” (which, probably, is understood by almost everybody as “prediction”), becomes more reliable. The term “time series projection” does not exist in theory of random processes and, anyways, we will need a quantitative measure of predictability as a statistical moment to evaluate the quality of climate simulations with GCMs. As shown before in this chapter, the theory of extrapolation of stationary random processes had been developed 80 years ago by Andrey Kolmogorov and Norbert Wiener while the practically applicable methods were offered in 1962 (A. Yaglom) and in 1970 (G. Box and G. Jenkins). Yet, in natural sciences, including climatology, there is no understanding of the current situations in theory of extrapolation of stationary random processes. Regretfully, the problem of mathematically proper extrapolation of the AGST time series is not discussed in the IPCC products. In particular, it means that the estimated predictability presented in the IPCC report (2013) has no mathematical foundation. All simulated time series of AGST data obtained in CMIP6 for the interval from 1870 through 2014 contain a statistically significant linear trend, which may also include some external forcing (e.g., the anthropogenic contribution). As we are also
72
2 Analysis of Scalar Time Series
Fig. 2.32 Spectral estimates of AGST time series, 1870–2014, with the linear trend present (a) and deleted (b). The solid black line and the line with symbols show the spectrum of the observed AGST and the average spectrum of simulated time series
interested in the natural variability of climate represented by the time series of AGST, the comparisons should be conducted in two versions: with the trend present (the time series may contain contributions from external sources) and with the trend deleted (purely natural variability with suppressed nature- and externally-caused low-frequency components). The AGST time series generated by the models contain a trend and, eventually, the task is to predict the behavior of climate under any external contribution that existed during the respective time span. The calculations for the case without trend deletion are conducted with the AVESTA1 program and the CAT.DAT file: 145
14
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The spectral density estimates for this case are shown in Fig. 2.32a. The spectra obtained from the simulated time series (thin gray lines) agree with the spectra averaged over the estimates obtained from the 35 simulated time series (lines with crosses) show that the models generate the AGST time series whose spectra agree with estimates obtained from observations. Most spectra do not differ much from the spectrum of the observed AGST. The same is true for the average over the ensemble of simulated data spectrum. This can be regarded as a definite achievement of numerical simulations of annual global surface temperature. When the linear trend is treated as an integral part of AGST variability irrespective of its provenance, the values of spectral density at the frequencies below approximately 0.030 cpy (time scales of about 30 years) exceed the spectra of time series with the deleted trend. At higher frequencies, the differences are negligibly small. The trend removal makes the low-frequency variations weaker by about an order of magnitude, which means that the statistical predictability of the original
2.6 Verification of GCM-Simulated Climate. The Scalar Case
73
Table 2.13 Comparison of basic statistics of AGST time series with and without linear trend Name
PDF
p
τ0.7
Name
PDF
p
τ0.7
Name
PDF
p
τ0.7
HadCRUT5
N/Y
4/4
15/2
G12
N/Y
3/3
10/1
G24
N/N
3/3
6/4
G1
N/Y
2/2
3/1
G13
N/N
1/1
5/1
G25
N/Y
4/1
12/1
G2
N/N
3/3
7/5
G14
N/Y
3/3
6/1
G26
N/Y
3/3
7/1
G3
N/Y
3/1
3/2
G15
N/Y
1/1
3/1
G27
N/Y
4/4
4/1
G4
Y/Y
3/3
3/0
G16
N/N
3/3
4/2
G28
N/Y
3/3
7/2
G5
N/Y
3/3
8/2
G17
N/Y
5/5
19/1
G29
N/Y
3/1
7/0
G6
N/Y
5/5
7/1
G18
N/Y
6/1
2/0
G30
N/Y
4/1
9/1
G7
N/Y
5/3
15/1
G19
N/Y
5/5
6/1
G31
N/Y
3/1
7/1
G8
N/N
3/3
7/1
G20
N/Y
13/7
9/2
G32
N/Y
5/5
2/1
G9
N/Y
3/1
14/1
G21
N/N
1/1
3/1
G33
N/Y
5/5
2/1
G10
N/N
4/3
10/1
G22
N/Y
3/3
4/2
G34
N/Y
5/1
10/1
G11
N/N
3/3
13/2
G23
N/N
3/3
5/2
G35
N/Y
6/3
17/1
data that contains a trend, is much higher. The spectra are shown in Fig. 2.32b and predictability estimates for both cases in Table 2.13. If the trend is not deleted, the probability density function of the time series is not Gaussian practically always (letter ‘N’ in the PDF column). The deletion of the trend results in the dominance of the Gaussian PDF: 26 out of the 35 cases (letter ‘Y’). The trend deletion leads to smaller optimal autoregressive orders (column p in the table). This is quite understandable but a more important statistical parameter is the relative predictability criterion (RPC)τα —the lead time (in years) at which the ratio of the prediction error standard deviation to the time series standard deviation becomes equal to 0.7; this means that the share of the prediction error variance becomes equal to 50% of the time series variance and the width of the 90% confidence interval for the prediction trajectory becomes equal to the doubled standard deviation of the time series. In our case, this predictability limit for the detrended time series does not exceed 5 years and its average value is about 3 years. The original time series possess significantly higher statistical predictability: it is as high as 19 years for the model G17 (AWI-CM-1–1-MR) and exceeds 9 years in ten cases. Thus, the high predictability of the original time series occurs due to the high values of spectral density at time scales longer than about 30 years, which is comparable to the time series length of 145 years. Formally, the simulation results related to statistical predictability of the annual global surface temperature do not seem to disagree with what is known from observations. Specifically, the difference between the predictability limit τ0.7 = 15 years for the observed time series does not disagree with the average predictability limit τ0.7 ≈ 8 years at the standard deviation of approximately 4.5 years. This means, in particular, that the statistical predictability of climate estimated at 10 to 20 years in the IPCC report agrees with observations presented with the time series HadCRUT5. On the whole, the simulated climate does not seem to be very different from observations.
74
2 Analysis of Scalar Time Series
This ends Example 2.11 and Sect. 2.6.
2.7 Engineering Time Series It is understandable that members of the natural science community are not familiar with the tasks that have to be solved by engineers. A major task in mechanical engineering is to obtain quantitative information about the response of mechanical constructions to external forcing that has the form of random vibrations. The engineers solve such tasks by analyzing their data mostly within the frequency domain while it would hardly be wrong to assume that many of us within the natural science community have ever had a chance to analyze time series of natural phenomena within the frequency domain. The goal of this and two other sections about the mechanical engineering science is to make more of us familiar with the type of problems that have to be resolved by engineers. This seems to be important because most products created in the areas of technology satisfy the users’ requirements. It means that designers of engineering products are well informed about the properties of potential random loads and about the response of devices that are manufactured according to their designs. In engineering, the main goal of studying the effect of random loading upon engineering devices, be it a lawn mower or an intercontinental ballistic missile, is to get quantitative information about the response of the device to external loads as a function of frequency. The quantity that defines this response is called the frequency response function. It describes the degree of damping or amplification of the input forcing and the delay between the load application and the response of the device as functions of frequency. This information is necessary for a proper design of mechanical devices and it can be obtained only in the bi- or trivariate cases; in this section, we will show the reader what type of spectral density may be characteristic for engineering data. The initial data present a multivariate engineering series kindly given to this author by Professor Randall J. Allemang. The data is supposed to be used by students within the Applied Fourier Transform Techniques tutorial. We will see how cumbersome the engineering tasks are when compared with the same type of analysis in natural sciences. Judging by the shape of the spectral density, the engineering problems related to random vibrations look very difficult, especially if compared with what is observed in nature. Also, engineers have some advantages over the researchers in natural sciences: their time series are usually much longer and they can repeat they experiments at will. (Obviously, this does not happen always.) We usually have shorter time series and rarely have a chance to repeat the experiment. Generally, our situation looks more difficult, which means, in particular, that we must follow the rules of time series analysis, information theory, and mathematical statistics especially accurately, in particular, to avoid situations when the number of parameters that need to be estimated is comparable to the number of terms in our
2.7 Engineering Time Series
75
time series. Let us see now what type of data the engineers have for designing and testing their devices and constructions. Example 2.14 Spectra of Engineering Data The engineering data used in this book are created for an exercise to be completed by students of mechanical engineering: it contains records of the response of an aluminum disk to several sources of random vibrations applied to the disk and the records of disk’s response to those excitations at 33 locations on the disk. The set of input and output processes forms a multivariate linear stochastic system and we will be using in this example the records of the input processes and the response to them registered at one point on the disk. Obviously, the random vibration sample shown in Fig. 2.33 starkly differs from what we normally have in natural sciences. In particular, it is extremely long (meaning that its length exceeds the largest time scale of interest by orders of magnitude) and, visually, behaves in a manner that is similar to the behavior of tides. However, the tidal contribution to the Earth atmosphere, ocean, and solid body is a sum of a finite number of strictly periodic functions and the net tidal effect at any given point varies in time according to the well-known astronomical factors and can be predicted precisely at any lead time. The vibration record shown in the figure below presents a sample of a stationary random process and it must be analyzed with methods based upon the theory of random processes. The same is true, of course, with respect to the time series generated by nature and we should also stay within the framework of that theory. The time series here contain observations of the response of an aluminum disk to external forcings in the form of random vibrations. Each time series consists of 773,120 observations measured at DT = 0.390625 × 10–3 s, that is, at the sampling rate of 2560 measurements per second (2560 Hz). The total duration of the experiment is 302 s.
Fig. 2.33 The input time series IN1 (a) and its first 0.04 s (b)
76
2 Analysis of Scalar Time Series
In this example, we will discuss only the spectral estimates of three time series: one output and two inputs notated as OUT, IN1, and IN2. The data are given in the attachment ESM, Example 3.8. The analysis of time domain models is not possible because of the high autoregressive orders prescribed by all order selection criteria. It means that the statistical structure of the time series is quite complicated. A graph of the first input time series IN1 is given in Fig. 2.33. The other two engineering time series that we have look similar to what is shown in the figure. In contrast to tides, the external load upon the disk does not contain any deterministic components and presents a sample of a regular random process meaning that its ‘components’ fill up the entire frequency band from zero frequency through the Nyquist frequency 1/2DT without any finite intervals between the ‘components’. Strictly speaking, such process cannot contain any periodic elements such as sines and cosines. Similar regular random processes of the vibration type in the Earth and solar systems are rare, and a well-known example of this type is the Quasi-biennial oscillation in the Earth upper troposphere and lower stratosphere (see Example 2.12). As seen from Fig. 2.33, the time series can be treated as stationary; analysis with AVESTA1 shows that its probability density function is normal (Gaussian): the absolute values of the standardized asymmetry and kurtosis do not exceed 2. The latter statement is true for all engineering time series discussed in this book. An important final goal of multivariate time series analyses conducted in mechanical engineering is to estimate the frequency response functions that connect the output processes to the inputs. The FRF defines the response of devices to external forcings as a function of frequency and it will be discussed later for both natural and engineering systems such as the global temperature and ENSO and the response of a disk to an external forcing. In our engineering sections, we will show statistical properties of scalar and multivariate processes that are studied by engineers to help them build reliable mechanical devices and constructions. Similar tasks should be solved in natural science for describing stochastic input/output systems created by nature. The initial file CAT.DAT for AVESTA1 in this case should be: 773120
99
1001
0
0
1
1
0.000390625
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The parameter NF is increased to 1001 to show the features of the spectral density in more detail. The trend, if it is present, will not be deleted because it constitutes a proper part of the time series. The ENDDATE parameter is set to zero because we are not interested in forecasting these time series. The time required for estimating the spectral density of a scalar time series of length N = 773,120 using AVESTA1 with the maximum autoregressive order M = 99 on a desktop computer is about 10 s. The correlation functions of natural phenomena usually decrease quite fast with the lag unless the process contains a “periodic” trend or trends such as daily and seasonal cycles caused by astronomical factors or by the environment (e.g., the QBO).
2.7 Engineering Time Series
77
Fig. 2.34 Correlation functions of the input time series IN1 (gray) and Quasi-biennial oscillation at 30 hPa level (black)
Then, the spectral density of such time series contains a sharp peak at respective frequency (1 cycle per day, per year, or per 28 months) and smaller peaks at respective harmonics (two or more cycles per day or per year). The situation with this engineering data is different: the correlation function of the input time series varies very fast changing its sign every one or two time units (Fig. 2.34); the other inputs and outputs behave in the same manner. The much smoother behavior of QBO is seen from its correlation function, which demonstrates the presence of a single quasi-periodic oscillation. The optimal spectral density estimates of the time series correspond to the maximum possible autoregressive order allowed by the AVESTA1 program: according to the five order selection criteria, the best-fitting autoregressive model in the scalar case is always AR(99). The spectra contain many peaks and most of them are statistically significant. With the time series length exceeding three quarters of million units, the number of equivalent degrees of freedom (subrecords) used here for computing the approximate confidence interval is close to 8000, so that the estimates are computed at a very high level of reliability; adding the confidence bounds for them is not necessary. A comparison with the results obtained with Matlab procedure pburg at AR order p = 1000 produces the spectral estimates that barely differ from what is shown in Fig. 2.35. The range of the spectral densities in the processes amounts to 13 orders of magnitude (the figure does not show the high-frequency part of the spectra) and contain many statistically significant peaks.
78
2 Analysis of Scalar Time Series
Fig. 2.35 Spectra of time series OUT, IN1, and IN2 (a, b, c, respectively)
As mentioned before, the examples from mechanical engineering are given here in order to show to the readers working in natural sciences the art of statistical analysis of engineering time series which impressively differ from what we have in our areas. The disadvantage is obviously the complicated statistical structure of their time series which does not seem to exist in nature. At the same time, the successful results attained in engineering are seen in our everyday life in the form of properly working devices and structures. In the following chapters, we will trace the changes in some time series statistics caused by the increase of time series dimension D to D = 2 and D = 3 (one and two inputs). This ends Example 2.14 and Sect. 2.7.
2.8 Conclusions The executable program AVESTA1 presents a simple and efficient tool for obtaining important statistical information about scalar time series of length from several dozen to 106 time units. The spectral estimates are given for each model and the information about the model selected by one of the five selection criteria is then used, within the same run of the program, to give additional information about the selected model. The run can be concluded with a time series forecast in accordance with the classical theory of extrapolation of stationary random processes by Andrey Kolmogorov and Norbert Wiener. All basic estimates given by AVESTA1, including the forecast trajectory are given with respective confidence bounds. The forecasting part can be waived in accordance with user’s instruction. The program allows the user to transform the time series in accordance with instructions given in the file CAT.DAT. The transformations include linear filtering of the time series but one needs to remember that this procedure should only be done on the basis of credible preliminary considerations. Operating AVESTA1 is very easy, it is quite productive and can be applied even by users not familiar with theory of stationary random processes. The program automatically generates a rather long list
Attachment 2.1: Weights and Frequency Response Functions …
79
of statistical properties that characterize the time series both in time and frequency domains in complete agreement with the theory of stationary random processes. This program makes it possible for the user to avoid solving time series analysis tasks one by one, from estimating its first statistical moments and its closeness to the Gaussian probability distribution function and up to spectral estimation and the time series forecast. With this mathematically correct tool, the user does not have to go through a series of efforts separately for every time series analysis task and can practically immediately start to interpret the statistical properties of the time series and the random process that generated it as well as to make conclusions about its physical properties. The classical theory of forecasting of stationary time series is practically unknown in today’s natural sciences so it seems reasonable to repeat that no other linear approach can produce a forecast with a smaller error variance. If the time series is Gaussian (normal), this approach produces the forecast with the absolutely minimal error variance for any linear or nonlinear method of extrapolation. The program AVESTA1 seems to be unique in the time series analysis software used in natural sciences, both in time and frequency domains.
Attachment 2.1: Weights and Frequency Response Functions of Linear Filters The programs AVESTA1 and AVESTA3 contain nine filters: four low-pass and highpass and one band-pass. Ideally, the goal of using a linear filter is to isolate a specific frequency band of the time series spectrum and completely suppress all other components. This goal cannot ever be achieved because the filter possessing this property is not physically realized. All physically realizable filters affect time series behavior at all frequencies. The linear filtering operation is described with the following equation: M{ F LT
x˜t =
h m xt−m ,
m=−M F L T
so that the length N of the time series becomes equal to N – 2MFLT. The spectral densities of the initial and filtered time series s( f ) and s˜ ( f ) are related to each other as s˜ ( f ) = |H ( f )|2 s( f ), where H(f ) is the filter’s frequency response function (FRF). The weights and frequency response functions of the filters used here are shown below. 1. Moving equal-weight averaging: weight function:
80
2 Analysis of Scalar Time Series
⎧ 1 ⎨ , |m| ≤ M F L T h m = 2M F L T + 1 , ⎩ 0, |m| > M F L T frequency response function: H( f ) =
sin π f (2M F L T + 1) . π f (2M F L T + 1)
This is a very common filter with a quickly decreasing frequency response function but with relatively large side lobes. 2. Tukey filter: weight function: ⎧ ( 2π m ) ⎪ ⎨ 1 + cos 2M F L T +1 , |m| ≤ M F L T hm = 2M F L T + 1 ⎪ ⎩ 0, |m| > M F L T frequency response function: H( f ) =
1 sin π f (2M F L T + 1) . π f (2M F L T + 1) 1 − [ f (2M F L T + 1)]2
. The side lobes of the frequency response function of this filter are small but the main lobe is decreasing slowly. It means that the high-frequency leakage is small but the main lobe covers a wide frequency range. 3. Triangular (Bartlett) filter: weight function: ( hm =
− (2M F4|m| , |m| ≤ M F L T L T +1)2 0, |m| > M F L T 2 2M F L T +1
frequency response function: [ H( f ) =
2 sin π f (2M+1) 2 π f (2M + 1)
]2 .
This function is not negative and the maximum of the first side lobe is just 0.05. 4. Normal filter: weight function:
Attachment 2.1: Weights and Frequency Response Functions …
81
⎧ ⎨ √1 exp(−m 2 /2σ 2 ), |m| ≤ M F L T h m = σ 2π ⎩ 0, |m| > M F L T frequency response function: H ( f ) = exp(−2π 2 σ 2 f 2 ). This function is not negative and monotonic. Both properties are useful. 5. Bell-shaped filter weight function: ⎧√ ( 2) ⎪ ⎨ 2 exp −π m , |m| ≤ M F L T , 2M 2 hm = M ⎪ ⎩ 0, |m| > M F L T frequency response function: ] [ ] [ H ( f ) ≈ exp −2π/( f M F L T − 1)2 + exp −2π/( f M F L T + 1)2 . The frequency response functions of these filters are shown below along with an example of the normal filter’s frequency response. The FRFs of these filters are shown in Fig. 2.36. Note that the arguments of the bell-shaped and normal filters are σ 2 f 2 and fMFLT, respectively.
Fig. 2.36 Frequency response functions of filters: equal weights (1), Tukey (2), and Bartlett (3); normal (4), bell-shaped (5), normal high-pass (6)
82
2 Analysis of Scalar Time Series
Attachment 2.2: Examples of Optimal Nonlinear Extrapolation of Stationary Random Processes A.M. Yaglom A part of presentation with E.G. Gladyshev and M.I. Fortus.1
Introduction The major goal of theory of extrapolation of stationary random processes is the solution of the following problem, which is very important for practical purposes: let ξ(s), −∞ < s < ∞, be a stationary random process whose “past” values (upon the semi-axis ξ(s), −∞ < s ≤ t) are known; the task is to produce the best prediction of the future value ξ(t + τ ), τ > 0 using these known values. Usually, the ‘best prediction’ is understood as the value of the functional ξ˜t,τ , {ξ(s), −∞ < s ≤ t} = ξ˜ (t, τ ) of all past values of the process for which the mean square value of its prediction error I2 I I I σ 2 (τ ) = MIξ(t + τ ) − ξ˜ (t, τ )I
(2.1)
is the smallest; besides, the functionals ξ˜ (t, τ ) to be studied are linear. Finding the respective “optimal” linear functional ξ˜ (t, τ )(and the mean square value σ 2 (τ ) of respective prediction error) is the content of the profound theory of linear extrapolation of stationary random processes created about 20 years ago by A.N. Kolmogorov (Kolmogorov 1941) and developed further by M.G. Krein (Krein 1945) and N. Wiener (Wiener 1949). Currently, the theory has achieved a significant degree of completeness. Yet, it is clear that the practical task of “the best prediction” cannot be solved completely within the framework of a linear theory of extrapolation. Actually, the limitation of using only the linear functionals ξ˜ (t, τ ) is just a consequence of the fact that respective solution becomes much simpler and it is by no means related to the substance of the task. Moreover, the condition of the minimal mean error variance is just a specific example belonging to a very large set of other permissible (and equally natural from the common-sense point of view) conditions for an optimal forecasting; the selection of that specific condition is dictated, first of all, by it being the most convenient for applying the analytical methods of solution and for obtaining viewable ultimate results. In this article, we will follow the usual way of characterizing the “quality of prediction” ξ˜ (t, τ ) with the mean square value σ 2 (τ ) of its error but the requirement of linearity of the functional ξ˜ (t, τ ) will be removed. It is well known that in the important case of the Gaussian processes, this removal cannot result in a 1
Translated from Russian and rearranged in accordance with the Springer publication standard.
Introduction
83
better solution of the prediction task than what is provided by the theory of linear extrapolation—the lowest value of σ 2 (s) within the class of arbitrary functionals ξ˜ (t, τ ) = ξ˜t,τ , {ξ(s), −∞ < s ≤ t} is always achieved with some linear functional ξ˜t,τ .2 If, however, not all univariate and multivariate probability distributions are Gaussian, there are, generally, some reasons to hope that the mean square error of linear extrapolation can be made smaller due to the transfer to nonlinear functionals ξ˜ (t, τ ). In this respect, suggestions have been made repeatedly in the engineering literature to look for prediction ξ˜ (t, τ ) within the class of some specific nonlinear functionals defined with a finite number of “arbitrary functions” (playing the role of parameters in this task), which are more ( ∞general than the class of linear functionals (which can formally be presented as 0 ξ(t − τ )ω(τ )dτ that contains the only “arbitrary function” ω(τ )); in this case, the minimum mean square condition (2.1) allows one to obtain a system of equations (usually, integral or integrodifferential) with respect to the unknown “arbitrary functions” that define the optimal functional ξ˜ (t, τ ), and which can possibly be solved, e.g., numerically (cf. (Zade 1953; Kuznetsov et al. 1954; Lubbok 1959; Pugatchev 1960)3 ). A different approach to the task of optimal nonlinear prediction is based upon the circumstance that the conditional mathematical expectation ξ˜0 (t, τ ) = M[ξ(t + τ )|ξ(s), s ≤ t]
(2.2)
of the future value of the process under the condition that all its past values are known (if this mathematical expectation exists) will be exactly the same functional of ξ(s), −∞ < s ≤ t, for which the mean square (2.1) turns out to be minimal (see, e.g., (Yaglom 1952), p. 77 or (Pugatchev 1960), Sect. 140). Therefore, the conditional mathematical expectation ξ0 (t, τ ) presents the best possible nonlinear forecast of the value ξ(t + τ ) while the mean square I I2 I I σ02 (τ ) = MIξ(t + τ ) − ξ˜0 (t, τ )I
(2.3)
defines the lower limit of mean square error for all possible methods of forecasting. For the stationary random processes ξ(n) with a discrete argument (stationary random sequences), which have moments of all orders and satisfy several more general conditions, it is possible to indicate a general method that, in principle, allows one to calculate, using the values ξ(m) at m ≤ n, on the basis of functions of statistical moments of different orders, the conditional expectation M[ξ(n + k)|ξ(m), m ≤ n] at any given degree of accuracy [see (Masani and Wiener 1959)]. However, this 2
Note by the way that in the case of Gaussian processes using that same condition of the minimal mean square error also turns out to be immaterial; in this case, all reasonable conditions of optimality lead to one and the same solution of the prediction task [see, e.g., (Yaglom 1952), p.77, or (Sherman 1958)]. 3 In most of those works, however, the subject was not the task of predicting a future value of a random process but rather a more general task of filtering of random processes which includes the prediction task as a specific case.
84
2 Analysis of Scalar Time Series
method requires extremely cumbersome computations even when the required degree of accuracy is quite modest and it is barely effective in practice even when the modern high-speed computers are used. The methods of nonlinear extrapolation proposed in Zade (1953); Kuznetsov et al. 1954; Lubbok 1959; Pugatchev 1960) also require burdensome calculations, which strongly impede their practical applications. As for specific examples of nonlinear extrapolation, they are practically impossible to find in literature.4 Therefore, the answer to the question of what can be achieved due to the moving from linear extrapolation of some process that exists in practice to nonlinear extrapolation remains unknown. It should be stressed that, strictly speaking, this question does not belong to the mathematical probability theory; it is even difficult to formulate within the theory’s framework. From the theoretician’s point of view, the solution of the question of possible gains due to the use of nonlinear methods of extrapolation as compared to linear methods is given with the well-known example of a stationary process. ξ(τ ) = a cos(λτ + ϕ).
(2.4)
Here, a is a constant, ϕ is a random variable uniformly distributed over the interval (0, 2π), and λ is a random variable independent of ϕ and having an arbitrary probability distribution F(λ) (see, for example, Doob 1953, Chap. X, example 4, and Chap. XI, Sect. 3, example 4). Indeed, all sample records of such process ξ (t) will have a strictly sinusoidal shape so that the nonlinear prediction ξ˜0 (t, τ ) at any τ will be absolutely accurate (that is, σ02 (τ ) ≡ 0); on the other hand, the spectral function of the process will be 41 a 2 [1+ F(λ)− F(−λ)] meaning that it can be chosen at will so that for any τ > 0 there are processes described with (2.4) for which the mean square error σ 2 (τ ) of linear extrapolation can be arbitrarily close to M|ξ(t + τ )|2 = B(0). However, from the applier’s point of view, the process (2.4) is not an “actual” stationary random process: the only random quantities here are the phase and frequency of respective harmonic oscillation and only those two variables need to be estimated instead of applying the theory of random processes to ξ (t). Note also that a practical application of this theory to ξ (t) turns out to be extremely difficult because the process is not ergodic so that its statistical characteristics cannot be determined from one sample record (or a small number of samples). Clearly, Eq. (2.4) can be easily used to construct an ergodic process with very close properties: it is sufficient to assume, for example, that ξ (t) consists of long slices of sinusoids (2.4) that end at random “discontinuity points” (distributed, for example, according to the Poisson law with a very small parameter) where the transfer to a new slice of the sinusoid occurs with the values of τ and λ chosen in accordance with their probability distributions within the process described with (2.4). However, from the applier’s point of view, this process looks exceedingly exotic; besides, in order for it to be “very close” to the process described with (2.4), the discontinuity points should be distributed “very rarely” so 4
The situation seems to be a little bit better in the area of filtering of stationary processes; yet, even in that area the number of published examples of applying the nonlinear methods is still very insignificant.
Continuous Markov Random Processes
85
that the ergodicity of the process will become purely conditional because it will be almost impossible to utilize. However, it is not difficult to build a series of simple examples of stationary random processes which are ergodic and not too exotic from the point of view of practical application and for which it can be possible to write the best nonlinear prediction and compare the respective root mean square error σ0 (τ ) with the root mean square error σ (τ ) of the best linear prediction; depiction of several examples of this type constitutes the main content of this work. Clearly, no number of such examples will allow us to precisely determine the gain which we can obtain by switching to nonlinear extrapolation in any specific and novel to us case; yet, some general idea about the degree of magnitude of the expected gain can still be obtained in this manner. In particular, the examples given below show that quite frequently the difference between σ02 (τ ) and σ 2 (τ ) turns out to be quite small though the extrapolation formulae defining the linear and nonlinear predictions are very dissimilar. This feature agrees very well with the fact that even in the case of linear extrapolation a significant change in the extrapolation formulae often leads to very small changes in the prediction error; this shows that in practical tasks the expediency of converting to a new and more complicated method of forecasting requires a thorough and dedicated study in all cases.
Continuous Markov Random Processes The Markov random processes ξ (s) are characterized with the property that their conditional probability distribution of the variable ξ (t + τ ) with the known past values ξ (s),−∞ < s ≤ t depends only upon ξ(t)—the latest of the known values. Clearly, this circumstance makes the task of the optimal linear extrapolation much simpler for such processes; here ξ˜0 (t, τ ) = M[ξ(t + τ )|ξ(t) ],
(2.5)
so that the functional ξ˜0 (t, τ ) = ξ˜0t,τ {ξ(s), −∞ < s ≤ t} becomes a function ξ˜0t,τ (ξ(t)) of a single variable and all that is needed in order to find it is the bivariate probability distributions of our process. (a)
Processes that present unique functions of the Ornstein–Uhlenbeck process
The simplest stationary Markov random process is the so called Ornstein–Uhlenbeck process (Doob 1942)—a real-valued Gaussian process η(s) with Mη(s) and the correlation function Bηη (τ ) = Mη(s)η(s + τ ) = e−|τ | (that is, with the spectral density f ηη (λ) = π(λ12 +1) ). As the process is Gaussian, its best linear prediction is also the best prediction in general: η(t, ˜ τ ) = η˜ 0 (t, τ ) = e−|τ | η(t).
(2.6)
86
2 Analysis of Scalar Time Series
Simple examples of non-Gaussian stationary Markov processes can be obtained by assuming ξ(s) = ϕ (η(s)), where η(s) is an Ornstein–Uhlenbeck process and y = ϕ(x) is a known nonlinear function that has a unique inverse function x = ϕ(y). In the specific case when ϕ(x) = ax 2n+1 , the correlation function of the process ξ(s) will be easily calculated following the rule of calculating the higher moments of the Gaussian distribution and will have a rational Fourier transform; consequently, for such ξ(s) process, the best linear extrapolation function will be easy to obtain. Moreover, the conditional distribution for ξ(t + τ ) and the known ξ(s) will be given here with a plain explicit formula which also allows one to easily obtain an explicit expression for the function ξ˜0t,τ (ξ(t)). 1. Let 1 ξ(t) = √ η3 (t) 15
(2.7)
where η(s) is an Ornstein–Uhlenbeck process. In this case, ] 1[ 2 1 3 Mη3 (t)η3 (t + τ ) = 9Bηη (0)Bηη (τ ) + 6Bηη (τ ) 15 15 3 2 = e−τ + e−3τ . (2.8) 5 5 √ (The multiplier 1/ 15 in (2.7) is intentionally selected to ensure that B(0) = 1. The spectral density that corresponds to the correlation function (2.8) is B(τ ) =
f (λ) =
3 5π
( λ2
1 2 + 2 +1 λ +9
⎛
) =
/ I2 I I I Iλ + i 11 3 I
⎞
9 ⎜ ⎟ ⎝ ⎠. 5π |(λ + i )(λ + 3i )|2
(2.9)
It is easily obtained from this that the best linear forecast of the process ξ(s) is given with the formula ξ˜ (t + τ ) = D0 ξ(t) + D1
(∞
) 11 s ξ(t − s)ds, exp − 3 ( /
(2.10)
0
where ) ( ] / ) 11 11 −3τ −τ −1 e +3 3− , e 3 3 ) ( / ( ) 11 2 − 5 e−τ − e−3τ . D1 = 3 3 3
1 D0 = 2
[(/
(2.11)
Continuous Markov Random Processes
87
The mean square error of this prediction is [ ] √ √ ) ( 3 33 − 11 −2τ 19 − 3 33 −4τ −2τ e e 1+ σ (τ ) = 1 − e + 10 10 2
≈ (1 − e−2τ )[1 + 0.62e−2τ + 0.18e−4τ ]
(2.12)
Clearly, the process ξ(s) is not Gaussian, the univariate probability distribution function of ξ(s) will obviously be ( √ ) √ 3 6 15 15x 2/3 exp − p(x) = √ , 2 3 2π x 2/3
(2.13)
which becomes infinite at x = 0. The conditional probability density of ξ(t + τ ) for a known ξ(t) will be equal to √ 6
15
( √ 3
pτ (x|ξ(t)|) = / 3 2π(1 − e−2τ )x 2/3
) 15[x 1/3 − e−τ ξ 1/3 (t)]2 exp − . 2(1 − e−2τ )
(2.14)
The functional ξ˜0 (t, τ ), which defines the best nonlinear forecast, can be found through calculating the mean value of the distribution given with (2.14) (or by calcu1 3 η using the known conditional distribution lating the mean value of the variable 15 of the variable η); it turns out to be 3 −τ ξ˜0 (t, τ ) = √ e (1 − e−2τ )ξ 1/3 (t) + e−3τ ξ(t). 3 15
(2.15)
Substituting (2.15) into (2.3) will show that the mean square error of extrapolation according to this formula is. σ02 (τ ) = (1 − e−2τ )[1 + 0.4e−2τ + 0.4e−4τ ].
(2.16)
Note also that in this example we have one more way to build an optimal extrapolation of the process ξ(s), which may well be the most natural: as the best linear forecast of the variable η(t + τ ) under the known values η(s), s ≤ t, is, ˜ + τ ) = e−τ η(t) [see (2.6)], the best prediction of the variable unarguably, η(t 1 3 ξ(t + τ ) = 15 η˜ (t + τ ) can also be given as ξ1 (t + τ ) =
1 3 η˜ (t, τ ) = e−3τ ξ(t). 15
(2.17)
However, one should remember that the meaning of the expression “the best” as ˜ + applied to the formula (2.17) and to the formula (2.15) is different. The forecast η(t ˜ τ )|2 = τ ) will be the best possible in the sense that for it the value M|η(t + τ ) − η(t,
88
2 Analysis of Scalar Time Series
min; it clearly shows that (2.17) is the version of the extrapolation formulae for which the quantity that takes the minimal value is I I2 I I σˆ 2/3 (τ ) = MIξ 1/3 (t + τ ) − ξˆ 1/3 (t, τ )I .
(2.18)
The condition of minimum value of (2.18) is also a reasonable way for optimal extrapolation but it has a drawback: it is very inconvenient for calculations in all cases but the one that we are discussing here. As for the mean square error of extrapolation made according to (2.17), it will be equal to I I2 ( )( ) σ12 (τ ) = MIξ(t + τ ) − e−3τ ξ(τ )I = 1 − e−2τ 1 + e−2τ − 0.2e−4τ ,
(2.19)
which is, of course, higher than the mean square error (2.16) of the best (in the sense of mean square method) linear extrapolation ξ˜0 (t, τ ) as well as greater than the mean square error (2.12) of the best (in the same sense) linear extrapolation ξ˜ (t, τ )(it is true because the extrapolation formula (2.17) is also linear). However, it will be interesting to compare quantitatively the error values for extrapolations in accordance with the formulae (2.10), (2.15), and (2.17). The solid lines in Fig. 2.37 show the values of σ 2 (τ ),σ02 (τ ), and σ12 (τ ) at different values of τ; in Fig. 2.38, the solid lines show the ratios σ0 (τ )/σ (τ ) and σ1 (τ )/σ (τ ). It is seen that in this case the best linear forecast ξ˜ (t, τ ) has a mean square error which exceeds the mean square error of the best nonlinear forecasting ξ˜0 (t, τ ) by 2% or less; moreover, even the linear forecast ξ˜1 (t, τ ), which is optimal in the sense of a rather strange criterion (2.18), has a mean square errors that exceeds the mean square error of best nonlinear forecast ξ˜0 (t, τ ) (in the means square method sense) by just 5%. As for comparing with each other only the linear methods of forecasting, a switch from the minimum of the mean square error to the condition of minimizing the formula (2.18) results in an increase of the mean square forecast error by not more than 3%. 2. Consider now the process 1 η5 (s). ξ(s) = √ 3 105
(2.20)
√ The coefficient 1/3 105 is selected in the way that makes the means square value of ξ (s) equal to 1. One may think that this process will be “more non-Gaussian” than the process from Example 1 so that the benefit due to the transfer from linear methods to a nonlinear will be greater than in the previous case. Actually, this assumption turns out to be incorrect. The correlation function of the process ξ(s) is Bξ ξ (τ ) = so that
5 −τ 40 −3τ 8 1 + e−5τ , Mη5 (s)η5 (s + τ ) = e + e 945 21 63 63
(2.21)
Continuous Markov Random Processes
89
Fig. 2.37 .
Fig. 2.38 .
f ξ ξ (λ) =
[ ] 3 24 8 5 35λ4 + 806λ2 + 1347 5 + + = . 2 2 2 2 63π λ + 1 λ + 9 λ + 25 63π (λ + 1)(λ2 + 9)(λ2 + 25) (2.22)
90
2 Analysis of Scalar Time Series
Hence, it can be easy to obtain that the best linear prediction of the process (2.20) can be written as ξ˜ (t + τ ) = D0 ξ(t) +
(∞
[
] D1 e−α1 s + D2 e−α2 s ξ(t − s)ds,
(2.23)
0
√ √ where α1 = −x1 ≈ 1.34 and α2 = −x2 ≈ 4.61(x1 and x2 are the roots of the quadratic Eq. 35x 2 +806x + 1347 = 0), while D0 , D1 , and D2 present some linear combinations of exponential functions e−2τ , e−3τ , and e−5τ with numerical coefficients. The mean square error of extrapolation of ζ (s) using the formula (2.23) is [ ] σ 2 (τ ) = 1 − e−2τ β0 + β1 e−2τ + β2 e−4τ + β3 e−6τ + β4 e−8τ ,
(2.24)
where β 0 ,…, β 4 are constants which will not be given here by limiting ourselves with the graph of the function σ 2 (τ ) (the dashed lines in Fig. 2.37). In order to determine the best nonlinear forecast ξ˜0 (t, τ ) of the process (2.20), 1 5 η , where η is distributed one should calculate the mean value of the variable 945 √ 10 in accordance with the Gauss law with the mean value 945e−τ ξ 1/5 (t) and the −2τ variance 1– e ; it easily leads to the formula ξ˜0 (t, τ ) =
15 −τ 10 −3τ e (1 − e−2τ )2 ξ 1/5 (t) + e (1 − e−2τ )ξ 3/5 (t) + e−5τ ξ(t) 9452/5 9451/5 (2.25)
According to this formula, the mean square error of extrapolation is σ02 (τ )
= (1 − e
−2τ
] [ 16 −2τ 16 −4τ 8 −6τ 8 −8τ . ) 1+ e + e + e + e 21 21 63 63
(2.26)
Finally, we can study a third method of extrapolating the process (2.20) using the formula ξ˜1 (t, τ ) = e−5τ ξ(t),
(2.27)
which has the property that the smallest possible value for it is I I2 I I 1/5 σ˜ 2/5 (τ ) = MIξ 1/5 (t + τ ) − ξ˜1 (t, τ )I ; in this case, the mean square error of extrapolation
(2.28)
Continuous Markov Random Processes
91
] [ I2 ( I ) 11 47 I I σ12 (τ ) = M Iξ(t + τ ) − ξ˜ (t, τ )I = 1 − e−2τ 1 + e−2τ + e−4τ + e−6τ − e−8τ 21 63 (2.29) will naturally be greater than (2.26) and even greater than (2.24) [cf. Figure 2.37 where all three functions (2.24, 2.26, and 2.29) are shown with dashed lines]. Figure 2.37 shows that the differences between the three functions σ 2 (τ ), σ02 (τ ), and σ12 (τ ) in the case of 2 (p. 88) turn out to be as small as in the case of 1 (p. 86). This conclusion is also supported with Fig. 2.38 where the dashed lines that show the ratios σ0 (τ )/σ (τ ) and σ1 (τ )/σ (τ ) at different values of τ. And we see that in our second example σ (τ ) exceeds σ0 (τ ) by not higher than 2% while the root mean square error σ1 (τ ) of forecast according to (2.27), which is optimal according to the criterion (2.28), exceeds σ0 (τ ) by not more than 5% and σ (τ ) by not more than 3%-4%. (b) Diffusion-type Markov processes Consider a stationary diffusion Markov process ξ (s) having a stationary distribution w(x) and the transition probabilities p(τ, x, y) that satisfies a Fokker-Plunk equation of the form ∂p ∂ ∂2 = L y p, L y p(y) = − [A(y) p(y)] + 2 [B(y) p(y)]. ∂τ ∂y ∂y
(2.30)
If the elliptic differential operator L y has an infinite sequence of eigenvalues λ0 = 0, − λ1 , − λ2 (0 < λ1 ≤ λ2 , ≤…), which has a full orthonormalized system of eigenfunctions l0 = aw(y), l 1 (y), l 2 (y),…, then p(τ, x, y) for such process will be given with the formula. p(τ, x, y) =
∞ {
e−λk τ lk (x)lk (y).
(2.31)
k=0
Therefore, the optimal linear forecast ξ˜0 (t, τ )(which coincides with the mean value of the probability distribution having a density p(τ, ξ(t), y) will be given with the formula ξ˜0 (t, τ ) =
∞ {
ak e−λk τ lk [ξ(t)],
(2.32)
k=0
where ak are constants and the correlation function Bξ ξ (τ ) = M|ξ(t)ξ(t + τ )| and respective spectral density f ξ ξ (λ) will be Bξ ξ (τ ) =
∞ { k=0
bk e−λk |τ | , f ξ ξ (λ) =
∞ { k=0
bk λk [ ]. π λ2 + λ2k
(2.33)
92
2 Analysis of Scalar Time Series
where bk are (nonnegative) constant not coinciding with ak . The spectral density f ξ ξ (λ) is generally a meromorphic function of λ having an infinite number of poles, which means that finding an explicit formula for the best linear forecast ξ˜ (t, τ ) will hardly be possible; however, we can always cut off the Bξ ξ (τ ) series after a small finite number of its terms and investigate a linear extrapolation formula which would by optimal for this “broken” correlation function. However, if it will be found that even with the linear extrapolation following this formula [which is not “the best of the best” for our process ξ (s) but seemingly leads to a mean square error just slightly higher than σ (τ )], the mean square error of extrapolation differs by just a small value from the mean square error of the best nonlinear prediction (2.32), it will prove that as compared with the best linear forecast, the best nonlinear forecast cannot give us a noticeable gain. These general considerations will now be illustrated with a simple example. 3. Consider a stationary diffusion process within a finite interval [−π/2, π/2] described with the common diffusion equation ∂2 p ∂p = a2 2 ∂τ ∂y
(2.34)
with the boundary conditions I I ∂ p II ∂ p II = = 0. ∂ y I y=−π/2 ∂ y I y=π/2
(2.35)
Obviously, the density of the stationary distribution here is a constant: w(x) =
π 1 −π , ≤x≤ . π 2 2
(2.36)
The orthonormalized system of eigenfunctions of the operator L y and the respective system of eigenvalues is given in this case with the formulae ⎡ 2 ⎤ √ sin nx, n = 1, 3, 5, ..., ⎢ π ⎥ 1 ⎥ l0 (x) = √ , ln (x) = ⎢ ⎣ ⎦ 2 π √ cos nx, n = 2, 4, 6, ... π
(2.37)
λn = a 2 n 2 , n = 0, 1, 2, . . .
(2.38)
and
It is easy to obtain from this that
Continuous Markov Random Processes
p(τ, x, y) =
93
∞ 2{ 1 + exp[−(2k + 1)2 a 2 τ ] sin(2k + 1)x sin(2k + 1)y π π k=0
+
∞ 2{ exp[−(2k)2 a 2 τ ] cos 2kx cos 2ky, π k=0
(2.39)
[ ] ∞ 4 { (−1)k exp −(2k + 1)2 a 2 τ sin[(2k + 1)ξ(t)], π k=0 (2k + 1)2
ξ˜0 (t, τ ) = ξ˜0t,τ (ξ(t)) =
(2.40) ∞ 8 { exp[−(2k + 1)2 a 2 |τ |] , π 2 k=0 (2k + 1)4
(2.41)
∞ 1 8a 2 { . 3 2 2 π k=0 (2k + 1) [λ + (2k + 1)4 a 4 ]
(2.42)
Bξ ξ (τ ) = f ξ ξ (λ) =
It is not difficult to calculate that the mean square of extrapolation error of the process ξ (s) in accordance with the formula (2.41) equals (π/2 (π/2 σ02 (τ )
[y − ξ0t,τ (x)]2 w(x) p(τ, x, y)d xd y
= −π/2 −π/2
=
2 ∞ 8 { e−2(2k+1)a τ π − 2 . 12 π k=0 (2k + 1)4
(2.43)
As for the best linear prediction that corresponds to the correlation function (2.41), it cannot be presented explicitly; yet, as the absolute values of consecutive terms of the series in the right side of (2.41) diminish very fast, one can hope that the linear extrapolation formula, which will be optimal for the case when the correlation function is given only with the first term of (2.41), will result in a mean square error that is close to the mean square error of the best linear extrapolation. Therefore, it will be rather interesting to compare the mean square error σ02 (τ ) with the mean square error σ12 (τ ) of linear extrapolation of ξ (s) according to the equation 2 ξ˜1 (t, τ ) = e−a τ ξ(t),
(2.44)
which is equal to I I2 ( ) 2 2 2 I I σ12 (τ ) = MIξ(t + τ ) − e−a τ ξ(t)I = 1 + e−2a τ Bξ ξ (0) − 2e−a τ Bξ ξ (τ ) =
∞ ) 16 { exp[−(2k + 1)2 a 2 τ ] π( 2 2 1 + e−a τ − 2 e−2a τ 12 π (2k + 1)4 k=0
(2.45)
94
2 Analysis of Scalar Time Series
By computing the values of the righthand sides of (2.39) and (2.45) at different aτ, it will be easy to see that even at the most “unproductive” value of aτ (it turns out to be close to 0.05), σ12 (τ ) exceeds σ02 (τ ) by just about 1.5% of the latter value; at most other values of aτ, the deviation of σ12 (τ ) from σ02 (τ ) is much smaller than in 1.5%. It becomes clear now that the mean square error σ 2 (τ ) of the best linear forecast (which will, of course, should be somewhere between σ02 (τ ) and σ 2 (τ )) will be barely different in this case from σ02 (τ ).
Disconnected Random Processes In issues related to applications, there are some processes which, along with continuous variations, may contain suddenly appearing and very fast “jumpy” changes of respective quantities. The mathematical models of such processes will be the disconnected random processes that have discontinuities of the first kind (“jumps”) at some random sequence of points and with some dynamic or stochastic law that controls their continuous variability between the points of discontinuity. In this section, we will discuss a number of examples of such disconnected processes, which will allow one to find explicit solutions of the tasks of linear and nonlinear optimal extrapolation. (a) Piecewise constant random processes Consider first some processes belonging to the class of point processes with adjoint random variables [cf. (Grenander 1950)], that is, the processes that include jumps at a random sequence of points …t -1 , t 0 , t 1 ,…, which defines a “point random process” along the axis −∞ < t < ∞; within the intervals between the jump points, the values of the process are constant and coincide with the values of some sequence of mutually independent and identically distributed random variables. Moreover, we may assume, without any loss of generality, that the mean value of those variables is zero and the variance equals to one, that is, M|ξ(s)| = 0, M|ξ(s)|2 = 1. 4. Let’s begin with the case of a Poisson process with adjoint random variables for which the probability of having exactly m discontinuity points within any interval )m −βT e (where β is the positive average density of length N on the axis s equals (βT m! of the Poisson point process {t n }). It is easy to see that in this case B(τ ) = Mξ(s)ξ(s + τ ) = e−βτ
(2.46)
(see, e.g., [13, p. 224]), so that ξ˜ (t, τ ) = e−βτ ξ(t), σ 2 (τ ) = 1 − e−2βτ .
(2.47)
Now, in this case, M{ξ(s + τ )|ξ(s), s ≤ t} = P{on the interval (s, s + τ ) without points ti } × ξ(t)+ + P{the interval (s, s + τ ) contains points ti } × 0 = e−βτ ξ(t),
(2.48)
Disconnected Random Processes
95
that is, ξ˜0 (t, τ ) = ξ˜ (t, τ ), σ02 (τ ) = σ 2 (τ )
(2.49)
(i.e., the property that ξ˜0 (t, τ ) can depend only upon ξ (t) follows immediately from the fact that ξ (s) is a Markov process). Thus, in spite of the fact that the process that we are studying is not Gaussian, its best prediction turns out to be linear. Note that for this same process with an unknown mean value Mξ(s), the variance of the optimal nonlinear estimate of the mean value obtained from the values of ξ (s) on the interval 0 ≤ s ≤ t (that is, the mean square error of the optimal nonlinear filter with a T-long “memory”, set for separating the constant summand from the stationary “noise” ξ ' (s) = ξ(s) − Mξ(s), according to Grenander (1950), will be, for T >> 1/β, almost twice smaller than the error variance of respective best linear estimate. 5. Let us obtain the random sequence of points …, t -1 , t 0 , t 1 , … from some poisson random sequence {ti' } (with the mean density β) by selecting every second point: ti = t2i' , i = . . . , −1, 0, 1, . . .
(2.50)
In this case, for the process ξ (s) adjoint random variables that corresponds to the sequence {t i } Mξ(s)ξ(s + τ ) = P{on the intreval (s, s + τ ) without points ti } × M[ξ(t)]2 = P{on the intrerval (s, s + τ ) without points ti' } × M[ξ(t)]2 + P{on the intrerval (s, s + τ ) there is one points ti' with an odd i} × M[ξ(t)]2 ( ) βτ . = e−βτ 1 + 2
(2.51)
Consequently, f (λ) =
β λ2 + 3β 2 2π (λ2 + β 2 )2
(2.52)
and √ ξ˜ (t, τ ) = [1 + ( 3 − 1)βτ ]e−βτ ξ(t) (∞ √ √ 2 −βτ + 2(2 − 3)β τ e exp(− 3βs)ξ(t − s)ds, 0
(2.53)
96
2 Analysis of Scalar Time Series
[ ] √ [ ] σ 2 (τ ) = 1 − e−2βτ 1 + βτ + (2 − 3)β 2 τ 2 ≈ 1 − e−2βτ 1 + βτ + 0.268β 2 τ 2 . (2.54) In this case, the optimal nonlinear prediction ξ˜0 (t, τ ) will depend only upon the value of ξ (t) and upon the distance τ 0 from point t to the latest observed jump of the process ξ (s) (at the point t − τ0 = ti = t2i' ). Having in mind that with probability βτ0 1 on the interval (t − τ0 , t) there will be no points t 'j and, with probability 1+βτ 1+βτ0 0 ' there will be one such point t2i+1 , we will have M{ξ(t + τ )|ξ(s), s ≤ t} = M{ξ(t + τ )|ξ(t), τ0 } 1 βτ0 = e−βτ (1 + βτ )ξ(t) + e−βτ (1 + βτ )ξ(t), 1 + βτ0 1 + βτ0 that is, ξ˜0 (t, τ ) =
1 + β(τ + τ0 ) −βτ e ξ(τ ). 1 + β(τ0 )
(2.55)
It also needs to be noticed that in order to calculate σ02 (τ ), the probability density function of the random variable τ0 will be h(t0 ) = 21 β(1 + βt0 )e−βt0 ; this makes it easy to determine that (∞ σ02 (τ )
=
M{[ ξ(t + τ ) − ξ˜0 (t, τ )]2 | τ0 } h(τ0 )dτ0
0
=1−e
−2βτ
[ ] eEi (−1) 2 1 + βτ − (βτ ) ≈ 1 − e−2βτ [1 + βτ + 0.298β 2 τ 2 ] 2 (2.56)
[here, Ei(x) is the integral exponential function (Gradsteyn and Ryzhik 1980)]. We see now that the formula for σ 2 (τ ) differs from the formula for σ02 (τ ) in the last case only due to a slightly bigger coefficient at β 2 τ 2 in the brackets. Calculations of σ 2 (τ ) and σ02 (τ ) according to the formulae (2.54) and (2.56) shows that at any βτ the first quantity does not exceed the second one by more than 0.1–0.2%. This approach allows one to study the case when ti = tki' where {ti' } is a Poisson sequence of points and k ≥ 3; yet, in this case the difference between σ 2 (τ ) and σ02 (τ ) turns out to be more noticeable. Similar results are also obtained for the processes which have jumps of only ±2 at t i while the values between the jumps are equal to either − 1 or + 1 consecutively [such processes are used sometimes in engineering as models of “infinitely clipped white noise” when the points t i play the role of zeroes in the initial noise that is being clipped; see, e.g., (McFadden 1958)].
Disconnected Random Processes
97
6. In the case when {t i } is a Poisson sequence, the process ξ (s) described above is usually called a “random telegraph signal” [e.g., see (Rice 1944)]. For this process, Bξ ξ (τ ) = e−2β|τ |
(2.57)
[see (Rice 1944)] and M{ξ(t + τ )|ξ(s), s ≤ t} = M{ξ(t + τ )|= ξ(t)} ( ) (βτ )2 −βτ = ξ(t)e 1 − βτ + − ... = e−2βτ ξ(t). 2! It is seen that here ξ˜0 (t, τ ) = ξ˜ (t, τ ), σ02 (τ ) = σ 2 (τ ),
(2.58)
that is, similar to case 4 on p. 94, the best forecast turns out to be linear. 7. Consider now one more “infinitely clipped noise”, for which the sequence {t i } is obtained from a poisson sequence {ti' }(with the average density β) in accordance with Formula (2.50). In this case, for τ > 0, we have ( ) (βτ )2 (βτ )3 1 −βτ 1 + βτ − + + ... B(τ ) = Mξ(t)ξ(t + t) = e 2 2! 3! ( ) (βτ )2 (βτ )3 1 + + . . . = e−βτ cos βτ + e−βτ 1 − βτ − 2 2! 3!
(2.59)
[also see (McFadden 1958)], that is, f (λ) =
β λ2 + 2β 2 , π λ2 + 4β 4
√ ξ˜ (t, τ ) = [cos βτ + ( 2 − 1) sin βτ ]e−βτ ξ(t) (∞ √ √ −βτ − 2(2 − 2)β sin βτ e e− 2βs ξ(t − s)ds,
(2.60)
(2.61)
0
√ σ 2 (τ ) = e−2βτ [1 − 2( 2 − 1) sin2 βτ ≈ 1 − e−2βτ [1 − 0.828 sin2 βτ ].
(2.62)
The best nonlinear forecast can be defined in this case in the same manner as it has been done in case 5, p. 95: ξ˜0 (t, τ ) = M{ξ(t + τ )|ξ(t), τ0 } = ξ(t)
(
[ ] (βτ )2 e−βτ (βτ )3 1 + βτ − − + ... 1 + βτ0 2! 3!
98
2 Analysis of Scalar Time Series
[ ]) (βτ )2 (βτ )3 βτ0 e−βτ 1 + βτ − + + ... + 1 + βτ0 2! 3! ( ) 1 − βτ0 = e−βτ cos βτ + sin βτ ξ(t). 1 + βτ0 (2.63) It is easy to obtain now that ([
(∞ σ02 (τ )
=
M
]2 ) ˜ ξ(t + τ ) − ξ0 (t, τ ) |τ0 h(τ0 )dτ0
0
[ ] = 1 − e−2βτ 1 − 2(1 + eEi(−1)) sin2 βτ ≈ 1 − e−2βτ [1 − 0.810 sin2 βτ ] (2.64) Comparison of (2.64) with (2.62) shows that at any βτ, the value σ (τ ) does not differ from σ 0 (τ ) by more than 0.3%. (b) Piecewise Gaussian random processes An extension of “point processes with adjoint random variables” is “point processes with adjoint continuous random processes”—the random processes ξ (s) that have discontinuities of the first kind within the random sequence of points …,t -1 ,t 0 , t 1 ,…; within the intervals between the jump points, the process coincides with pieces of continuous stationary random processes ξ i (s) with known statistical properties. In what follows, we will be discussing only the case when within all intervals t i ≤ s < t i+1 , the process coincides with a segment of a Gaussian stationary random process ξ i (s) with zero mean value and correlation function Bi (τ ); these processes are not correlated with all processes ξ j (s) at j /= i. Let’s begin with the case when all correlation functions Bi (τ ) are identical (with Bi (τ ) = B0 = const and also if β → ∞, such processes obviously turn into processes with adjoint random values). The correlation function of the process ξ (s) will now be B(τ ) = p0 (τ )Bi (τ ),
(2.65)
where p0 (τ ) is the probability of having at least one point t i within an interval of length τ. If Bi (τ ) has a rational Fourier transform and p0 (τ ) is expressed with a combination of power and exponential functions, the spectral density f (λ) corresponding to such B(τ ) will be rational so that the quantities ξ˜ (t, τ ) and σ 2 (τ ) would be possible to define with explicit formulae. As for the best nonlinear forecast, it will be described with a formula of the type ξ˜0 (t, τ ) = p(t, t + τ )ξ˜i (t, τ, τ0 ),
(2.66)
Disconnected Random Processes
99
where τ 0 has the same meaning as in the formula (2.55). Also, the quantity ξ˜i (t, τ, τ0 ) is the best linear forecast at lead time τ of the process having the correlation function Bi (τ ) obtained by using the values of the process at s − τ0 ≤ s ≤ t, while p(t, t + τ ) is the conditional probability of having at least one point t i within the interval t, t + τ on semi-axis −∞ < s ≤ t. 8. If Bi (τ ) = e−|τ | and {t i } is a Poisson sequence (with the mean density β) then p0 (τ) = p0 (t, t + τ ) = = e−βτ and
B(τ ) = e−(β+1)|τ | , ξ˜0 (t, τ ) = e−(β+1)|τ | ξ(t) = ξ˜ (t, τ ).
(2.67)
Thus, similar to the cases discussed in cases 4 and 6, pp. 94, 97, the best prediction turns out to be linear. 9. If Bi (τ ) = e−|τ | , as in the previous example, but {t i } is obtained from a poisson sequence in accordance with (2.50), then, obviously, ) (β+1)2 (3β+2) 2 β +2λ + β|τ | −(β+1)|τ | β+2 e , f (λ) = , B(τ ) = 1 + 2 π [λ2 + (β + 1)2 ]2 (
(2.68)
which means that in this case ξ˜ (t, τ ) is ξ˜ (t, τ ) = Aξ(t) + B
(∞
e−γ s ξ(t − s)ds,
(2.69)
0
where A and B are functions of β and τ, γ is a function of β, and { [ ] } / σ 2 (τ ) = 1 − e−2(β+1)τ 1 + βτ + (β + 1) 2(β + 1) − (3β + 2)(β + 2) τ 2 . (2.70) Now, similar to obtaining (2.55) and (2.56), we get 1 + β(τ + τ0 ) −(β+1)τ e ξ(t), 1 + βτ0 ) ( 1 2 2 2 −2(β+1)τ 1 + βτ − eEi (−1)β τ σ0 (τ ) = 1 − e 2 { } −2(β+1)τ ≈1−e 1 + βτ + 0.298β 2 τ 2 . ξ˜0 (t, τ ) =
(2.71)
(2.72)
When comparing the equalities (2.70) and (2.72), it is useful to have in mind that at all nonnegative values of β,
100
2 Analysis of Scalar Time Series
[ ] / 0.250β 2 ≤ (β + 1) 2(β + 1) − (3β + 2)(β + 2) < 0.265β 2 .
(2.73)
Comparing the values of σ 2 (τ ) and σ02 (τ ) shows that at any values of β and τ the first value exceeds the second one by not more than 1%. 10. If {t i } is a Poisson sequence (with a mean density β) and Bi (τ ) = (1+|τ |)e−|τ | , then p0 (τ ) = = e−βτ and
(β+1) (β+2) 2 β λ + β , f (λ) = ] . [ π λ2 + (β + 1)2 2 2
B(τ ) = (1 + |τ |)e
−(β+1)|τ |
(2.74)
It follows from here that ξ˜ (t, τ ) is defined again with (2.69) and { [ ] } / σ 2 (τ ) = 1 − e−2(β+1)τ 1 + 2τ + 2(β + 1) (β + 1) − β(β + 2) τ 2 . (2.75) On the other hand, according to (2.66), in this case,5 [ ] ξ˜0 (t, τ ) = e−(β+1)τ (1 + τ )ξ(t) + τ ξ ' (t)
(2.76)
so that one can easily see that { } σ02 (τ ) = 1 − e−2(β+1)τ 1 + 2τ + 2τ 2 .
(2.77)
(In order to make easy comparing this formula with (2.75), note that / 1 < 2(β + 1)[(β + 1 β(β + 2)] < 2 for all positive values of β.) Using (2.75) and (2.77), one can verify that in this example the ratio σ02 (τ )/σ 2 (τ ) will always stay between 1 and 0.95 at any β and τ. 11. If {t i }is a Poisson sequence of points with the average density β but Bi (τ ) = e−|τ | cos τ , then
Seemingly, the formula (2.76) shows that the best prediction ξ˜0 (t, τ ) is linear; however, then it becomes difficult to understand how it can be different from ξ˜ (t, τ ). Actually, the process ξ(s) that we are discussing now is not differentiable in the mean square [which follows immediately from (2.74)]; therefore, the quantity ξ ' (s) cannot be obtained with linear operations over the set of random variables ξ(s), s ≤ t, and, consequently, (2.76) cannot be regarded as a linear prediction. At the same time, the process ξ(s) will be differentiable almost certainly; therefore, the formula (2.76) makes sense here. 5
Disconnected Random Processes
101
Bi (τ ) = e−(β+1)|τ | cos τ , f (λ) =
λ2 + (β 2 + 2β + 2) β +1 . π λ2 + 2β(β + 2)λ2 + (β 2 + 2β + 2) (2.78)
In this case, the best linear prediction ξ˜ (t, τ ) is again given with (2.69) while the best prediction ξ˜0 (t, τ ) is (τ0 ( √ √ ) ce− 2s + de 2s ξ(t − s)dt, ξ˜0 (t, τ ) = aξ(t) + bξ(t − τ0 ) +
(2.79)
0
where a, b, c, and d are some specific functions of β, τ, and τ 0 [cf. formulae (1.54) in Yaglom (1955)]. The means square errors of extrapolation with (2.69) and (2.79) in this case will be { } / (2.80) σ 2 (τ ) = 1 − e−2(β+1)τ cos2 τ + ( β 2 + 2β + 2 − β − 1)2 sin2 (τ ) and σ02 (τ ) = 1 − e−2(β+1)τ
⎧ ⎨ ⎩
cos2 τ +
( β 4F 1, 2√ ; 2
⎫ √ ) ⎬ 2 + 2; 17 − 12 2 2 sin2 τ , √ √ ⎭ (4 + 3 2)(β + 2 2) β √
(2.81) where F(α, β; γ; z) is hypergeometric function. These formulae can be used to verify that the ratio σ02 (τ )/σ 2 (τ ) in this case is not smaller than 0.99 at all values of β and τ. Consider now the case when the sequence {t i }is again a Poisson sequence with the average density β but the correlation function Bi (τ ) takes two different values alternatively: B0 (τ ) = B2k (τ ) (within the intervals t i ≤ s < t i+1 , where i = 0, ± 2, ± 4,…) and B1 (τ ) = B2k+1 (τ ) (where i = ± 1, ± 3,…). Then, the correlation function of the process ξ (s) will be B(τ ) =
1 −βτ e [B0 (τ ) + B1 (τ )], 2
(2.82)
and if the Fourier transforms f 0 (λ) and f 1 (λ) of both B0 (τ ) and B1 (τ ) are rational, then B(τ ) will also have a rational Fourier transform f (λ) which will allow one to obtain explicit formulae for respective best linear prediction ξ˜ (t, τ ) and for respective mean square error σ 2 (τ ). /= 1 for λ In the case when the functions f 0 (λ) and f 1 (λ) are such that lim ff01 (λ) (λ) → ∞, then, according to the well-known result obtained by Slepian (1958), for any (arbitrarily small) continuous piece of a sample record of the process ξ (s) it can be exactly established whether the piece belongs to a process with the correlation function B0 (τ ) or B1 (τ ). Therefore, the best nonlinear prediction will still be given with
102
2 Analysis of Scalar Time Series
formula (2.661) where p(t, t + τ ) = e–βτ while ξ˜i (t, τ, τ0 ) should now be understood as the best linear prediction of the process with the correlation function Bi (τ ) that can be B0 (τ ) or B1 (τ ) depending upon which correlation function corresponds to the last observed continuous piece of our process ξ (s). Also, it is easy to show that if the values of ξ (s) are known to us over the entire semiaxis – ∞ < s ≤ t; this last result = 1 for λ → ∞; however, let’s dwell will stay valid even in the case when lim ff01 (λ) (λ) on it. It should also be noted that in the limit when β → 0 the process ξ (s) becomes a non-Gaussian and nonergodic process which presents, with probability ½, either a Gaussian process with the correlation function B0 (τ ) or, with the same probability, a Gaussian process with the correlation function B1 (τ ). However, the process ξ (s) will be ergodic for any positive value of β though this ergodicity will be very difficult to make useful if β is small (see the first two pages of this article). 12. If B0 (τ ) = e−β|τ | and B1 (τ ) = e−β|τ | (1 + |τ |), then ) ( (β+1)2 (2β+3) 2 |τ | −(β+1)|τ | 2β + 1 λ + 2β+1 e B(τ ) = 1 + , f (λ) = . 2 π [λ2 + (β + 1)2 ]2
(2.83)
Therefore, the best linear forecast ξ˜ (t, τ ) is again expressed with (2.69) while the respective mean square error is { [ ] } / σ 2 (τ ) = 1 − e−2(β+1)τ 1 + τ + (β + 1) (2β + 1) − (2β + 1)(2β + 3 τ 2 . (2.84) Then, in this case, ( −(β+1)|τ | ) e ξ(t) if t2i < t < t2i+1 ξ˜0 (t, τ ) = [ ] e−(β+1)|τ | (1 + τ )ξ(t) + τ ξ ' (t) if t2i−1 < t < t2i
(2.85)
and ] 1[ ] 1[ 1 − e−2(β+1)τ + 1 − e−2(β+1)τ (1 + 2τ + 2τ 2 2 2 = 1 − e−2(β+1)τ [1 + τ + τ 2 ]
σ02 (τ ) =
(2.86)
According to formulae (2.84) and (2.85), if β = 0 [that is, in the case of a nonergodic process ξ (s)], the ratio σ02 (τ )/σ 2 (τ ) changes as a function of τ between 1 and 0.915; however, even for β = 0.3 (which should be regarded as small in this case), these limits stay between 1 and 0.955 (see Fig. 2.39 where the behavior of the ratio σ02 (τ )/σ 2 (τ ) is shown for β = 0 and β = 0.3). 13. In conclusion, consider the case when the correlation functions B0 (T ) and B1 (T ) have different decrease rates and different initial variances B(0). Specifically, let
Disconnected Random Processes
103
Fig. 2.39 .
B0 (τ ) =
2C −α|τ | 2 . e−|τ | , B1 (τ ) = e C +1 C +1
(2.87)
e−(β+1)|τ | + Ce−(β+α)|τ | C +1
(2.88)
Then, obviously, B(τ ) = and λ2 + (1+β)(α+β)[C+α+(C+1)β] 1 + Cα + (C + 1)β 1+Cα+(C+1)β . f (λ) = 2 2 (C + 1)π [λ + (1 + β )2 ][λ2 + (α + β)2 ]
(2.89)
The best linear extrapolation formula corresponding to the spectral density (2.89) is the formula similar to (2.69) (but with A and B being functions of α, β, C, and τ, while γ is a function of α, β, and C) and the mean square extrapolation error with this formula is √ ( 2 (1 + β)(α + β) e−2βτ −2τ 2 −2aτ e + Ce − σ (τ ) = 1 − C +1 (α − 1)2 [/ (1 + Cα + α(C + 1)β)(C + α + (C + 1)β) × ) ] / −τ −ατ 2 (2.90) −(C + 1) (1 + β)(α + β) (e − e )
104
2 Analysis of Scalar Time Series
At the same time, we have here ( ) e−τ ξ(t) if t2i < t < t2i+1 ξ˜0 (t, τ ) = e−aτ ξ(τ ) if t2i−1 < t < t2i
(2.91)
and σ02 (τ )
( ) e−2βτ e−2τ + Ce−2ατ . =1− C +1
(2.92)
It is easy to verify that expression (2.90) is always not smaller than (2.92); however, the difference between them will usually be very minor. In fact, it easily follows from (2.91) (and also from (2.90) and (2.92) that when α = 1 the best prediction turns out to be linear and therefore σ 2 (τ ) = σ02 (τ ). (If C = 1, the case that we are discussing now obviously coincides with the case discussed in case 8 on p. 99). Now, when α and β are fixed and C → ∞ or C → 0, σ02 (τ ) →1 σ 2 (τ ) for all τ; the situation will be the same for fixed values of C, α and β and for τ → 1 or τ → 0. Finally, if C and α are fixed but β → ∞, the ratio σ02 (τ )/σ 2 (τ ) will tend to 1. All these properties of the formulae (2.90) and (2.92) make it natural that even at intermediate values of parameters contained in the formulae, the ratio σ02 (τ )/σ 2 (τ ) will be very close to 1. Indeed, our direct calculations show that if, for example, C = 1 and α = 2 (or α = ½), the ratio σ02 (τ )/σ 2 (τ ) will stay between 1 and 0.99 for all values of β and τ while if C = 1 and α = 4 (or α = ¼) the ratio will stay between 1 and 0.97 for all values of β and τ. It is only when α is very different from 1 (that is, when the damping rates of B0 (τ ) and B1 (τ ) are very different) and when “the most unproductive” values of β and τ are intentionally selected, it become possible to get smaller ratios of σ02 (τ )/σ 2 (τ ) tending to 0.9 (which is not insignificant for practical applications). Thus, if C = 1 and α = 9 (or α = 1/9), the smallest value of this ratio (that can be reached at β ≈ 0.03 and τ ≈ 3) turns out to be just a little bit smaller than 0.93. We saw in cases 1–3 that for a number of non-Gaussian continuous processes the difference between mean square error of the best linear and nonlinear predictions turns out to be very small; therefore, the results obtained in cases 8–13 make it natural to assume that for disconnected processes consisting of fragments of non-Gaussian continuous processes ξ i (s), the improvement obtained due to the switching from the optimal linear extrapolation to the nonlinear extrapolation will be very meager in many cases. Of course, one may try to validate this assumption by constructing specific examples with processes consisting of fragments of the Markov processes discussed here in paragraph 2; but we will not do it here.
References
105
The author is grateful to R.L. Dobrushin, A.N. Kolmogorov, and A.M. Obukhov for their useful comments on the issues discussed here. He is also grateful N.D. Morozova, E.Ya. Svirina, and G.I Simonova who conducted the numerical calculations used to obtain the results of this work.
References Bendat J, Piersol A (2010) Random data. Analysis and measurements procedures, 4th edn. Wiley, Hoboken Box G, Jenkins G (1970) Time series analysis. Forecasting and control. Wiley, Hoboken Box G, Jenkins G, Reinsel G, Ljung G (2015) Time series analysis. Forecasting and control, 5th edn. Wiley, Hoboken De Gooijer J (2017) Elements of nonlinear time series analysis and forecasting. Springer, Berlin IPCC (2013) Climate change 2013. The physical science basis. Cambridge University Press, Cambridge Kolmogorov AN (1939) Sur l’interpolation et extrapolation des suites stationnaires. Compte Randu de l‘Academie des Sciences. Paris 208:2043–2046 Kolmogorov AN (1941) Interpolation and extrapolation of stationary random sequences. Izv. AN SSSR, Mathematics 5(1):3–14 Lorenz E (1963) The predictability of hydrodynamic flow. T New York Acad Sci Ser II 25:409–432 Lorenz E (1975) Climatic predictability. In: The physical basis of climate and climate modelling. WMO, Geneva, Appendix 2.1, pp 132–136 Lorenz E (1995) Predictability: a problem partly solved. In: Proceedings of Seminar on predictability, vol 1, ECMWF, Reading, UK, pp 1–18 Morice C, Kennedy J, Rayner, N, Winn et al (2021) An updated assessment of near-surface temperature change from 1850: the HadCRUT5 dataset. J Geophys Res 126. https://doi.org/10.1029/ 2019JD032361 Osborn T, Jones P, Lister D et al (2021) Land surface air temperature variations across the globe updated to 2019: the CRUTEM5 dataset. J.Geophys Res Atmos 126(2) Papacharalampous G, Tyralis H, Koutsoyiannis D (2018) One-step ahead forecasting of geophysical processes with a purely statistical framework. Geosci Lett. https://doi.org/10.1186/s40562-0180111-1 Privalsky V (2021) Time series analysis in climatology and related sciences. Springer Wiener N (1949) Extrapolation, interpolation, and smoothing of stationary time series. Wiley and Sons, New York Willis M, Garces M, Hetzer C, and Businger S (2004). Infrasonic observations of open ocean swells in the Pacific: deciphering the song of the sea. Geophys Res Lett. https://doi.org/ https://doi.org/ 10.1029/2004GL020684 Yaglom A (1962) An introduction to the theory of stationary random functions. Prentice Hall, Englewood Cliffs Yaglom A (1987) Correlation theory of stationary and related random functions. vols 1, 2. Springer, New York
References to Attachment 2.2 Doob J (1942) The Brownian movement and stochastic equations. Ann Math 48(2):361–369 Doob J (1953) Stochastic processes. Wiley, New York
106
2 Analysis of Scalar Time Series
Gradsteyn I, Ryzhik I (1980) Table of integrals, series, and products. Academic Press, San Diego Grenander U (1950) Stochastic processes and statistical inference. Ark Mat 1(3):195–227 Kolmogorov AN (1941) Interpolation and extrapolation of stationary random sequences. Izv. AN SSSR, Mathematics 5(1):3–14 Krein M (1945) On an A.N. Kolmogorov’s extrapolation problem. Dokl. AN SSSR 48(8):339–342 Kuznetsov P, Stratonovich R, Tikhonov V (1954) Transmission of random functions through nonlinear systems. Autom Telemechanics 15(3):200–205 Lubbok I (1959) The optimization of a class of non-linear filters. Proc. Instn. Electr. Engrs, C, 344 E, 1–15 Masani N, Wiener N (1959) Non-linear prediction. Probability and statistics (the Harald Cramer Volume), Stockholm-New York, 190–212 McFadden J (1958) The fourth product moment of the infinitely clipped noise. IRE Trans of Inform. Theory, I T-4(4):159–162 Pugatchev V (1960) Theory of random functions and its applications to the problems of automatic control, 2nd edn, Moscow, 1960. English translation by Pergamon Press, Oxford, 1965 Rice S (1944) Theory of fluctuation noise. The Bell Syst Tech J 24(1):282–332 Sherman S (1958) Non-mean-square error criteria. IRE Trans Inform Theor I, T4(3):125–126 Slepian D. (1958) Some comments on the detection of Gaussian signals in Gaussian noise. IRE Trans Inform Theor I T 4(2):65–68 Wiener N (1949) Extrapolation, interpolation, and smoothing of stationary time series. Wiley and Sons, New York Yaglom A (1952) Introduction in theory stationary random function. Adv Math Sci 75(51):3–168 Yaglom A (1955) Extrapolation, interpolation and filtering of stationary processes with rational spectral density. Contr Moscow Math Soc 4:333–374 Zade L (1953) Optimum nonlinear filters. J Appl Phys 24(4):396–404
Chapter 3
Bivariate Time Series Analysis
3.1 Introduction Bivariate time series contain two scalar components and are regarded here as the input and output of a linear stochastic system. In natural sciences as well as in engineering, the goal of computations, which are conducted in this case with the executable program AVESTA3, is to estimate and understand major statistical and, if possible, physical properties of the input and output time series and the interaction between them. The information about this interaction is characterized with a bivariate linear stochastic equation in the time domain and with a number of functions in the frequency domain. Bivariate time series models are used in natural sciences mostly for two tasks: finding teleconnections and reconstructing time series into the past. The teleconnections are understood as dependences between time series obtained at different geographical coordinates and/or between time series consisting of different physical data. A typical case would be looking for the effect of the El Niño−Southern Oscillation (ENSO) phenomenon upon monthly precipitation at some point in South or North America during one or several seasons or at climatic scales. Another major task is reconstruction of the output data missing over a usually long initial time span of a bivariate time series. This operation is conducted through using the dependence of the output upon the input, on the conditions that the latter is known over the entire time span of interest and that both components are known simultaneously for a relatively short interval at the end of the bivariate time series. This is a typical task in many sciences mostly at climatic time scales from years to millennia. The methods developed within the theory of random processes can be applied to bivariate time series through the parametric and nonparametric approaches. The Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-16891-8_3.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 V. Privalsky, Practical Time Series Analysis in Natural Sciences, Progress in Geophysics, https://doi.org/10.1007/978-3-031-16891-8_3
107
108
3 Bivariate Time Series Analysis
goal of the nonparametric analysis is to estimate functions that characterize the time series behavior in the frequency domain. These functions are found straightforwardly through the time series without constructing their time domain models; the frequency domain characteristics are determined through the spectral matrix which is obtained by applying Fourier transform to the time series model. The traditional in engineering and other areas nonparametric approach is linear and it is also applied to natural processes. The approach is described in detail for scalar, bivariate, and multivariate time series in the classical books by Julius Bendat and Allan Piersol published several times under slightly different titles starting from 1966 and ending with the fourth edition of the book Analysis of random data (Bendat and Piersol 2010). The books present valuable sources of mathematical knowledge for applications involving random processes. They are practically unknown in Earth and solar sciences, which is highly regrettable and can be regarded as a cause of the mathematically incorrect approach to time series analysis that dominates natural sciences. The nonparametric approach provides estimates of frequency-dependent characteristics and therefore it cannot be used directly for reconstructing the trajectory of a time series component within the time domain. The solution of the time series reconstruction task should be found through the parametric analysis because it provides both time and frequency domain estimates of time series statistics. The parametric approach begins with building a linear time domain model of the multivariate time series in the form of stochastic difference equations (two equations for a bivariate time series) which describe the dependence of each scalar time series upon its own past and upon the past of the other component(s) plus some noise. A bivariate time series contains two scalar time series as its components. The current value of each component depends upon its past values and may depend upon the past values of the other component. In other words, the components may affect each other. This is called the autoregressive model of bivariate time series and it is expressed in the time domain with the following stochastic difference equation: (1) (1) (2) (2) x1,t = ϕ11 x1,t−1 + ϕ12 x2,t−1 + ϕ11 x1,t−2 + ϕ12 x2,t−2 ( p)
( p)
+ · · · + ϕ11 x1,t− p + ϕ12 x2,t− p + a1,t (1) (1) (2) (2) x2,t = ϕ21 x1,t−1 + ϕ22 x2,t−1 + ϕ21 x1,t−2 + ϕ22 x2,t−2 ( p)
( p)
+ · · · + ϕ21 x1,t− p + ϕ22 x2,t− p + a2,t . Here x 1,t , x 2,t are scalar time series, t is discrete time, p is the maximum order of autoregression, and ϕ (i) jk are the autoregressive coefficients (i = 1, 2; j, k = 1, …, p). The time series x 1,t and x 2,t contain a random disturbance in the form of zero mean innovation sequences a1,t , a2,t . Therefore, the mean values of the time series x 1,t , x 2,t are also equal to zero. The maximum number p of past values of the time series contained in this bivariate equation is called the model’s order and the system described with it is a bivariate autoregressive model AR(p). The bold font is used here and later to identify bi- or trivariate models of time series.
3.1 Introduction
109
Then, the equation is converted with a Fourier transform into the spectral matrix of the time series and the matrix is used to calculate the same frequency dependent characteristics as in the nonparametric case. What is proposed and realized here with the help of the AVESTA3 program is the autoregressive parametric approach and it allows one to solve both tasks: detecting teleconnections and reconstructing the missing part of a time series component. The term “autoregressive” clarifies the principle of the approach: in this chapter, the time series is represented with a bivariate linear equation that shows its “memory”: how far the current value of the scalar output time series depends upon its own past and upon the past of the scalar input time series. The respective difference equation is physically reasonable and it is stochastic because it contains a bivariate term that describes random deviations from otherwise deterministic relationship with the past values. The random component shows how much or how little the current value of a bivariate time series deviates from the deterministic part of the equation. As seen from the above given equation, the input process may also have a memory of its own past and can be affected with the past values of the output time series. If this latter property is found to be statistically significant, we have a linear stochastic system with a closed feedback loop: the output and input time series influence each other. The autoregressive model presents an extension of the linear regression equation used in the classical mathematical statistics: it describes connections between time invariant random variables. Consequently, the regression cannot be used for reconstructing missing time-dependent data or for detecting interdependences between time series, that is, between functions of time. When you say “time series”, you exclude by definition the tools belonging to the classical mathematical statistics: a time series is a sequence of time dependent random variables. Time series research belongs to a different branch of the probability theory—the theory of random processes. In the time domain, the optimal autoregressive model (selected in accordance with respective criteria) describes linear interactions between the terms of each scalar component with its own past and the role of the past values of the other component(s) contained in the autoregressive equation. In the bivariate case, each of the two scalar equations generally contains a linear combination of its past values and a linear combination of the past values of the other component. The quantitative indices of these relationships in the form of autoregressive coefficients and respective error variances define the interdependences within the time series and describe its important statistical properties such as the degree of interdependences, its statistical predictability, feedback relations, and the degree of causality. The latter concept introduced into econometrics about 60 years ago by the Nobel laureate C. W. J. Granger can also serve as an indicator of statistical predictability of bivariate time series. Some explanations of this concept will be given below. A full list of time domain results produced by AVESTA3 is given later. The frequency domain information about the time series is obtained through the Fourier transform of the stochastic difference equation. It is more complicated than what can be obtained for random variables through the cross-correlation coefficients
110
3 Bivariate Time Series Analysis
and through a linear regression equation but it is mathematically proper and is not difficult to understand. This is an obvious advantage of the approach based upon a ready-to-use executable program (AVESTA3 in our case) that allows the user to avoid programming in the R language, Matlab, or whatever else that requires additional preliminary operations and/or knowledge. After the time domain analysis, the program calculates the spectra of the scalar time series belonging to the bivariate (in this chapter) system. The spectrum is the time series characteristic that defines most important features in the behavior of time series components, first of all, the distribution of the time series energy over the frequency axis (different frequencies mean different time scales). It also tells the researcher about the presence of random vibrations, which appear in the spectrum as statistically significant peaks. Before further explanations of how to analyze bivariate time series, it seems necessary to return to the comments about the approach to analysis of bivariate and multivariate time series that became traditional in many if not all natural sciences. Over a hundred years ago, a remarkable American scientist Andrew Douglass (astronomer, archaeologist, climatologist) discovered a dependence between precipitation and the widths of the annual tree rings thus becoming the founder of dendrochronology. His approach created a century ago is still practically unique in climatology and in other natural sciences for teleconnection research and for reconstruction of time series behavior in the past, especially, for reconstructions of climate. The approach suggested by A. Douglass is based upon the linear regression equation and the cross-correlation coefficient between the time series of a climate indices (precipitation, temperature, or whatever else) and an indirect climate indicator in the form of the tree ring widths. At that time, it was a pioneering work. Its author saw important drawbacks in his method, tried to find a more efficient solution but it was impossible at that time. The development of theory of random processes in the twentieth Century resulted in the appearance of methods of analysis based upon the theory and made it possible to avoid the outdated correlation/regression approach. The new mathematically proper methods of multivariate time series analysis in the time and frequency domains started to appear about 60 years ago and quickly became the practically universal tools in the areas related to engineering, including the design, testing, and application of devices that may be influenced by random vibrations, from kitchen devices to sky scrapers, aircraft, intercontinental missiles, and spaceships. With some rare exceptions (the solid Earth physics), the natural sciences seem to have stayed unaware of what had been happening for many previous decades in the theory of random processes and in methods of time series analysis. Therefore, the previous and current research upon climate reconstructions, if based upon the correlation/regression approach, should not be trusted. Respective results are incorrect. As for studying the dependence between time series (teleconnection research) it also must be done with the tools based upon the theory of random processes and information theory. But this is still not happening. Suffice it to notice that the very popular journal Nature published about 75 articles on teleconnections in 2020 and
3.1 Introduction
111
during the first four months of 2021 and in most if not all of them, the methods developed within the theory of random processes were ignored and the mathematical tools used by the authors were the cross-correlation coefficients and regression equations. Similar to the time series reconstruction case, the results of such studies are incorrect. The refusal to use mathematically proper methods developed during the last 50– 60 years by professionals leads to incorrect solutions of important problems (such as climate reconstructions to support climate projections) and does not do much honor to natural sciences. The move from the scalar time series analysis to the simplest multivariate case leads to analysis of bivariate time series and it presents a cardinally different stage in time series research. As shown in Chap. 2, scalar time series possess statistical properties which need to be revealed and studied within the framework of theory of random processes, information theory, and mathematical statistics. The most important properties in the scalar case are the probability density function and spectral density. The stochastic time domain models are important for understanding the behavior of the time series and are necessary for time series forecasting. The spectral density is closely related to the property of statistical predictability and, if found satisfactory, the time domain autoregressive model can be used for forecasting the time series within the Kolmogorov-Wiener theory of extrapolation. A bivariate system has some time and frequency domain properties that do not exist in the scalar case. Both physically and mathematically, it is reasonable to treat a bivariate time series as a stochastic system with one input and one output, plus an additive noise. This approach was the basis of the ground breaking book by Julius Bendat and Allan Piersol first published in 1966. The 2010 edition summed up the content of their previous publications and it deals in detail with mathematical and statistical issues related to the frequency domain properties of multivariate time series, which are practically never applied for analysis and prediction of multivariate time series in natural sciences. To the best of this author’s knowledge, the last Bendat and Piersol book has no analogs, is practically unknown in natural sciences, and widely known and regularly applied in engineering, in particular, for tasks related to response of engineering devices to different forcing functions such as wind, atmospheric turbulence, thermal factors, highway or railway roughness, etc. Another radical event in the area of time series analysis and forecasting was publication of the fundamental monograph by George E. P. Box and Gwilym M. Jenkins in 1970—the book that summed up the knowledge of parametric time series analysis and forecasting at the engineering level. In contrast to the Bendat and Piersol monographs, the books by G. Box and his co-authors G. Jenkins, G. Reinsel and G. Ljung in the latest—the fifth!—edition of 2015 are known in natural sciences but it seems that the importance of this remarkable book is not valued as it deserves and is treated in natural sciences as if it were just another means of time series analysis and forecasting on a par with numerous mathematically baseless methods such as neural networks, machine learning, etc.
112
3 Bivariate Time Series Analysis
On the whole, the natural sciences continue to exist under the belief that the classical monographs on time series analysis by Bendat and Piersol (2010) and by Box with co-authors (2015), which became famous in applied sciences, have never been published. Among the many fundamentally important achievements contained in the books by G. Box and co-authors is the parametric approach to time series modeling in the time domain and a relatively brief description of the frequency domain analysis still barely known in natural sciences in the multivariate case. Though the mathematical basis of the books by Bendat and Piersol is relatively complicated, the books are widely used in many areas of technology and engineering. The number of citations of each of the books amounts to 40–50 thousand. The executable programs attached to this book combine the autoregressive analysis by Box et al. (2015) with the nonparametric frequency domain analysis by Bendat and Piersol (2010). This is achieved by building a spectral matrix of the multivariate time series through a Fourier transform of its multivariate time domain autoregressive model. This approach leads to parametric (autoregressive) estimates of frequency domain properties of multivariate time series. Probably, the most important goal of this and the following chapter is to show the reader how many important properties of multivariate time series can be discovered and studied by using the approach based upon these two classical books. It can be done with the computational tools given here without the necessity to immerse oneself into the theory of random processes, information theory, and into the time series analysis tools offered by programming languages and through other powerful sources such as Matlab. In concluding this introduction, it should be mentioned that the approach used here for time and frequency domain analysis of multivariate time series differs from the approach described in Box et al. (2015, Chap. 14), in particularly, with respect to the use of correlation matrices, which usually do not play an important role in our case. The time domain characteristics are studied directly from the stochastic difference equations while the frequency domain properties are obtained through a Fourier transform of those equations and contain some information which is not discussed in that classical volume.
3.2 Products of Bivariate Time Series Analysis with AVESTA3 The time domain properties of the bivariate time series that can be seen explicitly from its stochastic difference equation include information about • how far the memory of the output time series x 1,t extends into its own past, that is, how many past values x 1,t−k affect the current value x 1,t of the output; • how many past values of the input process x 2,t affect the output time series x 1,t at time t;
3.2 Products of Bivariate Time Series Analysis with AVESTA3
113
• how long the memory of the input time series x 2,t is with respect to its own past, that is, how many past values of the input time series x 2,t are contained in the equation for the input x 2,t ; • how many past values x 1,t-k of the output process x 1,t affect the input time series x 2,t at time t. This information allows one to understand how the components of a bivariate time series interact with each other and how one can obtain quantitative estimates of the mutual influence, in this case, through using the approach introduced in Privalsky (2021, Chaps. 8, 10, 13) for natural sciences. If the equation for the output includes a term or terms from the input equation and vice versa, one can say that the bivariate output/input system contains a closed feedback loop. The strength of the feedback can be determined through both time and frequency domain analysis of the time series; some information on using the AVESTA3 program for these purposes will be given later. The role of the unpredictable innovation sequences a1,t , a2,t will be seen by comparing the innovation sequence variances with the variances of the time series x 1,t and x 2,t . The innovation sequences can be correlated with each other and that relation may play a significant role in the behavior of the entire time series. The traditional approach through the regression equation does not allow one to study the dependence between time series because it assumes that the relationship between them can be expressed, after the removal of the mean value, with a proportionality equation x 1 = cx 2 plus a noise component whose role is defined by the cross-correlation coefficient between the time-invariant random variables x 1 and x 2 . This approach is dominant in natural sciences and the above-made statement remains true if the equation of proportionality (regression equation) is built not between the initial time series but between some quantities obtained through their preliminary analysis. A time domain statistical moments of bivariate time series include the correlation functions of the input and output processes and the cross-correlation function between them. This information is not absolutely required within the autoregressive approach but it gives a good qualitative idea about the complexity of time series behavior. The output of the AVESTA3 program contains both covariance and correlation matrices of the bivariate time series consisting of the scalar time series x 1,t and x 2,t . The time domain part of analysis results is given sequentially for each order of the bivariate model at autoregressive orders from p = 1 through p = M. It includes • the bivariate stochastic equation with estimates of AR coefficients and respective random errors, • covariance matrix of innovation sequence with variances and covariances of a1,t , a2,t (first and second terms in the first and third lines of the covariance matrix; the second and fourth lines show the RMS of estimation errors; • the determinant of the matrix (should not be too small); • the cross-correlation coefficient between a1,t and a2,t ; • statistical predictability criteria for extrapolation of the time series component at the lead time DT (the unit time step);
114
3 Bivariate Time Series Analysis
• values of bivariate order selection criteria. The diagonal terms of the covariance matrix show the variances of extrapolation of time series x 1,t and x 2,t at the lead time DT. The frequency domain information about bivariate time series is obtained here from the time domain equation given at the beginning of Sect. 3.1. Briefly, it is found through a Fourier transform of that equation and subsequent calculations of several functions that characterize the time series behavior in the frequency domain. This type of information cannot be obtained through the traditionally used correlation/regression approach because random variables do not depend upon time and, consequently, do not have correlation functions and spectral densities. The functions of frequency to be estimated for a bivariate time series and provided by AVESTA3 are • • • • • •
spectral densities, coherence function, coherent spectral density, gain factor, phase factor, and time lag between the output and input as a function of frequency.
The latter quantity has been introduced by this author having in mind that the time lags between time series components as a function of frequency may be useful in natural sciences. The time lag estimates should be treated with caution. The spectral density describes the distribution of time series energy over the frequency, that is, over different time scales. In natural sciences, the spectral density (or the spectrum) usually diminishes with frequency, which means that the energy of time series variability decreases as the time scales become shorter. In some cases, the spectral densities are not monotonic and may contain statistically significant peak or peaks. At climatic scales, that is, at time scales longer than one year, such behavior is rare. The standard notation for the spectra of time series x 1,t and x 2,t are s11 (f ) and s22 (f ). In analyzing spectral estimates of bivariate time series, one should have in mind that they will not necessarily coincide with the estimates obtained for the components of the bivariate time series regarded as scalar functions of time. Therefore, the spectra of the time series x 1,t obtained with the programs AVESTA1 and AVESTA3 are not necessarily identical. The physical cause of this phenomenon is the fact that in the bivariate case the time series x 1,t presents the output of a linear physical system that has an input process; the interaction between the output and input affects the spectra of both components. The other reason is the sampling variability of estimates. This phenomenon occurs with the input spectrum as well. The optimal autoregressive orders for two scalar time series and for one bivariate time series generally do not coincide with each other. Normally, the optimal order of a bivariate system AR(p) is smaller than the orders of the scalar models AR(p) selected for the two scalar cases. The spectrum dimension is the squared dimension of the time series divided by frequency (or multiplied by time), e.g., mm2 /cpy, or mm2 year. If the process does
3.2 Products of Bivariate Time Series Analysis with AVESTA3
115
not have a dimension, the dimension of the spectra will be one divided by frequency, that is, time. The coherence function γ 12 (f ), also called coherence or coherency, shows how the degree of linear dependence between the time series varies with frequency and it can be regarded as a frequency dependent extension of the cross-correlation coefficient between two random variables. The concept of the coherence function seems to have been borrowed from physics; it had been known to G. Granger almost 60 years ago (Granger and Hatanaka 1964) and used in that book for describing linear dependence between time series and for Granger’s causality research. It had also been used implicitly by Gelfand and Yaglom in their pioneering publication in 1957. The coherence is a dimensionless function of frequency that plays a key role in any study of bivariate time series. It varies with frequency but does not depend upon the role played by the time series, meaning that it is the same for the output/input pairs [x 1,t , x 2,t ] and for [x 2,t , x 1,t ]. In the multivariate case, the ordinary coherence function is transformed into multiple and partial coherences (see Chap. 4). Similar to the cross-correlation coefficient, the coherence function is dimensionless and does not depend upon the variances of the components x 1,t and x 2,t . In frequency domain analysis of time series, the coherence function is usually squared. 2 The coherent spectrum or coherent spectral density s11.2 ( f ) = γ12 ( f )s11 ( f ) is the part of the total spectrum s11 (f ) generated by linear dependence between the time series. Obviously, the dimension of the coherent spectrum coincides with the dimension of the output spectrum. The assumption that the time series x 1,t and x 2,t are related to each other through a linear stochastic system means that the output of the system is the result of a linear transformation of the input. That transformation is characterized with the frequency response function (FRF)—a complex-valued function of frequency. The absolute value of the frequency response function is called the gain factor and it can be regarded as a frequency dependent extension of the regression coefficient relating two random variables to each other. It describes how the input process is transformed into the output at different frequencies so that its dimension coincides with the ratio of the output and input dimensions. The phase factor is the element of the frequency response function which defines the phase shift between the input and output time series as a function of frequency. It is determined as the arctangent of the frequency dependent ratio of the imaginary and real parts of the cross-spectrum density relating the input and output processes. The phase factor does not have an analog in the traditional correlation/regression analysis because random variables do not depend upon time and, consequently, upon frequency. In a physically realizable system, the output process lags behind the input so that the phase factor of the system should be positive. However, the sign may change along the frequency axis, in particular, if the system contains feedback—a dependence of the input and output processes upon each other. The phase factor is measured in radians. The notations for the gain and phase factors here will be g12 (f ) and φ 12 (f ). The concept of the phase shift used in analysis of time series which present sample records of linearly regular processes can be understood as a phase shift between two
116
3 Bivariate Time Series Analysis
periodical functions of the same frequency. Such idea would be erroneous because a regular random process does not contain strictly periodical functions. However, it may contain components that are arbitrarily close to such functions but still take the frequency interval of a final width. In natural sciences, it seems convenient to transform the phase shift into a frequency dependent time lag. If φ 12 (f ) is the phase shift between x 1,t and x 2,t at the frequency f , the time lag τ12 ( f ) at that frequency will be τ12 ( f ) = ϕ12 ( f )/2π f . As frequency is a part of the denominator in the last expression, the estimates of time lags at very low frequencies will be unreliable. The coherence function, coherent spectrum, gain and phase factors, and the time lag factor contain quantitative information about the linear dependence between the time series, which is very useful for understanding their properties and for physical interpretations of analysis results. Yet, to the best of the author’s knowledge, they have never (or very rarely) been used in natural sciences. All these characteristics of bivariate time series can be obtained, usually almost instantaneously, with the program AVESTA3. The key stage of this approach is how to interpret the results and understand the phenomenon which is being studied and it can generally be started for time and frequency domains right after having run the time series through AVESTA3. Certainly, all estimates of time series parameters in both time and frequency domains must be accompanied with respective estimates of random errors in the form of the error variance, RMS, or confidence bounds at a given confidence level. The lack of this information makes all estimates absolutely useless. The situation with the reliability of frequency domain estimates obtained with the AVESTA3 program is problematic due to the lack of a mathematically proper method to determine the error variances of frequency dependent quantities estimated through autoregressive models. The error variances of time domain parameters, first of all, of autoregressive coefficients and white noise variances, were obtained here through extension of error variances for the scalar case (Box et al. 2015) and the error estimates are included in the results of analysis with AVESTA3. However, the autoregressive spectral analysis of multivariate time series does not seem to include a technique to get mathematically strict error variances for the estimated spectra, coherent spectra, coherence function, gain, phase, and time lag factors. The solution suggested by this author and used in AVESTA3 is approximate: the confidence bounds for estimates of all frequency dependent characteristics are calculated in accordance with the approach used by Bendat and Piersol (2010, Chaps. 7, 9). In the bivariate case, the confidence intervals for the estimated frequency functions are defined as shown in Table 9.6 of the Bendat and Piersol book and their parameter √ n d is defined for a D-variate case as N/D2 p, where N and p are the time series length and the autoregressive order selected for the time series by AVESTA3. The sequence of steps to be taken by the program is still defined with the file CAT.DAT. The ENDDATE parameter is not used for all non-scalar cases but still should be kept in CAT.DAT. All other parameters have the same meaning as before except that they are to be regarded here as the controlling parameters for bivariate time series analysis intended for obtaining the optimal, in the information theory
3.3 Finding Dependence Between Time Series with AVESTA3
117
sense, parametric model (or models) of the bivariate time series xt that consists of the scalar time series x 1,t and x 2,t . In particular, • N is the length of each time series component, • M is the maximum order of AR models that will be fitted to the time series. The maximum length N of the time series is still 1,000,000 while the maximum AR order is decreased to M = 50. The value of the maximum order M should be selected in the bivariate case from the relation M ≤ N /40. This is a bivariate version of the maximum order M ≤ N /10 recommended for the scalar time series. If the time series is scalar (that is, when its dimension D = 1), the number of autoregressive coefficients that must be estimated coincides with the maximum order M, which is not recommended to be more than N/10. In the bivariate case, D = 2 and the number of coefficients to be estimated for autoregressive order p will be D2 p. Therefore, the recommended maximum order is to be determined from the condition N/10 ≥ D2 M, or M ≤ N /40. If a scalar time series of length N = 100 is long enough for obtaining statistically reasonable estimates for the AR orders p up to p = 10, a bivariate time series of the same length should be regarded as very short and the maximum AR order for that time series should not exceed M = 2. If the user decides to disregard this recommendation, the order selection criteria will probably take care of the reliability problem by selecting only the low-order models. The frequency domain information is not given for models of exceedingly high orders.
3.3 Finding Dependence Between Time Series with AVESTA3 The main goal of this section is to demonstrate how the methods of multivariate time series analysis allow one to detect connection between two time series, in particular, even when the connection plays just a tiny role in the energy balance of the time series. Also, it will be shown that the task of climate reconstruction should be solved with a method lying within the framework of theory of random processes rather than through the regression equations. This section contains seven examples of applying AVESTA3 for studying bivariate time series. Examples 3.1 and 3.2 show the failure of the cross-correlation coefficient as indicator of dependence between time series. Example 3.3 demonstrates that the bivariate autoregressive approach presents an efficient tool for detecting teleconnections even in the presence of a very powerful background noise. Examples 3.4 and 3.5 are dedicated to detection and analysis of relations between time series of the annual global surface temperature (HadCRUT5) and the oceanic
118
3 Bivariate Time Series Analysis
component NINO3.4 of the ENSO phenomenon as the output and input of a linear stochastic system; it shows a strong interdependence between these two random processes within a limited range of time scales and at the same time demonstrates that this strong dependence just barely affects the energy budget of the global temperature variations—the most important indicator of climate variability. Example 3.6 shows how to properly restore a time series into the past using a bivariate time series which allows one to evaluate relations between its components and then use the acquired knowledge to restore the missing past of the time series of interest. One of the time series (the proxy) is known for the entire time span while the other one (the target), which needs to be restored (reconstructed), is known only over the latest part of the entire time interval. Finally, Example 3.7 shows how the autoregressive time series analysis conducted with AVESTA3 can be used for verification of climate simulated with general circulation models designed for projecting the future climate. Before continuing with the examples based upon observed and simulated data, we will prove the inability of the traditional correlation/regression approach to study relations between time series—the fact revealed 65 years ago in information theory. We will start with a figure that shows the cross-correlation function between observations of two natural phenomena and ask ourselves whether it should make the researcher to think twice before calculating a linear regression equation to restore the total solar irradiation over centuries into the past. In particular, which value of the cross-correlation function should be selected in this case as the cross-correlation coefficient for the regression equation? This cross-correlation function given in Fig. 3.1 contains many high values thus providing more information than can be obtained in a lonely cross-correlation coefficient. How can one justify the use of regression equations, scalar or multivariate, which cannot even take into account the entire cross-correlation function? A mathematically proper response will be given in this and the following examples thus proving the helplessness of the cross-correlation coefficient and regression equation in time series analysis. Note also, that the latter statement does not need to be verified through experimental examples because it had been proved mathematically; the examples are given here for researchers who are still faithful to the correlation/regression approach in time series analysis. According to numerous publications in natural sciences, the proper indicator of linear dependence between two time series is the cross-correlation coefficient. It is practically always used for detecting teleconnections between time series and for time series reconstructions, especially in climatology and related sciences. We will show now that this traditional approach is wrong. Example 3.1 Low Cross-Correlation Coefficient Misses Strictly Linear Relation The fact that the cross-correlation coefficient cannot serve as an indicator of linear dependence between time series had been discovered in information theory 66 years ago (Gelfand and Yaglom 1957); the authors of that classical work showed that the quantity which characterizes relationship between time series is the coherence function—a function of frequency. The seemingly first applications of the coherence function to show such dependences had appeared in econometrics (Granger with
3.3 Finding Dependence Between Time Series with AVESTA3
119
Fig. 3.1 Cross-correlation function between sunspot numbers and total solar irradiation of Earth
Hatanaka 1964) and in engineering (Bendat and Piersol 1966). The squared coherence function is widely used since that time in engineering but remains barely known in natural sciences. Unfortunately, these facts belonging to the information theory and widely used in methods of multivariate time series analysis are still stubbornly ignored in natural sciences. It means that results obtained through applying linear cross-correlation coefficient and regression equation to time series are wrong. In particular, it means that the climate reconstructions which are used to support the results of climate simulations with general circulation models within the framework of IPCC climate project must be completely revised. The fact that the cross-correlation coefficient, which characterizes the dependence between time-invariant random vectors, cannot be used for studying time series and should be replaced with the coherence function had also been revealed in natural sciences several times starting from 1988 (Privalsky 1988, 2015, 2018, 2021; Privalsky and Jensen 1995). Having in mind the rock-solid belief in the crosscorrelation coefficient (ordinary, partial, or multiple) as the critically important tool of multivariate time series analysis that exists for about a century in many natural sciences, we will try to persuade the reader to avoid the solution traditional in natural sciences with the help of some practical examples. A very simple case of the cross-correlation inability to serve as an indicator of linear dependence between two time series given many years ago in Privalsky and Jensen (1995) and then reproduced in Thomson and Emery (2014) had been disregarded in Earth and solar sciences. Taking into account this unshakable refusal to accept and use what had been achieved decades ago in the information theory and
120
3 Bivariate Time Series Analysis
Fig. 3.2 Components x 1,t , x 2,t of bivariate time series (a) and linear dependence between them (b)
in time series analysis, we will discuss here two strictly experimental cases that will tell us that the cross-correlation coefficient cannot serve as a quantitative measure of linear dependence between time series. Then it will be shown that the approach based upon the theory of stationary random processes provides a mathematically correct and quite satisfactory solution of the task. This example will be the first occasion when the user of this book will see how the listed earlier information provided by AVESTA3 can be used for studying bivariate time series. First, we will consider a bivariate time series of a moderate length (N = 250 time units) shown in Fig. 3.2. The scalar components of this bivariate time series present sample records of white noise with variance very close to 1. The components of the time series are given in the ESM attachment to the book. The traditional analysis through the cross-correlation coefficient shows the lack of linear dependence between the time series: the correlation coefficient r 12 between x 1,t and x 2,t equals to −0.05 (see the black line in Fig. 3.2b). An almost zero crosscorrelation for this bivariate set regarded as two random variables tells us, of course, that its components are mutually independent, both linearly and nonlinearly (because the time series is Gaussian). It can be safely to assume that further analysis of this time series if conducted within the traditional linear (or even nonlinear) regression approach would be abandoned at this stage. However, the analysis with AVESTA3 based upon theory of random processes produces very different results. The CAT.DAT file in this case should be 250
6
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
As the length N of the time series is 250, the maximum AR order M should be set to M = 6 (the maximum value of M at which 4 M ≤ N/10). When AVESTA3 is run, it will ask the user
3.3 Finding Dependence Between Time Series with AVESTA3
121
• for the name of the file that will contain the results of calculations (e.g., X1T&X2T.RES), • how many input processes you have (1 in the bivariate case), • the names of your output and input time series (X1T.TXT, X2T.TXT). We run AVESTA3 with this CAT.DAT and get the following time domain information: • CAT.DAT parameters, • output time series x 1,t and the same statistical moments which are provided by AVESTA1 (mean value, variance, RMS, asymmetry and kurtosis, standardized asymmetry and kurtosis); • respective information about the input time series x 2,t • covariance and correlation functions in a matrix form for lags from 0 to N/10; • correlation and cross-correlation functions r 11 (k), r 12 (k), r 21 (k), and r 22 (k) for |k| ≤ 25, • estimates of autoregressive coefficients for models of orders p = 1, …, M fitted to the time series and respective root mean square errors (the first and second numerical columns); the estimates with absolute values smaller than the RMS error multiplied by 1.64 are statistically insignificant at a confidence level 0.90; • covariance matrix of the bivariate white noise sequences a1,t and a2,t (see the equation at the beginning of Sect. 3.1) and its determinant; the lines 2 and 4 of this matrix show the standard errors of estimates above them, • the cross-correlation coefficient between a1,t and a2,t , • two predictability criteria as described in Chap. 2, and • values of order selection criteria. If the dependence between time series is strictly linear, the predictability criteria may not be given and the program will just stop. A mathematically convenient notation for a bivariate time series with components x 1,t and x 2,t is xt = [x 1,t , x 2,t ]' . The strike means vector transpose (switching from lines to columns or vice versa). The frequency domain information produced by the AVESTA3 program is given only for the bivariate AR models selected by at least one order selection criteria. If the maximum order AR(M) was not selected by any criteria, the frequency domain results will be given for that model as well. If the number N/(4M) is less than 10, the program will tell you about the insufficient statistical reliability. The information about the frequency dependent functions estimated by AVESTA3 includes respective 90% confidence limits. In our case, we have estimates of six bivariate autoregressive models and the program will let us know which of them is the best according to each of the five order selection criteria. The user can find the model selected by each criterion by looking for the word “chosen” in the printout. The program tells us that both time series are Gaussian: the absolute values of their standardized asymmetry and kurtosis are less than 2. It is important to remember that the components of bivariate Gaussian (normally distributed) time series cannot be related to each other nonlinearly because a nonlinear transformation of a Gaussian
122
3 Bivariate Time Series Analysis
time series is not Gaussian. This means, in particular, that using nonlinear methods in a Gaussian case is mathematically senseless. The estimated covariance and correlation matrices are given after the input’s statistical moments; generally, this information is supplementary and may be used for some specific purposes, for example, for determining the contributions of the time series components to each other (e.g., Privalsky 2021, Chap. 7). A convenient presentation of the correlation and cross-correlation functions is given a bit later and now we return to Example 3.1. The estimated correlation functions r 11 (k) and r 22 (k) and the cross-correlation function branches r 12 (k) and r 21 (k) between x 1,t and x 2,t are shown in Fig. 3.3 and they tell us that the time series x 1,t and x 2,t are close to white noise sequences (very low correlation functions) while the cross-correlation function reaches almost 1.0 at the lag k = -5. This means that the output time series is proportional to the input and that it lags behind x 2,t by five time units. It should also be noted that the task of determining the reliability of correlation and cross-correlation functions is quite difficult and requires the knowledge of the true correlation matrices of the time series xt = [x 1,t , x 2,t ]' for all values of the lag. In other words, the correlation functions present rather inconvenient tools for studying multivariate time series. The AVESTA3 program provides information about the autoregressive models of orders from 1 to M. The orders preferred by the order selection criteria may not be identical for all criteria but in our case all of them prefer the model AR(5). As seen from the printout part under the title “AR MODEL OF ORDER 5”, the output value of x 1,t is practically equal to the input x 2,t −5 that occurred five steps ago: the autoregressive coefficient at x 2,t −5 in the first equation is 0.99830. This is the only AR coefficient estimate that is statistically different from zero. The input time series x 2,t does not depend upon its past, that is, it presents a Gaussian white noise and the white noise variance practically coincides with the variance of the input time series.
Fig. 3.3 Correlation (a) and cross-correlation (b) functions of the time series xt = [x 1,t , x 2,t ]'
3.3 Finding Dependence Between Time Series with AVESTA3
123
These results completely agree with the true properties of the simulated time series x 1,t and x 2,t : they have been generated as a single sample of a Gaussian white noise (x 1,t ) and the same time series x 1,t moved behind with respect to it by five time units (x 2,t ) plus a weak white noise added for computational stability. And this is exactly what the model AR(5) tells us: x 1,t = 0.9983x 2,t −5 + a1,t with a 90% confidence interval for the AR coefficient ± 1.64 × 0.00333. This interval contains 1 while, statistically, all other coefficients do not differ from zero. The variance of the innovation sequence a1,t is negligibly small as compared to the variance of the input time series innovation sequence: ~ 0.003 against 1.016. Finally, the cross-correlation between the scalar innovation sequences a1,t and a2,t is very weak and this property proves the above-given connection between the input and output time series. Thus, the traditional approach through correlation and regression missed an almost strictly linear relation between two time series. Note also that within the framework of the traditional approach one is not supposed to move the scalar time series x 1,t and x 2,t with respect to each other in the time domain because, being random vectors, they do not depend upon time. If you make them time-dependent, you get a time series by definition and, consequently, you must use methods of time series analysis. Summing up, we can say that the time series x 1,t and x 2,t are linearly dependent upon each other or, more accurately, the time series x 1,t is linearly dependent upon x 2,t ; the dependence is described with the equation x 1,t = x 2,t-5 + a1,t and the variance of its error a1,t is by two orders of magnitude smaller that the variance of the time series x 1,t . We already have a good idea about what our time series xt is: a bivariate white noise with an almost strict linear dependence between its component. Consider now how it will be shown in the frequency domain. First, the spectra s11 (f ), s22 (f ) of the output x 1,t and the input x 2,t practically coincide with each other (Fig. 3.4a); it happens because the time series x 1,t and x 2,t differ from each other just in 10 terms (five at the beginning and five at the end). The peaks at frequencies close to 0.15 cp.t (or cpDT ) and at 0.30 cp.t are statistically insignificant: a horizontal line at the ordinate equal to 2 (the doubled variance of the time series) lies inside the confidence interval. The time series x 1,t and x 2,t have been simulated as white noise. (This result shows the importance of having confidence levels for sampling estimates.) The coherence function is shown in Fig. 3.4b and it barely differs from one over the entire frequency axis which means a linear dependence between x 1,t and x 2,t . The gray horizontal line is the upper 90% limit for estimated coherence when the true coherence is zero. The cross symbols in Fig. 3.4a mark the coherent spectrum s11.2 ( f )= 2 ( f )s11 ( f ) that shows the share of the output spectrum s11 (f ) generated by γ12 the linear dependence of the output time series x 1,t upon the input x 2,t . In this case, it is practically 100%. The linear stochastic system connecting the output time series to the input transforms the input time series into the output and the transformation is described by the frequency response function, which consists of the gain factor g12 (f ) and the phase
124
3 Bivariate Time Series Analysis
Fig. 3.4 Spectra, coherent spectrum (a), and coherence function squared (b) of the bivariate time series xt = [x 1,t , x 2,t ]'
factor ϕ 12 (f ). Generally, the transformation is frequency-dependent but in this case the role of FRF is minimal: it just adds a little bit of noise to the input process x 2,t . The behavior of the FRF components is shown in Fig. 3.5. The gain factor shows how the output process is obtained from the input as a function of frequency. In this case the transformation is not frequency dependent because the values of g12 (f ) differ from one by just one to two per cent. The phase factor (the black line in Fig. 3.5b) changes strictly linearly with frequency f , which means a constant time shift between the time series components. The confidence limits for the estimates of these functions are not shown because they are too close to the estimates shown in the figure. The time lag function is defined as
Fig. 3.5 Frequency response function: gain factor (a), phase factor and time lag (b)
3.3 Finding Dependence Between Time Series with AVESTA3
125
τ 12 (f ) = ϕ 12 (f )/2πf and its estimates at very low frequencies should not be trusted. This function does not seem to be important for engineers (at least, it is not used in the Bendat and Piersol book published in 2010) but it may be of interest in natural sciences. In this case, the lag is five time units at all but very low frequencies. The results of frequency domain analysis clearly confirm the true properties of this simulated time series: • its scalar components are obtained from the same sample record, the output being shifted with respect to the input; • both components have a variance close to one but the input process contains an additive white noise with a small variance; • the spectra show that the sample record used here belongs to white noise; • the coherence function is equal to one at all frequencies; • as the result, the coherent spectrum of the output process coincides with the input’s spectrum; • the frequency response function estimate shows the quantitative characteristics of the coefficient between the components that are linearly related to each other and the time lag between them is five time intervals. Obviously, the properties of this time series are very simple and the main goal of this example was to show how to analyze information obtained with the program AVESTA3 from bivariate time series; the example also proves that the traditional correlation/regression approach should not be used for research involving time series. More complicated cases will be discussed in other examples given in this chapter. This ends Example 3.1. Example 3.2 High Cross-Correlation Coefficient Misses Strictly Linear Relation Let x 2,t be a white noise sequence of length N = 100, having a zero mean value and unit variance and let x 1,t be the time series obtained through the following linear operation: x1,t = 0.5x1,t−1 − 0.6x1,t−2 + x2,t . The time series are shown in Fig. 3.6a (also see the ESM). For convenience, the mean value of x 1,t was changed in this figure from 0 to 4. The correlation/regression approach shows that the coefficient of determination amounts to 0.620 and the regression coefficient is close to 1.11 (Fig. 3.6b). It means that the regression equation can explain or restore, through the input time series x 2,t , 62% of the variance of the output time series x 1,t . This is pretty good but the correlation/regression approach is inapplicable to time series and their analysis made in accordance with theory of random processes is more efficient. Let us see if this last statement is true. We have a bivariate time series of length N = 100 so that the maximum autoregressive order should not exceed 2. Therefore, the file CAT.DAT should be
126
3 Bivariate Time Series Analysis
Fig. 3.6 Time series x 1,t and x 2,t (a) and linear regression between them (b)
100
2
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
Running the time series through AVESTA3, we see that both components are Gaussian and their correlation functions r 11 (k) and r 22 (k) show that the input time series x 2,t is close to a white noise while the output x 1,t has a more complicated structure (Fig. 3.7a). The cross-correlation function r 12 (k) is maximal at k = 0 but also seems to differ from zero at several lags (Fig. 3.7b).
Fig. 3.7 Correlation (a) and cross-correlation (b) functions of the bivariate time series [x 1,t , x 2,t ]
3.3 Finding Dependence Between Time Series with AVESTA3
127
The time domain results show the AR(2) model as X1 (T) = X1 (T − 1) X2 (T − 1) X1 (T − 2) X2 (T − 2) X2 (T) =
0.59195
0.13458
0.07954 0.17395 − 0.53228 0.11755 − 0.22026
X1 (T − 1) X2 (T − 1) X1 (T − 2)
0.09195 0.07951 0.06773
X2 (T − 2)
− 0.22030
0.17300 0.13457 0.17394 0.11755 0.17299,
in which only two autoregressive coefficients belonging to the output time series x 1,t significantly differ from zero. Without the statistically insignificant coefficients, the resulting dependence between the time series according to the AVESTA3 run, is x1,t ≈ 0.59x1,t−1 − 0.53x1,t−2 + x2,t x2,t = at which agrees with the AR(2) model given at the beginning of this example. Deviations from the true values 0.5 and −0.6 are caused by the sampling variability of estimates. The confidence interval for the estimates covers the true values of coefficients. Actually, the relation between the time series components is strictly linear so its analysis at higher AR orders will be strongly affected by the lack of any noise within the system making the matrices of autoregressive coefficients poorly defined. In particular, if we disregard the rule about the maximum AR order and put M = 4 into CAT.DAT, we will receive no frequency domain information and the models AR(3) and AR(4) will show that orders not equal to 2 are not acceptable. If the time series were long, say, N = 105 , we would have lost even the AR(2) model to computational instability. Note, in particular, that the determinant of the matrix containing variances and covariances of the white noise is very small: DET = 0.497089134E-05. In other words, the results of our calculations show that the two scalar time series are related to each other strictly linearly and that the AR(2) model describes 100% of the output variance. Consider now the properties of the AR(2) model in the frequency domain. The spectral densities of the components x 1,t and x 2,t are very distinctive and show that the input time series x 2,t presents a white noise which is then transformed into a time series whose spectrum contains a statistically significant peak at the frequency close to 0.2 cp.t (Fig. 3.8a). 2 The coherent spectrum s11.2 ( f ) = γ12 ( f )s11 ( f ) shown in Fig. 3.8b coincides with the spectrum s11 (f ), which means that the coherence function γ 12 (f ) equals 1
128
3 Bivariate Time Series Analysis
Fig. 3.8 a Spectra of time series x 1,t (black), x 2,t (gray); b coherent spectrum s11.2 (f )
at all frequencies (Fig. 3.9a). Thus, the stochastic system with the time series x 1,t as the output and x 2,t as the input is proved to be strictly linear and it describes the transformation of x 2,t into x 1,t in full. This is better than the 62% that would have been obtained with the traditional approach. The AVESTA3 printout provides information about the coherence function 2 squared γ12 ( f ) and the gain factor g12 (f )—the real part of the frequency response function that describes how the input process x 2,t is transformed into the output x 1,t (Fig. 3.9b). The coherence function does not differ from 1 while the gain factor, in agreement with a strictly linear model having a white noise at the input, reproduces the shape of the square root of the ratio of spectral densities s11 (f )/s22 (f ). The width
Fig. 3.9 a Coherence function between x 1,t and x 2,t and; b gain factor g12 (f )
3.3 Finding Dependence Between Time Series with AVESTA3
129
Fig. 3.10 Phase factor (a) and time lag (b)
of the confidence band for the gain factor estimate is close to zero and cannot be shown in the figure. The phase factor is not monotonic (Fig. 3.10) but it corresponds to time lags which are always less than DT. This seems to be the result of the specific form of dependence between the input and output time series, which belong to one and the same scalar model AR(2). Usually, the coherence functions, coherent spectra, gain and phase, and the time lag do not behave in the same manner. Concluding this example, we can state that • even with a high cross-correlation coefficient between the components of a bivariate time series, the correlation/regression approach produces incorrect results being incapable to reveal even a strictly linear dependence between time dependent random variables, • such dependence is easily found through the approach based upon the theory of random processes. This ends Example 3.2. Example 3.3 Discovering Elusive Phenomena: Tides in Lake Michigan The previous examples showed that the approach to analysis of dependences between time series based upon theory of random processes is capable of finding and describing such phenomena (also called teleconnections). Our task here is to show how powerful and accurate the autoregressive approach can be when it is used for finding and studying connections between time series. This will be done by describing a phenomenon which plays an essentially negligible role in the energy balance of the entire process and which is concentrated within several very narrow frequency bands. The general process is water level variations in Lake Michigan and the specific diminutive components of those variations to be studied here is the lunar and solar tides.
130
3 Bivariate Time Series Analysis
If water level variations in the Great Lakes contain a contribution from tides—the unique deterministic process in the Earth system—their influence should be reflected in relations between the time series of water levels in the lakes and in time series of water level observations obtained at stations in any oceanic area with noticeable tidal amplitudes. In particular, it should be seen in their spectra, in respective coherent spectra and the coherence functions between the time series of water levels in Lake Michigan and sea levels at any point where tides play a noticeable role. If the connection does exist, the AVESTA3 program should be able to provide information that would allow one to determine the amplitudes of tidal constituents in the lake’s water level variations. According to the NOAA (https://oceanservice.noaa.gov/facts/gltides.html), “The true tides – changes in water level caused by the gravitational forces of the sun and moon – do occur in the semi-diurnal (twice daily) pattern on the Great Lakes… [but] …the spring tide, the largest tides caused by the combined forces of the sun and moon, is less than five centimeters in height.” As the total range of water level variations in the lake can be as large as several meters, “the Great Lakes are considered to be non-tidal”. Having in mind this tiny role the tides play in the lake we need to accurately formulate the task of our analysis and the general method used for its solution. Thus, we want to • determine with a high degree of reliability whether the water level variations in Lake Michigan contain a contribution from tides caused by the lunar and solar gravitational forces and, if the tides are found to be present in the lake, • determine what tidal constituents exist in the lake’s water level observations, and • calculate, using the tools of multivariate (in this case, bivariate) time series analysis, the net contribution of tides to the lake’s water level variations and, if possible, estimate the share of different tidal constituents found in the lake. Obviously, this task is quite delicate because, according to the NOAA, the net tidal range amounts to just several centimeters while the range of water level changes is about two orders of magnitude higher. For example, the total range of water level elevations at the station #9087044 in Calumet (Lake Michigan) between January 1, 1967 and June 30, 2021 exceeds 6 m. The coordinates of the Calumet station are 41.730 N and 87.538 W. Using the AVESTA3 program as a tool for analyzing relations between two scalar time series, we intend to solve the above-listed tasks. It will be shown in what follows that the time domain models of water level variations containing tidal components are very cumbersome. Therefore, we will concentrate our efforts upon the frequency domain characteristics (spectra, coherent spectra, coherence functions, and gain factors) and also upon correlation and crosscorrelation functions of respective bivariate time series. The data source is given at the web site shown in the footnote.1 1
https://tidesandcurrents.noaa.gov/waterlevels.html?id=9087044&units=metric&bdate=202 00101&edate=20201231&timezone=LST/LDT&datum=IGLD&interval=h&action=data
3.3 Finding Dependence Between Time Series with AVESTA3
131
The sampling interval of observations is one hour and the time series length N is 8784 h: from January 1, 2020 through December 31, 2020. This year-long time series consists of very accurate (up to 1 mm) hourly observations (DT = 1 h) and, hopefully, it will help us to obtain statistically reliable results. The time series of sea level at oceanic stations used here are obtained for both the Atlantic and Pacific Oceans and the sea level variations at those sites and at the time scales from hours to about one day are dominated with tides. These stations are • Charleston, S. Carolina, USA (station #8665530, latitude 32.782 N, longitude 79.925 W), • San Francisco, California, USA (station #9414290, latitude 37.807 N, longitude 122.465 W), and • Seward (station #9455090, Alaska, USA, latitude 60.120 N, longitude 149.427 W). The data source for sea level records is the website http://uhslc.soest.hawaii.edu/ data/ belonging to the University of Hawaii. The tidal data for the oceanic stations can be found at the web site https://tidesandc urrents.noaa.gov/stations.html?type=Harmonic+Constituents. The true frequencies of the constituents used in what follows can be calculated from the information available at that site: it is given by the ratio of the tidal constituent’s speed in degrees per hour to 360. For example, the frequency of the semidiurnal constituent M2 is 28.984104/360 = 0.0805114 cph or just 0.0805 cph; this frequency does not change in time and from station to station. Firstly, we analyze the Calumet data with AVESTA1 as a scalar time series. The CAT.DAT file for it will be 8784
99
5001
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
Note that the frequency resolution in what follows is set to NF = 5001 in order to deal with the results of analysis of tides whose frequencies are known precisely. The entire time series of water level at Calumet is shown in Fig. 3.11a. The parameter K is set to zero because the time scales of the linear trend greatly exceed the scales of the short-term tidal variations that we plan to study. The ENDDATE parameter is irrelevant here and is set to zero. The water levels in the lake are given in millimeters with respect to the lake’s elevation of 177 m. The correlation function of this time series (not shown) decreases from about 0.93 at k = 1 h to 0.56 at k = 199 h. It also shows weak oscillation features in the area of semidiurnal tides. The optimal autoregressive model for the Calumet time series is either AR(35) or AR(89). The figure below shows the spectrum corresponding to the AR(35) model. Both orders are too high for time domain analysis. As noted above, the longer lasting phenomena with time scales exceeding many days are orders of magnitude more powerful than what is occurring at higher frequencies; that part of the spectrum does not show any peaks (Fig. 3.11b).
132
3 Bivariate Time Series Analysis
Fig. 3.11 Hourly water level variations in Lake Michigan at Calumet, 01/01/2020 through 12/31/2020 (a) and its autoregressive spectral estimate with 90% confidence levels (b)
At the same time, the spectrum contains a number of statistically significant peaks caused by some higher frequency phenomena. The largest such peak occurs at about the frequency of semidiurnal tides (0.08 cpy), and then there are several smaller peaks at higher frequencies related to other phenomena that are of no interest to us (for a detailed analysis of the spectrum of the Great Lakes water levels see the fundamental work by Mortimer and Fee 1976). The tides present a deterministic process and they should exist in all sufficiently large water basins. Being caused by the same external forces, they must be closely related to each other irrespective of the distance between the observation points. Therefore, we will be looking for tides in the Calumet time series through the standard observations at some randomly selected sea level stations. For the sake of brevity, we will discuss in detail only two cases: the connections between water levels at Calumet with sea level variations at Charleston and at San Francisco. As the low-frequency variations are of no interest to us here, the frequency domain characteristic will be shown in what follows for the frequency band from 0.03 cph to 0.3 cph or even to 0.1 cph. The task of determining the tidal components in Lake Michigan by using the time series recorded in Seward are left to the reader as an additional problem. An example of the time series is shown in Fig. 3.12 where the time series of water levels and the autoregressive spectral estimates for stations Calumet and Charleston calculated with AVESTA1 are shown simultaneously. Both time series are not Gaussian. The Calumet spectrum and the Charleston spectrum shown in solid black and gray lines present the spectral estimates obtained for the two scalar time series. The scalar AR models for them can be selected for two options each: as mentioned before, AR(35) or AR(89) for Calumet and AR(40) or AR(99) for Charleston. In all these cases, the statistical reliability of spectral estimates is high (the ratios N/p are large).
3.3 Finding Dependence Between Time Series with AVESTA3
133
Fig. 3.12 Water levels in Lake Michigan at Calumet (black) and at Charleston (gray) in 2020 (a) with respective spectral estimates (b)
As seen from the figure, the diurnal tides (at approximately 0.04 cph) do not seem to be present at Calumet while the semidiurnal tides (at about 0.08 cph), even if they exist, are weak. The change from the AR(35) model to AR(89) barely affects the spectrum. The Charleston models AR(40) and AR(99) have practically the same spectra both showing strong diurnal tidal constituent K1 (at 0.0419 cph with the true frequency 0.0418 cph) and the semidiurnal constituent M2 at f = 0.0805 cph (coincides with the true frequency of this constituent). Our task is to find tidal constituents in the Calumet time series and estimate their amplitudes. It will be done in three stages: 1. create bivariate time series Calumet/oceanic stations, 2. determine if they contain contributions from tides, 3. use the results of analyses that describe the dependence between the components at tidal frequencies and transform the oceanic tides into tides at the Calumet station. As the tidal components of water level variations in Lake Michigan and at Charleston are generated by the same forcing, which is external to both processes, the time series should be related to each other at respective frequencies. These dependences should be found with AVESTA3. The CAT.DAT file for it should be 8784
50
5001
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
In this case, the responses to requests from the program, can be CALUM&CHARL.RES, 1, CALUM, and CHARL (the names of respective data files). For the Calumet/San Francisco analysis, the term CHARL should be changed to SANFR.
134
3 Bivariate Time Series Analysis
Further analysis will be done for both bivariate systems: Calumet with Charleston and Calumet with San Francisco. It should be noted that ascribing the “output” and “input” labels to the time series is formal in this situation because both of them are affected simultaneously by the same external gravity forces. On the other hand, we intend to get quantitative estimates of the constituents’ amplitudes unknown to us for Calumet and well known for the oceanic stations. The knowledge of an oceanic constituent’s amplitude will be the basis for estimating the amplitude of the same constituent at Calumet. Stage 1. Continuing with the analysis of results given by AVESTA3 in the file CALUM&CHARL.RES for the first bivariate time series, we will see that the traditional approach through the cross-correlation coefficient shows the lack of dependence between the time series because the cross-correlation coefficient equals to 0.09. The cross-correlation remains below 0.11 even if the time series are shifted against each other by several hours (as is done sometimes when the traditional correlation/regression approach is used). If we judge the result by these facts, we would have to conclude that the assumption of any linear dependence between these time series is not supported with observations and we cannot reach our goal with the data that is available to us. Such conclusion would be incorrect. The program AVESTA3 provides estimates of the correlation and crosscorrelation functions; these time-dependent functions do not exist for the random variables because of their independence upon time. According to Fig. 3.13a, the correlation function of water level variations at Calumet (the black line) is almost monotonic and diminishes rather slowly, while the correlation function of the sea level time series at Charleston (the gray line) is completely dominated by semidiurnal tides. Both features should have been expected. The unexpected result is the behavior of the cross-correlation function (Fig. 3.13b): its absolute values are so small that one could say that the time series are not correlated to each other if it were not for the shape of the cross-correlation
Fig. 3.13 Correlation (a) and cross-correlation (b) functions for time series of water levels at Calumet and Charleston, 2020
3.3 Finding Dependence Between Time Series with AVESTA3
135
function which undeniably shows that the time series are mutually dependent. In spite of the very low absolute values of the cross-correlation function, the connection is obvious because it presents an almost periodical function of the lag. In other words, the cross-correlation function definitely shows that water level variations in Lake Michigan contain a seemingly periodic component and that its period amounts to about 12 h. It should be stressed that this result has been received for the case when the role of tides is insignificant, to say the least. Moreover, it shows that the standard cross-correlation analysis can bring interesting and unexpected results if the time series are treated as functions of time rather than as time-invariant random variables. A very similar picture of the correlation functions is obtained with AVESTA3 for the bivariate time series at Calumet and San Francisco. Thus, we seem to have just proved an already known fact—the presence of semidiurnal tide in both time series; yet, we are trying to verify the above given NOAA’s statement both qualitatively—what tidal harmonics exist in water level variations in Lake Michigan and quantitatively—what their amplitudes are. The optimal autoregressive models for the system are AR(40) for the Calumet and Charleston data and AR(50) for the Calumet and San Francisco data. The time domain analysis given after the correlation functions, is not possible for these models and we will switch to the frequency domain results. Stage 2. The spectra of water level variations at Calumet and Charleston regarded as components of a bivariate time series are shown in Fig. 3.14. The changes that have occurred in the estimate obtained for the same spectrum of the scalar Calumet time series (Fig. 3.12b) include a much sharper peak at the frequency of semidiurnal tide. There is also a very tiny ‘something’ close to 0.04 cph, which is statistically insignificant but still deserves attention because it happens at the frequency 0.0409 cph, which differs by about 2% from the frequency of the diurnal tidal constituent K1 . The spectrum of the Charleston time series shows two sharp and statistically significant peaks: at f = 0.0409 cph (the diurnal constituent K1 ) and at 0.0805 cph (the semidiurnal constituent M2 ). The results for the bivariate time series Calumet with San Francisco (Fig. 3.14b) are more complicated. In this case, we have four tidal harmonics in the spectrum of San Francisco data: at frequencies 0.0388 cph (very close to O1 ), 0.0418 cph (K1 ), 0.0804 cph (very close to M2 ), and at 0.0828 cph (very close to S2 ). Moreover, four peaks at exactly the same frequencies (though shockingly small at the frequencies of diurnal harmonics) are also seen in the spectrum of the Calumet time series. Thus, the qualitative task seems to have been solved: water level variations in Lake Michigan do contain contribution from tides, specifically, from the semidiurnal constituents S2 and M2 and possibly from the diurnal constituents O1 and K1 . Note that the tiny peaks at some frequencies must be taken into account because here we are dealing with a deterministic process: variations of water level caused by the gravity forces. Stage 3. Our final task is to estimate the amplitudes of those constituents in Lake Michigan. It should be done through analysis of quantitative frequency domain characteristics that describe the linear connection between the time series. First of
136
3 Bivariate Time Series Analysis
Fig. 3.14 Spectra of water levels at Calumet (black), Charleston and San Francisco (gray) in bivariate AR models Calumet/Charleston (a) and Calumet/San Francisco (b)
all, we need to know how closely the Calumet time series is related to the two oceanic time series at the tidal frequencies. In the bivariate case, it means that we need to have estimates of the squared coherence functions (frequency dependent squared cross-correlations between time series). The AVESTA3 program provides estimates of coherence function squared; their values define the percentage of the spectrum of the output time series generated by its linear dependence upon the input. The squared coherence function between the time series at Calumet and Charleston is shown in Fig. 3.15a. The values lying above the dashed line are statistically significant at confidence level 0.90. As seen from the figure, the function contains two significant peaks; they occur at 0.0409 cph and at 0.0805 cph. The first frequency differs from the frequency of the tidal constituent K1 by about 2% and this deviation has probably been caused by the sampling variability of estimates. The part of the Calumet spectrum linearly related to the tidal sea level variations at Charleston, that is, the coherent spectrum, is shown in Fig. 3.15b. At the diurnal frequency, the share of the tide in water level variations in Lake Michigan amounts to 25% while at f = 0.0805 the coherence function squared amounts to 0.988 meaning that the tide’s share, as stated just above, constitutes 98.8% of the total variability at that frequency in the spectrum of water level variations at Calumet (the gray line in Fig. 3.15b). The final step in the analysis of this model is to determine the amplitudes of the tidal components. AVESTA3 provides estimates of the frequency response function that connects the Calumet and Charleston time series. The absolute value of this function is called the gain factor. The gain factor relating variations of the two time series at a specific frequency f is defined as the coherence function multiplied by the square root of the ratio of the output spectrum to the input spectrum. The amplitude of variations at the frequency f in the output file (Calumet in this case) is found as the
3.3 Finding Dependence Between Time Series with AVESTA3
137
Fig. 3.15 Coherence function squared between water level variations at Calumet and Charleston (a) and coherent spectrum of water level variations at Calumet (b); the gray line is the Calumet water level spectrum
product of the amplitude at the frequency f in the input file (Charleston) multiplied by the gain factor. Before continuing the analysis, it should be noted that in the bivariate case, the reliability criterion (it can also be called the number of equivalent degrees of freedom) is N/4p and it amounts to 54 and 43 for the AR(40) and AR(50) model. This is not bad but, in addition, our time series are more or less closely related to each other only at several narrow frequency bands around the tidal frequencies. This is why the figures below that show the gain factors look rather exotic. But they allow one to understand that the results for the diurnal and semidiurnal tides are not equally reliable: the lower frequency part of the gain factor is less trustworthy. The distance between the confidence limits is wider or tighter depending upon the squared coherence function. When it is close to unity (the semidiurnal tides), the estimates are quite reliable. In the case of the Calumet/Charleston time series, the amplitude of the diurnal constituent K1 at Charleston is 0.105 m (http://tidesandcurrents.noaa.gov/harcon. html?id=8665530) while the value of the gain factor at f = 0.0418 is 0.0230 (see your printout by AVESTA3 and Fig. 3.16a). Therefore, the amplitude of the K1 harmonic at Calumet is 0.0024 m and its full range is 0.0048 m. As seen from the printout and the figure, the gain factor estimate is not very reliable because it has a rather wide 90% confidence interval: from 0.0064 to 0.040. The semidiurnal tide is much stronger at Charleston and the coherence function is very close to one. The estimates of the amplitude and range of the semidiurnal harmonic M2 , which has an amplitude of 0.767 m at Charleston and a very reliable estimate of the gain factor at f = 0.0805 cph is also 0.0230 (see the printout and Fig. 3.16b), are 0.018 m and 0.036 m. Thus, the maximum (spring) tide in Lake Michigan defined as the sum of the diurnal and semidiurnal ranges can be as high as
138
3 Bivariate Time Series Analysis
Fig. 3.16 Calumet/Charleston gain factor estimates at frequencies of diurnal (a) and semidiurnal (b) constituents
0.041 m, which is not too far from the NOAA estimate of 5 cm for the tides in Lake Michigan. The situation with the bivariate time series of water levels at Calumet and San Francisco is more interesting (Fig. 3.17); obviously, in this case we have the same four constituents as in Fig. 3.14, but now we can say that the relative contributions of constituents O1 and S2 are smaller than contributions from K1 and M2 and that the coherence function is much higher at the semidiurnal frequencies. Thus, we know now that water level variations in Lake Michigan contain contributions from four tidal constituents and we can calculate their amplitudes in the same manner as with the Calumet/Charleston time series.
Fig. 3.17 Coherence function squared between water level variations in Calumet and San Francisco (a) and coherent spectrum of water level variations in Calumet (b). The gray line is the Calumet water level spectrum
3.3 Finding Dependence Between Time Series with AVESTA3
139
Fig. 3.18 Calumet/San Francisco gain factor estimates at frequencies of diurnal (a) and semidiurnal (b) constituents
The amplitudes of the diurnal constituents K1 and O1 at San Francisco are 0.37 and 0.23 m, while the gain factor values at frequencies 0.0418 cph and 0.0388 cph are 0.00652 and 0.01173 (see the printout and Fig. 3.18a); therefore, the amplitudes of K1 and O1 in Lake Michigan at the Calumet station will amount to 0.0024 and 0.0027 m. The semidiurnal harmonics S2 and M2 , with their amplitudes of 0.137 and 0.576 m at San Francisco and the gain factor values 0.05934 and 0.02909 (see the printout and Fig. 3.18b), will have the relatively large amplitudes amounting to 0.0081 and 0.0167 m. The total range of water level variations in the lake that can be created by these four constituents achieves 0.06 m and, according to the figure, the estimates for the semidiurnal constituents are more reliable due to the higher values of the coherence function. The results obtained for these two bivariate time series are summarized in the Table 3.1 below. According to the table, the range of tides at Lake Michigan amounts to 4 cm or 6 cm when calculated through the tidal constituents at Charleston or San Francisco. This example shows that the autoregressive approach to analysis of bivariate time series presents a powerful tool that allows one to get characteristics of a signal even in the case when the signal is very weak and very complicated. Obtaining such results through filtering or on the basis of the traditional approach through regression equations and cross-correlation coefficients is absolutely impossible. Our results confirm the general NOAA’s estimate of the role of tides in the Great Lakes and, at the same time, demonstrate the ability to determine amplitudes of very weak individual tidal constituents (in this case, we proved the presence of diurnal constituents with a range of 1 cm) thus improving our knowledge of tides in the Great Lakes. It should be stressed again that this could have been done due to the accuracy of water level measurements. These new results may not be important from
140
3 Bivariate Time Series Analysis
Table 3.1 Estimated range of tidal constituents at Calumet station (Lake Michigan) Constituent
Frequency (cph)
Amplitude (m)
Gain factor
Range (m)
Charleston (# 8665530), optimal model AR(40) K1
0.0418
0.105
0.0230
0.0048
M2
0.0805
0.767
0.0230
0.0353
San Francisco (# 9414290), optimal model AR(50) K1
0.0418
0.370
0.00652
0.0048
O1
0.0387
0.230
0.01173
0.0054
S2
0.0833
0.137
0.05934
0.0162
M2
0.0805
0.576
0.02909
0.0336
a practical point of view but the main goal of this book is methodical: to show the reader how efficient the bivariate autoregressive analysis can be. This ends Example 3.3 and Sect. 3.3.
3.4 Teleconnection Between Global Temperature and ENSO The goal of this section is to show in detail how to detect and estimate relations between two time series, that is, how the AVESTA3 program should be used for finding and studying what is called teleconnections in natural sciences. A similar example on this subject was given in the previous section but there we could not consider the task of time domain analysis because the orders of the optimal autoregressive models were too high. Example 3.4 Global Temperature and ENSO. Time Domain Analysis In this case, the differences from the previous example include the time scales of the phenomena that are being studied and the amount of information (151 annual observations against 8784 hourly observations). The initial data here include two important time series: the annual surface temperature averaged over the globe HadCRUT5 and the annual sea surface temperature NINO3.4 within the El Niño—Southern Oscillation (ENSO) region. The choice of the annual sampling interval is logical because we are interested now in climatic variability and the time scales smaller than one year can hardly be regarded as climate. Actually, the teleconnection between the previous version of the annual global temperature HadCRUT4 and the ENSO’s component NINO3.4 had been detected in Privalsky (2021) but here we will pay more attention to the global temperature time series and ENSO within both time and frequency domains including the feedback between these two well-known time series. Initially, the analyses within this example will be conducted with the time series of global temperature that contains a linear trend. (The trend in the NINO3.4 data is
3.4 Teleconnection Between Global Temperature and ENSO
141
statistically insignificant.) The reason why the trend in the global temperature should stay as it has been observed is simple: if we are looking for interdependence between two time series, we need to remember that the processes that may indeed be mutually related, respond to each other in accordance with what they are. In other words, interactions between time series should be studied through using the observed data while possible presence of any external influences such as anthropogenic effects or whatever else, cannot generally be treated by the time series as external. The observed behavior of the output time series describes its response to the input irrespective of the output’s composition and the same is true for the dependence of the output process upon the input. The initial data for this example are taken from the website of the University of East Anglia https://crudata.uea.ac.uk/cru/data/temperature and the KNMI website https://climexp.knmi.nl/start.cgi. The first stage should be a verification of the initial data acceptability for this type of research, which requires analysis of the initial time series of annual global temperature HadCRUT5 and the annual sea surface temperature time series NINO3.4; the last time series is available from 1870 so that the common interval for them is 1870– 2020 meaning that N = 151. The test should be done using the AVESTA1 program for respective annual values of DT. For the initial monthly data from January 1870 through December 2020, it should be 1812
15
501
0
0
12
1
0.083333333
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The time series of the global temperature HadCRUT5 and sea surface temperature NINO3.4 are shown in Fig. 3.19. The time series HadCRUT5 contains a definite positive trend while NINO3.4 does not contain any visible evidence of nonstationarity (and, indeed, it will be found
Fig. 3.19 Anomalies of annual global temperature (a) and ENSO’s sea surface temperature (b), 1870–2020
142
3 Bivariate Time Series Analysis
stationary if we decide to test this property). Its analysis with AVESTA1 shows that the best autoregressive model for it is AR(4). The stationarity of the HadCRUT5 scalar time series can be verified by making sure that the roots of the linear equation with coefficients [−1 0.6326 −0.0279 0.1308 0.2321] stay within a unit circle where the sequence [0.6326 −0.0279 0.1308 0.2321] consists of the coefficients of the time series AR(4) model. In this case, one of the roots equals 0.98 but it still means that the time series is stationary. (This operation is not done by AVESTA1 and AVESTA3). Thus, the initial data and the results of its analyses as scalar time series seem to be quite regular with no unusual behavior. The CAT.DAT file for the bivariate annual data will be 151
3
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The maximum AR order M is set to 3 because at higher orders the number of autoregressive coefficients that have to be estimated is too high for the time series length N = 151. When M = 3, there will be D2 p = 12 autoregressive coefficients to estimate, which satisfies the condition D2 p ≤ N/10. The components of the bivariate time series HadCRUT5/NINO3.4 are rather short and one may suspect that if the length were longer, the optimal order could have been higher. To find out whether such situation may be realistic, one may set the maximum order of autoregression higher than it is recommended just above, for example, to set M = 5. If the best order exceeds 3, the time domain results will still be available for all orders in spite of their poor reliability and the program would tell you what the optimal order would be if the time series were longer. In this case, all order selections criteria indicate the same model AR(2) even if M = 5. The printout shows that the time series HadCRUT5 is not Gaussian (though its PDF is not very different from Gaussian) while NINO3.4 can be regarded as Gaussian: the absolute values of its standardized skewness and kurtosis stay below 2: 0.97 and −0.86. The estimates of the correlation and cross-correlation functions of the global surface temperature HadCRUT5 and the ENSO’s oceanic component NINO3.4 are shown in Fig. 3.20. This information is useful for getting a general idea of the system’s complexity. As seen from the figure, the correlation function of the global temperature stays relatively high for several lags while the NINO3.4 correlation function diminishes very quickly and then stays close to zero. This means that the time series HadCRUT5 is generated by a relatively slow process while the NINO3.4 is probably close to white noise. Besides, the correlation function of the HadCRUT5 time series decreases slowly due to the presence of significant linear trend. The maximum absolute value of the cross-correlation function barely exceeds 0.24 so that a conclusion about the lack of a significant linear dependence between the global temperature HadCRUT5 and the ENSO’s oceanic component NINO3.4 if made on the basis of the cross-correlation coefficient would look quite reasonable. However, it will be found starkly wrong in this case and, in particular, this is why the
3.4 Teleconnection Between Global Temperature and ENSO
143
Fig. 3.20 Correlation (a) and cross-correlation (b) functions of time series HadCRUT5 and NINO3.4
correlation functions should be used, first of all, for getting a first and not necessarily correct impression about the system’s structure. Continuing with the time domain analysis, we look for the text “CHOSEN’ in the output file (e.g., HC5&N34.RES) and repeat it to find that all five order selection criteria prefer the AR(2) model. Going to the text “AR MODEL OF ORDER 2”, we have X1 (T) = X1 (T − 1)
0.63754
0.08169
X2 (T − 1) X1 (T − 2)
0.06606 0.37476
0.01745 0.08192
X2 (T − 2) X2 (T) =
− 0.11386
0.01516
X1 (T − 1) X2 (T − 1)
− 1.03266 0.43324 0.41423 0.09255
X1 (T − 2) X2 (T − 2)
1.18720 0.43443 − 0.33070 0.08039
where the first numerical column contains the estimates of AR coefficients and the second one shows the root mean square errors of those estimates. All coefficients are statistically different from zero even at the confidence level of 95% (meaning that the absolute values of estimated coefficients exceed the doubled RMS of the estimate). This high reliability is another positive feature of the AR(2) model obtained for this short time series.
144
3 Bivariate Time Series Analysis
Leaving aside the RMS errors and including the innovation sequence at = [a1,t , a2,t ]' , the bivariate time series xt = [x1,t , x2,t ]' can be expressed with the following stochastic difference equation: x1,t ≈0.64x1,t−1 + 0.07x2,t−1 + 0.37x1,t−2 − 0.11x2,t−2 + a1,t . x2,t ≈ − 1.03x1,t−1 + 0.41x2,t−1 + 1.19x1,t−2 − 0.332,t−2 + a2,t The strike means a vector transpose (switching from lines to columns or from columns to lines). Here, the quantities a1,t and a2,t are the scalar innovation sequences; their variances and roles will be discussed below. As follows from the first equation, the global temperature HadCRUT5 depends upon two of its past values and upon two previous values of NINO3.4 but with smaller absolute values of AR coefficients. A similar statement is true for NINO3.4: it depends upon its own two previous values and is also affected by two previous values of HadCRUT5. All these relations are statistically significant. This situation means that the linear stochastic system HadCRUT5/ NINO3.4 contains a closed feedback loop: the processes affect each other. The next step in the time domain analysis is a comparison between statistical predictability properties of HadCRUT5 as a scalar process and as the output of the HadCRUT5/NINO3.4 linear stochastic system. The result for the scalar time series of annual HadCRUT5 values has already been received earlier when calculating the properties of HadCRUT5 as a scalar time series: the variance of the innovation sequence a1,t for the AR(4) model is 0.0164 (it is given in the file HC5.RES). In the bivariate case, it is the first element of the innovation covariance matrix and for this bivariate time series, it is 0.00992. This information is given for each bivariate model after the text “ESTIMATE OF INNOVATION COVARIANCE MATRIX…”. The first and third lines there contain the elements of the innovation sequence covariance matrix while the second and fourth lines show the RMS error of those estimates. The variance of the HadCRUT5 time series is 0.135. Therefore, the relative predictability criterion for HadCRUT5 as a component of√the bivariate time series, which is defined as the ratio of respective RMS values, is 0.00992/0.135 ≈ 0.27. This is the relative error of extrapolation of this time series within the framework of the Kolmogorov-Wiener theory at the lead time of one year. In the scalar case, this √ criterion is 0.0164/0.135 ≈ 0.35, which is 30% worse than in the bivariate case. In other words, the quality of predicting the annual global temperature at a oneyear lead time as the output of a bivariate linear stochastic system is better than its prediction as a scalar time series by about 30%. Another measure—the predictability quality criterion, which actually presents the correlation coefficient between the unknown true and the predicted values at the unit lead time (1 year) is equal in this case to 0.963 – a rather high value. In the scalar case, for the time interval from 1870 through 2020, it is 0.879. It seems probable that the bivariate approach would improve the statistical predictability of the annual global surface represented with the time series HadCRUT5 by a few years. Thus, a
3.4 Teleconnection Between Global Temperature and ENSO
145
shift to the bivariate approach brings a positive result but it would hardly improve the predictability in a cardinal manner. The predictability properties of the time series NINO3.4 as a component of respective bivariate time series can be estimated by switching the input and output processes in the system. Besides, the statistical predictability of NINO3.4 can be seen from the file HadCRUT5/NINO3.4.RES through comparing the variances or RMS values of NINO3.4 with variances or RMS values of the white noise a2,t (0.279 in line 3 after the text “ESTIMATE OF INNOVATION COVARIANCE…” for the AR(2) model). The predictability criteria RPC (relative predictability criterion) and PQC (prediction quality criterion) are 0.922 and 0.386 against 0.939 and 0.345 in the scalar case. In other words, the optimal AR(2) model for the bivariate time series shows that the component NINO3.4 is almost unpredictable and quite close to white noise. The forecasting of bivariate time series is done similar to how it is done in the scalar case but using the bivariate equation of the type shown above for the optimal AR model of the system HadCRUT5/NINO3.4. With the exception of getting estimates of the forecast error variance at the unit lead time and the predictability criteria, the task of multivariate forecasting is not discussed in this book. A more detailed mathematical analysis is given in Reinsel (2003, Chap. 2) and in Box et al. (2015, Chap. 14). The knowledge of prediction error variances in the bivariate and scalar cases allows one to determine the Granger predictability and feedback criteria. In both cases, that is, whether HadCRUT5 and NINO3.4 are regarded as the output and input or vice versa, the Granger predictability and feedback criteria are small. Respective equations are given in Granger and Hatanaka (1964) and reproduced in Privalsky (2021, Sect. 7.2). Another characteristic of bivariate systems is the cross-correlation coefficient between the innovation sequences of the output and input time series. It is also given in the AVESTA3 printout. A high correlation between the innovation sequences may lead to a high cross-correlation between the time series; then, the deterministic mutual connection between them expressed by the terms containing contributions from the past input and output time series values becomes less significant. This is an important point which also affects the multivariate statistical predictability. Summing up this example, one may say that • the cross-correlation between the global temperature HadCRUT5 and ENSO’s oceanic component NINO3.4 is low, • the linear stochastic system HadCRUT5/NINO3.4 is best described with a bivariate autoregressive model AR(2), • the model contains a closed feedback loop (mutual dependence between HadCRUT5 and NINO3.4), • the statistical predictability of the annual global temperature as a component of the bivariate time series HadCRUT5/NINO3.4 is higher than in the scalar case though the improvement is not large; the predictability of NINO3.4 is low for both scalar and bivariate cases,
146
3 Bivariate Time Series Analysis
• the Granger’s criteria of causality and feedback strength indicate a weak dependence between the global temperature and ENSO’s oceanic component. The latter issue will be discussed below. This ends Example 3.4. Example 3.5 Global Temperature and ENSO. Frequency Domain Analysis The time domain analysis of the bivariate time series HadCRUT5/NINO34 given in the previous example showed that the time series are mutually dependent but the degree of interdependence expressed through the cross-correlation function and other characteristics does not seem to be strong. In this example, we will study the relation between these time series expressed with functions of frequency. The frequency domain analysis describes the behavior of the time series at the input and output of a linear stochastic system and provides quantitative information about their interaction with several functions of frequency. The results of such analysis are supposed to help the researcher to understand how the energy transfer and/or exchange occur at the time scales where the relations between the time series become especially strong. The frequency dependent functions are given by AVESTA3 for models selected by each order selection criterion, if it does not coincide with a previous criterion. For example, if three criteria select order p = 3 and two criteria select order p = 2, the frequency dependent information will be given for models AR(3) and AR(2). In the current case, the best model is AR(2). The list of functions required for frequency domain analysis and provided by the AVESTA3 program has been given earlier in Sect. 3.2, and in all cases the first frequency domain characteristics are the spectral densities s11 (f ) and s22 (f ) of the output HadCRUT5 and the input NINO3.4 (Fig. 3.21). The spectral densities describe the distribution of time series energy at frequencies from f = 0 cycles per DT through the Nyquist frequency f N = 1/2DT cycles per DT.
Fig. 3.21 Bivariate spectral estimates s11 (f ) and s22 (f ) of annual global temperature HadCRUT5 (a) and ENSO’s oceanic component NINO3.4 (b) shown in the same scale
3.4 Teleconnection Between Global Temperature and ENSO
147
The estimated values of spectral density at very low frequencies are relatively unreliable because respective time scales may be comparable to the time series length or even exceed it. Improving their quality is possible by increasing the length N, which, of course, is not possible for any given time series. A higher frequency resolution through increasing the parameter NF cannot do it. These comments remain true for all other functions of frequency. One should also remember that the spectral estimates obtained for the same time series may be different in the scalar and bivariate versions. This is normal because in the latter case the time series become the elements of a bivariate system and changes in their spectrum estimates reflect possible interaction between the input and output time series. This statement is true for any multivariate case. An important role in this respect belongs to the optimal orders indicated by order selection criteria: they are usually highest in the scalar case so that the spectra estimated for multivariate system usually contain fewer features. The estimated spectrum s11 (f ) of the global surface temperature represented with the HadCRUT5 time series x 1,t covers several orders of magnitude and shows a general dominance of low frequency variations whose time scales exceed 5–10 years (Fig. 3.21a). In contrast to this, the ENSO spectral density represented with the sea surface temperature NINO3.4 as a bivariate time series component is less variable (one order of magnitude) and contains a maximum at approximately 0.25 cpy; yet, the confidence limits for the spectrum of NINO3.4 allow one to draw a monotonic line within them (Fig. 3.21b). Thus, the time series belonging to the stochastic system HadCRUT5/NINO3.4 have noticeably different properties: the spectrum of the output HadCRUT5 is controlled by long-term variations while the spectrum of the input process NINO3.4 contains a smooth maximum at time scales of 3–5 years. Its statistical significance is arguable and the spectrum of NINO3.4 is close to a constant. The differences between the spectra of HadCRUT5 and NINO3.4 are clearly seen in Fig. 3.22 when the spectra are shown in a logarithmic scale on both axes. This behavior of the NINO3.4 spectrum is the reason of the low predictability of NINO3.4 mentioned in the previous example. The degree of interdependence between the global surface temperature HadCRUT5 and the oceanic surface temperature NINO3.4 is described with the 2 ( f ), which, as noticed before, can be regarded as the coherence function γ12 squared frequency dependent correlation coefficient between two time series. In particular, the squared coherence shown in Fig. 3.23a, plays in the frequency domain the same role as the squared cross-correlation coefficient plays in determining the share of a random variable’s variance explained by its linear dependence upon another random variable. As seen from Fig. 3.23a, the squared coherence function between the time series HadCRUT5 and NINO3.4 exceeds the upper 90% confidence limit for the true zero coherence and thus should be regarded as statistically significant within the frequency band from 0.1 cpy to 0.4 cpy. Its maximum values above 0.6 belong to the frequency 2 ( f ) ≈ 0.680 is achieved at f = band between 0.2 cpy and 0.3 cpy; its highest value γ12 0.25 cpy. The gray horizontal lines define the upper 90% confidence limit for coherence estimates when the true coherence equals zero. Taking this into account, the
148
3 Bivariate Time Series Analysis
Fig. 3.22 Bivariate spectral estimates s11 (f ) and s22 (f ) of annual global temperature HadCRUT5 (a) and ENSO’s oceanic component NINO3.4 (b) on identical logarithmic scales
Fig. 3.23 Coherence function between HadCRUT5 and NINO3.4 and coherent spectrum s11.2 (f )
squared coherence should be regarded as statistically different from zero at frequencies between 0.1cpy and 0.4 cpy. According to this figure, the input process NINO3.4 generates from 25% to almost 70% of the global temperature spectrum within the frequency band between 0.1 cpy and 0.4 cpy. This contribution is described with the coherent spectrum s11.2 ( f ) = 2 ( f )s11 ( f ). In this time series, the coherence and the coherent spectrum increase γ12 to high values at very low frequencies due to the presence of trend. The coherent spectrum s11.2 (f ) shown in Fig. 3.23b, with a black line is quite close to the spectrum s11 (f ) of HadCRUT5 at frequencies between 0.15 cpy and 0.35 cpy and much smaller than s11 (f ) at low frequencies where it can hardly be regarded as reliable.
3.4 Teleconnection Between Global Temperature and ENSO
149
Fig. 3.24 Coherence function between HadCRUT5 and NINO3.4 and coherent spectrum s11.2 (f ) with a logarithmic scale of the frequency axis
The contribution of NINO3.4 to HadCRUT5 and vice versa becomes statistically insignificant within the frequency band from roughly 0.1 cpy through 0.01 cpy, that is, at time scales from 10 to 100 years. At lower frequencies, it increases due to the presence of the trend (the gray line in Fig. 3.24a shows the coherence for the detrended HadCRUT5). In other words, the spectrum of global temperature within the frequency band from 0.1 cpy to 0.01 cpy is not affected by the ENSO. The contribution of ENSO’s oceanic component to the annual global temperatures is highest at the frequencies where the spectrum s11 (f ) is small, that is, between approximately 0.1 cpy and 0.4 cpy. This is clearly seen in Fig. 3.24 where the frequency axis scale is logarithmic. At frequencies below 0.007 cpy, the coherence quickly increases to statistically significant values but respective time scales exceed 140 years. Such time scales are longer than the time series length and their estimates cannot be trusted. The reason for this minor effect of ENSO upon HadCRUT5 is caused by two factors: the inability of NINO3.4 to affect the global temperature at frequencies where the coherence function is insignificant and by the presence of a strong linear trend in the HadCRUT5 time series. The first factor is unavoidable because the AR models are linear but the presence of a trend cannot play a key role in processes that happen at higher frequencies. This issue will be discussed later. Continuing with the frequency domain analysis, we also estimate other quantitative indicators of the global temperature response to changes of oceanic temperature within the ENSO’s area. It is described with the frequency response function, which consists of two components: the gain and phase factors (Fig. 3.25). According to the gain factor behavior, if the sea surface temperature within the ENSO’s area NINO3.4 at frequencies from 0.1 cpy to 0.4 cpy becomes cooler or warmer by 1 °C, the global temperature variations within the same frequency band will become cooler or warmer by approximately 0.12 °C.
150
3 Bivariate Time Series Analysis
Fig. 3.25 Response of global temperature to ENSO’s oceanic component: gain factor (a) and phase factor (b)
The phase factor ϕ 12 (f ) shown in Fig. 3.25b is positive which means that changes of the global temperature lag behind changes of the NINO3.4 temperature; the almost linear increase of the phase factor with frequency shows a change from 0.44 rad to 1.20 rad. It can be transformed into a time lag τ 12 (f ) using the relation τ 12 (f ) = ϕ 12 (f )/2πf at frequencies that are not close to zero. In this case, the time lag corresponding to the frequency band where the coherence function differs from zero (Fig. 3.26) stays close to about half a year—the time required for a change in NINO3.4 to be felt by the global temperature; this result is more reliable within the frequency band from 0.2 cpy to 0.3 cpy where the linear dependence between the time series is the strongest. Thus, we have studied a bivariate linear stochastic system containing the observed time series of the annual global temperature (HadCRUT5) and the ENSO’s oceanic component NINO3.4 as the output and input. It was found that the time series of global temperature HadCRUT5 containing a strong trend is not Gaussian while the oceanic temperature data NINO3.4 is. The input and output processes form a linear teleconnection between the global temperature and the ENSO phenomenon. Our analysis of the system showed that the teleconnection is quite solid because the coherence between the input and output processes at frequencies from 0.2 spy and 0.3 cpy can be as high as 0.82 (0.68 for the squared coherence); the ENSO affects the annual global surface temperature but it happens only at the frequencies where the spectrum of the global annual temperature is weak. At longer time scales, the two time series are not mutually correlated. We also learned in Example 3.4 that the stochastic system HadCRUT5/NINO3.4 contains a closed feedback loop: the ENSO affects the global temperature, which, in its turn, affects the ENSO. Up to now, the effects of time series upon each other have been studied here only for the system with the global temperature HadCRUT5 as the output and ENSO’s oceanic component NINO3.4 as the input. Therefore, the next step should be an investigation of the system NINO3.4/HadCRUT5, which will
3.4 Teleconnection Between Global Temperature and ENSO
151
Fig. 3.26 Time lag between HadCRUT5 and NINO3.4
tell us how the ENSO’s component NINO3.4 depends upon the global temperature HadCRUT5. Some characteristics of the conversion of HadCRUT5 into NINO3.4 will be the same as before, including the spectral densities and the coherence function except that s11 (f ) will become s22 (f ) and vice versa. But all other functions, that is, the coherent spectrum, frequency response function, and time lag will change. An AVESTA3 run with the same CAT.DAT file and the time series NINO3.4 and HadCRUT5 as the output and input shows that the coherent spectrum as a part of the NINO3.4 spectrum generated by the global temperature HadCRUT5 is statistically different from zero only within the frequency band 0.15–0.35 cpy (Fig. 3.27a). This function of frequency is dependent upon the spectral density and squared coherence and loses its statistical reliability if their product is not reliable. As the squared coherence function remains the same as in the system HadCRUT5/NINO3.4, the contribution of HadCRUT5 to the spectrum of NINO3.4 will be the same as before: from 25% to almost 70% between 0.1 cpy and 0.4 cpy. In contrast to the system with HadCRUT5 as the output, this is the band where the spectrum of NINO3.4 reaches its maximum values. This allows one to state that the global annual temperature HadCRUT5 is responsible for a large part of the smooth spectral maximum of the sea surface temperature NINO3.4. The gain factor (Fig. 3.27b) shows a strong amplification of the output process NINO3.4 when the global temperature HadCRUT5 is transformed into the ENSO’s oceanic component NINO3.4. Within the frequency band from 0.15 cpy to 0.35 cpy, the gain factor exceeds 4 and increases to more than 5 in the band from 0.2 cpy to 0.3 cpy. It means that a 1 °C change of the global temperature within this frequency band will make NINO3.4 to change by 4–5 °C. This response of the ENSO’s oceanic
152
3 Bivariate Time Series Analysis
Fig. 3.27 Coherent spectrum of NINO3.4 (a) and the gain factor g12 (f ) showing the response of NINO3.4 to HadCRUT5 (b). The gray line is the spectrum of NINO3.4
component to variations of the global temperature is 35–45 times stronger that the response of the global temperature to ENSO. At the same time, we need to remember that we are discussing only a frequency band where the random variations of the global temperature would hardly be as high as 1 °C. We will not continue with further frequency domain analysis of the NINO3.4/HadCRUT5 time series leaving it to the reader and return now to the effect of ENSO upon the global temperature and vice versa. Obviously, it is a common belief that the ENSO presents the dominant role in interannual climate variability and we will now try to get quantitative estimates of the interaction between the two phenomena. The contribution of the input process to the output is described by the ratio of the coherent spectrum sum within the frequency band of interest to the sum of spectral density values of the output process within the entire frequency band. As stated above, the NINO3.4 contribution to the total HadCRUT5 spectrum amounts to less than one per cent. This is true but the estimate was received for the case when the linear trend in the averaged global temperature expressed with HadCRUT5 had not been deleted. We know that this effect exists only within the frequency band roughly between 0.1 cpy and 0.4 cpy. The spectral density at these frequencies cannot be affected by the trend so that the ENSO’s effect upon HadCRUT5 obtained in this way was artificially undervalued. The proper approach in this situation should consist of the following operations: • delete the linear trend from both time series, • find the sum of the output spectral density within the entire spectral band and the sum of coherent spectral density values within the frequency band where the coherence function is statistically significant; this band is supposed to be from 0.15 cpy through 0.35 cpy,
3.4 Teleconnection Between Global Temperature and ENSO
153
• find the contribution of the input to the output as the ratio of the above two quantities. To correctly determine this value for the global temperature affected by ENSO we will instruct the program to delete the linear trend from both time series and increase the frequency resolution to obtain more accurate estimates. The CAT.DAT file for processing the original with monthly data will be 1812
3
5001
2
0
12
1
0.083333333
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
Feeding this CAT.DAT file to AVESTA3 we will receive the printout file HC5&N34_K = 2.RES with the following important features: • both time series are Gaussian, • the sum of the HadCRUT5 spectral densities equals to 0.0368, • the sum of coherent spectral density within the interval from 0.1501 cpy through 0.3500 cpy equals to 26.26; it should be multiplied by the frequency resolution and by DT (0.0001 and 1 in this case). Thus, the contribution of NINO3.4 to the entire spectrum of the global temperature without the linear trend found as the ratio 0.002626/0.0368 amounts to 7%. Executing the same operations using the file N34&HC5_K = 2.RES, we will find that the global temperature is responsible for 34% of the ocean temperature spectrum within the ENSO area. Obviously, the influence of the ENSO phenomenon plays a small role in the behavior of the global temperature while the latter is responsible for one third of the NINO3.4 variance. Within the interval from 0.15 cpy through 0.35 cpy, the ENSO component NINO3.4 and the global temperature HadCRUT5 as inputs to respective bivariate systems is defined by the average squared coherence function in this band; it equals to 0.6, that is, the input components are responsible for 60% of the output spectrum within this frequency band. These results confirm the opinion of the ENSO’s important role in interannual climate variations and at the same time show the small (7%) role of ENSO in generation of the annual global temperature as a whole. The effect of the global temperature upon the ENSO is much stronger (34%). The results of this frequency domain analysis of the observed annual global temperature and the ENSO’s oceanic component show that • a linear dependence between the global temperature and the ENSO sea surface temperature does exist and it varies with frequency; • it is quite strong within the frequency band from 0.15 cpy to 0.35 cpy and especially from 0.2 cpy to 0.3 cpy (at time scales from 3 to 5 years) and gradually disappears at higher and lower frequencies;
154
3 Bivariate Time Series Analysis
• a 1 °C change in the ENSO temperature within that frequency band causes a change of approximately 0.12 °C in the global temperature in the same band and is practically nonexistent outside of it, • a 1 °C change in the global temperature within that frequency band seems to cause a change of approximately 5 °C in the ENSO’s global temperature within the same band from 0.2 cpy to 0.3 cpy, • the ENSO temperature leads the global temperature and the time lag amounts to approximately 0.50 years to 0.55 years within the frequency band with high coherence, • overall, the variability of the global temperature at time scales exceeding 10 years is independent of the ENSO behavior. This ends Example 3.4 and Sect. 3.4.
3.5 Time Series Reconstruction The time series reconstruction is understood here as restoring the missing part of a scalar time series. It means that we have a multivariate time series whose output process is known only over a short time span at the end of the total record while the other components are known through the entire span. The subject will be discussed here for the bivariate case. The input and output time series are called, in accordance with the tradition, the proxy and target. In solving the task, one must remember that time series is a time dependent sequence of random variables and that each time series presents a sample of some random process. In most cases, the reconstruction task appears in natural sciences at climatic time scales. It can be a reconstruction of missing data on a climate index such as temperature, precipitation, river flow or something else with the help of various observed proxy time series (ice cores, tree rings, corals, sediment cores, etc.) that are assumed to be correlated with the target data. The first reconstruction experiments were conducted about a century ago by the outstanding American scientist Andrew Douglass who is justly regarded as the founder of dendrochronology; his approach to restoring precipitation through tree ring widths data is still used as the only tool in many natural sciences for the same or other climate indices. At the time when Andrew Douglass was conducting his experiments, the theory of random process did not exist and the term ‘time series’ was unknown. In the current terminology, he was reconstructing the target time series (precipitation) using a proxy time series (tree ring widths) known over a long time interval, with both processes observed simultaneously over a relatively short time span. The Douglass’ method was to find a correlation coefficient between the target and proxy data and, if the coefficient turned out to be statistically significant, to
3.5 Time Series Reconstruction
155
reconstruct the target over the time span with no target observations using a linear regression equation obtained for the short interval of simultaneous observations. That was a century ago but since then we understood that the regression equation is applicable to random variables but not to time series. The reason lies in the fact that the random variables do not depend upon time and, consequently, they do not possess such statistical moments as correlation functions and spectra, along with other frequency-dependent functions. Incidentally, A. Douglass suspected his own method of low efficiency, tried to correct it but could not do it because a mathematical basis for such improvement did not exist (also see Privalsky 2021, p. 140). The number of publications in natural sciences dedicated to reconstructions through the correlation/regression method (CRR) is measured in thousands and, unfortunately, all of them are mathematically incorrect. The only case when the CRR method may work is when both the target and proxy time series present white noise. Such occasions may exist but the method can be used only upon the condition that a white noise model is true for both time series. To the best of this author’s knowledge, this is never done just because, judging by the current and previous publications, building time series models in either time or frequency domain do not seem to be a standard stage preceding the reconstruction. This shocking situation exists in spite of the fact that all methods of time series analysis based upon the theory of random processes do take into account the time argument. These methods allow one to solve the reconstruction problem in a mathematically proper way. One such method is described below. Example 3.6 Autoregressive Reconstruction of Time Series In this example, we will discuss a method of time series reconstruction based upon theory of random processes; it was proposed five years ago and recently described in more detail (Privalsky 2018, 2021). To the best of this author’s knowledge, the method (autoregressive reconstruction, or ARR) has never been used in natural sciences. Its application through using AVESTA3 plus a minor additional analysis will be discussed here. A natural approach to proving a method would be to know the answer to your task in advance, apply the method, and compare the results of reconstruction with the known true data. It will be done here with two simulated time series of length N = 1500; the sampling interval can be years, decades, or whatever is needed. To be specific, it is measured here in years. The components of this time series are x 1,t (the target) and x 2,t (the proxy). The relationship between them is expressed with the following simple equation: x1,t = 0.75x1,t−1 + x2,t−1 + a1,t x2,t = a2,t where a1,t and a2,t are the components of a bivariate innovation sequences—the time series that consist of identically distributed and mutually independent Gaussian random variables. This equation is a bivariate AR(1) model of order p = 1. We simulate N = 1500 values of this time series and pretend that the target time series
156
3 Bivariate Time Series Analysis
Fig. 3.28 Observed output (black) and input (gray) time series (a), regression between output and input (b)
is known to us only at values of t from 1351 through 1500. This short bivariate time series is shown in Fig. 3.28a. First, the correlation/regression method of reconstruction is not working at all in this case: the squared correlation coefficient between the output (target) and input (proxy) is close to 0.03 (Fig. 3.28b). Obviously, the linear regression equation is capable in this case to restore only 3% of the output time series variance. The CRR method in this case would not even be applied by researchers who use the traditional approach just because the time series are not correlated with each other. Consider now what needs to be done. To properly reconstruct the target time series x 1,t , one should go through the following two simple steps: • determine the autoregressive order p and parameters of the bivariate time series at values of t from 1351 through 1500 and • use it to reconstruct the target x 1,t for t = p + 1, …, 1350. This will complete the process of reconstruction. According to stage one, we run the bivariate time series target/proxy at values of t from 1351 through 1500 through AVESTA3. The CAT.DAT file should be 150
3
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The maximum AR order is chosen in the way that the ratio of the time series length N to the number of AR coefficients to be estimated is close to 1/10. The resulting file will be called TAR&PRO_150.res. All five order selection criteria choose the model AR(1) which can be written as x1,t ≈ 0.77x1,t−1 + 1.03x2,t−1 + a1,t . x2,t ≈ −0.12x1,t−1 − 0.11x2,t−1 + a2,t
3.5 Time Series Reconstruction
157
All AR coefficients but -0.11 are statistically significant. Note also that the selected model differs from the true model given just above where the proxy time series did not depend upon the target time series and just presented a white noise. This ends the first stage of our reconstruction. It took about one minute to prepare the CAT.DAT file and about 0.1 s for AVESTA3 to obtain the results. The second step is to obtain the reconstruction xˆ1,t−1 of the target time series x 1,t from t = 2 through t = 1349. It should be done as xˆ1,2 = 1.02588x2,1 , xˆ1,3 = 0.772281xˆ1,2 + 1.02588x2,2 , . . . All values of the proxy time series x 2,t are known and the target values can be obtained from the first equation, with the initial value of t being equal to p + 1, where p is the order of the AR model. The true and restored time series of x 1,t are shown in Fig. 3.29 for two randomly selected short time spans because the true and reconstructed time series are very similar and showing them for the entire time span from t = 2 through t = 1350 looks messy. Both files were found to be Gaussian. Visually, the quality of reconstruction is quite satisfactory. For a quantitative comparison, we can use AVESTA3 to see the connection between the true and restored time series for the first 1349 values A quality criterion can be defined as the ratio of the true and simulated time series variances. In this case, it is approximately 2.36/3.40, that is, the autoregressive reconstruction method resulted in restoring 69% of the true time series (against 3% if the traditional correlation/regression method were applied). Thus, the task of reconstructing the assumably unknown longer component of the time series using a short time series containing both components is solved, the solution was easy to obtain and it is quite satisfactory.
Fig. 3.29 The true (gray) and restored (black) time series in Example 3.6
158
3 Bivariate Time Series Analysis
The method of reconstruction described above belongs to the time domain only, so that the spectral analysis of the resulting time series (reconstructed and reconstruction error) is not required. However, it is strongly recommended, especially when the share of the time series that had been reconstructed is not as large as in this case. Such analysis should include a time domain AR model and the spectra of the reconstructed time series and reconstruction error. The frequency domain analysis shows that the spectrum of the restored time series has the same shape as the spectrum of the time series that was being restored and as the spectrum of the reconstruction error (Fig. 3.30a). The models of all these scalar time series is a Gaussian AR(1) process. Besides, the coherence between the initial output and reconstructed time series stays practically constant at 0.68 with the gain factor close to one (Fig. 3.30b). Finally, the coherence between the restored time series and its error is practically zero (not shown). All these properties confirm that the reconstruction was successful. At least two problems may exist with this or any other method of reconstruction when the properties of a short bivariate time series are automatically assumed to be correct over the entire time span and to the reconstructed data. First, though the bivariate time series consisting of a proxy and a target is assumed to be stationary (actually, even ergodic), its properties will be varying with time due to the sampling variability. The true statistics of the processes whose samples are used for the reconstruction are not known and it may happen that the statistics of the shorter bivariate time series have been much different from their unknown true values. This will mean distorted statistical properties of the reconstructed time series. Secondly, a short proxy time series cannot describe the behavior of its generated random processes at low frequencies. Assuming, for example, that the low-frequency variability of a climate index, which is being restored, had been statistically the same centuries ago may well be wrong.
Fig. 3.30 a Spectra of the time series that is being reconstructed, the reconstructed time series, and the reconstruction error (black, gray, dashed); b coherence and gain between the original and reconstructed time series
3.6 Verification of GCM-Simulated Climate. The Bivariate Case
159
The EMS directory, this chapter, Example 3.6 contains information for verification of this reconstruction example as well as a verification in the case when the model of the time series is AR(2). This concludes Example 3.6 and Sect. 3.5.
3.6 Verification of GCM-Simulated Climate. The Bivariate Case In this section, we will show how the AVESTA3 program can be used for verifying results of climate simulations. The data for the example given in this section belongs to the research stage CMIP6 conducted by the participant of the IPCC program. The necessary information about the experiments has been given in Sect. 2.6. The bivariate analysis allows one to study dependences between GCM-simulated climate components in both time and frequency domains and compare them with respective results of analysis of the same dependences obtained from observations. Comparing properties of bivariate difference equations is cumbersome and results of such comparisons will be difficult to understand. The functions that should be studied for evaluating the quality of climate simulations are the standard frequencydependent characteristics describing relationship within the time series: coherence functions, coherent spectra, and frequency response functions. The climate spectra generated by GCMs have been analyzed as scalar time series in Chap. 2 and their basic properties were shown to be satisfactory. In this case, the switch from scalar to bivariate time series analysis will hardly change the spectral estimates in a significant manner but would allow one to verify other functions of frequency. The main goal of this verification effort is to establish whether the GCMs have the ability to correctly reproduce the interdependence between the simulated annual global temperature (HadCRUT5) and the elements of the Earth’s climate that may affect the global temperature. To the best of the authors’ knowledge, the frequency dependent statistics that characterize time series as samples of bivariate random processes have not been used in the literature dedicated to verification of simulated climates. By itself, this is a serious drawback because we are dealing with multivariate time series whose properties must be studied in the frequency domain. Our comparisons will include the coherent spectra, coherence, and frequency response functions. The observation data are presented with the time series of the annual global surface temperature HadCRUT5 while the simulated results include the 35 climate models that had been used for the scalar case in Chap. 2. The verification described in Chap. 2 is necessary for GCM evaluations but it is by no means sufficient for checking their ability to take into account interactions that exist within the climate system. Our choice of the factor possibly affecting the global temperature is the El NiñoSouthern Oscillation system, which is believed to influence many elements of climate
160
3 Bivariate Time Series Analysis
and was recently shown to be closely related to the global annual temperature variations within the frequency band from approximately 0.1cpy to 0.4 cpy and to have a closed feedback loop with it (Privalsky 2021 and Sect. 3.3). Example 3.7. Verification of Climate Models With AVESTA3 Using the AVESTA3 program, we calculated the optimal AR models for each pair of simulated HadCRUT5 and NINO3.4 time series to obtain 36 bivariate files: one for the observed data and 35 for the simulated time series of HadCRUT5 and NINO3.4 for the time interval from 1870 through 2014. All computations for the observed and simulated time series of annual values in this chapter are conducted with the CAT.DAT file 145
3
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
In what follows, the linear trend in the time series of global temperature HadCRUTC5 and sea surface temperature NINO3.4 will not be deleted. If the processes are interacting with each other, they can hardly differ between variations caused by nature and by the external factors, including the anthropogenic signal. At the same time, the major results of verifications conducted with time series without the linear trend turned out to be very close to what will be shown below. As we plan to study possible interactions between the simulated global temperature and the oceanic component of ENSO, our interest will be concentrated upon the coherence functions and coherent spectra. According to the observed HadCRUT5 and NINO3.4 data, the maximum value of the estimated squared coherence function at frequencies above 0.01 cpy is 0.67 at about 0.25 cpy and the coherence stays statistically significant within the frequency band 0.1–0.4 cpy (Fig. 3.31a). The limit of statistical significance for coherence estimates is shown with the black horizontal line in the figure; it amounts to 0.168. Values of coherence below this threshold are statistically insignificant. The coherent spectrum shown in Fig. 3.31b is close to the spectrum of HadCRUT5 in the frequency band between 0.2 cpy and 0.3 cpy showing a strong (up to 67%) contribution of NINO3.4 to HadCRUT5 at those frequencies; at lower and higher frequencies this effect becomes smaller and increases again at very low frequencies. This increase still leaves the coherent spectrum at least an order of magnitude below the spectrum of HadCRUT5; besides, it happens at frequencies corresponding to time scales of many decades. According to the observed data, the behavior of the coherent spectrum is not related to the coherence function at frequencies below 0.15 cpy and above 0.35 cpy. The gain and phase factors (Fig. 3.32) are given for the entire frequency range but the values below approximately 0.05 cpy and over 0.45 cpy are statistically insignificant due to the low coherence function at those frequencies (also see Sect. 3.4). Altogether, the observations data show that the behavior of the global annual temperature depends upon ENSO only at frequencies between 0.1 cpy and 0.4 cpy. The energy
3.6 Verification of GCM-Simulated Climate. The Bivariate Case
161
Fig. 3.31 Estimated squared coherence function (a) and coherent spectrum (b) for the bivariate time series with components HadCRUT5 and NINO3.4. The gray line in b is the HadCRUT5 spectrum
source or sources for low-frequency variations of global temperature are not known and may be related, in particular, to the presence of the strong positive linear trend. The estimated frequency domain statistics of the bivariate stochastic system with the output HadCRUT5 and input NINO3.4 given from 1870 through 2014 will now be compared with respective statistics obtained from the simulated data. The number of degrees of freedom for all those estimates (the number of subrecords nd in Bendat and Piersol 2010) is most frequently 18 and 36 in six cases, which means that the estimates are reasonably accurate. The following steps will be taken to determine the degree of dependence of individual simulated HadCRUT5 time series upon respective simulations of ENSO’s
Fig. 3.32 Estimated gain (a) and phase (b) factors between HadCRUT5 and NINO3.4
162
3 Bivariate Time Series Analysis
oceanic component NINO3.4. This feature is described with the squared coherence functions. As shown in Fig. 3.31a, the contribution of ENSO to HadCRUT5 estimated from the observed bivariate time series occurs only within the frequency band from roughly 0.1 cpy to 0.4 cpy. The comparison results for the simulated time series shown in Fig. 3.33 are rather unexpected. The lines with symbols in the figure show the coherence function and coherent spectrum averaged over the 35 estimates of each function. For convenience, the figures are shown in both linear and logarithmic scale along the frequency axis. The frequency scale in this and following figures begins from 0.01 cpy while the coherence estimates obtained from observations becomes statistically significant only at frequencies of 0.1 cpy and higher. This is an obvious case of a steady positive bias that exists in almost all 35 climate models that we have.
Fig. 3.33 Coherence functions squared (a, c) and coherent spectra (b, d) between the observed (black) and simulated (gray) time series of HadCRUT5 and NINO3.4
3.6 Verification of GCM-Simulated Climate. The Bivariate Case
163
Within the band of ENSO’s natural frequency from 0.2 cpy to 0.3 cpy, the contributions of NINO3.4 to the global temperature HadCRUT5 are more or less similar for observations and simulations (50–75%). The unexpected result occurs at the lower frequencies. In contrast to observations, the numerical models ascribe to the ENSO’s oceanic component NINO3.4 a major role in variability of the annual global surface temperature within the frequency band below approximately 0.16 cpy. According to the numerical general circulation models, the contribution of ENSO to the global temperature expressed with the coherence function (Fig. 3.33a, c) strongly exceeds the estimates obtained from observations. The overestimated coherence function tells upon the coherent spectra of simulated time series making them much higher than the coherent spectrum estimated from observations (Fig. 3.33b, d). The thin gray lines show the share of the global temperature spectrum generated by NINO3.4. As seen from Fig. 3.33c, d), the contribution of the simulated ENSO’s component NINO3.4 to HadCRUT5 coherent spectra is much greater than what is observed in nature. According to the CMIP6 models, the NINO3.4 contribution to the HadCRU5 is especially large at the same frequencies below about 0.15 cpy (time scales 7 years and longer). Thus, the behavior of GCM-simulated global annual temperature cardinally differs from observations at low frequencies. This strong positive bias happens at the frequencies that define the decadal and multidecadal variations of climate; specifically, at the time scales of 10 and 20 years the contribution of ENSO to the global temperature simulated with GCMs exceeds the ENSO’s share in the HadCRUT5 obtained from observations by 1.7 and 15 times, respectively. At lower frequencies, the difference increases. Obviously, this behavior of simulated climate starkly disagrees with what is happening in nature. If we agree with this heavy dependence of the global temperature upon the ENSO, which presents just a local phenomenon behaving similar to white noise, we would be admitting that all other factors, including the global atmospheric and oceanic circulations and interdependences of global temperature with them are at best as relevant as the ENSO for the generation of the global annual temperature—the main indicator of climate. In other words, we would agree that the global atmospheric and oceanic circulation plays a secondary to the ENSO’s role in generation of the Earth’s climate. The analysis of the frequency response function (FRF) that describes the transformation of NINO34 into HadCRUT5 agrees with the previous results. Thus, the gain factor (transformation coefficient) at frequencies below 0.06 and 0.10 cpy exceeds the values obtained from the observed data by as much as 5.2 and 1.7 times (Fig. 3.34). The simulation results for the FRF’s argument (not shown) are quite satisfactory confirming that the output process HadCRUT5 lags behind the input NINO3.4. The global temperature during the last several decades could have been modified by external forces. Therefore, we also conducted the same experiment using the observed and simulated global and ENSO’s oceanic temperature with shorter time series: from 1870 through 1970. During that time interval, the global temperature could hardly have been strongly affected by the anthropogenic activities; the HadCRUT5 time series contains a relatively weak linear trend obviously caused by
164
3 Bivariate Time Series Analysis
Fig. 3.34 Gain factor of the system HadCRUT5/NINO3.4
natural factors. The observed and simulated time series of length N = 101 years is long enough to perform the same experiment as the one described above. Its results confirm the previous conclusion: in contrast to observations, the 35 general circulation models are built in such a way that the ENSO’s component NINO3.4 presents a strong source of decadal variations of climate. Last but not least, the same result obtained for the simulated bivariate time series of HadCRUT5 and NINO3.4 over the time interval from 850 through 1849 proved to be very close to what has been shown here for the short interval of 145 years. The estimates obtained from these long simulated time series are reliable because the number of degrees of freedom for the time series of length N = 1000 years increases, in most cases, from 18 to 125. The dependence of simulated global temperature upon the Southern Oscillation Index (SOI) from 1976 through 2014 was also proved to be in conflict with observations. Thus, the results shown here mean that the numerical models of general circulation are built in such a way that the components of the ENSO phenomenon become responsible for a large (probably, dominating) share of the global temperature variations. On the whole, the conclusion that the long-term variation of global temperature with time scales exceeding decades are caused by the ENSO strongly disagrees with observations and with physical reasoning. Finally, the feedback between the global temperature HadCRUT5 and ENSO’s oceanic component NINO3.4 affects the NINO3.4 spectrum in the same way as it does with the HadCRUT5 spectrum, that is, the spectral densities of simulated NINO3.4 time series are increasing with decreasing frequency starting from about 0.08 cpy. This feature of simulated data also disagrees with observations. This ends Example 3.7.
3.7 Bivariate Analysis of Mechanical Engineering Time Series
165
In this section, we tried to show how to employ AVESTA3 for conducting research that requires many runs of the program. It also shows that the numerical models of general circulation listed in Table 2.12 erroneously ascribe a large or even the dominant part of low-frequency natural variations of climate to the ENSO. This means that the current numerical models of climate are not capable to properly reproduce the most important feature of the Earth climate variations caused by nature. This erroneous result is given, in particular, for the time scales that are most important for the entire IPCC project’s goal to produce reliable projections of climate during the upcoming decades. This flaw in generating the annual global surface temperature— the main indicator of climate—makes unreliable the final results of the IPCC climate analysis including the projection part of the IPCC project.
3.7 Bivariate Analysis of Mechanical Engineering Time Series In engineering, time series are analyzed for obtaining quantitative information about dependences between the components of engineering constructions subject to random excitations. The final goals of such analyses are to design reliable, safe, and long-lasting products, be it a vacuum cleaner, skyscraper, or a spaceship. The analysis of engineering data is concentrated upon frequency domain dependences and properties because the time domain models are too complicated for their explicit examination. As we know from our everyday experience, the results of efforts applied by engineers are positive and we practically always can have engineering constructions and devices that satisfy our requirements. This happens with sophisticated constructions and devices that contain many mutually interacting parts with different responses to the same random forcing. A typical example would be a car that smoothly responds to an uneven road. In natural sciences, a major goal of studying bivariate time series is discovering the so-called teleconnections, that is, a statistically significant dependences between two (or more) elements of the planet’s environment at different geographical coordinates or between its different elements. The goal of this section is to show the reader what type of data engineers have and use for studying dependences between time series for subsequent design and manufacture of numerous products. In this chapter, we will discuss the bivariate case. Example 3.8 Bivariate Analysis of Engineering Time Series Consider an example where the initial data present a bivariate time series taken from the set of records kindly given to this author by Professor Emeritus Randall J. Allemang. The time series consists of a single input IN1 and the output OUT, which presents a response of an aluminum disk to excitation caused by IN1. The length of the time series is still 773,120 at the rate of one measurement per 0.000390625 s. The respective Nyquist frequency f N = 1/2DT is 1280 Hz. An example of a scalar
166
3 Bivariate Time Series Analysis
engineering time series was given earlier in Chap. 2 (Figs. 2.33, 2.34 and 2.35). Here, we will continue with a description of basic statistical properties of the data that can be encountered in mechanical engineering in the hope that the knowledge accumulated in engineering and so successfully used by engineers to supply the world with reliable products may make at least some of us to analyze bivariate and multivariate time series in the manner corresponding to the nature of the processes, that is, to sample records of random processes. Up to now, it seems that the natural science community (with the same exception of the solid Earth physics) is not familiar with the theory of random processes and with the methods of analysis and forecasting based upon it. The total number of terms in a time series exceeding three quarters of a million is very large so that, in contrast to natural sciences and especially to climatology, the statistical reliability of results that can be obtained for engineering data is very high at least in the stationary cases. This, of course, is a definite advantage that engineers often have over researchers working in natural sciences. Our goal here is to give an example of bivariate time series analysis based upon engineering data in order to show the differences and similarities between the time series analysis tasks typical for these two areas of knowledge. So, the CAT.DAT file in examples with bivariate data will be 773120
50
1001
0
0
1
1
0.000390625
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The frequency domain resolution is doubled from 501 to 1001 in order not to miss possible interaction between the input and output time series. The maximum order of autoregression is M = 50. Thus, the responses to the AVESTA3 questions will be OUT&IN1.RES, 1, OUT, IN1. The first results of analysis show that the time series are Gaussian and do not contain any significant linear trends. The time required by a desktop computer with CPU Intel® Core™ i5-10,400 up to 4.30 GHz for calculations was 33 min. The detailed information about the covariance, cross-covariance, correlation, and cross-correlation functions is produced by the program to give the user an idea about the degree of the task’s complexity and, as seen from Fig. 3.35, the bivariate process we are dealing with presents a combination of many vibration components. Figure 3.35a shows the correlation functions of the input and output time series (left and right parts of the figure), Fig. 3.35b shows the cross-correlation function. [Remember that the correlation function r(k) is even, that is, r(k) = r(-k).] A detailed analysis of these statistics is hardly required but the functions demonstrate that we are dealing with a rather complicated case. The entire linear stochastic system described by AVESTA3 for autoregressive models from AR(1) through AR(50) contains a lot of information that is probably useless for engineers; the complexity of the process is very high and the order selection criteria indicate that it is best described with the maximum order model AR(50). The engineers are hardly interested in these models because their normal approach is concentrated on frequency domain analysis that we will describe below.
3.7 Bivariate Analysis of Mechanical Engineering Time Series
167
Fig. 3.35 Correlation (a) and cross-correlation (b) functions of the bivariate time series with input IN1 and output OUT
Consider now the results of frequency domain analysis which are given by AVESTA3. The spectral density estimates of the input and output for the model AR(50) shown in Fig. 3.36 are very similar to what we had in the scalar case (Fig. 2.35) but the spectra do not coincide. The coherence function between the output and input time series (Fig. 3.37a) shows a number of frequency bands with both very high and very low coherence functions; for example, the coherence achieves 0.998 at f = 377.6 Hz and 0.02 at f = 462.8 Hz. This means that the coherent spectrum often does not differ from the output spectrum or drops from it by an order of magnitude and more within those bands (Fig. 3.37b).
Fig. 3.36 Spectra of the output (a) and input (b) processes
168
3 Bivariate Time Series Analysis
Fig. 3.37 Coherence function squared (a) and coherent spectrum (b) between time series OUT and IN1. The black line is the output spectrum
These features illustrate the complicated character of the frequency dependent functions for this system but some specifically interesting information is seen in the frequency response functions components. At frequencies where the coherence function is close to unity, the gain factor g12 (f ) stays close to unity meaning that the value of the response to the excitations by IN1 more or less coincides with the excitation value. However, at frequencies close to 700 and 730 Hz, the value of the response exceeds the excitation value by about 300–350% (Fig. 3.38a). It means that if the data was supposed to be used for designing some construction, the resonance response at those frequencies should have been studied with special attention.
Fig. 3.38 Frequency response function between time series OUT and IN1: gain (a) and phase (b) factors
3.8 Conclusions
169
The phase factor within the band from 700 to 730 Hz is practically constant, which means that the time lag between the input IN1 and output OUT within this band amounts to 0.001 s. This information may be not important for engineering tasks but can be quite interesting for natural data. The disk’s response to two input processes should take into account possible dependence between the inputs and it will be investigated in Chap. 4. Now, when we have an idea what the bivariate analysis of engineering data is, we can sum up the differences and similarities between it and the analysis of data generated by nature. It looks that the differences between natural and engineering data and their analyses include • the sampling rate that may be faster by orders of magnitude (e.g., 10–4 s vs. 1 year); • the larger number of terms in the time series (e.g., close to 106 vs. 100 or 1000); • the complexity of the frequency dependent characteristics (e.g., white noise spectrum vs. spectrum containing many sharp peaks, smooth coherence function vs coherence function varying several times between zero and one, smooth gain factor behavior vs gain factors containing several sharp peaks). No doubt that more differences can be found but the major and currently the only similarity between the natural and engineering data is that all of them present sample records of random processes. The engineers understand the nature of their data very well, apply methods based upon the theory of random processes, and obtain useful results. We generally do not have a detailed frequency domain description of our bivariate time series just because we are not trying to get it but it is about time to wake up and admit that all our processes are random and working with data generated by random processes cannot produce useful results until we start using the proper methods of random data analysis. This ends Example 3.8 and Sect. 3.7.
3.8 Conclusions This chapter contains a description of the tool which is necessary for studying statistical properties of two time series which are suspected to be related to each other and illustrate it with practical examples based upon results of analysis with the program AVESTA3. The time series obtained from observations carry reliable information and, essentially, Chaps. 2 and 3 almost completely cover the list of probabilistic properties that should normally be studied in natural sciences. The AVESTA3 program produces both time and frequency domain information about bivariate time series. The frequency domain information can also be obtained through the nonparametric method described in the classical book by Bendat and Piersol (2010) but the nonparametric approach generally requires more data than the autoregressive modeling. The latter approach provides statistically acceptable time domain results for time series of length about a hundred time units and longer. These results present a stochastic different equation which allows one to study relations
170
3 Bivariate Time Series Analysis
between the time series explicitly, for example, for detecting feedback systems thus discovering some physically important time-domain features in the bivariate data. The time domain model is transformed into a spectral matrix and eventually into functions of frequency including statistically reasonable estimates of spectra, coherent spectra, coherence and frequency respond functions. The use of order selection criteria does not allow the user to break the laws of mathematical statistics. The eight examples in this chapter cover a wide range of situations with bivariate time series. The mathematically proved inapplicability of the regression equation to bivariate time series had been steadily ignored in natural sciences for many decades; now, the researchers working in this area have a chance to obtain information about properties of bivariate time series in a proper way. In addition to experimental proofs of regression’s inability to discover the true dependence between two time series (Examples 3.1 and 3.2), the reader can learn what is necessary to know about bivariate time series and how to obtain the required information without deep knowledge of theory of random processes. The regression approach is completely helpless if the signal contained in a time series is weak while the autoregressive frequency domain analysis with AVESTA3 demonstrates that it can detect a tiny (two–three centimeters in amplitude) signal immersed in high accuracy measurements of a noisy background with a range of several meters (Example 3.3). The combination of long and highly accurate time series and application of the proper method of analysis to this data allows one to discover the diminutive signal and correctly estimate its amplitude. The ENSO phenomenon presents an almost universal basis for teleconnection research usually conducted at monthly or seasonal time scales but still regarded as a factor of climate variability; it is shown here that NINO3.4—the oceanic component of ENSO—is closely related with the annual global temperature. This climatic scale dependence is examined and shown to be strong only within a rather narrow frequency band where the global temperature spectrum is weak and where the spectrum of ENSO’s oceanic component has its only smooth peak. It turns out that the two time series are connected to each other through a closed feedback loop. The global temperature and the ENSO are interrelated but the influence of ENSO upon the Earth’s climate is small while the global temperature is responsible for a large part of ENSO’s variability (Examples 3.4 and 3.5). The climate reconstructions into a distant past (up to millennia) are “almost always” (Christiansen and Ljungqvist 2017) based upon regression equations. As the regression equation cannot describe relations between time series, it means that reconstructions of climate by this way are almost always incorrect. Again, this inability of the regression approach became known in mathematics many decades ago but remains completely unknown in climatology. This task is easily solved with bivariate (or multivariate, if necessary) autoregressive analysis and the results of such reconstructions with AVESTA3 given in Example 3.6 show experimentally that the regression equation is absolutely helpless in a simple and common for climatology case while the autoregressive reconstruction restores up to 70% of the missing time series variance. The ability of the traditional correlation/regression approach in this respect is limited by about 3%.
References
171
Section 3.6 and Example 3.7 demonstrate how the autoregressive frequency domain analysis can be used for verifications of climate models designed for prediction (projection) of climate behavior over the current century. Foreseeing the response of a very complex stochastic system, which generates the Earth’s climate, under external forcing requires, first of all, a definite understanding of the system’s internal mechanism. The time series of annual global temperature generated by 35 GCMs were analyzed for the interval from 1870 through 2014 and the results compared with observations. The properties of simulated global temperature proved to be very different from what had been observed for that time interval. In the simulated data, the role of ENSO was dominant and steadily growing at time scales over 6–7 years. It means that the GCMs are built in the way that the sea surface temperature within the ENSO’s area is responsible for decadal and possibly longer variations of climate. This result starkly disagrees with observations and requires a revision. It does not allow one to regard the results of climate simulations as reliable. According to Sect. 3.7 and Example 3.8, the engineering time series have a much more complicated structure than what we see in natural data. The goal there was to show that the design of different engineering constructions can be successful even with highly variable frequency-dependent functions. All nature-created processes are random but examples of applying time and frequency domain analysis in natural sciences (e.g., for detecting teleconnections or for climate reconstruction) are practically nonexistent showing that we are illiterate in the theory of random processes and in respective methods of analysis.
References Bendat J, Piersol A (1966) Measurement and analysis of random data. Wiley, New York Bendat J, Piersol A (2010) Random data. Analysis and measurements procedures, 4th edn. Wiley, Hoboken Box GEP, Jenkins GM (1970) Time series analysis. Forecasting and control. Wiley, Hoboken Box G, Jenkins G, Reinsel G, Ljung G (2015) Time series analysis. Forecasting and control, 5th edn. Wiley, Hoboken Christiansen B, Ljungqvist F (2017) Challenges and perspectives for large-scale temperature reconstructions of the past two millennia. Rev Geophys 55(1):40–97 Gelfand I, Yaglom A (1957) Calculation of the amount of information about a random function contained in another such function. Uspekhi Matematicheskikh Nauk, 12:3–52, English translation Am Mathem Soc Transl Ser 2(12):199–246 Granger C, Hatanaka M (1964) Spectral analysis of economic time series. Princeton University Press, Princeton Mortimer C, Fee E (1976) Free surface oscillations and tides of Lakes Michigan and Superior. Phi Trans R Soc Lond 281:1–61 Privalsky V (1988) Stochastic models and spectra of interannual variability of mean annual sea surface temperature in the North Atlantic. Dynam Atmos Ocean 12:1–18 Privalsky V (2015) On studying relations between time series in climatology. Earth Syst Dynam 6:389–397 Privalsky V (2018) A new method for reconstruction of solar irradiance. JASTP 172:138–142
172
3 Bivariate Time Series Analysis
Privalsky V, Jensen D (1995) Assessment of the influence of ENSO on annual global air temperature. Dynam Atmos Ocean 22:161–178 Privalsky V (2021) Time series analysis in climatology and related sciences. Springer Reinsel G (2003) Elements of multivariate time series analysis, 3rd edn. Springer, New York Thomson R, Emery W (2014) Data analysis methods in physical oceanography, 3rd edn. Elsevier
Chapter 4
Analysis of Trivariate Time Series
In addition to the bivariate case, the AVESTA3 program can process time series with three components, each in the form of a scalar time series. The algorithm of analysis in the trivariate case presents an extension of the bivariate algorithm. It is described in detail in Chap. 7 of the Bendat and Piersol book (2010). Mathematically, it is quite cumbersome but using AVESTA3 is easy and it allows one to obtain results for trivariate systems. A switch from bivariate (D = 2) to multivariate (D > 2) provides the user with richer information. It is probably safe to assume that the products of AVESTA3 in the multivariate case contain a number of critically important frequency dependent functions which are completely unknown in natural sciences. However, if the reader/user of this book is familiar with the initial methodological part and with the practical examples given in previous chapters, understanding and testing a trivariate example will not be difficult. The lack of practical examples of autoregressive multivariate time and/or frequency domain analysis in natural sciences makes the AVESTA3 program a tool for more advanced time and frequency domains probabilistic analysis in this area. Seemingly, the only publication on the subject was given by this author (Privalsky, 2021, Chap. 14).
4.1 Products of Trivariate Time Series Analysis with AVESTA3 In the time domain, a trivariate time series produces a system of three-dimensional stochastic difference equations for each autoregressive order (still from p = 1 to p ≤ 50). As in the bivariate case, the program will estimate statistical moments of the time Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-16891-8_4.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 V. Privalsky, Practical Time Series Analysis in Natural Sciences, Progress in Geophysics, https://doi.org/10.1007/978-3-031-16891-8_4
173
174
4 Analysis of Trivariate Time Series
series including time-dependent covariance matrices along with the correlation and cross-correlation functions. The autoregressive equations present a simple extension of the bivariate system discussed in the previous chapter. They can be important and useful but are often too bulky for explicit analysis. Therefore, we will concentrate in this chapter upon the frequency domain quantities and upon examples of practical multivariate analysis. Suppose that we have a trivariate time series xt = [x 1,t , x 2,t , x 3,t ]' . The frequency domain analysis will produce estimates of respective spectral densities s11 (f ), s22 (f ), s33 (f ) and, of course, the spectra have the same meaning as in the scalar and bivariate cases. The spectral estimates for multivariate time series are built with account for possible interdependences between the output and input scalar time series and for nonzero linear relations between the inputs. It means, in particular, that the spectral estimates obtained for a trivariate case and for three scalar time series will not necessarily be identical. As we already know, the linear relation between two time series is described 2 ( f ). If the system has two scalar inputs and one with the coherence function γ12 output, we are interested in the dependence of the output time series x 1,t upon the inputs x 2,t and x 3,t taken together. This is what happens in mathematical statistics when we study linear relations between three or more random variables: we want to know the multiple correlation coefficient and two partial correlation coefficients. The crucial difference from analysis of time-independent random variables is that our “correlation coefficients” are frequency dependent. The three coherence functions are: 2 • multiple coherence γ1:23 ( f ), which describes the degree of linear interdependence between the output x 1,t with both inputs x 2,t and x 3,t ; 2 2 • partial coherence functions γ12.3 ( f ) and γ13.2 ( f ), which describe the degree of linear interdependence between the output x 1,t with the inputs x 2,t and x 3,t individually.
The three new coherence functions mean three new coherent spectra: • multiple output coherent spectrum s1:23 (f ) that reflects the dependence of the output x 1,t upon both inputs x 2,t , x 3,t ; • partial coherent spectra s12.3 (f ) and s13.2 (f ) which describe the parts of the output spectrum s11 (f ) created due to the linear dependences of the output x 1,t upon the inputs x 2,t and x 3,t individually and with account for possible dependence between the inputs x 2,t and x 3,t . However, the partial coherent spectra calculated with AVESTA3 are generally relatively unreliable and should be treated with caution because of a problem with estimates of partial coherent spectra (see below). As we now have two input time series, AVESTA3 calculates respective frequency response functions: • gain factors g12.3 ( f ) and g13.2 ( f ) and • phase factors φ12.3 ( f ) and φ13.2 ( f ).
4.1 Products of Trivariate Time Series Analysis with AVESTA3
175
The final printout of frequency dependent functions is the time lag factor—a function that describes the time shift between the output and each input process as a function of frequency. This quantity does not seem to be used in engineering (in particular, in the books by J. Bendat and A. Piersol). In the trivariate case, there are two time lag factors: τ 12.3 (f ) and τ 13.2 (f ). These functions were suggested by this author for use in natural sciences and the experience with the time lag factor is scanty (Privalsky 2021, Chap. 14). It should be applied with caution, especially at low frequencies. If we have a multivariate linear system with q input processes and one output, the system can be described with q! equally correct models and some results of analysis will depend upon the specific order of input processes. In particular, one may test the correctness of AVESTA3 results by verifying the equations for the multiple coherent spectrum. s1:23 ( f ) = s12.3 ( f ) + s13.2 ( f ) and for the output spectrum 2 s11 ( f ) = s1:23 ( f )/γ1:23 ( f ).
These equations are correct irrespective of the input order choice and the multiple coherent spectrum will be the same though the partial coherent spectra may be different for different models. For a complete explanation, see Chap. 7 of the Bendat and Piersol book published in 2010. Still, the estimated partial coherent spectra produced by AVESTA3 cannot be fully trusted. Increasing the dimension D of the time series from two to three may cause reliability problems related to the time series length. According to the requirement that the ratio N/(D2 M) should not be less than 10, the length N of trivariate time series should be at least 900 if the maximum autoregressive order M of the model that we want to have does not exceed 10. If M = 50, the minimum length increases N = 4500. The minimal length required for getting results only for M = 1 is N = 90. This requirement to the time series length in the multivariate case puts an obstacle on the way to studying natural processes represented with multivariate time series at the climatic time scale—longer than one year. The use of monthly data to study these climate-related features makes the situation even worse due to providing irrelevant results obtained for the shorter time scales. Thus, working with multidimensional time series may be too demanding for the size of the user’s time series. The sciences that may regularly have enough observations for such analysis include meteorology, oceanography, hydrology, and, possibly, other areas. Meteorology is a well-developed science which has achieved definite success in weather forecasting through using a dynamic approach based upon numerical integration of fluid dynamics equations. It can hardly need any help in that respect from probabilistic considerations but one has to remember that weather is regarded as a time dependent random field; indeed, its forecasts obtained through a dynamic
176
4 Analysis of Trivariate Time Series
method are published with additional probabilistic information. On the other hand, the researchers working in natural sciences may not be acquainted with autoregressive multivariate time series analysis and as the main goal of this book is methodological, we will start our examples with analysis of meteorological time series with an accent upon the information provided by AVESTA3 rather than upon interpretation of the results. Consider now some examples based upon the trivariate time series of meteorological, climatological, and engineering data.
4.2 Application to Geophysical Data Example 4.1 Daily temperature as a function of solar radiation and precipitation This three-dimensional example is based upon the time series consisting of direct observations at the Salt City, UT station (see https://www.epa.gov/ceam/meteorolo gical-data-utah). The Weather Bureau Army Navy (WBAN) number of the station is 24127 and the respective file contains data on surface temperature (the output), solar radiation, and precipitation (the inputs). The time span of the time series is from January 1, 1961 through December 31, 1990 at the daily rate (DT = 1 day). The length of these time series is N = 10,957. With D = 3, the maximum autoregressive order for experiments should be set to 50, which is the maximum for AVESTA3. The CAT.DAT file for this run should be 10957
50
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
with the output time series of temperature (TEMP), and the inputs SOL (solar radiation) and PREC (precipitation). The three scalar time series are shown in Fig. 4.1. In what follows, we will be using the notation xt = [x 1,t , x 2,t , x 3,t ]' for trivariate time series meaning that x 1,t is the output process (the temperature, in this case) and x 2,t , x 3,t are the inputs (the solar radiation and precipitation). With DT = 1 day, we will be interested in meteorological time scales starting from a few days, with the Nyquist frequency f N = 0.5 cpd (cycles per day). The AVESTA3 output file will be notated as TEMP&SOL_PREC.RES. For analysis of the scalar time series (recommended), the value of M should be 99. In the scalar case, the order of the optimal model may be very high and respective spectra will reveal not only the seasonal trend but also the natural synoptic period and other high-frequency peaks; in the trivariate case, the best autoregressive order is always lower meaning that the spectra will have fewer details. After the information about the basic statistical moments of each component of the time series, the program provides estimates of covariance and correlation functions (not shown); it can be used for subsequent research but a simple look at
4.2 Application to Geophysical Data
177
Fig. 4.1 Time series of trivariate system xt = [x 1t , x 2t, x 3t ]' : (a) temperature, (b) solar radiation and precipitation at the Salt Lake City station
the printout of the correlation and cross-correlation function estimates reveals that this trivariate series has a rather complex structure. The correlation function of the output is decreasing slower than the input correlations while the cross-correlations with solar radiation stay relatively high for more than 50 lags (days). The printout file of the AVESTA3 run with this trivariate time series shows the model AR(8) as the best. Analyzing such time domain model consisting of three stochastic difference equations each containing 24 terms is too cumbersome but taking a look at it will do no harm. With this long time series, many out of the 72 coefficients are significantly different from zero, and some of them are quite large. However, the main analysis is to be done with the frequency dependent functions. For many readers, this will probably be the first acquaintance with frequency domain statistics of a multivariate time series so we will discuss it here in more detail. The spectrum of the temperature series s11 (f ) is almost monotonic (Fig. 4.2a) and contains a very smooth maximum at f = 0.1 cpd (the time scale of 10 days). The spectra s22 (f ) and s33 (f ) of the input processes (solar radiation and precipitation) are also smooth and the precipitation spectrum is close to a constant meaning a white noise model for precipitation, which is typical for this process in general (Fig. 4.2b). 2 2 The multiple coherence γ1:23 ( f ) and the two partial coherences γ12.3 ( f ) and 2 γ13.2 ( f ) are shown in Fig. 4.3. According to the figure, the multiple coherence is quite high at very low frequencies, and achieves a local maximum at about 0.10 cpd probably related to the spectral maximum shown in Fig. 4.2a. The maximum is statistically insignificant. The horizontal dashed black line in Fig. 4.3a shows the upper 90% confidence limit for the true zero multiple coherence. It is very close to zero which means that almost all coherences in this figure are meaningful; the fact that small but statistically significant coherences may be quite important as has already been showed in Example 3.3 when a very low but significant coherence helped us to determine the amplitudes of tidal constituents in Lake Michigan.
178
4 Analysis of Trivariate Time Series
Fig. 4.2 Estimated spectra of temperature (a), solar radiation (b, black), and precipitation (b, gray)
2 ( f ), Fig. 4.3 Coherence functions for the trivariate time series xt : (a) multiple coherence γ1:23 2 2 (b) partial coherences with solar radiation γ12.3 ( f ) and with precipitation γ13.2 ( f )
These results are statistically reliable because with N = 10,957 and M = 8, the equivalent number of freedom N/(D2 M) = 152, which is quite high. The total contributions of solar radiation and precipitation to the variance of temperature and their individual contributions are defined by the sums of respective frequency dependent functions. The AVESTA3 program sums up the individual spectral densities of all three spectra s11 (f ), s22 (f ), s33 (f ) and the coherent spectra s1:23 (f ), s12.3 (f ), s13.2 (f ). Respective values of the multiple and partial coherent spectra shown at the bottom of spectral estimates in AVESTA3 (look for “SUM …” in the printout) define the contributions of solar radiance and precipitation to the summed up spectral density of the output time series. If the frequency resolution of estimates is high, these values are close to respective variances so that the ratios of the summed up multiple and partial coherent spectra to the sum of the output spectra show the
4.2 Application to Geophysical Data
179
relative share of these contributions to the output variance. It is seen from the file TEMP&SOL_PREC.RES. In particular, the sum of the output spectrum values is 108.47 while the sum of the multiple coherent spectrum values is 87.88. Their ratio defines the contribution of solar radiation and precipitation to temperature variation and in this case, it is as high as 80%. Before continuing with the analysis of partial coherent spectra, one should take into account an important feature that differs multivariate systems from bivariate. In the latter case, we have a single function—the coherent spectrum—that presents the input’s linear contribution to the output. If the number of inputs exceeds one, we have to decide in what order the input time series should be given to the processing program. The general situation here is that in order to obtain proper estimates of the contributions from individual inputs we need to transform the inputs in such a way that they will be independent of each other. Details can be found in Chap. 7 of the book by Bendat and Piersol (2010) and in Phillips and Allemang (2022). The choice of inputs order can be based upon both physical and statistical considerations. Examples of respective versions of analysis for this three-dimensional time series are shown in Fig. 4.4. If we decide that the solar radiation input x 2,t should be the first in the line because the solar radiation affects temperature directly, we will get the distribution of partial coherent spectra shown in Fig. 4.4a. Otherwise, the distribution is shown in Fig. 4.4b: the contribution of precipitation (solid gray line) to the multiple coherent spectrum will be higher but the partial coherent spectrum related to the solar radiation (solid black line) will become less accurate. In this case, changing the order of input time series leads to sharply different results for the precipitation partial coherent spectrum (solid black line) but the radiation partial spectrum (dashed black) remains relatively the same, especially at lower frequencies.
Fig. 4.4 Spectra of temperature estimated with trivariate models TEMP&SOL_PREC.RES (a) and TEMP&PREC_ SOL.RES (b). Radiation- and precipitation-caused partial spectra—black and dashed-black
180
4 Analysis of Trivariate Time Series
The problem with arranging the inputs seems to exist only for the partial coherent spectra. All other functions of frequency do not show any dependence upon the inputs order. The gain factors g12.3 (f ) and g13.2 (f ) define the response of temperature to solar radiation and to precipitation individually, with account for possible relation between the two input processes. This latter relation is described with the ordinary coherence 2 ( f ) which is not shown here. The gain factor can be regarded as a function γ23 frequency dependent regression coefficient. The gain factors describe the response of temperature to a unit change of radiation (a) or precipitation (b). According to Fig. 4.5a, the response of temperature to a change of daily solar radiation by 1 Ly/day decreases almost monotonically from about 0.02– 0.05 °C per 1 Ly/day at lower frequencies to 0.01 °C at higher frequencies. A change of daily precipitation by 1 cm causes a response of 4 to 8 °C and to 4 °C and less at high frequencies. Both functions generally decrease with increasing frequency. The cause of the wide confidence interval is the small values of the partial coherence function relating temperature to precipitation. The phase factor φ 12.3 (f ) between temperature and solar radiation is positive at all frequencies. This is a physically normal situation: the output process lags behind the input (Fig. 4.6a). The phase factor φ 13.2 (f ) between temperature and precipitation (Fig. 4.6b) is negative, which means that the input process lags behind the output, that is, changes of precipitation lag behind variations of temperature. The time lag τ1.23 ( f ) defined as τ1.23 ( f ) = φ 12.3 (f )/2πf at frequencies above 0.1 cpy amounts to about 1–2 days (Fig. 4.7). The length of these time series is 30 years so we need to remember that results of analysis at frequencies below approximately 0.2–0.3 cpy cannot be trusted.
Fig. 4.5 Gain factors of the trivariate system xt : temperature’s response to solar radiation (a) and to precipitation (b)
4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial …
181
Fig. 4.6 Phase factors of the trivariate system xt = [x 1t , x 2t, x 3t ]: temperature, solar radiation, and precipitation
Fig. 4.7 Time lag factors of the trivariate system xt
The entire situation needs attention but the main goal of this example is to show what information is provided by the AVESTA3 program in trivariate cases. This concludes Example 4.1.
4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial Data Sets The first goal of the two following examples is still methodical: to show how the AVESTA3 program should be applied for analyzing trivariate time series (one output and two inputs). This will also allow us to verify frequency domain properties of the
182
4 Analysis of Trivariate Time Series
new set of time series obtained by spatial averaging of temperature data at very large areas up to the entire surface of the globe. An additional task is a test of relationships between the output time series and the inputs. The two data sets analyzed here have the annual global temperature as the output with the hemispheric time series (Example 4.2) or the oceanic and terrestrial time series as inputs (Example 4.3). Both cases are strictly linear and we know that in the first case the inputs contribute equally to the global temperature, while in the second case, their contributions should be proportional to the oceanic and terrestrial areas of the globe. Finding strong interdependences between natural processes in a multivariate case may not be easy. The time series selected for Examples 4.2 and 4.3 present rather obvious candidates that describe the behavior of the surface temperature averaged over very large spatial areas: global, hemispheric, oceanic, and terrestrial. The initial data include the time series recently provided by the University of East Anglia (UEA); its previous version was partially analyzed as a trivariate system in Privalsky (2021). The data is available at the UEA web site https://crudata.uea.ac.uk/cru/data/temper ature/. The UEA data has a monthly time resolution but we will transform them into sequences of annual temperature—the best indicators of climate variability. Example 4.2 Analysis of the recent UEA set: global and hemispheric temperature In this example, the output process is the annual global temperature and the inputs are the time series of the northern and southern hemispheric temperatures; the time interval for all time series is from 1857 through 2021 (Fig. 4.8). The initial year is selected in this way because, in contrast to other time series, the terrestrial data for the southern hemisphere are not available for the interval from 1850 through 1856. The time series length N = 165 yrs. First, we analyze these time series in the scalar version.
Fig. 4.8 Anomalies of global (a) and hemispheric (b) annual temperature, 1857–2021
4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial …
183
The CAT.DAT file for the annual data is 165
16
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The maximum order is increased to M = 16 in agreement with the previous recommendation about selecting the maximum autoregressive order M ≤ N/(10D). In the scalar case, the dimension of the time series D = 1. For convenience, the global and hemispheric time series will be called GL, NH, and SH. The optimal scalar models of all these time series built with the AVESTA1 program are AR(4) and the roots of respective characteristic equations lie outside the unit circle confirming that the time series belong to stationary random processes. All spectra contain a smooth peak at f = 0.22 cpy, that is, at the time scale of 4.5 years. The CAT.DAT file for the trivariate case will be 165
4
501
0
0
1
1
1
0
0
0
0
N
M
NF
K
R
L
LS
DT
MFLT
KFLT
LFLT
ENDDATE
The value of the maximum order M is intentionally increased to M = 4 in order to see if the optimal autoregressive order for this time series can be higher than 1. The estimated scalar spectra are shown in Fig. 4.9. All three spectra contain a maximum at about 0.22 cpy (approximately, 4.5 yrs). When attempting to analyze the trivariate time series with the scalar components GL, NH, and SH, we need to have in mind that the global temperature is obtained as the mean value between the hemispheres: GL = (NH + SH)/2. It is a strictly linear transformation and one should expect that the resulting AR model will be unstable.
Fig. 4.9 Scalar versions of spectra of annual global (a), northern and southern hemispheric temperature (b, black and gray, respectively). The line with crosses in (a) is the bivariate version of the global temperature spectrum
184
4 Analysis of Trivariate Time Series
When this happens, the estimate of autoregressive coefficient for every or some AR orders become statistically unreliable and the estimation error of some coefficients cannot even be calculated (the “NaN” sign). Therefore, we will turn to the frequency domain analysis, which does not show any signs of instability in this case, probably, due to some minor noise components introduced during the process of time series generation at the UEA. The maximum AR order M for the multivariate time series is determined from the relation N/(D2 M) ≥ 10 and in this case, it cannot be higher than M = 1. The AR(1) turned out to be the optimal model because it had been indicated by all five order selection criteria when the parameter M was set to M = 4. The order of the input processes is defined from a simple physical consideration: the oceanic area is much larger than the terrestrial one and, consequently, the effect of the oceans upon the global surface temperature is stronger. This is especially true for the southern hemisphere (SH) so that the SH time series should be set as the first input. Respective AVESTA3 file is named GL&SH_NH.RES. Now, the quantities s11 (f ), s22 (f ), and s33 (f ) are the spectra of global, south hemispheric and north hemispheric time series of annual temperature. Due to the change of the model’s order from scalar to trivariate, and especially to the very small AR order in the latter case, the spectral estimates lose their maxima at 0.22 cpy and become strictly monotonic. This is quite understandable because the number of parameters to be estimated in the three-variate case is nine times higher than in the scalar case. In this example, we are interested, first of all, in the relations between the global and hemispheric temperatures. The total contribution of the hemispheric time series to the spectral density of global temperature given by AVESTA3 are shown in the figure below. The practically complete coincidence between the global and multiple coherent spectra in Fig. 4.10a confirms that the system is very close to being strictly linear and the sum of the contributions from the hemispheric data coincide with the global temperature. As follows from Fig. 4.10b, the contributions of the first and second inputs differ by up to two orders of magnitude and become comparable only at frequencies higher than 0.1 cpy (less than 10 years). By ascribing the first role to the temperature x 3,t of the southern hemisphere, we rob the northern hemisphere time series x 2,t of the linear contribution of x 2,t to x 3,t that can be estimated trough a bivariate model SH&NH.RES (or through NH&SH.RES). If we change the order of input time series to obtain results from GL5&NH_SH.RES, the major role will now be played by the northern hemisphere temperature data (dashed black) while the contribution of the southern hemisphere (dashed gray) became negligibly small. These results are unacceptable and in what follows we will be estimating the individual contributions of hemispheric time series through the gain factor data rather than through the partial coherent spectra. Going to the estimated coherence functions (multiple and two partial) we find that all three are very close to 1 over the entire frequency band from 0 to 0.5 cpy (Fig. 4.11).
4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial …
185
Fig. 4.10 Spectrum of annual temperature (a, black), multiple coherent spectrum (a, symbols), partial coherent spectra (b, from GL5&SH_NH.RES: black and gray). The dashed lines are from GL5&NH_SH.RES)
Fig. 4.11 Multiple (a, with 90% confidence limits) and partial (b) coherence functions between the global and hemispheric temperature time series
This result confirms that the transformation of hemispheric data into global was strictly linear (we knew it in advance but the program did not) and that the program works correctly in this case of an almost ideal linear transformation. If it were strictly linear, we would not have received any information about the time and frequency domain properties of this trivariate time series. The global temperature is defined as a mean of the two hemispheric temperatures so that the gain factor connecting the hemispheric temperature to the global should be equal to 0.5, that is, if a hemispheric temperature changes by 1 °C, the global temperatures should be changed by 0.5 °C. And this is what happens with the gain
186
4 Analysis of Trivariate Time Series
Fig. 4.12 Gain factors between annual global and hemispheric temperature time series
factors shown in Fig. 4.12. The deviations from the theoretical value of 0.5 do not exceed 0.2%. The phase factor should be equal to zero and this is what AVESTA3 shows to the user. Deviations from zero are so small that showing them is hardly possible. On the whole, deviations from a strictly linear dependence between the output (the global temperature) and the inputs (the hemispheric temperatures) are insignificant, the time series between 1857 and 2021 do not contain any serious unexpected properties and look quite reliable. This ends Example 4.2. Another way to study and evaluate the data set produced by the UEA is to analyze the three-variate stochastic system with the global temperature as the output and the oceanic and terrestrial time series as inputs. Example 4.3 Analysis of the recent UEA set: global, oceanic, and terrestrial temperature The time series of annual global, oceanic, and terrestrial temperature are shown in Fig. 4.13. The fastest increase of annual temperature occurs on the land. Our task here will be to analyze time and frequency domain statistical characteristics of the new annual data set HC5, OCN5, and LND5 and compare the properties of this set with respective properties of the previous set HC4, OCN4, and LND4 as elements of respective tri-variate time series. The scalar versions of the global, oceanic and terrestrial spectra contain statistically significant smooth peaks at about 0.24 cpy (Fig. 4.14) while the trivariate system is described with an AR(1) model selected by four order selection criteria; consequently, all spectral densities decrease with frequency strictly monotonically (Fig. 4.15). The differences between the spectra estimated from the HadCRUT4 and HadCRUT5 data sets are small or invisible (terrestrial data) and all spectra contain
4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial …
187
Fig. 4.13 Anomalies of global (a), oceanic, and terrestrial annual temperature (HadCRUT5). 1857– 2021
Fig. 4.14 Scalar versions of spectra of global (a), oceanic (black) and terrestrial (gray) annual temperature (b). The HadCRUT5 and HadCRUT4 versions are shown with solid and dashed curves and with crosses
a statistically significant peak at f = 0.23 cpy, which corresponds to approximately 4.4 years. Obviously, a relation to the ENSO exists for the oceanic and terrestrial temperature. Further analysis of this feature is left to the reader. Also, the trivariate versions of spectral estimates are similar for the HadCRUT5 and HadCRUT4 data sets (Fig. 4.15). Consider now the behavior of the coherence functions which describe the dependence of the global temperature upon the oceanic and terrestrial. We have three such 2 2 2 ( f ) and partial coherencies γ12.3 ( f ) and γ13.2 ( f ). functions: multiple coherence γ1:23 The transformation of oceanic and terrestrial time series into the global temperature
188
4 Analysis of Trivariate Time Series
Fig. 4.15 HadCRUT5 and HadCRUT4 estimates of global (black), oceanic (dashed), and terrestrial (gray) annual surface temperature as components of the trivariate system
is strictly linear and the trivariate system should completely describe it, which means that all coherence functions should be equal to one. For the time series belonging to the latest set HadCRUT5, the multiple coherence 2 γ1:23 ( f ) mostly stays above 0.9 and eventually decreases to 0.89 (Fig. 4.16a). The 2 2 ( f ) and γ13.2 ( f ) slightly exceed 0.9 at low frequencies and partial coherences γ12.3 decrease to 0.73 and 0.66 at f = 0.5 cpy. Analysis of the time series x 1,t , x 2,t , and x 3,t reduced to zero mean values shows that the dependence between them often deviates from the linear transformation x 1,t = 0.71 x 2,t + 0.29 x 3,t , which seems to be the reason for the unexpectedly low values of the coherence function.
Fig. 4.16 Multiple (solid black) and partial (dashed and gray) coherence functions describing the dependence of global temperature upon the oceanic and terrestrial temperatures
4.3 Analysis of Global, Hemispheric, Oceanic, and Terrestrial …
189
The coherence functions that correspond to the data set HadCRUT4 behave much closer to a strictly linear relation between the scalar components of respective trivariate time series (Fig. 4.16b) with the multiple coherence staying above 0.975 and with partial coherences of between 0.957 and 0.895. This is better than what we see with the HadCRUT5 and the comparisons should be continued. The multiple coherent spectra of the global temperature calculated for the two data sets show more similarity and some small differences at frequencies higher than 0.05 cpy. The time series HadCRUT5 and HadCRUT4 are practically identical in this respect. The partial coherent spectra will not be considered here due to their doubtful reliability. Consider now the estimates of the frequency response functions. Let the gain factors for the pairs global/oceanic and global/land temperature will be, respectively, g12.3 (f ) and g13.2 (f ). As seen from Fig. 4.17a, both lines deviate from the constants 0.71 and 0.29 corresponding to the shares of the Earth’s surface taken by oceans and lands. The positive or negative deviations from the constants in the HadCRUT5 data can be as high as 15–25%. To compare with the previous results, we analyzed the older UEA data set and found that the deviation from the expected values 0.71 and 0.29 are noticeably smaller for the older set (Fig. 4.17b). Again, it looks like the data set HadCRUT4 behaves in better agreement with coefficients 0.71 and 0.29 than HadCRUT5. The phase and time lag factors show that all three processes are synchronous. This ends Example 4.3. The two latter examples should probably be repeated uisng the monthly data.
Fig. 4.17 Gain factor estimates for relationship of global temperature with oceanic and terrestrial temperature (the newer and previous UEA data sets)
190
4 Analysis of Trivariate Time Series
4.4 Application to Engineering Data Continuing with the mechanical engineering data, consider the case of a trivariate time series which consists of the output OUT and inputs IN1 and IN2. As in the previous chapters, our goal is to acquaint the reader/user with the structure of time series analysis that helps engineers to build reliable and long-lasting devices and constructions. Example 4.4 Frequency domain analysis of trivariate time series An increase in the number of input processes results in a new stochastic linear system which will probably differ in its frequency domain properties from the system with fewer input processes. Computations with AVESTA3 show that changes do occur: the spectra of the output estimated by AVESTA1 in the scalar case and with AVESTA3 in the bivariate and trivariate cases are not identical and similar changes occur in other frequency dependent functions. This evolution is described below by comparing the coherence functions and gain factors obtained for systems with one and two inputs (time series dimensions D = 2 and D = 3). When D = 2, we have one coherence function that shows the dependence between the output and the first input IN1 (Fig. 3.37a). In the system with two inputs, there are three coherences: one multiple and two partials. Having a system with one input and one output we will always have just one gain factor. In the bivariate case, it is shown in Fig. 3.38. In order to see what happens to our results when D is changed from D = 2 to D = 3, we will compare 2 the coherence functions γ12 ( f ) from the example with one input with the multiple 2 coherence γ1:23 ( f ) which tells us about the relation between the output and both inputs. The evolution of coherence caused by switching from a bivariate case to trivariate is shown in Fig. 4.18a. The multiple coherence is higher than the simple bivariate coherence which means that by adding another input we improved our knowledge of the system. This behavior of the coherence will tell upon the coherent spectrum: with two inputs, the coherent spectrum moves closer to the output spectrum (cf. Figs. 3.37b and 4.18b). The partial coherent spectra will not be discussed here due to the problem with ordering of the input time series. However, the partial coherence estimates produced by AVESTA3 are correct and do not depend upon the inputs ordering. The output’s response to the inputs is described with the gain factor. In the trivariate case, we have two gain factors shown in Fig. 4.19. Thus, adding an input has resulted in a significant change in the response of the output process to the input IN1: previously, the response of the output to the input IN1 had maxima at frequencies close to 700 Hz and 750 Hz while the addition of the second input (IN2) drastically changed the picture leaving just one peak per input time series but a much stronger: the two amplification coefficients changed from less than 4 to 13 and to 8. Getting new information resulted in a much different gain factor. Actually, the inputs IN1 and IN2 are coherent to each other and this connection must have influenced the gain factor.
4.4 Application to Engineering Data
191
Fig. 4.18 Multiple coherence (a), output spectrum (gray) and coherent output spectrum (b)
Fig. 4.19 Gain factors g12.3 (f ), g13.2 (f ) and gain factors g12 (f ) of the bivariate systems OUT/IN1 and OUT/IN2 (black and gray)
Our examples of engineering data analysis do not lead to any scientifically important conclusions but our goal here was just to show what type of time series analysis is typical in mechanical engineering. This is what an engineering student studying mechanical systems subjected to random vibrations has to deal with and what type of information need to be obtained and studied. This ends Example 4.4 and Sect. 4.3.
192
4 Analysis of Trivariate Time Series
4.5 Conclusions This short chapter contains information for the reader and/or user of AVESTA3 that opens the door to analysis of more complicated stochastic systems than a system with one input and one output. Currently, the approach to studying relations between natural phenomena presented with time series is mostly based upon the method of linear regression which is inapplicable to time series. Some tools that allow one to study multivariate regressions are known in climatology, specifically, the same regression equations combined with the principal component and closely related to it the empirical orthogonal function methods. However, both methods are not designed for time series analysis, especially in the frequency domain, which is an absolutely necessary condition because statistical properties of time series are frequency-dependent. The most common task in natural sciences in this respect is studying relation between two time series. That issue is discussed here in Chap. 3 in details and, hopefully, the reader of the book gets enough information to at least suspect that the regression approach that belongs neither to the theory of random processes nor to its methods should not be used for finding dependences between time series. The examples with trivariate time series belonging to natural sciences were given to show the potential of multivariate analysis in meteorology and climatology. Hopefully, the availability of AVESTA3 program will allow researchers to at least learn that the tools created by mathematicians can allow us to start studying complicated geophysical and other natural data. This chapter is written in an optimistic belief that the remarkable achievements reached by engineers and making our life easier and more efficient will tempt at least some of us to reconsider the erroneous theoretical basis of time series analysis that currently exists in natural sciences.
References Bendat J, Piersol A (2010) Random data, 4th edn. Wiley, Hoboken, New Jersey Phillips A, Allemang R (2022) Frequency response function estimation. Handbook of Experimental Structural Dynamics. https://doi.org/10.1007/978-1-4939-6503-8_8-1 Privalsky V (2021) Time series analysis in climatology and related sciences. Springer
Chapter 5
Conclusions and Recommendations
Once again, all processes generated by nature on the Earth and, seemingly, on the Sun, are random. Consequently, the tools for analysis of observed and simulated time series should agree with the theory of random processes. This book gives the researchers working in natural sciences a tool that allows them to study the behavior of nature indicators over time through stochastic models of that behavior as a set of random processes rather than as sets of time invariant random variables. By using the instruments of this tool, which include executable programs AVESTA1 and AVESTA3, the researcher can avoid the necessity to go into the depths of the theory of random processes and into methods of time series analysis built within the framework of this theory. The basic theoretical information about random processes required for this purpose is given in a simple form at the beginning of each chapter: general information in the introductory Chap. 1 and some specific information about scalar, bivariate, and trivariate time series in Chaps. 2, 3, and 4. This information is supposed to be sufficient for understanding the results of analysis provided by the programs but it also requires experiments with practical application of the programs. In what follows in this chapter, we will assume that the reader is already familiar with the programs and got some experience in using them by repeating the examples given in the book and can use the programs AVESTA1 and AVESTA3 for conducting original research independently of this book. Actually, the book is an attempt to supply the natural science researchers with easy-to-use instruments, which will keep the practical research within the mathematically proper methods of time series analysis. Some of the functions given by AVESTA1 and AVESTA3 are known in natural science (e.g., spectral density and, to a lesser extent, coherence function), some (stochastic difference equations, mathematically proper extrapolation of stationary time series, statistical predictability criteria, coherent spectrum, ordinary, multiple, and partial coherence functions, gain and phase factors) must have been new for many if not most of us but the meaning of all those functions should not have been difficult to understand. Hopefully, the reader has already acquired some experience in analyzing scalar and multivariate time series and learned how to use the information provided by the programs for solving various research tasks. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 V. Privalsky, Practical Time Series Analysis in Natural Sciences, Progress in Geophysics, https://doi.org/10.1007/978-3-031-16891-8_5
193
194
5 Conclusions and Recommendations
The book can be regarded as a methodological text illustrated with the help of specific tools of analysis and with many practical examples, which should be regarded as problems, or exercises, given with questions and answers and helping the reader to acquire knowledge in autoregressive scalar and multivariate time series analysis and experience in application of this knowledge. The sharp necessity in this or similar means of analysis of natural processes is caused by the current direction of research in natural sciences that would not follow the theory of random processes with its methods and tries to substitute it with mathematical statistics. The programs contained in this book produce results of scalar and multivariate time series analysis in agreement with the theory of random processes and, when necessary, with the classical mathematical statistics and information theory. They provide data about time series probability distribution and, whenever possible, quantitative information about statistical reliability of estimated scalar parameters and of all relevant functions of time and frequency. Results of analysis of individual scalar and multivariate time series given by AVESTA1 and AVESTA3 are supposed to characterize respective scalar or multivariate processes that occur in nature. By running the programs, the user obtains, in addition to the mean value and variances, estimates of the most important statistics of any time series: the agreement or disagreement of its probability distribution with the Gaussian (normal) probability distribution, correlation and cross-correlation functions, time domain models, and, what is especially important, estimates of the time series spectra and other functions of frequency. In what follows, we will try to sum up what new and useful information about time series the researcher of natural processes could have acquired from this book and from its executable programs. We will briefly summarize the abilities of the programs and the essentials of the examples of their applications for analysis of real and simulated time series. Two issues—extrapolation of scalar time series and the results of analysis of climate simulations through general circulation models—will be discussed in more detail. The programs are easy to use when the user knows his or her requirements to the optional preliminary processing of the time series. Combining these preliminary steps and the entire analysis of the resulting time series within a single run of the program makes the process of analysis quite simple. Among other tasks, the examples given in Chap. 2 describe simple transformations of the time series prior to its analysis but we will not discuss them in the concluding chapter. Just make sure that the preliminary processing you intend to do is necessary. The AVESTA1 program fits a user-prescribed number of autoregressive models to the time series, selects the best models in accordance with special order selection criteria, and provides detailed time and frequency domain information about it. This includes autoregressive estimates of the most important statistics of any stationary time series—the spectral density. All estimates of autoregressive coefficients and the selected spectrum estimate are given in the way that prevents the user from committing the widespread sin of showing the estimates naked, without respective confidence bounds. With exceptions of PDF characteristics and the forecasting ability, the information provided by AVESTA1 is rather traditional for applied mathematics and
5 Conclusions and Recommendations
195
engineering and it can be obtained in one run of the program. Similar information can also be acquired through other sources but at the price of more efforts. In this author’s opinion, the AVESTA1 program is unique in natural sciences because it solves in a single run a number of tasks obligatory for analysis of scalar time series. The AVESTA1 executable program tells the user whether the data contained in a given stationary time series can be regarded as belonging to a Gaussian probability distribution. Having this information, the user knows, in particular, whether the time series belongs to an ergodic random process and whether the time series extrapolation requires additional research. If the sample record is Gaussian, it describes properties of the entire random process that generated it and, additionally, its linear extrapolation in accordance with the Kolmogorov and Wiener theory produces the best possible forecast with the smallest error variance. Any effort to improve these results are pointless. Hundreds if not thousands of mathematically incorrect articles on time series forecasting are published every year in our sciences. Many if not most articles do not even mention the time series PDF. Our colleagues participating in these activities have no idea that they are trying again and again to solve a mathematical problem that had been completely resolved close to a century ago. Approach your task in agreement with the Kolmogorov-Wiener theory of extrapolation of stationary random processes and you will always get the best possible linear prediction; if the time series is Gaussian, it will be the prediction with the least possible error variance. Hopefully, this theoretical information supported with practical examples may force some researcher to think twice before offering and trying a new method of time series forecasting. If your time series is not Gaussian, a higher prediction quality is possible; all you have to do is select the proper PDF, use Yaglom’s spectral function for extrapolation (Yaglom 1962) or develop a new method of nonlinear extrapolation specifically for that PDF and the time series model, extrapolate at lead times from one to a prescribed maximum, and define the prediction error variance as a function of lead times. The book contains a remarkable presentation by A. Yaglom and his young colleagues about nonlinear forecasting of non-Gaussian processes published in Russian 61 years ago and translated in this book into English seemingly for the first time. The main conclusion supported with 13 mathematically strict examples of nonlinear extrapolation is: if you are extrapolating a non-Gaussian time series in agreement with the type of its probability distribution and its spectral density, your result can be better than the linear Gaussian extrapolation by just several per cent. Compare your solution with the classical linear result and decide whether the game is worth the effort. Attachment 2.2 contains the Yaglom’s paper modified by Springer. In the multivariate case, the bi- or trivariate time domain models are used to obtain a number of frequency dependent functions which characterize the multivariate time series features related to possible interdependences between the time series which are treated as the output and input(s) of a linear stochastic system. Chapter 3 dedicated to bivariate time series analysis is probably the most useful part of the book. The position which dominates natural sciences, first of all, climatology, is that relations between two time series should be studied on the basis of the
196
5 Conclusions and Recommendations
cross-correlation coefficient and regression equation, possibly, with some additional decorations. The fact that the correct approach had been described in information theory 66 years ago (Gelfand and Yaglom 1957), confirmed in econometrics not later than 59 years ago (Granger and Hatanaka 1964), and substituted with mathematically proper engineering methods in a series of classical books between 1966 and 2015 (e.g., Bendat and Piersol 1966; Box et al. 2015) is still ignored in natural sciences. The same destiny befell the few publications by this author between 1988 and 2021 (Privalsky 1988, 2015, 2018, 2021; Privalsky and Jensen 1995) describing the correct method with examples from climatology and solar research. This problem with the lack of a correct mathematical basis is known to climatologists but it can hardly be fixed at this time (e.g., see IPCC 2013). If the reader using the traditional regression approach to time series reconstruction was not familiar with this situation, Examples 3.1 and 3.2 illustrating the inability of the traditional approach should have convinced him or her that the traditional approach is wrong. Hopefully, the example of time series reconstruction conducted with a mathematically proper method suggested by this author in 2018 may persuade some researchers that the currently dominant approach should be reconstructed. The high sensitivity of bivariate autoregressive analysis to the presence of a very weak but regular signal in a bivariate time series is proved in Example 3.3 while the following two examples present the cases when the dependence between the global temperature and the ENSO are detected and analyzed at the climatic time scales. The mutual dependence between these phenomena does exist but it cannot affect the low-frequency part of the global temperature variations. Section 3.4 should have convinced the reader that a proper approach to teleconnection research may bring important results: applying AVESTA3 to the bivariate time series HadCRUT5/NINO3.4 has resulted in detection of a statistically reliable teleconnection between the globally averaged surface temperature with the oceanic component of ENSO. The important feature of this effort is that it demonstrates an interconnection at climatic scales: from years to decades. Examples 3.4 and 3.5 demonstrate a reliable linear dependence between the global temperature and ENSO within the frequency band from, roughly, 0.1 cpy and 0.4 cpy (from 2.5 yrs to 10 yrs) especially strong at time scales from 3 to 5 years. It also includes a closed feedback loop, which shows a weak dependence of global temperature upon the ENSO and a strong effect of the global temperature upon the oceanic component of ENSO. The connection remains strong at intermediate frequencies and disappears at time scales longer than 10 yrs. These results present an updated and expanded version of a previous research (Privalsky 2021). A mathematically proper and rather simple method of time series reconstruction described in Sect. 3.5 (Example 3.6) should convince the reader that the suggested method of autoregressive reconstructions (ARR) is much more efficient that the traditional approach through a regression equation: the respective shares of reconstruction amount to 69% and 3% of the reconstructed time series variance. The ARR method is known for five years but it cannot compete with the mathematically incorrect traditional methods used, in particular, in the IPCC climate change programs (e.g., IPCC 2013 and later).
5 Conclusions and Recommendations
197
Simulating the Earth climate on the basis of fluid dynamics is an attempt to build models of a random process through numerical integration of deterministic equations. It may be the only way to build models capable of reproducing properties of the observed climate and then use them for projecting climate’s response to external forcings. The simulated time series of global temperature obtained with the help of these numerical versions of dynamic equations present sample records of artificially produced climate, which becomes random due to the changes in the initial conditions, the discrete character of the computational grid, and, possibly, due to some other reasons. The property of the AVESTA1 and AVESTA3 programs to serve as instruments for estimating the quality of climate simulations with general circulation models is applied in this book for validating the results of simulated climate consisting of a set of scalar time series (Chap. 2) and the set of bivariate time series connecting the global temperature to the ENSO phenomenon (Chap. 3). The attempt to verify results of climate description with the general circulation models used in CMIP6 brought an unexpected result. In contrast with the results of analysis of observation data, which show a connection between the global temperature and ENSO within a limited band of frequencies from 0.1 cpy to 0.4 cpy, the GCMs extend this link to frequencies below 0.1 cpy so that the ENSO becomes the dominant source of climate variability at decadal and longer time scales. This feature of simulated climate disagrees with observations and can hardly be explained physically. Unfortunately, the latest for this book IPCC Report published in 2013 shows no interest to the fact that climate presents a multivariate random process. More than that, this entire fundamental scientific document titled as the Physical Science Basis (1535 pages) does not even contain the term ‘random process’ and does not seem to pay enough attention to analysis of current observations that describe multivariate processes. Application of AVESTA3 to simulation data made this critical defect obvious. According to the report, ”[t]here is high confidence that the El Niño-Southern Oscillation (ENSO) will remain a dominant mode of natural climate variability in the twenty-first century with global influences …” (IPCC 2013, pp. 106, 1243). This statement is supported by climate simulations with GCMs in the CMIP5 and CMIP6 experiments (also see Privalsky and Yushkov 2014). However, those results disagree with observations which show that the ENSO influence upon the average global temperature is small and concentrated within the frequency interval where the energy of climate variability is weak. According to observations, the ENSO is not related to climate variations at the time scales longer than about 7–10 years. This error in the results of GCM simulations means that the mechanism of generating the variability of climate at lower frequencies (time scales longer than one decade) is incorrect. The frequency band below 0.1 cycle per year is most important for explaining the current climate warming and, in particular, for getting projections of climate’s behavior under the external anthropogenic forcing. The above quoted statement regarding the dominant role of ENSO in generation of climate variations strongly disagrees with what is observed in nature but is present in GCM simulations.
198
5 Conclusions and Recommendations
In these authors’ opinion, this flaw makes the numerical climate models incorrect and all projections made on their basis should be regarded as unreliable. In addition to this, reconstructions of past climates play an important role in supporting the results obtained with numerical models of climate. However, as stated in the fundamental review of large-scale temperature reconstructions by Christiansen and Ljungqvist (2017, p.13), “[a] variety of different methods has been applied throughout the literature, but they have almost always been linear and based on some form of univariate or multivariate linear regression”. The reader of this book already knows that this approach is wrong (Sect. 3.5). This combination of erroneous results makes the conclusions about the probable behavior of climate in the twentyfirst century untrustworthy. To resolve the problem, the conclusions made in the IPCC report and in other similar documents should be reviewed by experts who do not participate in any climate change programs and whose expertise covers, in particular, aero- and hydrodynamics (e.g., aircraft design and naval architecture) and mechanical engineering (e.g., car, aircraft and spacecraft designers). On the whole, the book has given the reader and user two mathematically correct and easy-to-use instruments for analysis of scalar, bivariate, and trivariate time series. It also contains a number of practical examples which show the abilities of the programs in different areas of natural sciences. The examples are given, first of all, for practical analysis of time series by the reader but some of them show interesting original results. Hopefully, the book will be found useful for researchers studying natural phenomena. A number of recommendations regarding practical time series analysis was given recently in the previous book by this author (Privalsky 2021). Judging by what can be seen in current publications, those recommendations remain useful or even mandatory, they are partially reproduced below: . do not use estimates of statistical parameters and functions without respective confidence intervals at a specified confidence level, which are always provided within the methods developed by professionals; estimates without confidence bounds have absolutely no value; . do not forget to estimate the probability density function of time series; . do not filter your time series without first proving the necessity of the operation; . do not use any linear method of time series extrapolation (prediction, forecasting) not based upon the Kolmogorov-Wiener theory; if you use a nonlinear method, prove that it is more efficient than the linear method; . do not estimate too many statistical parameters when analyzing a time series; . do not apply methods of classical mathematical statistics to describe relation between time series; Actually, all these and some other worries about applying improper methods and obtaining incorrect results become irrelevant if the user follows the foremost recommendation for anyone who is interested in time series analysis: study and forecast time series with methods developed in agreement with the theory of random processes. Good luck!
References
199
References Bendat J, Piersol A (1966) Measurement and analysis of random data. Wiley, New York Box G, Jenkins G, Reinsel G, Liung G (2015) Time series analysis. Forecasting and control, 5th edn. Wiley, Hoboken Christiansen B, Ljungqvist F (2017) Challenges and perspectives for large-scale temperature reconstructions of the past two millennia. Rev Geophys 55:40–96 Gelfand I, Yaglom A (1957) Calculation of the amount of information about a random function contained in another such function, Uspekhi Matematicheskikh Nauk, 12:3–52, English translation: American Mathematical Society Translation Series 2(12):199–246, 1959 Granger C, Hatanaka M (1964) Spectral analysis of economic time series. Princeton University Press, Princeton IPCC (2013) Climate Change 2013: The physical science basis. In: Stocker TF, Qin D, Plattner G-K, Tignor M, Allen SK, Boschung J, Nauels A, Xia Y, Bex V, Midgley PM (eds) Contribution of working group I to the fifth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp 1535 Privalsky V (1988) Stochastic models and spectra of interannual variability of mean annual sea surface temperature in the North Atlantic. Dyn Atmos Ocean 12:1–18 Privalsky V, Jensen D (1995) Assessment of the influence of ENSO on annual global air temperature. Dyn Atmos Ocean 22:161–178 Privalsky V, Yushkov V (2014) ENSO influence upon global temperature in nature and in CMIP5 simulations. Atmos Sci Let. https://doi.org/10.1002/asl2.548 Privalsky V (2015) On studying relations between time series in climatology. Earth Syst Dynam 8:389–397. https://doi.org/10.5194/esd-6-389-2015 Privalsky V (2018) A new method for reconstruction of solar irradiance. J Atmos Sol Ter Phy 72:138–142 Privalsky V (2021) Time series analysis in climatology and related sciences. Springer Yaglom A (1962) Introduction to the theory of stationary random functions. Prentice Hall