211 28 3MB
English Pages 106 Year 2020
Viacheslav Karmalita Stochastic Dynamics of Economic Cycles
Also of interest Metrology of Automated Tests. Static and Dynamic Characteristics Viacheslav Karmalita, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ---- Digital Processing of Random Oscillations Viacheslav Karmalita, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ----
Advances in Mathematical Inequalities Shigeru Furuichi, Hamid Reza Moradi, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ----
Fuzzy Machine Learning. Advanced Approaches to Solve Optimization Problems Arindam Chaudhuri, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ----
Computational Methods for Data Analysis Yeliz Karaca, Carlo Cattani, ISBN ----, e-ISBN (PDF) ----, e-ISBN (EPUB) ----
Viacheslav Karmalita
Stochastic Dynamics of Economic Cycles
Mathematics Subject Classification 2010 37N40 Author Dr. Viacheslav Karmalita 255 Boylan Avenue DORVAL QC H9S 5J7 Canada
ISBN 978-3-11-070698-7 e-ISBN (PDF) 978-3-11-070702-1 e-ISBN (EPUB) 978-3-11-070703-8 Library of Congress Control Number: 2020941877 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2020 Walter de Gruyter GmbH, Berlin/Boston Cover image: duncan1890/E+/Getty Images Typesetting: Integra Software Services Pvt. Ltd. Printing and binding: CPI books GmbH, Leck www.degruyter.com
Preface People’s acquaintance with cyclic phenomena begins from the start of their conscious life. We are faced with natural cycles such as the change from day to night. Intuitively, cyclic behavior is perceived by us as repetition of the same event. The time interval between two consecutive occurrences of the event is called the cycle period hereinafter denoted as T0. In the example of the change from day to night, T0 = 24 hours. Another name for cyclic processes is periodic processes, meaning that the event in question appears after a defined time period. In our development, we begin to recognize the presence of causal relationships in the observed natural cycles. For example, we learn that the change from day to night occurs due to the rotation (motion) of the earth (object) in outer space (environment). Our acquaintance with cyclic phenomena significantly expands upon subsequent exposure to a variety of machines and structures which are the products of human activity. This occurs when we meet artificially created objects that are present in our everyday life. Thus, an understanding emerges that, first, cyclic processes are a universal concept inherent in phenomena both natural and artificial. Second, they can have various reasons for their existence. For example, we can watch the rider swing with the motorcycle up and down on rough roads. The source of the swaying motion (oscillations) is the external force on the motorcycle wheels due to road irregularities. The motorcycle is usually equipped with shock absorbers, which are a kind of an elastic system. If such a system is temporarily removed (deformed) from its equilibrium state by an external force, then the system oscillates for some time around its equilibrium state. Such oscillations are called natural (free) oscillations. Their period T0 is determined by the oscillating mass ms and the spring constant ks of the shock absorber. The latter is a measure of the stiffness (resistance to deformations) of the shock absorber whereas ms is equal to the mass of the rider plus the motorcycle over the shock absorbers. Therefore, a sudden impact to the wheel results in oscillations of the motorcycle with a certain frequency f0: sffiffiffiffiffiffi 1 1 ks · f0 = = : T0 2π ms The natural frequency is measured in the number of cycles during a given time period. Oscillations caused by a single impact gradually fade away. The reason for their attenuation is the resistance force due to internal friction in the components of the shock absorber. The property of an elastic system to absorb cyclic strain energy is called the damping ability which is characterized by the damping factor hereinafter
https://doi.org/10.1515/9783110707021-202
VI
Preface
denoted as h. In this way, the shock absorbers smooth out the impact of road bumps providing for safe handling and a comfortable ride. If we turn to economic systems being the products of human activities, they are also characterized by the presence of fluctuations in their output. These fluctuations are called economic cycles. This book deals with cycle models, the availability of which allows us to describe the behavior (dynamics) of the economic cycles over time. Such knowledge provides an opportunity to determine a strategy to control the system in order to achieve the desired economic results. Since the economy is an environment involving the interaction of numerous entities, it is quite fair to admit the presence of stochastic elements in the results of their activities. Therefore, elements of probability theory and mathematical statistics are presented in Chapters I and II. The scope and depth of their presentation are quite sufficient for understanding the approaches and methods discussed in the following text. Chapter III discusses continuous and discrete models of oscillatory systems. The conditions that provide a discrete adequacy of those systems are determined as well. This chapter concludes with discussions of statistical methods to estimate the parameters of concerned models. Chapter IV introduces a probabilistic description of the investment function as well as appropriate models which correspond to the stochastic nature of economic cycles. The suitability of the models for macroeconomic analysis and econometric estimations is discussed as well. Chapter V is devoted to detailed consideration of the features of estimating cycle characteristics. The necessity for such discussion arises due to the absence of direct estimates of the income function and its multicomponent nature, as well as nonstationarity of economic systems. At the end of this chapter, the accuracy of the results of estimation procedures is analyzed. This book is intended for professionals dealing with macroeconomic studies and econometric estimations as well as for graduate and postgraduate students specializing in these branches of knowledge.
Contents Preface
V
Introduction
1
Chapter I Elements of Probability Theory 4 1.1 Random Variable 4 1.2 Function of Random Variables 10 1.2.1 Function of One Argument 11 1.2.2 Function of Multiple Arguments 13 1.3 Random Process 19 1.4 Linear Time Series 21 Chapter II Adaptation of Probabilistic Models 26 2.1 Processing of Experimental Data 26 2.2 Criterion of Maximum Likelihood 29 2.3 Properties of Maximum Likelihood Estimates 2.4 Least-Squares Method 35 2.5 Statistical Inferences 36
30
Chapter III Stochastic Oscillatory Processes 39 3.1 Random Oscillations 39 3.2 Yule Series 43 3.3 Discrete Coincidence of Yule Series and Random Oscillations 3.4 Estimation of Yule Model Factors 51 3.4.1 Estimates of Maximum Likelihood 51 3.4.2 Least-Squares Estimation 54 Chapter IV Modelling of Economic Cycles 58 4.1 Probabilistic Description of Investment Function 4.2 Stochastic Models of Economic Cycles 60 4.3 Prospects of Applying Cycle Models 65 Chapter V Features of Estimation Procedure 68 5.1 Recovery of the Income Function from GDP Estimates 5.1.1 Ill-Posed Recovery Task 69
58
68
47
VIII
Contents
5.1.2 5.2 5.3 5.4 5.4.1 5.4.2
Discrete Implementation of the Recovery Procedure Decomposition of the Income Function 77 Pseudo-Stationarity of Economic Cycles 79 Accuracy of Estimates 81 Statistical Errors 82 Bias of Estimates 88
Summary
92
References Index
97
95
74
Introduction An inalienable part of human activity is cognitive action. It is realized on the basis of formulated cognitive tasks that allow to decompose a research problem into sequential steps. Cognitive tasks may be classified as theoretical and empirical. An example of theoretical tasks in macroeconomics is the development and research of models of certain economic systems (regional, national and global). In this case, the model is understood to be a theoretical (mathematical) image of a real phenomenon (process) created for in-depth study of reality. A system model is created based on the concept of its wholeness [1]. An economy, being a material object, can be modeled as a geographically (two-coordinate configured) distributed multidimensional (several inputs/outputs) system. Therefore, its output X(t) and input Y(t) may be represented as vectors: 0
1 X1 ðtÞ B C XðtÞ = @ . A = Xj ðtÞ ,
j = 1, k;
Xk ðt Þ 0
1 Y1 ð t Þ B C Y ðtÞ = @ . A = fYl ðtÞg,
l = 1, m:
Ym ðtÞ Any economic system is dynamic, that is, the output X(t) at any point in time is defined by input Y(t) at the same instant. In addition, there is a rule that determines the evolution of a system’s initial state over time [2]. In the general case, such a dynamic system is nonlinear. In other words, for different input values, the same increment of input may lead to a different increment of output. As an economic system has external interactions with its operating environment (market) in the form of matter and information, the system can be classified as open. Moreover, it is dissipative [3] because it supplies (loses) a product (matter) to the market. Such a system includes localized and interconnected objects (business entities). They can arise and disappear in distributed nonlinear dissipative systems due to self-organization mechanisms inherent to this system [4]. In the case of economic systems, the elements of these mechanisms are establishment, bankrupcy, merger, acquisition, restructure and so on. If each output Xj(t) and input Yl(t) is integrated inside of a designated area (region, nation and world), the corresponding systems will be lumped. Models of such systems can be written in terms of partial differential equations (PDE) as follows: ΛXðtÞ = Y ðtÞ.
https://doi.org/10.1515/9783110707021-001
(1)
2
Introduction
The mathematical operator Λ (a system of differential equations) relates the input Y(t) to a system’s state X(t) under initial and boundary (interactions with a market) conditions. When changes in time of the system’s outputs are described in terms of their values as well as their velocity and acceleration, then equations of operator Λ in (1) will include the PDE of the second order. In other words, they will contain the first and second partial derivatives of the output X(t). If we consider the system with only one input Yj and one output Yl, then its model will be one-dimensional. Such a model can be described by ordinary differential equation of the second order. The latter is written in the following form: . .. F t, Yl , Xj , Xj , Xj = 0
(2)
.. where X_ j and Xj are the first and the second derivatives, respectively: dXj ðtÞ X_ j ðtÞ = ; dt ..
X j ðt Þ =
d2 Xj ðtÞ . dt2
Empirical tasks consist of disclosure and examination of facts related to the studied economic systems. Solutions of such tasks are obtained via a means of a specific cognitive method called estimation which provides quantified data about the researched system. The application of this method is carried out by econometricians with the use of mathematical methods and economic data. The latter is the result of observing Yl(t) and Xj(t) over time which provides an ability to estimate the coefficients of a system model. In other words, these coefficient values provide a quantitative description of correlation between the system’s input and output. Thus, macroeconomic models can be tested and verified. The presence of a verified model provides abilities to analyze its current state and predict the system’s behavior over time. In turn, this knowledge forms the basis for subsequent development of an appropriate governing strategy to achieve desired economic results. The economic data is represented as the values of calculated indicators (characteristics). Their specificity is due to the fact that such data are available only at discrete instances of time ti = Δt·i (i = 1, . . ., n). That is, data have the form of a time series represented by a sequence of real numbers yi = Y(ti) and xi = X(ti). Therefore, the system models have to have a discrete form [5], and for their development, economists use finite differences of the input and output of the system: Δyi = yi − yi − 1 ; Δxi = xi − xi − 1 . Essentially, the difference equations are approximations of differential equations (2) obtained by replacing the derivatives with corresponding finite differences. As a result,
Introduction
3
the difference model, being an approximation of the original continuous model, features a methodical error. From the definition of the derivative of a variable in the form Δxi , X_ ðti Þ = lim Δt!0 Δt it follows that if Δt is finite, then Δxi X_ ðti Þ ≈ . Δt Therefore, due to the discrete nature of the economic data, the approximation error cannot be eliminated completely. Meanwhile, there is a long-term practice of using a time series which discretely coincides (no approximation error) with an ordinary differential equation [6]. The advantages of time-series models lie in the presence of well-developed procedures to estimate their factors [7]. This fact results in the ability to create a straightforward mathematical description of the estimation procedure from time series values and to obtain the estimates of coefficients of a differential equation. In addition, matching continuous and discrete models can provide a one-to-one conversion of econometric characteristics to coefficients of differential equations. This conversion will give an opportunity to analyze the behavior and properties of the economic models in terms that are generally accepted in human practice (efficiency, loss, gain, natural frequency and so on). It should be noted that until now, deterministic approach dominates the modeling of economic cycles [8]. This fact is not surprising because the economy is a megasystem with very slow processes that can last for years or even decades. Accordingly, the human mentality perceives them as pseudo-deterministic. However, these processes are a result of activities of millions of investors whose actions are not synchronized. Furthermore, they may also be opposite, for instance, while one makes investments another divests them. The random nature of such activities motivated the author to use stochastic approaches for the study of economic cycles. This cognitive action will include both the development of cycle models as well as methods to estimate their parameters.
Chapter I Elements of Probability Theory Stochastic approaches presented in this book make it necessary to discuss mathematical models related to the concept of probability. This chapter deals with models that are mathematical descriptions of random phenomena. The following material is devoted to presentation of such models as well as examination of their properties.
1.1 Random Variable Let us turn to a certain phenomenon whose observation results from a sample space A. Elements (points) of this space may be grouped by different ways into subspaces A1, . . ., Ai, . . ., Ak referred to as events. Appearance of an experimental result inside any subspace implies the occurrence of a specific event. That is to say, the experiment always results in the event: A = A1 + A2 + + Ak . A certain event Ai may be given a quantitative characteristic through the frequency of this event’s occurrence in n experiments. Let m(Ai) be the number of experiments in which the event Ai was observed. Then the frequency ν(Ai) of this event (event frequency) can be determined by the following expression: vðAi Þ = mðAi Þ=n. It is evident that the event frequency can be calculated only at the experiment’s completion and, generally speaking, depends on the kind of experiments and their number. Therefore, in mathematics, an objective measure P(Ai) of the event frequency is postulated. The measure P(Ai) is called the probability of the event Ai and is independent of the results in individual experiments. It is possible to state that: PðAi Þ = lim νðAi Þ. n !∞
If the experiment result is represented by a real number Ξ called a random variable, one may represent events in a form of conditions Ξ < ξ, where ξ is a certain number. In other words, an event may be determined as a multitude of possible outcomes satisfying the nonequality Ξ < ξ. The probability of such an event is a function of ξ and is called the cumulative distribution (or just distribution) function F(ξ) of the random variable Ξ: FðξÞ = PðΞ ≤ ξÞ .
https://doi.org/10.1515/9783110707021-002
1.1 Random Variable
5
It is clear that if a ≤ b, then PðaÞ ≤ PðbÞ;
Pð − ∞Þ = 0;
Pð + ∞Þ = 1.
Any distribution function is monotonous and nondiminishing. An example of such a function is represented in Fig. 1.
Fig. 1: A view of probability distribution function.
If the probability distribution function F(ξ) is continuous and differentiable, its first derivative of the form f ðξ Þ =
dF ðξ Þ dξ
is termed as the probability density function (PDF) of the random variable Ξ. Note that: ða PðΞ ≤ aÞ = F ðaÞ =
f ðξ Þdξ;
−∞
ðb Pða ≤ Ξ ≤ bÞ =
f ðξ Þdξ = FðbÞ − F ðaÞ; a
∞ ð
f ðξ Þdξ = 1. −∞
There are the parameters of the distribution function that are often used instead of the function itself. One of these parameters is the mathematical expectation of the variable Ξ: ∞ ð
μξ = M ½Ξ = −∞
ξ · f ðξ Þdξ.
6
Chapter I Elements of Probability Theory
The expectation of any real, single-valued, continuous function g(Ξ) may be expressed in a similar way: ∞ ð
M½gðΞÞ =
gðξ Þ · f ðξ Þdξ.
−∞
Note that mathematical expectations are not random but deterministic values. Of particular interest are functions of the type: gl ðΞÞ = ðΞ − μξ Þl , whose expectations are referred to as the lth-order central moments noted as l
αl = M Ξ − μξ : Specifically, the value α2 = Dξ = σξ2 is the lowest-order moment which evaluates the mean deviation of a random variable around its expectation. This central moment is called variance and σξ is referred to as the root-mean-square (rms) deviation. As an example, let us examine the probability density of random variables referred to in this text. The first example is related to a probability scheme characterized by maximum uncertainty of the results. It is a case when all values of Ξ in the range a . . . b have the same probability. The corresponding probability density (called uniform) of such a random variable Ξ is ( 1 , a ≤ ξ ≤ b; f ðξ Þ = b − a 0, ξ < 0, ξ > b. A view of the uniform density probability is represented in Fig. 2.
Fig. 2: Uniform PDF.
The uniformly distributed variable Ξ has the mathematical expectation ðb μξ =
ξ · f ðξ Þdξ = a
and variance
a+b , 2
1.1 Random Variable
Dξ =
ðb
ξ − μξ
2
7
· f ðξÞdξ = ðb − aÞ2 =12.
a
The uniform distribution has its merit when one is looking for maximum entropy of empiric data. Another type of probability density under examination is called the normal (Gaussian) distribution. The distribution of the normal value is described by the Gauss law: " # ðξ − μξ Þ2 1 f ðξ Þ = pffiffiffiffiffi exp − 2Dξ 2π · σξ A view of the PDF of the normal random value with μξ = 0 is represented in Fig. 3.
Fig. 3: The Gaussian PDF.
Here, ξα is the α-probability value of the random variable that meets the following condition:
ξðα
P ξ ≤ ξα =
f ðξ Þdξ = α.
−∞
In other words, this value determines the PDFs segment located to the left of ξα whose area equals to the probability P = α. The Gauss distribution function is completely defined by two moments: μξ and Dξ. In this case, the expectation is a center of a grouping of random variable values, with the variance being a measurement of their scattering around the expectation. When the variance is small, random variable values are grouped in the neighborhood of the expectation μξ, and if σξ is large, generally speaking, the values will be more spread around the mathematical expectation. An importance of Gaussian distribution in probability theory is based on the central limit theorem. Its simplified interpretation states that summation (action) of a large number of independent random values (factors) with similar distributions produces a random value (result) with a distribution tending to the normal distribution.
8
Chapter I Elements of Probability Theory
In practical applications, there is often a need to simultaneously consider a set of random variables characterizing the observed phenomenon. In this case, it is more practical to use a random vector ΞТ = (Ξ1, . . ., Ξn) instead of several random variables Ξ1, . . ., Ξn. Symbol “т” denotes the transposition of the vector Ξ: 0 1 Ξ1 B C Ξ = @ · A. Ξn When a vector variable is used, one has to deal with a multivariate distribution function of the kind: F ξ T = F ðξ 1 , . . . , ξ n Þ = F ðΞ1 ≤ ξ 1 , . . . , Ξn ≤ ξ n Þ. If function F ξ T has partial derivatives with respect to ξi, the joint probability density of variables Ξ1, . . ., Ξn has the form: ∂ n F ðξ 1 , . . . , ξ n Þ f ξ T = f ðξ 1 , . . . , ξ n Þ = . ∂ξ 1 . . . ∂ξ n Probability densities of the type ∞ ð
f ðξ i Þ = −∞
∞ ð
...
f ðξ 1 , . . . , ξ n Þdξ 1 . . . dξ i − 1 dξ i + 1 . . . dξ n −∞
are referred to as marginal. Random variables Ξ1, . . ., Ξn are called independent if f ðξ 1 , . . . , ξ n Þ = f ðξ 1 Þ · . . . · f ðξ n Þ. When variables are dependent, that is, the probability of Ξi depends on the remaining variables’ magnitude, then: f ðξ 1 , ..., ξ n Þ = f ðξ i =ξ 1 , . . . , ξ i − 1 , ξ i + 1 , ..., ξ n Þ × f ðξ 1 , . . . , ξ i − 1 , ξ i + 1 , . . . , ξ n Þ. Here f(ξi/ξ1, . . ., ξi–1, ξi+1, . . ., ξn) is a conditional probability density determining the probability of an event ξi < Ξi ≤ ξi + dξi when values of remaining (n–1) variables are known. A statistical relationship between variables Ξi and Ξj is characterized by the second-order moment. It is called the cross-covariance (covariance) and is defined as follows: γij = M½ðΞi − μi ÞðΞj − μj Þ = M½ðΞj − μj ÞðΞi − μi Þ.
1.1 Random Variable
9
As it follows from its definition, the covariance is positive if values Ξi > μi (Ξi < μi) appear most often along with values Ξj > μj (Ξj < μj), otherwise the covariance is negative. It is more convenient to quantify the relationships between variables through the use of the normalized version of the cross-covariance. They are called the crosscorrelation coefficients and are equal to ρij =
γij σ i σj
.
Their values are set to within −1 . . . +1 range. The range limits (values ± 1) correspond to a linear dependence of the two variables; the correlation coefficient and covariance are null when the variables are independent. Statistical relationships between n random variables are described by a covariance matrix of the form: 0 1 γ11 . γ1n B C · · A. ΓΞ = @ · γn1
·
γnn
The normalized version of the matrix Γ is a correlation matrix: 0 1 1 . ρ1n B C · · A, PΞ = @ · ρn1
·
1
γ
where ρii = σ iiσ = 1. From their definition, γij = γji and ρij = ρji, so the covariance and i i correlation matrices are symmetric. For example, consider n random Gauss-distributed variables whose probability density is given as follows:
1 f ξ T = ð2 πÞ − n=2 jH j1=2 exp − ðξ − μÞT H ð ξ − μ Þ , 2 where (ξ–μ)Τ = (ξ1–μ1, . . ., ξn–μn). H is a matrix inverse to the covariance matrix, |H| is its determinant and they are written as follows: n o i, j = 1, . . . , n; H = Γ − 1 = ηij , jH j = jΓ j − 1 . Therefore, the exponent index in the expression of the probability density f(ξT) may be represented in terms of matrix H elements: ðξ − μÞT H ðξ − μÞ =
n n X X · ηij ðξ i − μi Þðξ j − μj Þ. i=1
j=1
10
Chapter I Elements of Probability Theory
In particular, for two random variables Ξ1 and Ξ2 the covariance matrix Γ and inverse matrix Η are: ! σ21 γ12 ; Γ= γ21 σ22 ! σ22 − γ12 1 . H= 2 2 2 σ1 σ2 − γ12 − γ21 σ21 The joint probability density of the bivariate Gaussian distribution is written as follows: f ðξ 1 , ξ 2 Þ = "
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi × 2 2π σ1 σ22 − γ212
# σ22 ðξ 1 − μ1 Þ2 + σ21 ðξ 2 − μ2 Þ2 − 2γ12 ðξ 1 − μ1 Þðξ 2 − μ2 Þ exp − . 2 σ21 σ22 − γ212 A sample PDF of such a distribution is represented in Fig. 4.
Fig. 4: The bivariate Gaussian PDF.
1.2 Function of Random Variables The result of observations of a certain phenomenon, represented by random variables, can be the object of subsequent mathematical processing. Therefore, it is logical to consider a function Ψ of random variables Ξi with μi and σi such that: Ψ = φðΞ1 , . . . , Ξi , . . . , Ξn Þ. Clearly, values of Ψ will be random as well. In principle, a random function Ψ can have one or many arguments.
1.2 Function of Random Variables
11
1.2.1 Function of One Argument Consider a random function in the form of a linear transformation: Ψ = a · Ξ + b, where a and b are constant factors. In the sample space A, such a transformation is reduced to the relocation of subspace Ai (event) and its scaling. However, the stencil (nature) of elements (points) of the space A will be the same. In other words, the probabilistic scheme for constructing the sample space A does not change. Therefore, the type of the distribution law of Ψ will corresponds to one that has a variable Ξ. However, the parameters of the probability distribution function of Ψ will be different: μψ = μξ + b; Dψ = a2 Dξ . Consider the case when the function Ψ is nonlinear. The probability of occurrence of a value ξi in a small interval dξ can be determined as Pdξ = f ðξ i Þdξ. The corresponding interval of the function Ψ will be equal to dψ dψ = ξ i dξ. dξ The derivative module is used because only the sizes of the intervals are considered. Since the probabilities Pdξ and Pdψ must be the same, it follows that f ðψi Þ dψ = f ðξ i Þdξ, where ψi = φ(ξi). From the above equality, it follows that the PDF of the k-valued function Ψ is defined by the following expression: dξ kf ðξ Þ f ðψÞ = kf ðξ Þ = . dψ dψ dξ As an example of the described approach, let us consider a random observation of values of harmonic oscillations: Ψ = A · sinðω0 t + ΞÞ,
12
Chapter I Elements of Probability Theory
where ω0 = 2πf0 is the angular frequency of oscillations. The random variable Ξ has the uniform distribution: ( 1 , 0 ≤ ξ ≤ 2π; f ðξ Þ = 2π 0, ξ < 0, ξ > 2π, which means equiprobability of its values. The derivative dψ dξ has the following expression: dψ = A · cosðω0 t + ξ Þ = dξ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A 1 − sin2 ðω0 t + ξ Þ = A2 − ψ2 . The harmonic function has the same value for two different values of its argument in the interval 0 . . . 2π. Therefore, the PDF of the function Ψ can be defined as follows: ( 1 pffiffiffiffiffiffiffiffiffiffiffi , jψj < A; 2f ðξ Þ 2 2 f ðψÞ = = π A − ψ dψ dξ 0, jψj ≥ A. Its view is represented in Fig. 5.
Fig. 5: The PDF of harmonic oscillations.
The mathematical expectation μψ of this distribution is equal to zero: μψ = M½ψ = M½A · cosðω0 t + ξ Þ = 1 A· 2π
2ðπ
cosðω0 t + ξ Þdξ = 0. 0
In turn, the expression for the variance Dψ of the function Ψ can be obtained as follows: Dψ = M ψ2 = M½A · cosðω0 t + ξ Þ · A · cosðω0 t + ξ Þ =
1.2 Function of Random Variables
13
A2 M½cosð2ω0 t + 2ξ Þ + cosð0Þ = 2 A2 A2 + 2 2
2π ð
cosð2ω0 t + 2ξ Þdξ =
A2 . 2
0
1.2.2 Function of Multiple Arguments First of all, we turn to the case when function Ψ is a sum of two independent variables Ξ1 and Ξ2, that is Ψ = Ξ1 + Ξ2. The distribution function of Ψ may be found in the following way: ðð F ðψÞ = F ðΨ ≤ ψÞ = PðΞ1 + Ξ2 < ψÞ = f ðξ 1 , ξ 2 Þdξ 1 dξ 2 , A
with the integration region A represented in Fig. 6.
Fig. 6: The integration region A.
Due to the independence of variables Ξ1 and Ξ2, the joint distribution density f(ξ1, ξ2) will be simply a product of the PDFs of these two variables: f ðξ 1 , ξ 2 Þ = f ðξ 1 Þ · f ðξ 2 Þ. In this case, the expression of F(ψ) may be transformed into the following expression: ðð f ðξ 1 Þ · f ðξ 2 Þdξ 1 dξ 2 . F ðψÞ = A
14
Chapter I Elements of Probability Theory
In the case when Ξ1 and Ξ2 have uniform distribution on interval of their values 0 . . . 1, the distribution function of Ψ will be equal to the following expression: ðð ðð f ðξ 1 Þf ðξ 2 Þdξ 1 dξ 2 = dξ 1 dξ 2 . F ðψÞ = A∩R
A
That is, it equals to the area of the intersection of the region A and square R (Fig. 6). In other words, the value of F(ψ) is an area of the zone of R located under the line ψ = ξ 1 + ξ 2. This area can be defined as follows: a) FðψÞ = 0, ψ ≤ 0; b) FðψÞ = ψ2 =2, 0 < ψ ≤ 1; c) FðψÞ = 1 − ð2 − ψÞ2 =2, 1 < ψ ≤ 2; d) FðψÞ = 0, ψ > 0. Differentiation of the above distribution function allows to determine the PDF in the following form: a) f ðψÞ = 0, ψ ≤ 0; b) f ðψÞ = ψ, 0 < ψ ≤ 1; c) f ðψÞ = 2 − ψ, 1 < ψ ≤ 2; d) f ðψÞ = 0, ψ > 0. Therefore, f(ψ) has a shape of an isosceles triangle (Fig. 7) with a base and height of 2 and 1, respectively. Such type of a distribution law is called symmetrically triangular and has parameters μψ = 1 and Dψ = 1/6.
Fig. 7: The triangular PDF.
1.2 Function of Random Variables
15
The distribution function of a sum of two independent random variables may be transformed to the following form: ðð f ðξ 1 Þ · f ðξ 2 Þdξ 1 dξ 2 = F ð ψÞ = A
ψ− ðξ 1
∞ ð
f ðξ 1 Þdξ 1
f ðξ 2 Þdξ 2 . −∞
−∞
Corresponding probability density of the function Ψ will be equal to dF ðψÞ = f ðψÞ = dψ
∞ ð
f ðξ 1 Þ · f ðψ − ξ 1 Þdξ 1 .
−∞
Such a type of integral is called the convolution integral which can be noted conditionally as: f ðψÞ = f ðξ 1 Þ*f ðξ 2 Þ. The corresponding mathematical operation is called a convolution. As an example, let us define the probability density of two independent variables with Gauss law using the convolution integral: ∞ ð
f ðξ 1 Þ · f ðψ − ξ 1 Þdξ 1 =
f ðψÞ = 1 2πσ1 σ2 1 2πσ1 σ2
∞ ð
−∞ ∞ ð
"
−∞
# ðξ 1 − μ1 Þ2 ðψ − ξ 1 − μ2 Þ2 dξ 1 = exp − − 2σ21 2σ22 expð − Aξ 21 + 2Bξ 1 − CÞdξ 1 ,
−∞
where A=
σ21 + σ22 , 2σ1 σ2
B=
μ1 ψ − μ2 + , 2σ21 2σ22
C=
μ21 ðψ − μ2 Þ2 + . 2σ21 2σ22
16
Chapter I Elements of Probability Theory
A solution to the integral of an exponential function is well known to be as follows: ∞ ð
expð − Ax2 ± 2Bx − CÞdx = −∞
rffiffiffiffi π AC + B2 . exp − A A
Taking this fact into account, the PDF of a sum of two normal random values can be rewritten in the following form: ( ) 1 ½ψ − ðμ1 + μ2 Þ2 f ðψÞ = qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . · exp − 2σ2 + σ2 1 2 2π σ2 + σ2 1
2
Therefore, the function Ψ = Ξ1 + Ξ2 has a Gaussian distribution with the mathematical expectation: μψ = μ1 + μ2 , and the square root of the variance: σψ =
qffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffi Dψ = σ21 + σ22 .
If the function Ψ is equal to: Ψ = a1 Ξ 1 + a2 Ξ 2 , its first and second moments are as follows: μψ = a1 μ1 + a2 μ2 , σψ =
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a21 σ21 + a22 σ22 .
In case of dependent variables, their sum still follows the Gaussian law with the same expectation, but the variance will include the cross-correlation of variables Ξ1 and Ξ2: Dψ = M½ðψ − μψ Þ2 = M½ðξ 1 + ξ 2 − μ1 − μ2 Þ2 = M½ðξ 1 − μ1 Þ2 + ðξ 2 − μ2 Þ2 + 2ðξ 1 − μ1 Þðξ 2 − μ2 Þ = D1 + D2 + γ12 = σ21 + σ22 + 2ρ12 σ1 σ2 . If Ψ has a number of arguments n > 2 and is not a linear function, it may be linearized at the point μT = (μ1, . . ., μi, . . ., μn) yielding the following expression: Ψ ≈ φðμ1 , . . . , μn Þ +
n X i=1
ai ðΞi − μi Þ,
1.2 Function of Random Variables
where ai =
∂ψ ∂ξ i jμ .
17
We proceed to consider the variable Ε with the probabilistic proper-
ties equal to properties of Ψ: E = Ψ − φðμ1 , . . . , μi , . . . , μn Þ = n X
ai ðΞi − μi Þ =
i=1
n X
a i Zi .
i=1
The variable Zi has parameters μζ = 0 and Di. Commutative and associative properties of the convolution integral can be written in the following forms: ½ f ðζ 1 Þ*f ðζ 2 Þ* . . . *f ðζ n Þ = f ðζ 1 Þ* . . . *½ f ðζ n − 1 Þ*f ðζ n Þ , and f ðζ 1 Þ*f ðζ 2 Þ* . . . *f ðζ n Þ = f ðζ 2 Þ*f ðζ 1 Þ* . . . *f ðζ n Þ. They allow to determine f(ψ) via the following step-by-step approach: E1 = a1 Z1 + a2 Z2 , E2 = E1 + a3 Z3 , ... E = En − 1 + an Zn . In the case when variables Zi have the normal distribution, the function E will also be normal with the following moments: με = 0; Dε =
n X
a2i Di + 2
i=1
X
ai aj γij .
i≠j
Accordingly, the function Ψ will have the normal distribution as well with parameters: μψ = φðμ1 , . . . , μn Þ; Dψ = Dε . Very often, data processing may be represented as a system of linear equations: Ψ1 =
n X
a1i Ξi + b1 ,
i=1
18
Chapter I Elements of Probability Theory
Ψk =
n X
aki Ξi + bk ,
i=1
which may be written in the matrix form as Ψ = A · Ξ + B, where 0
Ψ1
1
0
Ξ1
1
B C B C Ψ = @ · A, Ξ = @ · A, Ψk 0
a11
B A=@ .
ak1
Ξn . . .
a1n
1
0
b1
1
C B C . A, B = @ · A. bk
akn
The mathematical expectation of vector Ψ is M½Ψ = μΨ = A · μΞ + B, and its covariance matrix has the following form: 0 1 D1 . γ1k B C Γ Ψ = @ . . . A. γk1
.
Dk
The ijth element of the matrix ΓΨ may be represented as the mathematical expectation of ijth element of a product of vector (Ψ–μΨ ) and its transposition (Ψ–μΨ )T. The above vector is represented as follows: 0 1 Ψ 1 − μ1 B C B Ψ 2 − μ2 C B C. ð Ψ − μΨ Þ = B C . @ A Ψ k − μk Therefore, the matrix Γψ can be written as h i Γ ψ = M ðΨ − μΨ ÞðΨ − μΨ ÞT = h i M ðA · Ξ + B − A · μΞ − BÞðA · Ξ + B − A · μΞ − BÞT = h i M A · ðΞ − μΞ Þ · ðΞ − μΞ ÞT · AT =
1.3 Random Process
h i = A · M ðΞ − μΞ ÞðΞ − μΞ ÞT · AT = AΓ Ξ AT .
19
(3)
From (3) it follows that the knowledge of the processing algorithm (matrix A) and the covariance matrix of variables Ξ allow to calculate the covariance matrix of the processing results Ψ.
1.3 Random Process By a random process, one implies a continuous function E(t) whose instantaneous values are those of random variables. At discrete moments ti = Δt·i (i = 1, . . ., n), these values may be viewed as a set of observations of a n-dimensional random vector E, that is 0 1 ε1 B C E = @ · A = fεi g. εn The sequence εi (i = 1, n) representing the values of E(t) at equally spaced points in time is called a discrete process or a time series. Note that the order in the sequences ε1, . . ., εn is of importance for a random process whereas the vector Ξ presented in Section 1.1 had indexes for notational convenience only. To simplify the subsequent representation of the discrete processes, we introduce the notation Δt = 1, making time dimensionless. The random process is referred to as stationary if the distribution function F(εi+1, . . ., εi+n) of any of its n values is independent of “i.” In this case, any n consecutive variables have the same distribution regardless of their position in the series. The mathematical expectation of εi will be equal to ∞ ð
με =
εi f ðεi Þdεi ,
−∞
and their variance ∞ ð
Dε = σ2ε =
ðεi − με Þ2 f ðεi Þdεi .
−∞
The relationship between two values εi and εi+k of the stationary series separated by a lag k is expressed by a covariance of the type: h i γk = M εi − με εi + k − με = γ − k ,
20
Chapter I Elements of Probability Theory
as well as by a correlation factor: ρk =
γk = ρ − k. Dε
A set (in sequence) of process covariances is termed the covariance function and a set of correlations is termed the correlation function. The Fourier transformation of the covariance function of the stationary process is called a one-sided power spectral density: SðθÞ =
∞ 1 X γ cosðθ · iÞ, π i= −∞ i
0 ≤ θ ≤ π.
The expression for covariances represented in terms of S(θ) is ðπ γk =
cosðθ · kÞSðθÞdθ: 0
Such relationships between the covariance function and the power spectral density are established by the Wiener–Khinchin theorem. Variance, in particular, can be expressed from the above equation as ðπ Dε = γ0 =
SðθÞdθ. 0
Therefore, the power spectral density shows the distribution of the random process variance (intensity) within a continuous frequency range 0 . . . π. The value S(θ)·dθ may be interpreted as an approximate portion of the process variance (intensity) within the frequency range θ . . . (θ + dθ). As the power spectral density, which will simply be called the spectrum going forward, is the Fourier cosine transformation of the covariance function, the knowledge of the latter is mathematically equivalent to the spectrum knowledge and vice versa. As an example, consider the process that will be referred in the subsequent text. This process, called “white” noise E(t), has a finite bandwidth and a uniform spectrum in the frequency range of 0 . . . π (Fig. 8).
Fig. 8: Spectrum of “white” noise.
1.4 Linear Time Series
21
Using the above expression for γk , the covariance function of such a process can be determined in terms of S(θ) as ðπ γk = 0
Dε sinðπkÞ . · cosðθ · kÞdθ = Dε · πk π
Since for k ≥ 1 the function sin(πk)/πk = 0, then the corresponding values of γk are also zero. Therefore, there is only the variance γ0 = Dε. In other words, values εi = E(ti) of the “white” noise are uncorrelated.
1.4 Linear Time Series The use of random series models is based on an assumption that these series are generated by a sequence of random values εi. The latter represents the discrete Gaussian “white” noise E with zero expectation and variance Dε. The values εi, as it was established in the previous section, can be interpreted as samples of the continuous “white” noise E(t). It is considered that εi transforms into the process xi = X(ti) by means of a linear filter that computes a weighted sum of preceding values of E in the following way: xi = εi + c1 εi − 1 + c2 εi − 2 + The introduction of a back-shift operator B set as Bxi = xi − 1 allows to express xi in a compact form: x i = C ð B Þ · εi ,
(4)
The operator C(B) of a linear filter (4) is equal to X∞ c Bj , ðco = 1Þ. CðBÞ = 1 + c1 B + c2 B2 + = j=0 j In such an approach, B is taken as a dummy variable and has different values including complex ones. Its jth power is a factor at cj. Theoretically, a series c1, c2, . . . can be finite or infinite. If this series converges at |Β | ≤ 1, the filter is called stable and the series xi is stationary [7]. Let us review several types of filter models having practical applications and directly linked with the matter in this book.
22
Chapter I Elements of Probability Theory
The autoregressive model represents a current value xi of a process in terms of a linear combination of preceding values xi-j (j > 0) and impulse εi as xi = a1 xi − 1 + a2 xi − 2 + + ap xi − p + εi .
(5)
The model given by (5) describes a random process referred to as the p-order autoregressive process and is denoted AR(p). In the following text, a process generated by the p-order autoregressive model will be denoted as the “AR(p) process.” The term “autoregression” may be explained as follows. It is known that a linear model of the kind: X = a1 Y1 + a2 Y2 + + ap Yp + E, relating the dependent variable X with independent variables Y1, . . ., Yp plus the term E (error) is referred to as a regression model. A specific term “a regression of X on values Y1, . . ., Yp” is applicable in this case. As in equation (5), xi is related to its preceding values and the term “autoregression” was chosen for this case. If the p-order autoregression operator is defined as AðBÞ = 1 − a1 B − a2 B2 − − ap Bp , the AR(p) model may be written in a more compact form: AðBÞxi = εi . The autoregressive model is a particular form of the linear filter (4). In fact, acting in a formal way, from the above presented expression: xi = CðBÞεi , where CðBÞ = A − 1 ðBÞ. The AR(p) process can be stationary or nonstationary. Conditions for the stationarity of the process may be defined from the condition specified above regarding the convergence of the series C(B). To do this, let us make use of the process characteristic equation [9] that for the p-order autoregression has the form: z p − a1 z p − 1 − − ap =
p X
aj zp − j = 0.
(6)
j=0
Equation (6) has p roots that may be written as z1, . . ., zp, and the autoregression operator may be represented in terms these roots in the following form: Y p p Y B = AðBÞ = 1− 1 − zj B , B j j=1 j=1
1.4 Linear Time Series
23
because in the equation A(B) = 0, the roots Bj = 1/zj. Decomposing A(B) into common fractions allows us to write: xi = A − 1 ðBÞεi =
p X j=1
p ∞ X X Qj εi = Qj ðzj BÞl εi . 1 − zj B j=1 l=0
It follows that to make the series C(B) = A−1(B) converge at ǀBǀ < 1, ǀzjǀ must be less than 1. In other words, the roots of the characteristic equation (6) have to be inside the unit circle. Let us to consider the covariance function of the stationary autoregressive process. Multiplying equation (5) by xi-j yields the following expression: xi xi − j = a1 xi − 1 xi − j + a2 xi − 2 xi − j + + ap xi − p xi − j + εi xi − j . A mathematical expectation of both parts of this equation provides the following expression for the covariance function (j ≥ 1): γj = a1 γj − 1 + a2 γj − 2 + + ap γj − p , because preceding values xi-j are not correlated with subsequent εi, that is, M[xi–jεi] = 0. In the case j = 0, the expression for the covariance function will be transformed to γ0 = Dx = a1 γ − 1 + a2 γ − 2 + + a − p γ − p + Dε , as М[xiεi] = M[εi2] = Dε. Taking into account γ− i = γi and moving covariance members to the left part of the expression yields the following expression for the AR(p) process variance: Dx =
Dε . 1 − a1 ρ1 − a2 ρ2 − − ap ρp
(7)
In the case j > 1, the autoregressive process covariances divided by γ0 may be related in the difference equation for process correlations: ρj = a1 ρj − 1 + + ap ρj − p . Substituting values j = 1, . . ., p in this equation will generate a system of linear equations for ai: ρ1 = a1 + a2 ρ1 + + ap ρp − 1 ; ... ρp = a1 ρp − 1 + a2 ρp − 2 + + ap , referred to as a system of Yule–Walker equations.
(8)
24
Chapter I Elements of Probability Theory
The next type of reviewed models describes a random series whose value xi is a linear function of a finite number of preceding εi–j and current εi. Such a process is described by the following difference equation: xi = εi − d1 εi − 1 − d2 εi − 2 − − dq εi − q
(9)
and is called the q-order moving average process, MA(q). The term “moving average” can lead to a misunderstanding as the sum of weights 1, d1, . . ., dq is not necessarily equal to 1 (as is the case when evaluating an average value). Nevertheless, this term has found use in literature, and we shall keep it. The moving average operator may be defined as DðBÞ = 1 − d1 B − d2 B2 − − dq Bq , and the MA(q) model can be briefly written as xi = DðBÞεi . It follows that the moving average process may be presented as the response of a linear filter with the transfer function D(B) to its input noise E. Note the MA(q) process is always stationary due to the fact that the series C(B) = D(B) is finite. Acting in the same way as in the case of the AR(p) process, a covariance function of the MA process may be obtained from (9): ( Dε − dj + d1 dj + 1 + + dq − j dq , j = 1, ..., q; γj = 0, j > q. The corresponding correlation function has the form: 8 < − dj + d1 dj + 1 + + dq − j dq , j = 1, ..., q; 1 + d21 + d2 + + d2q ρj = 2 : 0, j > q, because the MA process variance is Dx = γ0 = ð1 + d1 2 + + dq 2 ÞDε . Therefore, the correlation function of MA(q) process stops with the lag q. To achieve a greater flexibility in the observed series description the AR and MA processes may be combined into one model. The combined process of the type: xi = a1 xi − 1 + + ap xi − p + εi − d1 εi − 1 − − dq εi − q or AðBÞxi = DðBÞεi
(10)
1.4 Linear Time Series
25
is referred to as the (p,q)-order autoregressive moving average process and is denoted ARMA(p,q). Equation (10) may be written as xi =
DðBÞ 1 − d1 B − − dq Bq εi = εi , AðBÞ 1 − a1 B − − ap Bp
and the ARMA process may be interpreted as a response to white noise E of a linear filter with a rational transfer function. As it was said above, the MA process is always stationary. Therefore, the corresponding MA(q) term in (10) have no impact on conclusions defining conditions for the autoregressive process stationarity. Hence, the model ARMA(p,q) describes a stationary process if the roots of the characteristic equation of its AR(p) term are inside the unit circle. An expression for the ARMA process covariance function may be obtained in a way already used for the AR and MA processes: γj = a1 γj − 1 + + ap γj − p + γxε ðjÞ − d1 γxε ðj − 1Þ − − dq γxε ðj − qÞ.
(11)
The cross-covariance function of random processes X and E is given as follows: γxε ðjÞ = M½xi − j εi . It follows from (10) that xi-j is a function of input noise impulses observed up to the instant (i–j). As such, the cross-covariance function γxε (j) = 0 for j > 0 whereas γxε (j) ≠ 0 for j ≤ 0. It means that for the ARMA(p,q) process there are q correlations ρq, . . ., ρ1 whose values are related to parameters d of the MA process and parameters a of the AR process. However, at the instant j ≥ q + 1 ρj = a1 ρj − 1 + + ap ρj − p , that is, the correlation function of the ARMA process is entirely determined by the autoregression parameters. The reviewed models of random series allow to describe a large class of random processes. These models have found a practical application because they require a small number of parameters to represent a linear process. In general case, the models are employed in practice for approximations of observed data. However, as will be shown later, there are cases when these models can describe observed phenomena and model factors can have a physical interpretation.
Chapter II Adaptation of Probabilistic Models This chapter deals with the elements of mathematical statistics which comprise problems related to adaptation of probability models to empirical data. Such data is represented by a sequence of real numbers (series) х1, . . ., хn of a variable X. If all xi are assumed to be independent random values with an identical f(x), then the series is characterized by the following probability density: f ðx1 , . . . , xn Þ =
n Y
f ðxi Þ.
i=1
An adaptation procedure is realized by utilizing a probability model as a working hypothesis and its subsequent comparison with experimental data. This procedure has a random character due to utilizing a limited number of xi from an infinite population of random value X. Methods of mathematical statistics provide guidance regarding better approaches to utilize experimental data as well as to evaluate reliability of inferences pertaining to the adapted models.
2.1 Processing of Experimental Data Processing of series х1, . . ., хn and representation of these results in a form suitable for making inferences are critical issues of statistical analysis. Usually, this starts with a preliminary processing of data which reduces the latter to a few statistics. Under the name of statistic Tn is understood to be a result of processing the series х1, . . ., хn: Tn = Tðx1 , . . ., xn Þ. There are a few widely used statistics in practice. Random variable Z, called the standard normal variable, plays a basic role: Z=
X − μx , σx
where X is a Gaussian random variable. It is an obvious fact that Z has the mathematical expectation μz = 0 and σz = 1. The statistic Tn in the form: Tn =
n X
zi2 = χ2n
i=1
is called a chi-squared variable with n degrees of freedom. In general, the degrees of freedom of a statistic are equal to the number of independent components used https://doi.org/10.1515/9783110707021-003
2.1 Processing of Experimental Data
27
in its calculation. A mathematical description of the χ2n probability density function is 8 n > 2 2 − 1 · exp − χ2n 2 < ðχn Þ , χ2n > 0; n=2 · Γðn=2Þ f χn = 2 > : 0, otherwise where Γ(n/2) is the gamma function whose value (for integer n ≥ 2) coincides with the factorial (n–1)!; n is the number of degrees of freedom. Plots of f χ2n are represented in Fig. 9.
Fig. 9: The PDF of the chi-squared variable.
The first two moments of the χ2n distribution are μχ2 = n, and Dχ2 = 2n. n
n
χ2n
The distribution relates to a sum of independent variables zi2 with the same variance. This distribution approaches the Gauss law approximation when n inpffiffiffiffiffiffiffi creases due to the central limit theorem. In particular, if n > 30, a variable 2χ2n has pffiffiffiffiffiffiffiffiffiffiffiffi a distribution similar to normal with μχ2 = 2n − 1, and Dχ2 = 1. n n Statistic Z Tn = pffiffiffiffiffiffiffiffiffiffi = tn χ2n =n is a variable with Student’s t-distribution. Its PDF is given by the following expression: n+1 Γ t2 2 f ðtn Þ = pffiffiffiffiffiffi · ð1 + n Þ − ðn + 1Þ=2 , n πn · Γðn=2Þ where n is the number of degrees of freedom. Plots of f(tn) are represented in Fig. 10. The characteristics of this distribution are μtn = 0, and Dtn = n=ðn − 2Þ for n > 2. Asymptotically ðn ! ∞Þ, the t-distributed variable approaches the standard normal variable Z. In fact, a good approximation can be obtained with n > 30.
28
Chapter II Adaptation of Probabilistic Models
Fig. 10: The Student’s PDF.
The final statistic under examination is Tn =
χ2n1 · n2 χ2n2 · n1
= Fn 1 , n 2 .
This statistic is called the F(Fisher–Snedecor)-statistic with n1 and n2 degrees of freedom. Its PDF is described by the expression: n + n n n1 =2 1 2 1 Γ · Fn1 , n2 ðn1 =2Þ − 1 2 n2 , Fn 1 , n 2 ≥ 0 f Fn1 , n2 = ðn1 + n2 Þ=2 n1 n2 n1 Γ ð1 + Fn1 , n2 Þ Γ 2 2 n2 This PDF with n1 = n2 =100 is represented in Fig. 11.
Fig. 11: The PDF of the F-statistic.
Parameters of the F-statistic are: μF =
n2 2n22 ðn1 + n2 − 2Þ , σ2F = ; n2 > 4. n2 − 2 n1 ðn2 − 2Þ2 ðn2 − 4Þ
Asymptotically (n1, n2 ! ∞Þ, the F-statistic approaches the normal distribution with 2ðn + n Þ μF = 1, and σ2F = n1 n 2 . An acceptable ð ≈ 10%Þ approximation is achieved with 1 2 n1, n2 > 60.
2.2 Criterion of Maximum Likelihood
29
2.2 Criterion of Maximum Likelihood The task of model factor (parameter) estimation can be stated as follows: there is a random variable X with a known probability density f(x/а) featuring parameters аТ = (а1, . . ., аk). Actual values of these parameters in provided observations xT = (x1, . . ., xn) are unknown although these values are fixed. It is required to estimate parameters a and consider that some values of a provide “most likely” observations x. In 1925, R.A. Fisher, a British statistician and geneticist, has formulated the maximum likelihood criterion which holds a key position in the statistic estimation theory. Its essence may be described in the following manner. Prior to being observed, the possible values of a random variable X are described by the probability density f(x/а). Once observations x1, . . ., xn are obtained, it is appropriate to proceed with considering the possible values of parameters a that provided those observations. To do this, one constructs a likelihood function L(a/x) which is proportional to f(x/a) with observations x and unknown values of a. The ~ provide the maximum likelihood estimates (MLEs) of parameters a designated as a maximum value of the function L(a/x): ~ =xÞ ! max. Lða ~ a
In other words, the MLEs correspond to the highest probability of appearance of observations x1, . . ., xn. The likelihood function plays a fundamental role in the estimation theory since it carries all information about model parameters obtainable from empirical data [10]. Often it is more convenient to use the logarithmic likelihood function lða=xÞ = ln Lða=xÞ, which comprises additive constants instead of proportionality coefficients. Furthermore, l(a/x) takes on a simple view for exponential forms of distribution laws which are the most often used case in practical applications. To illustrate the principle of maximum likelihood, let us take an example of estimating parameters for the PDF of a normal variable X with μх and Dx. The results of this variable measurement are considered as a vector x of independent readouts x1, . . ., xn with the joint probability density: ( ) n X ðxi − μx Þ2 − n=2 . f ðx1 , . . . , xn Þ = ð2πDx Þ · exp − 2Dx i=1
30
Chapter II Adaptation of Probabilistic Models
As mentioned previously, the likelihood function is derived from the probability density providing fixed xi and variable parameters μх and Dx. In particular, the logarithmic likelihood function will have the form: lðμx , Dx =xÞ = −
n n 1 X lnð2πDx Þ − ðxi − μx Þ2 . 2 2Dx i = 1
The conditions of a maximum of the logarithmic likelihood function with respect to the parameter μx will be n ∂l 1 X = 2ðxi − μx Þ = 0, ∂μx 2Dx i = 1
which will result in the following expression: Pn x ~x = i = 1 i . μ n
(12)
Now, let us represent the condition of a maximum of the logarithmic likelihood function with respect to the parameter Dx: n ∂l n 1 X =− + 2 ðxi − μx Þ2 = 0. ∂Dx 2Dx 2Dx i = 1
This equation leads to the following expression for calculating the MLE of variance: ~x = σ ~x2 = D
Pn
i=1
ðxi − μx Þ2 . n
In addition to the point estimates of the parameters, interval estimates are also used in statistics. One of them is the confidence interval, which is obtained on a basis of statistics computed from observed data. The confidence interval will contain the true value of the unknown estimated parameter with a probability that is specified by a given confidence level. Usually, a confidence level P = 0.95 is utilized. The practical interpretation of the confidence interval with a confidence level, say P = 0.95, is as follows. Let us assume a very large number of independent experiments with a similar construction of a confidence interval. Then in 95% of experiments, the confidence interval will contain the true value of the estimated parameter. In the remaining 5% of experiments, the confidence interval may not contain that value.
2.3 Properties of Maximum Likelihood Estimates A wide applications of the maximum likelihood criterion is based on a property of ~ is the MLE of parameter a, then g(a ~) will be the MLE of any MLE invariance: if a function of parameter a (not necessarily a one-to-one function) [10].
2.3 Properties of Maximum Likelihood Estimates
31
To demonstrate that property, let us examine an injective (one-to-one) function g(a) instead of parameter a. A derivative of this function may be presented in the following form: ∂l ∂l ∂gðaÞ = · . ∂a ∂gðaÞ ∂a Respectively, the condition of a stationary point (extremum) of the logarithmic likelihood function ∂l =0 ∂gðaÞ corresponds to the following condition: ∂l = 0, ∂a ∂gðaÞ ≠ 0. ∂a The invariance property of the MLE is particularly important for estimation of model factors. This property substantially simplifies the calculation of their MLEs when the relationship between the model factors and the statistical characteristics of the series is known. In this case, it suffices to obtain only the MLEs of these characteristics. Subsequently, the knowledge of a functional link between the statistical characteristics and model factors allows to calculate the MLEs of the latter. In themselves, the parameter estimates cannot be correct or incorrect because they are somewhat arbitrary. Nevertheless, some estimates could be considered as “better” than others. To compare them, one can make use of the mean square error ~: of a i h ~ − Mða ~Þ2 g + Mf½Mða ~Þ − a2 g. ~ − aÞ2 = Mf½a M a
if
The first term on the right-hand side of this expression is the estimate variance which is the measure of a “random” fraction of the error: ~ − Mða ~Þ2 g. Da~ = Mf½a The second term – square of the estimate bias – gives a systematic deviation: ~Þ − a2 g. b2a~ = Mf½Mða Depending on properties of components of the error, estimates come subdivided into several categories.
32
Chapter II Adaptation of Probabilistic Models
First, if the estimate expectation equals to the parameter which is to be estimated, such as: ~Þ = a, M ða that is, ba~ = 0, then the estimate is called unbiased. ^ is less than the variance of any other Second, if the variance of the estimate a ~, that is: estimate a Da^ < Da~ , ^ is referred to as an efficient estimate. then the estimate a And finally, if with the increase of the series size n the estimate approaches the parameter a with a probability tending to 1, in other words, at any small c > 0 ~ − aj ≥ cÞ = 0, lim Pðja
n!∞
then the estimate is called consistent. From the Chebyshev inequality in the form: ~ − aj ≥ cÞ = Pðja
Da~ , c2
it follows that a sufficient (but not required) condition of consistency is: lim Da~ = 0.
n!∞
In other words, the accuracy of the estimate must increase with a corresponding increase of n. Both conditions of the estimate’s consistency are, in fact, requirements for the convergence in probability and the mean square. ~ x deduced in the previous ~x and D Let us examine the properties of estimates of μ ~x : section. First of all, we will evaluate a mathematical expectation of μ ~x = M½μ
n 1X n M½xi = · μx = μx , n i=1 n
that is, this estimate is unbiased. To obtain any inference of the consistency of esti~x may mates, it requires evaluation of their variances. A variance of the estimate μ be determined as Pn Dμ~ = Var
i = 1 xi
n
Pn
=
Var½xi nDx Dx . = 2 = n2 n n
i=1
The presence of the denominator n in this expression will decrease the variance Dμ~ with increasing n, that is to say, the estimate of μx is consistent.
2.3 Properties of Maximum Likelihood Estimates
33
~x may be used for calIn the case of an unknown expectation μx, its estimates μ ~ x: culation of D ~x = D
Pn
i=1
~x Þ2 ðxi − μ . n
First of all, we will modify the numerator of the fraction: n X
~x Þ2 = ðxi − μx + μx − μ ~x Þ2 = ðxi − μ
i=1 n X
~x − μx Þ ðxi − μx Þ2 − 2ðμ
i=1 n X
n X ðxi − μx Þ + i=1
~x − μx Þ = ðμ 2
n X
i=1
~ x − μx Þ ðxi − μx Þ + 2ðμ 2
i=1 n X
~x − μx Þ2 = nðμ
n X
! xi − nμx
+
i=1
~ x − μx Þnðμ ~ x − μx Þ + ðxi − μx Þ2 − 2ðμ
i=1
~x − μx Þ2 = nðμ
n X
~x − μx Þ2 . ðxi − μx Þ2 − nðμ
i=1
~x The use of this expression allows to determine the mathematical expectation of D as follows: X
X
n n 1 ~x = 1 M M ðxi − μx Þ2 = ðxi − μx Þ2 − M D n n i=1 i=1 ~x − μx Þ2 g = M½nðμ
1 1 n−1 ðnDx − nDμ~ Þ = ðnDx − Dx Þ = Dx . n n n
~ x has to be calculated in accordance with the folTherefore, the unbiased estimate D lowing expression: ~x = D
Pn
~x Þ2 ðxi − μ . n−1
i=1
The variance of this estimate may be represented as "P DD~ x = Var
h i # Pn 2 2 ~ Var ðx − μ Þ i x i = 1 ~ ðxi − μx Þ = . ðn − 1Þ ðn − 1Þ2
n i=1
It is clear that the numerator of this fraction has an order of magnitude n, while the denominator, n2 . Therefore, the value of DD~ x will decrease with increasing n, which ~ x. indicates consistency of D
34
Chapter II Adaptation of Probabilistic Models
An inference about the efficiency of the MLEs can be obtained in the following fashion. The lower bound of the variance of an unbiased estimate of any parameter a is determined by the Cramer–Rao inequality: Da~ ≥
1 . I ð aÞ
The I ðaÞ is called the Fisher information (or just information) contained in the series x1, . . ., xn about an unknown parameter a. In the case of MLEs, the Fisher information value is determined by the following expression:
∂2 lða=ðx1 , . . . , xn Þ = − M l′′ . I ðaÞ = − M ∂a2
(13)
The efficient estimates have values corresponding to the lower bound of the Cramer– Rao inequality: Da^ =
1 . I ð aÞ
Let us consider the value of l′. Due to the existence of the second derivative of the ^: likelihood function, l′ may be linearized at a point a ^ Þ + ða − a ^Þ · l′′ + . l′ = l′ða ^Þ = 0 for the MLE and ignoring the highest derivatives, Taking into account that l′ða the following expression can be written as follows: ^Þ · l′′. l′ = ða − a Asymptotically, l′′ = M l′′ , so taking into account (13) the above written expresn!∞ sion transforms into the following: ^ Þ · I ða ^Þ = − l′ = − ða − a n!∞
^Þ ða − a . Da^
An integration of this expression yields the following form of the logarithmic likelihood function: lðaÞ = −
^Þ2 ða − a + const. 2Da^
Therefore, the likelihood function will correspond to the normal distribution " # ^ Þ2 ða − a LðaÞ = C · exp − 2Da^
2.4 Least-Squares Method
35
^ and variance Da^ = 1=I ðaÞ. As this conclusion was achieved with the expectation a for n ! ∞, all MLEs are efficient asymptotically.
2.4 Least-Squares Method The least-squares method refers to the works of Adrien-Marie Legendre and Carl Gauss. Conceptually, it can be interpreted easily as follows: “Results of n repeated and independent measurements ~xi may be represented as a sum of an unknown value x and a measurement error ξi, that is, ~xi = x + ξ i . Value x is estimated in such a way as to minimize the sum of squares of the error: n X
ξ 2i =
i=1
n X
ð~xi − xÞ2 ! min .” x
i=1
”The corresponding estimate of value x is called the least-squares estimate (LSE), _ and it will be marked in the following text as x . Let us examine the results of measurements ~xi = x + ξ i featuring μξ = 0 and Dξ. As it was demonstrated in the previous section, asymptotically ðn ! ∞Þ, the likelihood function can be represented via fixed values of ~xi and an unknown variable x in the following form: " # n X ð~xi − xÞ2 . LðxÞ = C · exp − 2Dξ i=1 Therefore, the logarithmic likelihood function will be lðxÞ = const −
n 1 X ð~xi − xÞ2 = 2Dx i = 1
const −
Sð xÞ . 2Dx
If n ! ∞, the maximum of the likelihood function corresponds to the minimum of the sum S(x): max lðxÞ ) min SðxÞ. x x An extreme value of S(x) is determined from a condition: ∂SðxÞ = 0, ∂x which leads to the following equation: 2
n X i=1
ð~xi − xÞ = 0.
36
Chapter II Adaptation of Probabilistic Models
Thus, the MLE of a value x is asymptotically equal to its LSE: n P x⁓
x̑ =
i=1
n
.
The symbol “ ̑“ will indicate the least-square estimates in the subsequent text.
2.5 Statistical Inferences In addition to estimating the parameters of distributions, statistics are also used to infer conclusions regarding data properties [10]. Usually, a statistical inference is the result of testing competitive hypotheses Hi that declare certain data properties. Such a test procedure can be described in the following way: – formulating the properties (null hypothesis H0) that need to be confirmed; – setting the significance level α (usually, α = 0.05) for a validation of the null hypothesis; – calculation of statistics Tn whose value depends on empirical data and which is used for an inference about the truth of the hypothesis; – determining a critical region of these statistics where the equality P(Tα/2 ≤ Tn ≤ T1–α/2) = α is satisfied; ^n is inside – confirmation of the null hypothesis truth: if the value of statistics T (outside) of the critical region, the hypothesis is accepted (rejected). It is noteworthy to mention that the nature of statistical inference is such that if the hypothesis is accepted this doesn’t mean that it has been verified with a given probability. All it means is that there is no reason to reject this hypothesis. The rejection of a hypothesis is accompanied by two kinds of errors: – nonacceptance of a hypothesis even though it is true (the error of the first kind); – acceptance of a hypothesis even though it is false (the error of the second kind). If the hypothesis is not accepted then it is possible to predetermine the probably Pα of the error of the first kind. If the hypothesis is accepted, one can determine the probability Pβ of the error of the second kind (acceptance of wrong hypothesis) for the alternative hypothesis. The probability (1–Pβ) is also called the criterion power. Consider a few tests that may be referred to in the following text. The first one is the t-test applied to verify equality of the mean values of two normal series x1i (i = 1, n1) and x2j (j = 1, n2), that is, to verify the null hypothesis H0: μ1 = μ2. This test implementation starts with calculations of mean and variance estimates of these realizations: Pn2 P n1 j = 1 x2i i = 1 x1i ^ ^ μx1 = , μx2 = ; n1 n2
2.5 Statistical Inferences
^x = D 1
Pn1
i=1
^ x Þ2 ðx1i − μ 1 n1 − 1
,
^x = D 2
Pn2
j=1
^x Þ2 ðx2j − μ 2 n2 − 1
37
.
^x > μ ^x , the following statistic is calculated: If μ 1 2 ^1 − μ ^2 μ ^n = qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi = ^tn , T ^ x =n2 ^ x =n1 + D D 1 2 where the degrees of freedom n equal n=
^ x =n1 + D ^ x =n2 Þ2 ðD 1 2 ^ x =n2 Þ2 ^ x =n1 Þ2 ðD ðD 1 2 + n1 − 1 n2 − 1
.
If Dx1 = Dx2 , then n = n1 + n2 − 2. The obtained value of ^tn is compared with the tabular t1–α,n corresponding to the given significance level α. The null hypothesis is rejected, if ^tn > t1 − α, n . Another test is the F-test of variance homogeneity. In this case, the null hypothesis declares that two normal variables have the same variance (H0: Dx1 = Dx2 Þ. If ^x > D ^ x , then statistic Tn can be calculated as follows: D 1 2 ^ ^n = Dx1 = F ^n − 1, n − 1 . T 1 2 ^x D 2 ^n − 1, n − 1 is compared with the tabular value F1 − α, n − 1, n − 1 . The calculated statistic F 1 1 2 2 ^ Whenever Tn > F1 − α, n1 − 1, n2 − 1 , the null hypothesis is rejected. The last test to be considered is used to verify hypotheses about the type of distribution law of random variables. The basis of the test is Pearson’s chi-squared test which evaluates the difference between empirical and theoretical frequencies of an event’s occurrence. Suppose that the series xi (i = 1, n) is a result of observing a random variable X. For the null hypothesis, an assumption has to be made that the empirical data correspond to a certain theoretical distribution. Sometimes, there is sufficient a priori knowledge about an observed phenomenon to make such an assumption. In its absence, a histogram of xi values is constructed. The histogram is a stepped figure consisting of rectangles whose bases are equal to the boundaries of the intervals x(j-1) . . . x(j) (j = 1, k). The heights of rectangles correspond to the numbers lj or frequencies νj = lj/n of the results falling into specific intervals. The image of the histogram (Fig. 12) may help with making an assumption about the distribution law of values of xi. The image of the histogram represented in Fig. 12 forms the basis of an assumption of the normal distribution of the variable X for the null hypothesis. As the next step, the estimates of parameters of the assumed law are calculated using xi. Let us suppose that the number of unknown parameters is equal to m.
38
Chapter II Adaptation of Probabilistic Models
Fig. 12: Histogram image.
Subsequently, the range xmin . . . xmax is divided into k intervals so that on average each one had lj ≥ 5 readouts. The obtained estimates of distribution parameters are used to calculate theoretical values pj which must not be insignificant. Statistic Tn is formed as follows: ^n = T
k X ðlj − npj Þ2 j=1
npj
=n
k X ðνj − pj Þ2 j=1
pj
.
According to Pearson’s theorem, this statistic has the χ2 distribution with (k–m–1) degrees asymptotically, that is, ^n!∞ = χ2 T k − m − 1. ^n is compared with the tabular value χ2 As before, the obtained value T 1 − α, k − m − 1 corresponding to the given significance level α. The null hypothesis is rejected, if ^n > χ 2 T 1 − α, k − m − 1 .
Chapter III Stochastic Oscillatory Processes As mentioned in Introduction, a dynamic system is an object whose output at any point in time is defined by values of its input at the same time instant. In addition, there is a rule which determines the evolution of a system’s initial state over time. In principle, a dynamic system can be both linear and nonlinear.
3.1 Random Oscillations A dynamic system is called linear if, under the influence of an input process Y(t), its output X(t) at the moment of time t is determined by the following convolution equation: ðt
ðt hðt − τÞY ðτÞdτ = 0
hðτÞY ðt − τÞdτ = XðtÞ. 0
Here hðtÞ is a certain function, assumed to be piecewise continuous. Usually, it is called the weight function of the system. The function hðtÞ describes the state of the system at the moment t after exposure to Dirac’s delta (impulse) function δðt − t0 Þ from a state of rest. The delta function has zero width and infinite height: ( + ∞, t = t0 δðt − t0 Þ = , 0, t≠t0 and its integral (area) equals to 1: ∞ ð
δðt − t0 Þdt = 1. −∞
Therefore, another name for h(t) is the impulse-response function (IRF). It follows from the essence of the convolution integral that h(t) defines the influence of the system input at the moment ðt − τÞ on the system output at time t. Such a linear system is called time invariant because the weight function is a function of the time difference. A physically realizable system should only respond to previous values of the input Y(t), that is, the following condition must be met: hðtÞ = 0,
https://doi.org/10.1515/9783110707021-004
t < 0.
40
Chapter III Stochastic Oscillatory Processes
A system is said to be stable, if with any input limited in value, its output is also limited in its value. In other words, a stable linear system must have an absolutely integrable weight function: ∞ ð
hðtÞdt < ∞. −∞
In the frequency domain, a linear dynamic system can be characterized by the frequency response function (FRF) H(f ). The FRF is defined as the Fourier transform of the weight function: ∞ ð
Hð f Þ = −∞
where f is the cyclic frequency and j = has two equivalent representations:
hðtÞ · e − j2πft dt, pffiffiffiffiffiffiffi − 1. As the FRF is a complex function, it
H ð f Þ = U ð f Þ + j V ð f Þ = Að f Þ · e jΦð f Þ . U(f ) and V(f ) are the real and imaginary parts of the FRF; A(f ) and Φ(f ) are the magnitude (amplitude) and angle (phase) components of the vector representation of the FRF. The relationships between these two forms of the H(f ) are described by the following formulas: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Að f Þ = U ð f Þ2 + V ð f Þ2 ;
Vð f Þ Φð f Þ = tan − 1 : Uð f Þ The FRF can be given by the following physical interpretation. Let the input of a linear system be a harmonic oscillation of the following form: Y ðtÞ = A · sinð2π f0 t + ’Þ, and its second derivative is d2 Y ð t Þ = − ð2πf0 Þ2 · Y ðtÞ. dt2 Taking into account that dn XðtÞ = dtn
ðt hð τ Þ 0
dn Y ðt − τÞ dτ, dtn
3.1 Random Oscillations
41
one can write: d2 X ð t Þ =− dt2
ðt hðτÞ · ð2πf0 Þ2 · Y ðt − τÞdτ = − ð2πf0 Þ2 · XðtÞ. 0
This means that the output process of the linear system is also harmonic. Consequently, the ratio of the amplitudes of the input and output processes determines the A(f ) component, and the difference of their phases determines the Φ(f ) component. Note that the convolution of two functions in the time domain corresponds to the multiplication of the Fourier images of these functions. As a result, the frequency spectrum of the output process of a linear system can be defined as the product of the input spectrum and the system’s FRF. As an example, consider the dynamic system described by ordinary differential equation (2) with constant coefficients as follows: ..
X ðtÞ + 2hX_ ðtÞ + ð2πf0 Þ2 XðtÞ = Y ðtÞ
(14)
Such a model describes a linear elastic system with the natural frequency f0 and damping factor h which characterizes, for instance, losses of energy in mechanical systems. A typical view of the amplitude component A(f ) is shown in Fig. 13 (in a logarithmic scale).
Fig. 13: Amplitude component of the FRF.
As it was mentioned earlier, the phase component Φ(f ) of the system’s FRF determines the difference of phases (time delay) of the input and output processes corresponding to the same frequency (Fig. 14).
42
Chapter III Stochastic Oscillatory Processes
Fig. 14: Phase component of the FRF.
In the case when the system input is “white” noise, that is, Y(t) = E(t), the system’s output X(t) consists of random oscillations [11]. To determine their quantitative characteristics, a notion of a “dynamic shaping system” can be implemented. This approach allows to consider the observing process as the output of a hypothetical system with the input being “white” noise. Accordingly, parameters of this system are used to characterize (describe) the observing process. This means that the random oscillations may be exhaustively described in terms of parameters of a linear elastic system: its natural frequency f0 and damping factor h. A time-domain representative characteristic of random oscillations is given by a correlation function in the form [12]: ρðτÞ =
e − hjτj sinð2πfh jτj + ’Þ : sin ’
(15)
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Here fh = f02 − ðh=2πÞ2 is a natural frequency corrected for damping and sinφ = fh/f0. As it follows from Fig. 13, the output oscillations are mainly concentrated in the neighborhood of f0. Due to this fact, there is another name for random oscillations – a narrowband random process. Such a process is characterized by a bandwidth Δf at points where A(f ) equals to 0.707 · A(f0), as is shown in Fig. 13. The value of this bandwidth is related to the damping factor h in the following manner [13]: Δf =
h · π.
The expression for the spectral density (at f > 0) of a process generated by a linear elastic system under the effect of “white” noise E(t) is [11] Sð f Þ =
Dx h f0 · . 2π2 π2 ðf 2 − f 2 Þ2 − h2 f 2 0
Everything mentioned earlier allows us to assert that there are of developed mathematical tools for the quantitative description and analysis of random oscillations.
3.2 Yule Series
43
3.2 Yule Series Let us review the properties of a process generated by the second-order autoregressive model of the form: xi − a1 xi − 1 − a2 xi − 2 = εi , which is called the Yule series. The stationarity conditions of this model can be determined by examining its characteristic equation (6). For the Yule series, this equation will have the following form: z2 − a1 z − a2 = 0:
(16)
It was stipulated earlier that stationarity conditions of the AR process are formulated as roots of the characteristic equation (16) being inside the unit circle, that is, |zi| < 1. It is known that the roots of the quadratic equation are related to factors a1 and a2 by expressions: z1 · z2 = − a 2 ; z1 + z2 = a1 .
(17)
This fact allows to identify the stationarity conditions of the Yule series with respect to the factors of the AR(2) model. From the first equation of expression (17), it follows that j − a2 j = jz1 j · j z2 j. Taking into account the requirements |z1| < 1 and |z2| < 1, the stationarity condition for a2 is determined as − 1 < a2 < 1. A substitution of z2 = –a2/z1 obtained from the first equation (17) into the second equation allows to transform the latter into the following form: z1 a1 + a2 = z1 2 < 1. From this expression, two additional stationarity conditions are deduced as algebraic relationships between the factors a1 and a2: a2 + a1 < 1; a2 − a1 < 1. Together with the previously obtained condition −1 < a2 < 1, they define the stationarity area of the AR(2) factors in a form of a triangle in the a1 and a2 plane (Fig. 15).
44
Chapter III Stochastic Oscillatory Processes
Fig. 15: Region of factors a1 and a2 stationarity.
Let us consider the properties of the Yule series in the stationarity area utilizing its correlation function described by the second-order difference equation of the kind: ρi = a1 ρi − 1 + a2 ρi − 2 .
(18)
A common solution to this equation is obtained through the use of the roots of the characteristic equation (16): ρ i = A 1 z1 i + A 2 z2 i , where A1 and A2 are arbitrary constants derived from initial conditions: ρo = 1 = A1 + A2 ; ρ 1 = A 1 z1 + A 2 z2 . From equation (18) for i = 1 one obtains: ρ1 = a1 =ð1 − a2 Þ. Taking into account the relationships (17) between the factors a1, a2 and the roots z1, z2 of the characteristic equation, the constants may be expressed as z1 1 − z22 A1 = ; ð1 + z1 z2 Þðz1 − z2 Þ − z2 1 − z12 A2 = . ð1 + z1 z2 Þðz1 − z2 Þ Now, the common solution to equation (18) can be written in the form: 1 − z22 z1i + 1 − 1 − z12 z2i + 1 ρi = : ð1 + z1 z2 Þðz1 − z2 Þ
(19)
Let us analyze the earlier expression from the standpoint of possible values of z1 and z2 defined by the well-known expression: z1, 2 =
a1 ±
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi a21 + 4a2 . 2
3.2 Yule Series
45
First of all, we shall consider a case where roots z1 and z2 are real values, that is, when the following condition is fulfilled: a1 2 + 4a2 ≥ 0. This condition is respected in a zone of the stationarity triangle above the parabolic boundary in Fig. 15. In region “1,” the correlation function tends to be zero while still remaining positive, whereas in region “2” the correlations are sign-alternating [7]. The positive root is dominant in region “1” and the negative root in region “2.” If the roots are complex (area below the parabolic boundary in Fig. 15), they may be represented in equation (19) as follows: z1 = A · ejθ ; z2 = A · e − jθ , where θ = 2πf is the angular frequency. As a result, the expression for the correlation function yields the following form: ρi =
½sgnða1 Þi ð − a2 Þi=2 sinðθ · i + ’Þ ; sin ’
(20)
where ( sgnð a1 Þ =
+ 1, a1 > 0; − 1, a1 > 0.
Equation (20) describes a sinusoid with the damping factor (–a2)1/2. Parameters θ and φ are expressed through autoregressive model factors in the following form [7]: a1 cos θ = pffiffiffiffiffiffiffiffiffi ; 2 − a2 tg’ =
1 − a2 tgθ. 1 + a2
(21)
Specifically, in region “3” (Fig. 15), the correlation function always changes its sign on passage from ρo to ρ1; phase φ has a value within π/2 . . . π range. In region “4,” where φ < π/2, the initial correlations are always positive and reverse the sign with growing lags. For completeness of AR(2) model examination, the spectrum of the Yule series has to be estimated. Let us take advantage of the fact that the spectrum of the output of the linear system described by the operator C(B) is related to the uniform white noise spectrum Dε/π through the square of the operator module in which B = e–jθ [12]: SðθÞ =
Dε jCðe − jθ Þj2 : π
46
Chapter III Stochastic Oscillatory Processes
The AR(2) model operator is as follows: C ð BÞ =
1 , 1 − a1 B − a2 B 2
therefore, the process spectrum will be equal to: Dε . SðθÞ = π 1 + a21 + a22 − 2a1 ð1 − a2 Þ cos θ − 2a2 cosð2θ The Yule series variance determined by (7) takes the following form: Dx =
Dε . 1 − a1 ρ1 − a2 ρ2
For the first two correlations of the Yule series, it is possible to compose the Yule– Walker system of equations: ρ1 = a1 + a2 ρ1; ρ2 = a1 ρ1 + a2. This system of equations can be represented in the matrix form as B · A = P,
(22)
where a1 ; A= a 2
ρ P = ρ1 ; 2
ρ B = 1ρ 1 1 . 1
Solving this system with respect to vector P yields the following expressions for the first two correlations: ρ1 =
a1 ; 1 − a2
ρ2 = a 2 +
a21 . 1 − a2
Their substitution into the earlier expression for Dx enables to complete variance description with model factors: Dx =
ð1 − a2 ÞDε ð1 + a2 Þ½ð1 − a2 Þ2 − a21
.
(23)
3.3 Discrete Coincidence of Yule Series and Random Oscillations
47
Let us make an example of materializing a form of the correlation function of the Yule series described by the following model: xi = 1.52xi − 1 − 0.9xi − 2 + εi . The theoretical correlation function of this process is calculated from (18) with initial conditions ρo = 1 and ρ1 = a1/(1–a2) = 0.8. Its graphical representation is given in Fig. 16 from which it follows that a basic period of the correlation function equals to 10.
Fig. 16: Correlation function of Yule series.
This example shows that for model factor values 0 ≤ a1 ≤ 2; − 1 < a2 ≤ 0,
(24)
corresponding to region “4” of the stationarity triangle (Fig. 15), the behavior of the AR(2) process is pseudo-periodic.
3.3 Discrete Coincidence of Yule Series and Random Oscillations The earlier fact alludes to the physical interpretation of the Yule model as a discrete analog of the linear differential equation (14): ..
XðtÞ + 2 hX_ ðtÞ + ω 20 X ðtÞ = EðtÞ . Recall expression (15) of the correlation function of random oscillations in the following form: ρð τÞ =
e − hjτj sinðωh jτj + ’Þ . sin ’
48
Chapter III Stochastic Oscillatory Processes
This equation corresponds to expression (20), which for the values of factors a1 and a2 defined by region (24) takes the following form: ρi =
ð − a2 Þi=2 sinðθ · i + ’Þ . sin ’
(25)
In fact, the AR(2) model factors may be linked to oscillation characteristics (damping factor h and natural frequency ωo) from the following condition: ρi = ρðτ = Δt · iÞ. Here Δt = ti – ti–1 is the time interval between neighboring members of the Yule series. Such an equality sets the condition of statistical equivalency between sampling random oscillations X(t) and the AR(2) process. Substitution of (24) and (25) into that equality gives the following: 1
ð − a2 Þ2 = e − hΔt ; θ = ωh Δt. Taking into account the fact that a1 cosθ = pffiffiffiffiffiffiffiffiffi , 2 − a2 we finally obtain the expressions for h and fh = ωh/2π as follows: h= − fh =
lnð − a2 Þ ; 2 · Δt
1 a1 · cos − 1 pffiffiffiffiffiffiffiffiffi . 2πΔt 2 − a2
(26)
Since the interval Δt present is in expressions (26), a question arises regarding the rule for choosing its value. In other words, the rule for transforming a continuoustime realization x(t) into a time series. Terms xi of this time series are readouts (samples) of x(t) at a point in time ti = Δt·i. The interval Δt is referred to as the sampling interval. The theoretical base of the discretization procedure is the Nyquist–Shannon– Kotelnikov (sampling) theorem. Particularly, Shannon formulated it in the following terms: if a function x(t) contains no frequencies higher than fm, then this function is completely determined by giving its ordinates at a series of points spaced Δt = 1/(2fm) apart. The value 2fm is called the Nyquist frequency (rate).
3.3 Discrete Coincidence of Yule Series and Random Oscillations
49
This theorem sets up the maximum value of interval Δt. However, for practical purposes its value is obtained from the expression of the form: Δt =
1 , 2κ · fm
where κ > 1 is a margin coefficient whose value depends on how much a real signal is approximated by one featuring a spectrum limited by fm as well as on the purposes of subsequent processing. As it was mentioned earlier, the energy of random oscillations is concentrated within a narrow bandwidth around the natural frequency f0. This fact allows to consider random oscillations as a process with a pseudo-limited spectrum. Due to the absence of a strictly determined value of the frequency fm, the definition fm = f0 is more characteristic for the oscillation’s spectrum. In this case, as it follows from Fig. 13, the value of the margin coefficient must be intentionally κ > 1.4. Its value can be determined with a reference to subsequent processing of random oscillations, that is, with features of their model adaptation. As it was noted in chapter II, adaptation of the statistical model to empirical data is reduced to evaluation of parameters of that model. In the case of Yule series, it reduces to estimating the vector A of the Yule–Walker system (22). Formally, a solution of that system related to factors of the Yule model can be represented as follows: A = B − 1 · P. To provide stability of such a solution, the matrix B has to be well-conditioned [14]. A matrix is not well-conditioned, if its properties approach the properties of a singular matrix whose determinant is zero. The fact is that the formation of the inverse matrix B–1 involves dividing by the value of the determinant |B|. Therefore, the larger the value of the determinant, the more stable the solution of the matrix equation. Taking into account this fact the criterion of choice for the value of κ may be written as follows: max. jBj! κ It must be noted that an absolute value of the determinant may be increased simply through multiplying equation (22) by a certain number. Following that, a value of κ which provides an absolute maximum of the determinant |B| needs to be found. The matrix B of the Yule–Walker system has the following form: ! 1 ρ1 B= . ρ1 1
50
Chapter III Stochastic Oscillatory Processes
Its determinant |B| = 1 – ρ12 obviously reaches a maximum with value ρ1 = 0. Taking into account expression (25), the condition of the determinant maximum can be rewritten as ρ1 =
ð − a2 Þ1=2 sinðθ + ’Þ = sin ’
ð − a2 Þ1=2 sinð2πfh · Δt + ’Þ ð − a2 Þ1=2 sinðπ=κ + ’Þ = = 0, sin ’ sin ’ due to θ = ωhΔt and Δt = 1=ð2κ · f0 Þ. Therefore, a fulfillment of the condition ρ1 = 0 may be reached when sinðπ=κ + ’Þ = 0 or π=κ + ’ = π. Earlier, it was mentioned that sinφ = ωh/ω0, therefore, for small values of h (ωh ≈ ω0), the argument φ will be equal to ’ = sin − 1 ðωh =ω0 Þ ≈ sin − 1 ð1Þ = π=2. This means that the condition ρ1 = 0 stated earlier is fulfilled with κ = 2. In addition to ρ1, all of the following odd correlations are equal to zero as well. The value κ = 2 corresponds to the diagonal type of matrix B confirming a known fact that orthogonal and diagonal matrices are always well-conditioned. Notice that in the case of optimal sampling, the frequency ratio ωh/ω0 > 0.99 for values h < 0.2. In other words, for practical purposes the value of the natural frequency and its damped value may be assumed to be equal. Thus, the optimal (κ = 2) discretization of random oscillations is provided when the sampling rate equals four times the value of natural frequency. Changing κ in either direction causes ρ1 ≠ 0 and leads the determinant of matrix B to decrease. However, the range 1.6 ≤ κ ≤ 2.5 provides a value of the determinant |B| ≥ 0.9. Therefore, this particular range of κ is characterized by a degradation of accuracy of less than 10%. This fact allows to declare that the condition κ = 2 is not too strict as the sampling interval T0 T0 ≤ Δt < 5 3 may still be considered as pseudo-optimal. Having established a one-to-one conversion of Yule model factors to parameters of the elastic system allows us to transform the adaptation of models of random oscillations into a linear task of estimating these model factors. In turn, these estimates can be used to calculate the parameters (h and f0) of random oscillations by means of expressions (26).
3.4 Estimation of Yule Model Factors
51
3.4 Estimation of Yule Model Factors This section deals with adapting the Yule series models to empirical data which yields the values of model factors. In practice, only a few observations of the concerned process (phenomenon) are available and sometimes there is only one. Statistical properties of random processes, on the other hand, are determined for a multitude of their realizations. Nevertheless, this fact will not limit the possibility of an estimation procedure if the observed processes satisfy certain assumptions. Suppose there is a sequence of values x1, . . ., xn of a random process. Each value of xi features an expectation μx and a variance Dx. Let us describe a time average value x of a realization as n 1X xi . n!∞ n i=1
x = lim
In accordance with the Birkhoff–Khinchin theorem [9], for a stationary series with a finite mathematical expectation, the probability of existence of variable x is P = 1. Additionally, if a stationary series meets the following condition: n 1X ρi = 0 n!∞ n i=1
lim
(27)
then such a series features an ergodic property which signifies an equality of the space and time averages: μx = x. Condition (27) is not too hard to be realized since in practice it is reduced to the requirement of a correlation function’s tendency to zero with an increase of time lag. Actually, the stationary AR(2) process may be considered as ergodic because its correlation function is a damped sinusoid tending to zero. This fact enables the estimation of Yule model factors a1 and a2 based on a single realization.
3.4.1 Estimates of Maximum Likelihood As it was introduced in Section 3.3, correlations of the Yule series may be composed via a system of Yule–Walker equations (22): B · A = P.
52
Chapter III Stochastic Oscillatory Processes
Implementation of the Cramer’s rule for solving this system yields a linkage of Yule model factors with series correlations: a1 =
ρ1 ð1 − ρ2 Þ ; 1 − ρ21
a2 =
ρ2 − ρ21 . 1 − ρ21
(28)
These expressions allow to determine the maximum likelihood estimation (MLE) of factors a1 and a2, if the estimates of correlations ρ1 and ρ2 are already known: ~1 = a
~ρ1 ð1 − ρ ~2 Þ ; ~21 1−ρ
~2 = a
~2 − ρ ~21 ρ . ~21 1−ρ
Using equations (26), which relate а1 and а2 to random oscillation characteristics, it ~ and ~f0 . Recall, that due to the invariance propis not complicated to find the MLEs h erty of the MLE the processing algorithms can be developed without considering the likelihood function for model factors. Therefore, the task of adapting the Yule model is reduced to estimating its correlations. Consider the task of estimating ρ1 and ρ2, assuming there is a realization of Yule series in a form of an n-dimensional vector xT = (x1, . . ., xn). This vector is characterized by μх = 0 and joint Gauss probability density in the following form: ( ) n X n 1X − n=2 1=2 f ðx1 , . . . , xn Þ = ð2πÞ η xi xj . jH j exp − 2 i = 1 j = 1 ij In this expression jH j is the determinant of the matrix H inverse to the covariance matrix: H = η i j = Γ − 1 ; Γ = γ i j ; i, j = 1, . . . , n. The element ηij of the inverse matrix equals the algebraic complement Γji to the element γji divided by the determinant |Γ|. In turn, Γji is the determinant of a matrix derived from Γ by crossing out jth row and ith column as well as multiplying by (−1)j+i. The statistical relationship between the terms of a stationary series is determined by their relative positions within the realization x1, . . ., xn, that is, for stationary series 0 B Γ =@
γ0
·
γ − ðn − 1Þ
·
·
·
γ − ð n − 1Þ
·
γ0
1 C A = γk ,
3.4 Estimation of Yule Model Factors
53
where k = i–j = –(n–1), . . ., (n–1). Accordingly, the elements of the matrix H can be represented as follows: ηk = Γ − k =jΓ j, and the power of the exponential function in the expression of f(x1, . . ., xn) that transforms into n X n X
ηij xi xj =
i=1 j=1
n−1 X
ηk
k = − ðn − 1Þ
nX − jk j
xj xj + jkj .
j=1
The likelihood function is derived from the probability density providing fixed x and variable parameters ηk. In particular, the logarithmic likelihood function will have the form: lðηk =xÞ = −
nX − jkj n−1 n 1 1 X ln ð 2πÞ + lnjHj − ηk xj xj + 2 2 2 k = − ðη − 1Þ j=1
jk j .
The conditions of a maximum of the logarithmic likelihood function with respect to parameters ηk will be as follows: ∂lðηk =xÞ = 0, ∂ηk which will result in the following equation: n − jk j 1 ∂jH j X · − xj xj + jkj = 0. jH j ∂ηk j=1
(29)
Now, let us represent a derivative of the determinant |Η|. It may be done through the algebraic complement Hk to elements ηk lying on the same diagonal as follows: n − jk j ∂jH j X = H kj , ∂ηk j=1
where j is an ordinal number attributed to diagonal elements ηk. Finally, expression (29) takes the form: nX − jk j j=1
n − jk j H kj X xj xj + jkj . = jH j j=1
As Η–1 = Γ, the algebraic complement Hk to the determinant |H| is a corresponding element of matrix Γ: Hk = γ − k. jH j
54
Chapter III Stochastic Oscillatory Processes
Thus, the expression for calculations of the covariance MLE of stationary series will be determined as ~γk = ~γ − k =
n−k 1 X xj xj + k : n − k j=1
(30)
In fact, this expression is obtained from the condition of the maximum of the likelihood function with respect to the parameter ηk but not to γk . However, the invariance property allows to define the covariance estimates calculated in accordance with (30) as the MLE. Accordingly, the MLE of correlations of a stationary series can be obtained using the following expression: ~γ n ~ρk = k = · ~γ0 n − k
Pn − k i = 1 xi xi + k P . n 2 i = 1 xi
3.4.2 Least-Squares Estimation It is necessary to note that the method of the least squares may be used for estimation of model factors as well. This statement is based on the fact that MLE and least-squares estimates (LSE) are asymptotically consistent, that is, they become the same value when the realization size tends to be infinity. To prove this statement, let us examine the realization x1, . . ., xn featuring the mathematical expectation μx = 0. If a process is generated by the AR(2) model, one may pass from xT = (х1, . . ., хn) to the set (х1, х2, ε3, . . ., εn) using εi = xi − a1 xi − 1 − a2 xi − 2 , i = 3, ..., n. The probability density of that set can be represented in the form of the following product: f ðx1 , x2 , ε3 , ..., εn Þ = f ðε3 , ..., εn =x1 , x2 Þ · f ðx1 , x2 Þ. The white noise E has a Gauss distribution with με = 0 and Dε = 1, therefore n 1X ε2i f ðε3 , . . . , εn =x1 , x2 Þ = ð2πÞ − ðn − 2Þ=2 exp − 2 i=3
n 1X − ðn − 2Þ=2 2 = ð2πÞ exp − ðxi − a1 xi − 1 − a2 xi − 2 Þ . 2 i=3
3.4 Estimation of Yule Model Factors
55
As the AR(2) model is linear, x1 and x2 have the Gauss distribution too. Statistical relationships between these values are represented by the following covariance matrix: ! ! γ0 γ1 1 ρ1 = Dx Γ= . γ1 γ0 ρ1 1 In view of matrix Γ, the probability density of terms x1 and x2 may be written as " !# x1 1 −1 1=2 ðx1 , x2 Þ = ð2πÞ jHj exp − ðx1 , x2 ÞH . 2 x2 Let us recall that matrix H is the inverse of the covariance matrix, therefore its expression is as follows: ! 1 − ρ1 1 −1 . H=Γ = · Dx ð1 − ρ21 Þ − ρ1 1 In order to construct the likelihood function, the probability density has to be represented as an explicit function of the model factors. In Section 3.3, the expression for ρ1 was derived in the following form: ρ1 =
a1 . 1 − a2
Additionally, the variance of the Yule series was determined via factors a1 and a2 by expression (23): Dx =
ð1 − a2 Þ ð1 + a2 Þ½ð1 − a2 Þ2 − a21
,
assuming Dε = 1. Making use of these two latter expressions as well as the expression for matrix H, the logarithmic likelihood function may be constructed by setting values of x1, . . ., xn as fixed and factors a1, a2 as variable: 1 lða1 , a2 Þ = const + lnð1 − a22 Þ + x1 x2 a1 ð1 + a2 Þ 2 n 1X 1 − 1 − a22 x12 + x22 − ðxi − a1 xi − 1 − a2 xi − 2 Þ2 . 2 2 i=3 In this expression, only the following term depends on n: Sða1 , a2 Þ =
n 1X ðxi − a1 xi − 1 − a2 xi − 2 Þ2 . 2 i=3
56
Chapter III Stochastic Oscillatory Processes
That is why asymptotically (n → ∞), the maximum of the likelihood function corresponds to the minimum of the sum S(a1,a2): max lða1 , a2 Þ⁓ min sða1 , a2 Þ. a1 , a2
a1 , a2
The extreme value of S(a1,a2) is determined by the conditions: ∂Sða1 , a2 Þ = 0; ∂aj
j = 1, 2,
which lead to the following system of equations: n X
ðxi − a1 xi − 1 − a2 xi − 2 Þ · ð − xi − 1 Þ = 0;
i=3 n X ðxi − a1 xi − 1 − a2 xi − 2 Þ · ð − xi − 2 Þ = 0. i=3
The second derivatives of S(a1,a2) are n ∂2 Sða1 , a2 Þ X = xi2− j > 0; 2 ∂ aj i=3
j = 1, 2.
Therefore, the extreme value of S(a1,a2) is a minimum. The earlier system of equations transforms to n X
a1
xi2− 1 + a2
i=3
a1
n X
xi − 1 + a2
i=3
n X
xi − 2 =
n X
i=3 n X i=3
xi xi − 1 ;
i=3
xi2− 2 =
n X
xi xi − 2 ,
i=3
which can be written in a matrix form A · B = C. Here A=
a1 a2
bij =
! ; B=
m X l=3
b11
b12
b21
b22
xl − i xl − j ;
! ;
C=
b01 b02
i = 0 , 1 , 2; j = 1, 2.
! ;
3.4 Estimation of Yule Model Factors
57
Implementation of the Cramer’s formula leads to the LSE of the factors a1 and a2 in the following forms: b01 b22 − b02 b12 ; b11 b22 − b212
_
a1 =
b11 b02 − b01 b12 . b11 b22 − b212
_
a2 =
_
_
Let us mark ρ 1 and ρ 2 as the LSE: _
Pn − 2
_
Pn − 2
γ ρ 1 = _1 = γ0
_
i = 1 xi xi + 1 ; P n−2 2 i = 1 xi
γ ρ 2 = _2 = γ0
_
i = 1 xi xi + 2 . P n−2 2 i = 1 xi
_
_
As a result, the earlier expressions for a 1 and a 2 may be written in the following form: _
a1 =
_
a2 =
_
_
ρ 1 ð1 − ρ 2 Þ _2
1 − ρ1 _
; (31)
_
ρ 2 − ρ 21 _
1 − ρ 21
.
The expressions (31) correspond to expressions for the MLEs of factors a1 and a2. Let us recall that the expression of MLE of correlations is given by the expression: ~k = ρ
n n−k
Pn − k i = 1 xi xi + k P , n 2 i = 1 xi _
~k = ρ k when n → ∞, that is, the LSE and MLE are differing by the factor n/(n–k). Hence, ρ both types of estimates are consistent. In practice, the difference between MLE and LSE is insignificant for n > 100.
Chapter IV Modelling of Economic Cycles Consider the economic model “investments→income” whose input determining the state of the system is the investment function I(t). The output X(t) of this system, called the income function, is a monetary estimate of manufactured products and provisioned services. Usually, it is represented as a gross domestic product (GDP), which is the total value of X(t) for a certain period of time.
4.1 Probabilistic Description of the Investment Function First of all, we shall analyze the nature (phenomenon) of the investment function. The value of I(t) represents the result of independent actions of N diverse investors. This independence implies random starts of investments cycles, which lead to a random position of the time moment ti inside a certain (jth) investment cycle Ij(t). In this case, it seems to be quite a reasonable assumption to have equally probable values of ti inside the cycle duration Tj. In addition, we assume that the cycle is characterized by invested capital Cj and its return ΔCj, and shall postulate that the growth of the return in time occurs linearly (Fig. 17).
Fig. 17: Investment cycle in the time domain.
Here, ζji is the value of a random variable Zj(t) at moment ti, that is, ζji = Zj(ti). Its nature is stochastic due to random position of ti inside the range 0 . . . Tj. The variable Zj will also have a uniform probability distribution due to the linearity of the function Ij(t). The probability density function f(ζj) is represented in Fig. 18.
Fig. 18: The PDF of random variable Zj. https://doi.org/10.1515/9783110707021-005
4.1 Probabilistic Description of the Investment Function
59
This PDF is characterized by a mathematical expectation: μζ j =
ΔCj ; 2
and a variance: Dζ j =
ΔCj2 12
.
As such, the investment function of the given cycle can be written as follows: Ij ðtÞ = Cj + μζ j + Ej ðtÞ, where Ej ðtÞ = Zj ðtÞ − μζ j . In principle, investments can end up with losses too, that is, the value of the mathematical expectation μζ j can be either positive or negative. In turn, the investment function I(t) resulting from the actions of N investors can be written as follows: N X I ðt Þ = Ij ð t Þ = j=1 N X j=1
ðCj + μζ j Þ +
N X
Ej ðtÞ = MðtÞ + E ðtÞ.
j=1
The random process E(t) has the mathematical expectation equal to zero and variance Dε as follows: Dε =
N X j=1
Dεj =
N ΔC2 X j j=1
12
.
The value of Dε is always limited due to finite values of both ΔCj and N < ∞. In accordance with the central limit theorem, summing a large number of independent random values (Ej) with similar (uniform) distributions produces a random value E with the PDF tending to the Gaussian distribution (Fig. 19).
Fig. 19: A view of the PDF of E(t).
60
Chapter IV Modelling of Economic Cycles
The function M(ti) is a sum of the invested capital Cj and μζ j , that is, it is deterministic because mathematical expectations are not random values. The values of M(t) can change due to variation in the number of investors (N), their invested capitals (Cj) and appropriate returns (ΔCj). This means that the long-term trend of the income function I(t) is determined by the effect of the function M(t) on the economic system. If we turn to the frequency domain, then a concept of frequency has to be defined for the economic system. Suppose that the variable part of the investment cycle (Fig. 17) is a fragment of the so-called sawtooth function F(t) represented in Fig. 20.
Fig. 20: Fourier analysis of a sawtooth process.
This periodic function can be represented by the infinite Fourier series: F ðtÞ =
∞ 2A X 1 sinð2πkt=TÞ. π k=1 k
In particular, the first (k = 1) harmonic sin(2πft) with frequency f = 1/T is shown in Fig. 20. Therefore, the concept of cyclic frequency is quite appropriate to the processes occurring in economic systems. In fact, investment cycles can have a large variance in their time spans: some of them are measured in days, others may last years or even decades. This means that E (t) can include harmonics with a wide spectrum of frequencies, that is, E(t) is a wideband random process with values that are independent in time and having a limited variance. A mathematical model of such a random process can be the Gaussdistributed “white” noise. Therefore, the phenomenon of a particular economic cycle can be reduced to applying random fluctuations (“white” noise) of the investment function to a corresponding system.
4.2 Stochastic Models of Economic Cycles Existence of different economic cycles was quantitatively verified in reference [15] by performing spectral analysis of the rate of GDP changes. This analysis demonstrated the presence of the following cycles: – Kitchin cycle (3 . . . 4 years) – Juglar cycle (7 . . . 9 years) – Kondratiev cycle (52 . . . 53 years)
4.2 Stochastic Models of Economic Cycles
61
Wherein, the Kuznets swing was interpreted as the third harmonic of the Kondratieff wave. The earlier-mentioned GDP dynamics are represented in Fig. 21.
Fig. 21: Rates of the global GDP change.
It is clear that fluctuations of the GDP rates are measured by several percent ( Tmax will increase the value of M(t). The corresponding growth in the income value related to the long-term trend may offset a decrease in income due to cycle contraction. The discussed strategy relates to controlling the economic system’s input (investments), that is, the economic environment. As a figurative analogy of that approach, we can imagine handling a bicycle ride on rough terrain. Skillful handling allows to avoid pits and bumps in the terrain. In addition, a safe ride on a bicycle is ensured by its shock absorbers appropriate for the terrain conditions, in other words, by upgrading the bicycle. Therefore, there is potential opportunity for future developments of methods to change the characteristics (parameters) of the system in order to affect economic cycles. For instance, one can establish a correlation of Kσ with certain economic indicators whose variations lead to changes in
66
Chapter IV Modelling of Economic Cycles
system efficiency. This will make it possible to affect cycle intensity, such as increasing Kσ at the beginning of a downward trend of income function values. Another hypothetical scenario affecting the cycle can be proposed using the concept of the natural frequency of the system. It is possible to mitigate the negative effect of cycles is possible by changing the value of f0, namely, by increasing the natural frequency of the system during a decrease in the income. Thus, the time interval related to cycle contraction can be decreased. Actual uses of the discussed approaches for controlling cycles can be feasible only when there is quantitative description of links between parameters of the system model and economic characteristics/indicators. The possibility of establishing such correlations is justified by the following example. In reference [5], a multiplier-accelerator model of the business cycle was introduced using the market-clearing assumption. In other words, the target function of the system is the equilibrium of demand and income. An existence of the induced (with the acceleration coefficient k) and autonomous (public expenditure constant) investments was also postulated. The yielded result of such modelling was a linear difference equation of the second order with a homogeneous form as follows: ξ i − bð1 + kÞξ i − 1 + bkξ i − 2 = 0, where b is the marginal propensity to consume. If we compare this equation with the model of the Yule series, then the following notations can be made: a1 = bð1 + kÞ; a2 = − bk. Respectively, equations (26) take the following form: lnðbkÞ ; 2 · Δt 1 b + bk · cos − 1 pffiffiffiffiffiffi . f0 = 2πΔt 2 bk h= −
Therefore, a one-to-one tie of the acceleration coefficient and the marginal propensity to consume with characteristics of the stochastic model of economic cycles is available. Moreover, it is possible to define the correlations of factors a1 and a2 with the acceleration coefficient and the marginal propensity as follows: k = − a2 =ða1 + a2 Þ; b = a1 + a2 .
4.3 Prospects of Applying Cycle Models
67
As a result, estimating economic parameters k and b is possible via knowledge of ^2 can be calculated in ac^1 and a estimates of factors of the AR(2) model. In turn, a cordance with expressions (29) using values ξi of cycles of interest. The discussed examples do not exhaust the possibilities of using the stochastic approach for macroeconomic and econometrics purposes. They have only represented a few of the more obvious potential applications of introduced models.
Chapter V Features of Estimation Procedure As noted earlier, econometricians actually operate with an indicator of the income function known as the GDP which is designated in the following text as G(t). Recall that its value is a monetary estimate of manufactured goods and provisioned services for a certain period of time Δt. Meanwhile, adaptation of cycle models to empirical data requires the presence of values of the income function X(t).
5.1 Recovery of the Income Function from GDP Estimates The GDP function can be described mathematically as follows: ðt Gðt Þ =
XðtÞdt. t − Δt
Simple mathematical operations transform this expression to the following form: ðt Gðt Þ =
ðt XðtÞdt =
t − Δt
hðτÞXðt − τÞdτ,
(34)
0
where ( hð τ Þ =
1, 0 ≤ τ ≤ Δt; 0, τ < 0; τ > Δt.
In metrology [17], convolution equation (34) mathematically describes the result of measuring the process X(t) with an inertial measurement means (MM). The inertial properties of the latter are described by its IRF h(τ). Thus, estimating the GDP function can be interpreted as a measurement of the income function with an inertial MM (estimator). In the frequency domain, the estimator can be characterized by the frequency response function H(f): H ð f Þ = Ah ð f Þ · ejΦh ðf Þ ∞ ð
=
hðtÞ · e − j2πft dt = Δt · sincðπΔtf Þ · e − jπΔtf ,
−∞
https://doi.org/10.1515/9783110707021-006
5.1 Recovery of the Income Function from GDP Estimates
69
where 8 < 1, sincðπΔtf Þ = sin πΔtf : , πΔtf
f≠ 0· f = 0;
A view of the amplitude spectrum Ah(f) is shown in Fig. 23.
Fig. 23: The amplitude component of the estimator’s FRF.
Note that the convolution of two functions in the time domain corresponds to the multiplication of their Fourier images. Therefore, equation (34) can be represented in the frequency domain as Gð f Þ = H ð f Þ · Xð f Þ, where X(f) and G(f) are Fourier images of the income and GDP functions, accordingly. From the form of Ah(f) represented in Fig. 23, it follows that the GDP estimates of the economic cycles are not equal to the values of X(t), namely, the values of G(t) for cycles with large time spans are increased, while they are reduced for small ones. Therefore, the values of GDP cannot be used directly for adapting the “investments→income” models.
5.1.1 Ill-Posed Recovery Task Let us turn to estimating the values of GDP with a sample diagram represented in Fig. 24. This diagram shows the error Z(t) describing the imperfect performance of the estimation procedure. This error arises due to unreliable statistical data, its incompleteness, the performer’s skill and so on. Therefore, the available data related to ~ ðtÞ determined as follows: the values of GDP are actually the estimates G ~ ðtÞ = GðtÞ + Z ðtÞ. G
70
Chapter V Features of Estimation Procedure
Fig. 24: Diagram of GDP estimation.
The function Z(t) is called the measurement (instrumental) noise, which is usually broadband and is often modeled as “white” noise. In principle, knowledge of the weight function h(t) and estimates of the GDP provide opportunity to solve equation (34) with respect to the income function. The search of such a solution is called the recovery task, which, as a rule, is incorrect (ill-posed). A French mathematician Jacques Hadamard suggested to classify the mathematical task as correct (well-posed), if (1) its solution exists; (2) this solution is unique; (3) the solution is stable, that is, small deviations in the source data lead to small deviations in the solution. In reference [17], it was shown that the recovery task for measurements obtained by inertial MM does not satisfy the first and third Hadamard’s conditions. This fact allows us to classify it as an ill-posed task. To solve such tasks, Russian mathematician A.N. Tikhonov introduced [18] an approach for their approximate solution using a regularizing algorithm (recovery operator). The basis of the regularization method is the Tikhonov’s theorem with a generally understood interpretation regarding the concerned task as follows. Under certain conditions, there is an approximate solution Xα(t) to the ill-posed task with an accuracy corresponding to the error of estimates of Z(t). In other words, while it is not possible to actually solve the task, it is possible to get its approximate solution with any desired accuracy. The variable α > 0 is called the regularization parameter which is an increasing function of Z(t). In the case when h(t) and a realization g~ðtÞ of the ~ ðtÞ are available, according to Tikhonov’s method, the following regularizfunction G ing operator can be constructed: ∞ ð
Ψ ½xα ðtÞ =
ðt
j hðt − τÞ · xα ðτÞdτ − g~ðtÞj2 dt +
−∞ 0
5.1 Recovery of the Income Function from GDP Estimates
∞ ð
( 2
α
b0 · ½xα ðtÞ +
p X l=1
−∞
dl xα ðtÞ bl · dtl
71
2 ) dt.
The latter term with bl (l = 0, p) essentially characterizes the smoothness of the process X(t), and it is called the task stabilizer. The search of the approximate solution xα(t) is formulated as a minimization of the earlier functional Ψ: Ψ ! min. xα ðtÞ
The regularization method has proven itself in solving a variety of ill-posed problems, including the recovery of the measured functions (processes). In particular, the method discussed in reference [17] utilizes the frequency response H(f), that is, the functional Ψ is minimized in the frequency domain. For this, its Fourier image is used, which has the following form: ∞ ð
Ψ ½Xα ð f Þ = −∞
∞ ð
α·
X p
~ ð f Þj2 df + jH ð f ÞXα ð f Þ − G
bl · f 2l · jXα ð f Þj2 df ,
−∞ l=0
~ ð f Þ are the Fourier images of xα(t) and g~ðtÞ, respectively. The soluwhere Xα(f) and G tion Xα(f) providing a minimum of the functional Ψ[Xα(f)] is determined by the following extremum condition: ∂Ψ ½Xα ð f Þ = 0. ∂Xα ð f Þ From its fulfillment, an approximate solution Xα(f) can be found in a form of the following expression: Xα ð f Þ =
~ð f Þ H*ð f Þ G : Bp ð f Þ
A2h ð f Þ + α
(35)
Here Ah(f) is the amplitude component of the estimator’s FRF, index “*” is the sign P of the conjugate and Bp(f) = pl= 0 bl · ð f Þ2l . From expression (35), it follows that the approximate solution xα(t) is obtained from the source data g~ðtÞ by a recovery operator (filter) with the frequency response Hα(f) equal to Hα ð f Þ =
H* ð f Þ A2h ð f Þ + α · Bp ð f Þ
.
It should be noted that in addition to the recovery function, this filter will suppress the noise Z(t) that accompanies the realization g~ðtÞ. This statement is justified by
72
Chapter V Features of Estimation Procedure
the fact that Hα(f) correlates with the frequency response Ah(f) of the estimator. Substitution of the approximate solution xα(f) into the functional Ψ transforms the latter to a function of parameter α: ∞ ð
Ψ ðαÞ = −∞
~ ð f Þj2 α Bp ð f Þ · j G df . 2 Ah ð f Þ + α · Bp ð f Þ
Therefore, the recovery task is reduced to searching for a minimum of the function of only one variable α. The procedure for finding the optimal value of α can be realized as a numeric search (computational trials) for the minimum of the function Ψ(α). The determined value αopt is used to calculate xα(f). The inverse Fourier transform of the latter will provide the approximate solution xα(t). The computational trials require the selection of a set of possible values of α. The upper limit of its value can be found from the condition that Hα(f) of the recovering filter must have a passband greater than the bandwidth of the estimator. In other words, the recovering filter should not affect the process X(t) that is being recovered. The estimator has the amplitude component Ah(f) of the frequency response that decreases monotonically as f increases. Therefore, the earliermentioned condition can be formulated for values of Ah(f) < 1 in the following form: Aα ð f Þ =
Ah ð f Þ ≥ Ah ð f Þ. A2h ð f Þ + α · Bp ð f Þ
It follows from this condition that A2h ð f Þ + α · Bp ð f Þ ≤ 1. This inequality allows us to determine a range of possible values of the regularization parameter α: 0